While WireGuard makes every sense for an FPGA due to its minimal design, I wonder why there isn't much interest in using QUIC as a modern tunneling protocol, especially for corporate use cases. QUIC already provides an almost complete WireGuard-alternative via its datagrams that can be easily combined with TUN devices and custom authentication schemes (e.g. mTLS, bearer tokens obtained via OAuth2 and OIDC authentication, etc...) to build your own VPN. While I am not sure about performance, at least when compared to kernel-mode WireGuard, since QUIC is obviously a more complex state machine that's running in userspace and it depends on the implementation and optimizations offered by the OS (e.g. GRO/GSO), QUIC isn't just a yet another tunneling protocol, it actually offers lots of benefits such as working well with dynamic endpoints with DNS instead of just using static IP addrs, it uses modern TLSv1.3 and therefore it's compliant with FIPS for example, it uses AES which can be accelerated by the underlying hardware (e.g. AES-NI), it currently has implementations in almost every major programming language, it can work well in the future with proxies and load balancers, you can bring your own custom, more fine-grained authentication scheme (e.g. bearer tokens, mTLS, etc...), it masquerades as just another QUIC/HTTP3 traffic that's used by almost all major websites now and therefore less susceptible to dropping by any nodes in between, and other less obvious benefits such as congestion control and PMTUD.
Quic is a corporate supported black hole. Corporations are anti-human. Its a wonder that there is still some freedom to make useful protocols on the internet and that people are nice enough to do that
Why would anyone want to use a complex kludge like QUIC and be at the mercy of broken TLS libraries, when Wireguard implementations are ~ 5k LOC and easily auditable?
Have all the bugs in OpenSSL over the years taught us nothing?
I've recently spent a bunch of time working on a mesh networking project that employs CONNECT-IP over QUIC [1].
There's a lot of benefits for sure, mTLS being a huge one (particularly when combined with ACME). For general purpose, spoke and hub VPN's tunneling over QUIC is a no-brainer. Trivial to combine with JWT bearer tokens etc. It's a neat solution that should be used more widely.
However there are downsides, and those downsides are primarily performance related. For a bunch of reasons, some just including poorly optimized library code, others involving relatively high message parsing/framing/coalescing/fragmenting costs, and userspace UDP overheads. On fat pipes today you'll struggle to get more than a few gbits of throughput @ 1500 MTU (which is plenty for internet browsing for sure).
For fat pipes and hardware/FPGA acceleration use cases, google probably has the most mature approach here with their datacenter transport PSP [2]. Basically a stripped down per flow IPsec. In-kernel IPsec has gotten a lot faster and more scalable in recent years with multicore/multiqueue support [3]. Internal benchmarking still shows IPsec on linux absolutely dominating performance benchmarks (throughput and latency).
For the mesh project we ended up pivoting to a custom offload friendly, kernel bypass (AF_XDP) dataplane inspired by IPsec/PSP/Geneve.
I'm available for hire btw, if you've got an interesting networking project and need a remote Go/Rust developer (contract/freelance) feel free to reach out!
The purpose of Wireguard is to be simple. The purpose of QUIC is to be compatible with legacy web junk. You don't use the second one unless you need the second one.
You can build a custom L7 on top of anything, really. I think my favorite was tcp/ip over printers and webcams.
The question is what does QUIC get you that UDP alone does not? I don't know the answer to that. Is it because firewalls understand it better than native wireguard over UDP packets?
> WireGuard does not focus on obfuscation. Obfuscation, rather, should happen at a layer above WireGuard, with WireGuard focused on providing solid crypto with a simple implementation. It is quite possible to plug in various forms of obfuscation, however.
This comment https://news.ycombinator.com/item?id=45562302 goes into a practical example of QUIC being that "layer above WireGuard" which gets plugged in. Once you have that, one may naturally wonder "why not also have an alternative tunnelling protocol with <the additional things built into QUIC originally listed> without the need to also layer Wireguard under it?".
Many design decisions are in direct opposition to Wireguard's design. E.g. Wireguard (intentionally) has no AES and no user selectable ciphers (both intentionally), QUIC does. Wireguard has no obfuscation built in, QUIC does (+ the happy fact when you obfuscate traffic by using it then it looks like standard web traffic). Wireguard doesn't support custom authentication schemes, QUIC does. Both are a reasonable tunneling protocol design, just with different goals.
The hope with QUIC is encrypted tunnels that look and smell like standard web traffic are probably first in the list of any allowed traffic tunneling methods. It works (surprisingly) a lot more often than hoping an adversarial network/security admin doesn't block known VPN protocols (even when they are put on 443). It also doesn't hurt that "normal" users (unknowingly) try to generate this traffic, so opening a QUIC connection on 443 and getting a failure makes you look like "every other user with a browser" instead of "an interesting note in the log".
I.e. the advantage here is any% + QUIC%, where QUIC% is the additional chances of getting through by looking and smelling like actual web traffic, not a promise of 100%.
You really don't want reliable transport as a feature of the tunnel unless you are _intimately_ familiar with what all of the tunneled traffic is already doing for reliable transport.
The net result of two reliable transports which are unaware of each other is awful.
There is actually. A way more interesting re-implementation of a popular L7 is SSH over QUIC. SSH has to implement its own mutual authentication and transport embedded in the protocol implementation since it operates on top of plaintext TCP, but with QUIC you can just offload the authentication (e.g. JWT bearer tokens issued by IdPs verified at L7 or automatically via mTLS x509 certs) and transport parts to QUIC and therefore have a much more minimal implementation.
It’s multi stream, reliable connections. WireGuard’s encryption over UDP is none of those things. WireGuard encryption is simpler and far more flexible, but also less capable.
WireGuard-over-QUIC does not make any sense to me, this lowers performance and possibly the inner WireGuard MTUs. You can just replace WireGuard with QUIC altogether if you just want obfuscation.
It's not about performance, of course. It's about looking like HTTPS, being impenetrable, separating the ad-hoc transport encryption and the Wireguard encryption which also works as authentication between endpoints, and also not being not TCP inside TCP.
You can just do that by using QUIC-based tunneling directly instead of using WireGuard-over-QUIC and basically stacking 2 state machines on top of one another.
TCP over Wireguard is two state machines stacked on each other. QUIC over Wireguard is the same thing. Yet, both seems to work pretty well.
I think I see your argument, in that it's similar to what sshuttle does to eliminate TCP over TCP through ssh. sshuttle doesn't prevent HOL blocking though.
TCP over WireGuard is unavoidable because that's the whole point of tunneling. But TCP over WireGuard over QUIC just doesn't make any sense, neither from performance nor from security perspective. Not to mention that with every additional tunneling layer you need to reduce the MTU (which is already a very restricted sub-1500 value without tunneling) of all inner tunnels.
I think standards operate according to punctuated equilibrium so the market will only accept one new standard every ten years or so. I could imagine something like PQC causing a shift to QUIC in the future.
Tangentially related, I've experimented with Tailscale and Zerotier and, tho I guess they have different audiences, I prefer Zerotier for reliability. Tailscale gets borked by existing VPN config, breaking things on local networks. I like both but does anyone care to share their experiences or explain more in depth the uses / differences as they see it?
Very cool project - hoping to see follow-up designs that can do more than 1Gbps per port!
I recently built a fully Layer2-transparent 25Gbps+ capable wireguard-based solution for LR fiber links at work based on Debian with COTS Zen4 machines and a purpose-tailored Linux kernel build - I'd be curious to know what an optimized FPGA can do compared to that.
Just to elaborate for others, MACSec is a standard (802.1ae) and runs at line rate. Something like a Juniper PTX10008 can run it at 400Gbps, and it’s just a feature you turn on for the port you’d be using for the link you want to protect anyway (PTXs are routers/switches, not security devices).
If I need to provide encryption on a DCI, I’m at least somewhat likely to have gear that can just do this with vendor support instead of needing to slap together some Linux based solution.
Unless, I suppose, there’s various layer 2 domains you’re stitching together with multiple L2 hops and you don’t control the ones in the middle. In which case I’d just get a different link where that isn’t true.
Generally its used when you have links going between two of your sites, so you typically only need it on your switch or router that terminate that link.
I can't think of a scenario where this is useful. They claim "Full-throttle, wire-speed hardware implementation of Wireguard VPN" but then go on implementing this on a board with a puny set of four 1 Gbps ports... The standard software implementation of Wireguard (Linux kernel) can already saturate Gbps links (wirespeed, check) and can even approach 10 Gbps on a mid-range CPU: https://news.ycombinator.com/item?id=42172082
If they had produced a platform with four 10 Gbps ports, then it would become interesting. But the whole hardware and bitstream would have to be redevelopped almost from scratch.
It's an educational project. No need to put it on blast over that. CE/EE students can buy a board for a couple hundred bucks and play around with this to learn.
A hypothetical ASIC implementation would beat a CPU rather soundly on a per watt and per dollar basis, which is why we have hardware acceleration for other protocols on high end network adaptors.
Personally, if I could buy a Wireguard appliance that was decent for the cost, I'd be interested in that. I ran a FreeBSD server in my closet to do similar things back in the day and don't feel the need to futz around with that again.
There’s a strong air of grantware to it. The notion that it could be end-to-end auditable from the RTL up is interesting, though, and generally Wireguard performance will tank with a large routing table and small MTUs like you might suffer on a VPN endpoint server while this project seems to target line speed even at the absolute worst case routing x packets scenario.
The project got a grant from NLnet. I think they do a great job, they gave grants to many nice projects (and also some projects that are going nowhere, but I guess that is all in the game). NLnet really deserves praise for what they are doing!! https://nlnet.nl/thema/NGI0CommonsFund.html
Academic projects which receive grant money to produce papers and slides. This still can advance the state of the art, to be clear, and I like the papers and slides coming out of this project. But I wouldn’t cross my fingers for a working solution anytime soon.
Why would you even need dedicated hardware for just 40 Gb/s? That is within single-core decryption performance which should be the bottleneck for any halfway decent transport protocol. Are we talking 40 Gb/s at minimum packet size so you need to handle ~120 M packets/s?
Because the entire stack is auditable here. There's no Cisco backdoor, no Intel ME, no hidden malware from a zombie NPM package. It's all your hardware.
bps are easy. packets per second is the crunch. Say you've got 64 bytes per packet, which would be a worst-case-scenario - you're down to 150Mpacket/sec. Sending one byte after another is the easy bit, the decisions are made per-packet.
Amusingly, a lot of people have always been convinced that doing 10 Gbps is impossible on VPN. I recall a two-year old post on /r/mikrotik where everyone was telling OP it was impossible with citations and sources of why but then it worked
I meant the comments. Sadly I've linked the wrong permalink and confused everyone.
> > > I see. I'll terminate at the Ryzen 7950 box behind the router and see what I get.
> > That will still be a no. Outside of very specialized solutions this level of the performance is not available. It is rarely needed in real life anyways. Only small amount of traffic neess to be protected this way; for everything else point to point protection with ssh or tls is adequate. I studied different router devices and most (ipsec is dominant) have low encryption truoughput compared to routing capabilities. I guess that matches market requrements.
> It looks like I can get 8 Gbps with low CPU utilization using one of my x86 machines as terminal. This is pretty good. Don't need 10 G precisely. 8G is enough.
I've done precisely this so easily. I just terminate the WG at a gateway node and switch in Linux. It's trivial and throughput can easily max the 10G. I had a 40G network behind that on obsolete hardware providing storage and lots of machines reading from that.
Reading that thread was eye-opening since they should have just told him to terminate on the first machine behind. Which he eventually did and predictably worked.
This is conceptually interesting but seems quite a ways from a real end to end implementation - a bit of a smell of academic grantware that I hope can reach completion.
Fully available source from RTL up (although the license seems proprietary?) is very interesting from an audit standpoint, and 1G line speed performance, although easily achieved by any recent desktop hardware, is quite respectable in worst case scenarios (large routing table and small frames). The architecture makes sense (software managed handshakes configure a hardware packet pipeline). WireGuard really lacks acceleration in most contexts (newer Intel QAT supposedly can accelerate ChaCha20 but trying to figure out how one might actually make it work is truly mind bending), so it’s a pretty interesting place to do a hardware implementation.
"With traditional solutions (such as OpenVPN / IPSec) starting to run out of steam" -- and then zero explanation or evidence of how that is true.
I can see an argument for IPSec. I haven't used that for many years. However, I see zero evidence that OpenVPN is "running out of steam" in any way shape or form.
I would be interested to know the reasoning behind this. Hopefully the sentiment isn't "this is over five years old so something newer must automatically be better". Pardon me if I am being too cynical, but I've just seen way too much of that recently.
Seems like you just haven’t been paying attention. Even commercial VPNs like PIA and others now use Wireguard instead of traditional VPN stacks. Tailscale and other companies in that space are starting to replace VPN stacks with Wireguard solutions.
The reasons are abundant, the main ones being performance is drastically better, security is easier to guarantee because the stack itself is smaller and simpler, and it’s significantly more configurable and easier to obtain the behavior you want.
I use and advocate for wireguard but I don't see it's adoption in bigger orgs, at least the ones I've worked in. Appreciate this situation will change over time, but it'll be a long tail.
It’ll take a little bit of time. But for example Cloudflare’s Warp VPN also uses Wireguard under the hood.
So while corp environments may take a long time to switch for various reasons, it will happen eventually. But for stuff like this corp IT tends to be a lagging adopter, 10-20 years behind the curve.
Bigger orgs for the most part use whatever vpn solutions their (potentially decade old) hardware firewalls support. Until you can manage and endpoint a Wireguard tunnel on Cisco, Juniper, Fortigate (etc) hardware then it's going to take a while to become more mainstream.
Which is a shame, because I have a number of problematic links (low bandwidth, high latency) that wireguard would be absolutely fantastic for, but neither end supports it and there's no chance they'll let me start terminating a tonne of VPNs in software on a random *nix box.
If you use Kubernetes and Calico you can use Wireguard to transparently encrypt in-cluster traffic[1] (or across clusters if you have cluster mesh configured). I wonder if we'll see more "automatic SDN over Wireguard" stuff like this as time goes on and the technology gets more proven.
Problem is IIRC if you need FIPS compliance you can't use Wireguard, since it doesn't support the mandated FIPS ciphers or what-have-you.
sure, but I mean "road warrior" client. Typical, average company VPN users. Ironocally getting a technology like wireguard in k8s is easier than replacing an established vendor/product that serves normal users.
Exactly. We've looked at using Wireguard at my company, but because it can't be made FIPS compliant, it makes it a hard sell. There is a FIPS Wireguard implementation by WolfSSL, interestingly enough.
Yeah itll be running out of steam not only when regulators _understand_ wireguard, but when its the recommendation and orgs need to justify their old vpn solution
I wouldn't say they're running out of steam (they never had any) but OpenVPN was always poorly designed and engineered and IPSec has poor interop because there are so many options.
Unfortunately (luckily?) I don’t have enough knees about IPsec, but usually things make a lot more sense once you actually know the exact architecture and rationale behind it
IPSec isn’t running out of steam anytime soon. Every commercial firewall vendor uses it, and it’s mandatory in any federal government installation.
WireGuard isn’t certified for any federal installation that I’m aware of and I haven’t heard of any vendors willing to take on the work of getting it certified when its “superiority” is of limited relevance in an enterprise situation.
OpenVPN has both terrible configuration and performance compared to just about anything else. I've seen it really drop off to next to no usage both in companies and for personal use over the past few years as wireguard based solutions have replaced it.
Wireguard is slowly eating the space alive and thats a good thing.
Here's a very educational comparison between Wireguard, OpenVPN and IPSec. It shows how easy wireguard is to manage compared to the other solutions and measures and explains the noticeable differences in speed: https://www.youtube.com/watch?v=LmaPT7_T87g
I haven’t tinkered with an FPGA in years but this has my curiosity up. I’d love to separate the protocol handling from the routing and see how light (small of an FPGA, power efficiency) it could be made.
The routing isn’t interesting to me - but protecting low power IoT traffic certain is.
Aside from Blackwire prococols, the sector for FPGA's that are in the AMD architectural framework, Xilinx acquisition is the tangential key-management software for VPN tunneling, which is contingent on whether ASIC [application-specific integrated circuits] can successfully test binaries.
I’ll need someone more into this to break it down for me - how does VPN work on this and why do you need an FPGA version of it? Is this an internal VPN or one for connecting to the internet?
"VPN" is just virtual emulated network cables that you would use to connect your laptops to Wi-Fi routers. It's just so happens that a lot of companies use that word for a paid, cloud based Internet-over-Internet service. It's as if taxi companies called themselves "wheels" companies that whether you're referring to the physical object or the service had become ambiguous.
VPNs are normally processed in software, and that processing is usually multi-step. So latency, jitter, processing time per types of packets, etc can vary. This is FPGA based, and FPGA can run some algorithms and programs that can be implemented as chained conditions at fixed latency without relying on function calling in software. Presumably this is faster and more stable than software approaches thanks to that.
You run the WireGuard app on your computer/phone, tap Connect, and it creates an encrypted tunnel to a small network box (the “FPGA gateway”) at your office or in the cloud. From then on, your apps behave as if you’re on the company network, even if you’re at home or traveling.
Why the FPGA box: Because software implementations are too slow and existing hardware implementations cost too much.
Just a guess but I assume that this is (or rather, would be, judging by the README this isn't past the planning stage) for IoT and the like.
If you want your device to connect to a VPN you need something to implement the protocol. Cycles are precious in the embedded world so you don't want to do it in your microcontroller. You might offload it to another uC in your design but at that point it might make sense to just use an FPGA and have this at the hardware(-ish) level.
You can think of this as a "network interface chip" but speaking Wireguard instead of plain IP.
integration of some of the compute intensive bits into the nic itself. the reason to do it in hardware is to increase efficiency (or sometimes performance, although software/cpu wireguard is already pretty good). this could be baby steps towards lower power / miniaturized / efficient hardware that supports the wireguard protocol.
Wireguard is a protocol and program for making point-to-point VPN connections. It's notable because it's simple (compared to alternatives like OpenVPN), so simple it became a kernel module which made it very fast. These guys implemented it in an FPGA because they could.
Here's a dumb question, tangentially related, since they have a 10gig L2 switch mentioned... How come nobody (almost) makes L2 10gig switches? Ubiquiti has a 8port L2, that really seems to be it.
The last time I was checking (which was over 5 years ago now admittedly) there were no 10GbE switch options for reasonable prices. Juniper had good 16 port options with 1GbE interfaces at not crazy prices (which I have two of).
Going to 10GbE was many multiples of the 1GbE price. They just seemed way too expensive and were not dropping.
As it goes, maxing out 1GbE is fast enough for the sort of data and IOPS I send over my LAN. So 10GbE would probably have been overkill.
The 10Gb twisted pair cable requirements can bite you also. You may be working with who knows what installed cable that can't push it reliably. Or as a DIY person you may not understand exactly what to buy or limitations on running it.
1Gb is fast enough, cheap, and basically foolproof.
Enterprise 10G SFP+ switches has been pretty cheap on eBay for longer than that. While you can plug in an rj45 SFP it's just cheaper and better to use DAC cables.
First hand optics and fibre are cheap too really. I just picked up some 10GBASE-SR SFPs for $25usd ea, while an equivalent copper 10GBASE-T SFP module is nearly $90.
Do you mean like most vendors have moved onto faster port speeds? Mostly you can still use the slower 10G optics and the ports will clock down even if the nominal port speed is higher.
Not counting Cisco, juniper etc? Can probably get 32port 10G on eBay for cheap. There's also some on Amazon and AliExpress. And tons of white label options.
I think Wireguard is awesome and I use it exclusively.
That said, when traveling - on hotel wifi - for internet to work, TCP port 443 is always open, thus OpenVPN will always work if you run it on that port.
For Wireguard, there isn’t a reliable always-open UDP port. Port 123 or 53 could work sometimes, but it’s not as guaranteed.
For any other application though, Wireguard would be my first choice.
The format of the data inside the TCP stream is very simple. Each datagram is preceded with a 16 bit unsigned integer in big endian byte order, specifying the length of the datagram.
Performance would of course suffer but it's not likely that whichever service is blocking UDP is going to be offering high performance.
If you are doing it manually you can include two peers, one over UDP and one over TCP and prioritize traffic flow over the UDP one. Commercial VPN apps tend to handle that with "auto".
If you want to be fancy or you are confident that the UDP blocking service can offer high performance you can include a third peer using udp2raw: <https://github.com/wangyu-/udp2raw>
The reason why you may want to retain udp-over-tcp is that some sophisticated firewalls may block fake-TCP.
Yep, I really want to dote on wireguard and have contributed a little bit to it in its early years, but I've always found dsvpn to work at any cafe/hotel/hospital/etc. where I roam (except Sydney Airport - fuck their hostile wifi).
> For Wireguard, there isn’t a reliable always-open UDP port. Port 123 or 53 could work sometimes, but it’s not as guaranteed.
Couldn't you pipe it through something like udp2raw in those few cases? Probably performance would be worse/terrible, but then you say it's on hotel network so those tend to be terrible anyways.
SpiralHDL is so cool. There's been so so much consolidation in the semiconductor market, and that's scary. But it feels like there's such an amazing base of new open design systems to work from now, that getting new things started should be so possible! There's just a little too much gap in actually getting the Silicon Foundry model back up, things all a bit too encumbered still. Fingers crossed that chip making has its next day.
> However, the Blackwire hardware platform is expensive and priced out of reach of most educational institutions. Its gateware is written in SpinalHDL, a nice and powerfull but a niche HDL, which has not taken roots in the industry. While Blackwire is now released to open-source, that decision came from their financial hardship -- It was originaly meant for sale.
1. None of the commercial tools support them. All other HDLs compile to SV (or plain Verilog) and then you're wasting hours and hours debugging generated code. Not fun. Ask me how I know...
2. SV has an absolute mountain of features and other HDLs rarely come close. Especially when it comes to multi-clock designs (which are annoying and awkward but very common), and especially verification.
The only glimpse of hope I see on the horizon is Veryl, which hews close enough to SV that interop is going to be easy and the generated code is going to be very readable. Plus it's made by very experienced people. It's kind of the Typescript of SystemVerilog.
What are the benefits of SV for multi-clock design? I found migen (and amaranth) to be much nicer for multi-clock designs, providing a stdlib for CDCs and async FIFOs and keeping track of clock domains seperately from normal signals.
My issue with systemverilog is the multitude of implementation with widely varying degrees of support and little open source. Xsim poorly supports more advanced constructs and crashes with them, leaving you to figure out which part causes issues. Vivado only supports a subset. Toolchains for smaller FPGAs (lattice, chinese, ...) are much worse. The older Modelsim versions I used were also not great. You really have to figure out the basic common subset of all the tools and for synthesis, that basically leaves interfaces and logic . Interfaces are better than verilog, but much worse than equivalents in these neo-HDLs(?).
While tracing back compiled verilog is annoying, you are also only using one implementation of the HDL, without needing to battle multiple buggy, poorly documented implementation. There is only one, usually less buggy, poorly documented implementation.
Looking forward, it seems possible for Amaranth to be a full fledged language unto itself without needing python. One could maybe use python as an embedded macro language still -- which could be very powerful.
SpinalHDL's multiple clock domain support via lexical scoping is excellent.
Save for things like SV interfaces (which are equivalently implemented in a far better way using Scala's type system), SpinalHDL can emit pretty much any Verilog you can imagine.
While WireGuard makes every sense for an FPGA due to its minimal design, I wonder why there isn't much interest in using QUIC as a modern tunneling protocol, especially for corporate use cases. QUIC already provides an almost complete WireGuard-alternative via its datagrams that can be easily combined with TUN devices and custom authentication schemes (e.g. mTLS, bearer tokens obtained via OAuth2 and OIDC authentication, etc...) to build your own VPN. While I am not sure about performance, at least when compared to kernel-mode WireGuard, since QUIC is obviously a more complex state machine that's running in userspace and it depends on the implementation and optimizations offered by the OS (e.g. GRO/GSO), QUIC isn't just a yet another tunneling protocol, it actually offers lots of benefits such as working well with dynamic endpoints with DNS instead of just using static IP addrs, it uses modern TLSv1.3 and therefore it's compliant with FIPS for example, it uses AES which can be accelerated by the underlying hardware (e.g. AES-NI), it currently has implementations in almost every major programming language, it can work well in the future with proxies and load balancers, you can bring your own custom, more fine-grained authentication scheme (e.g. bearer tokens, mTLS, etc...), it masquerades as just another QUIC/HTTP3 traffic that's used by almost all major websites now and therefore less susceptible to dropping by any nodes in between, and other less obvious benefits such as congestion control and PMTUD.
Quic is a corporate supported black hole. Corporations are anti-human. Its a wonder that there is still some freedom to make useful protocols on the internet and that people are nice enough to do that
Why would anyone want to use a complex kludge like QUIC and be at the mercy of broken TLS libraries, when Wireguard implementations are ~ 5k LOC and easily auditable?
Have all the bugs in OpenSSL over the years taught us nothing?
MASQUE[0] is the protocol for this. Cloudflare already uses masque instead of wireguard in their warp vpn.
[0]https://datatracker.ietf.org/wg/masque/about/
i was curious about this and did some digging around for an open source implementation. this is what i found: https://github.com/iselt/masque-vpn
I've recently spent a bunch of time working on a mesh networking project that employs CONNECT-IP over QUIC [1].
There's a lot of benefits for sure, mTLS being a huge one (particularly when combined with ACME). For general purpose, spoke and hub VPN's tunneling over QUIC is a no-brainer. Trivial to combine with JWT bearer tokens etc. It's a neat solution that should be used more widely.
However there are downsides, and those downsides are primarily performance related. For a bunch of reasons, some just including poorly optimized library code, others involving relatively high message parsing/framing/coalescing/fragmenting costs, and userspace UDP overheads. On fat pipes today you'll struggle to get more than a few gbits of throughput @ 1500 MTU (which is plenty for internet browsing for sure).
For fat pipes and hardware/FPGA acceleration use cases, google probably has the most mature approach here with their datacenter transport PSP [2]. Basically a stripped down per flow IPsec. In-kernel IPsec has gotten a lot faster and more scalable in recent years with multicore/multiqueue support [3]. Internal benchmarking still shows IPsec on linux absolutely dominating performance benchmarks (throughput and latency).
For the mesh project we ended up pivoting to a custom offload friendly, kernel bypass (AF_XDP) dataplane inspired by IPsec/PSP/Geneve.
I'm available for hire btw, if you've got an interesting networking project and need a remote Go/Rust developer (contract/freelance) feel free to reach out!
1. https://www.rfc-editor.org/rfc/rfc9484.html
2. https://cloud.google.com/blog/products/identity-security/ann...
3. https://netdevconf.info/0x17/docs/netdev-0x17-paper54-talk-s...
Is quic related to the Chrome implemented WebTransport? Seems pretty cool to have that in browser API.
The purpose of Wireguard is to be simple. The purpose of QUIC is to be compatible with legacy web junk. You don't use the second one unless you need the second one.
QUIC isn't really about the web, it's more of a TCP+TLS replacement on top of UDP. You can build your own custom L7 on top of QUIC.
You can build a custom L7 on top of anything, really. I think my favorite was tcp/ip over printers and webcams.
The question is what does QUIC get you that UDP alone does not? I don't know the answer to that. Is it because firewalls understand it better than native wireguard over UDP packets?
Mostly because WireGuard (intentionally) didn't bother with obfuscation https://www.wireguard.com/known-limitations/
> WireGuard does not focus on obfuscation. Obfuscation, rather, should happen at a layer above WireGuard, with WireGuard focused on providing solid crypto with a simple implementation. It is quite possible to plug in various forms of obfuscation, however.
This comment https://news.ycombinator.com/item?id=45562302 goes into a practical example of QUIC being that "layer above WireGuard" which gets plugged in. Once you have that, one may naturally wonder "why not also have an alternative tunnelling protocol with <the additional things built into QUIC originally listed> without the need to also layer Wireguard under it?".
Many design decisions are in direct opposition to Wireguard's design. E.g. Wireguard (intentionally) has no AES and no user selectable ciphers (both intentionally), QUIC does. Wireguard has no obfuscation built in, QUIC does (+ the happy fact when you obfuscate traffic by using it then it looks like standard web traffic). Wireguard doesn't support custom authentication schemes, QUIC does. Both are a reasonable tunneling protocol design, just with different goals.
I think maybe it's easier for an adversarial network admin to block QUIC altogether.
The hope with QUIC is encrypted tunnels that look and smell like standard web traffic are probably first in the list of any allowed traffic tunneling methods. It works (surprisingly) a lot more often than hoping an adversarial network/security admin doesn't block known VPN protocols (even when they are put on 443). It also doesn't hurt that "normal" users (unknowingly) try to generate this traffic, so opening a QUIC connection on 443 and getting a failure makes you look like "every other user with a browser" instead of "an interesting note in the log".
I.e. the advantage here is any% + QUIC%, where QUIC% is the additional chances of getting through by looking and smelling like actual web traffic, not a promise of 100%.
Encryption and reliable transport.
You really don't want reliable transport as a feature of the tunnel unless you are _intimately_ familiar with what all of the tunneled traffic is already doing for reliable transport.
The net result of two reliable transports which are unaware of each other is awful.
I probably should have clarified that question.
What does QUIC get you that TCP over Wireguard over UDP does not?
Where is DNS on top of QUIC? Asking unironically.
There is actually. A way more interesting re-implementation of a popular L7 is SSH over QUIC. SSH has to implement its own mutual authentication and transport embedded in the protocol implementation since it operates on top of plaintext TCP, but with QUIC you can just offload the authentication (e.g. JWT bearer tokens issued by IdPs verified at L7 or automatically via mTLS x509 certs) and transport parts to QUIC and therefore have a much more minimal implementation.
We already have it. It is called DNS over HTTP/3 (DoH3).
That's DoQ, RFC 9250.
What legacy junk is QUIC compatible with? It doesn’t include anything HTTP-related at all. It’s just an encrypted transport layer.
It’s multi stream, reliable connections. WireGuard’s encryption over UDP is none of those things. WireGuard encryption is simpler and far more flexible, but also less capable.
Mullvad offers exactly the combination of wireguard in QUIC for obsfucation and to make traffic look like Https -- https://mullvad.net/en/blog/introducing-quic-obfuscation-for...
WireGuard-over-QUIC does not make any sense to me, this lowers performance and possibly the inner WireGuard MTUs. You can just replace WireGuard with QUIC altogether if you just want obfuscation.
It's not about performance, of course. It's about looking like HTTPS, being impenetrable, separating the ad-hoc transport encryption and the Wireguard encryption which also works as authentication between endpoints, and also not being not TCP inside TCP.
You can just do that by using QUIC-based tunneling directly instead of using WireGuard-over-QUIC and basically stacking 2 state machines on top of one another.
TCP over Wireguard is two state machines stacked on each other. QUIC over Wireguard is the same thing. Yet, both seems to work pretty well.
I think I see your argument, in that it's similar to what sshuttle does to eliminate TCP over TCP through ssh. sshuttle doesn't prevent HOL blocking though.
TCP over WireGuard is unavoidable because that's the whole point of tunneling. But TCP over WireGuard over QUIC just doesn't make any sense, neither from performance nor from security perspective. Not to mention that with every additional tunneling layer you need to reduce the MTU (which is already a very restricted sub-1500 value without tunneling) of all inner tunnels.
Probably simplifies their clients and backends I'd imagine?
See also Obscura's approach of QUIC bridges to Mullvad as a privacy layer: https://obscura.net/blog/bootstrapping-trust/
I think standards operate according to punctuated equilibrium so the market will only accept one new standard every ten years or so. I could imagine something like PQC causing a shift to QUIC in the future.
Tangentially related, I've experimented with Tailscale and Zerotier and, tho I guess they have different audiences, I prefer Zerotier for reliability. Tailscale gets borked by existing VPN config, breaking things on local networks. I like both but does anyone care to share their experiences or explain more in depth the uses / differences as they see it?
Very cool project - hoping to see follow-up designs that can do more than 1Gbps per port!
I recently built a fully Layer2-transparent 25Gbps+ capable wireguard-based solution for LR fiber links at work based on Debian with COTS Zen4 machines and a purpose-tailored Linux kernel build - I'd be curious to know what an optimized FPGA can do compared to that.
When macsec exists?
No kidding.
Just to elaborate for others, MACSec is a standard (802.1ae) and runs at line rate. Something like a Juniper PTX10008 can run it at 400Gbps, and it’s just a feature you turn on for the port you’d be using for the link you want to protect anyway (PTXs are routers/switches, not security devices).
If I need to provide encryption on a DCI, I’m at least somewhat likely to have gear that can just do this with vendor support instead of needing to slap together some Linux based solution.
Unless, I suppose, there’s various layer 2 domains you’re stitching together with multiple L2 hops and you don’t control the ones in the middle. In which case I’d just get a different link where that isn’t true.
> When macsec exists?
When you say "exists" ... is there an OpenSource high-quality implementation ?
https://man7.org/linux/man-pages/man8/ip-macsec.8.html
Generally its used when you have links going between two of your sites, so you typically only need it on your switch or router that terminate that link.
This is a flex!
I can't think of a scenario where this is useful. They claim "Full-throttle, wire-speed hardware implementation of Wireguard VPN" but then go on implementing this on a board with a puny set of four 1 Gbps ports... The standard software implementation of Wireguard (Linux kernel) can already saturate Gbps links (wirespeed, check) and can even approach 10 Gbps on a mid-range CPU: https://news.ycombinator.com/item?id=42172082
If they had produced a platform with four 10 Gbps ports, then it would become interesting. But the whole hardware and bitstream would have to be redevelopped almost from scratch.
It's an educational project. No need to put it on blast over that. CE/EE students can buy a board for a couple hundred bucks and play around with this to learn.
A hypothetical ASIC implementation would beat a CPU rather soundly on a per watt and per dollar basis, which is why we have hardware acceleration for other protocols on high end network adaptors.
Personally, if I could buy a Wireguard appliance that was decent for the cost, I'd be interested in that. I ran a FreeBSD server in my closet to do similar things back in the day and don't feel the need to futz around with that again.
There’s a strong air of grantware to it. The notion that it could be end-to-end auditable from the RTL up is interesting, though, and generally Wireguard performance will tank with a large routing table and small MTUs like you might suffer on a VPN endpoint server while this project seems to target line speed even at the absolute worst case routing x packets scenario.
what do you mean by grantware?
The project got a grant from NLnet. I think they do a great job, they gave grants to many nice projects (and also some projects that are going nowhere, but I guess that is all in the game). NLnet really deserves praise for what they are doing!! https://nlnet.nl/thema/NGI0CommonsFund.html
Academic projects which receive grant money to produce papers and slides. This still can advance the state of the art, to be clear, and I like the papers and slides coming out of this project. But I wouldn’t cross my fingers for a working solution anytime soon.
I can see this as a hardened VPN in a mission-critical deployment, which could not be as easily compromised as a software stack.
Why would you even need dedicated hardware for just 40 Gb/s? That is within single-core decryption performance which should be the bottleneck for any halfway decent transport protocol. Are we talking 40 Gb/s at minimum packet size so you need to handle ~120 M packets/s?
Because the entire stack is auditable here. There's no Cisco backdoor, no Intel ME, no hidden malware from a zombie NPM package. It's all your hardware.
IMO it would be cool if they added Wireguard to Corundum but it would be expensive enough that they wouldn't get any hobbyist cred.
If a PC can do 10Gbps, are there any cycles left for other stuff?
bps are easy. packets per second is the crunch. Say you've got 64 bytes per packet, which would be a worst-case-scenario - you're down to 150Mpacket/sec. Sending one byte after another is the easy bit, the decisions are made per-packet.
Amusingly, a lot of people have always been convinced that doing 10 Gbps is impossible on VPN. I recall a two-year old post on /r/mikrotik where everyone was telling OP it was impossible with citations and sources of why but then it worked
https://old.reddit.com/r/mikrotik/comments/112mo4v/is_there_...
Mikrotik's hardware often can't even do linespeed beyond basic switching, not to mention VPN, so yeah.
I meant the comments. Sadly I've linked the wrong permalink and confused everyone.
> > > I see. I'll terminate at the Ryzen 7950 box behind the router and see what I get.
> > That will still be a no. Outside of very specialized solutions this level of the performance is not available. It is rarely needed in real life anyways. Only small amount of traffic neess to be protected this way; for everything else point to point protection with ssh or tls is adequate. I studied different router devices and most (ipsec is dominant) have low encryption truoughput compared to routing capabilities. I guess that matches market requrements.
> It looks like I can get 8 Gbps with low CPU utilization using one of my x86 machines as terminal. This is pretty good. Don't need 10 G precisely. 8G is enough.
I've done precisely this so easily. I just terminate the WG at a gateway node and switch in Linux. It's trivial and throughput can easily max the 10G. I had a 40G network behind that on obsolete hardware providing storage and lots of machines reading from that.
Reading that thread was eye-opening since they should have just told him to terminate on the first machine behind. Which he eventually did and predictably worked.
They're discussing mikrotik hardware specifically? Enterprise stuff or a powerful server can easily do it.
It's highly going to depend on the hardware in use.
This is conceptually interesting but seems quite a ways from a real end to end implementation - a bit of a smell of academic grantware that I hope can reach completion.
Fully available source from RTL up (although the license seems proprietary?) is very interesting from an audit standpoint, and 1G line speed performance, although easily achieved by any recent desktop hardware, is quite respectable in worst case scenarios (large routing table and small frames). The architecture makes sense (software managed handshakes configure a hardware packet pipeline). WireGuard really lacks acceleration in most contexts (newer Intel QAT supposedly can accelerate ChaCha20 but trying to figure out how one might actually make it work is truly mind bending), so it’s a pretty interesting place to do a hardware implementation.
> (although the license seems proprietary?)
Hm, "BSD 3-Clause License" is seems really proprietary to you?
But you are right: do the personal license in many(most?) Verilog files[1] overrules the LICENSE file[2] of a repo?
[1] https://github.com/chili-chips-ba/wireguard-fpga/blob/main/1...
[2] https://github.com/chili-chips-ba/wireguard-fpga/blob/main/L...
"With traditional solutions (such as OpenVPN / IPSec) starting to run out of steam" -- and then zero explanation or evidence of how that is true.
I can see an argument for IPSec. I haven't used that for many years. However, I see zero evidence that OpenVPN is "running out of steam" in any way shape or form.
I would be interested to know the reasoning behind this. Hopefully the sentiment isn't "this is over five years old so something newer must automatically be better". Pardon me if I am being too cynical, but I've just seen way too much of that recently.
Seems like you just haven’t been paying attention. Even commercial VPNs like PIA and others now use Wireguard instead of traditional VPN stacks. Tailscale and other companies in that space are starting to replace VPN stacks with Wireguard solutions.
The reasons are abundant, the main ones being performance is drastically better, security is easier to guarantee because the stack itself is smaller and simpler, and it’s significantly more configurable and easier to obtain the behavior you want.
I use and advocate for wireguard but I don't see it's adoption in bigger orgs, at least the ones I've worked in. Appreciate this situation will change over time, but it'll be a long tail.
It’ll take a little bit of time. But for example Cloudflare’s Warp VPN also uses Wireguard under the hood.
So while corp environments may take a long time to switch for various reasons, it will happen eventually. But for stuff like this corp IT tends to be a lagging adopter, 10-20 years behind the curve.
Warp actually uses MASQUE (UDP/IP over QUIC) by default
Bigger orgs for the most part use whatever vpn solutions their (potentially decade old) hardware firewalls support. Until you can manage and endpoint a Wireguard tunnel on Cisco, Juniper, Fortigate (etc) hardware then it's going to take a while to become more mainstream.
Which is a shame, because I have a number of problematic links (low bandwidth, high latency) that wireguard would be absolutely fantastic for, but neither end supports it and there's no chance they'll let me start terminating a tonne of VPNs in software on a random *nix box.
If you use Kubernetes and Calico you can use Wireguard to transparently encrypt in-cluster traffic[1] (or across clusters if you have cluster mesh configured). I wonder if we'll see more "automatic SDN over Wireguard" stuff like this as time goes on and the technology gets more proven.
Problem is IIRC if you need FIPS compliance you can't use Wireguard, since it doesn't support the mandated FIPS ciphers or what-have-you.
[1]https://docs.tigera.io/calico/latest/network-policy/encrypt-...
sure, but I mean "road warrior" client. Typical, average company VPN users. Ironocally getting a technology like wireguard in k8s is easier than replacing an established vendor/product that serves normal users.
The anti-FIPS position of the wireguard implementors is a big problem for adoption.
Exactly. We've looked at using Wireguard at my company, but because it can't be made FIPS compliant, it makes it a hard sell. There is a FIPS Wireguard implementation by WolfSSL, interestingly enough.
Yeah itll be running out of steam not only when regulators _understand_ wireguard, but when its the recommendation and orgs need to justify their old vpn solution
I wouldn't say they're running out of steam (they never had any) but OpenVPN was always poorly designed and engineered and IPSec has poor interop because there are so many options.
Unfortunately (luckily?) I don’t have enough knees about IPsec, but usually things make a lot more sense once you actually know the exact architecture and rationale behind it
Knowledge *
IPSec isn’t running out of steam anytime soon. Every commercial firewall vendor uses it, and it’s mandatory in any federal government installation.
WireGuard isn’t certified for any federal installation that I’m aware of and I haven’t heard of any vendors willing to take on the work of getting it certified when its “superiority” is of limited relevance in an enterprise situation.
Interestingly tried out just now on one of my devices and Wireguard VPN speed was 5x faster on same configuration to OpenVPN.
OpenVPN has both terrible configuration and performance compared to just about anything else. I've seen it really drop off to next to no usage both in companies and for personal use over the past few years as wireguard based solutions have replaced it.
Same here. With openvpn my somewhat modern cpu takes out a whole core @100% at like 200 megabits/s.
With WireGuard I instead max out the internet bandwidth (400 megabits/s) with like 20% cpu usage if that.
I really don’t understand why. We have AES acceleration. AES-NI can easily do more bps… why is openvpn so slow?
Wireguard is slowly eating the space alive and thats a good thing.
Here's a very educational comparison between Wireguard, OpenVPN and IPSec. It shows how easy wireguard is to manage compared to the other solutions and measures and explains the noticeable differences in speed: https://www.youtube.com/watch?v=LmaPT7_T87g
Very recommended!
Project page: https://nlnet.nl/project/KlusterLab-Wireguard/
I haven’t tinkered with an FPGA in years but this has my curiosity up. I’d love to separate the protocol handling from the routing and see how light (small of an FPGA, power efficiency) it could be made.
The routing isn’t interesting to me - but protecting low power IoT traffic certain is.
Aside from Blackwire prococols, the sector for FPGA's that are in the AMD architectural framework, Xilinx acquisition is the tangential key-management software for VPN tunneling, which is contingent on whether ASIC [application-specific integrated circuits] can successfully test binaries.
This is a very cool project! I had never heard of SystemVerilog until today.
I’ll need someone more into this to break it down for me - how does VPN work on this and why do you need an FPGA version of it? Is this an internal VPN or one for connecting to the internet?
This part of the README answers the “why” pretty well:
> Both software and hardware implementations of Wireguard already exist. However, the software performance is far below the speed of wire.
> Existing hardware approaches are both prohibitively expensive and based on proprietary, closed-source IP blocks and tools.
> The intent of this project is to bridge these gaps with an FPGA open-source implementation of Wireguard, written in SystemVerilog HDL.
So having it on an FPGA gives you the best of both worlds, speed of a hardware implementation without the concerns of a proprietary black box.
"VPN" is just virtual emulated network cables that you would use to connect your laptops to Wi-Fi routers. It's just so happens that a lot of companies use that word for a paid, cloud based Internet-over-Internet service. It's as if taxi companies called themselves "wheels" companies that whether you're referring to the physical object or the service had become ambiguous.
VPNs are normally processed in software, and that processing is usually multi-step. So latency, jitter, processing time per types of packets, etc can vary. This is FPGA based, and FPGA can run some algorithms and programs that can be implemented as chained conditions at fixed latency without relying on function calling in software. Presumably this is faster and more stable than software approaches thanks to that.
Not a member of the project but here is my take:
You run the WireGuard app on your computer/phone, tap Connect, and it creates an encrypted tunnel to a small network box (the “FPGA gateway”) at your office or in the cloud. From then on, your apps behave as if you’re on the company network, even if you’re at home or traveling.
Why the FPGA box: Because software implementations are too slow and existing hardware implementations cost too much.
Internal or Internet: Both.
Just a guess but I assume that this is (or rather, would be, judging by the README this isn't past the planning stage) for IoT and the like.
If you want your device to connect to a VPN you need something to implement the protocol. Cycles are precious in the embedded world so you don't want to do it in your microcontroller. You might offload it to another uC in your design but at that point it might make sense to just use an FPGA and have this at the hardware(-ish) level.
You can think of this as a "network interface chip" but speaking Wireguard instead of plain IP.
integration of some of the compute intensive bits into the nic itself. the reason to do it in hardware is to increase efficiency (or sometimes performance, although software/cpu wireguard is already pretty good). this could be baby steps towards lower power / miniaturized / efficient hardware that supports the wireguard protocol.
also just a fun project for the authors. :)
Wireguard is a protocol and program for making point-to-point VPN connections. It's notable because it's simple (compared to alternatives like OpenVPN), so simple it became a kernel module which made it very fast. These guys implemented it in an FPGA because they could.
Here's a dumb question, tangentially related, since they have a 10gig L2 switch mentioned... How come nobody (almost) makes L2 10gig switches? Ubiquiti has a 8port L2, that really seems to be it.
Do you mean specifically as consumer products?
There are loads of 10GbE switches from Cisco/Juniper/Arista/et al.
I'd guess so.
The last time I was checking (which was over 5 years ago now admittedly) there were no 10GbE switch options for reasonable prices. Juniper had good 16 port options with 1GbE interfaces at not crazy prices (which I have two of).
Going to 10GbE was many multiples of the 1GbE price. They just seemed way too expensive and were not dropping.
As it goes, maxing out 1GbE is fast enough for the sort of data and IOPS I send over my LAN. So 10GbE would probably have been overkill.
The 10Gb twisted pair cable requirements can bite you also. You may be working with who knows what installed cable that can't push it reliably. Or as a DIY person you may not understand exactly what to buy or limitations on running it.
1Gb is fast enough, cheap, and basically foolproof.
Enterprise 10G SFP+ switches has been pretty cheap on eBay for longer than that. While you can plug in an rj45 SFP it's just cheaper and better to use DAC cables.
Second hand optics and preterminated fibre are cheap now too.
First hand optics and fibre are cheap too really. I just picked up some 10GBASE-SR SFPs for $25usd ea, while an equivalent copper 10GBASE-T SFP module is nearly $90.
Mikrotik has quite a few, I've been happily using CRS306 and CRS312 for some years now.
Do you mean like most vendors have moved onto faster port speeds? Mostly you can still use the slower 10G optics and the ports will clock down even if the nominal port speed is higher.
Not counting Cisco, juniper etc? Can probably get 32port 10G on eBay for cheap. There's also some on Amazon and AliExpress. And tons of white label options.
I think Wireguard is awesome and I use it exclusively.
That said, when traveling - on hotel wifi - for internet to work, TCP port 443 is always open, thus OpenVPN will always work if you run it on that port.
For Wireguard, there isn’t a reliable always-open UDP port. Port 123 or 53 could work sometimes, but it’s not as guaranteed.
For any other application though, Wireguard would be my first choice.
Some VPN applications provide the means by which to tunnel WG over TCP. Some provide those as standalone tools: <https://github.com/mullvad/udp-over-tcp>
The one above has a very simple protocol:
Performance would of course suffer but it's not likely that whichever service is blocking UDP is going to be offering high performance.If you are doing it manually you can include two peers, one over UDP and one over TCP and prioritize traffic flow over the UDP one. Commercial VPN apps tend to handle that with "auto".
If you want to be fancy or you are confident that the UDP blocking service can offer high performance you can include a third peer using udp2raw: <https://github.com/wangyu-/udp2raw>
The reason why you may want to retain udp-over-tcp is that some sophisticated firewalls may block fake-TCP.
Yep, I really want to dote on wireguard and have contributed a little bit to it in its early years, but I've always found dsvpn to work at any cafe/hotel/hospital/etc. where I roam (except Sydney Airport - fuck their hostile wifi).
[dsvpn]: https://github.com/jedisct1/dsvpn
QUIC will hopefully help with this.
> For Wireguard, there isn’t a reliable always-open UDP port. Port 123 or 53 could work sometimes, but it’s not as guaranteed.
Couldn't you pipe it through something like udp2raw in those few cases? Probably performance would be worse/terrible, but then you say it's on hotel network so those tend to be terrible anyways.
SpiralHDL is so cool. There's been so so much consolidation in the semiconductor market, and that's scary. But it feels like there's such an amazing base of new open design systems to work from now, that getting new things started should be so possible! There's just a little too much gap in actually getting the Silicon Foundry model back up, things all a bit too encumbered still. Fingers crossed that chip making has its next day.
> However, the Blackwire hardware platform is expensive and priced out of reach of most educational institutions. Its gateware is written in SpinalHDL, a nice and powerfull but a niche HDL, which has not taken roots in the industry. While Blackwire is now released to open-source, that decision came from their financial hardship -- It was originaly meant for sale.
Here's some kind of link for the old BlackWire 100Gbe wiregaurd project mentioned: https://github.com/FPGA-House-AG/BlackwireSpinal
Amusingly, after the commentaries about niche HDLs, the authors seem to have turned to PipelineC in this project.
The problems with all not-SV HDLs are:
1. None of the commercial tools support them. All other HDLs compile to SV (or plain Verilog) and then you're wasting hours and hours debugging generated code. Not fun. Ask me how I know...
2. SV has an absolute mountain of features and other HDLs rarely come close. Especially when it comes to multi-clock designs (which are annoying and awkward but very common), and especially verification.
The only glimpse of hope I see on the horizon is Veryl, which hews close enough to SV that interop is going to be easy and the generated code is going to be very readable. Plus it's made by very experienced people. It's kind of the Typescript of SystemVerilog.
What are the benefits of SV for multi-clock design? I found migen (and amaranth) to be much nicer for multi-clock designs, providing a stdlib for CDCs and async FIFOs and keeping track of clock domains seperately from normal signals.
My issue with systemverilog is the multitude of implementation with widely varying degrees of support and little open source. Xsim poorly supports more advanced constructs and crashes with them, leaving you to figure out which part causes issues. Vivado only supports a subset. Toolchains for smaller FPGAs (lattice, chinese, ...) are much worse. The older Modelsim versions I used were also not great. You really have to figure out the basic common subset of all the tools and for synthesis, that basically leaves interfaces and logic . Interfaces are better than verilog, but much worse than equivalents in these neo-HDLs(?).
While tracing back compiled verilog is annoying, you are also only using one implementation of the HDL, without needing to battle multiple buggy, poorly documented implementation. There is only one, usually less buggy, poorly documented implementation.
Looking forward, it seems possible for Amaranth to be a full fledged language unto itself without needing python. One could maybe use python as an embedded macro language still -- which could be very powerful.
SpinalHDL's multiple clock domain support via lexical scoping is excellent.
Save for things like SV interfaces (which are equivalently implemented in a far better way using Scala's type system), SpinalHDL can emit pretty much any Verilog you can imagine.