I previously worked at Bytedance and we've maintained a Rust zero-copy gRPC/Thrift implementation for 4 years: https://github.com/cloudwego/volo, it is based on Bytes crate (reference counting bytes, for folks don't familiar with Rust ecosystem). A fun fact: when we measuring on our product environment, zero-copy isn't means higher performance in lots of scenarios, there are some trade-offs:
1. zero-copy means bytes are always inlined in the raw message buffer, which means the app should always access bytes by a reference/pointer
2. You cannot compress the RPC message, if you want to fully leverage the advantages from zero serdes/copy
Speaking of volo I'm trying to implement a etcd shim with SurrealKV. Haven't been able to get the OG etcd E2E conformance test 100% passed yet so I'm not releasing it just now
I also find this misleading, and could be solved so easily by just explaining that of course varints need resolving and things will just happen lazily (presumably, I didn’t read the code) when they are requested to be read rather than eagerly.
Is this still true? New versions of protobuf allow codegen of `std::string_view` rather than `const std::string&` (which forces a copy) of `string` and `repeated byte` fields.
It allows avoiding allocations, but it doesn't allow using serialised data as a backing memory for an in-language type. Protobuf varints have to be decoded and written out somewhere. They cannot be lazily decoded efficiently either: order of fields in the serialised message is unspecified, hence it either need to iterate message over and over finding one on demand or build a map of offsets, which negates any wins zero-copy strives to achieve.
This is true but the relative overhead of this is highly dependent on the protobuf structure in one's schema. For example, fixed integer fields don't need to be decoded (including repeated fixed ints), and the main idea of the "zero copy" here is avoiding copying string and bytes fields. If your protobufs are mostly varints then yes they all have to be decoded, if your protobufs contain a lot of string/bytes data then most of the decoded overhead could be memory copies for this data rather than varint decoding.
In some message schemas even though this isn't truly zero copy it may be close to it in terms of actual overhead and CPU time, in other schemas it doesn't help at all.
Those field accessors take and return string_view but they still copy. The official C++ library always owns the data internally and never aliases except in one niche use case: the field type is Cord, the input is large and meets some other criteria, and the caller had used kParseWithAliasing, which is undocumented.
To a very close approximation you can say that the official protobuf C++ library always copies and owns strings.
This is very cool! I’m most interested in the protobuf runtime - Rust has historically used Prost, which doesn’t pass the protobuf compliance test suite and isn’t Google-maintained. Google’s priority internally is cpp interop, so they use unsafe for protobuf - which the community is understandably not excited about.
(For full disclosure, I started the ConnectRPC project - so of course I’m excited about that part of the announcement too.)
I have been on a similar odyssey making a 'zero copy' Java library that supports protobuf, parquet, thrift (compact) and (schema'd) json. It does allocate a long[] and break out the structure for O(1) access but doesn't create a big clump of object wrappers and strings and things; internally it just references a big pool buffer or the original byte[].
About protocols in this vicinity, I've been noticing a missing piece in OSS around transport as well. In Python, you often need incompatible dependency sets in one app, and the usual choices are either ad-hoc subprocess RPC that gets messy over time or HTTP / containers that are overkill and make you change deployment strategy.
I ended up building a protocol for my own use around a very strict subprocess boundary for Python (initially at least, protocol is meant to be universal). It has explicit payload shape, timeout and error semantics. I already went a little too far beyond my usecase with deterministic canonicalization for some common pitfall data types (I think pickle users would understand, though). It still needs some documentation polish, but if anyone would actually use it, I can document it properly and publish it.
I've been running into _a lot_ of issues with Hyper/Tonic. Like literal H2 spec violations. Try hosting a tonic server behind nginx or ALB. It will literally just not work as it can't handle GOAWAY retries in a H2 spec-compliant way.
If this fixes that I might consider switching.
However, Google is also working in a new grpc-rust implementation and I have faith in them getting it right so holding tight a little bit longer.
It's 2026 and I'm still defining my own messaging and wire protocols.
Plain C structs that fit in a UDP datagram that you can reinterpret_cast from is still best. You can still provide schemas and UUIDs for that, and dynamically transcode to JSON or whatever.
Until you have to work with big and little endian systems. There are other weirdness about how different computers represent things as well. utf-8 / ucs-16 strings (or other code pages). Not all floats are ieee-754. Still when you can ignore all those issues what you did is really easy and often works.
- you agree never to care about endianness (can probably get away with this now)
- you don't want to represent anything complicated or variable length, including strings
For topics which are sending the state of something, a gap naturally self-recovers so long as you keep sending the state even if it doesn't change.
For message buses that need to be incremental, you need to have a separate snapshot system to recover state. That's usually pretty rare outside of things like order books (I work in low-latency trading).
For requests/response, I find it's better to tell the requester their request was not received rather than transparently re-send it, since by the time you re-send it it might be stale already. So what I do at the protocol level is just have ack logic, but no retransmit. Also it's datagram-oriented rather than byte-oriented, so overall much nicer guarantees than TCP (so long as all your messages fit in one UDP payload).
Google really dropped the ball with protobuf when they took so long to make them zero-copy. There are 3rd party implementations popping up now and a real risk of future wire-level incompatibilities across languages.
"zero copy" in this context just means that the contents of the input buffer are aliased to string fields in the decoded representation. This is a language-level feature and has nothing to do with the wire format.
On the other hands, having half the packages depend on packages such as serde, syn, procmacro2 might not be such a good idea. First of all it is annoying when creating new projects to have to move over table stakes. Second, it is a security nightmare. most of rust could be vulnerable if dtolnay decided to go rogue.
It is not that everything should go into the stdlib, but having syn, procmacro and serde would be a good start imo. And like golang having a native http stack would be really awesome, every time you have to do any HTTP, you end up pulling in some c-based crypto lib, which can really mess up your day when you want to cross-compile. With golang it mostly just works.
It isn't really in the flavor of rust to do, so I don't think it is going to happen, but it is nice when building services, that you can avoid most dependencies.
I agree with this. Rust has a node-style dependency problem; any non-trivial rust project ends up with dozens of dependencies in my experience. I would add tokio to the list of dependencies-so-common-they-should-be-moved-to-stdin.
A second tier stdlib would turn out like the Boost c++ libraries -- an 800 lb gorilla of a common dependency that gets called in just to do something very simple; although to be fair most of the Boost functionality already is in rust's stdlib.
As long as the "2nd-tier" stdlib was versioned & tied in with the edition system, it could work. The problem with most stdlibs (including Rust's) is that there's no way to remove anything & replace it with a better design. So the lib only ever grows, slowly adding complexity.
Would it be still a good idea if instead of being created / owned by google as an organization it was originally made by someone that didn't make billions by handling trillions of http requests over decades and you had to keep all of the bad initial api design choices going forward?
I would always go to the official docs page for the needs I have, and use their HTTP library (or any other). It removes decision making, having to ensure good quality practices from lesser known libraries, and risks of supply chain attacks (assuming the root stdlib of a language would have more attention to detail and security than any random 3rd-party library thrown into github by a small group of unpaid devs)
Only when it falls short on my needs, I would drop the stdlib and go in dearch of a good quality, reputable, and reliable 3rd-party lib (which is easier said than done).
Has worked me well with Go and Python. I would enjoy the same with Rust. Or at a minimum, a list of libraries officialy curated and directly pointed at by the lang docs.
I previously worked at Bytedance and we've maintained a Rust zero-copy gRPC/Thrift implementation for 4 years: https://github.com/cloudwego/volo, it is based on Bytes crate (reference counting bytes, for folks don't familiar with Rust ecosystem). A fun fact: when we measuring on our product environment, zero-copy isn't means higher performance in lots of scenarios, there are some trade-offs:
1. zero-copy means bytes are always inlined in the raw message buffer, which means the app should always access bytes by a reference/pointer
2. You cannot compress the RPC message, if you want to fully leverage the advantages from zero serdes/copy
3. RC itself
same thing with io_uring zero copy in my limited testing: buffer usage accounting is not free and copying memory makes things drastically simpler.
Speaking of volo I'm trying to implement a etcd shim with SurrealKV. Haven't been able to get the OG etcd E2E conformance test 100% passed yet so I'm not releasing it just now
True zero-copy is not achievable with Protobuf, you need something like FlatBuffers for that. What is presented here is more like a zero-allocations.
I also find this misleading, and could be solved so easily by just explaining that of course varints need resolving and things will just happen lazily (presumably, I didn’t read the code) when they are requested to be read rather than eagerly.
Is this still true? New versions of protobuf allow codegen of `std::string_view` rather than `const std::string&` (which forces a copy) of `string` and `repeated byte` fields.
https://protobuf.dev/reference/cpp/string-view/
It allows avoiding allocations, but it doesn't allow using serialised data as a backing memory for an in-language type. Protobuf varints have to be decoded and written out somewhere. They cannot be lazily decoded efficiently either: order of fields in the serialised message is unspecified, hence it either need to iterate message over and over finding one on demand or build a map of offsets, which negates any wins zero-copy strives to achieve.
This is true but the relative overhead of this is highly dependent on the protobuf structure in one's schema. For example, fixed integer fields don't need to be decoded (including repeated fixed ints), and the main idea of the "zero copy" here is avoiding copying string and bytes fields. If your protobufs are mostly varints then yes they all have to be decoded, if your protobufs contain a lot of string/bytes data then most of the decoded overhead could be memory copies for this data rather than varint decoding.
In some message schemas even though this isn't truly zero copy it may be close to it in terms of actual overhead and CPU time, in other schemas it doesn't help at all.
Those field accessors take and return string_view but they still copy. The official C++ library always owns the data internally and never aliases except in one niche use case: the field type is Cord, the input is large and meets some other criteria, and the caller had used kParseWithAliasing, which is undocumented.
To a very close approximation you can say that the official protobuf C++ library always copies and owns strings.
Well that is very disappointing news.
Even the decoder makes a copy even though it's returning a string_view? What's the point then.
I can understand encoders having to make copies, but not in a decoder.
This is very cool! I’m most interested in the protobuf runtime - Rust has historically used Prost, which doesn’t pass the protobuf compliance test suite and isn’t Google-maintained. Google’s priority internally is cpp interop, so they use unsafe for protobuf - which the community is understandably not excited about.
(For full disclosure, I started the ConnectRPC project - so of course I’m excited about that part of the announcement too.)
Exciting!
I have been on a similar odyssey making a 'zero copy' Java library that supports protobuf, parquet, thrift (compact) and (schema'd) json. It does allocate a long[] and break out the structure for O(1) access but doesn't create a big clump of object wrappers and strings and things; internally it just references a big pool buffer or the original byte[].
The speed demons use tail calls on rust and c++ to eat protobuf https://blog.reverberate.org/2021/04/21/musttail-efficient-i... at 2+GB/sec. In java I'm super pleased to be getting 4 cycles per touched byte and 500MB/sec.
Currently looking at how to merge a fast footer parser like this into the Apache Parquet Java project.
About protocols in this vicinity, I've been noticing a missing piece in OSS around transport as well. In Python, you often need incompatible dependency sets in one app, and the usual choices are either ad-hoc subprocess RPC that gets messy over time or HTTP / containers that are overkill and make you change deployment strategy.
I ended up building a protocol for my own use around a very strict subprocess boundary for Python (initially at least, protocol is meant to be universal). It has explicit payload shape, timeout and error semantics. I already went a little too far beyond my usecase with deterministic canonicalization for some common pitfall data types (I think pickle users would understand, though). It still needs some documentation polish, but if anyone would actually use it, I can document it properly and publish it.
I've been running into _a lot_ of issues with Hyper/Tonic. Like literal H2 spec violations. Try hosting a tonic server behind nginx or ALB. It will literally just not work as it can't handle GOAWAY retries in a H2 spec-compliant way.
If this fixes that I might consider switching.
However, Google is also working in a new grpc-rust implementation and I have faith in them getting it right so holding tight a little bit longer.
It's 2026 and I'm still defining my own messaging and wire protocols.
Plain C structs that fit in a UDP datagram that you can reinterpret_cast from is still best. You can still provide schemas and UUIDs for that, and dynamically transcode to JSON or whatever.
Until you have to work with big and little endian systems. There are other weirdness about how different computers represent things as well. utf-8 / ucs-16 strings (or other code pages). Not all floats are ieee-754. Still when you can ignore all those issues what you did is really easy and often works.
Provided that:
If you decide to use UDP, do you ignore the transmission errors or write the handling layer on your own?
I handle it in different ways by topic.
For topics which are sending the state of something, a gap naturally self-recovers so long as you keep sending the state even if it doesn't change.
For message buses that need to be incremental, you need to have a separate snapshot system to recover state. That's usually pretty rare outside of things like order books (I work in low-latency trading).
For requests/response, I find it's better to tell the requester their request was not received rather than transparently re-send it, since by the time you re-send it it might be stale already. So what I do at the protocol level is just have ack logic, but no retransmit. Also it's datagram-oriented rather than byte-oriented, so overall much nicer guarantees than TCP (so long as all your messages fit in one UDP payload).
Google really dropped the ball with protobuf when they took so long to make them zero-copy. There are 3rd party implementations popping up now and a real risk of future wire-level incompatibilities across languages.
"zero copy" in this context just means that the contents of the input buffer are aliased to string fields in the decoded representation. This is a language-level feature and has nothing to do with the wire format.
bit-slicing the nth slice is 4-bit hex computing: microcontroller instrument assemblage
Commonly used crates should be blessed and go into an extended stdlib.
Ok, but this is not a commonly used crate. Its brand new!
Unless there’s a strict schedule for review to remove them, please no… because that’s how we get BerkeleyDB and CGI in the standard Perl libraries.
If anything, there should be “less than blessed” “*-awesome” libraries
No HTTP, Proto, or gRPC crate should ever find itself in the stdlib.
Didn't we learn this with python?
How many python http client libraries are in the dumping ground that is the python "batteries included" standard library?
And yet people always reach for the one that is outside stdlib.
On the other hands, having half the packages depend on packages such as serde, syn, procmacro2 might not be such a good idea. First of all it is annoying when creating new projects to have to move over table stakes. Second, it is a security nightmare. most of rust could be vulnerable if dtolnay decided to go rogue.
It is not that everything should go into the stdlib, but having syn, procmacro and serde would be a good start imo. And like golang having a native http stack would be really awesome, every time you have to do any HTTP, you end up pulling in some c-based crypto lib, which can really mess up your day when you want to cross-compile. With golang it mostly just works.
It isn't really in the flavor of rust to do, so I don't think it is going to happen, but it is nice when building services, that you can avoid most dependencies.
I agree with this. Rust has a node-style dependency problem; any non-trivial rust project ends up with dozens of dependencies in my experience. I would add tokio to the list of dependencies-so-common-they-should-be-moved-to-stdin.
A second tier stdlib would turn out like the Boost c++ libraries -- an 800 lb gorilla of a common dependency that gets called in just to do something very simple; although to be fair most of the Boost functionality already is in rust's stdlib.
As long as the "2nd-tier" stdlib was versioned & tied in with the edition system, it could work. The problem with most stdlibs (including Rust's) is that there's no way to remove anything & replace it with a better design. So the lib only ever grows, slowly adding complexity.
You don't think golang's http library is a good idea? I would have thought everyone is happy we have it
Would it be still a good idea if instead of being created / owned by google as an organization it was originally made by someone that didn't make billions by handling trillions of http requests over decades and you had to keep all of the bad initial api design choices going forward?
I would always go to the official docs page for the needs I have, and use their HTTP library (or any other). It removes decision making, having to ensure good quality practices from lesser known libraries, and risks of supply chain attacks (assuming the root stdlib of a language would have more attention to detail and security than any random 3rd-party library thrown into github by a small group of unpaid devs)
Only when it falls short on my needs, I would drop the stdlib and go in dearch of a good quality, reputable, and reliable 3rd-party lib (which is easier said than done).
Has worked me well with Go and Python. I would enjoy the same with Rust. Or at a minimum, a list of libraries officialy curated and directly pointed at by the lang docs.