Bit of a tangent, but what I'm looking for is a S3-compatible server with transparent storage, ie storing each file (object) as an individual file on disk.
Minio used to do that but changed many years ago. Production-grade systems don't do that, for good reason. The only tool I've found is Rclone but it's not really meant to be exposed as a service.
Great educational project! I'm curious why you are using Raft and also 2PC unless you're sharding data and doing cross-shard transactions? Or is Raft only for cluster membership but 2PC is for replicating data? If that's the case it kind of seems like overkill but I'm not sure.
Few distributed filesystems/object stores seem to use Raft (or consensus at all) for replicating data because it's unnecessary overhead. Chain replication is one popular way for replicating data (which uses consensus to manage membership but the data path is outside of consensus).
Thank you for this sharp and detailed question!
In minikv, both Raft and 2PC are purposefully implemented, which may seem “overkill” in some contexts, but it serves both education and production-grade guarantees:
- Raft is used for intra-shard strong consistency: within each "virtual shard" (256 in total), data and metadata are replicated via Raft (with leader election and log replication), not just for cluster membership;
- 2PC (Two-Phase Commit) is only used when a transaction spans multiple shards: this allows atomic, distributed writes across multiple partitions. Raft alone is not enough for atomicity here, hence the 2PC overlay;
- The design aims to illustrate real-world distributed transaction tradeoffs, not just basic data replication. It helps understand what you gain and lose with a layered model versus simpler replication like chain replication (which, as you noted, is more common for the data path in some object stores).
So yes, in a pure object store, consensus for data replication is often skipped in favor of lighter-weight methods. Here, the explicit Raft+2PC combo is an architectural choice for anyone learning, experimenting, or wanting strong, multi-shard atomicity.
In a production system focused only on throughput or simple durability, some of this could absolutely be streamlined.
Hello, cool project, did you think about maybe contributing to the key-value store feature of Garage, which is also a Rust project by open source development lab Deux Fleurs?
Hi Emilie, nice project, thanks for sharing. I’m curious whether there were any decisions that you added mainly for educational value even though you wouldn’t make the same call in a production system?
Yes, I know.
I had the opportunity to request a review of my first post (which was flagged) following my email to the moderators of HN.
After checking, the moderator told me to redo a post because indeed I was wrongly flagged by some people here.
>All the code, architecture, logic, and design in minikv were written by me, 100% by hand.
Why people always lie with this? Especially in this case that they uploaded the entire log:
Date: Sat Dec 6 16:08:04 2025 +0100
Add hashing utilities and consistent hash ring
Date: Sat Dec 6 16:07:24 2025 +0100
Create mod.rs for common utilities in minikv
Date: Sat Dec 6 16:07:03 2025 +0100
Add configuration structures for minikv components
Date: Sat Dec 6 16:06:26 2025 +0100
Add error types and conversion methods for minikv
Date: Sat Dec 6 16:05:45 2025 +0100
Add main module for minikv key-value store
And this goes on until project is complete (which probably took 2~3h total if sum all sessions). Doubt learned anything at all. Well, other than that LLMs can solo complete simple projects.
Comments in previous submission are also obviously AI generated. No wonder was flagged.
It looks like that if you want logically separated commits from a chunk of programming you have done. Stage a file or a hunk or two, write commit message, commit, rinse and repeat.
So, either the entire project was already written and being uploaded one file at the time (first modification since lowest commit mentioned is README update: https://github.com/whispem/minikv/commit/6fa48be1187f596dde8..., clearly AI generated and clearly AI used has codebase/architecture knowledge), and this claim is false, or they're implementing a new component every 30s.
I had the opportunity to request a review of my first post (which was flagged) following my email to the moderators of HN.
I didn’t use AI for the codebase, only for .md files & there's no problem with that.
My project was reviewed by moderators, don't worry.
If the codebase or architecture was AI generated this post would not have been authorized and therefore it would not have been published.
No, I was helped (.md files only) by AI to rewrite but the majority of the doc is written by myself, I just asked for help from the AI for formatting for example.
I am not going to pretend to know what this person did, but I've definitely modified many things at once and made distinct commits after the fact (within 30s). I do not find it that abnormal.
Thanks a lot!
I make distinct commits "every 30s" because I'm focused and I test my project.
If the CI is green, I don't touch of anything.
If not, I work on the project until the CI is fully green.
Bit of a tangent, but what I'm looking for is a S3-compatible server with transparent storage, ie storing each file (object) as an individual file on disk.
Minio used to do that but changed many years ago. Production-grade systems don't do that, for good reason. The only tool I've found is Rclone but it's not really meant to be exposed as a service.
Anyone knows of an option?
Great educational project! I'm curious why you are using Raft and also 2PC unless you're sharding data and doing cross-shard transactions? Or is Raft only for cluster membership but 2PC is for replicating data? If that's the case it kind of seems like overkill but I'm not sure.
Few distributed filesystems/object stores seem to use Raft (or consensus at all) for replicating data because it's unnecessary overhead. Chain replication is one popular way for replicating data (which uses consensus to manage membership but the data path is outside of consensus).
Thank you for this sharp and detailed question! In minikv, both Raft and 2PC are purposefully implemented, which may seem “overkill” in some contexts, but it serves both education and production-grade guarantees:
- Raft is used for intra-shard strong consistency: within each "virtual shard" (256 in total), data and metadata are replicated via Raft (with leader election and log replication), not just for cluster membership;
- 2PC (Two-Phase Commit) is only used when a transaction spans multiple shards: this allows atomic, distributed writes across multiple partitions. Raft alone is not enough for atomicity here, hence the 2PC overlay;
- The design aims to illustrate real-world distributed transaction tradeoffs, not just basic data replication. It helps understand what you gain and lose with a layered model versus simpler replication like chain replication (which, as you noted, is more common for the data path in some object stores).
So yes, in a pure object store, consensus for data replication is often skipped in favor of lighter-weight methods. Here, the explicit Raft+2PC combo is an architectural choice for anyone learning, experimenting, or wanting strong, multi-shard atomicity. In a production system focused only on throughput or simple durability, some of this could absolutely be streamlined.
Hello, cool project, did you think about maybe contributing to the key-value store feature of Garage, which is also a Rust project by open source development lab Deux Fleurs?
Hello! Thank you for your message. I don’t know this project, do you have a GitHub link maybe?
Sure, here you go:
- Documentation: https://garagehq.deuxfleurs.fr/
- Git repo: https://git.deuxfleurs.fr/Deuxfleurs/garage
Thanks!
Hi Emilie, nice project, thanks for sharing. I’m curious whether there were any decisions that you added mainly for educational value even though you wouldn’t make the same call in a production system?
Looks nice.
What is the memory consumption under a significant load? That seems to be as much important as the throughput & latency.
I there an official docker image? I am looking for something more light-weighted than MinIO. What are the requirements?
Have you checked garage - https://garagehq.deuxfleurs.fr ? Not affiliated nor trying to overshadow the posted project
Yes! I'll check as soon as possible
Last posted 16 days ago: https://news.ycombinator.com/item?id=46661308
Yes, I know. I had the opportunity to request a review of my first post (which was flagged) following my email to the moderators of HN. After checking, the moderator told me to redo a post because indeed I was wrongly flagged by some people here.
>All the code, architecture, logic, and design in minikv were written by me, 100% by hand.
Why people always lie with this? Especially in this case that they uploaded the entire log:
And this goes on until project is complete (which probably took 2~3h total if sum all sessions). Doubt learned anything at all. Well, other than that LLMs can solo complete simple projects.Comments in previous submission are also obviously AI generated. No wonder was flagged.
It looks like that if you want logically separated commits from a chunk of programming you have done. Stage a file or a hunk or two, write commit message, commit, rinse and repeat.
You have never split your working tree changes into separate commits?
Irrelevant question. In README has:
>Built in public as a learning-by-doing project
So, either the entire project was already written and being uploaded one file at the time (first modification since lowest commit mentioned is README update: https://github.com/whispem/minikv/commit/6fa48be1187f596dde8..., clearly AI generated and clearly AI used has codebase/architecture knowledge), and this claim is false, or they're implementing a new component every 30s.
I had the opportunity to request a review of my first post (which was flagged) following my email to the moderators of HN. I didn’t use AI for the codebase, only for .md files & there's no problem with that. My project was reviewed by moderators, don't worry. If the codebase or architecture was AI generated this post would not have been authorized and therefore it would not have been published.
How does this deleted fix_everything.sh fit in to your story?
https://github.com/whispem/minikv/commit/6e01d29365f345283ec...
I don't see the problem to be honest
Hmm. You doth protest too much, methinks :)
I thought that your “background in literature” contributed to the “well-written docs”, but that was LLMs!
No, I was helped (.md files only) by AI to rewrite but the majority of the doc is written by myself, I just asked for help from the AI for formatting for example.
I am not going to pretend to know what this person did, but I've definitely modified many things at once and made distinct commits after the fact (within 30s). I do not find it that abnormal.
Thanks a lot! I make distinct commits "every 30s" because I'm focused and I test my project. If the CI is green, I don't touch of anything. If not, I work on the project until the CI is fully green.
What does that mean? You got feedback from the CI within 30 seconds and immediately pushed a fix?