We discovered this change recently because my dad was looking for a file that Dropbox accidentally overwrote which at first we said “no problem. This is why we pay for backblaze”
We had learned that this policy had changed a few months ago, and we were never notified. File was unrecoverable
If anyone at backblaze is reading this, I pay for your product so I can install you on my parents machine and never worry about it again. You decided saving on cloud storage was worth breaking this promise. Bad bad call
I'm going to drop Backblaze for my entire company over this.
I need it to capture local data, even though that local data is getting synced to Google Drive. Where we sync our data really has nothing to do with Backblaze backing up the endpoint. We don't wholly trust sync, that's why we have backup.
On my personal Mac I have iCloud Drive syncing my desktop, and a while back iCloud ate a file I was working on. Backblaze had it captured, thankfully. But if they are going to exclude iCloud Drive synced folders, and sounds like that is their intention, Backblaze is useless to me.
Bidirectional auto file sync is a fundamentally broken pattern and I'm tired of pretending it's not. It's just complete chaos with wrong files constantly getting overridden on both ends.
I have no clue why people still use it and I'd cut my losses if I were you, either backup to the cloud or pull from it, not both at the same time like an absolute tictac.
This is an instance of someone familiar with complex file access patterns not understanding the normal use case for these services.
The people using these bidirectional sync services want last writer wins behavior. The mild and moderately technical people I work with all get it and work with it. They know how to use the UI to look for old versions if someone accidentally overwrites their file.
Your characterization as complete chaos with constant problems does not mesh with the reality of the countless low-tech teams I've seen use Dropbox type services since they were launched.
This would be half OK if it worked, but you can't trust it to. OneDrive, for instance, has an open bug for years now where it will randomly revert some of your files to a revision from several months earlier. You can detect and recover this from the history, but only if you know that it happened and where, which you usually won't because it happens silently. I only noticed because it happened to an append-only text file I use daily.
You can build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem. From Windows or Mac, this FTP account could be accessed through built-in software.
1 out of a thousand people might do that, the others will buy the product. That's why people use it, most people don't want to build everything themselves.
But as usual it forgets the "For a Linux user" part.
If we remove the whole linux section and just ask "why not map a folder in Explorer" it's a reasonable question, probably even more reasonable in 2026 than in 2007. The network got faster and more reliable, and the dropbox access got slower.
Obvious. Explorer even has support built in for transparent ‘native’ gui support. I’m not even sure why you felt the need to explain it in detail. Next you’ll be explaining how to walk. (/s, I loved it)
It works perfectly fine if you're user that know how it works. I use it with Syncthing and it works coz I know to not edit same file at the same time on 2 devices (my third and fourth device is always on internet so chances propagate reasonably fast even if the 2 devices aren't on on the same time)
I think this is a case of people using bidirectional file sync wrong. The point is to make the most up to date version of a file available across multiple devices, not to act as a backup or for collaboration between multiple users.
It works perfectly fine as long as you keep how it works in mind, and probably most importantly don't have multiple users working directly on the same file at once.
I've been using these systems for over a decade at this point and never had a problem. And if I ever do have one, my real backup solution has me covered.
“Every file is only ever written to from a single client, and will be asynchronously made available to all other clients, and after some period of time has elapsed you can safely switch to always writing to the file from a different client”.
> And if I ever do have one, my real backup solution has me covered.
What do you use and how do you test / reconcile to make sure it’s not missing files? I find OneDrive extremely hard to deal with because the backup systems don’t seem to be 100% reliable.
I think there are a lot of solutions these days that error on the side of claiming success.
I agree. I use syncthing for syncing phones and laptops. For data like photos, which aren't really updated. It works very nice. And for documents updated by one user, moving between devices is totally seamless.
That being said i understand how it works at a high level.
Also taking recommendations for a simple services I can install on my dads windows machine and my moms Mac that will just automatically backup the main drive to the cloud just in case
I've been extremely happy with Arq https://www.arqbackup.com/ for several years as a quiet backup solution, bring your own storage. I've done a few small restores and it's been just fine, and it automatically thins your backups to constrain storage costs.
Managing exclusions is something to keep vaguely on top of (I've accidentally had a few VM disk images get backed up when I don't need/want them) but the default exclusions are all very reasonable.
Installed Carbonite on my parents’ computer something like 15 years ago, and it still works (every now and then my dad tells me he used it to recover from a bug or a mistake).
But I have no idea where the company currently sits on the spectrum from good actor to fully enshittified.
I'm going to join the exodus, though for a different reason. Switched to Orbstack and ever since Backblaze refuses to back up saying "disk full" as Orbstack uses a 8TB sparse disk image. You can exclude it, but if they won't (very easily) fix a known issue by measuring file sizes properly I don't feel confident about the product.
Of course, I wouldn’t use their client anymore. Actually, I would have never used it from the start as it’s not open source. I think for backups there’s no better guarantee than that. I don’t mean because you could look at the source code, I mean because in my experience open source products tend to care more about their users than not. At least for such foundational tools.
The issue with a client app backing up dropbox and onedrive folders on your computer is the files on demand feature, you could sync a 1tb onedrive to your 250gb laptop but it's OK because of smart/selective sync aka files on demand. Then backblaze backup tries to back the folder up and requests a download of every single file and now you have zero bytes free, still no backup and a sick laptop.
You could oauth the backblaze app to access onedrive directly, but if you want to back your onedrive up you need a different product IMO.
Shoutout to Arq backup which simply gives you an option in backup plans for what to do with cloud only files:
- report an error
- ignore
- materialize
Regardless, if you make it back up software that doesn’t give this level of control to users, and you make a change about which files you’re going to back up, you should probably be a lot more vocal with your users about the change. Vanishingly few people read release notes.
I honestly didn't even realize Backblaze had a clientside app. Very happy user of Arq - been running a daily scheduled dual backup of my HDD to an external NAS and Backblaze B2 for years with zero issues.
And I've used enough "gold standard" commercial applications, like the one being discussed in this very article, that I don't trust those either. If you recoil in horror at code written by LLMs, I'm afraid that the vendors you're already working with have some really bad news for you. You can get over it now or get over it later. You will get over it.
I can audit and verify Claude's output. Code running at BackBlaze, not so much. Take some responsibility for your data. Rest assured, nobody else will.
You are not wrong, but I just don't have time. My choices are pay someone or throw my hands up. I have been paying backblaze. But I recently had a drive die, and discovered the backups are missing .exe and .dll files, and so that part of the restore was worthless.
What time I do have, I've been using to try and figure out photo libraries. Nothing is working the way I need it to. The providers are a mess of security restrictions and buggy software.
My favorite Peanuts comic was always the one where Linus is standing at an intersection next to a 'Push Button To Cross Street' sign. He is sucking his thumb and clutching his blanket despondently.
In the last panel, Charlie Brown tells him, "You have to move your feet, too."
That seems like a pretty straightforward issue to solve, to simply backup only those files that are actually on the system, not the stubs. If it's on your computer, it should able to get backed up. If it's just a shadow, a pointer, it doesn't.
Making the change without making it clear though, that's just awful. A clear recipe for catastrophic loss & drip drip drip of news in the vein of "How Backblaze Lost my Stuff"
Imagine if they could detect stab or real file huh? Space technology, I know! Or just fucking copy them as stubs and what's actually downloaded as actually downloaded! Boggles the mind!
Or maybe just do what they do now, but WARN about that in HUGE RED LETTERS, in the website and the app, instead of burying it in an update note like weasels!
The whole "just sync everything, and if you can't seek everything, pretend to sync everything with fake files and then download the real ones ad-hoc" model of storage feels a bit ill-conceived to me. It tries to present a simple facade but I'm not sure it actually simplifies things. It always results in nasty user surprises and sometimes data loss. I've seen Microsoft OneDrive do the same thing to people at work.
Same. I lost a lot of photos this way. I've recently moved over to Immich + Borg backup with a 3-2-1 backup between a local synology NAS and BorgBase. Painful lesson, but at least now I feel much more confident. I've even built some end-to-end monitoring with Grafana.
My own approach to simplicity generally means "hide complexity behind a simple interface" rather than pushing for simple implementations because I feel that too much emphasis on simplicity of implementations often means sacrificing correctness.
This particular example is a useful one for me to think about, because it's a version of hiding complexity in order to present a simple interface that I actually hate. (WYSIWYG editors is another one, for similar reasons: it always ends up being buggy and unpredictable.)
That would make sense for online-only files, but I have my Dropbox folder set to synchronize everything to my PC, and Backblaze still started skipping over it a few months ago. I reached out to support and they confirmed that they are just entirely skipping Dropbox/OneDrive/etc folders entirely, regardless of if the files are stored locally or not.
That doesn't really make a lot of sense, though. Reading a file that's not actually on disk doesn't download it permanently. If I have zero of 10TB worth of files stored locally on my 1TB device, read them all serially, and measure my disk usage, there's no reason the disk should be full, or at least it should be cache that can be easily freed. The only time this is potentially a problem is if one of the files exceeds the total disk space available.
Hell, if I open a directory of photos and my OS tries to pull exif data for each one, it would be wild if that caused those files to be fully downloaded and consume disk space.
Right, but even if that’s working it breaks the user experience of services like this that ‘files I used recently are on my device’.
After a backup, you’d go out to a coffee shop or on a plane only to find that the files in the synced folder you used yesterday, and expected to still be there, were not - but photos from ten years ago were available!
That shouldn't be seen as Backblaze's problem. It's Dropbox's problem that they made their product too complicated for users to reason about. The original Dropbox concept was "a folder that syncs" and there would be nothing problematic about Backblaze or anything else trying to back it up like any other folder.
Today's Dropbox is a network file system with inscrutable cache behavior that seeks to hide from the users the information about which files are actually present. That makes it impossible for normal users to correctly reason about its behavior, to have correct expectations for what will be available offline or what the side effects of opening a file will be, and Backblaze is stuck trying to cope with a situation where there is no right answer.
There’s no reason to think that would happen - files you had from ten years ago would have been backed up ten years ago and would be skipped over today.
Good point (I’m assuming you’re right here and it trusts file metadata and doesn’t read files it’s already backed up?)
It would still happen with the first backup - or first connection of the cloud drive - though, which isn’t a great post-setup new user experience. It probably drove complaints and cancellations.
I feel like I’ve accidentally started defending the concept of not backing up these folders, which I didn’t really intend to. I’d also want these backed up. I’m just thinking out loud about the reasons the decision was made.
It's generally now handled decently well, but with three or four of these things it can make backups take annoying long as without "smarts" (which are not always present) it may force a download of the entire OneDrive/Box each time - even if it never crashes out.
The issue really isn't that it's not backing up the folder (which I can see an argument for both sides and various ways to do it) - it's that they changed what they did in a surprising way.
Your backup solution is not something you ever want to be the source of surprises!
This is a complexity that makes it harder, but not insurmountable.
It would be reasonable to say that if you run the file sync in a mode that keeps everything locally, then Backblaze should be backing it up. Arguably they should even when not in that mode, but it'll churn files repeatedly as you stream files in and out of local storage with the cloud provider.
> Arguably they should even when not in that mode, but it'll churn files repeatedly as you stream files in and out of local storage with the cloud provider.
When you have a couple terabytes of data in that drive, is it acceptable to cycle all that data and use all that bandwidth and wear down your SSD at the same time?
Also, high number of small files is a problem for these services. I have a large font collection in my cloud account and oh boy, if I want to sync that thing, the whole thing proverbially overheats from all the queries it's sending.
Reading your comments, it sounds like you are arguing it is impossible to backup files in Dropbox in any reasonable way, and therefore nobody should backup their cloud files. I know you haven’t technically said that, but that’s what it sounds like.
I assume you don’t think that, so I’m curious, what would you propose positively?
> I know you haven’t technically said that, but that’s what it sounds like.
Yes, I didn't technically said that.
> It sounds like you are arguing it is impossible to backup files in Dropbox in any reasonable way, and therefore nobody should backup their cloud files.
I don't argue neither, either.
What I said is with "on demand file download", traditional backup software faces a hard problem. However, there are better ways to do that, primary candidate being rclone.
You can register a new application ID for your rclone installation for your Google Drive and Dropbox accounts, and use rclone as a very efficient, rsync-like tool to backup your cloud storage. That's what I do.
I'm currently backing up my cloud storages to a local TrueNAS installation. rclone automatically hash-checks everything and downloads the changed ones. If you can mount Backblaze via FUSE or something similar, you can use rclone as an intelligent MITM agent to smartly pull from cloud and push to Backblaze.
Also, using RESTIC or Borg as a backup container is a good idea since they can deduplicate and/or only store the differences between the snapshots, saving tons of space in the process, plus encrypting things for good measure.
My understanding of Backblaze Computer Backup is it is not a general purpose, network accessible filesystem.[0] If you want to use another tool to backup specific files, you'd use their B2 object storage platform.[1] It has an S3 compatible API you can interact with, Computer Backup does not.
But generally speaking, I'd agree with your sentiment.
This. You should not try to backup your local cache of cloud files as if those were your local files. Use a tool that talks to the cloud storage directly.
Use tools with straightforward, predictable semantics, like rclone, or synching, or restic/Borg. (Deduplication rules, too.)
But if the files are only on the remote storage and not local, chances are they haven't been modified recently, so it shouldn't download them fully, just check the metadata cache for size / modification time and let them be if they didn't change.
So, in practice, you shouldn't have to download the whole remote drive when you do an incremental backup.
You can't trust size and modification time all the time, though mdate is a better indicator, it's not foolprooof. The only reliable way will be checksumming.
Interestingly, rclone supports that on many providers, but to be able to backblaze support that, it needs to integrate rclone, connect to the providers via that channel and request checks, which is messy, complicated, and computationally expensive. Even if we consider that you won't be hitting API rate limits on the cloud provider.
Sometimes modification time of a file which is not downloaded on computer A, but modified by computer B is not reflected immediately to computer A.
Henceforth, backup software running on computer A will think that the file has not been modified. This is a known problem in file synchronization. Also, some applications modifying the files revert or protect the mtime of the file for reasons. They are rare, but they're there.
The problem is, downloading files and disk management is not in your control, that part is managed by the cloud client (dropbox, google drive, et. al) transparently. The application accessing the file is just waiting akin to waiting for a disk spin up.
The filesystem is a black box for these software since they don't know where a file resides. If you want control, you need to talk with every party, incl. the cloud provider, a-la rclone style.
Unless it does something very weird it won't trigger all those files to download at the same time. That shouldn't be a worry.
And, as a separate note, they shouldn't be balking at the amount of data in a virtualized onedrive or dropbox either considering the user could get a many-terabyte hard drive for significantly less money.
> Unless it does something very weird it won't trigger all those files to download at the same time. That shouldn't be a worry.
The moment you call read() (or fopen() or your favorite function), the download will be triggered. It's a hook sitting between you and the file. You can't ignore it.
The only way to bypass it is to remount it over rclone or something and use "ls" and "lsd" functions to query filenames. Otherwise it'll download, and it's how it's expected to work.
I think you might be confusing Backblaze reading files and how Dropbox/OneDrive/Nextcloud/etc. work. NC doesn't enable this by default (I don't think), but Windows calls it virtual file support. There is no avoiding filling the upload buffer, because Backblaze has zero control over how Dropbox downloads files. When Backblaze requests that a file be opened and read, Windows will ask Dropbox or whatever to open the file for it, and to read it. How that is done is up to whatever handles the virtual files. To Backblaze, your Dropbox folder is a normal directory with all that that entails, so Backblaze thinks that it can just zip through the directory and it'll read data from disk, even though that isn't really what's happening. I had to exclude my Nextcloud directory from my Duplicati backups for precisely this reason -- my Nextcloud is hosted on my server, and Duplicati was sending it so many requests it would cause my server to start sending back error 500s.
And no, my server isn't behind cloudflare, primarily because I don't have $200 to throw at them to allow me to proxy arbitrary TCP/UDP ports through their network, and I don't know how to tell CF "Hey, only proxy this traffick but let me handle everything else" (assuming that's even possible given that the usual flow is to put your entire domain behind them).
Dropbox and onedrive can handle backblaze zipping through and opening many files. The risk is getting too many gigabytes at once, but that shouldn't happen because backblaze should only open enough for immediate upload. If it does happen it's very easily fixed.
If it overloads nextcloud by hitting too many files too fast, that's a legitimate issue but it's not what OP was worried about.
Maybe it'll, maybe it won't, but it'll cycle all files in the drive and will stress everything from your cloud provider to Backblaze, incl. everything in between; software and hardware-wise.
That sounds very acceptable to get those files backed up.
It shouldn't stress things to spend a couple weeks relaying a terabyte in small chunks. The most likely strain is on my upload bandwidth and yeah that's the cost of cloud backup, more ISPs need to improve upload.
I mean, cycling a couple of terabytes of data over a 512GB drive is at least full 4 writes, which is too much for that kind of thing.
> more ISPs need to improve upload.
I was yelling the same things to the void for the longest time, then I had a brilliant idea of reading the technical specs of the technology coming to my home.
Lo and behold, the numbers I got were the technical limits of the technology that I had at home (PON for the time being), and going higher would need a very large and expensive rewiring with new hardware and technology.
4 writes out of what, 3000? For something you'll need to do once or twice ever? It's fine. You might not even eat your whole Drive Write Per Day quota for the upload duration, let alone the entire month.
> the technical limits of the technology that I had at home (PON for the time being)
Depends on your device capacity and how much is in actual use. Wear leveling things also wear things while it moves things around.
> For something you'll need to do once or twice ever?
I don't know you, but my cloud storage is living, and even if it's not living, if the software can't smartly ignore files, it'll pull everything in, compare and pass without uploading, causing churns in every backup cycle.
> Isn't that usually symmetrical? Is yours not?
GPON (Gigabit PON) is asymmetric. Theoretical limits is 2.4Gbps down, 1.2Gbps up. I have 1000Mbit/75Mbit at home.
How do you know how often those files need to be backed up without reading them? Timestamps and sizes are not reliable, only content hashes. How do you get a content hash? You read the file.
If timestamps aren’t reliable, you fall way outside the user that can trust a third party backup provider. Name a time when modification timestamp fails but a cloud provider will catch the need to download the file.
I guess the problem with Backblaze's business model with respect to Backblaze Personal is that it is "unlimited". They specifically exclude linux users because, well, we're nerds, r/datahoarders exists, and we have different ideas about what "unlimited" means. [1]
This is another example in disguise of two people disagreeing about what "unlimited" means in the context of backup, even if they do claim to have "no restrictions on file type or size" [2].
Any company that does the "unlimited*" shenanigans are automatically out from any selection process I had going, wherever they use it. It's a clear signal that the marketing/financial teams have taken over the businesses, and they'll be quick to offload you from the platform given the chance, and you'll have no recourse.
Always prefer businesses who are upfront and honest about what they can offer their users, in a sustainable way.
That’s an example of where unlimited can work (because the limit is a number of hours of degraded service which is quantifiable).
Storage was already a hairy beast with the original setup, and it would be much better if they had defined limits you could at least know about (and pay for).
Google Drive reneged on unlimited storage for Education accounts once they realized that universities also contain researchers who need to store huge amounts of data.
Massive fraud from abroad didn't help there either. A favorite backup spot for terabytes of pirated media, complete with guides on which schools had good @edu addresses for it.
Hadn't even considered your obvious point, a good one!
YouTube is constantly reencoding videos to save space at the expense of older content looking like mud, so arguably even they're having their struggles.
YT would have to start declining in growth pretty substantially for that to be the case. All the 360p video from 2010-2015 probably doesn't take up even 1% of the storage new videos added in 2025.
True, it's more likely to be aimed at stemming the tide of 4k video that nobody watches - but luckily they're worth more than Disney right now so we don't have to confront that ... yet.
You can only do it during growth phases or if there’s complimentary products with margin. The story I was told about Office 365 was the when they were using spinning disk, exchange was IOPS-bound, so they had lots of high volume, low iops storage to offer for SharePoint. Google has a similar story, although neither are really unlimited, but approaching unlimited with for large customers.
Once growth slows, churn eats much of the organic growth and you need to spend money on marketing.
Or they're selling their product to a market where the purchaser doesn't understand how much they would need to pay if they were paying by the gigabyte (or even how to check how much they would need). Telling those people they don't need to worry about that "detail" is a key selling point. Backblaze has a product for people who understand the limitations of their consumer product and don't find them acceptable: B2, which is priced by the gigabyte.
> And statistically-speaking, is viable as long as a company keeps its users to a normal distribution.
Doing a bait-and-switch on a percentage of your paying customers, no matter how small the percentage is, may be "viable" for the company, but it's a hostile experience for those users, and companies deserve to be called out for it.
On the other hand, subsidizing high-usage customers with low-usage customers is pretty generous to the high-usage customers, and there's no pricing model that doesn't suck a little.
Pricing tiers suck if your usage needs are at the bottom of a tier, or you need exactly one premium feature but not more. A la carte pricing is always at least a bit steep, since there's no minimum charge/bulk discount (consider a gym or museum's "day pass") so they have to charge you the full one-time costs every time in case that's your only time.
Base cost + extra per usage might be the best overall, but because nobody has solved micro transactions, the usage fees have to be pretty steep too. And frankly, everyone hates being metered - it means you have to think about pricing every time you go to use something.
I just read the Reddit post by their developer and my takeaway is that they have a very good understanding of “unlimited” really means. It’s not a shenanigan. It’s just calculated risk. It’s clear to me that they simultaneously intend to offer truly unlimited backups while hoping that what the average user backs up is within a certain limit that they can easily predict and plan for. It’s a statistical game that they are prepared to play.
> It’s a statistical game that they are prepared to play.
I understand this, many others do too, the only difference seems to be that we're not willing to play those games. Others are, and that's OK, just giving my point of view which I know is shared by many others who are bit stricter about where we host our backups. Instead of "statistical games" we prefer "upfront limitations", as one example.
The problem is you have to play with them - and sure, maybe they're willing to be the Costco to the unlimited backup's $1.50 hotdog - but for how long? Will their dedication to unlimited and particular price points mean you have to take Pepsi for awhile instead of Coke, or that your polish sausage dog disappears? Wait, where did the analogy go? I'm hungry.
It's a bit safer when you know your playbook - if there was unlimited (as it is now) and unlimited plus (where they backup "cloud storage cached files") and unlimited pro max premier (where they backup entire cloud storages) you'd at least know where you stand, and you'd change "holy shit my important file I though was backed up isn't and now it's gone forever" to "I have to pay $10 a more a month or take on this risk".
In university we had computer labs, I worked in the office that handled all of engineering computing. You paid the fee for engineering school and you got to use the labs. They had printers. We wanted printing to be free. This didn't mean "you get to take reams of blank paper home with you", it meant "you get as much printing as you reasonably need for academic purposes". Nobody cared if you printed your resume, fliers for your book club, or whatever, we weren't sticklers. Honestly we wanted to think about printers as little as possible.
But we'd always have a few people at the end of the semester print 493 blank pages using up all of their print quota they'd "paid for". No sir, you didn't pay for 500 pages of printing a semester, we'd let you print as much as you needed, we just had to put a quota in place to prevent some joker from wallpapering the lecture hall.
It was hard to express what we meant and "unlimited" didn't cut it.
You meant “reasonable,” but you did not apply reason. Situations such as this can be handled with a quota set at something like 150% of median use, but then extended upon a justified request. It can work in a lab where there’s a human touch, but it fails at million-user scale where even that level of human support is too expensive.
If they limit the rate of speed it's technically limited which really makes me wonder how they legally can say these things. I guess it means in a lot of cases it's like Comcast where they also limit the data a month perhaps but dang.
They mean that they're not going to limit the total amount of data that you send/receive beyond the natural limit implied by the maximum rate.
When a movie subscription says unlimited movies, we know they're not suggesting that they can break the laws of time, just that they won't turn you away from a screening. It's pretty normal language, used to communicate no additional limit, which is relevant when compared to cell phone data plans (which are actually, in my opinion, fraudulent) that shunt you to a lower tier after a certain amount of usage.
I mean, in this universe we live in everything is limited somehow.
I do wish it was a word that had to be completely dropped from marketing/adverting.
For example there is not unlimited storage, hell the visible universe has a storage limit. There is not unlimited upload and download speed, and what if when you start using more space they started exponentially slowing the speed you could access the storage? Unlimited CPU time in processing your request? Unlimited execution slots to process your request? Unlimited queue size when processing your requests.
Hence everything turns into the mess of assumptions.
> I mean, in this universe we live in everything is limited somehow.
Yes, indeed, most relevant in this case probably "time" and "bandwidth", put together, even if you saturate the line for a month, they won't throttle you, so for all intents and purposes, the "data cap" is unlimited (or more precise; there is no data cap).
In almost all services this tends to get an asterisk that says "unless your usage interferes with other users" which in itself is poorly defined. But typically means once their system gets closer to its usage limit, you're the first to get booted off the service.
No ISP I've had in my adult life had such conditions, it truly is "Whatever you manage to do with the bandwidth we give you". I've done hundreds of TBs for months without any impact to my bandwidth (transferring ML datasets among other things), and I'm pretty sure a ISP in my country would break some law if they'd limit a typical broadband home connection based on data transfer quotas.
What? You are capped by bandwidth and time is its own limit. You are capped at the max bandwidth in your service contract multiplied by the length of the contract. A bandwidth cap has an implied data cap
The point is that you have access to a 100Mb/s connection, and your access to that connection is unlimited. It doesn't become a 10Mb/s connection at some point, and your access isn't cut off - there are no limits on your access.
Of course there are practical limits as you can't make your 100Mb/s connection into a gigabit one (ignoring that you can buy burstable in a datacenter, etc, etc).
Where unlimited falls down is when it refers to a endlessly consumable resource, like storage.
> And they have the necessary pipes to serve the rate they sell you 24/7
I doubt they have those pipes, at least if every of their customers (or a sufficiently large amount) would actually make use of that.
Second question would be, how long they would allow you to utilize your broadband 24/7 at max capacity without canceling your subscription. Which leads back to the point the person I replied to was making: If you truly make use of what is promised, they cancel you. Hence it is not a faithful offer in the first place.
> Nobody has turned the moon into a hard drive yet.
Not important here because backblaze only has to match the storage of your single device. Plus some extra versions but one year multiplied by upload speed is also a tractable amount.
Since I know how many of those businesses are run I'll let you in on the very obvious secret: there’s zero chance they have enough uplink to accommodate everyone using 100% of their bandwidth at the same time, and probably much less than that.
Residential network access is oversold as everything else.
The only difference with storage is there’s a theoretical maximum on how much a single person can use.
But you could just as well limit backup upload speed for similar effect. Having something about fair use in ToS is really not that different.
Residential ISPs don’t work financially unless you oversell peak time full-rate bandwidth. If you do things right, you oversell at a level that your customers don’t actually slow down. Even today, you won’t have 100% of customers using 100% of their full line rate 100% of the time.
Back in the late 1990s we could run a couple dozen 56k lines on a 1.544 Mbps backhaul. We could have those to the same extent today, but there’s still a ratio that works fine.
Yes, yes. We know. The business environment can't be arsed to maintain it's own integrity by actually building out the capacity they want to charge for. Everyone hides behind statistical multiplexing until the actuarial pants shitting event occurs. Then it's bail out time, or "We're sorry. We used all the money for executive bonuses!"
Building out for 100% of theoretical capacity makes no sense but you can still easily accommodate the small handful of power users with plenty to spare. Most ISPs will not drop or throttle users trying to get their money's worth if it’s fiber or similar. LTE of course that’s another thing.
That sort of horrible abuse only happens in areas where some provider has strict monopoly, but that’s an aberration and with Starlink’s availability there’s an upper bound nowadays.
It’s not unlimited. The limit might be very high these days, but it’s at most bandwidth times duration. And while that sounds trivial, it does mean they aren’t selling you an infinity of a resource.
I've been running RPI-based torrent client 24/7 in several countries and never experienced that. Eats a few TBs per month, not the full line, but pretty decent amount. I guess it really depends on the country.
I’ve used Spectrum and their predecessors since the 90s. Never ran into this, although the upstream speeds are ridiculously slow, and they used to force Netflix traffic to an undersized peer circuit.
I'm unsure if you're sarcastic or not, never have I've used a ISP that would throttle you, for any reason, this is unheard of in the countries I've lived, and I'm not sure many people would even subscribe to something like that, that sounds very reverse to how a typical at-home broadband connection works.
Of course, in countries where the internet isn't so developed as in other parts of the world, this might make sense, but modern countries don't tend to do that, at least in my experience.
I think most are familiar with throttling because most (all?) phone plans have some data cap at one point, but I don't think I've heard of any broadband connections here with data caps, that wouldn't make any sense.
Data caps are just documenting the reality that ISPs oversubscribe - if they sell a hundred 1Gb/s connections to a neighborhood, it's highly unlikely they're peering that neighborhood onto the Internet at large at 100Gb/s. I don't know what the current standard is, but in the past it's been 10/100 to 1 - so a hundred 1Gb/s connections might be sharing 1-10Gb/s of uplink; and if usage starts to saturate that they need a way of backing off that is "fair" - data caps are one of the ways they inform the customer of such.
I've seen it with my new fiber rollout - every single customer no matter their purchased speed had 1Gb up and down - as more customers came online and usage became higher, they're not limiting anyone, but you get closer to your advertised rate - but my upload is still faster than my download because most of my neighborhood is downloading, few are uploading.
I have 5 Gbps symmetric at home. I and my fiancee both work from home, so our backup fiber connection from another provider is 2 Gbps. We can also both tether to cell phones if necessary. We can get 5G home wireless Internet here, too, and we might ditch our 2 Gbps line in favor of that as a backup. We moved from Texas back home to Illinois last year, and one of the biggest considerations was who had service at what tiers due to remote work. Some of the houses we looked at in the same three-county area in the Chicago suburbs didn’t even have 5G home available (not from AT&T, Verizon, or T-Mobile anyway).
My parents have 5G wireless home as their primary connection, and that was only introduced in their area a couple of years ago. Before that, they could get dial-up, 512 kbps wireless with about a $1000 startup cost, ISDN (although the phone company really didn’t want to sell it to them), Starlink, or HughesNet. The folks across the asphalt road from them had 20 Mbps Ethernet over power lines years ago, and that’s now I think 250 Mbps. It’s a different power company, though, so they aren’t eligible.
Around 80% of the US population lives in large urban areas. The other 20% of the population range from smaller towns to living many kilometers from any town at all. There’s a lot of land in the US.
Here in dense NYC, most apartments I've lived in have but a single ISP available. It's common to hunt for apartments by searching the address on service maps.
I'm pretty sure one landlord was cut in by his ISP, as he skipped town when I tried to ask about getting fiber, and his office locked their door and drew their shades when I went there with a technician on two occasions. The final time, we got there before they opened and the woman ran into the office and slammed the door on us.
"I guess the problem with Backblaze's business model with respect to Backblaze Personal is that it is "unlimited"."
The new and very interesting problem with their business model is that drive prices have doubled - and in some cases, more than doubled - in the last 12 months.
Backblaze has a lot of debt and at some point the numbers don't make sense anymore.
Yeah, I found that out recently when I had to purchase a new 16TB drive because of them in my RAID died recently. I bought the hard drive used about three years ago for about $130. To replace it I had to shop around and I ended up paying about $270 and I think that was considered a decent deal right now.
Oh well, I guess this is why we're given two kidneys.
It’s funny that the same person asking for linux support would complain about B2 “not being for home users”. I sync my own backups to B2 and would set that up over installing linux any day of the week! It’s extremely easy.
Yea, that's pretty shady. Either don't call your service unlimited or bump up the prices so you can survive occasional datahoarder, called them out on it many years ago.
I actually emailed them years ago about it. Asked them point blank what'd happen if I dumped 20+ TB of encrypted, undeduplicable backups onto their storage servers. They actually replied that there'd be no problem, but I didn't buy it. Not at all surprised to see this now.
If a company uses the word unlimited to describe their service, but then attempts to weasel out of it via their T&Cs, that doesn't constitute a disagreement over the meaning of the word unlimited. It just means the company is lying.
From a philosophical standpoint, I agree, but it terms of service providers "unlimited" has always pretty much always been synonymous with "unmetered" (i.e. we don't charge you for traffic, but we will still throttle you if you are affecting service reliability for other customers)
Sorry but unlimited has never meant unrestricted. TOCs always have restrictions. If it were unrestricted it would be used for all kinds of illegal stuff they don’t want on their servers, child pr0n and whatnot. They can’t legally offer a service like this without restrictions as they operate within an existing set of laws.
Unlimited however, they can offer. I don’t see how people get into mental block of thinking something is nefarious when a company offers you unlimited hosting or data. Yes, they know it’s impossible if everyone took full advantage of that. They also know most people won’t and so they don’t have to spend time worrying about it. It’s a simple actuarial exercise to work out the pricing that covers the use of your users.
Back in the early 2000s I ran a web hosting service that was predominantly a LAMP stack shared hosting environment. It had several unlimited plans and they were easy to estimate/price. The only times I had an issue of supporting a heavy user, it would turn out they were doing something unrestricted. Back then, it was usually something pron or mp3 related. So the user would get kicked off for that. I didn’t have any issues with supporting the usage load if it was within TOS. The margins were so high it was almost impossible to find a user that could give me any trouble from an economic standpoint.
When it comes to storage "unlimited" to me means a promise to be broken at some random point in the future. I'll never use a service that claims unlimited anything over having an actual cost model. Companies that charge by what you use have actually given consideration to the cost of doing business and have priced that in already.
I use them for the b2 bucket style storage where this happens. Its expensive per gig compared to the cost of a working personal unlimited desktop account. I like to visit their reddit page occasionally and its a constant stream of desktop client woes and stories of restoring problems and any time b2 is mentioned its like "but muh 50 terabytes" lol
It's cheaper if you have multiple computers with normal amounts of data though. My whole family is on my B2 account (Duplicati backing up eight computers each to a separate bucket), and it's $10/month.
I can understand in theory why they wouldn't want to back up .git folders as-is. Git has a serious object count bloat problem if you have any repository with a good amount of commit history, which causes a lot of unnecessary overhead in just scanning the folder for files alone.
I don't quite understand why it's still like this; it's probably the biggest reason why git tends to play poorly with a lot of filesystem tools (not just backups). If it'd been something like an SQLite database instead (just an example really), you wouldn't get so much unnecessary inode bloat.
At the same time Backblaze is a backup solution. The need to back up everything is sort of baked in there. They promise to be the third backup solution in a three layer strategy (backup directly connected, backup in home, backup external), and that third one is probably the single most important one of them all since it's the one you're going to be touching the least in an ideal scenario. They really can't be excluding any files whatsoever.
The cloud service exclusion is similarly bad, although much worse. Imagine getting hit by a cryptoworm. Your cloud storage tool is dutifully going to sync everything encrypted, junking up your entire storage across devices and because restoring old versions is both ass and near impossible at scale, you need an actual backup solution for that situation. Backblaze excluding files in those folders feels like a complete misunderstanding of what their purpose should be.
Why should a file backup solution adapt to work with git? Or any application? It should not try to understand what a git object is.
I’m paying to copy files from a folder to their servers just do that. No matter what the file is. Stay at the filesystem level not the application level.
I'm not saying Backblaze should adapt to git; the issue isn't application related (besides git being badly configured by default; there's a solution with git gc, it's just that git gc basically never runs).
It's that to back up a folder on a filesystem, you need to traverse that folder and check every file in that folder to see if it's changed. Most filesystem tools usually assume a fairly low file count for these operations.
Git, rather unusually, tends to produce a lot of files in regular use; before packing, every commit/object/branch is simply stored as a file on the filesystem (branches only as pointers). Packing fixes that by compressing commit and object files together, but it's not done by default (only after an initial clone or when the garbage collector runs). Iterating over a .git folder can take a lot of time in a place that's typically not very well optimized (since most "normal" people don't have thousands of tiny files in their folders that contain sprawled out application state.)
The correct solution here is either for git to change, or for Backblaze to implement better iteration logic (which will probably require special handling for git..., so it'd be more "correct" to fix up git, since Backblaze's tools aren't the only ones with this problem.)
7za (the compression app) does blazingly fast iteration over any kind of folder. This doesn't require special code for git. Backblaze's backup app could do the same but rather than fix their code they excluded .git folders.
When I backup my computer the .git folders are among the most important things on there. Most of my personal projects aren't pushed to github or anywhere else.
Fortunately I don't use Backblaze. I guess the moral is don't use a backup solution where the vendor has an incentive to exclude things.
IMHO, you can't do blazingly fast iteration over folders with small files in Windows, because every open is hooked by the anti-virus, and there goes your performance.
Actually once the initial backup is done there is no reason to scan for changes. They can just use a Windows service that tells them when any file is modified or created and add that file to their backup list.
No they don’t. They just have to price the product to reflect changing user patterns. When backblaze started, it was simply “we back up all the files on your drive” they didn’t even have a restore feature that was your job when you needed it. Over time they realized some user behavior changed, these Cloud drives where a huge data source they hadn’t priced in, git gave them some problems that they didn’t factor in, etc. The issue is there solution to dealing with it is to exclude it and that means they’re now a half baked solution to many of their users, they should have just changed the pricing and supported the backup solution people need today.
I think it's understandable for both Backblaze and most users, but surely the solution is to add `.git` to their default exclusion list which the user can manage.
I think they shouldn't back up git objects individually because git handles the versioning information. Just compress the .git folder itself and back it up as a single unit.
Better yet, include dedpulication, incremental versioning, verification, and encryption. Wait, that's borg / restic.
This is a joke, but honestly anyone here shouldn't be directly backing up their filesystems and should instead be using the right tool for the job. You'll make the world a more efficient place, have more robust and quicker to recover backups, and save some money along the way.
Eh, you really shouldn't do that for any kind of file that acts like a (an impromptu) database. This is how you get corruption. Especially when change information can be split across more than one file.
Sorry, what are you saying shouldn't be done? Backing up untracked/modified files in a bit repo? Or compressing the .git folder and backing it up as a unit?
> Backing up untracked/modified files in a bit repo?
This. It's best to do this in an atomic operation, such as a VSS style snapshot that then is consistent and done with no or paused operations on the files. Something like a zip is generally better because it takes less time on the file system than the upload process typically takes.
> SourceGear Vault Pro is a version control and bug tracking solution for professional development teams. Vault Standard is for those who only want version control. Vault is based on a client / server architecture using technologies such as Microsoft SQL Server and IIS Web Services for increased performance, scalability, and security.
It's probably primarily because Linus is a kernel and filesystem nerd, not a database nerd, so he preferred to just use the filesystem which he understood the performance characteristics of well (at least on linux).
I decided to look into this (git gc should also be doing this), and I think I figured out why it's such a consistent issue with git in particular. Running git gc does properly pack objects together and reduce inode count to something much more manageable.
It's the same reason why the postgres autovacuum daemon tends to be borderline useless unless you retune it[0]: the defaults are barmy. git gc only runs if there's 6700 loose unpacked objects[1]. Most typical filesystem tools tend to start balking at traversing ~1000 files in a structure (depends a bit on the filesystem/OS as well, Windows tends to get slower a good bit earlier than Linux).
To fix it, running
> git config --global gc.auto 1000
should retune it and any subsequent commit to your repo's will trigger garbage collection properly when there's around 1000 loose files. Pack file management seems to be properly tuned by default; at more than 50 packs, gc will repack into a larger pack.
[0]: For anyone curious, the default postgres autovacuum setting runs only when 10% of the table consists of dead tuples (roughly: deleted+every revision of an updated row). If you're working with a beefy table, you're never hitting 10%. Either tune it down or create an external cronjob to run vacuum analyze more frequently on the tables you need to keep speedy. I'm pretty sure the defaults are tuned solely to ensure that Postgres' internal tables are fast, since those seem to only have active rows to a point where it'd warrant autovacuum.
A few thousand files shouldn't be a problem to a program designed to scan entire drives of files. Even in a single folder and considering sloppy programs I wouldn't worry just yet, and git's not putting them in a single folder.
They 100% should have communicated this change, absolutely unacceptable to change behavior without an extremely visible warning.
However, backing up these kinds of directories has always been ill-defined. Dropbox/Google Drive/etc. files are not actually present locally - at least not until you access the file or it resides to cache it. Should backup software force you to download all 1TB+ of your cloud storage? What if the local system is low on space? What if the network is too slow? What if the actually data is in an already excluded %AppData% location.
Similar issue with VCS, should you sync changes to .git every minute? Every hour? When is .git in a consistent state?
IMO .git and other VCS should just be synced X times per day and it wait for .git to be unchanged for Y minutes before syncing it. Hell, I bet Claude could write a special Git aware backup script.
But Google Drive and Dropbox mount points are not real. It’s crazy to expect backup software to handle that unless explicitly advertised.
Dropbox and GDrive desktop clients can be configured to sync files to a local directory. Backing them up with an additional platform would probably need some sort of logic like you described for VCS.
I had similar experience as well. They upgraded their client and server software something like 5 years ago which put forward different restrictions on character set used for password. I have used a special character which was no longer allowed. When I needed to restore files after disk failure I could not log in either in the app or on the website. The customer service was useless -- we are sorry, your fault. I have lost 1 TB of personal photos due to this as a paying customer. Never trust Backblaze.
I have the same experience with Backblaze. 3 years ago I tried to restore my files from Backblaze, using their desktop client.
First thing I noticed is that if it can't download a file due to network or some other problem then it just skips it. But you can force it to retry by modifying its job file which is just an SQLite DB. Also it stores and downloads files by splitting them into small chunks. It stores checksums of these chunks, but it doesn't store the complete checksum of the file, so judging by how badly the client is written I can't be sure that restored files are not corrupted after the stitching.
Then I found out that it can't download some files even after dozens of retries because it seems they are corrupted on Backblaze side.
But the most jarring issue for me is that it mangled all non-ascii filenames. They are stored as UTF-8 in the DB, but the client saves them as Windows-1252 or something. So I ended up with hundreds of gigabytes of files with names like фикац, and I can't just re-encode these names back, because some characters were dropped during the process.
I wanted to write a script that forces Backblaze Client to redownload files, logs all files that can't be restored, fixes the broken names and splits restored files back into chunks to validate their checksums against the SQLite DB, but it was too big of a task for me, so I just procrastinated for 3 years, while keeping paying monthly Backblaze fees because it's sad to let go of my data.
Do you have any more details? This is a pretty big deal. The differentiators between Backblaze and Hetzner mostly boil down to this kind of thing supposedly not being possible.
I’m on my phone so forgive the formatting, but here’s my entire support exchange:
- - -
Hey, I tried restoring a file from my backup — downloading it directly didn't work, and creating a restore with it also failed – I got an email telling me contract y'all about it.
Can you explain to me what happened here, and what can I do to get my file(s?) back?
- - -
Hi Jan,
Thanks for writing in!
I've reached out to our engineers regarding your restore, and I will get back to you as soon as I have an update. For now, I will keep the ticket open.
- - -
Hi Jan,
Regarding the file itself - it was deleted back in 2022, but unfortunately, the deletion never got recorded properly, which made it seem like the file still existed.
Thus, when you tried to restore it, the restoration failed, as the file doesn't actually exist anymore. In this case, it shouldn't have been shown in the first place.
For that, I do apologize. As compensation, we've granted you 3 monthly backup credits which will apply on your next renewal. Please let me know if you have any further questions.
- - -
That makes me even more confused to be honest - I’ve been paying for forever history since January 2022 according to my invoices?
Do you know how/when exactly it got deleted?
- - -
Hi Jan,
Unfortunately, we don't have that information available to us. Again, I do apologize.
- - -
I really don’t want to be rude, but that seems like a very serious issue to me and I’m not satisfied with that response.
If I’m paying for a forever backup, I expect it to be forever - and if some file got deleted even despite me paying for the “keep my file history forever” option, “oh whoops sorry our bad but we don’t have any more info” is really not a satisfactory answer.
I don’t hold it against _you_ personally, but I really need to know more about what happened here - if this file got randomly disappeared, how am I supposed to trust the reliability of anything else that’s supposed to be safely backed up?
- - -
Hi Jan,
I'll inquire with our engineers tomorrow when they're back in, and I'll update you as soon as I can. For now, I will keep the ticket open.
- - -
Appreciate that, thank you! It’s fine if the investigation takes longer, but I just want to get to the bottom of what happened here :)
- - -
Hi Jan,
Thanks for your patience.
According to our engineers and my management team:
With the way our program logs information, we don't have the specific information that explains exactly why the file was removed from the backup. Our more recent versions of the client, however, have vastly improved our consistency checks and introduced additional protections and audits to ensure complete reliability from an active backup.
Looking at your account, I do see that your backup is currently not active, so I recommend running the Backblaze installer over your current installation to repair it, and inherit your original backup state so that our updates can check your backup.
I do apologize, and I know it's not an ideal answer, but unfortunately, that is the extent of what we can tell you about what has happened.
- - -
I gave up escalating at this point and just decided these aren’t trusted anymore.
The files in question are four year old at this point so it’s hard for me conclusively state, so I guess there might be a perfect storm of that specific file being deleted because it was due to expire before upgraded to “keep history forever”, but I don’t think it’s super likely, and I absolutely would expect them to have telemetry about that in any case.
If anyone from Backblaze stumbles upon it and wants to escalate/reinvestigate, the support ID is #1181161.
This reminds me of the Seinfeld riff on car rental reservations. Anyone can make a backup. The important part is holding the backup. If Backblaze doesn’t always do that then it is practically worthless to everyone.
This seems absurd from a company offering backups as a service.
Especially if they allow them restoring all your data onto a drive and shipping it to you, they pretty clearly should have enough information available to them to test restorations of data, and the number of times I've heard that failure mode ("oh, we didn't track deletions well enough, so we only found out we deleted it when you tried restoring"), plus them saying they have made improvements to avoid this exact failure mode in newer client versions, makes me think they should have enough reports to investigate it.
...which makes me wonder if they did, and decided they would go bankrupt if they told people how much data they lost, so they decided to bet on people not trying restores on a lot of the lost data.
Weirdly, reading this had the net impact of me signing up to Backblaze.
I had no idea that it was such a good bargain. I used to be a Crashplan user back in the day, and I always thought Backblaze had tiered limits.
I've been using Duplicati to sync a lot of data to S3's cheapest tape-based long term storage tier. It's a serious pain in the ass because it takes hours to queue up and retrieve a file. It's a heavy enough process that I don't do anything nearly close to enough testing to make sure my backups are restorable, which is a self-inflicted future injury.
Here's the thing: I'm paying about $14/month for that S3 storage, which makes $99/year a total steal. I don't use Dropbox/Box/OneDrive/iCloud so the grievances mentioned by the author are not major hurdles for me. I do find the idea that it is silently ignoring .git folders troubling, primarily because they are indeed not listed in the exclusion list.
I am a bit miffed that we're actively prevented from backing up the various Program Files folders, because I have a large number of VSTi instruments that I'll need to ensure are rcloned or something for this to work.
I'm happy to pay an annual fee for a one-size fits all approach that I don't have to think about. I read the post and I'm just saying that his blockers are not blockers for me.
I would ask you: what is the better alternative? That's not a rhetorical question; they don't have my credit card details for another two weeks.
You lose a bit of control. With S3 you can preprocess (transform, index, filter, downcode, etc) before storing. You can index metadata in place (names, sizes, metadata) for low-cost searching.
As for testing recovery, you can validate file counts, sizes + checksums without performing recovery.
A few shell scripts give you the power of advanced enterprise backup, whereas backblaze only supports GUI restores.
At some point, Backblaze just silently stopped backing up my encrypted (VeraCrypt) drives. Just stopped working without any announcement, warning or notification. After lots of troubleshooting and googling I found out that this was intentional from some random reddit thread. I stopped using their backup service after that.
Some companies are in the business of trust. These companies NEED to understand that trust is somewhat difficult to earn, but easy to lose and nearly IMPOSSIBLE to regain. After reading this article I will almost certainly never use or recommend Backblaze. (And while I don't use them currently, they WERE on the list of companies I would have recommended due to the length of their history.)
I noticed this (thankfully before it was critical) and I’ve decided to move on from BB. Easily over 10 year customer. Totally bogus. Not only did it stop backing it up the old history is totally gone as well.
The one thing they have to do is backup everything and when you see it in their console you can rest assured they are going to continue to back it up.
They’ve let the desktop client linger, it’s difficult to add meaningful exceptions. It’s obvious they want everyone to use B2 now.
Not OP, but I have been using borg backup [1] against Hetzner Storage Box [2]
Borg backup is a good tool in my opinion and has everything that I need (deduplication, compression, mountable snapshot.
Hetzner Storage Box is nothing fancy but good enough for a backup and is sensibly cheaper for the alternatives (I pay about 10 eur/month for 5TB of storage)
Before that I was using s3cmd [3] to backup on a S3 bucket.
Just a note of caution: sync != backup. When I was younger and dumber, I had my own rsync cron script to do a nightly sync of my documents to a remote server. One day I noticed files were gone from my local drive; I think there were block corruptions on the disk itself, and the files were dropped from the filesystem, or something like that. The nightly rsync propagated the deletions to the remote "backup."
Just this weekend, my backup tool went rogue and exhausted quota on rsync.net (Some bad config by me on Borg.) Emailed them, they promptly added 100 GB storage for a day so that I could recover the situation. Plus, their product has been rock solid since a few years I've been using them.
rsync.net and rclone are great, my brain understood restic easier than borg for local backups over usb (ymmv), and plain old `rsync --archive` is most excellent wrt preserving file mod times and the like.
There is 100% a difference between "dead data" (eg: movie.mp4) and "live data" (eg: a git directory with `chmod` attributes)- S3 and similar often don't preserve "attributes and metadata" without a special secondary pass, even though the `md5` might be the same.
I have used Arq for way over a decade. It does incremental encrypted backups and supports a lot of storage providers. Also supports S3 object lock (to protect against ransomware). It’s awesome!
(Arq developer here) By default Arq tries to be unobtrusive. Edit your backup plan and slide the “CPU usage” slider all the way to the right to make it go faster.
It looks like the following line has been added to /Library/Backblaze.bzpkg/bzdata/bzexcluderules_mandatory.xml which excludes my Dropbox folder from getting backed up:
That is the exact path to my Dropbox folder, and I presume if I move my Dropbox folder this xml file will be updated to point to the new location. The top of the xml file states "Mandatory Exclusions: editing this file DOES NOT DO ANYTHING".
.git files seem to still be backing up on my machine, although they are hidden by default in the web restore (you must open Filters and enable Show Hidden Files). I don't see an option to show hidden files/folders in the Backblaze Restore app.
> .git files seem to still be backing up on my machine
Try checking bzexcluderules_editable.xml. A few years ago, Backblaze would back up .git folders for Mac but not Windows. Not sure if this is still the case.
After mucking around with various easy to use options my lack of trust[1] pushed me into a more-complicated-but-at-least-under-my-control-option: syncthing+restic+s3 compatible cloud provider.
Basically it works like this:
- I have syncthing moving files between all my devices. The larger the device, the more stuff I move there[2]. My phone only has my keepass file and a few other docs, my gaming PC has that plus all of my photos and music, etc.
- All of this ends up on a raspberry pi with a connected USB harddrive, which has everything on it. Why yes, that is very shoddy and short term! The pi is mirrored on my gaming PC though, which is awake once every day or two, so if it completely breaks I still have everything locally.
- Nightly a restic job runs, which backs up everything on the pi to an s3 compatible cloud[3], and cleans out old snapshots (30 days, 52 weeks, 60 months, then yearly)
- Yearly I test restoring a random backup, both on the pi, and on another device, to make sure there is no required knowledge stuck on there.
This is was somewhat of a pain to setup, but since the pi is never off it just ticks along, and I check it periodically to make sure nothing has broken.
[1] there is always weirdness with these tools. They don't sync how you think, or when you actually want to restore it takes forever, or they are stuck in perpetual sync cycles
[2] I sync multiple directories, broadly "very small", "small", "dumping ground", and "media", from smallest to largest.
[3] Currently Wasabi, but it really doens't matter. Restic encrypts client side, you just need to trust the provider enough that they don't completely collapse at the same time that you need backups.
We need to talk about The Cone of Backups(tm), which you and I seem to have separately derived!
Props for getting this implemented and seemingly trusted... I wish there was an easier way to handle some of this stuff (eg: tiny secure key material => hot syncthing => "live" git files => warm docs and photos => cold bulk movies, isos, etc)... along with selective "on demand pass through browse/fetch/cache"
They all have different policy, size, cost, technical details, and overall SLA/quality tradeoffs.
~ 5 years ago, I had a development flow that involved a large source tree (1-10K files, including build output) that was syncthing-ed over a residential network connection to some k8s stuff.
Desyncs/corruptions happened constantly, even though it was a one-way send.
I've never had similar issues with rsync or unison (well, I have in unison, but that's two-way sync, and it always prompted to ask for help by design).
Anyway, my decade-old synology is dying, so I'm setting up a replacement. For other reasons (mostly a decade of systemd / pulse audio finding novel ways to ruin my day, and not really understanding how to restore my synology backups), I've jumped ship over to FreeBSD. I've heard good things about using zfs to get:
So, similar to what you've landed on, but with one size tier. I have docker containers that the phone talks to for stuff like calendars, and just have the source of the backup flow host my git repos.
One thing to do no matter what:
Write at least 100,000 files to the source then restore from backup (/ on a linux VM is great for this). Run rsync in dry run / checksum mode on the two trees. Confirm the metadata + contents match on both sides. I haven't gotten around to this yet with the flow I just proposed. Almost all consumer backup tools fail this test. Comments here suggest backblaze's consumer offering fails it badly. I'm using B2, but I haven't scrubbed my backup sets in a while. I get the impression it has much higher consistency / durability.
I will say I specifically don't sync git repos (they are just local and pushed to github, which I consider good enough for now), and I am aware that syncthing is one more of those tools that does not work well with git.
syncthing is not perfect, and can get into weird states if you add and remove devices from it for example, but for my case it is I think the best option.
I've personally had no major issues with syncthing, it just works in the background, the largest folder I have synced is ~6TB and 200k files which is mirroring a backup I have on a large external.
One particular issue I've encountered is that syncthing 2.x does not work well for systems w/o an SSD due to the storage backend switching to sqlite which doesn't perform as well as leveldb on HDDs, the scans of the 6TB folder was taking an excessively long time to complete compared to 1.x using leveldb. I haven't encountered any issues with mixing the use of 1.x and 2.x in my setup. The only other issues I've encountered are usually related to filename incompatibilites between filesystems.
Anecdotally, I've been managing a Syncthing network with a file count in the ~200k range, everything synced bidirectionally across a few dozen (Windows) computers, for 9 years now; I've never seen data loss where Syncthing was at fault.
Good to know. I wonder what the difference is. We were doing things like running go build inside the source directory. Maybe it can't handle write races well on Linux/MacOS?
I also have a lil script that rolls dice on restic snapshot, then lists files and picks a random set to restore to /dev/null.
I still trust restic checksums will actually check whether restore is correct, but that way random part of storage gets tested every so often in case some old pack file gets damaged
I once had to restore around 2 TB of RAW photos.
The app was a mess. It crashed every few hours. I ended up manually downloading single folders over a timespan of 2 weeks to restore my data. The support only apologized and could not help with my restore problem. After this I cancelled my subscription immediately and use local drives for my backups now, drives which I rotate (in use and locations).
I restored a 2 TB drive via the net no problem from them some years back, although I didn't use the client, but downloaded one massive ZIP file from the web interface.
Well, "no problem" is an overstatement. Once you need a restore, you learn that their promise of end-to-end encryption is actually a lie. (As in, you have to break the end-to-end encryption to restore since everything has to be decrypted on their servers.)
I can almost almost understand the logic behind not backing up OneDrive/Dropbox. I think it's bad logic but I can understand where it's coming from.
Not backing up .git folders however is completely unacceptable.
I have hundreds of small projects where I use git track of history locally with no remote at all. The intention is never to push it anywhere. I don't like to say these sorts of things, and I don't say it lightly when I say someone should be fired over this decision.
I had a back and forth with them about .git folders a couple of years back and their defence was something like "we are a consumer product - not a professional developer product. Pay for our business offering"
But if that's truly their stance, then they are being deceptive about their non-business offering at the point of sale.
EDIT - see my other comment where I found the actual email
Well I do pay for their business product, I have a "Business Groups" package with a few dozen endpoints all backing up for $99/year per machine.
According to support's reply just now, my backups are crippled just like every other customer. No git, no cloud synced folders, even if those folders are fully downloaded locally.
(This is also my personal backup strategy for iCloud Drive: one Mac is set to fully download complete iCloud contents, and that Mac backs up to Backblaze.)
> My first troubling discovery was in 2025, when I made several errors then did a push -f to GitHub and blew away the git history for a half decade old repo. No data was lost, but the log of changes was.
I know this is besides the point somewhat, but: Learn your tools people. The commit history could probably have been easily restored without involving any backup. The commits are not just instantly gone.
Indeed, the commits and blobs might even have still been available on the GitHub remote, I'm not sure they clean them on some interval or something, but bunch of stuff you "delete" from git still stays in the remote regardless of what you push.
AFAICT Backblaze does back up .git directories. I have many repos backed up. The .git directory is hidden by default in the web UI (along with all other hidden files), but there is an option to show them.
You should try downloading one of your backed up git repos to see if it actually does contain the full history, I just checked several and everything looks good.
> changes within .git directories occur far too often and over so many files that the Backblaze software simply would not be able to keep up.
I don’t really understand that. I’m using Windows File History, and while it’s limited to backing up changes only every 15 minutes, and is writing to a local network drive, it doesn’t seem to have any trouble with .git directories.
Thanks. Silently ignoring .git folders would be much more egregious than not backing up cloud drives in my opinion. The latter is at least somewhat understandable, though they should have been more transparent about it.
A lot of personal “nerd” options are listed in the thread (and like restic/borg are really good!) but nothing really centralized. Backblaze was a great fire and forget option for deploying as a last resort backup. I don’t think there are any competitors in that space if you are looking for continuous backup, centralized management and good pricing that doesn’t require talking to a salesperson to get things going and is pay as you go.
It's ironic that Backblaze themselves wrote a blog post a couple of years ago explaining why Dropbox isn't enough as a backup service and you need Backblaze as an additional layer of protection: https://www.backblaze.com/blog/whats-wrong-with-google-drive...
Everyone is acting like this is obviously wrong, and they clearly should have communicated the change and made it visible in the exclusion settings.
However, there is a very good reason for not backing up what is in effect network attached storage. Particularly for OneDrive, as it often adds company SharePoint sites you open files from as mountpoints under your OneDrive folder (business OneDrive is basically a personal Sharepoint site under the hood). Trying to back them up would result in downloading potentially hundreds of gigabytes of files to the desktop only to them reupload them to OneDrive. That would also likely trigger data exfiltration flags at your corporate IT.
A Dropbox/OneDrive/Drive/etc folder is a network mount point by another name. (Many of them are not implemented as FUSE mounts or equivalent OS API, not folders on disk.) It's fundamentally reasonable for software that promises backing up the local disk not to backup whatever network drives you happen to have signed in/mounted.
Ironically drop box and one drive folders I can still somewhat understand as they are "backuped" in other ways (but potentially not reliable so I also understand why people do not like that).
But .git? It does not mean you have it synced to GitHub or anything reliable?
If you do anything then only backup the .git folder and not the checkout.
But backing up the checkout and not the .git folder is crazy.
I have multiple drives that started out as their own os. Each of them has a Dropbox folder in the standard location. Each of them has a different set of files in them (I deduped at one point), with some overlap of different versions. I no longer use Dropbox, so none of these are synced anywhere.
They don't need to be in my case, I'm only using them now because of existing shortcuts and VM shares and programs configured to source information from them. That doesn't mean I don't want them backed up.
Same for OneDrive: Microsoft configured my account for OneDrive when I set it up. Then I immediately uninstalled it (I don't want it). But I didn't notice that my desktop and documents folders live there. I hate it. But by the time I noticed it, it was already being used as a location for multiple programs that would need to be reconfigured, and it was easier to get used to it than to fix it. Several things I've forgotten about would likely break in ways I wouldn't notice for weeks/months. Multiple self-hosted servers for connecting to my android devices would need to reindex (Plex, voidtools everything, several remote systems that mount via sftp and connected programs would decide all my files were brand new and had never been seen before)
normally this folder are synced to dropbox and/or onedrive
both services have internal backups to reduce the chance they lose data
both services allow some limited form of "going back to older version" (like the article states itself).
Just because the article says "sync is not backup" doesn't mean that is true, I mean it literally is backup by definition as it: makes a copy in another location and even has versioning.
It's just not _good enough_ backup for their standards. Maybe even standards of most people on HN, but out there many people are happy with way worse backups, especially wrt. versioning for a lot of (mostly static) media the only reason you need version rollback is in case of a corrupted version being backed up. And a lot of people mostly backup personal photos/videos and important documents, all static by nature.
Through
1. it doesn't really fulfill the 3-2-1 rules it's only 2-1-1 places (local, one backup on ms/drop box cloud, one offsite). Before when it was also backed up to backblaze it was 3-2-1 (kinda). So them silently stopping still is a huge issue.
2. newer versions of the 3-2-1 rule also say treat 2 not just as 2 backups, but also 2 "vendors/access accounts" with the one-drive folder pretty much being onedrive controlled this is 1 vendor across local and all backups. Which is risky.
> You are using it to mean "maintaining full version history", I believe?
No, they are using it to mean “backed up”. Like, “if this data gets deleted or is in any way lost locally, it’s still backed remotely (even years later, when finally needed)”.
I’m astonished so many people here don’t know what a backup is! No wonder it’s easy for Backblaze to play them for fools.
But isn't that exactly what Dropbox does? If I delete a file on my PC, I can go to Dropbox.com and restore it, to some period in the past (I think it depends on what you pay for). In fact, I can see every version that's changed during the retention period and choose which version to restore.
Maintaining version history out to a set retention period is a backup...no?
definition of the term backup by most sources is one the line of:
> a copy of information held on a computer that is stored separately from the computer
there is nothing about _any_ versioning, or duration requirements or similar
To use your own words, I fear its you who doesn't know what a backup is and assume a lot other additional (often preferable(1)) things are part of that term.
Which is a common problem, not just for the term backup.
There is a reason lawyers define technical terms in a for this contract specific precise way when making contracts.
Or just requirements engineering. Failing there and you might end up having a backup of all your companies important data in a way susceptible to encrypting your files ransomware or similar.
---
(1): What often is preferable is also sometimes the think you really don't want. Like sometimes keeping data around too long is outright illegal. Sometimes that also applies to older versions only. And sometimes just some short term backups are more then enough for you use case. The point here is the term backup can't mean what you are imply it does because a lot of existing use cases are incompatible with it.
Microsoft makes no guarantees on onedrive, you are responsible for backing up that data. Of course they try hard to keep it safe, but contractually they give no promises
Both Dropbox and OneDrive default to "online first" for most users (including Dropbox on macOS which has moved itself into File Provider). It is a technically sound and sane default for Backblaze to ignore these mounts, especially given their policy not to backup network drives. They really should have informed legacy users about it.
Technically speaking, imagine you're iterating over a million files, and some of them are 1000x slower than the others, it's not Backblaze's fault that things have gone this way. Avoiding files that are well-known network mount points is likely necessary for them to be reliable at what they do for local files.
It's important to recognize that these new OS-level filesystem hooks are slow and inefficient - the use case is opening one file and not 10,000 - and this means that things you might want to do (like recursive grep) are now unworkably slow if they don't fit in some warmed-up cache on your device.
To fix it, Backblaze would need a "cloud to cloud" backup that is optimized for that access pattern, or a checkbox (or detection system) for people who manage to keep a full local mirror in a place where regular files are fast. This is rapidly becoming a less common situation. I do, however, think that they should have informed people about the change.
1. You have to check "show hidden files" in the web ui (or the app) when restoring and
2. If you restore a folder that has a '.git' folder inside of it (by checking it in the ui) but you DID NOT check "show hidden files", then the '.git' (or any other hidden file/folder) does not get restored.
Which is.. unexpected.. if I check a folder to restore, I expect *everything* inside of it to be restored.
But the dropbox folder is, in fact, not there. Which is a surprise to me as well. :(
I highly recommend switching to something more like Arq and then using whatever backend storage that you want. There are probably some other open source ways to do it, etc, but Arq scratches the itch of having control over your backups and putting them where you want with a GUI to easily configure/keep track of what is going on.
Maybe there's something newer/better now (and I bought lifetime licenses of it long ago), but it works for me.
That said, I use Arq + Backblaze storage and I think my monthly bill is very low, like under $5. Though I haven't backed-up much media there yet, but I do have control over what is being backed-up.
Yeah this is the core problem with how most backup tools handle Dropbox / iCloud / OneDrive now.
Those folders aren’t really “normal files” anymore — a lot of the time they’re just placeholders, and touching them can trigger downloads or other weird behavior depending on the client.
That said, just skipping the entire folder is kind of the worst possible outcome. Backup should be predictable. If something is on disk, it should get backed up. If it’s not, you should at least know that, not find out later when you need it.
I’ve been working on Duplicati (https://github.com/duplicati/duplicati) and one thing we’ve tried to be careful about is not silently ignoring data. If something can’t be backed up, it should be visible to the user.
Feel free to reach out to me if you have any questions about setting up duplicati.
For those looking for something at a decent price for up to 5TB, take a look at JottaCloud, which is supported by rclone, and then you can layer restic on top for a complete backup solution.
JottaCloud is "unlimited" for $11.99 a month (your upload speed is throttled after 5TB).
I've been using them for a few years for backing up important files from my NAS (timemachine backups, Immich library, digitised VHS's, Proxmox Backup Server backups) and am sitting at about 3.5TB.
WJW. This sort of blanket policy change should be called-out in ALL CAPS, bold-faced, and underlined as it changes one of the implicit assumptions with the service's execution.
The technical and performance implications of backing-up cloud mount-points are real, but that's zero excuse for the way this change was communicated.
This is a royal screw-up in corporate communications and I would not be surprised if it makes a huge negative impact in their bottom line and results in a few terminations.
I think this is a risk with anything that promotes itself as "unlimited", or otherwise doesn't specify concrete limits. I'm always sceptical of services like this as it feels like the terms could arbitrarily change at any point, as we've found out here.
(as a side note, it's funny to see see them promoting their native C app instead of using Java as a "shortcut". What I wouldn't give for more Java apps nowadays)
On the topic of backing up data from cloud platforms such as Onedrive, I suspect this is stop the client machine from actively downloading 'files on demand' which are just pointers in explorer until you go to open them.
If you've got huge amounts of files in Onedrive and the backup client starts downloading everyone of them (before it can reupload them again) you're going to run into problems.
This is a pain, to be sure, but surely there is some sort of logic you could implement to detect whether a file is a Real File that actually exists on the device (if so, back it up) or a pointer to the cloud (ignore it by default, probably, but maybe provide a user setting to force it to back up even these)
It used to be the case that placeholder files were very obvious but now OneDrive and iCloud (possibly others) work more like an attached network storage with some local cache, and that was a good move for most programs because back then a file being evicted from storage looked like a file deletion.
Came here to say this. Files in OneDrive get removed from your local storage and are downloaded ON DEMAND. given that you can have 1TB+ onedrive folder, backblaze downloading all of that is gonna throttle your connection and fill up your disk real fast.
I think this should not be attributed to malice, however unfortunate. I had also developed some sync app once and onedrive folders were indeed problematic, causing cyclic updates on access and random metadata changes for no explicit reason.
Complete lack of communication (outside of release notes, which nobody really reads, as the article too states) is incompetence and indeed worrying.
Just show a red status bar that says "these folders will not be backed up anymore", why not?
If the constant meta changes (or other peculiarities involving those folders) make the sync unusable, then it can be both. In that case, you stop syncing and communicate.
So my idea is that it's a competency problem (lack of communication), not malice. But it's just a theory, based on my own experience.
In any case, this is a bad situation, however you look at it.
I've been on Backblaze for a few years now, ever since Crashplan decided it didn't want individuals to use its service any more.
It's always been just janky. A bad app that constantly throws low disk warnings and opens a webpage if you click anywhere on it. Being told the password change dialogue in the app doesn't work and having to use the website etc etc.
Just all round not an experience that inspires confidence. In comparison, Crashplan just worked.
Unrelated to the main point, and probably too late to matter, but you can access repo activity logs via Github's API. I had to clean up a bad push before and was able to find the old commit hash in the logs, then reset the branch to that commit, similarly to how you'd fix local messes using reflog.
Use restic with resticprofile and you won't need anything else. Point it to a Hetzner storagebox, the best value you can get. Don't trust fisher price backup plans
Isn't it challenging to back up a directory that's being synced with a 3rd party service? Especially if more than one person may be working on one of those files in OneDrive or DropBox?
I think the target of the anger here should be (at least in part): OneDrive.
My understanding is that a modern, default onedrive setup will push all your onedrive folder contents to the cloud, but will not do the same in reverse -- it's totally possible to have files in your cloud onedrive, visible in your onedrive folder, but that do not exist locally. If you want to access such a file, it typically gets downloaded from onedrive for you to use.
If that's the case, what is Backblaze or another provider to do? Constantly download your onedrive files (that might have been modified on another device) and upload them to backblaze? Or just sync files that actually exist locally? That latter option certainly would not please a consumer, who would expect the files they can 'see' just get magically backed up.
It's a tricky situation and I'm not saying Backblaze handled it well here, but the whole transparent cloud storage situation thing is a bit of a mess for lots of people. If Dropbox works the same way (no guaranteed local file for something you can see), that's the same ugly situation.
Most have pointed out that the OneDrive exclusion makes sense due to its complexity. But I see no one here defending the undocumented .git exclusion. That’s pretty egregious - if I’m backing up that directory it’s always 100% intentional and it definitely feels like a sacrifice to the product functionality for stability and performance. Not documenting it just twists the knife.
If you want to access your file, it gets downloaded. If Backblaze wants to check if your file has been changed, it doesn’t need to have the file downloaded - that’s what modification time is for. And file size.
Time Machine has a similar issue. OneDrive silently corrupted hundreds of my files, replacing their content with binary zeros while retaining the original file size. I have Time Machine backups going back years, but it turns out TM does not backup Cloud files, even if you have them pinned to local storage! So I lost sales those files, including some irreplaceable family photos
I’ve added restic to my backup routine, pointed at cloud files and other critical data
This is really disturbing to hear as I've incorporated B2 into a lot of my flow for backups as well as a storage backend for Nextcloud and planned as the object store for some upcoming archival storage products I'm working on.
I know the post is talking about their personal backup product but it's the same company and so if they sneak in a reduction of service like this, as others have already commented, it erodes difficult-to-earn trust.
I've been very content moving away from OneDrive/GDrive to a personal NAS setup with Synology/Ugreen. You can access a shared drive/photo drive and use Tailscale to mount your volume from anywhere.
I've also configured encrypted cloud backups to a different geographic region and off-site backups to a friend's NAS (following the 3-2-1 backup rule). It does help having 2.5Gb networking as well, but owning your data is more important in the coming age of sloppy/degrading infrastructure and ransomware attacks.
I’ve been using it for years, and the one time I needed to restore a file, I realized that VMware VMs files were excluded from the backup. They are so many exclusion that I start doing physical backup again.
I assume they do some form of de-duplication across all files in their system. Most windows system files, and binaries would be duplicates, and only need to be stored once. I'm relatively sure this is true for most other systems, like Linux, MacOS, etc. Why not just back everything up for everyone?
It really shouldn't take up much more space or bandwidth.
Personally: I had to go in and edit the undisclosed exclusions file, and restart the backup process. I've got quite a few gigabytes of upload going now.
Having option to not do that is great, we did it for our backup system, because our cloud stores were backed up separately.
Doing it silently is disaster.
Making excludes doing it hidden from UI is outright malice, because it's far too easy to assume those would just be added as normal excludes and then go "huh, I probably just removed those from excludes when I set it up".
It seems to me that Backblaze does NOT exclude ".git". It's not shown by default in the restore UI -- you must enable "show hidden files" to see it -- but it's there. I just did a test restore of my top-level Project directory (container for all of my personal Git projects) and all .git directories are included in the produced .zip file.
At least at the enterprise level, I've never seen anyone use Backblaze for this. You want to use a cloud level backup like Rubrik/Veeam/Cohesity. Trying to back up cloud based files locally is a fools errand. Granted it sucks that they dropped this without proper communication, but it's already a bad solution.
Thanks for publicising. I recently decided not to renew my Backblaze in favour of 'self hosting' encrypted backups outside the US. But I was horrified to learn that my git repos may not have been backed up, nor my Dropbox, whose subscription I also recently cancelled. Good riddance.
My experience using restic has been excellent so far, snapshots take 5 mins rather than 30 mins with backblaze's Mac client. I just hope I can trust it…
Glad I switched from their personal computer backup to using restic + B2 a while ago. Every night my laptop and homelab both back up to each other and to B2. It takes less than a minute and I have complete control over the exclusions and retention. And I can easily switch off B2 to something else if I want.
I was using Restic + B2 for a while, but recently switched to Restic + Hetzner Storage Box.
Storage Box is a little more effort to setup since it doesn't provide an S3 interface and I instead had to use WebDAV, but it's more affordable and has automated snapshots that adds a layer of easy immutability.
I would love to see a summary of all of the various options being bandied about.
There are 2 components in my mind: the backup "agent" (what runs on your laptop/desktop/server) and the storage provider (which BB is in this context).
What do people recommend for the agent? (I understand some storage providers have their own agents) For Linux/MacOS/Windows.
What do people recommend for the storage provider? Let's assume there are 1TB of files to be backed up. 99.9% don't change frequently.
I feel that's a systemic problem with all consumer online-backup software: They often use the barest excuse to not back things up. At best, it's to show a fast progress bar to the average user, and at worst it's to quietly renege on the "unlimited" capacity they promised when they took your money. [1]
Trying to audit—let alone change—the finer details is a pain even for power users, and there's a non-zero risk the GUI is simply lying to everybody while undocumented rules override what you specified.
When I finally switched my default boot to Linux, I found many of those offerings didn't support it, so I wrote some systemd services around Restic + Backblaze B2. It's been a real breath of fresh air: I can tell what's going on, I can set my own snapshot retention rules, and it's an order of magnitude cheaper. [2]
____
[1] Along the lines of "We have your My Documents. Oh, you didn't manually add My Videos or My Music for every user? Too bad." Or in some cases, certain big-file extensions are on the ignore list by default for no discernible reason.
[2] Currently a dollar or two a month for ~200gb. It doesn't change very much, and data verification jobs redownload the total amount once a month. I don't backn up anything I could get from elsewhere, like Steam games. Family videos are in the care of different relatives, but I'm looking into changing that.
Yes, you're exactly right. Once they decide not to exclude certain filetypes it puts the burden on the endusers who are unequipped to monitor these changes.
With restic I don't need some kind of special server daemon on the other end, I can point my backup destination to any mountable filesystem, or relatively dumb "bucket" stores like S3 or B2. I like having the sense of options and avoiding lock-in. [1]
As for GUIs in general... Well, like I said, I just finished several years of bad experiences with some proprietary ones, and I wanted to see and choose what was really going on.
At this point, I don't think I'd ever want a GUI beyond a basic status-reporting widget. It's not like I need to regularly micromanage the folder-set, especially when nobody else is going to tweak it by surprise.
_____
[1] The downside to the dumb-store is a ransomware scenario, where the malware is smart enough to go delete my old snapshots using the same connection/credentials. Enforcing retention policies on the server side necessarily needs a smarter server. B2 might actually have something useful there, but I haven't dug into it.
Same, I use Restic + Backrest (plus monitoring on Healthchecks, self-hosted + Prometheus/AlertManager/Pushover), with some decent structure - local backups every half-an-hour to raid1, every hour a backup to my old NAS, every day a backup to FTP in Helsinki, and once a week some backups to Backblaze (via Restic). Gives me local backups, observability, remote backups spread across different providers - seems quite safe :) I highly recommend to everyone figuring out a good backup strategy, takes a day or two.
Edit: on top of that I've built a custom one-page monitoring dashboard, so I see everything in one place (https://imgur.com/B3hppIW) - I'll opensource, it's decent architecture, I just need to cleanup some secrets from Git history...
What cloud backend are people using for restic? B2/S3/something else? I'm still just backing up to other machines using it (though I'd also heavily recommend restic)
I run restic with rclone, which is not only compatible with S3-like storage (which include many, like Hetzner, OVH, Exoscale) but many others, from mega to pcloud through Google Drive.
For stuff I care about (mostly photos), I back them up on two different services. I don't have TBs of those, so it's not very expensive. My personal code I store on git repositories anyway (like SourceHut or Codeberg or sometimes GitHub).
They need to be reminded they are a utility. It's not up to them to have an opinion about my data. "Utilities" with an opinion carry a lot more liability.
Anyone have suggestions for backing up Google Drive + local files? I keep reading the horror stories about people getting locked out of cloud services, and worry about my 20 years of history stored in Drive. Less worried about local files which are sync'd to an external disk, but it'd be nice to have something in place for everything.
I assume when asking such a question, you expect an honest answer like mine:
rclone is my favorite alternative. Supports encryption seamlessly, and loaded with features. Plus I can control exactly what gets synced/backed up, when it happens, and I pay for what I use (no unsustainable "unlimited" storage that always comes with annoying restrictions). There's never any surprises (which I experienced with nearly every backup solution). I use Backblaze B2 as the backend. I pay like $50 a month (which I know sounds high), but I have many terabytes of data up there that matters to me (it's a decade or more of my life and work, including long videos of holidays like Christmas with my kids throughout the years).
For super-important stuff I keep a tertiary backup on Glacier. I also have a full copy on an external harddrive, though those drives are not very reliable so I don't consider it part of the backup strategy, more a convenience for restoring large files quickly.
Backblaze's B2 storage is fine if used with a separate app over which you have more control. Others here have mentioned Arq. I have used it, as well as Kopia[0] and Blinkdisk[1] (Blinkdisk is essentially Kopia but with a nicer UI). Can recommend all three highly; the latter two are FOSS.
I was waiting for some kind of pop up, so I could click "Deny all" and then the text would be readable. But no. It just stayed essentially greyed out, like reading it is an invalid option (which turns out to be the case).
> I made several errors then did a push -f to GitHub and blew away the git history for a half decade old repo. No data was lost, but the log of changes was. No problem I thought, I’ll just restore this from Backblaze.
`git reflog` is your friend. You can recover from almost any mistake, including force-pushed branches.
This is an absolutely massive loss for me. I had no idea it wasn't backing up my OneDrive files. A horrible way to find out and a massive loss of trust.
This is why I use Arq with Backblaze. They just see a bunch of encrypted files with random GUID filenames. They don't need to know what I'm backing up, just that I am backing it up.
Hetzner storagebox. 1TB for under 5 bucks/month, 5TB for under 15. Sftp access. Point your restic there. Backup game done, no surprises, no MBAs involved.
Hetzner Storage Box is not even remotely a similar product to B2, considering the only reliability offered is the disks being in RAID. There is no geo-redundancy at all. The number of simultaneous connections is also quite limited, though this might not matter for many use cases.
Selfhosting Offsite is hard.
Accessing services via standard protocols like ssh/webdav and just pushing your encrypted blobs there is a good middle ground.
They can't control what you upload, and you can easily point your end-point somewhere else if you need to move.
* S3 is super expensive, unless you use Glacier, but that has a high overhead per file, so you should bundle them before uploading.
* If your value your privacy, you need to encrypt the files on the client before uploading.
* You need to keep multiple revisions of each file, and manage their lifecycle. Unless you're fine with losing any data that was overwritten at the time of the most recent backup.
* You need to de-duplicate files, unless you want bloat whenever you rename a file or folder.
* Plus you need to pay for Amazon's extortionate egress prices if you actually need to restore your data.
I certainly wouldn't want to handle all that on my own in a script. What can make sense is using open source backup software with S3/R2/B2 as backing storage.
In terms of software I've been impressed by restic, and as a developer who wants to be able to not back-up gitignored files the rustic clone of restic.
In terms of cloud storage... well I was using backblaze's b2 but the issues here are definitely making me reconsidering doing business with the company even if my use of it is definitely not impacted by any of them.
I like how you can set multiple keys (much like LUKS) so that the key used by scheduled backups can be changed without messing with the key that I have memorized to restore with when disaster strikes.
It also means you can have multiple computers backing up (sequentially, not simultaneously) to the same repository, each with their own key.
I'm Backblaze user -- multiple machines, multiple accounts. I'm going to be dropping Backblaze over this change, that I'm only learning about from this thread.
That's pretty crazy because I just set up personal backups with a different service (rsync.net, I was already using it for WP website backups) and my git folders were literally my first priority
I found out the hard way that backblaze just deletes backed up data from external hard drives that haven't been connected in a while. I had like 2TB total.
I like backblaze for backups, but I use restic and b2. You get what you pay for. Really lame behavior from backblaze as I always recommended their native backup solution to others and now need to reconsider.
Time to make the move over to linux and use Duplicati with Backblaze or any other bucket. You get the benefit of encrypted backups, have more control over what to back up, and will be notified upon failure.
not helpful for non-mac users, but i really like the way arq separates the backup utility from the backup location. I feel like the the reason backblaze did this was to save money on "unlimited" storage and the associated complexity of cloud storage locations.
If this is true, I'll need to stop using Backblaze. I have been relying on them for years. If I had discovered this mid-restore, I think I would have lost my mind.
Well shit. If this is right, I'm dropping Backblaze and recommending all my friends/customers do the same. I pay for and rely on Backblaze as the "back up everything" they advertise.... to silently stop backing up the vast majority of my work is unacceptable!
I only use Backblaze as a cold storage service so this doesn't affect me but it's worth knowing about changes in the delivery of their other services as it might become widespread
The article links to a statement made by Backblaze:
"The Backup Client now excludes popular cloud storage providers [...] this change aligns with Backblaze’s policy to back up only local and directly connected storage."
I guess windows 10 and 11 users aren't backing up much to Backblaze, since microsoft is tricking so many into moving all of their data to onedrive.
Not backing up cloud is a good default. I have had people complain about performance when they connected to our multiple TB shared drive because their backup software fetched everything. There are of course reasons to back that up I am not belittling that, but not for people who want temporary access to some 100GB files i.e. most people in my situation.
This is terrifying. Aren't Backblaze users paying per-GB of storage/transfer? Why should it matter what's being stored, as long as the user is paying the costs? This will absolutely result in permanent data loss for some subset of their users.
I hope Backblaze responds to this with a "we're sorry and we've fixed this."
I just looked in my Backblaze restore program, and all my .git folders are in there. I did have to go to the Settings menu and toggle an option to show hidden files. This is the Mac version.
I was always roughly of the mind that Backblaze was just too close to the "If it's too good to be true it probably is", seems like that may have been a good decision.
I recently stopped using Backblaze after a decade because it was using over 20GB of RAM on my machine. I also realized that I mostly wanted it for backing up old archival data that doesn’t change ever really. So I created a B2 bucket and uploaded a .tar.xz file.
restic and with cloudflare r2 (safety) or new hetzner storage boxes(cost effectiveness) are almost cheaper than backblaze 'unlimited' with full control and 'unlimited' history.
I still like backblaze, they've been nice for the days where I was running windows. Their desktop app is probably one of the best in the scene.
> My first troubling discovery was in 2025, when I made several errors then did a push -f to GitHub and blew away the git history for a half decade old repo
git reflog is your "backup". it contains every commit and the resulting log (DAG) going back 90 days. If you do blow away a remote commit, don't fret, it's in your reflog
# list all of the remote HEAD commits you've ever worked with
git reflog origin/master
# double check it's the right one
git log -5 origin/master@{2}
# reset the remote to the right one
git push -f origin $(git rev-parse origin/master@{2}):master
# (optional) reset your local branch
git reset origin/master@{2}
# at this point your local branch has time-traveled, but your working dir will be in the present state (e.g. all the relevant files will show as changed)
This "let's not back up .git folders" thing bit me too. I had reinstalled windows and thought "Eh, no big deal, I'll just restore my source code directory from Backblaze". But, of course, I'm that kind of SWE who tends to accumulate very large numbers of git repositories over time (think hundreds at least), some big, some small. Some are personal projects. Some are forks of others. But either way, I had no idea that Backblaze had decided, without my consent, to not back up .git directories. So, of course, imagine how shocked and dismayed I was when I discovered that I had a bunch of git repositories which had the files at the time they were backed up, but absolutely no actual git repo data, so I couldn't sync them. At all. After that, I permanently abandoned Backblaze and have migrated to IDrive E2 with Duplicati as the backup agent. Duplicati, at least, keeps everything except that which I tell it not to, and doesn't make arbitrary decisions on my behalf.
Windows is constantly pushing my wife and inlaws to move all their files to OneDrive while Backblaze is no longer backing up OneDrive. There are similar things going on with Apple and iCloud.
What is the point of Backblaze at all at this point? If you are a consumer, all your files are probably IN OneDrive or iCloud or soon will be.
I use Backblaze to backup my gaming PC. While .git and Dropbox does not affect me it’s worrisome that OneDrive is not backed up seeing as Windows 11 somehow automatically/dark pattern stores local files in OneDrive.
You have to give Apple credit, they nailed Time Machine. I have fully restored from Time Machine backups when buying new Macs more times than I can count. It works and everything comes back to an identical state of the snapshot. Yet, Microsoft can’t seem to figure this out.
I use Kopia Backup software, sending all my important files to a compatible S3 bucket, using retention-mode: compliance as ransomware protection. I have access to every incremental snapshot Kopia makes using kopia-ui.
Holy Hannah, this is such bullshit from Backblaze. Both the .git directory (why would I not SPECIFICALLY want this backed up for my projects?) and the cloud directories.
I get that changing economics make it more difficult to honor the original "Backup Everything" promise but this feels very underhanded. I'll be cancelling.
The only right approach these days is a vps with a zfs partition with auto-snapshots, compression, and deduplication on and a syncthing instance running. Everything else is bound to lose money, and/or data (a comment mentions they lost a file and got 3 whole months FREE)
If the user didn't put it there, it's hidden. Nobody routinely inspects the detailed configuration settings of their backup system, especially when it does appear to be working if you see it transferring data to the cloud and spot-check a file or two.
Any addition to the exclusions list that wasn't added by explicit user action is a hidden change and a data loss bug.
Blackblaze's personal backup solution is a mess in general. The client is clearly a giant pile of spaghetti code and I've had numerous issues with it, trying to figure out and change which files it does and doesn't backup is just one of them.
The configuration and logging formats they use are absolutely nonsensical.
I've recently been looking for online backup providers and Backblaze came highly recommended to me - but I think after reading this article I'll look elsewhere because this kind of behavior seems like the first step on the path of enshittification.
ANY company, and I do mean any, that offers "unlimited" anything is 100% a scam. At best its a temporary growth hack to entice people who havent had technology rug-pulls. And when profits dwindle and the S curve is near the upper coast, you can guarantee that "unlimited" will get hidden restrictions, exclusions, "terms of service" changes, nebulous fair use policies that arent fair, and more dark patterns. And every one of them are "how do we worsen unlimited to make more money on captive customers?"
We're also seeing this play out in real time with Anthropic with their poop-splatter-llm. They've gone through like 4 rug-pulls, and people STILL pay $200/month for it. Every round, their unlimited gets worse and worse, like I outlined above.
Pay as you go is probably the more fair. But SaaS providers reallllllly hate in providing direct and easy to use tools to identify costs, or <gasp> limit the costs. A storag/backup provider could easily show this. LLM providers could show a near-realtime token utilization.
But no. Dark patterns, rug-pulls, and "i am altering the deal, pray i do not alter it further".
Surprisingly only the headings (2.05) and links (3.72) fail the Firefox accessibility check, the body text is 5.74. But subjectively it seems worse and I definitely agree with you that the contrast is too low.
Contrast looks good for the text, but the font used has very thin lines. A thicker font would have been readable by itself. At 250% page zoom it's good enough, if you don't enable the browser built-in reader mode.
I wonder if it's because of the font-weight being decreased. If I disable the `font-weight` rule in Firefox's Inspector the text gets noticeably darker, but the contrast score doesn't change. Could be a bad interaction with anti-aliasing thin text that the contrast checker isn't able to pick up.
I'd say it looks pretty readable on android although I still wouldn't describe it as good. I wouldn't say I feel encouraged to squint. But possibly different antialiasing explains it.
I think the accessibility checks only take into account the text color, not the actual real world readability of given text which in this case is impossible to read because of the font weight.
Like this, by word of mouth. That’s how Apple has done UI design since they stopped printing paper manuals.
- ctrl-shift-. to show hidden files on macOS
- pull down to see search box (iOS 18)
- swipe from top right corner for flashlight button
- swipe up from lower middle for home screen
> I have a gesture for whoever decided "find in page" should go under share.
You can also just type your search term into the normal address bar and there's an item at the bottom of the list for "on this page - find <search>". I'd never even seen the find-in-page button under share.
Not restricted to Apple, but TIL: Double-clicking on a word an keeping the second click pressed, then dragging, allows you to select per word instead of per character.
I found this to be a common theme in web design a while back, and in part led to an experiment developing a newspaper/Pocket-like interface to reading HN. It's not perfect, but is easier on the eyes for reading... https://times.hntrends.net/story/47762864
I'm also pretty sure 14 points font is a bit outdated at this point, 16 should probably be a minimum with current screens. It's not as if screens aren't wide enough to fit bigger text.
Haha I keep forgetting that. Fortunately the browser remembers my zoom settings per page. I'm pretty sure the font is now at 16 or something via repeated Cmd +.
10 point at 96 dpi or with correctly applied scaling is very readable. But some toolkits like GTK have huge paddings for their widgets, so the text will be readable, but you’ll lose density.
macOS/iOS Safari and Brave browsers have "Reader mode" . Chrome has a "Reading mode" but it's more cumbersome to use because it's buried in a side menu.
For desktop browsers, I also have a bookmarklet on the bookmarks bar with the following Javascript:
It doesn't darken the text on every webpage but it does work on this thread's article. (The Javascript code can probably be enhanced with more HTML heuristics to work on more webpages.)
One day try throwing a pair on you'll be surprised. The small thin font is causing this not the text contrast. This and low light scenarios are the first things to go.
On mobile they’ve hidden that under “customize this article”, which I never would have even noticed if I hadn’t specifically known that there is some sort of dropdown somewhere, heh. But now we know :)
Your iPhone has this cool feature called reader mode if you didn’t know.
As for mentioning WCAG - so what if it doesn’t adhere to those guidelines? It’s his personal website, he can do what he wants with it. Telling him you found it difficult to read properly is one thing but referencing WCAG as if this guy is bound somehow to modify his own aesthetic preference for generic accessibility reasons is laughable. Part of what continues to make the web good is differing personal tastes and unique website designs - it is stifling and monotonous to see the same looking shit on every site and it isn’t like there aren’t tools (like reader mode) for people who dislike another’s personal taste.
Didn't say it wasn't. I said invoking an accessibility standard when it comes to a guy's personal website is laughable because the way it was said implied he was compelled to change his site because some bureaucratic busybodies somewhere said he should. Unless you are a business or a government, most people aren't overly concerned about accessibility, nor should they be - especially if it comes about only through guilt tripping or insinuated threats.
My naive idea:
Download 100 TB every 3 month to a 2nd device, create a list of files restored, validate checksums with the original machine, make a list of files differing and missing, check which ones are supposed to be missing? That sounds like a full time job.
Managing backup exclusions strikes again. It's impossible. Either commit to backing up the full disk, including the 80% of easily regenerated/redownloaded etc. data, or risk the 0.001% critical 16 byte file that turns out to contain your Bitcoin wallet key or god knows what else. I've been bitten by this more times than I'd like to admit managing my own backups, it's hard to expect a shrink-wrapped provider to do much better. It only takes one dumb simplification like "my Downloads folder is junk, no need to back that up" combined with (no doubt, years later) downloading say a 1Password recovery PDF that you lazily decide will live in that folder, and the stage is set.
Pinning this squarely on user error. Backblaze could clearly have done better, but it's such a well known failure mode that it's not much far off refusing to test restores of a bunch of tapes left in the sun for a decade.
It isn't user error if it was working perfectly fine until the provider made a silent change.
Unless the user error you are referring to is not managing their own backups, like I do. Though this isn't free from trouble, I once had silent failures backing up a small section of my stuff for a while because of an ownership/perms snafu and my script not sending the reports to stderr to anywhere I'd generally see them. Luckily an automated test (every now & then it scans for differences in the whole backup and current data) because it could see the source and noticed a copy wasn't in the latest snapshot on the far-away copy. Reliable backups is a harder problem then most imagine.
If there is a footgun I haven't considered yet in backup exclusions, I'd like to know more. Shouldn't it be safe to exclude $XDG_CACHE_HOME? Unfortunately, since many applications don't bother with the XDG standard, I have to exclude a few more directories, so if you have any stories about unexpected exclusions, would you mind sharing?
I don't remember why I started doing it, but I don't bulk exclude .cache for some reason or other. I have a script that strips down larger known caches as part of the backup. But the logic, whatever it was, is easy to understand: you're relying on apps to correctly categorise what is vs. isn't cache.
Also consider e.g. ~/.cache/thumbnails. It's easy to understand as a cache, but if the thumbnails were of photos on an SD card that gets lost or immediately dies, is it still a cache? It might be the only copy of some once-in-a-lifetime event or holiday where the card didn't make it back with you. Something like this actually happened to me, but in that case, the "cache" was a tarball of an old photo gallery generated from the originals that ought to have been deleted.
It's just really hard to know upfront whether something is actually important or not. Same for the Downloads folder. Vendor goes bankrupt, removes old software versions, etc. The only safe thing you can really do is hold your nose and save the whole lot.
We are going to drop blackblaze over this
We discovered this change recently because my dad was looking for a file that Dropbox accidentally overwrote which at first we said “no problem. This is why we pay for backblaze”
We had learned that this policy had changed a few months ago, and we were never notified. File was unrecoverable
If anyone at backblaze is reading this, I pay for your product so I can install you on my parents machine and never worry about it again. You decided saving on cloud storage was worth breaking this promise. Bad bad call
I'm going to drop Backblaze for my entire company over this.
I need it to capture local data, even though that local data is getting synced to Google Drive. Where we sync our data really has nothing to do with Backblaze backing up the endpoint. We don't wholly trust sync, that's why we have backup.
On my personal Mac I have iCloud Drive syncing my desktop, and a while back iCloud ate a file I was working on. Backblaze had it captured, thankfully. But if they are going to exclude iCloud Drive synced folders, and sounds like that is their intention, Backblaze is useless to me.
Bidirectional auto file sync is a fundamentally broken pattern and I'm tired of pretending it's not. It's just complete chaos with wrong files constantly getting overridden on both ends.
I have no clue why people still use it and I'd cut my losses if I were you, either backup to the cloud or pull from it, not both at the same time like an absolute tictac.
> I have no clue why people still use it
This is an instance of someone familiar with complex file access patterns not understanding the normal use case for these services.
The people using these bidirectional sync services want last writer wins behavior. The mild and moderately technical people I work with all get it and work with it. They know how to use the UI to look for old versions if someone accidentally overwrites their file.
Your characterization as complete chaos with constant problems does not mesh with the reality of the countless low-tech teams I've seen use Dropbox type services since they were launched.
This would be half OK if it worked, but you can't trust it to. OneDrive, for instance, has an open bug for years now where it will randomly revert some of your files to a revision from several months earlier. You can detect and recover this from the history, but only if you know that it happened and where, which you usually won't because it happens silently. I only noticed because it happened to an append-only text file I use daily.
I also have no clue why people use it.
You can build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem. From Windows or Mac, this FTP account could be accessed through built-in software.
This reference is 19 years old this month, in case anyone who recognized it was still feeling young.
Wait a moment, you just gave me an idea for a product
This cannot be a serious proposal. You should probably talk to people who don't use technology because they love it, but because they need it.
1 out of a thousand people might do that, the others will buy the product. That's why people use it, most people don't want to build everything themselves.
It's a reference to https://news.ycombinator.com/item?id=8863
But as usual it forgets the "For a Linux user" part.
If we remove the whole linux section and just ask "why not map a folder in Explorer" it's a reasonable question, probably even more reasonable in 2026 than in 2007. The network got faster and more reliable, and the dropbox access got slower.
Obvious. Explorer even has support built in for transparent ‘native’ gui support. I’m not even sure why you felt the need to explain it in detail. Next you’ll be explaining how to walk. (/s, I loved it)
Slow as fuck compared to 2 synced dirs
It works perfectly fine if you're user that know how it works. I use it with Syncthing and it works coz I know to not edit same file at the same time on 2 devices (my third and fourth device is always on internet so chances propagate reasonably fast even if the 2 devices aren't on on the same time)
But the moment that hits normal users, yeah, mess
I think this is a case of people using bidirectional file sync wrong. The point is to make the most up to date version of a file available across multiple devices, not to act as a backup or for collaboration between multiple users.
It works perfectly fine as long as you keep how it works in mind, and probably most importantly don't have multiple users working directly on the same file at once.
I've been using these systems for over a decade at this point and never had a problem. And if I ever do have one, my real backup solution has me covered.
+1. It works perfectly if your mental model is:
“Every file is only ever written to from a single client, and will be asynchronously made available to all other clients, and after some period of time has elapsed you can safely switch to always writing to the file from a different client”.
Bidirectional file sync is also in hot demand from people who don't know the words, "file", "client", "write", "async", "available", or "time"
:P
The fact that lay people can and will use a tool incorrectly does not mean said tool is not useful
> And if I ever do have one, my real backup solution has me covered.
What do you use and how do you test / reconcile to make sure it’s not missing files? I find OneDrive extremely hard to deal with because the backup systems don’t seem to be 100% reliable.
I think there are a lot of solutions these days that error on the side of claiming success.
I agree. I use syncthing for syncing phones and laptops. For data like photos, which aren't really updated. It works very nice. And for documents updated by one user, moving between devices is totally seamless.
That being said i understand how it works at a high level.
Throw some clock skew into the mix and it’s even more hilarious!
Why is this downvoted?
The insult to tictacs.
Also taking recommendations for a simple services I can install on my dads windows machine and my moms Mac that will just automatically backup the main drive to the cloud just in case
I've been extremely happy with Arq https://www.arqbackup.com/ for several years as a quiet backup solution, bring your own storage. I've done a few small restores and it's been just fine, and it automatically thins your backups to constrain storage costs.
Managing exclusions is something to keep vaguely on top of (I've accidentally had a few VM disk images get backed up when I don't need/want them) but the default exclusions are all very reasonable.
Installed Carbonite on my parents’ computer something like 15 years ago, and it still works (every now and then my dad tells me he used it to recover from a bug or a mistake).
But I have no idea where the company currently sits on the spectrum from good actor to fully enshittified.
I'm going to join the exodus, though for a different reason. Switched to Orbstack and ever since Backblaze refuses to back up saying "disk full" as Orbstack uses a 8TB sparse disk image. You can exclude it, but if they won't (very easily) fix a known issue by measuring file sizes properly I don't feel confident about the product.
Why not just use backblaze as cold storage and use restic or another tool with a GUI to backup to it?
Well this wasn’t the promise backblaze made a decade ago when we started using their products.
Now I need a new solution that will work for my parents
Of course, I wouldn’t use their client anymore. Actually, I would have never used it from the start as it’s not open source. I think for backups there’s no better guarantee than that. I don’t mean because you could look at the source code, I mean because in my experience open source products tend to care more about their users than not. At least for such foundational tools.
Backblaze Computer Backup != B2 Cloud Storage
You can't connect to their Computer Backup service through third-party software.
When an org quietly degrades one of their products, you should expect this behavior to occur again.
That is a lot more expensive if you have more than a small amount of data.
The issue with a client app backing up dropbox and onedrive folders on your computer is the files on demand feature, you could sync a 1tb onedrive to your 250gb laptop but it's OK because of smart/selective sync aka files on demand. Then backblaze backup tries to back the folder up and requests a download of every single file and now you have zero bytes free, still no backup and a sick laptop. You could oauth the backblaze app to access onedrive directly, but if you want to back your onedrive up you need a different product IMO.
Shoutout to Arq backup which simply gives you an option in backup plans for what to do with cloud only files:
- report an error
- ignore
- materialize
Regardless, if you make it back up software that doesn’t give this level of control to users, and you make a change about which files you’re going to back up, you should probably be a lot more vocal with your users about the change. Vanishingly few people read release notes.
I honestly didn't even realize Backblaze had a clientside app. Very happy user of Arq - been running a daily scheduled dual backup of my HDD to an external NAS and Backblaze B2 for years with zero issues.
That was their whole business originally. The block storage is a newer offering.
Love Arq!
Why no linux support?
(Arq developer here) Haven't gotten many requests for it at all over the years. I presume it's because there are so many free options for Linux.
Just wanted to say that for many years, Arq has been my backup solution. It's amazing and I advise it to everyone I know.
If it's open-source, Linux support is only a few hours with Claude away.
If it's not open-source, but the protocol is documented, see above.
If it's not open-source, and the protocol isn't documented, well... that makes the decision easy, doesn't it?
Backups software written by Claude? No thanks.
I've used enough Claude coded applications that I wouldn't trust that with a backup, unless it had extensive tests along with it.
And I've used enough "gold standard" commercial applications, like the one being discussed in this very article, that I don't trust those either. If you recoil in horror at code written by LLMs, I'm afraid that the vendors you're already working with have some really bad news for you. You can get over it now or get over it later. You will get over it.
I can audit and verify Claude's output. Code running at BackBlaze, not so much. Take some responsibility for your data. Rest assured, nobody else will.
You are not wrong, but I just don't have time. My choices are pay someone or throw my hands up. I have been paying backblaze. But I recently had a drive die, and discovered the backups are missing .exe and .dll files, and so that part of the restore was worthless.
What time I do have, I've been using to try and figure out photo libraries. Nothing is working the way I need it to. The providers are a mess of security restrictions and buggy software.
The choices are maybe eat shit, or spend your own time auditing and polishing shit into something edible before eating it?
That's the general conclusion you will draw after reading the comments on this story, yes.
>You can get over it now or get over it later. You will get over it
You're forgetting the third option:
You can remain blissfully unaware of it.
You can remain blissfully unaware of it.
And you can read many accounts of the outcome of that strategy in this very thread.
"Dear Claude, please create an eztensive testing suite for this app. Love, cobertos"
"Great idea wise customer, I will certainly mock one out just for you!"
My favorite Peanuts comic was always the one where Linus is standing at an intersection next to a 'Push Button To Cross Street' sign. He is sucking his thumb and clutching his blanket despondently.
In the last panel, Charlie Brown tells him, "You have to move your feet, too."
That seems like a pretty straightforward issue to solve, to simply backup only those files that are actually on the system, not the stubs. If it's on your computer, it should able to get backed up. If it's just a shadow, a pointer, it doesn't.
Making the change without making it clear though, that's just awful. A clear recipe for catastrophic loss & drip drip drip of news in the vein of "How Backblaze Lost my Stuff"
The OP’s complaint is that the files were not backed up. If they had discovered that only stubs were backed up, I don’t think they’d be any happier.
The stubs are the thing on your computer?
Imagine if they could detect stab or real file huh? Space technology, I know! Or just fucking copy them as stubs and what's actually downloaded as actually downloaded! Boggles the mind!
Or maybe just do what they do now, but WARN about that in HUGE RED LETTERS, in the website and the app, instead of burying it in an update note like weasels!
The whole "just sync everything, and if you can't seek everything, pretend to sync everything with fake files and then download the real ones ad-hoc" model of storage feels a bit ill-conceived to me. It tries to present a simple facade but I'm not sure it actually simplifies things. It always results in nasty user surprises and sometimes data loss. I've seen Microsoft OneDrive do the same thing to people at work.
I’ve lost data not realizing I was backing up placeholder files (iCloud).
Hiding the network always ends in pain. But never goes out of style.
Same. I lost a lot of photos this way. I've recently moved over to Immich + Borg backup with a 3-2-1 backup between a local synology NAS and BorgBase. Painful lesson, but at least now I feel much more confident. I've even built some end-to-end monitoring with Grafana.
Careful w that Synology NAS, mine's now a brick that may also have led to permanent data loss.
My own approach to simplicity generally means "hide complexity behind a simple interface" rather than pushing for simple implementations because I feel that too much emphasis on simplicity of implementations often means sacrificing correctness.
This particular example is a useful one for me to think about, because it's a version of hiding complexity in order to present a simple interface that I actually hate. (WYSIWYG editors is another one, for similar reasons: it always ends up being buggy and unpredictable.)
That would make sense for online-only files, but I have my Dropbox folder set to synchronize everything to my PC, and Backblaze still started skipping over it a few months ago. I reached out to support and they confirmed that they are just entirely skipping Dropbox/OneDrive/etc folders entirely, regardless of if the files are stored locally or not.
The primary trouble I have with backblaze was that this change was not clearly communicated, even if perhaps it could be justified.
That doesn't really make a lot of sense, though. Reading a file that's not actually on disk doesn't download it permanently. If I have zero of 10TB worth of files stored locally on my 1TB device, read them all serially, and measure my disk usage, there's no reason the disk should be full, or at least it should be cache that can be easily freed. The only time this is potentially a problem is if one of the files exceeds the total disk space available.
Hell, if I open a directory of photos and my OS tries to pull exif data for each one, it would be wild if that caused those files to be fully downloaded and consume disk space.
Right, but even if that’s working it breaks the user experience of services like this that ‘files I used recently are on my device’.
After a backup, you’d go out to a coffee shop or on a plane only to find that the files in the synced folder you used yesterday, and expected to still be there, were not - but photos from ten years ago were available!
That shouldn't be seen as Backblaze's problem. It's Dropbox's problem that they made their product too complicated for users to reason about. The original Dropbox concept was "a folder that syncs" and there would be nothing problematic about Backblaze or anything else trying to back it up like any other folder.
Today's Dropbox is a network file system with inscrutable cache behavior that seeks to hide from the users the information about which files are actually present. That makes it impossible for normal users to correctly reason about its behavior, to have correct expectations for what will be available offline or what the side effects of opening a file will be, and Backblaze is stuck trying to cope with a situation where there is no right answer.
If I backup a file, I need to read that file. The rest is in the management layer underneath that file.
Seems simple enough to do for Backblaze, no?
Do you really want Backblaze to ignore all the side effects of scanning through the entire contents a badly-designed network filesystem?
What I actually want is not a backup. That is just an artefact of the process.
What i want is restores. The ability to restore anything from ideally any point back in time.
How that is achieved is not my concern.
Obviously Backblaze does not achieve that, today.
> How that is achieved is not my concern.
You're dodging the question. Wanting to ignore the side effects does not mean they won't affect you.
There’s no reason to think that would happen - files you had from ten years ago would have been backed up ten years ago and would be skipped over today.
Good point (I’m assuming you’re right here and it trusts file metadata and doesn’t read files it’s already backed up?)
It would still happen with the first backup - or first connection of the cloud drive - though, which isn’t a great post-setup new user experience. It probably drove complaints and cancellations.
I feel like I’ve accidentally started defending the concept of not backing up these folders, which I didn’t really intend to. I’d also want these backed up. I’m just thinking out loud about the reasons the decision was made.
It's generally now handled decently well, but with three or four of these things it can make backups take annoying long as without "smarts" (which are not always present) it may force a download of the entire OneDrive/Box each time - even if it never crashes out.
> it may force a download of the entire OneDrive/Box each time - even if it never crashes out.
I am not aware of any evidence supporting this.
Cloud placeholders have been a feature for years, plenty of programs have mitigations for this behavior.
The issue really isn't that it's not backing up the folder (which I can see an argument for both sides and various ways to do it) - it's that they changed what they did in a surprising way.
Your backup solution is not something you ever want to be the source of surprises!
This is a complexity that makes it harder, but not insurmountable.
It would be reasonable to say that if you run the file sync in a mode that keeps everything locally, then Backblaze should be backing it up. Arguably they should even when not in that mode, but it'll churn files repeatedly as you stream files in and out of local storage with the cloud provider.
> Arguably they should even when not in that mode, but it'll churn files repeatedly as you stream files in and out of local storage with the cloud provider.
When you have a couple terabytes of data in that drive, is it acceptable to cycle all that data and use all that bandwidth and wear down your SSD at the same time?
Also, high number of small files is a problem for these services. I have a large font collection in my cloud account and oh boy, if I want to sync that thing, the whole thing proverbially overheats from all the queries it's sending.
Reading your comments, it sounds like you are arguing it is impossible to backup files in Dropbox in any reasonable way, and therefore nobody should backup their cloud files. I know you haven’t technically said that, but that’s what it sounds like.
I assume you don’t think that, so I’m curious, what would you propose positively?
> I know you haven’t technically said that, but that’s what it sounds like.
Yes, I didn't technically said that.
> It sounds like you are arguing it is impossible to backup files in Dropbox in any reasonable way, and therefore nobody should backup their cloud files.
I don't argue neither, either.
What I said is with "on demand file download", traditional backup software faces a hard problem. However, there are better ways to do that, primary candidate being rclone.
You can register a new application ID for your rclone installation for your Google Drive and Dropbox accounts, and use rclone as a very efficient, rsync-like tool to backup your cloud storage. That's what I do.
I'm currently backing up my cloud storages to a local TrueNAS installation. rclone automatically hash-checks everything and downloads the changed ones. If you can mount Backblaze via FUSE or something similar, you can use rclone as an intelligent MITM agent to smartly pull from cloud and push to Backblaze.
Also, using RESTIC or Borg as a backup container is a good idea since they can deduplicate and/or only store the differences between the snapshots, saving tons of space in the process, plus encrypting things for good measure.
My understanding of Backblaze Computer Backup is it is not a general purpose, network accessible filesystem.[0] If you want to use another tool to backup specific files, you'd use their B2 object storage platform.[1] It has an S3 compatible API you can interact with, Computer Backup does not.
But generally speaking, I'd agree with your sentiment.
[0]: https://www.backblaze.com/computer-backup/docs/supported-bac...
[1]: https://www.backblaze.com/docs/cloud-storage-about-backblaze...
This. You should not try to backup your local cache of cloud files as if those were your local files. Use a tool that talks to the cloud storage directly.
Use tools with straightforward, predictable semantics, like rclone, or synching, or restic/Borg. (Deduplication rules, too.)
But if the files are only on the remote storage and not local, chances are they haven't been modified recently, so it shouldn't download them fully, just check the metadata cache for size / modification time and let them be if they didn't change.
So, in practice, you shouldn't have to download the whole remote drive when you do an incremental backup.
You can't trust size and modification time all the time, though mdate is a better indicator, it's not foolprooof. The only reliable way will be checksumming.
Interestingly, rclone supports that on many providers, but to be able to backblaze support that, it needs to integrate rclone, connect to the providers via that channel and request checks, which is messy, complicated, and computationally expensive. Even if we consider that you won't be hitting API rate limits on the cloud provider.
If you can’t trust modification time you are doing something so unusual that you probably need to be handling your backups privately anyway.
I don't think so.
Sometimes modification time of a file which is not downloaded on computer A, but modified by computer B is not reflected immediately to computer A.
Henceforth, backup software running on computer A will think that the file has not been modified. This is a known problem in file synchronization. Also, some applications modifying the files revert or protect the mtime of the file for reasons. They are rare, but they're there.
Then do it in memory, assuming those services allow you to read the files like that. It sounds like they do based on your other comments.
The problem is, downloading files and disk management is not in your control, that part is managed by the cloud client (dropbox, google drive, et. al) transparently. The application accessing the file is just waiting akin to waiting for a disk spin up.
The filesystem is a black box for these software since they don't know where a file resides. If you want control, you need to talk with every party, incl. the cloud provider, a-la rclone style.
Why would they do new backups of old files all the time? They would just skip those.
Unless it does something very weird it won't trigger all those files to download at the same time. That shouldn't be a worry.
And, as a separate note, they shouldn't be balking at the amount of data in a virtualized onedrive or dropbox either considering the user could get a many-terabyte hard drive for significantly less money.
> Unless it does something very weird it won't trigger all those files to download at the same time. That shouldn't be a worry.
The moment you call read() (or fopen() or your favorite function), the download will be triggered. It's a hook sitting between you and the file. You can't ignore it.
The only way to bypass it is to remount it over rclone or something and use "ls" and "lsd" functions to query filenames. Otherwise it'll download, and it's how it's expected to work.
Why would it use either of those on all the files at once? It should only be opening enough files to fill the upload buffer.
I think you might be confusing Backblaze reading files and how Dropbox/OneDrive/Nextcloud/etc. work. NC doesn't enable this by default (I don't think), but Windows calls it virtual file support. There is no avoiding filling the upload buffer, because Backblaze has zero control over how Dropbox downloads files. When Backblaze requests that a file be opened and read, Windows will ask Dropbox or whatever to open the file for it, and to read it. How that is done is up to whatever handles the virtual files. To Backblaze, your Dropbox folder is a normal directory with all that that entails, so Backblaze thinks that it can just zip through the directory and it'll read data from disk, even though that isn't really what's happening. I had to exclude my Nextcloud directory from my Duplicati backups for precisely this reason -- my Nextcloud is hosted on my server, and Duplicati was sending it so many requests it would cause my server to start sending back error 500s.
And no, my server isn't behind cloudflare, primarily because I don't have $200 to throw at them to allow me to proxy arbitrary TCP/UDP ports through their network, and I don't know how to tell CF "Hey, only proxy this traffick but let me handle everything else" (assuming that's even possible given that the usual flow is to put your entire domain behind them).
No, I'm not confusing anything.
Dropbox and onedrive can handle backblaze zipping through and opening many files. The risk is getting too many gigabytes at once, but that shouldn't happen because backblaze should only open enough for immediate upload. If it does happen it's very easily fixed.
If it overloads nextcloud by hitting too many files too fast, that's a legitimate issue but it's not what OP was worried about.
Maybe it'll, maybe it won't, but it'll cycle all files in the drive and will stress everything from your cloud provider to Backblaze, incl. everything in between; software and hardware-wise.
That sounds very acceptable to get those files backed up.
It shouldn't stress things to spend a couple weeks relaying a terabyte in small chunks. The most likely strain is on my upload bandwidth and yeah that's the cost of cloud backup, more ISPs need to improve upload.
I mean, cycling a couple of terabytes of data over a 512GB drive is at least full 4 writes, which is too much for that kind of thing.
> more ISPs need to improve upload.
I was yelling the same things to the void for the longest time, then I had a brilliant idea of reading the technical specs of the technology coming to my home.
Lo and behold, the numbers I got were the technical limits of the technology that I had at home (PON for the time being), and going higher would need a very large and expensive rewiring with new hardware and technology.
4 writes out of what, 3000? For something you'll need to do once or twice ever? It's fine. You might not even eat your whole Drive Write Per Day quota for the upload duration, let alone the entire month.
> the technical limits of the technology that I had at home (PON for the time being)
Isn't that usually symmetrical? Is yours not?
> 4 writes out of what, 3000?
Depends on your device capacity and how much is in actual use. Wear leveling things also wear things while it moves things around.
> For something you'll need to do once or twice ever?
I don't know you, but my cloud storage is living, and even if it's not living, if the software can't smartly ignore files, it'll pull everything in, compare and pass without uploading, causing churns in every backup cycle.
> Isn't that usually symmetrical? Is yours not?
GPON (Gigabit PON) is asymmetric. Theoretical limits is 2.4Gbps down, 1.2Gbps up. I have 1000Mbit/75Mbit at home.
> I don't know you, but my cloud storage is living
But you're probably changing less than 1% each day. And new changes are likely already in the cache, no need to download them.
> if the software can't smartly ignore files, it'll
Backblaze checks the modification date.
> GPON (Gigabit PON) is asymmetric. Theoretical limits is 2.4Gbps down, 1.2Gbps up. I have 1000Mbit/75Mbit at home.
2:1 is fine. If you're getting worse than 10:1 then that does sound like your ISP failed you?
How do you know how often those files need to be backed up without reading them? Timestamps and sizes are not reliable, only content hashes. How do you get a content hash? You read the file.
If timestamps aren’t reliable, you fall way outside the user that can trust a third party backup provider. Name a time when modification timestamp fails but a cloud provider will catch the need to download the file.
Backblaze already trusts the modification date.
Why would it do that more than once unless you are modifying 4TB of data every day, in which case you are causing the problem.
I don't know how your client works, but reading metadata (e.g. requesting size) off any file causes some cloud clients to download it completely.
Of course I'm not modifying 4TB on a cloud drive, every day.
Can you name such a client? That sounds like a terrible experience.
The fault is with the PC manufacturers screwing you on disk space claiming 1TB, when its only 256gb. bait and switch
I guess the problem with Backblaze's business model with respect to Backblaze Personal is that it is "unlimited". They specifically exclude linux users because, well, we're nerds, r/datahoarders exists, and we have different ideas about what "unlimited" means. [1]
This is another example in disguise of two people disagreeing about what "unlimited" means in the context of backup, even if they do claim to have "no restrictions on file type or size" [2].
[1] https://www.reddit.com/r/backblaze/comments/jsrqoz/personal_... [2] https://www.backblaze.com/cloud-backup/personal
Any company that does the "unlimited*" shenanigans are automatically out from any selection process I had going, wherever they use it. It's a clear signal that the marketing/financial teams have taken over the businesses, and they'll be quick to offload you from the platform given the chance, and you'll have no recourse.
Always prefer businesses who are upfront and honest about what they can offer their users, in a sustainable way.
> It's a clear signal that the marketing/financial teams have taken over the businesses
Or that they're targeting the mass retail market, where people are technically ignorant, and "unlimited" is required to compete.
And statistically-speaking, is viable as long as a company keeps its users to a normal distribution.
Is there an example of a consumer facing SaaS that's been able to handle the "unlimited" in a way you'd consider positive?
US cellular data plans? Where it's throttled after soft cap?
Although I will say it's been nice to have them give more transparency around their actual soft cap numbers.
That’s an example of where unlimited can work (because the limit is a number of hours of degraded service which is quantifiable).
Storage was already a hairy beast with the original setup, and it would be much better if they had defined limits you could at least know about (and pay for).
Google and Youtube, especially Youtube.
Google does not have unlimited. I had to pay to increase my storage.
Google Drive reneged on unlimited storage for Education accounts once they realized that universities also contain researchers who need to store huge amounts of data.
Massive fraud from abroad didn't help there either. A favorite backup spot for terabytes of pirated media, complete with guides on which schools had good @edu addresses for it.
Hadn't even considered your obvious point, a good one!
Google forced everyone off their deprecated G Suite for Business plan (which had unlimited storage) and onto a Workspace plan.
I had to give up and delete plenty of data because of this. That data was important to me, but not important enough to pay their ransom.
YouTube is constantly reencoding videos to save space at the expense of older content looking like mud, so arguably even they're having their struggles.
We all know the "nobody has watched this video in ten years, login at least once or it'll be yeeted" email is coming, someday.
YT would have to start declining in growth pretty substantially for that to be the case. All the 360p video from 2010-2015 probably doesn't take up even 1% of the storage new videos added in 2025.
True, it's more likely to be aimed at stemming the tide of 4k video that nobody watches - but luckily they're worth more than Disney right now so we don't have to confront that ... yet.
YouTube shorts are incredibly highly compressed.
You can only do it during growth phases or if there’s complimentary products with margin. The story I was told about Office 365 was the when they were using spinning disk, exchange was IOPS-bound, so they had lots of high volume, low iops storage to offer for SharePoint. Google has a similar story, although neither are really unlimited, but approaching unlimited with for large customers.
Once growth slows, churn eats much of the organic growth and you need to spend money on marketing.
Telegram?
>and "unlimited" is required to compete.
And there speaks marketing.
Or they're selling their product to a market where the purchaser doesn't understand how much they would need to pay if they were paying by the gigabyte (or even how to check how much they would need). Telling those people they don't need to worry about that "detail" is a key selling point. Backblaze has a product for people who understand the limitations of their consumer product and don't find them acceptable: B2, which is priced by the gigabyte.
>doesn't understand how much they would need to pay...how to check how much they would need...
...even nearly any frame of reference for anything storage related, much less gigabytes
> And statistically-speaking, is viable as long as a company keeps its users to a normal distribution.
Doing a bait-and-switch on a percentage of your paying customers, no matter how small the percentage is, may be "viable" for the company, but it's a hostile experience for those users, and companies deserve to be called out for it.
On the other hand, subsidizing high-usage customers with low-usage customers is pretty generous to the high-usage customers, and there's no pricing model that doesn't suck a little.
Pricing tiers suck if your usage needs are at the bottom of a tier, or you need exactly one premium feature but not more. A la carte pricing is always at least a bit steep, since there's no minimum charge/bulk discount (consider a gym or museum's "day pass") so they have to charge you the full one-time costs every time in case that's your only time.
Base cost + extra per usage might be the best overall, but because nobody has solved micro transactions, the usage fees have to be pretty steep too. And frankly, everyone hates being metered - it means you have to think about pricing every time you go to use something.
> Or that they're targeting the mass retail market, where people are technically ignorant, and "unlimited" is required to compete.
So… Marketing has taken over, just as parent comment said. Got it.
I just read the Reddit post by their developer and my takeaway is that they have a very good understanding of “unlimited” really means. It’s not a shenanigan. It’s just calculated risk. It’s clear to me that they simultaneously intend to offer truly unlimited backups while hoping that what the average user backs up is within a certain limit that they can easily predict and plan for. It’s a statistical game that they are prepared to play.
> It’s a statistical game that they are prepared to play.
I understand this, many others do too, the only difference seems to be that we're not willing to play those games. Others are, and that's OK, just giving my point of view which I know is shared by many others who are bit stricter about where we host our backups. Instead of "statistical games" we prefer "upfront limitations", as one example.
The problem is you have to play with them - and sure, maybe they're willing to be the Costco to the unlimited backup's $1.50 hotdog - but for how long? Will their dedication to unlimited and particular price points mean you have to take Pepsi for awhile instead of Coke, or that your polish sausage dog disappears? Wait, where did the analogy go? I'm hungry.
It's a bit safer when you know your playbook - if there was unlimited (as it is now) and unlimited plus (where they backup "cloud storage cached files") and unlimited pro max premier (where they backup entire cloud storages) you'd at least know where you stand, and you'd change "holy shit my important file I though was backed up isn't and now it's gone forever" to "I have to pay $10 a more a month or take on this risk".
In university we had computer labs, I worked in the office that handled all of engineering computing. You paid the fee for engineering school and you got to use the labs. They had printers. We wanted printing to be free. This didn't mean "you get to take reams of blank paper home with you", it meant "you get as much printing as you reasonably need for academic purposes". Nobody cared if you printed your resume, fliers for your book club, or whatever, we weren't sticklers. Honestly we wanted to think about printers as little as possible.
But we'd always have a few people at the end of the semester print 493 blank pages using up all of their print quota they'd "paid for". No sir, you didn't pay for 500 pages of printing a semester, we'd let you print as much as you needed, we just had to put a quota in place to prevent some joker from wallpapering the lecture hall.
It was hard to express what we meant and "unlimited" didn't cut it.
You meant “reasonable,” but you did not apply reason. Situations such as this can be handled with a quota set at something like 150% of median use, but then extended upon a justified request. It can work in a lab where there’s a human touch, but it fails at million-user scale where even that level of human support is too expensive.
Most home broadband providers offer unlimited network traffic.
If they limit the rate of speed it's technically limited which really makes me wonder how they legally can say these things. I guess it means in a lot of cases it's like Comcast where they also limit the data a month perhaps but dang.
They mean that they're not going to limit the total amount of data that you send/receive beyond the natural limit implied by the maximum rate.
When a movie subscription says unlimited movies, we know they're not suggesting that they can break the laws of time, just that they won't turn you away from a screening. It's pretty normal language, used to communicate no additional limit, which is relevant when compared to cell phone data plans (which are actually, in my opinion, fraudulent) that shunt you to a lower tier after a certain amount of usage.
In the language of marketing (in the USA at least) the word "unlimited" means "limited".
They offer "unlimited" where I live, not "unlimited*".
I mean, in this universe we live in everything is limited somehow.
I do wish it was a word that had to be completely dropped from marketing/adverting.
For example there is not unlimited storage, hell the visible universe has a storage limit. There is not unlimited upload and download speed, and what if when you start using more space they started exponentially slowing the speed you could access the storage? Unlimited CPU time in processing your request? Unlimited execution slots to process your request? Unlimited queue size when processing your requests.
Hence everything turns into the mess of assumptions.
> I mean, in this universe we live in everything is limited somehow.
Yes, indeed, most relevant in this case probably "time" and "bandwidth", put together, even if you saturate the line for a month, they won't throttle you, so for all intents and purposes, the "data cap" is unlimited (or more precise; there is no data cap).
In almost all services this tends to get an asterisk that says "unless your usage interferes with other users" which in itself is poorly defined. But typically means once their system gets closer to its usage limit, you're the first to get booted off the service.
No ISP I've had in my adult life had such conditions, it truly is "Whatever you manage to do with the bandwidth we give you". I've done hundreds of TBs for months without any impact to my bandwidth (transferring ML datasets among other things), and I'm pretty sure a ISP in my country would break some law if they'd limit a typical broadband home connection based on data transfer quotas.
What? You are capped by bandwidth and time is its own limit. You are capped at the max bandwidth in your service contract multiplied by the length of the contract. A bandwidth cap has an implied data cap
The point is that you have access to a 100Mb/s connection, and your access to that connection is unlimited. It doesn't become a 10Mb/s connection at some point, and your access isn't cut off - there are no limits on your access.
Of course there are practical limits as you can't make your 100Mb/s connection into a gigabit one (ignoring that you can buy burstable in a datacenter, etc, etc).
Where unlimited falls down is when it refers to a endlessly consumable resource, like storage.
Of course. You're always capped by rate. But you're not capped by the cumulative amount (other than as a function of rate and time).
Doesn't help when you still need a VPN to get rid of Telekom/Vodafones abysmal peering
And they have the necessary pipes to serve the rate they sell you 24/7.
Nobody has turned the moon into a hard drive yet.
> And they have the necessary pipes to serve the rate they sell you 24/7
I doubt they have those pipes, at least if every of their customers (or a sufficiently large amount) would actually make use of that.
Second question would be, how long they would allow you to utilize your broadband 24/7 at max capacity without canceling your subscription. Which leads back to the point the person I replied to was making: If you truly make use of what is promised, they cancel you. Hence it is not a faithful offer in the first place.
> Nobody has turned the moon into a hard drive yet.
Not important here because backblaze only has to match the storage of your single device. Plus some extra versions but one year multiplied by upload speed is also a tractable amount.
Since I know how many of those businesses are run I'll let you in on the very obvious secret: there’s zero chance they have enough uplink to accommodate everyone using 100% of their bandwidth at the same time, and probably much less than that.
Residential network access is oversold as everything else.
The only difference with storage is there’s a theoretical maximum on how much a single person can use.
But you could just as well limit backup upload speed for similar effect. Having something about fair use in ToS is really not that different.
Residential ISPs don’t work financially unless you oversell peak time full-rate bandwidth. If you do things right, you oversell at a level that your customers don’t actually slow down. Even today, you won’t have 100% of customers using 100% of their full line rate 100% of the time.
Back in the late 1990s we could run a couple dozen 56k lines on a 1.544 Mbps backhaul. We could have those to the same extent today, but there’s still a ratio that works fine.
Yes, yes. We know. The business environment can't be arsed to maintain it's own integrity by actually building out the capacity they want to charge for. Everyone hides behind statistical multiplexing until the actuarial pants shitting event occurs. Then it's bail out time, or "We're sorry. We used all the money for executive bonuses!"
Building out for 100% of theoretical capacity makes no sense but you can still easily accommodate the small handful of power users with plenty to spare. Most ISPs will not drop or throttle users trying to get their money's worth if it’s fiber or similar. LTE of course that’s another thing.
That sort of horrible abuse only happens in areas where some provider has strict monopoly, but that’s an aberration and with Starlink’s availability there’s an upper bound nowadays.
It’s not unlimited. The limit might be very high these days, but it’s at most bandwidth times duration. And while that sounds trivial, it does mean they aren’t selling you an infinity of a resource.
Unsure if sarcastic but most ISPs will throttle and "traffic" long before you use anything close to <bandwidth rating> times <seconds in a month>.
I've been running RPI-based torrent client 24/7 in several countries and never experienced that. Eats a few TBs per month, not the full line, but pretty decent amount. I guess it really depends on the country.
I'm in the UK with Virgin Media on their 1Gbps package, going through multiple TB a month and I'm yet to be throttled in any way.
Well, multiple TB isn't close to your bandwidth rating. It only takes 2% of your connection in a single direction to hit 6TB a month.
Ha, yes I suppose that's correct.
I’ve used Spectrum and their predecessors since the 90s. Never ran into this, although the upstream speeds are ridiculously slow, and they used to force Netflix traffic to an undersized peer circuit.
I'm unsure if you're sarcastic or not, never have I've used a ISP that would throttle you, for any reason, this is unheard of in the countries I've lived, and I'm not sure many people would even subscribe to something like that, that sounds very reverse to how a typical at-home broadband connection works.
Of course, in countries where the internet isn't so developed as in other parts of the world, this might make sense, but modern countries don't tend to do that, at least in my experience.
Alas, "isn't so developed" applies to the US: https://arstechnica.com/tech-policy/2020/06/cox-slows-intern...
My parents have gotten hit by this. Dad was downloading huge video files at one point on his WiFi and his ISP silently throttled him.
A common term is "data cap": https://en.wikipedia.org/wiki/Data_cap
> Alas, "isn't so developed" applies to the US
Wow, I knew that was generally true, didn't know it was true for internet access in the US too, how backwards...
> A common term is "data cap": https://en.wikipedia.org/wiki/Data_cap
I think most are familiar with throttling because most (all?) phone plans have some data cap at one point, but I don't think I've heard of any broadband connections here with data caps, that wouldn't make any sense.
Data caps are just documenting the reality that ISPs oversubscribe - if they sell a hundred 1Gb/s connections to a neighborhood, it's highly unlikely they're peering that neighborhood onto the Internet at large at 100Gb/s. I don't know what the current standard is, but in the past it's been 10/100 to 1 - so a hundred 1Gb/s connections might be sharing 1-10Gb/s of uplink; and if usage starts to saturate that they need a way of backing off that is "fair" - data caps are one of the ways they inform the customer of such.
I've seen it with my new fiber rollout - every single customer no matter their purchased speed had 1Gb up and down - as more customers came online and usage became higher, they're not limiting anyone, but you get closer to your advertised rate - but my upload is still faster than my download because most of my neighborhood is downloading, few are uploading.
I have 5 Gbps symmetric at home. I and my fiancee both work from home, so our backup fiber connection from another provider is 2 Gbps. We can also both tether to cell phones if necessary. We can get 5G home wireless Internet here, too, and we might ditch our 2 Gbps line in favor of that as a backup. We moved from Texas back home to Illinois last year, and one of the biggest considerations was who had service at what tiers due to remote work. Some of the houses we looked at in the same three-county area in the Chicago suburbs didn’t even have 5G home available (not from AT&T, Verizon, or T-Mobile anyway).
My parents have 5G wireless home as their primary connection, and that was only introduced in their area a couple of years ago. Before that, they could get dial-up, 512 kbps wireless with about a $1000 startup cost, ISDN (although the phone company really didn’t want to sell it to them), Starlink, or HughesNet. The folks across the asphalt road from them had 20 Mbps Ethernet over power lines years ago, and that’s now I think 250 Mbps. It’s a different power company, though, so they aren’t eligible.
Around 80% of the US population lives in large urban areas. The other 20% of the population range from smaller towns to living many kilometers from any town at all. There’s a lot of land in the US.
Here in dense NYC, most apartments I've lived in have but a single ISP available. It's common to hunt for apartments by searching the address on service maps.
I'm pretty sure one landlord was cut in by his ISP, as he skipped town when I tried to ask about getting fiber, and his office locked their door and drew their shades when I went there with a technician on two occasions. The final time, we got there before they opened and the woman ran into the office and slammed the door on us.
Our ISPs conspire to avoid competition (AKA "overbuilding") and so stuff like this just festers. It's truly a shame.
"I guess the problem with Backblaze's business model with respect to Backblaze Personal is that it is "unlimited"."
The new and very interesting problem with their business model is that drive prices have doubled - and in some cases, more than doubled - in the last 12 months.
Backblaze has a lot of debt and at some point the numbers don't make sense anymore.
Yeah, I found that out recently when I had to purchase a new 16TB drive because of them in my RAID died recently. I bought the hard drive used about three years ago for about $130. To replace it I had to shop around and I ended up paying about $270 and I think that was considered a decent deal right now.
Oh well, I guess this is why we're given two kidneys.
It’s funny that the same person asking for linux support would complain about B2 “not being for home users”. I sync my own backups to B2 and would set that up over installing linux any day of the week! It’s extremely easy.
What software/workflow do you use for this Linux to B2 backup please?
Restic + rclone personally, with a wrapper script to glue things together nicely
What's the advantage of additionally using rclone vs. just restic?
rclone on a cron job
Yea, that's pretty shady. Either don't call your service unlimited or bump up the prices so you can survive occasional datahoarder, called them out on it many years ago.
I actually emailed them years ago about it. Asked them point blank what'd happen if I dumped 20+ TB of encrypted, undeduplicable backups onto their storage servers. They actually replied that there'd be no problem, but I didn't buy it. Not at all surprised to see this now.
Unlimited means without limits or restrictions.
If a company uses the word unlimited to describe their service, but then attempts to weasel out of it via their T&Cs, that doesn't constitute a disagreement over the meaning of the word unlimited. It just means the company is lying.
From a philosophical standpoint, I agree, but it terms of service providers "unlimited" has always pretty much always been synonymous with "unmetered" (i.e. we don't charge you for traffic, but we will still throttle you if you are affecting service reliability for other customers)
Sorry but unlimited has never meant unrestricted. TOCs always have restrictions. If it were unrestricted it would be used for all kinds of illegal stuff they don’t want on their servers, child pr0n and whatnot. They can’t legally offer a service like this without restrictions as they operate within an existing set of laws.
Unlimited however, they can offer. I don’t see how people get into mental block of thinking something is nefarious when a company offers you unlimited hosting or data. Yes, they know it’s impossible if everyone took full advantage of that. They also know most people won’t and so they don’t have to spend time worrying about it. It’s a simple actuarial exercise to work out the pricing that covers the use of your users.
Back in the early 2000s I ran a web hosting service that was predominantly a LAMP stack shared hosting environment. It had several unlimited plans and they were easy to estimate/price. The only times I had an issue of supporting a heavy user, it would turn out they were doing something unrestricted. Back then, it was usually something pron or mp3 related. So the user would get kicked off for that. I didn’t have any issues with supporting the usage load if it was within TOS. The margins were so high it was almost impossible to find a user that could give me any trouble from an economic standpoint.
When it comes to storage "unlimited" to me means a promise to be broken at some random point in the future. I'll never use a service that claims unlimited anything over having an actual cost model. Companies that charge by what you use have actually given consideration to the cost of doing business and have priced that in already.
Why don't they charge by the Gigabyte
Because approximately no one wants that. Anyone who does already uses S3 etc.
I use them for the b2 bucket style storage where this happens. Its expensive per gig compared to the cost of a working personal unlimited desktop account. I like to visit their reddit page occasionally and its a constant stream of desktop client woes and stories of restoring problems and any time b2 is mentioned its like "but muh 50 terabytes" lol
It's cheaper if you have multiple computers with normal amounts of data though. My whole family is on my B2 account (Duplicati backing up eight computers each to a separate bucket), and it's $10/month.
They do, it's called B2 and is another product of them.
As an FYI you can recover from force pushes to GitHub using the GitHub UI[0] or their API[1]. And if you force push to one of your own machines you can use the reflog[2]. [0]: https://stackoverflow.com/a/78872853 [1]: https://stackoverflow.com/a/48110879 [2]: https://stackoverflow.com/a/24236065
And as a double FYI this means a force push does not permanently delete sensitive data! Beware. Rotate that API key, even if it's a pain in the arse.
I can understand in theory why they wouldn't want to back up .git folders as-is. Git has a serious object count bloat problem if you have any repository with a good amount of commit history, which causes a lot of unnecessary overhead in just scanning the folder for files alone.
I don't quite understand why it's still like this; it's probably the biggest reason why git tends to play poorly with a lot of filesystem tools (not just backups). If it'd been something like an SQLite database instead (just an example really), you wouldn't get so much unnecessary inode bloat.
At the same time Backblaze is a backup solution. The need to back up everything is sort of baked in there. They promise to be the third backup solution in a three layer strategy (backup directly connected, backup in home, backup external), and that third one is probably the single most important one of them all since it's the one you're going to be touching the least in an ideal scenario. They really can't be excluding any files whatsoever.
The cloud service exclusion is similarly bad, although much worse. Imagine getting hit by a cryptoworm. Your cloud storage tool is dutifully going to sync everything encrypted, junking up your entire storage across devices and because restoring old versions is both ass and near impossible at scale, you need an actual backup solution for that situation. Backblaze excluding files in those folders feels like a complete misunderstanding of what their purpose should be.
I don’t think this is the right way to see this.
Why should a file backup solution adapt to work with git? Or any application? It should not try to understand what a git object is.
I’m paying to copy files from a folder to their servers just do that. No matter what the file is. Stay at the filesystem level not the application level.
I'm not saying Backblaze should adapt to git; the issue isn't application related (besides git being badly configured by default; there's a solution with git gc, it's just that git gc basically never runs).
It's that to back up a folder on a filesystem, you need to traverse that folder and check every file in that folder to see if it's changed. Most filesystem tools usually assume a fairly low file count for these operations.
Git, rather unusually, tends to produce a lot of files in regular use; before packing, every commit/object/branch is simply stored as a file on the filesystem (branches only as pointers). Packing fixes that by compressing commit and object files together, but it's not done by default (only after an initial clone or when the garbage collector runs). Iterating over a .git folder can take a lot of time in a place that's typically not very well optimized (since most "normal" people don't have thousands of tiny files in their folders that contain sprawled out application state.)
The correct solution here is either for git to change, or for Backblaze to implement better iteration logic (which will probably require special handling for git..., so it'd be more "correct" to fix up git, since Backblaze's tools aren't the only ones with this problem.)
7za (the compression app) does blazingly fast iteration over any kind of folder. This doesn't require special code for git. Backblaze's backup app could do the same but rather than fix their code they excluded .git folders.
When I backup my computer the .git folders are among the most important things on there. Most of my personal projects aren't pushed to github or anywhere else.
Fortunately I don't use Backblaze. I guess the moral is don't use a backup solution where the vendor has an incentive to exclude things.
IMHO, you can't do blazingly fast iteration over folders with small files in Windows, because every open is hooked by the anti-virus, and there goes your performance.
Actually once the initial backup is done there is no reason to scan for changes. They can just use a Windows service that tells them when any file is modified or created and add that file to their backup list.
Backblaze offers 'unlimited' backup space, so they have to do this kind of thing as a result of that poor marketing choice.
No they don’t. They just have to price the product to reflect changing user patterns. When backblaze started, it was simply “we back up all the files on your drive” they didn’t even have a restore feature that was your job when you needed it. Over time they realized some user behavior changed, these Cloud drives where a huge data source they hadn’t priced in, git gave them some problems that they didn’t factor in, etc. The issue is there solution to dealing with it is to exclude it and that means they’re now a half baked solution to many of their users, they should have just changed the pricing and supported the backup solution people need today.
If they must scam, shouldn’t they be deduplicating on the server rather than the client?
FWIW some other people in this thread are saying the article is wrong about .git folders not being backed up: https://news.ycombinator.com/item?id=47765788
That's a really important fact that's getting buried so I'd like to highlight it here.
I think it's understandable for both Backblaze and most users, but surely the solution is to add `.git` to their default exclusion list which the user can manage.
I think they shouldn't back up git objects individually because git handles the versioning information. Just compress the .git folder itself and back it up as a single unit.
Better yet, include dedpulication, incremental versioning, verification, and encryption. Wait, that's borg / restic.
This is a joke, but honestly anyone here shouldn't be directly backing up their filesystems and should instead be using the right tool for the job. You'll make the world a more efficient place, have more robust and quicker to recover backups, and save some money along the way.
This is a good point, but you might expect them to back up untracked and modified files in the backup, along with everything else on your filesystem.
Eh, you really shouldn't do that for any kind of file that acts like a (an impromptu) database. This is how you get corruption. Especially when change information can be split across more than one file.
Sorry, what are you saying shouldn't be done? Backing up untracked/modified files in a bit repo? Or compressing the .git folder and backing it up as a unit?
> Backing up untracked/modified files in a bit repo?
This. It's best to do this in an atomic operation, such as a VSS style snapshot that then is consistent and done with no or paused operations on the files. Something like a zip is generally better because it takes less time on the file system than the upload process typically takes.
> If it'd been something like an SQLite database instead (just an example really)
See Fossil (https://fossil-scm.org/)
P.S. There's also (https://www.sourcegear.com/vault/)
> SourceGear Vault Pro is a version control and bug tracking solution for professional development teams. Vault Standard is for those who only want version control. Vault is based on a client / server architecture using technologies such as Microsoft SQL Server and IIS Web Services for increased performance, scalability, and security.
It's probably primarily because Linus is a kernel and filesystem nerd, not a database nerd, so he preferred to just use the filesystem which he understood the performance characteristics of well (at least on linux).
Git packs objects into pack-files on a regular basis. If it doesn't, check your configuration, or do it manually with 'git repack'.
I decided to look into this (git gc should also be doing this), and I think I figured out why it's such a consistent issue with git in particular. Running git gc does properly pack objects together and reduce inode count to something much more manageable.
It's the same reason why the postgres autovacuum daemon tends to be borderline useless unless you retune it[0]: the defaults are barmy. git gc only runs if there's 6700 loose unpacked objects[1]. Most typical filesystem tools tend to start balking at traversing ~1000 files in a structure (depends a bit on the filesystem/OS as well, Windows tends to get slower a good bit earlier than Linux).
To fix it, running
> git config --global gc.auto 1000
should retune it and any subsequent commit to your repo's will trigger garbage collection properly when there's around 1000 loose files. Pack file management seems to be properly tuned by default; at more than 50 packs, gc will repack into a larger pack.
[0]: For anyone curious, the default postgres autovacuum setting runs only when 10% of the table consists of dead tuples (roughly: deleted+every revision of an updated row). If you're working with a beefy table, you're never hitting 10%. Either tune it down or create an external cronjob to run vacuum analyze more frequently on the tables you need to keep speedy. I'm pretty sure the defaults are tuned solely to ensure that Postgres' internal tables are fast, since those seem to only have active rows to a point where it'd warrant autovacuum.
[1]: https://git-scm.com/docs/git-gc
I needed to use
> git config --global gc.auto 1000
with the long option name, and no `=`.
I love nothing more than running strange git commands found in HN comments.
Let's ride the lightning and see if it does anything.
A few thousand files shouldn't be a problem to a program designed to scan entire drives of files. Even in a single folder and considering sloppy programs I wouldn't worry just yet, and git's not putting them in a single folder.
You don’t see ZFS/BTRFS block based snapshot replication choking on git or any sort of dataset. Use the right job for the tool or something.
They 100% should have communicated this change, absolutely unacceptable to change behavior without an extremely visible warning.
However, backing up these kinds of directories has always been ill-defined. Dropbox/Google Drive/etc. files are not actually present locally - at least not until you access the file or it resides to cache it. Should backup software force you to download all 1TB+ of your cloud storage? What if the local system is low on space? What if the network is too slow? What if the actually data is in an already excluded %AppData% location.
Similar issue with VCS, should you sync changes to .git every minute? Every hour? When is .git in a consistent state?
IMO .git and other VCS should just be synced X times per day and it wait for .git to be unchanged for Y minutes before syncing it. Hell, I bet Claude could write a special Git aware backup script.
But Google Drive and Dropbox mount points are not real. It’s crazy to expect backup software to handle that unless explicitly advertised.
Dropbox and GDrive desktop clients can be configured to sync files to a local directory. Backing them up with an additional platform would probably need some sort of logic like you described for VCS.
Exclusions are one thing, but I've had Backblaze _fail to restore a file_. I pay for unlimited history.
I contacted the support asking WTF, "oh the file got deleted at some point, sorry for that", and they offered me 3 months of credits.
I do not trust my Backblaze backups anymore.
I had similar experience as well. They upgraded their client and server software something like 5 years ago which put forward different restrictions on character set used for password. I have used a special character which was no longer allowed. When I needed to restore files after disk failure I could not log in either in the app or on the website. The customer service was useless -- we are sorry, your fault. I have lost 1 TB of personal photos due to this as a paying customer. Never trust Backblaze.
I have the same experience with Backblaze. 3 years ago I tried to restore my files from Backblaze, using their desktop client.
First thing I noticed is that if it can't download a file due to network or some other problem then it just skips it. But you can force it to retry by modifying its job file which is just an SQLite DB. Also it stores and downloads files by splitting them into small chunks. It stores checksums of these chunks, but it doesn't store the complete checksum of the file, so judging by how badly the client is written I can't be sure that restored files are not corrupted after the stitching.
Then I found out that it can't download some files even after dozens of retries because it seems they are corrupted on Backblaze side.
But the most jarring issue for me is that it mangled all non-ascii filenames. They are stored as UTF-8 in the DB, but the client saves them as Windows-1252 or something. So I ended up with hundreds of gigabytes of files with names like фикац, and I can't just re-encode these names back, because some characters were dropped during the process.
I wanted to write a script that forces Backblaze Client to redownload files, logs all files that can't be restored, fixes the broken names and splits restored files back into chunks to validate their checksums against the SQLite DB, but it was too big of a task for me, so I just procrastinated for 3 years, while keeping paying monthly Backblaze fees because it's sad to let go of my data.
I wonder if they fixed their client since then.
And they talked so much about how great redundancy they have on backend. I guess they don't count 404's
Do you have any more details? This is a pretty big deal. The differentiators between Backblaze and Hetzner mostly boil down to this kind of thing supposedly not being possible.
I’m on my phone so forgive the formatting, but here’s my entire support exchange:
- - -
Hey, I tried restoring a file from my backup — downloading it directly didn't work, and creating a restore with it also failed – I got an email telling me contract y'all about it.
Can you explain to me what happened here, and what can I do to get my file(s?) back?
- - -
Hi Jan,
Thanks for writing in!
I've reached out to our engineers regarding your restore, and I will get back to you as soon as I have an update. For now, I will keep the ticket open.
- - -
Hi Jan,
Regarding the file itself - it was deleted back in 2022, but unfortunately, the deletion never got recorded properly, which made it seem like the file still existed.
Thus, when you tried to restore it, the restoration failed, as the file doesn't actually exist anymore. In this case, it shouldn't have been shown in the first place.
For that, I do apologize. As compensation, we've granted you 3 monthly backup credits which will apply on your next renewal. Please let me know if you have any further questions.
- - -
That makes me even more confused to be honest - I’ve been paying for forever history since January 2022 according to my invoices?
Do you know how/when exactly it got deleted?
- - -
Hi Jan,
Unfortunately, we don't have that information available to us. Again, I do apologize.
- - -
I really don’t want to be rude, but that seems like a very serious issue to me and I’m not satisfied with that response.
If I’m paying for a forever backup, I expect it to be forever - and if some file got deleted even despite me paying for the “keep my file history forever” option, “oh whoops sorry our bad but we don’t have any more info” is really not a satisfactory answer.
I don’t hold it against _you_ personally, but I really need to know more about what happened here - if this file got randomly disappeared, how am I supposed to trust the reliability of anything else that’s supposed to be safely backed up?
- - -
Hi Jan,
I'll inquire with our engineers tomorrow when they're back in, and I'll update you as soon as I can. For now, I will keep the ticket open.
- - -
Appreciate that, thank you! It’s fine if the investigation takes longer, but I just want to get to the bottom of what happened here :)
- - -
Hi Jan,
Thanks for your patience.
According to our engineers and my management team:
With the way our program logs information, we don't have the specific information that explains exactly why the file was removed from the backup. Our more recent versions of the client, however, have vastly improved our consistency checks and introduced additional protections and audits to ensure complete reliability from an active backup.
Looking at your account, I do see that your backup is currently not active, so I recommend running the Backblaze installer over your current installation to repair it, and inherit your original backup state so that our updates can check your backup.
I do apologize, and I know it's not an ideal answer, but unfortunately, that is the extent of what we can tell you about what has happened.
- - -
I gave up escalating at this point and just decided these aren’t trusted anymore.
The files in question are four year old at this point so it’s hard for me conclusively state, so I guess there might be a perfect storm of that specific file being deleted because it was due to expire before upgraded to “keep history forever”, but I don’t think it’s super likely, and I absolutely would expect them to have telemetry about that in any case.
If anyone from Backblaze stumbles upon it and wants to escalate/reinvestigate, the support ID is #1181161.
Thank you for sharing this. A non-persistent backup service is on the same level as a zombie-insurance provider.
This reminds me of the Seinfeld riff on car rental reservations. Anyone can make a backup. The important part is holding the backup. If Backblaze doesn’t always do that then it is practically worthless to everyone.
This seems absurd from a company offering backups as a service.
Especially if they allow them restoring all your data onto a drive and shipping it to you, they pretty clearly should have enough information available to them to test restorations of data, and the number of times I've heard that failure mode ("oh, we didn't track deletions well enough, so we only found out we deleted it when you tried restoring"), plus them saying they have made improvements to avoid this exact failure mode in newer client versions, makes me think they should have enough reports to investigate it.
...which makes me wonder if they did, and decided they would go bankrupt if they told people how much data they lost, so they decided to bet on people not trying restores on a lot of the lost data.
wut
Weirdly, reading this had the net impact of me signing up to Backblaze.
I had no idea that it was such a good bargain. I used to be a Crashplan user back in the day, and I always thought Backblaze had tiered limits.
I've been using Duplicati to sync a lot of data to S3's cheapest tape-based long term storage tier. It's a serious pain in the ass because it takes hours to queue up and retrieve a file. It's a heavy enough process that I don't do anything nearly close to enough testing to make sure my backups are restorable, which is a self-inflicted future injury.
Here's the thing: I'm paying about $14/month for that S3 storage, which makes $99/year a total steal. I don't use Dropbox/Box/OneDrive/iCloud so the grievances mentioned by the author are not major hurdles for me. I do find the idea that it is silently ignoring .git folders troubling, primarily because they are indeed not listed in the exclusion list.
I am a bit miffed that we're actively prevented from backing up the various Program Files folders, because I have a large number of VSTi instruments that I'll need to ensure are rcloned or something for this to work.
"Maybe they're only incompetent in the ways that have been enumerated in this blog post" does not seem like much of a sales pitch. Baffling.
I'm happy to pay an annual fee for a one-size fits all approach that I don't have to think about. I read the post and I'm just saying that his blockers are not blockers for me.
I would ask you: what is the better alternative? That's not a rhetorical question; they don't have my credit card details for another two weeks.
You lose a bit of control. With S3 you can preprocess (transform, index, filter, downcode, etc) before storing. You can index metadata in place (names, sizes, metadata) for low-cost searching.
As for testing recovery, you can validate file counts, sizes + checksums without performing recovery.
A few shell scripts give you the power of advanced enterprise backup, whereas backblaze only supports GUI restores.
If you don’t really want backups you can save a lot more money by not signing up for Backblaze.
Are they known for accidentally erasing your backups?
I get that this is not a restorable image, but for $100 a year I'm not expecting that.
At some point, Backblaze just silently stopped backing up my encrypted (VeraCrypt) drives. Just stopped working without any announcement, warning or notification. After lots of troubleshooting and googling I found out that this was intentional from some random reddit thread. I stopped using their backup service after that.
Some companies are in the business of trust. These companies NEED to understand that trust is somewhat difficult to earn, but easy to lose and nearly IMPOSSIBLE to regain. After reading this article I will almost certainly never use or recommend Backblaze. (And while I don't use them currently, they WERE on the list of companies I would have recommended due to the length of their history.)
> trust is somewhat difficult to earn, but easy to lose and nearly IMPOSSIBLE to regain
Eh, I don't agree. Case in point: Microsoft.
Or in other words: a sucker is born every minute.
I noticed this (thankfully before it was critical) and I’ve decided to move on from BB. Easily over 10 year customer. Totally bogus. Not only did it stop backing it up the old history is totally gone as well.
The one thing they have to do is backup everything and when you see it in their console you can rest assured they are going to continue to back it up.
They’ve let the desktop client linger, it’s difficult to add meaningful exceptions. It’s obvious they want everyone to use B2 now.
What are you using now? Asking for a friend
Not OP, but I have been using borg backup [1] against Hetzner Storage Box [2]
Borg backup is a good tool in my opinion and has everything that I need (deduplication, compression, mountable snapshot.
Hetzner Storage Box is nothing fancy but good enough for a backup and is sensibly cheaper for the alternatives (I pay about 10 eur/month for 5TB of storage)
Before that I was using s3cmd [3] to backup on a S3 bucket.
[1] https://www.borgbackup.org/
[2] https://s3tools.org/s3cmd
[3] https://s3tools.org/s3cmd
I cannot edit any longer, the second link was supposed to be
[2] https://www.hetzner.com/storage/storage-box/
Thanks, I was quite confused for a second
This is quite a bit more expensive than Backblaze if you have more than 5 TBs or so
I use rsync.net. You can use basically any SSH tool or rclone interface. They have a cheaper plan for "experts" if you want to forgo zfs snapshots,https://www.rsync.net/signup/order.html?code=experts.
Just a note of caution: sync != backup. When I was younger and dumber, I had my own rsync cron script to do a nightly sync of my documents to a remote server. One day I noticed files were gone from my local drive; I think there were block corruptions on the disk itself, and the files were dropped from the filesystem, or something like that. The nightly rsync propagated the deletions to the remote "backup."
D'argh.
Musing: There's a step further along the spectrum that echoes the relationship, where where "backup" != "backup that resists malware".
In other words, a backup can be degraded into a sync-to-nothing situation if the client logic is untrustworthy.
Rsync.net is really really good.
Just this weekend, my backup tool went rogue and exhausted quota on rsync.net (Some bad config by me on Borg.) Emailed them, they promptly added 100 GB storage for a day so that I could recover the situation. Plus, their product has been rock solid since a few years I've been using them.
Thanks for your kind words.
Just to clarify - there are discounted plans that don't have free ZFS snapshots but you can still have them ... they just count towards your quota.
If your files don't change much - you don't have much "churn" - they might not take up any real space anyway.
rsync.net and rclone are great, my brain understood restic easier than borg for local backups over usb (ymmv), and plain old `rsync --archive` is most excellent wrt preserving file mod times and the like.
There is 100% a difference between "dead data" (eg: movie.mp4) and "live data" (eg: a git directory with `chmod` attributes)- S3 and similar often don't preserve "attributes and metadata" without a special secondary pass, even though the `md5` might be the same.
I have used Arq for way over a decade. It does incremental encrypted backups and supports a lot of storage providers. Also supports S3 object lock (to protect against ransomware). It’s awesome!
How is the performance? For me it takes Arq over an hour just to scan my files for changes.
(Arq developer here) By default Arq tries to be unobtrusive. Edit your backup plan and slide the “CPU usage” slider all the way to the right to make it go faster.
Wasabi + rclone works well for me. Previous BB customer.
It looks like the following line has been added to /Library/Backblaze.bzpkg/bzdata/bzexcluderules_mandatory.xml which excludes my Dropbox folder from getting backed up:
</bzexclusions><excludefname_rule plat="mac" osVers="*" ruleIsOptional="f" skipFirstCharThenStartsWith="*" contains_1="/users/username/dropbox/" contains_2="*" doesNotContain="*" endsWith="*" hasFileExtension="*" />
That is the exact path to my Dropbox folder, and I presume if I move my Dropbox folder this xml file will be updated to point to the new location. The top of the xml file states "Mandatory Exclusions: editing this file DOES NOT DO ANYTHING".
.git files seem to still be backing up on my machine, although they are hidden by default in the web restore (you must open Filters and enable Show Hidden Files). I don't see an option to show hidden files/folders in the Backblaze Restore app.
I wonder if OP didn't realise there was this _Show Hidden Files_ option and their .git was indeed backed up.
That would be nice, they'd be able to get their history back!
> .git files seem to still be backing up on my machine
Try checking bzexcluderules_editable.xml. A few years ago, Backblaze would back up .git folders for Mac but not Windows. Not sure if this is still the case.
After mucking around with various easy to use options my lack of trust[1] pushed me into a more-complicated-but-at-least-under-my-control-option: syncthing+restic+s3 compatible cloud provider.
Basically it works like this:
- I have syncthing moving files between all my devices. The larger the device, the more stuff I move there[2]. My phone only has my keepass file and a few other docs, my gaming PC has that plus all of my photos and music, etc.
- All of this ends up on a raspberry pi with a connected USB harddrive, which has everything on it. Why yes, that is very shoddy and short term! The pi is mirrored on my gaming PC though, which is awake once every day or two, so if it completely breaks I still have everything locally.
- Nightly a restic job runs, which backs up everything on the pi to an s3 compatible cloud[3], and cleans out old snapshots (30 days, 52 weeks, 60 months, then yearly)
- Yearly I test restoring a random backup, both on the pi, and on another device, to make sure there is no required knowledge stuck on there.
This is was somewhat of a pain to setup, but since the pi is never off it just ticks along, and I check it periodically to make sure nothing has broken.
[1] there is always weirdness with these tools. They don't sync how you think, or when you actually want to restore it takes forever, or they are stuck in perpetual sync cycles
[2] I sync multiple directories, broadly "very small", "small", "dumping ground", and "media", from smallest to largest.
[3] Currently Wasabi, but it really doens't matter. Restic encrypts client side, you just need to trust the provider enough that they don't completely collapse at the same time that you need backups.
We need to talk about The Cone of Backups(tm), which you and I seem to have separately derived!
Props for getting this implemented and seemingly trusted... I wish there was an easier way to handle some of this stuff (eg: tiny secure key material => hot syncthing => "live" git files => warm docs and photos => cold bulk movies, isos, etc)... along with selective "on demand pass through browse/fetch/cache"
They all have different policy, size, cost, technical details, and overall SLA/quality tradeoffs.
Does syncthing work yet?
~ 5 years ago, I had a development flow that involved a large source tree (1-10K files, including build output) that was syncthing-ed over a residential network connection to some k8s stuff.
Desyncs/corruptions happened constantly, even though it was a one-way send.
I've never had similar issues with rsync or unison (well, I have in unison, but that's two-way sync, and it always prompted to ask for help by design).
Anyway, my decade-old synology is dying, so I'm setting up a replacement. For other reasons (mostly a decade of systemd / pulse audio finding novel ways to ruin my day, and not really understanding how to restore my synology backups), I've jumped ship over to FreeBSD. I've heard good things about using zfs to get:
saniod + syncoid -> zfs send -> zfs recv -> restic
In the absence of ZFS, I'd do:
rsync -> restic
Or:
unison <-> unison -> restic.
So, similar to what you've landed on, but with one size tier. I have docker containers that the phone talks to for stuff like calendars, and just have the source of the backup flow host my git repos.
One thing to do no matter what:
Write at least 100,000 files to the source then restore from backup (/ on a linux VM is great for this). Run rsync in dry run / checksum mode on the two trees. Confirm the metadata + contents match on both sides. I haven't gotten around to this yet with the flow I just proposed. Almost all consumer backup tools fail this test. Comments here suggest backblaze's consumer offering fails it badly. I'm using B2, but I haven't scrubbed my backup sets in a while. I get the impression it has much higher consistency / durability.
I will say I specifically don't sync git repos (they are just local and pushed to github, which I consider good enough for now), and I am aware that syncthing is one more of those tools that does not work well with git.
syncthing is not perfect, and can get into weird states if you add and remove devices from it for example, but for my case it is I think the best option.
I've personally had no major issues with syncthing, it just works in the background, the largest folder I have synced is ~6TB and 200k files which is mirroring a backup I have on a large external.
One particular issue I've encountered is that syncthing 2.x does not work well for systems w/o an SSD due to the storage backend switching to sqlite which doesn't perform as well as leveldb on HDDs, the scans of the 6TB folder was taking an excessively long time to complete compared to 1.x using leveldb. I haven't encountered any issues with mixing the use of 1.x and 2.x in my setup. The only other issues I've encountered are usually related to filename incompatibilites between filesystems.
Anecdotally, I've been managing a Syncthing network with a file count in the ~200k range, everything synced bidirectionally across a few dozen (Windows) computers, for 9 years now; I've never seen data loss where Syncthing was at fault.
Good to know. I wonder what the difference is. We were doing things like running go build inside the source directory. Maybe it can't handle write races well on Linux/MacOS?
I also have a lil script that rolls dice on restic snapshot, then lists files and picks a random set to restore to /dev/null.
I still trust restic checksums will actually check whether restore is correct, but that way random part of storage gets tested every so often in case some old pack file gets damaged
That's a really good idea, imma steal it.
I once had to restore around 2 TB of RAW photos. The app was a mess. It crashed every few hours. I ended up manually downloading single folders over a timespan of 2 weeks to restore my data. The support only apologized and could not help with my restore problem. After this I cancelled my subscription immediately and use local drives for my backups now, drives which I rotate (in use and locations).
I never trust them again with my data.
If you're pulling 2TB of data you should be ordering it on a drive from them and saving yourself the hassle.
I restored a 2 TB drive via the net no problem from them some years back, although I didn't use the client, but downloaded one massive ZIP file from the web interface.
Well, "no problem" is an overstatement. Once you need a restore, you learn that their promise of end-to-end encryption is actually a lie. (As in, you have to break the end-to-end encryption to restore since everything has to be decrypted on their servers.)
The fact that they’d exclude “.git” and other things without being transparent about it is scandalous
I can almost almost understand the logic behind not backing up OneDrive/Dropbox. I think it's bad logic but I can understand where it's coming from.
Not backing up .git folders however is completely unacceptable.
I have hundreds of small projects where I use git track of history locally with no remote at all. The intention is never to push it anywhere. I don't like to say these sorts of things, and I don't say it lightly when I say someone should be fired over this decision.
I had a back and forth with them about .git folders a couple of years back and their defence was something like "we are a consumer product - not a professional developer product. Pay for our business offering"
But if that's truly their stance, then they are being deceptive about their non-business offering at the point of sale.
EDIT - see my other comment where I found the actual email
Well I do pay for their business product, I have a "Business Groups" package with a few dozen endpoints all backing up for $99/year per machine.
According to support's reply just now, my backups are crippled just like every other customer. No git, no cloud synced folders, even if those folders are fully downloaded locally.
(This is also my personal backup strategy for iCloud Drive: one Mac is set to fully download complete iCloud contents, and that Mac backs up to Backblaze.)
> My first troubling discovery was in 2025, when I made several errors then did a push -f to GitHub and blew away the git history for a half decade old repo. No data was lost, but the log of changes was.
I know this is besides the point somewhat, but: Learn your tools people. The commit history could probably have been easily restored without involving any backup. The commits are not just instantly gone.
> The commits are not just instantly gone.
Indeed, the commits and blobs might even have still been available on the GitHub remote, I'm not sure they clean them on some interval or something, but bunch of stuff you "delete" from git still stays in the remote regardless of what you push.
Why tho? Doesn't Backblaze sell cloud storage, and more files you backup the more profit they make? Or I misunderstand what it is?
AFAICT Backblaze does back up .git directories. I have many repos backed up. The .git directory is hidden by default in the web UI (along with all other hidden files), but there is an option to show them.
You should try downloading one of your backed up git repos to see if it actually does contain the full history, I just checked several and everything looks good.
I commented on this topic elsewhere on this page. This is an email from 2021. Maybe they changed policy but here:
> Bob (Backblaze Help)
> Aug 5, 2021, 11:33 PDT
> Hello there,
> Thank you for taking the time to write in,
> Unfortunately .git directories are excluded by Backblaze by default. File
> changes within .git directories occur far too often and over so many files
> that the Backblaze software simply would not be able to keep up. It's beyond
> the scope of our application.
> The Personal Backup Plan is a consumer grade backup product. Unfortunately we
> will not be able to meet your needs in this regard.
> Let me know if you have any other questions.
> Regards,
> Bob The Backblaze Team
> changes within .git directories occur far too often and over so many files that the Backblaze software simply would not be able to keep up.
I don’t really understand that. I’m using Windows File History, and while it’s limited to backing up changes only every 15 minutes, and is writing to a local network drive, it doesn’t seem to have any trouble with .git directories.
Does the UI atleast hint there's hidden files or is it only by going to the filters you can find this out?
It seems incredibly stupid for a BACKUP PROGRAM to not list the hidden files instead of indicating they're hidden (e.g. _(hidden)_.git)
Thanks. Silently ignoring .git folders would be much more egregious than not backing up cloud drives in my opinion. The latter is at least somewhat understandable, though they should have been more transparent about it.
A lot of personal “nerd” options are listed in the thread (and like restic/borg are really good!) but nothing really centralized. Backblaze was a great fire and forget option for deploying as a last resort backup. I don’t think there are any competitors in that space if you are looking for continuous backup, centralized management and good pricing that doesn’t require talking to a salesperson to get things going and is pay as you go.
I just checked the Backblaze app and found that .iso was on the exclusion list. Just in case anyone here is as dumb as I...
It's ironic that Backblaze themselves wrote a blog post a couple of years ago explaining why Dropbox isn't enough as a backup service and you need Backblaze as an additional layer of protection: https://www.backblaze.com/blog/whats-wrong-with-google-drive...
That aged well...
Everyone is acting like this is obviously wrong, and they clearly should have communicated the change and made it visible in the exclusion settings.
However, there is a very good reason for not backing up what is in effect network attached storage. Particularly for OneDrive, as it often adds company SharePoint sites you open files from as mountpoints under your OneDrive folder (business OneDrive is basically a personal Sharepoint site under the hood). Trying to back them up would result in downloading potentially hundreds of gigabytes of files to the desktop only to them reupload them to OneDrive. That would also likely trigger data exfiltration flags at your corporate IT.
A Dropbox/OneDrive/Drive/etc folder is a network mount point by another name. (Many of them are not implemented as FUSE mounts or equivalent OS API, not folders on disk.) It's fundamentally reasonable for software that promises backing up the local disk not to backup whatever network drives you happen to have signed in/mounted.
Great explanation, very reasonable.
Except that before they did and then they didn't without any proper notification (release notes don't count for significant changes like this).
They should have just added a pop up or at least email or both, given a heads-up and then again when the change actually kicked in
Well, you missed the point.
The problem is not them not backing it up by default but:
* changing existing setting to backup less by default * essentially hiding the change from the user as it is not shown on directory exclude list
Ironically drop box and one drive folders I can still somewhat understand as they are "backuped" in other ways (but potentially not reliable so I also understand why people do not like that).
But .git? It does not mean you have it synced to GitHub or anything reliable?
If you do anything then only backup the .git folder and not the checkout.
But backing up the checkout and not the .git folder is crazy.
I use backblaze and have repos I dont push for this reason so I am a bit stunned lol
I have multiple drives that started out as their own os. Each of them has a Dropbox folder in the standard location. Each of them has a different set of files in them (I deduped at one point), with some overlap of different versions. I no longer use Dropbox, so none of these are synced anywhere.
They don't need to be in my case, I'm only using them now because of existing shortcuts and VM shares and programs configured to source information from them. That doesn't mean I don't want them backed up.
Same for OneDrive: Microsoft configured my account for OneDrive when I set it up. Then I immediately uninstalled it (I don't want it). But I didn't notice that my desktop and documents folders live there. I hate it. But by the time I noticed it, it was already being used as a location for multiple programs that would need to be reconfigured, and it was easier to get used to it than to fix it. Several things I've forgotten about would likely break in ways I wouldn't notice for weeks/months. Multiple self-hosted servers for connecting to my android devices would need to reindex (Plex, voidtools everything, several remote systems that mount via sftp and connected programs would decide all my files were brand new and had never been seen before)
> drop box and one drive folders I can still somewhat understand as they are "backuped" in other ways
No they are not. This is explicitly addressed in the article itself.
normally this folder are synced to dropbox and/or onedrive
both services have internal backups to reduce the chance they lose data
both services allow some limited form of "going back to older version" (like the article states itself).
Just because the article says "sync is not backup" doesn't mean that is true, I mean it literally is backup by definition as it: makes a copy in another location and even has versioning.
It's just not _good enough_ backup for their standards. Maybe even standards of most people on HN, but out there many people are happy with way worse backups, especially wrt. versioning for a lot of (mostly static) media the only reason you need version rollback is in case of a corrupted version being backed up. And a lot of people mostly backup personal photos/videos and important documents, all static by nature.
Through
1. it doesn't really fulfill the 3-2-1 rules it's only 2-1-1 places (local, one backup on ms/drop box cloud, one offsite). Before when it was also backed up to backblaze it was 3-2-1 (kinda). So them silently stopping still is a huge issue.
2. newer versions of the 3-2-1 rule also say treat 2 not just as 2 backups, but also 2 "vendors/access accounts" with the one-drive folder pretty much being onedrive controlled this is 1 vendor across local and all backups. Which is risky.
The author of article explicitly ignored both of them come with versioning so it is not just sync, you have old version of files too
Parent is using "backuped" to mean "likely in some cloud (latest version)". And that may explain why BB excludes .git folders.
You are using it to mean "maintaining full version history", I believe? Another important consideration.
> You are using it to mean "maintaining full version history", I believe?
No, they are using it to mean “backed up”. Like, “if this data gets deleted or is in any way lost locally, it’s still backed remotely (even years later, when finally needed)”.
I’m astonished so many people here don’t know what a backup is! No wonder it’s easy for Backblaze to play them for fools.
But isn't that exactly what Dropbox does? If I delete a file on my PC, I can go to Dropbox.com and restore it, to some period in the past (I think it depends on what you pay for). In fact, I can see every version that's changed during the retention period and choose which version to restore.
Maintaining version history out to a set retention period is a backup...no?
definition of the term backup by most sources is one the line of:
> a copy of information held on a computer that is stored separately from the computer
there is nothing about _any_ versioning, or duration requirements or similar
To use your own words, I fear its you who doesn't know what a backup is and assume a lot other additional (often preferable(1)) things are part of that term.
Which is a common problem, not just for the term backup.
There is a reason lawyers define technical terms in a for this contract specific precise way when making contracts.
Or just requirements engineering. Failing there and you might end up having a backup of all your companies important data in a way susceptible to encrypting your files ransomware or similar.
---
(1): What often is preferable is also sometimes the think you really don't want. Like sometimes keeping data around too long is outright illegal. Sometimes that also applies to older versions only. And sometimes just some short term backups are more then enough for you use case. The point here is the term backup can't mean what you are imply it does because a lot of existing use cases are incompatible with it.
Oftentimes the important data that needs restoring is in the checkout: uncommitted and unstaged changes that represent hours of work.
Microsoft makes no guarantees on onedrive, you are responsible for backing up that data. Of course they try hard to keep it safe, but contractually they give no promises
Both Dropbox and OneDrive default to "online first" for most users (including Dropbox on macOS which has moved itself into File Provider). It is a technically sound and sane default for Backblaze to ignore these mounts, especially given their policy not to backup network drives. They really should have informed legacy users about it.
Technically speaking, imagine you're iterating over a million files, and some of them are 1000x slower than the others, it's not Backblaze's fault that things have gone this way. Avoiding files that are well-known network mount points is likely necessary for them to be reliable at what they do for local files.
It's important to recognize that these new OS-level filesystem hooks are slow and inefficient - the use case is opening one file and not 10,000 - and this means that things you might want to do (like recursive grep) are now unworkably slow if they don't fit in some warmed-up cache on your device.
To fix it, Backblaze would need a "cloud to cloud" backup that is optimized for that access pattern, or a checkbox (or detection system) for people who manage to keep a full local mirror in a place where regular files are fast. This is rapidly becoming a less common situation. I do, however, think that they should have informed people about the change.
fwiw, the .git files are being backed... but..
1. You have to check "show hidden files" in the web ui (or the app) when restoring and
2. If you restore a folder that has a '.git' folder inside of it (by checking it in the ui) but you DID NOT check "show hidden files", then the '.git' (or any other hidden file/folder) does not get restored.
Which is.. unexpected.. if I check a folder to restore, I expect *everything* inside of it to be restored.
But the dropbox folder is, in fact, not there. Which is a surprise to me as well. :(
I highly recommend switching to something more like Arq and then using whatever backend storage that you want. There are probably some other open source ways to do it, etc, but Arq scratches the itch of having control over your backups and putting them where you want with a GUI to easily configure/keep track of what is going on.
Maybe there's something newer/better now (and I bought lifetime licenses of it long ago), but it works for me.
That said, I use Arq + Backblaze storage and I think my monthly bill is very low, like under $5. Though I haven't backed-up much media there yet, but I do have control over what is being backed-up.
Yeah this is the core problem with how most backup tools handle Dropbox / iCloud / OneDrive now. Those folders aren’t really “normal files” anymore — a lot of the time they’re just placeholders, and touching them can trigger downloads or other weird behavior depending on the client. That said, just skipping the entire folder is kind of the worst possible outcome. Backup should be predictable. If something is on disk, it should get backed up. If it’s not, you should at least know that, not find out later when you need it. I’ve been working on Duplicati (https://github.com/duplicati/duplicati) and one thing we’ve tried to be careful about is not silently ignoring data. If something can’t be backed up, it should be visible to the user.
Feel free to reach out to me if you have any questions about setting up duplicati.
For those looking for something at a decent price for up to 5TB, take a look at JottaCloud, which is supported by rclone, and then you can layer restic on top for a complete backup solution.
JottaCloud is "unlimited" for $11.99 a month (your upload speed is throttled after 5TB).
I've been using them for a few years for backing up important files from my NAS (timemachine backups, Immich library, digitised VHS's, Proxmox Backup Server backups) and am sitting at about 3.5TB.
WJW. This sort of blanket policy change should be called-out in ALL CAPS, bold-faced, and underlined as it changes one of the implicit assumptions with the service's execution.
The technical and performance implications of backing-up cloud mount-points are real, but that's zero excuse for the way this change was communicated.
This is a royal screw-up in corporate communications and I would not be surprised if it makes a huge negative impact in their bottom line and results in a few terminations.
I think this is a risk with anything that promotes itself as "unlimited", or otherwise doesn't specify concrete limits. I'm always sceptical of services like this as it feels like the terms could arbitrarily change at any point, as we've found out here.
(as a side note, it's funny to see see them promoting their native C app instead of using Java as a "shortcut". What I wouldn't give for more Java apps nowadays)
On the topic of backing up data from cloud platforms such as Onedrive, I suspect this is stop the client machine from actively downloading 'files on demand' which are just pointers in explorer until you go to open them.
If you've got huge amounts of files in Onedrive and the backup client starts downloading everyone of them (before it can reupload them again) you're going to run into problems.
But ideally, they'd give you a choice.
This is a pain, to be sure, but surely there is some sort of logic you could implement to detect whether a file is a Real File that actually exists on the device (if so, back it up) or a pointer to the cloud (ignore it by default, probably, but maybe provide a user setting to force it to back up even these)
It used to be the case that placeholder files were very obvious but now OneDrive and iCloud (possibly others) work more like an attached network storage with some local cache, and that was a good move for most programs because back then a file being evicted from storage looked like a file deletion.
Came here to say this. Files in OneDrive get removed from your local storage and are downloaded ON DEMAND. given that you can have 1TB+ onedrive folder, backblaze downloading all of that is gonna throttle your connection and fill up your disk real fast.
No reason for that to be true.
I think this should not be attributed to malice, however unfortunate. I had also developed some sync app once and onedrive folders were indeed problematic, causing cyclic updates on access and random metadata changes for no explicit reason.
Complete lack of communication (outside of release notes, which nobody really reads, as the article too states) is incompetence and indeed worrying.
Just show a red status bar that says "these folders will not be backed up anymore", why not?
What’s worse, random metadata change or a completely missing data?
If the constant meta changes (or other peculiarities involving those folders) make the sync unusable, then it can be both. In that case, you stop syncing and communicate.
So my idea is that it's a competency problem (lack of communication), not malice. But it's just a theory, based on my own experience.
In any case, this is a bad situation, however you look at it.
I've been on Backblaze for a few years now, ever since Crashplan decided it didn't want individuals to use its service any more.
It's always been just janky. A bad app that constantly throws low disk warnings and opens a webpage if you click anywhere on it. Being told the password change dialogue in the app doesn't work and having to use the website etc etc.
Just all round not an experience that inspires confidence. In comparison, Crashplan just worked.
But Crashplan also had an absolute abomination of a bloated, sluggish, Java-based client.
It was a bit, but I never found it as bad as Backblaze.
Unrelated to the main point, and probably too late to matter, but you can access repo activity logs via Github's API. I had to clean up a bad push before and was able to find the old commit hash in the logs, then reset the branch to that commit, similarly to how you'd fix local messes using reflog.
Use restic with resticprofile and you won't need anything else. Point it to a Hetzner storagebox, the best value you can get. Don't trust fisher price backup plans
Isn't it challenging to back up a directory that's being synced with a 3rd party service? Especially if more than one person may be working on one of those files in OneDrive or DropBox?
I think the target of the anger here should be (at least in part): OneDrive.
My understanding is that a modern, default onedrive setup will push all your onedrive folder contents to the cloud, but will not do the same in reverse -- it's totally possible to have files in your cloud onedrive, visible in your onedrive folder, but that do not exist locally. If you want to access such a file, it typically gets downloaded from onedrive for you to use.
If that's the case, what is Backblaze or another provider to do? Constantly download your onedrive files (that might have been modified on another device) and upload them to backblaze? Or just sync files that actually exist locally? That latter option certainly would not please a consumer, who would expect the files they can 'see' just get magically backed up.
It's a tricky situation and I'm not saying Backblaze handled it well here, but the whole transparent cloud storage situation thing is a bit of a mess for lots of people. If Dropbox works the same way (no guaranteed local file for something you can see), that's the same ugly situation.
Most have pointed out that the OneDrive exclusion makes sense due to its complexity. But I see no one here defending the undocumented .git exclusion. That’s pretty egregious - if I’m backing up that directory it’s always 100% intentional and it definitely feels like a sacrifice to the product functionality for stability and performance. Not documenting it just twists the knife.
If you want to access your file, it gets downloaded. If Backblaze wants to check if your file has been changed, it doesn’t need to have the file downloaded - that’s what modification time is for. And file size.
Time Machine has a similar issue. OneDrive silently corrupted hundreds of my files, replacing their content with binary zeros while retaining the original file size. I have Time Machine backups going back years, but it turns out TM does not backup Cloud files, even if you have them pinned to local storage! So I lost sales those files, including some irreplaceable family photos
I’ve added restic to my backup routine, pointed at cloud files and other critical data
This is really disturbing to hear as I've incorporated B2 into a lot of my flow for backups as well as a storage backend for Nextcloud and planned as the object store for some upcoming archival storage products I'm working on.
I know the post is talking about their personal backup product but it's the same company and so if they sneak in a reduction of service like this, as others have already commented, it erodes difficult-to-earn trust.
I had issues with the personal backup product and was told the solution was to create a new account. I moved to Wasabi immediately using rclone.
On macOS.
I've been very content moving away from OneDrive/GDrive to a personal NAS setup with Synology/Ugreen. You can access a shared drive/photo drive and use Tailscale to mount your volume from anywhere.
I've also configured encrypted cloud backups to a different geographic region and off-site backups to a friend's NAS (following the 3-2-1 backup rule). It does help having 2.5Gb networking as well, but owning your data is more important in the coming age of sloppy/degrading infrastructure and ransomware attacks.
I’ve been using it for years, and the one time I needed to restore a file, I realized that VMware VMs files were excluded from the backup. They are so many exclusion that I start doing physical backup again.
I assume they do some form of de-duplication across all files in their system. Most windows system files, and binaries would be duplicates, and only need to be stored once. I'm relatively sure this is true for most other systems, like Linux, MacOS, etc. Why not just back everything up for everyone?
It really shouldn't take up much more space or bandwidth.
Personally: I had to go in and edit the undisclosed exclusions file, and restart the backup process. I've got quite a few gigabytes of upload going now.
Having option to not do that is great, we did it for our backup system, because our cloud stores were backed up separately.
Doing it silently is disaster.
Making excludes doing it hidden from UI is outright malice, because it's far too easy to assume those would just be added as normal excludes and then go "huh, I probably just removed those from excludes when I set it up".
I dropped them when they silently dropped support for veracrypt/truecrypt drives: https://old.reddit.com/r/backblaze/comments/1ol0pgf/backblaz...
They're really proving lately that they are a company that can't be trusted with your data.
It seems to me that Backblaze does NOT exclude ".git". It's not shown by default in the restore UI -- you must enable "show hidden files" to see it -- but it's there. I just did a test restore of my top-level Project directory (container for all of my personal Git projects) and all .git directories are included in the produced .zip file.
At least at the enterprise level, I've never seen anyone use Backblaze for this. You want to use a cloud level backup like Rubrik/Veeam/Cohesity. Trying to back up cloud based files locally is a fools errand. Granted it sucks that they dropped this without proper communication, but it's already a bad solution.
We just use restic with B2 and some local S3 servers rather than relying on proprietary solutions. If it goes to shit we will just change the provider
Thanks for publicising. I recently decided not to renew my Backblaze in favour of 'self hosting' encrypted backups outside the US. But I was horrified to learn that my git repos may not have been backed up, nor my Dropbox, whose subscription I also recently cancelled. Good riddance.
My experience using restic has been excellent so far, snapshots take 5 mins rather than 30 mins with backblaze's Mac client. I just hope I can trust it…
Glad I switched from their personal computer backup to using restic + B2 a while ago. Every night my laptop and homelab both back up to each other and to B2. It takes less than a minute and I have complete control over the exclusions and retention. And I can easily switch off B2 to something else if I want.
I was using Restic + B2 for a while, but recently switched to Restic + Hetzner Storage Box.
Storage Box is a little more effort to setup since it doesn't provide an S3 interface and I instead had to use WebDAV, but it's more affordable and has automated snapshots that adds a layer of easy immutability.
I would love to see a summary of all of the various options being bandied about.
There are 2 components in my mind: the backup "agent" (what runs on your laptop/desktop/server) and the storage provider (which BB is in this context).
What do people recommend for the agent? (I understand some storage providers have their own agents) For Linux/MacOS/Windows.
What do people recommend for the storage provider? Let's assume there are 1TB of files to be backed up. 99.9% don't change frequently.
I discovered Backblaze through their disk reliabilty posts here on HN and became a customer for a family laptop many years ago.
Now I discover again through HN, that it's time to find another solution.
I feel that's a systemic problem with all consumer online-backup software: They often use the barest excuse to not back things up. At best, it's to show a fast progress bar to the average user, and at worst it's to quietly renege on the "unlimited" capacity they promised when they took your money. [1]
Trying to audit—let alone change—the finer details is a pain even for power users, and there's a non-zero risk the GUI is simply lying to everybody while undocumented rules override what you specified.
When I finally switched my default boot to Linux, I found many of those offerings didn't support it, so I wrote some systemd services around Restic + Backblaze B2. It's been a real breath of fresh air: I can tell what's going on, I can set my own snapshot retention rules, and it's an order of magnitude cheaper. [2]
____
[1] Along the lines of "We have your My Documents. Oh, you didn't manually add My Videos or My Music for every user? Too bad." Or in some cases, certain big-file extensions are on the ignore list by default for no discernible reason.
[2] Currently a dollar or two a month for ~200gb. It doesn't change very much, and data verification jobs redownload the total amount once a month. I don't backn up anything I could get from elsewhere, like Steam games. Family videos are in the care of different relatives, but I'm looking into changing that.
Yes, you're exactly right. Once they decide not to exclude certain filetypes it puts the burden on the endusers who are unequipped to monitor these changes.
Umm, why didnt you find a GUI manager like Vorta (this one is Borg exclusive IIRC)?
With restic I don't need some kind of special server daemon on the other end, I can point my backup destination to any mountable filesystem, or relatively dumb "bucket" stores like S3 or B2. I like having the sense of options and avoiding lock-in. [1]
As for GUIs in general... Well, like I said, I just finished several years of bad experiences with some proprietary ones, and I wanted to see and choose what was really going on.
At this point, I don't think I'd ever want a GUI beyond a basic status-reporting widget. It's not like I need to regularly micromanage the folder-set, especially when nobody else is going to tweak it by surprise.
_____
[1] The downside to the dumb-store is a ransomware scenario, where the malware is smart enough to go delete my old snapshots using the same connection/credentials. Enforcing retention policies on the server side necessarily needs a smarter server. B2 might actually have something useful there, but I haven't dug into it.
My takeaway is that for data that matters, don't trust the service. I back up with Restic, so that the service only sees encrypted blobs.
Same, I use Restic + Backrest (plus monitoring on Healthchecks, self-hosted + Prometheus/AlertManager/Pushover), with some decent structure - local backups every half-an-hour to raid1, every hour a backup to my old NAS, every day a backup to FTP in Helsinki, and once a week some backups to Backblaze (via Restic). Gives me local backups, observability, remote backups spread across different providers - seems quite safe :) I highly recommend to everyone figuring out a good backup strategy, takes a day or two.
Edit: on top of that I've built a custom one-page monitoring dashboard, so I see everything in one place (https://imgur.com/B3hppIW) - I'll opensource, it's decent architecture, I just need to cleanup some secrets from Git history...
What cloud backend are people using for restic? B2/S3/something else? I'm still just backing up to other machines using it (though I'd also heavily recommend restic)
I run restic with rclone, which is not only compatible with S3-like storage (which include many, like Hetzner, OVH, Exoscale) but many others, from mega to pcloud through Google Drive.
For stuff I care about (mostly photos), I back them up on two different services. I don't have TBs of those, so it's not very expensive. My personal code I store on git repositories anyway (like SourceHut or Codeberg or sometimes GitHub).
Yep, I was wondering which services people would recommend. I had been thinking about B2, I just haven't prioritised it.
cloudflare
Dropped Backblaze over this when I learned about it in December (https://mjtsai.com/blog/2025/12/19/backblaze-no-longer-backs...) and went to Arq. Not as polished, especially on Windows, but works and is actually cheaper.
They need to be reminded they are a utility. It's not up to them to have an opinion about my data. "Utilities" with an opinion carry a lot more liability.
Anyone have suggestions for backing up Google Drive + local files? I keep reading the horror stories about people getting locked out of cloud services, and worry about my 20 years of history stored in Drive. Less worried about local files which are sync'd to an external disk, but it'd be nice to have something in place for everything.
So what are HN’s favorite alternatives?
Preferably cheap and rclone compatible.
Hetzner storagebox sounds good, what about S3 or Glacier-like options?
> So what are HN’s favorite alternatives?
I assume when asking such a question, you expect an honest answer like mine:
rclone is my favorite alternative. Supports encryption seamlessly, and loaded with features. Plus I can control exactly what gets synced/backed up, when it happens, and I pay for what I use (no unsustainable "unlimited" storage that always comes with annoying restrictions). There's never any surprises (which I experienced with nearly every backup solution). I use Backblaze B2 as the backend. I pay like $50 a month (which I know sounds high), but I have many terabytes of data up there that matters to me (it's a decade or more of my life and work, including long videos of holidays like Christmas with my kids throughout the years).
For super-important stuff I keep a tertiary backup on Glacier. I also have a full copy on an external harddrive, though those drives are not very reliable so I don't consider it part of the backup strategy, more a convenience for restoring large files quickly.
Backblaze's B2 storage is fine if used with a separate app over which you have more control. Others here have mentioned Arq. I have used it, as well as Kopia[0] and Blinkdisk[1] (Blinkdisk is essentially Kopia but with a nicer UI). Can recommend all three highly; the latter two are FOSS.
[0]: https://kopia.io/
[1]: https://blinkdisk.com/
The cheapest is a computer at a relative or friend's house. I have my backup server at my parents house. We both have gigabit fiber so it works well.
Commenting on the presentation, not the content: Why is there a white haze over the entirety of this website?
Hi-DPI displays have convinced web designers it's okay to use ludicrously thin fonts with barely any contrast.
I was waiting for some kind of pop up, so I could click "Deny all" and then the text would be readable. But no. It just stayed essentially greyed out, like reading it is an invalid option (which turns out to be the case).
> I made several errors then did a push -f to GitHub and blew away the git history for a half decade old repo. No data was lost, but the log of changes was. No problem I thought, I’ll just restore this from Backblaze.
`git reflog` is your friend. You can recover from almost any mistake, including force-pushed branches.
This is an absolutely massive loss for me. I had no idea it wasn't backing up my OneDrive files. A horrible way to find out and a massive loss of trust.
Thank you. I will also immediately stop using backblaze. Its purpose is to independently back up my hard drive. Not to pick and choose.
This is why I use Arq with Backblaze. They just see a bunch of encrypted files with random GUID filenames. They don't need to know what I'm backing up, just that I am backing it up.
Hetzner storagebox. 1TB for under 5 bucks/month, 5TB for under 15. Sftp access. Point your restic there. Backup game done, no surprises, no MBAs involved.
Hetzner Storage Box is not even remotely a similar product to B2, considering the only reliability offered is the disks being in RAID. There is no geo-redundancy at all. The number of simultaneous connections is also quite limited, though this might not matter for many use cases.
I feel what you need to chose between $5/m and 99.999% SLA
B2 has no geo-redundacy by default. It's RAID across a bunch of neighboring servers.
Until there is. Backblaze was also trusted years ago. Selfhost, it became easy enough.
Selfhosting Offsite is hard. Accessing services via standard protocols like ssh/webdav and just pushing your encrypted blobs there is a good middle ground. They can't control what you upload, and you can easily point your end-point somewhere else if you need to move.
I already dropped Backblaze over this stuff and I do not intend to ever consider using them again.
Now, I:
- Put important stuff in a SyncThing folder and sync that out to 2 different nodes.
- Clone stuff to an encrypted external drive at home.
- Clone stuff to an encrypted external drive at work and hide it out in the datacenter (fire suppression, HVAC, etc).
It's janky but it works.
I used to use a safe deposit box but that got too tedious.
I backup my data to s3 and r2 using local scripts, never had any issues
Don't even know why people rely on these guis which can show their magic anytime
* S3 is super expensive, unless you use Glacier, but that has a high overhead per file, so you should bundle them before uploading.
* If your value your privacy, you need to encrypt the files on the client before uploading.
* You need to keep multiple revisions of each file, and manage their lifecycle. Unless you're fine with losing any data that was overwritten at the time of the most recent backup.
* You need to de-duplicate files, unless you want bloat whenever you rename a file or folder.
* Plus you need to pay for Amazon's extortionate egress prices if you actually need to restore your data.
I certainly wouldn't want to handle all that on my own in a script. What can make sense is using open source backup software with S3/R2/B2 as backing storage.
Even with Glacier, S3 is ridiculously expensive compared to almost anything else.
which service you recommend?
In terms of software I've been impressed by restic, and as a developer who wants to be able to not back-up gitignored files the rustic clone of restic.
In terms of cloud storage... well I was using backblaze's b2 but the issues here are definitely making me reconsidering doing business with the company even if my use of it is definitely not impacted by any of them.
> Don't even know why people
Most people (my mom) don't know what s3 and r2 is or how to use it.
This. I use Restic, the cloud service doesn't know about what I send, it's just encrypted blobs as far as it is concerned.
> encrypted blobs
I like how you can set multiple keys (much like LUKS) so that the key used by scheduled backups can be changed without messing with the key that I have memorized to restore with when disaster strikes.
It also means you can have multiple computers backing up (sequentially, not simultaneously) to the same repository, each with their own key.
you don't understand why pre-rolled critical backup solutions might be appealing to (especially non-technical) people?
also, you pay per-GB. the author is on backblaze's unlimited plan.
I'm Backblaze user -- multiple machines, multiple accounts. I'm going to be dropping Backblaze over this change, that I'm only learning about from this thread.
Any suggestions for alternatives?
That's pretty crazy because I just set up personal backups with a different service (rsync.net, I was already using it for WP website backups) and my git folders were literally my first priority
I found out the hard way that backblaze just deletes backed up data from external hard drives that haven't been connected in a while. I had like 2TB total.
I like backblaze for backups, but I use restic and b2. You get what you pay for. Really lame behavior from backblaze as I always recommended their native backup solution to others and now need to reconsider.
Longtime backblaze user. Time to vibecode myself a replacement.
> There was the time they leaked all your filenames to Facebook, but they probably fixed that.
That's a good warning
> Backblaze had let me down. Secondly within the Backblaze preferences I could find no way to re-enable this.
This - the nail in the coffin
Time to make the move over to linux and use Duplicati with Backblaze or any other bucket. You get the benefit of encrypted backups, have more control over what to back up, and will be notified upon failure.
not helpful for non-mac users, but i really like the way arq separates the backup utility from the backup location. I feel like the the reason backblaze did this was to save money on "unlimited" storage and the associated complexity of cloud storage locations.
Arq is on windows now!
If this is true, I'll need to stop using Backblaze. I have been relying on them for years. If I had discovered this mid-restore, I think I would have lost my mind.
Well shit. If this is right, I'm dropping Backblaze and recommending all my friends/customers do the same. I pay for and rely on Backblaze as the "back up everything" they advertise.... to silently stop backing up the vast majority of my work is unacceptable!
Seems Backblaze does not even read their own blog with articles about 3-2-1 backups and sync not being the same as backup.
I only use Backblaze as a cold storage service so this doesn't affect me but it's worth knowing about changes in the delivery of their other services as it might become widespread
I left them years ago when they wouldn't package a download for restore. Total waste of money and false sense of security.
The article links to a statement made by Backblaze:
"The Backup Client now excludes popular cloud storage providers [...] this change aligns with Backblaze’s policy to back up only local and directly connected storage."
I guess windows 10 and 11 users aren't backing up much to Backblaze, since microsoft is tricking so many into moving all of their data to onedrive.
Not backing up cloud is a good default. I have had people complain about performance when they connected to our multiple TB shared drive because their backup software fetched everything. There are of course reasons to back that up I am not belittling that, but not for people who want temporary access to some 100GB files i.e. most people in my situation.
This is terrifying. Aren't Backblaze users paying per-GB of storage/transfer? Why should it matter what's being stored, as long as the user is paying the costs? This will absolutely result in permanent data loss for some subset of their users.
I hope Backblaze responds to this with a "we're sorry and we've fixed this."
I think the author is referring to the personal backup plan [1] which has a fixed monthly amount
[1] https://www.backblaze.com/cloud-backup/personal
Initially I thought this was about their B2 file versions/backups, where they keep older versions of your files.
B2 is not a backup service. It’s an object storage service.
Weird, because in the Reddit thread linked above they call themselves a backup service.
I just looked in my Backblaze restore program, and all my .git folders are in there. I did have to go to the Settings menu and toggle an option to show hidden files. This is the Mac version.
i think at this point i have had enough of the majority of consumer products and just use production.
backup to real s3 storage.
llms on real api tokens.
search on real search api no adverts.
google account on workspace and gcp, no selling the data.
etc.
only way to stop corpos treating you like a doormat
I was always roughly of the mind that Backblaze was just too close to the "If it's too good to be true it probably is", seems like that may have been a good decision.
I recently stopped using Backblaze after a decade because it was using over 20GB of RAM on my machine. I also realized that I mostly wanted it for backing up old archival data that doesn’t change ever really. So I created a B2 bucket and uploaded a .tar.xz file.
restic and with cloudflare r2 (safety) or new hetzner storage boxes(cost effectiveness) are almost cheaper than backblaze 'unlimited' with full control and 'unlimited' history.
I still like backblaze, they've been nice for the days where I was running windows. Their desktop app is probably one of the best in the scene.
> My first troubling discovery was in 2025, when I made several errors then did a push -f to GitHub and blew away the git history for a half decade old repo
git reflog is your "backup". it contains every commit and the resulting log (DAG) going back 90 days. If you do blow away a remote commit, don't fret, it's in your reflog
Ouch. The only reason their “well figured out what to include and exclude” policy made sense was an implicit assumption that they’d play it safe
Just switched from Backblaze to Cloudflare R2 (using restic). Now it makes me think if I should check for such issues with R2 as well.
Should really qualify this headline with which backblaze product.
If you can’t trust one of their products, you can’t trust the company.
This "let's not back up .git folders" thing bit me too. I had reinstalled windows and thought "Eh, no big deal, I'll just restore my source code directory from Backblaze". But, of course, I'm that kind of SWE who tends to accumulate very large numbers of git repositories over time (think hundreds at least), some big, some small. Some are personal projects. Some are forks of others. But either way, I had no idea that Backblaze had decided, without my consent, to not back up .git directories. So, of course, imagine how shocked and dismayed I was when I discovered that I had a bunch of git repositories which had the files at the time they were backed up, but absolutely no actual git repo data, so I couldn't sync them. At all. After that, I permanently abandoned Backblaze and have migrated to IDrive E2 with Duplicati as the backup agent. Duplicati, at least, keeps everything except that which I tell it not to, and doesn't make arbitrary decisions on my behalf.
Edit: spelling errors and cleanup
Windows is constantly pushing my wife and inlaws to move all their files to OneDrive while Backblaze is no longer backing up OneDrive. There are similar things going on with Apple and iCloud.
What is the point of Backblaze at all at this point? If you are a consumer, all your files are probably IN OneDrive or iCloud or soon will be.
This is not a Backblaze issue.
When trying to copy files from a OneDrive folder, the operation fails if the file must be sync'd first.
I, for one, do not think it is fair to blame Backblaze for the shortcomings of another application who breaks basic funtionality like copying files.
https://techcommunity.microsoft.com/discussions/onedriveforb...
FWIW You can put a rpi in gadget mode and use nbd kit to mount nfs/smb shares..
I'd like to apologise to everyone for this situation. It's very likely because I've just started using it recently.
Restic+Backblaze
I use Backblaze to backup my gaming PC. While .git and Dropbox does not affect me it’s worrisome that OneDrive is not backed up seeing as Windows 11 somehow automatically/dark pattern stores local files in OneDrive.
You have to give Apple credit, they nailed Time Machine. I have fully restored from Time Machine backups when buying new Macs more times than I can count. It works and everything comes back to an identical state of the snapshot. Yet, Microsoft can’t seem to figure this out.
I use Kopia Backup software, sending all my important files to a compatible S3 bucket, using retention-mode: compliance as ransomware protection. I have access to every incremental snapshot Kopia makes using kopia-ui.
Holy Hannah, this is such bullshit from Backblaze. Both the .git directory (why would I not SPECIFICALLY want this backed up for my projects?) and the cloud directories.
I get that changing economics make it more difficult to honor the original "Backup Everything" promise but this feels very underhanded. I'll be cancelling.
The only right approach these days is a vps with a zfs partition with auto-snapshots, compression, and deduplication on and a syncthing instance running. Everything else is bound to lose money, and/or data (a comment mentions they lost a file and got 3 whole months FREE)
Ultimately the author is ranting about something that is likely an unintended bug where some update along the line reset the default exclusions list.
It almost seems like they’re taking it personally as some kind of intentionally slight against them.
Most users would not want Backblaze to back up other cloud synced directories. This default is sensible.
Hidden default to not backup is incompetent not sensible.
It’s not hidden it’s right there in the exclusions list.
If the user didn't put it there, it's hidden. Nobody routinely inspects the detailed configuration settings of their backup system, especially when it does appear to be working if you see it transferring data to the cloud and spot-check a file or two.
Any addition to the exclusions list that wasn't added by explicit user action is a hidden change and a data loss bug.
This is just wild.
I mean, they do one thing.
Looking forward to seeing if they respond.
Blackblaze's personal backup solution is a mess in general. The client is clearly a giant pile of spaghetti code and I've had numerous issues with it, trying to figure out and change which files it does and doesn't backup is just one of them.
The configuration and logging formats they use are absolutely nonsensical.
Is this grey-on-black just meant for LLMs to see for training, or is the intention that humans should be able to read it too?
I've recently been looking for online backup providers and Backblaze came highly recommended to me - but I think after reading this article I'll look elsewhere because this kind of behavior seems like the first step on the path of enshittification.
rhey alao stopped taking my cc and email me on a no+reply email about it like they dont want to get paid
ANY company, and I do mean any, that offers "unlimited" anything is 100% a scam. At best its a temporary growth hack to entice people who havent had technology rug-pulls. And when profits dwindle and the S curve is near the upper coast, you can guarantee that "unlimited" will get hidden restrictions, exclusions, "terms of service" changes, nebulous fair use policies that arent fair, and more dark patterns. And every one of them are "how do we worsen unlimited to make more money on captive customers?"
We're also seeing this play out in real time with Anthropic with their poop-splatter-llm. They've gone through like 4 rug-pulls, and people STILL pay $200/month for it. Every round, their unlimited gets worse and worse, like I outlined above.
Pay as you go is probably the more fair. But SaaS providers reallllllly hate in providing direct and easy to use tools to identify costs, or <gasp> limit the costs. A storag/backup provider could easily show this. LLM providers could show a near-realtime token utilization.
But no. Dark patterns, rug-pulls, and "i am altering the deal, pray i do not alter it further".
Dropping them like I accidentally picked up shit...
To the author: please use a darker font. Preferably black.
I’m only in my 40’s, I don’t require glasses (yet) and I have to actively squint to read your site on mobile. Safari, iPhone.
I’m pretty sure you’re under the permitted contrast levels under WCAG.
Surprisingly only the headings (2.05) and links (3.72) fail the Firefox accessibility check, the body text is 5.74. But subjectively it seems worse and I definitely agree with you that the contrast is too low.
Contrast looks good for the text, but the font used has very thin lines. A thicker font would have been readable by itself. At 250% page zoom it's good enough, if you don't enable the browser built-in reader mode.
I wonder if it's because of the font-weight being decreased. If I disable the `font-weight` rule in Firefox's Inspector the text gets noticeably darker, but the contrast score doesn't change. Could be a bad interaction with anti-aliasing thin text that the contrast checker isn't able to pick up.
I'd say it looks pretty readable on android although I still wouldn't describe it as good. I wouldn't say I feel encouraged to squint. But possibly different antialiasing explains it.
I think the accessibility checks only take into account the text color, not the actual real world readability of given text which in this case is impossible to read because of the font weight.
The problem is less the color than the weight. If it was 500 rather than 300 it would be perfectly fine.
Safari’s reader mode is good for this. All you have to do is long press the icon on the left edge of the address bar.
LONG PRESS????!?! you legend. How does one find these things out.
Like this, by word of mouth. That’s how Apple has done UI design since they stopped printing paper manuals.
- ctrl-shift-. to show hidden files on macOS - pull down to see search box (iOS 18) - swipe from top right corner for flashlight button - swipe up from lower middle for home screen
Etc, etc
It's so intuitive, how could I have missed that?
Good old iOS and hidden features. Great discoverability. Long press those, swipe that, gesture this.
I have a gesture for whoever decided "find in page" should go under share.
> I have a gesture for whoever decided "find in page" should go under share.
You can also just type your search term into the normal address bar and there's an item at the bottom of the list for "on this page - find <search>". I'd never even seen the find-in-page button under share.
That would be so much better if Find in page was the first item not the last.
Not restricted to Apple, but TIL: Double-clicking on a word an keeping the second click pressed, then dragging, allows you to select per word instead of per character.
Long press is a shortcut, the longer way is to click on the icon beside the url and tap/click the enormous "reader mode" button.
That's what I've done for years.
Long pressing is much more pleasant.
I wish Apple would give us a hint rather than requiring us to chance upon this recommendation on HN.
That’s a nice use for AI - pop up hints when it sees you using the long way a few times.
The problem is Apple’s hints keep popping up even after you say no thanks or it’s fine.
So that’s why Reader mode sometimes shows up directly when I click on the icon, I must be long clicking it by accident.
cmd+shift+R for reader mode if you prefer a keyboard shortcut
Yes, it’s a great workaround but website owners should not make me do that.
I found this to be a common theme in web design a while back, and in part led to an experiment developing a newspaper/Pocket-like interface to reading HN. It's not perfect, but is easier on the eyes for reading... https://times.hntrends.net/story/47762864
Your feedback is noted! I'll darken it down a few nootches and test it on mobile. Thanks for the feedback
Please: Not "a few notches". All the way. Black. That is if you actually care if people read your posts.
I instinctively use Dark Reader on any page with a white background so I was genuinely surprised by your comment at first.
Completely agree with this comment. Had to cut / paste it into vim and q! when done, was getting a headache.
Even as a Vim user I find this completely overkill when you can just press the reader mode button on the browser.
document.querySelectorAll('p').forEach(p => p.style.color = 'black');
Use this command in the developer tools console to change the color.
I'm also pretty sure 14 points font is a bit outdated at this point, 16 should probably be a minimum with current screens. It's not as if screens aren't wide enough to fit bigger text.
That's good guidelines and all, but meanwhile you are posting it on a site with..
Haha I keep forgetting that. Fortunately the browser remembers my zoom settings per page. I'm pretty sure the font is now at 16 or something via repeated Cmd +.
Which is why Firefox has memorized that this site needs 170% zoom.
There’s a reason I have HN set to 200%
10 point at 96 dpi or with correctly applied scaling is very readable. But some toolkits like GTK have huge paddings for their widgets, so the text will be readable, but you’ll lose density.
I'm on my laptop and that font is too thin and too small. I'm in my mid 30's ;)
On my android phone it's perfectly legible. Moving my phone away it's only a tiny bit worse than HN.
Is this maybe a pixel density of iphone issue?
I wouldn't mind a darker and higher weight font though.
macOS/iOS Safari and Brave browsers have "Reader mode" . Chrome has a "Reading mode" but it's more cumbersome to use because it's buried in a side menu.
For desktop browsers, I also have a bookmarklet on the bookmarks bar with the following Javascript:
It doesn't darken the text on every webpage but it does work on this thread's article. (The Javascript code can probably be enhanced with more HTML heuristics to work on more webpages.)Some css files abuse !important so you might have to add that too:
The font is dark enough, yet the weight is too light. Hairline or ultrathin or something. It's eye straining.
>I don’t require glasses (yet)
One day try throwing a pair on you'll be surprised. The small thin font is causing this not the text contrast. This and low light scenarios are the first things to go.
> The small thin font is causing this not the text contrast.
Whatever causes it, I do wear glasses (and on a recent prescription too) and the text is still very hard to read.
+1
Firefox users: press F9 or C-A-R
F9 doesn't seem to do anything for me on Linux... Neither on the posted page nor on HN.
What is it supposed to do?
There is no mention of F9 on this support page either:
https://support.mozilla.org/en-US/kb/keyboard-shortcuts-perf...
Am I missing something?
yeah reader mode it is, didn't know it's different on Linux than on Windows and the support article listing it is here: https://support.mozilla.org/en-US/kb/keyboard-shortcuts-perf...
I assume they are trying to enable Reader mode which is Ctrl+Alt+R
According to http://web.archive.org/web/20260317212538/https://support.mo... its
F9 on Windows
Ctrl + Alt + R on Linux
Command + Option + R on macOS
(It uses JS to only show the one for your platform but with view source you can see it mentions all three of these different OSes.)
So I guess the first guy is a Windows user and you other two use Linux.
> (It uses JS to only show the one for your platform but with view source you can see it mentions all three of these different OSes.)
There is a dropdown at the top-right to select the platform - no need to view source.
On mobile they’ve hidden that under “customize this article”, which I never would have even noticed if I hadn’t specifically known that there is some sort of dropdown somewhere, heh. But now we know :)
Probably. When available, reader mode can also be activated by clicking the little "page with text" icon on the right of the address bar.
Reader mode?
Your iPhone has this cool feature called reader mode if you didn’t know.
As for mentioning WCAG - so what if it doesn’t adhere to those guidelines? It’s his personal website, he can do what he wants with it. Telling him you found it difficult to read properly is one thing but referencing WCAG as if this guy is bound somehow to modify his own aesthetic preference for generic accessibility reasons is laughable. Part of what continues to make the web good is differing personal tastes and unique website designs - it is stifling and monotonous to see the same looking shit on every site and it isn’t like there aren’t tools (like reader mode) for people who dislike another’s personal taste.
I don’t know, I got 140 upvotes on a nitpick so I think others agree with me it’s hard to read.
Didn't say it wasn't. I said invoking an accessibility standard when it comes to a guy's personal website is laughable because the way it was said implied he was compelled to change his site because some bureaucratic busybodies somewhere said he should. Unless you are a business or a government, most people aren't overly concerned about accessibility, nor should they be - especially if it comes about only through guilt tripping or insinuated threats.
Many here at HN find that site hard to read, not just the original commenter.
Why don't you just go tell the WCAG on him yourself?
The who?
> Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage.
As the author I certainly apreciate this feedback
This is not merely annoyance. This is usability failure.
This is the most broken rule in the history of time. Every thread
if I can't read TFA because of its formatting it isn't tangential
Worst case scenario you copy the text out. It's worth complaining sometimes, but yes it's tangential.
flying really close to the dropbox comment, sir
Meanwhile, Backblaze still happily backups up the 100TB+ I have on various hard drives with my Mac Pro.
Does it? How do you know?
If they start excluding random content (eg: .git) without effective notice, maybe they AREN'T backing up everything you think they are.
You don’t do quarterly restore tests?
How do you do that?
My naive idea: Download 100 TB every 3 month to a 2nd device, create a list of files restored, validate checksums with the original machine, make a list of files differing and missing, check which ones are supposed to be missing? That sounds like a full time job.
Why should Backblaze back up their competitors’ data? And what use is it to you for it to do so?
It’s my data not their competitors’.
Managing backup exclusions strikes again. It's impossible. Either commit to backing up the full disk, including the 80% of easily regenerated/redownloaded etc. data, or risk the 0.001% critical 16 byte file that turns out to contain your Bitcoin wallet key or god knows what else. I've been bitten by this more times than I'd like to admit managing my own backups, it's hard to expect a shrink-wrapped provider to do much better. It only takes one dumb simplification like "my Downloads folder is junk, no need to back that up" combined with (no doubt, years later) downloading say a 1Password recovery PDF that you lazily decide will live in that folder, and the stage is set.
Pinning this squarely on user error. Backblaze could clearly have done better, but it's such a well known failure mode that it's not much far off refusing to test restores of a bunch of tapes left in the sun for a decade.
> Pinning this squarely on user error.
It isn't user error if it was working perfectly fine until the provider made a silent change.
Unless the user error you are referring to is not managing their own backups, like I do. Though this isn't free from trouble, I once had silent failures backing up a small section of my stuff for a while because of an ownership/perms snafu and my script not sending the reports to stderr to anywhere I'd generally see them. Luckily an automated test (every now & then it scans for differences in the whole backup and current data) because it could see the source and noticed a copy wasn't in the latest snapshot on the far-away copy. Reliable backups is a harder problem then most imagine.
If there is a footgun I haven't considered yet in backup exclusions, I'd like to know more. Shouldn't it be safe to exclude $XDG_CACHE_HOME? Unfortunately, since many applications don't bother with the XDG standard, I have to exclude a few more directories, so if you have any stories about unexpected exclusions, would you mind sharing?
I don't remember why I started doing it, but I don't bulk exclude .cache for some reason or other. I have a script that strips down larger known caches as part of the backup. But the logic, whatever it was, is easy to understand: you're relying on apps to correctly categorise what is vs. isn't cache.
Also consider e.g. ~/.cache/thumbnails. It's easy to understand as a cache, but if the thumbnails were of photos on an SD card that gets lost or immediately dies, is it still a cache? It might be the only copy of some once-in-a-lifetime event or holiday where the card didn't make it back with you. Something like this actually happened to me, but in that case, the "cache" was a tarball of an old photo gallery generated from the originals that ought to have been deleted.
It's just really hard to know upfront whether something is actually important or not. Same for the Downloads folder. Vendor goes bankrupt, removes old software versions, etc. The only safe thing you can really do is hold your nose and save the whole lot.