Reverse-engineering and analysis of SanDisk High Endurance microSDXC card (2020)

274 points | by userbinator 8 months ago

139 comments

bityard 8 months ago
Every time Raspberry Pi comes up as a topic on HN, there are scores of comments decrying the futility and frustration of unreliable SD cards.
The thing with SD cards is that there are wide variances in quality, even among the top name-brand manufacturers. You can get the normal crappy cards that everyone buys which are likely to last not much more than 2-3 years under very occasional use all the way up to industrial cards with 10 year warranties and reliability matching the best enterprise SSDs. And everything in between.
The point is: If you buy the cheapest thing on Amazon, you're going to get exactly what you paid for. For my part, I have been using "High Endurance" cards in my Pis for several years as normal OS and data disks. They are not actually much more expensive than "normal" cards. I have not had any of them fail yet.
[-]
- NikolaNovak 8 months ago
  I don't disagree with you, but the other perennial unanswered question on HN is: how do I ensure I'm paying extra for actual quality and not just sucker tax?
  Memory cards and ssds are famously obtuse like this. There are the branded versions (custom made for your Switch, for your Xbox,etc) which are exactly the same as their cheaper counterparts. Sandisk itself recently started a "creator" line of cards and ssds which are, again, exactly the same but more expensive. Samsung had Evo Plus which was different than Evo+ but same as Evo Select. Go figure!
  Sometimes looking at specs helps but usually not - I.e. The "up to" speed marks are famously irrelevant for real time usage, and brand name companies will offer any number of cards with same specs and tags (u2/v3/class10/mark5/whatever) at varying price points. And then there's the WD Red saga where drives branded "NAS" were completely inappropriate for NAS usage.
  I ran a photography business a while back and always bought extreme pro because I just couldn't risk it, but honestly, I felt like a bit of a sucker. It's hard to know when you're actually buying real quality and endurance increase.
  [-]
  - ploxiln 8 months ago
    Looking for the most "extreme" high-quality micro-sd cards on Amazon, and also on B&H to avoid the chance of counterfeits, I'm struck by how they are all so damn cheap. Whether SanDisk Ultra, Extreme, Extreme Plus, Extreme Pro, or Max Endurance, the 64GB card is less than $20. (I've seen reports of Sandisk Extreme micro-sd cards dying, at close to the "typical raspi" rate ... I haven't seen that yet for Max Endurance, and that does sound like what I want, so I got one and I'm trying it. But it bothered me, ironically, that it's pretty much the same price.)
    Meanwhile, if you go to a professional electronics component vendor like Mouser or Newark, a SwissBit "pseudo-SLC" micro-sd card is over $50 for 32 GB. I'm pretty sure that is real industrial usage quality, and will have similar longevity to a "real SSD" of similar size. (I also got one of these, it works well so far, but that's not a conclusive test yet :)
    I guess I could instead get the 512 GB "Sandisk Extreme" whatever, and the additional size will probably give similar longevity? It just seems silly, and a bit less of a sure thing, than the priced-as-expected industrial thing. It is kinda strange/annoying that in the consumer side of the market, they keep adding superlatives, but it seems like the same cheap flash for the same cheap price? The real stuff surely costs more than $16, but it's not on the consumer market, at all.
    And why doesn't the Raspi foundation/company sell some of the high-priced industrial stuff, just to make it an obvious option, even if not for most people? If you're just messing around and learning, use the cheap one, but if you want a raspi to do real work for more than 4 months straight, you're gonna want one of these real industrial SD cards. I guess most people who care about storage use m.2 adapters, which you might want for the performance, or the better pricing at higher capacity. But if you don't need the performance or capacity, or don't have room for the m.2 in your application, and just need the damn thing to not break ... the real "extreme" does exist, it is the SwissBit.
    [-]
    - hnuser123456 8 months ago
      I feel like the DWPD should be given with at least 2 significant digits, and must be based on real world testing to find the amount of writes that 99.9% of the cards can take for the duration of their warranty without failing before warranty ends.
    - buccal 8 months ago
      Kingston Industrial pSLC are less than 30 € for 32 GB.
  - michaelt 8 months ago
    > how do I ensure I'm paying extra for actual quality and not just sucker tax?
    Objectively? You can't. You can make sure you don't get a counterfeit by never buying on amazon or ebay. And you can test things like the write speed are as claimed.
    But to actually measure the longevity of a card? You've got to buy a load of them, use them intensively in realistic conditions, and carefully investigate every failure to produce statistics.
    You can get plenty of anecdotes from DIYers, but let me tell you: My headless Raspberry Pi doesn't show up on the network? Could be a failed microsd card. Could be the disk's full so some log is blocking on a write. Could be a bad software update. Could be I powered it off mid-software-update. Could be weak wifi. Could be I brought a counterfeit microsd. Could be something I plugged into the GPIO. Could be a barely adequate power supply. Could be I messed up a setting and now the DHCP or the SSH server isn't working right. Could be it getting recruited into a botnet. Could be it changing IP addresses and me trying to connect to the old one. Are you going to take my word for it when I say it was the microsd card for sure?
    [-]
    - NikolaNovak 8 months ago
      I'm exactly with you - with the added Complication that when somebody does endurance testing (e.g. Backblaze), by then a) manufacturer started using different components for that SKU, and / or manufacturers no longer produce those SKUs. Almost by definition, you can only see endurance when it's too late (see the ibm desk star saga).
      Bottom line, you can try to trust the manufacturer reputation, purchase channel, and price points / specs, but you can never be sure - so backup like crazy :-)
- magicalhippo 8 months ago
  Especially with the Raspberry Pi, another point is power delivery. You need a beefy power source, but also a good cable. I bought a USB-C charging cable for my Pi4, and it was very unstable and crashed a lot. Corrupted the install a couple of times, and so I got some new SD cards.
  Well, turned out the "charging cable" had 1 ohm resistance and so when the Pi load spiked, the voltages would drop well below 4.5V...
  Tossed it away and got a proper cable, and the thing has been rock solid since.
  Highly recommend getting a USB cable tester, I had several black sheep in my collection.
- 1vuio0pswjnm7 8 months ago
  Why mount the SD card read/write. Why not mount read-only. After boot, I can remove the SD card. I can insert another card with more files, or chroot to other external storage. The rootfs is in RAM. Been booting RPi with tootfs in RAM since 2013.
  SD cards are not reliable for frequent, small writes, IME. Certainly worse than USB sticks. I wish RPi offered the option to boot from USB. Even an option to use a SD card to eMMC adapter and boot from eMMC would be an improvement.
  [-]
  - bityard 7 months ago
    But my point was that not all SD cards are made the same and lots of the tech-literate here on HN don't seem to know that. "SD cards are not reliable" as a blanket statement is false. You have to research to find out which ones are the best and use those, then you don't need to do weird workarounds like read-only filesystems and running from a RAM disk, you can just use the Pi like a normal computer.
  - bornfreddy 8 months ago
    This is the best solution - if you don't need persistent storage. I prefer N100 minipcs for this reason - the Minix Z100 I have uses 6.4W at the wall (linux, idle, wifi and bt disabled) and while this is probably more than rpi, I got a silent computer with a case and NVME out of the box. Being in x86-land is just an added bonus.
    My rpi4 cost me way too many sd cards.
  - erik 8 months ago
    > Why mount the SD card read/write. Why not mount read-only.
    I have seen dozens of name brand SD cards fail while mounted read only in a medium sized deployment of Pis. The problems mostly come from firmware bugs, not from nand wear.
  - 1vuio0pswjnm7 8 months ago
    An SD card removed after boot is not mounted at all.
    There is nothing in the slot to "fail", for whatever alleged reason.
  - 8 months ago
    [deleted]
  - 1vuio0pswjnm7 8 months ago
    s/tootfs/rootfs/
    I use external drives for persistant storage.
- LargoLasskhyfv 7 months ago
  Yah, well, the other points with early Raspberries were underpower/undervoltage due to shitty design and signalling of its own circuitry,
  only to be compensated with the one-and-only-official-wall-wart-with-attached-cable,
  bad config/buggy design of the upstream-driver itself,
  fixes of those only rolled out with much delay into their 'OS', and so forth.
  Also https://www.bunniestudios.com/blog/on-microsd-problems/ (2010),
  & https://www.bunniestudios.com/blog/2013/on-hacking-microsd-c... (2013)
  & https://media.ccc.de/v/30C3_-_5294_-_en_-_saal_1_-_201312291... (2014)
  and much many more...
  Just log to RAM, or something else, than that tiny tempting toy.
  That aside flash-fraud is still rampant, no matter what you think you buy from whomever.
- numpad0 8 months ago
  Is SD card "quality" really relevant? I've long wondered why it remains the problem only on Raspberry Pi. One quirk of Pi is it doesn't have proper power circuit, likely for legal reasons. Are such design oddities really not have to do with early SD failures specifically on Pis?
  [-]
  - Slartie 8 months ago
    It is not just a problem with Pis, but also other use cases in which the SD card is used 24/7, or at least for long stretches of time, especially if the use case involves writing of data. Dashcams are also notorious for destroying crappy SD cards due to their high write load, for example.
  - erik 8 months ago
    > I've long wondered why it remains the problem only on Raspberry Pi.
    SD card firmware is often buggy and only heavily tested with windows. Camera manufactures will specify SD cards that are know to work with their cameras for this reason.
  - JKCalhoun 8 months ago
    Guessing it's due to swapping? I imagine disk activity from a dashboard cam is a monotonous long write stream. Disk activity on a Pi is your modern VM OS constantly swapping memory between RAM and drive.
- Dylan16807 8 months ago
  A normal-quality microsd is already the same price per byte as a real SSD, so paying 50% more to get a high endurance card is pretty annoying. Also annoying is that the sandisk high endurance still doesn't have a 1TB model.
- londons_explore 8 months ago
  > you're going to get exactly what you paid for.
  And nearly always the cheap cards seem to fail due to firmware shortcomings rather than hardware faults. Ie. the built in wear levelling and error recovery is bad.
  Considering firmware is effectively free once written, the fact the cheap cards still don't have good firmware leads me to believe it might be deliberate market segmentation. Ie. "we need to make the cheap cards glitch and fail often, or nobody would buy our expensive cards".
  If so, it really raises moral questions about how many peoples wedding pictures have been gobbled simply so some executive can get his christmas bonus...
  [-]
  - userbinator 8 months ago
    I suspect the cheap cards are bad because of cheap flash, which ironically requires much stronger ECC and wear leveling to function at all.
  - mschuster91 8 months ago
    > And nearly always the cheap cards seem to fail due to firmware shortcomings rather than hardware faults. Ie. the built in wear levelling and error recovery is bad.
    Well, higher quality cards have better performing controllers that operate at higher speeds or have dedicated hardware offloads for stuff that a "cheaper" controller has to do in software.
  - pdimitar 8 months ago
    These moral questions stand unsolved for decades, and many people are acutely aware of them. IMO it's time we start putting those executives in jail for it. But we all know it ain't ever happening.
- emmelaich 8 months ago
  What specific brands and products have you been using successfully?
  [-]
  - sandos 8 months ago
    I work with embedded products that has a SD in them, and we have used Swissbit but moved to GreenLiant "Industrial Card" cards.
    We do _extensive_ testing of these in-house. Like restarting machines 10000 times etc. We have never seen a card wear out, as far as I know, and we have test systems that are used way more than any usage I would consider "normal".
- spintin 8 months ago
  [dead]
dekoruotas 8 months ago
Very interesting. This longevity aspect frequently comes up in the Wii U modding community. It is tempting to plug in a simple USB stick to store the game files, but, because of the intense read/write nature, it is deemed to be prone to failure. Recently use of high-endurance SD cards has grown, but some say it is still not as safe as an external HDD. It would be interesting to hear thoughts from someone more experienced about the safest storage option as the last thing you want is for your save files to get corrupted.
[-]
- staindk 8 months ago
  In the same vein as this I've wondered for a couple years now what the impact of flash storage longevity is on mobile phone performance over time. Felt like my Samsung S8 was very snappy when I got it, yet a couple years later things that used to be fast - like finding specific music, scrolling through the photos in my gallery, etc. - had slowed down considerably.
  Could also just be software updates or other things causing this but there should be some component of decreasing performance caused by wear on flash storage.
  [-]
  - izacus 8 months ago
    You're right, flash degradation and deterioration of write speeds is pretty much primary reason why older phones feel slow and laggy.
    A lot of - especially older or mid/low range - phones have cheap eMMC storage which is signifcantly worse at wear leveling than the higher end UFS storage.
    [-]
    - londons_explore 8 months ago
      > phones have cheap eMMC storage which is signifcantly worse at wear leveling than the higher end UFS storage.
      Which is shocking really - the phones should switch the eMMC to RAW flash mode (ie. no wear levelling), and then write an actually-smart wear levelling algorithm that runs in the OS.
      The OS has far better info for wear levelling anyway - it has more visibility into read-write patterns, it has more RAM to store more state, it can cron background scrubs and reorganisation to idle periods, it can be patched if a bug is found which only manifests after years, etc.
      Unfortunately, as far as I'm aware, most eMMC's can't be put into any kind of RAW mode anyway.
    - circuit10 8 months ago
      Could you get around this by using a custom ROM that installs the OS on a high-quality microSD card or something like that?
  - metalman 8 months ago
    The only part of a far future sci-fi that stayed with me is the use of memory chips as universal (ha) currency :ie capacity was face value and then total value was determined by the data contained on the chip and how much someone(thing) wanted that. Sometimes it looks like that is an inevitable outcome.
- merpkz 8 months ago
  What kind of intense read/write nature you are talking about in a video game console? It just reads the game ROM from storage and executes it, there is nothing to write back, the game is not being modified in any way while playing. All these talks about wearing out sdcards in game consoles or raspberry pi devices in my personal opinion are partially because of people encountering poor quality cards - counterfeits. There is an sdcard in my car camcorder which must have seen thousands of full write cycles and still has no issues functioning despite all the operating temperature differences it endures due to weather seasons.
  [-]
  - toast0 8 months ago
    Writes should be minimal yeah. But reads could be intense. My car has worn out two maps SD cards. One of them had a questionable chain of custody, but I went back to the almost certainly original card that came with the car, and it's starting to misbehave in the same ways, so I think it's only a matter of time. These cards are unwritable after factory initialization, so writes are definitely not a factor.
    [-]
    - progman32 8 months ago
      I understand that reads can technically cause read disturb, but isn't this normally handled by the controller? My intuition says that the writes caused by block rewrites should not significantly accelerate wear. I'd suspect more mundane issues such as bad solder, but would love to hear an expert take.
- Neywiny 8 months ago
  Safe is a multifaceted term. Essentially these storage media (I'm more experienced with solid state but HDDs may be included) the probability the data you wrote is the data you read is a function of how many program/erase cycles, how long ago that was, and naturally the part's specifics. For example, a lot of NOR flash is rated to 10 years at up to 100k cycles. But devices > 10 years old rarely make the news for their flash being dead. On the other hand, I believe there was a Tesla fiasco where their logs were wearing out the flash prematurely.
  There's usually trends to look for in regards to that third factor. The lower the # of bits per cell, the higher probability the voltage level is still working the right range. Which is why so much flash is still SLC or pSLC capable. Usually this is more industrial. Then you have entirely different technologies altogether. NVRAM/FRAM/MRAM are various terms for extremely high (or infinite) read/write technologies while still being non-volatile (keeps its data with power off). I don't know how much of a drop in replacement those are. I think LTT had one of those on a flash drive a while back https://youtu.be/oJ5fFph0AEM, but it's so low capacity it'll probably be useless.
  It may be possible to hack something up with a MR5A16A. It's a whole 4 MB but has unlimited endurance and over 20 years of endurance. It looks like it has more of an SRAM interface than NAND, but should be capable of saturating a USB high speed link. The drive would likely cost $75? TBH if there was a market it may be a fun project.
  If you sacrifice some endurance you can go up to 1Gb per device which might be interesting. But the cost scales.
  [-]
  - Scoundreller 8 months ago
    > But devices > 10 years old rarely make the news for their flash being dead.
    Accelerated stability testing is fraught with potential issues, and any output is intentionally conservative.
    An issue with estimating lifespan on new products is that they'll expose them to more extreme conditions, but those more extreme conditions may trigger (exponentially faster) higher order reactions that are relative non-issues at regular conditions.
    Then you have things like activation energy requirements for a reaction that just might not be met at regular conditions, but happen at higher temperatures.
    And an IC is quite the soup of molecules in varying combinations unlike a straightforward solution.
    [-]
    - userbinator 8 months ago
      and any output is intentionally conservative.
      Samsung still screwed up with the planar TLC flash used in the infamous 840 EVO SSD, which had a real-world retention measured in months. Their "fix" was to issue a firmware update that continuously rewrites data in the background, but of course this has no effect if the drive isn't always powered.
      https://forum.acelab.eu.com/viewtopic.php?t=8735
      https://goughlui.com/2024/07/20/salvage-tested-an-elderly-fo...
- mikepurvis 8 months ago
  I’ve run a modded Wii U for ~5 years (mocha cfw -> cbhc -> tiramisu -> aroma) and have always used a usb flash drive, but I did have one fail and just assumed it was a bad unit — I could well imagine the write patterns being particularly hard on them though.
- gibspaulding 8 months ago
  Probably overkill, but I wonder if anyone has experimented with setting up a raspberry pi or something to pass through access to a network share over USB, that way you could have all the data on a RAID array with a proper automated backup strategy somewhere.
- Shawnj2 8 months ago
  Considering SSD prices have crashed it’s the only way to go IMO, it solves literally all of the problems with other iptions like power draw for hard drives and longevity and speed over flash drives
- causality0 8 months ago
  Why is there that much writing? I would imagine a game system reads billions of times as much data as it write.
_benj 8 months ago
Something I found interesting when learning about flash was how it is manufactured and how much reliable single layer cells are (SLC)
There’s certainly a very visible increase in price and decrease in capacity but it’s certainly interesting when you get sd cards with a proper datasheet vs. customer level devices
https://www.digikey.com/en/products/filter/memory-cards/501?...
[-]
- msarnoff 8 months ago
  Most eMMC chips (basically the chip version of an SD card) can be configured to work in pseudo-SLC (pSLC) mode. This halves the capacity but improves the write endurance by several times.
  [-]
  - Firefishy 8 months ago
    Raspberry Pi's "Making a More Resilient File System" document https://pip.raspberrypi.com/categories/685-app-notes-guides-... has instructions on how to configure the eMMC on the CM4 and CM5 to run in pSLC mode. Halving the storage capacity.
    [-]
    - msarnoff 8 months ago
      Yup. mmc-utils is the way to go. Note that this change is irreversible once you send the command to finalize the settings.
      The single biggest thing you can do to improve the reliability of your embedded system is to use eMMC’s built-in hardware partitioning.
      - Each hardware partition is a separate block device. A firmware update cannot corrupt the device by overwriting a partition table.
      - There are two small boot partitions, and they can be made permanently read-only after programming, thus preventing corruption of your bootloader. You can also use the other one read-write for your uboot environment.
      - You can easily have two OS partitions for A/B firmware updates. In addition to mounting them readonly, temporary write protection can be enabled on a per-partition basis and disabled when needed for fw updates.
      - If you can’t afford the capacity hit from pSLC, I believe it can be enabled on a per-partition basis. (Don’t quote me on this, it could be wrong).
      All these settings can be configured with either mmc-utils or u-boot. In volumes, programming houses can take care of this for you. (You’ll have to list all the registers out very specifically in an Excel spreadsheet)
      The downside is that calculating all the correct register values is not a simple process, and you’ll have to spend a bit of time reading the eMMC spec.
      [-]
      - remram 8 months ago
        How come there is so much "permanent" config for SD cards?
        [-]
        hypercube33 8 months ago
        I believe it's because it originally was SD/MMC which was supposed to be a future media for audio and the like for retail sale. I had some read only palm pilot cards like this - books etc were also sold this way for a short period.
        userbinator 8 months ago
        The spec was developed with DRM as one of its use cases.
        msarnoff 8 months ago
        Probably the same reason microcontrollers have OTP fuses—to prevent accidental corruption in the field or from buggy programming.
    - dehrmann 8 months ago
      Could you also use ZFS or BTRFS with copies? I'm not sure I'd trust any of these drives.
  - anthk 8 months ago
    Guess what the PocketCHIP guys did in a upgrade. Exactly the oppsite. Yes, ubifs died every time.
    [-]
    - Dylan16807 8 months ago
      They switched it from SLC to MLC? I wouldn't expect 2 bits per cell to cause problems... how bad was the flash?
      Unless you're saying the switch corrupted the existing data, in which case wow what a screwup.
- userbinator 8 months ago
  a very visible increase in price and decrease in capacity
  Unfortunately SLC costs disproportionately more compared to MLC, TLC, and now QLC -- the actual die cost is only 3x for SLC compared to TLC of the same capacity, but the prices are closer to 10x or more.
  Related: https://news.ycombinator.com/item?id=40405578
  [-]
  - Panzer04 8 months ago
    I recall reading about running MLC+ SSDs in SLC mode for SLC reliability at the cost of the capacity multiplier afforded by the multilevel NAND.
    It always sucks when hardware that is effectively identical is charged at massively different prices, but I guess as a consumer this differentiation probably leads to ever so slightly cheaper MLC+ SSDs...
  - adgjlsfhk1 8 months ago
    most of the reason for this is that tlc still has plenty of endurance for any remotely normal use case. people get really worried about the decrease in reliability, but normal tlc drives are warrantied to 0.3 DWPD, which is still a ton (~900 hours of 8k compressed raw video for a 2 TB SSD)
    [-]
    - userbinator 8 months ago
      The problem isn't endurance, it's retention. Early SLC was rated for 10 years retention after 100K cycles. The latest QLC is only rated for several months after 300 cycles.
- dehrmann 8 months ago
  I back up important things on SLC USB-A flash drives and write refreshed copies at least annually. The drives are small (<=8GB), expensive, there are only a handful of manufacturers (mine's an ATP), and you're probably ordering it from Digikey or Mouser. SLC will supposedly hold data at rest longer, and USB-A means I can find readers for at least another decade. We'll see if retention at rest is actually good.
  [-]
  - userbinator 8 months ago
    You can get (real!) SLC USB drives from Aliexpress too, for around $0.70/GB. These aren't the ultra-cheap ones which are often fake in capacity, but prominently advertise the type of flash and controller they use.
    I also have some really small USB drives from around 2 decades ago, of course SLC flash (the only type of NAND flash at the time), and can confirm that their data is still intact. They've gone through probably a few dozen full-drive writes, while SLC flash back then was rated for 10 years retention after 100K cycles.
    [-]
    - pdimitar 8 months ago
      Yeah, I have 2 such from AliE and they perform well and haven't lost anything yet even though they have dozens of thousands of files at rest that are updated 1-2 times a month, for almost 3 years now. Quite pleased with them.
      They are indeed more expensive but nothing that a programmer salary would notice even if you bought 10 of them. I am wondering whether I should buy another pair and use it as a metadata mirrored vdev on my media NAS. Haven't decided yet.
      [-]
      - dehrmann 8 months ago
        > I am wondering whether I should buy another pair and use it as a metadata mirrored vdev on my media NAS. Haven't decided yet.
        Probably not worth the operational overhead, and in the past, I remember USB having significantly worse write latency than SATA.
        [-]
        pdimitar 8 months ago
        Yeah, thought the same. Not to mention that I tried to have a bunch of old USB sticks put into one of my servers and it added like two minutes to its reboot time. (Though admittedly I didn't fight hard with systemd to make sure their mounting does not block anything -- it's likely possible to do.)
        Ultimately I am not willing to shell out hundreds of EUR for f.ex. 2x SLC internal SSDs at 32 or 64 GB each so I guess I'll just have a few spare disks around when I am back to being employed, and that's going to be that.
- error9397 8 months ago
  These don't use SLC, they use TLC. SLC is hard to find in SD cards and SSDs due to cost.
  [-]
  - l11r 8 months ago
    In the provided link price seems to be reasonable for SLC, no? 15 bucks for 512 MB (not GB).
    Also datashits clearly say it's SLC and not TLC: https://static6.arrow.com/aropdfconversion/55056b1a2a966f599...
walrus01 8 months ago
Previously, Bunnie Huang on the problems of counterfeit sd cards
https://www.bunniestudios.com/blog/on-microsd-problems/
https://www.bunniestudios.com/blog/2013/on-hacking-microsd-c...
[-]
- everybodyknows 8 months ago
  2010, and 2013.
las_balas_tres 8 months ago
We have hundreds of raspberry pi s out in the field that experience sdcard failures far too often even though the filesystems are setup as readonly using a ram based overlay filesystem. I suspect something happens during reboot of these pi's and that the SoC generates spurious signals on the bus lines causing havoc with some of the sdcards. It doesn't seem to happen with high end cards or the same setup but with a compute module (which we now use exclusively) with an on board emmc chip.
[-]
- OptionOfT 8 months ago
  This is interesting, I got gifted an one of those hydroponic systems. The thing was 2 years old. Runs on a Raspberry Zero. The issue? SD card corrupt. I got an image from someone else and fixed it with that.
  What's worse is that these things are connected to the internet (with VNC installed!), and they don't do updates...
  The system is awesome, but I VERY quickly moved it to a separate VLAN.
- lysace 8 months ago
  These haven't failed for me:
  https://www.digikey.com/en/product-highlight/a/atp/advancedm...
  I'm cynically guessing it's down to a relatively simple workaround in the SD card firmware that fixes this very common corruption on power loss while writing. Or maybe that combined with a 0.1 cent capacitor.
- lxgr 8 months ago
  Writes aren't the only things that can degrade data stored on NAND flash. Not sure how SD cards are mitigating read disturb errors, for example.
- erik 8 months ago
  I've seen the same. I don't think it's the reboot. My understanding is that NAND undergoes wear-leveling even when it is read only. The card shuffles data around its storage even when it hasn't been written to. And the firmware is unreliable.
- dezgeg 8 months ago
  Are there other SBCs / SoMs that are as widely used with SD cards though? That's certainly a source of bias.
  [-]
  - imtringued 8 months ago
    99% of devices that use SD cards are battery powered. It's an issue with the power supply.
PaulKeeble 8 months ago
Its really hard to compare SD cards and especially durability because we get no information as to what they are doing differently. You can get a better idea of the performance characteristics than the broad categories (A1 or A2, largely useless) with a review on storagereview but they don't have anything further or a way to compare durability. It matters less in cameras but for Single board computers or dashcam uses it would be nice to have a better idea of the practical durability and the usage pattern that would preserve it.
wvh 8 months ago
There's a lack of transparency on what you're actually getting with storage these days. Akin to the actual flash architecture, I've been trying to find information on the level of OPAL/Sed support in SSDs and many manufacturers don't even mention that information anymore, nor do help desk people have any useful information. The whole flash industry has a shady feel to it when it comes to pricing, capabilities and reliability, not to mention the sheer amount of counterfeit productions flooding the market. That's a bit bizarre, data security is pretty paramount, this is not an industry that should be clouded in a shroud of secrecy and shadiness.
[-]
- bayindirh 8 months ago
  From what I understand, Flash manufacturing and management during operation are black arts at this point and nobody wants to give the smallest clue for anything related to Flash management and specifications.
  For example, Crucial (Micron) doesn't give any TBW for their external SSDs, giving indications that they're using mixed-binning as long as the speed specs are satisfied. Same for lower level Kingston (NV series) SSDs. At least they openly say that the drives can wildly differ from batch to batch (and oh boy, they do).
  As the industry is pinched by customers for more capacity, QLC becomes the norm, TBW numbers are continuously hidden, and you're only left with the internal endurance counters, if you can access them.
  Controllers are in the same boat. Wear leveling, write amplification, flash monitoring are all left to whims of the controller designers, resulting in strange situations.
  Off-shift/dark production is another problem, but it's not new by any means. BunnyStudios has a great write up which has been linked a couple of times.
  [-]
  - userbinator 8 months ago
    The datasheets for the early flash memories (SLC) were easily available and loudly proclaimed 100K or even 1M cycles of endurance and 10-20 years of retention. Now they're very difficult to obtain and are rated for only a few hundred cycles and a few months of retention. No wonder they're secretive about it.
    [-]
    - bayindirh 8 months ago
      True. I still remember Corsair’s words about original Flash Voyager GT 16GB:
      “We had to change our flash controller, but the performance is the same. Same 100K write SLC flash, same speed, same endurance, same Flash Voyager” (paraphrase mine).
      I still have that drive and it has a small file write performance which can rival a low level SSD despite being a USB 2.0 drive.
      Seriously, no kidding.
ck2 8 months ago
When are we going to get tiny RAID10 microSDXC USB3 devices?
Just plug in four microsd cards and the hardware takes care of the mirroring and rebuilding when dead card removed+replaced
Be sure to use cards from different batches or even makes so they don't all fail at the same time.
[-]
- ComputerGuru 8 months ago
  ZFS (or btrfs) will do that for you, no hardware RAID required.
  Most pro DSLRs have two SD card slots and will also do it (at the firmware level) if you toggle the appropriate option (usually the options are RAID 1 or a unionfs-like "use both SDs" option).
- Koshkin 8 months ago
  Wouldn't a USB hub and, say, BTRFS do the job?
  [-]
  - doubled112 8 months ago
    It would work.
    I RAIDed a bunch of cheap USB 2.0 flash drives on a hub with MDRAID as a learning tool back in the day.
    It was horrendously unreliable. USB wasn’t a good choice for storage back then, and I’m convinced the hub had issues. This would work much better now.
    I did, however, get to watch the blinkin lights, learn how to recover from failures, and discover quite a few gotchas.
    [-]
    - aaronmdjones 8 months ago
      MDRAID is good for availability and fault tolerance but no good for integrity.
      For example, in a RAID-1, if one of the drives has a silently corrupted block, MDRAID will happily return that if you're unlucky enough for it to decide to use that drive to satisfy that read request. If you have error detection at a higher level, you might start pulling all but one drive from the array at a time and re-issuing the read request until it gives you bad data again (then you know which drive is bad).
      If you have an 8-drive RAID-6 and one of the data blocks in a stripe is corrupt, again, it will happily return that (it won't even read the parity blocks, because every drive is present). Again you would have to pull one drive at a time and re-issue the read request until you get back good data, assuming you have a way to know that (e.g. a Zip archive with a CRC32). If you're still getting bad data, you didn't pull the bad drive; re-add it and pull the next one. This would happen when you pull the drive with the corrupted block, because then it would calculate what that block was supposed to contain based on the parity in that stripe.
      Most distros have something akin to a monthly scrub job, where MDRAID will confirm mirrors against each other and parity against data. Unfortunately this will only tell you when they don't agree; it's still your responsibility to identify which drive is at fault and correct it (by pulling the corrupted drive from the array, nuking the MDRAID metadata on it, and re-adding it, simulating a drive replacement and thus array rebuild).
      Worse still, the RAID-5 and RAID-6 levels in MDRAID have a "repair" action in addition to the "check" action that detects the above. This doesn't do what you think it does; instead it just iterates every stripe in the array and recalculates the parity blocks based on the data blocks, writing new parity back. Thus you lose the above option of pulling 1 drive at a time because now your parity is for corrupted data.
      You need a filesystem that detects and repairs corruption. Btrfs and ZFS both do this, but Btrfs' multi-device RAID support is (still) explicitly marked experimental and for throwaway data only.
      In ZFS, you can either do mirroring (equivalent to an MDRAID RAID-1), a RAID-Z (equivalent to an MDRAID RAID-5 in practice but not implementation), a RAID-Z2 (RAID-6), a RAID-Z3 (no MDRAID equivalent), set the "copies" property which writes the same data multiple times (e.g. creating a 1 GiB file with copies=2 uses 2 GiB of filesystem free space while still reporting that the file is 1 GiB in size), or some combination thereof.
      ZFS checksums everything (data and metadata blocks), and every read request (for metadata or data) is confirmed against the checksum when it is performed. If it does detect checksum mismatch and it has options for recovery (RAID-Z parity or another drive in a mirror or an extra filesystem-level copy created by the "copies" property being greater than 1), it will automatically correct this corruption and then return good data. If it doesn't, it will NOT return corrupted data; it will return -EIO. Better still, checksums are also metadata, so they are also replicated if you have any of the above topologies (except copies=). This protects against corruption that destroys a checksum (which would ordinarily prevent a read) rather than destroying (meta)data. A ZFS scrub will similarly detect all instances of checksum and (meta)data mismatch and automatically correct any corruption. Better still, a ZFS scrub only needs to operate on filesystem-level allocated space, not every stripe or block.
      tl;dr: Don't use MDRAID on questionable storage media. It won't go well.
takeda-kun 8 months ago
MLC is the opposite of "high-endurance".
[-]
- jsheard 8 months ago
  It's high endurance compared to TLC and QLC, which make up >99.9% of flash sold nowadays. Sure SLC is even better but it's all relative.
error9397 8 months ago
I'm a bit disappointed Industrial XI Ultra cards weren't evaluated.
jqpabc123 8 months ago
My experience:
1) All microSD cards have a high failure rate.
2) Cheap cards sourced directly from China are usually fake and will fail a format test. I tried multiple times and could never find a reliable source.
3) MS Bitlocker is a security measure that also seems to serve as a reliability test. Cards with physical memory reliability issues seem more prone to write failures with Bitlocker. It's better to know about this up front.
4) If your data is really important, always make a backup copy of any card.
5) Physically, microSD cards are fairly durable and even water resistant thanks to being encased in epoxy.
I have one on my watch strap and take it wherever I go, even the shower and pool. Just make sure it is dry before plugging it in.
https://www.thingiverse.com/thing:6784665
[-]
- jdietrich 8 months ago
  >All microSD cards have a high failure rate.
  You can buy genuinely reliable microSD cards, if you really need them. Delkin, Innodisk, Swissbit and others make industrial microSD cards with SLC flash, ECC, SMART and power loss protection. Capacities are small and you'll pay a steep premium, but they legitimately have endurance and MTBFs that compete with conventional SSDs.
  [-]
  - crispyambulance 8 months ago
    This is true.
    You can get "industrial" versions of these cards that are intended to be used as BOM items for products. This means you get a stable supply and they won't change the part number out from under you (which happens even with reputable suppliers when you buy as a consumer). When they're about to EOL the part, you get a heads-up multiple months in advance and a "last time buy" opportunity.
    In my experience, these aren't "better" performance-wise. You just "know" what you're getting, have an opportunity to qualify it, and you won't get surprise changes.
    Of course, the price is jacked up as a result.
    Generally speaking, it's usually better to make the customer responsible for the SD card if you have to use one at all.
- matsemann 8 months ago
  Ref 2: if you buy from Amazon, you never know if you get a genuine card. Even if you shop from an official seller, as inventory is commingled so you might as well get a fake one from another seller with the same SKU.
  Big problem in action cam related circles, where lots of people have broken recordings due to fake cards.
  [-]
  - jqpabc123 8 months ago
    if you buy from Amazon, you never know if you get a genuine card.
    Yes, but there is a significant difference --- I can always return it to Amazon postage free if the capacity is fake and it won't format properly. Presumably, Amazon extracts the appropriate weight in flesh from the seller.
    The way retail typically works in the USA, the vendor assumes all the risk --- any problems get unrolled backwards. You can sell a fake product on Amazon --- but it will likely cost you --- not Amazon (who controls the money) nor the end consumer if he is astute.
    Basically, Amazon acts like an escrow service.
    [-]
    - Palomides 8 months ago
      you're still feeding the machine, and assuming you can catch cases where something is merely lower quality rather than completely fake
      [-]
      - jqpabc123 8 months ago
        If there are too many returns, the machine doesn't get fed; it gets punished.
        Amazon and other retailers have played this game for a very long time and they aren't in it to lose money.
        In my experience, the thing most commonly faked with memory cards is the capacity --- for example, an 8Gb card altered to appear 64Gb, until it is formatted.
        If you want extra assurance, buy Amazon branded cards. In this case, Amazon assumes almost all the risk so there is little incentive for fakery.
        [-]
        BolexNOLA 8 months ago
        > If there are too many returns, the machine doesn't get fed; it gets punished.
        On Amazon? I highly doubt it. The threshold for “too many returns” is probably rarely hit. MicroSD’s are super cheap - people don’t return things that are cheap because they don’t consider it worth their time. Then these companies go and pay people for good reviews and just keep flipping inconsistent/garbage products.
        If punishment were a real thing that companies had to deal with then this wouldn’t continue to be a problem. It’s been this way for many, many years. Amazon has no equal competitor, there’s nowhere else for folks to go and people rarely look at the specific vendor they’re buying from. As far as they’re concerned it’s all Amazon.
        [-]
        nothrabannosir 8 months ago
        > Amazon has no equal competitor, there’s nowhere else for folks to go and people rarely look at the specific vendor they’re buying from.
        I’ve bought more off target.com than Amazon and I read at least one other commenter here who does the same. Walmart.com also has a wide range of products.
        Even sd cards. Which are still going strong.
        https://www.target.com/s/microsd
        [-]
        8 months ago
        [deleted]
        jqpabc123 8 months ago
        Amazon back charges the vendor for every return --- which likely includes return shipping and handling. I can assure you, Amazon doesn't just "eat" these costs.
        Remember, Amazon holds the cash for all sales. The cost of returns is extracted from vendor disbursements. A fake card sold for $10 may cost the vendor $20 if it gets returned.
        [-]
        BolexNOLA 8 months ago
        If enough people return the item which again, with microsd cards, is likely not nearly often enough on Amazon.
        If “the system worked” i.e. companies get appropriately punished for selling bad products, then this issue wouldn’t still be so widespread. It’s basically a feature of Amazon now. People just assume they’re going to randomly get junk. It’s baked into our expectations at this point.
        [-]
        jqpabc123 8 months ago
        Amazon returns are very easy --- maybe even too easy.
        I suspect the real issue with memory cards is that most people probably don't check them. The solution is --- don't be like most people.
        [-]
        BolexNOLA 8 months ago
        It is significantly more work to return an item than to buy one, it’s very asymmetric.
        circuit10 8 months ago
        Just formatting it isn't enough, you need to use a tool that does a test that writes to all the storage and reads it back
        [-]
        jqpabc123 8 months ago
        Formatting is a base level test that doesn't require any additional software.
        A lot of cheap cards from China can't even pass this basic test.
        reboot81 8 months ago
        Validrive[1] can validate actual vs advertised storage area. Maybe works on SD-cards aswell?
        1 https://www.grc.com/validrive.htm
  - lazide 8 months ago
    Personally I found an easy way to tell - write a script which fills the drive fully with random data (computing a check sum as it writes), then read it all back and verify you get the right checksum.
    Oh and compute read/write times while doing so.
    If the read back data doesn’t validate, or you can’t write the amount expected, or the read/write rates don’t match expected? faaaaaaake
    Pretty accurate too. I personally never had an actual fake Sandisk SD card from Amazon. I bought probably 50 of them in the space of 6 months at one point. Other brands were not great.
    USB flash drives though? Literally all trash. I tried at least 5 different ones before I just gave up.
    No idea what the market looks like now however.
  - everybodyknows 8 months ago
    Are you saying that the counterfeiting of name brand packaging is so skillful that even careful inspection by us the consumers cannot reasonably hope to detect it?
    [-]
    - cwillu 8 months ago
      https://www.bunniestudios.com/blog/on-microsd-problems/
      “One vendor in particular interested me; it was literally a mom, pop and one young child sitting in a small stall of the mobile phone market, and they were busily slapping dozens of non-Kingston marked cards into Kingston retail packaging. They had no desire to sell to me, but I was persistent; this card interested me in particular because it also had the broken “D” logo but no Kingston marking.”
    - AlexandrB 8 months ago
      Unless you have the real and the fake side by side it can be really hard.
      [-]
      - dboreham 8 months ago
        Even then it's hard because the legit vendors change their packaging frequently.
    - BolexNOLA 8 months ago
      Yes it’s a very big problem until you run speed tests and the like which most people have no clue how to do.
  - mikae1 8 months ago
    First rule: never buy flash storage from Amazon.
- KMnO4 8 months ago
  Important thing you should be aware of: not all counterfeit cards fail a format/f3 test.
  I recently bought a very expensive Sandisk Extreme UHS-II V90 card from Amazon. It passed without any issue when doing a full capacity check, but was still fake because they were using slower (150mb/s vs 300mb/s) flash.
  The average user[0] would never know, because it was definitely “faster” than other cards and maxed out the UHS-I reader in my MacBook Pro. I returned it and bought from my local camera store (Henry’s) and the performance difference was very obvious.
  [0]: I guess you could argue that the average user wouldn’t be buying a $200 V90 card, but I still think you could fall victim to this if you didn’t explicitly own a dedicated UHS-II reader.
- megous 8 months ago
  1) I have ~7 cards (all from China and Sandisk, solely because Sandisk has a website where you can verify the card is genuine) that are in daily 24/7 use for 6-8 years with no issues (mounted in rw mode) 2) ... 4) that's true for any storage media
  If you ever tried running uSD card in SDR104 mode, you'd notice that they tend to heat up way more than in lower speed interface modes. So for longetivity, I guess it's better to run them at HS mode (which is just 50MHz), or at least lower the SDR104 bus frequency from 200MHz to something lower.
- hashier 8 months ago
  Ref 2
  You mean a simple reformat with fat? Or what’s in your opinion the best test when getting a new SD card?
  [-]
  - megous 8 months ago
    Write pseudorandom data to the whole card with a fixed seed.
    Read back data and compare against the same pseudorandom data with a same fixed seed.
    Or just make a random data file the size of SD card and write/read back compare, if you don't care about having to store the big file and also testing for potential local disk corruption at the same time.
    [-]
    - noja 8 months ago
      Or use https://github.com/dkrahmer/MediaTester
      [-]
      - hashier 8 months ago
        Looks interesting but unfortunately a windows tool. Happen to know a Linux tool?
        [-]
        mohaba 8 months ago
        https://fight-flash-fraud.readthedocs.io/en/stable/
        [-]
        megous 8 months ago
        I'd rather use something that works with block device directly, rather than something that depends on the filesystem code and may lead to filesystem corruption and potential for kernel instability. Also it seems like a weird design decision to fill flash with files, when in Linux there's trivial access to block device directly.
        It's also possible to write 64bit address of each 8 byte block to every such block, and avoid pseudorandom generator, and potentially have more insight to what happened when the block ends up mapped to unexpected location.
  - jqpabc123 8 months ago
    The first thing I do is a simple format and check the formatted capacity.
    Cheap, fake cards will often fail this simple basic test.
    There are a number of readily available utilities to further test performance and reliability.
- 8 months ago
  [deleted]
mdp2021 8 months ago
Related: a few years ago had to select microSD models for reliability (longevity): turned out that because of scarcity of available technical details*, the best indicator seemed to be the length of the warranty (5yrs, 10yrs, lifetime...).
Do you have better advice?
* cpr the article:
> [manufacturer] doesn’t publish any detailed specifications about the cards’ internal workings [...] I shouldn’t have to go this far in hardware reverse-engineering to just ask a simple question of what Flash SanDisk used in their high-endurance card
--
About details in the article, instead: it states that 3D NAND should be more reliable (3D has less interference than planar) - but actually I read that 3D NAND can be more fragile physically, and that the bits-per-cell value remains crucial (the less the better)...
[-]
- megous 8 months ago
  That can just as well be marketing or a way to pad the price.
  64 GiB microSD still costs ~$14 locally, but the same thing can be had for $4 from China without any warranty.
  They stick endless limited warranty on it and increase the price so much that it's economical for them to just replace a few cards that people will actually bother to return, and still make profit. And warranty will tell you nothing about quality.
  In fact on many products you can explicitly choose to pay more for longer warranty.
  [-]
  - 8 months ago
    [deleted]
- muragekibicho 8 months ago
  For maximum clarity, is there a positive or negative relationship between warranty length and longevity?
  [-]
  - ptero 8 months ago
    Positive and strongly correlated, as long as the part is not a fake from a fly-by-night shop.
    I personally buy SD cards and USB memory sticks from bhphotovideo, whose customers (pro and enthusiast photographers), care greatly about not losing their images, so B&H watches their suppliers closely. My 2c.
  - Uvix 8 months ago
    Positive. A longer warranty means they don’t think it will fail during that period (i.e. higher longevity).
- 8 months ago
  [deleted]
spintin 8 months ago
[dead]
TacticalCoder 8 months ago
[dead]
casenmgreen 8 months ago
Very good work.
thrdbndndn 8 months ago
I don’t quite understand the sarcastic, if not slightly passive-aggressive, tone in this article. This kind of attitude seems quite common in hacking and reverse engineering write-ups.
In my opinion, manufacturers aren’t obligated to disclose every technical detail beyond what a typical spec sheet would cover, such as the specific type of flash used. It’s incredibly impressive that someone as talented as the author can figure it out independently, but I don’t see why there’s a need for the constant tone of frustration or an “us vs. the company” mindset throughout the article.
[-]
- bean-weevil 8 months ago
  As someone who's been in a position like this before, I suspect the author is angry at the manufacturers and feels they're not being honest by not providing proper specs. It's hard to spec out a project when no one will tell you exactly what their parts can do. It's like if you were trying to buy a light truck for work and intsead of telling you how much weight it could pull or how often maintenance would be needed, the manufacturer refused to say anything more than "it can tow a speedboat" and "it requires infrequent maintenance."
- hilbert42 8 months ago
  The reason why we know how these media storage devices work is that our valuable data is being stored on them! We need to know how fragile that storage is and the only way we can do that is to have all the engineering data/test info.
  SSD and MicroSD/thumb drive and even HD manufacturers have a damn hide by being so secretive about their devices—of course it's never the manufactures who suffer the burden of data loss, it's the customer.
  What's desperately needed are open-source manufacturers who will publish the necessary data.
  This problem isn't new, sleazebag Kodak knowingly released shoddy unstable color stock in the 1950s and decades later precious family and wedding photos had faded to nothing.
  Let that be a lesson, this solid-state storage shit hasn't been around long enough yet to know whether we'll be seeing a repeat of that Kodak fuckup.
- hulitu 8 months ago
  > In my opinion, manufacturers aren’t obligated to disclose every technical detail beyond what a typical spec sheet would cover, such as the specific type of flash used.
  Some write endurance and retention figures will be ok. A 1TB flash is useless if, the moment you wrote 1TB, you cannot read it anymore or it gives erroneous values.
- mdp2021 8 months ago
  > to disclose every technical detail
  Basic fundamental technical details are regularly missing: e.g., bits per cell.
- grues-dinner 8 months ago
  The manufacturer may not be obliged to give the details but the hacker also doesn't need to be pathetically grateful for what they do deign to give.
  Most companies are quite happily to lie, obfuscate and omit to the hilt if they can: nearly every labelling regulation is a patch over some fuckery. The relationship is often pretty adversarial, especially at retail.
  Doubly so in computer memory devices, which is an industry particularly filled with flimflam and chancers.
- foxhop 8 months ago
  When i buy 512G microsd it becomes my property. If i need to repair or replace my property, it should be disclosed what technology i bought, so that i or my repairman could understand my property enough to decide a proper path forward depending on my data & other environmental devices.
  The author needed to reverse engineer what could have been on the spec sheet...