Zram Performance Analysis

91 points | by enz a day ago

32 comments

burch45 21 hours ago
This post’s conclusions are odd. It has a bunch of extensive benchmarks showing that zstd is by far the worst performing across every metric except a slight increase in compression ratio and then says the conclusion is zstd is the best choice. Unless I’m missing something in the data.
[-]
- Dylan16807 19 hours ago
  In the first benchmark it gets a ratio of 4 instead of 2.7, fitting 36-40% more data with 75% more CPU. It looks great.
  The next two show it fitting 20% more data with 2-3x the CPU, which is a tougher tradeoff but still useful in a lot of situations.
  The rest of the post analyzes the CPU cost in more detail, so yeah it's worse in every subcategory of that. But the increase in compression ratio is quite valuable. The conclusion says it "provides the highest compression ratio while still maintaining acceptable speeds" and that's correct. If you care about compression ratio, strongly consider zstd.
- buildbot 20 hours ago
  I have had similar experience, with ZFS zstd dropped IOPs and throughput by 2-4x compared to lz4! On a 64 core Milan server chip…
  [-]
  - colechristensen 19 hours ago
    ZFS lz4 in my experience is faster in every metric than no compression.
    [-]
    - Havoc 9 hours ago
      Only if the data in question is at least somewhat compressible
      [-]
      - colechristensen 4 hours ago
        Not really, it goes so fast through the CPU that the disk speed is at worst the same and the CPU overhead is tiny (in other words it's not fast while saturating the CPU, it's fast while consuming a couple percent of the CPU)
        technically sure you're correct but the actual overhead of lz4 was more or less at the noise floor of other things going on on the system to the extent that I think lz4 without thought or analysis is the best advice always.
        Unless you have a really specialized use case the additional compression from other algorithms isn't at all worth the performance penalty in my opinion.
- 1oooqooq 20 hours ago
  the context is missing.
  but for vps, where the cpu usage is extremely low and ram is expensive, it might make sense to sacrifice a little performance for more db cache maybe. can't say without more context
kragen a day ago
An alternative is zswap https://old.reddit.com/r/linux/comments/11dkhz7/zswap_vs_zra... which I believe, despite the name, can also compress RAM without hitting disk.
[-]
- mscdex 21 hours ago
  It's only an alternative if you have a backing swap device. zram does not have this requirement, so (aside from using no compression) it's basically the only solution for some scenarios (e.g. using entire disk(s) for ZFS).
  [-]
  - kragen 21 hours ago
    Can't you use a ramdisk as your backing swap device?
    [-]
    - PhageGenerator 19 hours ago
      Using a ramdisk for zswap is basically just zram with extra steps.
      [-]
      - Ferret7446 18 hours ago
        It is not the same at all. The swapping algorithm can make a big difference in performance, for better or worse depending on workload
        [-]
        RealStickman_ 10 hours ago
        Zram is just swap but in RAM. It uses the same algorithms as normal swap
      - kragen 19 hours ago
        Extra steps are fine if the result works better.
- heavyset_go 21 hours ago
  If you use hibernation, I think it also compresses your RAM image for potentially less wear and faster loading/saving
  [-]
  - 1oooqooq 20 hours ago
    why hibernation would not compress to begin with? you're more likely just end up running zstd twice.
    [-]
    - heavyset_go 19 hours ago
      Swap isn't compressed by default, hibernation dumps memory to swap
      [-]
      - kasabali 13 hours ago
        Hibernation uses compression regardless of zswap
        [-]
        heavyset_go 12 hours ago
        Thanks for the correction

zram tends to change the calculus of how to setup the memory behavior of your kernel.

On a system with integrated graphics and 8 (16 logical) cores and 32 GB of system memory I achieve what appears to be optimal performance using:

    zramen --algorithm zstd --size 200 --priority 100 --max-size 131072 make
    sysctl vm.swappiness=180
    sysctl vm.page-cluster=0
    sysctl vm.vfs_cache_pressure=200
    sysctl vm.dirty_background_ratio=1
    sysctl vm.dirty_ratio=2
    sysctl vm.watermark_boost_factor=0
    sysctl vm.watermark_scale_factor=125
    sysctl kernel.nmi_watchdog=0
    sysctl vm.min_free_kbytes=150000
    sysctl vm.dirty_expire_centisecs=1500
    sysctl vm.dirty_writeback_centisecs=1500

Compression factor tends to stay above 3.0. At very little cost I more than doubled my effective system memory. If an individual workload uses a significant fraction of system memory at once complications may arise.

avidiax 14 hours ago
This seems like a great place to ask: how does one go about optimizing something like zram, which has a tremendous number of parameters [1]?
I had considered some kind of test where each parameter is perturbed a bit in sequence, so that you get an estimate of a point partial derivative. You would then do an iterative hill climb. That probably won't work well in my case since the devices I'm optimizing have too much variance to give a clear signal on benchmarks of a reasonable duration.
[1] https://docs.kernel.org/admin-guide/sysctl/vm.html
[-]
- hdjfjkremmr 13 hours ago
  optuna, probably coupled with a VM to automate testing
Szpadel 4 hours ago
you have have multiple layers of compression, but you need some simple Daemon (basically for loop in bash)
I use lz4-rle as first layer, but if page is idle for 1h it is recompressed using zstd lvl 22 in the background
it is great balance, for responsiveness Vs compression ratio
sirfz 21 hours ago
a comment here about zram caught my eye a day or two ago and I've been meaning to look into it. Glad to see this post (and I'm sure many others saw the same comment and shared my obsession)
[-]
- dfc 17 hours ago
  You saw a comment a day or two ago about zram, but never got around to looking into it more even though you are obsessed by it?
gatane 18 hours ago
Just I was trying to find a benchmark about this, I wondered which algorithm would work best for videogames. Thanks!
[-]
- dandanua 13 hours ago
  Video games and compute heavy tasks cannot have a large compression factor. The good thing is that you can test your own setup using zramctl.
jftuga 19 hours ago
Has anyone tried using zram inside of various K8s pods? If so, I'd be interested in knowing the outcome.
[-]
- asgeirn 14 hours ago
  Inside the pods it makes no sense, but I do enable it on some memory-constrained worker nodes. Note that the kubelet by default refuses to start if the machine has any swap at all.
pengaru 15 hours ago
LZ4 looks like the sweet spot to me, you get OK compression and the performance hit is minimal.
[-]
- masklinn 14 hours ago
  As all tradeoffs it depends on your requirements. lz4 is ridiculously fast so it essentially gets you more ram for free, zstd is a lot more CPU-intensive but also has a much higher compression ratio. So if your RAM is severely undersized for some of your workloads and / or you're not especially CPU-bound until disk swap takes you out, then zstd gives you a lot more headroom.
mjunyeopkim 21 hours ago
[dead]