+
Skip to content

Conversation

psvri
Copy link

@psvri psvri commented Oct 10, 2025

Hello,

In my WSL environment, I found that gnu md5sum was faster than coreutils md5sum. So I investigated via strace and found that GNU used 32KiB size buffers while coreutils used 8KiB as shown below to read files. This PR aims to do the same, there by closing the performance gap to an extent.

md5sum

truncate -s 10M nullbytes
strace md5sum nullbytes 
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768) = 32768

strace ./coreutils_main hashsum --md5 nullbytes
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192

Version details

md5sum --version
md5sum (GNU coreutils) 9.4
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Ulrich Drepper, Scott Miller, and David Madore.

With this PR I dont much improvement when compared to main branch for smaller inputs. Its only for larger inputs that we will notice anything.

Benchmarks when compared to main
truncate -s 1K nullbytes;
hyperfine --warmup 3 "./coreutils_main hashsum --md5 nullbytes" "./coreutils_patch hashsum --md5 nullbytes";
Benchmark 1: ./coreutils_main hashsum --md5 nullbytes
  Time (mean ± σ):      11.3 ms ±   0.3 ms    [User: 1.4 ms, System: 0.8 ms]
  Range (min … max):     9.7 ms …  12.9 ms    243 runs

Benchmark 2: ./coreutils_patch hashsum --md5 nullbytes
  Time (mean ± σ):      10.8 ms ±   0.9 ms    [User: 1.4 ms, System: 0.8 ms]
  Range (min … max):     9.0 ms …  12.2 ms    243 runs

Summary
  ./coreutils_patch hashsum --md5 nullbytes ran

truncate -s 10K nullbytes;
hyperfine --warmup 3 "./coreutils_main hashsum --md5 nullbytes" "./coreutils_patch hashsum --md5 nullbytes";
Benchmark 1: ./coreutils_main hashsum --md5 nullbytes
  Time (mean ± σ):      11.5 ms ±   0.4 ms    [User: 1.5 ms, System: 0.8 ms]
  Range (min … max):     9.3 ms …  12.2 ms    232 runs

Benchmark 2: ./coreutils_patch hashsum --md5 nullbytes
  Time (mean ± σ):      11.2 ms ±   0.6 ms    [User: 1.5 ms, System: 0.7 ms]
  Range (min … max):     8.9 ms …  12.4 ms    248 runs

Summary
  ./coreutils_patch hashsum --md5 nullbytes ran
    1.02 ± 0.06 times faster than ./coreutils_main hashsum --md5 nullbytes

truncate -s 100K nullbytes;
hyperfine --warmup 3 "./coreutils_main hashsum --md5 nullbytes" "./coreutils_patch hashsum --md5 nullbytes";
Benchmark 1: ./coreutils_main hashsum --md5 nullbytes
  Time (mean ± σ):      13.0 ms ±   0.3 ms    [User: 1.7 ms, System: 0.8 ms]
  Range (min … max):    12.4 ms …  14.0 ms    206 runs

Benchmark 2: ./coreutils_patch hashsum --md5 nullbytes
  Time (mean ± σ):      12.0 ms ±   0.3 ms    [User: 1.5 ms, System: 0.9 ms]
  Range (min … max):    11.4 ms …  13.4 ms    231 runs

Summary
  ./coreutils_patch hashsum --md5 nullbytes ran
    1.08 ± 0.03 times faster than ./coreutils_main hashsum --md5 nullbytes

truncate -s 1M nullbytes;
hyperfine --warmup 3 "./coreutils_main hashsum --md5 nullbytes" "./coreutils_patch hashsum --md5 nullbytes";
Benchmark 1: ./coreutils_main hashsum --md5 nullbytes
  Time (mean ± σ):      27.9 ms ±   0.4 ms    [User: 2.8 ms, System: 2.4 ms]
  Range (min … max):    27.1 ms …  29.1 ms    102 runs

Benchmark 2: ./coreutils_patch hashsum --md5 nullbytes
  Time (mean ± σ):      17.3 ms ±   0.4 ms    [User: 2.8 ms, System: 1.2 ms]
  Range (min … max):    15.0 ms …  18.4 ms    165 runs

Summary
  ./coreutils_patch hashsum --md5 nullbytes ran
    1.61 ± 0.04 times faster than ./coreutils_main hashsum --md5 nullbytes

truncate -s 10M nullbytes;
hyperfine --warmup 3 "./coreutils_main hashsum --md5 nullbytes" "./coreutils_patch hashsum --md5 nullbytes";
Benchmark 1: ./coreutils_main hashsum --md5 nullbytes
  Time (mean ± σ):     181.7 ms ±   2.2 ms    [User: 18.6 ms, System: 13.8 ms]
  Range (min … max):   174.8 ms … 183.8 ms    16 runs

Benchmark 2: ./coreutils_patch hashsum --md5 nullbytes
  Time (mean ± σ):      69.6 ms ±   3.5 ms    [User: 15.3 ms, System: 4.3 ms]
  Range (min … max):    60.3 ms …  73.2 ms    40 runs

Summary
  ./coreutils_patch hashsum --md5 nullbytes ran
    2.61 ± 0.13 times faster than ./coreutils_main hashsum --md5 nullbytes

Even with this PR, I still find that GNU hashsum is still 20% faster in my system for very large inputs.

truncate -s 10M nullbytes;
hyperfine --warmup 3 "./coreutils_patch hashsum --md5 nullbytes" "md5sum nullbytes"
Benchmark 1: ./coreutils_patch hashsum --md5 nullbytes
  Time (mean ± σ):      69.1 ms ±   1.1 ms    [User: 15.8 ms, System: 4.3 ms]
  Range (min … max):    66.9 ms …  71.2 ms    42 runs

Benchmark 2: md5sum nullbytes
  Time (mean ± σ):      57.7 ms ±   1.7 ms    [User: 13.1 ms, System: 2.6 ms]
  Range (min … max):    51.0 ms …  61.3 ms    50 runs

Summary
  md5sum nullbytes ran
    1.20 ± 0.04 times faster than ./coreutils_patch hashsum --md5 nullbytes

Since I am doing this with WSL, it would be nice if others could also check this patch once on there linux systems.

@sylvestre
Copy link
Contributor

please run hyperfine with the three commands
without your change, with your change and gnu
it is easier to understand

use uucore::translate;

const NAME: &str = "hashsum";
const READ_BUFFER_SIZE: usize = 32 * 1024;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add an comment explaining what is this magic number

(like GNU is fine)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/usage_vs_getopt (passes in this run but fails in the 'main' branch)

@sylvestre sylvestre force-pushed the improve_hashsum branch 2 times, most recently from 833e079 to 6220e13 Compare October 11, 2025 14:31
@cakebaker cakebaker changed the title hashsum: improve hashsump using 32KiB bufreader hashsum: improve hashsum using 32KiB bufreader Oct 11, 2025
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

@psvri
Copy link
Author

psvri commented Oct 11, 2025

please run hyperfine with the three commands without your change, with your change and gnu it is easier to understand

hyperfine results comparing main, PR and GNU
truncate -s 10K nullbytes;
hyperfine --warmup 3 "./coreutils_main hashsum --md5 nullbytes" "./coreutils_patch hashsum --md5 nullbytes" "md5sum nullbytes";
Benchmark 1: ./coreutils_main hashsum --md5 nullbytes
  Time (mean ± σ):      11.7 ms ±   0.3 ms    [User: 1.4 ms, System: 0.8 ms]
  Range (min … max):    10.1 ms …  13.1 ms    276 runs

Benchmark 2: ./coreutils_patch hashsum --md5 nullbytes
  Time (mean ± σ):      11.6 ms ±   0.3 ms    [User: 1.5 ms, System: 0.8 ms]
  Range (min … max):    10.1 ms …  12.5 ms    237 runs

Benchmark 3: md5sum nullbytes
  Time (mean ± σ):       1.9 ms ±   0.2 ms    [User: 0.7 ms, System: 0.1 ms]
  Range (min … max):     1.5 ms …   2.7 ms    1152 runs

  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.

Summary
  md5sum nullbytes ran
    6.06 ± 0.57 times faster than ./coreutils_patch hashsum --md5 nullbytes
    6.11 ± 0.57 times faster than ./coreutils_main hashsum --md5 nullbytes


truncate -s 100K nullbytes;
hyperfine --warmup 3 "./coreutils_main hashsum --md5 nullbytes" "./coreutils_patch hashsum --md5 nullbytes" "md5sum nullbytes";
Benchmark 1: ./coreutils_main hashsum --md5 nullbytes
  Time (mean ± σ):      13.1 ms ±   0.6 ms    [User: 1.5 ms, System: 1.0 ms]
  Range (min … max):    10.9 ms …  13.9 ms    211 runs

Benchmark 2: ./coreutils_patch hashsum --md5 nullbytes
  Time (mean ± σ):      12.1 ms ±   0.6 ms    [User: 1.6 ms, System: 0.8 ms]
  Range (min … max):     9.6 ms …  13.7 ms    227 runs

Benchmark 3: md5sum nullbytes
  Time (mean ± σ):       2.5 ms ±   0.2 ms    [User: 0.8 ms, System: 0.2 ms]
  Range (min … max):     2.0 ms …   3.3 ms    1084 runs

  Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`--shell=none` option to disable the shell completely.

Summary
  md5sum nullbytes ran
    4.84 ± 0.41 times faster than ./coreutils_patch hashsum --md5 nullbytes
    5.24 ± 0.43 times faster than ./coreutils_main hashsum --md5 nullbytes



truncate -s 1M nullbytes;
hyperfine --warmup 3 "./coreutils_main hashsum --md5 nullbytes" "./coreutils_patch hashsum --md5 nullbytes" "md5sum nullbytes";
Benchmark 1: ./coreutils_main hashsum --md5 nullbytes
  Time (mean ± σ):      28.9 ms ±   0.5 ms    [User: 3.1 ms, System: 2.2 ms]
  Range (min … max):    27.9 ms …  31.0 ms    98 runs

Benchmark 2: ./coreutils_patch hashsum --md5 nullbytes
  Time (mean ± σ):      17.2 ms ±   0.4 ms    [User: 2.6 ms, System: 1.4 ms]
  Range (min … max):    15.2 ms …  18.3 ms    165 runs

Benchmark 3: md5sum nullbytes
  Time (mean ± σ):       7.5 ms ±   0.4 ms    [User: 1.8 ms, System: 0.5 ms]
  Range (min … max):     6.2 ms …   8.9 ms    376 runs

Summary
  md5sum nullbytes ran
    2.30 ± 0.14 times faster than ./coreutils_patch hashsum --md5 nullbytes
    3.85 ± 0.22 times faster than ./coreutils_main hashsum --md5 nullbytes


truncate -s 10M nullbytes;
hyperfine --warmup 3 "./coreutils_main hashsum --md5 nullbytes" "./coreutils_patch hashsum --md5 nullbytes" "md5sum nullbytes";
Benchmark 1: ./coreutils_main hashsum --md5 nullbytes
  Time (mean ± σ):     186.4 ms ±   1.5 ms    [User: 19.0 ms, System: 13.3 ms]
  Range (min … max):   183.1 ms … 188.2 ms    16 runs

Benchmark 2: ./coreutils_patch hashsum --md5 nullbytes
  Time (mean ± σ):      71.1 ms ±   1.7 ms    [User: 14.4 ms, System: 5.5 ms]
  Range (min … max):    65.5 ms …  74.3 ms    40 runs

Benchmark 3: md5sum nullbytes
  Time (mean ± σ):      56.7 ms ±   3.2 ms    [User: 9.3 ms, System: 5.7 ms]
  Range (min … max):    52.0 ms …  62.8 ms    50 runs

Summary
  md5sum nullbytes ran
    1.25 ± 0.08 times faster than ./coreutils_patch hashsum --md5 nullbytes
    3.28 ± 0.19 times faster than ./coreutils_main hashsum --md5 nullbytes

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@sylvestre
Copy link
Contributor

both codspeed and my system don't see any difference:

Summary
  /usr/bin/md5sum nullbytes ran
    1.09 ± 0.32 times faster than ./target/release/coreutils.prev hashsum --md5 nullbytes
    1.14 ± 0.35 times faster than ./target/release/coreutils hashsum --md5 nullbytes

or

  /usr/bin/md5sum moby64.txt ran
    1.30 ± 0.02 times faster than ./target/release/coreutils.prev hashsum --md5 moby64.txt
    1.30 ± 0.02 times faster than ./target/release/coreutils hashsum --md5 moby64.txt

@sylvestre
Copy link
Contributor

i guess it is windows specific ?!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载