Dieser Post wurde aus meiner alten WordPress-Installation importiert. Sollte es Darstellungsprobleme, falsche Links oder fehlende Bilder geben, bitte einfach hier einen Kommentar hinterlassen. Danke.
How to convert a hash to a string? Perl is TIMTOWTDY but which way is the fastest? I need a checksum (hash, digest) for the hash, so the string must be the same for the same hash every time. Hash keys are not sorted, even a simple join('', keys(%hash)) may be different for each call (if the hash has at least two keys).
I started a little race searching for new ways and benchmarking them, the result was:
perl -MStorable=freeze -MBenchmark -MYAML -MJSON::XS -MJSON=to_json -MData::Dumper=Dumper -le '$Storable::canonical = 1;my $jxss = JSON::XS->new->canonical;timethese(0,{ Dumper => sub { Dumper(\%INC); }, self => sub { join("\x00",map { $_."\x01".$INC{$_} } sort keys %INC); }, JSON => sub { to_json(\%INC); }, JSONsort => sub { JSON->new->canonical->encode(\%INC) }, JSONXS => sub { JSON::XS::encode_json(\%INC); }, JSONXSsort => sub { JSON::XS->new->canonical->encode(\%INC) }, JSONXSpre => sub { $jxss->encode(\%INC); }, YAML => sub { YAML::Dump(\%INC); }, Storable => sub { freeze(\%INC); }, } );'The Benchmark module shows the time elapsed for each candidate (wallclock seconds), the CPU time used and the number of cycles the candidate looped (n=XXX at the end) but the most important part is the number of times per second each candidate could run, this is the best (only?) value to compare the results.Benchmark: running Dumper, JSON, JSONXS, JSONXSsort, JSONsort, Storable, YAML, self for at least 3 CPU seconds...
Dumper: 3 wallclock secs ( 3.17 usr + 0.00 sys = 3.17 CPU) @ 7704.10/s (n=24422)
JSON: 4 wallclock secs ( 3.08 usr + 0.03 sys = 3.11 CPU) @ 62825.72/s (n=195388)
JSONsort: 4 wallclock secs ( 3.11 usr + 0.00 sys = 3.11 CPU) @ 42986.17/s (n=133687)
JSONXS: 3 wallclock secs ( 3.12 usr + 0.00 sys = 3.12 CPU) @ 88461.54/s (n=276000)
JSONXSpre: 4 wallclock secs ( 3.16 usr + 0.00 sys = 3.16 CPU) @ 48809.49/s (n=154238)
JSONXSsort: 3 wallclock secs ( 3.19 usr + 0.00 sys = 3.19 CPU) @ 44238.24/s (n=141120)
Storable: 4 wallclock secs ( 3.13 usr + 0.00 sys = 3.13 CPU) @ v19036.74/s (n=59585)
YAML: 4 wallclock secs ( 3.14 usr + 0.00 sys = 3.14 CPU) @ 175.48/s (n=551)
self: 3 wallclock secs ( 3.11 usr + 0.00 sys = 3.11 CPU) @ 20418.65/s (n=63502)
Dumper, JSON, JSONXS and YAML don't work because they don't sort the hash. self works only sometimes because it doesn't support hash trees (references as value of a hash item).
JSONXSpre is the winner: Create a sorting JSON::XS object once and reuse it all the time is slightly faster than creating the object each run, but I didn't expect Storable, Dumper and finally YAML to be that bad.
It's so easy to run Benchmarks with Perl, no matter if you prefer a oneliner (like I did above) or put your source into a small perl script file.
7 Kommentare. Schreib was dazu-
ilmari
1.08.2012 3:54
Antworten
-
Max
4.08.2012 8:58
Antworten
-
Joshua Keroes
6.08.2012 18:29
Antworten
-
Reini Urban
6.08.2012 19:14
Antworten
-
demerphq
8.08.2012 10:31
Antworten
-
beefreak@freenet.de
23.08.2012 15:37
Antworten
-
Sebastian
23.08.2012 21:33
Antworten
Have you tried Data::Pond? It's written in XS for speed and uses a subset of Perl syntax to represent the data (similarly to what JSON is to Javascript).
And so Data::MessagePack which is pretty fast AND gives small results:
my $mp = Data::MessagePack->new->canonical;
then
MessagePack => sub { $mp->pack(\%INC); },
gives 53537.36/s while JSONXSpre gives 63099.54/s, on my host.
It would be worthwhile to add all of the YAML implementations. YAML, if YAML::XS isn't installed, won't be competitive.
perl -MBenchmark -MYAML -MYAML::XS -MYAML::Tiny -MYAML::Syck -e 'timethese(0,{"YAML::XS" => sub { YAML::XS::Dump(\%INC) }, "YAML" => sub { YAML::Dump(\%INC) }, "YAML::Tiny" => sub { YAML::Tiny::Dump(\%INC) }, "YAML::Syck" => sub { YAML::Syck::Dump(\%INC) } } )'
Benchmark: running YAML, YAML::Syck, YAML::Tiny, YAML::XS for at least 3 CPU seconds...
YAML: 3 wallclock secs ( 3.16 usr + 0.00 sys = 3.16 CPU) @ 468.04/s (n=1479)
YAML::Syck: 3 wallclock secs ( 3.16 usr + 0.00 sys = 3.16 CPU) @ 20095.57/s (n=63502)
YAML::Tiny: 3 wallclock secs ( 3.16 usr + 0.00 sys = 3.16 CPU) @ 7174.05/s (n=22670)
YAML::XS: 4 wallclock secs ( 3.21 usr + 0.01 sys = 3.22 CPU) @ 6829.81/s (n=21992)
For small hashes Data:::MessagePack is usually faster than JSON::XS,
for bigger hashes JSON::XS is fastest.
serialize:
Rate storable json mp
storable 91022/s -- -33% -51%
json 136437/s 50% -- -26%
mp 185579/s 104% 14% --
Just wanted to add that this is a pretty crap benchmark.
Unless you really are serializing a small simple hash of string values I wouldn't put any faith in the numbers posted here.
simple sort, join('', sort keys %hash) ???
Simple, but doesn't work with multi-level hash trees.