Dieser Post wurde aus meiner alten WordPress-Installation importiert. Sollte es Darstellungsprobleme, falsche Links oder fehlende Bilder geben, bitte einfach hier einen Kommentar hinterlassen. Danke.
How to concat an array into a string? Perl is TIMTOWTDI and I could imagine two common ways - but which one is the best?
Actually, "best" isn't obvious, it depends on the data and project preferences. I had to improve the memory usage of a module handling a huge amount of data. I thought joinmight be the best choice to concat the array into a string (not doing this at all would be best, but it's not an option here), but it isn't.
First, get a base number of the data size. It's not exact and includes other parts, but ok to get an idea of the sizes we're working on:
perl -le '@a = (map { "x"x1024000 } (0..100)); system "ps u $$";'USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMANDsewi 28588 6.0 0.8 124344 104180 pts/0 S+ 16:31 0:00 perl -le @a = (map { "x"x1024000 } (0..100)); systNext, compare join vs. a element-by-element concat using a for loop:
perl -le '@a = (map { "x"x1024000 } (0..100)); $y = join("\n",@a); system "ps u $$";'USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMANDsewi 28602 0.0 2.5 326352 306136 pts/0 S+ 16:32 0:00 perl -le @a = (map { "x"x1024000 } (0..100)); $y =You should look at the RSS colum, it's not 100% perfect, because swapped memory isn't shown here and Perl's own memory manager is sitting between the script and the kernel, but it's enough to compare things.perl -le '@a = (map { "x"x1024000 } (0..100)); for my $b (@a) { $y .= $b; } system "ps u $$";'USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMANDsewi 28606 0.0 1.6 245596 205148 pts/0 S+ 16:32 0:00 perl -le @a = (map { "x"x1024000 } (0..100)); for
A very quick-and-dirty benchmark but still shows that (after substracting the "base" data usage) join is using twice the amount of the concat loop: (306136 - 104180) vs. (205148 - 104180). All numbers are kilobytes (kb)*.
But memory isn't important if you're dealing with small pieces of data, maybe speed is an issue:
perl -MBenchmark -le '@a = (map { "x"x1024000 } (0..100));timethese(0, { _join => sub { my $y = join("\n",@a); }, _concat => sub { my $y; for my $b (@a) { $y .= $b; } }});'The manual concat loop is even faster than Perl's internal join: 59 runs per second compared to 32 runs per second for join.Benchmark: running _concat, _join for at least 3 CPU seconds...
_concat: 3 wallclock secs ( 3.26 usr + 0.00 sys = 3.26 CPU) @ 59.20/s (n=193)
_join: 3 wallclock secs ( 3.10 usr + 0.00 sys = 3.10 CPU) @ 32.58/s (n=101)
But wait... there is not "the one best" way. The same benchmark with a very small change:
perl -MBenchmark -le '@a = (map { "x"x1024 } (0..10));timethese(0, { _join => sub { my $y = join("\n",@a); }, _concat => sub { my $y; for my $b (@a) { $y .= $b; } }});'The internal join clearly wins this time some 100.000 runs per second ahead of the concat loop. Most arrays are usually closer to "10 times 1 kB" than "100 times 1 MB", so join seems to be the right one for most operations - but not all.Benchmark: running _concat, _join for at least 3 CPU seconds...
_concat: 3 wallclock secs ( 3.15 usr + 0.00 sys = 3.15 CPU) @ 644350.48/s (n=2029704)
_join: 3 wallclock secs ( 3.11 usr + 0.00 sys = 3.11 CPU) @ 749435.69/s (n=2330745)
I retried the same test with "100 times 10 bytes" and join turned out to be nearly twice as fast as the loop.
* Don't give me that silly creations like "kiB" or "MiB". One kB are 1024 bytes and one MB are 1024 * 1024 bytes, noone needs that "i" junk. I started developing at a time where people knew that a "kB" is 1024 bytes, there is absolutely no reason why someone should start with 1000 bytes. Show me a binary memory chip having 1000 bytes and I might start accepting kiB's. "MiB" is "Men in Black", nothing else. Read this before complaining.
15 Kommentare. Schreib was dazu-
Alexander Hartmaier (abraxxa)
11.03.2013 18:54
Antworten
-
Sebastian Willing
11.03.2013 18:56
Antworten
-
Sebastian Willing
11.03.2013 18:56
Antworten
-
Sebastian
11.03.2013 20:48
Antworten
-
Sebastian Willing
11.03.2013 20:56
Antworten
-
Sebastian Willing
11.03.2013 20:56
Antworten
-
Andrew
15.03.2013 14:45
Antworten
-
Sebastian Willing
15.03.2013 16:06
Antworten
-
Sebastian Willing
15.03.2013 16:06
Antworten
-
Chas. Owens
5.04.2013 18:54
Antworten
-
Sebastian Willing
5.04.2013 18:57
Antworten
-
Sebastian Willing
5.04.2013 18:57
Antworten
-
Sebastian
5.04.2013 21:30
Antworten
-
Sebastian Willing
5.04.2013 21:36
Antworten
-
Sebastian Willing
5.04.2013 21:36
Antworten
Note that the bandwidth of network links is specified in KiB/s (1000 based), so those units are useful as well!
[Imported from blog]
[Imported from blog]
base-1000 is also useless there. We used to use calculated them base-1024 and noone complained.
[Imported from blog]
[Imported from blog]
Those of us who actually learned to count in school know that k is a prefix for 1000, and M is a prefix for 1,000,000. To blow off the SI and create confusion by inventing different values for them is the height of arrogance and stupidity.
[Imported from blog]
[Imported from blog]
You are making the join do extra work (concatenating with "\n"). If you make the loop perform the same work, the performance advantage drops to next to nothing (in fact, the join eventually starts to beat the concat):
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw/cmpthese/;
my @a = qw/a b c d e/;
my %subs = (
join => sub {
my $s = join "-", @a;
return $s;
},
concat => sub {
my $s = @a[0];
for my $v (@a[1 .. $#a]) {
$s .= "-$v";
}
return $s;
},
);
for my $sub (keys %subs) {
print "$sub: ", $subs{$sub}(), "\n";
}
for my $n (10, 50, 100) {
@a = ("x" x 1024000) x $n;
print "\n$n 1,024,000 byte items\n";
cmpthese -3, \%subs;
}
[Imported from blog]
[Imported from blog]
Thank you, you're right.
[Imported from blog]
[Imported from blog]