469,270 Members | 1,114 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,270 developers. It's quick & easy.

How to do a quicker array replace

785 Expert 512MB
Is there a faster way for replacing cats with dogs in every string of the array?
Is there a way with a shorter expression, but not necessarily faster?
Expand|Select|Wrap|Line Numbers
  1. @temp = map { $_ =~ s/cats/dogs/g; $_ } @temp;
Sep 8 '10 #1

✓ answered by Oralloy

@chaarmann,

How about this:

Expand|Select|Wrap|Line Numbers
  1. for(@temp){s/cats/dogs/g}
I'm not sure it's quicker, but it's definitely a few characters smaller.

9 20105
Oralloy
983 Expert 512MB
@chaarmann,

How about this:

Expand|Select|Wrap|Line Numbers
  1. for(@temp){s/cats/dogs/g}
I'm not sure it's quicker, but it's definitely a few characters smaller.
Sep 8 '10 #2
chaarmann
785 Expert 512MB
@Oralloy

Thank you very much.
It's not only smaller but also quicker (more than double speed).

Here is the test result:
Expand|Select|Wrap|Line Numbers
  1.            Rate test_map test_for
  2. test_map 6.25/s       --     -55%
  3. test_for 14.0/s     125%       --
  4.  
Here is the test:
Expand|Select|Wrap|Line Numbers
  1. package test;
  2.  
  3. use warnings; 
  4. use strict;
  5. use Benchmark qw(:all);
  6.  
  7. # main program
  8. my @array;
  9. foreach my $number (0..9999) {
  10.     $array[$number] = "cats...cats......cats";
  11. }
  12.  
  13. cmpthese(-1, {
  14.   'test_map' => \&test_map,
  15.   'test_for' => \&test_for,
  16. });
  17.  
  18. sub test_map {
  19.     @array = map { $_ =~ s/cats/dogs/g; $_ } @array;
  20.     @array = map { $_ =~ s/dogs/cats/g; $_ } @array;
  21. }
  22.  
  23. sub test_for {
  24.     for (@array) { s/cats/dogs/g };
  25.     for (@array) { s/dogs/cats/g };
  26. }
An explanation could be that it makes a deep copy with "map", but not with "for".
Sep 9 '10 #3
Oralloy
983 Expert 512MB
@chaarmann,

I'm not sure that you need to do the reassignment after executing your maps, given that map() exposes the actual element to the substitution. If it does, then what you're effectively doing is overwriting @array, which is already modified, with the output of map, and thus changing the array's spine.

I'm pretty sure that the times will be a lot closer, if you drop the assignments in test_map.

Now, the question is, of course, does the alternate map implementation work? Perl always surprises me by where it has side effects (in this case, the string updates).

Cheers!
Sep 9 '10 #4
chaarmann
785 Expert 512MB
@Oralloy,

I was assuming wrongly that Perl would copy the modified lines of the map-command to a new array, but it's replacing them. Thanks for pointing that out. So I added a third test:
Expand|Select|Wrap|Line Numbers
  1. sub test_map_unassigned {
  2.     map { $_ =~ s/cats/dogs/g; $_ } @array;
  3.     map { $_ =~ s/dogs/cats/g; $_ } @array;
  4. }
  5.  
The result is:

Expand|Select|Wrap|Line Numbers
  1.                       Rate         test_map test_map_unassigned         test_for
  2. test_map            6.00/s               --                -32%             -58%
  3. test_map_unassigned 8.82/s              47%                  --             -38%
  4. test_for            14.2/s             136%                 60%               --
That is strange. I was expecting the second test as quick as the first. A command like "@a=@b" should only copy references (shallow copy), that means the array address itself and not all its elements. This is very quick and should take no measurable time overhead.
Sep 9 '10 #5
Oralloy
983 Expert 512MB
@Chaarmann,

I'm a belt and suspenders guy, so I'd say you should inspect the computed values to make sure that the updates are happening as I believe they're supposed to. I've screwed myself more than once with Perl by assuming (or not assuming) a side effect.

That said, there are a couple possibilities - first is the garbage collector - is it being invoked during one of the tests? Garbage collection can take a lot of time at random times, so repeating tests is also a good thing.

Also, the reason map is probably slower than the for loop is that it is building a result list, even though your code is dropping it on the floor (void context). That means that you pay the cost of memory allocation as an unexpected side effect.

Re-assigning to @array doubles that allocation cost, and suffers the cost of the reference count updates as the old spine of @array is destroyed. That explains the execution time differences, I think.

I wonder if the release of the spine of an array is O(n) time, or O(n^2). That would explain a lot of the time cost you're seeing for the test.

BTW, what happens if you test 100,000 times, rather than just 10,000 times?

Cheers!
Sep 9 '10 #6
chaarmann
785 Expert 512MB
@Oralloy.
Thank's, that's a good explanation.
So as what I understand is that
  1. if I use "for", it doesn't change the array spine (object), only all its string references to the new strings.
  2. When I use "map without assignment", it creates a new array object, assigns to the new strings, and when finished it drops the old array object and assigns the new array object to variable @array.
  3. When I use "map with assignment", it does the same as "map without assignment", then additionally builds a new array (a third object) and loops through all string references again to assign all strings from the second to the new array, then drops the second array object.

Question: Is Perl not doing any performance optimization when compiling the code?
By the way:

I have verified the values with
Expand|Select|Wrap|Line Numbers
  1. print "map: array[0]=$array[0]\n";
in all tests after every array, and it's really changing cats to dogs, and backward later on, in all test. I just commented out these lines before rerunning for performance, and I stripped all comments before listing it here, to make it small.

I repeated the test a lot of times, I even switched the order of the tests, but always the same results. So garbage collection is no issue (it's a powerful Sun Solaris station with a lot of RAM).

I tested with 100,000, but then it takes more than 1 second for a sigle run so the benchmark warns me that the result is not reliable:
Expand|Select|Wrap|Line Numbers
  1.             (warning: too few iterations for a reliable count)
  2.             (warning: too few iterations for a reliable count)
  3.             (warning: too few iterations for a reliable count)
  4.                     s/iter         test_map test_map_unassigned         test_for
  5. test_map              1.68               --                -32%             -59%
  6. test_map_unassigned   1.14              47%                  --             -39%
  7. test_for             0.695             142%                 64%               --
Here is the output with 1,000 elements array size:
Expand|Select|Wrap|Line Numbers
  1.                       Rate         test_map test_map_unassigned         test_for
  2. test_map            66.0/s               --                -27%             -54%
  3. test_map_unassigned 90.7/s              37%                  --             -36%
  4. test_for             142/s             115%                 57%               --
Sep 9 '10 #7
Oralloy
983 Expert 512MB
@chaarman,

Thanks for rounding out the tests. It is good to see that they scale approximately linearly. I guess my faith in programmers isn't destroyed yet. :)

That said, I appreciate going through the exercise with you. I learned something about Perl, too. It's always good to sharpen our pencils.

Thanks and Good Luck!
Sep 9 '10 #8
KennB
2
@temp = map { $_ =~ s/cats/dogs/g; $_ } @temp;

I think that what is happening is that, if a replacement occurs, the value '1' gets placed into @Temp. Otherwise, a null value gets inserted. And for some reason, the original array gets the replacement changes.

Romano
Jan 6 '13 #9
KennB
2
update on my reply

@temp = map { $_ =~ s/cats/dogs/g; $_ } @temp;

I think that what is happening is that, if a substitution occurs, the $Count value for the number of substitutions gets placed into @temp. Otherwise, a null value gets inserted. And for some reason, the original array gets the replacement changes.

My solution was to make a copy of the original array and a third copy to receive the $Count values.

@Test = ("XXXXX", "NNNNN", "XNNNN", "XXNNN");
@Test1 = @Test;
@Test2 = map(s/X/x/g, @Test1);

for($i = 0; $i <= $#Test; $i++)
{
print "$i Test = $Test[$i] Test1 = $Test1[$i] Test2 = $Test2[$i]\n";
}


0 Test = XXXXX Test1 = xxxxx Test2 = 5
1 Test = NNNNN Test1 = NNNNN Test2 =
2 Test = XNNNN Test1 = xNNNN Test2 = 1
3 Test = XXNNN Test1 = xxNNN Test2 = 2

@Test is unchanged. @Test1 gets the substitution changes and @Test2 gets the $Count values.
One can discard @Test2 or use it to get indices for the strings in which substitutions occurred.

Both of these statements give the same result:
@Test2 = map(~s/X/x/g, @Test1);
@Test2 = map($_ =~s/X/x/g, @Test1);

I do not know why the original array values gets changed.

Romano
Jan 7 '13 #10

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

13 posts views Thread by Chris Mantoulidis | last post: by
13 posts views Thread by J. J. Cale | last post: by
2 posts views Thread by J. J. Cale | last post: by
6 posts views Thread by CodeRazor | last post: by
2 posts views Thread by Frank | last post: by
3 posts views Thread by able | last post: by
20 posts views Thread by silverburgh.meryl | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.