Array#push causes “stack level too deep” error with large arrays
I made two arrays, each with 1 million items:
a1 = 1_000_000.times.to_a
a2 = a1.clone
I tried to push a2 into a1:
a1.push *a2
This returns SystemStackError: stack level too deep.
However, when I try with concat, I don't get the error:
a1.concat a2
a1.length # => 2_000_000
I also don't get the error with the splat operator:
a3 = [*a1, *a2]
a3.length # => 2_000_000
Why is this the case? I looked at the documentation for Array#push, and it is written in C. I suspect that it may be doing some recursion under the hood, and that's why it's causing this error for large arrays. Is this correct? Is it not a good idea to use push for large arrays?
I think that this is not a recursion error, but an argument stack error. You are running up against the limit of the Ruby VM stack depth for arguments.
The problem is the splat operator, which is passed as an argument to push. The splat operator is expanded into a million element argument list for push.
As function arguments are passed as stack elements, and the pre-configured max size of the Ruby VM stack size is:
RubyVM::DEFAULT_PARAMS[:thread_vm_stack_size]
=> 1048576
..this is where the limit comes from.
You can try the following:
RUBY_THREAD_VM_STACK_SIZE=10000000 ruby array_script.rb
..and it will work fine.
This is also the reason you want to use concat instead, as the whole array can be passed as one single reference, and concat will then process the array internally. As opposed to push + splat, which will try to use the stack as a temporary storage for all the array elements.
Casper has already answered the question in the title and given you a solution you can use to make a1.push *a2 work, but I'd like to talk about the last question you asked, on whether it's a good idea.
More specifically, if you are going to work with arrays that are millions of items long in production code, performance becomes something to keep in mind. http://www.continuousthinking.com/2011/09/07/ruby_array_plus_vs_push.html has an overview of 4 different ways to handle array concatenation in ruby: +, .push, << and .concat.
There they mention that array.push will effectively handle each argument separately, and increase the array size by 50% every time the array is too small. This means that in your example, a will be increased in size 2 times and get 1 million appends. Meanwhile, array.concat will first calculate the new size of the array, extend the original array and then copy the new array into the right place.
For situations like yours, concat will most likely be more performant, both from a memory and from a CPU usage perspective. However, without benchmarks I can't say for sure. My recommendation is to measure the time and memory usage to do both operations for the size of the arrays you want to handle. concat will most likely come out on top, but I might be mistaken on that front.
a1 = 1_000_000.times.to_a
a2 = a1.clone
I tried to push a2 into a1:
a1.push *a2
This returns SystemStackError: stack level too deep.
However, when I try with concat, I don't get the error:
a1.concat a2
a1.length # => 2_000_000
I also don't get the error with the splat operator:
a3 = [*a1, *a2]
a3.length # => 2_000_000
Why is this the case? I looked at the documentation for Array#push, and it is written in C. I suspect that it may be doing some recursion under the hood, and that's why it's causing this error for large arrays. Is this correct? Is it not a good idea to use push for large arrays?
I think that this is not a recursion error, but an argument stack error. You are running up against the limit of the Ruby VM stack depth for arguments.
The problem is the splat operator, which is passed as an argument to push. The splat operator is expanded into a million element argument list for push.
As function arguments are passed as stack elements, and the pre-configured max size of the Ruby VM stack size is:
RubyVM::DEFAULT_PARAMS[:thread_vm_stack_size]
=> 1048576
..this is where the limit comes from.
You can try the following:
RUBY_THREAD_VM_STACK_SIZE=10000000 ruby array_script.rb
..and it will work fine.
This is also the reason you want to use concat instead, as the whole array can be passed as one single reference, and concat will then process the array internally. As opposed to push + splat, which will try to use the stack as a temporary storage for all the array elements.
Casper has already answered the question in the title and given you a solution you can use to make a1.push *a2 work, but I'd like to talk about the last question you asked, on whether it's a good idea.
More specifically, if you are going to work with arrays that are millions of items long in production code, performance becomes something to keep in mind. http://www.continuousthinking.com/2011/09/07/ruby_array_plus_vs_push.html has an overview of 4 different ways to handle array concatenation in ruby: +, .push, << and .concat.
There they mention that array.push will effectively handle each argument separately, and increase the array size by 50% every time the array is too small. This means that in your example, a will be increased in size 2 times and get 1 million appends. Meanwhile, array.concat will first calculate the new size of the array, extend the original array and then copy the new array into the right place.
For situations like yours, concat will most likely be more performant, both from a memory and from a CPU usage perspective. However, without benchmarks I can't say for sure. My recommendation is to measure the time and memory usage to do both operations for the size of the arrays you want to handle. concat will most likely come out on top, but I might be mistaken on that front.
Comments
Post a Comment