Array#flatten -> Array#collapse (my first C extension)
If you work in Ruby, and probably in other dynamically typed languages,
you’ve likely had to make use of flatten
. Say you have an array like
this: [[3, [3, 3, 3]], [3, [3, 3, 3]], [3, [3, 3, 3]]]
and what you
want is an array like this: [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
.
Trivial, you say: array.flatten
.
What if you didn’t have access to flatten
? If you had to reimplement it,
what would you write?
Well, it’s helpful, to me anyway, to think about how lists (arrays) are recursive structures like trees. When we want to deal with deeply nested recursive stuff, we can always turn to a recursive function:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
def flatten(array)
flat = []
rec_flat = proc {|a,f|
a.each do |e|
if e.is_a? Array
rec_flat.call(e,f)
else
f << e
end
end
}
rec_flat.call(array,flat)
flat
end
Though as I discussed in more recursion we want to be careful about exhausting the stack. So, let’s manage the stack ourselves manually:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class Array
def flat_stack
flat = []
stack = [] << self
until stack.empty?
e = stack.pop
e.each do |a|
if a.is_a? Array
stack << a
else
flat << a
end
end
end
flat
end
end
This is roughly how the Ruby source code handles the situation, though you’ll see they also take care to make sure recursive references to the array inside the array don’t cause an infinite loop.
I don’t really recommend rewriting this in Ruby, but the slowdown/memory bloat is only a constant factor:
So, not super useful to rewrite this in Ruby. But there’s one thing that
bugs me about flatten
in Ruby. It’s not open to a block
. You can do
flatten {|e| p e }
but you’ll see Ruby just throws it away. Another
common use case is flatten.compact
. Likewise, there’s no way to tell
flatten
to ignore any nil
values. This is kinda a bummer to me because
given flatten
has to traverse the entire data structure, we might want
to save ourselves a constant factor by piggybacking on that traversal.
So here’s a version of flat_stack
that gives us that flexibility:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class Array
def flat_stack(keep_nil = true)
flat = []
stack = [] << self
until stack.empty?
e = stack.pop
e.each do |a|
if a.is_a? Array
stack << a
else
a = yield a if block_given?
if keep_nil
flat << a
else
flat << a unless a.nil?
end
end
end
end
flat
end
end
However, it still is better, performance-wise, to just use
flatten.compact
:
(Unpictured: Just utilizing the block and not the compaction is a little better performing, still not worth using over the normal idiom.)
This got me a bit curious, and my curiosity was such to overcome my fear: why not try to write a C extension to accomplish this?
Unfortunately, this is me writing C:
But! With help from my partner, a slew of blog posts, and the kind people on LGBT#programming I dove in and gave it a shot. First, here’s the resulting performance:
Colors might be a bit hard to see, but collapse
performs better than
flatten
+ compact
. And because I wrote mine with the exact same
allocations as the original core Ruby, the memory performance is the same:
If you want to write your own C extension, my original experiment
(patterned on
this)
is a decent template of what you’ll minimally need, and it’s on
github. Essentially all I needed
was an extconf.rb
, a .c
file, and for my own TDD preference I also
used a _spec.rb
file to actually drive the C code.
If you’d like to make use of Array#collapse
because you trust what
a golden lab writes on his day off, feel free to grab the
gem off
Rubygems.