The best kittens, technology, and video games blog in the world.

Sunday, July 18, 2010

If only Ruby had macros

Kicius Gustaw Czarowny by K0P from flickr (CC-NC-ND)


Blogger will most likely totally destroy code formatting again, sorry about that.

Ruby annoys me a lot - the code gets so close to being Just Right, with only that last little bit of wrongness that won't go away no matter what. With everything except Ruby at least I know it will be crap no matter what, so I never get this.

For example it's so easy to make a function generating successive values on each call:

def counter(v)
  return counter(v, &:succ) unless block_given?
  proc{ v = yield(v) }
end


But you must give it value before the first - and sometimes such a thing doesn't exist, like with generating successive labels "a", "b", "c" ... A counter starting from the first value passed isn't exactly difficult, it just doesn't feel right:

def counter(v)
  return counter(v, &:succ) unless block_given?
  proc{ old, v = v, yield(v); old }
end

Useless variables like old that only indicate control flow just annoy me. Not to mention lack of default block argument. I'm undecided if this tap makes things better or worse.

def counter(v)
  return counter(v, &:succ) unless block_given?
  proc{v.tap{ v = yield(v) }}
end

Another example. This wrapper for Ruby executable makes rubygems and -r compatible. It's so close to being able to use Array#map, and yet so far away:

args = []
while arg = ARGV.shift
  if arg =~ /\A-r(.*)\z/
    lib = $1.empty? ? ARGV.shift : $1
    args << "-e" << "require 'rubygems'; require '#{lib}'"
  else
    args << arg
  end
end
exec "ruby", *args


Yes, these are tiny things, but it's frustrating to get almost there. By the way, -r should just call require, another thing which is almost right but no.

I could go on with these small examples, but I want to talk about something bigger. A very common pattern in all programming languages is something like this:

collection.each{|item|
  if item.test_1
    item.action_1 
  elsif item.test_2
    item.action_2
  elsif item.test_3
    item.action_3
  else
    item.otherwise
  end
}


Or a very similar:

collection.each{|item|
case item
  when pattern_1
    item.action_1 
  when pattern_2
    item.action_2
  when pattern_3
    item.action_3
  else
    item.otherwise
  end
}

Tests and actions are all next to each other, where they belong. But what if instead of executing an action on a single item at a time, we wanted to do so on all matching items together?

If Ruby had proper macros it would be totally trivial - unfortunately Ruby forces us to choose one of bad options. First, the most straightforward:

yes1, no1 = collection.partition{|item| item.test_1}
yes2, no12 = no1.partition{|item| item.test_2}
yes3, no123 = no12.partition{|item| item.test_3}

yes_1.action_1
yes_2.action_2
yes_3.action_3
no123.otherwise

Rather awful. Or perhaps this?

groups = collection.group_by{|item|
if item.test_1 then 1
  elsif item.test_2 then 2
  elsif item.test_3 then 3
  else 4
  end
}
(groups[1]||[]).action_1
(groups[2]||[]).action_2
(groups[3]||[]).action_3
(groups[4]||[]).otherwise


By the way we cannot use a series of selects here - action_3 should apply only to items which pass test_3 but not test_1 or test_2.

We can imagine adding extra methods to Enumerable to get syntax like this:

collection.run_for_each_group(
proc{|item| item.test_1}, proc{|group| group.action_1},
  proc{|item| item.test_2}, proc{|group| group.action_2},
  proc{|item| item.test_3}, proc{|group| group.action_3},
                            proc{|group| group.otherwise})

Or maybe like this (looks even worse if you need to assign groups to a variable before performing the relevant action):

tmp = collection.dup
tmp.destructive_select!{|item| item.test_1}.action_1
tmp.destructive_select!{|item| item.test_2}.action_2
tmp.destructive_select!{|item| item.test_3}.action_3
tmp.otherwise

#destructive_select! being a method in style of Perl's splice - removing some items from collection, and returning removed values.

Possibly wrapping it in something like:

collection.filter{|item| item.test_1}.action{|group| group.action_1}.
          .filter{|item| item.test_2}.action{|group| group.action_2}.
          .filter{|item| item.test_3}.action{|group| group.action_3}.
                                     .action{|group| group.otherwise}


It's Kicius by starlightexpress from flickr (CC-NC-ND)


A few more bad ideas (David Allen says the way you can tell a highly creative person is that they generate bad ideas faster than anyone else). With instance_eval we could do something like this, with item and group being appropriate method calls.

collection.run_for_each_group{
  rule{ item.test_1 }
  action{ group.action_1 }

  rule{ item.test_2 }
  action{ group.action_2 }

  rule{ item.test_3 }
  action{ group.action_3 }

  action{ group.otherwise }
}

It would be pretty hard to do that while still being able to have inner blocks with your current object's context. By the way trying this out I found out that it's impossible to call a block specifying self, and call a block passing arguments at the same time - it's only one or the other - and no combination of the two makes it work. Those tiny limitations are just infuriating.

I also tried overriding ===. Now that would only work for a small subset of cases but was worth a try:

collection.run_for_each_group{|item, group|
  case item
  when pattern_1
    group.action_1
  when pattern_2
    group.action_2
  when pattern_3
    group.action_3
  else
    group.otherwise
  end
}


This item would actually be a special object, calling === on which would callcc, partition collection in two, and resume twice modifying group variable (initially set to the entire collection). That would be pretty cool - except Ruby doesn't use double dispatch, so === is not a CLOS style generic function - it's a method, set on pattern objects, and while adding new pattern types is easy, making old patterns match new kinds of objects is hard. It would require manually finding out every pattern, and manually overriding it to handle our magic item type - and then a lot of hackery to make Regexp#=== work, and then it would fail anyway, as Range#=== and such seem to be handled specially by Ruby.

There was a related possibility of not doing anything weird to item, but requiring special patterns:

collection.run_for_each_group{|item, group, all|
  case item
  when all[pattern_1]
    group.action_1
  when all[pattern_2]
    group.action_2
  when all[pattern_3]
    group.action_3
  else
    group.otherwise
  end
}

We're not actually using item here all, so we don't really need to pass it:

collection.run_for_each_group{|group, all|
  if all[pattern_1]
    group.action_1
  elsif all[pattern_2]
    group.action_2
  elsif all[pattern_3]
    group.action_3
  else
    group.otherwise
  end
}

Totally implementable, only somewhat ugly with all these all[]s. There are two good ways to implement it - all function would test all items, and if all returned the same value it would just return. Otherwise, it would divide the collection, and in one implementation use callcc, or in alternative implementation, throw something, and restart the whole block twice - this assumes tests are cheap and deterministic.

It looks good, but it doesn't make me happy, as I want all kinds of tests, not just pattern matches. And eventually I came up with this:

collection.run_for_each_group{|item, group, all|
  if all[item.test_1]
    group.action_1
  elsif all[item.test_2]
    group.action_2
  elsif all[item.test_3]
    group.action_3
  else
    group.otherwise
  end
}

This way, you can do any test on item you want - just pass the result to all[] before proceeding.

How is it implemented? I could callcc for every element, but unlike Scheme's, Ruby's callcc is rather expensive. And not every version of Ruby has it. So it's the naive throw-and-restart-twice instead. This means tests on each item can be rerun many times, so they better be cheap. Determinism is also advised, even though my implementation caches the first value returned to avoid troubles.

Well, first some usage example you can actually run:

require "pathname"
files = Pathname("/etc").children
files.run_for_each_group{|x,xs,all|
  if all[x.directory?]
    puts "Subdirectories: #{xs*' '}"
  elsif all[x.symlink?]
    puts "Symlinks: #{xs*' '}"
  elsif all[x.size > 2**16]
    puts "Big files: #{xs*' '}"
  else
    puts "The rest: #{xs.size} files"
  end
}


Doesn't it look a lot lot better than a long cascade of #partitions?

And now #run_for_in_group:


module Enumerable 
  def run_for_each_group(expected=[], &blk)
    return if empty?
    xst, xsf = [], []
    each{|it|
      answers = expected.dup
      catch :item_tested do
        yield(it, self, proc{|v|
          if answers.empty?
            (v ? xst : xsf) << it
            throw :item_tested
          end
          answers.pop
        })
        return
      end
    }
    xst.run_for_each_group([true, *expected], &blk)
    xsf.run_for_each_group([false, *expected], &blk)
  end
end

It shouldn't be that difficult to understand. expected tracks the list of expected test results for all items in current collection. Now we iterate, passing each element, the entire group, and all callback function.

The first few times all is called, it just returns recorded answers - they're the same for every element. If after all recorded answers all is called again - we record its result, throw out of the block, and rerun it twice with expanded expectations.

On the other hand if we didn't get any calls to all other than those already recorded, it means we reached the action - group it sees is every element with the same test history. This must only happen once for group, so we return from function.

Total number of block calls is - 1x for each action, 2x for directories, 3x for symlinks, 4x for big files, and also 4x for everything else. Avoiding these reruns would be totally possible with callcc - but it's rather ugly, and often these tests aren't an issue.


So problem solved? Not really. I keep finding myself in situations where a new control structure would make a big difference, and there just doesn't seem to be any way of making it work in Ruby without enough boilerplate code to make it not worthwhile.

I'll end this post with some snippets of code which are just not quite right. Any ideas for making them suck less?

urls = Hash[file.map{|line| id, url = line.split; [id.to_i, url]}] 
 
each_event{|type, *args| 
  case type
  when :foo
    one, two = *args
    # ...
  when :bar
    one, = *args
    # ...
  end
}

if dir
  Dir.chdir(dir){ yield(x) }
else
  yield(x)
end

5 comments:

The Sum of All Beers said...

Hi,
this isn't related to the main thrust of your post, but if you are having hassle with Blogger respecting your code formatting, you might want to consider using the PrettyPrint.js library.

Its the lowest ceremony method for integrating code formatting into blogger so far, and this has been a real bugbear for me.

I threw up some basic notes with links on my blog here:
http://richardconroy.blogspot.com/2010/07/howto-pretty-printing-code-snippets-in.html

Its far from perfect (performance can be sluggish) but the formatting is tidy.

taw said...

Thanks, I'll take a look at it.

rogerdpack said...

oh man I wish those were formatted more cleanly. Can you use gist's perhaps?
Also maybe a short-circuit operator might help?
-roger-

taw said...

rogerdpack: Blogger is shit and destroys any indentation I use when I preview. Too bad I cannot change it to anything else easily now. Sorry.

Victor Piousbox said...

Not having macros is a very good thing. Macros is a source of obscure bugs. Ruby is already, arguably, too flexible in some contexts. Having macros in there would make some people (inexperienced devvs in particular) go nuts.