The best kittens, technology, and video games blog in the world.

Sunday, June 28, 2015

Design patterns for Ruby file format converters

Baby tiger by Diego Cambiaso from flickr (CC-SA)

It's common to write scripts to convert files from one format to another. Everybody wrote tons of utilities with interface like water_to_wine_converter source.water target.wine or somesuch. That's the easy part. Pretty much every time the next step is immediately - well, what if I want to convert whole directory of those?

I keep running into this problem over and over, and the solutions I end up writing converged to fairly similar form every time, so I thought I'll write about it.

Well, first, we're going to require pathname library. Once upon a time I used to just use Strings for file paths, but the more I use Pathname the more I like it. It lacks a bunch of methods I need often, and could definitely get better, but it's still a big improvement over using raw Strings.

I'll assume for simplicity we don't need any fancy command line argument processing, but if you do, it doesn't change the rest of the pattern.

require "pathname"

class WaterToWineConverter
  def initialize(input_path, output_path)
    @input_path  = input_path
    @output_path = output_path
  end

  # Actual code
end

unless ARGV.size == 2
  STDERR.puts "Converts water to wine format"
  STDERR.puts "Usage #{$0} deck.water deck.wine"
  STDERR.puts "   or #{$0} water_folder/ wine_folder/"
  exit 1
end

input_path = Pathname(ARGV[0]) output_path = Pathname(ARGV[1])
WaterToWineConverter.new(input_path, output_path).run!

So far so good. Alternatively we could pass raw Strings to constructor, and convert them to Pathname there:

require "pathname"

class WaterToWineConverter
  def initialize(input_path, output_path)
    @input_path  = Pathname(input_path)
    @output_path = Pathname(output_path)
  end

  # Actual code
end

unless ARGV.size == 2
  STDERR.puts "Converts water to wine format"
  STDERR.puts "Usage #{$0} deck.water deck.wine"
  STDERR.puts "   or #{$0} water_folder/ wine_folder/"
  exit 1
end

WaterToWineConverter.new(ARGV[0], ARGV[1]).run!

You might even do both just for extra robustness - passing Pathname object to Pathname() constructor works just fine.

Use of Pathname is usually a matter of preference, but in this case it's part of the pattern.

Well, let's write the #run! method:

class WaterToWineConverter
  def run!
    if @input_path.directory?
      @input_path.find do |source_path|
        next if source_path.directory?
        target_path = map_path(source_path)
        next if target_path.exist?
        target_path.parent.mkpath
        convert!(source_path, target_path)
      end
    else
      convert!(@input_path, @output_path)
    end
  end
end


That's some nice code. If input path is a file, we just call convert! method.
If it's a directory, we use #find to find all files in input directory, use map_path to decide where the file goes, create folder to put that file if it doesn't exist yet. target_path.parent.mkpath is an extremely common pattern that frees you from ever worrying about directories existing or not. Just do that before you open any file for writing and you're good to go.
In this example we decided to next if target already exists - this is common if you're trying to synchronize two directories, let's say converting your .epubs to .mobis, and you don't want to redo this work. But just as well we could decide to overwrite or raise exception or print warning or whatever makes most sense.
convert!(source_path, target_path) is just a straightforward method that doesn't need to care about any of that - it already knows if target is safe to write, that directory to create target in has been created and so on.
Now the last remaining part of the pattern is to write #map_path(path) method. If both source and target use the same extension, it's really simple thanks to the power of Pathname:
class WaterToWineConverter
  def map_path(path)
    @output_path + path.relative_path_from(@input_path)
  end
end

Unfortunately there's no such easy way if we need to change extension as well. I feel like they should add a few methods to Pathname, especially for file extension manipulation, but we'll avoid monkeypatching and do it the hard way.

Fortunately it's not too messy if we're only working with one extension type, and the somewhat ugly bit is encapsulated in one method:

class WaterToWineConverter
  def map_path(path)
    @output_path +
      path.relative_path_from(@input_path).dirname +
      "#{path.basename(".water")}.wine"
  end
end


And that's it. It's fairly short, elegant (except for that extension changing part), and robust code that's easy to adapt to pretty much every converter's needs.

2 comments:

Anonymous said...

Instead of using + use File.join.

taw said...

Anonymous: Never ever use File.join, forget it even exists. Pathname#+ is superior to methods like File.join in every way.