Nov 102013

Today at RubyConf in Miami, David Copeland gave a great talk titled, “Eliminating branching, nil and attributes – let’s get weird.” It’s always fun to get weird, and I sat in on his talk to see what direction it would take.

David mentioned that it’s fun to see how we can implement basic constructs in a language by using basic building blocks and I wholeheartedly agree. In fact, the exercise he posed in a great one in this case since it leads us into the lambda calculus which is a simple language that is actually just as powerful as languages like Ruby. Because Ruby supports lambdas it’s beautifully simple (almost as much so as in Scheme or Clojure) to implement a construct like the standard ‘if’ statement. This kind of thing is often done as a learning exercise in a lisp dialect, but I couldn’t find an example in Ruby so I decided to translate it here.

The example that I wrote using lambdas in Ruby also takes advantage of currying, or the concept that functions that take more than one argument can always be expressed as a series of one-argument functions that return a function accepting the next argument. In other words not only can we discard with the ‘if’ statement, we can also get rid of functions that take more than one argument to further ‘simplify’ our programs.

I’d encourage you to read up on this if these concepts or the following code fragment seems foreign. Without further ado, here is my attempt at if..then/else without the Ruby ‘if’ statement, along with MiniTest specs.

Note that this program will only work for Ruby 1.9 and above. For Ruby 1.8.7 and below you will have to change the ‘arrow’ syntax for defining to syntaxes compatible with those interpreters.

require 'minitest/spec'
require 'minitest/autorun'

# We first define lambdas for true and false expressions. 'tru' is a function
# taking one argument that returns a function accepting another function, and
# which returns the value of the outer function. 'fls' also takes one argument
# and returns a function taking a second argument, but it returns the value
# given to the second function.
# Note that we use a convention of indicating variables whose values are ignored
# with the '_' character.
tru = -> (x) { -> (_) { x } }
fls = -> (_) { -> (y) { y } }

# We now define a sequence of three nested lambdas corresponding to the
# conditional test followed by lambdas representing the true and the false
# "branches" in the conditional. Again this could have been represented as a
# single lambda accepting three arguments, but to keep the language as simple as
# possible and to demonstrate currying and higher-order functions we implement
# the conditional by a series of one-argument lambdas.
# For a true expression, you can think of this in terms of the following
# evaluation:
#'true expression').call('false expression')
# We call the if_then_else lambda with the expression that either evaluates to
# true or false. The next two invocations of 'call' are evaluated by the
# functions above (tru and fls) which return either the value given to the outer
# function or the inner function.
if_then_else = -> (p) { -> (a) { -> (b) { } } }

describe 'Implementing conditionals in the lambda calculus' do
  describe 'the tru (true) lambda expression' do
    it 'returns the argument given to the first function' do'first').call('second').must_equal 'first'

  describe 'the fls (false) lambda expression' do
    it 'returns the argument given to the second function' do'first').call('second').must_equal 'second'

  describe 'if..then..else using the lambda calculus' do
    it 'returns the value of the "then" branch if the conditional is true' do'true value').call('false value').
        must_equal 'true value'

    it 'returns the value of the "else" branch if the conditional is false' do'true value').call('false value').
        must_equal 'false value'

This code is also available on github.

Nov 072013

Update: Thanks to Edward Kmett, this issue was quickly fixed for those of you who depend on Bytes >= 0.13. Cereal was also updated. There is some interesting discussion on this issue on the Haskell Reddit page. In the end I learned about some of the inner workings of serialization in Haskell, and learned to appreciate the extremely helpful and responsive Haskell community that is strongly dedicated to making their ecosystem and tools better in response to feedback.

Recently I’ve been making some minor contributions to Elm, a fantastic, functional language that compiles to Javascript. If you haven’t already seen some of the things that you can create with Elm using a relatively small amount of code, you should spend some time checking out the project’s web site.

Last weekend I did some work on adding a bit more robustness to the Elm compiler’s mechanism of loading interface definitions of compiled programs (1, 2). I’m not going to go into details about the checks that I added since they’re probably not that interesting (hopefully they just work as intended and it makes it easier and more predictable for you to use the Elm compiler!), but I did come across some interesting behavior with regard to Haskell’s serialization of certain data types while I was hacking on this part of the code.

Basically, if you serialize a data structure such as a Hash/Map/Dictionary in a given language, and then re-read that data structure, you’d expect to get back something like the original dictionary, right? This may or may not be the case in Haskell. Here’s a simple example:

import Data.Binary as B
import Data.Map as M

myMap :: M.Map String String
myMap = B.decode b
    where m = [("one", "blue"), ("two", "red"), ("three", "green")]
          b = B.encode m

main = do
  putStrLn $ "The third key is: " ++ show (M.member "three" myMap)

On line 10 above we use M.member, which takes a key and returns a Boolean indicating if that key is in the Map. If you’re new at Haskell you may expect this to return True. Look closely at the structure that is serialized on line 6, though. It’s a List of two-element tuples. We could have turned this into a Map with M.fromList, but we didn’t – the actual structure that we serialize is a List!

Now, when we decode the serialized structure on line 5, what do you think will happen? We don’t specify a type on that line, and if you look at the type of the function B.decode it’s as follows:

λ: :t B.decode
  :: Binary a => Data.ByteString.Lazy.Internal.ByteString -> a

It takes a ByteString, and returns an ‘a’ which, according to the type constraint on the function should be something in the Binary typeclass. Readers of this post familiar with Haskell may notice that Haskell will try to return a Map, specifically with keys and values of type String. This is because of the type signature on line 4.

If you’re still following, now it’s time for the real fun to begin. To summarize the process above, Haskell used type inference to figure out what the type should be that it attempts to deserialize. You may think that you would get a runtime error when Haskell determines that a given ByteString can’t be decoded into a Map since it’s actually a serialized List, but you’d be wrong! Instead it returns a corrupt Map which doesn’t know about all of the keys it contains (for some reason related to the internal representation of the Map it sees some of them, though):

λ: M.member "one" myMap
λ: M.member "two" myMap
λ: M.member "three" myMap

When requested, Haskell happily deserializes into a Map with a broken internal representation without warning but with incorrect behavior. Interestingly, the Map can tell you it’s broken when asked (the function M.valid on the map returns False). In my case though the warning came too late, after I had already spent a bunch of time tracking down a bug in the program.

Although I haven’t spent much time in Haskell, it really surprised me how deserialization can lead to strangely broken data structures. In conjunction with the way that type inference is used to determine the target class to which to deserialize, it lead to an error that not only went uncaught by the compiler, but caused strange behavior at runtime as well. This is in contrast to pretty much all of my other experiences in Haskell, where I’ve seen the language and the libraries really be a great help in detecting trouble before even running the program.

In conclusion, if you’re using Haskell’s serialization on types like a Map (and perhaps others?) think carefully about the assumptions that type inference is making for you in terms of deserialization, and carefully check your work by hand (or better, with automated tests). Please let me know in the comments if others have had this experience, and if you think that this is something that can be addressed to make serialization of data structures like Data.Map more safe through providing safer defaults. Alternatively, let me know if there is something that could have helped me catch this error earlier since I’m new in the language. Thanks!

Nov 062013

We all know that responsible programming in a highly dynamic language like Ruby requires a significant amount of time spent writing tests. We also know that these tests don’t just serve to demonstrate correct operation of the program under certain conditions, they can also give rise to well-structured code [1]. Still, we want to make sure that the tests we’re writing add the most possible value to our program, rather than becoming a burden to the development process.

The “test pyramid” is one of the most useful concepts that I’ve come across to make sure that tests are helpful without slowing down your development. Basically, this concept says most of your automated tests should be in the form of isolated unit tests, and more integrative tests should be used sparingly. There are several reasons that this makes sense:

  1. Unit tests are fast to run, since they run without dependencies. In contrast integration tests, and especially tests that drive a web UI are relatively slow.
  2. Unit tests give you the most direct feedback about the source of problems. A single change in code could break many integration tests, and you would have to dig through layers of code to find out the single line that caused an error. In contrast an isolated unit test should indicate the cause of a bug within a few lines.

You may have some arguments against following such a practice. Won’t an emphasis on unit testing miss bugs that occur when components are integrated? It’s true that you do want some integration tests for high-value scenarios in your application. They make sure that components generally work together and don’t cause bugs. However remember that any non-trivial program can’t be integration-tested in its entirety. If you tried to do that, your development process would drown under the weight of the tests, and you would probably still be missing coverage. In contrast, well-focused and abundant unit tests and a few high-level integration tests will cover your code in addition to providing enough high-level coverage.


Aug 272013

I’m currently in Quito, Ecuador training new developers joining the Stack Builders team. As we’re working on a practice project, a time billing system, I came across a really useful class that I hadn’t really used much in the past. Ruby’s Method class helped us to clean up a part of our code and I wanted to share how we used it.

In our system, we had a method that would either add or subtract time from a Date object. The problem is, you can’t just pass ‘+’ or ‘-‘ to be applied as concisely as you would in more functional languages.  For example in Haskell (using the interactive shell, ghci) you can do the following to pass addition or subtraction to a function to modify two numbers:

λ> let modifyNumbers x y f = x `f` y
λ> modifyNumbers 3 4 (+)
λ> modifyNumbers 3 4 (-)

What’s the closest way to approach this elegance in Ruby? I’ll cover a few different possibilities including Ruby’s `method` method.

Perhaps the most straightforward method to accomplish this to someone not used to a language with higher-order functions would be a simple conditional to decide which calculation to apply to a set of numbers:

def modify_numbers(x, y, modification_type)
  if modification_type == :add
    x + y
  elsif modification_type == :subtract
    x - y

=> modify_numbers(1, 2, :add)
=> 3 

In Ruby, we can use higher-order functions – that is, we can write functions that take functions as arguments. Let’s take a look at this approach:

add = -> (x, y) { x + y }
subtract = -> (x, y) { x - y }

def compute_time(time_a, time_b, add), 2)
=> 3

Passing an anonymous block of code around may help to write more generic functions that you can compose in different ways later. But it still feels a bit kludgy to have to wrap a function inside of a another function just to pass it around. After all, this kind of thing comes for free in Haskell and Clojure! Instead of passing an anonymous function, you could just pass in a symbol representing the method to be invoked and use send:

def modify_numbers(x, y, modification_type)
  x.send(modification_type, y)

modify_numbers(1, 2, :+)
 => 3 

This is nice since you don’t have to wrap the method in a lambda in order to use it. But I think that we can do a bit better using Ruby’s ‘method’ method:

def modify_numbers(x, y, method_name)

modify_numbers(1, 2, :+)
=> 3

I like this approach the best since it avoids having to explicitly wrap the method in a lambda. You also get something that acts like a lambda (ie, the instance of Method that ‘method’ returns responds to ‘call’ just like a lambda). This may help in refactoring later if you decide that you really need a higher-order function.

After writing this, I’m curious how many other Rubyists have found a use for the Method class in their code. I know that I spent six years programming in Ruby without using it a single time. Let me know in the comments!