Update: Thanks to Edward Kmett, this issue was quickly fixed for those of you who depend on Bytes >= 0.13. Cereal was also updated. There is some interesting discussion on this issue on the Haskell Reddit page. In the end I learned about some of the inner workings of serialization in Haskell, and learned to appreciate the extremely helpful and responsive Haskell community that is strongly dedicated to making their ecosystem and tools better in response to feedback.
Last weekend I did some work on adding a bit more robustness to the Elm compiler’s mechanism of loading interface definitions of compiled programs (1, 2). I’m not going to go into details about the checks that I added since they’re probably not that interesting (hopefully they just work as intended and it makes it easier and more predictable for you to use the Elm compiler!), but I did come across some interesting behavior with regard to Haskell’s serialization of certain data types while I was hacking on this part of the code.
Basically, if you serialize a data structure such as a Hash/Map/Dictionary in a given language, and then re-read that data structure, you’d expect to get back something like the original dictionary, right? This may or may not be the case in Haskell. Here’s a simple example:
import Data.Binary as B import Data.Map as M myMap :: M.Map String String myMap = B.decode b where m = [("one", "blue"), ("two", "red"), ("three", "green")] b = B.encode m main = do putStrLn $ "The third key is: " ++ show (M.member "three" myMap)
On line 10 above we use M.member, which takes a key and returns a Boolean indicating if that key is in the Map. If you’re new at Haskell you may expect this to return True. Look closely at the structure that is serialized on line 6, though. It’s a List of two-element tuples. We could have turned this into a Map with M.fromList, but we didn’t – the actual structure that we serialize is a List!
Now, when we decode the serialized structure on line 5, what do you think will happen? We don’t specify a type on that line, and if you look at the type of the function B.decode it’s as follows:
λ: :t B.decode B.decode :: Binary a => Data.ByteString.Lazy.Internal.ByteString -> a
It takes a ByteString, and returns an ‘a’ which, according to the type constraint on the function should be something in the Binary typeclass. Readers of this post familiar with Haskell may notice that Haskell will try to return a Map, specifically with keys and values of type String. This is because of the type signature on line 4.
If you’re still following, now it’s time for the real fun to begin. To summarize the process above, Haskell used type inference to figure out what the type should be that it attempts to deserialize. You may think that you would get a runtime error when Haskell determines that a given ByteString can’t be decoded into a Map since it’s actually a serialized List, but you’d be wrong! Instead it returns a corrupt Map which doesn’t know about all of the keys it contains (for some reason related to the internal representation of the Map it sees some of them, though):
λ: M.member "one" myMap True λ: M.member "two" myMap True λ: M.member "three" myMap False
When requested, Haskell happily deserializes into a Map with a broken internal representation without warning but with incorrect behavior. Interestingly, the Map can tell you it’s broken when asked (the function M.valid on the map returns False). In my case though the warning came too late, after I had already spent a bunch of time tracking down a bug in the program.
Although I haven’t spent much time in Haskell, it really surprised me how deserialization can lead to strangely broken data structures. In conjunction with the way that type inference is used to determine the target class to which to deserialize, it lead to an error that not only went uncaught by the compiler, but caused strange behavior at runtime as well. This is in contrast to pretty much all of my other experiences in Haskell, where I’ve seen the language and the libraries really be a great help in detecting trouble before even running the program.
In conclusion, if you’re using Haskell’s serialization on types like a Map (and perhaps others?) think carefully about the assumptions that type inference is making for you in terms of deserialization, and carefully check your work by hand (or better, with automated tests). Please let me know in the comments if others have had this experience, and if you think that this is something that can be addressed to make serialization of data structures like Data.Map more safe through providing safer defaults. Alternatively, let me know if there is something that could have helped me catch this error earlier since I’m new in the language. Thanks!