# Transit in Perl

I’ve just released the first version of Data::Transit, an implementation of the transit format in Perl. This is an early release, so be warned that there are still a lot of sharp edges.

## Motivation

Despite decades of web frameworks and cool new languages, there’s still a large portion of the web built in Perl. It’s one of the original server side scripting languages, and so we’ll be dealing with it for many years to come. One of Transit’s goals is to allow disparate languages to communicate with richer data; by building a Perl version, it becomes easier to build new subsystems in other languages.

Because I find that Perl, Ruby and even Python look extremely similar when you get past cosmetic differences, I studied those implementations heavily. I was able to avoid some pitfalls, but there was sufficient difference in Perl hashes to make it difficult to solve one of the problems in the same way.

Similar to many other languages, Perl uses 1 and 0 to represent true and false instead of having actual booleans. In the Python implementation of the spec, special true and false types were introduced to avoid key collisions in hashes, but Perl can’t use that approach. Only strings are allowed to exist as keys, which means my design choices are limited.

There are basically two options: accepting the possibility of key collisions or some sort of string-based encoding. While it’s tempting to do the latter for completeness, that sacrifices the feel of dealing with types native to the language. Thinking about the common usage path, it’s unlikely both true and 1 will be passed as keys to the same hash. I generally prefer not to complicate the interface to a library to serve the needs of the last 5%, so I ended up choosing the former.

The other complication unique to Perl is that many libraries usually part of the base language (Date, UUID, etc.) are provided as CPAN libraries in Perl. If these types are part of the base package, they would have to be included as dependencies. This would explode the dependencies pulled in by the package for types that may not ever be used. As with hash keys, I chose the needs of the common path over a perfect implementation and left those portions out of the base implementation.

## Future Work

As mentioned in the beginning of the article, this is a pretty early release. More extensive testing is required before it can be taken out of alpha. Most notably, Cognitect has a list of Seattle user group meetings they use as an unofficial benchmark for the format, which needs to be run with some profiling to ensure reasonable performance.

In the design trade-offs, I discussed choosing not to pull in a bunch of CPAN modules that may never be used. Many people may want to use these packages, though, so there should be a way to supply a set of extension distributions that can be easily included if desired. This will be largely driven by feedback, though, as I’m not a fan of writing a bunch of code nobody will use.

# How to Study Design Patterns

One of the schisms in the development community is between devotees and detractors of the book Design Patterns. For those unfamiliar with the book, it curates several patterns that arise frequently in OOD, describing when it is appropriate to use them.

As it turns out, there’s a similar rift in the game of Go around Joseki. In Toshiro Kageyama’s amazing book Lessons in the Fundamentals of Go, there is a chapter titled “How to Study Joseki”, which reconciles the two positions.

In this article, we will be looking at how these ideas can be applied to Design Patterns. We will also spend some time applying this approach to a pattern, to see how these ideas work in practice.

## How not to Use Design Patterns

Consider the greenhorn developer, having just read the book cover to cover. He sees a situation looking similar to the State pattern, and sets to work applying it. Let’s take a look at his code:

This reveals a poor understanding of its underlying purpose. The State pattern is intended to allow the subclass of state to change throughout the life of the object, otherwise it needlessly adds complexity. It would have been much better to do this instead:

Situations like this are asource of depare in experienced developers, causing the opinion that studying Design Patterns is misleading and damaging to designs. They may even mutter something about how nothing can replace experience.

# The Proper Way to Study Design Patterns

1. Don’t just read the pattern, a deeper understanding is required to apply it correctly.
2. Each pattern is intended to solve a specific problem. Examine every line of an idealized implementation and contemplate the implications of every conceivable variation.
3. Consider how surrounding context can influence the needs of the pattern. A variation isn’t necessarily invalid if external factors introduce additional needs.

Put another way, the 23 published Design Patterns are by no means a definitive list, and there is endless variation even within covered patterns. There are thousands of slight variations to these patterns in the world, with more being created daily. The only hope one has of putting them into practice is to have a deep understanding of their intent.

# Case Study: The Memento Pattern

Let’s look at the Memento Pattern to better understand this approach. Keep in mind that this is not a complete treatment of the pattern, but a sample to give you an idea. I encourage you to continue studying the pattern independently.

The first step in studying any pattern is reading about it. If you have the book, you should go through the relevant section now. Now let’s look at an idealized version of the pattern in java:

## Varying Visibility

OriginatorMemento’s visibility is private to Originator, which is designed hide the mechanics of Originator from the outside world. If we made the class public, how would that affect the pattern? For one thing, it would create a way to mutate Originator outside of its official interface. This could cause problems if the implementation needs to be changed, such as for performance or adding new features.

Now let’s think about the broader context, is there anything that could make us want to modify the visibility? Suppose we want to use Originator as a base class, in which subclasses have implementations which vary enough that the original state no longer makes sense. This seems like a reasonable scenario for switching the visibility to protected.

Is there a broader context in which making OriginatorMemento public is a good idea? My gut reaction is no, as a crucial piece of the pattern is the ability to hide internal implementation. If that state can be modified outside of the Originator, we might as well do this instead:

# Simple Collision Detection

I started looking at collision detection for a side project I’ve been tinkering with lately. The basic problem I needed to solve was, given a bunch of dots of uniform size on a plane, how do you quickly determine if any of them are touching or overlapping.

Obviously there’s the brute force solution of comparing every dot with every other dot, but a $O(n^2)$ solution will fall over when the number of dots start scaling up. It really feels like a hash table should be possible here, but that kind of lookup only works well on discrete values, not the continuous ranges we’re dealing with here.

If it were possible to convert into some small set of discrete values, we could use a hash table. To do this, imagine dividing the plane up into equally sized squares. Within the hash table, we could maintain a set of all the dots that are at least partially covered by the corresponding square.

Now, instead of having to compare against every dot on the plane, the problem has been reduced to just those candidates within one of these squares. If we contrive the squares to have sides matching the diameter of the dots, then in the worst case the dot can exist in four squares.

Now, obviously there can be any number of dots within these squares, but I was trying to prevent dots from overlapping, which means there can be at most four dots in each square. So there is a constant bound on the number of dots to compare against, and so the problem has been reduced to $O(n)$.

# Alternative clojure.test Integration With test.check

I’ve enjoyed using test.check lately, but its integration with clojure.test doesn’t fit with how I want to use it very well. In this post we’re going to explore a different approach, which I’m going to try to get into test.chuck.

## The Problem

To demonstrate my difficulty, consider these basic tests in clojure.test:

Instead of checking hard-coded integers, I would prefer test.check to do its magic. To do this using defspec (the existing integration option), I would have to do something like this:

There’s a lot of boilerplate here, and it’s not communicating that these two properties are related. Alternatively, I could join everything into one big condition:

This is also unsatisfying, as I wouldn’t be able to tell easily what failed.

Setting aside that issue for now, let’s look at the output:

Unlike deftest, defspec prints a report even when it passes. If I start using properties liberally my test output will quickly get too noisy.

The transition path is also difficult. Look at the original deftest code and notice that moving to defspec feels like a complete rewrite, as opposed to an upgrade from hard-coded to generated values.

Also, consider the case where we’re performing multiple assertions stepping through a stateful piece of code.

Moving this to defspec can be tricky:

## The Alternative

What I’d really like to say is exactly the same thing as the original deftest code, with just a little bit of variation for the generated values:

Notice how simple it is to move from testing to checking. With defspec the code screams generative testing and mentions the assertions as an aside. With checking generative testing becomes the aside and the assertions becomes the focus.

## Naive Implementation

A first pass on checking looks something like this:

This works because the is macro completely bypasses the tc/quick-check construct. While this gets us limping along, there are a couple problems.

First, let’s force a failure:

Now look at the output for this failure:

We’re seeing a failure for every attempt when tc/quick-check tries to narrow down to the smallest failure. All we really want to know about is the result of this search.

The other problem is that tc/quick-check only traces when the final sexp is false:

Which becomes obvious in the test output:

## Intercepting Reporting

We’re seeing the failure, but getting a lot of noise as well. What we really need here is an alternative test-reporting framework that can be nested within the checking macro. Fortunately, clojure.test is designed to replace that framework by overriding the report multimethod. Let’s look at what one of these reports looks like:

So the condition quick-check is looking for is actually if :type is not :pass. If we could catch that investigations would work correctly:

Output:

## Tracking Reports

This guarantees that the investigation happens, but we’ve lost the individual assertions in the process. What we need is a way to get at only those reports which were generated in the final failing execution. There’s not a mechanism for passing information out of a failure, so we’ll have to use a closure with some state to simulate the effect:

And the output:

## Including Results

So things are functioning correctly, and the failures are easy to read. In more complex scenarios the value may have been transformed heavily before the assertion is made, in which case we want to present the tc/quick-check return value.

Output:

## Cleaning Up

I’m happy with how things are reported now, but the code is a pretty big mess. Decomposing the logic should help with that:

## Number of Tests

One final touch is that the number of tests really shouldn’t be hard-coded. This adds to the checking footprint slightly, but that seems worth it to simplify our ability to control the number of tests being run.

# Update

The macro has been accepted to test.chuck in release 0.1.12. Gary Fredericks helped me work out some concurrency problems with the above implementation.

First, with-redefs rebinds things globally, which means that we can end up saving to the wrong atom. It’s much better to use binding:

Second, using reset! in save-to-final-reports causes a race condition between checking the condition and assigning to the atom. To get around this we can rearrange the arguments of save-to-final-reports and call swap! instead:

# Clojure Routing Libraries

There are a lot of different routing libraries available through clojure, and it’s hard understand which one best fits your needs. I find what I really wanted when I was researching them was an article comparing them.

There are a lot of libraries in the Clojure Toolbox, so to limit this article I’m only going to talk about libraries that have had a commit within the past 6 months.

# Ring

First, I need to point out that everything’s a layer on top of Ring. As such, I’m going to establish some basic terminology: * Request: Clojure map representing the HTTP request. * Response: Clojure map representing the HTTP response. * Handlers: Function translating requests into responses. * Middleware: Functions that wrap around handlers. I’m not going to replicate Ring’s documentation, but keep in mind that all of these frameworks are fundamentally just functions and maps.

As a simple example, suppose you want your server to respond to GET requests for /foo and /bar. It would look something like this:

# Compojure

As you can see in the previous example, a common pattern in ring handlers is to condition on the :request-method and :uri. Compojure standardizes this process, allowing you to instead say this:

# Pedestal

Instead of using functions to define routing, pedestal prefers a data structure called a routing table. It’s still built on top of Ring, but ends up looking a little different.

This will actually end up being expanded into a much more complicated data structure, but you get the idea. I haven’t had a huge amount of exposure to Pedestal, but a general principle seems to be to prefer data to functions wherever possible.

# Twixt

One of the cool things about how Ring is designed is it allows for a fairly arbitrary amount of layering in the middleware. Twixt is an asset pipeline (css, javascript, etc) that’s designed to complement other routing by intercepting requests to /assets/.

# Conclusion

When I started writing this article I thought the landscape was larger, but actually there are only a few active projects. I think Compojure is more broadly used, but Pedestal has a lot of power I don’t understand yet.

For my second shadowing experience, I followed Steve Vance at Zipcar. Steve is an agile coach, so my experience was very different from that at Brightcove.

# Release Day

I found out when I arrived that it was release day. One of the first things I noticed was how much of a non-event the release was. Coming from an atmosphere where releases happen on a Friday night and everybody involved has to work a Saturday, weekly releases that have little impact on the workday was a refreshing change.

# Scrum

I’ve participated in scrum many times over the years, and almost every time it has devolved into status updates for the manager. I know this isn’t how scrum is supposed to work, but that’s how its always worked in practice.

Steve did a great job of avoiding this problem, and I think one of the key factors was that he didn’t put individuals on the spot. Instead of circling the group asking each person the infamous three questions, he went through the work in progress. I feel better honors the idea of Respect from XP, and I look forward to trying this idea in my own teams.

## Multiple Scrums

Another interesting component was that, instead of a single scrum at some point during the day where all three questions are asked, the team I was observing had two scrums. In the morning they talked about what they were going to do during the day, and in the afternoon they discussed what they had done.

This helped solve a major problem with timing I’ve seen with scrum over the years. If it’s placed at the beginning of the day, people have a hard to time remembering what they did the previous day. If it’s placed at the end of the day, people don’t have a chance to fully process the major ideas from the previous day.

That being said, I’m not completely sold on the idea. I find recurring meetings pretty frustrating, and it might be that interrupting my day a second time would trigger this reaction.

# Conclusion

I really had a good time at Zipcar, and I really appreciate the opportunity. I learned a great deal from seeing how a well-functioning agile team behaves, I can’t wait to apply many of these ideas to my own teams.

Recently I had the chance to shadow Zach Shaw at Brightcove. This was a lot of fun and I really appreciate the opportunity.

## Cloud Fluency

The first thing that struck me was how heavily cloud based tools played a central role. I’ve been aware of the plethora of services provided by companies like Amazon, Rackspace and the like for a while, but I hadn’t given them a serious look. This area has been popping up much more for me recently, so I’m trying to figure out some hobby system to build with them. Expect more blog posts in this area.

## Code Review

One of the activities I worked on with Zach was a code review. I’ve done a lot of code reviews, but I’ve never watched anybody else as they’ve gone through the process. There were a couple things I found interesting about Zach’s approach.

At the beginning of the code review, Zach gave me an overview of the context, and then he listed his objectives for the review. I’ve always dived right into walking through the code, I found this much more disciplined.

As we went through the changes, occasionally one of would say something along the lines of “It would be better if…” Normally when I have those thoughts I add them to a laundry list of notes to give back to the original developer. This has always been a really awkward exchange, and there’s usually a feeling of annoyance that I nitpicked enough to say something needed to be renamed. It never occurred to me that I could just change the code.

## Conclusion

As I mentioned before, this was a lot of fun and I really appreciate it. Thanks to Brightcove and Zach for giving me the opportunity.

If you’re interested in letting me shadow, drop me an email at shadowing@colinwilliams.name.

# Refactoring to Something More Expressive

Another fun tidbit from going through SICP.

## Exercise 2.2

Consider the problem of representing line segments in a plane. Each segment is represented as a pair of points: a starting point and an ending point. Define a constructor make-segment and selectors start-segment and end-segment that define the representation of segments in terms of points. Furthermore, a point can be represented as a pair of numbers: the x coordinate and the y coordinate. Accordingly, specify a constructor make-point and selectors x-point and y-point that define this representation. Finally, using your selectors and constructors, define a procedure midpoint-segment that takes a line segment as argument and returns its midpoint (the point whose coordinates are the average of the coordinates of the endpoints).

# First Approach

My first approach was to use the constructors they described, modernizing the data structures slightly to make it easier to understand.

With the data structures make-point and make-segment aren’t incredibly useful. I won’t reference them again, but I ended up just defining the data structures directly and deleting the constructors.

The first duplication I eliminated was between the two averages, as the only thing that changed was the axis.

Better, but I’m not sure things got easier to read, and there’s still that duplicated structure between extracting the start and end points.

Something that will make this easier to understand is naming the averaging concept with a function.

That’s a little easier to understand, but still working on a pretty low level. One thing about clojure is there are so few data types that often there’s a higher level concept provided by the language.

# Summary

I was amazed at how clear and expressive the final representation was. The intermediate refactorings helped me see what I was trying to do from a higher level, and this ultimately gave me the insight that I was actually merging the points together with an average.

# Types of Tests

Testing has become really overloaded, and I think this creates barriers to writing or changing tests. There was all that activity earlier this year following DHH’s indictment of TDD, including a really interesting conversation between DHH, Kent Beck and martin Fowler.

What inspired this post, though, was Beck’s snarky response to DHH before the conversation. In it he lists eight separate bullet points on what he gets out of TDD, which is asking a lot of a few lines of code. A major theme in software design is that each piece of code should only try to do one thing, we even do that in testing by encouraging one assertion per test, but somehow the idea was lost when we started conflating objectives.

I suggest we have clear separation on which purpose each test is trying to serve. This isn’t a radically new idea, acceptance and unit tests were distinct entities in Extreme Programming: Explained (aff), but I think we should take this separation even further.

## Acceptance Tests

As I mentioned before, this is the piece that is already treated somewhat separately. Unfortunately, for a lot of shops, these can be indistinguishable from other kinds of tests. When I work in codebases like this, I become afraid to delete superfluous tests on the off chance they’re enforcing some important piece of business logic.

I’ve also found that this kind of test works best when passing through as high a level as possible. If this happens at a lower level I become resistent to refactorings that might change that portion dramatically.

## Documentation Tests

Here’s where I’m going to depart from the norm. I’m completely behind having documentation be executable code, enforcing that it never gets out of date, but making them subordinate to other tests is just awkward.

When I’m trying to understand a new system, the last thing I want to do is pore over a bunch of low level tests to get an idea of what’s going on. We may be able to make our tests read like natural language, but that does nothing for how they’re organized.

If tests serve as documentation, they should stand apart. They should start with high level concepts and defer details until later. Keeping the information up-to-date automatically is important, but not at the cost of organizing it in a coherent way.

## Comprehension Tests

Another place I often write tests is when I’m trying to understand a piece of code. I tend to think of this kind of test as completely expendable. It’s ok to leave these tests around, think of them as your notes on the system. As soon as they get in the way, though, get rid of them, the system has changed too much for the notes to remain accurate.

## Tools

Some tools can be used to enforce this separation, even if that wasn’t their original intent. - Cucumber is a good option for acceptance tests, pushing down the implementation and emphasizing high level concerns. - Something like rspec could be useful for documentation tests, as it is designed to be more expressive. - Any tool will work for Comprehension tests, they’re all about getting understanding quickly, so whatever tool is fastest.

Over the next couple months, I will be shadowing developers at various companies. This is primarily a journey of self-improvement, I want to see how others work to learn more about what makes a great programmer.

## Complexity

One of the biggest challenges we face as programmers is understanding complexity. There is a lot of effort and literature around how to avoid creating it, but we still have to tackle complex code on a daily basis.

What I’m interested in is the process people go through to deal with this challenge. Our community is filled with really smart people, and I suspect that we’ve each solved this problem in different ways.

I have my own process, but I want to see these other approaches in action. I want to look into a tangled mess with someone and see it slowly start to make sense.

## Tools

I like emacs. No, this isn’t the start of a flame war, but I’ve become incredibly proficient with this tool. There are a lot of other tools out there, and I’d like to see them used by someone who knows what they’re doing. Not just editors, either, anything that could have a profound impact on how I work.

## What To Expect

This is largely up to the person being shadowed. I can be an outside observer or participate more like a pair programming session. I can ask a bunch of questions or keep my mouth shut all day. The key thing is I don’t want to get in the way, and I’m happy to lend a hand if asked.

There are a few things I’d like to include, but are by no means necessary:

• Start the day with an overview of what to expect
• End the day with a chat about what happened
• Write about it here on my blog

## Sign Me Up

If this sounds interesting to you, drop me a line at shadowing@colinwilliams.name. If you don’t think your company would be interested, I’d be just as happy shadowing you on an open source project.