I’ve just released the first version of Data::Transit, an implementation of the transit format in Perl. This is an early release, so be warned that there are still a lot of sharp edges.
Despite decades of web frameworks and cool new languages, there’s still a large portion of the web built in Perl. It’s one of the original server side scripting languages, and so we’ll be dealing with it for many years to come. One of Transit’s goals is to allow disparate languages to communicate with richer data; by building a Perl version, it becomes easier to build new subsystems in other languages.
Because I find that Perl, Ruby and even Python look extremely similar when you get past cosmetic differences, I studied those implementations heavily. I was able to avoid some pitfalls, but there was sufficient difference in Perl hashes to make it difficult to solve one of the problems in the same way.
Similar to many other languages, Perl uses 1 and 0 to represent true and false instead of having actual booleans. In the Python implementation of the spec, special true and false types were introduced to avoid key collisions in hashes, but Perl can’t use that approach. Only strings are allowed to exist as keys, which means my design choices are limited.
There are basically two options: accepting the possibility of key collisions or some sort of string-based encoding. While it’s tempting to do the latter for completeness, that sacrifices the feel of dealing with types native to the language. Thinking about the common usage path, it’s unlikely both true and 1 will be passed as keys to the same hash. I generally prefer not to complicate the interface to a library to serve the needs of the last 5%, so I ended up choosing the former.
The other complication unique to Perl is that many libraries usually part of the base language (Date, UUID, etc.) are provided as CPAN libraries in Perl. If these types are part of the base package, they would have to be included as dependencies. This would explode the dependencies pulled in by the package for types that may not ever be used. As with hash keys, I chose the needs of the common path over a perfect implementation and left those portions out of the base implementation.
As mentioned in the beginning of the article, this is a pretty early release. More extensive testing is required before it can be taken out of alpha. Most notably, Cognitect has a list of Seattle user group meetings they use as an unofficial benchmark for the format, which needs to be run with some profiling to ensure reasonable performance.
In the design trade-offs, I discussed choosing not to pull in a bunch of CPAN modules that may never be used. Many people may want to use these packages, though, so there should be a way to supply a set of extension distributions that can be easily included if desired. This will be largely driven by feedback, though, as I’m not a fan of writing a bunch of code nobody will use.
One of the schisms in the development community is between devotees and detractors of the book Design Patterns. For those unfamiliar with the book, it curates several patterns that arise frequently in OOD, describing when it is appropriate to use them.
This reveals a poor understanding of its underlying purpose. The State pattern is intended to allow the subclass of state to change throughout the life of the object, otherwise it needlessly adds complexity. It would have been much better to do this instead:
Situations like this are asource of depare in experienced developers, causing the opinion that studying Design Patterns is misleading and damaging to designs. They may even mutter something about how nothing can replace experience.
The Proper Way to Study Design Patterns
Don’t just read the pattern, a deeper understanding is required to apply it correctly.
Each pattern is intended to solve a specific problem. Examine every line of an idealized implementation and contemplate the implications of every conceivable variation.
Consider how surrounding context can influence the needs of the pattern. A variation isn’t necessarily invalid if external factors introduce additional needs.
Put another way, the 23 published Design Patterns are by no means a definitive list, and there is endless variation even within covered patterns. There are thousands of slight variations to these patterns in the world, with more being created daily. The only hope one has of putting them into practice is to have a deep understanding of their intent.
Case Study: The Memento Pattern
Let’s look at the Memento Pattern to better understand this approach. Keep in mind that this is not a complete treatment of the pattern, but a sample to give you an idea. I encourage you to continue studying the pattern independently.
The first step in studying any pattern is reading about it. If you have the book, you should go through the relevant section now. Now let’s look at an idealized version of the pattern in java:
OriginatorMemento’s visibility is private to Originator, which is designed hide the mechanics of Originator from the outside world. If we made the class public, how would that affect the pattern? For one thing, it would create a way to mutate Originator outside of its official interface. This could cause problems if the implementation needs to be changed, such as for performance or adding new features.
Now let’s think about the broader context, is there anything that could make us want to modify the visibility? Suppose we want to use Originator as a base class, in which subclasses have implementations which vary enough that the original state no longer makes sense. This seems like a reasonable scenario for switching the visibility to protected.
Is there a broader context in which making OriginatorMemento public is a good idea? My gut reaction is no, as a crucial piece of the pattern is the ability to hide internal implementation. If that state can be modified outside of the Originator, we might as well do this instead:
I started looking at collision detection for a side project I’ve been tinkering with lately. The basic problem I needed to solve was, given a bunch of dots of uniform size on a plane, how do you quickly determine if any of them are touching or overlapping.
Obviously there’s the brute force solution of comparing every dot with every other dot, but a $O(n^2)$ solution will fall over when the number of dots start scaling up. It really feels like a hash table should be possible here, but that kind of lookup only works well on discrete values, not the continuous ranges we’re dealing with here.
If it were possible to convert into some small set of discrete values, we could use a hash table. To do this, imagine dividing the plane up into equally sized squares. Within the hash table, we could maintain a set of all the dots that are at least partially covered by the corresponding square.
Now, instead of having to compare against every dot on the plane, the problem has been reduced to just those candidates within one of these squares. If we contrive the squares to have sides matching the diameter of the dots, then in the worst case the dot can exist in four squares.
Now, obviously there can be any number of dots within these squares, but I was trying to prevent dots from overlapping, which means there can be at most four dots in each square. So there is a constant bound on the number of dots to compare against, and so the problem has been reduced to $O(n)$.
I’ve enjoyed using test.check lately, but its integration with clojure.test doesn’t fit with how I want to use it very well. In this post we’re going to explore a different approach, which I’m going to try to get into test.chuck.
To demonstrate my difficulty, consider these basic tests in clojure.test:
Notice how simple it is to move from testing to checking. With defspec the code screams generative testing and mentions the assertions as an aside. With checking generative testing becomes the aside and the assertions becomes the focus.
A first pass on checking looks something like this:
We’re seeing the failure, but getting a lot of noise as well. What we really need here is an alternative test-reporting framework that can be nested within the checking macro. Fortunately, clojure.test is designed to replace that framework by overriding the report multimethod. Let’s look at what one of these reports looks like:
This guarantees that the investigation happens, but we’ve lost the individual assertions in the process. What we need is a way to get at only those reports which were generated in the final failing execution. There’s not a mechanism for passing information out of a failure, so we’ll have to use a closure with some state to simulate the effect:
So things are functioning correctly, and the failures are easy to read. In more complex scenarios the value may have been transformed heavily before the assertion is made, in which case we want to present the tc/quick-check return value.
One final touch is that the number of tests really shouldn’t be hard-coded. This adds to the checking footprint slightly, but that seems worth it to simplify our ability to control the number of tests being run.
Second, using reset! in save-to-final-reports causes a race condition between checking the condition and assigning to the atom. To get around this we can rearrange the arguments of save-to-final-reports and call swap! instead:
There are a lot of different routing libraries available through clojure, and it’s hard understand which one best fits your needs. I find what I really wanted when I was researching them was an article comparing them.
There are a lot of libraries in the Clojure Toolbox, so to limit this article I’m only going to talk about libraries that have had a commit within the past 6 months.
First, I need to point out that everything’s a layer on top of Ring. As such, I’m going to establish some basic terminology:
* Request: Clojure map representing the HTTP request.
* Response: Clojure map representing the HTTP response.
* Handlers: Function translating requests into responses.
* Middleware: Functions that wrap around handlers.
I’m not going to replicate Ring’s documentation, but keep in mind that all of these frameworks are fundamentally just functions and maps.
As a simple example, suppose you want your server to respond to GET requests for /foo and /bar. It would look something like this:
This will actually end up being expanded into a much more complicated data structure, but you get the idea. I haven’t had a huge amount of exposure to Pedestal, but a general principle seems to be to prefer data to functions wherever possible.
When I started writing this article I thought the landscape was larger, but actually there are only a few active projects. I think Compojure is more broadly used, but Pedestal has a lot of power I don’t understand yet.
For my second shadowing experience, I followed Steve Vance at Zipcar. Steve is an agile coach, so my experience was very different from that at Brightcove.
I found out when I arrived that it was release day. One of the first things I noticed was how much of a non-event the release was. Coming from an atmosphere where releases happen on a Friday night and everybody involved has to work a Saturday, weekly releases that have little impact on the workday was a refreshing change.
I’ve participated in scrum many times over the years, and almost every time it has devolved into status updates for the manager. I know this isn’t how scrum is supposed to work, but that’s how its always worked in practice.
Steve did a great job of avoiding this problem, and I think one of the key factors was that he didn’t put individuals on the spot. Instead of circling the group asking each person the infamous three questions, he went through the work in progress. I feel better honors the idea of Respect from XP, and I look forward to trying this idea in my own teams.
Another interesting component was that, instead of a single scrum at some point during the day where all three questions are asked, the team I was observing had two scrums. In the morning they talked about what they were going to do during the day, and in the afternoon they discussed what they had done.
This helped solve a major problem with timing I’ve seen with scrum over the years. If it’s placed at the beginning of the day, people have a hard to time remembering what they did the previous day. If it’s placed at the end of the day, people don’t have a chance to fully process the major ideas from the previous day.
That being said, I’m not completely sold on the idea. I find recurring meetings pretty frustrating, and it might be that interrupting my day a second time would trigger this reaction.
I really had a good time at Zipcar, and I really appreciate the opportunity. I learned a great deal from seeing how a well-functioning agile team behaves, I can’t wait to apply many of these ideas to my own teams.
Recently I had the chance to shadow Zach Shaw at Brightcove. This was a lot of fun and I really appreciate the opportunity.
The first thing that struck me was how heavily cloud based tools played a central role. I’ve been aware of the plethora of services provided by companies like Amazon, Rackspace and the like for a while, but I hadn’t given them a serious look. This area has been popping up much more for me recently, so I’m trying to figure out some hobby system to build with them. Expect more blog posts in this area.
One of the activities I worked on with Zach was a code review. I’ve done a lot of code reviews, but I’ve never watched anybody else as they’ve gone through the process. There were a couple things I found interesting about Zach’s approach.
At the beginning of the code review, Zach gave me an overview of the context, and then he listed his objectives for the review. I’ve always dived right into walking through the code, I found this much more disciplined.
As we went through the changes, occasionally one of would say something along the lines of “It would be better if…” Normally when I have those thoughts I add them to a laundry list of notes to give back to the original developer. This has always been a really awkward exchange, and there’s usually a feeling of annoyance that I nitpicked enough to say something needed to be renamed. It never occurred to me that I could just change the code.
As I mentioned before, this was a lot of fun and I really appreciate it. Thanks to Brightcove and Zach for giving me the opportunity.
Consider the problem of representing line segments in a plane. Each segment is represented as a pair of points: a starting point and an ending point. Define a constructor make-segment and selectors start-segment and end-segment that define the representation of segments in terms of points. Furthermore, a point can be represented as a pair of numbers: the x coordinate and the y coordinate. Accordingly, specify a constructor make-point and selectors x-point and y-point that define this representation. Finally, using your selectors and constructors, define a procedure midpoint-segment that takes a line segment as argument and returns its midpoint (the point whose coordinates are the average of the coordinates of the endpoints).
My first approach was to use the constructors they described, modernizing the data structures slightly to make it easier to understand.
That’s a little easier to understand, but still working on a pretty low level. One thing about clojure is there are so few data types that often there’s a higher level concept provided by the language.
I was amazed at how clear and expressive the final representation was. The intermediate refactorings helped me see what I was trying to do from a higher level, and this ultimately gave me the insight that I was actually merging the points together with an average.
What inspired this post, though, was Beck’s snarky response to DHH before the conversation. In it he lists eight separate bullet points on what he gets out of TDD, which is asking a lot of a few lines of code. A major theme in software design is that each piece of code should only try to do one thing, we even do that in testing by encouraging one assertion per test, but somehow the idea was lost when we started conflating objectives.
I suggest we have clear separation on which purpose each test is trying to serve. This isn’t a radically new idea, acceptance and unit tests were distinct entities in Extreme Programming: Explained (aff), but I think we should take this separation even further.
As I mentioned before, this is the piece that is already treated somewhat separately. Unfortunately, for a lot of shops, these can be indistinguishable from other kinds of tests. When I work in codebases like this, I become afraid to delete superfluous tests on the off chance they’re enforcing some important piece of business logic.
I’ve also found that this kind of test works best when passing through as high a level as possible. If this happens at a lower level I become resistent to refactorings that might change that portion dramatically.
Here’s where I’m going to depart from the norm. I’m completely behind having documentation be executable code, enforcing that it never gets out of date, but making them subordinate to other tests is just awkward.
When I’m trying to understand a new system, the last thing I want to do is pore over a bunch of low level tests to get an idea of what’s going on. We may be able to make our tests read like natural language, but that does nothing for how they’re organized.
If tests serve as documentation, they should stand apart. They should start with high level concepts and defer details until later. Keeping the information up-to-date automatically is important, but not at the cost of organizing it in a coherent way.
Another place I often write tests is when I’m trying to understand a piece of code. I tend to think of this kind of test as completely expendable. It’s ok to leave these tests around, think of them as your notes on the system. As soon as they get in the way, though, get rid of them, the system has changed too much for the notes to remain accurate.
Some tools can be used to enforce this separation, even if that wasn’t their original intent.
- Cucumber is a good option for acceptance tests, pushing down the implementation and emphasizing high level concerns.
- Something like rspec could be useful for documentation tests, as it is designed to be more expressive.
- Any tool will work for Comprehension tests, they’re all about getting understanding quickly, so whatever tool is fastest.
Over the next couple months, I will be shadowing developers at various companies. This is primarily a journey of self-improvement, I want to see how others work to learn more about what makes a great programmer.
One of the biggest challenges we face as programmers is understanding complexity. There is a lot of effort and literature around how to avoid creating it, but we still have to tackle complex code on a daily basis.
What I’m interested in is the process people go through to deal with this challenge. Our community is filled with really smart people, and I suspect that we’ve each solved this problem in different ways.
I have my own process, but I want to see these other approaches in action. I want to look into a tangled mess with someone and see it slowly start to make sense.
I like emacs. No, this isn’t the start of a flame war, but I’ve become incredibly proficient with this tool. There are a lot of other tools out there, and I’d like to see them used by someone who knows what they’re doing. Not just editors, either, anything that could have a profound impact on how I work.
What To Expect
This is largely up to the person being shadowed. I can be an outside observer or participate more like a pair programming session. I can ask a bunch of questions or keep my mouth shut all day. The key thing is I don’t want to get in the way, and I’m happy to lend a hand if asked.
There are a few things I’d like to include, but are by no means necessary:
Start the day with an overview of what to expect
End the day with a chat about what happened
Write about it here on my blog
Sign Me Up
If this sounds interesting to you, drop me a line at email@example.com. If you don’t think your company would be interested, I’d be just as happy shadowing you on an open source project.