20 common mistakes when doing Test-Driven Development

These are based on my own experience as a developer and what I've heard and seen from other people working with me.

(All the notes are from my talk TDD - Making sure everything works, at the Agile Transformation Summit 2015. You can download the full deck in PDF here).

  1. Thinking that your code is flawless: Quit thinking that you are above the rest because chances are that you are not: your code is as bad as mine. We all need help, and TDD is a great way to get it.

  2. Thinking that is all about the tests: It's also about design. It's about confidence. It's also about documentation. In a way, tests are a very nice byproduct of the process.

  3. Asking for permission: Why in the world would you consider asking for permission to do your job well in the first place? And if you are asking, it means that there's a different viable alternative, right?

  4. Not having your entire team onboard: Your team should be able to speak your language and give you support whenever needed. During TDD, you want to keep everyone engaged and contributing.

  5. Pursuing a specific code coverage: Code coverage is just about the quantity, not the quality. It doesn't tell how good your tests are. Are they testing the right thing? Are they easy to read and maintain?

  6. Failing to be consistent: An abandoned test suite is worse than not having anything at all. Is not-useful code that's sitting there needing maintenance and accumulating dust every single day.

  7. Not running your tests frequently enough: If you are not, something it’s really wrong with your TDD process. Your test suite is your thermometer, telling you how you are doing every step of the way.

  8. Failing to define proper conventions: You need solid name, structural, and architectural conventions to keep the entire team speaking the same language throughout the project.

  9. Testing other people's code: By definition, you can't do this. Splitting this process is like having one person drinking a beer, and another person puking when the former gets drunk.

  10. Writing code without having a failing test: This is probably the main problem you'll face. It's a habit that will cost you a long time to break, but eventually the right way will become second nature.

  11. Testing code that’s already written: Sometimes the code is already there, and if so, why do you need to write tests for it? I'd rather not write tests unless I have to modify the original code.

  12. Writing a test before all other tests are passing: Go one step at the time. You can't concentrate in multiple failing tests at the same time. Being methodical will pay off big.

  13. Refactoring before all tests are passing: Don't change code without having working tests. Tests are the only way you have to know whether your changes were successful.

  14. Writing code unrelated to the failing test: If the code is not covered by the failing test, nothing can tell you whether the code works as intended. Unrelated code is untested code.

  15. Testing more than one thing at the time: A test testing more than one thing is a complicated test. Keep tests as simple as possible. If you have too many assertions, you'll have to debug when the test fails.

  16. Writing slow tests: If your tests are slow, you aren't going to run them frequently enough. I'd rather have 99 fast tests that I can run every minute than 99 + 1 slow test that never get run.

  17. Writing overlapping tests: Avoid multiple tests testing the same thing. The clue is getting multiple tests failing with every error. This makes the feedback from your test suite hard to interpret.

  18. Testing trivial code: Only test what could possible fail and forget about everything else. By testing trivial code, you get distracted from testing the code that really matters.

  19. Creating dependencies between tests: It makes it impossible to run tests individually or in a different order. A problem in one test will cause other tests to fail.

  20. Writing tests with external dependencies: This usually produces slow tests. If dependencies break or cease to exist, your tests will stop working. This makes your test suite very fragile and unstable.

I'm sure you have seeing more. I'd love to hear about it.

My first public speaking engagement was great!

Yesterday I was honored to give a presentation about Test-Driven Development at the Agile Transformation Summit 2015 (ATS 2015) that took place at Nova Southeastern University in Fort Lauderdale, FL.

This was my first public speaking engagement. I'm very happy (and surprised) with how well it came out, and I hope all the people that was there had as much fun as I did.

(The room was packed beyond what I expected. I'm so glad to see how many people are interested on TDD.)

The major highlight was having James Grenning listening to my talk and nodding throughout it. (James is one of the original signers of the Agile Manifesto, and considered by many one of the fathers of TDD.) When the talk was over, he came over and not only gave me a lot of great feedback, but mentioned how much he enjoyed my presentation, which was extremely gratifying. (And then tweeted about it, which was also great.)

Here is the link to download my presentation.

One of my pet peeves is to look at presentations that don't say much without the speaker, so I took special care to make this one "readable" even if you didn't listen to my talk.

If you want a quick summary of the presentation without downloading the PDF, I also wrote a post about it.

Solution to the position of the element

Here is my solution to last Sunday's programming challenge about finding the position of an element in an array (You can read the challenge here.)

The handwritten notes I took when thinking about this problem:

As you probably guessed, this problem is a whole lot like a regular search algorithm. A simple loop to find the element and return the position of it solves it (or return the position of the first element greater than our target.) I decided to also check for the first and last element of the array to cover those cases separately.

I called this simple loop solution "sequential". You can see the Java code in this Gist (method sequential).

The only problem with this sequential solution is that it doesn't scale well for large arrays: We have to visit every single element in the array, so this solution has an O(n) complexity.

When I published the problem, I asked to think about this case:

In order to make the problem a little bit more complicated, I'd ask you to think about an algorithm that can scale well to sufficiently big arrays.

Since the array is sorted, we can look into a binary search algorithm to find our solution. (Remember that binary search has an O(log n) complexity, which outperforms O(n).)

The method "binary" in this Gist contains this solution. Notice that this is exactly a regular binary search with the only difference that I'm returning the left bound of the array in case the target is not found (instead of returning -1.)

I tried hard to provide the best solution I could in the time I had, but I might very well be wrong (or you might have a better way of solving the problem.) If that's the case, please let me know and I'll update the post.

In case you are interested, here is the challenge from last week: Programming challenge: merging overlapping intervals.

Have you reached all your potential as a programmer?

This is not about what you know, but about how fast you can apply what you already know.

Think for a second how do you always try to solve a particular problem. To oversimplify this, I'm going to divide it in two different groups:

  1. The first group are those people who start from the beginning, and step by step work on their problem until they reach a solution. When they do, they are done. (And everything works, and it also looks good.)

  2. The second group are those people who care the most to reach the end as soon as possible (thus not paying attention to the details along the way.) After they find the right path, they come back and clean everything that has to be cleaned. (And then everything works as well, and it also looks good.)

(Just to be clear, I'm assuming both groups reach exactly the same place at the end. Not only their solutions are the same, but there's also no difference in the quality of their program.)

Granted, I think most people are in (have tried) both categories. I'm also not sure whether one of these is better than the other every single time.

But I try myself to be in the second group every time I can (thus meaning that I like better the second approach.) Here is what this means:

  • When I'm trying to solve a problem, I don't care about anything else than to find the solution (the feedback I need to know that I'm done.)

  • From my coding, I remove every detail that gets in the way of speed. Remember, my goal is to move fast to get feedback as soon as possible.

  • I heavily use mocks, hardcoded values, ugly code, and anything that I need to move on and get to the end.

  • I usually have to ask anyone pair programming with me to be patient: people freak out when they see the tail of debt I leave when I'm moving fast.

  • When I find the light at the end of the tunnel, I come back cleaning: mocks, hardcoded values, ugly code, and every compromise I made gets fixed.

(Some people would tell you that this approach is called "prototyping". It might be, but I don't like to call it that way: the word "prototype" forms a picture of an incomplete product in my mind.)

Why do I like this approach anyway?

Because I've seen how fast it is.

I've worked with people who know it all, and their only difference has been how fast they can get from point A to point B: the time it takes to get things right when you aren't sure yet those are the things you need, adds up against you really quick. Being messy while you find answers, and then taking the time to clean up your mess (the mess that really matters) comes up ahead most of the time.

Is this for you?

Seems so obvious and simple, but I promise it's very powerful!

I've taught this approach to several colleagues before. They've tried and now they can't go back. They feel they work faster now. I definitively know they do.

The key is to constantly evaluate whether you are ready to clean things up or keep moving along. If you've proven your point, start cleaning. If you haven't, don't worry about the details yet.

If you find yourself in the first group most of the time, try this for a change. You might not like it, and that's fine, but who knows? There might be something there for you.

Programming challenge: the position of the element

Let's try something simple this week (because it's Memorial Day week. I promise to rise the bar next week):

Given a sorted array and a target value, return the index if the target is found. If not, return the index where it would be if it were inserted in order. There won't be duplicate values in the array.

For example:

  • [1, 3, 5, 6] with target value 5 should return 2.
  • [1, 3, 5, 6] with target value 2 should return 1.
  • [1, 3, 5, 6] with target value 7 should return 4.
  • [1, 3, 5, 6] with target value 0 should return 0.

In order to make the problem a little bit more complicated, I'd ask you to think about an algorithm that can scale well to sufficiently big arrays.

As always, feel free to tweet me your solutions and ideas. I will publish mine next Wednesday (and will update this post with a link to it.) Here is my solution to this problem.

In case you are interested, here is the challenge from last week: Programming challenge: merging overlapping intervals.

Solution to merging overlapping intervals

Here is my solution to last Sunday's programming challenge about merging overlapping intervals (You can read the challenge here.)

First of all here are the notes I took while solving the problem, just in case someone finds them interesting:

Comparing each interval with all the other intervals is too inefficient, and I didn't want to do that. Instead, I sorted all the intervals: [1,3] [2,6] [8,10] [7,11] turns into [1,3] [2,6] [7,11] [8,10] (notice how I'm sorting only by the first value of the interval.)

Why sorting? Because now I don't have to compare each interval with the rest, since every possible overlapping interval is next to each other.

From here, I can start comparing each interval with its neighbor, and merge them if they overlap (my code just prints the merged interval to the console.) Since the list of intervals is sorted, if an interval doesn't overlap with its neighbor, I don't need to keep comparing the same interval anymore (I can move onto the next one.)

Here is my Java code to solve the problem.

I tried hard to provide the best solution I could in the time I had, but I might very well be wrong (or you might have a better way of solving the problem.) If that's the case, please let me know and I'll update the post.

An interesting part of this exercise is the data structure to store all the intervals. In my solution I used a regular matrix (where each row represents an interval, and there are only 2 columns.) This is fine since I'm not merging the intervals in memory (but printing them instead.) If I wanted to store the merged intervals, I would have needed a different structure for sure (resizing arrays in Java is a pain in the neck.)

In case you are interested, here is the challenge from last week: Programming challenge: rotating a matrix 90 degrees in place. And here is the next challenge in the list: Programming challenge: the position of the element.

The Google App Engine Pipeline API

This is how the documentation describes it:

The Google App Engine Pipeline API connects together complex workflows (including human tasks). The goals are flexibility, workflow reuse, and testability.

In order to properly illustrate what that means, I've been trying to find a suitable example to write a blog post about it. I've used the API in 3 different projects, and none of them seem like a good candidate for the post, so I made up a totally fictitious example instead. (Please, bear with me because I'll stretch it all I can to fit my purpose.)

A totally made up example

Let's imagine you need to compute the total number of points accumulated by several players across multiple games. All we care about is the total score.

Each game has several levels, and each player played one or more of these levels, accumulating a certain number of points on each one of them. We have access to a REST API that, given a specific level and a player, returns the accumulated score.

Of course, so far this is an ordinary, non-interesting problem, but let me try to make it a little bit more complicated:

  • Let's assume that the REST API call takes approximately 2 seconds to return the player's score in a specific level
  • Let's assume we have approximately 1,000 games
  • Let's assume each game has approximately 10 levels
  • Let's assume each level is played by approximately 1,000 players

Assuming we are going to put together some regular for-loops to do all the work synchronously, computing the total score will take approximately 231 days.

(What?!?!)

  • 1,000 games * 10 levels * 1,000 players * 2 seconds = 20,000,000 seconds
  • 20,000,000 seconds = 231.48 days

(Remember I'm forcing these crazy assumptions to have an example that fits my purpose here. Don't get mad at me because the REST API takes too damn long to return a value.)

What's the right approach?

If you are a regular Google App Engine user, you know about all the deadlines here and there that force developers to be very careful with how we spend our limited time. Backends are exempt from deadline exceptions but 231 days is still too long.

The first obvious thing we need to do is to somehow parallelize our solution. Unfortunately this is not always a simple task: coordinating all the computations for games, levels, and players can get really messy whenever we try to break our problem into multiple, asynchronous threads.

So yes, despite we know what the right approach is, we are probably hours away from the final solution.

The App Engine Pipeline API (finally!)

Here is one of the cases where the Pipeline API is very useful.

The Pipeline API uses the Task Queue in the background (so no backends necessary), and the beauty of it is that it takes care of all the plumbing required to coordinate the entire workflow (creating tasks, executing them, waiting for other tasks to complete, dependencies between tasks, etc.)

To our effect, with minimum code we can setup a workflow to retrieve all the data we need, and behind the scenes the API will go nuts working and coordinating tasks (but of course, we don't care about that).

So in a nutshell, the Pipeline API will let you create a very sophisticated parallelized solution to a problem like this. (For some reason I always picture a cop in the middle of traffic making sure the flow is as efficient as possible.)

Semi-functional code for this problem

And is not completely functional because we haven't talked about all the needed structures (classes) and we made a bunch of raw assumptions.

You can find the Python code in this Gist.

Approximately 30 lines of code (not counting the comments) to solve the problem. (I'm sure it will take more code to fill all the gaps, but it shouldn't be a ton more.) Pretty impressive for an otherwise complicated problem.

Aren't you going to explain the code?

I could, but in this case I believe the documentation does a great job. Check the following links:

In case you are looking to learn about Git and GitHub

(The following is part of a documentation repository we are creating internally at my company to help onboard new team members on different technical areas.)

Git is a distributed revision control system with an emphasis on speed, data integrity, and support for distributed, non-linear workflows. Git was initially designed and developed by Linus Torvalds for the development of the Linux kernel, and has since become the most widely adopted version control system for software development.

  • The Official Git Site - Offers documentation for all Git commands as well as binary downloads of Git for all supported platforms.

  • Git Tutorial - A step by step Git tutorial covering from the very beginning to more advanced concepts.

  • Git Interactive Tutorial - This is an amazing interactive tutorial where you'll learn Git by actually using it in a simulated web terminal window.

  • Pro Git - This is a book, free and online. It's great whenever you want to use it as a reference or as a starting guide to learn Git.

  • Comparing Workflows - You can use different workflows to collaborate with your team using Git. This page covers some of the most popular workflows: Centralized, Feature Branch, Gitflow, and Forking Workflow.

GitHub

GitHub is a web-based Git repository hosting service, which offers all of the distributed revision control and source code management functionality of Git as well as adding its own features. Unlike Git, which is strictly a command-line tool, GitHub provides a web-based graphical interface and desktop as well as mobile integration. It also provides access control and several collaboration features such as wikis, task management, and bug tracking and feature requests for every project.

Programming challenge: merging overlapping intervals

Here is the problem for this week:

Given a collection of intervals, write a function that merges all overlapping intervals and prints them out.

For example, given [1, 3], [2, 6], [8, 10], and [7, 11], the function should print [1, 6], [7, 11]. Or given [5, 12], and [8, 10] the function should print [5, 12].

You can assume that the first element of each interval is always less or equal than the second element of the interval.

The solution will come later

Last week I posted the solution at the same time I published the challenge (and in the same post.) Some people asked me to do it separately so they have some time to think and produce a solution by themselves.

Fair enough.

Later in the week (probably Thursday) I'll publish my solution in a separate post and link to it from here. In the meantime, feel free to tweet me your solutions.

Update: Here is my solution to this problem.

In case you are interested, here is the challenge from last week: Programming challenge: rotating a matrix 90 degrees in place. And here is the next challenge in the list: Programming challenge: the position of the element.

What could go wrong with pair programming?

I believe pair programming is great. I've been doing it for more than 10 years and I firmly believe that it's helped me become a better engineer. (Night after night coding with somebody better than me paid off for sure.)

But of course, we tend to find things that we like, and exploit them to the point where they hurt more than help. Pair programming is no exception, and I've seen the bad side of it whenever is abused (or misused.)

The good about pair programming

I just can't talk crap about pair programming without first saying that I love it. I've experienced first hand most of its benefits: I've watched fairly junior developers turn into rock stars by consistently pairing with more experienced teammates. I've seen features get developed by two people in probably a quarter of the time that would have taken only one person to finish them. I've seen the effect on code quality, communication, and design.

I've seen the good about pair programming, and that's why I consider it an indispensable tool for most development teams out there.

But this post is about the bad and the ugly

This is not about the good though. This post is about the bad and the ugly.

Pair programming is not a silver bullet (I don't think we've discovered one yet.) If not executed correctly, it can negatively affect your teams and your projects.

My goal with this post is to share with you several aspects that you should keep in mind when introducing pair programming to your teams. Of course, what haven't worked for me might as well work perfectly fine for you, so take into account everything below but feel free to explore and decide by yourself.

Uneven pairs

Two junior developers doing pair programming will definitively make a lot of progress. Two senior developers will make magic stuff happen. But what about a very junior person with a rock star?

Some people say the junior will be able to absorb a lot of knowledge, and this is probably true. But is this what you want to get out of a pair programming session? In this equation, what are the benefits for your rock star? And what are the benefits for your product?

Big egos

We developers have big egos. (I personally don't like people telling me how to do my job. I'm that stupid.) Sometimes, pairing two developers together can turn into a fight in no time.

It's just that neither of them want to lose an argument. And they will have plenty of those to talk about during a pair programming session.

Enemies

Pairs have to like each other.

Think about it for a second: you are asking two individuals to work together sharing a keyboard and a display. But they would rather be 10 miles apart.

(This doesn't happen every day. But I've seen it. More than once.)

Fork in the road

Should we go left? Should we go right?

Opinionated people working together are likely to get stuck on every branch along the way (when usually any answer would be just fine.) If you also add the ego factor to this equation, things can get out of control.

(By the way, I think than having teammates with different opinions working together is a very healthy way to foster creativity, as long as they are willing to make the right concessions when the time comes.)

Scrutiny

Certain people are not confortable getting scrutinized.

Guess what? Pair programming is all about having another person looking at every single keystroke. Pairs will see their raw thoughts as they come out their minds and get reflected in code.

People that don't like this will tend to hide and overthink everything they type. Or they will rather watch. More and more, they will become a pair too passive to be effective.

Time waste

There's a lot of thinking involved in developing software. Papers, sticky notes, and whiteboards are very common tools when developing software.

You can always share with your pair, but before that happens you need to mature your thoughts. Most frequently than not you need a little bit of time alone before you are ready to move forward.

Is this the right time to have your pair weirdly looking at you while you think? Isn't there anything better for him to do?

Divide and conquer

Programming is not always about solving creative, exciting, and difficult problems: there's also a bunch of trivial stuff that we have to do.

Stuff that anyone can do. Stuff that probably won't benefit much from having two people looking at it.

In these cases it's usually more effective to divide and conquer, covering twice the ground by having the pair split for a while.

(Purists will tell you all the benefits of also pairing in these situations. I get it. But they aren't paying the bill.)

Explore and fail

I've seen much less appetite to fail when there's someone looking at you every step of the way. (You know, that stupid idea that I just had might not come across as a smart move, so let's not do it.)

And I think if you aren't willing to fail (hence playing it safe all the time), your probabilities of doing something great are very slim.

Money

And there's always the money factor.

Sometimes two people working together on the same thing is a hard sell for someone who doesn't fully understand the benefits. (And we are here to explain, and tell them what they're probably wrong.)

But sometimes, pair programming is actually more costly that what we can afford.

Everything has a cost, and pair programming delivers ton of benefits, but is your client willing to pay for them? When the benefits you get aren't valued accordingly with the cost you incur, you shouldn't try to force it.

Parting thoughts

Pair programming is not always the right answer, and it might well be the wrong one. Your job (and mine) is to find out every one of those cases where we should look to a different direction.

(An interesting thought is that I'd never be able to write a blog post with somebody looking over my shoulder. That's not "pair programming", I know, but if it were, I can easily see myself having a really hard time enjoying my work.)

The above situations don't mean you should stop doing pair programming with your teams. You should continue, but avoid forcing it every time all the time. They key is to find when and where it will pay off the highest dividends.