Differential Synchronization

From Neil Fraser:

Keeping two or more copies of the same document synchronized with each other in real-time is a complex challenge. This paper describes the differential synchronization algorithm. Differential synchronization offers scalability, fault-tolerance, and responsive collaborative editing across an unreliable network.

Think about two people working over the same document at the same time. And you need to keep that document synchronized over the network.

I knew about Three-way merges, but didn't think much about the drawbacks of using it:

[Three-way merge] could be compared to an automobile with a windshield which becomes opaque while driving. Look at the road ahead, then drive blindly for a bit, then stop and look again. Major collisions become commonplace when everyone else on the road has the same type of "look xor drive" cars.

In comparison:

[When using Differential Synchronization] there is no requirement that "the chickens stop moving so we can count them" which plagues server-side three-way merges.

Long (complex) article from there. Interesting right now, but will become vital as soon as I need to implement something like this.

Despite the inherent complexity, this synchronization system works extremely well. It is robust, self-healing and impressively accommodating of users who are working on the same text.

Filed this under "I'm sure I'm going to need it".

Mobile browsers are not the problem. Web pages are

I just finished reading an article by The Verge about how The mobile web sucks.

I agree. It sucks, but I don't think it's because mobile browsers are bad (like the article tries to sell), but because web pages are poorly created for devices several magnitudes less powerful than desktop computers.

For a fun exercise try to load The Verge on your mobile phone. (Or better don't. You aren't going to enjoy it anyway.)

I spent some time digging into the article's page that so vehemently claims how mobile browsers don't work. Here is what I found:

  • Browser caching is not fully used. Several files report zero or very short caching expiration.
  • Multiple images are not properly compressed. Google PageSpeed Insight reports that almost 1MB could be saved by compressing the images.
  • Several JavaScript and HTML files can be minimized.
  • Compression is not enabled in several of the domains that serve content for the page.
  • There's render-blocking JavaScript and CSS in the above-the-fold content area of the page.

PageSpeed gives the mobile version of The Verge a score of 13 out of 100. (The desktop version scores 38/100, but that's for another story.)

With those numbers, I don't think The Verge has the necessary credibility to stand and criticize mobile browsers out there. Their problem is in their offices, not anywhere else.

Utilization is not the final answer

Utilization represents how much time you are actually working on stuff that makes money. When you aren't 100% "utilized", it means that some of your time is spent doing "non-billable" work.

At least that's how we think about "utilization" at work.

Common sense dictates that a highly utilized team makes more money than a poorly utilized team. The tendency is to always find ways to keep everyone busy all the time.

This works up to a point.

There's much more that goes into making a team really efficient. You can easily have a group of people with a 100% utilization performing much worse than a 75%, well-allocated team.

The trick is to think about what you are giving up by having someone scheduled for every minute of every day.

(I think that when a person has to work on only one project, a higher utilization is usually a good thing. The more projects you add to the mix, the less efficient people become.)

I don't have the right formula. There's probably none (it depends on the person, the project, the rest of the team, the environment, etc.) I think it's a good exercise to always think about the trade offs instead of letting the numbers fool us.

At work we are moving away from the cold numbers into a more strategic team organization. I expect utilization percentages to go down. I also expect us to make much more money.

Free Python course if you rush and use my code

I know a bunch of you are Python fans. I also know that most of you probably like free stuff.

If that's the case, you might want to check the "Learn Python GUI programming using Qt framework" course by Udemy.

This course is about Python GUI programming and building GUI applications using Python and Qt framework. We will see how we can build powerful desktop applications using nothing but Python and one of its Qt bindings.

The first 50 users that use the coupon code svpino will get totally free access to the course (a $79 value!)

I've personally haven't taken the course myself, but since it's going to be free for you, I don't see how you can go wrong.

Let me know if you liked it.

When cookies don't work and you have to stay late a couple of nights

Most of the time you can't think about what's hidden around the corner. It doesn't matter how much time you spend thinking about doing something, you won't be able to properly measure the effort until the job is done.

The assignment looks simple on the surface: moving all the pages of a hybrid mobile application to be locally stored on the mobile device to create a better experience for the user.

The mobile application was working fine before. But it's using online pages that (duh!) take time to load. If only we could store all those pages as part of the mobile package, the browsing experience would be faster for the user, since only dynamic content would have to be loaded (and we are using AJAX calls for that already.)

To finally paint the full picture, the application authenticates using OAuth2 with the backend services. The token information is stored in a server (that we started calling "proxy" and haven't stopped since) and a surrogate session ID is created for the client to use (thus avoiding sharing the token and refresh_token with the client app.)

By the way, to make matters a little bit more complicated, since the mobile application is just a native wrapper to a responsive web page, we are reusing all assets and code in our regular web application: only one web application that works on your desktop browser and is also embedded in a wrapper running on your mobile device.

Here is my best attempt to draw the entire architecture of the application (sorry for the messy illustration):

From online to offline

Back to what we wanted to do: making our hybrid mobile application load pages stored offline in the mobile device instead of going out there to the server to load them.

I thought this was going to be easy. We had it like that before anyways (before OAuth2 was introduced in the picture, and early enough that authentication was hardcoded in the app.) Probably we just needed to flip the switch again and move it all back offline.

We quickly realized that things were going to be more interesting this time around. Look at this workflow:

  1. User loads the mobile app
  2. A locally-stored (offline) page loads up asking them to sign in.
  3. User it's redirected to the OAuth2 (online) authentication screen (this happens through the proxy server.)
  4. After specifying username and password, user is taken to the OAuth2 (online) authorization screen.
  5. After allowing access to our application, the OAuth2 server redirects the user back to our proxy server, including all the necessary authentication information.
  6. Our proxy server then creates a surrogate session (to avoid exposing the token with an unsecured client), saves it in a cookie for the client to read, and redirects the user to the main page of the site.

So far, all it's good for the web, but remember we want that "main" page of the site to also be locally stored in the mobile device. How can we get our proxy server to redirect to a page that isn't online?

Mobile-in-the-middle

A while ago I learned how to intercept web requests in the embedded browser of the mobile device and cancel or change them to whatever I needed.

So that's what we started doing: whenever we were ready to display our main (online) page, it was as simple as stopping the request from the browser, and replacing it with the locally stored offline version of the page.

When using the regular web version, the user would get the online main page after authenticating, but when running from inside our mobile wrapper, the user would get the locally stored version of the same page, because our code replaced it on the fly.

Unfortunately, I failed to see where this model gets a little more complicated.

Where the heck is my cookie?

Offline pages and cookies do not get along.

A cookie needs a domain. An offline page stored in a mobile device doesn't have one. It's a page that we load using the file:// protocol thus can't access cookies saved by an actual website.

Our proxy server creates a session that's served and saved as a cookie on the client side. As soon as we made our pages work offline, there was no session saved, so users couldn't log in anymore.

Oh crap!

I thought this was going to be easy. Now it turned to be 10 times more complicated than what I anticipated.

Cookieless sessions

Right away we discussed about a cookieless session. You know, those where you send the session information as part of the URL of the page.

This was very common back in the day when users companies loved to disable the access to cookies in the browser. There are still several websites that use this approach.

The advantage of this mechanism was that we could fix our mobile issue with a well-known and easy to implement approach, but security-wise I had a concern: what happens if a user shares a URL to our application that includes the session information? We would have to protect from this scenario by checking IP addresses or I-don't-know-what-else, but this was definitively something to think about.

There had to be a better way.

Cookieless for mobile. Cookies for the web.

We decided to use both approaches: cookies for the regular web application, and an embedded session string as part of the URL for the mobile application. This way we avoided the security issue with the cookieless session (since the mobile application doesn't display the URL) and kept the web application working as it was.

Depending on the application, we started asking the OAuth2 server to redirect the user to a different URL on the proxy server. This way, we implemented a handler for creating and managing cookieless sessions and another one for the regular cookie mechanism.

A couple of nights (and days) later, the final solution was working perfectly fine.

What I got from the experience

First and foremost, this was another example of why estimation is so hard in a complex field like Software Engineering. Sometimes, what seems super simple can blow up to unimagined proportions.

The second lesson is how hard work pays off at the end. We could have easily settled with the online (slow) version of the application, but because we pushed hard we accomplished what we really wanted.

Finally, we all ended with an enormous satisfaction of implementing something clever that makes using our application a little bit better.

Nothing like this feeling to fuel your passion to return next day to work.

I don't have enough time to develop but that's alright

I've always wanted to write while drunk, sitting on a plane, waiting for a storm to go away before taking off.

What could be better than that?

Lately a couple of people have asked me where do I find time to develop anymore. They see me a lot in different meetings, so they wonder how in hell I find time to code.

The answer is that I don't have time to develop anymore like before. But that's alright. I still find myself doing some things during weekends or late nights, but most of my days are filled with different tasks that don't involve direct coding (but I still spend 90% of my time interacting with source code in one way or another.)

Here is the thing: I don't mind as much as I thought I would.

Coding is great and I love it, but nowadays I get to do something that I love even more: improving other people's careers.

We developers spend so much time coding that we usually forget about everything else needed to be a great professional in our industry. Lately I've got a lot of time to think and help my team with that, and I'm really happy about it.

Just like a developer feels great writing code, I've found an enormous potential in helping them with everything else around their code: communication, managing expectations, estimation, architecture, flexibility, and anything that helps people be more efficient.

The idea of contributing not only to a particular project with my code, but to several different applications thanks to my team is incredibly rewarding. This is the type of work I've always dreamed about doing, so I'm psyched I get paid for it.

All that being said, I still code. Mostly during my free time and mostly stuff that I really enjoy and keep me on top of my game. And I love it.

The plane just took off. Finally.

Programming challenge: boolean parenthesizations

I thought I wasn't going to be able to post a challenge this week, because I'm traveling out of the country (with no Internet connection most likely.) However, last week I implemented the possibility to publish future posts in the blog, and this is how this one was created.

(I wrote the post on Friday, left the country on Saturday, and hoped my code worked to publish this post on Sunday. I'm back this coming Saturday hoping to see that everything was fine.)

For this week here is a pretty cool problem:

Given a boolean expression consisting of a string of the symbols 'true', 'false' and the operators 'and', 'or', and 'xor', write a program that determines all the ways you can add parenthesizes to the expression so it always evaluates to true.

For example, given the expression "true and false xor true" there are two possible solutions:

  • (true and false) xor true = true
  • true and (false xor true) = true

I've personally haven't thought about a solution for the problem yet, but I can easily see how a recursive brute-force algorithm should be pretty simple to create. If you want to make it more exciting, try to think in a non-recursive way of solving the problem.

(This problem looks a whole lot like a dynamic programming exercise.)

In case you are interested, here is the challenge from last week: Programming challenge: towers of Hanoi.

Adam Harris about the RSS feed

Adam Harris asks regarding my previous post about moving the RSS feed generation to a static process:

Why do you generate your rss feed each hour instead of just whenever you publish a new post?

This is a very good question. It was a long time ago when I wrote about the way my blog works behind the scenes, so to anyone reading the last post it might have sounded stupid what I did.

A new post in my blog is just a new file uploaded in the right folder with a specific name convention. Whenever I post something new, I clear the cache, and the engine reloads all the files and the new post shows up.

This means that I don't have a specific signal for when a new post is created (other than when there's no cache and I load all the existing files.) This makes it hard for me to determine when exactly the new RSS feed needs to be refreshed.

Right now it's every hour. I might be able to think in a better way to do this process without having to regenerate the feed every time, but I'm not sure the performance gain will justify the time to implement this.

I'll look into it. It's already in my TODO list for the blog.

Thanks Adam for your question!

Moving from dynamicly generated to a static RSS feed

Before I didn't get too much traffic to the blog. At least not like now.

Things are always easier with few users hitting your code (remember, I wrote the blog engine displaying the text you are reading right now.) One of my goals have always been to not pay for Google App Engine, and host this site as cheapest as I can.

And with more subscribers to my RSS feed, that goal was getting harder and harder to achieve.

I don't know a whole lot about RSS feed readers, but apparently they constantly ping your feed to detect new posts ("constantly" meaning from 10 to 60 minutes depending on the software.) The RSS specification includes a ttl attribute to specify for how long readers can cache the feed, but the readers accessing my site ignored it (or so it seemed.)

So I started to see that most of my traffic was to the /rss path coming from a bunch of feed readers (after Google Reader died, nobody has settle with a "standard" reader.) My feed was dynamically created (and cached) with each request, but this wasn't working in my favor.

I decided then to move the feed to Google Cloud Storage as a static file. Every hour I generate a new feed and save it in GCS as an XML file. All readers are now redirected to that file, so they don't have to hit my App Engine instances.

I removed so much (costly) traffic that it almost feels impossible.

The downside of course is that feed readers can be late up to an hour to receive the latest post from my blog, but I really don't care.

It's been working like this for 3 months now and I haven't heard any complaints.

The lesson here is that things will always change. No matter how "right" you think you are today, and how cool your solution is; scaling your application will bring a whole lot of new problems that will render all your textbook-solutions completely invalid.

Just be prepared for that.

Programming challenge: towers of Hanoi

I'm sorry I missed last weekend's programming challenge. I was on vacation and away from a computer (which was great!) I'll also be absent next weekend since I'll be traveling to Cuba, so I apologize in advance.

But here is the challenge for this week; a classic mathematical puzzle that I find super interesting to solve with a computer:

Given a stack of N disks arranged from largest on the bottom to smallest on top placed on a rod, together with two empty rods, write a program that finds the minimum number of moves required to move the stack from the first rod to the last one, where moves are allowed only if a smaller disk is placed on top of a larger disk.

For example, for 3 disks, the solution requires moving the disks 7 times:

  • Move disk from [1] to [3]
  • Move disk from [1] to [2]
  • Move disk from [3] to [2]
  • Move disk from [1] to [3]
  • Move disk from [2] to [1]
  • Move disk from [2] to [3]
  • Move disk from [1] to [3]

Ideally, your program will print the number of necessary moves together with each one of them.

I'm excited to see what you can come up with! I'm also thinking about not publishing a post with a solution anymore. Turns out that you guys are sending me solutions every week that are better or equally good as mine, so I'll probably tweet one of your solutions instead of writing one. (The exception will be on those problems where I actually want to solve them for fun or to learn something new.)

Update: Here is my solution to the problem, and here is another pretty cool solution that Jose Fidalgo tweeted me.

In case you are interested, here is the challenge from 2 weeks ago: Programming challenge: the position of the element.