Dec 12

Friday Post: Time Drift Sucks

Hey All,

I’ve been plagued by a bug at work for the past 6 months. Every so often, my oAuth library would fail out of the blue, causing some of my apps to die. It never occurred in production, only in our development environments. If I made the oAuth token by hand, or used the console, it worked fine. If I waited a few minutes, it would authenticate, and everything worked. Totally random, no rhyme or reason.

It wasn’t until this week when I was trying to use a library and upload some test files to an AWS S3 bucket that I got an error. My request time signature was 19 minutes behind the Amazon server. AWS (and my oAuth lib) have a 15 minute time window. But why was this happening? My clock was synced using NTP. I went to check my app and it had also failed. Then it hit me. I hadn’t rebooted my machine in a week. Our dev environments use vagrant, which runs a small virtual machine that containerizes our software. The virtual machine has its own clock, which gets synced on reboot. Over the week, it’s clock slowly drifted just enough that my timestamps failed for Amazon, and then for oAuth. Every so often it will resync, but not fast enough.

In 2014, I thought Time Drift was a thing of past, a relic of dying CMOS batteries, but it still exists today. So if you are ever using vagrant to develop your applications, setup an NTP daemon on the virtual machine, or just vagrant halt / vagrant up every so often 🙂

Happy Friday!