So I was at work talking with a colleague about various nerdy topics like what Linux distro makes for the best desktop experience (or laptop, as the case was) and trying to figure out why a CentOS Spacewalk box was throwing 79,000+ syslog events in only a week… and then an email came in from Red Hat titled “Be prepared for leap second 2015.”
Sigh, where to begin.
Here’s a link to an article describing what the “leap second” is all about, but the basics are that the Earth’s rotation is slowing a couple thousandths of a second every day while atomic time is constant. So, every so often, the powers that be basically stall clocks by a single second so that atomic time and conventional UTC time line up. So, for one day (June 30th, to be exact) the clocks will read 23:59:60 (or 11:59:60 PM) whereas they would ordinarily click over to 00:00:00 after 23:59:59.
The problem is that the majority of computer systems these days are programmed to synchronize time to an NTP source. The server(s) running this blog subscribe to pool.ntp.org, for instance. Every few minutes the servers go out to an atomic time source and say “Hey, what time is it?” and the atomic clock replies, and if it’s out, the server time is adjusted. The real issue resides in the fact that 23:59:60 does not exist in the implementation of UTC time and so the kernel must know to increment the system clock back one second at the specific date/time. So, in some cases, 23:59:60 is interpreted the same as 00:00:00 by some kernels and therefore the single second offset is not accounted for.
If a server subscribes to NTP synchronized time then the NTP source should make the server aware and the system clock should read 23:59:59, 23:59:59, 00:00:00. For systems not running NTP the time will simply be off by a single second if the time were exactly correct. Many Linux kernels were programmed to insert the leap second in the syslog and the system will hang due to a deadlock on xtime_lock.
The fix is simple but pretty significant – patch the kernel. Not a huge deal in the grand scheme of things but if you work at an MSP or large corporation with tons of servers, getting everyone to sign off on a kernel update can be a serious challenge. Well… you’ve got 3.5 months, so… get going!
It’s not the first time a leap second has been added (and is in fact the 26th), but considering that RHEL4/5/6/7 are all affected by this potential glitch it is definitely worth investigating and resolving ahead of time. It’s also a good idea to investigate or patch any other distribution running an older kernel.