Reliable Embedded Systems: Recovering Arduino i2c Bus Lock-ups

It’s not unheard of for the i2c bus to “lock-up” on you, which can stop your system from working. If you are trying to operate a remote or embedded system, this is not ideal. This article discusses techniques that you can use to try and make your system more fault-tolerant and attempt to recover from any errors. 

The techniques are broadly:

  • Prevent a bus-lockup from hanging your system
  • Recover from and remove the bus lock-up condition

The i2c bus

The i2c (Inter-IC Communication)  Bus consists of 2 wires (hence often called Two-Wire Interface aka TWI when you want to avoid using the i2c licensed term) the Data (SDA) and Clock (SCL). These should normally sit high at whatever voltage your system is running at ie 5V or 3V3. 

It is an incredibly useful interface and you can hang a multitude of very useful chips off just these two wires, for example Real-Time Clocks, EEPROM memories, Accelerometers, interface expands, etc. With a potential multitude of devices hanging off this same bus, they must play nicely with each other, and in order to do this, there is often a Master and number of Slave devices. Each device has an address on the bus and will only speak when spoken to by the Master.

For the purposes of this article I shall just look at what can go wrong and how we might recover from it.

Symptoms of a stuck i2c bus

There are two “Bus Stuck” scenarios: either the SDA or SCL lines become held low. The bus relies on orderly communication and if a device holds one of these lines low you’re in trouble. 

There are many scenarios when something like this can occur. The Analog Devices “Implementing an I2C Reset” suggests :

Frequently the master, which is usually a microcontroller or a gate array, will be interrupted in the middle of its communication with an I2C slave and, upon return, find a stuck bus. Initially this looks like a device problem, but it is not. The slave is still waiting to send the remainder of the data requested by the master. The problem is that the master has forgotten where it was when it was interrupted or reset. An extraneous reset on the processor will generally create this condition, especially if the processor cannot save its status. At this point, the slave will have put the next bit out on the SDA line (because the SCL line may have dropped to a low on reset) and awaits the next clock on SCL. Of course the processor does not send it, and as a result this slave just waits and waits. If the bit the slave puts on the SDA line is a 0, the newly awakened processor sees what appears to be a hung bus. The bus is in a nonoperational mode; however, it is not due to the slave. It is the processor’s fault for not finishing the message it started.

Analog Devices “Implementing an I2C reset” app note.

There are also times when a slave device needs to slow down the Master and it can do this via something called “clock stretching”, where it holds down the clock line until it is ready again. Sometimes this can go wrong and the SCL gets stuck low – which is a right pain by the way. 

Whatever the reason, a stuck bus is very bad news and particularly on Arduinos, particularly because of the Wirelibrary which is used to implement i2c.

Preventing a stuck i2c bus from hanging your system

In order to recover from a stuck bus, you must first prevent it from causing irretrievable errors in your system. If you can do that, then you have a chance to try and recover the situation.

In Arduino’s  you can take one or both of two defensive steps:

  • Replace the i2c Wire Library
  • Use the onboard Watchdog
  • (or do both!)

Changing the i2c “Wire “library

On the Arduino, the standard “Wire” library handles the i2c communication that has been around since the dawn of time. However, that doesn’t mean it’s any good. In my humble opinion its biggest drawback it the use of infinite loops with no timeout. What this means in technical terms is that your code will disappear up its own backside.

A more defensive approach to programming is to always provide an “out”. If you have a loop that is waiting for something to happen, instead of assuming that it is absolutely guaranteed to happen guv’nor, provide a timeout if it hasn’t happened within a certain “reasonable” period of time. This gives you an option to “handle the error” in a controlled manner.

Your homework, then, is to rewrite the Arduino Wire libraries with timeouts. Fame and fortune awaits you…

Still here? Not finished yet? Ok I know this is a bit of a major undertaking, so while you’re rolling your own and then sharing it with me, you could give this library a go from DSS Circuits which does a good job if all you need is Master mode. 

Enable the Arduino Watchdog

The watchdog, to refresh your memory will reset the Arduino if you don’t regularly “kick the dog”. I’ve spoken about it more in the previous articles of this series, but in short, it was designed to rescue you from situations like this. It’ll get you out of the brown smelly stuff.

If you stick with the default Wire library, then you need the watchdog and you need to use it in combination with the techniques to free the bus described below.

Attempt to detect and free a stuck i2c bus

If we have successfully kept our system running in the event of an i2c bus lock-up condition, we must now move onto Phase 2 and try and clear the condition. The techniques on offer vary depending on whether the SDA or SCL lines are stuck low.

You can attempt recovery in various places within your code. If you have changed the i2c Library to one with a timeout, then you may have the option to recover the bus at the point of the failure if your i2c communication. If not and at the very least, you should attempt to recover the bus within the ‘setup’ Arduino code. The theory goes that if you do get onto a situation where the watchdog has fired (i.e.the Wire library infinite loop) then you’ve entered “setup” as a result of a watchdog reboot. The action of a reboot, will not necessarily clear the i2c bus, so you are in danger of repeatably triggering the watchdog and rebooting in a new endless loop of resets. You, therefore, need to analyse the bus and attempt to clear the bus on reboot in the setup.

The options open to you depend on what the bus looks like: SDA low or SCL low? If it’s both, then you are right royally screwed and you need a Microsoft-style fix – i.e. switch it on and off.

If the SDA line is stuck low

Of the two conditions, this is the easier of the two. There’s even a defined procedure in the i2c Specification:

If the data line (SDA) is stuck LOW, the master should send nine clock pulses. The device that held the bus LOW should release it sometime within those nine clocks. If not, then use the HW reset or cycle power to clear the bus

i2c Specification, NXP

The “Implementing an i2c Reset” application note expands this to some pseudo code:

  1. Master tries to assert a Logic 1 on the SDA line 
  2. Master still sees a Logic 0 and then generates a clock pulse on SCL (1-0-1 transition) 
  3. Master examines SDA. If SDA = 0, go to Step 2; if SDA=1,goto Step4 
  4. Generate a STOP condition 

You can implement this recovery with a variation on this code from Forward Computing. The aim of this code is to inject clock pulses onto the i2c line and hopefully unstick the bus.

If the SCL line is stuck low

To quote the i2c Specification again:

In the unlikely event where the clock (SCL) is stuck LOW, the preferential procedure is to reset the bus using the HW reset signal if your I2C devices have HW reset inputs. If the I2C devices do not have HW reset inputs, cycle power to the devices to activate the mandatory internal Power-On Reset (POR) circuit.

i2c Specification, NXP

So there you have it – you have to do a variation of the the classic Microsoft fix 😂.

To design a system to recover from this type of error requires some hardware fore-thought or even a hardware wiring change. Some i2c Devices have a reset line that you can wire to an Arduino I/O pin. A “reset” of the device has a chance of clearing the stuck i2c line. In the absence of a reset line, then you can implement a “kill switch: something to cycle the power of the device to physically reset it.

Summary

To create robust systems we need to anticipate as many problems as we can and put in defensive measures. At the very least I would implement the watchdog and the SDA recovery routine. Both are fairly painless to implement. Of the two scenarios, the SDA line being low is the more common, and changing the i2c library is a bit of a pain – there is no ready-made replacement to the Wire library (and of course it could bring its own issues). The kill switch/reset requires designing in from the outset.

Leave a Comment

Your email address will not be published. Required fields are marked *