Embedded systems are generally extremely reliable; they are closer to the “bare metal” and don’t have the layers that a Desktop Operating System has and thus there is less to go wrong*. However, they are not without their own challenges and can also hang on occasion. This can be for a variety of reasons, from self-inflicted coding errors to hardware not behaving. For example, the standard Arduino i2c “Wire” library has no timeouts and has infinite while loops. If something goes amiss with the i2c then the Arduino will hang waiting forever for a change that never comes.
If we want to create as “reliable” a system as we can, then we need a mechanism to rescue us from these situations. The “watchdog” is one technique in our toolbox that we can use to catch a “hung” system. This article explores the use of this simple but powerful device.
Internal Watchdogs
One technique/facility that developers have at their disposal is the so-called “watchdog timer” aka WDT. In essence, this is a separate piece of hardware within the processor that is like a bomb in a film with the countdown sequence. When the count gets to zero, the “bomb” goes off, which in our case is resetting the Arduino. Except this ”bomb” has a big red “Reset” button next to the countdown display. If we manage to press this before the count reaches zero we get to buy ourselves more time – the count resets to the start and begins again. To prevent detonation ever occurring we have to remember to keep pressing that button – if we ever forget, or we get captured by the baddies and prevented from pressing it, then Boom, game over…(but we do get to play the game again with a respawn).
It can actually be a bit more nuanced than that simple analogy, the watchdog can actually let you do other things on timeout or even just act ike an alarm clock.
The Atmega328P Watchdog
The Atmega328P Watchdog Timer, which is the heart of the current Sleepy Pi’s is fed from a separate on-chip 128 kHz oscillator and can be set to timeout on the following values:
- 16 ms
- 32 ms
- 64 ms
- 0.125 s
- 0.25 s
- 0.5 s
- 1 s
- 2 s
- 4 s
- 8 s
As long as you “kick” the watchdog before the timer expires, then you are good. The 328P has also has some very useful features should you forget to kick it. It’s not restricted to just reseting the cpu, it can also be set to:
- Just Stop
- Interrupt
- Reset
- Interrupt and then reset
The interrupt options are the really interesting modes – they allow us to do something if the watchdog fires. You can just use the interrupt mode to do something on a regular interval – remember you are not necessarily resetting the chip. This facility is used extensively in the Arduino Low Power library; you can put the chip to “sleep” and the watchdog wakes it up on a regular basis.
Basic Watchdog Use
The watchdog is pretty easy to use in normal Arduino code. Simply include the header #include <avr/wdt.h> and use wdt_enable to enable and wdt_reset to “kick”the dog. You can hack the standard Blink example to show the simple operation of the watchdog.
#include <avr/wdt.h>
void setup() {
// Initialize serial communication:
Serial.begin(9600);
Serial.println("Start..");
pinMode(LED_BUILTIN, OUTPUT);
// Enable the Watchdog
wdt_enable( WDTO_1S);
}
void loop() {
digitalWrite(LED_BUILTIN, HIGH); // turn the LED on (HIGH is the voltage level)
delay(1000); // wait for a second
digitalWrite(LED_BUILTIN, LOW); // turn the LED off by making the voltage LOW
delay(1000); // wait for a second
wdt_reset();
}
In this simple example, we have started the watchdog with a 1 second timeout. If we use the standard delay(1000), then the loop takes 2 seconds and before it gets to reset the watchdog, so it keeps resetting. If you open up the “Serial Monitor”, then you will see it repeatably printing out “Start…”. If you reduce the delays, say from 1000 to 400, then the loop operates and the watchdog is reset correctly and no restart.
To switch off the watchdog use wdt_disable.
Note that the time-out periods are defined for you – in the code above I used WDTO_1S (1 second) , but I could have used any of the following “defined” settings (from wdt.h):
- WDTO_15MS
- WDTO_30MS
- WDTO_60MS
- WDTO_120MS
- WDTO_250MS
- WDTO_500MS
- WDTO_1S
- WDTO_2S
- WDTO_4S
- WDTO_8S
NOTE: You may have noticed that it is WDTO_15MS rather than WDTO_16MS or WDTO_30MS rather than WDTO_32MS – maybe the code authors only like round numbers?
More Advanced Watchdog Options
Interrupt Only Mode
The section above described all the “standard” watchdog functions. To use the more advanced features we have to get a bit more ninja.
To enable the “interrupt only” mode we have to change a bit in the Watchdog Timer Control Register (WDTCSR). Todo this do the following in setup:
wdt_enable(timePeriod);
WDTCSR |= (1 << WDIE);
Where “timePeriod” is one of those defines from the last section i.e. WDTO_120MS.
This will fire off an interrupt when the WDT expires. To “catch” the interrupt, add the following routine to your code and do something within it.:
ISR (WDT_vect)
{
// Add your own funky code here (but do keep it short and sweet).
}
Interrupt and Reset Mode
This mode resets after the interrupt code has finished executing. To use it is almost identical to the interrupt only mode except that you need to also set the reset bit in the WDTCSR register. You can be fancy about this all one line, but simply it can be this in the setup:
wdt_enable(timePeriod);
WDTCSR |= (1 << WDIE); // Watchdog Interrupt Enable
WDTCSR |= (1 << WDE); // Watchdog System Reset Enable
Add the interrupt code again to catch the interrupt and after it has executed the Arduino will then reset.
ISR (WDT_vect)
{
// Add your own funky code here (but do keep it short and sweet).
}
You can use this mode to capture trouble shooting information. A feature that I will be exploring in a future article.
The Watchdog with Bootloaders: The Loop of Death
It’s worth mentioning that nearly all the Arduino’s you will develop with, operate with a bootloader. This is tremendously useful and allows you to upload code over a comms link, like serial or USB to the Arduino. Most boot loaders wait a few seconds at startup to see if there’s anything trying to talk to it to upload new code. If nothing is trying, then it passes control to your code and your code starts up.
However, if your code has enabled the Watchdog and the code resets, then you can have a situation where the bootloader wait is longer than watchdog time out and so resets the Arduino again. Alternatively, if the timeout is longer than the bootloader wait, your system will work fine until you try and upload new code when the time taken to upload the code may be longer than the timeout and it resets – so you can never upload new code. As you can see, you are in a bit of a pickle. You are in the Bootloader Watchdog Loop of Death.
To escape the Loop of Death, you have to go back to basics and re-burn the bootloader with an In circuit Progammer.
To avoid the Loop of Death, just use a bootloader than is compatible with a watchdog (disables the watchdog first before doing its bootloadery stuff). Fortunately, this is nearly every modern bootloader you encounter, so you may never have to navigate the Loop of Death. But at least you are now “in the know”.
NOTE: The Sleepy Pi bootloaders are fully compatible with the use of the watchdog.
*I was once at Microsoft UK when they were launching Windows Embedded and they were telling a tale of a series of petrol pumps that stopped working when the clocks went forward. The Windows XP they were using had popped up a dialogue asking the user if they wanted to apply daylight savings! I’ve never been a fan of Microsoft, but I can confirm their canteen is damn good. Back in the day, Microsoft and Apple were pretty much neck and neck in their canteens. Of course, there was always Aston Martin’s “Food Theatre”. You can read more about these in my bestselling book: Works Canteens and I : A personal journey through the World’s Canteens.
Nice tutorial; clearly written. However, I have been unsuccessful implementing your Interrupt Only advice on the UNO. There has to be something more because, with your partial code, the System Resets after the ISR.
Could you present the entire code for Interrupt Only? Thanks.
Pingback: Reliable Embedded Systems: Recovering Arduino i2c Bus Lock-ups - Spell Foundry