There was some recent news that an Arduino programmed to count up to 1,000,000,000, speaking each count, had malfunctioned after 10 years, not reaching the one billion count. This had many people (on Reddit) conjecturing how long it would take an Arduino Uno R3 to count to one billion if it didn’t have to output each value.
This seemed to me to be a good test of AI. A simple program to write and test. I expected to have AI write the problem in less time than it would take to run. As I was interested in running LLM models locally on my M4 iMac, I installed LMStudio and tried both the Gemma 3 12B and Deepseek R1 models. In both cases it created a program quickly, along with, especially with Gemma, a lengthy explanation of the thought (?) behind the design. In both cases, the loop looked like this (Gemma 3 shown):
// Count from 1 to 1000000000 (one billion)
for (unsigned long i = 1; i <= 1000000000; ++i) {
// Do nothing in the loop. The time is spent just incrementing i.
// Adding a delay here would defeat the purpose of measuring the counting time.
}
Unfortunately, this program won’t work correctly because the loop is optimized out of existence. The i value is never used. Then I tried ChatGPT (version 5). Here is its entire solution:
void setup() {
Serial.begin(9600);
while (!Serial) ; // Wait for Serial to be ready (on some boards)
Serial.println("Starting count...");
unsigned long start = millis(); // Get start time
volatile unsigned long counter = 0;
for (unsigned long i = 1; i <= 1000000000UL; i++) {
counter = i; // Prevent optimization
}
unsigned long end = millis(); // Get end time
unsigned long elapsed = end - start;
Serial.print("Finished counting to ");
Serial.println(counter);
Serial.print("Elapsed time: ");
Serial.print(elapsed);
Serial.println(" milliseconds");
Serial.print("Elapsed time: ");
Serial.print(elapsed / 1000.0);
Serial.println(" seconds");
}
void loop() {
// Do nothing
}
Because counter is declared volatile the loop must be executed and cannot be optimized away. A volatile variable prevents the compiler from assuming it might be accessed in another execution thread, like in an interrupt service routine. So i must be incremented 1,000,000,000 times, storing the value away in counter with each increment. Execution time is 1320 seconds, which is 22 minutes.
So I thought, what if we make the loop variable volatile, simplifying the program and potentially making it faster? The new loop:
volatile unsigned long counter = 0;
for (counter = 0; counter < 1000000000UL; counter++) {
}
Execution time now is 2201 seconds (or over 36 minutes). So why is it slower? In the ChatGPT program the loop variable, i, can be kept in a machine register for optimum performance. Each iteration only needs to store i into counter. Because volatile variables must be stored in memory, in the second program the loop variable is in memory. Because it is volatile and the microcontroller only performs arithmetic in registers, counter must be loaded into a register, incremented, and then stored back into memory with each iteration. The extra load action increases execution time by 67%.
However we can still beat the ChatGPT program by knowing something about the microcontroller architecture. The Arduino UNO uses an 8-bit microcontroller which means it can only add 8 bits at a time. So the 32-bit increment must be done as a sequence of four instructions. We can improve the execution time at the expense of having a much more complicated program by just incrementing the least significant byte and then after each 256 increments, increment the upper three bytes. We use a union to separate out the least significant byte. Note that this relies on little-endian byte ordering and is considered bad form, but whatever gets the job done is needed!
union {
uint32_t count;
uint8_t lowOrderCount;
} volatile cnt;
#define MAXCOUNT (1000000000UL)
#define UPPER_END (MAXCOUNT & ~0xffUL)
#define LOWER_END ((uint8_t)(MAXCOUNT & 0xffUL))
do { // count up
while (++cnt.lowOrderCount != 0)
; // least significant byte
cnt.count += 256; // carry into upper bytes
} while (cnt.count != UPPER_END);
for (; cnt.lowOrderCount < LOWER_END; cnt.lowOrderCount++)
;
This takes 510 seconds (8.5 minutes).
If we need to perform arithmetic on large numbers, a 32-bit microcontroller is a big win. The counter can be incremented in a single instruction, usually one clock cycle of a typically faster clock-speed microcontroller. For instance, in the new Arduino UNO R4 the ChatGPT program takes 105 seconds, while any Arduino with a SAMD microcontroller takes 126 seconds. Even faster, an RP2040 takes 48 seconds and an ESP32 takes 42 seconds.