Before I potentially dive in with writing a Far Inside SAMD21 book I decided to spend some time just looking at the architecture as well as what Arduino does with it. I did find that there is a lot of code overhead. So the potential savings of coding to “bare metal” is very much reduced. A simple program that does nothing but toggle a pin takes 902 bytes of ROM and 22 bytes of RAM in the Arduino Nano Every. It takes 12080 bytes of ROM and 2204 bytes of RAM in a Arduino Nano 33 IoT. Although I haven’t spent much time comparing them, I think that the generated code is significantly larger for most tasks with the ARM, so even though that the microcontroller has far more ROM and RAM than the Nano Every, the advantage is less than it might first appear. The GCC compiler for AVR seems to be much more sophisticated in performing optimizations.
I’ve looked at generating the shortest possible pulses that can be generated with the GPIO (General Purpose I/O) pins. In Arduino Library code this would be, for instance:
digitalWrite(13,1);
digitalWrite(13,0);
This generates a pulse that is 1.2µs wide on the Nano Every, which happens to be less than half as wide as obtainable on an Uno, but takes 1.5µs on the Nano 33 IoT even though the clock speed of the SAMD21 microcontroller is three times that of the ATMega4809 in the Nano Every. There is a lot of “stuff” that happens when these functions are used. And that overhead is greater in the Nano 33 IoT.
It is much faster to control the GPIO registers directly. In the AVR microcontrollers there are instructions which can set or clear individual bits of a register. This reduces the number of instructions needed to generate the pulse to two, one for output high and the next for output low. Using my provided Pins.h definitions, this can be done with two statements, each of which compiles into a single machine instruction:
port.digital_13 = 1;
port.digital_13 = 0;
This generates a pulse 62ns wide, the width of a single 16MHz system clock cycle.
The ARM microcontroller doesn’t have a traditional microcontroller instruction set capable of altering single bits. Because changing a single bit would involve having to do a read-modify-write sequence of (at least) three instructions, this would cause a problem if an interrupt occurs mid-sequence. We really need an atomic (non-interruptable) operation.
So the GPIO interface not only has a register for writing to the port, but also has registers for setting, clearing, and toggling bits in the port. This way we can set, clear, or toggle individual pins in a single instruction. So we can create the pulse with the following code, which reduces into two consecutive instructions that store into the “toggle register”.
PORT_IOBUS->Group[0].OUTSET.reg = 1 << 17;
PORT_IOBUS->Group[0].OUTCLR.reg = 1 << 17;
By using macro expansions this can be cleaned up into something as simple as was used for the AVR. Anyway, the pulse width is now 20ns wide, the width of a single 48MHz system clock cycle.
If only things were only that simple. The instruction set ends up requiring up to four additional instructions to load constants for the register location and bit map prior to doing the store. So that means that five instructions are necessary, taking 100ns, so the faster ARM microcontroller actually ends up being slower for GPIO!
Also note that while it takes that single instruction to store a variable boolean value into the port of the AVR, this cannot be done with the ARM. AVR:
port.digital_13 = value;
ARM:
if (value)
PORT_IOBUS->Group[0].OUTSET.reg = 1<<17;
else
PORT_IOBUS->Group[0].OUTCLR.reg = 1<<17;
Again, the ugliness can be hidden in macros.
I have used ARM based microcontrollers professionally in my former working life, and they do work well. But I’m just saying that all is not rosey. The Nano 33 IoT provides the benefit of a having WiFi/Bluetooth at about $5 extra cost of the Nano Every, and at about $5 less cost than a traditional Nano or Uno. But if you don’t need the radio there really isn’t much compelling over the Nano Every.