GPIO on Arduino Nano Every

Using the Arduino Library, a pin (here digital pin 13) can be pulsed at maximum rate with:

 void setup() {
   pinMode(13, OUTPUT);
 }
 

 void loop() {
   digitalWrite(13, HIGH);
   digitalWrite(13, LOW);
 }

On a Nano, the pulse width is 3.2µs with a period of 6.7µs. On the Nano Every, the pulse width is 0.95µs with a period of 2.15µs. This is a result of a cleaner software implementation as well as the faster clock speed (20MHz) with the Nano Every after modifying the boards.txt file.

GPIO (digital I/O) in the Arduino Nano, ATmega328P, was done with DDR, PIN, and PORT registers for each of the B, C, and D ports. In the Nano Every, ATmega4809, each port (there are six, A through F) has a number of registers:

  • DIR direction control, like the DDR register of the ATmega328P
  • DIRSET set bits of DIR (avoids read-modify-write)
  • DIRCLR clear bits of DIR
  • DIRTGL toggle bits of DIR
  • OUT write to port, much like the PORT register of ATmega328P.
  • OUTSET set bits of OUT
  • OUTCLR clear bits of OUT
  • OUTTGL toggle bits of OUT (like writing to PIN register of ATmega328).
  • IN read the port, like the PIN register of ATmega328).
  • INTFLAGS interrupt flags for pin change interrupts
  • Eight other registers, one per pin, for configuration, such as pullups.

In addition, there are “virtual ports” which are in the part of the address space where CPU instructions that can address individual pits exist. VPORTA through VPORTF have these registers, same as the port registers with the same name:

  • DIR 
  • OUT
  • IN
  • INTFLAGS

We can access the ports directly with this code on the Nano:

 void setup() {
    DDRB |= (1<<5);
 }
 
 void loop() {
   PORTB |= (1<<5);
   PORTB &= ~(1<<5);
 }

This creates a 125ns pulse and a 500ns period, and the amount of program memory is reduced by 284 bytes. Because of the different register names, this won’t work on Nano Every, however there is “ATMEGA328 EMULATION”, a software layer, that does allow it to run, and efficiently as well, however it actually takes more program memory than the first example. The pulse is 50ns wide, exactly one clock period, and has a 300ns period. The bit operations on the ATmega4809 seem more efficient than the ATmega328P.

The equivalent on the Nano Every depends on the use of virtual ports. Digital 13 corresponds to port E, bit 2.  With the virtual ports we have:

void setup() {
    VPORTE.DIR |= (1<<2);
 }
 

 void loop() {
   VPORTE.OUT |= (1<<2);
   VPORTE.OUT &= ~(1<<2);
 }

Performance is the same but program memory is reduced by eliminating the emulation code overhead. 90 bytes are saved. Without the virtual ports, it slows down a bit and takes 10 more bytes. Frankly I’m not sure why it’s that many more.

Pulsing a digital output pin, ATmega4809, 20MHz clock.
 void setup() {
    PORTE.DIRSET = (1<<2);
 }
 void loop() {
   // put your main code here, to run repeatedly:
   PORTE.OUTSET = (1<<2);
   PORTE.OUTCLR = (1<<2);
 }

The pulse with is now 100ns with a period of 400ns.

The Pins.h library I provide in Far Inside The Arduino provides a way to access the digital pins just as efficiently (same performance) but with greater legibility. I wrote it to be compatible across all AVR parts:

 void setup() {
    ddr.digital_13 = 1;
 }
 
 void loop() {
   port.digital_13 = 1;
   port.digital_13 = 0;
 }

The part of the definition for the ATmega4809 is:

#if defined (__AVR_ATmega4809__)
 // Arduino Nano Every
 

 /* The gcc compiler will compress reads and stores of these registers
  * to single (atomic) instructions because addresses are in range to do
  * so. It is not obvious that something like VPORTA.DIR |= 1 << 3 will do
  * the same instead of a load, or, store sequence, however it does. 
  */
 

 struct bits
 {
   /* VPORTA.DIR at 0x00, VPORTA.OUT at 0x01, VPORTA.IN at 0x02, VPORTA.INTFLAGS at 0x03 */
   int digital_2:1;
   int digital_7:1;
   int analog_4:1;
   int analog_5:1;
   int :1;
   int :1;
   int :1;  
   int :1;
   
   /* Skip three bytes to get to Port B's register */
   int :16;
   int :8;
   
   /* VPORTB.DIR at 0x04, VPORTB.OUT at 0x05, VPORTB.IN at 0x06, VPORTB.INTFLAGS at 0x07 */
   int digital_9:1;
   int digital_10:1;
   int digital_5:1;
   int :1;
   int :1;
   int :1;
   int :1;  
   int :1;
   
   /* Skip three bytes to get to Port C's register */
   int :16;
   int :8;
 

   /* VPORTC as above */
   int :1;
   int :1;
   int :1;
   int :1;
   int digital_0:1;
   int digital_1:1;
   int digital_4:1;  
   int :1;
   
   /* Skip three bytes to get to Port D's register */
   int :16;
   int :8;
 

   /* VPORTD as above */
   int analog_3:1;
   int analog_2:1;
   int analog_1:1;
   int analog_0:1;
   int analog_6:1;
   int analog_7:1;
   int :1;  
   int :1;
   
   /* Skip three bytes to get to Port E's register */
   int :16;
   int :8;
   
   /* VPORTE as above */
   int digital_11:1;
   int digital_12:1;
   int digital_13:1;
   int digital_8:1;
   int :1;
   int :1;
   int :1;  
   int :1;
   
   /* Skip three bytes to get to Port F's register */
   int :16;
   int :8;
   
     /* VPORTF as above */
   int :1;
   int :1;
   int :1;
   int :1;
   int digital_6:1;
   int digital_3:1;
   int :1;  
   int :1;
   };
 
 /* These macros are used to access the different register bits.
    For example port.digital_8  gives VPORTB.OUT bit 0
    I'm using the same names for these as the other AVRs. */
 
 #define ddr (*(/*volatile*/ struct bits *)(&VPORTA.DIR))
 #define pin (*(volatile struct bits *)(&VPORTA.IN))
 #define port (*(volatile struct bits *)(&VPORTA.OUT))
 #define intflg (*(volatile struct bits *)(&VPORTA.INTFLAGS))
 #endif

Note — this hasn’t been fully tested and verified yet. Use at own risk!