Alignment and Packing

Here’s a topic relevant to both the new books I’m writing. Consider the following small program:

char foo[] = {1, 2, 3, 4, 5, 6, 7, 8};
void setup() {
  Serial.begin(9600);
  while (!Serial);
  Serial.println(*(uint32_t *)&foo[0], HEX); 
  Serial.println(*(uint32_t *)&foo[1], HEX); 
  Serial.println(*(uint32_t *)&foo[2], HEX); 
  Serial.println(*(uint32_t *)&foo[3], HEX); 
}
void loop() {
}

If you run this on an AVR-based Arduino board, like an UNO, it will display:

4030201
5040302
6050403
7060504

Which shows that integers (in this case 32 bit integers) can be located at any address, and it also shows that the byte ordering is “little-endian”. However if you run this on a SAMD21-based Arduino you get this output:

4030201

And the microcontroller appears to “hang”. The problem is that the ARM processor core cannot access words unless they are aligned (locations must be divisible by four). The compiler and linker insure that integers and floats are always assigned to aligned addresses, padding as necessary. So everything works out fine (except for programs like the above!)

Oh my! Since I originally wrote this post there there seems to have been a change in the code generation of the compiler. It seems to be that if the compiler can’t vouch for the address to be word aligned the compiler will generate code that reads one byte at a time. So the above program now works!

However consider the following situation. We usually use structures (struct) to show how memory is used in hardware interfaces or software protocols. In this case consider a device that sends a data packet that is five bytes long to our microcontroller. It consists of a single byte that represents a command followed by a unsigned four byte value (thankfully little-endian) that is a data argument. We could declare this:

struct FOO {
  uint8_t cmd;
  uint32_t data;
} packet;

We could get this data using the serial port with, say Serial.readBytes((char *)&packet, sizeof(struct FOO)); Everything would work fine with the UNO, but if we tried it on a Nano 33 IoT (which has the SAMD21) we would find that while it wouldn’t crash, it wouldn’t work either.

When we compiled for the Nano 33 IoT, the compiler adds three bytes of fill after the command byte so that the data byte is aligned to a divisible by four address. (The structure will always start at a divisible by four address.) The size of the structure becomes eight bytes instead of five even though we specified the data sizes explicitly.

The solution to the problem is to force the structure to be packed, which means that no fill bytes will be added and integers won’t necessarily be aligned to word boundaries. The struct definition becomes:

struct __attribute__((__packed__)) FOO {
   uint8_t cmd;
   uint32_t data; 
} packet;

Now the structure is five bytes like it should be. By the way, if you add this attribute to the UNO compile it will have no effect because with the AVR compiler structures are always packed.

So what happens when we access packet.data, which is now no longer on a word boundary? In this case the compiler is smart enough to generate the code that fetches the value a single byte at a time and combines them into a word. Access is much slower this way, but at least it will work.