Locked (Atomic) Register read/write

Question

I'm coding something using direct control of GPIO, there are some good resources around for this, such as http://elinux.org/RPi_Low-level_peripherals#GPIO_hardware_hacking ; the process involves open("/dev/mem") and then a mmap operation effectively maps the desired physical address into your virtual address space. Then you read section 6 of this http://www.raspberrypi.org/wp-content/uploads/2012/02/BCM2835-ARM-Peripherals.pdf to find out about how the I/O are controlled.

To change to the function of a pin (input, or output, or various special functions) you modify these 3-bit fields in the GPFSELx I/O registers (000=input, 001= output foe instance). These modification operations are compiled to operations with ordinary load and store (e.g. to change GPIO0 to input: *(regptr) &= ~7; which compiles to something like

    ldr     r2, [r3, #0]     ; r = *ptr (load r2 from I/O register)
    bic     r2, r2, #7       ; r2 &= ~7
    str     r2, [r3, #0]     ; *ptr = r2 (store r2 to I/O register)

The problem is this: if an interrupt occurs between the load and store, and another process or ISR modifies the same I/O register, the store operation (based on a stale read into r2) will revert the effects of that other operation. So changing these I/O registers really needs to done with an atomic (locked) read/modify/write operation. The examples I've seen do not use a locked operation.

Since these I/O registers are generally changed only when setting something up, it's unlikely that problems will occur, but 'never' is always better than 'unlikely'. Also, if you have an application where you are bit-bashing to emulate an open-collector output, then (as far as I can tell) this involves programming the output to 0 and then switching it between output (for low) or input (for off/high). So in that case there would be frequent mods to these I/O registers, and unsafe modifications would be far more likely to cause a problem.

So, there's probably an ARM 'compare and set' or similar operation which can be used here to do this, can anyone point me to that, and how to make that happen from C code?

[Note, nothing special is needed when you have programmed an I/O as output and are just changing it from 0 to 1 or vice versa; since there is an I/O register you write to, to set selected bits to 1 and another to clear selected bits to 0. No read/write is needed for this operation, thus there is no hazard from interrupts].

Maybe I didn't understand this correctly but since you open `/dev/mem` it seems that your code is userspace code. I don't think that in any modern OS one has to be careful about interrupts changing registers values in userspace code. I believe that this wouldn't be a problem even in kernel space code since Linux restores all the registers when interrupt handler finishes its job. — Krzysztof Adamski, Apr 13 '13 at 17:51
My understanding is that the load/store goes to a physical register via the VM mapping set up by mmap (an I/O register, not a CPU register). In this case there no reason that another process, or a device driver can't be doing the same thing concurrently and modifying the same register. (I assume it's modifying a different set of bits in the reg, or clearly we have bigger problems). There is no save/restore of IO registers as there is for processor registers. — greggo, Apr 13 '13 at 18:23
I've edited a bit to clarify 'I/O register' as opposed to r2 etc. — greggo, Apr 13 '13 at 18:28
I can see your point now. It's more of a preemption than interrupt handling problem, though. Using atomic operations would help at least when two processes are trying to set different bits at the same time. — Krzysztof Adamski, Apr 14 '13 at 10:34
ldrex/strex does not work on uncached memory. The exclusive monitor relies on the caches. In fact, it used to be possible to lock-up the CPU hard if you attempted that on an Cortex-A9 SMP system, for example. — thinkfat, Oct 17 '18 at 14:51

greggo · Answer 1 · 2013-04-13T21:48:37.097

I looked into this, the ARM has 'ldrex and 'strex' instructions, the strex will return a fail result if exclusivity is lost (or may have been lost) since the ldrex, which includes a context switch (or another processor modifying the same register in a multi-processor environment). So it can be done using that; if the strex fails you loop up, and re-do the operation (with a fresh ldrex).

ref: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0008a/ch01s02s01.html

The routines below seem to work on the Raspberry Pi (in that they generate the assembler I was expecting; and that the effect on the bits when I use them are as expected. I haven't verified that they protect against the context switch issue). Note that these are inlines rather than functions, so they should be put in a header file.

[ EDIT: This does not work for the purpose discussed, it seems it's disallowed somehow. If I use these routines where *addr is an ordinary variable, it works fine. When I use it where *addr is pointed to a mapped GPIO register, the process gets a bus error. (When I change the ldrex/strex to ldr/str and disable the do loop, it then works). So it seems that the ARM exclusive monitor is not able to, or is not set up to, function on memory-mapped I/O regs, and the question remains open.]

//
// Routines to atomically modify 32-bit registers using ldrex and strex.
// 
//
//
//  locked_bic_to_reg( volatile unsigned * addr, unsigned val )
//                 *addr &= ~val
//  locked_or_to_reg( volatile unsigned * addr, unsigned val )
//                 *addr |= val
//   locked_insert_to_reg( volatile unsigned * addr, unsigned val, int width, int pos )
//           insert 'width' lsbs of 'val into *addr, with the lsb at bit 'pos'.
//           Caller must ensure 1 <= width <= 32 and 0 <= pos < 32-width
//
//
static inline void
locked_bic_to_reg( volatile unsigned * addr, unsigned val )
{
    int fail;
    do{
        asm volatile ("ldrex r0,[%1]\n"
           "   bic r0,r0,%2\n"
           "   strex %0,r0,[%1]": "=r"(fail) : "r"(addr), "r"(val): "r0" );
    }while(fail!=0);
}
static inline void
locked_or_to_reg( volatile unsigned * addr, unsigned val)
{
    int fail;
    do{
        asm volatile ("ldrex r0,[%1]\n"
           "   orr r0,r0,%2\n"
           "   strex %0,r0,[%1]": "=r"(fail) : "r"(addr), "r"(val): "r0" );
    }while(fail!=0);
}

static inline void
locked_insert_to_reg( volatile unsigned * addr, unsigned val, int width, int pos )
{
    int fail;
    if(width >=32 ) {
        *addr = val;    // assume wid = 32, pos = 0;
    }else{
        unsigned m=(1<<width)-1;
        val = (val&m) << pos;   // mask and position
        m <<= pos;

        do{
            asm volatile ("ldrex r0,[%1]\n"
               "   bic r0,r0,%2\n"   /// bic with mask
               "   orr r0,r0,%3\n"    // or result
               "   strex %0,r0,[%1]": "=r"(fail) : "r"(addr), "r"(m), "r"(val): "r0" );
        }while(fail!=0);
    }
}

Seems to me this is the kind of thing that should be in the processor-specific .h files, but no .h file under /usr/include or /usr/lib/gcc/arm-linux-gnueabihf/ contains the string 'ldrex'. Maybe a __builtin__ , or one of the kernel headers? — greggo, Apr 13 '13 at 17:16
ldrex/strex are intended for multi-core sharing of resources (shared ram). swp is traditionally used for single core locking of a single core resource. ldrex/strex, happens to work as a single core solution (DEPENDING ON THE CHIP VENDOR) so it is misused. it does appear to work on the raspberry pi processor though. — old_timer, Aug 01 '13 at 18:12

Locked (Atomic) Register read/write

1 Answers1

Linked