Using the Data Cache on Cortex M7 STM32H7 (Rust)
Without breaking DMA peripherals that is
This is specifically for the STM32H7S3L8 Nucleo board, but applies to
many Cortex M7 platforms. Using the cortex_m crate,
enabling the cache itself is simple:
#[embassy_executor::main]
async fn main(_spawner: Spawner) {
let mut cor = cortex_m::Peripherals::take().unwrap();
cor.SCB.enable_icache(); // Instruction cache, easy.
p.SCB.enable_dcache(&mut cor); // Bad!
}
Doing this will break everything using DMA though as CPU reads and write now hit the cache first, and maybe eventually main memory. DMA only reads and writes actual main memory, so we need to ensure all memory writes involving DMA are flushed before starting any transfers. Since that’s very tedious and error prone, a nicer way is to configure designated memory regions that completely bypass caches and use these for all DMA purposes. That’s what the Memory Protection Unit (MPU) 1 on the Cortex M7 is for.
The first step is to edit the linker script and slice off a piece of RAM into a new region. It will probably look different for you so this is just a reference, and of course configure the size (has to be a power of 2) and alignment according the your needs.
MEMORY
{
RAM : ORIGIN = 0x24000000, LENGTH = 256K
DMA_ACCESSIBLE : ORIGIN = 0x24040000, LENGTH = 32K
}
We’ll fill that entire region with one section, the important part
here is the exported symbols __dma_accessible_start and
end that give will be accessible in our program so we know
the address where the uncached region starts and ends.
SECTIONS
{
.dma_accessible (NOLOAD) : {
__dma_accessible_start = .;
KEEP(*(.dma_accessible));
. = ALIGN(16384);
__dma_accessible_end = .;
} > DMA_ACCESSIBLE
}
Now declare the variables exported from the linker script:
unsafe extern "C" {
static __dma_accessible_start: u8;
static __dma_accessible_end: u8;
}
And then enable the MPU and create a region by writing to the
appropriate register. The important part is the “Shareable” bit which
disables caching, and after this it’s safe to call
enable_dcache.
// Credit: Alexandros Liarokapis
unsafe fn set_mpu_dma_region(cor: &mut cortex_m::Peripherals) {
let dma_accessible_start = unsafe { &__dma_accessible_start as *const u8 as u32 };
let dma_accessible_end = unsafe { &__dma_accessible_end as *const u8 as u32 };
let dma_accessible_size = dma_accessible_end - dma_accessible_start;
unsafe {
// Memory barrier to serialize accesses
cortex_m::asm::dmb();
cor.MPU.ctrl.write(0);
}
unsafe {
configure_uncached_region(&mut cor.MPU, 1, dma_accessible_start, dma_accessible_size);
}
unsafe {
cor.MPU.ctrl.modify(|w| {
w
// PRIVDEFENA[2] - Use default memory map as background region.
| (1 << 2)
// ENABLE[0] - Enable MPU
| (1 << 0)
});
cortex_m::asm::dsb();
cortex_m::asm::isb();
};
}
unsafe fn configure_uncached_region(mpu: &mut MPU, region: u32, addr: u32, size: u32) {
assert!(size.is_power_of_two());
assert!(addr % size == 0);
assert!(region <= 7);
let n = size.ilog2();
unsafe {
// choose region
mpu.rnr.write(region);
// disable region
mpu.rasr.modify(|w| {
w
// ENABLE[0] - Region enable bit
& !(1 << 0)
});
mpu.rbar.write(addr);
mpu.rasr.write(
// XN[28] - Instruction access bit (1 : No Execute)
(1 << 28)
// AP[26:24] - Access Permission (011 : Full Access)
| (0b011 << 24)
// (TEX(0b001), S(0), C(0), B(0) : NORMAL / Shareable)
// TEX[21:19]
| (0b001 << 19) // <-- this bit is the entire point
// S[18]
| (0 << 18)
// C[17] - C
| (0 << 17)
// B[16] - B
| (0 << 16)
// SIZE[5:1] - Actual size = 2^(SIZE + 1) => SIZE = log2(Actual size) - 1 = N - 1
| ((n - 1) << 1)
// ENABLE[0] - Region enable bit
| (1 << 0),
);
};
}
Strictly speaking you should invalidate the cache before doing this, but if it’s the first thing the program does there’s no real need.
To allocate a static global buffer in the new uncached region, use
the link_section attribute. Also be sure to sprinkle
#[used] everywhere in case the compiler decides that the
variable is dead code, which can do very funny things.
#[used]
#[unsafe(link_section = ".dma_accessible")]
static mut I2S_DMA_BUFFER: GroundedArrayCell<u32, 2048> = GroundedArrayCell::uninit();
embassy-stm32 I2S example.
So this driver for example ALSO needs to be a global and in the uncached section or else nothing will work:
#[used]
#[unsafe(link_section = ".dma_accessible")]
static mut MY_I2S_DRIVER: GroundedCell<I2S<u32>> = GroundedCell::uninit();
That last bug is absolutely nasty and the reason for this note, hopefully it helps someone some day.
1It’s like an MMU, but carefully designed to never enable anything resembling virtual memory in lower end product lines. To run a real OS you’ll have to pay more.