ENGINEERING BLOG

Building a Custom Edge AI Accelerator

How we achieved 45% faster inference using RISC-V Co-Design.

By IdeaWorksInnovations Engineering Team


At IdeaWorksInnovations, we don't just run models; we build the silicon they run on. In our latest R&D project, we set out to break the memory bottleneck in Edge AI inference. The result? A custom Neural Processing Unit (NPU) coupled with a RISC-V core that delivers a 45% performance boost over standard implementation.

The Challenge

Running quantized neural networks on standard embedded CPUs often hits a wall. It's not just about compute—it's about feeding the beast. In a standard loop, the CPU spends thousands of cycles just shuffling data: loading weights, packing them, and then finally executing.

The Solution: Hardware-Software Co-Design

We chose an industrial-grade RISC-V core and extended it with our custom NPU via the APU (Auxiliary Processing Unit) interface.

Architecture Highlights

Code Spotlight: Simplicity by Design

We designed our software stack to be as elegant as our hardware. Here is how simple it is to program a matrix multiplication layer with our DMA intrinsics:

// Pre-packed weights are loaded in the background by DMA
// No CPU cycles wasted on "load" instructions!
npu_dma_load(VRF_B_BASE, &weights[tile_idx], 8);

// RISC-V core can do other work, or wait for sync
npu_fence(); 

// Fire the 8x8 Systolic Array
// Computes 64 MACs per cycle
npu_matmul(VRF_A_BASE, VRF_B_BASE, count);

The Results

We benchmarked this on standard edge workloads. The numbers speak for themselves:

Metric Improvement
Inference Speed +45% Faster
Cycle Count 31% Reduction
Efficiency Significant Power Savings

This isn't just theory. We have reduced the cycle count from 138k to 95k for standard classification tasks, purely by optimizing how data moves through the silicon.

Take the Next Step

This design demonstrates how IdeaWorksInnovations solves the toughest edge AI challenges. Whether you need custom IP for your SoC or optimized deployment for your models, we have the expertise to make it happen.

Interested in seeing the full benchmark data?

Discuss how this IP can accelerate your product.

Contact Our Engineering Team