Meta Reality Labs presentation on Data-Driven Processor Core Selection

Meta Reality Labs presentation on Data-Driven Processor Core Selection

Another highlight at the Synopsys Virtual Prototyping Day 2025 was the presentation by Anusha Vasan , Performance Architect at Meta Reality Labs. Anusha talked about core benchmarking with Platform Architect to enable informed processor core selection in the early architecture phase through architecture analysis.

Her key takeaways are

  • The systematic approach using Synopsys Platform Architect streamlines early architecture exploration, enabling data-driven decisions about core selection, configuration, and memory hierarchy. Anusha was able to run, root-cause, and analyze over 500 experiments in 3–4 months.
  • Both silicon and firmware teams benefit from informed architectural decisions early in the SoC design phase, leading to more efficient and performant SoC designs.

Anusha performed experiments along 3 main dimensions in the design space:

  1. Core Type: evaluate different third-party core models within Platform Architect to assess their performance.
  2. Core Configuration: sweep through core parameters such as cache size, Tightly Coupled Memory (TCM) size, and prefetch capability to understand their impact.
  3. Memory Hierarchy: vary memory latencies (on-chip vs. off-chip) to assess performance trade-offs.

The Simulation Setup in Synopsys Platform Architect consists of cycle-accurate SystemC core models, connected to generic configurable interconnect, memory, and peripheral components. This provides a flexible environment for running parametric sweeps. Based on this setup, Anusha shared results from her experiments with 3 different workloads.

 1. Key results from generic benchmark programs:

  • Mem Copy (Memory-Bound): Small cache sizes (32KB) are sufficient if the code runs from memory with less than 13 clock cycles latency. Performance degrades sharply beyond this point.
  • Matrix Multiplication (Compute-Bound): optimized Cache-friendly code reduces the need for large caches. Unoptimized code benefits from larger caches as the memory latency increases.
  • Read Bandwidth: Bandwidth depends on cache hit rates and memory latency. Understanding this helps to analyze contention for system-level memories.

2. Key results from Realistic Workloads focused on Augmented Reality use-cases:

  • Workloads differ in instruction mix, e.g. dominated by floating point vs. integer operations
  • For workloads dominated by floating point operations, increasing core frequency yields better performance.
  • For workloads dominated by integer operations, increasing cache size is more beneficial than frequency.
  • This highlights the importance of matching silicon and firmware optimizations to workload characteristics.

3. Key results from Zephyr RTOS Boot:

  • Mapping requests from instruction and data caches to different memories with optimized latencies can improve boot performance.
  • Utilizing prefetch capabilities and strategic buffer/memory placement are key levers to improve firmware performance.

Anusha summarized the following Recommendations & Outcomes from her benchmarking experiments:

  • Optimize cache sizes and memory hierarchy based on workload needs.
  • Select appropriate core frequency for specific workload types.
  • Optimize SRAM/DRAM sizing and placement relative to cores.
  • Optimize Firmware code for cache alignment and prefetching.
  • Place performance-critical code and data into low-latency memory regions.

In case you missed it, Anusha's presentation is available on the Synopsys Virtual Prototyping Day 2025 event website.

Synopsys Inc Meta

To view or add a comment, sign in

More articles by Tim Kogel

Others also viewed

Explore content categories