Another highlight at the Synopsys Virtual Prototyping Day 2025 was the presentation by
Anusha Vasan
, Performance Architect at
Meta
Reality Labs. Anusha talked about core benchmarking with Platform Architect to enable informed processor core selection in the early architecture phase through architecture analysis.
- The systematic approach using Synopsys Platform Architect streamlines early architecture exploration, enabling data-driven decisions about core selection, configuration, and memory hierarchy. Anusha was able to run, root-cause, and analyze over 500 experiments in 3–4 months.
- Both silicon and firmware teams benefit from informed architectural decisions early in the SoC design phase, leading to more efficient and performant SoC designs.
Anusha performed experiments along 3 main dimensions in the design space:
- Core Type: evaluate different third-party core models within Platform Architect to assess their performance.
- Core Configuration: sweep through core parameters such as cache size, Tightly Coupled Memory (TCM) size, and prefetch capability to understand their impact.
- Memory Hierarchy: vary memory latencies (on-chip vs. off-chip) to assess performance trade-offs.
The Simulation Setup in Synopsys Platform Architect consists of cycle-accurate SystemC core models, connected to generic configurable interconnect, memory, and peripheral components. This provides a flexible environment for running parametric sweeps. Based on this setup, Anusha shared results from her experiments with 3 different workloads.
1. Key results from generic benchmark programs:
- Mem Copy (Memory-Bound): Small cache sizes (32KB) are sufficient if the code runs from memory with less than 13 clock cycles latency. Performance degrades sharply beyond this point.
- Matrix Multiplication (Compute-Bound): optimized Cache-friendly code reduces the need for large caches. Unoptimized code benefits from larger caches as the memory latency increases.
- Read Bandwidth: Bandwidth depends on cache hit rates and memory latency. Understanding this helps to analyze contention for system-level memories.
2. Key results from Realistic Workloads focused on Augmented Reality use-cases:
- Workloads differ in instruction mix, e.g. dominated by floating point vs. integer operations
- For workloads dominated by floating point operations, increasing core frequency yields better performance.
- For workloads dominated by integer operations, increasing cache size is more beneficial than frequency.
- This highlights the importance of matching silicon and firmware optimizations to workload characteristics.
3. Key results from Zephyr RTOS Boot:
- Mapping requests from instruction and data caches to different memories with optimized latencies can improve boot performance.
- Utilizing prefetch capabilities and strategic buffer/memory placement are key levers to improve firmware performance.
Anusha summarized the following Recommendations & Outcomes from her benchmarking experiments:
- Optimize cache sizes and memory hierarchy based on workload needs.
- Select appropriate core frequency for specific workload types.
- Optimize SRAM/DRAM sizing and placement relative to cores.
- Optimize Firmware code for cache alignment and prefetching.
- Place performance-critical code and data into low-latency memory regions.
In case you missed it, Anusha's presentation is available on the Synopsys Virtual Prototyping Day 2025 event website.
good job Anusha Vasan!