Meta Reality Labs presentation on Data-Driven Processor Core Selection

Tim Kogel

Published Jul 14, 2025

Another highlight at the Synopsys Virtual Prototyping Day 2025 was the presentation by Anusha Vasan , Performance Architect at Meta Reality Labs. Anusha talked about core benchmarking with Platform Architect to enable informed processor core selection in the early architecture phase through architecture analysis.

Her key takeaways are

The systematic approach using Synopsys Platform Architect streamlines early architecture exploration, enabling data-driven decisions about core selection, configuration, and memory hierarchy. Anusha was able to run, root-cause, and analyze over 500 experiments in 3–4 months.
Both silicon and firmware teams benefit from informed architectural decisions early in the SoC design phase, leading to more efficient and performant SoC designs.

Anusha performed experiments along 3 main dimensions in the design space:

Core Type: evaluate different third-party core models within Platform Architect to assess their performance.
Core Configuration: sweep through core parameters such as cache size, Tightly Coupled Memory (TCM) size, and prefetch capability to understand their impact.
Memory Hierarchy: vary memory latencies (on-chip vs. off-chip) to assess performance trade-offs.

The Simulation Setup in Synopsys Platform Architect consists of cycle-accurate SystemC core models, connected to generic configurable interconnect, memory, and peripheral components. This provides a flexible environment for running parametric sweeps. Based on this setup, Anusha shared results from her experiments with 3 different workloads.

1. Key results from generic benchmark programs:

Mem Copy (Memory-Bound): Small cache sizes (32KB) are sufficient if the code runs from memory with less than 13 clock cycles latency. Performance degrades sharply beyond this point.
Matrix Multiplication (Compute-Bound): optimized Cache-friendly code reduces the need for large caches. Unoptimized code benefits from larger caches as the memory latency increases.
Read Bandwidth: Bandwidth depends on cache hit rates and memory latency. Understanding this helps to analyze contention for system-level memories.

Recommended by LinkedIn

The Business Power Architecture: Decoding Jensen…

Fernando Espinosa 8 months ago

PART 3: Visualization, Integration, Intelligence: The…

Tony Hemmelgarn 2 months ago

Emergent Vision Technologies releases eSDK Pro…

Emergent Vision Technologies, Inc. 3 months ago

2. Key results from Realistic Workloads focused on Augmented Reality use-cases:

Workloads differ in instruction mix, e.g. dominated by floating point vs. integer operations
For workloads dominated by floating point operations, increasing core frequency yields better performance.
For workloads dominated by integer operations, increasing cache size is more beneficial than frequency.
This highlights the importance of matching silicon and firmware optimizations to workload characteristics.

3. Key results from Zephyr RTOS Boot:

Mapping requests from instruction and data caches to different memories with optimized latencies can improve boot performance.
Utilizing prefetch capabilities and strategic buffer/memory placement are key levers to improve firmware performance.

Anusha summarized the following Recommendations & Outcomes from her benchmarking experiments:

Optimize cache sizes and memory hierarchy based on workload needs.
Select appropriate core frequency for specific workload types.
Optimize SRAM/DRAM sizing and placement relative to cores.
Optimize Firmware code for cache alignment and prefetching.
Place performance-critical code and data into low-latency memory regions.

In case you missed it, Anusha's presentation is available on the Synopsys Virtual Prototyping Day 2025 event website.

Synopsys Inc Meta

Ahmad Samih 9mo

good job Anusha Vasan!

1 Reaction

To view or add a comment, sign in

Meta Reality Labs presentation on Data-Driven Processor Core Selection

Tim Kogel

Recommended by LinkedIn

More articles by Tim Kogel

Others also viewed

RTL Meets AI: How Artificial Intelligence Is Reshaping Hardware Design

AI and the Next Fifteen Years of Chip Design

🚀1000x Faster: Architecting Scalable Next-Gen SoCs Using TLM-2.0 Virtual Prototypes.

RISC-V vs ARM: A Deep Dive into Architecture, Ecosystem, and the Future of Open Hardware

The Future of Factory Optimization with Cross-Platform AI Solutions

Accelerate Your Design Leveraging Altera’s Partners - April 2026 | Part 1

AI Chip RTL Design & Verification Special Edition

The New Reality of AI: Local Compute is Leading Again

The Silicon Bastion | Vol. 2: The Self‑Aware Die & The Rise of Generative Silicon

Hot Chips for AI: Day 1

Explore content categories

Recommended by LinkedIn

More articles by Tim Kogel

Building a Performance Validation Continuum

Meta Platforms' presentation on Design Space Exploration of AI SoCs

New Book on SoC Architecture Design

Webinar on Embedded DevOps Using Virtual Prototypes

Virtual Prototyping Day Presentation: Design Space Exploration of Infrastructure Processing Units

Virtual Prototyping Day Keynote: Virtual Development in Automotive

Automotive Virtual Prototyping

Virtual Prototyping Presentations from SNUG Silicon Valley 2020

Others also viewed

RTL Meets AI: How Artificial Intelligence Is Reshaping Hardware Design

AI and the Next Fifteen Years of Chip Design

🚀1000x Faster: Architecting Scalable Next-Gen SoCs Using TLM-2.0 Virtual Prototypes.

RISC-V vs ARM: A Deep Dive into Architecture, Ecosystem, and the Future of Open Hardware

The Future of Factory Optimization with Cross-Platform AI Solutions

Accelerate Your Design Leveraging Altera’s Partners - April 2026 | Part 1

AI Chip RTL Design & Verification Special Edition

The New Reality of AI: Local Compute is Leading Again

The Silicon Bastion | Vol. 2: The Self‑Aware Die & The Rise of Generative Silicon

Hot Chips for AI: Day 1

Similar topics

Performance Benchmarking Systems

Explore content categories