Virtual Memory - Part 3

Pratik Parvati

Published May 25, 2020

In the last two articles, we learned about 'the need for virtual memory' and 'how it works'. In this last article on virtual memory, we will learn completely about Transaction Lookaside Buffer (TLB).

Transaction Lookaside Buffer (TLB)

A translation lookaside buffer (TLB) is a memory cache that is used to reduce the time taken to access a user memory location. It is a part of the chip's memory-management unit (MMU). The TLB stores the recent translations of virtual memory to physical memory and can be called an address-translation cache. A TLB may reside between the CPU and the CPU cache, between CPU cache and the main memory or between the different levels of the multi-level cache.

Overview

A TLB has a fixed number of slots containing page-table entries; page-table entries map virtual addresses to physical addresses. The virtual memory is the memory space as seen from a process; this space is often split into pages of a fixed size (in paged memory). The page table, generally stored in main memory, keeps track of where the virtual pages are stored in the physical memory. There are two memory accesses to get the actual address; First, the page table is looked up for the frame number. Second, the frame number with the page offset gives the actual address. Thus any straightforward virtual memory scheme would have the effect of doubling the memory access time. Hence, the TLB is used to reduce the time taken to access the memory locations in the page-table method.

Interesting Note: In a Harvard architecture, separate virtual address space may exist for instructions and data. This can lead to distinct TLBs for each access type, an instruction translation lookaside buffer (ITLB) and a data translation lookaside buffer (DTLB).

The TLB can be used as a fast lookup hardware cache. The figure shows the working of a TLB. Each entry in the TLB consists of two parts: a tag and a value. If the tag of the incoming virtual address matches the tag in the TLB, the corresponding value is returned. A common optimization for physically addressed caches is to perform the TLB lookup in parallel with the cache access. Upon each virtual-memory reference, the hardware checks the TLB to see whether the page number is held therein. If yes, it is a TLB hit, and the translation is made. The frame number is returned and is used to access the memory. If the page number is not in the TLB, the page table must be checked. In addition, we add the page number and frame number to the TLB, so that they will be found quickly on the next reference. If the TLB is already full, a suitable block must be selected for replacement. There are different replacement methods like least recently used (LRU), first-in, first-out (FIFO) etc.

TLB-miss handling

Two schemes for handling TLB misses are commonly found in modern architectures:

With hardware TLB management, the CPU automatically walks the page tables to see whether there is a valid page-table entry for the specified virtual address. If an entry exists, it is brought into the TLB, and the TLB access is retried; this time the access will hit, and the program can proceed normally. If the CPU finds no valid entry for the virtual address in the page tables, it raises a page fault exception, which the operating system must handle. Handling page faults usually involves bringing the requested data into physical memory, setting up a page table entry to map the faulting virtual address to the correct physical address, and resuming the program.
With software-managed TLBs, a TLB miss generates a TLB miss exception, and operating system code is responsible for walking the page tables and performing the translation in software. The operating system then loads the translation into the TLB and restarts the program from the instruction that caused the TLB miss.

Performance Impact

The CPU has to access main memory for an instruction-cache miss, data-cache miss, or TLB miss; The third case (the simplest one) is where the desired information itself actually is in a cache, but the information for virtual-to-physical translation is not in a TLB. These are all slow, due to the need to access a slower level of the memory hierarchy, so a well-functioning TLB is important. Indeed, a TLB miss can be more expensive than an instruction or data cache miss, due to the need for not just a load from main memory, but a page walk, requiring several memory accesses.

If the TLB miss occurs, then the CPU checks the page table for the page table entry. If the present bit is set, then the page is in main memory, and the processor can retrieve the frame number from the page-table entry to form the physical address. The processor also updates the TLB to include the new page-table entry. Finally, if the present bit is not set, then the desired page is not in the main memory, and a page fault is issued. Then a page-fault interrupt is called, which executes the page-fault handling routine. TLB thrashing occurs when it visits pages all over the place in and out between the cache and disk memory that it slows down the entire process, degrading performance in exactly the same way as the thrashing of the instruction or data cache does.

TLB reduces memory access time. This performance improvement is directly proportionate to Hit Ratio i.e. higher hit ratio provides lesser memory access time. Reduced memory access time gives decreased Turnaround time and hence increased throughput. And finally results in increased system performance.

This is all about Virtual memory. I hope, I covered most of the concepts related to Virtual memory. Please let me know if there is anything missed in the comment section; I will try to edit and add in this article. I hope you all enjoyed this Virtual memory marathon.

Pratik Parvati - Software Engineer at Vayavya Labs Pvt Ltd.

Virtual Memory - Part 3

Pratik Parvati

Transaction Lookaside Buffer (TLB)

Overview

TLB-miss handling

Performance Impact

More articles by this author

Others also viewed

Spinlocks vs. Semaphores: Understanding Synchronization Mechanisms

What is Cache? And Why It Matters Before Understanding Caching

From CPU Profile to Optimized Code in Minutes: How N|Sentinel Turns Node.js Telemetry into Action

Persistent Memory – The Next Giant Leap in System Performance

Memory Barriers in Network Drivers: Understanding smp_wmb()

Dedicated CPU Vs Shared vCPUs

OS Fundamentals – Part 2: Virtual Memory and Paging: Expanding Memory Beyond Installed RAM

"Cache Coherence" in an hour...

What exactly is a server, what does the spec says.

TEEs: Introduction and Intel SGX

Explore content categories

Transaction Lookaside Buffer (TLB)

Overview

TLB-miss handling

Performance Impact

Optimizing Verification Workflow with Portable Test and Stimulus Standard (PSS): Best Practices

Oct 11, 2024

Bridging the Gap: Integrating PSS with Hardware-Software Co-Verification

Aug 31, 2024

Exploring Advanced Techniques in Portable Stimulus Standard (PSS) for Semiconductor Verification

May 12, 2024

Understanding the Core Components of Portable Stimulus Standard (PSS) Models

Oct 1, 2023

Demystifying the Portable Stimulus and Test Standard (PSS): Revolutionizing Semiconductor Verification

Sep 23, 2023

Design Pattern - Factory

Jul 8, 2023

Design Pattern - Builder

Mar 25, 2023

Design Pattern - SOLID Principles

Mar 19, 2023

C++ bad habits - How to make mistakes!

Jan 9, 2022

C++ STL Iterators

Aug 28, 2021

Others also viewed

Spinlocks vs. Semaphores: Understanding Synchronization Mechanisms

What is Cache? And Why It Matters Before Understanding Caching

From CPU Profile to Optimized Code in Minutes: How N|Sentinel Turns Node.js Telemetry into Action

Persistent Memory – The Next Giant Leap in System Performance

Memory Barriers in Network Drivers: Understanding smp_wmb()

Dedicated CPU Vs Shared vCPUs

OS Fundamentals – Part 2: Virtual Memory and Paging: Expanding Memory Beyond Installed RAM

"Cache Coherence" in an hour...

What exactly is a server, what does the spec says.

TEEs: Introduction and Intel SGX

Explore content categories