# **Virtual Memory Management**

\***Throughout the course we will use overheads that were adapted from those distributed from the textbook website. Slides are from the book authors, modified and selected by Jean Mayo, Shuai Wang and C-K Shene.** 

> *The danger of computers becoming like humans is not as great as the danger of humans becoming like computers.*

**Spring 2019**

1 *Konrad Zuse*

#### Definitions

**□**Cache

Ø**Copy of data that is faster to access than the original**

 $\triangleright$  **Hit**: if cache has copy

**► Miss:** if cache does not have copy

**□** Cache block

Ø**Unit of cache storage (multiple memory locations)**

#### **QTemporal locality**

- Ø**Programs tend to reference the same memory locations multiple times**
- Ø**Example: instructions in a loop**

#### q**Spatial locality**

Ø**Programs tend to reference nearby locations**

Ø**Example: data in a loop**

#### Locality of Reference



**<u></u>During any phase of execution, the process references only a relatively small fraction of pages.**

#### Main Points

- **Q**Can we provide the illusion of near infinite **memory in limited physical memory?**
	- Ø**Demand-paged virtual memory**
	- Ø**Memory-mapped files**
- $\Box$  **How do we choose which page to replace?** 
	- Ø**FIFO (First-In-First-Out), MIN (Optimal), LRU (Least Recently Used), LFU (Least Frequently Used), Second Chance, Clock**

#### **Observations**

- $\Box$  A complete program does not have to be in **memory, because**
	- Ø**error handling codes are not frequently used**
	- Ø**arrays, tables, large data structures usually allocate memory more than necessary and many parts are not used at the same time**
	- Ø**some options and cases may be used rarely**
- **QIf they are not needed, why must they be in memory?**

#### Benefits

- **<u>L</u>**Program length is not restricted to real **memory size. That is, virtual address size can be larger than physical memory size.**
- **Q**Can run more programs because space **originally allocated for the un-loaded parts can be used by other programs.**
- $\Box$  Save load/swap I/O time because we do not **have to load/swap a complete program.**

#### Virtual Memory

- **QVirtual memory** is the separation of user logical **memory from physical memory.**
- **Q**This permits to have extremely large virtual **memory, which makes programming large systems easier.**
- **Q**Because memory segments can be shared, this **further improves performance and save time.**
- **Q**Virtual memory is commonly implemented **with demand paging, demand segmentation or demand paging+segmentation.**

### Demand Paging



## Demand Paging (Before)



## Demand Paging (After)

Page Table

**Physical Memory** Page Frames

Frame Access Page A **page B becomes valid**Page B Virtual Page  $\frac{1}{2}$  Frame for B  $R/W$ Page B<sup>4</sup>  $\cdot$ **bring page B into physical memory, replacing the original page A**  Virtual Page: A Frame for A Invalid **page A becomes invalid**

 $\vdash$  **□** The kernel finds the page in virtual memory, brings it into physical memory. If there is no available page frame available, the kernel find an "occupied" one. □ Suppose page A was chosen. The kernel brings page B in to replace page A. The kernel update page table.

Disk

#### Address Translation

- $\Box$  Address translation from a *virtual address* to a *physical address* **is the same as a paging system.**
- **Q**However, there is an additional check. If the needed **page is not in physical memory (***i.e***., its valid bit is not set), a page fault (***i.e***., a trap) occurs.**
- **QIf a page fault occurs, we need to do the following:** 
	- Ø**Find an unused page frame. If no such page frame exists, a victim must be found and evicted.**
	- Ø**Write the old page out and load the new page in.**

Ø**Update both page tables.**

Ø**Resume the interrupted instruction.**

#### Details of Handling a Page Fault

**Trap to the OS // a context switch occurs Make sure it is a page fault; If the address is not a legal one then address error, return Find a page frame // page replacement algorithm Write the victim page back to disk // page out (if modified) Load the new page from disk // page in** Update both page tables // two pages are involved! **Resume the execution of the interrupted instruction**

#### Hardware Support

- **Qage Table Base Register, Page Table Length Register, and a Page Table.**
- **<u></u>Each entry of a page table must have a valid/invalid bit.** *Valid* **means that that page is in physical memory. The address translation hardware must recognize this bit and generate a page fault if the valid bit is not set.**
- $\square$  **Secondary Memory: use a disk.**
- **Q**Other hardware components may be needed **and will be discussed later.**

#### Too Many Memory Accesses?!

**Q** Each address reference may use at least two **memory accesses, one for page table look up and the other for accessing the page. It may be worse! See below:**



#### Performance Issue: 1/2

- $\Box$  Let  $\boldsymbol{p}$  be the probability of a page fault, the page **fault rate,**  $0 \leq p \leq 1$ **.**
- $\Box$  The effective access time is

**(1-***p***)<sup>\*</sup>memory access time +** *p***<sup>\*</sup>page fault time** 

- **Q**The page fault rate *p* should be small, and **memory access time is usually between 10 and 200 nanoseconds.**
- $\Box$  To complete a page fault, three components are **important:**
	- Ø**Serve the page-fault trap**
	- Ø**Page-in and page-out,** *a bottleneck*
	- Ø**Resume the interrupted process**

#### Performance Issue: 2/2

 $\Box$  Suppose memory access time is 100 nanoseconds, **paging requires 25 milliseconds (software and hardware). Then, effective access time is**

 $(1-p)^*100 + p^*(25 \text{ milliseconds})$ 

**= (1-***p***)\*100 +** *p***\*25,000,000 nanoseconds**

**= 100 + 24,999,900\****p* **nanoseconds**

- $\Box$  If page fault rate is 1/1000, the effective access time **is 25,099 nanoseconds = 25 microseconds. It is 250 times slower!**
- **QIf we wish only 10% slower, effective access time is no more than 110 and** *p***=0.0000004.**

#### Three Important Issues in V.M.

- **QPage tables can be very large.** If an address has **32 bits and page size is 4K, then there are**   $2^{32}/2^{12}=2^{20}=(2^{10})^2=1$ M entries in a page table per **process!**
- q**Virtual to physical address translation must be fast. This is done with TLB. Remove any TLB entries (i.e., copies of now invalid page table entry).**
- **QPage replacement.** When a page fault occurs and **there is no free page frame, a victim page must be found. If the victim is not selected properly, system degradation may be high.**

## How Do We Know If Page Has Been Modified?

- **<u>Every</u> page table entry has some bookkeeping** 
	- Ø**Has page been modified?**
		- ü**Set by hardware on store instruction**
		- ü**In both TLB and page table entry**
	- Ø**Has page been recently used?**
		- ü**Set by hardware on in page table entry on every TLB miss**
- **Q**Bookkeeping bits can be reset by the OS kernel
	- Ø**When changes to page are flushed to disk**
	- Ø**To track whether page is recently used**

## Keeping Track of Page Modifications (Before)



## Keeping Track of Page Modifications (After)



## Modified/Dirty & Referenced/Used Bits

- **Q**Most machines keep dirty/use bits in the page **table entry**
- **<u></u>Physical page is** 
	- Ø**Modified if** *any* **page table entry that points to it is modified (**Modified/Dirty bit**)**
	- Ø**Recently used if** *any* **page table entry that points to it is recently used (**Referenced/Used bit**)**
- **Q**On MIPS, simpler to keep dirty/use bits in the **core map**

Ø**Core map: map of physical page frames**

## Page Replacement: 1/2

#### $\Box$  The following is a basic scheme

- Ø**Find the desired page on disk**
- Ø**Find a free page frame in physical memory**
	- Ø**if there is a free page frame, use it**
	- Ø**if there is no free page frame, use a pagereplacement algorithm to find a victim page**
	- Ø**write this victim page back to disk and update the page table and page frame table**
- Ø**Read the desired page into the selected frame and update page tables and page frame table**
- Ø**Restart the interrupted instruction**

### Page Replacement: 2/2

- $\Box$  If there is no free page frame, two page transfers (*i.e.*, **page-in and page-out) may be required.**
- $\Box$  **A modified bit may be added to a page table entry. The modified bit is set if that page has been modified (***i.e***., storing info into it). It is initialized to 0 when a page is loaded into memory.**
- $\Box$  **Thus, if a page is not modified (***i.e.***, modified bit = 0), it does not have to be written back to disk.**
- $\Box$  Some systems may also have a referenced bit. When a **page is referenced (***i.e***., reading or writing), its referenced bit is set. It is initialized to 0 when a page is brought in.**

23 **QBoth bits are set by hardware automatically.** 

## Page Replacement Algorithms

#### **Q**We shall discuss the following page replacement **algorithms:**

- Ø**First-In-First-Out - FIFO**
- Ø**The Least Recently Used – LRU**
- Ø**The Optimal Algorithm**
- Ø**The Second Chance Algorithm**
- Ø**The Clock Algorithm**
- **Q**The fewer number of page faults an algorithm **generates, the better the algorithm performs.**
- $\Box$  **Page replacement algorithms work on page numbers. A string of page numbers is referred to as a page reference string.**

#### The FIFO Algorithm

**Q** The FIFO algorithm always selects the "oldest" **page to be the victim. Columns organized by "age".**



## Belady Anomaly

- **QIntuitively, increasing the number of page frames should reduce the number of page faults.**
- $\Box$  However, some page replacement algorithms do not **satisfy this "intuition." The FIFO algorithm is an example.**
- **QBelady Anomaly:** Page faults may increase as **the number of page frames increases.**
- q**FIFO was used in DEC VAX-78***xx* **series and NT because it is easy to implement: append the new page to the tail and select the head to be a victim!**



*page fault=***8** *miss ratio=***8/12=66.7%** *hit ratio =* **33.3%**

#### The LRU Algorithm: 2/2

**Q** The memory content of 3-frames is a subset of the memory **content of 4-frames. This is the inclusion property. With this property, Belady anomaly never occurs.** Why?



**1 1 1 1 1 1 1 1 1 1 1**

**2 2 2 2 4 4 4 4 3 3**

**3 3 3 3 3 3 2 2 2**

*extra*

## The Optimal Algorithm: 1/2



*page fault*=6 *miss ratio*= $6/12=50\%$  *hit ratio* =  $50\%$ 

## The Optimal Algorithm: 2/2

**Q**The optimal algorithm always delivers the fewest **page faults, if it can be implemented. It also satisfies the inclusion property (***i.e***., no Belady anomaly).**



 **0 0 0 0 0 0 0 0 0 3 3 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 4 4 4 4 4 4 0 1 2 3 0 1 4 0 1 2 3 4**

*extra*

## The Inclusion Property

**Q** Define the following notations:

- $P = \langle p_1, p_2, ..., p_n \rangle$ : a page trace
- $\triangleright$ *m* : the number of page frames
- $\triangleright M_t$ (P,  $\alpha$ , *m*) : the memory content after page  $p_t$  is **referenced with respect to a page replacement**  algorithm  $\alpha$ .
- $\Box$  A page replacement algorithm satisfies the **inclusion property** if  $M$ <sup> $t$ </sup>( $P$ , $\alpha$ , $m$ ) $\subseteq$   $M$ <sub> $t$ </sub>( $P$ , $\alpha$ , $m$ +1) **holds for every** *t***.**
- **QHomework:** Inclusion property means no Belady **anomaly.**

#### LRU Revisited



## LRU Approximation Algorithms

- **Q**FIFO has Belady anomaly, the Optimal algorithm **requires the knowledge in the future, and the LRU algorithm requires accurate info of the past.**
- **Q**The optimal and LRU algorithms are difficult to **implement, especially the optimal algorithm. Thus, LRU approximation algorithms are needed. We will discuss three:**
	- Ø**The Second-Chance Algorithm**
	- Ø**The Clock Algorithm**
	- Ø**The Enhanced Second-Chance Algorithm**

## Second-Chance Algorithm: 1/3

- $\Box$  The second chance algorithm is a FIFO algorithm. **It uses the referenced bit of each page.**
- **Q**The page frames are in page-in order (linked-list).
- $\Box$  If a page frame is needed, check the oldest (head):
	- Ø**If its referenced bit is 0, take this one**
	- Ø**Otherwise, clear the referenced bit, move it to the tail, and (perhaps) set the current time. This gives this page frame a second chance.**
- **QRepeat this procedure until a 0 referenced bit page is found. Do page-out and page-in if necessary, and move it to the tail.**
- **QProblem:** Page frames are moved too frequently.

#### Second-Chance Algorithm: 2/3

*new*  $page = X$  *<i>rc* = referenced and changed/modified bit pair



#### Second-Chance Algorithm: 3/3

*new page = X*


# The Clock Algorithm: 1/2

- **<u>Let</u> If the second chance algorithm is implemented</u> with a** *circular* **list, we have the clock algorithm.**
- $\Box$  A "next" pointer is needed.
- **Q**When a page frame is needed, we examine the **page under the "next" pointer:**
	- v**If its referenced bit is 0, take it**
	- *<del>❖</del>* **Otherwise, clear the reference bit and advance the "next" pointer.**
- **QRepeat this until a 0 reference bit frame is found.**
- $\Box$  Do page-in and page-out, if necessary

#### The Clock Algorithm: 2/2

*new page = X*



# Enhanced Second-Chance Algorithm: 1/5

- $\Box$  **Four page lists based on their reference-modify bits (***r,c***) are used:**
	- **<del>❖ Q00</del> pages were not recently referenced and not modified, the best candidates!**
	- **<del>❖ Q01</del> pages were changed but not recently referenced. Need a page-out.**
	- **<del>❖ Q10</del> pages were recently used but clean.**
	- **<del>❖ Q</del>11 pages were recently used and modified. Need a page-out.**

# Enhanced Second-Chance Algorithm: 2/5

- **□ We still need a "next" pointer.**
- **Q**When a page frame is needed:
	- **<del>❖</del>Does the "next" frame has 00 combination? If yes, victim is found. Otherwise, reset the referenced bit and move this page to the corresponding list (***i.e***., Q10 or Q11).**
	- *❖***If Q00 becomes empty, check Q01. If there is a frame with 01 combination, it is the victim. Otherwise, reset the referenced bit and move the frame to the corresponding list (***i.e***., Q10 or Q11).**
	- **<del>❖If Q01</del> becomes empty, move Q10 to Q00 and Q11 to Q01. Restart the scanning process.**

## Enhanced Second-Chance Algorithm: 3/5







#### Enhanced Second-Chance **Algorithm: 4/5**<br>001 010 **Q00 Q01 Q10 Q11**



**01**









#### **Q00 Q01 Q10 Q11**

#### **This algorithm was used in IBM DOS/VS and MacOS!**



### Other Important Issues

- **Q**Global vs. Local Allocation
- **Q**Locality of Reference
- **<u></u>Thrashing**
- **Q**The Working Set Model
- **Q**The Working Set Clock Algorithm
- **Q** Page-Fault Frequency Replacement Algorithm

#### Global vs. Local Replacement

- **QGlobal** replacement allows a process to select a **victim from the set of all page frames, even if the page frame is currently allocated to another process. QLocal** replacement requires that each process **selects a victim from its own set of allocated frames. Q**With a global replacement, the number of frames
	- **allocated to a process may change over time, and, as a result, paging behavior of a process is affected by other processes and may be unpredictable.**

#### Global vs. Local: A Comparison

- **Q**With a global replacement algorithm, a process cannot **control its own page fault rate, because the behavior of a process depends on the behavior of other processes. The same process running on a different system may have a totally different behavior.**
- $\Box$  With a local replacement algorithm, the set of pages of **a process in memory is affected by the paging behavior of that process only. A process does not have the opportunity of using other less used frames. Performance may be lower.**
- **□ With a global strategy, throughput is usually higher, and is commonly used.**

#### Locality of Reference



**<u></u>During any phase of execution, the process references only a relatively small fraction of pages.**

#### Thrashing

- **QThrashing** means a process spends more time **paging than executing (***i.e***., low CPU utilization and high paging rate).**
- **QIf CPU utilization is too low, the medium-term scheduler is invoked to swap in one or more swapped-out processes or bring in one or more new jobs. The number of processes in memory is referred to as the degree of multiprogramming.**

## Degree of Multiprogramming: 1/3

- **Q**We cannot increase the degree of multiprogramming **arbitrarily as throughput will drop at certain point and thrashing occurs.**
- **Q**Therefore, the medium-term scheduler must **maintain the optimal degree of multiprogramming.**



degree of multiprogramming

# Degree of Multiprogramming: 2/3

- **1. Suppose we use a global strategy and the CPU utilization is low. The medium-term scheduler will add a new process.**
- **2. Suppose this new process requires more pages. It starts to have more page faults, and page frames of other processes will be taken by this process.**
- **3. Other processes also need these page frames. Thus, they start to have more page faults.**
- **4. Because pages must be paged- in and out, these processes must wait, and the number of processes in the ready queue drops. CPU utilization is lower.**

## Degree of Multiprogramming: 3/3

- **5. Consequently, the medium-term scheduler brings in more processes into memory. These new processes also need page frames to run, causing more page faults.**
- **6. Thus, CPU utilization drops further, causing the medium-term scheduler to bring in even more processes.**
- **7. If this continues, the page fault rate increases dramatically, and the effective memory access time increases. Eventually, the system is paralyzed because the processes are spending almost all time to do paging!**

#### The Working Set Model: 1/4

 $\Box$  **The working set of a process at virtual time** *t***,** written as  $W(t, \theta)$ , is the set of pages that were **referenced in the interval (***t-* $\theta$ **,** *t***], where**  $\theta$  **is the** window size. These are "most recently used" **pages, which can be ordered in the LRU way.**  $\Box \theta = 3$ . The result is identical to that of LRU:



*page fault=***10** *miss ratio=***10/12=83.3%** *hit ratio =* **16.7%**

# The Working Set Model: 2/4

#### $\Box$  **However, the result of**  $\theta = 4$  **is different from that of LRU.**



## The Working Set Model: 3/4

- **QThe Working Set Policy: Find a good**  $\theta$ **, and keep**  $W(t, \theta)$  in memory for every *t*.
- **Q**What is the best value of  $\theta$ ? This is a system tuning **issue. This value can change as needed from time to time.**



# The Working Set Model: 4/4

- **QU**nfortunately, like LRU, the working set policy **cannot be implemented directly, and an approximation is necessary.**
- **Q**But, the working set model does satisfy the **inclusion property.**
- $\Box$  A commonly used algorithm is the Working Set **Clock algorithm,** WSClock**. This is a good and efficient approximation.**



# Example VMOS: 1/2

- q**VMOS (**V**irtual** M**emory** O**perating** S**ystem) was an early OS (1970s) using working set.\***
- **Q**This OS is designed for UNIVAC Spectra 70, **similar to IBM System/370.**
- **Q**Time for adjusting working set:
	- **1. Page fault**
	- **2. A process finishes executing 4000 instruction.**  This time is the window size  $\theta$ .
	- **3. For a process waiting for I/O, unless its**  working set has been adjusted within  $\theta$  time, **its working set has to be adjusted.**

#### Example VMOS: 2/2



# Virtual Relocation Virtual Memory in a VM: 1/4



**What is the VM supports virtual memory?** 

**A page-in in a VM brings its page into its VS; but, Actually the page should**



# Virtual Relocation Virtual Memory in a VM: 2/4



**SGT: Segment Table, PGT: Page Table, FMT: Page Frame Table**

# Virtual Relocation Virtual Memory in a VM: 3/4



# Virtual Relocation Virtual Memory in a VM: 4/4



**page in memory** 

# Virtual Memory Management in Control Program (CP): 1/5

- **□ The control program CP of VM/370 views each virtual machine as a process.**
- **Q** CP uses a second chance page replacement and **working set.**
- $\Box$  All page frames are in two lists **FREELIST** and **FLUSHLIST.**
- $\Box$  **FREELIST** has all free page frames.
- **QIf** for some reason a VM cannot hold its page frames, **all of its page frames are moved to FLUSHLIST. However, page tables are not modified, only showing these pages are not available to use.**

# Virtual Memory Management in Control Program (CP): 2/5

- $\Box$  If the RM needs a page frame, then ...
	- Ø**Take one from FREELIST**
	- Ø**Or, if FREELIST has no page frame available, then take one from FLUSHLIST.**
	- Ø**Or, if FLUSHLIST is also empty, then search the used page frame with the clock algorithm.**
	- Ø**Note that page table entry has to be modified and page-out may be needed.**

# Virtual Memory Management in Control Program (CP): 3/5

- $\Box$  **A VM May or may not be allowed to get page frames.**
- **Q** When a VM is allowed to get pages, the memory **management MM component in RM monitors paging activity of this VM.**
- **Q** Once this VM causes a page fault, MM monitors the **number of in memory page of this VM, until this VM becomes not allowed to get page frames.**
- **Q** At this moment, RM calculates the **Average** Resident Pages **(**ARP**) of this VM.**
- **Q** Note that among these page faults, some causes **removing of its own page, while the others steal other VM's pages.**

# Virtual Memory Management in Control Program (CP): 4/5

- **Q**The MM in CP determines the rate of page faults of **this VM that requires stealing other VM's pages.**
- **Q** If this rate is larger than 8%, this rate is recorded in *S*. Otherwise,  $S = 0$ .
- $\Box$  During this period (i.e., the time a VM allowed to get **pages), let** *P* **be the average life span: time span in this period divided by the number of page faults.**
- $\Box$  Let *I* be the global average life span: total CPU time **so far divided by the number of page faults.**

# Virtual Memory Management in Control Program (CP): 5/5

**Q** The MM component of CP uses the following to **predict the average resident page in the next period:**

$$
\mathbf{newARP} = \max\left( (\mathbf{ARP} + S)\sqrt{\frac{I}{P}}, 5\right)
$$

- **Q**This prediction is usually close to the actual working **set size except for some odd situations. Note that the newARP has at least 5 pages.**
- $\Box$  If system performance goes down because of this **VM's high page faults (** $I > P$  **and/or**  $S > 0$ **), the new prediction is larger.**
- **□ Otherwise,** *I* **may be less than** *P***, and hence the new prediction of ARP may be smaller.**

### Cache Concept (Read)



**Q The cache either returns the value stored at that memory location, or it forwards the request onward to the next level cache**

## Memory Hierarchy



**i7 has 8MB as shared 3rd level cache; 2nd level cache is per-core**

### Cache Concept (Write)



70 **Q** Memory requests are buffered and then sent to the cache in the background: **□ Typically, the cache stores a block of data, so each write ensures that the rest of the block is in the cache before updating the cache**  $\Box$  If the cache is write through, the data is then sent onward to the next level **of cache or memory.**

## Memory Hierarchy

#### **□ Cache memory can be between CPU and memory, external device and memory, etc.**



# Cache Memory: 1/12

- $\Box$  It is possible to build a computer using only static **RAM.**
- **Q**This would be very fast, but the cost can be very high.
- $\Box$  During the course of the execution of a problem, **memory references tend to cluster (e.g., loops).**
- $\Box$  **Thus, we only need a small amount of fast memory between physical memory and CPU, or even on CPU or module.**
### Cache Memory: 2/12



### Cache Memory: 3/12



### Cache Memory: 4/12



# Cache Memory: 5/12

- $\Box$  Direct mapping, associative mapping and set **associative mapping.**
- **Q** Each block of physical memory maps to only one **cache line.**



### Cache Memory: 6/12

**Q** Use the tag field to compare whether a block is in **cache. If the block is in cache, we have a cache hit!**  $\Box$  If a miss, load the block to cache. Only one choice!



77

# Cache Memory: 7/12

- $\Box$  Associative mapping: A physical memory block can **be loaded into any line of cache.**
- **Q** Memory address is interpreted as tab and word.  $\Box$  Tag uniquely identifies block of memory.



# Cache Memory: 8/12

- $\Box$  Associative mapping: A physical memory block can **be loaded into any line of cache.**
- **Q** LRU is usually used to find a victim for the new **block.**s+w



79

## Cache Memory: 9/12

#### **□ Set Associative mapping: Cache is divided into a number of sets, each of which contains a number of lines.**



### Cache Memory: 10/12

#### q*k***-way Set Associative mapping:**



(b) k direct-mapped caches

# Cache Memory: 11/12

#### q*k***-way Set Associative mapping:**



# Cache Memory: 12/12

#### q**Replacement Algorithms:**

- **Example 2 Set Show Mapping:** There is no choice. **Because each block maps to one line, a miss always replace that line.**
- **Example 2 Associative and Set Associative** Mapping**: Random, FIFO, LRU (e.g., in a 2 way set associative, it is easy to find the LRU), LFU (i.e., Least Frequently Used – replacing block which has had fewest hits).**

# **The End**