SOF108 - Computer Architecture Week 11: Memory Hierarchy
Outline
Overview
Overview of Storage System
An expanded view of the memory system
Memory Performance Gap
Memory Hierarchy
Location
Capacity
Access Methods
Access Methods
Physical Types
Physical Characteristics
Memory Hierarchy
Memory Hierarchy
Principle of locality
Principle of Locality
How do memory hierarchies work?
Cache Memory
Cache Operation
Basic Cache Design
Cache Example
Cache Example
Cache Example
Cache Example
Cache Example
Cache Example
Cache Example
Cache Example
Cache Example
Cache Example
Cache Example
Cache Example
Cache Example
Cache Example
Cache Example
No Cache vs Cache
Elements of Cache Design
Cache and Main Memory Structure
Cache Structure: TAG
Cache Lines
Cache Lines: Example
Cache Mapping Function
Cache Mapping: Direct
Direct Mapping: Address Structure
Direct Mapping
Direct Mapping: Address Structure
A Hypothetical Direct Mapping
Direct Addressing: Example
Direct Mapping: Example Address
Cache Mapping: Fully Associative
Fully Associative Mapping: Address Structure
Fully Associative Mapping: Address Structure
A Hypothetical Fully Associative Mapping
Fully Associative Mapping: Example
Fully Associative Mapping: Example Address
Cache Mapping: Set Associative
Set Associative Mapping: Address Structure
Set Associative Mapping: Address Structure
A Hypothetical Set Associative Mapping
Set Associative Mapping: Example
Set Associative Mapping: Example
Advantages and Disadvantages of Direct Mapping
Advantages and Disadvantages of Fully Associative Mapping
Why Set Associative?
Summary
End of Lecture 8! 
Advantages and Disadvantages of Direct Mapping
Advantages and Disadvantages of Fully Associative Mapping
2.62M
Category: informaticsinformatics

SOF108 - Computer Architecture Week 11: Memory Hierarchy

1. SOF108 - Computer Architecture Week 11: Memory Hierarchy

MALAYSIA
SOF108 - Computer Architecture
Week 11: Memory Hierarchy
1

2. Outline

• Fundamentals of Memory Technology
• Why Memory Hierarchy?
• Cache Memory
• Cache Mapping Functions
2

3. Overview

• Computer memory exhibits perhaps the widest range of type, technology,
organization, performance, and cost of any feature of a computer system
• No single technology is optimal in satisfying the memory requirements for
a computer system
• As a consequence, the typical memory system is equipped with a hierarchy
of memory subsystems
• Some internal to the system (we will mainly look into this one’s)
• Some external to the system
3

4. Overview of Storage System

• Key characteristics of computer memory system:
• Location
• Capacity
• Access method
• Physical type
• Physical characteristics
* See Appendix for more details
4

5. An expanded view of the memory system

5

6. Memory Performance Gap

Why we need memory hierarchy?
100,000
Performance
10,000
1,000
Processor-Memory
Performance Gap
Growing
Processor
100
10
Memory
1
1980
1985
1990
1995
2000
2005
2010
Year
Memory Performance Gap
6

7. Memory Hierarchy

• The total memory capacity of a computer can be visualized as hierarchy of
components.
• Auxiliary/secondary memory – Slow but high capacity
• Main memory – Relatively faster than secondary memory, but comparatively smaller
capacity
• Cache memory – Fast memory, but….
• The overall goal of Memory Hierarchy:
• to obtain the highest possible average access speed while minimizing the total cost
of the entire memory system.
7

8.

Storage
Disk drives
RAM
Cache

9.

Disk I/O is very slow
Fast memory is also expensive
CPU
Disk drive
RAM
Cache
Processor
Memory hierarchy allows maximum performance at the lowest
cost
9

10. Location

• Processor (CPU) Local Memory
• registers (e.g. PC, IR etc.)
• Internal / Primary Memory
• main memory (RAM)
• cache
• External / Secondary Memory
• Peripheral storage devices e.g. disk (e.g. floppy, hard disk, CD, DVD etc.), tape
accessible via I/O controllers.
10

11. Capacity

• Internal / Main memory
• Typically expressed in term of bytes (1 byte=8 bits).
• Typical in the range of Gigabyte (GB)
• External memory
• Typically expressed in term of bytes.
• Typical in the range of GB to Terabyte (TB).
11

12. Access Methods

Sequential access
Direct access
Random access
Memory is organized into units
of data called records
Involves a shared read-write
mechanism
Each addressable location in
memory has a unique, physically
wired-in addressing mechanism
Access must be made in a
specific linear sequence
Individual blocks or records have
a unique address based on
physical location
The time to access a given
location is independent of the
sequence of prior accesses and is
constant
Access time is variable
Access time is variable
Any location can be selected at
random and directly addressed
and accessed
Main memory and some cache
systems are random access
12

13. Access Methods

• Sequential access (e.g. tapes)
• Memory is organized into units of data called records - access must be made in a
specific linear sequence. The access time is highly variable.
• Direct access (e.g. disks)
• Individual blocks or records have a unique address based on physical location.
• Access accomplished by direct access to reach the general vicinity plus sequential.
• Access time is variable
• Random access (e.g. main memory)
• Each addressable location (word or byte) in memory has a unique, physically wired-in
addressing mechanism. Access time is constant.
13

14. Physical Types

• Semiconductor memory
• Magnetic surface memory – used for disk and tape
• Optical
• Magneto-optical
14

15. Physical Characteristics

• Volatile
• Information decays naturally when electrical power is switched off - semiconductor
memory. (e.g. main memory)
• Non-volatile memory
• Information remains after power switched off unless deliberately changed - magnetic
surface, optical and semiconductor memories. E.g.: magnetic surface memory (e.g.
hard disk, floppy, flash drive, CD-RW, DVD etc.)
• Non-erasable memory
• Information cannot be altered, except by destroying the storage unit.
• Semiconductor of this type is known as read-only memory (ROM).
15

16. Memory Hierarchy

• Programmers want unlimited amounts of memory with low latency to store
unlimited programs.
• Fast memory technology is more expensive per bit than slower memory
• Solution: organize memory system into a memory hierarchy
• Entire addressable memory space available in largest, slowest memory
• Each level maps addresses from slower, larger memory to faster, smaller
• Incrementally smaller and faster memories, each containing a subset of the memory below it,
proceed in steps up toward the processor
16

17. Memory Hierarchy

17

18.

Cache
CPU
Cache
Cache is typically integrated directly into CPU chip or placed on a
separate chip that has a separate bus interconnect with the CPU
18

19.

Cache
CPU
Cache
Smaller, faster storage device that acts as staging area for subset of
data in a larger, slower device
19

20.

Cache
L1 cache
16KB-128 KB
1 ns
L2 cache
256 KB – 1MB
3–10 ns
L3 cache
2–32 MB
10–20 MB
20

21.

Cache
CPU
CPU
CPU
Cachedoes
provides
an illusion
provided
that
most
How
the cache
know of
to fast
loadaccess,
and store
data so
that
most
are satisfied
by cache
accesses will
be satisfied
by the cache?
21

22. Principle of locality

• Principle of locality / Locality of references: Program access a
relatively small portion of the address space at any instant of time
• Example: 90/10 rule: "A program spends 90% of its time in 10% of its code"
• Implication: we can predict with reasonable accuracy what
instructions and data a program will use in the near future based on
its accesses in the recent past.
22

23. Principle of Locality

• When the computer uses something, it will probably use it again very
soon, and it will probably use the "neighbors" of this item very soon.
• There are two basic types of reference locality – temporal and spatial
locality.
• Temporal locality: states that recently accessed items are likely to be
accessed in the near future.
• Spatial locality: says that items whose addresses are near one another tend
to be referenced close together in time.
23

24. How do memory hierarchies work?

• Fundamental idea of a memory hierarchy:
• For each k, the faster, smaller device at level k serves as cache for larger, slower
device at level k+1
• How do memory hierarchies work?
• Programs tend to access data at level k more often than they access data at level
k+1
• Thus, storage at level k+1 can be slower, and thus larger and cheaper per bit
• Net effect: Large pool of memory that costs as little as the cheap storage near the
bottom, but that serves data to programs at ≈ rate of the fast storage near the top.
24

25. Cache Memory

• Cache: Smaller, faster storage device that acts as staging area for subset
of data in a larger, slower device (RAM)
• Basic concept for Cache:
• higher levels are smaller and faster
• maintain copies of data from lower levels
• provide illusion of fast access to larger storage, provided that most accesses are
satisfied by cache
• work if memory accesses exhibit good locality of references
• Analogy: Think it something like a refrigerator
25

26. Cache Operation

1.
2.
3.
4.
CPU requests contents of memory location
Check cache for this data (level k)
If present (a hit), get from cache (fast)
When a word is not found in the cache, a
miss occurs:
a) Fetch word from lower level (higher cache level,
e.g. k+1) in hierarchy, incur higher latency
b) Lower level may be another cache or the main
memory
c) Place block into cache
5. Then deliver from cache to CPU
26

27.

Cache Operation
Level k:
Level k+1:
48
3
Smaller, faster, more expensive
device at level k caches a
subset of the blocks from level k+1
9
14
10
10
4
Data is copied between
levels in block-sized transfer units
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Larger, slower, cheaper storage
device at level k+1 is partitioned
into blocks.
27

28. Basic Cache Design

• Cache only holds a portion of a program
• Which part of the program does the cache
contain?
• Most recently accessed references
• Cache is divided into units called cache blocks
(also known as cache lines), each block holds
contiguous set of memory addresses
• How does the CPU know which part of the
program the cache is holding?
• Each cache block has extra bits, called cache tag,
which hold the main memory address of the data
in the block
28

29. Cache Example

29

30. Cache Example

30

31. Cache Example

31

32. Cache Example

32

33. Cache Example

33

34. Cache Example

34

35. Cache Example

35

36. Cache Example

36

37. Cache Example

37

38. Cache Example

38

39. Cache Example

39

40. Cache Example

40

41. Cache Example

41

42. Cache Example

42

43. Cache Example

43

44. No Cache vs Cache

44

45. Elements of Cache Design

• Cache Placement Policy / Mapping function: Where can a block be placed in the
upper level?
• direct, fully associative, set associative
• Replacement policy: Which block should be replaced on a miss?
• least-recently used, first-in-first-out, least-frequently used, random
• Write policy : What happens on a write?
• write through, write back, write once
• Number of caches: How many caches?
• single- or two-level, unified or split
45

46. Cache and Main Memory Structure

• Main memory
• Memory size: 2n addressable words (n-bit
address)
• Block size: K words (Word length: x bits)
• Number of blocks: M = 2n / K
• Cache
• Number of lines: C (C << M)
• Number of memory blocks much larger than the
number of cache lines
• Note: Tag is used to identify a block
46

47. Cache Structure: TAG

• A cache line contains two fields
• Data from RAM
• The address of the block currently in the cache.
• TAG field: the part of the cache line that stores the address of the
block
• Many different addresses can be in any given cache line. The tag field
specifies the address currently in the cache line
47

48. Cache Lines

• The cache memory is divided into blocks or lines.
• Currently lines can range from 16 to 64 bytes
• Data is copied to and from the cache one line at a time
• The lower log2(line size) bits of a memory address specify a particular
byte in a line
48

49. Cache Lines: Example

• With a line size of 4, the offset is: log 2 4 = 2 bits
• The lower 2 bits specify which byte in the line
49

50. Cache Mapping Function

• Which memory blocks are allowed into which cache lines?
• Direct mapped (block can go to only one line)
• Associative / Fully Associative (block can go to any line)
• Set-associative (block can go to one of N lines)
Direct
• The simplest technique
• Maps each block of main memory into only
one possible cache line
Associative
• Permits each main memory block to be
loaded into any line of the cache
• The cache control logic interprets a
memory address simply as a Tag and a
Word field
Set Associative
• A compromise that exhibits the strengths
of both the direct and associative
approaches while reducing their
disadvantages
• To determine whether a block is in the
cache, the cache control logic must
simultaneously examine every line’s Tag for
a match
50

51. Cache Mapping: Direct

• Each location (address) in RAM has one specific place in cache where
the data will be held
• Since a data block from RAM can only be in one specific line in the
cache, it must always replace the one block that was already there.
• Map each block of main memory into only one possible cache line.
• Which cache line: (Block number) mod (Number of lines)
51

52. Direct Mapping: Address Structure

• Offset/word: The lower log2(line size) bits define which byte in the block
• Line: The next log2(number of lines) bits defines which line of the cache
• Tag: The remaining upper bits are the tag field
***The bits in the tag field and the line field define the block number
52

53. Direct Mapping

Advantage:
• By looking at the physical character, you can easily
look at the line in the cache and check if the data is
available.
Disadvantage:
• Because of the labelling, there’s a restriction on the
placement
• E.g: Block 0 has to be placed in only line 0, not in
another slots even though they are free
overhead.

54. Direct Mapping: Address Structure

• Memory address is interpreted as tag, line and offset*
• Address length = tag + line + offset = (s + w) bits
• Total number of addressable memory units = 2s+w words or bytes
• Depending on word addressable or byte addressable memory
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2s+w/2w = 2s = 2t+l
• Number of lines in cache = m = 2l
s
Tag, s-l
Line, l
offset, w
*words or bytes – Depending on whether it is word addressable memory or byte addressable memory
54

55. A Hypothetical Direct Mapping

Memory
• 8-bit memory address
• One memory address
contain 1-byte of data
• Let’s assume 1 word is equal
to 1 byte
Where in the cache does the
content of memory address
0b00000000 maps to?
• L0 (line 0)

56. Direct Addressing: Example

6
Direct Addressing: Example
• 32 bit addresses (can address 4 GB)
• 32 KB of cache, 64 byte lines
• Bits to specify offset?
• offset is 6 bits
• Number of lines?
• 32 KB / 64 = 512
• Bits to specify which line?
• log2(512) = 9
• Bits to specify Tag?
• 32-(9+6)

57. Direct Mapping: Example Address

7
Direct
Mapping:
Example
Address
• Using the previous direct mapping scheme with 17 bit tag, 9 bit index/line and 6 bit offset
01111101011101110001101100111000
• Compare the tag field of line 001101100 (10810) for the value 01111101011101110. If it
matches, return byte 111000 (5610) of the line
*Index here is referring to the line number (for direct mapping)

58. Cache Mapping: Fully Associative

8
Cache Mapping: Fully Associative
• In associative cache mapping, the data from any location in RAM can be
stored in any location in cache
• When the processor wants an address, all tag fields in the cache are checked
to determine if the data is already in the cache
• Each tag line requires circuitry to compare the desired address with the tag
field
• All tag fields are checked in parallel

59. Fully Associative Mapping: Address Structure

9
Fully Associative Mapping: Address Structure
• Offset/word: The lower log2(line size) bits define which byte in the
block
• Tag: The remaining upper bits are the tag field.
***The bits in the tag field define the block number

60. Fully Associative Mapping: Address Structure

Memory address is interpreted as tag and offset* (word or byte)
• Address length = (s + w) bits
• Number of addressable units = 2s+w words or bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2s+w/2w = 2s
• Number of lines in cache = undetermined / not relevant
Tag, s
offset, w
*words or bytes – Depending on whether it is word addressable memory or byte addressable memory

61. A Hypothetical Fully Associative Mapping

1
A Hypothetical Fully Associative Mapping
Memory
• 8-bit memory address
• One memory address contain
1-byte of data
• Let’s assume 1 word is equal to
1 byte
Where in the cache does the
content of memory address
0b00000000 maps to?
Can map to any cache lines, i.e.,
L0, L1, L2, L3 (check which line
is empty, if all the lines are
full need a replacement policy)

62. Fully Associative Mapping: Example

2
Fully Associative Mapping: Example
• For a 4 GB address space with 128 KB cache and 32 byte blocks:
• How many bits to specify the offset/word?
• log232 = 5 bits
• How many bits to specify the tag?
• 32-5 = 27 bits

63. Fully Associative Mapping: Example Address

3
Fully Associative Mapping: Example Address
• Using the previous associate mapping scheme with 27 bit tag and 5 bit
offset
01111101011101110001101100111000
• Compare all tag fields for the value 011111010111011100011011001. If a
match is found, return byte 11000 (2410) of the line

64. Cache Mapping: Set Associative

4
Cache Mapping: Set Associative
• Set: The cache lines are grouped into sets
• The data can be stored in any of the lines in the set
• When the processor wants an address, it indexes to the set and then searches the
tag fields of all lines in the set for the desired address
• Example: 2 way set associative mapping: A given block can be in one of 2 lines in only one set
• Map each block of main memory into only one possible cache set.
• Which cache set: (Block number) mod (Number of sets)

65. Set Associative Mapping: Address Structure

5
Set Associative Mapping: Address Structure
• Set: Identifies the set
• Offset/word: The lower log2(line size) bits define which byte in the
block
• Tag: The remaining upper bits are the tag field
***The bits in the tag field and the set field define the block number

66. Set Associative Mapping: Address Structure

6
Set Associative Mapping: Address Structure
• Address length = (s + w) bits
• Total number of addressable units = 2s+w
• Block size = line size = 2w
• Number of blocks in main memory = 2s
• Number of lines in set = k
• Number of sets = v = 2d
• Number of lines in cache = kv = k X 2d
s
Tag, s-d
Set, d
offset, w

67. A Hypothetical Set Associative Mapping

7
A Hypothetical Set Associative Mapping
Memory
• 8-bit memory address
• One memory address
contain 1-byte of data
• Let’s assume 1 word is equal
to 1 byte
Where in the cache does the
content of memory address
0b00000000 maps to?
• Set 0 (can map to either
L0 or L1 in S0)

68. Set Associative Mapping: Example

8
Set Associative Mapping: Example
• 32 bit addresses
• 32 KB of cache, 64 byte lines, 4 way set associative
• Bits to specify offset?
• log2(64) = 6 bits
• Number of lines?
• 32 KB / 64 = 512
• Number of sets?
• 512 / 4 = 128
• Bits to specify set?
• log2(128) = 7
• Bits to specify tag?
• 32-(7+6) = 19 bits

69. Set Associative Mapping: Example

9
Set Associative Mapping: Example
• Using the previous set-associate mapping with 19 bit tag, 7 bit index/ set and 6 bit offset
01111101011101110001101100111000
• Compare the tag fields in all four lines of set 1101100 for the value 0111110101110111000. If a
match is found, return byte 111000 (5610) of that line
*Index here is referring to the set number (for set associative mapping)

70. Advantages and Disadvantages of Direct Mapping

• Advantages:
• Simple
• Inexpensive
• Disadvantage:
• Fixed location for given block - If a program accesses 2 blocks that map to the
same line repeatedly, cache misses are very high (low hit ratio)
70

71. Advantages and Disadvantages of Fully Associative Mapping

• Advantage:
• Flexibility in replacing a block (on a line) when a new block is read into the
cache
• Disadvantage:
• Complex circuitry required to examine the tags of all cache lines in parallel.
• Expensive.
71

72. Why Set Associative?

2
Why Set Associative?
• Advantage: Normally 2-way or 4-way improve hit ratio

73. Summary

3
Summary
• Memory hierarchy is an organizational approach to improve the
performance of the computer
• Different types of Cache addressing
• Each have their own trade-offs

74. End of Lecture 8! 

End of Lecture 8!
• Additional readings:
• Chapter 4 (Cache Memory)
• Stallings, W. Computer organization and architecture: Designing for performance, Tenth Edition Upper Saddle River, N.J: Prentice
Hall.
• Chapter 2 (Memory Hierarchy Design)
• John L. Hennessy and David Patterson, Computer Architecture: A Quantitative Approach, Fourth Edition, Morgan Kaufham.
• Appendix B (Review of Memory Hierarchy)
• John L. Hennessy and David Patterson, Computer Architecture: A Quantitative Approach, Fourth Edition, Morgan Kaufham.
• Acknowledgement:
• Some of the animations (Slides 8 to 13) used in this presentation were prepared by 2018/09 batch students: Kenneth Tan, Tuan Hong
Chua, Jiahao Liu, Chenxi Xiong, ZiXi Hee for their SOF108 class presentation
• Next lecture:
• Number of Cache,Cache replacement policy, Cache write policy, Cache performance metrics, Virtual memory (maybe)
• Announcements:
Mid-semester exam will be scheduled on Week 10 (20th Nov. 2023)
Topics covered: Lecture 1 – 7 (Chapter 1.1 – Chapter 3.3)
• Questions?
74

75.

Appendix: Additional Self Readings
75

76.

Appendix: University's vision and mission
Vision
Xiamen University Malaysia aspires to become a university with a
distinct global outlook, featuring first-class teaching and research, and
embracing cultural diversity.
Mission
To nurture young talents with dignity and wisdom, turning them into
fine citizens of the region who will contribute to the prosperity of the
people and social progress of Malaysia, China and Southeast Asia.
76

77. Advantages and Disadvantages of Direct Mapping

• Advantages:
• Simple
• Inexpensive
• Disadvantage:
• Fixed location for given block - If a program accesses 2 blocks that map to the
same line repeatedly, cache misses are very high (low hit ratio)
77

78. Advantages and Disadvantages of Fully Associative Mapping

• Advantage:
• Flexibility in replacing a block (on a line) when a new block is read into the
cache
• Disadvantage:
• Complex circuitry required to examine the tags of all cache lines in parallel.
• Expensive.
78
English     Русский Rules