Презентация Computer structure pipeline онлайн
На нашем сайте вы можете скачать и просмотреть онлайн доклад-презентацию на тему Computer structure pipeline абсолютно бесплатно. Урок-презентация на эту тему содержит всего 64 слайда. Все материалы созданы в программе PowerPoint и имеют формат ppt или же pptx. Материалы и темы для презентаций взяты из открытых источников и загружены их авторами, за качество и достоверность информации в них администрация сайта не отвечает, все права принадлежат их создателям. Если вы нашли то, что искали, отблагодарите авторов - поделитесь ссылкой в социальных сетях, а наш сайт добавьте в закладки.
Презентации » Технология » Computer structure pipeline
Оцените!
Оцените презентацию от 1 до 5 баллов!
- Тип файла:ppt / pptx (powerpoint)
- Всего слайдов:64 слайда
- Для класса:1,2,3,4,5,6,7,8,9,10,11
- Размер файла:672.04 kB
- Просмотров:57
- Скачиваний:0
- Автор:неизвестен
Слайды и текст к этой презентации:
№5 слайд
![Pipelining Pipelining does](/documents_6/e67377346d0ee936888b2394cb90b3f4/img4.jpg)
Содержание слайда: Pipelining
Pipelining does not reduce the latency of single task,
it increases the throughput of entire workload
Potential speedup = Number of pipe stages
Pipeline rate is limited by the slowest pipeline stage
Partition the pipe to many pipe stages
Make the longest pipe stage to be as short as possible
Balance the work in the pipe stages
Pipeline adds overhead (e.g., latches)
Time to “fill” pipeline and time to “drain” it reduces speedup
Stall for dependencies
Too many pipe-stages start to loose performance
IPC of an ideal pipelined machine is 1
Every clock one instruction finishes
№7 слайд
![Structural Hazard Different](/documents_6/e67377346d0ee936888b2394cb90b3f4/img6.jpg)
Содержание слайда: Structural Hazard
Different instructions using the same resource at the same time
Register File:
Accessed in 2 stages:
Read during stage 2 (ID)
Write during stage 5 (WB)
Solution: 2 read ports, 1 write port
Memory
Accessed in 2 stages:
Instruction Fetch during stage 1 (IF)
Data read/write during stage 4 (MEM)
Solution: separate instruction cache and data cache
Each functional unit can only be used once per instruction
Each functional unit must be used at the same stage for all instructions
№17 слайд
![Forwarding Control Forwarding](/documents_6/e67377346d0ee936888b2394cb90b3f4/img16.jpg)
Содержание слайда: Forwarding Control
Forwarding from EXE (L3)
if (L3.RegWrite and (L3.dst == L2.src1)) ALUSelA = 1
if (L3.RegWrite and (L3.dst == L2.src2)) ALUSelB = 1
Forwarding from MEM (L4)
if (L4.RegWrite and
((not L3.RegWrite) or (L3.dst L2.src1)) and
(L4.dst = L2.src1)) ALUSelA = 2
if (L4.RegWrite and
((not L3.RegWrite) or (L3.dst L2.src2)) and
(L4.dst = L2.src2)) ALUSelB = 2
№29 слайд
![Control Hazard Stall Stall](/documents_6/e67377346d0ee936888b2394cb90b3f4/img28.jpg)
Содержание слайда: Control Hazard: Stall
Stall pipe when branch is encountered until resolved
Stall impact: assumptions
CPI = 1
20% of instructions are branches
Stall 3 cycles on every branch
CPI new = 1 + 0.2 × 3 = 1.6
(CPI new = CPI Ideal + avg. stall cycles / instr.)
We loose 60% of the performance
№30 слайд
![Control Hazard Predict Not](/documents_6/e67377346d0ee936888b2394cb90b3f4/img29.jpg)
Содержание слайда: Control Hazard: Predict Not Taken
Execute instructions from the fall-through (not-taken) path
As if there is no branch
If the branch is not-taken (~50%), no penalty is paid
If branch actually taken
Flush the fall-through path instructions before they change the machine state (memory / registers)
Fetch the instructions from the correct (taken) path
Assuming ~50% branches not taken on average
CPI new = 1 + (0.2 × 0.5) × 3 = 1.3
№32 слайд
![BTB Allocation Allocate](/documents_6/e67377346d0ee936888b2394cb90b3f4/img31.jpg)
Содержание слайда: BTB
Allocation
Allocate instructions identified as branches (after decode)
Both conditional and unconditional branches are allocated
Not taken branches need not be allocated
BTB miss implicitly predicts not-taken
Prediction
BTB lookup is done parallel to IC lookup
BTB provides
Indication that the instruction is a branch (BTB hits)
Branch predicted target
Branch predicted direction
Branch predicted type (e.g., conditional, unconditional)
Update (when branch outcome is known)
Branch target
Branch history (taken / not-taken)
№33 слайд
![BTB cont. Wrong prediction](/documents_6/e67377346d0ee936888b2394cb90b3f4/img32.jpg)
Содержание слайда: BTB (cont.)
Wrong prediction
Predict not-taken, actual taken
Predict taken, actual not-taken, or actual taken but wrong target
In case of wrong prediction – flush the pipeline
Reset latches (same as making all instructions to be NOPs)
Select the PC source to be from the correct path
Need get the fall-through with the branch
Start fetching instruction from correct path
Assuming P% correct prediction rate
CPI new = 1 + (0.2 × (1-P)) × 3
For example, if P=0.7
CPI new = 1 + (0.2 × 0.3) × 3 = 1.18
№41 слайд
![The Memory Space Each memory](/documents_6/e67377346d0ee936888b2394cb90b3f4/img40.jpg)
Содержание слайда: The Memory Space
Each memory location
is 8 bit = 1 byte wide
has an address
We assume 32 byte address
An address space of 232 bytes
Memory stores both instructions and data
Each instruction is 32 bit wide stored in 4 consecutive bytes in memory
Various data types have different width
№43 слайд
![Memory Components Inputs](/documents_6/e67377346d0ee936888b2394cb90b3f4/img42.jpg)
Содержание слайда: Memory Components
Inputs
Address: address of the memory location we wish to access
Read: read data from location
Write: write data into location
Write data (relevant when Write=1) data to be written into specified location
Outputs
Read data (relevant when Read=1) data read from specified location
№44 слайд
![The Program Counter PC Holds](/documents_6/e67377346d0ee936888b2394cb90b3f4/img43.jpg)
Содержание слайда: The Program Counter (PC)
Holds the address (in memory) of the next instruction to be executed
After each instruction, advanced to point to the next instruction
If the current instruction is not a taken branch,
the next instruction resides right after the current instruction
PC PC + 4
If the current instruction is a taken branch,
the next instruction resides at the branch target
PC target (absolute jump)
PC PC + 4 + offset×4 (relative jump)
№45 слайд
![Instruction Execution Stages](/documents_6/e67377346d0ee936888b2394cb90b3f4/img44.jpg)
Содержание слайда: Instruction Execution Stages
Fetch
Fetch instruction pointed by PC from I-Cache
Decode
Decode instruction (generate control signals)
Fetch operands from register file
Execute
For a memory access: calculate effective address
For an ALU operation: execute operation in ALU
For a branch: calculate condition and target
Memory Access
For load: read data from memory
For store: write data into memory
Write Back
Write result back to register file
update program counter
№59 слайд
![Five Execution Steps](/documents_6/e67377346d0ee936888b2394cb90b3f4/img58.jpg)
Содержание слайда: Five Execution Steps
Instruction Fetch
Use PC to get instruction and put it in the Instruction Register.
Increment the PC by 4 and put the result back in the PC.
IR = Memory[PC];
PC = PC + 4;
Instruction Decode and Register Fetch
Read registers rs and rt
Compute the branch address
A = Reg[IR[25-21]];
B = Reg[IR[20-16]];
ALUOut = PC + (sign-extend(IR[15-0]) << 2);
We aren't setting any control lines based on the instruction type
(we are busy "decoding" it in our control logic)
№60 слайд
![Five Execution Steps cont.](/documents_6/e67377346d0ee936888b2394cb90b3f4/img59.jpg)
Содержание слайда: Five Execution Steps (cont.)
Execution
ALU is performing one of three functions, based on instruction type:
Memory Reference: effective address calculation.
ALUOut = A + sign-extend(IR[15-0]);
R-type:
ALUOut = A op B;
Branch:
if (A==B) PC = ALUOut;
Memory Access or R-type instruction completion
Write-back step
№63 слайд
![Delayed Branch Define branch](/documents_6/e67377346d0ee936888b2394cb90b3f4/img62.jpg)
Содержание слайда: Delayed Branch
Define branch to take place AFTER n following instruction
HW executes n instructions following the branch regardless of branch is taken or not
SW puts in the n slots following the branch instructions that need to be executed regardless of branch resolution
Instructions that are before the branch instruction, or
Instructions from the converged path after the branch
If cannot find independent instructions, put NOP
№64 слайд
![Delayed Branch Performance](/documents_6/e67377346d0ee936888b2394cb90b3f4/img63.jpg)
Содержание слайда: Delayed Branch Performance
Filling 1 delay slot is easy, 2 is hard, 3 is harder
Assuming we can effectively fill d% of the delayed slots
CPInew = 1 + 0.2 × (3 × (1-d))
For example, for d=0.5, we get CPInew = 1.3
Mixing architecture with micro-arch
New generations requires more delay slots
Cause computability issues between generations
Скачать все slide презентации Computer structure pipeline одним архивом:
Похожие презентации
-
Clocking for Telecom Infrastructure & Data Center Applications
-
Pipelife колодцы 800 и 1000 мм. Требования к установке колодцев
-
Analysis of Statically Determinate Structures
-
Framed Building Structure
-
Ideal computer system
-
Computer mouse
-
Outlook for E-Waste Solutions: Infrastructure and end-markets for e-scrap in the Northeast
-
Computer Hardware
-
Phase modulating structures for secure communications
-
Personal computer (PC)