[Computer Architecture] Multi-cycle Performance

1 분 소요

Multicycle Processor Performance

Instructions take different number of cycles
- 3 cycles: beq, j
- 4 cycles: R-Type, sw, addi
- 5 cycles: lw
CPI is weighted average
SPECINT2000 benchmark:
- 25% loads
- 10% stores
- 11% branches
- 2% jumps
- 52% R-type

Average CPI = (0.11 + 0.02)(3) + (0.52 + 0.10)(4) + (0.25)(5) = 4.12

Multicycle critical path: Tc = t_{pcq_ALUout} + t_mux+ max(t_ALU + t_mux, t_mem) + t_setup

Tc = t_{pcq_ALUout} + t_mux+ max(t_ALU + t_mux, t_mem) + t_setup

= Tc = t_{pcq_ALUout} + t_mux+ t_mem + t_setup

= [30 + 25 + 250 + 20] ps

= 325 ps

For a program with 100 billion instructions executing on a multicycle MIPS processor

Execution Time

= (# instructions) × CPI × Tc

= (100 × 109 )(4.12)(325 × 10-12)

= 133.9 seconds

This is slower than the single-cycle processor (92.5 seconds). Why?

제일 많이 걸리는 게 clock cycle을 정하는데 clock을 정확히 이분하게 나누지 않았다.
- Not even balenced
- Not all steps same length
자르다보면 다음 cycle에 계산하기 위해 저장하기 위한 register(f/f)가 중간중간에 들어가는데 register에서는 t_pcq, t_setup 이라는 overhead로 인해 clock이 정확하게 5배 빨라지지 못하는 것이다.
- Sequencing overhead for each stop(t_pcq+ t_setup = 50 ps)
- 그래서 CPI에서 손해본 것을 clock cylce에서 감당할 수 없었다.

Q) 어떻게 이분하게 쪼개는가

A) mips는 data memory 때문에 unbalence가 발생