EECS 151: Complete Study Guide

Introduction to Digital Design and ICs — UC Berkeley, Spring 2026

中文概要

EECS 151是UC Berkeley的数字集成电路设计入门课。涵盖：布尔逻辑、SystemVerilog硬件描述语言、有限状态机、RISC-V处理器设计、流水线、CMOS晶体管电路、延迟分析、逻辑努力、导线与能耗、加法器与乘法器、时序分析与存储器、功耗分析、移位器与交叉开关、除法器、FPGA与ASIC比较。本网站覆盖全部讲义内容加上61C背景知识复习，提供三份考试速查表、交互式动画和练习题。

Course Map

61C Background / Prerequisites

Review of CS 61C concepts needed for EECS 151

中文

这一节回顾61C的核心基础：数的表示(无符号、二进制补码、符号扩展)，RISC-V指令集基础(6种格式)，内存模型(字节寻址、大小端、对齐)，以及C语言到汇编的映射。就像盖房子之前要先了解砖块和水泥一样，这些是设计数字电路的"原材料"。

Number Representation

UNSIGNED & TWO'S COMPLEMENT

Unsigned N-bit: range [0, 2^N-1]. Simply binary place values.
Two's complement N-bit: range [-2^N-1, 2^N-1-1]. MSB has weight -2^N-1.
Negation: flip all bits, add 1. Example: -5 in 8-bit = ~(00000101)+1 = 11111011
Sign extension: replicate the MSB to fill wider register. E.g., 1011 (4-bit, -5) → 11111011 (8-bit, still -5)
Zero extension: pad with 0s (for unsigned values)

中文

二进制补码像一个"环形数轴"：从0往上数到最大正数(01...1)，再往上就跳到最小负数(10...0)。符号扩展就是复制最高位来保持数值不变，就像把"-5"从4位搬到8位时，前面都填1。

WORKED EXAMPLE

Convert -13 to 8-bit two's complement:

13 in binary = 00001101
Flip bits = 11110010
Add 1 = 11110011
Check: -128+64+32+16+0+0+2+1 = -128+115 = -13

RISC-V ISA Quick Reference

6 INSTRUCTION FORMATS

Format	Key Fields	Examples
R-type	funct7 \| rs2 \| rs1 \| funct3 \| rd \| opcode	add, sub, and, or, xor, slt
I-type	imm[11:0] \| rs1 \| funct3 \| rd \| opcode	addi, lw, jalr, slli
S-type	imm[11:5] \| rs2 \| rs1 \| funct3 \| imm[4:0] \| opcode	sw, sb, sh
B-type	imm bits \| rs2 \| rs1 \| funct3 \| imm bits \| opcode	beq, bne, blt, bge
U-type	imm[31:12] \| rd \| opcode	lui, auipc
J-type	imm bits \| rd \| opcode	jal

All instructions are 32 bits. x0 is hardwired to 0. rd = destination register.

中文

RISC-V的6种指令格式就像6种不同的"信封"模板。R型用于寄存器间运算(如加减)，I型用于带立即数的操作(如addi、加载)，S型用于存储(把数据写入内存)，B型用于条件跳转，U/J型用于大跳转和上半立即数。

Memory Model

BYTE ADDRESSING & ENDIANNESS

Byte addressing: each memory address points to one byte (8 bits)
Word: 4 bytes (32 bits) in RV32. lw loads 4 bytes starting at address.
Little-endian (RISC-V): least significant byte at lowest address
Alignment: word access should be at addresses divisible by 4; half-word by 2
Misaligned access may cause exceptions or be slower

中文

内存就像一排编了号的"小格子"，每格存1字节。小端序(Little-endian)就是低位字节放在低地址，像把数字"从右往左"排列。对齐要求4字节数据必须放在4的倍数地址上，否则硬件访问会出问题。

C to Assembly Mapping

COMMON PATTERNS

// C: int x = a + b;
// ASM: add x0_reg, a_reg, b_reg

// C: if (a == b) { ... }
// ASM: beq a_reg, b_reg, label

// C: int arr[10]; x = arr[3];
// ASM: lw x_reg, 12(base_reg)   # 3*4=12 byte offset

// C: for (int i=0; i





Boolean Logic & Combinational Circuits
Lectures 2-3, 7-8 | Midterm 1 scope
中文概要布尔代数是数字设计的数学基础。变量只有0和1两个值。基本运算：AND(与)、OR(或)、NOT(非)。关键定律：德摩根定律(NOT(AB)=A'+B')、分配律、吸收律。组合逻辑的输出仅取决于当前输入。可用真值表、卡诺图、SOP/POS形式表达和化简。噪声容限衡量信号在传输中抗干扰的能力。

Boolean Algebra Laws

IDENTITIES
X + 0 = X, X · 1 = X
X + 1 = 1, X · 0 = 0
X + X = X, X · X = X
X + X' = 1, X · X' = 0

KEY LAWS

De Morgan's: (AB)' = A' + B', (A+B)' = A'B'
Distributive: A(B+C) = AB+AC, also A+BC = (A+B)(A+C)
Absorptive: A+AB = A, A(A+B) = A
Consensus: AB+A'C+BC = AB+A'C



中文
德摩根定律是最重要的化简工具，可以记为"打破长杠，变换运算"：AND变OR，OR变AND，各变量取反。吸收律(A+AB=A)的直觉是：如果A已经是1，加上AB不改变结果。共识定律说多余项BC可以去掉，因为它被AB和A'C覆盖。


Canonical Forms


SOP (Sum of Products): OR of minterms where f=1. Each minterm is an AND of all variables.
POS (Product of Sums): AND of maxterms where f=0. Each maxterm is an OR of all variables.


中文
SOP就是把真值表中输出为1的行写成"乘积之和"，每一项包含所有变量(取值为0的变量加非号)。POS则相反，取输出为0的行写成"和之积"。两者互为对偶。


Noise Margins
FORMULAS





中文
噪声容限就像信号的"安全边距"。NM_H是高电平输出和下一级高电平输入阈值之间的差距，NM_L是低电平的对应差距。越大越抗干扰，就像围墙越高越安全。


Design Metrics


Propagation delay (t_p): time from input 50% to output 50%. t_p = (t_pHL + t_pLH)/2
Rise time (t_r): output 10% to 90%. Fall time (t_f): 90% to 10%.
Throughput: operations per unit time. Latency: time from start to finish of one task.


中文
传播延迟是信号从输入到输出经过50%点的时间。吞吐量是单位时间能完成多少操作(像流水线工厂每秒出几个产品)，延迟是一个操作从头到尾的时间(一个产品走完整条流水线要多久)。





SystemVerilog
Lectures 4-6 | Midterm 1 scope
中文概要SystemVerilog是硬件描述语言，分为结构化（门级连接）和行为化（算法描述）两种风格。关键区别：assign是连续赋值（组合逻辑），always_ff用于时序逻辑（触发器），always_comb用于组合逻辑。阻塞赋值(=)按顺序执行，非阻塞赋值(<=)并行更新（用于触发器）。模块可参数化、可用generate循环实例化。

Two Styles

STRUCTURAL
module mux2(input a,b,sel, output out);
  logic s0, w0, w1;
  not (s0, sel);
  and (w0, s0, a);
  and (w1, sel, b);
  or  (out, w0, w1);
endmodule
Instantiates gates explicitly. Output is always first argument.

BEHAVIORAL
module mux2(input a,b,sel, output out);
  assign out = sel ? b : a;
endmodule
assign = continuous assignment (combinational logic). Changes on RHS immediately reflected.


中文
结构化描述就像画电路图，一个一个门连起来。行为化描述更像写代码，描述"做什么"而不是"怎么连"。两种方式综合后可以得到一样的电路。assign相当于"永远保持这个等式成立的导线连接"。


Always Blocks
KEY RULES

Block Use For Assignment Sensitivity
always_comb Combinational logic = (blocking) Auto (all inputs)
always_ff Flip-flops <= (non-blocking) @(posedge clk)
always_latch Latches (avoid!) <= Level-sensitive


中文
always_comb像一个"随时更新的计算器"，输入一变输出立刻变。always_ff像一个"快门"，只在时钟上升沿那一刻拍一张"快照"存起来。记住：组合逻辑用=（阻塞），触发器用<=（非阻塞）。


Blocking vs Non-Blocking
CRITICAL DIFFERENCE

Non-blocking (<=) - for FF
always_ff @(posedge clk) begin
  q1 <= in;  // all evaluate
  q2 <= q1;  // using OLD values
  q3 <= q2;  // = shift register!
end
Blocking (=) - WRONG for FF
always_ff @(posedge clk) begin
  q1 = in;   // updates immediately
  q2 = q1;   // uses NEW q1 = in
  q3 = q2;   // all get same value!
end



Parameterized Modules & Generate

module counter #(parameter N = 4) (
  input clk, rst, enb,
  output logic [N-1:0] count
);
  always_ff @(posedge clk)
    count <= rst ? '0 : (enb ? count+1 : count);
endmodule

中文
参数化模块像一个"可调尺寸的模具"。parameter定义可调参数(如位宽N)，generate/for可以批量生成重复硬件结构(如N个加法器)。实例化时用 #(.N(8)) 指定参数值。





Sequential Logic & FSMs
Lectures 6-7 | Midterm 1 scope
中文概要时序逻辑的输出取决于当前输入和历史状态（通过触发器存储）。有限状态机(FSM)是常见的时序电路。Moore机输出仅取决于当前状态，Mealy机输出取决于当前状态和输入。FSM用两段式always block实现：一个always_ff更新状态寄存器，一个always_comb计算下一状态和输出。同步/异步复位的区别在于复位是否依赖时钟。

Moore vs Mealy


Moore Mealy
Output depends on State only State + Input
Output changes On clock edge Asynchronously with input
Typically needs More states Fewer states


中文
Moore机像一个"只看当前房间决定行为"的机器人，Mealy机还会"听门外的声音来决定"。Moore更安全(输出同步)，Mealy更省状态但输出可能有毛刺。


FSM in SystemVerilog (Two-Block Style)

typedef enum logic [1:0] {IDLE, RUN, DONE} state_t;
state_t state, next_state;

always_ff @(posedge clk)
  state <= rst ? IDLE : next_state;

always_comb begin
  next_state = state;  // default: stay
  case (state)
    IDLE: if (start) next_state = RUN;
    RUN:  if (finish) next_state = DONE;
    DONE: next_state = IDLE;
  endcase
end

中文
两段式FSM：第一段(always_ff)是"记忆"，每个时钟存一次当前状态；第二段(always_comb)是"决策"，根据当前状态和输入决定下一步。注意always_comb里要有默认赋值(next_state=state)，否则会综合出latch。


Sync vs Async Reset


Synchronous Reset
always_ff @(posedge clk)
  if (!reset_n) q <= 0;
  else q <= d;
Reset only on clock edge.
Asynchronous Reset
always_ff @(posedge clk or
           negedge reset_n)
  if (!reset_n) q <= 0;
  else q <= d;
Reset immediately, no clock needed.






RISC-V & Datapath
Lectures 8-10 | Midterm 1 scope
中文概要RISC-V是开源指令集架构，有6种指令格式(R/I/S/B/U/J)。处理器是一个大型状态机：每个时钟周期取指、译码、执行、访存、写回。数据通路包括PC、指令存储器、寄存器文件、ALU、数据存储器、立即数生成器。控制信号(PCSel, ALUSel, RegWEn等)由指令的opcode/funct字段决定。JAL/JALR用于跳转，Branch用于条件分支。

Instruction Formats


Format Fields (31..0) Used By
R-type funct7 | rs2 | rs1 | funct3 | rd | opcode ADD, SUB, AND, OR, XOR, SLT...
I-type imm[11:0] | rs1 | funct3 | rd | opcode ADDI, LW, JALR, SLLI...
S-type imm[11:5] | rs2 | rs1 | funct3 | imm[4:0] | opcode SW, SB, SH
B-type imm[12|10:5] | rs2 | rs1 | funct3 | imm[4:1|11] | opcode BEQ, BNE, BLT, BGE...
U-type imm[31:12] | rd | opcode LUI, AUIPC
J-type imm[20|10:1|11|19:12] | rd | opcode JAL


中文
RISC-V指令格式里，立即数的位分布看起来很乱，但其实是为了让不同格式的相同位尽量对齐(简化硬件)。记住关键：R型用于寄存器运算，I型用于立即数和加载，S型用于存储，B型用于分支。


Single-Cycle Datapath




Data flows: PC → IMEM → Decode/RegRead → ALU → DMEM → WriteBack

中文
单周期数据通路就像一条"流水线工厂"：PC提供地址 → 指令存储器取出指令 → 译码并读寄存器 → ALU计算 → 访问数据存储器 → 结果写回寄存器。控制逻辑就像"车间主任"，根据指令类型发出不同控制信号。





Pipelining & Hazards
Lectures 10-11 | Midterm 1 scope
中文概要流水线将数据通路分成多个阶段(取指F、译码D、执行X、访存M、写回W)，用寄存器分隔，使多条指令可以同时在不同阶段执行，提高吞吐量。但引入三种冒险：结构冒险(资源冲突)、数据冒险(数据依赖)、控制冒险(分支)。解决方法：转发(forwarding)、停顿(stalling)、分支预测。最大频率由最慢阶段决定。

5-Stage Pipeline










Hazards


Type Cause Solution
Structural Two stages need same resource Separate IMEM/DMEM, multi-port regfile
Data Instruction depends on result of prior instr Forwarding, stalling
Control Branch/jump changes PC Branch prediction, flush, delay slots


中文
流水线冒险就像高速公路上的"堵车"：结构冒险是两辆车抢同一车道(加车道解决)；数据冒险是后车需要前车的结果(转发=直接传递，停顿=等一等)；控制冒险是不知道该走哪个出口(分支预测=猜一个方向先走着)。


Timing
FORMULAS

Max frequency: f = 1/T_min
T_min = t_clk-q + t_logic,max + t_setup
Hold constraint: t_clk-q + t_logic,min > t_hold
Retiming: move registers to balance critical path across pipeline stages






CMOS Fundamentals
Lectures 13-14 | Midterm 2 scope
中文概要CMOS是现代芯片的基础。NMOS在栅极高电平时导通（拉低输出），PMOS在栅极低电平时导通（拉高输出）。每个门有互补的上拉(PUN,PMOS)和下拉(PDN,NMOS)网络，是对偶关系：串联变并联。静态CMOS门只能实现反相功能(NAND/NOR)。晶体管尺寸设计：串联时加倍宽度以保持等效驱动能力。

NMOS vs PMOS






V_G = 0 (NMOS OFF)



中文
NMOS就像一个"高电平才开的水龙头"，导通时把输出拉到地(输出0)。PMOS相反，"低电平才开"，导通时把输出拉到电源(输出1)。NMOS传递强0弱1，PMOS传递强1弱0，所以CMOS两者配合使用。


CMOS Gate Structure
KEY RULES

PDN (NMOS, to GND): Series = AND, Parallel = OR
PUN (PMOS, to V_DD): Dual of PDN (Series↔Parallel)
Output = complement of PDN expression
Static CMOS is always inverting: can build NAND, NOR, not AND/OR directly


中文
PDN决定什么时候输出拉低(接地)，所以如果PDN中两个NMOS串联(都导通才接地=AND条件)，输出就是(AB)的反=NAND。对偶规则：PDN串联↔PUN并联，像镜子里的倒影。


Interactive: Build a CMOS Gate











Transistor Sizing
METHOD
Goal: equal pull-up/pull-down strength as reference inverter (W_p/W_n = β)

NMOS in series × N: each width = N × W_n,min
PMOS in series × N: each width = N × W_p,min
Parallel: keep minimum width


Gate NMOS config PMOS config Each NMOS W Each PMOS W Total (β=2)
NAND2 2 series 2 parallel 2 2 8
NAND3 3 series 3 parallel 3 2 15
NOR2 2 parallel 2 series 1 4 10
NOR3 3 parallel 3 series 1 6 21

NAND gates are smaller than NOR gates! (because PMOS in parallel, not series)

中文
晶体管串联时电阻加倍，所以要加倍宽度来补偿。NAND中NMOS串联(需要加倍)而PMOS并联(不用加倍)，NOR则相反。因为PMOS本身就比NMOS大(beta=2)，所以NOR中PMOS串联会变得很大，这就是为什么NAND比NOR更受欢迎。





CMOS Delay
Lecture 15 | Midterm 2 scope
中文概要CMOS延迟用RC模型分析。每个晶体管导通时有电阻R(与宽度成反比)，栅极和漏极有电容。传播延迟：t = ln(2) × R × (寄生电容 + 负载电容)。Elmore延迟用于复杂RC网络：对每个电容，乘以从源到该电容的共享电阻，求和。关键：包含非通路电容，不包含非通路电阻。反相器链最优扇出约为e≈2.7。

RC Model
FORMULAS

On-resistance: R = R_on / W (wider = lower resistance)
Gate capacitance: C_g = W × C_i
Drain capacitance: C_d = γ × C_g (γ ≈ 1 in this class)
Inverter delay: t_p = ln(2) × R_eq × (C_parasitic + C_load)


中文
RC模型把晶体管简化为"水管"(电阻R)和"水桶"(电容C)。宽度大=水管粗=电阻小=充电快。延迟就是给电容充到50%所需的时间=ln(2)*RC。就像用水管往桶里灌水，管越粗(R小)桶越小(C小)就越快。


Elmore Delay
METHOD
t_D = ln(2) × ∑_i R_shared,i × C_i
For each capacitor: multiply by total resistance from source to that cap. Sum all.
Include off-path CAPACITANCES. Exclude off-path RESISTANCES.

中文
Elmore延迟就像算"每个水桶被上游所有水管拖慢了多少"。对每个电容C_i，找从源到它经过的所有电阻加起来(共享路径)，乘以C_i，最后全部求和。非通路上的电容要算(因为也要充电)，但非通路上的电阻不算(因为电流不经过那里)。


Interactive: RC Charging





V(t) = V_DD(1 - e^-t/RC), 50% at t = ln(2)·RC



Inverter Chain Optimization
FORMULA

Overall fanout: F = C_L/C_in
N stages, optimal fanout per stage: f = F^1/N
Optimal f ≈ e ≈ 2.718 (practical: 3-4)
Chain delay: D = N × t_p0(1 + γ/f)


中文
如果要驱动一个很大的负载(比如C_L是输入的1000倍)，直接用一个反相器会太慢。解决方案：用一串逐渐变大的反相器，每级放大约e倍(2.7倍)。级数N=ln(F)/ln(e)，总延迟最小。就像搬重物时用多个滑轮逐级放大力量。





Logical Effort
Lecture 16 | Midterm 2 scope
中文概要逻辑努力是比较不同门速度并优化尺寸的方法。门延迟 d = g×h + p（逻辑努力×电气努力 + 寄生延迟）。路径总努力 F = G×B×H，最优每级努力 f = F^(1/N)。从输出往回算每个门的输入电容来确定尺寸。NAND比NOR快（逻辑努力更小）。

Gate Delay Formula
CORE FORMULA
d = g × h + p

g = logical effort (gate type property)
h = C_out/C_in = electrical effort (fanout)
p = parasitic delay
f = g×h = stage effort


中文
逻辑努力g衡量"这个门比反相器慢多少"。电气努力h衡量"负载有多重"。寄生延迟p是门自身的固有延迟。总延迟d就像做一件事的难度=事情本身的难度(g) × 负担大小(h) + 准备时间(p)。


Logical Effort Table


Gate g (logical effort) p (parasitic delay)
Inverter 1 1
NAND2 4/3 2
NAND3 5/3 3
NOR2 5/3 2
NOR3 7/3 3



Path Optimization
OPTIMIZATION METHOD

Compute: G = ∏g_i, B = ∏b_i (branching), H = C_out/C_in
Path effort: F = G × B × H
Optimal stage effort: f̂ = F^1/N
Min delay: D = N × f̂ + ∑p_i
Size backwards: C_in,i = g_i × C_out,i / f̂


中文
路径优化的步骤：(1)算总努力F=G*B*H；(2)求最优每级努力f=F^(1/N)；(3)从输出往回推每个门的输入电容C_in = g*C_out/f。就像规划旅程：先算总路程，再平均分配每天的行程，然后从目的地往回安排住宿点。





Wires & Energy
Lecture 17-18 | Midterm 2 scope
中文概要导线有电阻和电容，先进工艺中导线延迟可超过门延迟。总功耗 = 动态 + 短路 + 漏电。动态功耗 P = αCV²f，其中α是活动因子=2p(1-p)。降压可大幅降低功耗(P∝V³)但频率也降(f∝V)。每次开关能量=CV²。短路功耗在PUN/PDN同时导通时产生，峰值在V_DD/2。

Wire Delay
FORMULAS

Wire R: R = ρ × L / (W × t) = R_sq × L/W
Wire C: C = ε × A / d
Lumped: t = 0.7 RC
Distributed (N→∞): t ≈ 0.35 RC


中文
导线不是理想的，有电阻(越长越细越大)和电容(与邻近导线的耦合)。集中式模型把整条导线的RC集中在一点，分布式模型更准确(延迟减半)。在先进工艺中导线延迟已经超过门延迟，成为瓶颈。


Dynamic Power
THE MOST IMPORTANT POWER FORMULA
P_dynamic = α × C_L × V_DD² × f_clk

Activity factor: α = 2p(1-p) where p = P(output=1)
Energy per switch: E = C_L × V_DD²
Short-circuit: P_sc = I_sc × V_DD, peaks at V_in = V_DD/2
Static: P_leak = I_leak × V_DD


中文
动态功耗P=alpha*C*V^2*f是最重要的公式。alpha是"活动因子"=信号翻转的概率(对于随机信号，p=0.5时alpha最大=0.5)。降压效果显著：电压降一半，功耗降到1/8(因为f也降一半)。但不能无限降压，低于阈值电压晶体管就不工作了。


Interactive: Voltage Scaling



V_DD: 1.0V




Frequency: 200 MHz
Power: 200 mW
Energy/op: 1.00 μJ


Base: C=10nF, f=200MHz@1.0V, α=0.1. Assumes I_D ∝ V_DD.





Adders
Lectures 18-19 | Midterm 2 scope
中文概要加法器核心信号：生成G=AB，传播P=A⊕B，C_out=G+P·C_in。行波进位O(N)最慢但最省面积和能量。进位跳跃O(√N)利用分组跳过全传播段。进位选择O(√N)并行计算两种进位假设再选择。进位前瞻O(logN)最快但面积最大。

Full Adder
CORE EQUATIONS

S = A ⊕ B ⊕ C_in
C_out = G + P · C_in   where G = AB, P = A⊕B
Kill: K = ĀB̄ (carry always killed)


中文
全加器是加法器的"砖块"。G(生成)=两个输入都是1时自己产生进位。P(传播)=恰好一个是1时把进位传递下去。K(杀死)=两个都是0时进位被吃掉。C_out = G + P*C_in 就是"要么自己生成，要么传递进来的"。


Adder Comparison


Type Delay Area Energy Key Idea
Ripple-Carry O(N) O(N) Lowest Chain carries
Carry-Bypass O(√N) O(N) Low Skip groups if all P=1; b=√(N/2)
Carry-Select O(√N) ~2N Med Compute Cin=0,1 in parallel
Carry-Lookahead O(log N) O(NlogN) High Parallel G/P tree



Interactive: Carry Propagation













Multipliers
Lectures 19-20 | Midterm 2 scope
中文概要N位乘法三步：部分积生成(AND)、部分积压缩(CSA)、最终加法。CSA将3个数变2个数仅需O(1)。Wallace树用多层CSA，O(log N)层压缩。Booth编码(基4)把N个部分积减半为N/2：3A=4A-A，避免困难乘法。Baugh-Wooley用于有符号乘法的符号扩展优化。

Three Stages





中文
乘法器就像手算竖式乘法的硬件版：(1)算出所有部分积(每一位乘以被乘数)；(2)用CSA(进位保留加法器)快速压缩，每次3个数变2个数；(3)最后用快速加法器把剩下的2个数加起来。Wallace树就是一层层CSA嵌套。


Booth Recoding (Radix-4)
ENCODING TABLE

B_k+1B_kB_k-1 000 001 010 011 100 101 110 111
Partial Product 0 +A +A +2A -2A -A -A 0

N partial products → N/2. Key insight: 3A = 4A - A (defer 4A to next group).

中文
Booth编码的核心思想：把连续的1变成"高位+1、低位-1"。比如0111=1000-0001=8-1。基4版本每次看3位(重叠1位)，把部分积数量减半。硬件只需要做0、+A、-A、+2A、-2A五种操作，其中2A就是左移一位。





Timing & Latches
Lecture 21 | Final scope
中文概要时序分析：建立时间(setup)是数据在时钟边沿前必须稳定的时间，保持时间(hold)是之后必须保持的时间。最小周期T ≥ t_clk-q + t_logic,max + t_setup + t_skew + t_jitter。保持约束：t_clk-q + t_logic,min > t_hold + t_skew。锁存器(latch)是电平敏感的，透明时D直通到Q；触发器(flip-flop)是边沿触发的，由两个锁存器组成。时钟偏移(skew)是空间变化，抖动(jitter)是时间变化。

Setup and Hold
TIMING CONSTRAINTS

Setup (max delay): T ≥ t_clk-q + t_logic,max + t_setup + t_skew + t_jitter
Hold (min delay): t_clk-q + t_logic,min ≥ t_hold + t_skew + t_jitter
Setup slack: T - (t_clk-q + t_logic,max + t_setup - t_skew + t_jitter)
Hold slack: (t_clk-q + t_logic,min - t_skew - t_jitter) - t_hold

Setup too slow? Reduce logic, retime, pipeline.   Hold too fast? Add buffers.

中文
建立时间就像"提前到机场的时间"——数据必须在时钟来之前就稳定好。保持时间就像"不能提前离开考场"——时钟来了之后数据还要保持一段时间。建立违例可以通过降频解决(给更多时间)，但保持违例不行(与频率无关)，只能加缓冲器。


Flip-Flop vs Latch


Flip-Flop Latch
Triggered by Clock edge Clock level
Transparent when Never (samples at edge) CLK high (or low)
Built from Two latches (primary+secondary) Transmission gate + inverter
Verilog always_ff @(posedge clk) always_latch if(clk) q<=d;


中文
锁存器像一扇"电动门"——时钟高电平时门开着，输入直接穿过到输出(透明)；时钟低电平时门关闭，保持最后的值。触发器像"快门相机"——只在时钟上升沿那一瞬间"拍照"(采样)。触发器由两个锁存器组成(主从结构)。


Clock Skew & Jitter


Clock skew (t_SK): spatial variation in clock arrival. Deterministic + random.
Clock jitter (t_J): temporal variation in consecutive clock edges. Random noise.
Positive skew (receiver late) helps setup, hurts hold
Negative skew (receiver early) hurts setup, helps hold


中文
时钟偏移(skew)是"空间上的不同步"——芯片不同角落的时钟到达时间不同(像广播信号到达不同城市有延迟)。抖动(jitter)是"时间上的不稳定"——同一个点的时钟周期忽长忽短。正偏移(接收器晚收到时钟)给了数据更多时间到达(利于建立)，但也给了旧数据更多时间残留(不利于保持)。





SRAM & Memory
Final scope (from FA25 notes)
中文概要SRAM用6个晶体管存储1位数据。读操作：预充电位线到V_DD，开启字线，感应放大器检测位线差异。读稳定性要求存储节点不被翻转(拉下>拉起)。写操作：拉低一条位线迫使单元翻转。写能力要求访问管能覆盖存储反相器。行译码器用NAND门选中字线。列电路包括位线调节、写驱动、感应放大器。

6T SRAM Cell





中文
6T SRAM像两个互相"掐架"的反相器(交叉耦合)，稳定在两个状态之一(存0或存1)。读的时候先把两条"天平秤"(位线)都充到高电平，然后打开门(字线)，存储的值会让一边稍微降低，感应放大器放大这个微小差异。写的时候强行把一边拉到低，迫使状态翻转。


Read vs Write


Read Write
Requirement Stability (don't flip the cell) Writability (must flip the cell)
Pull-down vs Access PD > Access (W/L₁ > W/L₅) Access > PU (W/L₅ > W/L₂)
Typical ratio 1.5:1 (planar), 2:1 (finFET) 1:1 (planar), 2:1 (finFET)






Power & Energy Analysis
Lectures 17, 19 | Final scope

中文
功耗分析是芯片设计的核心挑战之一。总功耗=动态功耗+静态功耗。动态功耗来自充放电(开关活动)，静态功耗来自漏电流。就像一辆汽车：动态功耗是踩油门时消耗的油(取决于踩了多少次和踩多深)，静态功耗是停车时发动机空转消耗的油(一直在漏)。现代芯片两者都很重要，需要用各种技术来管理。


Power Breakdown
TOTAL POWER
P_total = P_dynamic + P_{short-circuit} + P_static

Dynamic: P_dyn = α × C_L × V_DD² × f — from charging/discharging load capacitances
Short-circuit: P_sc = I_sc × V_DD — when PMOS and NMOS both on during transition
Static (leakage): P_static = I_leak × V_DD — subthreshold and gate leakage


中文
动态功耗：每次信号翻转，给电容充电(0→1)从电源取能量CV^2，放电(1→0)能量消散到地。短路功耗：输入翻转过程中PMOS和NMOS同时短暂导通，产生直通电流(V_in在V_DD/2时最大)。静态功耗：即使不翻转，晶体管也有微小漏电流。


Activity Factor
ACTIVITY FACTOR alpha
The activity factor α represents the probability of a 0→1 transition per clock cycle.

α = 2 × p × (1-p) where p = P(output = 1)
Maximum when p = 0.5: α_max = 0.5
Clock signal: α = 1 (transitions every cycle)
For an AND2 with independent random inputs: p = 0.25, α = 2(0.25)(0.75) = 0.375
For an OR2 with independent random inputs: p = 0.75, α = 2(0.75)(0.25) = 0.375


中文
活动因子alpha衡量"信号有多忙"。alpha=2p(1-p)中p是输出为1的概率。如果信号总是0或总是1(p=0或1)，alpha=0(没有翻转)。如果p=0.5(随机信号)，alpha=0.5(最大翻转率)。时钟信号每个周期都翻转，所以alpha=1。


Energy Per Operation
ENERGY FORMULAS

Energy per switching event: E_switch = C_L × V_DD²
Energy per operation (considering activity): E_op = α × C_L × V_DD²
Note: energy is independent of frequency! Lowering f reduces power but not energy/op.
Lowering V_DD reduces energy/op quadratically (E ∝ V_DD²)


中文
能量和功率的区别：功率P=能量/时间。降低频率可以降低功率(单位时间做的事少了)，但每次操作的能量不变。降低电压才能真正降低每次操作的能量(E正比于V^2)。这就是为什么降压是最有效的省电手段。


Power Management Techniques
TECHNIQUES

Technique What It Does Saves
Clock Gating Disable clock to idle blocks (alpha → 0) Dynamic power
Voltage Scaling Lower V_DD (P ∝ V³ since f ∝ V) Dynamic + Energy
Power Gating Cut V_DD to idle blocks entirely Static (leakage)
Multi-V_th High V_th on non-critical paths (less leakage) Static power
Body Biasing Adjust V_th dynamically via substrate voltage Static power


中文
时钟门控：给不用的模块"关掉闹钟"，不翻转就不耗动态功耗。电压缩放：降低供电电压，功耗按V^3下降(因为频率也按V下降)。电源门控：给不用的模块彻底"断电"，消除漏电。多阈值：关键路径用快(低阈值)晶体管，其他用省电(高阈值)的。


DVFS (Dynamic Voltage and Frequency Scaling)
DVFS

Adjust V_DD and f together based on workload
High load → high V/f for performance
Low load → low V/f to save power
Key relationship: f_max ∝ (V_DD - V_th)² / V_DD (simplified: f ∝ V_DD)
Power scales as P ∝ V_DD³ (because both V² and f scale)


中文
DVFS就像汽车的"巡航模式"：高速公路上加大油门(高电压高频率)跑得快，城市里降低油门(低电压低频率)省油。手机CPU就大量使用DVFS：刷微博时降频省电，玩游戏时全速运行。功耗按电压的三次方下降，所以降压10%可以省电约27%。


WORKED EXAMPLE
Q: A chip has C_total=20nF, V_DD=1.0V, f=1GHz, alpha=0.15, I_leak=100mA. Find total power.

P_dyn = 0.15 × 20×10^-9 × 1.0² × 10⁹ = 3.0 W
P_static = 0.1 × 1.0 = 0.1 W
P_total ≈ 3.0 + 0.1 = 3.1 W (ignoring short-circuit)

If we scale V_DD to 0.8V: f scales to 0.8GHz, P_dyn = 0.15 × 20e-9 × 0.64 × 0.8e9 = 1.536 W (49% reduction!)





Shifters & Crossbars
Lecture 21 | Final scope

中文
移位器用于将二进制数左移或右移若干位(乘以或除以2的幂)。桶形移位器用多级MUX实现对数级延迟的任意移位。漏斗移位器把左移和右移统一成一个操作。交叉开关是N个输入到M个输出的全连接开关网络，用于路由器和网络芯片。就像火车站的"道岔"系统，可以把任意一条入轨连到任意一条出轨。


Barrel Shifter
LOGARITHMIC MUX STAGES
An N-bit barrel shifter uses log₂(N) stages of 2:1 muxes.

Stage k shifts by 2^k positions if shift_amount[k] = 1
For 8-bit: 3 stages (shift by 1, 2, 4)
For 32-bit: 5 stages (shift by 1, 2, 4, 8, 16)
Each stage: N muxes selecting between shifted and unshifted
Delay: O(log N) mux delays
Area: O(N log N) muxes total


中文
桶形移位器的巧妙之处：把移位量分解成二进制位。比如移位5=101(二进制)，就先移1位(第0级)，跳过移2位(第1级)，再移4位(第2级)。每一级是一排MUX，选择"移还是不移"。就像调音量旋钮，每个刻度都是上一个的两倍。


EXAMPLE: 8-BIT BARREL SHIFTER
// Shift amount = 5 = 101 in binary
// Input:    A B C D E F G H

// Stage 0 (shift by 1, enabled): B C D E F G H 0
// Stage 1 (shift by 2, skip):    B C D E F G H 0
// Stage 2 (shift by 4, enabled): F G H 0 0 0 0 0

// Result: left shift by 5


Funnel Shifter
UNIFIED SHIFT

Combines left shift, right shift, and rotate into one structure
Place two copies of input side-by-side: [A | A] (2N bits)
Select N consecutive bits from this 2N-bit word
Left shift by k: select bits starting at position N-k
Right shift by k: select bits starting at position k (with sign/zero extension for upper copy)
Rotate by k: select bits starting at position k (upper copy = lower copy)


中文
漏斗移位器的思路很优雅：把输入复制一份拼成2N位的"长条"，然后从中选取连续的N位。左移就是选靠后的位置，右移就是选靠前的位置，旋转就是两份都用原数据。就像一个"取景框"在两倍长的胶片上滑动。


Crossbar Switch
N x M CROSSBAR

Connects any of N inputs to any of M outputs
Each output has an N:1 mux
Total: M muxes, each N-to-1
Area: O(N × M) — grows quadratically
Delay: O(1) mux delay per output (single stage)
Used in: network switches, NoC routers, register file read ports


中文
交叉开关就像电话交换机：N个输入和M个输出之间可以任意连接。每个输出端有一个N选1的MUX。面积是N*M(每个交叉点一个开关)，所以规模大了很贵。在片上网络(NoC)和路由器中广泛使用。











Binary Division
Lecture 21 | Final scope

中文
二进制除法比乘法更复杂，因为每一步都依赖上一步的结果(无法像乘法那样并行)。恢复除法最直观：试减，如果不够就加回来。不恢复除法更快：不加回去，而是下一步补偿。SRT除法用更高基数(基4/8)每步产生多位商，是现代处理器的标准做法。就像手算长除法，但每一步可以"猜"多位。


Restoring Division
ALGORITHM

Initialize: remainder R = dividend (2N bits), divisor D, quotient Q = 0
For each bit position (N iterations):
  
  Left-shift R by 1
  Trial subtract: R' = R - D
  If R' ≥ 0: set Q bit = 1, R = R'
  If R' < 0: set Q bit = 0, restore R (don't update)
  

After N steps: Q = quotient, R = remainder

Delay: O(N) iterations, each with an N-bit subtraction
Hardware: N-bit adder/subtractor + shift registers

中文
恢复除法就像手算除法：每次"试探"能不能减去除数。能减(够减)就商1，不能减(不够减)就商0并把减掉的加回来(恢复)。缺点是"加回来"这一步浪费时间。


EXAMPLE: 7 / 3
Dividend = 0111 (7), Divisor = 011 (3)
Step 1: R=0111, shift left -> 01110
        Trial: 01110 - 01100 = 00010 >= 0, Q[1]=1, R=00010
Step 2: R=00010, shift left -> 00100
        Trial: 00100 - 01100 = -01000 < 0, Q[0]=0, R=00100(restore)
Result: Q = 10 (2), R = 01 (1)
Check: 7 = 3*2 + 1 correct!


Non-Restoring Division
FASTER ALGORITHM

Instead of restoring, add D in the next step if previous step was negative
If previous remainder ≥ 0: shift left, subtract D, Q bit = 1
If previous remainder < 0: shift left, add D, Q bit = 0
At the end: if remainder is negative, add D to correct
Advantage: only one add/subtract per step (no restore step)
Quotient bits are in {0,1} but may need final correction


中文
不恢复除法的思路：如果这一步减多了(结果为负)，下一步不是"加回来再减"，而是直接"加"来补偿。省去了恢复那一步。就像在天平上称重：放错了砝码不用拿下来重放，直接在另一边加砝码补偿。


SRT Division
HIGH-RADIX DIVISION

Named after Sweeney, Robertson, Tocher
Uses radix-4 or higher: produces 2+ quotient bits per iteration
Quotient digits from redundant set {-2, -1, 0, 1, 2} for radix-4
Uses a lookup table (PLA) indexed by top bits of remainder and divisor
Redundancy allows imprecise remainder estimation → faster
Delay: O(N/2) for radix-4, O(N/k) for radix-2^k
Used in modern Intel/AMD/ARM processors


中文
SRT除法是现代CPU的标准。基4版本每步产生2位商，速度翻倍。关键技巧是使用"冗余表示"：商的每一位可以是-2到+2(不只是0和1)，这样不需要精确知道余数的值就能选择商，容错性更强。就像去超市不用精确到分地算钱，四舍五入也能结账。(著名的Pentium FDIV bug就是SRT表格里少了几个条目。)


EXAM TIP
Division is inherently sequential (each step depends on previous remainder). Unlike multiplication, you cannot parallelize the iterations. This is why division is 10-40x slower than multiplication in hardware.





FPGA vs ASIC
Project-relevant | Final scope

中文
FPGA(现场可编程门阵列)就像"乐高积木"——买回来后可以反复重新编程配置。ASIC(专用集成电路)就像"定制模具"——一旦制造出来就不能改了。FPGA适合原型验证和小批量，ASIC适合大批量生产(手机芯片等)。我们151的项目就是在FPGA上实现RISC-V处理器。


FPGA Architecture
CORE COMPONENTS

LUT (Look-Up Table): Small SRAM that can implement any N-input boolean function (typically 4-6 inputs). Acts as a universal gate.
CLB (Configurable Logic Block): Contains multiple LUTs + flip-flops + carry chains + muxes
Routing: Programmable interconnect (switch boxes + connection boxes) that connects CLBs
IOBs: Input/Output blocks at chip periphery
Block RAM: Dedicated SRAM blocks (e.g., 36Kbit)
DSP slices: Hardened multiply-accumulate units


中文
FPGA的核心是LUT(查找表)——一小块SRAM存储真值表，可以实现任何N输入布尔函数。比如4输入LUT有16个存储单元(2^4)，可以实现任何4输入逻辑。CLB把多个LUT、触发器、进位链打包在一起。可编程路由网络把它们连接起来。就像一块"万能面包板"。


FPGA vs ASIC Comparison


Aspect FPGA ASIC
Reconfigurable Yes (reprogram anytime) No (fixed after fabrication)
NRE Cost Low ($0 - $10K) Very high ($1M - $100M+)
Unit Cost High ($10 - $10K per chip) Low ($0.10 - $100 per chip)
Performance ~100-500 MHz typical ~1-5 GHz
Power Efficiency 10-100x worse Best (optimized transistors)
Area Efficiency 10-40x worse Best (custom layout)
Time to Market Days to weeks Months to years
Best For Prototyping, low volume, reconfigurable High volume, max performance


中文
选择FPGA还是ASIC就像选择"3D打印"还是"开模具"：3D打印(FPGA)灵活快速但单件贵，开模具(ASIC)前期投入大但大量生产便宜。FPGA因为用通用LUT和可编程路由，面积和功耗都比ASIC大很多，但开发周期短，适合我们做项目验证。


FPGA Design Flow
DESIGN STEPS

RTL Design: Write SystemVerilog (behavioral)
Simulation: Verify with testbenches (iverilog, VCS)
Synthesis: Convert RTL to LUT/FF netlist (Vivado)
Place & Route: Map netlist to physical FPGA resources
Bitstream Generation: Create configuration file
Program FPGA: Load bitstream onto device



ASIC Design Flow
DESIGN STEPS

RTL Design: Write SystemVerilog
Simulation: Functional verification
Logic Synthesis: Map to standard cell library (Design Compiler)
Floor Planning: Arrange major blocks on die
Place & Route: Place cells and connect wires
Timing Closure: Meet setup/hold across all paths (STA)
DRC/LVS: Design rule check / layout vs schematic
Tapeout: Send to foundry (TSMC, Samsung, etc.)
Fabrication: Weeks to months


中文
FPGA流程：写代码 → 仿真 → 综合(变成LUT网表) → 布局布线 → 生成比特流 → 下载到板子上。几小时就能完成。ASIC流程多了很多步：综合到标准单元库、时序收敛、物理验证(DRC/LVS)、然后送去代工厂流片，整个过程几个月到一年。


PROJECT NOTE
In EECS 151, your RISC-V processor targets the Xilinx PYNQ-Z1 FPGA. Use Vivado for synthesis. Key FPGA-specific considerations: LUT utilization, BRAM usage, timing constraints (100MHz target), and using always @(*) for iverilog compatibility.





Cheatsheet: Midterm 1


Boolean Algebra
De Morgan: (AB)'=A'+B', (A+B)'=A'B'
Absorption: A+AB=A
Consensus: AB+A'C+BC=AB+A'C
SOP: OR of AND minterms
POS: AND of OR maxterms
Noise Margins
NM_H = V_OH - V_IH
NM_L = V_IL - V_OL
t_p = (t_pHL + t_pLH)/2
SystemVerilog
assign = continuous (comb)
always_comb: blocking (=)
always_ff: non-blocking (<=)
Module instantiation: .port(signal)
generate/for: parameterized hardware
FSMs
Moore: output = f(state)
Mealy: output = f(state, input)
Two-block style: 1 always_ff + 1 always_comb
RISC-V Formats
R: funct7|rs2|rs1|f3|rd|op (ADD,SUB,AND)
I: imm[11:0]|rs1|f3|rd|op (ADDI,LW,JALR)
S: imm[11:5]|rs2|rs1|f3|imm[4:0]|op (SW)
B: imm|rs2|rs1|f3|imm|op (BEQ,BNE)
U: imm[31:12]|rd|op (LUI,AUIPC)
J: imm|rd|op (JAL)
Pipelining
5 stages: F D X M W
T_min = t_cq + t_logic + t_su
Hold: t_cq + t_min > t_hold
Hazards: structural, data, control
Fix: forwarding, stalling, prediction
Control Signals
PCSel: pc+4 or branch target
ImmSel: I/S/B/U/J format
ALUSel: ADD/SUB/AND/OR/XOR/SLT
MemRW: Read/Write
WBSel: ALU/mem/PC+4
RegWEn: write enable
Key Constants
x0 always = 0
JAL: rd=PC+4, PC=PC+imm
JALR: rd=PC+4, PC=rs1+imm
LUI: rd=imm<<12
AUIPC: rd=PC+(imm<<12)





Cheatsheet: Midterm 2


CMOS Basics
NMOS ON: V_GS > V_th (strong 0)
PMOS ON: V_GS < -|V_th| (strong 1)
PDN: series=AND, parallel=OR
PUN = dual of PDN
Always inverting: NAND, NOR only
Sizing: N in series => width x N
CMOS Delay
R = R_on/W, C_g = W*C_i, C_d = gamma*C_g
t_p = ln(2)*R*(C_p + C_L)
Elmore: t = ln(2)*sum(R_shared * C_i)
Off-path caps: YES. Off-path R: NO
Chain: f_opt = F^(1/N), f ~= e ~= 2.7
Logical Effort
d = g*h + p
INV: g=1,p=1. NAND2: g=4/3,p=2
NAND3: g=5/3,p=3. NOR2: g=5/3,p=2
F = G*B*H, f_hat = F^(1/N)
D_min = N*f_hat + sum(p_i)
Size: C_in,i = g_i * C_out,i / f_hat
Power & Energy
P_dyn = alpha * C * V_DD^2 * f
alpha = 2*p*(1-p)
E_switch = C * V_DD^2
P_sc = I_sc * V_DD (peak at V_DD/2)
P_leak = I_leak * V_DD
V scaling: f prop V, P prop V^3
DVFS: adjust V and f with workload
Wires
Lumped: t = 0.7RC
Distributed: t = 0.35RC
R = rho*L/(W*t), C = eps*A/d
Use Elmore delay for RC trees
Adders
G=AB, P=A xor B, C_out=G+P*Cin
RCA: O(N). Bypass: O(sqrt N), b=sqrt(N/2)
Select: O(sqrt N). CLA: O(log N)
Bypass delay: 2b + N/b - 2
Multipliers
CSA: 3 nums -> 2 in O(1)
Wallace: O(log_1.5 N) CSA levels
Booth radix-4: N->N/2 partial products
3A = 4A - A trick
Constants
ln(2) = 0.693
e = 2.718 (optimal fanout)
gamma = C_d/C_g (usually 1)
NAND preferred over NOR





Cheatsheet: Final Exam

Cumulative: includes all MT1 + MT2 content plus the following.

Timing Constraints
T >= t_cq + t_logic,max + t_su + t_sk + t_j
Hold: t_cq + t_min >= t_hold + t_sk + t_j
Setup slack = T - (t_cq+t_max+t_su-t_sk+t_j)
Hold slack = (t_cq+t_min-t_sk-t_j) - t_hold
Setup fail: reduce logic, retime, pipeline
Hold fail: add buffers (cannot fix with freq)
Clock Non-Idealities
Skew (t_SK): spatial (deterministic+random)
Jitter (t_J): temporal (cycle-to-cycle)
+skew: helps setup, hurts hold
-skew: hurts setup, helps hold
Both reduce effective cycle time
Latch vs Flip-Flop
Latch: level-sensitive (transparent)
FF: edge-triggered (two latches)
Latch has t_cq, t_dq, t_su, t_hold
FF has t_cq, t_su, t_hold
Transmission gate latch: NMOS+PMOS parallel
SRAM
6T cell: 2 access + 2 inverters
Read: precharge BL/BLbar, assert WL
Read stability: PD > Access transistor
Write: force BL low, assert WL
Writability: Access > PU
Ratio: PD:Access:PU = 3:2:1 (approx)
Memory Periphery
Row decoder: NAND gates -> wordline
Predecoder: smaller groups first
Bitline conditioning: precharge high
Write driver: pull one BL to 0
Sense amplifier: detect BL vs BLbar diff
Power Analysis
P_total = P_dyn + P_sc + P_leak
P_dyn = alpha*C*V^2*f
E_op = alpha*C*V^2 (freq independent)
Clock gating: reduce alpha
DVFS: adjust V,f with workload
Power gating: cut V_DD to idle blocks
Division Algorithms
Restoring: trial subtract, restore if neg
Non-restoring: add next step if neg
SRT: radix-4, 2 bits/iter, redundant digits
Division is sequential: O(N) iterations
~10-40x slower than multiplication
Shifters & FPGA
Barrel: log2(N) mux stages, O(NlogN)
Funnel: 2N concat, select N bits
Crossbar: N*M muxes, O(1) delay
FPGA: LUT + CLB + routing + BRAM
ASIC: standard cells, tapeout, months
FPGA 10-100x less efficient than ASIC





Flashcards
Click to flip. Covers all exam material.


Loading...
Loading...

1/1















Practice Quiz

Block	Use For	Assignment	Sensitivity
`always_comb`	Combinational logic	`=` (blocking)	Auto (all inputs)
`always_ff`	Flip-flops	`<=` (non-blocking)	`@(posedge clk)`
`always_latch`	Latches (avoid!)	`<=`	Level-sensitive

	Moore	Mealy
Output depends on	State only	State + Input
Output changes	On clock edge	Asynchronously with input
Typically needs	More states	Fewer states

Format	Fields (31..0)	Used By
R-type	funct7 \| rs2 \| rs1 \| funct3 \| rd \| opcode	ADD, SUB, AND, OR, XOR, SLT...
I-type	imm[11:0] \| rs1 \| funct3 \| rd \| opcode	ADDI, LW, JALR, SLLI...
S-type	imm[11:5] \| rs2 \| rs1 \| funct3 \| imm[4:0] \| opcode	SW, SB, SH
B-type	imm[12\|10:5] \| rs2 \| rs1 \| funct3 \| imm[4:1\|11] \| opcode	BEQ, BNE, BLT, BGE...
U-type	imm[31:12] \| rd \| opcode	LUI, AUIPC
J-type	imm[20\|10:1\|11\|19:12] \| rd \| opcode	JAL

Type	Cause	Solution
Structural	Two stages need same resource	Separate IMEM/DMEM, multi-port regfile
Data	Instruction depends on result of prior instr	Forwarding, stalling
Control	Branch/jump changes PC	Branch prediction, flush, delay slots

Gate	NMOS config	PMOS config	Each NMOS W	Each PMOS W	Total (β=2)
NAND2	2 series	2 parallel	2	2	8
NAND3	3 series	3 parallel	3	2	15
NOR2	2 parallel	2 series	1	4	10
NOR3	3 parallel	3 series	1	6	21

Type	Delay	Area	Energy	Key Idea
Ripple-Carry	O(N)	O(N)	Lowest	Chain carries
Carry-Bypass	O(√N)	O(N)	Low	Skip groups if all P=1; b=√(N/2)
Carry-Select	O(√N)	~2N	Med	Compute Cin=0,1 in parallel
Carry-Lookahead	O(log N)	O(NlogN)	High	Parallel G/P tree

B_k+1B_kB_k-1	000	001	010	011	100	101	110	111
Partial Product	0	+A	+A	+2A	-2A	-A	-A	0

	Flip-Flop	Latch
Triggered by	Clock edge	Clock level
Transparent when	Never (samples at edge)	CLK high (or low)
Built from	Two latches (primary+secondary)	Transmission gate + inverter
Verilog	`always_ff @(posedge clk)`	`always_latch if(clk) q<=d;`

	Read	Write
Requirement	Stability (don't flip the cell)	Writability (must flip the cell)
Pull-down vs Access	PD > Access (W/L₁ > W/L₅)	Access > PU (W/L₅ > W/L₂)
Typical ratio	1.5:1 (planar), 2:1 (finFET)	1:1 (planar), 2:1 (finFET)

Technique	What It Does	Saves
Clock Gating	Disable clock to idle blocks (alpha → 0)	Dynamic power
Voltage Scaling	Lower V_DD (P ∝ V³ since f ∝ V)	Dynamic + Energy
Power Gating	Cut V_DD to idle blocks entirely	Static (leakage)
Multi-V_th	High V_th on non-critical paths (less leakage)	Static power
Body Biasing	Adjust V_th dynamically via substrate voltage	Static power

Aspect	FPGA	ASIC
Reconfigurable	Yes (reprogram anytime)	No (fixed after fabrication)
NRE Cost	Low ($0 - $10K)	Very high ($1M - $100M+)
Unit Cost	High ($10 - $10K per chip)	Low ($0.10 - $100 per chip)
Performance	~100-500 MHz typical	~1-5 GHz
Power Efficiency	10-100x worse	Best (optimized transistors)
Area Efficiency	10-40x worse	Best (custom layout)
Time to Market	Days to weeks	Months to years
Best For	Prototyping, low volume, reconfigurable	High volume, max performance

Prerequisites

MT1 Topics

MT2 Topics

Final Topics

Exam Prep

EECS 151: Complete Study Guide

Course Map

61C Background / Prerequisites

Number Representation

RISC-V ISA Quick Reference

Memory Model

C to Assembly Mapping

Boolean Logic & Combinational Circuits

Boolean Algebra Laws

Canonical Forms

Noise Margins

Design Metrics

SystemVerilog

Two Styles

Always Blocks

Blocking vs Non-Blocking

Non-blocking (<=) - for FF

Blocking (=) - WRONG for FF

Parameterized Modules & Generate

Sequential Logic & FSMs

Moore vs Mealy

FSM in SystemVerilog (Two-Block Style)

Sync vs Async Reset

Synchronous Reset

Asynchronous Reset

RISC-V & Datapath

Instruction Formats

Single-Cycle Datapath

Pipelining & Hazards

5-Stage Pipeline

Hazards

Timing

CMOS Fundamentals

NMOS vs PMOS

CMOS Gate Structure

Interactive: Build a CMOS Gate

Transistor Sizing

CMOS Delay

RC Model

Elmore Delay

Interactive: RC Charging

Inverter Chain Optimization

Logical Effort

Gate Delay Formula

Logical Effort Table

Path Optimization

Wires & Energy

Wire Delay

Dynamic Power

Interactive: Voltage Scaling

Adders

Full Adder

Adder Comparison

Interactive: Carry Propagation

Multipliers

Three Stages

Booth Recoding (Radix-4)

Timing & Latches

Setup and Hold

Flip-Flop vs Latch

Clock Skew & Jitter

SRAM & Memory

6T SRAM Cell

Read vs Write

Power & Energy Analysis

Power Breakdown

Activity Factor

Energy Per Operation

Power Management Techniques

DVFS (Dynamic Voltage and Frequency Scaling)

Shifters & Crossbars

Barrel Shifter

Funnel Shifter

Crossbar Switch

Binary Division