EECS 151: Complete Study Guide

Introduction to Digital Design and ICs — UC Berkeley, Spring 2026

中文概要

EECS 151是UC Berkeley的数字集成电路设计入门课。涵盖:布尔逻辑、SystemVerilog硬件描述语言、有限状态机、RISC-V处理器设计、流水线、CMOS晶体管电路、延迟分析、逻辑努力、导线与能耗、加法器与乘法器、时序分析与存储器、功耗分析、移位器与交叉开关、除法器、FPGA与ASIC比较。本网站覆盖全部讲义内容加上61C背景知识复习,提供三份考试速查表、交互式动画和练习题。

Course Map

MIDTERM 1 (Lec 1-11) L1-2: Introduction, Design Process L3: Design Metrics, Noise Margins L4-6: SystemVerilog (Structural, Behavioral, Sequential) L7: Finite State Machines L8: Combinational Logic, Boolean Algebra L9-11: RISC-V, Pipelining, FPGAs, Hazards L12: Formal Verification (guest) MIDTERM 2 (Lec 13-20) L13: CMOS Process (fabrication, finFET) L14: CMOS Inverter (MOS transistor, VTC) L15: CMOS Delay (RC model, Elmore delay) L16: Logical Effort (gate sizing optimization) L17: Wires & Energy (dynamic/static power) L18: Adders (RCA, carry-bypass, carry-select) L19-20: Adders 2, Multipliers, Dividers FINAL EXAM (Cumulative + Lec 21+) L21: Timing & Latches, Shifters, Division, Power Analysis + SRAM Memory, FPGA vs ASIC + All MT1 and MT2 content Click any topic in the sidebar to study it. Each section has: concepts, formulas, diagrams, worked examples, and a Chinese summary. Cheatsheets are printable (Ctrl+P). Flashcards and quizzes at the bottom for self-testing.

61C Background / Prerequisites

Review of CS 61C concepts needed for EECS 151

中文

这一节回顾61C的核心基础:数的表示(无符号、二进制补码、符号扩展),RISC-V指令集基础(6种格式),内存模型(字节寻址、大小端、对齐),以及C语言到汇编的映射。就像盖房子之前要先了解砖块和水泥一样,这些是设计数字电路的"原材料"。

Number Representation

UNSIGNED & TWO'S COMPLEMENT
  • Unsigned N-bit: range [0, 2N-1]. Simply binary place values.
  • Two's complement N-bit: range [-2N-1, 2N-1-1]. MSB has weight -2N-1.
  • Negation: flip all bits, add 1. Example: -5 in 8-bit = ~(00000101)+1 = 11111011
  • Sign extension: replicate the MSB to fill wider register. E.g., 1011 (4-bit, -5) → 11111011 (8-bit, still -5)
  • Zero extension: pad with 0s (for unsigned values)
中文

二进制补码像一个"环形数轴":从0往上数到最大正数(01...1),再往上就跳到最小负数(10...0)。符号扩展就是复制最高位来保持数值不变,就像把"-5"从4位搬到8位时,前面都填1。

WORKED EXAMPLE

Convert -13 to 8-bit two's complement:

  1. 13 in binary = 00001101
  2. Flip bits = 11110010
  3. Add 1 = 11110011
  4. Check: -128+64+32+16+0+0+2+1 = -128+115 = -13

RISC-V ISA Quick Reference

6 INSTRUCTION FORMATS
FormatKey FieldsExamples
R-typefunct7 | rs2 | rs1 | funct3 | rd | opcodeadd, sub, and, or, xor, slt
I-typeimm[11:0] | rs1 | funct3 | rd | opcodeaddi, lw, jalr, slli
S-typeimm[11:5] | rs2 | rs1 | funct3 | imm[4:0] | opcodesw, sb, sh
B-typeimm bits | rs2 | rs1 | funct3 | imm bits | opcodebeq, bne, blt, bge
U-typeimm[31:12] | rd | opcodelui, auipc
J-typeimm bits | rd | opcodejal

All instructions are 32 bits. x0 is hardwired to 0. rd = destination register.

中文

RISC-V的6种指令格式就像6种不同的"信封"模板。R型用于寄存器间运算(如加减),I型用于带立即数的操作(如addi、加载),S型用于存储(把数据写入内存),B型用于条件跳转,U/J型用于大跳转和上半立即数。

Memory Model

BYTE ADDRESSING & ENDIANNESS
  • Byte addressing: each memory address points to one byte (8 bits)
  • Word: 4 bytes (32 bits) in RV32. lw loads 4 bytes starting at address.
  • Little-endian (RISC-V): least significant byte at lowest address
  • Alignment: word access should be at addresses divisible by 4; half-word by 2
  • Misaligned access may cause exceptions or be slower
中文

内存就像一排编了号的"小格子",每格存1字节。小端序(Little-endian)就是低位字节放在低地址,像把数字"从右往左"排列。对齐要求4字节数据必须放在4的倍数地址上,否则硬件访问会出问题。

C to Assembly Mapping

COMMON PATTERNS
// C: int x = a + b;
// ASM: add x0_reg, a_reg, b_reg

// C: if (a == b) { ... }
// ASM: beq a_reg, b_reg, label

// C: int arr[10]; x = arr[3];
// ASM: lw x_reg, 12(base_reg)   # 3*4=12 byte offset

// C: for (int i=0; i

Boolean Logic & Combinational Circuits

Lectures 2-3, 7-8 | Midterm 1 scope

中文概要

布尔代数是数字设计的数学基础。变量只有0和1两个值。基本运算:AND(与)、OR(或)、NOT(非)。关键定律:德摩根定律(NOT(AB)=A'+B')、分配律、吸收律。组合逻辑的输出仅取决于当前输入。可用真值表、卡诺图、SOP/POS形式表达和化简。噪声容限衡量信号在传输中抗干扰的能力。

Boolean Algebra Laws

IDENTITIES
  • X + 0 = X, X · 1 = X
  • X + 1 = 1, X · 0 = 0
  • X + X = X, X · X = X
  • X + X' = 1, X · X' = 0
KEY LAWS
  • De Morgan's: (AB)' = A' + B', (A+B)' = A'B'
  • Distributive: A(B+C) = AB+AC, also A+BC = (A+B)(A+C)
  • Absorptive: A+AB = A, A(A+B) = A
  • Consensus: AB+A'C+BC = AB+A'C
中文

德摩根定律是最重要的化简工具,可以记为"打破长杠,变换运算":AND变OR,OR变AND,各变量取反。吸收律(A+AB=A)的直觉是:如果A已经是1,加上AB不改变结果。共识定律说多余项BC可以去掉,因为它被AB和A'C覆盖。

Canonical Forms

  • SOP (Sum of Products): OR of minterms where f=1. Each minterm is an AND of all variables.
  • POS (Product of Sums): AND of maxterms where f=0. Each maxterm is an OR of all variables.
中文

SOP就是把真值表中输出为1的行写成"乘积之和",每一项包含所有变量(取值为0的变量加非号)。POS则相反,取输出为0的行写成"和之积"。两者互为对偶。

Noise Margins

FORMULAS
NM_H Undefined NM_L V_OH V_IH V_IL V_OL NM_H = V_OH - V_IH NM_L = V_IL - V_OL "1" "0" Amount of noise tolerable at worst-case output
中文

噪声容限就像信号的"安全边距"。NM_H是高电平输出和下一级高电平输入阈值之间的差距,NM_L是低电平的对应差距。越大越抗干扰,就像围墙越高越安全。

Design Metrics

  • Propagation delay (t_p): time from input 50% to output 50%. t_p = (t_pHL + t_pLH)/2
  • Rise time (t_r): output 10% to 90%. Fall time (t_f): 90% to 10%.
  • Throughput: operations per unit time. Latency: time from start to finish of one task.
中文

传播延迟是信号从输入到输出经过50%点的时间。吞吐量是单位时间能完成多少操作(像流水线工厂每秒出几个产品),延迟是一个操作从头到尾的时间(一个产品走完整条流水线要多久)。

SystemVerilog

Lectures 4-6 | Midterm 1 scope

中文概要

SystemVerilog是硬件描述语言,分为结构化(门级连接)和行为化(算法描述)两种风格。关键区别:assign是连续赋值(组合逻辑),always_ff用于时序逻辑(触发器),always_comb用于组合逻辑。阻塞赋值(=)按顺序执行,非阻塞赋值(<=)并行更新(用于触发器)。模块可参数化、可用generate循环实例化。

Two Styles

STRUCTURAL
module mux2(input a,b,sel, output out);
  logic s0, w0, w1;
  not (s0, sel);
  and (w0, s0, a);
  and (w1, sel, b);
  or  (out, w0, w1);
endmodule

Instantiates gates explicitly. Output is always first argument.

BEHAVIORAL
module mux2(input a,b,sel, output out);
  assign out = sel ? b : a;
endmodule

assign = continuous assignment (combinational logic). Changes on RHS immediately reflected.

中文

结构化描述就像画电路图,一个一个门连起来。行为化描述更像写代码,描述"做什么"而不是"怎么连"。两种方式综合后可以得到一样的电路。assign相当于"永远保持这个等式成立的导线连接"。

Always Blocks

KEY RULES
BlockUse ForAssignmentSensitivity
always_combCombinational logic= (blocking)Auto (all inputs)
always_ffFlip-flops<= (non-blocking)@(posedge clk)
always_latchLatches (avoid!)<=Level-sensitive
中文

always_comb像一个"随时更新的计算器",输入一变输出立刻变。always_ff像一个"快门",只在时钟上升沿那一刻拍一张"快照"存起来。记住:组合逻辑用=(阻塞),触发器用<=(非阻塞)。

Blocking vs Non-Blocking

CRITICAL DIFFERENCE

Non-blocking (<=) - for FF

always_ff @(posedge clk) begin
  q1 <= in;  // all evaluate
  q2 <= q1;  // using OLD values
  q3 <= q2;  // = shift register!
end

Blocking (=) - WRONG for FF

always_ff @(posedge clk) begin
  q1 = in;   // updates immediately
  q2 = q1;   // uses NEW q1 = in
  q3 = q2;   // all get same value!
end

Parameterized Modules & Generate

module counter #(parameter N = 4) (
  input clk, rst, enb,
  output logic [N-1:0] count
);
  always_ff @(posedge clk)
    count <= rst ? '0 : (enb ? count+1 : count);
endmodule
中文

参数化模块像一个"可调尺寸的模具"。parameter定义可调参数(如位宽N),generate/for可以批量生成重复硬件结构(如N个加法器)。实例化时用 #(.N(8)) 指定参数值。

Sequential Logic & FSMs

Lectures 6-7 | Midterm 1 scope

中文概要

时序逻辑的输出取决于当前输入和历史状态(通过触发器存储)。有限状态机(FSM)是常见的时序电路。Moore机输出仅取决于当前状态,Mealy机输出取决于当前状态和输入。FSM用两段式always block实现:一个always_ff更新状态寄存器,一个always_comb计算下一状态和输出。同步/异步复位的区别在于复位是否依赖时钟。

Moore vs Mealy

MooreMealy
Output depends onState onlyState + Input
Output changesOn clock edgeAsynchronously with input
Typically needsMore statesFewer states
中文

Moore机像一个"只看当前房间决定行为"的机器人,Mealy机还会"听门外的声音来决定"。Moore更安全(输出同步),Mealy更省状态但输出可能有毛刺。

FSM in SystemVerilog (Two-Block Style)

typedef enum logic [1:0] {IDLE, RUN, DONE} state_t;
state_t state, next_state;

always_ff @(posedge clk)
  state <= rst ? IDLE : next_state;

always_comb begin
  next_state = state;  // default: stay
  case (state)
    IDLE: if (start) next_state = RUN;
    RUN:  if (finish) next_state = DONE;
    DONE: next_state = IDLE;
  endcase
end
中文

两段式FSM:第一段(always_ff)是"记忆",每个时钟存一次当前状态;第二段(always_comb)是"决策",根据当前状态和输入决定下一步。注意always_comb里要有默认赋值(next_state=state),否则会综合出latch。

Sync vs Async Reset

Synchronous Reset

always_ff @(posedge clk)
  if (!reset_n) q <= 0;
  else q <= d;

Reset only on clock edge.

Asynchronous Reset

always_ff @(posedge clk or
           negedge reset_n)
  if (!reset_n) q <= 0;
  else q <= d;

Reset immediately, no clock needed.

RISC-V & Datapath

Lectures 8-10 | Midterm 1 scope

中文概要

RISC-V是开源指令集架构,有6种指令格式(R/I/S/B/U/J)。处理器是一个大型状态机:每个时钟周期取指、译码、执行、访存、写回。数据通路包括PC、指令存储器、寄存器文件、ALU、数据存储器、立即数生成器。控制信号(PCSel, ALUSel, RegWEn等)由指令的opcode/funct字段决定。JAL/JALR用于跳转,Branch用于条件分支。

Instruction Formats

FormatFields (31..0)Used By
R-typefunct7 | rs2 | rs1 | funct3 | rd | opcodeADD, SUB, AND, OR, XOR, SLT...
I-typeimm[11:0] | rs1 | funct3 | rd | opcodeADDI, LW, JALR, SLLI...
S-typeimm[11:5] | rs2 | rs1 | funct3 | imm[4:0] | opcodeSW, SB, SH
B-typeimm[12|10:5] | rs2 | rs1 | funct3 | imm[4:1|11] | opcodeBEQ, BNE, BLT, BGE...
U-typeimm[31:12] | rd | opcodeLUI, AUIPC
J-typeimm[20|10:1|11|19:12] | rd | opcodeJAL
中文

RISC-V指令格式里,立即数的位分布看起来很乱,但其实是为了让不同格式的相同位尽量对齐(简化硬件)。记住关键:R型用于寄存器运算,I型用于立即数和加载,S型用于存储,B型用于分支。

Single-Cycle Datapath

PC IMEM Reg[] rs1,rs2,rd ALU DMEM WB Imm Gen wb Control Logic: PCSel, ImmSel, RegWEn, BrSel, ALUSel, MemRW, WBSel

Data flows: PC → IMEM → Decode/RegRead → ALU → DMEM → WriteBack

中文

单周期数据通路就像一条"流水线工厂":PC提供地址 → 指令存储器取出指令 → 译码并读寄存器 → ALU计算 → 访问数据存储器 → 结果写回寄存器。控制逻辑就像"车间主任",根据指令类型发出不同控制信号。

Pipelining & Hazards

Lectures 10-11 | Midterm 1 scope

中文概要

流水线将数据通路分成多个阶段(取指F、译码D、执行X、访存M、写回W),用寄存器分隔,使多条指令可以同时在不同阶段执行,提高吞吐量。但引入三种冒险:结构冒险(资源冲突)、数据冒险(数据依赖)、控制冒险(分支)。解决方法:转发(forwarding)、停顿(stalling)、分支预测。最大频率由最慢阶段决定。

5-Stage Pipeline

Hazards

TypeCauseSolution
StructuralTwo stages need same resourceSeparate IMEM/DMEM, multi-port regfile
DataInstruction depends on result of prior instrForwarding, stalling
ControlBranch/jump changes PCBranch prediction, flush, delay slots
中文

流水线冒险就像高速公路上的"堵车":结构冒险是两辆车抢同一车道(加车道解决);数据冒险是后车需要前车的结果(转发=直接传递,停顿=等一等);控制冒险是不知道该走哪个出口(分支预测=猜一个方向先走着)。

Timing

FORMULAS
  • Max frequency: f = 1/Tmin
  • Tmin = tclk-q + tlogic,max + tsetup
  • Hold constraint: tclk-q + tlogic,min > thold
  • Retiming: move registers to balance critical path across pipeline stages

CMOS Fundamentals

Lectures 13-14 | Midterm 2 scope

中文概要

CMOS是现代芯片的基础。NMOS在栅极高电平时导通(拉低输出),PMOS在栅极低电平时导通(拉高输出)。每个门有互补的上拉(PUN,PMOS)和下拉(PDN,NMOS)网络,是对偶关系:串联变并联。静态CMOS门只能实现反相功能(NAND/NOR)。晶体管尺寸设计:串联时加倍宽度以保持等效驱动能力。

NMOS vs PMOS

V_G = 0 (NMOS OFF)
中文

NMOS就像一个"高电平才开的水龙头",导通时把输出拉到地(输出0)。PMOS相反,"低电平才开",导通时把输出拉到电源(输出1)。NMOS传递强0弱1,PMOS传递强1弱0,所以CMOS两者配合使用。

CMOS Gate Structure

KEY RULES
  • PDN (NMOS, to GND): Series = AND, Parallel = OR
  • PUN (PMOS, to VDD): Dual of PDN (Series↔Parallel)
  • Output = complement of PDN expression
  • Static CMOS is always inverting: can build NAND, NOR, not AND/OR directly
中文

PDN决定什么时候输出拉低(接地),所以如果PDN中两个NMOS串联(都导通才接地=AND条件),输出就是(AB)的反=NAND。对偶规则:PDN串联↔PUN并联,像镜子里的倒影。

Interactive: Build a CMOS Gate

Transistor Sizing

METHOD

Goal: equal pull-up/pull-down strength as reference inverter (Wp/Wn = β)

  • NMOS in series × N: each width = N × Wn,min
  • PMOS in series × N: each width = N × Wp,min
  • Parallel: keep minimum width
GateNMOS configPMOS configEach NMOS WEach PMOS WTotal (β=2)
NAND22 series2 parallel228
NAND33 series3 parallel3215
NOR22 parallel2 series1410
NOR33 parallel3 series1621

NAND gates are smaller than NOR gates! (because PMOS in parallel, not series)

中文

晶体管串联时电阻加倍,所以要加倍宽度来补偿。NAND中NMOS串联(需要加倍)而PMOS并联(不用加倍),NOR则相反。因为PMOS本身就比NMOS大(beta=2),所以NOR中PMOS串联会变得很大,这就是为什么NAND比NOR更受欢迎。

CMOS Delay

Lecture 15 | Midterm 2 scope

中文概要

CMOS延迟用RC模型分析。每个晶体管导通时有电阻R(与宽度成反比),栅极和漏极有电容。传播延迟:t = ln(2) × R × (寄生电容 + 负载电容)。Elmore延迟用于复杂RC网络:对每个电容,乘以从源到该电容的共享电阻,求和。关键:包含非通路电容,不包含非通路电阻。反相器链最优扇出约为e≈2.7。

RC Model

FORMULAS
  • On-resistance: R = Ron / W (wider = lower resistance)
  • Gate capacitance: Cg = W × Ci
  • Drain capacitance: Cd = γ × Cg (γ ≈ 1 in this class)
  • Inverter delay: tp = ln(2) × Req × (Cparasitic + Cload)
中文

RC模型把晶体管简化为"水管"(电阻R)和"水桶"(电容C)。宽度大=水管粗=电阻小=充电快。延迟就是给电容充到50%所需的时间=ln(2)*RC。就像用水管往桶里灌水,管越粗(R小)桶越小(C小)就越快。

Elmore Delay

METHOD

tD = ln(2) × ∑i Rshared,i × Ci

For each capacitor: multiply by total resistance from source to that cap. Sum all.

Include off-path CAPACITANCES. Exclude off-path RESISTANCES.

中文

Elmore延迟就像算"每个水桶被上游所有水管拖慢了多少"。对每个电容C_i,找从源到它经过的所有电阻加起来(共享路径),乘以C_i,最后全部求和。非通路上的电容要算(因为也要充电),但非通路上的电阻不算(因为电流不经过那里)。

Interactive: RC Charging

V(t) = VDD(1 - e-t/RC), 50% at t = ln(2)·RC

Inverter Chain Optimization

FORMULA
  • Overall fanout: F = CL/Cin
  • N stages, optimal fanout per stage: f = F1/N
  • Optimal f ≈ e ≈ 2.718 (practical: 3-4)
  • Chain delay: D = N × tp0(1 + γ/f)
中文

如果要驱动一个很大的负载(比如C_L是输入的1000倍),直接用一个反相器会太慢。解决方案:用一串逐渐变大的反相器,每级放大约e倍(2.7倍)。级数N=ln(F)/ln(e),总延迟最小。就像搬重物时用多个滑轮逐级放大力量。

Logical Effort

Lecture 16 | Midterm 2 scope

中文概要

逻辑努力是比较不同门速度并优化尺寸的方法。门延迟 d = g×h + p(逻辑努力×电气努力 + 寄生延迟)。路径总努力 F = G×B×H,最优每级努力 f = F^(1/N)。从输出往回算每个门的输入电容来确定尺寸。NAND比NOR快(逻辑努力更小)。

Gate Delay Formula

CORE FORMULA

d = g × h + p

  • g = logical effort (gate type property)
  • h = Cout/Cin = electrical effort (fanout)
  • p = parasitic delay
  • f = g×h = stage effort
中文

逻辑努力g衡量"这个门比反相器慢多少"。电气努力h衡量"负载有多重"。寄生延迟p是门自身的固有延迟。总延迟d就像做一件事的难度=事情本身的难度(g) × 负担大小(h) + 准备时间(p)。

Logical Effort Table

Gateg (logical effort)p (parasitic delay)
Inverter11
NAND24/32
NAND35/33
NOR25/32
NOR37/33

Path Optimization

OPTIMIZATION METHOD
  1. Compute: G = ∏gi, B = ∏bi (branching), H = Cout/Cin
  2. Path effort: F = G × B × H
  3. Optimal stage effort: f̂ = F1/N
  4. Min delay: D = N × f̂ + ∑pi
  5. Size backwards: Cin,i = gi × Cout,i / f̂
中文

路径优化的步骤:(1)算总努力F=G*B*H;(2)求最优每级努力f=F^(1/N);(3)从输出往回推每个门的输入电容C_in = g*C_out/f。就像规划旅程:先算总路程,再平均分配每天的行程,然后从目的地往回安排住宿点。

Wires & Energy

Lecture 17-18 | Midterm 2 scope

中文概要

导线有电阻和电容,先进工艺中导线延迟可超过门延迟。总功耗 = 动态 + 短路 + 漏电。动态功耗 P = αCV²f,其中α是活动因子=2p(1-p)。降压可大幅降低功耗(P∝V³)但频率也降(f∝V)。每次开关能量=CV²。短路功耗在PUN/PDN同时导通时产生,峰值在VDD/2。

Wire Delay

FORMULAS
  • Wire R: R = ρ × L / (W × t) = Rsq × L/W
  • Wire C: C = ε × A / d
  • Lumped: t = 0.7 RC
  • Distributed (N→∞): t ≈ 0.35 RC
中文

导线不是理想的,有电阻(越长越细越大)和电容(与邻近导线的耦合)。集中式模型把整条导线的RC集中在一点,分布式模型更准确(延迟减半)。在先进工艺中导线延迟已经超过门延迟,成为瓶颈。

Dynamic Power

THE MOST IMPORTANT POWER FORMULA

Pdynamic = α × CL × VDD² × fclk

  • Activity factor: α = 2p(1-p) where p = P(output=1)
  • Energy per switch: E = CL × VDD²
  • Short-circuit: Psc = Isc × VDD, peaks at Vin = VDD/2
  • Static: Pleak = Ileak × VDD
中文

动态功耗P=alpha*C*V^2*f是最重要的公式。alpha是"活动因子"=信号翻转的概率(对于随机信号,p=0.5时alpha最大=0.5)。降压效果显著:电压降一半,功耗降到1/8(因为f也降一半)。但不能无限降压,低于阈值电压晶体管就不工作了。

Interactive: Voltage Scaling


Frequency: 200 MHz
Power: 200 mW
Energy/op: 1.00 μJ

Base: C=10nF, f=200MHz@1.0V, α=0.1. Assumes ID ∝ VDD.

Adders

Lectures 18-19 | Midterm 2 scope

中文概要

加法器核心信号:生成G=AB,传播P=A⊕B,Cout=G+P·Cin。行波进位O(N)最慢但最省面积和能量。进位跳跃O(√N)利用分组跳过全传播段。进位选择O(√N)并行计算两种进位假设再选择。进位前瞻O(logN)最快但面积最大。

Full Adder

CORE EQUATIONS
  • S = A ⊕ B ⊕ Cin
  • Cout = G + P · Cin   where G = AB, P = A⊕B
  • Kill: K = ĀB̄ (carry always killed)
中文

全加器是加法器的"砖块"。G(生成)=两个输入都是1时自己产生进位。P(传播)=恰好一个是1时把进位传递下去。K(杀死)=两个都是0时进位被吃掉。C_out = G + P*C_in 就是"要么自己生成,要么传递进来的"。

Adder Comparison

TypeDelayAreaEnergyKey Idea
Ripple-CarryO(N)O(N)LowestChain carries
Carry-BypassO(√N)O(N)LowSkip groups if all P=1; b=√(N/2)
Carry-SelectO(√N)~2NMedCompute Cin=0,1 in parallel
Carry-LookaheadO(log N)O(NlogN)HighParallel G/P tree

Interactive: Carry Propagation

Multipliers

Lectures 19-20 | Midterm 2 scope

中文概要

N位乘法三步:部分积生成(AND)、部分积压缩(CSA)、最终加法。CSA将3个数变2个数仅需O(1)。Wallace树用多层CSA,O(log N)层压缩。Booth编码(基4)把N个部分积减半为N/2:3A=4A-A,避免困难乘法。Baugh-Wooley用于有符号乘法的符号扩展优化。

Three Stages

Partial Products AND gates or Booth Reduction (CSA) 3:2 compress, Wallace tree Final Adder CLA or fast adder
中文

乘法器就像手算竖式乘法的硬件版:(1)算出所有部分积(每一位乘以被乘数);(2)用CSA(进位保留加法器)快速压缩,每次3个数变2个数;(3)最后用快速加法器把剩下的2个数加起来。Wallace树就是一层层CSA嵌套。

Booth Recoding (Radix-4)

ENCODING TABLE
Bk+1BkBk-1000001010011100101110111
Partial Product0+A+A+2A-2A-A-A0

N partial products → N/2. Key insight: 3A = 4A - A (defer 4A to next group).

中文

Booth编码的核心思想:把连续的1变成"高位+1、低位-1"。比如0111=1000-0001=8-1。基4版本每次看3位(重叠1位),把部分积数量减半。硬件只需要做0、+A、-A、+2A、-2A五种操作,其中2A就是左移一位。

Timing & Latches

Lecture 21 | Final scope

中文概要

时序分析:建立时间(setup)是数据在时钟边沿前必须稳定的时间,保持时间(hold)是之后必须保持的时间。最小周期T ≥ tclk-q + tlogic,max + tsetup + tskew + tjitter。保持约束:tclk-q + tlogic,min > thold + tskew。锁存器(latch)是电平敏感的,透明时D直通到Q;触发器(flip-flop)是边沿触发的,由两个锁存器组成。时钟偏移(skew)是空间变化,抖动(jitter)是时间变化。

Setup and Hold

TIMING CONSTRAINTS
  • Setup (max delay): T ≥ tclk-q + tlogic,max + tsetup + tskew + tjitter
  • Hold (min delay): tclk-q + tlogic,min ≥ thold + tskew + tjitter
  • Setup slack: T - (tclk-q + tlogic,max + tsetup - tskew + tjitter)
  • Hold slack: (tclk-q + tlogic,min - tskew - tjitter) - thold

Setup too slow? Reduce logic, retime, pipeline.   Hold too fast? Add buffers.

中文

建立时间就像"提前到机场的时间"——数据必须在时钟来之前就稳定好。保持时间就像"不能提前离开考场"——时钟来了之后数据还要保持一段时间。建立违例可以通过降频解决(给更多时间),但保持违例不行(与频率无关),只能加缓冲器。

Flip-Flop vs Latch

Flip-FlopLatch
Triggered byClock edgeClock level
Transparent whenNever (samples at edge)CLK high (or low)
Built fromTwo latches (primary+secondary)Transmission gate + inverter
Verilogalways_ff @(posedge clk)always_latch if(clk) q<=d;
中文

锁存器像一扇"电动门"——时钟高电平时门开着,输入直接穿过到输出(透明);时钟低电平时门关闭,保持最后的值。触发器像"快门相机"——只在时钟上升沿那一瞬间"拍照"(采样)。触发器由两个锁存器组成(主从结构)。

Clock Skew & Jitter

  • Clock skew (tSK): spatial variation in clock arrival. Deterministic + random.
  • Clock jitter (tJ): temporal variation in consecutive clock edges. Random noise.
  • Positive skew (receiver late) helps setup, hurts hold
  • Negative skew (receiver early) hurts setup, helps hold
中文

时钟偏移(skew)是"空间上的不同步"——芯片不同角落的时钟到达时间不同(像广播信号到达不同城市有延迟)。抖动(jitter)是"时间上的不稳定"——同一个点的时钟周期忽长忽短。正偏移(接收器晚收到时钟)给了数据更多时间到达(利于建立),但也给了旧数据更多时间残留(不利于保持)。

SRAM & Memory

Final scope (from FA25 notes)

中文概要

SRAM用6个晶体管存储1位数据。读操作:预充电位线到VDD,开启字线,感应放大器检测位线差异。读稳定性要求存储节点不被翻转(拉下>拉起)。写操作:拉低一条位线迫使单元翻转。写能力要求访问管能覆盖存储反相器。行译码器用NAND门选中字线。列电路包括位线调节、写驱动、感应放大器。

6T SRAM Cell

WL (Word Line) BL BL̄ M5 M6 INV 1 M1(N) M2(P) INV 2 M3(N) M4(P) Q VDD GND Read: precharge BL,BL̄ high, assert WL Write: force one BL low, assert WL Sizing: PD > Access > PU for stability
中文

6T SRAM像两个互相"掐架"的反相器(交叉耦合),稳定在两个状态之一(存0或存1)。读的时候先把两条"天平秤"(位线)都充到高电平,然后打开门(字线),存储的值会让一边稍微降低,感应放大器放大这个微小差异。写的时候强行把一边拉到低,迫使状态翻转。

Read vs Write

ReadWrite
RequirementStability (don't flip the cell)Writability (must flip the cell)
Pull-down vs AccessPD > Access (W/L1 > W/L5)Access > PU (W/L5 > W/L2)
Typical ratio1.5:1 (planar), 2:1 (finFET)1:1 (planar), 2:1 (finFET)

Power & Energy Analysis

Lectures 17, 19 | Final scope

中文

功耗分析是芯片设计的核心挑战之一。总功耗=动态功耗+静态功耗。动态功耗来自充放电(开关活动),静态功耗来自漏电流。就像一辆汽车:动态功耗是踩油门时消耗的油(取决于踩了多少次和踩多深),静态功耗是停车时发动机空转消耗的油(一直在漏)。现代芯片两者都很重要,需要用各种技术来管理。

Power Breakdown

TOTAL POWER

Ptotal = Pdynamic + Pshort-circuit + Pstatic

  • Dynamic: Pdyn = α × CL × VDD2 × f — from charging/discharging load capacitances
  • Short-circuit: Psc = Isc × VDD — when PMOS and NMOS both on during transition
  • Static (leakage): Pstatic = Ileak × VDD — subthreshold and gate leakage
中文

动态功耗:每次信号翻转,给电容充电(0→1)从电源取能量CV^2,放电(1→0)能量消散到地。短路功耗:输入翻转过程中PMOS和NMOS同时短暂导通,产生直通电流(V_in在V_DD/2时最大)。静态功耗:即使不翻转,晶体管也有微小漏电流。

Activity Factor

ACTIVITY FACTOR alpha

The activity factor α represents the probability of a 0→1 transition per clock cycle.

  • α = 2 × p × (1-p) where p = P(output = 1)
  • Maximum when p = 0.5: αmax = 0.5
  • Clock signal: α = 1 (transitions every cycle)
  • For an AND2 with independent random inputs: p = 0.25, α = 2(0.25)(0.75) = 0.375
  • For an OR2 with independent random inputs: p = 0.75, α = 2(0.75)(0.25) = 0.375
中文

活动因子alpha衡量"信号有多忙"。alpha=2p(1-p)中p是输出为1的概率。如果信号总是0或总是1(p=0或1),alpha=0(没有翻转)。如果p=0.5(随机信号),alpha=0.5(最大翻转率)。时钟信号每个周期都翻转,所以alpha=1。

Energy Per Operation

ENERGY FORMULAS
  • Energy per switching event: Eswitch = CL × VDD2
  • Energy per operation (considering activity): Eop = α × CL × VDD2
  • Note: energy is independent of frequency! Lowering f reduces power but not energy/op.
  • Lowering VDD reduces energy/op quadratically (E ∝ VDD2)
中文

能量和功率的区别:功率P=能量/时间。降低频率可以降低功率(单位时间做的事少了),但每次操作的能量不变。降低电压才能真正降低每次操作的能量(E正比于V^2)。这就是为什么降压是最有效的省电手段。

Power Management Techniques

TECHNIQUES
TechniqueWhat It DoesSaves
Clock GatingDisable clock to idle blocks (alpha → 0)Dynamic power
Voltage ScalingLower VDD (P ∝ V3 since f ∝ V)Dynamic + Energy
Power GatingCut VDD to idle blocks entirelyStatic (leakage)
Multi-VthHigh Vth on non-critical paths (less leakage)Static power
Body BiasingAdjust Vth dynamically via substrate voltageStatic power
中文

时钟门控:给不用的模块"关掉闹钟",不翻转就不耗动态功耗。电压缩放:降低供电电压,功耗按V^3下降(因为频率也按V下降)。电源门控:给不用的模块彻底"断电",消除漏电。多阈值:关键路径用快(低阈值)晶体管,其他用省电(高阈值)的。

DVFS (Dynamic Voltage and Frequency Scaling)

DVFS
  • Adjust VDD and f together based on workload
  • High load → high V/f for performance
  • Low load → low V/f to save power
  • Key relationship: fmax ∝ (VDD - Vth)2 / VDD (simplified: f ∝ VDD)
  • Power scales as P ∝ VDD3 (because both V2 and f scale)
中文

DVFS就像汽车的"巡航模式":高速公路上加大油门(高电压高频率)跑得快,城市里降低油门(低电压低频率)省油。手机CPU就大量使用DVFS:刷微博时降频省电,玩游戏时全速运行。功耗按电压的三次方下降,所以降压10%可以省电约27%。

WORKED EXAMPLE

Q: A chip has Ctotal=20nF, VDD=1.0V, f=1GHz, alpha=0.15, Ileak=100mA. Find total power.

  1. Pdyn = 0.15 × 20×10-9 × 1.02 × 109 = 3.0 W
  2. Pstatic = 0.1 × 1.0 = 0.1 W
  3. Ptotal ≈ 3.0 + 0.1 = 3.1 W (ignoring short-circuit)

If we scale VDD to 0.8V: f scales to 0.8GHz, Pdyn = 0.15 × 20e-9 × 0.64 × 0.8e9 = 1.536 W (49% reduction!)

Shifters & Crossbars

Lecture 21 | Final scope

中文

移位器用于将二进制数左移或右移若干位(乘以或除以2的幂)。桶形移位器用多级MUX实现对数级延迟的任意移位。漏斗移位器把左移和右移统一成一个操作。交叉开关是N个输入到M个输出的全连接开关网络,用于路由器和网络芯片。就像火车站的"道岔"系统,可以把任意一条入轨连到任意一条出轨。

Barrel Shifter

LOGARITHMIC MUX STAGES

An N-bit barrel shifter uses log2(N) stages of 2:1 muxes.

  • Stage k shifts by 2k positions if shift_amount[k] = 1
  • For 8-bit: 3 stages (shift by 1, 2, 4)
  • For 32-bit: 5 stages (shift by 1, 2, 4, 8, 16)
  • Each stage: N muxes selecting between shifted and unshifted
  • Delay: O(log N) mux delays
  • Area: O(N log N) muxes total
中文

桶形移位器的巧妙之处:把移位量分解成二进制位。比如移位5=101(二进制),就先移1位(第0级),跳过移2位(第1级),再移4位(第2级)。每一级是一排MUX,选择"移还是不移"。就像调音量旋钮,每个刻度都是上一个的两倍。

EXAMPLE: 8-BIT BARREL SHIFTER
// Shift amount = 5 = 101 in binary
// Input:    A B C D E F G H

// Stage 0 (shift by 1, enabled): B C D E F G H 0
// Stage 1 (shift by 2, skip):    B C D E F G H 0
// Stage 2 (shift by 4, enabled): F G H 0 0 0 0 0

// Result: left shift by 5

Funnel Shifter

UNIFIED SHIFT
  • Combines left shift, right shift, and rotate into one structure
  • Place two copies of input side-by-side: [A | A] (2N bits)
  • Select N consecutive bits from this 2N-bit word
  • Left shift by k: select bits starting at position N-k
  • Right shift by k: select bits starting at position k (with sign/zero extension for upper copy)
  • Rotate by k: select bits starting at position k (upper copy = lower copy)
中文

漏斗移位器的思路很优雅:把输入复制一份拼成2N位的"长条",然后从中选取连续的N位。左移就是选靠后的位置,右移就是选靠前的位置,旋转就是两份都用原数据。就像一个"取景框"在两倍长的胶片上滑动。

Crossbar Switch

N x M CROSSBAR
  • Connects any of N inputs to any of M outputs
  • Each output has an N:1 mux
  • Total: M muxes, each N-to-1
  • Area: O(N × M) — grows quadratically
  • Delay: O(1) mux delay per output (single stage)
  • Used in: network switches, NoC routers, register file read ports
中文

交叉开关就像电话交换机:N个输入和M个输出之间可以任意连接。每个输出端有一个N选1的MUX。面积是N*M(每个交叉点一个开关),所以规模大了很贵。在片上网络(NoC)和路由器中广泛使用。

4x3 Crossbar In0In1In2In3 Out0Out1Out2

Binary Division

Lecture 21 | Final scope

中文

二进制除法比乘法更复杂,因为每一步都依赖上一步的结果(无法像乘法那样并行)。恢复除法最直观:试减,如果不够就加回来。不恢复除法更快:不加回去,而是下一步补偿。SRT除法用更高基数(基4/8)每步产生多位商,是现代处理器的标准做法。就像手算长除法,但每一步可以"猜"多位。

Restoring Division

ALGORITHM
  1. Initialize: remainder R = dividend (2N bits), divisor D, quotient Q = 0
  2. For each bit position (N iterations):
    • Left-shift R by 1
    • Trial subtract: R' = R - D
    • If R' ≥ 0: set Q bit = 1, R = R'
    • If R' < 0: set Q bit = 0, restore R (don't update)
  3. After N steps: Q = quotient, R = remainder

Delay: O(N) iterations, each with an N-bit subtraction

Hardware: N-bit adder/subtractor + shift registers

中文

恢复除法就像手算除法:每次"试探"能不能减去除数。能减(够减)就商1,不能减(不够减)就商0并把减掉的加回来(恢复)。缺点是"加回来"这一步浪费时间。

EXAMPLE: 7 / 3
Dividend = 0111 (7), Divisor = 011 (3)
Step 1: R=0111, shift left -> 01110
        Trial: 01110 - 01100 = 00010 >= 0, Q[1]=1, R=00010
Step 2: R=00010, shift left -> 00100
        Trial: 00100 - 01100 = -01000 < 0, Q[0]=0, R=00100(restore)
Result: Q = 10 (2), R = 01 (1)
Check: 7 = 3*2 + 1 correct!

Non-Restoring Division

FASTER ALGORITHM
  • Instead of restoring, add D in the next step if previous step was negative
  • If previous remainder ≥ 0: shift left, subtract D, Q bit = 1
  • If previous remainder < 0: shift left, add D, Q bit = 0
  • At the end: if remainder is negative, add D to correct
  • Advantage: only one add/subtract per step (no restore step)
  • Quotient bits are in {0,1} but may need final correction
中文

不恢复除法的思路:如果这一步减多了(结果为负),下一步不是"加回来再减",而是直接"加"来补偿。省去了恢复那一步。就像在天平上称重:放错了砝码不用拿下来重放,直接在另一边加砝码补偿。

SRT Division

HIGH-RADIX DIVISION
  • Named after Sweeney, Robertson, Tocher
  • Uses radix-4 or higher: produces 2+ quotient bits per iteration
  • Quotient digits from redundant set {-2, -1, 0, 1, 2} for radix-4
  • Uses a lookup table (PLA) indexed by top bits of remainder and divisor
  • Redundancy allows imprecise remainder estimation → faster
  • Delay: O(N/2) for radix-4, O(N/k) for radix-2k
  • Used in modern Intel/AMD/ARM processors
中文

SRT除法是现代CPU的标准。基4版本每步产生2位商,速度翻倍。关键技巧是使用"冗余表示":商的每一位可以是-2到+2(不只是0和1),这样不需要精确知道余数的值就能选择商,容错性更强。就像去超市不用精确到分地算钱,四舍五入也能结账。(著名的Pentium FDIV bug就是SRT表格里少了几个条目。)

EXAM TIP

Division is inherently sequential (each step depends on previous remainder). Unlike multiplication, you cannot parallelize the iterations. This is why division is 10-40x slower than multiplication in hardware.

FPGA vs ASIC

Project-relevant | Final scope

中文

FPGA(现场可编程门阵列)就像"乐高积木"——买回来后可以反复重新编程配置。ASIC(专用集成电路)就像"定制模具"——一旦制造出来就不能改了。FPGA适合原型验证和小批量,ASIC适合大批量生产(手机芯片等)。我们151的项目就是在FPGA上实现RISC-V处理器。

FPGA Architecture

CORE COMPONENTS
  • LUT (Look-Up Table): Small SRAM that can implement any N-input boolean function (typically 4-6 inputs). Acts as a universal gate.
  • CLB (Configurable Logic Block): Contains multiple LUTs + flip-flops + carry chains + muxes
  • Routing: Programmable interconnect (switch boxes + connection boxes) that connects CLBs
  • IOBs: Input/Output blocks at chip periphery
  • Block RAM: Dedicated SRAM blocks (e.g., 36Kbit)
  • DSP slices: Hardened multiply-accumulate units
中文

FPGA的核心是LUT(查找表)——一小块SRAM存储真值表,可以实现任何N输入布尔函数。比如4输入LUT有16个存储单元(2^4),可以实现任何4输入逻辑。CLB把多个LUT、触发器、进位链打包在一起。可编程路由网络把它们连接起来。就像一块"万能面包板"。

FPGA vs ASIC Comparison

AspectFPGAASIC
ReconfigurableYes (reprogram anytime)No (fixed after fabrication)
NRE CostLow ($0 - $10K)Very high ($1M - $100M+)
Unit CostHigh ($10 - $10K per chip)Low ($0.10 - $100 per chip)
Performance~100-500 MHz typical~1-5 GHz
Power Efficiency10-100x worseBest (optimized transistors)
Area Efficiency10-40x worseBest (custom layout)
Time to MarketDays to weeksMonths to years
Best ForPrototyping, low volume, reconfigurableHigh volume, max performance
中文

选择FPGA还是ASIC就像选择"3D打印"还是"开模具":3D打印(FPGA)灵活快速但单件贵,开模具(ASIC)前期投入大但大量生产便宜。FPGA因为用通用LUT和可编程路由,面积和功耗都比ASIC大很多,但开发周期短,适合我们做项目验证。

FPGA Design Flow

DESIGN STEPS
  1. RTL Design: Write SystemVerilog (behavioral)
  2. Simulation: Verify with testbenches (iverilog, VCS)
  3. Synthesis: Convert RTL to LUT/FF netlist (Vivado)
  4. Place & Route: Map netlist to physical FPGA resources
  5. Bitstream Generation: Create configuration file
  6. Program FPGA: Load bitstream onto device

ASIC Design Flow

DESIGN STEPS
  1. RTL Design: Write SystemVerilog
  2. Simulation: Functional verification
  3. Logic Synthesis: Map to standard cell library (Design Compiler)
  4. Floor Planning: Arrange major blocks on die
  5. Place & Route: Place cells and connect wires
  6. Timing Closure: Meet setup/hold across all paths (STA)
  7. DRC/LVS: Design rule check / layout vs schematic
  8. Tapeout: Send to foundry (TSMC, Samsung, etc.)
  9. Fabrication: Weeks to months
中文

FPGA流程:写代码 → 仿真 → 综合(变成LUT网表) → 布局布线 → 生成比特流 → 下载到板子上。几小时就能完成。ASIC流程多了很多步:综合到标准单元库、时序收敛、物理验证(DRC/LVS)、然后送去代工厂流片,整个过程几个月到一年。

PROJECT NOTE

In EECS 151, your RISC-V processor targets the Xilinx PYNQ-Z1 FPGA. Use Vivado for synthesis. Key FPGA-specific considerations: LUT utilization, BRAM usage, timing constraints (100MHz target), and using always @(*) for iverilog compatibility.

Cheatsheet: Midterm 1

Boolean Algebra

  • De Morgan: (AB)'=A'+B', (A+B)'=A'B'
  • Absorption: A+AB=A
  • Consensus: AB+A'C+BC=AB+A'C
  • SOP: OR of AND minterms
  • POS: AND of OR maxterms

Noise Margins

  • NM_H = V_OH - V_IH
  • NM_L = V_IL - V_OL
  • t_p = (t_pHL + t_pLH)/2

SystemVerilog

  • assign = continuous (comb)
  • always_comb: blocking (=)
  • always_ff: non-blocking (<=)
  • Module instantiation: .port(signal)
  • generate/for: parameterized hardware

FSMs

  • Moore: output = f(state)
  • Mealy: output = f(state, input)
  • Two-block style: 1 always_ff + 1 always_comb

RISC-V Formats

  • R: funct7|rs2|rs1|f3|rd|op (ADD,SUB,AND)
  • I: imm[11:0]|rs1|f3|rd|op (ADDI,LW,JALR)
  • S: imm[11:5]|rs2|rs1|f3|imm[4:0]|op (SW)
  • B: imm|rs2|rs1|f3|imm|op (BEQ,BNE)
  • U: imm[31:12]|rd|op (LUI,AUIPC)
  • J: imm|rd|op (JAL)

Pipelining

  • 5 stages: F D X M W
  • T_min = t_cq + t_logic + t_su
  • Hold: t_cq + t_min > t_hold
  • Hazards: structural, data, control
  • Fix: forwarding, stalling, prediction

Control Signals

  • PCSel: pc+4 or branch target
  • ImmSel: I/S/B/U/J format
  • ALUSel: ADD/SUB/AND/OR/XOR/SLT
  • MemRW: Read/Write
  • WBSel: ALU/mem/PC+4
  • RegWEn: write enable

Key Constants

  • x0 always = 0
  • JAL: rd=PC+4, PC=PC+imm
  • JALR: rd=PC+4, PC=rs1+imm
  • LUI: rd=imm<<12
  • AUIPC: rd=PC+(imm<<12)

Cheatsheet: Midterm 2

CMOS Basics

  • NMOS ON: V_GS > V_th (strong 0)
  • PMOS ON: V_GS < -|V_th| (strong 1)
  • PDN: series=AND, parallel=OR
  • PUN = dual of PDN
  • Always inverting: NAND, NOR only
  • Sizing: N in series => width x N

CMOS Delay

  • R = R_on/W, C_g = W*C_i, C_d = gamma*C_g
  • t_p = ln(2)*R*(C_p + C_L)
  • Elmore: t = ln(2)*sum(R_shared * C_i)
  • Off-path caps: YES. Off-path R: NO
  • Chain: f_opt = F^(1/N), f ~= e ~= 2.7

Logical Effort

  • d = g*h + p
  • INV: g=1,p=1. NAND2: g=4/3,p=2
  • NAND3: g=5/3,p=3. NOR2: g=5/3,p=2
  • F = G*B*H, f_hat = F^(1/N)
  • D_min = N*f_hat + sum(p_i)
  • Size: C_in,i = g_i * C_out,i / f_hat

Power & Energy

  • P_dyn = alpha * C * V_DD^2 * f
  • alpha = 2*p*(1-p)
  • E_switch = C * V_DD^2
  • P_sc = I_sc * V_DD (peak at V_DD/2)
  • P_leak = I_leak * V_DD
  • V scaling: f prop V, P prop V^3
  • DVFS: adjust V and f with workload

Wires

  • Lumped: t = 0.7RC
  • Distributed: t = 0.35RC
  • R = rho*L/(W*t), C = eps*A/d
  • Use Elmore delay for RC trees

Adders

  • G=AB, P=A xor B, C_out=G+P*Cin
  • RCA: O(N). Bypass: O(sqrt N), b=sqrt(N/2)
  • Select: O(sqrt N). CLA: O(log N)
  • Bypass delay: 2b + N/b - 2

Multipliers

  • CSA: 3 nums -> 2 in O(1)
  • Wallace: O(log_1.5 N) CSA levels
  • Booth radix-4: N->N/2 partial products
  • 3A = 4A - A trick

Constants

  • ln(2) = 0.693
  • e = 2.718 (optimal fanout)
  • gamma = C_d/C_g (usually 1)
  • NAND preferred over NOR

Cheatsheet: Final Exam

Cumulative: includes all MT1 + MT2 content plus the following.

Timing Constraints

  • T >= t_cq + t_logic,max + t_su + t_sk + t_j
  • Hold: t_cq + t_min >= t_hold + t_sk + t_j
  • Setup slack = T - (t_cq+t_max+t_su-t_sk+t_j)
  • Hold slack = (t_cq+t_min-t_sk-t_j) - t_hold
  • Setup fail: reduce logic, retime, pipeline
  • Hold fail: add buffers (cannot fix with freq)

Clock Non-Idealities

  • Skew (t_SK): spatial (deterministic+random)
  • Jitter (t_J): temporal (cycle-to-cycle)
  • +skew: helps setup, hurts hold
  • -skew: hurts setup, helps hold
  • Both reduce effective cycle time

Latch vs Flip-Flop

  • Latch: level-sensitive (transparent)
  • FF: edge-triggered (two latches)
  • Latch has t_cq, t_dq, t_su, t_hold
  • FF has t_cq, t_su, t_hold
  • Transmission gate latch: NMOS+PMOS parallel

SRAM

  • 6T cell: 2 access + 2 inverters
  • Read: precharge BL/BLbar, assert WL
  • Read stability: PD > Access transistor
  • Write: force BL low, assert WL
  • Writability: Access > PU
  • Ratio: PD:Access:PU = 3:2:1 (approx)

Memory Periphery

  • Row decoder: NAND gates -> wordline
  • Predecoder: smaller groups first
  • Bitline conditioning: precharge high
  • Write driver: pull one BL to 0
  • Sense amplifier: detect BL vs BLbar diff

Power Analysis

  • P_total = P_dyn + P_sc + P_leak
  • P_dyn = alpha*C*V^2*f
  • E_op = alpha*C*V^2 (freq independent)
  • Clock gating: reduce alpha
  • DVFS: adjust V,f with workload
  • Power gating: cut V_DD to idle blocks

Division Algorithms

  • Restoring: trial subtract, restore if neg
  • Non-restoring: add next step if neg
  • SRT: radix-4, 2 bits/iter, redundant digits
  • Division is sequential: O(N) iterations
  • ~10-40x slower than multiplication

Shifters & FPGA

  • Barrel: log2(N) mux stages, O(NlogN)
  • Funnel: 2N concat, select N bits
  • Crossbar: N*M muxes, O(1) delay
  • FPGA: LUT + CLB + routing + BRAM
  • ASIC: standard cells, tapeout, months
  • FPGA 10-100x less efficient than ASIC

Flashcards

Click to flip. Covers all exam material.

Loading...
Loading...
1/1

Practice Quiz