中文总结:这是61C的基础回顾——数字怎么在计算机里表示,二进制补码怎么处理负数,以及RISC-V指令集。就像盖房子前要先打地基。
61C Prerequisites Review
Number Representation
Computers represent everything in binary. Key systems:
Unsigned: Range [0, 2n -1]. Direct binary-to-decimal.
Sign-magnitude: MSB is sign. Two zeros problem. Range [-(2n-1 -1), 2n-1 -1]
Two's complement: MSB has weight -2n-1 . Range [-2n-1 , 2n-1 -1]. One representation of zero.
RISC-V ISA Review
RISC-V is a load-store architecture with 32 registers (x0-x31, x0 always 0).
R-type: add, sub, and, or, xor, sll, srl, sra, slt, sltu
I-type: addi, lw, jalr, loads
S-type: sw, sb, sh (stores)
B-type: beq, bne, blt, bge (branches)
U-type: lui, auipc
J-type: jal
Interactive: Two's Complement Converter
输入十进制数,查看它的二进制补码表示(8位)
Decimal value (-128 to 127):
Convert
Quick Check
What is -5 in 8-bit two's complement?
0b10000101
0b11111011
0b01111011
解释:5 = 00000101,取反得 11111010,加1得 11111011。
中文总结:布尔逻辑是数字电路的语言。就像搭积木,用AND、OR、NOT这些"逻辑门"组合出任何功能。德摩根定律是最重要的变换工具,K-map(卡诺图)帮你化简表达式。
Boolean Algebra & Logic
Key Laws
Canonical Forms
SOP (Sum of Products / 最小项之和): OR of AND terms (minterms). Each minterm = 1 row of truth table.
POS (Product of Sums / 最大项之积): AND of OR terms (maxterms). Each maxterm = 0 row of truth table.
Noise Margins
噪声容限就像"安全距离"——信号在传输中可以有多少噪声而不出错。
Interactive: Gate Simulator
点击输入(A/B)切换0和1,观察不同逻辑门的输出变化。这就是数字电路最基本的运算!
AND
OR
NOT
NAND
NOR
XOR
Click on input circles (A, B) to toggle between 0 and 1
Karnaugh Maps (卡诺图)
K-map是化简布尔表达式的图形工具。把真值表画成网格,圈出相邻的1来合并项。记住:相邻格子只差一个变量!
Interactive 2-Variable K-Map
Click cells to toggle 0/1, see the simplified expression update.
Active Recall
Apply De Morgan's to: NOT(A AND B OR C)
Show Answer
NOT(AB + C) = NOT(AB) · NOT(C) = (A'+B') · C'
先对整体用德摩根:OR变AND,取反各项。再对AB用德摩根。
中文总结:组合逻辑是"无记忆"的电路——输出只取决于当前输入。MUX(多路选择器)像个开关,选择哪个输入传到输出。解码器把二进制代码变成独热码。ALU是处理器的"计算引擎"。
Combinational Logic
Multiplexers (MUX)
2:1 MUX: Y = S ? I1 : I0 = S'I0 + SI1
4:1 MUX: 2 select bits, 4 data inputs
Any Boolean function can be implemented with a MUX (connect truth table outputs to data inputs)
Decoders & Encoders
Decoder: n inputs → 2n outputs (one-hot). Used for memory address decoding.
Encoder: 2n inputs → n outputs. Priority encoder handles multiple active inputs.
ALU (Arithmetic Logic Unit)
ALU就像处理器的计算器——根据控制信号选择做加法、减法、AND、OR等运算。
Combines adder + logic operations via MUX
ALU control signals select operation
Flags: zero, negative, overflow, carry
Interactive: 4-bit ALU Simulator
输入两个4位数和操作类型,查看ALU运算结果
Active Recall
How many select bits does a 16:1 MUX need?
3
4
16
因为 2^4 = 16,需要4个选择位来选16个输入之一。
中文总结:SystemVerilog是描述硬件的语言,不是软件编程!always @(*)描述组合逻辑,always @(posedge clk)描述时序逻辑。用 = 做阻塞赋值(组合),用 <= 做非阻塞赋值(时序)。搞混了电路就会出bug!
SystemVerilog for Digital Design
always Blocks
Parameterized Modules & Generate
module adder #(parameter WIDTH=32) (
input [WIDTH-1:0] a, b,
output [WIDTH-1:0] sum
);
assign sum = a + b;
endmodule
// Generate for repeated structures
genvar i;
generate
for (i = 0; i < N; i = i + 1) begin : gen_block
// instantiate modules here
end
endgenerate
Common Pitfalls
Latch inference: Missing else or missing default in case → infers latch (bad!)
Mixing = and <=: Never mix in same always block
Incomplete sensitivity list: Use always @(*) to avoid this
Interactive: Spot the Bug
看下面的Verilog代码,找出错误!
always @(posedge clk) begin
if (enable)
count = count + 1; // Bug here!
end
Reveal Bug
Should use non-blocking assignment (<=) in sequential always block!
时序逻辑块中应该用 <= 而不是 =
always @(*) begin
case(sel)
2'b00: y = a;
2'b01: y = b;
2'b10: y = c;
// Missing 2'b11!
endcase
end
Reveal Bug
Missing default case → infers a latch! Add: default: y = 0;
缺少default分支会推断出锁存器(latch),这是组合逻辑中的大忌!
Active Recall
What type of assignment should you use in a combinational always block?
Blocking (=)
Non-blocking (<=)
Either one works
组合逻辑用阻塞赋值(=),时序逻辑用非阻塞赋值(<=)。这是铁律!
中文总结:时序逻辑有"记忆"——输出取决于当前输入和过去的状态。触发器(Flip-Flop)是最基本的存储单元,在时钟边沿才更新。FSM(有限状态机)就像流程图,根据当前状态和输入决定下一个状态。Moore机输出只看状态,Mealy机输出还看输入。
Sequential Logic & Finite State Machines
Flip-Flops
D Flip-Flop: Q(next) = D. Captures input on clock edge.
Register: Array of flip-flops, stores multi-bit value.
Enable: Q(next) = en ? D : Q
Reset: Synchronous (checked at clock edge) vs Asynchronous (immediate)
Moore vs Mealy
Interactive: FSM Animator
这是一个2位计数器FSM。点击"Clock Tick"或按输入按钮,观察状态如何转换!当前状态会高亮显示。
2-bit Up/Down Counter
Current State: S0 (00)
Output: 00
Direction: UP
Clock Tick
Toggle Direction
Reset
State Transition History:
Active Recall
In a Moore machine, when does the output change?
Immediately when input changes
Only at clock edges (when state updates)
Between clock edges
Moore机的输出只取决于状态,状态只在时钟边沿更新,所以输出也只在时钟边沿变化。
中文总结:数据通路是CPU的"管道系统",数据在其中流动。单周期CPU一个时钟周期执行一条指令。关键部件:PC(指向当前指令)、指令存储器、寄存器文件、ALU、数据存储器。控制信号就像阀门,决定数据怎么流。
RISC-V Single-Cycle Datapath
Datapath Components
PC (Program Counter): Holds address of current instruction
Instruction Memory: ROM, outputs 32-bit instruction
Register File: 32 registers, 2 read ports, 1 write port
Immediate Generator: Extracts and sign-extends immediate from instruction
ALU: Performs computation (add, sub, and, or, etc.)
Data Memory: For load/store instructions
Control Signals
Signal R-type Load Store Branch JAL
RegWrite 1 1 0 0 1
MemRead 0 1 0 0 0
MemWrite 0 0 1 0 0
ALUSrc reg imm imm reg -
MemToReg ALU Mem - - PC+4
Interactive: Instruction Decoder
输入一条RISC-V指令,查看它的类型、控制信号和数据流向
add x1, x2, x3
lw x1, 0(x2)
sw x1, 0(x2)
beq x1, x2, label
addi x1, x2, 5
jal x1, label
Active Recall
For a store instruction (sw), is RegWrite 0 or 1?
0 (store doesn't write to register file)
1 (store writes data)
Store指令把寄存器的值存到内存,不需要写回寄存器文件,所以RegWrite=0。
中文总结:流水线就像洗衣服——不用等一批全洗完烘干叠好,而是同时有衣服在洗、在烘、在叠。5级流水线把指令分成:取指(IF)、译码(ID)、执行(EX)、访存(MEM)、写回(WB)。但会有冲突(hazard):结构冲突、数据冲突、控制冲突。前递(forwarding)和暂停(stall)是解决方案。
Pipelining
5-Stage Pipeline
IF (Instruction Fetch)
ID (Instruction Decode)
EX (Execute)
MEM (Memory)
WB (Write Back)
Hazards
Interactive: Pipeline Hazard Simulator
输入指令序列,模拟器会显示流水线时序图、标出数据冲突(红色)、前递路径(绿色)和气泡/暂停(灰色)。
Instructions (one per line):
Enable Forwarding
Simulate
Active Recall
A load instruction is followed by an instruction that uses the loaded value. With forwarding, how many stall cycles are needed?
0 (forwarding handles it)
1 (load-use hazard requires 1 stall even with forwarding)
2 (need to wait for WB)
Load指令在MEM阶段才拿到数据,即使有前递,下一条指令在EX阶段就需要数据,所以必须暂停1个周期。这就是"load-use hazard"。
中文总结:CMOS用两种晶体管——NMOS(接地的拉下网络)和PMOS(接电源的拉上网络)。它们互补工作:当输出该为1时PMOS导通、NMOS截止;输出为0时反过来。这样就不会有直接从电源到地的通路(除了切换瞬间),所以静态功耗极低。尺寸(sizing)影响速度和功耗。
CMOS Fundamentals
NMOS vs PMOS
CMOS Design Rules
Pull-up network (PUN) is complement of pull-down network (PDN)
Series NMOS ↔ Parallel PMOS (AND function in PDN ↔ OR function in PUN)
Parallel NMOS ↔ Series PMOS (OR function in PDN ↔ AND function in PUN)
CMOS gates are inherently inverting (NAND, NOR, NOT — not AND, OR)
Interactive: CMOS Gate Builder
切换输入A和B,观察NMOS和PMOS晶体管的通断状态。绿色=导通(ON),红色=截止(OFF)。看看输出如何由拉上和拉下网络决定!
INV
NAND
NOR
Click on A or B labels to toggle inputs
Active Recall
In a CMOS NAND gate, the NMOS transistors are in _____ and PMOS transistors are in _____.
Series, Parallel
Parallel, Series
Series, Series
NAND门:NMOS串联(两个都导通才拉低),PMOS并联(任一导通就拉高)。这是对偶关系!
中文总结:逻辑努力(Logical Effort)是分析和优化门延迟的方法。每个门有"固有延迟"(parasitic delay)和"努力延迟"(effort delay)。逻辑努力g衡量门比反相器慢多少,电气努力h是负载与输入电容的比。最优设计让每级的努力相等。就像分配工作——让每个人承担相同的任务量最高效。
CMOS Delay & Logical Effort
Delay Model
Logical Effort of Common Gates
Gate g (n=1) g (n=2) g (n=3) g (n=4) p
Inverter 1 - - - 1
NAND - 4/3 5/3 6/3 n
NOR - 5/3 7/3 9/3 n
Interactive: Logical Effort Calculator
输入门的参数,自动计算延迟。也可以计算多级路径的最优设计。
Multi-Stage Path Optimizer
Practice Problem
Design a path from C_in=1 to C_out=64 using NAND2-INV-NAND2-INV. Calculate minimum delay.
Show Step 1: Calculate G
G = gNAND2 × gINV × gNAND2 × gINV = (4/3)(1)(4/3)(1) = 16/9 ≈ 1.78
Show Step 2: Calculate F and f-hat
H = 64/1 = 64, B = 1
F = G × H × B = 1.78 × 64 = 113.8
f-hat = F^(1/4) = 113.8^0.25 ≈ 3.27
Show Step 3: Calculate Delay
P = 2 + 1 + 2 + 1 = 6
D = 4 × 3.27 + 6 = 13.07 + 6 = 19.07 τ
最小延迟约19个tau单位。每级的最优努力(f-hat)约3.27。
中文总结:导线不是理想的——它有电阻(R)和电容(C),会产生延迟。长导线的延迟与长度的平方成正比(RC延迟)。解决方案是插入中继器(repeater),把长线分成短段。导线也消耗能量,在现代芯片中,导线延迟可能比门延迟更大!
Wires & Interconnect
Wire Delay Model
Repeaters
Break long wire into k segments with buffers
Delay becomes proportional to L (instead of L²)
Optimal number of repeaters minimizes total delay
Active Recall
Why does wire delay scale as L² without repeaters?
Show Answer
Because both R and C are proportional to L. Delay = RC ∝ L × L = L².
电阻和电容都和长度成正比,延迟=RC正比于L×L=L的平方。加中继器后每段的R和C都变小,总延迟变成线性。
中文总结:加法器是最基本的算术电路。行波进位加法器(RCA)最简单但最慢——进位要一级一级传。超前进位加法器(CLA)提前计算进位,快得多但更复杂。进位选择加法器同时计算两种情况(carry=0和carry=1),等进位来了直接选。还有更高级的前缀加法器(Kogge-Stone, Brent-Kung)。
Adder Architectures
Full Adder (基本单元)
Adder Comparison
Type Delay Area Key Idea
Ripple Carry O(n) O(n) Chain full adders, carry ripples through
Carry Lookahead O(log n) O(n log n) Pre-compute carries using G, P
Carry Select O(√n) O(n) Compute both results, MUX select
Prefix (KS/BK) O(log n) O(n log n) Tree structure for parallel prefix
Interactive: Adder Comparison Visualizer
选择位宽,观察行波进位加法器和超前进位加法器的进位传播速度对比!蓝色方块表示进位到达该位。
Bit width:
8
Animate!
Reset
Carry Lookahead (O(log n))
Active Recall
What is the critical path delay (in gate delays) of a 32-bit ripple carry adder?
32 gate delays
~64 gate delays (2 per bit for carry chain)
~5 gate delays (log2(32))
每个全加器的进位链有~2个门延迟,32位连起来就是~64个门延迟。CLA只需要log2(32)=5级。
中文总结:乘法器比加法器复杂得多。最简单的阵列乘法器就是小学竖式乘法的硬件版——生成部分积然后加起来。Booth编码减少部分积数量,Wallace树用进位保存加法器(CSA)并行压缩部分积。除法器就更慢了——需要逐位迭代。SRT除法每次可以产生多位商。
Multipliers & Dividers
Array Multiplier
Partial products: PPi = A × B[i], shifted left by i
For n-bit × n-bit: n partial products, each n bits
Result: 2n bits wide
Delay: O(n) through carry chain of final adder
Booth Encoding
Wallace Tree
Uses Carry-Save Adders (CSA) to compress partial products
3:2 compressors (full adders) reduce 3 rows to 2
Delay: O(log n) compression stages + final CPA
Much faster than sequential addition of partial products
Binary Division
Restoring division: subtract divisor, if negative, add back (restore)
Non-restoring: don't add back, adjust in next step
SRT division: produces quotient bits from lookup table, can generate multiple bits per cycle
Interactive: Binary Multiplication Visualizer
输入两个4位数,查看部分积生成过程
中文总结:移位器用于数据的位移操作。桶形移位器(Barrel Shifter)可以在一个周期内完成任意位数的移位,用多级MUX实现。漏斗移位器(Funnel Shifter)把左移、右移、旋转统一处理。交叉开关(Crossbar)是通用的互连网络,任何输入可以连到任何输出。
Shifters & Crossbars
Barrel Shifter
N-bit barrel shifter uses log2 (N) stages of MUXes
Each stage shifts by 2i positions (controlled by shift amount bit i)
Example: 8-bit barrel shifter = 3 MUX stages (shift by 1, 2, 4)
Delay: O(log n) MUX delays
Funnel Shifter
Concatenate {A, B} into 2n-bit value, shift right, take lower n bits
Handles left shift, right shift, and rotate with one structure
Left shift by k = right shift by (n-k) of {input, 0...0}
Crossbar Switch
N inputs × N outputs, any permutation possible
N² switches (MUXes), expensive but fully flexible
Used in network switches, FPGA routing
Interactive: Barrel Shifter Simulator
输入一个8位值和移位量,观察每级MUX如何逐步完成移位
中文总结:时序约束是数字设计中最关键的概念之一。Setup time(建立时间):数据必须在时钟边沿之前稳定多长时间。Hold time(保持时间):数据在时钟边沿之后必须保持稳定多长时间。违反这些约束会导致亚稳态(metastability)——输出在0和1之间"犹豫不决",可能导致系统崩溃。
Timing Constraints
Key Timing Parameters
Metastability
When setup/hold violated, FF enters metastable state
Output is undefined for some time (MTBF = mean time between failures)
Solution for async inputs: use synchronizer (2+ flip-flops in series)
Interactive: Setup/Hold Time Visualizer
拖动数据信号的变化时间点,查看是否满足建立时间和保持时间约束。绿色=满足,红色=违反。
Practice Problem
Given: t_clk-to-q = 1ns, t_setup = 0.5ns, t_hold = 0.3ns, t_comb = 4ns. What is the minimum clock period?
Show Solution
t_clk ≥ t_clk-to-q + t_comb + t_setup = 1 + 4 + 0.5 = 5.5 ns
f_max = 1/5.5ns ≈ 181.8 MHz
Hold check: t_clk-to-q + t_comb,min ≥ t_hold → 1 + 0 ≥ 0.3 ✓ (assume t_comb,min ≥ 0)
最小时钟周期=1+4+0.5=5.5ns,最大频率约182MHz。保持时间约束也满足。
中文总结:SRAM(静态随机存储器)是缓存的核心。经典6T SRAM单元用6个晶体管存储1位——两个交叉耦合的反相器保持数据,两个访问晶体管连接位线。读操作会干扰存储的数据(所以要小心设计比例),写操作要用强信号覆盖原有值。感应放大器(Sense Amplifier)检测位线上的微小电压差来加速读取。
SRAM & Memory
6T SRAM Cell
2 cross-coupled inverters (4 transistors) = storage
2 access transistors (NMOS) = controlled by word line (WL)
Connected to bit line (BL) and complement bit line (BLB)
SRAM Sizing Constraints
Read stability: Cell ratio (pull-down / access) > 1 (typically ~1.5). Prevents read disturb.
Write-ability: Pull-up ratio (pull-up / access) < 1 (typically ~0.8). Allows bit line to flip cell.
Memory Hierarchy
存储器层次:寄存器 > L1缓存 > L2缓存 > L3缓存 > 主存(DRAM) > 磁盘/SSD。越靠上越快越小越贵。
Registers ~1 cycle ~1KB
L1 Cache ~2-4 cycles ~64KB
L2 Cache ~10 cycles ~256KB
DRAM ~100 cycles ~GBs
Active Recall
Why must the cell ratio (pull-down / access transistor width) be > 1 in 6T SRAM?
Show Answer
During read, the access transistor tries to charge the internal node storing '0'. If the access transistor is too strong relative to pull-down, it can flip the cell (read disturb). Cell ratio > 1 ensures pull-down is stronger, keeping the stored '0' stable.
读操作时,访问管会给存储"0"的节点充电。如果访问管太强(相对于下拉管),可能把单元翻转(读干扰)。cell ratio > 1确保下拉管更强,保持数据稳定。
中文总结:功耗是现代芯片设计的头号问题。动态功耗来自电容充放电(开关活动),与频率和电压的平方成正比。静态功耗来自漏电流,即使不切换也在消耗。DVFS(动态电压频率调节)降低电压和频率来省电。时钟门控(Clock Gating)关闭不需要的模块时钟来减少无用切换。
Power & Energy
Power Reduction Techniques
DVFS: Reduce VDD and f together. Power drops as VDD ²!
Clock Gating: Gate the clock to idle modules. Eliminates switching power.
Power Gating: Cut off VDD to idle blocks. Eliminates leakage too.
Multi-Vth : Use high-Vth (low leakage) for non-critical paths.
Interactive: Power Calculator
调整参数,观察动态功耗和静态功耗如何变化
中文总结:FPGA和ASIC是两种实现数字电路的方式。FPGA(现场可编程门阵列)像乐高积木——通过编程配置LUT(查找表)和连线来实现任何功能,可以反复修改,但速度慢、功耗高、面积大。ASIC(专用集成电路)是定制芯片——性能最优但不能改,设计周期长、成本高。我们151的项目就是先在FPGA上验证,真正的芯片用ASIC流程。
FPGA vs ASIC
FPGA Architecture
LUT (Look-Up Table): k-input truth table stored in SRAM. A k-LUT can implement ANY k-input Boolean function.
CLB (Configurable Logic Block): Contains LUTs + flip-flops + MUXes + carry chain
Routing: Programmable interconnect (switch boxes, connection boxes)
I/O Blocks: Interface with external world
Block RAM: Dedicated SRAM blocks
DSP Slices: Hardened multiply-accumulate units
FPGA vs ASIC Comparison
FPGA ASIC
Reconfigurable Yes No (fixed)
Design Time Days-Weeks Months-Years
NRE Cost Low ($100s) Very High ($Ms)
Per-unit Cost High Low at volume
Performance ~3-10x slower Fastest
Power ~10x more Most efficient
Area ~20-40x larger Smallest
ASIC Design Flow
→ RTL Design
→ Synthesis
→ Place & Route
→ Timing Closure
→ DRC/LVS
→ Tapeout
→ Fabrication
Interactive: LUT Truth Table Builder
设置一个3-input LUT的真值表,查看它实现的布尔函数。FPGA中的每个LUT就是这样工作的!
A B C Out (click to toggle)
0 0 0 0
0 0 1 0
0 1 0 0
0 1 1 0
1 0 0 0
1 0 1 0
1 1 0 0
1 1 1 0
Function: F = 0 (constant zero)
SRAM contents: 00000000
Active Recall
A 4-input LUT can implement how many different Boolean functions?
16
65536 (2^16)
4
4-input LUT有2^4=16行真值表,每行可以是0或1,所以有2^16=65536种可能的函数。