EECS 151 Interactive Study Guide

中文总结：这是61C的基础回顾——数字怎么在计算机里表示，二进制补码怎么处理负数，以及RISC-V指令集。就像盖房子前要先打地基。

61C Prerequisites Review

Number Representation

Computers represent everything in binary. Key systems:

Unsigned: Range [0, 2ⁿ-1]. Direct binary-to-decimal.
Sign-magnitude: MSB is sign. Two zeros problem. Range [-(2^n-1-1), 2^n-1-1]
Two's complement: MSB has weight -2^n-1. Range [-2^n-1, 2^n-1-1]. One representation of zero.

Key Formulas:

Two's complement value: $-b_{n-1} \cdot 2^{n-1} + \sum_{i=0}^{n-2} b_i \cdot 2^i$

Negation trick: flip all bits, add 1

Sign extension: copy MSB to fill new upper bits

RISC-V ISA Review

RISC-V is a load-store architecture with 32 registers (x0-x31, x0 always 0).

R-type: add, sub, and, or, xor, sll, srl, sra, slt, sltu
I-type: addi, lw, jalr, loads
S-type: sw, sb, sh (stores)
B-type: beq, bne, blt, bge (branches)
U-type: lui, auipc
J-type: jal

Interactive: Two's Complement Converter

输入十进制数，查看它的二进制补码表示（8位）

Decimal value (-128 to 127):

Quick Check

What is -5 in 8-bit two's complement?

0b10000101

0b11111011

0b01111011

中文总结：布尔逻辑是数字电路的语言。就像搭积木，用AND、OR、NOT这些"逻辑门"组合出任何功能。德摩根定律是最重要的变换工具，K-map（卡诺图）帮你化简表达式。

Boolean Algebra & Logic

Key Laws

De Morgan's Laws:

\overline{A \cdot B} = \bar{A} + \bar{B}

\overline{A + B} = \bar{A} \cdot \bar{B}

Other Important Laws:

Distributive: A(B+C) = AB + AC

Consensus: AB + A'C + BC = AB + A'C

Shannon expansion: f = x·f(x=1) + x'·f(x=0)

Canonical Forms

SOP (Sum of Products / 最小项之和): OR of AND terms (minterms). Each minterm = 1 row of truth table.
POS (Product of Sums / 最大项之积): AND of OR terms (maxterms). Each maxterm = 0 row of truth table.

Noise Margins

噪声容限就像"安全距离"——信号在传输中可以有多少噪声而不出错。

NM_H = V_OH - V_IH (high noise margin)

NM_L = V_IL - V_OL (low noise margin)

For valid logic: NM_H > 0 and NM_L > 0

Interactive: Gate Simulator

点击输入（A/B）切换0和1，观察不同逻辑门的输出变化。这就是数字电路最基本的运算！

Click on input circles (A, B) to toggle between 0 and 1

Truth Table for AND:

Karnaugh Maps (卡诺图)

K-map是化简布尔表达式的图形工具。把真值表画成网格，圈出相邻的1来合并项。记住：相邻格子只差一个变量！

Interactive 2-Variable K-Map

Click cells to toggle 0/1, see the simplified expression update.

	B=0	B=1
A=0
A=1

F = 0

Active Recall

Apply De Morgan's to: NOT(A AND B OR C)

中文总结：组合逻辑是"无记忆"的电路——输出只取决于当前输入。MUX（多路选择器）像个开关，选择哪个输入传到输出。解码器把二进制代码变成独热码。ALU是处理器的"计算引擎"。

Combinational Logic

Multiplexers (MUX)

2:1 MUX: Y = S ? I1 : I0 = S'I0 + SI1
4:1 MUX: 2 select bits, 4 data inputs
Any Boolean function can be implemented with a MUX (connect truth table outputs to data inputs)

Decoders & Encoders

Decoder: n inputs → 2ⁿ outputs (one-hot). Used for memory address decoding.
Encoder: 2ⁿ inputs → n outputs. Priority encoder handles multiple active inputs.

ALU (Arithmetic Logic Unit)

ALU就像处理器的计算器——根据控制信号选择做加法、减法、AND、OR等运算。

Combines adder + logic operations via MUX
ALU control signals select operation
Flags: zero, negative, overflow, carry

Interactive: 4-bit ALU Simulator

输入两个4位数和操作类型，查看ALU运算结果

Input A (0-15):

Operation:

Input B (0-15):

Active Recall

How many select bits does a 16:1 MUX need?

3

4

16

中文总结：SystemVerilog是描述硬件的语言，不是软件编程！always @(*)描述组合逻辑，always @(posedge clk)描述时序逻辑。用 = 做阻塞赋值（组合），用 <= 做非阻塞赋值（时序）。搞混了电路就会出bug！

SystemVerilog for Digital Design

always Blocks

Combinational (always @(*) or always_comb)

always @(*) begin
  // Use blocking (=)
  if (sel)
    y = a;
  else
    y = b;
end

组合逻辑用 always @(*) + 阻塞赋值 =

Sequential (always @(posedge clk))

always @(posedge clk) begin
  // Use non-blocking (<=)
  if (rst)
    q <= 0;
  else
    q <= d;
end

时序逻辑用 posedge clk + 非阻塞赋值 <=

Parameterized Modules & Generate

module adder #(parameter WIDTH=32) (
  input  [WIDTH-1:0] a, b,
  output [WIDTH-1:0] sum
);
  assign sum = a + b;
endmodule

// Generate for repeated structures
genvar i;
generate
  for (i = 0; i < N; i = i + 1) begin : gen_block
    // instantiate modules here
  end
endgenerate

Common Pitfalls

Latch inference: Missing else or missing default in case → infers latch (bad!)
Mixing = and <=: Never mix in same always block
Incomplete sensitivity list: Use always @(*) to avoid this

Interactive: Spot the Bug

看下面的Verilog代码，找出错误！

always @(posedge clk) begin
  if (enable)
    count = count + 1;  // Bug here!
end

always @(*) begin
  case(sel)
    2'b00: y = a;
    2'b01: y = b;
    2'b10: y = c;
    // Missing 2'b11!
  endcase
end

Active Recall

What type of assignment should you use in a combinational always block?

Blocking (=)

Non-blocking (<=)

Either one works

中文总结：时序逻辑有"记忆"——输出取决于当前输入和过去的状态。触发器（Flip-Flop）是最基本的存储单元，在时钟边沿才更新。FSM（有限状态机）就像流程图，根据当前状态和输入决定下一个状态。Moore机输出只看状态，Mealy机输出还看输入。

Sequential Logic & Finite State Machines

Flip-Flops

D Flip-Flop: Q(next) = D. Captures input on clock edge.
Register: Array of flip-flops, stores multi-bit value.
Enable: Q(next) = en ? D : Q
Reset: Synchronous (checked at clock edge) vs Asynchronous (immediate)

Moore vs Mealy

Moore Machine

Output = f(state only)

Output on states, changes only at clock edge

输出只看当前状态，更稳定但可能多一个周期延迟

Mealy Machine

Output = f(state, input)

Output on transitions, can react faster

输出同时看状态和输入，反应更快但可能有毛刺

Interactive: FSM Animator

这是一个2位计数器FSM。点击"Clock Tick"或按输入按钮，观察状态如何转换！当前状态会高亮显示。

2-bit Up/Down Counter

Current State: S0 (00)

Output: 00

Direction: UP

State Transition History:

Active Recall

In a Moore machine, when does the output change?

Immediately when input changes

Only at clock edges (when state updates)

Between clock edges

中文总结：数据通路是CPU的"管道系统"，数据在其中流动。单周期CPU一个时钟周期执行一条指令。关键部件：PC（指向当前指令）、指令存储器、寄存器文件、ALU、数据存储器。控制信号就像阀门，决定数据怎么流。

RISC-V Single-Cycle Datapath

Datapath Components

PC (Program Counter): Holds address of current instruction
Instruction Memory: ROM, outputs 32-bit instruction
Register File: 32 registers, 2 read ports, 1 write port
Immediate Generator: Extracts and sign-extends immediate from instruction
ALU: Performs computation (add, sub, and, or, etc.)
Data Memory: For load/store instructions

Control Signals

Signal	R-type	Load	Store	Branch	JAL
RegWrite	1	1	0	0	1
MemRead	0	1	0	0	0
MemWrite	0	0	1	0	0
ALUSrc	reg	imm	imm	reg	-
MemToReg	ALU	Mem	-	-	PC+4

Interactive: Instruction Decoder

输入一条RISC-V指令，查看它的类型、控制信号和数据流向

Active Recall

For a store instruction (sw), is RegWrite 0 or 1?

0 (store doesn't write to register file)

1 (store writes data)

中文总结：流水线就像洗衣服——不用等一批全洗完烘干叠好，而是同时有衣服在洗、在烘、在叠。5级流水线把指令分成：取指(IF)、译码(ID)、执行(EX)、访存(MEM)、写回(WB)。但会有冲突(hazard)：结构冲突、数据冲突、控制冲突。前递(forwarding)和暂停(stall)是解决方案。

Pipelining

5-Stage Pipeline

IF (Instruction Fetch) ID (Instruction Decode) EX (Execute) MEM (Memory) WB (Write Back)

Hazards

Structural

Two instructions need same hardware. Fix: duplicate resource or separate memories.

两条指令抢同一个硬件资源

Data

RAW: read-after-write. Fix: forwarding from EX/MEM or stall.

后面的指令需要前面还没算完的结果

Control

Branch outcome unknown. Fix: predict, flush if wrong, branch delay slot.

分支指令还没决定跳不跳，后面的指令不知道该不该执行

Pipeline Performance:

\text{Speedup} = \frac{n \cdot k}{n + k - 1} \approx k \text{ for large } n

k = pipeline stages, n = number of instructions. Ideal speedup = k, but hazards reduce it.

Interactive: Pipeline Hazard Simulator

输入指令序列，模拟器会显示流水线时序图、标出数据冲突（红色）、前递路径（绿色）和气泡/暂停（灰色）。

Instructions (one per line):

Enable Forwarding

Active Recall

A load instruction is followed by an instruction that uses the loaded value. With forwarding, how many stall cycles are needed?

0 (forwarding handles it)

1 (load-use hazard requires 1 stall even with forwarding)

2 (need to wait for WB)

中文总结：CMOS用两种晶体管——NMOS（接地的拉下网络）和PMOS（接电源的拉上网络）。它们互补工作：当输出该为1时PMOS导通、NMOS截止；输出为0时反过来。这样就不会有直接从电源到地的通路（除了切换瞬间），所以静态功耗极低。尺寸(sizing)影响速度和功耗。

CMOS Fundamentals

NMOS vs PMOS

NMOS

ON when gate = 1 (V_GS > V_th)
Good at pulling DOWN (passing 0)
Bad at pulling up (weak 1)
In pull-down network (connected to GND)

NMOS：高电平打开，擅长拉低输出

PMOS

ON when gate = 0 (V_GS < -|V_tp|)
Good at pulling UP (passing 1)
Bad at pulling down (weak 0)
In pull-up network (connected to V_DD)

PMOS：低电平打开，擅长拉高输出

CMOS Design Rules

Pull-up network (PUN) is complement of pull-down network (PDN)
Series NMOS ↔ Parallel PMOS (AND function in PDN ↔ OR function in PUN)
Parallel NMOS ↔ Series PMOS (OR function in PDN ↔ AND function in PUN)
CMOS gates are inherently inverting (NAND, NOR, NOT — not AND, OR)

Transistor Sizing:

I_{DS} \propto \frac{W}{L} \cdot \mu \cdot C_{ox}

PMOS is ~2-3x wider than NMOS for equal drive strength (because μ_p < μ_n)

Interactive: CMOS Gate Builder

切换输入A和B，观察NMOS和PMOS晶体管的通断状态。绿色=导通(ON)，红色=截止(OFF)。看看输出如何由拉上和拉下网络决定！

Click on A or B labels to toggle inputs

Active Recall

In a CMOS NAND gate, the NMOS transistors are in _____ and PMOS transistors are in _____.

Series, Parallel

Parallel, Series

Series, Series

中文总结：逻辑努力(Logical Effort)是分析和优化门延迟的方法。每个门有"固有延迟"(parasitic delay)和"努力延迟"(effort delay)。逻辑努力g衡量门比反相器慢多少，电气努力h是负载与输入电容的比。最优设计让每级的努力相等。就像分配工作——让每个人承担相同的任务量最高效。

CMOS Delay & Logical Effort

Delay Model

Gate Delay:

d = g \cdot h + p = g \cdot \frac{C_{out}}{C_{in}} + p

Where:

g = logical effort (how much worse than inverter)
h = electrical effort = C_out / C_in
p = parasitic delay (intrinsic)
f = g · h = stage effort

Logical Effort of Common Gates

Gate	g (n=1)	g (n=2)	g (n=3)	g (n=4)	p
Inverter	1	-	-	-	1
NAND	-	4/3	5/3	6/3	n
NOR	-	5/3	7/3	9/3	n

Multi-Stage Path Optimization:

F = G \cdot B \cdot H = \prod f_i \quad \text{(path effort)}

\hat{f} = F^{1/N} \quad \text{(optimal stage effort)}

D = N \cdot F^{1/N} + P \quad \text{(minimum delay)}

G = product of logical efforts, H = C_out/C_in of path, B = branching effort, N = number of stages

Interactive: Logical Effort Calculator

输入门的参数，自动计算延迟。也可以计算多级路径的最优设计。

Single Gate Delay

Gate Type:

C_out (load capacitance, in units of C_inv):

C_in (input capacitance, in units of C_inv):

Multi-Stage Path Optimizer

Path logical effort G (product of g's):

Path electrical effort H (C_out/C_in):

Branching effort B:

Number of stages N:

Total parasitic delay P:

Practice Problem

Design a path from C_in=1 to C_out=64 using NAND2-INV-NAND2-INV. Calculate minimum delay.

G = g_NAND2 × g_INV × g_NAND2 × g_INV = (4/3)(1)(4/3)(1) = 16/9 ≈ 1.78

H = 64/1 = 64, B = 1

F = G × H × B = 1.78 × 64 = 113.8

f-hat = F^(1/4) = 113.8^0.25 ≈ 3.27

P = 2 + 1 + 2 + 1 = 6

D = 4 × 3.27 + 6 = 13.07 + 6 = 19.07 τ

最小延迟约19个tau单位。每级的最优努力(f-hat)约3.27。

中文总结：导线不是理想的——它有电阻(R)和电容(C)，会产生延迟。长导线的延迟与长度的平方成正比(RC延迟)。解决方案是插入中继器(repeater)，把长线分成短段。导线也消耗能量，在现代芯片中，导线延迟可能比门延迟更大！

Wires & Interconnect

Wire Delay Model

Lumped RC Model:

t_{wire} = 0.69 \cdot R_{wire} \cdot C_{wire}

Distributed RC (Elmore delay):

t_{wire} = 0.38 \cdot R_{wire} \cdot C_{wire} \quad \text{(for distributed RC)}

Since R and C both scale with length L:

t_{wire} \propto L^2

Repeaters

Break long wire into k segments with buffers
Delay becomes proportional to L (instead of L²)
Optimal number of repeaters minimizes total delay

Wire Energy:

E_{wire} = C_{wire} \cdot V_{DD}^2 \quad \text{(per transition)}

In modern chips, wire energy can dominate gate energy!

Active Recall

Why does wire delay scale as L² without repeaters?

中文总结：加法器是最基本的算术电路。行波进位加法器(RCA)最简单但最慢——进位要一级一级传。超前进位加法器(CLA)提前计算进位，快得多但更复杂。进位选择加法器同时计算两种情况(carry=0和carry=1)，等进位来了直接选。还有更高级的前缀加法器(Kogge-Stone, Brent-Kung)。

Adder Architectures

Full Adder (基本单元)

S = A \oplus B \oplus C_{in}

C_{out} = AB + C_{in}(A \oplus B) = AB + AC_{in} + BC_{in}

Generate: G = AB, Propagate: P = A ⊕ B

Adder Comparison

Type	Delay	Area	Key Idea
Ripple Carry	O(n)	O(n)	Chain full adders, carry ripples through
Carry Lookahead	O(log n)	O(n log n)	Pre-compute carries using G, P
Carry Select	O(√n)	O(n)	Compute both results, MUX select
Prefix (KS/BK)	O(log n)	O(n log n)	Tree structure for parallel prefix

Interactive: Adder Comparison Visualizer

选择位宽，观察行波进位加法器和超前进位加法器的进位传播速度对比！蓝色方块表示进位到达该位。

Bit width: 8

Ripple Carry (O(n))

Carry Lookahead (O(log n))

Active Recall

What is the critical path delay (in gate delays) of a 32-bit ripple carry adder?

32 gate delays

~64 gate delays (2 per bit for carry chain)

~5 gate delays (log2(32))

中文总结：乘法器比加法器复杂得多。最简单的阵列乘法器就是小学竖式乘法的硬件版——生成部分积然后加起来。Booth编码减少部分积数量，Wallace树用进位保存加法器(CSA)并行压缩部分积。除法器就更慢了——需要逐位迭代。SRT除法每次可以产生多位商。

Multipliers & Dividers

Array Multiplier

Partial products: PP_i = A × B[i], shifted left by i
For n-bit × n-bit: n partial products, each n bits
Result: 2n bits wide
Delay: O(n) through carry chain of final adder

Booth Encoding

Radix-2 Booth: examine pairs of multiplier bits to encode {-1, 0, +1}

Radix-4 (Modified Booth): examine triplets, encode {-2, -1, 0, +1, +2}

Benefit: Reduces number of partial products by half (radix-4)

Booth编码通过合并相邻位减少部分积数量，加速乘法运算

Wallace Tree

Uses Carry-Save Adders (CSA) to compress partial products
3:2 compressors (full adders) reduce 3 rows to 2
Delay: O(log n) compression stages + final CPA
Much faster than sequential addition of partial products

Binary Division

Restoring division: subtract divisor, if negative, add back (restore)
Non-restoring: don't add back, adjust in next step
SRT division: produces quotient bits from lookup table, can generate multiple bits per cycle

Interactive: Binary Multiplication Visualizer

输入两个4位数，查看部分积生成过程

A (0-15):

B (0-15):

中文总结：移位器用于数据的位移操作。桶形移位器(Barrel Shifter)可以在一个周期内完成任意位数的移位，用多级MUX实现。漏斗移位器(Funnel Shifter)把左移、右移、旋转统一处理。交叉开关(Crossbar)是通用的互连网络，任何输入可以连到任何输出。

Shifters & Crossbars

Barrel Shifter

N-bit barrel shifter uses log₂(N) stages of MUXes
Each stage shifts by 2ⁱ positions (controlled by shift amount bit i)
Example: 8-bit barrel shifter = 3 MUX stages (shift by 1, 2, 4)
Delay: O(log n) MUX delays

Funnel Shifter

Concatenate {A, B} into 2n-bit value, shift right, take lower n bits
Handles left shift, right shift, and rotate with one structure
Left shift by k = right shift by (n-k) of {input, 0...0}

Crossbar Switch

N inputs × N outputs, any permutation possible
N² switches (MUXes), expensive but fully flexible
Used in network switches, FPGA routing

Interactive: Barrel Shifter Simulator

输入一个8位值和移位量，观察每级MUX如何逐步完成移位

8-bit Input (binary):

Shift amount (0-7):

Direction:

中文总结：时序约束是数字设计中最关键的概念之一。Setup time（建立时间）：数据必须在时钟边沿之前稳定多长时间。Hold time（保持时间）：数据在时钟边沿之后必须保持稳定多长时间。违反这些约束会导致亚稳态(metastability)——输出在0和1之间"犹豫不决"，可能导致系统崩溃。

Timing Constraints

Key Timing Parameters

Setup Time Constraint (determines max frequency):

t_{clk} \geq t_{clk \to q} + t_{comb,max} + t_{setup}

Hold Time Constraint (determines correctness):

t_{clk \to q} + t_{comb,min} \geq t_{hold}

With Clock Skew:

t_{clk} \geq t_{clk \to q} + t_{comb,max} + t_{setup} + t_{skew}

t_{clk \to q} + t_{comb,min} \geq t_{hold} + t_{skew}

Metastability

When setup/hold violated, FF enters metastable state
Output is undefined for some time (MTBF = mean time between failures)
Solution for async inputs: use synchronizer (2+ flip-flops in series)

Interactive: Setup/Hold Time Visualizer

拖动数据信号的变化时间点，查看是否满足建立时间和保持时间约束。绿色=满足，红色=违反。

Data transition time (drag on canvas or use slider):

t_setup (ns):

t_hold (ns):

Practice Problem

Given: t_clk-to-q = 1ns, t_setup = 0.5ns, t_hold = 0.3ns, t_comb = 4ns. What is the minimum clock period?

t_clk ≥ t_clk-to-q + t_comb + t_setup = 1 + 4 + 0.5 = 5.5 ns

f_max = 1/5.5ns ≈ 181.8 MHz

Hold check: t_clk-to-q + t_comb,min ≥ t_hold → 1 + 0 ≥ 0.3 ✓ (assume t_comb,min ≥ 0)

最小时钟周期=1+4+0.5=5.5ns，最大频率约182MHz。保持时间约束也满足。

中文总结：SRAM（静态随机存储器）是缓存的核心。经典6T SRAM单元用6个晶体管存储1位——两个交叉耦合的反相器保持数据，两个访问晶体管连接位线。读操作会干扰存储的数据（所以要小心设计比例），写操作要用强信号覆盖原有值。感应放大器(Sense Amplifier)检测位线上的微小电压差来加速读取。

SRAM & Memory

6T SRAM Cell

2 cross-coupled inverters (4 transistors) = storage
2 access transistors (NMOS) = controlled by word line (WL)
Connected to bit line (BL) and complement bit line (BLB)

Read Operation:

Pre-charge BL and BLB to V_DD
Assert word line (WL = 1)
Cell pulls one bit line slightly lower
Sense amplifier detects small voltage difference

Write Operation:

Drive BL and BLB to desired values (one to V_DD, other to GND)
Assert word line — strong bit line drivers overpower cell

SRAM Sizing Constraints

Read stability: Cell ratio (pull-down / access) > 1 (typically ~1.5). Prevents read disturb.
Write-ability: Pull-up ratio (pull-up / access) < 1 (typically ~0.8). Allows bit line to flip cell.

Memory Hierarchy

存储器层次：寄存器 > L1缓存 > L2缓存 > L3缓存 > 主存(DRAM) > 磁盘/SSD。越靠上越快越小越贵。

Registers
~1 cycle
~1KB

L1 Cache
~2-4 cycles
~64KB

L2 Cache
~10 cycles
~256KB

DRAM
~100 cycles
~GBs

Active Recall

Why must the cell ratio (pull-down / access transistor width) be > 1 in 6T SRAM?

中文总结：功耗是现代芯片设计的头号问题。动态功耗来自电容充放电(开关活动)，与频率和电压的平方成正比。静态功耗来自漏电流，即使不切换也在消耗。DVFS（动态电压频率调节）降低电压和频率来省电。时钟门控(Clock Gating)关闭不需要的模块时钟来减少无用切换。

Power & Energy

Dynamic Power (switching):

P_{dynamic} = \alpha \cdot C \cdot V_{DD}^2 \cdot f

α = activity factor, C = capacitance, f = frequency

Static Power (leakage):

P_{static} = I_{leak} \cdot V_{DD}

Total Power:

P_{total} = P_{dynamic} + P_{static} + P_{short-circuit}

Power Reduction Techniques

DVFS: Reduce V_DD and f together. Power drops as V_DD²!
Clock Gating: Gate the clock to idle modules. Eliminates switching power.
Power Gating: Cut off V_DD to idle blocks. Eliminates leakage too.
Multi-V_th: Use high-V_th (low leakage) for non-critical paths.

Interactive: Power Calculator

调整参数，观察动态功耗和静态功耗如何变化

V_DD (V): 1.0

Frequency (GHz): 1.0

Activity factor α: 0.3

Capacitance (pF): 100

中文总结：FPGA和ASIC是两种实现数字电路的方式。FPGA（现场可编程门阵列）像乐高积木——通过编程配置LUT（查找表）和连线来实现任何功能，可以反复修改，但速度慢、功耗高、面积大。ASIC（专用集成电路）是定制芯片——性能最优但不能改，设计周期长、成本高。我们151的项目就是先在FPGA上验证，真正的芯片用ASIC流程。

FPGA vs ASIC

FPGA Architecture

LUT (Look-Up Table): k-input truth table stored in SRAM. A k-LUT can implement ANY k-input Boolean function.
CLB (Configurable Logic Block): Contains LUTs + flip-flops + MUXes + carry chain
Routing: Programmable interconnect (switch boxes, connection boxes)
I/O Blocks: Interface with external world
Block RAM: Dedicated SRAM blocks
DSP Slices: Hardened multiply-accumulate units

FPGA vs ASIC Comparison

	FPGA	ASIC
Reconfigurable	Yes	No (fixed)
Design Time	Days-Weeks	Months-Years
NRE Cost	Low ($100s)	Very High ($Ms)
Per-unit Cost	High	Low at volume
Performance	~3-10x slower	Fastest
Power	~10x more	Most efficient
Area	~20-40x larger	Smallest

ASIC Design Flow

→ RTL Design → Synthesis → Place & Route → Timing Closure → DRC/LVS → Tapeout → Fabrication

Interactive: LUT Truth Table Builder

设置一个3-input LUT的真值表，查看它实现的布尔函数。FPGA中的每个LUT就是这样工作的！

A	B	C
0	0	0
0	0	1
0	1	0
0	1	1
1	0	0
1	0	1
1	1	0
1	1	1

Function: F = 0 (constant zero)

SRAM contents: 00000000

Active Recall

A 4-input LUT can implement how many different Boolean functions?

16

65536 (2^16)

4