译文地址:[译] 400 行 C 代码实现一个虚拟机(2018) (arthurchiao.art)
原文地址:Write your Own Virtual Machine (jmeiners.com)
[译] 400 行 C 代码实现一个虚拟机(2018)
译者序
本文翻译自 2018 年的一篇英文博客 Write your Own Virtual Machine 。文章介绍如何用 400 行左右的 C 代码实现一个虚拟机,该虚拟机能够运行标准 的 LC-3 汇编程序。
不过要注意,本文所说的“虚拟机”是指 JVM 或者 Erlang Beam 那种虚拟机(解释 器),并不是 VirtualBox 或 VMWare 启动的那种有完整操作系统的虚拟机。
由于译者水平有限,本文不免存在遗漏或错误之处。如有疑问,请查阅原文。
以下是译文。
1. 引言
本文将教你编写一个自己的虚拟机(VM),这个虚拟机能够运行汇编语言编写的程序, 例如我朋友编写的 2048 或者我自己的 Roguelike。如果你会编程,但希望 更深入地了解计算机的内部原理以及编程语言是如何工作的,那本文很适合你。从零开始 写一个虚拟机听起来可能让人有点望而生畏,但读完本文之后你会惊讶于这件事原来如此简 单,并从中深受启发。
本文所说的虚拟机最终由 400 行左右 C 代码组成。理解这些代码只需要基本的 C/C++ 知识和二进制运算。这个虚拟机可以在 Unix 系统(包括 macOS)上执行。代码中包含少 量平台相关的配置终端(terminal)和显示(display)的代码,但这些并不是本项目的核 心。(欢迎大家添加对 Windows 的支持。)
注意:这个虚拟机是Literate Programming 的产物。 本文会解释每段代码的原理,最终的实现就是将这些代码片段连起来。
什么是虚拟机?
虚拟机就像计算机(computer),它模拟包括 CPU 在内的几个硬件组件,能够执行 算术运算、读写内存、与 I/O 设备交互。最重要的是,它能理解机器语言(machine language),因此可以用相应的语言来对它进行编程。
一个虚拟机需要模拟哪些硬件要看它的使用场景。有些虚拟机是设计用来模拟特定类型的计算设备 的,例如视频游戏模拟器。现在 NES 已经不常见了,但我们还是可以用 NES 硬件模拟器来玩 NES 游戏。这些模拟器必须能忠实地 重建每一个细节,以及原硬件的每个主要组件。
另外一些虚拟机则完全是虚构的,而非用来模拟硬件。这类虚拟机的主要用途是使软件开发 更容易。例如,要开发一个能运行在不同计算架构上的程序,你无需使用每种架构特定的汇 编方言来实现一遍自己的程序,而只需要使用一个跨平台的虚拟机提供的汇编语言。
注:编译器也解决了类似的跨平台问题,它将标准的高级语言编写的程序编译成能在不同 CPU 架构上执行的程序。相比之下,虚拟机的跨平台方式是自己创建一个标准的 CPU 架 构,然后在不同的物理设备上模拟这个 CPU 架构。编译器方式的优点是没有运行时开销 (runtime overhead),但实现一个支持多平台的编译器是非常困难的,但实现一个虚拟 机就简单多了。在实际中,人们会根据需求的不同混合使用虚拟机和编译器,因为二者工 作在不同的层次。
Java Virtual Machine (JVM) 就是一个非常成功的例子。JVM 本身是一个中等大小、程序员完全能够看懂的程序,因此很 容易将它移植到包括手机在内的上千种设备上。只要在设备上实现了 JVM,接下来任何 Java、Kotlin 或 Clojure 程序都无需任何修改就可以直接运行在这个设备上。唯一的开销 来自虚拟机自身以及机器之上的 进一步抽象。 大部分情况下,这完全是可以接受的。
虚拟机不必很大或者能适应各种场景,老式的视频游戏 经常使用很小的虚拟机来提 供简单的脚本系统(scripting systems)。
虚拟机还适用于在一个安全的或隔离的环境中执行代码。一个例子就是垃圾回收(GC)。要 在 C 或 C++ 之上实现一个自动垃圾回收机制并不容易 ,因为 程序无法看到它自身的栈或变量 。 但是,虚拟机是在它运行的程序“之外”的,因此它能够看到栈上所有的内存引用 。
另一个例子是 以太坊智能合约 (Ethereum smart contracts)。 智能合约是在区块链网络中被验证节点(validating node)执行的小段程序。这就要求 人们在无法提前审查这些由陌生人编写的代码的情况下,直接他们的机器上执行这些代码。 为避免合约执行一些恶意行为,智能合约将它们放到一个 虚拟机 内执行,这个虚拟机没有权限访问文件系统、网络、磁盘等等资源。以太坊也很好地展现了 虚拟机的可移植性特性,因为以太坊节点可以运行在多种计算机和操作系统上。使用虚拟机 使得智能合约的编写无需考虑将在什么平台运行。
2. LC-3 架构
我们的虚拟机将会模拟一个虚构的称为 LC-3 的计算机。 LC-3 在学校中比较流行,用于教学生如何用汇编编程。与 x86 相比 ,LC-3 的指令集更 加简化,但现代 CPU 的主要思想其中都包括了。
我们首先需要模拟机器最基础的硬件组件,尝试来理解每个组件是做什么的,如果 现在无法将这些组件拼成一张完整的图也不要着急。
2.1 内存
LC-3 有 65,536 个内存位置(16 bit 无符号整形能寻址的最大值),每个位置可以存储一 个 16-bit 的值。这意味着它总共可以存储 128KB 数据(64K * 2 Byte),比我们平时接触 的计算机内存小多了!在我们的程序中,这个内存会以简单数组的形式存放数据:
/* 65536 locations */uint16_t memory[UINT16_MAX];
2.2 寄存器
一个寄存器就是 CPU 上一个能够存储单个数据的槽(slot)。 寄存器就像是 CPU 的 “工作台”(workbench) ,CPU 要对一段数据进行处理,必须先将数据放到某个寄存器中。但 因为寄存器的数量很少,因此在任意时刻只能有很少的数据加载到寄存器。计算机的解决办 法是:首先将数据从内存加载到寄存器,然后将计算结果放到其他寄存器,最后将最终结果 再写回内存。
LC-3 总共有 10 个寄存器,每个都是 16 比特。其中大部分都是通用目的寄存器,少数几 个用于特定目的。
-
8 个通用目的寄存器(R0-R7)
-
1 个程序计数器(program counter, PC)寄存器
-
1 个条件标志位(condition flags,COND)寄存器
通用目的寄存器可以用于执行任何程序计算。程序计数器(PC)是一个无符号整数,表示内 存中将要执行的下一条指令的地址。条件标记寄存器记录前一次计算结果的正负符号。
enum {
R_R0 = 0,
R_R1,
R_R2,
R_R3,
R_R4,
R_R5,
R_R6,
R_R7,
R_PC, /* program counter */
R_COND,
R_COUNT};
和内存一样,我们也用数组来表示这些寄存器:
uint16_t reg[R_COUNT];
2.3 指令集
一条指令就是一条 CPU 命令,它告诉 CPU 执行什么任务,例如将两个数相加。一条指令包 含两部分:
-
操作码(opcode):表示任务的类型
-
执行任务所需的参数
每个操作码代表 CPU “知道”的一种任务。在 LC-3 中只有 16 个操作码。计算机能够完成 的所有计算,都是这些简单指令组成的指令流。每条指令 16 比特长,其中最左边的 4 个 比特存储的是操作码,其余的比特存储的是参数。
我们稍后会详细介绍每条指令是做什么的,现在先定义下面的这些操作码,确保它们 是按如下顺序定义的,这样每条指令就可以获得正确的枚举值:
enum {
OP_BR = 0, /* branch */
OP_ADD, /* add */
OP_LD, /* load */
OP_ST, /* store */
OP_JSR, /* jump register */
OP_AND, /* bitwise and */
OP_LDR, /* load register */
OP_STR, /* store register */
OP_RTI, /* unused */
OP_NOT, /* bitwise not */
OP_LDI, /* load indirect */
OP_STI, /* store indirect */
OP_JMP, /* jump */
OP_RES, /* reserved (unused) */
OP_LEA, /* load effective address */
OP_TRAP /* execute trap */};
注:Intel x86 架构有几百条指令,而其他的架构例如 ARM 和 LC-3 只有很少的指令 。较小的指令集称为精简指令集(RISC),较大 的指令集称为复杂指令集(CISC)。更大 的指令集本质上通常并没有提供新特性,只是使得编写 汇编更加方便 。一条 CISC 指令能做的事情可能需要好几条 RISC 才能完成。但是,对设计和制造工程 师来说,CISC 更加复杂和昂贵,设计和制造业更贵。包括这一点在内的一些权衡使得指 令设计也在不断变化。
2.4 条件标志位
R_COND
寄存器存储条件标记,其中
记录了最近一次计算的执行结果
。 这使得程序可以完成诸如 if (x > 0) { ... }
之类的逻辑条件。
每个 CPU 都有很多条件标志位来表示不同的情形。LC-3 只使用 3 个条件标记位,用来 表示前一次计算结果的符号:
enum {
FL_POS = 1 << 0, /* P */
FL_ZRO = 1 << 1, /* Z */
FL_NEG = 1 << 2, /* N */};
注:
<<
和>>
表示移位操作。
至此,我们就完成了虚拟机的硬件组件的模拟。
3. 汇编示例
下面通过一个 LC-3 汇编程序先来感受一下这个虚拟机运行的是什么代码。这里无需知 道如何编写汇编程序或者理解背后的工作原理,只是先直观感受一下。下面是 “Hello World” 例子:
.ORIG x3000 ; this is the address in memory where the program will be loaded LEA R0, HELLO_STR ; load the address of the HELLO_STR string into R0 PUTs ; output the string pointed to by R0 to the console HALT ; halt the program HELLO_STR .STRINGZ "Hello World!" ; store this string here in the program .END ; mark the end of the file
和 C 类似,这段程序从最上面开始,每次执行一条声明(statement)。但和 C 不同的是, 这里没有作用域符号 {}
或者控制结构(例如 if
和 while
),仅仅是一个扁平的声 明列表(a flat list of statements)。这样的程序更容易执行。
注意,其中一些声明中的名字和我们前面的定义的操作码(opcodes)是一样的。前面 介绍到,每条指令都是 16 比特,但这里的汇编程序看起来每行的字符数都是不一样的。 为什么会有这种不一致呢?
这是因为这些汇编声明都是以人类可读写的格式编写的,以纯文本的形式表示。一种称为 汇编器(assembler)的工具会将这些文本格式的指令转换成 16 比特的二进制指令, 后者是虚拟机可以理解的。这种二进制格式称为机器码(machine code),是虚拟机可以 执行的格式,其本质上就是一个 16 比特指令组成的数组。
注:虽然在开发中编译器(compiler)和汇编器(assembler)的角色是类似的,但二者 是两个不同的工具。汇编器只是简单地将程序员编写的文本编码(encode)成二进制格式 ,将其中的符号替换成相应的二进制表示并打包到指令内。
.ORIG
和 .STRINGZ
看起来像是指令,但其实不是,它们称为
汇编制导命令
(assembler directives),可以生成一段代码或数据。例如,.STRINGZ
会在它所在的 位置插入一段字符串。
循环和条件判断是通过类似 goto
的指令实现的。下面是一个如何计时到 10 的例子:
AND R0, R0, 0 ; clear R0 LOOP ; label at the top of our loop ADD R0, R0, 1 ; add 1 to R0 and store back in R0 ADD R1, R0, -10 ; subtract 10 from R0 and store back in R1 BRn LOOP ; go back to LOOP if the result was negative ... ; R0 is now 10!
注:本文不需要读者会编写汇编代码。但如果你感兴趣,你可以使用 LC-3 工具来编写和汇编你自己写的汇编程序。
4. 执行程序
前面的例子是给大家一个直观印象来理解虚拟机在做什么。实现一个虚拟机不必精通汇编编 程,只要遵循正确的流程来读取和执行指令,任何 LC-3 程序都能够正确执行,不管这些程 序有多么复杂。理论上,这样的虚拟机甚至可以运行一个浏览器或者 Linux 这样的操作系 统。
如果深入地思考这个特性,你就会意识到这是一个 在哲学上非常奇特的现象 : 程序能完成各种智能的事情,其中一些我们甚至都很难想象;但同时,所有这些程序最终都是用我们编 写的这些少量指令来执行的! 我们既了解 —— 又不了解 —— 那些和程序执行相关的的事情 。图灵 曾经讨探讨过这种令人惊叹的思想:
“The view that machines cannot give rise to surprises is due, I believe, to a fallacy to which philosophers and mathematicians are particularly subject. This is the assumption that as soon as a fact is presented to a mind all consequences of that fact spring into the mind simultaneously with it. It is a very useful assumption under many circumstances, but one too easily forgets that it is false.” — Alan M. Turing
过程(Procedure)
我们将编写的这个过程(procedure)描述如下:
-
从 PC 寄存器指向的内存地址中加载一条指令
-
递增 PC 寄存器
-
查看指令中的 opcode 字段,判断指令类型
-
根据指令类型和指令中所带的参数执行该指令
-
跳转到步骤 1
你可能会有疑问:“如果这个循环不断递增 PC,而我们没有 if
或 while
,那程序不会 很快运行到内存外吗?”答案是不会,我们前面提到过,有类似 goto
的指令会通过修改 PC 来改变执行流。
下面是以上流程的大致代码实现:
int main(int argc, const char* argv[]) {
{Load Arguments, 12}
{Setup, 12}
/* set the PC to starting position */
enum { PC_START = 0x3000 }; /* 0x3000 is the default */
reg[R_PC] = PC_START;
int running = 1;
while (running) {
uint16_t instr = mem_read(reg[R_PC]++); /* FETCH */
uint16_t op = instr >> 12;
switch (op) {
case OP_ADD: {ADD, 6} break;
case OP_AND: {AND, 7} break;
case OP_NOT: {NOT, 7} break;
case OP_BR: {BR, 7} break;
case OP_JMP: {JMP, 7} break;
case OP_JSR: {JSR, 7} break;
case OP_LD: {LD, 7} break;
case OP_LDI: {LDI, 6} break;
case OP_LDR: {LDR, 7} break;
case OP_LEA: {LEA, 7} break;
case OP_ST: {ST, 7} break;
case OP_STI: {STI, 7} break;
case OP_STR: {STR, 7} break;
case OP_TRAP: {TRAP, 8} break;
case OP_RES:
case OP_RTI:
default:
{BAD OPCODE, 7}
break;
}
}
{Shutdown, 12}}
5. 指令实现
现在需要做的就是正确地实现每一条指令。每条指令的详细描述见 GitHub Repo 中附录的 PDF 文档。你需要 照着文档的描述自己实现这些指令。这项工作做起来其实比听起来要容易。下面我会拿其中 的两个作为例子来展示如何实现,其余的见下一章。
5.1 ADD
ADD 指令将两个数相加,然后将结果存到一个寄存器中。关于这条指令的描述见 526 页。 ADD 指令的编码格式如下:
这里给出了两张图是因为 ADD 指令有两种不同的“模式”。在解释模式之前,先来看看两张 图的共同点:
-
两者都是以
0001
这 4 个比特开始的,这是 OP_ADD 的操作码(opcode) -
后面 3 个比特名为
DR
(destination register),即目的寄存器,相加的结果会放到 这里 -
再后面 3 个比特是
SR1
,这个寄存器存放了第一个将要相加的数字
至此,我们知道了相加的结果应该存到哪里,以及相加的第一个数字。只要再知道第二个数 在哪里就可以执行加法操作了。从这里开始,这两者模式开始不同:注意第 5 比特 ,这个标志位表示的是操作模式是立即模式(immediate mode)还是寄存器模式 (register mode)。在寄存器模式中,第二个数是存储在寄存器中的,和第一个数类似。 这个寄存器称为 SR2
,保存在第 0-2 比特中。第 3 和 第 4 比特没用到。用汇编代码描 述就是:
ADD R2 R0 R1 ; add the contents of R0 to R1 and store in R2.
在立即模式中,第二个数直接存储在指令中,而不是寄存器中。这种模式更加方便,因 为程序不需要额外的指令来将数据从内存加载到寄存器,直接从指令中就可以拿到这个值。 这种方式的限制是存储的数很小,不超过 2^5 = 32(无符号)。这种方式很适合对一个值 进行递增。用汇编描述就是:
ADD R0 R0 1 ; add 1 to R0 and store back in R0
下面一段解释来自 LC-3 规范:
If bit [5] is 0, the second source operand is obtained from SR2. If bit [5] is 1, the second source operand is obtained by sign-extending the imm5 field to 16 bits. In both cases, the second source operand is added to the contents of SR1 and the result stored in DR. (Pg. 526)
这段解释也就是我们前面讨论的内容。但什么是 “sign-extending”(有符号扩展)?虽然立即 模式中存储的值只有 5 比特,但这个值需要加到一个 16 比特的值上。因此,这些 5 比 特的数需要扩展到 16 比特才能和另一个数相匹配。对于正数,我们可以在前面填充 0, 填充之后值是不变的。但是,对于负数,这样填充会导致问题。例如, -1
的 5 比特表示 是 11111
。如果我们用 0 填充,那填充之后的 0000 0000 0001 1111
等于 32!这种 情况下就需要使用有符号扩展( sign extension),对于正数填充 0,对负数填充 1。
uint16_t sign_extend(uint16_t x, int bit_count) {
if ((x >> (bit_count - 1)) & 1) {
x |= (0xFFFF << bit_count);
}
return x;}
注:如果你如何用二进制表示负数感兴趣,可以查阅二进制补码(Two’s Complement) 相关的内容。本文中只需要知道怎么进行有符号扩展就行了。
规范中还有一句:
The condition codes are set, based on whether the result is negative, zero, or positive. (Pg. 526)
前面我们定义的那个条件标记枚举类型现在要派上用场了。每次有值写到寄存器时,我们 需要更新这个标记,以标明这个值的符号。为了方便,我们用下面的函数来实现这个功能:
void update_flags(uint16_t r) {
if (reg[r] == 0) {
reg[R_COND] = FL_ZRO;
}
else if (reg[r] >> 15) { /* a 1 in the left-most bit indicates negative */
reg[R_COND] = FL_NEG;
} else {
reg[R_COND] = FL_POS;
}}
现在我们就可以实现 ADD 的逻辑了:
{
uint16_t r0 = (instr >> 9) & 0x7; /* destination register (DR) */
uint16_t r1 = (instr >> 6) & 0x7; /* first operand (SR1) */
uint16_t imm_flag = (instr >> 5) & 0x1; /* whether we are in immediate mode */
if (imm_flag) {
uint16_t imm5 = sign_extend(instr & 0x1F, 5);
reg[r0] = reg[r1] + imm5;
} else {
uint16_t r2 = instr & 0x7;
reg[r0] = reg[r1] + reg[r2];
}
update_flags(r0);}
本节包含了大量信息,这里再总结一下:
-
ADD 接受两个值作为参数,并将计算结果写到一个寄存器中
-
在寄存器模式中,第二个值存储在某个寄存器中
-
在立即模式中,第二个值存储在指令最右边的 5 个比特中
-
短于 16 比特的值需要执行有符号扩展
-
每次指令修改了寄存器后,都需要更新条件标志位(condition flags)
以上就是 ADD 的实现,你可能会觉得以这样的方式实现另外 15 个指令将会是一件非常繁 琐的事情。好消息是,前面的这些函数基本都是可以重用的,因为另外 15 条指令中,大部 分都会组合有符号扩展、不同的模式和更新条件标记等等。
5.2 LDI
LDI 是 load indirect 的缩写,用于从内存加载一个值到寄存器,规范见 532 页。 LDI 的二进制格式如下:
与 ADD 相比,LDI 只有一种模式,参数也更少。LDI 的操作码是 1010
,对应 OP_LDI
枚举类型。和 ADD 类似,它包含一个 3 比特的 DR(destination register)寄存器,用 于存放加载的值。剩余的比特组成 PCoffset9
字段,这是该指令内嵌的一个立即值( immediate value),和 imm5
类似。由于这个指令是从内存加载值,因此我们可以猜测 ,PCoffset9
是一个加载值的内存地址。LC-3 规范提供了更多细节:
An address is computed by sign-extending bits [8:0] to 16 bits and adding this value to the incremented PC. What is stored in memory at this address is the address of the data to be loaded into DR. (Pg. 532)
和前面一样,我们需要将这个 9 比特的 PCoffset9
以有符号的方式扩展到 16 比特,但 这次是将扩展之后的值加到当前的程序计数器 PC
(如果回头去看前面的 while
循 环,就会发现这条指令加载之后 PC 就会递增)。相加得到的结果(也就是 PC 加完之后的 值)表示一个内存地址,这个地址中存储的值表示另一个地址,后者中存储的是需要加载到 DR 中的值。
这种方式听上去非常绕,但它确是不可或缺的。LD 指令只能加载 offset 是 9 位的地址, 但整个内存是 16 位的。LDI 适用于加载那些远离当前 PC 的地址内的值,但要加载这 些值,需要将这些最终地址存储在离 PC 较近的位置。可以将它想想成 C 中有一个局部变 量,这变量是指向某些数据的指针:
// the value of far_data is an address// of course far_data itself (the location in memory containing the address) has an addresschar* far_data = "apple";// In memory it may be layed out like this:// Address Label Value// 0x123: far_data = 0x456// ...// 0x456: string = 'a'// if PC was at 0x100// LDI R0 0x023// would load 'a' into R0
和 ADD 类似,将值放到 DR 之后需要更新条件标志位:
The condition codes are set based on whether the value loaded is negative, zero, or positive. (Pg. 532)
下面是我对 LDI 的实现(后面章节中会介绍 mem_read
):
{
uint16_t r0 = (instr >> 9) & 0x7; /* destination register (DR) */
uint16_t pc_offset = sign_extend(instr & 0x1ff, 9); /* PCoffset 9*/
/* add pc_offset to the current PC, look at that memory location to get the final address */
reg[r0] = mem_read(mem_read(reg[R_PC] + pc_offset));
update_flags(r0);}
后面会看到,这些指令的实现中,大部分辅助功能函数都是可以复用的。
以上是两个例子,接下来就可以参考这两个例子实现其他的指令。注意本文中有两个指令是 没有用到的:OP_RTI
和 OP_RES
。你可以忽略这两个指令,如果执行到它们直接报错。 将 main()
函数中未实现的 switch case
补全后,你的虚拟机主体就完成了!
6. 全部指令的参考实现
本节给出所有指令的实现。如果你自己的实现遇到问题,可以参考这里给出的版本。
6.1 RTI & RES
这两个指令本文没用到。
abort();
6.2 Bitwise and(按位与)
{ uint16_t r0 = (instr >> 9) & 0x7; uint16_t r1 = (instr >> 6) & 0x7; uint16_t imm_flag = (instr >> 5) & 0x1; if (imm_flag) { uint16_t imm5 = sign_extend(instr & 0x1F, 5); reg[r0] = reg[r1] & imm5; } else { uint16_t r2 = instr & 0x7; reg[r0] = reg[r1] & reg[r2]; } update_flags(r0); }
6.3 Bitwise not(按位非)
{ uint16_t r0 = (instr >> 9) & 0x7; uint16_t r1 = (instr >> 6) & 0x7; reg[r0] = ~reg[r1]; update_flags(r0); }
6.4 Branch(条件分支)
{ uint16_t pc_offset = sign_extend((instr) & 0x1ff, 9); uint16_t cond_flag = (instr >> 9) & 0x7; if (cond_flag & reg[R_COND]) { reg[R_PC] += pc_offset; } }
6.5 Jump(跳转)
RET 在规范中作为一个单独的指令列出,因为在汇编中它是一个独立的关键字。但是,RET 本质上是 JMP 的一个特殊情况。当 R1 为 7 时会执行 RET。
{ /* Also handles RET */ uint16_t r1 = (instr >> 6) & 0x7; reg[R_PC] = reg[r1]; }
6.6 Jump Register(跳转寄存器)
{ uint16_t r1 = (instr >> 6) & 0x7; uint16_t long_pc_offset = sign_extend(instr & 0x7ff, 11); uint16_t long_flag = (instr >> 11) & 1; reg[R_R7] = reg[R_PC]; if (long_flag) { reg[R_PC] += long_pc_offset; /* JSR */ } else { reg[R_PC] = reg[r1]; /* JSRR */ } break; }
6.7 Load(加载)
{ uint16_t r0 = (instr >> 9) & 0x7; uint16_t pc_offset = sign_extend(instr & 0x1ff, 9); reg[r0] = mem_read(reg[R_PC] + pc_offset); update_flags(r0); }
6.8 Load Register(加载寄存器)
{ uint16_t r0 = (instr >> 9) & 0x7; uint16_t r1 = (instr >> 6) & 0x7; uint16_t offset = sign_extend(instr & 0x3F, 6); reg[r0] = mem_read(reg[r1] + offset); update_flags(r0); }
6.9 Load Effective Address(加载有效地址)
{ uint16_t r0 = (instr >> 9) & 0x7; uint16_t pc_offset = sign_extend(instr & 0x1ff, 9); reg[r0] = reg[R_PC] + pc_offset; update_flags(r0); }
6.10 Store(存储)
{ uint16_t r0 = (instr >> 9) & 0x7; uint16_t pc_offset = sign_extend(instr & 0x1ff, 9); mem_write(reg[R_PC] + pc_offset, reg[r0]); }
6.11 Store Indirect(间接存储)
{ uint16_t r0 = (instr >> 9) & 0x7; uint16_t pc_offset = sign_extend(instr & 0x1ff, 9); mem_write(mem_read(reg[R_PC] + pc_offset), reg[r0]); }
6.12 Store Register(存储寄存器)
{ uint16_t r0 = (instr >> 9) & 0x7; uint16_t r1 = (instr >> 6) & 0x7; uint16_t offset = sign_extend(instr & 0x3F, 6); mem_write(reg[r1] + offset, reg[r0]); }
7. Trap Routines(中断陷入例程)
LC-3 提供了几个预定于的函数(过程),用于执行常规任务以及与 I/O 设备交换, 例如,用于从键盘接收输入的函数,在控制台上显示字符串的函数。这些都称为 trap routines,你可以将它们当做操作系统或者是 LC-3 的 API。 每个 trap routine 都有一个对应的 trap code(中断号)。要执行一次捕获, 需要用相应的 trap code 执行 TRAP
指令。
定义所有 trap code:
enum {
TRAP_GETC = 0x20, /* get character from keyboard, not echoed onto the terminal */
TRAP_OUT = 0x21, /* output a character */
TRAP_PUTS = 0x22, /* output a word string */
TRAP_IN = 0x23, /* get character from keyboard, echoed onto the terminal */
TRAP_PUTSP = 0x24, /* output a byte string */
TRAP_HALT = 0x25 /* halt the program */};
你可能会觉得奇怪:为什么 trap code 没有包含在指令编码中?这是因为它们没有给 LC-3 带来任何新功能,只是提供了一种方便地执行任务的方式(和 C 中的系统函数类似 )。在官方 LC-3 模拟器中,trap routines 是用汇编实现的。 当调用到 trap code 时,PC 会移动到 code 对应的地址。CPU 执行这个函数( procedure)的指令流,函数结束后 PC 重置到 trap 调用之前的位置。
注:这就是为什么程序从 0x3000 而不是 0x0 开始的原因。低地址空间是特意留出来 给 trap routine 用的。
规范只定义了 trap routine 的行为,并没有规定应该如何实现。在我们这个虚拟机中, 将会用 C 实现。当触发某个 trap code 时,会调用一个相应的 C 函数。这个函数执行 完成后,执行过程会返回到原来的指令流。
虽然 trap routine 可以用汇编实现,而且物理的 LC-3 计算机也确实是这样做的,但对虚 拟机来说并不是非常合适。相比于实现自己的 primitive I/O routines,我们可以利用操 作系统上已有的。这样可以使我们的虚拟机运行更良好,还简化了代码,提供了一个便于移 植的高层抽象。
注:从键盘获取输入就是一个例子。汇编版本使用一个循环来持续检查键盘有没有输入 ,这会消耗大量 CPU 而实际上没做多少事情!使用操作系统提供的某个合适的输入函 数的话,程序可以在收到输入之前一直 sleep。
TRAP 处理逻辑:
switch (instr & 0xFF) {
case TRAP_GETC: {TRAP GETC, 9} break;
case TRAP_OUT: {TRAP OUT, 9} break;
case TRAP_PUTS: {TRAP PUTS, 8} break;
case TRAP_IN: {TRAP IN, 9} break;
case TRAP_PUTSP: {TRAP PUTSP, 9} break;
case TRAP_HALT: {TRAP HALT, 9} break;}
和前面几节类似,我会拿一个 trap routine 作为例子展示如何实现,其他的留给读者自己 完成。
7.1 PUTS
PUT
trap code 用于输出一个以空字符结尾的字符串(和 C 中的 printf
类似)。规 范见 543 页。
显示一个字符串需要将这个字符串的地址放到 R0
寄存器,然后触发 trap。规范中说:
Write a string of ASCII characters to the console display. The characters are contained in consecutive memory locations, one character per memory location, starting with the address specified in R0. Writing terminates with the occurrence of x0000 in a memory location. (Pg. 543)
意思是说字符串是存储在一个连续的内存区域。注意这里和 C 中的字符串有所不同:C 中每个字符占用一个 byte;LC-3 中内存寻找是 16 位的,每个字符都是 16 位,占用 两个 byte。因此要用 C 函数打印这些字符,需要将每个值先转换成 char
类型再输出:
{
/* one char per word */
uint16_t* c = memory + reg[R_R0];
while (*c) {
putc((char)*c, stdout);
++c;
}
fflush(stdout);}
这就是 PUTS
trap routine 的实现了。如果熟悉 C 的话,这个函数应该很容易理解。现 在你可以按照 LC-3 规范,自己动手实现其他的 trap routine 了。
8. Trap Routine 参考实现
本节给出所有 trap routine 的一份参考实现。
8.1 输入单个字符(Input Character)
/* read a single ASCII char */reg[R_R0] = (uint16_t)getchar();
8.2 输出单个字符(Output Character)
putc((char)reg[R_R0], stdout);fflush(stdout);
8.3 打印输入单个字符提示(Prompt for Input Character)
printf("Enter a character: ");char c = getchar();putc(c, stdout);reg[R_R0] = (uint16_t)c;
8.4 输出字符串(Output String)
{
/* one char per byte (two bytes per word) here we need to swap back to
big endian format */
uint16_t* c = memory + reg[R_R0];
while (*c) {
char char1 = (*c) & 0xFF;
putc(char1, stdout);
char char2 = (*c) >> 8;
if (char2) putc(char2, stdout);
++c;
}
fflush(stdout);}
8.5 暂停程序执行(Halt Program)
puts("HALT");fflush(stdout);running = 0;
9. 加载程序
前面提到了从内存加载和执行指令,但指令是如何进入内存的呢?将汇编程序转换为 机器码时,得到的是一个文件,其中包含一个指令流和相应的数据。只需要将这个文件的内 容复制到内存就算完成加载了。
程序的前 16 比特规定了这个程序在内存中的起始地址,这个地址称为 origin。因此 加载时应该首先读取这 16 比特,确定起始地址,然后才能依次读取和放置后面的指令及数 据。
下面是将 LC-3 程序读到内存的代码:
void read_image_file(FILE* file) {
uint16_t origin; /* the origin tells us where in memory to place the image */
fread(&origin, sizeof(origin), 1, file);
origin = swap16(origin);
/* we know the maximum file size so we only need one fread */
uint16_t max_read = UINT16_MAX - origin;
uint16_t* p = memory + origin;
size_t read = fread(p, sizeof(uint16_t), max_read, file);
/* swap to little endian */
while (read-- > 0) {
*p = swap16(*p);
++p;
}}
注意读取前 16 比特之后,对这个值执行了 swap16()
。这是因为 LC-3 程序是大端 (big-endian),但现在大部分计算机都是小端的(little-endian),因此需要做大小端 转换。如果你是在某些特殊的机器 (例如 PPC)上运行,那就不 需要这些转换了。
uint16_t swap16(uint16_t x) {
return (x << 8) | (x >> 8);}
注:大小端(Endianness)是指对于 一个整型数据,它的每个字节应该如何解释。在小端中,第一个字节是最低位,而在大端 中刚好相反,第一个字节是最高位。据我所知,这个顺序完全是人为规定的。不同的公司 做出的抉择不同,因此我们这些后来人只能针对大小端做一些特殊处理。要理解本文中大 小端相关的内容,知道这些就足够了。
我们再封装一下前面加载程序的函数,接受一个文件路径字符串作为参数,这样更加方便:
int read_image(const char* image_path) {
FILE* file = fopen(image_path, "rb");
if (!file) { return 0; };
read_image_file(file);
fclose(file);
return 1;}
10. 内存映射寄存器(Memory Mapped Registers)
某些特殊类型的寄存器是无法从常规寄存器表(register table)中访问的。因此,在内 存中为这些寄存器预留了特殊的地址。要读写这些寄存器,只需要读写相应的内存地址。 这些称为 内存映射寄存器(MMR)。内存映射寄存器通常用于处理与特殊硬件的交互。
LC-3 有两个内存映射寄存器需要实现,分别是:
-
KBSR
:键盘状态寄存器(keyboard status register),表示是否有键按下 -
KBDR
:键盘数据寄存器(keyboard data register),表示哪个键按下了
虽然可以用 GETC
来请求键盘输入,但这个 trap routine 会阻塞执行,知道从键盘获得 了输入。KBSR 和 KBDR 使得我们可以轮询设备的状态然后继续执 行,因此程序不会阻塞。
enum {
MR_KBSR = 0xFE00, /* keyboard status */
MR_KBDR = 0xFE02 /* keyboard data */};
内存映射寄存器使内存访问稍微复杂了一些。这种情况下不能直接读写内存位置,而要使 用 setter
和 getter
辅助函数。当获取输入时,getter
会检查键盘输入并更新两 个寄存器(也就是相应的内存位置)。
void mem_write(uint16_t address, uint16_t val) {
memory[address] = val;}uint16_t mem_read(uint16_t address){
if (address == MR_KBSR) {
if (check_key()) {
memory[MR_KBSR] = (1 << 15);
memory[MR_KBDR] = getchar();
} else {
memory[MR_KBSR] = 0;
}
}
return memory[address];}
这就是我们的虚拟机的最后一部分了!只要你实现了前面提到的 trap routine 和指令,你 的虚拟机就即将能够运行了!
11. 平台相关的细节
本节包含一些与键盘交互以及显示相关的代码。如果不感兴趣可以直接复制粘贴。
如果不是在 Unix 类系统上运行本程序,例如 Windows,那本节内容需要替换为相应的平台 实现。
uint16_t check_key() {
fd_set readfds;
FD_ZERO(&readfds);
FD_SET(STDIN_FILENO, &readfds);
struct timeval timeout;
timeout.tv_sec = 0;
timeout.tv_usec = 0;
return select(1, &readfds, NULL, NULL, &timeout) != 0;}
下面是特定于 Unix 的设置终端输入的代码:
struct termios original_tio;void disable_input_buffering() {
tcgetattr(STDIN_FILENO, &original_tio);
struct termios new_tio = original_tio;
new_tio.c_lflag &= ~ICANON & ~ECHO;
tcsetattr(STDIN_FILENO, TCSANOW, &new_tio);}void restore_input_buffering() {
tcsetattr(STDIN_FILENO, TCSANOW, &original_tio);}
当程序被中断时,我们需要将终端的设置恢复到默认:
void handle_interrupt(int signal) {
restore_input_buffering();
printf("\n");
exit(-2);}
signal(SIGINT, handle_interrupt);disable_input_buffering();
12. 运行虚拟机
现在你可以编译和运行这个 LC-3 虚拟机了!
使用你喜欢的 C 编译器编译这个虚拟机( lc3.c 或排版更紧凑的 lc3-vm.c),然后下载汇 编之后的两个小游戏:
用如下命令执行:lc3-vm path/to/2048.obj
。
Play 2048!{2048 Example 13}Control the game using WASD keys.
Are you on an ANSI terminal (y/n)? y
+--------------------------+
| |
| |
| |
| 2 |
| |
| 2 |
| |
| |
| |
+--------------------------+
调试
如果程序不能正常工作,那可能是你的实现有问题。调试程序就有点麻烦了。我建议通读 LC-3 程序的汇编源代码,然后使用一个调试器单步执行虚拟机指令,确保虚拟机执行到 的指令是符合预期的。如果发现了不符合预期的行为,就需要重新查看 LC-3 规范,确认你 的实现是否有问题。
13. C++ 实现(可选)
使用 C++ 会使代码更简短。本节介绍 C++ 的一些实现技巧。
C++ 有强大的编译时泛型(compile-time generics)机制,可以帮我们自动生成部分指令 的实现代码。这里的基本思想是重用每个指令的公共部分。例如,好几条指令都用到了间接 寻址或有符号扩展然后加到当前寄存器的功能。模板如下:
{Instruction C++ 14}template <unsigned op>void ins(uint16_t instr) {
uint16_t r0, r1, r2, imm5, imm_flag;
uint16_t pc_plus_off, base_plus_off;
uint16_t opbit = (1 << op);
if (0x4EEE & opbit) { r0 = (instr >> 9) & 0x7; }
if (0x12E3 & opbit) { r1 = (instr >> 6) & 0x7; }
if (0x0022 & opbit) {
r2 = instr & 0x7;
imm_flag = (instr >> 5) & 0x1;
imm5 = sign_extend((instr) & 0x1F, 5);
}
if (0x00C0 & opbit) { // Base + offset
base_plus_off = reg[r1] + sign_extend(instr & 0x3f, 6);
}
if (0x4C0D & opbit) { // Indirect address
pc_plus_off = reg[R_PC] + sign_extend(instr & 0x1ff, 9);
}
if (0x0001 & opbit) {
// BR
uint16_t cond = (instr >> 9) & 0x7;
if (cond & reg[R_COND]) { reg[R_PC] = pc_plus_off; }
}
if (0x0002 & opbit) { // ADD
if (imm_flag) {
reg[r0] = reg[r1] + imm5;
} else {
reg[r0] = reg[r1] + reg[r2];
}
}
if (0x0020 & opbit) { // AND
if (imm_flag) {
reg[r0] = reg[r1] & imm5;
} else {
reg[r0] = reg[r1] & reg[r2];
}
}
if (0x0200 & opbit) { reg[r0] = ~reg[r1]; } // NOT
if (0x1000 & opbit) { reg[R_PC] = reg[r1]; } // JMP
if (0x0010 & opbit) { // JSR
uint16_t long_flag = (instr >> 11) & 1;
pc_plus_off = reg[R_PC] + sign_extend(instr & 0x7ff, 11);
reg[R_R7] = reg[R_PC];
if (long_flag) {
reg[R_PC] = pc_plus_off;
} else {
reg[R_PC] = reg[r1];
}
}
if (0x0004 & opbit) { reg[r0] = mem_read(pc_plus_off); } // LD
if (0x0400 & opbit) { reg[r0] = mem_read(mem_read(pc_plus_off)); } // LDI
if (0x0040 & opbit) { reg[r0] = mem_read(base_plus_off); } // LDR
if (0x4000 & opbit) { reg[r0] = pc_plus_off; } // LEA
if (0x0008 & opbit) { mem_write(pc_plus_off, reg[r0]); } // ST
if (0x0800 & opbit) { mem_write(mem_read(pc_plus_off), reg[r0]); } // STI
if (0x0080 & opbit) { mem_write(base_plus_off, reg[r0]); } // STR
if (0x8000 & opbit) { // TRAP
{TRAP, 8}
}
//if (0x0100 & opbit) { } // RTI
if (0x4666 & opbit) { update_flags(r0); }}{Op Table 14}static void (*op_table[16])(uint16_t) = {
ins<0>, ins<1>, ins<2>, ins<3>,
ins<4>, ins<5>, ins<6>, ins<7>,
NULL, ins<9>, ins<10>, ins<11>,
ins<12>, NULL, ins<14>, ins<15>};
这里的技巧是从 Bisqwit’s NES emulator 学来的。如果你对仿真或 NES 感兴趣,强烈建议观看他的视频。
完整版 C++ 实现见 lc3-alt.cpp.
14. 其他语言实现
社区贡献的其他语言实现(链接见原文):
-
C
-
C++
-
Common Lisp
-
Go
-
Haskell
-
JavaScript
-
Python
-
Racket
-
Rust
-
Swift
-
TypeScript
Write your Own Virtual Machine
By: Justin Meiners and Ryan Pendleton
View the final code and other resources in the GitHub repo.
In this tutorial, I will teach you how to write your own virtual machine (VM) that can run assembly language programs, such as my friend’s 2048 or my Roguelike. If you know how to program, but would like to gain a deeper understanding of what is going on inside a computer and better understand how programming languages work, then this project is for you. Writing your own VM may sound a little scary, but I promise that you will find it to be surprisingly simple and enlightening.
The final code is about 250 lines of C (unix, windows). All you need to know is how to read basic C or C++ and how to do binary arithmetic.
Note: This tutorial is a literate program. This means you are reading the source code right now! Each piece of code from the VM project will be shown and explained thoroughly, so you can be sure nothing is left out. The final code was created by “tangling” the blocks of code together.
1. What is a virtual machine?
A VM is a program that acts like a computer. It simulates a CPU along with a few other hardware components, allowing it to perform arithmetic, read and write to memory, and interact with I/O devices, just like a physical computer. Most importantly, it can understand a machine language which you can use to program it.
The amount of computer hardware the VM attempts to simulate depends on its purpose. Some VMs are designed to reproduce the behavior of some particular computer, such as video game emulators. Most people don’t have an NES lying around anymore, but we can still play NES games by simulating the NES hardware in a program. These emulators must faithfully recreate every detail and major hardware component of the original device.
Other VMs don’t act like any real computer and are entirely made up! This is primarily done to make software development easier. Imagine you wanted to create a program that ran on multiple computer architectures. A VM could offer a standard platform which provided portability for all of them. Instead of rewriting a program in different dialects of assembly for each CPU architecture, you would only need to write the small VM program in each assembly language. Each program would then be written only once in the VM’s assembly language.
Note: A compiler solves a similar problem by compiling a standard high-level language to several CPU architectures. A VM creates one standard CPU architecture which is simulated on various hardware devices. One advantage of a compiler is that it has no runtime overhead while a VM does. Even though compilers do a pretty good job, writing a new one that targets multiple platforms is very difficult, so VMs are still helpful here. In practice, VMs and compilers are mixed at various levels.
The Java Virtual Machine (JVM) is a very successful example. The JVM itself is a moderately sized program that is small enough for one programmer to understand. This has made it possible to be written for thousands of devices including phones. Once the JVM is implemented on a new device, any Java, Kotlin, or Clojure program ever written can run on it without modification. The only cost is the overhead of the VM itself and the further abstraction from the machine. Most of the time, this is a pretty good tradeoff.
A VM doesn’t have to be large or pervasive to provide a similar benefit. Old video games often used small VMs to provide simple scripting systems.
VMs are also useful for executing code in a secure or isolated way. One application of this is garbage collection. There is no trivial way to implement automatic garbage collection on top of C or C++ since a program cannot see its own stack or variables. However, a VM is “outside” the program it is running and can observe all of the memory references on the stack.
Another example of this behavior is demonstrated by Ethereum smart contracts. Smart contracts are small programs which are executed by each validating node in the blockchain network. This requires the node operators to run programs on their machines that have been written by complete strangers, without any opportunity to scrutinize them beforehand. To prevent a contract from doing malicious things, they are run inside a VM that has no access to the file system, network, disc, etc. Ethereum is also a good application of the portability features that result when using a VM. Since Ethereum nodes can be run on many kinds of computers and operating systems, the use of a VM allows smart contracts to be written without any consideration of the many platforms they run on.
2. LC-3 architecture
Our VM will simulate a fictional computer called the LC-3. The LC-3 is popular for teaching university students how to program in assembly language. It has a simplified instruction set compared to x86, but contains all the main ideas used in modern CPUs.
First, we need to simulate the essential hardware components of the machine. Try to understand what each component is, but don’t worry right now if you are unsure of how it fits into the larger picture. Start by creating a C file. Each of the code snippets in this section should be placed in the global scope of this file.
Memory
The LC-3 has 65,536 memory locations (the maximum that is addressable by a 16-bit unsigned integer 2^16
), each of which stores a 16-bit value. This means it can store a total of only 128KB, which is a lot smaller than you may be used to! In our program, this memory will be stored in a simple array:
#define MEMORY_MAX (1 << 16)uint16_t memory[MEMORY_MAX]; /* 65536 locations */
Used by 1 2 3 4
Registers
A register is a slot for storing a single value on the CPU. Registers are like the “workbench” of the CPU. For the CPU to work with a piece of data, it has to be in one of the registers. However, since there are just a few registers, only a minimal amount of data can be loaded at any given time. Programs work around this by loading values from memory into registers, calculating values into other registers, and then storing the final results back in memory.
The LC-3 has 10 total registers, each of which is 16 bits. Most of them are general purpose, but a few have designated roles. - 8 general purpose registers (R0
-R7
) - 1 program counter (PC
) register - 1 condition flags (COND
) register
The general purpose registers can be used to perform any program calculations. The program counter is an unsigned integer which is the address of the next instruction in memory to execute. The condition flags tell us information about the previous calculation.
enum{ R_R0 = 0, R_R1, R_R2, R_R3, R_R4, R_R5, R_R6, R_R7, R_PC, /* program counter */ R_COND, R_COUNT};
Used by 1 2 3 4
Just like the memory, we will store the registers in an array:
Instruction set
An instruction is a command which tells the CPU to do some fundamental task, such as add two numbers. Instructions have both an opcode which indicates the kind of task to perform and a set of parameters which provide inputs to the task being performed.
Each opcode represents one task that the CPU “knows” how to do. There are just 16 opcodes in LC-3. Everything the computer can calculate is some sequence of these simple instructions. Each instruction is 16 bits long, with the left 4 bits storing the opcode. The rest of the bits are used to store the parameters.
We will discuss, in detail, what each instruction does later. For now, define the following opcodes. Make sure they stay in this order so that they are assigned the proper enum value:
enum{ OP_BR = 0, /* branch */ OP_ADD, /* add */ OP_LD, /* load */ OP_ST, /* store */ OP_JSR, /* jump register */ OP_AND, /* bitwise and */ OP_LDR, /* load register */ OP_STR, /* store register */ OP_RTI, /* unused */ OP_NOT, /* bitwise not */ OP_LDI, /* load indirect */ OP_STI, /* store indirect */ OP_JMP, /* jump */ OP_RES, /* reserved (unused) */ OP_LEA, /* load effective address */ OP_TRAP /* execute trap */};
Used by 1 2 3 4
Note: The Intel x86 architecture has hundreds of instructions, while others such as ARM and LC-3 have very few. Small instruction sets are referred to as RISCs while larger ones are called CISCs. Larger instruction sets typically do not provide any fundamentally new possibilities, but they often make it more convenient to write assembly for. A single instruction in CISC might take the place of several in RISC. However, they tend to be more complex and expensive for engineers to design and manufacture. This and other tradeoffs cause the designs to come in and out of style.
Condition flags
The R_COND
register stores condition flags which provide information about the most recently executed calculation. This allows programs to check logical conditions such as if (x > 0) { ... }
.
Each CPU has a variety of condition flags to signal various situations. The LC-3 uses only 3 condition flags which indicate the sign of the previous calculation.
enum{ FL_POS = 1 << 0, /* P */ FL_ZRO = 1 << 1, /* Z */ FL_NEG = 1 << 2, /* N */};
Used by 1 2 3 4
Note: (The
<<
symbol is called the left bitshift operator.(n << k)
shifts the bits ofn
to the leftk
places. Thus1 << 2
will equal4
. Read that link if you are not familiar. It will be important.)
We are finished setting up the hardware components of our VM! After adding standard includes (see reference), your file should look something like this:
@{Includes}@{Registers}@{Condition Flags}@{Opcodes}
3. Assembly examples
Now let’s look at an LC-3 assembly program to get an idea of what the VM actually runs. You don’t need to know how to program assembly or understand everything that is going on. Just try to get a general idea of what is going on. Here is a simple “Hello World”:
.ORIG x3000 ; this is the address in memory where the program will be loaded LEA R0, HELLO_STR ; load the address of the HELLO_STR string into R0PUTs ; output the string pointed to by R0 to the console HALT ; halt the program HELLO_STR .STRINGZ "Hello World!" ; store this string here in the program.END ; mark the end of the file
Just like in C, the program starts at the top and executes one statement at a time. However, unlike C, there are no nested scopes {}
or control structures such as if
or while
; just a flat list of statements. This makes it much easier to execute.
Notice that some of the statements have names which match the opcodes we defined earlier. Previously, we learned that every instruction was 16 bits, but each line looks like it is a different number of characters. How is this inconsistency possible?
This is because the code we are reading is written in assembly which is a human-readable and writable form, encoded in plain text. A tool called an assembler is used to transform each line of text into a 16-bit binary instruction the VM can understand. This binary form, which is essentially an array of 16-bit instructions, is called machine code and is what the VM actually runs.
Note: Although a compiler and an assembler play a similar role in development, they are not the same. An assembler simply encodes what the programmer has written in text into binary, replacing symbols with their binary representation and packing them into instructions.
The commands .ORIG
and .STRINGZ
look like instructions but they aren’t. They are assembler directives which generate a piece of code or data (like macros). For example, the .STRINGZ
inserts a string of characters into the program binary at the location it is written.
Loops and conditions are done with a goto-like instruction. Here is another example which counts to 10.
AND R0, R0, 0 ; clear R0 LOOP ; label at the top of our loop ADD R0, R0, 1 ; add 1 to R0 and store back in R0 ADD R1, R0, -10 ; subtract 10 from R0 and store back in R1BRn LOOP ; go back to LOOP if the result was negative... ; R0 is now 10!
Note: Learning to write assembly is not necessary for this tutorial. However, if you are interested, you can write and assemble your own LC-3 programs using the LC-3 Tools.
4. Executing programs
Once again, the previous examples are just to give you an idea of what the VM does. To write a VM, you don’t need to be fluent in assembly. As long as you follow the proper procedure for reading and executing instructions, any LC-3 program will run correctly, no matter how complicated it is. In theory, it could even run a web browser or an operating system like Linux!
If you think about this property deeply, it is a philosophically remarkable idea. The programs themselves can do all kinds of intelligent things that we never anticipated and may not be able to understand, but at the same time, everything they can do is limited to the simple code we will write! We simultaneously know everything and nothing about how each program works. Turing observed this marvelous idea:
“The view that machines cannot give rise to surprises is due, I believe, to a fallacy to which philosophers and mathematicians are particularly subject. This is the assumption that as soon as a fact is presented to a mind all consequences of that fact spring into the mind simultaneously with it. It is a very useful assumption under many circumstances, but one too easily forgets that it is false.” — Alan M. Turing
Procedure
Here is the procedure we need to write:
-
Load one instruction from memory at the address of the
PC
register. -
Increment the
PC
register. -
Look at the opcode to determine which type of instruction it should perform.
-
Perform the instruction using the parameters in the instruction.
-
Go back to step 1.
You may be wondering, “if the loop keeps incrementing the PC
, and we don’t have if
or while
, won’t it quickly run out of instructions?” No. As we mentioned before, some goto-like instructions change the execution flow by jumping the PC
around.
Let’s start outlining this process in the main loop:
int main(int argc, const char* argv[]){ @{Load Arguments} @{Setup} /* since exactly one condition flag should be set at any given time, set the Z flag */ reg[R_COND] = FL_ZRO; /* set the PC to starting position */ /* 0x3000 is the default */ enum { PC_START = 0x3000 }; reg[R_PC] = PC_START; int running = 1; while (running) { /* FETCH */ uint16_t instr = mem_read(reg[R_PC]++); uint16_t op = instr >> 12; switch (op) { case OP_ADD: @{ADD} break; case OP_AND: @{AND} break; case OP_NOT: @{NOT} break; case OP_BR: @{BR} break; case OP_JMP: @{JMP} break; case OP_JSR: @{JSR} break; case OP_LD: @{LD} break; case OP_LDI: @{LDI} break; case OP_LDR: @{LDR} break; case OP_LEA: @{LEA} break; case OP_ST: @{ST} break; case OP_STI: @{STI} break; case OP_STR: @{STR} break; case OP_TRAP: @{TRAP} break; case OP_RES: case OP_RTI: default: @{BAD OPCODE} break; } } @{Shutdown}}
Used by 1 2
While we are at the main loop let’s handle command line input to make our program usable. We expect one or more paths to VM images and present a usage string if none are given.
if (argc < 2){ /* show usage string */ printf("lc3 [image-file1] ...\n"); exit(2);}for (int j = 1; j < argc; ++j){ if (!read_image(argv[j])) { printf("failed to load image: %s\n", argv[j]); exit(1); }}
Used by 1 2 3
5. Implementing instructions
Your task now is to fill in each opcode case with a correct implementation. This is easier than it sounds. A detailed specification for each instruction is included in the project documents. The specificiation for each translates pretty easily to several lines of codes. I will demonstrate how to implement two of them here. The code for the rest can be found in the next section.
ADD
The ADD
instruction takes two numbers, adds them together, and stores the result in a register. Its specification is found on page 526. Each ADD
instruction looks like the following:
The encoding shows two rows because there are two different “modes” for this instruction. Before I explain modes, let’s try to find the similarities between them. In both rows, we can see that we start with 4 bits, 0001
. This is the opcode value for OP_ADD
. The next 3 bits are marked DR
. This stands for destination register. The destination register is where the added sum will be stored. The next 3 bits are SR1
. This is the register containing the first number to add.
So we know where we want to store the result and we know the first number to add. The last bit of information we need is the second number to add. At this point, the two rows start to look different. Notice that on the top row the 5th bit is a 0
and in the second row it is 1
. This bit indicates whether it is immediate mode or register mode. In register mode, the second number is stored in a register just like the first. This is marked SR2
and is contained in bits 2-0. Bits 3 and 4 are unused. In assembly this would be written as:
ADD R2 R0 R1 ; add the contents of R0 to R1 and store in R2.
Immediate mode is a convenience which reduces the length of a typical program. Instead of adding two values stored in separate registers, the second value is embedded in the instruction itself, marked imm5
in the diagram. This removes the need to write instructions to load the value from memory. The tradeoff is that the instruction only has room for a small number, up to 2^5=32
(unsigned) to be exact, making immediate mode primarily useful for incrementing and decrementing. In assembly, it could be written as:
ADD R0 R0 1 ; add 1 to R0 and store back in R0
Here is a summary from the specification:
If bit [5] is 0, the second source operand is obtained from SR2. If bit [5] is 1, the second source operand is obtained by sign-extending the imm5 field to 16 bits. In both cases, the second source operand is added to the contents of SR1 and the result stored in DR. (Pg. 526)
That sounds just like the behaviour we discussed, but what is “sign-extending”? The immediate mode value has only 5 bits, but it needs to be added to a 16-bit number. To do the addition, those 5 bits need to be extended to 16 to match the other number. For positive numbers, we can simply fill in 0’s for the additional bits. For negative numbers, this causes a problem. For example, -1 in 5 bits is 1 1111
. If we just extended it with 0’s, this is 0000 0000 0001 1111
which is equal to 31. Sign extension corrects this problem by filling in 0’s for positive numbers and 1’s for negative numbers, so that original values are preserved.
uint16_t sign_extend(uint16_t x, int bit_count){ if ((x >> (bit_count - 1)) & 1) { x |= (0xFFFF << bit_count); } return x;}
Used by 1 2 3 4
Note: If you are interested in exactly how negative numbers can be represented in binary, you can read about Two’s Complement. However, this is not essential. You can just copy the code above and use it whenever the specification says to sign extend numbers.
There is one last sentence in the specification:
The condition codes are set, based on whether the result is negative, zero, or positive. (Pg. 526)
Earlier we defined a condition flags enum, and now it’s time to use them. Any time a value is written to a register, we need to update the flags to indicate its sign. We will write a function so that this can be reused:
void update_flags(uint16_t r){ if (reg[r] == 0) { reg[R_COND] = FL_ZRO; } else if (reg[r] >> 15) /* a 1 in the left-most bit indicates negative */ { reg[R_COND] = FL_NEG; } else { reg[R_COND] = FL_POS; }}
Used by 1 2 3 4
Now we are ready to write the code for the ADD
case:
{ /* destination register (DR) */ uint16_t r0 = (instr >> 9) & 0x7; /* first operand (SR1) */ uint16_t r1 = (instr >> 6) & 0x7; /* whether we are in immediate mode */ uint16_t imm_flag = (instr >> 5) & 0x1; if (imm_flag) { uint16_t imm5 = sign_extend(instr & 0x1F, 5); reg[r0] = reg[r1] + imm5; } else { uint16_t r2 = instr & 0x7; reg[r0] = reg[r1] + reg[r2]; } update_flags(r0);}
Used by 1
This section contained a lot of information, so let’s summarize. - ADD
takes two values and stores them in a register. - In register mode, the second value to add is found in a register. - In immediate mode, the second value is embedded in the right-most 5 bits of the instruction. - Values which are shorter than 16 bits need to be sign extended. - Any time an instruction modifies a register, the condition flags need to be updated.
You may be feeling overwhelmed about writing 15 more instructions. However, all of what you learned here will be reused. Most of the instructions use some combination of sign extension, different modes, and updating flags.
LDI
LDI
stands for “load indirect.” This instruction is used to load a value from a location in memory into a register. The specification is found on page 532.
Here is what the binary layout looks like:
In contrast to ADD
, there are no modes and fewer parameters. This time, the opcode is 1010
which corresponds with the OP_LDI
enum value. Just like ADD
, it contains a 3-bit DR
(the destination register) for storing the loaded value. The remaining bits are labeled PCoffset9
. This is an immediate value embedded in the instruction (similar to imm5
). Since this instruction loads from memory, we can guess that this number is some kind of address which tells us where to load from. The specification provides more detail:
An address is computed by sign-extending bits
[8:0]
to 16 bits and adding this value to the incrementedPC
. What is stored in memory at this address is the address of the data to be loaded intoDR
. (Pg. 532)
Just like before, we need to sign extend this 9-bit value, but this time add it to the current PC
. (If you look back at the execution loop, the PC
was incremented right after this instruction was loaded.) The resulting sum is an address to a location in memory, and that address contains, yet another value which is the address of the value to load.
This may seem like a roundabout way to read from memory, but it is indispensable. The LD
instruction is limited to address offsets that are 9 bits, whereas the memory requires 16 bits to address. LDI
is useful for loading values that are stored in locations far away from the current PC, but to use it, the address of the final location needs to be stored in a neighborhood nearby. You can think of it like having a local variable in C which is a pointer to some data:
// the value of far_data is an address// of course far_data itself (the location in memory containing the address) has an addresschar* far_data = "apple";// In memory it may be layed out like this:// Address Label Value// 0x123: far_data = 0x456// ...// 0x456: string = 'a'// if PC was at 0x100// LDI R0 0x023// would load 'a' into R0
Same as before, the flags need to be updated after putting the value into DR
:
The condition codes are set based on whether the value loaded is negative, zero, or positive. (Pg. 532)
Here is the code for this case: (mem_read
will be discussed in a later section.)
{ /* destination register (DR) */ uint16_t r0 = (instr >> 9) & 0x7; /* PCoffset 9*/ uint16_t pc_offset = sign_extend(instr & 0x1FF, 9); /* add pc_offset to the current PC, look at that memory location to get the final address */ reg[r0] = mem_read(mem_read(reg[R_PC] + pc_offset)); update_flags(r0);}
Used by 1
As I said, this instruction shared a lot of the code and knowledge learned from ADD
. You will find this is the case with the remaining instructions.
You now need to go back and implement the rest of the switch cases for the instructions. Follow the specification and use the code listed here to complete the others. The code for all instructions is listed at the end of the tutorial. Two of the opcodes specified before will not be used, they are OP_RTI
and OP_RES
. You can ignore these cases or throw an error if they are executed. After you are done, the bulk of your VM will be completed!
6. Instruction cheat sheet
This section contains the full implementations of the remaining instructions if you get stuck.
RTI & RES
(these are unused)
Bitwise and
{ uint16_t r0 = (instr >> 9) & 0x7; uint16_t r1 = (instr >> 6) & 0x7; uint16_t imm_flag = (instr >> 5) & 0x1; if (imm_flag) { uint16_t imm5 = sign_extend(instr & 0x1F, 5); reg[r0] = reg[r1] & imm5; } else { uint16_t r2 = instr & 0x7; reg[r0] = reg[r1] & reg[r2]; } update_flags(r0);}
Used by 1
Bitwise not
{ uint16_t r0 = (instr >> 9) & 0x7; uint16_t r1 = (instr >> 6) & 0x7; reg[r0] = ~reg[r1]; update_flags(r0);}
Used by 1
Branch
{ uint16_t pc_offset = sign_extend(instr & 0x1FF, 9); uint16_t cond_flag = (instr >> 9) & 0x7; if (cond_flag & reg[R_COND]) { reg[R_PC] += pc_offset; }}
Used by 1
Jump
RET
is listed as a separate instruction in the specification, since it is a different keyword in assembly. However, it is actually a special case of JMP
. RET
happens whenever R1
is 7.
Jump register
{ uint16_t long_flag = (instr >> 11) & 1; reg[R_R7] = reg[R_PC]; if (long_flag) { uint16_t long_pc_offset = sign_extend(instr & 0x7FF, 11); reg[R_PC] += long_pc_offset; /* JSR */ } else { uint16_t r1 = (instr >> 6) & 0x7; reg[R_PC] = reg[r1]; /* JSRR */ }}
Used by 1
Load
{ uint16_t r0 = (instr >> 9) & 0x7; uint16_t pc_offset = sign_extend(instr & 0x1FF, 9); reg[r0] = mem_read(reg[R_PC] + pc_offset); update_flags(r0);}
Used by 1
Load register
{ uint16_t r0 = (instr >> 9) & 0x7; uint16_t r1 = (instr >> 6) & 0x7; uint16_t offset = sign_extend(instr & 0x3F, 6); reg[r0] = mem_read(reg[r1] + offset); update_flags(r0);}
Used by 1
Load effective address
{ uint16_t r0 = (instr >> 9) & 0x7; uint16_t pc_offset = sign_extend(instr & 0x1FF, 9); reg[r0] = reg[R_PC] + pc_offset; update_flags(r0);}
Used by 1
Store
{ uint16_t r0 = (instr >> 9) & 0x7; uint16_t pc_offset = sign_extend(instr & 0x1FF, 9); mem_write(reg[R_PC] + pc_offset, reg[r0]);}
Used by 1
Store indirect
{ uint16_t r0 = (instr >> 9) & 0x7; uint16_t pc_offset = sign_extend(instr & 0x1FF, 9); mem_write(mem_read(reg[R_PC] + pc_offset), reg[r0]);}
Used by 1
Store register
{ uint16_t r0 = (instr >> 9) & 0x7; uint16_t r1 = (instr >> 6) & 0x7; uint16_t offset = sign_extend(instr & 0x3F, 6); mem_write(reg[r1] + offset, reg[r0]);}
Used by 1
7. Trap routines
The LC-3 provides a few predefined routines for performing common tasks and interacting with I/O devices. For example, there are routines for getting input from the keyboard and for displaying strings to the console. These are called trap routines which you can think of as the operating system or API for the LC-3. Each trap routine is assigned a trap code which identifies it (similar to an opcode). To execute one, the TRAP
instruction is called with the trap code of the desired routine.
Define an enum for each trap code:
enum{ TRAP_GETC = 0x20, /* get character from keyboard, not echoed onto the terminal */ TRAP_OUT = 0x21, /* output a character */ TRAP_PUTS = 0x22, /* output a word string */ TRAP_IN = 0x23, /* get character from keyboard, echoed onto the terminal */ TRAP_PUTSP = 0x24, /* output a byte string */ TRAP_HALT = 0x25 /* halt the program */};
Used by 1 2 3 4
You may be wondering why the trap codes are not included in the instructions. This is because they do not actually introduce any new functionality to the LC-3, they just provide a convenient way to perform a task (similar to OS system calls). In the official LC-3 simulator, trap routines are written in assembly. When a trap code is called, the PC
is moved to that code’s address. The CPU executes the procedure’s instructions, and when it is complete, the PC is reset to the location following the initial call.
Note: This is why programs start at address
0x3000
instead of0x0
. The lower addresses are left empty to leave space for the trap routine code.
There is no specification for how trap routines must be implemented, only what they are supposed to do. In our VM, we are going to do things slightly differently by writing them in C. When a trap code is invoked, a C function will be called. When it is completed, execution will return to the instructions. (If you are curious about trap codes in assembly, see Ryan’s implementation.)
Even though the trap routines can be written in assembly and this is what a physical LC-3 computer would do, it isn’t the best fit for a VM. Instead of writing our own primitive I/O routines, we can take advantage of the ones available on our OS. This will make the VM run better on our computers, simplify the code, and provide a higher level of abstraction for portability.
Note: Getting input from the keyboard is one specific example of this. The assembly version uses a loop to continuously check the keyboard for input. This consumes a lot of CPU time for nothing! Using a proper OS input function allows the program to sleep until input is received.
In the switch case for the TRAP
opcode, add another switch:
reg[R_R7] = reg[R_PC];switch (instr & 0xFF){ case TRAP_GETC: @{TRAP GETC} break; case TRAP_OUT: @{TRAP OUT} break; case TRAP_PUTS: @{TRAP PUTS} break; case TRAP_IN: @{TRAP IN} break; case TRAP_PUTSP: @{TRAP PUTSP} break; case TRAP_HALT: @{TRAP HALT} break;}
As with instructions, I will show you how to implement a single trap routine and leave the rest to you.
PUTS
The PUTS
trap code is used to output a null-terminated string (similar to printf
in C). The specification is found on page 543.
To display a string, we must give the trap routine a string to display. This is done by storing the address of the first character in R0
before beginning the trap.
The specification says:
Write a string of ASCII characters to the console display. The characters are contained in consecutive memory locations, one character per memory location, starting with the address specified in
R0
. Writing terminates with the occurrence ofx0000
in a memory location. (Pg. 543)
Notice that unlike C strings, characters are not stored in a single byte, but in a single memory location. Memory locations in LC-3 are 16 bits, so each character in the string is 16 bits wide. To display this with a C function, we will need to convert each value to a char and output them individually.
{ /* one char per word */ uint16_t* c = memory + reg[R_R0]; while (*c) { putc((char)*c, stdout); ++c; } fflush(stdout);}
Used by 1
That’s all for this routine. The trap routines are pretty straightforward if you are familiar with C. Go back to the specification and implement the others now. As with the instructions, the full code can be found at the end of the tutorial.
8. Trap routine cheat sheet
This section contains the full implementations of the remaining trap routines.
Input Character
/* read a single ASCII char */reg[R_R0] = (uint16_t)getchar();update_flags(R_R0);
Used by 1
Output Character
Prompt for Input Character
{ printf("Enter a character: "); char c = getchar(); putc(c, stdout); fflush(stdout); reg[R_R0] = (uint16_t)c; update_flags(R_R0);}
Used by 1
Output String
{ /* one char per byte (two bytes per word) here we need to swap back to big endian format */ uint16_t* c = memory + reg[R_R0]; while (*c) { char char1 = (*c) & 0xFF; putc(char1, stdout); char char2 = (*c) >> 8; if (char2) putc(char2, stdout); ++c; } fflush(stdout);}
Used by 1
Halt Program
9. Loading programs
We have mentioned a lot about loading and executing instructions from memory, but how do instructions get into memory in the first place? When an assembly program is converted to machine code, the result is a file containing an array of instructions and data. This can be loaded by just copying the contents right into an address in memory.
The first 16 bits of the program file specify the address in memory where the program should start. This address is called the origin. It must be read first, after which the rest of the data can be read from the file into memory starting at the origin address.
Here is the code for reading an LC-3 program into memory:
void read_image_file(FILE* file){ /* the origin tells us where in memory to place the image */ uint16_t origin; fread(&origin, sizeof(origin), 1, file); origin = swap16(origin); /* we know the maximum file size so we only need one fread */ uint16_t max_read = MEMORY_MAX - origin; uint16_t* p = memory + origin; size_t read = fread(p, sizeof(uint16_t), max_read, file); /* swap to little endian */ while (read-- > 0) { *p = swap16(*p); ++p; }}
Used by 1 2 3 4
Notice that swap16
is called on each loaded value. LC-3 programs are big-endian, but most modern computers are little-endian. So, we need to swap each uint16
that is loaded. (If you happen to be using an obscure computer, like an old PPC Mac, then do not swap.)
Note: Endianness refers to how bytes of an integer are interpreted. In little-endian, the first byte is the least significant digit, and in big-endian, it is reversed. As far as I know, the decision is mostly arbitrary. Different companies made different decisions, so now we are left with varying implementations. You do not need to know anything else about endianness for this project.
Let’s also add a convenience function for read_image_file
which takes a path a string;
int read_image(const char* image_path){ FILE* file = fopen(image_path, "rb"); if (!file) { return 0; }; read_image_file(file); fclose(file); return 1;}
Used by 1 2 3 4
10. Memory mapped registers
Some special registers are not accessible from the normal register table. Instead, a special address is reserved for them in memory. To read and write to these registers, you just read and write to their memory location. These are called memory mapped registers. They are commonly used to interact with special hardware devices.
The LC-3 has two memory mapped registers that need to be implemented. They are the keyboard status register (KBSR
) and keyboard data register (KBDR
). The KBSR
indicates whether a key has been pressed, and the KBDR
identifies which key was pressed.
Although you can request keyboard input using GETC
, this blocks execution until input is received. KBSR
and KBDR
allows you to poll the state) of the device and continue execution, so the program can stay responsive while waiting for input.
enum{ MR_KBSR = 0xFE00, /* keyboard status */ MR_KBDR = 0xFE02 /* keyboard data */};
Used by 1 2 3 4
Memory mapped registers make memory access a bit more complicated. We can’t read and write to the memory array directly, but must instead call setter and getter functions. When memory is read from KBSR
, the getter will check the keyboard and update both memory locations.
void mem_write(uint16_t address, uint16_t val){ memory[address] = val;}uint16_t mem_read(uint16_t address){ if (address == MR_KBSR) { if (check_key()) { memory[MR_KBSR] = (1 << 15); memory[MR_KBDR] = getchar(); } else { memory[MR_KBSR] = 0; } } return memory[address];}
Used by 1 2 3 4
That completes the last component of the VM! Provided that you implemented the rest of the trap routines and instructions, you are almost ready to try it out!
11. Platform specifics
This section contains some tedious details that are needed to access the keyboard and behave nicely. These are not insightful or relevant to learning about VMs. Feel free to copy paste! These functions should be declared above your main function.
Linux/macOS/UNIX
NOTE: Skip to the next section for the Windows versions of these functions.
struct termios original_tio;void disable_input_buffering(){ tcgetattr(STDIN_FILENO, &original_tio); struct termios new_tio = original_tio; new_tio.c_lflag &= ~ICANON & ~ECHO; tcsetattr(STDIN_FILENO, TCSANOW, &new_tio);}void restore_input_buffering(){ tcsetattr(STDIN_FILENO, TCSANOW, &original_tio);}uint16_t check_key(){ fd_set readfds; FD_ZERO(&readfds); FD_SET(STDIN_FILENO, &readfds); struct timeval timeout; timeout.tv_sec = 0; timeout.tv_usec = 0; return select(1, &readfds, NULL, NULL, &timeout) != 0;}
Used by 1 2
#include <stdio.h>#include <stdint.h>#include <signal.h>/* unix only */#include <stdlib.h>#include <unistd.h>#include <fcntl.h>#include <sys/time.h>#include <sys/types.h>#include <sys/termios.h>#include <sys/mman.h>
Used by 1 2
Windows
NOTE: If you already included the Unix versions don’t add these!
HANDLE hStdin = INVALID_HANDLE_VALUE;DWORD fdwMode, fdwOldMode;void disable_input_buffering(){ hStdin = GetStdHandle(STD_INPUT_HANDLE); GetConsoleMode(hStdin, &fdwOldMode); /* save old mode */ fdwMode = fdwOldMode ^ ENABLE_ECHO_INPUT /* no input echo */ ^ ENABLE_LINE_INPUT; /* return when one or more characters are available */ SetConsoleMode(hStdin, fdwMode); /* set new mode */ FlushConsoleInputBuffer(hStdin); /* clear buffer */}void restore_input_buffering(){ SetConsoleMode(hStdin, fdwOldMode);}uint16_t check_key(){ return WaitForSingleObject(hStdin, 1000) == WAIT_OBJECT_0 && _kbhit();}
Used by 1 2
#include <stdio.h>#include <stdint.h>#include <signal.h>/* windows only */#include <Windows.h>#include <conio.h> // _kbhit
Used by 1 2
All platforms
To properly handle input to the terminal, we need to adjust some buffering settings. The implementation for these varies for each platform and should have been defined above.
We include this setup code at the beginning of the program (start of main).
When the program is interrupted, we want to restore the terminal settings back to normal. This should be done at the end of the program.
Settings should also be restored if we receive a signal to end the program.
void handle_interrupt(int signal){ restore_input_buffering(); printf("\n"); exit(-2);}
Used by 1 2 3 4
Everything we have written so far should have been added to the C file in the following order:
@{Memory Mapped Registers}@{TRAP Codes}@{Memory Storage}@{Register Storage}@{Input Buffering}@{Handle Interrupt}@{Sign Extend}@{Swap}@{Update Flags}@{Read Image File}@{Read Image}@{Memory Access}@{Main Loop}
12. Running the VM
You can now build and run the LC-3 VM!
-
Compile the VM with your favorite C compiler. (Final source for unix, windows).
$ gcc lc3.c -o lc3-vm
-
Run the VM with the .obj file as an argument:
$ lc3-vm path/to/2048.obj
-
Play 2048!
Control the game using WASD keys. Are you on an ANSI terminal (y/n)? y +--------------------------+ | | | | | | | 2 | | | | 2 | | | | | | | +--------------------------+
Debugging
If the program doesn’t work correctly, it is likely because you programmed an instruction incorrectly. This can be tricky to debug. I recommend reading through the assembly source code of an LC-3 program while simultaneously using a debugger to step through the VM instructions one at a time. As you read the assembly, make sure the VM goes to the instruction that you expect it to. If a discrepancy occurs, you will then know which instruction caused the issue. Reread its specification and double check your code.
13. Alternate C++ technique
This section shows an advanced way of executing instructions that makes the code a whole lot smaller. This section is entirely optional.
Since C++ has powerful compile-time generics, we can use the compiler to generate parts of the instructions for us. This method reduces code duplication and is actually closer to how the computer would be wired in hardware.
The idea here is to reuse the steps common to each instruction. For example, several of the instructions use indirect addressing or sign extend a value and add it to the current PC value. Wouldn’t it be nice if we could write this code once for all instructions?
By thinking of the instruction as a pipeline of steps we can see that each instruction is just a permutation of several of the smaller steps. We will use bitwise flags to mark which steps to perform for each instruction. A 1
in the bit corresponding to the instruction number indicates the compiler should include this section of code for that instruction.
template <unsigned op>void ins(uint16_t instr){ uint16_t r0, r1, r2, imm5, imm_flag; uint16_t pc_plus_off, base_plus_off; constexpr uint16_t opbit = (1 << op); if (0x4EEE & opbit) { r0 = (instr >> 9) & 0x7; } if (0x12F3 & opbit) { r1 = (instr >> 6) & 0x7; } if (0x0022 & opbit) { imm_flag = (instr >> 5) & 0x1; if (imm_flag) { imm5 = sign_extend(instr & 0x1F, 5); } else { r2 = instr & 0x7; } } if (0x00C0 & opbit) { // Base + offset base_plus_off = reg[r1] + sign_extend(instr & 0x3F, 6); } if (0x4C0D & opbit) { // Indirect address pc_plus_off = reg[R_PC] + sign_extend(instr & 0x1FF, 9); } if (0x0001 & opbit) { // BR uint16_t cond = (instr >> 9) & 0x7; if (cond & reg[R_COND]) { reg[R_PC] = pc_plus_off; } } if (0x0002 & opbit) // ADD { if (imm_flag) { reg[r0] = reg[r1] + imm5; } else { reg[r0] = reg[r1] + reg[r2]; } } if (0x0020 & opbit) // AND { if (imm_flag) { reg[r0] = reg[r1] & imm5; } else { reg[r0] = reg[r1] & reg[r2]; } } if (0x0200 & opbit) { reg[r0] = ~reg[r1]; } // NOT if (0x1000 & opbit) { reg[R_PC] = reg[r1]; } // JMP if (0x0010 & opbit) // JSR { uint16_t long_flag = (instr >> 11) & 1; reg[R_R7] = reg[R_PC]; if (long_flag) { pc_plus_off = reg[R_PC] + sign_extend(instr & 0x7FF, 11); reg[R_PC] = pc_plus_off; } else { reg[R_PC] = reg[r1]; } } if (0x0004 & opbit) { reg[r0] = mem_read(pc_plus_off); } // LD if (0x0400 & opbit) { reg[r0] = mem_read(mem_read(pc_plus_off)); } // LDI if (0x0040 & opbit) { reg[r0] = mem_read(base_plus_off); } // LDR if (0x4000 & opbit) { reg[r0] = pc_plus_off; } // LEA if (0x0008 & opbit) { mem_write(pc_plus_off, reg[r0]); } // ST if (0x0800 & opbit) { mem_write(mem_read(pc_plus_off), reg[r0]); } // STI if (0x0080 & opbit) { mem_write(base_plus_off, reg[r0]); } // STR if (0x8000 & opbit) // TRAP { @{TRAP} } //if (0x0100 & opbit) { } // RTI if (0x4666 & opbit) { update_flags(r0); }}
Used by 1 2
static void (*op_table[16])(uint16_t) = { ins<0>, ins<1>, ins<2>, ins<3>, ins<4>, ins<5>, ins<6>, ins<7>, NULL, ins<9>, ins<10>, ins<11>, ins<12>, NULL, ins<14>, ins<15>};
Used by 1 2
Note: I learned about this technique from Bisqwit’s NES emulator. If you are interested in emulation, or the NES, I highly recommend his videos.
The rest of the C++ version uses the code we already wrote! The full source is here: unix, windows.
14. Contributions
atul-g has contributed a handy reference card that summarizes how the whole system works.
Many programmers have completed this tutorial and shared their implementations in various languages. A selection of these used to be listed, but since there were so many, we decided to utilize GitHub tags to organize them.
To list your own project, just make sure it is tagged with the GitHub topic lc3
. If your language is missing, feel free to submit a pull request.
Special thanks to inkydragon for contributing Windows platform support.
Want to contribute? We need help with an integration test. This is a good first issue to learn from.
发表评论:
◎欢迎参与讨论,请在这里发表您的看法、交流您的观点。