Aho, Alfred V., Lam, Monica S., Sethi, Ravi
and Ullman, Jeffrey D.: Compilers: Principles, Techniques,
and Tools.. 2nd edition. Addison-Wesley, 2006.
The first edition of the textbook is also quite workable, especially for the earlier units of this course.
Students might find the O'Reilly book on Lex and Yacc (two tools
which are used to automate the front-end of the compiler) helpful.
However, much of the information in this book is dated and available
for free online or in the man pages, which students should absolutely
read thoroughly!
The make program will prove very useful in developing the compiler.
There is an O'Reilly book on make, but again for the purposes of
this project, there is enough free information available online.
The definitive reference for the C language is the ISO standard. The 1999 standard introduced a number of language extensions, some of which we may chose to ignore in class. A working draft of that standard was openly available and a copy may be downloaded here. A more recent revision known as "C11" which was adopted in 2011 is here
For the purposes of this course, in which we are not actually trying to
build a production compiler, any of these standards can be used for
reference.
Another recommended book is:
Harbison, Samuel P and Steele, Guy L.: C: A reference manual.
5th edition. Prentice Hall, 2006.
This is basically a regurgitation of the ISO C standard, but with better examples and explanations, and a discussion of backwards compatibility issues with ANSI C (1989 standard) and classic K&R C.
This is a graduate-level course. Student are expected to do the
necessary readings and research on their own. Lecture notes will
be posted here, but they do not comprise the entirety of the course material
for which the student is responsible.
The course is divided into several units, outlines of which are
presented below. These units will not necessarily be confined to
a single week.
Note: lecture notes are in PostScript, which can be output directly
to any PostScript printer, or viewed with a free viewer such as ghostscript.
Assignments may be in text or PostScript form.
Special COVID note: Online lectures will follow the lecture notes,
all of which are published here in advance. It is highly suggested that you
pre-read these notes. It is also highly appreciated that in this graduate
level course, you participate in online class and are prepared to discuss
the material, ask questions, etc.
Topics: Errors vs warnings, error recovery, error reporting,
memory allocation strategies within the compiler.
Lecture Notes (PS Format)
Topics: compile-time vs run-time, syntax-directed translation, abstract
syntax trees, type systems, symbol tables and scoping rules, feedback
between lexer, parser and semantic stages, overview of C language
type and namespace issues.
Lecture Notes (PS Format)
Graphical vs linear IR, internal vs external IR. 1-address and 3-address
linear IRs. Overview of IR approaches of popular languages including
Perl, Python, Java, PHP. Overview of LLVM IR. Techniques for generating
IR from AST.
Lecture notes
Having completed AST generation and declaration/symbol handling for the
entire language, it is now time to enter the final phase "how do we do it?"
Theory of optimization at the quad level. Examples of several optimization
algorithms.
Lecture notes
We'll explore the X86 assembly language in some depth, with special
attention to variables, addressing modes, and function calling/register
usage conventions. If time permits, we'll contrast this CISC architecture
with a RISC architecture.
Overview of Instruction selection and Register Allocation.
Challenges in one- and two-address architectures.
Lecture notes unit 8
It's the final curtain!
Here are some resources to help you complete this last assignment:
Note: these are old, I'm looking for more recent versions
Intel X86 reference manual (over 2000 pages!)
Note that the instruction set is presented according to Intel syntax,
which differs from the "UNIX" or "AT&T" syntax which is used on
Solaris, Linux and BSD systems.
Sun Microsystems x86 Assembly Language Reference
This booklet presents an overview of the X86 assembler in UNIX notation and
provides a translation of opcode names and addressing mode nomenclature between
UNIX and Intel formats. It does not describe the individual instructions
in detail
SPARC Assembly Reference Manual
Provides an overview of the SPARC V8 and V9 assemblers. Doesn't give a
detailed explanation of each instruction but there is a summary of all
instructions and addressing modes
SPARC Architecture Manual
Provides detailed coverage of the SPARC V9 architecture including
operation of the processor, register windowing, addressing modes,
opcodes and instruction formats. This manual was written by the
SPARC International group and as such is OS and assembler agnostic.
But unlike the Intel reference manuals, the notation used follows
the UNIX standard very closely.