ccmic is a C++ implementation of a C compiler for a non-trivial subset of the C programming language, conforming to the C17 standard. The compiler's design is based on the principles and practices outlined in Writing a C Compiler by Nora Sandler. It is continuously developed and tested against the the book's companion test suite (hosted by Nora Sandler) (also included in the repository as a submodule (tests/)). As new features and optimizations are integrated into the codebase, the compiler's capabilities continue to expand modularly.
The compiler transforms C source code into X86-64 assembly through a multi-stage pipeline:
- Lexing (Lexical Analysis): Regex-based tokenization of C source.
- Parsing (Syntactic Analysis): Recursive descent parsing with precedence climbing for abstract syntax tree (AST) construction.
- Leverages the Visitor design pattern for AST traversal(s), where
ASTnodes accept aVisitorinterface (defined insrc/frontend/visitor.h), in which the design separates algorithms (e.g., pretty-printing, semantic analysis, IR generation) from the object structure, enabling the addition of new operations without modifying the AST classes.
- Leverages the Visitor design pattern for AST traversal(s), where
- Semantic Analysis: Type checking, symbol resolution, and loop labeling for AST validation.
- IR Generation: AST lowering to a custom intermediate representation (IR).
- Code Generation (Assembly Generation): IR-to-assembly translation, stack allocation, and fixup passes for X86-64.
- Assembly Emission: Final X86-64 assembly output ready for assembling and linking to an executable.
╭────────────────────────╮
│ Lexing │
│ (Lexical Analysis) │
╰────────────────────────╯
│
▼
╭──────────────────────────╮
│ Parsing │
│ (Syntactic Analysis) │
╰──────────────────────────╯
│
▼
╭───────────────────────╮
│ Semantic Analysis │
╰───────────────────────╯
│
▼
╭───────────────────────╮
│ IR Generation │
╰───────────────────────╯
│
▼
╭───────────────────────────╮
│ Code Generation │
│ (Assembly Generation) │
╰───────────────────────────╯
│
▼
╭───────────────────────╮
│ Assembly Emission │
╰───────────────────────╯
- Signed integers:
int(32-bit) andlong(64-bit). - Unsigned integers:
unsigned int(32-bit) andunsigned long(64-bit).
- Unary operations:
-,~, and!. - Binary operations: Arithmetic (
+,-,*,/,%), bitwise (&,|), assignment (=), logical (&&,||), and relational (<,>,<=,>=,==,!=). - Type conversions: Explicit casts and implicit conversions following the usual arithmetic conversion rules.
- Local variables: Declarations and usage within functions and scopes.
- Control flow:
if/elsestatements and conditional expressions (? :). - Compound statements: Nested blocks using
{}. - Loops:
for,while, anddo-whileloops withbreakandcontinue.
- Function definitions: Parameters and return types.
- Function calls: User-defined and standard library functions.
- Global variables: File-scope declarations and initialization.
- Storage specifiers:
staticandexternfor visibility and linkage control.
The implementation is organized into several key directories and file(s):
- src/frontend/: Lexer, parser, AST, and semantic analysis.
- src/midend/: IR generation (and optimization passes to be implemented).
- src/backend/: Assembly generation, stack allocation, and fixup passes.
- src/utils/: Pipeline orchestration, including assembly emission, and pretty-printers for debugging.
- src/main.cpp: Entry point of the compiler, orchestrating the compilation pipeline based on command-line arguments.
➜ ccmic git:(main) tree src
src
├── backend
│ ├── assembly.cpp
│ ├── assembly.h
│ ├── assemblyGenerator.cpp
│ ├── assemblyGenerator.h
│ ├── backendSymbolTable.cpp
│ ├── backendSymbolTable.h
│ ├── fixupPass.cpp
│ ├── fixupPass.h
│ ├── pseudoToStackPass.cpp
│ └── pseudoToStackPass.h
├── frontend
│ ├── ast.h
│ ├── block.cpp
│ ├── block.h
│ ├── blockItem.cpp
│ ├── blockItem.h
│ ├── constant.cpp
│ ├── constant.h
│ ├── declaration.cpp
│ ├── declaration.h
│ ├── expression.cpp
│ ├── expression.h
│ ├── forInit.cpp
│ ├── forInit.h
│ ├── frontendSymbolTable.h
│ ├── function.cpp
│ ├── function.h
│ ├── lexer.cpp
│ ├── lexer.h
│ ├── operator.cpp
│ ├── operator.h
│ ├── parser.cpp
│ ├── parser.h
│ ├── printVisitor.cpp
│ ├── printVisitor.h
│ ├── program.cpp
│ ├── program.h
│ ├── semanticAnalysisPasses.cpp
│ ├── semanticAnalysisPasses.h
│ ├── statement.cpp
│ ├── statement.h
│ ├── storageClass.cpp
│ ├── storageClass.h
│ ├── type.cpp
│ ├── type.h
│ └── visitor.h
├── main.cpp
├── midend
│ ├── ir.cpp
│ ├── ir.h
│ ├── irGenerator.cpp
│ ├── irGenerator.h
│ ├── irOptimizationPasses.cpp
│ └── irOptimizationPasses.h
└── utils
├── compilerDriver.cpp
├── compilerDriver.h
├── constants.h
├── pipelineStagesExecutors.cpp
├── pipelineStagesExecutors.h
├── prettyPrinters.cpp
└── prettyPrinters.hgit clone --recurse-submodules https://github.com/zzmic/ccmic.git- Clang that supports C++23 (or above) for building the compiler.
- GCC that supports X86-64 and C17 for preprocessing, assembling, and linking.
# Remove any previous build artifacts if necessary.
make clean
# Build the compiler with the number of available CPU cores for parallel compilation.
make -j$(nproc)bin/main [--lex] [--parse] [--validate] [--tacky] [--codegen] [-S] [-s] [-c] [-o <outputFile>] <sourceFile>- Pipeline control:
--lex(lexical analysis),--parse(syntactic analysis),--validate(semantic analysis),--tacky(IR generation), and--codegen(code generation). - Output options:
-Sor-s(assembly emission),-c(object file emission), and-o <outputFile>(specify output file, default to the program name (i.e., the base name of the source file)). - Optimizations (to be implemented):
--fold-constants(constant folding),--eliminate-unreachable-code(dead code elimination),--propagate-copies(copy propagation),--eliminate-dead-stores(dead store elimination), and--optimize(enable all optimizations).
To generate a compile_commands.json file for use with tools that support the JSON compilation database format, run the following command (assuming that compiledb is installed):
compiledb make -j$(nproc)or
make -j$(nproc) compiledbTo perform static analysis on the codebase using Clang-Tidy, run the following command (assuming that the LLVM toolchain is installed) with the compilation database generated in the previous step/section:
run-clang-tidy -j$(nproc) -p .or
make -j$(nproc) tidyThe checks and options for Clang-Tidy are configured (and can be further customized) in .clang-tidy.
- Adding language features: Expand the frontend in
src/frontend/by modifying the lexer, parser, and AST nodes, as well as updating semantic analysis. - Adding optimizations: Implement new optimization passes in
src/midend/by following the existingIR::OptimizationPasspattern. - Extending code generation: Modify or add new code generation strategies in
src/backend/. - Debugging: Leverage the implemented pretty-printers for IR and assembly inspection in
src/utils/(andgdb/lldb). - Testing: Run tests using the companion test suite linked in the overview section.
This project is not an orthodox implementation or artifact from the referenced literature. Therefore, I should be accountable for any errors and flaws in the implementation and documentation, not the original authors of the sources.