Writing a simple Compiler on my own - Introduction

    Hello it's a me again Drifter Programming! Today we start a new parted series, where I will get into all the steps needed to write a simple and basic compiler for some programming language. This post will give you information about what you have to know before getting into this series and the stuff that we will cover in this series. So, without further do, let's get started!

What is a Compiler?

    Compilers are programs that take a source code written in some programming language as input and translate it into machine code so that a specific CPU can run it.

    To do this more easily we split the compilation process into some steps, where each step can be approached on it's own using the output of the previous step mostly, but for simplicity here in this series we might also combine some steps into one program.

The steps are:

  1. Lexical Analysis (split the input up into tokens or lexical units)
  2. Syntax Analysis or Parsing (syntax tree of the tokens)
  3. Semantic Analysis (check correctness of syntax tree)
  4. Intermediate Code Generation (abstract syntax tree with important nodes only)
  5. Optimization (Optimize the AST by doing some operations)
  6. Machine code generation (Generate Assembly code for some CPU)

    I will not get more deeply into this, cause each step will be discussed in one or more parts of this series, where I will get into the theory behind it and also the implementation for most of it.


What you have to know before getting into Compiler Design

    Compiler Design is difficult and needs knowledge from various topics of Computer Science. You need to know the programming language C pretty good, cause the Lexer and Parser will be written in Flex and Bison that are C-based tools that help us with programming all those state-machines.


    You will also need knowledge in Data structures like Hashtables, Trees and Lists, cause the data of the ST, AST and the Symbol Table will be stored in such structures.

    

    Knowing some algorithms like Graph Coloring (that I will get into the following days, but implemented in Java for simplicity) is also good, cause the register-assigment-problem can be solved using such an algorithm. 


    The final and last thing you have to know is the Assembly language of the CPU you want the code to run. I know MIPS Assembly pretty well and also did a series about everything you need to know, and so we will write the Compiler for an MIPS CPU and so in MIPS Assembly that is actually not so difficult to write with.

    In my Recap of the year you can find my C (that also contain Datastructures) and Assembly posts and even more that might help you or that you might be interested in.


What will I learn watching this series?

    I have not selected a specific programming language to write with yet, but I'm thinking of some older language like C, Fortran, Pascal, Basic etc. that has simple syntax rules and that we can simplify even more if we want to.

    In this series we will get into all the steps needed to write an Compiler for an programming language by using the C-tools Flex and Bison that will help us in mostly everything. We will start by getting into Lexical Analysis and Regular Expressions to get the tokens, and then combine those Tokens with Bison in an Syntax Tree that will actually be an Abstract Syntax Tree, where we also check the semantics. After that we will get into some simple optimizations and finally into the machine code generation for an MIPS CPU.

The steps and the whole procedure look pretty simple in theory, but the coding will be difficult and will need a lot of thinking in some cases. The easiest part will be the Lexical and Syntax Analysis. After that "sh*t gets real" :P

I hope that I didn't scared you away :)

The whole process of writing an Compiler can give you a lot of useful knowledge!

    It's not about the compiler only. Think about the Datastructures and Algorithms that we need to get things running. The Tools that we will use. How much better we will understand how code is executed on an CPU and how much better we will get writing in C!


Image sources:


And this is actually it for today's post!

I hope that you will stay and start watching this series, cause the journey will be worth it!

Next time we will start getting into Lexical Analysis.

Bye!

H2
H3
H4
3 columns
2 columns
1 column
Join the conversation now
Logo
Center