Writing a simple Compiler on my own - Revisit Queue and Parameter Checking (part 3) [C][Flex][Bison]




[Custom Thumbnail]

All the Code of the series can be found at the Github repository:
https://github.com/drifter1/compiler


Introduction

    Hello it's a me again @drifter1! Today we continue with my Compiler Series, a series where we implement a complete compiler for a simple C-like language by using the C-tools Flex and Bison. In this article we will get into the rest of the changes in functions that are needed for the Revisit Queue..

The topics that we will cover today are:

  1. Having a common Symbol Table Entry
  2. Function Parameter Check Function

Requirements:

    Actually you need to read and understand all the topics that I covered in the series as a whole, as these articles will give you access to knowledge about:
  • What Compiler Design is (mainly the steps)
  • For which exact Language the Compiler is build for (Tokens and Grammar)
  • How to use Flex and Bison
  • How to implement a lexer and parser for the language using those tools
  • What the Symbol Table is and how we implement it
  • How we combine Flex and Bison together
  • How we can pass information from the Lexer to the Parser
  • How we define operator priorities, precedencies and associativity
  • What Semantic Analysis is (Attributes, SDT etc.)
  • How we do the so called "Scope Resolution"
  • How we declare types and check the type inter-compatibility for different cases, including function parameters
  • How we check function calls later on using a "Revisit queue", as function declarations mostly happen after functions get used (in our Language)
  • Intermediate Code Representations, including AST's
  • How we implement an AST (structure and management)
  • Action Rules for AST nodes and more

Difficulty:

Talking about the series in general this series can be rated:
  • Intermediate to Advanced
Today's topic(s) can be rated:
  • Medium
So, without further ado, let's now finally start with the actual Tutorial...

Actual Tutorial Content

Having a common Symbol Table Entry

    As we described the problem till now, the symbol table entry is something that the function call and declaration have in common. The symbol table entry is representing the function identifier itself and is clearly independent of the way that we use this identifier. So, what needs to be managed now is making sure that the function calls and declaration are all "connected" to the same identifier, and so symbol table entry/item. This can be made even clearer if you take a look back to the visualization of part 1, which was:


    Thinking about the way that we managed declarations of identifiers in general, we can easily understand that we will use the same principle for function declarations too! We will simply define a new flag variable, let's say "function_decl", that will get a value of '1' whenever we are declaring a function. This will help the "insert" function distinguish that the change in scope of the function (scope == '1' instead of '0' which was the function call) is unimportant, so that we can store the return datatype, parameter count and datatype of each parameter, in the previous entry, without creating a new one. Having no other entry in the Symbol Table, the declaration is the first thing that occurs (the case of declaring "print" by ourselves), creating a new entry normally.

In Code all that looks like this:
symtab.c:
...
/* flag variable for function declaring */ int function_decl = 0; // 1: declaring function, 0: not
...
void insert(char *name, int len, int type, int lineno){ unsigned int hashval = hash(name); list_t *l = hash_table[hashval];
while ((l != NULL) && (strcmp(name,l->st_name) != 0)) l = l->next;
/* variable not yet in table */ if (l == NULL){ /* check if we are declaring */ if(declare == 1){ /* set up entry */ l = (list_t*) malloc(sizeof(list_t)); strncpy(l->st_name, name, len); l->st_size = len; l->st_type = type; l->scope = cur_scope; l->lines = (RefList*) malloc(sizeof(RefList)); l->lines->lineno = lineno; l->lines->next = NULL;
/* add to hashtable */ l->next = hash_table[hashval]; hash_table[hashval] = l; // printf("Inserted %s for the first time with linenumber %d!\n", name, lineno); } else{ /* add it to check it again later */ l = (list_t*) malloc(sizeof(list_t)); strncpy(l->st_name, name, len); l->st_size = len; l->st_type = type; l->scope = cur_scope; l->lines = (RefList*) malloc(sizeof(RefList)); l->lines->lineno = lineno; l->lines->next = NULL; l->next = hash_table[hashval]; hash_table[hashval] = l; // printf("Inserted %s at line %d to check it again later!\n", name, lineno);
/* Adding identifier to the revisit queue */ add_to_queue(l, l->st_name, PARAM_CHECK); } } /* found in table */ else{ // just add line number if(declare == 0){ /* find last reference */ RefList *t = l->lines; while (t->next != NULL) t = t->next;
/* add linenumber to reference list */ t->next = (RefList*) malloc(sizeof(RefList)); t->next->lineno = lineno; t->next->next = NULL; // printf("Found %s again at line %d!\n", name, lineno); } /* new entry */ else{ /* same scope - multiple declaration error! */ if(l->scope == cur_scope){ fprintf(stderr, "A multiple declaration of variable %s at line %d\n", name, lineno); exit(1); } /* other scope - but function declaration */ else if(function_decl == 1){ /* find last reference */ RefList *t = l->lines; while (t->next != NULL) t = t->next;
/* add linenumber to reference list */ t->next = (RefList*) malloc(sizeof(RefList)); t->next->lineno = lineno; t->next->next = NULL; } /* other scope - create new entry */ else{ /* set up entry */ l = (list_t*) malloc(sizeof(list_t)); strncpy(l->st_name, name, len); l->st_size = len; l->st_type = type; l->scope = cur_scope; l->lines = (RefList*) malloc(sizeof(RefList)); l->lines->lineno = lineno; l->lines->next = NULL;
/* add to hashtable */ l->next = hash_table[hashval]; hash_table[hashval] = l; // printf("Inserted %s for a new scope with linenumber %d!\n", name, lineno); } } } }
...

    Note that this all is easy in our language as we don't have multiple scopes and managing the two scopes "global (0)" and "function (1)" is pretty easy. The interesting part of the "insert" function is that it get's called whenever an identifier is being found by the lexer. This means that managing how this function works is quite important! Either way, making this new change and managing the print function (next part) the symbol table dump file would return the following global scope for the "full_example.c" program:


Without hiding the scope (commenting out the "hide_scope" function) we have:


    This clearly shows us that the function call and declaration don't have different entries anymore! That way only the function parameters and variables declared inside of a function's body have a different scope...

Function Parameter Check Function

    We know that the correct parameter count and datatype of each parameter, which are given by the function declaration, are stored inside of the symbol table entry of the function. In the same notion the parameter datatypes and count for each function call, are stored inside of the revisit queue entry. Knowing all that we can easily loop through each function call, based on the "num_of_calls" variable, and check the count and afterwards the datatype compatibility for each parameter. The number of parameters is a simple if check, where we check if the number of parameters of the function call and declaration are equal, whilst the other get's managed by the semantic check function: "get_result_type".

In Code all this looks like this:
symtab.h:
...
int func_param_check(char *name, int num_of_calls, int** par_types, int *num_of_pars);
...
------------------------------------------------------------------------ symtab.c:
...
int func_param_check(char *name, int num_of_calls, int** par_types, int *num_of_pars){ int i, j, type_1, type_2;
/* lookup entry */ list_t *l = lookup(name);
/* for all function calls */ for(i = 0 ; i < num_of_calls ; i++){ /* check number of parameters */ if(l->num_of_pars != num_of_pars[i]){ fprintf(stderr, "Function call of %s has wrong num of parameters!\n", name); exit(1); } /* check if parameters are compatible */ for(j = 0; j < num_of_pars[i]; j++){ /* type of parameter in function declaration */ type_1 = l->parameters[j].par_type;
/* type of parameter in function call*/ type_2 = par_types[i][j];
/* check compatibility for function call */ get_result_type(type_1, type_2, NONE); /* error occurs automatically in the function */ } }
return 0; /* success */ }

    As you can also see from the comments, the function will return '0' when the check is successful! On an error we will exit the whole compilation procedure, printing out an error message before-hand.

    Another small thing that we should tweak is the "func_declare" function, to get rid of the segmentation fault error that occurs, whenever we use this function while we don't have a symbol table entry returned by the "lookup" function. To fix this we just have to add a simple NULL check for the returned symbol table entry 'l'. In code this looks like this:
symtab.h:
...
int func_declare(char *name, int ret_type, int num_of_pars, Param *parameters);
...
--------------------------------------------------------------- symtab.c:
...
int func_declare(char *name, int ret_type, int num_of_pars, Param *parameters){ /* lookup entry */ list_t *l = lookup(name);
if(l != NULL){ /* if type is not defined yet */ if(l->st_type == UNDEF){ /* entry is of function type */ l->st_type = FUNCTION_TYPE;
/* return type is ret_type */ l->inf_type = ret_type;
/* parameter stuff */ l->num_of_pars = num_of_pars; l->parameters = parameters;
return 0; /* success */ } /* already declared error */ else{ fprintf(stderr, "Function %s already declared!\n", name); exit(1); } } }
...

RESOURCES

References:

No references, just using code that I implemented in my previous articles.

Images:

All of the images are custom-made!

Previous parts of the series

General Knowledge and Lexical Analysis

Syntax Analysis

  • Syntax Analysis Theory
  • Bison basics
  • Creating a grammar for our Language
  • Combine Flex and Bison
  • Passing information from Lexer to Parser
  • Finishing Off The Grammar/Parser (part 1)
  • Finishing Off The Grammar/Parser (part 2)
  • Semantic Analysis (1)

  • Semantic Analysis Theory
  • Semantics Examples
  • Scope Resolution using the Symbol Table
  • Type Declaration and Checking
  • Function Semantics (part 1)
  • Function Semantics (part 2)
  • Intermediate Code Generation (AST)

  • Abstract Syntax Tree Principle
  • Abstract Syntax Tree Structure
  • Abstract Syntax Tree Management
  • Action Rules for Declarations and Initializations
  • Action Rules for Expressions
  • Action Rules for Assignments and Simple Statements
  • Action Rules for If-Else Statements
  • Action Rules for Loop Statements and some Fixes
  • Action Rules for Function Declarations (part 1)
  • Action Rules for Function Declarations (part 2)
  • Action Rules for Function Calls
  • Semantic Analysis (2)

  • Datatype attribute for Expressions
  • Type Checking for Assignments
  • Revisit Queue and Parameter Checking (part 1)
  • Revisit Queue and Parameter Checking (part 2)

  • Final words | Next up on the project

         And this is actually it for today's post! I hope that I explained everything as much as I needed to, meaning that you learned something out of it.
    Next up on this series are:
    • Semantic analysis (using even more action rules in Bison)
    • Machine Code generation (MIPS Assembly)
         Which are all topics that will need more than one article to complete. Also, note that we might also get into Optimizations later on, or could even extend the Language by adding complex datatypes (structs and unions), more rules etc.
    So, see ya next time!

    GitHub Account:

    https://github.com/drifter1

    Keep on drifting! ;)

    H2
    H3
    H4
    3 columns
    2 columns
    1 column
    4 Comments
    Ecency