CMSC 129 – Principles of Compiler Design – Compilers



CMSC 129 – Principles of Compiler Design – Compilers

0 0


cmsc129-slide2


On Github tjmonsi / cmsc129-slide2

CMSC 129

Principles of Compiler Design

Created by TJ Monserrat

Brief introduction of Designing Compilers

Structure of Compiler

Compilers

For the whole semester: We will learn how to design and implement compilers.

What are Compilers?

Programs that can turn everyday human language to things that can be executed by computers

Type of Compilers

Type of Compilers

Type of Compilers

What will we be doing for the project?

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

Parts of a Compiler

What will we be doing for the project?

What will we be doing for the project?

What will we be doing for the project?

Parts of Compiler: Lexical Analysis

Parts of Compiler: Lexical Analysis

Takes in Source Code or Character Stream and turns it into an array of tokens or Lexemes

Parts of Compiler: Lexical Analysis

Each token is in the form of:

  [token-name, attribute-value]
  • token-name - abstract symbol or key identifier that is used during syntax analysis
  • attribute-value - Points to an entry in the symbol table for this token.
    • That's why we need Symbol Tables...
    • Information from the symbol-table entry is needed for semantic analysis and code generation.

These tokens will be passed to the Syntax Analyzer

Parts of Compiler: Lexical Analysis

For Example

  position = initial + rate * 60
  • position is a lexeme that can mapped to a token of [id, 0].
    • id is an abstract symbol for it being an identifier or a variable
    • 0 points to the index of the symbol-table entry for the word position

Parts of Compiler: Lexical Analysis

For Example

  position = initial + rate * 60
  • = is a lexeme that can mapped to a token of [eq].
    • Since this token doesn't need an attribute, we can omit the second component
    • We could use any other token-name like '=' or 'assign'... as long as it serves the purpose: to identify that this is an assign statement.

Parts of Compiler: Lexical Analysis

For Example

  position = initial + rate * 60
  • initial is a lexeme that can mapped to a token of [id, 1].
    • 1 points to the index of the symbol-table entry for the word initial
  • + is a lexeme that can mapped to a token of [add].
  • rate is a lexeme that can mapped to a token of [id, 2].
    • 2 points to the index of the symbol-table entry for the word rate

Parts of Compiler: Lexical Analysis

For Example

  position = initial + rate * 60
  • * is a lexeme that can mapped to a token of [mul].
  • 60 is a lexeme that can mapped to a token of [number, 3].
    • 3 points to the index of the symbol-table entry for the value 60

Parts of Compiler: Lexical Analysis

This makes the example:

  position = initial + rate * 60

into...

  Lexemes:
  [ [id, 0], [eq], [id, 1], [add], [id, 2], [mul], [number, 3] ]

  SymbolTable: {
    0: { value: 'position', ... },
    1: { value: 'initial', ... },
    2: { value: 'rate', ... },
    3: { value: 60, ... }
  }

Parts of Compiler: Syntax Analysis

Parts of Compiler: Syntax Analysis

The Lexemes...

  [ [id, 0], [eq], [id, 1], [add], [id, 2], [mul], [number, 3] ]

...will need to be fed to the syntax analyzer to produce a syntax tree like this...

  [eq]
    |--------
    |       |
  [id, 0] [add]
            |--------
            |       |
         [id, 2]  [mul]
                    |---------
                    |        |
                 [id, 3]  [number, 60]
            

Parts of Compiler: Syntax Analysis

To be able to do this...

  [eq]
    |--------
    |       |
  [id, 0] [add]
            |--------
            |       |
         [id, 2]  [mul]
                    |---------
                    |        |
                 [id, 3]  [number, 60]
            

We need to know how to parse them using Context-free grammars.

Parts of Compiler: Intermediate Code Generation

Parts of Compiler: Intermediate Code Generation

  [eq]
    |--------
    |       |
  [id, 0] [add]
            |--------
            |       |
         [id, 2]  [mul]
                    |---------
                    |        |
                 [id, 3]  [number, 60]
            

The system can create an intermediate code before translating it into the target program code.

  t1 = 60
  t2 = id3 * t1
  t3 = id2 + t2
  id1 = t3

Parts of Compiler: Intermediate Code Generation

  t1 = 60
  t2 = id3 * t1
  t3 = id2 + t2
  id1 = t3

We need to transform parse trees into a three-address code sequence. Note that three-address code sequences will be of the following...

  • It has AT MOST ONE OPERATOR on the right side
  • The compiler must generate a temporary name to hold the value computed by a three-address instruction.
  • Some "three-address instructions" have fewer than three operands.

Parts of Compiler: Code Generation

Parts of Compiler: Code Generation

  t1 = 60
  t2 = id3 * t1
  t3 = id2 + t2
  id1 = t3

We can then transform the intermediate code generation to the target code that can be understood by the system

  LDF R2, id3
  MULF R2, R2, #60.0
  LDF R1, id2
  ADDF R1, R1, R2
  STF id1, R1

Parts of Compiler: Code Generation

  t1 = 60
  t2 = id3 * t1
  t3 = id2 + t2
  id1 = t3

But for our class, we need to transform the intermediate code above to our pseudo-ASM that we will discuss next time.

  load t1 
  60
  load t2
  mul id3 t1
  load t3
  add id2 t2
  load id1
  t3

Parts of Compiler: Symbol Table Management

Parts of Compiler: Symbol Table Management

An essential function of a compiler is to record the variable names used in the source program and collect information about various attributes of each name.

Parts of Compiler: Symbol Table Management

These attributes may provide information about the storage allocated for:

  • name
  • type
  • scope (where in the program its value may be used)
  • in the case of procedure names:
    • the number of arguments
    • types of each of the arguments
    • the method of passing each argument
    • the type returned

Programming Language Basics

Things you need to ponder when designing a language:

Programming Language Basics - Static/Dynamic Distinction

  • Static - a language uses a policy that allows the compiler to decide an issue
  • Dynamic - a policy that only allows a decision to be made when we execute the program

Programming Language Basics - Environments and States

  • Environment - mapping from names to locations in the store.
  • State - mapping from locations in store to their values.the state maps left-values to their corresponding right-values (x = 3)

Programming Language Basics - Static Scope and Block Structure

We should consider static-scope rules for a language with blocks, where a block is a grouping of declarations and statements.

Programming Language Basics - Static Scope and Block Structure

  const y = 10;
  const x = (z, a, b = 10) => {
    let q = 0;
    const m = (p) => {
      return p + 5;
    }
    q = (b === 10) ? m(z) + a : m(b);
    setTimeout(() => {
      y = q + 5
    }, 5000);
  }
  x(y, 25);

Programming Language Basics - Static Scope and Block Structure

  const y = 10;
  const x = (z, a, b = 10) => {
   ...
  }
  x(y, 25);

  // Outer scope has variables y and x, where x is a function

Programming Language Basics - Static Scope and Block Structure

  const x = (z, a, b = 10) => {
    let q = 0;
    const m = (p) => {
      ...
    }
    q = (b === 10) ? m(z) + a : m(b);
    setTimeout(() => {
      ...
    }, 5000);
  }
  // the function x has parameter values z, a, b, q, and m, 
  // where m is also a function

Programming Language Basics - Static Scope and Block Structure

  const y = 10;
  const x = (z, a, b = 10) => {
    let q = 0;
    const m = (p) => {
      ...
    }
    q = (b === 10) ? m(z) + a : m(b);
    setTimeout(() => {
      ...
    }, 5000);
  }

  // Take note that function x also includes variable y
  // as part of its scope.

Programming Language Basics - Static Scope and Block Structure

  const m = (p) => {
    return p + 5;
  }

  // function m includes variable p.

Programming Language Basics - Static Scope and Block Structure

  const y = 10;
  const x = (z, a, b = 10) => {
    let q = 0;
    const m = (p) => {
      return p + 5;
    }
    q = (b === 10) ? m(z) + a : m(b);
    setTimeout(() => {
      ...
    }, 5000);
  }

  // But the same as function x, function m also includes:
  // z, a, b, q, and y as part of its scope

Programming Language Basics - Static Scope and Block Structure

  setTimeout(() => {
    y = q + 5
  }, 5000);

  // The anonymous function inside the parameter callback 
  // of the setTimeout function includes y and q even if not initialized...

Programming Language Basics - Static Scope and Block Structure

  const y = 10;
  const x = (z, a, b = 10) => {
    let q = 0;
    const m = (p) => {
      ...
    }
    q = (b === 10) ? m(z) + a : m(b);
    setTimeout(() => {
      y = q + 5
    }, 5000);
  }

  // Because the anonymous function is within the scope of
  // function x.

Programming Language Basics - Static Scope and Block Structure

Note that Scoping affects the Symbol Table structure.

Programming Language Basics - Parameter Passing

  • Pass-by-value - the actual parameter is evaluated (if it is an expression) or copied (if it is a variable)
  • Pass-by-reference - the address of the actual parameter is passed to the callee as the value of the corresponding formal parameter.

Exercises

Every exercise will have an effect on your project.

Exercises

  • Grammar Creation
    • 1. Create the Grammar of the Intermediate Code
    • 2. Create your own Source Code Grammar
  • Lexical Analysis
    • 3. DFA of the Intermediate Code
    • 4. DFA of your own Source Code
    • 5. Token list creation of the Intermediate Code
    • 6. Token list creation of your own Source Code

Exercises

  • Syntax Analysis
    • 7. Error Reporting of the Intermediate Code
    • 8. Error Reporting of your own Source Code
    • 9. Syntax Tree creation of the Intermediate Code
    • 10. Syntax Tree creation of your own Source Code

Exercises

  • Intermediate Code Generation
    • 11. Three-address Code generation from the Syntax Tree of Intermediate Code
    • 12. Three-address Code generation from the Syntax Tree of your own Source Code

Exercises

All exercises will building towards the project, Except the Target-Code Generation, because...

Project

Translate own Source code to Target-code generation, passing through all of the phases:

  • Lexical Analysis: Given Source Code, Show Lexemes
  • Syntax Analysis: Given Lexemes, Show Parse Tree
  • Intermediate Code Generation: Given Parse Tree, Show Three-Address Code
  • Target Code Generation: Given Three-Address Code, Show Target-Code generation
  • Run Target-Code Generation: Given input, show output

New Grading System

  • 1.0 - All 14 exercises done and working, Project working with documentation

New Grading System - Subtractions

  • Did 8-11 projects, all working: -0.25
  • Did 5-7 projects, all working: -0.5
  • Did 1-4 projects, all working: -0.75
  • Did 0 projects, all working: -1.0

New Grading System - Subtractions

  • Did not do final project: -1.0
  • Did final project but has major bugs: -0.5
  • Did not do documentation: -0.5
  • Documentation lacking in substance: -0.25

New Grading System - Notes to live by...

  • If after the grade subtraction, grade is < 3.0 then grade is INC

Target-Code: Pseudo-ASM Language

// SOURCE CODE
var y;
var x = 2;
y = x + 1
// PSEUDO-ASM
start main
var y
var x
load x 2
load y
add x 1
end

Target-Code: Pseudo-ASM Language

// SOURCE CODE
var x;
var y;
...
if ((x < 3) && (y >= 3)) {
  x = 4
}
else {
  x = 2
}

Target-Code: Pseudo-ASM Language

// PSEUDO-ASM
start main
var x
...
var temp1
load temp1
lt x 3
var temp2
load temp2
load gte y 3
var temp3
load temp3
and temp1 temp2
cond temp3
load x 4
else
load x 2
endcond
end

Target-Code: Pseudo-ASM Language

// SOURCE CODE
for (x=0; x<10; x++) {
  console.log(x)
}

// or

var x = 0
while (x<10) {
  console.log(x)
  x++
}

Target-Code: Pseudo-ASM Language

// PSEUDO-ASM
start main
var x
load x 0
var temp1
load temp1
lt x 10
loop temp1
print x
load x
add x 1
endloop
end

Target-Code: Pseudo-ASM Language

// SOURCE CODE
var x = 0
do {
  console.log(x)
  x++;
} while(x<10)

Target-Code: Pseudo-ASM Language

// PSEUDO-ASM
start main
var x
load x 0
floop
print x
add x 1
var temp1
load temp1
lt x 10
endfloop temp1
end

Target-Code: Pseudo-ASM Language

// SOURCE CODE
function c() {
  console.log("Hah")
}

function b(y, z) {
  console.log(y)
  console.log(z)
  console.log("Hey")
}

function a() {
  var x = 0;
  var y = 1;
  b(x, y);
  console.log("Yoh")
  c();
  return x
}

var x;
x = a();

Target-Code: Pseudo-ASM Language

// PSEUDO-ASM
start c
print "Hah"
end

start b param var y var z
print y
print z
print "Hey"
end

start a
var x
var y
load x 0
load y 1
call b x y
print "Yoh"
call c
return x 
end

start main
var x
load x
call a
end

How to get the slides or Class Specs

Google Docs: https://goo.gl/SLMzVB

Slides: https://github.com/tjmonsi/cmsc129-slide2

CMSC 129 Principles of Compiler Design Created by TJ Monserrat