Syntax analysis

Description

Syntax analysis is the second phase in compiler design where the lexical tokens generated by the lexical analyzer are validated against a grammar defining the language syntax.

I - Grammar

A language syntax is determined by a set of productions forming a grammar. Constructed grammars must satisfy the LL(1) (left to right, leftmost derivation, 1 lookahead) conditions.

LL(1) Conditions

A - No left recursion

Example of a left recursion

A -> A a | b

Solution for a left recursion

A -> b B
B -> a B | ε

B - Intersection of First sets in same production must be empty

Example of a non-empty intersection of First sets in a same production

A -> B C | D E
B -> F G
D -> F H

In the above grammar, First(F) ∈ First(B) and First(F) ∈ First(D), therefore First(B) ∩ First(D) ≠ {}

One solution for this problem

A -> F I
I -> G C | H E

C - Intersection of First and Follow sets of a non-terminal must be empty

Example of a non-empty intersection of First and Follow sets of a non-terminal

A -> B a
B -> a | ε

In the above grammar, First(B) ∩ Follow(B) = {a}

One solution for this problem

A -> a C
C -> a | ε

JSON structure

Terminals and Non-terminals

  • Non-terminals must be composed of upper case letters and underscore only. (Cannot be part of Reserved syntax keywords)
  • Terminals must begin and end with a single quote. The text in between the single quotes defines the lexical token name (case sensitive) and should not contain spaces. (Cannot be part of Reserved syntax keywords)
  • EPSILON represents an epsilon production.
  • Whitespaces between tokens are delimiters.

Global keys table

Key Key description Value
non-terminal (e.g. T_IF) The non-terminal that can be replaced by its value. Array of non-terminals string (e.g. ["NT_A NT_B NT_C"])
Array of a terminal string (e.g. ["'if'"])
Array of an epsilon string (e.g. ["EPSILON"])
Array of a mix strings (e.g. ["NT_A NT_B NT_C", "'if'", "EPSILON"])
Note: The above row can be applied several times in this object

Reserved syntax keyword table

Lexical Keyword
:any
$

II - Syntax error messages

When the user input does not align with the language grammar, the syntax analyzer will try to recover from the panic mode and will report customized error messages describing each situation.

JSON structure

Global keys table

Key Description Value Value description Required
default_message A default message, used if no specific message is defined for a particular situation. string General error message. The string can contain optional placeholders. Yes
error_messages Specific message for a particular situation. Array of error message objects Check table Yes

Error message object table

Key Key description Value Value description
non_terminal Non-terminal expected by the syntax analyzer at a specific location in the input string Non-terminal (e.g. T_VAR_NAME)
:any Any non-terminal
$ End of grammar
terminal Actual terminal read by the syntax analyzer string Terminal (e.g. 'equal_sign')
:any Any terminal
$ End of file
message Error message string Meaningful message on what was added or unexpected.
(e.g. 'Missing variable name before the assignment operator at line ${lexical.line}').
The string can contain optional placeholders.

Placeholders table

Placeholder Description
${lexical[.next|.previous]*.value} Display the value of the token object
${lexical[.next|.previous]*.column} Display the column number in the line starting from 1
${lexical[.next|.previous]*.line} Display the line number in the text starting from 1
${filename} Input file name.