Syntax analysis
Description
Syntax analysis is the second phase in compiler design where the lexical tokens generated by the lexical analyzer are validated against a grammar defining the language syntax.
I - Grammar
A language syntax is determined by a set of productions forming a grammar. Constructed grammars must satisfy the LL(1) (left to right, leftmost derivation, 1 lookahead) conditions.
LL(1) Conditions
A - No left recursion
Example of a left recursion
A -> A a | b
Solution for a left recursion
A -> b B
B -> a B | ε
B - Intersection of First sets in same production must be empty
Example of a non-empty intersection of First sets in a same production
A -> B C | D E
B -> F G
D -> F H
In the above grammar, First(F) ∈ First(B) and First(F) ∈ First(D), therefore First(B) ∩ First(D) ≠ {}
One solution for this problem
A -> F I
I -> G C | H E
C - Intersection of First and Follow sets of a non-terminal must be empty
Example of a non-empty intersection of First and Follow sets of a non-terminal
A -> B a
B -> a | ε
In the above grammar, First(B) ∩ Follow(B) = {a}
One solution for this problem
A -> a C
C -> a | ε
JSON structure
Terminals and Non-terminals
- Non-terminals must be composed of upper case letters and underscore only. (Cannot be part of Reserved syntax keywords)
- Terminals must begin and end with a single quote. The text in between the single quotes defines the lexical token name (case sensitive) and should not contain spaces. (Cannot be part of Reserved syntax keywords)
EPSILON
represents an epsilon production.- Whitespaces between tokens are delimiters.
Global keys table
Key | Key description | Value | |
---|---|---|---|
non-terminal (e.g. T_IF ) |
The non-terminal that can be replaced by its value. | Array of non-terminals string (e.g. ["NT_A NT_B NT_C"] ) |
|
Array of a terminal string (e.g. ["'if'"] ) |
|||
Array of an epsilon string (e.g. ["EPSILON"] ) |
|||
Array of a mix strings (e.g. ["NT_A NT_B NT_C", "'if'", "EPSILON"] ) |
|||
Note: The above row can be applied several times in this object |
Reserved syntax keyword table
Lexical Keyword |
---|
:any |
$ |
II - Syntax error messages
When the user input does not align with the language grammar, the syntax analyzer will try to recover from the panic mode and will report customized error messages describing each situation.
JSON structure
Global keys table
Key | Description | Value | Value description | Required |
---|---|---|---|---|
default_message | A default message, used if no specific message is defined for a particular situation. | string | General error message. The string can contain optional placeholders. | Yes |
error_messages | Specific message for a particular situation. | Array of error message objects | Check table | Yes |
Error message object table
Key | Key description | Value | Value description |
---|---|---|---|
non_terminal | Non-terminal expected by the syntax analyzer at a specific location in the input | string | Non-terminal (e.g. T_VAR_NAME ) |
:any |
Any non-terminal | ||
$ |
End of grammar | ||
terminal | Actual terminal read by the syntax analyzer | string | Terminal (e.g. 'equal_sign' ) |
:any |
Any terminal | ||
$ |
End of file | ||
message | Error message | string | Meaningful message on what was added or unexpected. (e.g. 'Missing variable name before the assignment operator at line ${lexical.line}'). The string can contain optional placeholders. |
Placeholders table
Placeholder | Description |
---|---|
${lexical[.next|.previous]*.value} |
Display the value of the token object |
${lexical[.next|.previous]*.column} |
Display the column number in the line starting from 1 |
${lexical[.next|.previous]*.line} |
Display the line number in the text starting from 1 |
${filename} |
Input file name. |