Compiler Design
Chapter 02: Parsing Techniques & Tools – Shift-Reduce, Operator Precedence, LR Parsers, and Parser Generators.
Mastering the fundamental concepts of parsing, different parser types, and automatic tools for grammar analysis.
Categories of all notes📘 Topic: Basic Parsing Techniques: Parsers
What is Parsing?
English: Parsing is the process of analyzing a sequence of symbols (like code) to check its grammatical structure according to a given grammar.
Hinglish: Parsing ka matlab hai code ya symbols ke sequence ko check karna ki wo grammar ke rules ke according sahi hai ya nahi.
What is a Parser?
English: A parser is a program or tool that performs parsing. It takes input from the lexical analyzer (tokens) and checks if the input follows the syntax rules of the programming language.
Hinglish: Parser ek program hota hai jo parsing karta hai. Ye lexical analyzer se tokens leta hai aur verify karta hai ki wo syntax rules ko follow karte hain ya nahi.
Why Parsers are Important?
- 💡 Ensure the source code is syntactically correct.
- 💡 Build a structure called parse tree or syntax tree.
- 💡 Help compiler understand the program structure.
- 💡 Enable further steps like semantic analysis and code generation.
Types of Parsers
Type | Description (English + Hinglish) |
---|---|
Top-Down Parsers | Parse from start symbol down to leaves; tries to build parse tree from root. (Start se parse karke leaves tak jata hai.) |
Bottom-Up Parsers | Parse from input symbols up to start symbol; builds parse tree from leaves. (Input se start symbol tak parse karta hai.) |
Common Parsing Techniques
- Top-Down Parsing
- Recursive Descent Parser
- Predictive Parser (LL Parsers)
- Bottom-Up Parsing
- Shift-Reduce Parser
- LR Parsers (SLR, CLR, LALR)
Example: Recursive Descent Parser
- 💡 Uses functions for each non-terminal in grammar.
- 💡 Parses input by calling functions recursively.
- 💡 Easy to implement but not suitable for left-recursive grammars.
Parse Tree
- 💡 A tree representation showing how a string is generated by grammar.
- 💡 Helps visualize the parsing process.
Summary in Hinglish
- Parsing ka matlab hai code ke tokens ko grammar ke rules ke hisaab se check karna.
- Parser wo tool hai jo ye check karta hai aur parse tree banata hai.
- Parsing do tarah ka hota hai: top-down aur bottom-up.
- Har type ke parsing ke liye alag techniques use hoti hain.
📘 Topic: Shift-Reduce Parsing
What is Shift-Reduce Parsing?
English: Shift-Reduce Parsing is a type of bottom-up parsing technique used by compilers to analyze the syntax of input by shifting input symbols onto a stack and reducing them to grammar rules.
Hinglish: Shift-Reduce Parsing ek bottom-up parsing technique hai jisme compiler input symbols ko stack me daalta hai (shift karta hai) aur phir unhe grammar ke rules ke hisaab se reduce karta hai.
Basic Idea (Shift and Reduce)
- Shift: Input symbol ko stack par daalo (push karo).
- Reduce: Stack ke top symbols ko grammar ke right side ke rule se match karo, aur unhe replace karo us rule ke left side non-terminal se.
Why Use Shift-Reduce Parsing?
- 💡 Efficient bottom-up parsing technique.
- 💡 Can handle a wide range of grammars including LR grammars.
- 💡 Useful in real-world compiler designs.
Main Components
- Stack: Holds input symbols and partial parses.
- Input Buffer: Holds the rest of the input to be parsed.
- Parsing Table (for LR parsers): Helps decide when to shift or reduce.
Actions in Shift-Reduce Parsing
Action | Explanation (English + Hinglish) |
---|---|
Shift | Move next input symbol to stack. (Agla symbol stack me daal do.) |
Reduce | Replace symbols on stack matching RHS of a production with LHS non-terminal. (Stack ke symbols ko production ke left side se replace karo.) |
Accept | Parsing successful when whole input is parsed correctly. (Parsing successful hai.) |
Error | If no valid action is possible, parsing fails. (Parsing me galti ho gayi hai.) |
Example: Shift-Reduce Parsing Steps
Consider grammar:
E → E + T | T
T → id
Input: id + id
Parsing Steps (simplified):
Stack | Input | Action |
---|---|---|
(empty) | id + id $ | Shift id |
id | + id $ | Reduce id to T (using T → id ) |
T | + id $ | Reduce T to E (using E → T ) |
E | + id $ | Shift + |
E + | id $ | Shift id |
E + id | $ | Reduce id to T (using T → id ) |
E + T | $ | Reduce E + T to E (using E → E + T ) |
E | $ | Accept |
Summary in Hinglish
- Shift-Reduce Parsing bottom-up parsing hai.
- Stack me symbols ko daalte hain (shift) aur grammar ke rules se unhe replace karte hain (reduce).
- Jab poora input sahi parse ho jata hai to accept karte hain.
- Parsing me galti ho to error aata hai.
📘 Topic: Operator Precedence Parsing
What is Operator Precedence Parsing?
English: Operator Precedence Parsing is a bottom-up parsing technique used to parse expressions where operators have different priorities (precedence). It helps decide the order in which operators are applied without using full grammar rules.
Hinglish: Operator Precedence Parsing ek bottom-up parsing technique hai jo expressions ko parse karti hai jisme alag-alag operators ki priority (precedence) hoti hai. Isse pata chalta hai ki kaunsa operator pehle apply karna hai bina poore grammar use kiye.
Why do we need Operator Precedence Parsing?
- 💡 Expressions have multiple operators like
+
,*
,-
with different priorities. - 💡 Need a simple way to parse expressions respecting operator precedence.
- 💡 Avoids ambiguity in parsing mathematical expressions.
Basic Idea (Precedence Relations)
- Define precedence relations between operators:
<
(less than) means the operator on the left has lower precedence.=
(equal) means the operators have the same precedence (used for associativity).>
(greater than) means the operator on the left has higher precedence.- Use a precedence table to decide when to shift and when to reduce.
Precedence Table Example
+ | * | id | ( | ) | $ | |
---|---|---|---|---|---|---|
+ | > | < | < | < | > | > |
* | > | > | < | < | > | > |
id | > | > | > | > | ||
( | < | < | < | < | = | |
) | > | > | > | > | ||
$ | < | < | < | < | accept |
$
is end marker.- For example,
+ < *
means+
has lower precedence than*
, so shift*
.
Parsing Steps
- Initialize stack with
$
. - Compare top terminal on stack with next input symbol using precedence table.
- If stack top
<
input, shift input symbol onto stack. - If stack top
>
input, reduce (pop and replace using grammar). - Repeat until input and stack both contain only
$
.
Example: Operator Precedence Parsing
Expression: id + id * id
- Start with stack:
$
- Input:
id + id * id $
Stack | Input | Action |
---|---|---|
$ | id + id * id $ | Shift id |
$ id | + id * id $ | Reduce id to E |
$ E | + id * id $ | Shift + |
$ E + | id * id $ | Shift id |
$ E + id | * id $ | Reduce id to E |
$ E + E | * id $ | Shift * (+ < * ) |
$ E + E * | id $ | Shift id |
$ E + E * id | $ | Reduce id to E |
$ E + E * E | $ | Reduce E * E to E (* > + or $ ) |
$ E + E | $ | Reduce E + E to E |
$ E | $ | Accept |
Summary in Hinglish
- Operator precedence parsing expressions ko operators ki priority ke hisaab se parse karta hai.
- Ek precedence table use karke decide karta hai kab shift karna hai aur kab reduce.
- Example me,
*
ki precedence+
se zyada hoti hai, isliye*
pehle reduce hoga. - Jab stack aur input dono me sirf
$
bacha to parsing successful hoti hai.
📘 Topic: Top-Down Parsing
What is Top-Down Parsing?
English: Top-Down Parsing is a parsing technique that starts from the start symbol of a grammar and tries to rewrite it to match the input string by applying production rules from top to bottom.
Hinglish: Top-Down Parsing ek parsing technique hai jo grammar ke start symbol se shuru hoti hai aur input string se match karne ke liye rules ko upar se niche tak apply karti hai.
How does Top-Down Parsing work?
- Starts from start symbol (like
S
). - Predicts which production rule to use to match the input string.
- Expands non-terminals into terminals step by step.
- Checks if the derived string matches the input string.
Main Types of Top-Down Parsing
- Recursive Descent Parsing: Simple top-down parsing implemented using recursion.
- Predictive Parsing: A form of recursive descent parsing without backtracking, uses lookahead to decide next rule.
Advantages of Top-Down Parsing
- 👍 Easy to implement using recursion.
- 👍 Works well for grammars without left recursion.
- 👍 Good for hand-written parsers.
Disadvantages of Top-Down Parsing
- 👎 Cannot handle left recursive grammars.
- 👎 May require backtracking in some cases (slow).
- 👎 Not suitable for all types of grammars.
Example: Top-Down Parsing Logic
Grammar:
S → aSb | ε
Input: aabb
Parsing steps (simplified):
- Start with
S
. - Try
S → aSb
. Matcha
from input. - Recur on
S
for next substring. - Continue until input matched or fail.
Summary in Hinglish
- Top-Down Parsing start symbol se shuru hota hai aur input match karta hai rules ke through.
- Recursive descent parsing sabse common type hai.
- Left recursion ko handle nahi kar sakta.
- Simple grammars ke liye best hai.
📘 Topic: Predictive Parsers
What is a Predictive Parser?
English: Predictive Parsing is a top-down parsing technique that uses lookahead (usually one symbol) to decide which production to use, without backtracking.
Hinglish: Predictive Parser ek top-down parsing technique hai jo input ka agla symbol (lookahead) dekh kar bina backtracking ke rule decide karta hai.
How does it work?
- 💡 Uses a parsing table (called LL(1) parsing table).
- 💡 For current non-terminal and lookahead terminal, chooses the correct production.
- 💡 Reads input from left to right (Left-to-right) and produces Leftmost derivation.
- 💡 No backtracking, so faster and efficient.
Steps to Construct Predictive Parser
- Remove left recursion from grammar.
- Compute FIRST and FOLLOW sets for grammar symbols.
- Build parsing table using FIRST and FOLLOW sets.
- Use stack to simulate parsing.
Example: Predictive Parsing Table
Grammar:
E → T E'
E' → + T E' | ε
T → int
Parsing table:
Non-terminal | int | + | $ |
---|---|---|---|
E | E → T E' | ||
E' | E' → + T E' | E' → ε | |
T | T → int |
Input: int + int $
Advantages of Predictive Parsing
- 👍 Simple and efficient for suitable grammars.
- 👍 No backtracking.
- 👍 Easy to implement with parsing table.
Disadvantages of Predictive Parsing
- 👎 Grammar must be LL(1), so no left recursion or common prefixes.
- 👎 Not all grammars are LL(1).
📘 Topic: LR Parsers (Automatic Construction of Efficient Parsers)
What is an LR Parser?
English: LR Parser is a bottom-up parser that reads input Left-to-right and produces a Rightmost derivation in reverse. It is very powerful and can parse most programming language grammars.
Hinglish: LR Parser ek bottom-up parser hai jo input ko Left se Right padhta hai aur Rightmost derivation ko reverse me produce karta hai. Ye bahut powerful hota hai aur zyada tar programming languages ke grammar ko parse kar sakta hai.
How does LR Parser work?
- 💡 Uses a stack to hold states and symbols.
- 💡 Reads next input symbol and decides action: Shift, Reduce, Accept, or Error.
- 💡 Uses a parsing table with ACTION and GOTO parts.
- 💡 Automatically built from grammar.
Types of LR Parsers
- SLR (Simple LR): Basic version, simpler parsing table.
- LR(1): Uses one lookahead symbol, more precise than SLR.
- LALR (Look-Ahead LR): Combines states from LR(1) to reduce table size but retains power.
LR Parsing Table (Components)
- ACTION Table:
- Shift (push symbol/state on stack)
- Reduce (replace symbols by non-terminal)
- Accept (successful parse)
- Error (parsing error)
- GOTO Table:
- Decides next state after reduction.
Example (Simple LR parsing step)
Grammar:
S → CC
C → cC | d
Input: c c d d
Simplified Steps:
- Shift
c
, shiftc
, shiftd
- Reduce by
C → d
- Reduce by
C → cC
- Reduce by
S → CC
Advantages of LR Parsers
- 👍 Can parse a large class of grammars (more than predictive parsers).
- 👍 No backtracking needed.
- 👍 Used in most compiler tools (like YACC).
Disadvantages of LR Parsers
- 👎 More complex parsing tables.
- 👎 Harder to implement by hand (usually tool-generated).
Summary (Hinglish)
- Predictive Parser: Simple top-down, uses lookahead, no backtracking, only for LL(1) grammars.
- LR Parser: Powerful bottom-up parser, reads input left to right, uses states and tables, can parse almost all programming languages’ grammars.
- LR Types: SLR (simple), LR(1) (better), LALR (balanced).
📘 Topic: Canonical Collection of LR(0) Items
What is an LR(0) Item?
English: An LR(0) item is a production rule of a grammar with a dot (•
) placed at some position on the right-hand side to show how much of the production has been seen (or parsed).
Hinglish: LR(0) item ek grammar ka production rule hota hai jisme •
(dot) lagaya hota hai, jo dikhata hai ki production ka kitna part dekha ya parse hua hai.
Format of an LR(0) Item (Example)
If production is:
A → XYZ
Then LR(0) items can be:
A → •XYZ (dot at start, nothing parsed yet)
A → X•YZ (X parsed, now expect Y)
A → XY•Z (XY parsed, now expect Z)
A → XYZ• (all parsed, ready to reduce)
What is the Canonical Collection of LR(0) Items?
English: The canonical collection is the complete set of LR(0) items (states) built for a grammar, used to construct the LR parsing table.
Hinglish: Canonical collection wo pura set hota hai LR(0) items ka (states), jo grammar ke liye banta hai LR parsing table banane ke liye.
How to Construct Canonical Collection?
- Start with augmented grammar (add
S' → S
whereS
is start symbol). - Begin with the item
[S' → •S]
and find its closure (add items that can start with the symbol after dot). - For each item set, find all possible transitions (shifting dot over next grammar symbol) and form new item sets.
- Repeat until no new item sets can be added.
Important Concepts for LR(0) Collection
- 💡 Closure: Add all items that can be derived from a non-terminal immediately after dot.
- 💡 Goto: Move the dot over a grammar symbol and find the closure again.
- 💡 States: Each set of LR(0) items is called a state in the LR parser.
Example: Building LR(0) States (Partial)
Grammar:
S → CC
C → cC | d
Start with augmented grammar:
S' → S
Initial item:
[S' → •S]
Closure adds:
[S → •CC]
[C → •cC]
[C → •d]
(This process continues to create full canonical collection by moving dot over S
, C
, c
, d
to find new item sets and their closures.)
Why Canonical Collection is Important?
- 💡 Helps in building LR(0) parsing tables.
- 💡 Represents all possible parser states.
- 💡 Determines parser actions (shift, reduce, accept).
Summary (Hinglish)
- LR(0) item ek production with dot dikhaata hai parsing progress.
- Canonical collection sare LR(0) item sets ka poora group hai.
- Isko banane ke liye closure aur goto ka use karte hain.
- Yeh LR parser ki state machine banata hai.
📘 Topic: Constructing SLR Parsing Tables
What is an SLR Parser?
English: SLR (Simple LR) parser is a bottom-up parser that uses LR(0) items and FOLLOW sets to build parsing tables (ACTION and GOTO). It’s simpler than full LR(1) parsers.
Hinglish: SLR parser ek bottom-up parser hai jo LR(0) items aur FOLLOW sets ka use karta hai parsing table (ACTION aur GOTO) banane ke liye. Ye full LR(1) se thoda simple hai.
Components of SLR Parsing Table
- ACTION Table: Decides shift, reduce, accept, or error.
- GOTO Table: Tells next state after reducing a non-terminal.
Steps to Construct SLR Parsing Table
- Step 1: Augment the Grammar
- Add a new start symbol:
whereS' → S
S
is original start symbol.
- Add a new start symbol:
- Step 2: Construct Canonical Collection of LR(0) Items
- Start with item
[S' → •S]
and find closure. - For each item set, find all GOTO transitions to new sets.
- Continue until no new sets.
- Start with item
- Step 3: Calculate FOLLOW Sets for all Non-terminals
FOLLOW(A)
= set of terminals that can appear immediately after non-terminalA
in any "sentential form".
- Step 4: Fill ACTION and GOTO Tables
- For each state (item set)
I
: - For item
[A → α • a β]
wherea
is terminal,ACTION[I, a] = shift to state GOTO(I, a).
- For item
[A → α •]
(dot at the end), For each terminalb
inFOLLOW(A)
,ACTION[I, b] = reduce by A → α.
- If
[S' → S •]
inI
,ACTION[I, $] = accept.
- For non-terminal
A
,GOTO[I, A] = state GOTO(I, A).
- For each state (item set)
- Step 5: Handle Conflicts (if any)
- SLR parsing tables may have conflicts if grammar is not SLR(1).
- Conflicts are shift-reduce or reduce-reduce errors.
Example: SLR Table Construction
Grammar:
S → CC
C → cC | d
Augmented:
S' → S
Canonical Collection Items (States): (Partial)
I0:
S' → •S
S → •CC
C → •cC
C → •d
I1
: after S
shift
I2
: after C
shift, etc.
FOLLOW Sets:
FOLLOW(S) = {$}
FOLLOW(C) = {c, d, $}
Filling ACTION and GOTO example:
- In state with
[C → c • C]
and next inputc
, shift. - In state with
[C → d •]
, reduce byC → d
on terminals inFOLLOW(C)
.
Summary
- SLR uses LR(0) items + FOLLOW sets.
- ACTION for shift if dot before terminal, reduce if dot at end.
- Accept when start symbol production is complete.
- GOTO for non-terminals.
- Simple but less powerful than LR(1).
📘 Topic: Constructing Canonical LR Parsing Tables
What is Canonical LR Parser?
English: Canonical LR parser (also called LR(1) parser) is a powerful bottom-up parser. It uses LR(1) items which have lookahead symbols to decide parsing actions more precisely than SLR or LR(0).
Hinglish: Canonical LR parser ek strong bottom-up parser hai. Ye LR(1) items use karta hai jisme lookahead symbol hota hai, jo parsing decisions ko SLR ya LR(0) se zyada accurate banata hai.
What is an LR(1) Item?
- 💡 An LR(1) item is a production with a dot (
•
) and a lookahead symbol (terminal or$
). - Example:
[A → α • β, a]
Means:
- We have parsed
α
, expectingβ
next, - The lookahead symbol is
a
(next input symbol expected after this production).
- We have parsed
Steps to Construct Canonical LR Parsing Table
- Step 1: Augment the Grammar
- Add new start symbol:
S' → S
- Add new start symbol:
- Step 2: Construct the Canonical Collection of LR(1) Items
- Start with item:
[S' → • S, $]
- Use Closure to add all items that can appear given the lookahead.
- Use Goto function to move dot over terminals or non-terminals.
- Repeat until all states (sets of LR(1) items) are found.
- Start with item:
- Step 3: Build ACTION and GOTO Tables
- For each state (set of LR(1) items):
- If an item is
[A → α • a β, b]
wherea
is a terminal,ACTION[state, a] = shift to GOTO(state, a).
- If an item is
[A → α •, a]
(dot at end),ACTION[state, a] = reduce by A → α.
- If item is
[S' → S •, $]
,ACTION[state, $] = accept.
- For non-terminal
A
,GOTO[state, A] = next state from GOTO function.
- Step 4: Handle Conflicts
- Canonical LR parser usually has fewer conflicts than SLR or LR(0).
- If conflicts appear, grammar may not be LR(1).
Example (Simple Grammar for Canonical LR)
Grammar:
S → CC
C → cC | d
Augmented:
S' → S
Start with [S' → •S, $]
, closure gives items with lookahead $
.
Build sets of LR(1) items by moving dot and adjusting lookaheads.
Construct ACTION and GOTO tables using rules above.
Difference from SLR
- 💡 SLR uses FOLLOW sets for reduce actions; Canonical LR uses specific lookaheads in items.
- 💡 Canonical LR is more precise, detects fewer conflicts.
Summary (Hinglish)
- Canonical LR parser LR(1) items ke saath kaam karta hai (dot + lookahead).
- Ye parsing tables ACTION aur GOTO banata hai states ke basis par.
- Reduce tab hota hai jab dot end mein ho aur lookahead match kare.
- Shift tab hota hai jab dot terminal ke aage ho.
- Sabse powerful bottom-up parser hai.
📘 Topic: Constructing LALR Parsing Tables
What is LALR Parser?
English: LALR (Look-Ahead LR) parser is a simplified version of the Canonical LR parser. It merges states with the same LR(0) items but different lookaheads to reduce the size of the parsing table, making it more memory efficient while keeping much of LR(1)’s power.
Hinglish: LALR parser Canonical LR ka simplified form hai. Ye un states ko jodta hai jinke paas same LR(0) items hain lekin alag lookahead hote hain, isse parsing table chhota aur memory efficient hota hai, par power bhi kaafi achhi rehti hai.
Why LALR?
- 💡 Canonical LR tables bahut bade ho sakte hain (states bahut zyada).
- 💡 LALR tables chhote hote hain, isliye zyada use kiye jate hain compiler design mein.
Main Idea of Constructing LALR Tables
- Step 1: Construct the canonical collection of LR(1) items (states with dot + lookahead).
- Step 2: Combine (merge) states that have the same LR(0) core (same items ignoring lookahead).
- Step 3: Merge the lookahead sets of these states.
- Step 4: Build ACTION and GOTO tables from these merged states.
Step-by-step Construction (LALR)
- Step 1: Build Canonical LR(1) Collection
- Like Canonical LR, build all LR(1) item sets with dot and lookahead.
- Step 2: Identify LR(0) Cores
- For each LR(1) state, find the LR(0) core by ignoring lookahead symbols.
- Step 3: Merge States with Same LR(0) Core
- Combine all states whose cores are the same.
- Merge their lookahead sets to form one state.
- Step 4: Construct ACTION and GOTO Tables
- Use the merged states to fill ACTION and GOTO tables similarly to Canonical LR.
- Shift and reduce decisions are based on merged lookahead sets.
- Step 5: Resolve Conflicts
- Because states are merged, some conflicts may appear which were not in Canonical LR.
- Mostly LALR grammars work well for programming languages.
Example (Simple Explanation) for LALR
Consider grammar:
S → CC
C → cC | d
Build Canonical LR(1) states (with lookaheads).
Identify LR(0) cores (just items without lookahead).
Merge states with same cores, combine their lookaheads.
Build ACTION and GOTO from merged states.
Comparison Table: Canonical LR vs. LALR
Feature | Canonical LR | LALR |
---|---|---|
Number of States | Many (larger) | Fewer (smaller) |
Lookahead Handling | Precise (full LR1) | Merged lookaheads |
Table Size | Large | Smaller |
Conflict Handling | Less conflicts | May have some conflicts |
Usage | Powerful, theoretical | Practical, widely used |
Summary
- LALR parser is practical, memory-efficient version of Canonical LR.
- Build Canonical LR(1) states first, then merge states with same LR(0) core.
- Construct parsing table from merged states.
- Commonly used in many compilers (e.g., yacc, bison).
📘 Topic: Using Ambiguous Grammars
What is an Ambiguous Grammar?
English: A grammar is called ambiguous if there exists at least one string in the language that can have more than one distinct parse tree or derivation. In simple words, the grammar gives multiple ways to understand or analyze the same sentence.
Hinglish: Jab kisi grammar se koi ek sentence ko do ya usse zyada alag-alag tarike se parse ya derive kiya ja sake, to wo grammar ambiguous kehlata hai. Matlab ek sentence ka multiple meaning ya structure nikal sakta hai.
Why Ambiguity is a Problem?
- 👎 Compiler ko samajhne mein dikkat hoti hai ki actual meaning kya hai.
- 👎 Multiple parse trees mean confusion in meaning or operations.
- 👎 Parsing algorithms (especially deterministic ones) can fail or be inefficient.
Example of Ambiguous Grammar (Arithmetic Expression)
Grammar:
E → E + E
E → E * E
E → (E)
E → id
Sentence: id + id * id
Two parse trees possible:
(id + id) * id
id + (id * id)
Use of Ambiguous Grammars
- 💡 Ambiguous grammars are sometimes used temporarily during design to describe language intuitively.
- 💡 But compilers cannot use ambiguous grammars directly because they cause parsing confusion.
- 💡 Ambiguity must be resolved for deterministic parsing.
Resolving Ambiguity
- Write unambiguous grammar versions by adding precedence and associativity rules.
- For example, rewrite above grammar with precedence:
E → E + T | T T → T * F | F F → (E) | id
This grammar is unambiguous and enforces
*
precedence over+
.
Summary (Hinglish)
- Ambiguous grammar ka matlab hai ek sentence ke liye multiple parse trees possible.
- Ye compiler ke liye problem create karta hai parsing mein.
- Sometimes design mein use hota hai, but final compiler grammar should be unambiguous.
- Ambiguity resolve karne ke liye grammar ko rewrite karte hain precedence aur associativity rules ke saath.
📘 Topic: Automatic Parser Generator
What is an Automatic Parser Generator?
English: An automatic parser generator is a software tool that automatically creates a parser from a given grammar. Instead of writing the parsing code by hand, you just provide the grammar rules, and the tool generates the parser program for you.
Hinglish: Automatic parser generator ek aisa software hai jo grammar dene par apne aap parser bana deta hai. Matlab tumhe manually parsing code nahi likhna padta, bas grammar do aur parser ready ho jata hai.
Why Use Automatic Parser Generators?
- 💡 Writing parsers manually is time-consuming and error-prone.
- 💡 Automates the process of parser construction.
- 💡 Ensures correctness of parsing logic according to the grammar.
- 💡 Easy to modify parser by just changing grammar rules.
Common Examples of Parser Generators
- 💡 Yacc (Yet Another Compiler Compiler) — Mostly for LALR parsers.
- 💡 Bison — GNU version of yacc.
- 💡 ANTLR (ANother Tool for Language Recognition) — Supports LL(*) parsers.
- 💡 JavaCC — For Java parsers.
How Automatic Parser Generator Works?
- User provides the grammar specification in a specific format.
- Tool analyzes the grammar and creates parsing tables (like LR or LL tables).
- It generates the parser code (in C, Java, or other languages).
- This parser code can be integrated into your compiler or interpreter.
Example of Parser Generator Use
Suppose you have grammar rules for arithmetic expressions:
expr → expr + term | term
term → term * factor | factor
factor → (expr) | id
You give this grammar to Yacc or ANTLR.
It generates parser code that can parse expressions like id + id * id
.
Summary
- Automatic parser generators save time and effort.
- They automatically handle complicated parsing tables and code generation.
- Most modern compilers use these tools.
- You only need to focus on defining grammar correctly.
📘 Topic: YACC Tool (Yet Another Compiler Compiler)
What is YACC?
English: YACC stands for Yet Another Compiler Compiler. It is a popular tool used to automatically generate a parser based on a given grammar. YACC takes the grammar rules as input and produces C code for a parser that can recognize strings of that grammar.
Hinglish: YACC ka matlab hai Yet Another Compiler Compiler. Ye ek tool hai jo grammar ko lekar apne aap parser banata hai. Tum grammar do, YACC C language mein parser ka code bana ke deta hai.
Purpose of YACC
- 💡 Automatically generate parsers for programming languages or data formats.
- 💡 Helps compiler writers avoid manual parser coding.
- 💡 Supports LALR(1) parsing technique (efficient and powerful).
How YACC Works?
- Input: Grammar rules written in YACC syntax (similar to BNF).
- YACC generates: C source code for the parser (
y.tab.c
). - Output parser: Reads tokens from a lexical analyzer (usually Lex) and builds parse trees or performs semantic actions.
Basic Structure of a YACC File
%{
// C declarations and headers
%}
%%
// Grammar rules and actions
%%
// Additional C functions
Example of YACC Grammar with Actions
Suppose grammar for simple arithmetic expressions:
%{
#include
%}
%token NUM
%%
expr: expr '+' term { printf("Add\n"); }
| term
;
term: term '*' factor { printf("Multiply\n"); }
| factor
;
factor: '(' expr ')'
| NUM
;
%%
int main() {
yyparse();
return 0;
}
int yyerror(char *s) {
fprintf(stderr, "Error: %s\n", s);
return 0;
}
%token NUM
declares tokens from the lexer (like numbers).- Actions
{ printf("Add\n"); }
execute when rules match.
Integration with Lex
- 💡 YACC parser usually gets input tokens from a lexical analyzer tool like Lex.
- 💡 Lex breaks input into tokens (like
NUM
,'+'
,'*'
). - 💡 YACC uses these tokens to parse input.
Advantages of YACC
- 👍 Automates parser generation from grammar.
- 👍 Saves manual coding effort.
- 👍 Supports error recovery and debugging features.
- 👍 Widely used and supported.
Disadvantages of YACC
- 👎 Generated code is in C only.
- 👎 LALR(1) parsing may not handle all complex grammars.
- 👎 Learning curve for writing proper grammar and actions.
Summary (Hinglish)
- YACC ek tool hai jo grammar se parser C code generate karta hai.
- Mostly compiler banane mein use hota hai.
- Lex ke sath milke input tokenize karta hai, phir YACC parse karta hai.
- Grammar ko proper format mein likhna padta hai YACC mein.