- 使用 [[https://github.com/westes/flex|flex]] 和 [[http://www.gnu.org/software/bison/|Bison]]。
* [[http://westes.github.io/flex/manual/|Lexical Analysis With Flex]]
* [[https://www.gnu.org/software/bison/manual/bison.html|Bison 3.0.4]]
- 參考書籍
* [[https://www.amazon.com/flex-bison-Text-Processing-Tools/dp/0596155972|flex & bison: Text Processing Tools]]
* [[https://www.amazon.com/Language-Implementation-Patterns-Domain-Specific-Programming/dp/193435645X|Language Implementation Patterns]]
* [[http://www.ituring.com.cn/book/1159|自製編程語言]]
- [[ANTLR]]
====== 基礎教學 ======
{{http://epaperpress.com/lexandyacc/images/fig12.gif}}
- 撰寫 bas.y (yacc 檔) 和 bas.l (lex 檔)。
- yacc 編譯 bas.y 產生定義 token 的 y.tab.h 給 lex.yy.c include; 產生定義 parser 的 y.tab.c,yyparse 是 parser 的入口函式。
- lex 編譯 bas.l 產生定義 scanner 的 lex.yy.c,yylex 是 scanner 的入口函式。y.tab.c 的 yyparse 調用 yylex。
- cc (C 編譯器) 編譯 y.tab.c 和 lex.yy.c,得到自製編譯器 bas.exe。
- bas.exe 編譯 source,得到 compiled output。
* [[http://www.ituring.com.cn/book/1159|自制编程语言]]
* [[http://download.csdn.net/detail/matrixvirus/6573659|自制编程语言]].
* [[http://web.iitd.ac.in/~sumeet/flex__bison.pdf|flex & bison]]
* [[http://www.cse.iitb.ac.in/~br/courses/cs699-autumn2013/refs/lexyacc-epaperpress.pdf|A COMPACT GUIDE TO LEX & YACC]]
* 寫 lex 需要懂 [[wp>Regular expression]],寫 yacc 需要懂 [[wp>Backus–Naur form|Backus–Naur form (BNF)]]。
* 要懂怎麼建 ast tree, walk ast tree, 建 symbol table。
* 要懂怎麼為輸入輸出建模,錯誤訊息怎麼有條理的輸出。
===== BNF =====
* [[http://good-ed.blogspot.tw/2010/02/ebnf.html|(E)BNF表示式]]
* [[http://good-ed.blogspot.tw/2010/03/blog-post.html|手工打造算式計算機]]
* [[http://good-ed.blogspot.tw/2010/04/lexyacc.html|以lex/yacc實作算式計算機]]
* [[wp>LR parser]]
* LR(k) 代表往前看幾個 token。
* [[wp>GLR parser]]
* [[http://www.gnu.org/software/bison/manual/html_node/GLR-Parsers.html|1.5 Writing GLR Parsers]]
* [[http://www.gnu.org/software/bison/manual/html_node/Debugging.html|8 Debugging Your Parser]]
* [[http://www.gnu.org/software/bison/manual/html_node/Understanding.html|8.1 Understanding Your Parser]]
* 小技巧
* [[http://stackoverflow.com/questions/14665976/flex-use-text-file-as-input-stream|Flex: Use Text File as Input Stream]]
* [[http://stackoverflow.com/questions/8794239/can-flex-and-bison-accept-input-from-other-sources|Can flex and bison accept input from other sources?]]
* [[http://stackoverflow.com/questions/1756275/bison-end-of-file|bison end of file]]
* 問題在於 Bison 遇到 EOF,其動作是中止文法推導 (return 0)。此時 stack 中尚有 symbol,因此 Bison 判定為錯誤。故人為預先在文本或是輸入流加上 \n,使得文法推導成功。
* [[http://stackoverflow.com/questions/9611682/flexlexer-support-for-unicode|Flex(lexer) support for unicode]]
====== Flex & Bison 應用 ======
* [[http://www.di-mgt.com.au/flex_and_bison_in_msvc.html|Using flex and bison in MSVC++]]
* [[http://blog.sina.com.cn/s/blog_6476250d0100wq2q.html|【计算机】在Visual Studio 2008中使用Flex、Bison工具的配置]]
* 在 Windows 上建議使用 Cygwin 提供的 Flex & Bison。
* [[http://flex.sourceforge.net/manual/Makefiles-and-Flex.html|A.1 Makefiles and Flex]]
===== Flex =====
definitions
%%
rules
%%
user code
* [[http://flex.sourceforge.net/manual/Definitions-Section.html|5.1 Format of the Definitions Section]]
* 以 {name} 的形式使用,避免不必要的錯誤。
* [[http://flex.sourceforge.net/manual/Rules-Section.html|5.2 Format of the Rules Section]]
* [[http://flex.sourceforge.net/manual/Patterns.html|6 Patterns]]
* These expressions all designate a set of characters equivalent to the corresponding standard C isXXX function.
* [[http://marvin.cs.uidaho.edu/Handouts/regex.html|A Regular Expression Primer]]
* [[http://flex.sourceforge.net/manual/User-Code-Section.html|5.3 Format of the User Code Section]]
* [[http://flex.sourceforge.net/manual/Comments-in-the-Input.html|5.4 Comments in the Input]]
* 新起一行,一個以上空白,後接 /* comment */。
* 常用選項
* [[http://flex.sourceforge.net/manual/Code_002dLevel-And-API-Options.html|16.3 Code-Level And API Options]]
* %option bison-locations
* 產生位置資訊供 bison 使用。此選項隱含 %option bison-bridge
* %option bison-bridge
* 產生適用於 bison 的接口。
* %option reentrant
* 產生可重入的 scanner。
* %option nounistd
* 避免引入非標準的 unistd 標頭檔。
* %option noyywrap
* %option nodefault
* %option yylinenumber
* %option case-insensitive
* 小技巧
* [[http://flex.sourceforge.net/manual/Start-Conditions.html|10 Start Conditions]]
* Lex 規則可以用 start condition 分組。start condition 可以是 inclusive 或是 exclusive。
* 切換至 inclusive start condition 時,沒有隸屬任何 start condition 的規則也會被納入匹配。If the start condition is inclusive, then rules with no start conditions at all will also be active.
* 切換至 exclusive start condition 時,只有隸屬該 exclusive start condition 會被納入匹配。If it is exclusive, then only rules qualified with the start condition will be active.
* BEGIN(0) 或是 BEGIN(INITIAL) 都會切換回預設的 start condition (INITIAL)。
* [[http://stackoverflow.com/questions/22779138/when-should-i-use-begin-to-change-start-condition|When should I use BEGIN to change start condition?]]
%s example ; example 是 inclusive start condition。
%%
; 當切換到 example start condition 時,foo 和 bar 都會被激活。
foo do_something();
bar something_else();
* [[http://flex.sourceforge.net/manual/How-can-I-match-C_002dstyle-comments_003f.html|How can I match C-style comments?]]
* [[http://stackoverflow.com/questions/15396322/the-quotation-marks-in-flex|The quotation marks “ ” in flex]]
* 雙引號用意在於告訴 lex 不用解釋其中內容。正則表達式所用的符號不起作用。
* 如果要匹配雙引號,使用 \"。
* 單引號不起作用,單純代表單引號本身。
* Bison 用雙引號代表要匹配的字串,單引號代表要匹配的字元。
===== Bison =====
Bison 3.0 版之後有較大改變。注意手冊版本。
* [[http://www.gnu.org/software/bison/manual/html_node/Grammar-Outline.html|3.1 Outline of a Bison Grammar]]
* [[http://www.gnu.org/software/bison/manual/html_node/Grammar-File.html|3 Bison Grammar Files]]
* [[http://www.gnu.org/software/bison/manual/html_node/Recursion.html|3.3.3 Recursive Rules]]
/* left recursion */
expseq1:
exp
| expseq1 ',' exp
;
/* right recursion */
expseq1:
exp
| exp ',' expseq1
;
* right recursion 有可能會造成 bison 內部的棧空間耗盡,但優點在於鏈結串列的節點是由頭串聯至尾,無須反向。
* 可以透過 YYMAXDEPTH ([[http://www.gnu.org/software/bison/manual/html_node/Memory-Management.html|5.10 Memory Management, and How to Avoid Memory Exhaustion]]) 加大 bison 內部的棧空間。
* [[http://stackoverflow.com/questions/1697363/left-recursion-in-grammar-results-in-conflicts|Left Recursion in Grammar Results in Conflicts]]
* [[http://www.gnu.org/software/bison/manual/html_node/Declarations.html|3.7 Bison Declarations]]
* [[http://www.gnu.org/software/bison/manual/html_node/Decl-Summary.html|3.7.12 Bison Declaration Summary]]
* [[http://www.gnu.org/software/bison/manual/html_node/Interface.html|4 Parser C-Language Interface]]
* [[http://www.gnu.org/software/bison/manual/html_node/Parser-Function.html|4.1 The Parser Function yyparse]]
* 可以藉由檢查 yyparse 返回值,簡單查驗 parse 是否成功。
* [[http://www.gnu.org/software/bison/manual/html_node/Algorithm.html|5 The Bison Parser Algorithm]]
* [[http://www.gnu.org/software/bison/manual/html_node/Lookahead.html|5.1 Lookahead Tokens]]
* [[http://www.gnu.org/software/bison/manual/html_node/Shift_002fReduce.html|5.2 Shift/Reduce Conflicts]]
* 不管是 shift 匹配到一個 symbol,或是當下 reduce (stack) 到一個 rule,都是合法行徑的時候,即為 Shift/Reduce Conflict。
* 觀察 *.output,可以得知發生 Shift/Reduce Conflict 的狀態為何。如果有吃進同一個 token, 不論 shift 或是 reduce 皆可,即是 Shift/Reduce Conflict。
* [[http://www.gnu.org/software/bison/manual/html_node/Non-Operators.html|5.3.6 Using Precedence For Non Operators]]
* [[http://www.gnu.org/software/bison/manual/html_node/Expect-Decl.html|3.7.8 Suppressing Conflict Warnings]]
* [[http://www.gnu.org/software/bison/manual/html_node/Multiple-Types.html#Multiple-Types|More Than One Value Type]]
* [[http://www.gnu.org/software/bison/manual/html_node/Token-Decl.html|Token Type Names]]
%token TOKEN
Each call to yylex() returns an integer value which represents a token type. This tells YACC what kind of token it has read. The token may optionally have a value, which should be placed in the variable yylval.
By default yylval is of type int, but you can override that from the YACC file by re#defining YYSTYPE.
* token 有以下三種含義:
* 回傳給 yacc 的 token type。
* token 的原始字串,yytext。
* 回傳給 yacc 的 token value,yylval。
使用 %union 讓 yylval 可以有多種型別。
%union
{
int number;
char *string;
}
%token STATE
%token NUMBER
%token WORD
[0-9]+ yylval.number=atoi(yytext); return NUMBER;
[a-z0-9]+ yylval.string=strdup(yytext);return WORD;
====== 優先權與結合性 ======
* [[http://www.gnu.org/software/bison/manual/html_node/Precedence-Decl.html|3.7.3 Operator Precedence]]
* [[http://www.gnu.org/software/bison/manual/html_node/Precedence.html|5.3 Operator Precedence]]
* [[http://www.gnu.org/software/bison/manual/html_node/Using-Precedence.html#Using-Precedence|5.3.2 Specifying Operator Precedence]]
* The relative precedence of different operators is controlled by the order in which they are declared. The first precedence/associativity declaration in the file declares the operators whose precedence is lowest, the next such declaration declares the operators whose precedence is a little higher, and so on.
====== 建立符號表 ======
* [[http://www.d.umn.edu/~rmaclin/cs5641/Notes/L15_SymbolTable.pdf|Symbol Table Implementations]]
* [[http://pages.cs.wisc.edu/~cs536-1/readings/6.SYMBOL-TABLES.html|Symbol Tables and Static Checks]]
* 2 Using Flex
====== 建立語法樹 ======
Bison 中所寫的語法規則,有些僅是用來處理運算子優先順序,和程式無關。抽象語法樹即是將前述語法規則剔除。
* [[http://web.eecs.utk.edu/~bvz/teaching/cs461Sp11/notes/parse_tree/|Building Abstract Syntax Trees]]
* 遍歷抽象語法樹
* [[http://stackoverflow.com/questions/2644597/how-do-i-implement-if-statement-in-flex-bison|How do i implement If statement in Flex/bison]]
* [[http://efxa.org/2014/05/25/how-to-create-an-abstract-syntax-tree-while-parsing-an-input-stream/|How to create an abstract syntax tree while parsing an input stream]]
* 3 Using Bison
====== Mid-Rule Action ======
* [[https://www.gnu.org/software/bison/manual/html_node/Using-Mid_002dRule-Actions.html|3.4.8.1 Using Mid-Rule Actions]]
* A mid-rule action may refer to the components preceding it using $n, but it may not refer to subsequent components because it is run before they are parsed.
* mid-rule action 可以參照到前面 symbol 的值,但無法存取其後 symbol 的值。
* The mid-rule action itself counts as one of the components of the rule. This makes a difference when there is another action later in the same rule (and usually there is another at the end): you have to count the actions along with the symbols when working out which number n to use in $n.
* mid-rule action 本身也算成一個 symbol。
* The mid-rule action can also have a semantic value. The action can set its value with an assignment to $$, and actions later in the rule can refer to the value using $n. Since there is no symbol to name the action, there is no way to declare a data type for the value in advance, so you must use the ‘$<…>n’ construct to specify a data type each time you refer to this value.
* mid-rule action 本身可以有值。
* There is no way to set the value of the entire rule with a mid-rule action, because assignments to $$ do not have that effect. The only way to set the value for the entire rule is with an ordinary action at the end of the rule.
* mid-rule action 的值無法賦值給本身所屬的規則。
====== 錯誤回報 ======
* [[https://www.ibm.com/developerworks/opensource/library/l-flexbison/|Better error handling using Flex and Bison]]
* [[http://stackoverflow.com/questions/15429834/flex-bison-building-c-compiler-line-number-in-error-message|Flex Bison: Building C Compiler - Line number in error message]]
* [[http://stackoverflow.com/questions/656703/how-does-flex-support-bison-location-exactly|How does flex support bison-location exactly?]]
* [[http://www.gnu.org/software/bison/manual/html_node/Error-Reporting.html|4.7 The Error Reporting Function yyerror]]
* 取得更精確的錯誤訊息。
* 不同版本 Bison 有不同控制項
* %error-verbose
* %define parse.error verbose
If you invoke ‘%define parse.error verbose’ in the Bison declarations section (see The Bison Declarations Section), then Bison provides a more verbose and specific error message string instead of just plain "syntax error". However, that message sometimes contains incorrect information if LAC is not enabled (see LAC).
* [[http://www.gnu.org/software/bison/manual/html_node/LAC.html|5.8.3 LAC]]
* [[http://flex.sourceforge.net/manual/Generated-Scanner.html|9 The Generated Scanner]]
* 自訂 YY_INPUT
* BeginToken 紀錄 token 起終點。
* Lex 讀取 token 時,有可能會多讀取字元,以便判定是否辨識出一個 token。此時需要修正 token 位置,輸出錯誤訊息時才能正確定位。
* [[http://www.gnu.org/software/bison/manual/html_node/Tracking-Locations.html|3.5 Tracking Locations]]
* [[http://www.gnu.org/software/bison/manual/html_node/Location-Type.html|3.5.1 Data Type of Locations]]
* YYLTYPE
* 位置的預設資料結構。可以自行定義加以擴充。
* [[http://www.gnu.org/software/bison/manual/html_node/Actions-and-Locations.html|3.5.2 Actions and Locations]]
* @1 代表 $1 的位置資料結構。
* [[http://www.gnu.org/software/bison/manual/html_node/Location-Default-Action.html|3.5.3 Default Action for Locations]]
* YYLLOC_DEFAULT
* 處理位置的預設動作
* [[http://www.gnu.org/software/bison/manual/html_node/Error-Recovery.html|6 Error Recovery]]
* [[http://marvin.cs.uidaho.edu/Teaching/CS445/bisonErrorToken.html|Using the Error Token in Bison]]
* 如果需要累計錯誤,不中斷剖析,使用 error token。
* [[http://stackoverflow.com/questions/19005355/how-to-count-syntax-errors-in-bison-flex|how to count syntax errors in bison-flex]]
* 在 yyerror 裡面計數。yacc 提供的 error token 不是用來記數錯誤,而是用來做錯誤回復 (error recovery)。
* 8. Error Reporting and Recovery
===== 記錄位置 =====
* [[http://lc.life.blog.163.com/blog/static/2405905920086201624532/|yylineno]]
* [[http://stackoverflow.com/questions/13317634/flex-yylineno-set-to-1|Flex yylineno set to 1]]
* 必須要有處理 \n 的規則,yylineno 才會累加。
* [[http://hi.baidu.com/gao_dennis/item/8fb10642a38136efa5c066c7|lex 和 yacc 学习笔记5 (bison和flex 记录token的位置信息)]]
* 要開啟 location 的支援,必須同時修改 flex 和 bison。
* flex 添加 ''%option bison-locations'' (''%option bison-bridge'' 可移除)。
* flex 添加如下代碼:
/* handle locations */
//static int yycloumn = 1; // yycloumn 自動產生
#define YY_USER_ACTION yylloc->first_line = yylloc->last_line = yylineno; \
yylloc->first_column = yycloumn; yylloc->last_column = yycloumn+yyleng-1; \
yycolumn += yyleng;
* flex 在有處理 \n 的規則將 yycloumn 歸為 1。
* bison 添加 ''%locations''。函式 yyerror 第一個參數為 YYLTYPE*。
* lex 把 ' ' 和 '\t' 都視做長度為一,所以針對 '\t' 計算位置會有問題。
* [[http://stackoverflow.com/questions/8197812/how-does-one-configure-notepad-to-use-spaces-instead-of-tabs|How does one configure Notepad++ to use spaces instead of tabs?]]
* 編輯器將 tab 轉換成固定個數的 space 傳給 parser。
* [[http://websystemsengineering.blogspot.tw/2012/12/replacing-tabs-with-spaces-in-notepad.html|Replacing tabs with spaces in Notepad++]]
* 原來文本中的 tab 不受新設定的影響,不會被轉換成 space。
====== 上下文相關語法 ======
* [[http://www.gnu.org/software/bison/manual/bison.html#Context-Dependency|7 Handling Context Dependencies]]
* [[http://www.gnu.org/software/bison/manual/bison.html#Semantic-Tokens|7.1 Semantic Info in Token Types]]
* 由 flex 負責決定回傳 token 的型別。
* [[http://www.gnu.org/software/bison/manual/bison.html#Lexical-Tie_002dins|7.2 Lexical Tie-ins]]
* 由 Bison 設置標誌,藉此更改 flex 的行為。
* [[http://www.gnu.org/software/bison/manual/bison.html#Tie_002din-Recovery|7.3 Lexical Tie-ins and Error Recovery]]
* 錯誤回復需要調整。
* [[wp>The lexer hack]]
* [[http://eli.thegreenplace.net/2012/07/05/how-clang-handles-the-type-variable-name-ambiguity-of-cc/|How Clang handles the type / variable name ambiguity of C/C++]]
* [[http://www.drdobbs.com/cpp/practical-parsing-for-ansi-c/196603535|Practical Parsing for ANSI C]]
* Lexer 和 Parser 並非各自獨立。
* check_type (SDCC.lex) 負責辨識 typename 和 identifier。
====== 錯誤回復 ======
* [[http://shape-of-code.coding-guidelines.com/2010/04/19/brief-history-of-syntax-error-recovery/|Brief history of syntax error recovery]]
* 設計語言時,如果有所謂的 statement terminiator (如: 分號或斷行),該處會是適合的同步點 (synchronization point)。
stmt : whileStmt
| ifStmt
| ...
| error NEWLINE /* 此處選擇斷行符做為同步點。*/
;
* [[http://inst.eecs.berkeley.edu/~cs164/sp11/lectures/lecture15-2x2.pdf]]
* 如果 lex 必須忽略分隔符 (如: 斷行),則在 yacc 端處理。
* [[http://stackoverflow.com/questions/13096879/bison-error-recovery|bison error recovery]]
* [[http://stackoverflow.com/questions/19005355/how-to-count-syntax-errors-in-bison-flex|how to count syntax errors in bison-flex]]
* [[http://www.gnu.org/software/bison/manual/html_node/Simple-Error-Recovery.html|2.3 Simple Error Recovery]]
* [[https://www.gnu.org/software/bison/manual/html_node/Error-Recovery.html|6 Error Recovery]]
* [[http://marvin.cs.uidaho.edu/Teaching/CS445/bisonErrorToken.html|Using the Error Token in Bison]]
* [[https://www.gnu.org/software/bison/manual/html_node/Destructor-Decl.html|3.7.6 Freeing Discarded Symbols]]
* 進行錯誤回復的過程中,yacc 可能會丟棄已處理的 symbol。此時需要將被丟棄的 symbol 使用的內存釋放掉。
====== 衝突處理 ======
根本原因: 當 Bison 試圖 reduce 時,有其它 shift 或是 reduce 的可能,即為 shift/reduce 或是 reduce/reduce conflict。以下圖來看,"." 出現在規則結尾處時,代表有 reduce 發生。此時,如果有其它 "." 出現,代表衝突發生。
State 14 conflicts: 1 shift/reduce
...
State 14
12 input_state_path: T_INPUT_STATE_NO_PATH . T_DIVIDE identifier
13 | T_INPUT_STATE_NO_PATH .
T_DIVIDE shift, and go to state 19
T_DIVIDE [reduce using rule 13 (input_state_path)]
$default reduce using rule 13 (input_state_path)
* . 代表目前正讀入 T_INPUT_STATE_NO_PATH,規則 12 和規則 13 同時處於此狀態 (State 14)。
* 如果下一個 token 是 T_DIVIDE,Bison 可以選擇使用規則 12,shift T_DIVIDE,轉換到狀態 19。
* 如果下一個 token 是 T_DIVIDE,Bison 可以選擇使用規則 13,reduce。
* 對於 shift/reduce conflict,Bison 預設選擇 shift。因此沒有被用到的規則 13,以中括號表示。
* shift/reduce conflict
* Bison 預設選擇 shift
* 解決方法:
* 從 *.output 檔找到有 shift/reduce conflict 的狀態。
* 辨識出 reduce rule。
* 辨識出與之相衝突的 shift rule。"reduce using" 語句前面有 reduce 預期的下一個 token,看哪一條 shift 預期的下一個 token 和它相同。
* 辨識出 reduce rule 會 reduce 到的狀態。
* 推導出導致 shift/reduce conflict 的字串。
* reduce/reduce conflict
* Bison 預設選擇第一條規則 reduce
* [[http://www.codeproject.com/Articles/5221/The-Bison-v-verbose-option|The Bison -v/--verbose option]]
* 容易發生衝突的情況 (基本上都是因為 recursive 性質所引起):
* expression grammers: 同一條規則,但此規則屬於 recursive rule,因此衝突有機會發生在同一條規則上。
* if/then/else: 具有相同 LHS 的規則。
* nest lists of items
$ grep "State 14" -n Syntax_InputState.output
$ sed -n '211,222p' Syntax_InputState.output > Syntax_InputState.output.new
* [[http://serverfault.com/questions/133692/how-to-display-certain-lines-from-a-text-file-in-linux|How to display certain lines from a text file in Linux?]]
===== GLR 剖析器 =====
* Bison 是 LALR(1),1 代表 lookahead 的 token 個數為 1。如果文法需要 2 個 lookahead,考慮改用 GLR。
* LALR: Look Ahead, Left-to-Right, rightmost derivation
* [[wiki>LALR parser]]
* scanner 和 parser 現在都需要引入 tab.h
* [[http://stackoverflow.com/questions/4946070/bison-flex-peek-at-the-next-letter-or-token|bison/flex: Peek at the next letter or token]]
* 搭配 lex 的 r/s 樣式或許可以解決 lookahread 的問題。
* [[http://flex.sourceforge.net/manual/Patterns.html#Patterns|6 Patterns]]
* [[http://flex.sourceforge.net/manual/Limitations.html#Limitations|24 Limitations]]
對於原本文法存在的 shift/reduce 或是 reduce/reduce conflict,GLR parser 於衝突處分成多個 parser 同時並行處理。注意! GLR parser 處理時間隨 conflict 個數增加。
%glr-parser
%expect 2
%expect-rr
* [[http://www.gnu.org/software/bison/manual/html_node/Expect-Decl.html|3.7.8 Suppressing Conflict Warnings]]
* [[http://www.gnu.org/software/bison/manual/bison.html#GLR-Semantic-Actions|1.5.3.1 Deferred semantic actions]]
[[http://stackoverflow.com/questions/24748594/additional-syntax-error-message-on-glr-parser-when-syntax-is-ambiguous|Additional syntax error message on GLR parser when syntax is ambiguous]]
* 使用 %dprec 解決 reduce/reduce conflict
* 使用 %merge
===== Q & A =====
* [[http://flex.sourceforge.net/manual/Whenever-flex-can-not-match-the-input-it-says-_0022flex-scanner-jammed_0022_002e.html|Whenever flex can not match the input it says "flex scanner jammed".]]
* [[http://stackoverflow.com/questions/10267307/flex-default-rule|Flex default rule]]
* [[http://flex.sourceforge.net/manual/Flex-is-not-matching-my-patterns-in-the-same-order-that-I-defined-them_002e.html|Flex is not matching my patterns in the same order that I defined them.]]
* [[http://stackoverflow.com/questions/8379299/is-it-possible-to-set-priorities-for-rules-to-avoid-the-longest-earliest-match|Is it possible to set priorities for rules to avoid the “longest-earliest” matching pattern?]]
* Lex 會匹配最長的 input。可以用文中方法匹配自己想要的樣式。
* [[http://stackoverflow.com/questions/9611682/flexlexer-support-for-unicode|Flex(lexer) support for unicode]]
* 匹配 []
* [][]
* 注意,如果要在 character class 內匹配到 [],] 必須緊跟在 [ 之後。
* [[http://flex.sourceforge.net/manual/A-Note-About-yytext-And-Memory.html|21.3 A Note About yytext And Memory]]
* [[http://bbs.csdn.net/topics/260011540|请教:strdup与strcpy具体的区别]]
* [[http://stackoverflow.com/questions/14020380/strcpy-vs-strdup|strcpy vs strdup]]
* [[http://stackoverflow.com/questions/7582394/strdup-or-strdup|strdup or _strdup?]]
* 如果看到取得字串有問題,可能是沒有用 strdup 複製 yytext。
* [[http://stackoverflow.com/questions/5613073/yacc-rule-useless-due-to-conflicts|Yacc “rule useless due to conflicts”]]
* [[http://www.gnu.org/software/bison/manual/html_node/Mid_002dRule-Conflicts.html|3.4.8.3 Conflicts due to Mid-Rule Actions]]
* 當 Yacc 無法藉由有限個數的 lookahead 來推導規則會引發此錯誤。Mid-Rule Action 有時也會引發此錯誤。
* [[http://stackoverflow.com/questions/6797221/bison-how-to-ignore-a-token-if-it-doesnt-fit-into-a-rule|Bison: How to ignore a token if it doesn't fit into a rule]]
* Bison 無法忽略 Flex 傳回的 token。
* [[http://stackoverflow.com/questions/17954253/how-to-get-rid-of-warnings-in-bison|How to get rid of warnings in Bison?]]
* [[http://www.gnu.org/software/bison/manual/html_node/Bison-Options.html|9.1 Bison Options]]
# 先關閉 waring,再開啟 conflicts-sr
% bison -Wnone,conflicts-sr
* [[http://stackoverflow.com/questions/16452737/is-it-possible-to-call-one-yacc-parser-from-another-to-parse-specific-token-subs|Is it possible to call one yacc parser from another to parse specific token substream?]]
====== 處理 INCLUDE ======
YY_BUFFER_STATE 是 lex 對特定輸入流所產生的句柄。其中包含輸入流的緩衝區,和其它相關變數。
#define YY_STRUCT_YY_BUFFER_STATE
struct yy_buffer_state
{
FILE *yy_input_file;
char *yy_ch_buf; /* input buffer */
char *yy_buf_pos; /* current position in input buffer */
/* Size of input buffer in bytes, not including room for EOB
* characters.
*/
yy_size_t yy_buf_size;
...
}
* [[http://westes.github.io/flex/manual/Multiple-Input-Buffers.html|11 Multiple Input Buffers]]
* yypush_buffer_state 和 yypop_buffer_state 將 YY_BUFFER_STATE 入棧和出棧。但除了 YY_BUFFER_STATE 之外,一般還需要其它資訊,因此較建議自行維護此類棧空間。
* [[https://stackoverflow.com/questions/2269929/how-can-i-implement-include-constructs-with-flex-and-yacc|How can I implement #include constructs with Flex and YACC?]]
* [[http://efxa.org/2014/05/11/implementing-the-include-directive-to-support-nested-input-files/|Implementing the “include” directive to support nested input files]]
* YY_CURRENT_BUFFER 代表當前的輸入流。
* Chapter 2: Using Flex
* Start States and Nested Input Files
* Chapter 5: A Reference for Flex Specifications
* Input Management
* Lex 的輸入流可以是:
* 標準輸入
* 檔案
* 字串: [[http://stackoverflow.com/questions/780676/string-input-to-flex-lexer|String input to flex lexer]]
* YYINPUT 將輸入流填充到緩衝區,使用者可以自行替換 YYINPUT,添加自己所需的功能。
yy_switch_to_buffer (yy_scan_buffer (char *base, yy_size_t size) );
* 第零階段。處理巨集定義和調用、條件編譯。
====== Lex & Yacc Internal ======
===== Flex =====
token 由兩部分組成: token 和 token value。
* Lex
* yytext: token
* yylval: token value,傳給 yacc
* yylex: lexer
%{
// declaration and option setting
#include
#include // yacc 根據 *.y 產生的 *.h,定義 token (number)
%}
%%
// pattern and action
REGEXP1 ACTION
REGEXP2 {ACTIONS}
%%
// helper function
* Yacc
* YYSTYPE 定義 yylval 的型別,這裡用改用 union
* 用 $- 取得第幾號 yylval
* yyparse: parser
%{
// include C header files
%}
// 定義 token。parser 據此判斷該符號是 token。
// 其餘非 token 的符號必須出現在 rule 左邊至少一次。
%%
// rule (grammer)
// EXPS ($$)
// EXP ($1)
// FACTORS ($2)
// TOKEN ($3)
EXPS: /* empty, match the beginning of the input */
| EXP FACTORS TOKEN
{
ACTIONS
}
; /* end of rule */
%%
// helper function
* [[wp>Flex lexical analyser]]
* [[http://flex.sourceforge.net/manual/|Flex Manual]]
* [[http://flex.sourceforge.net/manual/Start-Conditions.html#Start-Conditions|Start Conditions]]
* 該條規則只有在 START COND 下才會被啟動
REGEXP ACTION
* [[http://flex.sourceforge.net/manual/Multiple-Input-Buffers.html|Multiple Input Buffers]]
* 指定 lex 從哪一個緩衝區讀入 token
* lex 可以設定多種 start condition。不同 start condition 會採用不同的 rule。使用 BEGIN 切換 start condition。
%x SN_CONDITION_MACRO_DEFINITION SN_CONDITION_MACRO_CALL SN_CONDITION_MACRO_CALL_DELIM
%x SN_CONDITION_MACRO_EXPANSION SN_CONDITION_MACRO_DECL SN_CONDITION_INCLUDE
%x SN_CONDITION_FALSE
%x SN_CONDITION_COPY SN_CONDITION_MESSAGE
%x SN_NO_BOUNDARY_TWO_STRING SN_NO_BOUNDARY_ONE_STRING
%x SN_COMMENT_CONDITION SN_COMMENT_CONDITION2
%x SN_NO_BOUNDARY_THREE_TOKEN
* yyscan_t 是可重入 (reentrant) 的 Lex 之狀態。
// 透過 yyscan_t 傳遞 scanner 狀態
// struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
typedef void* yyscan_t;
/* For convenience, these vars (plus the bison vars far below)
are macros in the reentrant scanner. */
#define yyin yyg->yyin_r
#define yyout yyg->yyout_r
#define yyextra yyg->yyextra_r
#define yyleng yyg->yyleng_r
#define yytext yyg->yytext_r
#define yylineno (YY_CURRENT_BUFFER_LVALUE->yy_bs_lineno)
#define yycolumn (YY_CURRENT_BUFFER_LVALUE->yy_bs_column)
#define yy_flex_debug yyg->yy_flex_debug_r
/* Holds the entire state of the reentrant scanner. */
struct yyguts_t
{
/* User-defined. Not touched by flex. */
YY_EXTRA_TYPE yyextra_r;
/* The rest are the same as the globals declared in the non-reentrant scanner. */
FILE *yyin_r, *yyout_r;
size_t yy_buffer_stack_top; /**< index of top of stack. */
size_t yy_buffer_stack_max; /**< capacity of stack. */
YY_BUFFER_STATE * yy_buffer_stack; /**< Stack as an array. */
char yy_hold_char;
int yy_n_chars;
int yyleng_r;
char *yy_c_buf_p;
int yy_init;
int yy_start;
int yy_did_buffer_switch_on_eof;
int yy_start_stack_ptr;
int yy_start_stack_depth;
int *yy_start_stack;
yy_state_type yy_last_accepting_state;
char* yy_last_accepting_cpos;
int yylineno_r;
int yy_flex_debug_r;
char *yytext_r;
int yy_more_flag;
int yy_more_len;
YYSTYPE * yylval_r;
}; /* end struct yyguts_t */
* [[http://stackoverflow.com/questions/2634998/writing-re-entrant-lexer-with-flex|Writing re-entrant lexer with Flex]]
*yyscanner is available to all your actions. In addition, yylex has a local variable called yyg that holds the entire state, and most macros conveniently refer to yyg.
* [[http://flex.sourceforge.net/manual/Reentrant.html|Reentrant C Scanners]]
===== Bison =====
* [[http://www.cs.uic.edu/~spopuri/cparser.html|Understanding C parsers generated by GNU Bison]]
* [[https://lists.gnu.org/archive/html/help-bison/2006-09/msg00018.html|Understanding C parsers generated by GNU Bison]]
* [[http://www.dickgrune.com/|Dr. Dick Grune]]
* [[http://dickgrune.com/Books/PTAPG_1st_Edition/BookBody.pdf|Parsing Techniques - A Practical Guide]]
====== 系列文章 ======
* [[http://efxa.org/2014/05/11/implementing-the-include-directive-to-support-nested-input-files/|Implementing the “include” directive to support nested input files]]
* [[http://efxa.org/2014/05/17/techniques-for-resolving-common-grammar-conflicts-in-parsers/|Techniques for resolving common grammar conflicts in parsers]]
* [[http://efxa.org/2014/05/25/how-to-create-an-abstract-syntax-tree-while-parsing-an-input-stream/|How to create an abstract syntax tree while parsing an input stream]]
====== 網路教程 ======
* [[http://web.cs.wpi.edu/~kal/courses/compilers/|Compilers Programming Language Translation]]
* [[http://www.progtools.org/compilers/tutorials/cxx_and_bison/cxx_and_bison.html|How to use C++ with Bison, V 1.0.2]]
* [[http://gnuu.org/2009/09/18/writing-your-own-toy-compiler/|Writing Your Own Toy Compiler Using Flex, Bison and LLVM]]
* [[http://www.saltycrane.com/blog/2007/02/example-using-bison-and-flex-with/|Example using bison and flex with cygwin on Windows]]
* [[http://www.tldp.org/HOWTO/Lex-YACC-HOWTO.html#toc1|Lex and YACC primer]]
* [[http://dinosaur.compilertools.net/|The Lex & Yacc Page]]
* [[http://www.lemoda.net/c/reentrant-parser/index.html|Make a reentrant parser with Flex and Bison]]
====== 進階議題 ======
* GLR parser
* [[http://scottmcpeak.com/elkhound/|Elkhound: A GLR Parser Generator]]
* Push and Pull parser
* 以 parser 為主,調用 scanner 的模式,稱之為 pull parser,意味著將 token pull 進 parser。
* 以 scanner 為主,將 token 餵給 scanner 的模式,稱之為 push parser,意味著將 token push 進 parser。
* [[https://www.gnu.org/software/bison/manual/html_node/Push-Decl.html|3.7.11 A Push Parser]]
===== 斷行問題 =====
* [[http://www.editpadpro.com/tricklinebreak.html|Line Breaks in Windows, UNIX & Macintosh Text Files]]
* Windows 系統以 CR+LF 斷行。
* UNIX 和 OS X 系統以 LF 斷行。
* 傳統 Mac 系統以 CR 斷行。
===== 編碼問題 =====
* 由編輯器將文本轉成 [[UTF8|UTF-8]] 編碼,以 char * 傳給,Lex 撰寫 UTF-8 編碼匹配字符,後續再轉成寬字符 (wchar_t)。
* [[http://csliu.com/2009/04/unicode-support-in-flex|Unicode Support in Flex]]
* [[http://lists.gnu.org/archive/html/help-flex/2001-05/msg00014.html|Multi-byte support in 'flex']]
* [[http://stackoverflow.com/questions/9611682/flexlexer-support-for-unicode|Flex(lexer) support for unicode]]
===== 語法剖析術語=====
* [[http://www.panopticoncentral.net/2009/02/12/lexing-and-parsing/|Lexing and Parsing]]
* [[http://www.panopticoncentral.net/2009/03/13/ll-vs-lr-vs-glr/|LL vs. LR vs. GLR]]
* [[http://stackoverflow.com/questions/19663564/what-is-the-difference-between-lalr-and-lr-parsing|What is the difference between LALR and LR parsing?]]
* LR (0) < LALR(1) < LR(1)
* [[wp>Shift-reduce parser]]
* Shift 表示將 token/non-terminal 置於堆疊上; reduce 代表匹配到一個 rule,將該 rule 相關的 token/non-terminal 推出堆疊。
* [[wp>Operator-precedence parser]]
* [[wp>Off-side rule]]
* [[wp>Earley parser]]
* [[http://shape-of-code.coding-guidelines.com/2014/09/10/is-early-parsing-now-practical/|Is Early parsing now practical?]]
* [[wp>Dangling else]]
====== 外部連結 ======
* [[https://fbb-git.github.io/bisoncpp/|BisonC++]]
* [[http://gppg.codeplex.com/|Gardens Point Parser Generator]]
* C# 版的 Flex & Bison。