- 使用 [[https://github.com/westes/flex|flex]] 和 [[http://www.gnu.org/software/bison/|Bison]]。 * [[http://westes.github.io/flex/manual/|Lexical Analysis With Flex]] * [[https://www.gnu.org/software/bison/manual/bison.html|Bison 3.0.4]] - 參考書籍 * [[https://www.amazon.com/flex-bison-Text-Processing-Tools/dp/0596155972|flex & bison: Text Processing Tools]] * [[https://www.amazon.com/Language-Implementation-Patterns-Domain-Specific-Programming/dp/193435645X|Language Implementation Patterns]] * [[http://www.ituring.com.cn/book/1159|自製編程語言]] - [[ANTLR]] ====== 基礎教學 ====== {{http://epaperpress.com/lexandyacc/images/fig12.gif}} - 撰寫 bas.y (yacc 檔) 和 bas.l (lex 檔)。 - yacc 編譯 bas.y 產生定義 token 的 y.tab.h 給 lex.yy.c include; 產生定義 parser 的 y.tab.c,yyparse 是 parser 的入口函式。 - lex 編譯 bas.l 產生定義 scanner 的 lex.yy.c,yylex 是 scanner 的入口函式。y.tab.c 的 yyparse 調用 yylex。 - cc (C 編譯器) 編譯 y.tab.c 和 lex.yy.c,得到自製編譯器 bas.exe。 - bas.exe 編譯 source,得到 compiled output。 * [[http://www.ituring.com.cn/book/1159|自制编程语言]] * [[http://download.csdn.net/detail/matrixvirus/6573659|自制编程语言]]. * [[http://web.iitd.ac.in/~sumeet/flex__bison.pdf|flex & bison]] * [[http://www.cse.iitb.ac.in/~br/courses/cs699-autumn2013/refs/lexyacc-epaperpress.pdf|A COMPACT GUIDE TO LEX & YACC]] * 寫 lex 需要懂 [[wp>Regular expression]],寫 yacc 需要懂 [[wp>Backus–Naur form|Backus–Naur form (BNF)]]。 * 要懂怎麼建 ast tree, walk ast tree, 建 symbol table。 * 要懂怎麼為輸入輸出建模,錯誤訊息怎麼有條理的輸出。 ===== BNF ===== * [[http://good-ed.blogspot.tw/2010/02/ebnf.html|(E)BNF表示式]] * [[http://good-ed.blogspot.tw/2010/03/blog-post.html|手工打造算式計算機]] * [[http://good-ed.blogspot.tw/2010/04/lexyacc.html|以lex/yacc實作算式計算機]] * [[wp>LR parser]] * LR(k) 代表往前看幾個 token。 * [[wp>GLR parser]] * [[http://www.gnu.org/software/bison/manual/html_node/GLR-Parsers.html|1.5 Writing GLR Parsers]] * [[http://www.gnu.org/software/bison/manual/html_node/Debugging.html|8 Debugging Your Parser]] * [[http://www.gnu.org/software/bison/manual/html_node/Understanding.html|8.1 Understanding Your Parser]] * 小技巧 * [[http://stackoverflow.com/questions/14665976/flex-use-text-file-as-input-stream|Flex: Use Text File as Input Stream]] * [[http://stackoverflow.com/questions/8794239/can-flex-and-bison-accept-input-from-other-sources|Can flex and bison accept input from other sources?]] * [[http://stackoverflow.com/questions/1756275/bison-end-of-file|bison end of file]] * 問題在於 Bison 遇到 EOF,其動作是中止文法推導 (return 0)。此時 stack 中尚有 symbol,因此 Bison 判定為錯誤。故人為預先在文本或是輸入流加上 \n,使得文法推導成功。 * [[http://stackoverflow.com/questions/9611682/flexlexer-support-for-unicode|Flex(lexer) support for unicode]] ====== Flex & Bison 應用 ====== * [[http://www.di-mgt.com.au/flex_and_bison_in_msvc.html|Using flex and bison in MSVC++]] * [[http://blog.sina.com.cn/s/blog_6476250d0100wq2q.html|【计算机】在Visual Studio 2008中使用Flex、Bison工具的配置]] * 在 Windows 上建議使用 Cygwin 提供的 Flex & Bison。 * [[http://flex.sourceforge.net/manual/Makefiles-and-Flex.html|A.1 Makefiles and Flex]] ===== Flex ===== definitions %% rules %% user code * [[http://flex.sourceforge.net/manual/Definitions-Section.html|5.1 Format of the Definitions Section]] * 以 {name} 的形式使用,避免不必要的錯誤。 * [[http://flex.sourceforge.net/manual/Rules-Section.html|5.2 Format of the Rules Section]] * [[http://flex.sourceforge.net/manual/Patterns.html|6 Patterns]] * These expressions all designate a set of characters equivalent to the corresponding standard C isXXX function. * [[http://marvin.cs.uidaho.edu/Handouts/regex.html|A Regular Expression Primer]] * [[http://flex.sourceforge.net/manual/User-Code-Section.html|5.3 Format of the User Code Section]] * [[http://flex.sourceforge.net/manual/Comments-in-the-Input.html|5.4 Comments in the Input]] * 新起一行,一個以上空白,後接 /* comment */。 * 常用選項 * [[http://flex.sourceforge.net/manual/Code_002dLevel-And-API-Options.html|16.3 Code-Level And API Options]] * %option bison-locations * 產生位置資訊供 bison 使用。此選項隱含 %option bison-bridge * %option bison-bridge * 產生適用於 bison 的接口。 * %option reentrant * 產生可重入的 scanner。 * %option nounistd * 避免引入非標準的 unistd 標頭檔。 * %option noyywrap * %option nodefault * %option yylinenumber * %option case-insensitive * 小技巧 * [[http://flex.sourceforge.net/manual/Start-Conditions.html|10 Start Conditions]] * Lex 規則可以用 start condition 分組。start condition 可以是 inclusive 或是 exclusive。 * 切換至 inclusive start condition 時,沒有隸屬任何 start condition 的規則也會被納入匹配。If the start condition is inclusive, then rules with no start conditions at all will also be active. * 切換至 exclusive start condition 時,只有隸屬該 exclusive start condition 會被納入匹配。If it is exclusive, then only rules qualified with the start condition will be active. * BEGIN(0) 或是 BEGIN(INITIAL) 都會切換回預設的 start condition (INITIAL)。 * [[http://stackoverflow.com/questions/22779138/when-should-i-use-begin-to-change-start-condition|When should I use BEGIN to change start condition?]] %s example ; example 是 inclusive start condition。 %% ; 當切換到 example start condition 時,foo 和 bar 都會被激活。 foo do_something(); bar something_else(); * [[http://flex.sourceforge.net/manual/How-can-I-match-C_002dstyle-comments_003f.html|How can I match C-style comments?]] * [[http://stackoverflow.com/questions/15396322/the-quotation-marks-in-flex|The quotation marks “ ” in flex]] * 雙引號用意在於告訴 lex 不用解釋其中內容。正則表達式所用的符號不起作用。 * 如果要匹配雙引號,使用 \"。 * 單引號不起作用,單純代表單引號本身。 * Bison 用雙引號代表要匹配的字串,單引號代表要匹配的字元。 ===== Bison ===== Bison 3.0 版之後有較大改變。注意手冊版本。 * [[http://www.gnu.org/software/bison/manual/html_node/Grammar-Outline.html|3.1 Outline of a Bison Grammar]] * [[http://www.gnu.org/software/bison/manual/html_node/Grammar-File.html|3 Bison Grammar Files]] * [[http://www.gnu.org/software/bison/manual/html_node/Recursion.html|3.3.3 Recursive Rules]] /* left recursion */ expseq1: exp | expseq1 ',' exp ; /* right recursion */ expseq1: exp | exp ',' expseq1 ; * right recursion 有可能會造成 bison 內部的棧空間耗盡,但優點在於鏈結串列的節點是由頭串聯至尾,無須反向。 * 可以透過 YYMAXDEPTH ([[http://www.gnu.org/software/bison/manual/html_node/Memory-Management.html|5.10 Memory Management, and How to Avoid Memory Exhaustion]]) 加大 bison 內部的棧空間。 * [[http://stackoverflow.com/questions/1697363/left-recursion-in-grammar-results-in-conflicts|Left Recursion in Grammar Results in Conflicts]] * [[http://www.gnu.org/software/bison/manual/html_node/Declarations.html|3.7 Bison Declarations]] * [[http://www.gnu.org/software/bison/manual/html_node/Decl-Summary.html|3.7.12 Bison Declaration Summary]] * [[http://www.gnu.org/software/bison/manual/html_node/Interface.html|4 Parser C-Language Interface]] * [[http://www.gnu.org/software/bison/manual/html_node/Parser-Function.html|4.1 The Parser Function yyparse]] * 可以藉由檢查 yyparse 返回值,簡單查驗 parse 是否成功。 * [[http://www.gnu.org/software/bison/manual/html_node/Algorithm.html|5 The Bison Parser Algorithm]] * [[http://www.gnu.org/software/bison/manual/html_node/Lookahead.html|5.1 Lookahead Tokens]] * [[http://www.gnu.org/software/bison/manual/html_node/Shift_002fReduce.html|5.2 Shift/Reduce Conflicts]] * 不管是 shift 匹配到一個 symbol,或是當下 reduce (stack) 到一個 rule,都是合法行徑的時候,即為 Shift/Reduce Conflict。 * 觀察 *.output,可以得知發生 Shift/Reduce Conflict 的狀態為何。如果有吃進同一個 token, 不論 shift 或是 reduce 皆可,即是 Shift/Reduce Conflict。 * [[http://www.gnu.org/software/bison/manual/html_node/Non-Operators.html|5.3.6 Using Precedence For Non Operators]] * [[http://www.gnu.org/software/bison/manual/html_node/Expect-Decl.html|3.7.8 Suppressing Conflict Warnings]] * [[http://www.gnu.org/software/bison/manual/html_node/Multiple-Types.html#Multiple-Types|More Than One Value Type]] * [[http://www.gnu.org/software/bison/manual/html_node/Token-Decl.html|Token Type Names]] %token TOKEN
Each call to yylex() returns an integer value which represents a token type. This tells YACC what kind of token it has read. The token may optionally have a value, which should be placed in the variable yylval. By default yylval is of type int, but you can override that from the YACC file by re#defining YYSTYPE.
* token 有以下三種含義: * 回傳給 yacc 的 token type。 * token 的原始字串,yytext。 * 回傳給 yacc 的 token value,yylval。 使用 %union 讓 yylval 可以有多種型別。 %union { int number; char *string; } %token STATE %token NUMBER %token WORD [0-9]+ yylval.number=atoi(yytext); return NUMBER; [a-z0-9]+ yylval.string=strdup(yytext);return WORD; ====== 優先權與結合性 ====== * [[http://www.gnu.org/software/bison/manual/html_node/Precedence-Decl.html|3.7.3 Operator Precedence]] * [[http://www.gnu.org/software/bison/manual/html_node/Precedence.html|5.3 Operator Precedence]] * [[http://www.gnu.org/software/bison/manual/html_node/Using-Precedence.html#Using-Precedence|5.3.2 Specifying Operator Precedence]] * The relative precedence of different operators is controlled by the order in which they are declared. The first precedence/associativity declaration in the file declares the operators whose precedence is lowest, the next such declaration declares the operators whose precedence is a little higher, and so on. ====== 建立符號表 ====== * [[http://www.d.umn.edu/~rmaclin/cs5641/Notes/L15_SymbolTable.pdf|Symbol Table Implementations]] * [[http://pages.cs.wisc.edu/~cs536-1/readings/6.SYMBOL-TABLES.html|Symbol Tables and Static Checks]] * 2 Using Flex ====== 建立語法樹 ====== Bison 中所寫的語法規則,有些僅是用來處理運算子優先順序,和程式無關。抽象語法樹即是將前述語法規則剔除。 * [[http://web.eecs.utk.edu/~bvz/teaching/cs461Sp11/notes/parse_tree/|Building Abstract Syntax Trees]] * 遍歷抽象語法樹 * [[http://stackoverflow.com/questions/2644597/how-do-i-implement-if-statement-in-flex-bison|How do i implement If statement in Flex/bison]] * [[http://efxa.org/2014/05/25/how-to-create-an-abstract-syntax-tree-while-parsing-an-input-stream/|How to create an abstract syntax tree while parsing an input stream]] * 3 Using Bison ====== Mid-Rule Action ====== * [[https://www.gnu.org/software/bison/manual/html_node/Using-Mid_002dRule-Actions.html|3.4.8.1 Using Mid-Rule Actions]] * A mid-rule action may refer to the components preceding it using $n, but it may not refer to subsequent components because it is run before they are parsed. * mid-rule action 可以參照到前面 symbol 的值,但無法存取其後 symbol 的值。 * The mid-rule action itself counts as one of the components of the rule. This makes a difference when there is another action later in the same rule (and usually there is another at the end): you have to count the actions along with the symbols when working out which number n to use in $n. * mid-rule action 本身也算成一個 symbol。 * The mid-rule action can also have a semantic value. The action can set its value with an assignment to $$, and actions later in the rule can refer to the value using $n. Since there is no symbol to name the action, there is no way to declare a data type for the value in advance, so you must use the ‘$<…>n’ construct to specify a data type each time you refer to this value. * mid-rule action 本身可以有值。 * There is no way to set the value of the entire rule with a mid-rule action, because assignments to $$ do not have that effect. The only way to set the value for the entire rule is with an ordinary action at the end of the rule. * mid-rule action 的值無法賦值給本身所屬的規則。 ====== 錯誤回報 ====== * [[https://www.ibm.com/developerworks/opensource/library/l-flexbison/|Better error handling using Flex and Bison]] * [[http://stackoverflow.com/questions/15429834/flex-bison-building-c-compiler-line-number-in-error-message|Flex Bison: Building C Compiler - Line number in error message]] * [[http://stackoverflow.com/questions/656703/how-does-flex-support-bison-location-exactly|How does flex support bison-location exactly?]] * [[http://www.gnu.org/software/bison/manual/html_node/Error-Reporting.html|4.7 The Error Reporting Function yyerror]] * 取得更精確的錯誤訊息。 * 不同版本 Bison 有不同控制項 * %error-verbose * %define parse.error verbose
If you invoke ‘%define parse.error verbose’ in the Bison declarations section (see The Bison Declarations Section), then Bison provides a more verbose and specific error message string instead of just plain "syntax error". However, that message sometimes contains incorrect information if LAC is not enabled (see LAC).
* [[http://www.gnu.org/software/bison/manual/html_node/LAC.html|5.8.3 LAC]] * [[http://flex.sourceforge.net/manual/Generated-Scanner.html|9 The Generated Scanner]] * 自訂 YY_INPUT * BeginToken 紀錄 token 起終點。 * Lex 讀取 token 時,有可能會多讀取字元,以便判定是否辨識出一個 token。此時需要修正 token 位置,輸出錯誤訊息時才能正確定位。 * [[http://www.gnu.org/software/bison/manual/html_node/Tracking-Locations.html|3.5 Tracking Locations]] * [[http://www.gnu.org/software/bison/manual/html_node/Location-Type.html|3.5.1 Data Type of Locations]] * YYLTYPE * 位置的預設資料結構。可以自行定義加以擴充。 * [[http://www.gnu.org/software/bison/manual/html_node/Actions-and-Locations.html|3.5.2 Actions and Locations]] * @1 代表 $1 的位置資料結構。 * [[http://www.gnu.org/software/bison/manual/html_node/Location-Default-Action.html|3.5.3 Default Action for Locations]] * YYLLOC_DEFAULT * 處理位置的預設動作 * [[http://www.gnu.org/software/bison/manual/html_node/Error-Recovery.html|6 Error Recovery]] * [[http://marvin.cs.uidaho.edu/Teaching/CS445/bisonErrorToken.html|Using the Error Token in Bison]] * 如果需要累計錯誤,不中斷剖析,使用 error token。 * [[http://stackoverflow.com/questions/19005355/how-to-count-syntax-errors-in-bison-flex|how to count syntax errors in bison-flex]] * 在 yyerror 裡面計數。yacc 提供的 error token 不是用來記數錯誤,而是用來做錯誤回復 (error recovery)。 * 8. Error Reporting and Recovery ===== 記錄位置 ===== * [[http://lc.life.blog.163.com/blog/static/2405905920086201624532/|yylineno]] * [[http://stackoverflow.com/questions/13317634/flex-yylineno-set-to-1|Flex yylineno set to 1]] * 必須要有處理 \n 的規則,yylineno 才會累加。 * [[http://hi.baidu.com/gao_dennis/item/8fb10642a38136efa5c066c7|lex 和 yacc 学习笔记5 (bison和flex 记录token的位置信息)]] * 要開啟 location 的支援,必須同時修改 flex 和 bison。 * flex 添加 ''%option bison-locations'' (''%option bison-bridge'' 可移除)。 * flex 添加如下代碼: /* handle locations */ //static int yycloumn = 1; // yycloumn 自動產生 #define YY_USER_ACTION yylloc->first_line = yylloc->last_line = yylineno; \ yylloc->first_column = yycloumn; yylloc->last_column = yycloumn+yyleng-1; \ yycolumn += yyleng; * flex 在有處理 \n 的規則將 yycloumn 歸為 1。 * bison 添加 ''%locations''。函式 yyerror 第一個參數為 YYLTYPE*。 * lex 把 ' ' 和 '\t' 都視做長度為一,所以針對 '\t' 計算位置會有問題。 * [[http://stackoverflow.com/questions/8197812/how-does-one-configure-notepad-to-use-spaces-instead-of-tabs|How does one configure Notepad++ to use spaces instead of tabs?]] * 編輯器將 tab 轉換成固定個數的 space 傳給 parser。 * [[http://websystemsengineering.blogspot.tw/2012/12/replacing-tabs-with-spaces-in-notepad.html|Replacing tabs with spaces in Notepad++]] * 原來文本中的 tab 不受新設定的影響,不會被轉換成 space。 ====== 上下文相關語法 ====== * [[http://www.gnu.org/software/bison/manual/bison.html#Context-Dependency|7 Handling Context Dependencies]] * [[http://www.gnu.org/software/bison/manual/bison.html#Semantic-Tokens|7.1 Semantic Info in Token Types]] * 由 flex 負責決定回傳 token 的型別。 * [[http://www.gnu.org/software/bison/manual/bison.html#Lexical-Tie_002dins|7.2 Lexical Tie-ins]] * 由 Bison 設置標誌,藉此更改 flex 的行為。 * [[http://www.gnu.org/software/bison/manual/bison.html#Tie_002din-Recovery|7.3 Lexical Tie-ins and Error Recovery]] * 錯誤回復需要調整。 * [[wp>The lexer hack]] * [[http://eli.thegreenplace.net/2012/07/05/how-clang-handles-the-type-variable-name-ambiguity-of-cc/|How Clang handles the type / variable name ambiguity of C/C++]] * [[http://www.drdobbs.com/cpp/practical-parsing-for-ansi-c/196603535|Practical Parsing for ANSI C]] * Lexer 和 Parser 並非各自獨立。 * check_type (SDCC.lex) 負責辨識 typename 和 identifier。 ====== 錯誤回復 ====== * [[http://shape-of-code.coding-guidelines.com/2010/04/19/brief-history-of-syntax-error-recovery/|Brief history of syntax error recovery]] * 設計語言時,如果有所謂的 statement terminiator (如: 分號或斷行),該處會是適合的同步點 (synchronization point)。 stmt : whileStmt | ifStmt | ... | error NEWLINE /* 此處選擇斷行符做為同步點。*/ ; * [[http://inst.eecs.berkeley.edu/~cs164/sp11/lectures/lecture15-2x2.pdf]] * 如果 lex 必須忽略分隔符 (如: 斷行),則在 yacc 端處理。 * [[http://stackoverflow.com/questions/13096879/bison-error-recovery|bison error recovery]] * [[http://stackoverflow.com/questions/19005355/how-to-count-syntax-errors-in-bison-flex|how to count syntax errors in bison-flex]] * [[http://www.gnu.org/software/bison/manual/html_node/Simple-Error-Recovery.html|2.3 Simple Error Recovery]] * [[https://www.gnu.org/software/bison/manual/html_node/Error-Recovery.html|6 Error Recovery]] * [[http://marvin.cs.uidaho.edu/Teaching/CS445/bisonErrorToken.html|Using the Error Token in Bison]] * [[https://www.gnu.org/software/bison/manual/html_node/Destructor-Decl.html|3.7.6 Freeing Discarded Symbols]] * 進行錯誤回復的過程中,yacc 可能會丟棄已處理的 symbol。此時需要將被丟棄的 symbol 使用的內存釋放掉。 ====== 衝突處理 ====== 根本原因: 當 Bison 試圖 reduce 時,有其它 shift 或是 reduce 的可能,即為 shift/reduce 或是 reduce/reduce conflict。以下圖來看,"." 出現在規則結尾處時,代表有 reduce 發生。此時,如果有其它 "." 出現,代表衝突發生。 State 14 conflicts: 1 shift/reduce ... State 14 12 input_state_path: T_INPUT_STATE_NO_PATH . T_DIVIDE identifier 13 | T_INPUT_STATE_NO_PATH . T_DIVIDE shift, and go to state 19 T_DIVIDE [reduce using rule 13 (input_state_path)] $default reduce using rule 13 (input_state_path) * . 代表目前正讀入 T_INPUT_STATE_NO_PATH,規則 12 和規則 13 同時處於此狀態 (State 14)。 * 如果下一個 token 是 T_DIVIDE,Bison 可以選擇使用規則 12,shift T_DIVIDE,轉換到狀態 19。 * 如果下一個 token 是 T_DIVIDE,Bison 可以選擇使用規則 13,reduce。 * 對於 shift/reduce conflict,Bison 預設選擇 shift。因此沒有被用到的規則 13,以中括號表示。 * shift/reduce conflict * Bison 預設選擇 shift * 解決方法: * 從 *.output 檔找到有 shift/reduce conflict 的狀態。 * 辨識出 reduce rule。 * 辨識出與之相衝突的 shift rule。"reduce using" 語句前面有 reduce 預期的下一個 token,看哪一條 shift 預期的下一個 token 和它相同。 * 辨識出 reduce rule 會 reduce 到的狀態。 * 推導出導致 shift/reduce conflict 的字串。 * reduce/reduce conflict * Bison 預設選擇第一條規則 reduce * [[http://www.codeproject.com/Articles/5221/The-Bison-v-verbose-option|The Bison -v/--verbose option]] * 容易發生衝突的情況 (基本上都是因為 recursive 性質所引起): * expression grammers: 同一條規則,但此規則屬於 recursive rule,因此衝突有機會發生在同一條規則上。 * if/then/else: 具有相同 LHS 的規則。 * nest lists of items $ grep "State 14" -n Syntax_InputState.output $ sed -n '211,222p' Syntax_InputState.output > Syntax_InputState.output.new * [[http://serverfault.com/questions/133692/how-to-display-certain-lines-from-a-text-file-in-linux|How to display certain lines from a text file in Linux?]] ===== GLR 剖析器 ===== * Bison 是 LALR(1),1 代表 lookahead 的 token 個數為 1。如果文法需要 2 個 lookahead,考慮改用 GLR。 * LALR: Look Ahead, Left-to-Right, rightmost derivation * [[wiki>LALR parser]] * scanner 和 parser 現在都需要引入 tab.h * [[http://stackoverflow.com/questions/4946070/bison-flex-peek-at-the-next-letter-or-token|bison/flex: Peek at the next letter or token]] * 搭配 lex 的 r/s 樣式或許可以解決 lookahread 的問題。 * [[http://flex.sourceforge.net/manual/Patterns.html#Patterns|6 Patterns]] * [[http://flex.sourceforge.net/manual/Limitations.html#Limitations|24 Limitations]] 對於原本文法存在的 shift/reduce 或是 reduce/reduce conflict,GLR parser 於衝突處分成多個 parser 同時並行處理。注意! GLR parser 處理時間隨 conflict 個數增加。 %glr-parser %expect 2 %expect-rr * [[http://www.gnu.org/software/bison/manual/html_node/Expect-Decl.html|3.7.8 Suppressing Conflict Warnings]] * [[http://www.gnu.org/software/bison/manual/bison.html#GLR-Semantic-Actions|1.5.3.1 Deferred semantic actions]] [[http://stackoverflow.com/questions/24748594/additional-syntax-error-message-on-glr-parser-when-syntax-is-ambiguous|Additional syntax error message on GLR parser when syntax is ambiguous]] * 使用 %dprec 解決 reduce/reduce conflict * 使用 %merge ===== Q & A ===== * [[http://flex.sourceforge.net/manual/Whenever-flex-can-not-match-the-input-it-says-_0022flex-scanner-jammed_0022_002e.html|Whenever flex can not match the input it says "flex scanner jammed".]] * [[http://stackoverflow.com/questions/10267307/flex-default-rule|Flex default rule]] * [[http://flex.sourceforge.net/manual/Flex-is-not-matching-my-patterns-in-the-same-order-that-I-defined-them_002e.html|Flex is not matching my patterns in the same order that I defined them.]] * [[http://stackoverflow.com/questions/8379299/is-it-possible-to-set-priorities-for-rules-to-avoid-the-longest-earliest-match|Is it possible to set priorities for rules to avoid the “longest-earliest” matching pattern?]] * Lex 會匹配最長的 input。可以用文中方法匹配自己想要的樣式。 * [[http://stackoverflow.com/questions/9611682/flexlexer-support-for-unicode|Flex(lexer) support for unicode]] * 匹配 [] * [][] * 注意,如果要在 character class 內匹配到 [],] 必須緊跟在 [ 之後。 * [[http://flex.sourceforge.net/manual/A-Note-About-yytext-And-Memory.html|21.3 A Note About yytext And Memory]] * [[http://bbs.csdn.net/topics/260011540|请教:strdup与strcpy具体的区别]] * [[http://stackoverflow.com/questions/14020380/strcpy-vs-strdup|strcpy vs strdup]] * [[http://stackoverflow.com/questions/7582394/strdup-or-strdup|strdup or _strdup?]] * 如果看到取得字串有問題,可能是沒有用 strdup 複製 yytext。 * [[http://stackoverflow.com/questions/5613073/yacc-rule-useless-due-to-conflicts|Yacc “rule useless due to conflicts”]] * [[http://www.gnu.org/software/bison/manual/html_node/Mid_002dRule-Conflicts.html|3.4.8.3 Conflicts due to Mid-Rule Actions]] * 當 Yacc 無法藉由有限個數的 lookahead 來推導規則會引發此錯誤。Mid-Rule Action 有時也會引發此錯誤。 * [[http://stackoverflow.com/questions/6797221/bison-how-to-ignore-a-token-if-it-doesnt-fit-into-a-rule|Bison: How to ignore a token if it doesn't fit into a rule]] * Bison 無法忽略 Flex 傳回的 token。 * [[http://stackoverflow.com/questions/17954253/how-to-get-rid-of-warnings-in-bison|How to get rid of warnings in Bison?]] * [[http://www.gnu.org/software/bison/manual/html_node/Bison-Options.html|9.1 Bison Options]] # 先關閉 waring,再開啟 conflicts-sr % bison -Wnone,conflicts-sr * [[http://stackoverflow.com/questions/16452737/is-it-possible-to-call-one-yacc-parser-from-another-to-parse-specific-token-subs|Is it possible to call one yacc parser from another to parse specific token substream?]] ====== 處理 INCLUDE ====== YY_BUFFER_STATE 是 lex 對特定輸入流所產生的句柄。其中包含輸入流的緩衝區,和其它相關變數。 #define YY_STRUCT_YY_BUFFER_STATE struct yy_buffer_state { FILE *yy_input_file; char *yy_ch_buf; /* input buffer */ char *yy_buf_pos; /* current position in input buffer */ /* Size of input buffer in bytes, not including room for EOB * characters. */ yy_size_t yy_buf_size; ... } * [[http://westes.github.io/flex/manual/Multiple-Input-Buffers.html|11 Multiple Input Buffers]] * yypush_buffer_state 和 yypop_buffer_state 將 YY_BUFFER_STATE 入棧和出棧。但除了 YY_BUFFER_STATE 之外,一般還需要其它資訊,因此較建議自行維護此類棧空間。 * [[https://stackoverflow.com/questions/2269929/how-can-i-implement-include-constructs-with-flex-and-yacc|How can I implement #include constructs with Flex and YACC?]] * [[http://efxa.org/2014/05/11/implementing-the-include-directive-to-support-nested-input-files/|Implementing the “include” directive to support nested input files]] * YY_CURRENT_BUFFER 代表當前的輸入流。 * Chapter 2: Using Flex * Start States and Nested Input Files * Chapter 5: A Reference for Flex Specifications * Input Management * Lex 的輸入流可以是: * 標準輸入 * 檔案 * 字串: [[http://stackoverflow.com/questions/780676/string-input-to-flex-lexer|String input to flex lexer]] * YYINPUT 將輸入流填充到緩衝區,使用者可以自行替換 YYINPUT,添加自己所需的功能。 yy_switch_to_buffer (yy_scan_buffer (char *base, yy_size_t size) ); * 第零階段。處理巨集定義和調用、條件編譯。 ====== Lex & Yacc Internal ====== ===== Flex ===== token 由兩部分組成: token 和 token value。 * Lex * yytext: token * yylval: token value,傳給 yacc * yylex: lexer %{ // declaration and option setting #include #include // yacc 根據 *.y 產生的 *.h,定義 token (number) %} %% // pattern and action REGEXP1 ACTION REGEXP2 {ACTIONS} %% // helper function * Yacc * YYSTYPE 定義 yylval 的型別,這裡用改用 union * 用 $- 取得第幾號 yylval * yyparse: parser %{ // include C header files %} // 定義 token。parser 據此判斷該符號是 token。 // 其餘非 token 的符號必須出現在 rule 左邊至少一次。 %% // rule (grammer) // EXPS ($$) // EXP ($1) // FACTORS ($2) // TOKEN ($3) EXPS: /* empty, match the beginning of the input */ | EXP FACTORS TOKEN { ACTIONS } ; /* end of rule */ %% // helper function * [[wp>Flex lexical analyser]] * [[http://flex.sourceforge.net/manual/|Flex Manual]] * [[http://flex.sourceforge.net/manual/Start-Conditions.html#Start-Conditions|Start Conditions]] * 該條規則只有在 START COND 下才會被啟動 REGEXP ACTION * [[http://flex.sourceforge.net/manual/Multiple-Input-Buffers.html|Multiple Input Buffers]] * 指定 lex 從哪一個緩衝區讀入 token * lex 可以設定多種 start condition。不同 start condition 會採用不同的 rule。使用 BEGIN 切換 start condition。 %x SN_CONDITION_MACRO_DEFINITION SN_CONDITION_MACRO_CALL SN_CONDITION_MACRO_CALL_DELIM %x SN_CONDITION_MACRO_EXPANSION SN_CONDITION_MACRO_DECL SN_CONDITION_INCLUDE %x SN_CONDITION_FALSE %x SN_CONDITION_COPY SN_CONDITION_MESSAGE %x SN_NO_BOUNDARY_TWO_STRING SN_NO_BOUNDARY_ONE_STRING %x SN_COMMENT_CONDITION SN_COMMENT_CONDITION2 %x SN_NO_BOUNDARY_THREE_TOKEN * yyscan_t 是可重入 (reentrant) 的 Lex 之狀態。 // 透過 yyscan_t 傳遞 scanner 狀態 // struct yyguts_t * yyg = (struct yyguts_t*)yyscanner; typedef void* yyscan_t; /* For convenience, these vars (plus the bison vars far below) are macros in the reentrant scanner. */ #define yyin yyg->yyin_r #define yyout yyg->yyout_r #define yyextra yyg->yyextra_r #define yyleng yyg->yyleng_r #define yytext yyg->yytext_r #define yylineno (YY_CURRENT_BUFFER_LVALUE->yy_bs_lineno) #define yycolumn (YY_CURRENT_BUFFER_LVALUE->yy_bs_column) #define yy_flex_debug yyg->yy_flex_debug_r /* Holds the entire state of the reentrant scanner. */ struct yyguts_t { /* User-defined. Not touched by flex. */ YY_EXTRA_TYPE yyextra_r; /* The rest are the same as the globals declared in the non-reentrant scanner. */ FILE *yyin_r, *yyout_r; size_t yy_buffer_stack_top; /**< index of top of stack. */ size_t yy_buffer_stack_max; /**< capacity of stack. */ YY_BUFFER_STATE * yy_buffer_stack; /**< Stack as an array. */ char yy_hold_char; int yy_n_chars; int yyleng_r; char *yy_c_buf_p; int yy_init; int yy_start; int yy_did_buffer_switch_on_eof; int yy_start_stack_ptr; int yy_start_stack_depth; int *yy_start_stack; yy_state_type yy_last_accepting_state; char* yy_last_accepting_cpos; int yylineno_r; int yy_flex_debug_r; char *yytext_r; int yy_more_flag; int yy_more_len; YYSTYPE * yylval_r; }; /* end struct yyguts_t */ * [[http://stackoverflow.com/questions/2634998/writing-re-entrant-lexer-with-flex|Writing re-entrant lexer with Flex]] *yyscanner is available to all your actions. In addition, yylex has a local variable called yyg that holds the entire state, and most macros conveniently refer to yyg. * [[http://flex.sourceforge.net/manual/Reentrant.html|Reentrant C Scanners]] ===== Bison ===== * [[http://www.cs.uic.edu/~spopuri/cparser.html|Understanding C parsers generated by GNU Bison]] * [[https://lists.gnu.org/archive/html/help-bison/2006-09/msg00018.html|Understanding C parsers generated by GNU Bison]] * [[http://www.dickgrune.com/|Dr. Dick Grune]] * [[http://dickgrune.com/Books/PTAPG_1st_Edition/BookBody.pdf|Parsing Techniques - A Practical Guide]] ====== 系列文章 ====== * [[http://efxa.org/2014/05/11/implementing-the-include-directive-to-support-nested-input-files/|Implementing the “include” directive to support nested input files]] * [[http://efxa.org/2014/05/17/techniques-for-resolving-common-grammar-conflicts-in-parsers/|Techniques for resolving common grammar conflicts in parsers]] * [[http://efxa.org/2014/05/25/how-to-create-an-abstract-syntax-tree-while-parsing-an-input-stream/|How to create an abstract syntax tree while parsing an input stream]] ====== 網路教程 ====== * [[http://web.cs.wpi.edu/~kal/courses/compilers/|Compilers Programming Language Translation]] * [[http://www.progtools.org/compilers/tutorials/cxx_and_bison/cxx_and_bison.html|How to use C++ with Bison, V 1.0.2]] * [[http://gnuu.org/2009/09/18/writing-your-own-toy-compiler/|Writing Your Own Toy Compiler Using Flex, Bison and LLVM]] * [[http://www.saltycrane.com/blog/2007/02/example-using-bison-and-flex-with/|Example using bison and flex with cygwin on Windows]] * [[http://www.tldp.org/HOWTO/Lex-YACC-HOWTO.html#toc1|Lex and YACC primer]] * [[http://dinosaur.compilertools.net/|The Lex & Yacc Page]] * [[http://www.lemoda.net/c/reentrant-parser/index.html|Make a reentrant parser with Flex and Bison]] ====== 進階議題 ====== * GLR parser * [[http://scottmcpeak.com/elkhound/|Elkhound: A GLR Parser Generator]] * Push and Pull parser * 以 parser 為主,調用 scanner 的模式,稱之為 pull parser,意味著將 token pull 進 parser。 * 以 scanner 為主,將 token 餵給 scanner 的模式,稱之為 push parser,意味著將 token push 進 parser。 * [[https://www.gnu.org/software/bison/manual/html_node/Push-Decl.html|3.7.11 A Push Parser]] ===== 斷行問題 ===== * [[http://www.editpadpro.com/tricklinebreak.html|Line Breaks in Windows, UNIX & Macintosh Text Files]] * Windows 系統以 CR+LF 斷行。 * UNIX 和 OS X 系統以 LF 斷行。 * 傳統 Mac 系統以 CR 斷行。 ===== 編碼問題 ===== * 由編輯器將文本轉成 [[UTF8|UTF-8]] 編碼,以 char * 傳給,Lex 撰寫 UTF-8 編碼匹配字符,後續再轉成寬字符 (wchar_t)。 * [[http://csliu.com/2009/04/unicode-support-in-flex|Unicode Support in Flex]] * [[http://lists.gnu.org/archive/html/help-flex/2001-05/msg00014.html|Multi-byte support in 'flex']] * [[http://stackoverflow.com/questions/9611682/flexlexer-support-for-unicode|Flex(lexer) support for unicode]] ===== 語法剖析術語===== * [[http://www.panopticoncentral.net/2009/02/12/lexing-and-parsing/|Lexing and Parsing]] * [[http://www.panopticoncentral.net/2009/03/13/ll-vs-lr-vs-glr/|LL vs. LR vs. GLR]] * [[http://stackoverflow.com/questions/19663564/what-is-the-difference-between-lalr-and-lr-parsing|What is the difference between LALR and LR parsing?]] * LR (0) < LALR(1) < LR(1) * [[wp>Shift-reduce parser]] * Shift 表示將 token/non-terminal 置於堆疊上; reduce 代表匹配到一個 rule,將該 rule 相關的 token/non-terminal 推出堆疊。 * [[wp>Operator-precedence parser]] * [[wp>Off-side rule]] * [[wp>Earley parser]] * [[http://shape-of-code.coding-guidelines.com/2014/09/10/is-early-parsing-now-practical/|Is Early parsing now practical?]] * [[wp>Dangling else]] ====== 外部連結 ====== * [[https://fbb-git.github.io/bisoncpp/|BisonC++]] * [[http://gppg.codeplex.com/|Gardens Point Parser Generator]] * C# 版的 Flex & Bison。