<note>
  - 使用 [[https://github.com/westes/flex|flex]] 和 [[http://www.gnu.org/software/bison/|Bison]]。
    * [[http://westes.github.io/flex/manual/|Lexical Analysis With Flex]]
    * [[https://www.gnu.org/software/bison/manual/bison.html|Bison 3.0.4]]
  - 參考書籍
    * [[https://www.amazon.com/flex-bison-Text-Processing-Tools/dp/0596155972|flex & bison: Text Processing Tools]]
    * [[https://www.amazon.com/Language-Implementation-Patterns-Domain-Specific-Programming/dp/193435645X|Language Implementation Patterns]]
    * [[http://www.ituring.com.cn/book/1159|自製編程語言]]
  - [[ANTLR]]
</note>
====== 基礎教學 ======
{{http://epaperpress.com/lexandyacc/images/fig12.gif}}

  - 撰寫 bas.y (yacc 檔) 和 bas.l (lex 檔)。
  - yacc 編譯 bas.y 產生定義 token 的 y.tab.h 給 lex.yy.c include; 產生定義 parser 的 y.tab.c，yyparse 是 parser 的入口函式。
  - lex 編譯 bas.l 產生定義 scanner 的 lex.yy.c，yylex 是 scanner 的入口函式。y.tab.c 的 yyparse 調用 yylex。
  - cc (C 編譯器) 編譯 y.tab.c 和 lex.yy.c，得到自製編譯器 bas.exe。
  - bas.exe 編譯 source，得到 compiled output。

  * [[http://www.ituring.com.cn/book/1159|自制编程语言]]
    * [[http://download.csdn.net/detail/matrixvirus/6573659|自制编程语言]].
  * [[http://web.iitd.ac.in/~sumeet/flex__bison.pdf|flex & bison]]
    * [[http://www.cse.iitb.ac.in/~br/courses/cs699-autumn2013/refs/lexyacc-epaperpress.pdf|A COMPACT GUIDE TO LEX & YACC]]
  * 寫 lex 需要懂 [[wp>Regular expression]]，寫 yacc 需要懂 [[wp>Backus–Naur form|Backus–Naur form (BNF)]]。
    * 要懂怎麼建 ast tree, walk ast tree, 建 symbol table。
    * 要懂怎麼為輸入輸出建模，錯誤訊息怎麼有條理的輸出。
===== BNF =====
  * [[http://good-ed.blogspot.tw/2010/02/ebnf.html|(E)BNF表示式]]
  * [[http://good-ed.blogspot.tw/2010/03/blog-post.html|手工打造算式計算機]]
  * [[http://good-ed.blogspot.tw/2010/04/lexyacc.html|以lex/yacc實作算式計算機]]


  * [[wp>LR parser]]
    * LR(k) 代表往前看幾個 token。
  * [[wp>GLR parser]]
    * [[http://www.gnu.org/software/bison/manual/html_node/GLR-Parsers.html|1.5 Writing GLR Parsers]]

  * [[http://www.gnu.org/software/bison/manual/html_node/Debugging.html|8 Debugging Your Parser]]
    * [[http://www.gnu.org/software/bison/manual/html_node/Understanding.html|8.1 Understanding Your Parser]]

  * 小技巧
    * [[http://stackoverflow.com/questions/14665976/flex-use-text-file-as-input-stream|Flex: Use Text File as Input Stream]]
    * [[http://stackoverflow.com/questions/8794239/can-flex-and-bison-accept-input-from-other-sources|Can flex and bison accept input from other sources?]]
    * [[http://stackoverflow.com/questions/1756275/bison-end-of-file|bison end of file]]
      * 問題在於 Bison 遇到 EOF，其動作是中止文法推導 (return 0)。此時 stack 中尚有 symbol，因此 Bison 判定為錯誤。故人為預先在文本或是輸入流加上 \n，使得文法推導成功。
    * [[http://stackoverflow.com/questions/9611682/flexlexer-support-for-unicode|Flex(lexer) support for unicode]]
====== Flex & Bison 應用 ======
  * [[http://www.di-mgt.com.au/flex_and_bison_in_msvc.html|Using flex and bison in MSVC++]]
  * [[http://blog.sina.com.cn/s/blog_6476250d0100wq2q.html|【计算机】在Visual Studio 2008中使用Flex、Bison工具的配置]]
  * 在 Windows 上建議使用 Cygwin 提供的 Flex & Bison。
  * [[http://flex.sourceforge.net/manual/Makefiles-and-Flex.html|A.1 Makefiles and Flex]]
===== Flex =====
<code>
definitions
%%
rules
%%
user code
</code>
  * [[http://flex.sourceforge.net/manual/Definitions-Section.html|5.1 Format of the Definitions Section]]
    * 以 {name} 的形式使用，避免不必要的錯誤。
  * [[http://flex.sourceforge.net/manual/Rules-Section.html|5.2 Format of the Rules Section]]
    * [[http://flex.sourceforge.net/manual/Patterns.html|6 Patterns]]
      * <q>These expressions all designate a set of characters equivalent to the corresponding standard C isXXX function.</q> 
    * [[http://marvin.cs.uidaho.edu/Handouts/regex.html|A Regular Expression Primer]]
  * [[http://flex.sourceforge.net/manual/User-Code-Section.html|5.3 Format of the User Code Section]]
  * [[http://flex.sourceforge.net/manual/Comments-in-the-Input.html|5.4 Comments in the Input]]
    * 新起一行，一個以上空白，後接 /* comment */。

  * 常用選項
    * [[http://flex.sourceforge.net/manual/Code_002dLevel-And-API-Options.html|16.3 Code-Level And API Options]]
      * %option bison-locations
        * 產生位置資訊供 bison 使用。此選項隱含 %option bison-bridge
      * %option bison-bridge
        * 產生適用於 bison 的接口。
      * %option reentrant
        * 產生可重入的 scanner。
      * %option nounistd
        * 避免引入非標準的 unistd 標頭檔。  
      * %option noyywrap
      * %option nodefault
      * %option yylinenumber
      * %option case-insensitive

  * 小技巧
    * [[http://flex.sourceforge.net/manual/Start-Conditions.html|10 Start Conditions]]
    * Lex 規則可以用 start condition 分組。start condition 可以是 inclusive 或是 exclusive。
      * 切換至 inclusive start condition 時，沒有隸屬任何 start condition 的規則也會被納入匹配。<q>If the start condition is inclusive, then rules with no start conditions at all will also be active.</q>
      * 切換至 exclusive start condition 時，只有隸屬該 exclusive start condition 會被納入匹配。<q>If it is exclusive, then only rules qualified with the start condition will be active.</q>
      * BEGIN(0) 或是 BEGIN(INITIAL) 都會切換回預設的 start condition (INITIAL)。
    * [[http://stackoverflow.com/questions/22779138/when-should-i-use-begin-to-change-start-condition|When should I use BEGIN to change start condition?]]<code>
%s example     ; example 是 inclusive start condition。
%%

; 當切換到 example start condition 時，foo 和 bar 都會被激活。   
<example>foo   do_something();
     
bar            something_else(); 
</code>

  * [[http://flex.sourceforge.net/manual/How-can-I-match-C_002dstyle-comments_003f.html|How can I match C-style comments?]]

  * [[http://stackoverflow.com/questions/15396322/the-quotation-marks-in-flex|The quotation marks “ ” in flex]]
    * 雙引號用意在於告訴 lex 不用解釋其中內容。正則表達式所用的符號不起作用。
    * 如果要匹配雙引號，使用 \"。
    * 單引號不起作用，單純代表單引號本身。
    * Bison 用雙引號代表要匹配的字串，單引號代表要匹配的字元。
===== Bison =====
<note>Bison 3.0 版之後有較大改變。注意手冊版本。</note>
  * [[http://www.gnu.org/software/bison/manual/html_node/Grammar-Outline.html|3.1 Outline of a Bison Grammar]]

  * [[http://www.gnu.org/software/bison/manual/html_node/Grammar-File.html|3 Bison Grammar Files]]
    * [[http://www.gnu.org/software/bison/manual/html_node/Recursion.html|3.3.3 Recursive Rules]]<code yacc>
    /* left recursion */
expseq1:
  exp
| expseq1 ',' exp
;

    /* right recursion */
expseq1:
  exp
| exp ',' expseq1
;
</code>
      * right recursion 有可能會造成 bison 內部的棧空間耗盡，但優點在於鏈結串列的節點是由頭串聯至尾，無須反向。
      * 可以透過 YYMAXDEPTH ([[http://www.gnu.org/software/bison/manual/html_node/Memory-Management.html|5.10 Memory Management, and How to Avoid Memory Exhaustion]]) 加大 bison 內部的棧空間。
      * [[http://stackoverflow.com/questions/1697363/left-recursion-in-grammar-results-in-conflicts|Left Recursion in Grammar Results in Conflicts]]
    * [[http://www.gnu.org/software/bison/manual/html_node/Declarations.html|3.7 Bison Declarations]]
      * [[http://www.gnu.org/software/bison/manual/html_node/Decl-Summary.html|3.7.12 Bison Declaration Summary]]

  * [[http://www.gnu.org/software/bison/manual/html_node/Interface.html|4 Parser C-Language Interface]]
    * [[http://www.gnu.org/software/bison/manual/html_node/Parser-Function.html|4.1 The Parser Function yyparse]]
      * 可以藉由檢查 yyparse 返回值，簡單查驗 parse 是否成功。

  * [[http://www.gnu.org/software/bison/manual/html_node/Algorithm.html|5 The Bison Parser Algorithm]]
    * [[http://www.gnu.org/software/bison/manual/html_node/Lookahead.html|5.1 Lookahead Tokens]]
    * [[http://www.gnu.org/software/bison/manual/html_node/Shift_002fReduce.html|5.2 Shift/Reduce Conflicts]]
      * 不管是 shift 匹配到一個 symbol，或是當下 reduce (stack) 到一個 rule，都是合法行徑的時候，即為 Shift/Reduce Conflict。
      * 觀察 *.output，可以得知發生 Shift/Reduce Conflict 的狀態為何。如果有吃進同一個 token， 不論 shift 或是 reduce 皆可，即是 Shift/Reduce Conflict。
      * [[http://www.gnu.org/software/bison/manual/html_node/Non-Operators.html|5.3.6 Using Precedence For Non Operators]]
      * [[http://www.gnu.org/software/bison/manual/html_node/Expect-Decl.html|3.7.8 Suppressing Conflict Warnings]]

  * [[http://www.gnu.org/software/bison/manual/html_node/Multiple-Types.html#Multiple-Types|More Than One Value Type]]
  * [[http://www.gnu.org/software/bison/manual/html_node/Token-Decl.html|Token Type Names]]<code>
%token <TYPE> TOKEN
</code>

<blockquote>Each call to yylex() returns an integer value which represents a token type. This tells YACC what kind of token it has read. The token may optionally have a value, which should be placed in the variable yylval. 

By default yylval is of type int, but you can override that from the YACC file by re#defining YYSTYPE. 
</blockquote>
    * token 有以下三種含義:
      * 回傳給 yacc 的 token type。
      * token 的原始字串，yytext。
      * 回傳給 yacc 的 token value，yylval。  

使用 %union 讓 yylval 可以有多種型別。<code>
%union 
{
        int number;
        char *string;
}

%token <number> STATE
%token <number> NUMBER
%token <string> WORD

[0-9]+                  yylval.number=atoi(yytext); return NUMBER;
[a-z0-9]+               yylval.string=strdup(yytext);return WORD;
</code>

====== 優先權與結合性 ======
  * [[http://www.gnu.org/software/bison/manual/html_node/Precedence-Decl.html|3.7.3 Operator Precedence]]
  * [[http://www.gnu.org/software/bison/manual/html_node/Precedence.html|5.3 Operator Precedence]]
    * [[http://www.gnu.org/software/bison/manual/html_node/Using-Precedence.html#Using-Precedence|5.3.2 Specifying Operator Precedence]]
    * <q>The relative precedence of different operators is controlled by the order in which they are declared. The first precedence/associativity declaration in the file declares the operators whose precedence is lowest, the next such declaration declares the operators whose precedence is a little higher, and so on.</q>
====== 建立符號表 ======
  * [[http://www.d.umn.edu/~rmaclin/cs5641/Notes/L15_SymbolTable.pdf|Symbol Table Implementations]]
  * [[http://pages.cs.wisc.edu/~cs536-1/readings/6.SYMBOL-TABLES.html|Symbol Tables and Static Checks]]
  * 2 Using Flex
====== 建立語法樹 ======
Bison 中所寫的語法規則，有些僅是用來處理運算子優先順序，和程式無關。抽象語法樹即是將前述語法規則剔除。

  * [[http://web.eecs.utk.edu/~bvz/teaching/cs461Sp11/notes/parse_tree/|Building Abstract Syntax Trees]]
    * 遍歷抽象語法樹
  * [[http://stackoverflow.com/questions/2644597/how-do-i-implement-if-statement-in-flex-bison|How do i implement If statement in Flex/bison]]
  * [[http://efxa.org/2014/05/25/how-to-create-an-abstract-syntax-tree-while-parsing-an-input-stream/|How to create an abstract syntax tree while parsing an input stream]]

  * 3 Using Bison
====== Mid-Rule Action ======
  * [[https://www.gnu.org/software/bison/manual/html_node/Using-Mid_002dRule-Actions.html|3.4.8.1 Using Mid-Rule Actions]]
    * <q>A mid-rule action may refer to the components preceding it using $n, but it may not refer to subsequent components because it is run before they are parsed.</q>
      * mid-rule action 可以參照到前面 symbol 的值，但無法存取其後 symbol 的值。
    * <q>The mid-rule action itself counts as one of the components of the rule. This makes a difference when there is another action later in the same rule (and usually there is another at the end): you have to count the actions along with the symbols when working out which number n to use in $n.</q>
      * mid-rule action 本身也算成一個 symbol。
    * <q>The mid-rule action can also have a semantic value. The action can set its value with an assignment to $$, and actions later in the rule can refer to the value using $n. Since there is no symbol to name the action, there is no way to declare a data type for the value in advance, so you must use the ‘$<…>n’ construct to specify a data type each time you refer to this value.</q>
      * mid-rule action 本身可以有值。
    * <q>There is no way to set the value of the entire rule with a mid-rule action, because assignments to $$ do not have that effect. The only way to set the value for the entire rule is with an ordinary action at the end of the rule.</q>
      * mid-rule action 的值無法賦值給本身所屬的規則。

====== 錯誤回報 ======
  * [[https://www.ibm.com/developerworks/opensource/library/l-flexbison/|Better error handling using Flex and Bison]]
    * [[http://stackoverflow.com/questions/15429834/flex-bison-building-c-compiler-line-number-in-error-message|Flex Bison: Building C Compiler - Line number in error message]]
    * [[http://stackoverflow.com/questions/656703/how-does-flex-support-bison-location-exactly|How does flex support bison-location exactly?]]

  * [[http://www.gnu.org/software/bison/manual/html_node/Error-Reporting.html|4.7 The Error Reporting Function yyerror]]
    * 取得更精確的錯誤訊息。
    * 不同版本 Bison 有不同控制項
      * %error-verbose
      * %define parse.error verbose
<blockquote>If you invoke ‘%define parse.error verbose’ in the Bison declarations section (see The Bison Declarations Section), then Bison provides a more verbose and specific error message string instead of just plain "syntax error". However, that message sometimes contains incorrect information if LAC is not enabled (see LAC).</blockquote>
    * [[http://www.gnu.org/software/bison/manual/html_node/LAC.html|5.8.3 LAC]]

  * [[http://flex.sourceforge.net/manual/Generated-Scanner.html|9 The Generated Scanner]]
    * 自訂 YY_INPUT
    * BeginToken 紀錄 token 起終點。
    * Lex 讀取 token 時，有可能會多讀取字元，以便判定是否辨識出一個 token。此時需要修正 token 位置，輸出錯誤訊息時才能正確定位。
    * [[http://www.gnu.org/software/bison/manual/html_node/Tracking-Locations.html|3.5 Tracking Locations]]
      * [[http://www.gnu.org/software/bison/manual/html_node/Location-Type.html|3.5.1 Data Type of Locations]]
        * YYLTYPE
        * 位置的預設資料結構。可以自行定義加以擴充。 
      * [[http://www.gnu.org/software/bison/manual/html_node/Actions-and-Locations.html|3.5.2 Actions and Locations]]
        * @1 代表 $1 的位置資料結構。
      * [[http://www.gnu.org/software/bison/manual/html_node/Location-Default-Action.html|3.5.3 Default Action for Locations]]
        * YYLLOC_DEFAULT
        * 處理位置的預設動作

  * [[http://www.gnu.org/software/bison/manual/html_node/Error-Recovery.html|6 Error Recovery]]
    * [[http://marvin.cs.uidaho.edu/Teaching/CS445/bisonErrorToken.html|Using the Error Token in Bison]]  
    * 如果需要累計錯誤，不中斷剖析，使用 error token。

  * [[http://stackoverflow.com/questions/19005355/how-to-count-syntax-errors-in-bison-flex|how to count syntax errors in bison-flex]]
    * 在 yyerror 裡面計數。yacc 提供的 error token 不是用來記數錯誤，而是用來做錯誤回復 (error recovery)。

  * 8. Error Reporting and Recovery
===== 記錄位置 =====
  * [[http://lc.life.blog.163.com/blog/static/2405905920086201624532/|yylineno]]
    * [[http://stackoverflow.com/questions/13317634/flex-yylineno-set-to-1|Flex yylineno set to 1]]
    * 必須要有處理 \n 的規則，yylineno 才會累加。

  * [[http://hi.baidu.com/gao_dennis/item/8fb10642a38136efa5c066c7|lex 和 yacc 学习笔记5 （bison和flex 记录token的位置信息）]]
    * 要開啟 location 的支援，必須同時修改 flex 和 bison。
      * flex 添加 ''%option bison-locations'' (''%option bison-bridge'' 可移除)。
      * flex 添加如下代碼:<code c>
/* handle locations */
//static int yycloumn = 1; // yycloumn 自動產生

#define YY_USER_ACTION yylloc->first_line = yylloc->last_line = yylineno; \
    yylloc->first_column = yycloumn; yylloc->last_column = yycloumn+yyleng-1; \
    yycolumn += yyleng;
</code>
      * flex 在有處理 \n 的規則將 yycloumn 歸為 1。
      * bison 添加 ''%locations''。函式 yyerror 第一個參數為 YYLTYPE*。
  * lex 把 ' ' 和 '\t' 都視做長度為一，所以針對 '\t' 計算位置會有問題。
    * [[http://stackoverflow.com/questions/8197812/how-does-one-configure-notepad-to-use-spaces-instead-of-tabs|How does one configure Notepad++ to use spaces instead of tabs?]]
    * 編輯器將 tab 轉換成固定個數的 space 傳給 parser。
    * [[http://websystemsengineering.blogspot.tw/2012/12/replacing-tabs-with-spaces-in-notepad.html|Replacing tabs with spaces in Notepad++]]
    * 原來文本中的 tab 不受新設定的影響，不會被轉換成 space。
====== 上下文相關語法 ======
  * [[http://www.gnu.org/software/bison/manual/bison.html#Context-Dependency|7 Handling Context Dependencies]]
    * [[http://www.gnu.org/software/bison/manual/bison.html#Semantic-Tokens|7.1 Semantic Info in Token Types]]
      * 由 flex 負責決定回傳 token 的型別。  
    * [[http://www.gnu.org/software/bison/manual/bison.html#Lexical-Tie_002dins|7.2 Lexical Tie-ins]]
      * 由 Bison 設置標誌，藉此更改 flex 的行為。
    * [[http://www.gnu.org/software/bison/manual/bison.html#Tie_002din-Recovery|7.3 Lexical Tie-ins and Error Recovery]]
      * 錯誤回復需要調整。  

  * [[wp>The lexer hack]]
    * [[http://eli.thegreenplace.net/2012/07/05/how-clang-handles-the-type-variable-name-ambiguity-of-cc/|How Clang handles the type / variable name ambiguity of C/C++]]
    * [[http://www.drdobbs.com/cpp/practical-parsing-for-ansi-c/196603535|Practical Parsing for ANSI C]]
      * Lexer 和 Parser 並非各自獨立。
      * check_type (SDCC.lex) 負責辨識 typename 和 identifier。
====== 錯誤回復 ======
  * [[http://shape-of-code.coding-guidelines.com/2010/04/19/brief-history-of-syntax-error-recovery/|Brief history of syntax error recovery]]
    * 設計語言時，如果有所謂的 statement terminiator (如: 分號或斷行)，該處會是適合的同步點 (synchronization point)。<code>
stmt : whileStmt
     | ifStmt
     | ...
     | error NEWLINE /* 此處選擇斷行符做為同步點。*/
     ;
</code>
  * [[http://inst.eecs.berkeley.edu/~cs164/sp11/lectures/lecture15-2x2.pdf]]
    * 如果 lex 必須忽略分隔符 (如: 斷行)，則在 yacc 端處理。
      * [[http://stackoverflow.com/questions/13096879/bison-error-recovery|bison error recovery]]
    * [[http://stackoverflow.com/questions/19005355/how-to-count-syntax-errors-in-bison-flex|how to count syntax errors in bison-flex]]

  * [[http://www.gnu.org/software/bison/manual/html_node/Simple-Error-Recovery.html|2.3 Simple Error Recovery]]
  * [[https://www.gnu.org/software/bison/manual/html_node/Error-Recovery.html|6 Error Recovery]]
  * [[http://marvin.cs.uidaho.edu/Teaching/CS445/bisonErrorToken.html|Using the Error Token in Bison]]

  * [[https://www.gnu.org/software/bison/manual/html_node/Destructor-Decl.html|3.7.6 Freeing Discarded Symbols]]
    * 進行錯誤回復的過程中，yacc 可能會丟棄已處理的 symbol。此時需要將被丟棄的 symbol 使用的內存釋放掉。  
====== 衝突處理 ======
根本原因: 當 Bison 試圖 reduce 時，有其它 shift 或是 reduce 的可能，即為 shift/reduce 或是 reduce/reduce conflict。以下圖來看，"." 出現在規則結尾處時，代表有 reduce 發生。此時，如果有其它 "." 出現，代表衝突發生。

<code>
State 14 conflicts: 1 shift/reduce

   ...
   
State 14

   12 input_state_path: T_INPUT_STATE_NO_PATH . T_DIVIDE identifier
   13                 | T_INPUT_STATE_NO_PATH .

    T_DIVIDE  shift, and go to state 19

    T_DIVIDE  [reduce using rule 13 (input_state_path)]
    $default  reduce using rule 13 (input_state_path)
</code>
  * . 代表目前正讀入 T_INPUT_STATE_NO_PATH，規則 12 和規則 13 同時處於此狀態 (State 14)。
  * 如果下一個 token 是 T_DIVIDE，Bison 可以選擇使用規則 12，shift T_DIVIDE，轉換到狀態 19。
  * 如果下一個 token 是 T_DIVIDE，Bison 可以選擇使用規則 13，reduce。
  * 對於 shift/reduce conflict，Bison 預設選擇 shift。因此沒有被用到的規則 13，以中括號表示。

  * shift/reduce conflict
    * Bison 預設選擇 shift
    * 解決方法:
      * 從 *.output 檔找到有 shift/reduce conflict 的狀態。
      * 辨識出 reduce rule。
      * 辨識出與之相衝突的 shift rule。"reduce using" 語句前面有 reduce 預期的下一個 token，看哪一條 shift 預期的下一個 token 和它相同。
      * 辨識出 reduce rule 會 reduce 到的狀態。
      * 推導出導致 shift/reduce conflict 的字串。
  * reduce/reduce conflict
    * Bison 預設選擇第一條規則 reduce

  * [[http://www.codeproject.com/Articles/5221/The-Bison-v-verbose-option|The Bison -v/--verbose option]]

  * 容易發生衝突的情況 (基本上都是因為 recursive 性質所引起):
    * expression grammers: 同一條規則，但此規則屬於 recursive rule，因此衝突有機會發生在同一條規則上。
    * if/then/else: 具有相同 LHS 的規則。
    * nest lists of items

<code bash>
$ grep "State 14" -n Syntax_InputState.output
$ sed -n '211,222p' Syntax_InputState.output > Syntax_InputState.output.new
</code>
  * [[http://serverfault.com/questions/133692/how-to-display-certain-lines-from-a-text-file-in-linux|How to display certain lines from a text file in Linux?]]

===== GLR 剖析器 =====
  * Bison 是 LALR(1)，1 代表 lookahead 的 token 個數為 1。如果文法需要 2 個 lookahead，考慮改用 GLR。
    * LALR: Look Ahead, Left-to-Right, rightmost derivation
    * [[wiki>LALR parser]]
    * scanner 和 parser 現在都需要引入 tab.h
    * [[http://stackoverflow.com/questions/4946070/bison-flex-peek-at-the-next-letter-or-token|bison/flex: Peek at the next letter or token]]

  * 搭配 lex 的 r/s 樣式或許可以解決 lookahread 的問題。
    * [[http://flex.sourceforge.net/manual/Patterns.html#Patterns|6 Patterns]]
    * [[http://flex.sourceforge.net/manual/Limitations.html#Limitations|24 Limitations]]  

對於原本文法存在的 shift/reduce 或是 reduce/reduce conflict，GLR parser 於衝突處分成多個 parser 同時並行處理。注意! GLR parser 處理時間隨 conflict 個數增加。

<code yacc>
%glr-parser
%expect 2
%expect-rr
</code>
  * [[http://www.gnu.org/software/bison/manual/html_node/Expect-Decl.html|3.7.8 Suppressing Conflict Warnings]]
  * [[http://www.gnu.org/software/bison/manual/bison.html#GLR-Semantic-Actions|1.5.3.1 Deferred semantic actions]]
[[http://stackoverflow.com/questions/24748594/additional-syntax-error-message-on-glr-parser-when-syntax-is-ambiguous|Additional syntax error message on GLR parser when syntax is ambiguous]]
  * 使用 %dprec 解決 reduce/reduce conflict
  * 使用 %merge
===== Q & A =====
  * [[http://flex.sourceforge.net/manual/Whenever-flex-can-not-match-the-input-it-says-_0022flex-scanner-jammed_0022_002e.html|Whenever flex can not match the input it says "flex scanner jammed".]]
    * [[http://stackoverflow.com/questions/10267307/flex-default-rule|Flex default rule]]
  * [[http://flex.sourceforge.net/manual/Flex-is-not-matching-my-patterns-in-the-same-order-that-I-defined-them_002e.html|Flex is not matching my patterns in the same order that I defined them.]]
    * [[http://stackoverflow.com/questions/8379299/is-it-possible-to-set-priorities-for-rules-to-avoid-the-longest-earliest-match|Is it possible to set priorities for rules to avoid the “longest-earliest” matching pattern?]]
    * Lex 會匹配最長的 input。可以用文中方法匹配自己想要的樣式。
  * [[http://stackoverflow.com/questions/9611682/flexlexer-support-for-unicode|Flex(lexer) support for unicode]]
  * 匹配 []
    * [][]
    * 注意，如果要在 character class 內匹配到 []，] 必須緊跟在 [ 之後。
  * [[http://flex.sourceforge.net/manual/A-Note-About-yytext-And-Memory.html|21.3 A Note About yytext And Memory]]
    * [[http://bbs.csdn.net/topics/260011540|请教：strdup与strcpy具体的区别]]
    * [[http://stackoverflow.com/questions/14020380/strcpy-vs-strdup|strcpy vs strdup]]
    * [[http://stackoverflow.com/questions/7582394/strdup-or-strdup|strdup or _strdup?]]
    * 如果看到取得字串有問題，可能是沒有用 strdup 複製 yytext。
  * [[http://stackoverflow.com/questions/5613073/yacc-rule-useless-due-to-conflicts|Yacc “rule useless due to conflicts”]]
    * [[http://www.gnu.org/software/bison/manual/html_node/Mid_002dRule-Conflicts.html|3.4.8.3 Conflicts due to Mid-Rule Actions]]
    * 當 Yacc 無法藉由有限個數的 lookahead 來推導規則會引發此錯誤。Mid-Rule Action 有時也會引發此錯誤。
  * [[http://stackoverflow.com/questions/6797221/bison-how-to-ignore-a-token-if-it-doesnt-fit-into-a-rule|Bison: How to ignore a token if it doesn't fit into a rule]]
    * Bison 無法忽略 Flex 傳回的 token。
  * [[http://stackoverflow.com/questions/17954253/how-to-get-rid-of-warnings-in-bison|How to get rid of warnings in Bison?]]
    * [[http://www.gnu.org/software/bison/manual/html_node/Bison-Options.html|9.1 Bison Options]]<code>
# 先關閉 waring，再開啟 conflicts-sr
% bison -Wnone,conflicts-sr
</code>
  * [[http://stackoverflow.com/questions/16452737/is-it-possible-to-call-one-yacc-parser-from-another-to-parse-specific-token-subs|Is it possible to call one yacc parser from another to parse specific token substream?]]
====== 處理 INCLUDE ======
YY_BUFFER_STATE 是 lex 對特定輸入流所產生的句柄。其中包含輸入流的緩衝區，和其它相關變數。<code c>
#define YY_STRUCT_YY_BUFFER_STATE
struct yy_buffer_state
    {
    FILE *yy_input_file;

    char *yy_ch_buf;        /* input buffer */
    char *yy_buf_pos;       /* current position in input buffer */

    /* Size of input buffer in bytes, not including room for EOB
     * characters.
     */
    yy_size_t yy_buf_size;

    ...
}
</code>
  * [[http://westes.github.io/flex/manual/Multiple-Input-Buffers.html|11 Multiple Input Buffers]]
    * yypush_buffer_state 和 yypop_buffer_state 將 YY_BUFFER_STATE 入棧和出棧。但除了 YY_BUFFER_STATE 之外，一般還需要其它資訊，因此較建議自行維護此類棧空間。
      * [[https://stackoverflow.com/questions/2269929/how-can-i-implement-include-constructs-with-flex-and-yacc|How can I implement #include constructs with Flex and YACC?]]
      * [[http://efxa.org/2014/05/11/implementing-the-include-directive-to-support-nested-input-files/|Implementing the “include” directive to support nested input files]]
    * YY_CURRENT_BUFFER 代表當前的輸入流。

  * Chapter 2: Using Flex
    * Start States and Nested Input Files
  * Chapter 5: A Reference for Flex Specifications
    * Input Management
  * Lex 的輸入流可以是:
    * 標準輸入
    * 檔案
    * 字串: [[http://stackoverflow.com/questions/780676/string-input-to-flex-lexer|String input to flex lexer]]
  * YYINPUT 將輸入流填充到緩衝區，使用者可以自行替換 YYINPUT，添加自己所需的功能。


<code c>
yy_switch_to_buffer (yy_scan_buffer (char *base, yy_size_t size) );
</code>
  * 第零階段。處理巨集定義和調用、條件編譯。
====== Lex & Yacc Internal ======


===== Flex =====
token 由兩部分組成: token 和 token value。

  * Lex
    * yytext: token
    * yylval: token value，傳給 yacc
    * yylex: lexer<code>
%{
// declaration and option setting
#include <stdio.h>
#include <y.tab.h>    // yacc 根據 *.y 產生的 *.h，定義 token (number)
%}

%%
// pattern and action
REGEXP1     ACTION

REGEXP2     {ACTIONS}

%%
// helper function
</code>
  * Yacc
    * YYSTYPE 定義 yylval 的型別，這裡用改用 union
    * 用 $- 取得第幾號 yylval
    * yyparse: parser<code>
%{
// include C header files
%}

// 定義 token。parser 據此判斷該符號是 token。
// 其餘非 token 的符號必須出現在 rule 左邊至少一次。

%%

// rule (grammer)

// EXPS     ($$)
// EXP      ($1)
// FACTORS  ($2)
// TOKEN    ($3)
EXPS: /* empty, match the beginning of the input */
  | EXP FACTORS TOKEN
    {
        ACTIONS
    }
; /* end of rule */    
%%
// helper function
</code>

  * [[wp>Flex lexical analyser]]
  * [[http://flex.sourceforge.net/manual/|Flex Manual]]
    * [[http://flex.sourceforge.net/manual/Start-Conditions.html#Start-Conditions|Start Conditions]]
      * 該條規則只有在 START COND 下才會被啟動<code>
<START COND>REGEXP    ACTION
</code>
    * [[http://flex.sourceforge.net/manual/Multiple-Input-Buffers.html|Multiple Input Buffers]]
      * 指定 lex 從哪一個緩衝區讀入 token

  * lex 可以設定多種 start condition。不同 start condition 會採用不同的 rule。使用 BEGIN 切換 start condition。
<code>
%x SN_CONDITION_MACRO_DEFINITION    SN_CONDITION_MACRO_CALL  SN_CONDITION_MACRO_CALL_DELIM
%x SN_CONDITION_MACRO_EXPANSION     SN_CONDITION_MACRO_DECL  SN_CONDITION_INCLUDE
%x SN_CONDITION_FALSE
%x SN_CONDITION_COPY                SN_CONDITION_MESSAGE
%x SN_NO_BOUNDARY_TWO_STRING        SN_NO_BOUNDARY_ONE_STRING
%x SN_COMMENT_CONDITION             SN_COMMENT_CONDITION2
%x SN_NO_BOUNDARY_THREE_TOKEN
</code>

  * yyscan_t 是可重入 (reentrant) 的 Lex 之狀態。
<code cpp>
// 透過 yyscan_t 傳遞 scanner 狀態
//  struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
typedef void* yyscan_t;

/* For convenience, these vars (plus the bison vars far below)
   are macros in the reentrant scanner. */
#define yyin yyg->yyin_r
#define yyout yyg->yyout_r
#define yyextra yyg->yyextra_r
#define yyleng yyg->yyleng_r
#define yytext yyg->yytext_r
#define yylineno (YY_CURRENT_BUFFER_LVALUE->yy_bs_lineno)
#define yycolumn (YY_CURRENT_BUFFER_LVALUE->yy_bs_column)
#define yy_flex_debug yyg->yy_flex_debug_r

/* Holds the entire state of the reentrant scanner. */
struct yyguts_t
    {
    /* User-defined. Not touched by flex. */
    YY_EXTRA_TYPE yyextra_r;

    /* The rest are the same as the globals declared in the non-reentrant scanner. */
    FILE *yyin_r, *yyout_r;
    size_t yy_buffer_stack_top; /**< index of top of stack. */
    size_t yy_buffer_stack_max; /**< capacity of stack. */
    YY_BUFFER_STATE * yy_buffer_stack; /**< Stack as an array. */
    char yy_hold_char;
    int yy_n_chars;
    int yyleng_r;
    char *yy_c_buf_p;
    int yy_init;
    int yy_start;
    int yy_did_buffer_switch_on_eof;
    int yy_start_stack_ptr;
    int yy_start_stack_depth;
    int *yy_start_stack;
    yy_state_type yy_last_accepting_state;
    char* yy_last_accepting_cpos;

    int yylineno_r;
    int yy_flex_debug_r;

    char *yytext_r;
    int yy_more_flag;
    int yy_more_len;

    YYSTYPE * yylval_r;

    }; /* end struct yyguts_t */

</code>
  * [[http://stackoverflow.com/questions/2634998/writing-re-entrant-lexer-with-flex|Writing re-entrant lexer with Flex]]
    *<q>yyscanner is available to all your actions. In addition, yylex has a local variable called yyg that holds the entire state, and most macros conveniently refer to yyg.
</q>
    * [[http://flex.sourceforge.net/manual/Reentrant.html|Reentrant C Scanners]]
===== Bison =====
  * [[http://www.cs.uic.edu/~spopuri/cparser.html|Understanding C parsers generated by GNU Bison]]
    * [[https://lists.gnu.org/archive/html/help-bison/2006-09/msg00018.html|Understanding C parsers generated by GNU Bison]]
  * [[http://www.dickgrune.com/|Dr. Dick Grune]]
    * [[http://dickgrune.com/Books/PTAPG_1st_Edition/BookBody.pdf|Parsing Techniques - A Practical Guide]]

====== 系列文章 ======
  * [[http://efxa.org/2014/05/11/implementing-the-include-directive-to-support-nested-input-files/|Implementing the “include” directive to support nested input files]]
  * [[http://efxa.org/2014/05/17/techniques-for-resolving-common-grammar-conflicts-in-parsers/|Techniques for resolving common grammar conflicts in parsers]]
  * [[http://efxa.org/2014/05/25/how-to-create-an-abstract-syntax-tree-while-parsing-an-input-stream/|How to create an abstract syntax tree while parsing an input stream]]
====== 網路教程 ======
  * [[http://web.cs.wpi.edu/~kal/courses/compilers/|Compilers Programming Language Translation]]
  * [[http://www.progtools.org/compilers/tutorials/cxx_and_bison/cxx_and_bison.html|How to use C++ with Bison, V 1.0.2]]
  * [[http://gnuu.org/2009/09/18/writing-your-own-toy-compiler/|Writing Your Own Toy Compiler Using Flex, Bison and LLVM]]
  * [[http://www.saltycrane.com/blog/2007/02/example-using-bison-and-flex-with/|Example using bison and flex with cygwin on Windows]]
  * [[http://www.tldp.org/HOWTO/Lex-YACC-HOWTO.html#toc1|Lex and YACC primer]]
  * [[http://dinosaur.compilertools.net/|The Lex & Yacc Page]]
  * [[http://www.lemoda.net/c/reentrant-parser/index.html|Make a reentrant parser with Flex and Bison]]
====== 進階議題 ======
  * GLR parser
    * [[http://scottmcpeak.com/elkhound/|Elkhound: A GLR Parser Generator]]
  * Push and Pull parser
    * 以 parser 為主，調用 scanner 的模式，稱之為 pull parser，意味著將 token pull 進 parser。 
    * 以 scanner 為主，將 token 餵給 scanner 的模式，稱之為 push parser，意味著將 token push 進 parser。
      * [[https://www.gnu.org/software/bison/manual/html_node/Push-Decl.html|3.7.11 A Push Parser]]
===== 斷行問題 =====
  * [[http://www.editpadpro.com/tricklinebreak.html|Line Breaks in Windows, UNIX & Macintosh Text Files]]
    * Windows 系統以 CR+LF 斷行。
    * UNIX 和 OS X 系統以 LF 斷行。
    * 傳統 Mac 系統以 CR 斷行。
===== 編碼問題 =====
  * 由編輯器將文本轉成 [[UTF8|UTF-8]] 編碼，以 char * 傳給，Lex 撰寫 UTF-8 編碼匹配字符，後續再轉成寬字符 (wchar_t)。
    * [[http://csliu.com/2009/04/unicode-support-in-flex|Unicode Support in Flex]]
    * [[http://lists.gnu.org/archive/html/help-flex/2001-05/msg00014.html|Multi-byte support in 'flex']]
    * [[http://stackoverflow.com/questions/9611682/flexlexer-support-for-unicode|Flex(lexer) support for unicode]]

===== 語法剖析術語=====
  * [[http://www.panopticoncentral.net/2009/02/12/lexing-and-parsing/|Lexing and Parsing]]
  * [[http://www.panopticoncentral.net/2009/03/13/ll-vs-lr-vs-glr/|LL vs. LR vs. GLR]]
    * [[http://stackoverflow.com/questions/19663564/what-is-the-difference-between-lalr-and-lr-parsing|What is the difference between LALR and LR parsing?]]
    * LR (0) < LALR(1) < LR(1)
  * [[wp>Shift-reduce parser]]
    * Shift 表示將 token/non-terminal 置於堆疊上; reduce 代表匹配到一個 rule，將該 rule 相關的 token/non-terminal 推出堆疊。

  * [[wp>Operator-precedence parser]]
  * [[wp>Off-side rule]]
  * [[wp>Earley parser]]
    * [[http://shape-of-code.coding-guidelines.com/2014/09/10/is-early-parsing-now-practical/|Is Early parsing now practical?]]
  * [[wp>Dangling else]]

====== 外部連結 ======
  * [[https://fbb-git.github.io/bisoncpp/|BisonC++]]
  * [[http://gppg.codeplex.com/|Gardens Point Parser Generator]]
    * C# 版的 Flex & Bison。