-
- ISO/IEC 聯合技術委員會 (Joint Technical Committee, JTC),小組委員會 (Subcommittee, SC),工作組 (Working Group, WG)。
- WG 14 負責 C 語言,WG 21 負責 C++ 語言。
-
- 第 157 頁,對照 C99 逐句 (sentence by sentence) 閱讀。
-
- C++ 標準對某些術語有給出較精確的解釋。
-
- 注意! C 語言不是上下文無關的語法。
標準
1. Scope
-
- 本標準旨在闡述如何解釋用 C 語言所撰寫的程式。
2. Normative references
-
- 本標準所參考到的其它標準。
3. Terms, definitions, and symbols
-
- 本標準使用到的名詞、定義和符號。
-
- 實參。調用函式時,所傳遞的參數。
-
external appearance or action
- 標準所規範的行為 (behavior),著重在程序執行後,外部可觀察到的現象。
-
unspecified behavior where each implementation documents how the choice is made
- 本標準並未規定,由實現 (即編譯器) 決定之行為。如果可以明確以文件表述行為,則屬此類; 否則,則歸類於 unspecified behavior。注意! implementation-defined behavior 並不都會改變程序執行的結果。如 register 關鍵字,只會影響到執行效能。
-
behavior that depends on local conventions of nationality, culture, and language that each implementation documents
-
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
- 本標準並未給出規定之行為。比如: 除以零。
-
use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance
- 本標準提供兩種以上之可能之行為。帶有 undefined behavior 的 C 程式被視為不正確 (erroneous); 而帶有 unspecified behavior 的 C 程式被視為正確 (correct),只是行為不定。比如: 運算式 b + c 中的運算元,b 和 c 何者先被計算 (evaluate) 並未規定,但不影響其最終結果。
-
unit of data storage in the execution environment large enough to hold an object that may have one of two values
- 儲存裝置上 (一般為內存) 的一個區塊,其大小可以存下真假值。
-
addressable unit of data storage large enough to hold any member of the basic character set of the execution environment
- 儲存裝置上 (一般為內存),可以定址的一個區塊,其大小可以存下 char。
-
〈abstract〉 member of a set of elements used for the organization, control, or representation of data
- 抽象的說法,字符是用來組織、控制和表示資料。
-
single-byte character 〈C〉 bit representation that fits in a byte
- 對於 C 而言,一個 byte 可以表示的即為字符。
-
sequence of one or more bytes representing a member of the extended character set of either the source or the execution environment
- 對擴展字符集中的字符,用一個或多個 byte 加以表示。
-
bit representation that fits in an object of type wchar_t, capable of representing any character in the current locale
- 可以用 wchar_t 表示的本地字符。
-
- multibyte character 算是早期用來表示非英語系的字符集,表示字符所需的 byte 數不固定; wide character 是後來用來表示 Unicode 字符集,表示字符所需的 byte 數固定。
-
region of data storage in the execution environment, the contents of which can represent values
- object 為儲存裝置上的一個區塊 (一般為內存),其中的內容可以表示成值 (value)。其內容如何解釋成值,端賴於該 object 的型別 (type)。
-
- 形參。宣告或定義函式時,寫在參數列的參數。
-
precise meaning of the contents of an object when interpreted as having a specific type
4. Conformance
-
- 闡述什麼才是符合 (conformance) 本標準的 C 程序。對於編譯器而言,所有符合本標準的 C 程序必須通過編譯。
5. Environment
-
- 編譯和執行 C 程式的環境。
- 5.1 Conceptual models
- 5.1.1 Translation environment
- 編譯環境
- 5.1.2 Execution environments
- 執行環境
-
- 於裸機 (無作業系統) 上執行 C 程式。程式的啟動和結束皆為 implementation-defined。
-
-
- 定義程式入口點為 main。
-
- 定義 main 執行結束之後的返回值。
-
-
The semantic descriptions in this International Standard describe the behavior of an abstract machine in which issues of optimization are irrelevant.
- 本標準所述之語意,解釋一個抽象機器 (模型) 應有之行為,其中不牽涉到優化相關議題。
Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment.
- 最簡單的說法,副作用 (side effect) 改變了執行環境 (execution environment) 的狀態 (比如:賦値於某個變數)。
At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place.
- 循序點之前的所有副作用必須被執行完畢,循序點之後的所有副作用不能發生。因為副作用代表執行環境 (execution environment) 狀態上的改變,循序點可以被視為執行環境 (execution environment) 狀態上的穩定態。
6. Language
- 6. Language
- 語言規範
- 6.1 Notation
- 6.2 Concepts
-
An identifier declared in different scopes or in the same scope more than once can be made to refer to the same object or function by a process called linkage.
- 將識別字和特定物件 (指佔有內存空間的變數) 或函式進行綁定的過程,稱為 linkage。基本上透過 extern 和 static 決定識別字的 linkage。此處的 linkage 為語言上的概念。
- 一般透過鏈結器實現 linkage。
-
An object has a storage duration that determines its lifetime.
- 物件存在於存儲器 (一般指內存) 的生命週期。基本上透過 static 決定物件的 storage duration。
If an object is referred to outside of its lifetime, the behavior is undefined.
- 在物件合法的生命週期以外,存取該物件的行為被視為 undefined (returning a local variable from function in C)。
-
The meaning of a value stored in an object or returned by a function is determined by the type of the expression used to access it.
- 物件或是函式的返回值,其解釋由其型別決定。此為型別的用途所在。
Types are partitioned into object types (types that fully describe objects), function types (types that describe functions), and incomplete types (types that describe objects but lack information needed to determine their sizes).
- 基本型別分為底下三類:
- object type: 變數。
- function type: 函式。
- incomplete type: 大小不明的型別。
Any number of derived types can be constructed from the object, function, and incomplete types, as follows:
- 衍生型別 (derived type) 可以由 object、function 和 incomplete type 透過以下方式生成:
- array type
- structure type
- union type
- function type
- pointer type
A pointer type may be derived from a function type, an object type, or an incomplete type, called the referenced type.
- T * 為 pointer type,其中的 T 被稱為 referenced type (注意字尾的 ed,代表被參考到的)。這裡所提及的 referenced type 不同於 C++ 中的 reference type。
- 6.2.6 Representations of types
- 定義各型別基本的表達 (實現) 方式。
-
-
- 型別轉換規則。
- 6.3.1 Arithmetic operands
- 6.3.2 Other operands
-
An lvalue is an expression with an object type or an incomplete type other than void;
- 左值 (lvalue) 可以透過 & 取得其位址。
-
The (nonexistent) value of a void expression (an expression that has type void) shall not be used in any way, and implicit or explicit conversions (except to void) shall not be applied to such an expression.
- void 表達式 (比如返回 void 的函式調用) 不可做任何用途。
-
A pointer to void may be converted to or from a pointer to any incomplete or object type.
- void * 可以被轉型成 T *。
-
-
- 定義前處理器和編譯器需要處理的符號 (token)。
-
The value of a constant shall be in the range of representable values for its type.
- 可以這樣理解,如果無後綴的整數常量,其值超過 long long int (整數型別中最大者) 所能表示的範圍,則視作違反此條規則。
-
The type of an integer constant is the first of the corresponding list in which its value can be represented.
- 十進位且無後綴的整數常量,其型別視該整數常量從 int、long int 和 long long 中挑選。
-
An unsuffixed floating constant has type double.
-
An identifier declared as an enumeration constant has type int.
-
An integer character constant has type int.
- 字元常量其型別為 int。
If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type char whose value is that of the single character or escape sequence is converted to type int.
int i = '\xFF'; // 底層實現 (即編譯器) 預設為 signed char。 printf("sizeof: %d", i ); // 輸出 -1。
-
An expression is a sequence of operators and operands that specifies computation of a value, or that designates an object or a function, or that generates side effects, or that performs a combination thereof.
- 表達式由運算子和運算元組成。
-
- 以下的順序依照優先級,由大至小 (C Operator Precedence)。
-
- 陣列索引和函式調用屬於此類。
-
-
- address operator: &
- indirection operator: *
-
- * / %
-
- + -
-
- » «
-
- > < >= ⇐
-
- ? :
-
- = *= /= %= += -= «= »= &= ^= |=
-
The left operand of a comma operator is evaluated as a void expression;
- comma 運算子的左運算元,其運算結果最後會被捨棄。
-
A constant expression can be evaluated during translation rather than runtime, and accordingly may be used in any place that a constant may be.
- 編譯時期可以計算出值的表達式,稱為 constant expression。constant expression 可以出現在需要常量 (constant) 的地方。
-
The declaration specifiers consist of a sequence of specifiers that indicate the linkage, storage duration, and part of the type of the entities that the declarators denote.
-
- typedef、extern、static、auto 和 register。
-
- void、char、short、int、long、float、double、signed 和 unsigned。
-
- const、restrict 和 volatile。
-
- inline
-
-
- 由一般型別、指針、陣列和函式構成的型別。
-
- typedef
-
- designator (指定子),即等號左邊的識別字。
typedef struct MY_TYPE { boolean flag; short int value; double stuff; } MY_TYPE; void function(void) { MY_TYPE a = { .flag = true, .value = 123, .stuff = 0.456 }; ^^^^^ designator }
-
A statement specifies an action to be performed.
- 述句代表所要做的動作,可以是單純的表達式述句,或是控制流述句。
A block allows a set of declarations and statements to be grouped into one syntactic unit.
- compound statement 即為一個 block,語法上視作一個單元。
-
- 用於 goto 或是 switch 述句。
-
- 表達式 (expression) 後接分號 (;),即為表達式述句。
-
- 即 if-else 和 switch 述句。
A selection statement selects among a set of statements depending on the value of a controlling expression.
- controlling expression 即 if 和 switch 中的 conditional expression。
-
- 即 while、do-while 和 for 述句。
-
- 即 goto、continue、break 和 return 述句。
-
There shall be no more than one external definition for each identifier declared with internal linkage in a translation unit.
- 經過前處理 (preprocess) 之後的文本,即為 translation unit。
- 6.9.1 Function definitions41)
function-definition: declaration-specifiers declarator declaration-listopt compound-statement ^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^ void foo(int i) { ... }
-
If the declaration of an identifier for an object has file scope and an initializer, the declaration is an external definition for the identifier.
- 有初始化的變數宣告 (declaration),即為定義 (definition)。
A declaration of an identifier for an object that has file scope without an initializer, and without a storage-class specifier or with the storage-class specifier static, constitutes a tentative definition.
- 未初始化的全域變數,即為 tentative definition。
int gFoo; // tentative definition static int gBAR; // tentative definition
-
- 定義前處理器指示符。
型別轉換
- 小結:
- integer promotion (整型提升)
- 當函式形參型別未知,傳遞參數時,會進行整型提升。
- 在算述表達式裡的類型轉換 (即一般算術轉換) 的過程中,會先對個別運算元進行整型提升,可能轉換成 int 或是 unsigned int。
unsigned char c1 = 255, c2 = 2; int n = c1 + c2; ^^ ^^ unsigned char unsigned char | | int int
If an int can represent all values of the original type, the value is converted to an int;
- 如果運算元皆可被 int 所表示,則其型別皆轉換至 int; 若否,轉換至 unsigned int。(整型提升)
If both operands have the same type, then no further conversion is needed.
- 進行過整型提升之後,若運算元型別相同,型別轉換終止。(一般算術轉換)
- usual arithmetic conversion (一般算術轉換)
-
Otherwise, the integer promotions are performed on both operands.
- 在此條規則之前,優先考慮運算元為浮點型別的情況。此條規則針對運算元皆為整數型別的情況,這時,整型提升 (integer promotion) 參與規則。
Then the following rules are applied to the promoted operands:
- 以下規則應用在已進行過整型提升的運算元。
If both operands have the same type, then no further conversion is needed.
- 進行過整型提升之後,若運算元型別相同,型別轉換終止; 若否,則依照整型 rank 的規則,做進一步的型別轉換。
- 基本上是從低 rank 轉換至高 rank。
-
-
The meaning of a value stored in an object or returned by a function is determined by the type of the expression used to access it.
- 物件或是函式的返回值,其解釋由其型別決定。此為型別的用途所在。
- arithmetic types: 算術型別
- integer types: 整數型別
- floating types: 浮點型
- scalar types: 標 (純) 量型別
- arithmetic types
- pointer types
- aggregate types: 聚合型別 (不包含 union types,因為 union types 同一時間只能取得其中一個成員)
- arrary types
- struct types
- 6.2.6 Representations of types
-
-
Every integer type has an integer conversion rank defined as follows:
- integer conversion rank 定義整型之間的階級關係,此關係決定 usual arithmetic conversion (一般算術轉換) 的結果。
No two signed integer types shall have the same rank, even if they have the same representation.
- 任意兩個 signed integer type 不可以有相同的階級,即使兩者在編譯器內部皆使用相同實現 (即佔用相同的 bit 數)。
The rank of a signed integer type shall be greater than the rank of any signed integer type with less precision.
- 一個 signed integer type 的階級必須大於任一個其精準度小於前者的其它 signed integer type。
The rank of long long int shall be greater than the rank of long int, which shall be greater than the rank of int, which shall be greater than the rank of short int, which shall be greater than the rank of signed char.
- long long int > long int > int > short int > signed char
- char 是否為 signed 或是 unsigned 由編譯器內部實現決定,所以這裡必須指明是 signed char。
The rank of any unsigned integer type shall equal the rank of the corresponding signed integer type, if any.
- 任一個 unsigned integer type,其階級必須等同於對應的 signed integer type。
The rank of any standard integer type shall be greater than the rank of any extended integer type with the same width.
- 標準所定義的 integer type,其階級必須大於編譯器自行擴展且有相同長度的 extended integer type。
The rank of char shall equal the rank of signed char and unsigned char.
- char、signed char 和 unsigned char 皆有相同階級。
The rank of _Bool shall be less than the rank of all other standard integer types.
- _Bool 的階級必須小於所有其它標準所定義的 integer type。
The rank of any enumerated type shall equal the rank of the compatible integer type (see 6.7.2.2).
- 任何 enumerated type 其階級必須等同於對應相容的 integer type。
For all integer types T1, T2, and T3, if T1 has greater rank than T2 and T2 has greater rank than T3, then T1 has greater rank than T3.
- integer type 階級關係有遞移性 (transitive)。
If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int.These are called the integer promotions. All other types are unchanged by the integer promotions.
- 如果表達式中的運算元,其値落在 int 可表達範圍之內,則轉型成 int; 若否,則轉型成 unsigned int。前述兩種情況皆稱為 integer promotion (整型提升)。
The integer promotions are applied only: as part of the usual arithmetic conversions, to certain argument expressions, to the operands of the unary +, -, and ~ operators, and to both operands of the shift operators, as specified by their respective subclauses.
- 說明 integer promotion 發生時機。注意! 有些情況下,不施行 integer promotion。
-
When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
- 一整型 T1 轉換至另一整型 T2 ,若原值可以被 T2 表示,則轉換後的值不變。
-
Many operators that expect operands of arithmetic type cause conversions and yield result types in a similar way. The purpose is to determine a common real type for the operands and result.
- usual arithmetic conversion 主要決定運算結果之型別,並視情況轉換運算元之型別,以符合運算結果之型別。
Otherwise, the integer promotions are performed on both operands.
- 如果運算元皆為整型,則先做 integer promotion。
If both operands have the same type, then no further conversion is needed.
- 經過 integer promotion 之後,如果運算元皆為相同型別,則結束。
Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.
- 如果運算元其整型皆為 signed 或是 unsigned,rank 較小的一方轉換成 rank 較大的一方之型別。
Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
- 如果運算元中,unsigned 整型的 rank 大於等於另一個 signed 整型,則後者轉型成前者。
Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.
- 如果運算元中,signed 整型可以表達 unsigned 整型,則後者轉型成前者。
Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.
- 如果運算元中,signed 整型無法表達 unsigned 整型,則兩者皆轉型成 signed 整型相對應的 unsigned 整型。
-
-
Some operators (the unary operator ~, and the binary operators «, », &, ^, and |, collectively described as bitwise operators) are required to have operands that have integer type. These operators yield values that depend on the internal representations of integers, and have implementation-defined and undefined aspects for signed types.
未定義行為
- GCC 開啟
-Wall
和-Wextra
。
其它
-
- C 標準只規範 int 可以表示的最大和最小值,因此 int 的大小至少為 2 byte。
-
- C 在 GCC 上的 implementation-defined。
陷阱
-
- Signed不Signed就是問題所在
unsigned int u = 1234; int i = -5678; unsigned int result = u + i; // i 會被轉換成 unsigned int,其值不再是負數。
-
-
- 指針和陣列在 C 中是不等價的 (Question 6.2)。兩者可通用的地方,在於陣列索引 (array indexing) 和指針運算 (pointer arithmetic) (Question 6.3),和參數傳遞 (Question 6.4)。
-
- C 只支持 array of arrays (陣列的陣列),不支援 multidimensional array (多維陣列)。
附錄
- A1: Lexer
- A2: Parser