lexical-conventions.texinfo (8835B)
1 @node Lexical conventions 2 @chapter Lexical conventions 3 4 This section gives an informal account of some of the lexical 5 conventions used in writing Scheme programs. For a formal syntax of 6 Scheme, see @ref{Formal syntax}. 7 8 @menu 9 * Identifiers:: 10 * Whitespace and comments:: 11 * Other notations:: 12 * Datum labels:: 13 @end menu 14 15 @node Identifiers 16 @section Identifiers 17 18 An identifier is any sequence of letters, digits, and ``extended 19 identifier characters'' provided that it does not have a prefix which 20 is a valid number. However, the @code{.} token (a single period) used 21 in the list syntax is not an identifier. 22 23 All implementations of Scheme must support the following extended 24 identifier characters: 25 26 @example 27 ! $ % & * + - . / : < = > ? @ ^ _ ~ 28 @end example 29 30 Alternatively, an identifier can be represented by a sequence of 31 zero or more characters enclosed within vertical lines (@samp{|}), 32 analogous to string literals. Any character, including whitespace 33 characters, but excluding the backslash and vertical line characters, 34 can appear verbatim in such an identifier. In addition, characters 35 can be specified using either an @svar{inline hex escape} or the same 36 escapes available in strings. 37 38 For example, the identifier @code{|H\x65;llo|} is the same identifier 39 as @code{Hello}, and in an implementation that supports the 40 appropriate Unicode character the identifier @code{|\x3BB;|} is the 41 same as the identifier @theultimate{}. What is more, @code{|\t\t|} 42 and @code{|\x9;\x9;|} are the same. Note that @code{||} is a valid 43 identifier that is different from any other identifier. 44 45 Here are some examples of identifiers: 46 47 @example 48 ... + 49 +soup+ <=? 50 ->string a34kTMNs 51 lambda list->vector 52 q V17a 53 |two words| |two\x20;words| 54 the-word-recursion-has-many-meanings 55 @end example 56 57 @xref{Formal syntax} for the formal syntax of identifiers. 58 59 Identifiers have two uses within Scheme programs: 60 61 @itemize 62 @item 63 Any identifier can be used as a variable or as a syntactic keyword 64 (see @ref{Variables syntactic keywords and regions,, 65 Variables@comma{} syntactic keywords@comma{} and regions} 66 and @ref{Macros}). 67 68 @item 69 When an identifier appears as a literal or within a literal (see 70 @ref{Literal expressions}), it is being used to denote a @define{symbol} 71 (see @ref{Symbols}). 72 73 @end itemize 74 75 In contrast with earlier revisions of the report [@ref{R5RS}], the syntax 76 distinguishes between upper and lower case in identifiers and in 77 characters specified using their names. However, it does not 78 distinguish between upper and lower case in numbers, nor in 79 @svar{inline hex escapes} used in the syntax of identifiers, 80 characters, or strings. None of the identifiers defined in this report 81 contain upper-case characters, even when they appear to do so as a 82 result of the English-language convention of capitalizing the first 83 word of a sentence. 84 85 The following directives give explicit control over case folding. 86 87 @lisp 88 #!fold-case 89 #!no-fold-case 90 @end lisp 91 92 These directives can appear anywhere comments are permitted (see 93 @ref{Whitespace and comments}) but must be followed by a delimiter. 94 They are treated as comments, except that they affect the reading of 95 subsequent data from the same port. The @code{#!fold-case} directive 96 causes subsequent identifiers and character names to be case-folded as 97 if by string-foldcase (see @ref{Strings}). It has no effect on 98 character literals. The @code{#!no-fold-case} directive causes a return 99 to the default, non-folding behavior. 100 101 @node Whitespace and comments 102 @section Whitespace and comments 103 104 @define{Whitespace} characters include the space, tab, and newline 105 characters. (Implementations may provide additional whitespace 106 characters such as page break.) Whitespace is used for improved 107 readability and as necessary to separate tokens from each other, 108 a token being an indivisible lexical unit such as an identifier or 109 number, but is otherwise insignificant. Whitespace can occur between 110 any two tokens, but not within a token. Whitespace occurring inside 111 a string or inside a symbol delimited by vertical lines is significant. 112 113 The lexical syntax includes several comment forms. Comments are 114 treated exactly like whitespace. 115 116 A semicolon (@samp{;}) indicates the start of a line comment. The 117 comment continues to the end of the line on which the semicolon 118 appears. 119 120 @sharpindex{;} 121 122 Another way to indicate a comment is to prefix a @svar{datum} 123 (cf. @ref{External representations formal}) with @code{#;} and 124 optional @svar{whitespace}. The comment consists of the comment prefix 125 @code{#;}, the space, and the @svar{datum} together. This notation 126 is useful for ``commenting out'' sections of code. 127 128 Block comments are indicated with properly nested @code{#|} and 129 @code{|#} pairs. 130 131 @lisp 132 #| 133 The FACT procedure computes the factorial 134 of a non-negative integer. 135 |# 136 (define fact 137 (lambda (n) 138 (if (= n 0) 139 #;(= n 1) 140 1 ;Base case: return 1 141 (* n (fact (- n 1)))))) 142 @end lisp 143 144 @node Other notations 145 @section Other notations 146 147 For a description of the notations used for numbers, see @ref{Numbers}. 148 149 @table @t 150 151 @item @code{. + -} 152 These are used in numbers, and can also occur anywhere in an 153 identifier. A delimited plus or minus sign by itself is also an 154 identifier. A delimited period (not occurring within a number or 155 identifier) is used in the notation for pairs (@ref{Pairs and lists}), 156 and to indicate a rest-parameter in a formal parameter list 157 (@ref{Procedures}). Note that a sequence of two or more periods is an 158 identifier. 159 160 @item @code{( )} 161 Parentheses are used for grouping and to notate lists (@ref{Pairs and 162 lists}). 163 164 @item @code{'} 165 The apostrophe (single quote) character is used to indicate literal 166 data (@ref{Literal expressions}). 167 168 @item @code{`} 169 The grave accent (backquote) character is used to indicate partly 170 constant data (@ref{Quasiquotation}). 171 172 @item @code{, ,@@} 173 The character comma and the sequence comma at-sign are used in 174 conjunction with quasiquotation (@ref{Quasiquotation}). 175 176 @item @code{"} 177 The quotation mark character is used to delimit strings 178 (@ref{Strings}). 179 180 @item @code{\} 181 Backslash is used in the syntax for character constants 182 (@ref{Characters}) and as an escape character within string constants 183 (@ref{Strings}) and identifiers (@ref{Lexical structure}). 184 185 @item @code{[ ] @{ @}} 186 Left and right square and curly brackets (braces) are reserved for 187 possible future extensions to the language. 188 189 @item @code{#} 190 The number sign is used for a variety of purposes depending on the 191 character that immediately follows it: 192 193 @item @code{#t #f} 194 These are the boolean constants (@ref{Booleans}), along with the 195 alternatives @code{#true} and @code{#false}. 196 197 @item @code{#\} 198 This introduces a character constant (@ref{Characters}). 199 200 @item @code{#(} 201 This introduces a vector constant (@ref{Vectors}). Vector constants are 202 terminated by @code{)}. 203 204 @item @code{#u8(} 205 This introduces a bytevector constant (@ref{Bytevectors}). Bytevector 206 constants are terminated by @code{)} . 207 208 @item @code{#e #i #b #o #d #x} 209 These are used in the notation for numbers 210 (@ref{Syntax of numerical constants}). 211 212 @item @code{#@r{@svar{n}}= #@r{@svar{n}}#} 213 These are used for labeling and referencing other literal data 214 (@ref{Datum labels}). 215 216 @end table 217 218 @node Datum labels 219 @section Datum labels 220 221 @deffn {lexical syntax} #@svar{n}=@svar{datum} 222 @deffnx {lexical syntax} #@svar{n}# 223 224 The lexical syntax @code{#}@svar{n}@code{=}@svar{datum} reads the same 225 as @svar{datum}, but also results in @svar{datum} being labelled by 226 @svar{n}. It is an error if @svar{n} is not a sequence of digits. 227 228 The lexical syntax @code{#}@svar{n}@code{#} serves as a reference to 229 some object labelled by @code{#}@svar{n}@code{=}; the result is the 230 same object as the @code{#}@svar{n}@code{=} (see 231 @ref{Equivalence predicates}). Together, these syntaxes permit the 232 notation of structures with shared or circular substructure. 233 234 @lisp 235 (let ((x (list 'a 'b 'c))) 236 (set-cdr! (cddr x) x) 237 x) @result{} #0=(a b c . #0#) 238 @end lisp 239 240 The scope of a datum label is the portion of the outermost datum in 241 which it appears that is to the right of the label. Consequently, a 242 reference @code{#}@svar{n}@code{#} can occur only after a label 243 @code{#}@svar{n}@code{=}; it is an error to attempt a forward 244 reference. In addition, it is an error if the reference appears as the 245 labelled object itself (as in @code{#}@svar{n}@code{=} 246 @code{#}@svar{n}@code{#}), because the object labelled by 247 @code{#}@svar{n}@code{=} is not well defined in this case. 248 249 It is an error for a @svar{program} or @svar{library} to include 250 circular references except in literals. In particular, it is an error 251 for quasiquote (@ref{Quasiquotation}) to contain them. 252 253 @lisp 254 #1=(begin (display #\x) #1#) @result{} @r{error} 255 @end lisp 256 @end deffn