Mercurial > projects > dang
annotate lexer/Token.d @ 44:495188f9078e new_gen
Big update - Moving towards a better, more seperated parser
The parser no loner creates the AST directly, but through
callbacks(actions). This means the parser can be run with a different set
of actions that do something else.
The parser is not back to full strength yet, the main thing missing is the
various statements and structs.
Also added a SmallArray that uses the stack only until a given size is
exceeded, after which the array is copied unto the heap.
author | Anders Halager <halager@gmail.com> |
---|---|
date | Wed, 23 Apr 2008 00:57:45 +0200 |
parents | 4e879f82dd64 |
children | b6c1dc30ca4b |
rev | line source |
---|---|
1 | 1 module lexer.Token; |
2 | |
3 public | |
4 import misc.Location; | |
5 | |
6 import Integer = tango.text.convert.Integer; | |
7 | |
42
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
8 /** |
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
9 The Token struct will be used through the Lexer, Parser and other |
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
10 modules as a location into source. |
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
11 |
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
12 The Token should always be optimized for size to limit unnecessary |
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
13 memory usage. |
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
14 */ |
1 | 15 struct Token |
16 { | |
17 Tok type; | |
18 Location location; | |
19 uint length; | |
20 | |
42
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
21 /** |
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
22 Create a new token with a Tok type, Location in source and a |
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
23 length of how many chars the Token span in the source |
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
24 */ |
1 | 25 static Token opCall (Tok type, Location location, uint length) |
26 { | |
27 Token t; | |
28 t.type = type; | |
29 t.location = location; | |
30 t.length = length; | |
31 return t; | |
32 } | |
33 | |
42
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
34 /** |
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
35 Get the type of the Token as a string |
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
36 */ |
1 | 37 char[] getType () |
38 { | |
39 return typeToString[this.type]; | |
40 } | |
41 | |
42
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
42 /** |
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
43 A human readable dump of a Token |
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
44 */ |
1 | 45 char[] toString () |
46 { | |
47 return this.getType()~": Len: "~Integer.toString(this.length) | |
48 ~", Loc: "~location.toString; | |
49 } | |
50 | |
42
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
51 /** |
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
52 Get the string in the source that matches what this Token is |
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
53 covering. |
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
54 */ |
1 | 55 char[] get () |
56 { | |
57 return location.get(length); | |
58 } | |
44
495188f9078e
Big update - Moving towards a better, more seperated parser
Anders Halager <halager@gmail.com>
parents:
42
diff
changeset
|
59 |
495188f9078e
Big update - Moving towards a better, more seperated parser
Anders Halager <halager@gmail.com>
parents:
42
diff
changeset
|
60 /** |
495188f9078e
Big update - Moving towards a better, more seperated parser
Anders Halager <halager@gmail.com>
parents:
42
diff
changeset
|
61 Returns true if the type of this token is a basic type (int, float, ...). |
495188f9078e
Big update - Moving towards a better, more seperated parser
Anders Halager <halager@gmail.com>
parents:
42
diff
changeset
|
62 Void is included, although a void in it self is not really a type. |
495188f9078e
Big update - Moving towards a better, more seperated parser
Anders Halager <halager@gmail.com>
parents:
42
diff
changeset
|
63 */ |
495188f9078e
Big update - Moving towards a better, more seperated parser
Anders Halager <halager@gmail.com>
parents:
42
diff
changeset
|
64 bool isBasicType() |
495188f9078e
Big update - Moving towards a better, more seperated parser
Anders Halager <halager@gmail.com>
parents:
42
diff
changeset
|
65 { |
495188f9078e
Big update - Moving towards a better, more seperated parser
Anders Halager <halager@gmail.com>
parents:
42
diff
changeset
|
66 return type >= Tok.Byte && type <= Tok.Void; |
495188f9078e
Big update - Moving towards a better, more seperated parser
Anders Halager <halager@gmail.com>
parents:
42
diff
changeset
|
67 } |
495188f9078e
Big update - Moving towards a better, more seperated parser
Anders Halager <halager@gmail.com>
parents:
42
diff
changeset
|
68 |
495188f9078e
Big update - Moving towards a better, more seperated parser
Anders Halager <halager@gmail.com>
parents:
42
diff
changeset
|
69 /** |
495188f9078e
Big update - Moving towards a better, more seperated parser
Anders Halager <halager@gmail.com>
parents:
42
diff
changeset
|
70 Just a shortcut to avoid `token.type == Tok.Identifier`. |
495188f9078e
Big update - Moving towards a better, more seperated parser
Anders Halager <halager@gmail.com>
parents:
42
diff
changeset
|
71 */ |
495188f9078e
Big update - Moving towards a better, more seperated parser
Anders Halager <halager@gmail.com>
parents:
42
diff
changeset
|
72 bool isIdentifier() |
495188f9078e
Big update - Moving towards a better, more seperated parser
Anders Halager <halager@gmail.com>
parents:
42
diff
changeset
|
73 { |
495188f9078e
Big update - Moving towards a better, more seperated parser
Anders Halager <halager@gmail.com>
parents:
42
diff
changeset
|
74 return type == Tok.Identifier; |
495188f9078e
Big update - Moving towards a better, more seperated parser
Anders Halager <halager@gmail.com>
parents:
42
diff
changeset
|
75 } |
1 | 76 } |
77 | |
42
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
78 /** |
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
79 Tok is short for TokenType. This enum list is to supply the Token |
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
80 with a type. |
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
81 |
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
82 This enum is used to switch over "many" places. |
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
83 */ |
1 | 84 enum Tok : ushort |
85 { | |
86 /* Non-code related tokens */ | |
87 EOF, | |
88 | |
89 /* Basic types */ | |
90 Identifier, | |
91 Integer, | |
92 | |
93 /* Basic operators */ | |
94 Assign, | |
95 Add, Sub, | |
96 Mul, Div, | |
97 Comma, | |
98 | |
99 /* Symbols */ | |
100 OpenParentheses, | |
101 CloseParentheses, | |
102 OpenBrace, | |
103 CloseBrace, | |
104 Seperator, | |
36
ce17bea8e9bd
Switch statements support
Anders Halager <halager@gmail.com>
parents:
28
diff
changeset
|
105 Colon, |
28
69464d465284
Now supporting structs - both read and write. Still a few errors though, so watch out.
Anders Johnsen <skabet@gmail.com>
parents:
22
diff
changeset
|
106 Dot, |
1 | 107 |
8
2e1069ee21af
Added Ne, Lt, Le, Gt, Ge and Not in lexer
johnsen@johnsen-desktop
parents:
6
diff
changeset
|
108 /* Comparator operators */ |
2e1069ee21af
Added Ne, Lt, Le, Gt, Ge and Not in lexer
johnsen@johnsen-desktop
parents:
6
diff
changeset
|
109 Eq, Ne, |
2e1069ee21af
Added Ne, Lt, Le, Gt, Ge and Not in lexer
johnsen@johnsen-desktop
parents:
6
diff
changeset
|
110 Lt, Gt, |
2e1069ee21af
Added Ne, Lt, Le, Gt, Ge and Not in lexer
johnsen@johnsen-desktop
parents:
6
diff
changeset
|
111 Le, Ge, |
2e1069ee21af
Added Ne, Lt, Le, Gt, Ge and Not in lexer
johnsen@johnsen-desktop
parents:
6
diff
changeset
|
112 |
2e1069ee21af
Added Ne, Lt, Le, Gt, Ge and Not in lexer
johnsen@johnsen-desktop
parents:
6
diff
changeset
|
113 Not, |
6 | 114 |
1 | 115 /* Keywords */ |
116 Byte, Ubyte, | |
117 Short, Ushort, | |
118 Int, Uint, | |
119 Long, Ulong, | |
120 | |
121 Float, Double, | |
122 | |
5
2c5a8f4c254a
Added very simple if support.
Anders Halager <halager@gmail.com>
parents:
1
diff
changeset
|
123 Bool, |
2c5a8f4c254a
Added very simple if support.
Anders Halager <halager@gmail.com>
parents:
1
diff
changeset
|
124 |
37 | 125 Void, |
126 | |
22 | 127 Struct, |
128 | |
11
642c6a998fd9
Support for while statements and fixed scope for if
Anders Halager <halager@gmail.com>
parents:
8
diff
changeset
|
129 If, Else, |
642c6a998fd9
Support for while statements and fixed scope for if
Anders Halager <halager@gmail.com>
parents:
8
diff
changeset
|
130 While, |
36
ce17bea8e9bd
Switch statements support
Anders Halager <halager@gmail.com>
parents:
28
diff
changeset
|
131 Switch, Case, Default, |
1 | 132 Return, |
133 | |
134 } | |
135 | |
42
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
136 /** |
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
137 An associative array to supply a Tok to String function. |
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
138 |
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
139 Keep always this list updated when adding a new Tok. |
4e879f82dd64
Added some docs for the lexer - now you can understand _some_ of the madness going on here :)
Anders Johnsen <skabet@gmail.com>
parents:
37
diff
changeset
|
140 */ |
1 | 141 public char[][Tok] typeToString; |
142 | |
143 static this() | |
144 { | |
145 typeToString = | |
146 [ | |
147 Tok.EOF:"EOF"[], | |
148 Tok.Identifier:"Identifier", | |
149 Tok.Byte:"Byte", | |
150 Tok.Short:"Short", | |
151 Tok.Int:"Int", | |
152 Tok.Long:"Long", | |
5
2c5a8f4c254a
Added very simple if support.
Anders Halager <halager@gmail.com>
parents:
1
diff
changeset
|
153 Tok.Bool:"Bool", |
37 | 154 Tok.Void:"Void", |
8
2e1069ee21af
Added Ne, Lt, Le, Gt, Ge and Not in lexer
johnsen@johnsen-desktop
parents:
6
diff
changeset
|
155 Tok.Eq:"Eq", |
2e1069ee21af
Added Ne, Lt, Le, Gt, Ge and Not in lexer
johnsen@johnsen-desktop
parents:
6
diff
changeset
|
156 Tok.Ne:"Ne", |
2e1069ee21af
Added Ne, Lt, Le, Gt, Ge and Not in lexer
johnsen@johnsen-desktop
parents:
6
diff
changeset
|
157 Tok.Lt:"Lt", |
2e1069ee21af
Added Ne, Lt, Le, Gt, Ge and Not in lexer
johnsen@johnsen-desktop
parents:
6
diff
changeset
|
158 Tok.Le:"Le", |
2e1069ee21af
Added Ne, Lt, Le, Gt, Ge and Not in lexer
johnsen@johnsen-desktop
parents:
6
diff
changeset
|
159 Tok.Gt:"Gt", |
2e1069ee21af
Added Ne, Lt, Le, Gt, Ge and Not in lexer
johnsen@johnsen-desktop
parents:
6
diff
changeset
|
160 Tok.Ge:"Ge", |
1 | 161 Tok.OpenParentheses:"OpenParentheses", |
162 Tok.CloseParentheses:"CloseParentheses", | |
163 Tok.OpenBrace:"OpenBrace", | |
164 Tok.CloseBrace:"CloseBrace", | |
28
69464d465284
Now supporting structs - both read and write. Still a few errors though, so watch out.
Anders Johnsen <skabet@gmail.com>
parents:
22
diff
changeset
|
165 Tok.Dot:"Dot", |
1 | 166 Tok.Assign:"Assign", |
167 Tok.Add:"Add", | |
168 Tok.Sub:"Sub", | |
169 Tok.Mul:"Mul", | |
170 Tok.Div:"Div", | |
5
2c5a8f4c254a
Added very simple if support.
Anders Halager <halager@gmail.com>
parents:
1
diff
changeset
|
171 Tok.Integer:"Integer", |
2c5a8f4c254a
Added very simple if support.
Anders Halager <halager@gmail.com>
parents:
1
diff
changeset
|
172 Tok.If:"If", |
11
642c6a998fd9
Support for while statements and fixed scope for if
Anders Halager <halager@gmail.com>
parents:
8
diff
changeset
|
173 Tok.While:"While", |
36
ce17bea8e9bd
Switch statements support
Anders Halager <halager@gmail.com>
parents:
28
diff
changeset
|
174 Tok.Switch:"Switch", |
ce17bea8e9bd
Switch statements support
Anders Halager <halager@gmail.com>
parents:
28
diff
changeset
|
175 Tok.Case:"Case", |
ce17bea8e9bd
Switch statements support
Anders Halager <halager@gmail.com>
parents:
28
diff
changeset
|
176 Tok.Default:"Default", |
21
0fb2d13dce37
Now working with gdc also (gdc use reverse paremeter validating on function calls)
johnsen@johnsen-laptop
parents:
11
diff
changeset
|
177 Tok.Comma:"Comma", |
5
2c5a8f4c254a
Added very simple if support.
Anders Halager <halager@gmail.com>
parents:
1
diff
changeset
|
178 Tok.Return:"Return", |
22 | 179 Tok.Struct:"Struct", |
36
ce17bea8e9bd
Switch statements support
Anders Halager <halager@gmail.com>
parents:
28
diff
changeset
|
180 Tok.Colon:"Colon", |
1 | 181 Tok.Seperator:"Seperator" |
182 ]; | |
183 } |