Mercurial > projects > ldc
annotate dmd2/utf.c @ 1650:40bd4a0d4870
Update to work with LLVM 2.7.
Removed use of dyn_cast, llvm no compiles
without exceptions and rtti by
default. We do need exceptions for the libconfig stuff, but rtti isn't
necessary (anymore).
Debug info needs to be rewritten, as in LLVM 2.7 the format has
completely changed. To have something to look at while rewriting, the
old code has been wrapped inside #ifndef DISABLE_DEBUG_INFO , this means
that you have to define this to compile at the moment.
Updated tango 0.99.9 patch to include updated EH runtime code, which is
needed for LLVM 2.7 as well.
author | Tomas Lindquist Olsen |
---|---|
date | Wed, 19 May 2010 12:42:32 +0200 |
parents | f04dde6e882c |
children |
rev | line source |
---|---|
758
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
1 // utf.c |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
2 // Copyright (c) 2003 by Digital Mars |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
3 // All Rights Reserved |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
4 // written by Walter Bright |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
5 // http://www.digitalmars.com |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
6 // License for redistribution is by either the Artistic License |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
7 // in artistic.txt, or the GNU General Public License in gnu.txt. |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
8 // See the included readme.txt for details. |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
9 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
10 // Description of UTF-8 at: |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
11 // http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
12 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
13 #include <stdio.h> |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
14 #include <assert.h> |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
15 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
16 #include "utf.h" |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
17 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
18 int utf_isValidDchar(dchar_t c) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
19 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
20 return c < 0xD800 || |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
21 (c > 0xDFFF && c <= 0x10FFFF && c != 0xFFFE && c != 0xFFFF); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
22 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
23 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
24 /******************************************** |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
25 * Decode a single UTF-8 character sequence. |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
26 * Returns: |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
27 * NULL success |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
28 * !=NULL error message string |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
29 */ |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
30 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
31 const char *utf_decodeChar(unsigned char *s, size_t len, size_t *pidx, dchar_t *presult) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
32 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
33 dchar_t V; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
34 size_t i = *pidx; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
35 unsigned char u = s[i]; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
36 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
37 assert(i >= 0 && i < len); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
38 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
39 if (u & 0x80) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
40 { unsigned n; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
41 unsigned char u2; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
42 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
43 /* The following encodings are valid, except for the 5 and 6 byte |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
44 * combinations: |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
45 * 0xxxxxxx |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
46 * 110xxxxx 10xxxxxx |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
47 * 1110xxxx 10xxxxxx 10xxxxxx |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
48 * 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
49 * 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
50 * 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
51 */ |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
52 for (n = 1; ; n++) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
53 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
54 if (n > 4) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
55 goto Lerr; // only do the first 4 of 6 encodings |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
56 if (((u << n) & 0x80) == 0) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
57 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
58 if (n == 1) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
59 goto Lerr; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
60 break; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
61 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
62 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
63 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
64 // Pick off (7 - n) significant bits of B from first byte of octet |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
65 V = (dchar_t)(u & ((1 << (7 - n)) - 1)); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
66 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
67 if (i + (n - 1) >= len) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
68 goto Lerr; // off end of string |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
69 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
70 /* The following combinations are overlong, and illegal: |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
71 * 1100000x (10xxxxxx) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
72 * 11100000 100xxxxx (10xxxxxx) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
73 * 11110000 1000xxxx (10xxxxxx 10xxxxxx) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
74 * 11111000 10000xxx (10xxxxxx 10xxxxxx 10xxxxxx) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
75 * 11111100 100000xx (10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
76 */ |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
77 u2 = s[i + 1]; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
78 if ((u & 0xFE) == 0xC0 || |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
79 (u == 0xE0 && (u2 & 0xE0) == 0x80) || |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
80 (u == 0xF0 && (u2 & 0xF0) == 0x80) || |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
81 (u == 0xF8 && (u2 & 0xF8) == 0x80) || |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
82 (u == 0xFC && (u2 & 0xFC) == 0x80)) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
83 goto Lerr; // overlong combination |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
84 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
85 for (unsigned j = 1; j != n; j++) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
86 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
87 u = s[i + j]; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
88 if ((u & 0xC0) != 0x80) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
89 goto Lerr; // trailing bytes are 10xxxxxx |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
90 V = (V << 6) | (u & 0x3F); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
91 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
92 if (!utf_isValidDchar(V)) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
93 goto Lerr; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
94 i += n; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
95 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
96 else |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
97 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
98 V = (dchar_t) u; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
99 i++; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
100 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
101 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
102 assert(utf_isValidDchar(V)); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
103 *pidx = i; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
104 *presult = V; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
105 return NULL; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
106 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
107 Lerr: |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
108 *presult = (dchar_t) s[i]; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
109 *pidx = i + 1; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
110 return "invalid UTF-8 sequence"; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
111 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
112 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
113 /*************************************************** |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
114 * Validate a UTF-8 string. |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
115 * Returns: |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
116 * NULL success |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
117 * !=NULL error message string |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
118 */ |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
119 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
120 const char *utf_validateString(unsigned char *s, size_t len) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
121 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
122 size_t idx; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
123 const char *err = NULL; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
124 dchar_t dc; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
125 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
126 for (idx = 0; idx < len; ) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
127 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
128 err = utf_decodeChar(s, len, &idx, &dc); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
129 if (err) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
130 break; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
131 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
132 return err; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
133 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
134 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
135 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
136 /******************************************** |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
137 * Decode a single UTF-16 character sequence. |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
138 * Returns: |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
139 * NULL success |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
140 * !=NULL error message string |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
141 */ |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
142 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
143 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
144 const char *utf_decodeWchar(unsigned short *s, size_t len, size_t *pidx, dchar_t *presult) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
145 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
146 const char *msg; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
147 size_t i = *pidx; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
148 unsigned u = s[i]; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
149 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
150 assert(i >= 0 && i < len); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
151 if (u & ~0x7F) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
152 { if (u >= 0xD800 && u <= 0xDBFF) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
153 { unsigned u2; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
154 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
155 if (i + 1 == len) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
156 { msg = "surrogate UTF-16 high value past end of string"; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
157 goto Lerr; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
158 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
159 u2 = s[i + 1]; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
160 if (u2 < 0xDC00 || u2 > 0xDFFF) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
161 { msg = "surrogate UTF-16 low value out of range"; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
162 goto Lerr; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
163 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
164 u = ((u - 0xD7C0) << 10) + (u2 - 0xDC00); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
165 i += 2; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
166 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
167 else if (u >= 0xDC00 && u <= 0xDFFF) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
168 { msg = "unpaired surrogate UTF-16 value"; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
169 goto Lerr; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
170 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
171 else if (u == 0xFFFE || u == 0xFFFF) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
172 { msg = "illegal UTF-16 value"; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
173 goto Lerr; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
174 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
175 else |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
176 i++; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
177 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
178 else |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
179 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
180 i++; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
181 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
182 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
183 assert(utf_isValidDchar(u)); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
184 *pidx = i; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
185 *presult = (dchar_t)u; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
186 return NULL; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
187 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
188 Lerr: |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
189 *presult = (dchar_t)s[i]; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
190 *pidx = i + 1; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
191 return msg; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
192 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
193 |