Mercurial > projects > ldc
annotate dmd2/utf.c @ 984:4c0df37d0421
Removing ldc.conf. (IMPORTANT: run 'cmake .' after pull)
Added it to .hgignore.
This gets rid of spurious differences caused by CMake regenerating it differently.
Just run 'cmake .' to get it back in your local checkout.
author | Frits van Bommel <fvbommel wxs.nl> |
---|---|
date | Thu, 19 Feb 2009 13:50:05 +0100 |
parents | f04dde6e882c |
children |
rev | line source |
---|---|
758
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
1 // utf.c |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
2 // Copyright (c) 2003 by Digital Mars |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
3 // All Rights Reserved |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
4 // written by Walter Bright |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
5 // http://www.digitalmars.com |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
6 // License for redistribution is by either the Artistic License |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
7 // in artistic.txt, or the GNU General Public License in gnu.txt. |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
8 // See the included readme.txt for details. |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
9 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
10 // Description of UTF-8 at: |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
11 // http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
12 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
13 #include <stdio.h> |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
14 #include <assert.h> |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
15 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
16 #include "utf.h" |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
17 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
18 int utf_isValidDchar(dchar_t c) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
19 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
20 return c < 0xD800 || |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
21 (c > 0xDFFF && c <= 0x10FFFF && c != 0xFFFE && c != 0xFFFF); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
22 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
23 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
24 /******************************************** |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
25 * Decode a single UTF-8 character sequence. |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
26 * Returns: |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
27 * NULL success |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
28 * !=NULL error message string |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
29 */ |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
30 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
31 const char *utf_decodeChar(unsigned char *s, size_t len, size_t *pidx, dchar_t *presult) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
32 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
33 dchar_t V; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
34 size_t i = *pidx; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
35 unsigned char u = s[i]; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
36 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
37 assert(i >= 0 && i < len); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
38 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
39 if (u & 0x80) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
40 { unsigned n; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
41 unsigned char u2; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
42 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
43 /* The following encodings are valid, except for the 5 and 6 byte |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
44 * combinations: |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
45 * 0xxxxxxx |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
46 * 110xxxxx 10xxxxxx |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
47 * 1110xxxx 10xxxxxx 10xxxxxx |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
48 * 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
49 * 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
50 * 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
51 */ |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
52 for (n = 1; ; n++) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
53 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
54 if (n > 4) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
55 goto Lerr; // only do the first 4 of 6 encodings |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
56 if (((u << n) & 0x80) == 0) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
57 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
58 if (n == 1) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
59 goto Lerr; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
60 break; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
61 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
62 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
63 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
64 // Pick off (7 - n) significant bits of B from first byte of octet |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
65 V = (dchar_t)(u & ((1 << (7 - n)) - 1)); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
66 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
67 if (i + (n - 1) >= len) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
68 goto Lerr; // off end of string |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
69 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
70 /* The following combinations are overlong, and illegal: |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
71 * 1100000x (10xxxxxx) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
72 * 11100000 100xxxxx (10xxxxxx) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
73 * 11110000 1000xxxx (10xxxxxx 10xxxxxx) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
74 * 11111000 10000xxx (10xxxxxx 10xxxxxx 10xxxxxx) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
75 * 11111100 100000xx (10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
76 */ |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
77 u2 = s[i + 1]; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
78 if ((u & 0xFE) == 0xC0 || |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
79 (u == 0xE0 && (u2 & 0xE0) == 0x80) || |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
80 (u == 0xF0 && (u2 & 0xF0) == 0x80) || |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
81 (u == 0xF8 && (u2 & 0xF8) == 0x80) || |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
82 (u == 0xFC && (u2 & 0xFC) == 0x80)) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
83 goto Lerr; // overlong combination |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
84 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
85 for (unsigned j = 1; j != n; j++) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
86 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
87 u = s[i + j]; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
88 if ((u & 0xC0) != 0x80) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
89 goto Lerr; // trailing bytes are 10xxxxxx |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
90 V = (V << 6) | (u & 0x3F); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
91 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
92 if (!utf_isValidDchar(V)) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
93 goto Lerr; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
94 i += n; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
95 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
96 else |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
97 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
98 V = (dchar_t) u; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
99 i++; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
100 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
101 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
102 assert(utf_isValidDchar(V)); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
103 *pidx = i; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
104 *presult = V; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
105 return NULL; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
106 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
107 Lerr: |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
108 *presult = (dchar_t) s[i]; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
109 *pidx = i + 1; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
110 return "invalid UTF-8 sequence"; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
111 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
112 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
113 /*************************************************** |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
114 * Validate a UTF-8 string. |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
115 * Returns: |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
116 * NULL success |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
117 * !=NULL error message string |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
118 */ |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
119 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
120 const char *utf_validateString(unsigned char *s, size_t len) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
121 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
122 size_t idx; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
123 const char *err = NULL; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
124 dchar_t dc; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
125 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
126 for (idx = 0; idx < len; ) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
127 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
128 err = utf_decodeChar(s, len, &idx, &dc); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
129 if (err) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
130 break; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
131 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
132 return err; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
133 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
134 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
135 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
136 /******************************************** |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
137 * Decode a single UTF-16 character sequence. |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
138 * Returns: |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
139 * NULL success |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
140 * !=NULL error message string |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
141 */ |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
142 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
143 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
144 const char *utf_decodeWchar(unsigned short *s, size_t len, size_t *pidx, dchar_t *presult) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
145 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
146 const char *msg; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
147 size_t i = *pidx; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
148 unsigned u = s[i]; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
149 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
150 assert(i >= 0 && i < len); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
151 if (u & ~0x7F) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
152 { if (u >= 0xD800 && u <= 0xDBFF) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
153 { unsigned u2; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
154 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
155 if (i + 1 == len) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
156 { msg = "surrogate UTF-16 high value past end of string"; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
157 goto Lerr; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
158 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
159 u2 = s[i + 1]; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
160 if (u2 < 0xDC00 || u2 > 0xDFFF) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
161 { msg = "surrogate UTF-16 low value out of range"; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
162 goto Lerr; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
163 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
164 u = ((u - 0xD7C0) << 10) + (u2 - 0xDC00); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
165 i += 2; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
166 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
167 else if (u >= 0xDC00 && u <= 0xDFFF) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
168 { msg = "unpaired surrogate UTF-16 value"; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
169 goto Lerr; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
170 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
171 else if (u == 0xFFFE || u == 0xFFFF) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
172 { msg = "illegal UTF-16 value"; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
173 goto Lerr; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
174 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
175 else |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
176 i++; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
177 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
178 else |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
179 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
180 i++; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
181 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
182 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
183 assert(utf_isValidDchar(u)); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
184 *pidx = i; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
185 *presult = (dchar_t)u; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
186 return NULL; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
187 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
188 Lerr: |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
189 *presult = (dchar_t)s[i]; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
190 *pidx = i + 1; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
191 return msg; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
192 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
193 |