Mercurial > projects > ldc
annotate dmd2/utf.c @ 1187:a95fc9fcad5c
Make sure -defaultlib and -debuglib don't get cut off if longer than 63 chars,
and clean up some overly verbose code.
author | Frits van Bommel <fvbommel wxs.nl> |
---|---|
date | Wed, 01 Apr 2009 00:52:31 +0200 |
parents | f04dde6e882c |
children |
rev | line source |
---|---|
758
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
1 // utf.c |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
2 // Copyright (c) 2003 by Digital Mars |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
3 // All Rights Reserved |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
4 // written by Walter Bright |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
5 // http://www.digitalmars.com |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
6 // License for redistribution is by either the Artistic License |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
7 // in artistic.txt, or the GNU General Public License in gnu.txt. |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
8 // See the included readme.txt for details. |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
9 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
10 // Description of UTF-8 at: |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
11 // http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
12 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
13 #include <stdio.h> |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
14 #include <assert.h> |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
15 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
16 #include "utf.h" |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
17 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
18 int utf_isValidDchar(dchar_t c) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
19 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
20 return c < 0xD800 || |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
21 (c > 0xDFFF && c <= 0x10FFFF && c != 0xFFFE && c != 0xFFFF); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
22 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
23 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
24 /******************************************** |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
25 * Decode a single UTF-8 character sequence. |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
26 * Returns: |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
27 * NULL success |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
28 * !=NULL error message string |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
29 */ |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
30 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
31 const char *utf_decodeChar(unsigned char *s, size_t len, size_t *pidx, dchar_t *presult) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
32 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
33 dchar_t V; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
34 size_t i = *pidx; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
35 unsigned char u = s[i]; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
36 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
37 assert(i >= 0 && i < len); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
38 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
39 if (u & 0x80) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
40 { unsigned n; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
41 unsigned char u2; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
42 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
43 /* The following encodings are valid, except for the 5 and 6 byte |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
44 * combinations: |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
45 * 0xxxxxxx |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
46 * 110xxxxx 10xxxxxx |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
47 * 1110xxxx 10xxxxxx 10xxxxxx |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
48 * 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
49 * 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
50 * 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
51 */ |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
52 for (n = 1; ; n++) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
53 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
54 if (n > 4) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
55 goto Lerr; // only do the first 4 of 6 encodings |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
56 if (((u << n) & 0x80) == 0) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
57 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
58 if (n == 1) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
59 goto Lerr; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
60 break; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
61 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
62 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
63 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
64 // Pick off (7 - n) significant bits of B from first byte of octet |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
65 V = (dchar_t)(u & ((1 << (7 - n)) - 1)); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
66 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
67 if (i + (n - 1) >= len) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
68 goto Lerr; // off end of string |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
69 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
70 /* The following combinations are overlong, and illegal: |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
71 * 1100000x (10xxxxxx) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
72 * 11100000 100xxxxx (10xxxxxx) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
73 * 11110000 1000xxxx (10xxxxxx 10xxxxxx) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
74 * 11111000 10000xxx (10xxxxxx 10xxxxxx 10xxxxxx) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
75 * 11111100 100000xx (10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
76 */ |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
77 u2 = s[i + 1]; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
78 if ((u & 0xFE) == 0xC0 || |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
79 (u == 0xE0 && (u2 & 0xE0) == 0x80) || |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
80 (u == 0xF0 && (u2 & 0xF0) == 0x80) || |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
81 (u == 0xF8 && (u2 & 0xF8) == 0x80) || |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
82 (u == 0xFC && (u2 & 0xFC) == 0x80)) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
83 goto Lerr; // overlong combination |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
84 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
85 for (unsigned j = 1; j != n; j++) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
86 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
87 u = s[i + j]; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
88 if ((u & 0xC0) != 0x80) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
89 goto Lerr; // trailing bytes are 10xxxxxx |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
90 V = (V << 6) | (u & 0x3F); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
91 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
92 if (!utf_isValidDchar(V)) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
93 goto Lerr; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
94 i += n; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
95 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
96 else |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
97 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
98 V = (dchar_t) u; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
99 i++; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
100 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
101 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
102 assert(utf_isValidDchar(V)); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
103 *pidx = i; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
104 *presult = V; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
105 return NULL; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
106 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
107 Lerr: |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
108 *presult = (dchar_t) s[i]; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
109 *pidx = i + 1; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
110 return "invalid UTF-8 sequence"; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
111 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
112 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
113 /*************************************************** |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
114 * Validate a UTF-8 string. |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
115 * Returns: |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
116 * NULL success |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
117 * !=NULL error message string |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
118 */ |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
119 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
120 const char *utf_validateString(unsigned char *s, size_t len) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
121 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
122 size_t idx; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
123 const char *err = NULL; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
124 dchar_t dc; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
125 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
126 for (idx = 0; idx < len; ) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
127 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
128 err = utf_decodeChar(s, len, &idx, &dc); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
129 if (err) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
130 break; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
131 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
132 return err; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
133 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
134 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
135 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
136 /******************************************** |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
137 * Decode a single UTF-16 character sequence. |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
138 * Returns: |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
139 * NULL success |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
140 * !=NULL error message string |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
141 */ |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
142 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
143 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
144 const char *utf_decodeWchar(unsigned short *s, size_t len, size_t *pidx, dchar_t *presult) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
145 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
146 const char *msg; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
147 size_t i = *pidx; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
148 unsigned u = s[i]; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
149 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
150 assert(i >= 0 && i < len); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
151 if (u & ~0x7F) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
152 { if (u >= 0xD800 && u <= 0xDBFF) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
153 { unsigned u2; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
154 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
155 if (i + 1 == len) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
156 { msg = "surrogate UTF-16 high value past end of string"; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
157 goto Lerr; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
158 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
159 u2 = s[i + 1]; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
160 if (u2 < 0xDC00 || u2 > 0xDFFF) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
161 { msg = "surrogate UTF-16 low value out of range"; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
162 goto Lerr; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
163 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
164 u = ((u - 0xD7C0) << 10) + (u2 - 0xDC00); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
165 i += 2; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
166 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
167 else if (u >= 0xDC00 && u <= 0xDFFF) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
168 { msg = "unpaired surrogate UTF-16 value"; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
169 goto Lerr; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
170 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
171 else if (u == 0xFFFE || u == 0xFFFF) |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
172 { msg = "illegal UTF-16 value"; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
173 goto Lerr; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
174 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
175 else |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
176 i++; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
177 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
178 else |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
179 { |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
180 i++; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
181 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
182 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
183 assert(utf_isValidDchar(u)); |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
184 *pidx = i; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
185 *presult = (dchar_t)u; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
186 return NULL; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
187 |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
188 Lerr: |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
189 *presult = (dchar_t)s[i]; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
190 *pidx = i + 1; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
191 return msg; |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
192 } |
f04dde6e882c
Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff
changeset
|
193 |