annotate dmd2/utf.c @ 1083:c1e9f612e2e2

Fix for dual operand form of fistp, also make reg ST(0) explicit and fix lindquists previous code that allowed dual operand form of fstp but dissallowed the single operand form accidently
author Kelly Wilson <wilsonk cpsc.ucalgary.ca>
date Tue, 10 Mar 2009 06:23:26 -0600
parents f04dde6e882c
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
758
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
1 // utf.c
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
2 // Copyright (c) 2003 by Digital Mars
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
3 // All Rights Reserved
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
4 // written by Walter Bright
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
5 // http://www.digitalmars.com
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
6 // License for redistribution is by either the Artistic License
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
7 // in artistic.txt, or the GNU General Public License in gnu.txt.
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
8 // See the included readme.txt for details.
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
9
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
10 // Description of UTF-8 at:
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
11 // http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
12
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
13 #include <stdio.h>
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
14 #include <assert.h>
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
15
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
16 #include "utf.h"
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
17
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
18 int utf_isValidDchar(dchar_t c)
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
19 {
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
20 return c < 0xD800 ||
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
21 (c > 0xDFFF && c <= 0x10FFFF && c != 0xFFFE && c != 0xFFFF);
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
22 }
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
23
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
24 /********************************************
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
25 * Decode a single UTF-8 character sequence.
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
26 * Returns:
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
27 * NULL success
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
28 * !=NULL error message string
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
29 */
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
30
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
31 const char *utf_decodeChar(unsigned char *s, size_t len, size_t *pidx, dchar_t *presult)
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
32 {
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
33 dchar_t V;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
34 size_t i = *pidx;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
35 unsigned char u = s[i];
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
36
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
37 assert(i >= 0 && i < len);
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
38
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
39 if (u & 0x80)
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
40 { unsigned n;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
41 unsigned char u2;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
42
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
43 /* The following encodings are valid, except for the 5 and 6 byte
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
44 * combinations:
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
45 * 0xxxxxxx
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
46 * 110xxxxx 10xxxxxx
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
47 * 1110xxxx 10xxxxxx 10xxxxxx
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
48 * 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
49 * 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
50 * 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
51 */
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
52 for (n = 1; ; n++)
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
53 {
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
54 if (n > 4)
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
55 goto Lerr; // only do the first 4 of 6 encodings
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
56 if (((u << n) & 0x80) == 0)
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
57 {
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
58 if (n == 1)
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
59 goto Lerr;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
60 break;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
61 }
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
62 }
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
63
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
64 // Pick off (7 - n) significant bits of B from first byte of octet
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
65 V = (dchar_t)(u & ((1 << (7 - n)) - 1));
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
66
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
67 if (i + (n - 1) >= len)
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
68 goto Lerr; // off end of string
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
69
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
70 /* The following combinations are overlong, and illegal:
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
71 * 1100000x (10xxxxxx)
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
72 * 11100000 100xxxxx (10xxxxxx)
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
73 * 11110000 1000xxxx (10xxxxxx 10xxxxxx)
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
74 * 11111000 10000xxx (10xxxxxx 10xxxxxx 10xxxxxx)
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
75 * 11111100 100000xx (10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx)
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
76 */
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
77 u2 = s[i + 1];
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
78 if ((u & 0xFE) == 0xC0 ||
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
79 (u == 0xE0 && (u2 & 0xE0) == 0x80) ||
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
80 (u == 0xF0 && (u2 & 0xF0) == 0x80) ||
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
81 (u == 0xF8 && (u2 & 0xF8) == 0x80) ||
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
82 (u == 0xFC && (u2 & 0xFC) == 0x80))
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
83 goto Lerr; // overlong combination
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
84
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
85 for (unsigned j = 1; j != n; j++)
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
86 {
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
87 u = s[i + j];
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
88 if ((u & 0xC0) != 0x80)
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
89 goto Lerr; // trailing bytes are 10xxxxxx
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
90 V = (V << 6) | (u & 0x3F);
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
91 }
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
92 if (!utf_isValidDchar(V))
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
93 goto Lerr;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
94 i += n;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
95 }
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
96 else
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
97 {
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
98 V = (dchar_t) u;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
99 i++;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
100 }
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
101
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
102 assert(utf_isValidDchar(V));
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
103 *pidx = i;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
104 *presult = V;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
105 return NULL;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
106
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
107 Lerr:
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
108 *presult = (dchar_t) s[i];
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
109 *pidx = i + 1;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
110 return "invalid UTF-8 sequence";
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
111 }
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
112
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
113 /***************************************************
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
114 * Validate a UTF-8 string.
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
115 * Returns:
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
116 * NULL success
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
117 * !=NULL error message string
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
118 */
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
119
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
120 const char *utf_validateString(unsigned char *s, size_t len)
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
121 {
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
122 size_t idx;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
123 const char *err = NULL;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
124 dchar_t dc;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
125
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
126 for (idx = 0; idx < len; )
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
127 {
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
128 err = utf_decodeChar(s, len, &idx, &dc);
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
129 if (err)
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
130 break;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
131 }
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
132 return err;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
133 }
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
134
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
135
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
136 /********************************************
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
137 * Decode a single UTF-16 character sequence.
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
138 * Returns:
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
139 * NULL success
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
140 * !=NULL error message string
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
141 */
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
142
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
143
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
144 const char *utf_decodeWchar(unsigned short *s, size_t len, size_t *pidx, dchar_t *presult)
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
145 {
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
146 const char *msg;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
147 size_t i = *pidx;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
148 unsigned u = s[i];
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
149
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
150 assert(i >= 0 && i < len);
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
151 if (u & ~0x7F)
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
152 { if (u >= 0xD800 && u <= 0xDBFF)
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
153 { unsigned u2;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
154
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
155 if (i + 1 == len)
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
156 { msg = "surrogate UTF-16 high value past end of string";
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
157 goto Lerr;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
158 }
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
159 u2 = s[i + 1];
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
160 if (u2 < 0xDC00 || u2 > 0xDFFF)
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
161 { msg = "surrogate UTF-16 low value out of range";
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
162 goto Lerr;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
163 }
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
164 u = ((u - 0xD7C0) << 10) + (u2 - 0xDC00);
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
165 i += 2;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
166 }
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
167 else if (u >= 0xDC00 && u <= 0xDFFF)
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
168 { msg = "unpaired surrogate UTF-16 value";
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
169 goto Lerr;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
170 }
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
171 else if (u == 0xFFFE || u == 0xFFFF)
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
172 { msg = "illegal UTF-16 value";
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
173 goto Lerr;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
174 }
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
175 else
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
176 i++;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
177 }
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
178 else
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
179 {
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
180 i++;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
181 }
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
182
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
183 assert(utf_isValidDchar(u));
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
184 *pidx = i;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
185 *presult = (dchar_t)u;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
186 return NULL;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
187
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
188 Lerr:
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
189 *presult = (dchar_t)s[i];
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
190 *pidx = i + 1;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
191 return msg;
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
192 }
f04dde6e882c Added initial D2 support, D2 frontend and changes to codegen to make things compile.
Tomas Lindquist Olsen <tomas.l.olsen@gmail.com>
parents:
diff changeset
193