comparison com.ibm.icu/src/com/ibm/icu/mangoicu/UChar.d @ 92:ebefa5c2eab4

moving ICU bindings to com.ibm.icu
author Frank Benoit <benoit@tionex.de>
date Sun, 19 Apr 2009 13:49:38 +0200
parents base/src/java/mangoicu/UChar.d@1bf55a6eb092
children 536e43f63c81
comparison
equal deleted inserted replaced
91:2755ef2c8ef8 92:ebefa5c2eab4
1 /*******************************************************************************
2
3 @file UChar.d
4
5 Copyright (c) 2004 Kris Bell
6
7 This software is provided 'as-is', without any express or implied
8 warranty. In no event will the authors be held liable for damages
9 of any kind arising from the use of this software.
10
11 Permission is hereby granted to anyone to use this software for any
12 purpose, including commercial applications, and to alter it and/or
13 redistribute it freely, subject to the following restrictions:
14
15 1. The origin of this software must not be misrepresented; you must
16 not claim that you wrote the original software. If you use this
17 software in a product, an acknowledgment within documentation of
18 said product would be appreciated but is not required.
19
20 2. Altered source versions must be plainly marked as such, and must
21 not be misrepresented as being the original software.
22
23 3. This notice may not be removed or altered from any distribution
24 of the source.
25
26 4. Derivative works are permitted, but they must carry this notice
27 in full and credit the original source.
28
29
30 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
31
32
33 @version Initial version, October 2004
34 @author Kris
35
36
37 Note that this package and documentation is built around the ICU
38 project (http://oss.software.ibm.com/icu/). Below is the license
39 statement as specified by that software:
40
41
42 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
43
44
45 ICU License - ICU 1.8.1 and later
46
47 COPYRIGHT AND PERMISSION NOTICE
48
49 Copyright (c) 1995-2003 International Business Machines Corporation and
50 others.
51
52 All rights reserved.
53
54 Permission is hereby granted, free of charge, to any person obtaining a
55 copy of this software and associated documentation files (the
56 "Software"), to deal in the Software without restriction, including
57 without limitation the rights to use, copy, modify, merge, publish,
58 distribute, and/or sell copies of the Software, and to permit persons
59 to whom the Software is furnished to do so, provided that the above
60 copyright notice(s) and this permission notice appear in all copies of
61 the Software and that both the above copyright notice(s) and this
62 permission notice appear in supporting documentation.
63
64 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
65 OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
66 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT
67 OF THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
68 HOLDERS INCLUDED IN THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL
69 INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING
70 FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,
71 NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION
72 WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
73
74 Except as contained in this notice, the name of a copyright holder
75 shall not be used in advertising or otherwise to promote the sale, use
76 or other dealings in this Software without prior written authorization
77 of the copyright holder.
78
79 ----------------------------------------------------------------------
80
81 All trademarks and registered trademarks mentioned herein are the
82 property of their respective owners.
83
84 *******************************************************************************/
85
86 module com.ibm.icu.mangoicu.UChar;
87
88 private import com.ibm.icu.mangoicu.ICU;
89
90 /*******************************************************************************
91
92 This API provides low-level access to the Unicode Character
93 Database. In addition to raw property values, some convenience
94 functions calculate derived properties, for example for Java-style
95 programming.
96
97 Unicode assigns each code point (not just assigned character)
98 values for many properties. Most of them are simple boolean
99 flags, or constants from a small enumerated list. For some
100 properties, values are strings or other relatively more complex
101 types.
102
103 For more information see "About the Unicode Character Database"
104 (http://www.unicode.org/ucd/) and the ICU User Guide chapter on
105 Properties (http://oss.software.ibm.com/icu/userguide/properties.html).
106
107 Many functions are designed to match java.lang.Character functions.
108 See the individual function documentation, and see the JDK 1.4.1
109 java.lang.Character documentation at
110 http://java.sun.com/j2se/1.4.1/docs/api/java/lang/Character.html
111
112 There are also functions that provide easy migration from C/POSIX
113 functions like isblank(). Their use is generally discouraged because
114 the C/POSIX standards do not define their semantics beyond the ASCII
115 range, which means that different implementations exhibit very different
116 behavior. Instead, Unicode properties should be used directly.
117
118 There are also only a few, broad C/POSIX character classes, and they
119 tend to be used for conflicting purposes. For example, the "isalpha()"
120 class is sometimes used to determine word boundaries, while a more
121 sophisticated approach would at least distinguish initial letters from
122 continuation characters (the latter including combining marks). (In
123 ICU, BreakIterator is the most sophisticated API for word boundaries.)
124 Another example: There is no "istitle()" class for titlecase characters.
125
126 A summary of the behavior of some C/POSIX character classification
127 implementations for Unicode is available at
128 http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/posix_classes.html
129
130 See <A HREF="http://oss.software.ibm.com/icu/apiref/uchar_8h.html">
131 this page</A> for full details.
132
133 *******************************************************************************/
134
135 class UChar : ICU
136 {
137 public enum Property
138 {
139 Alphabetic = 0,
140 BinaryStart = Alphabetic,
141 AsciiHexDigit,
142 BidiControl,
143 BidiMirrored,
144 Dash,
145 DefaultIgnorableCodePoint,
146 Deprecated,
147 Diacritic,
148 Extender,
149 FullCompositionExclusion,
150 GraphemeBase,
151 GraphemeExtend,
152 GraphemeLink,
153 HexDigit,
154 Hyphen,
155 IdContinue,
156 IdStart,
157 Ideographic,
158 IdsBinaryOperator,
159 IdsTrinaryOperator,
160 JoinControl,
161 LogicalOrderException,
162 Lowercase,
163 Math,
164 NoncharacterCodePoint,
165 QuotationMark,
166 Radical,
167 SoftDotted,
168 TerminalPunctuation,
169 UnifiedIdeograph,
170 Uppercase,
171 WhiteSpace,
172 XidContinue,
173 XidStart,
174 CaseSensitive,
175 STerm,
176 VariationSelector,
177 NfdInert,
178 NfkdInert,
179 NfcInert,
180 NfkcInert,
181 SegmentStarter,
182 BinaryLimit,
183 BidiClass = 0x1000,
184 IntStart = BidiClass,
185 Block, CanonicalCombiningClass,
186 DecompositionType,
187 EastAsianWidth,
188 GeneralCategory,
189 JoiningGroup,
190 JoiningType,
191 LineBreak,
192 NumericType,
193 Script,
194 HangulSyllableType,
195 NfdQuickCheck,
196 NfkdQuickCheck,
197 NfcQuickCheck,
198 NfkcQuickCheck,
199 LeadCanonicalCombiningClass,
200 TrailCanonicalCombiningClass,
201 IntLimit,
202 GeneralCategoryMask = 0x2000,
203 MaskStart = GeneralCategoryMask,
204 MaskLimit,
205 NumericValue = 0x3000,
206 DoubleStart = NumericValue,
207 DoubleLimit,
208 Age = 0x4000,
209 StringStart = Age,
210 BidiMirroringGlyph,
211 CaseFolding,
212 IsoComment,
213 LowercaseMapping,
214 Name,
215 SimpleCaseFolding,
216 SimpleLowercaseMapping,
217 SimpleTitlecaseMapping,
218 SimpleUppercaseMapping,
219 TitlecaseMapping,
220 Unicode1Name,
221 UppercaseMapping,
222 StringLimit,
223 InvalidCode = -1
224 }
225
226 public enum Category
227 {
228 Unassigned = 0,
229 GeneralOtherTypes = 0,
230 UppercaseLetter = 1,
231 LowercaseLetter = 2,
232 TitlecaseLetter = 3,
233 ModifierLetter = 4,
234 OtherLetter = 5,
235 NonSpacingMark = 6,
236 EnclosingMark = 7,
237 CombiningSpacingMark = 8,
238 DecimalDigitNumber = 9,
239 LetterNumber = 10,
240 OtherNumber = 11,
241 SpaceSeparator = 12,
242 LineSeparator = 13,
243 ParagraphSeparator = 14,
244 ControlChar = 15,
245 FormatChar = 16,
246 PrivateUseChar = 17,
247 Surrogate = 18,
248 DashPunctuation = 19,
249 StartPunctuation = 20,
250 EndPunctuation = 21,
251 ConnectorPunctuation = 22,
252 OtherPunctuation = 23,
253 MathSymbol = 24,
254 CurrencySymbol = 25,
255 ModifierSymbol = 26,
256 OtherSymbol = 27,
257 InitialPunctuation = 28,
258 FinalPunctuation = 29,
259 Count
260 }
261
262 public enum Direction
263 {
264 LeftToRight = 0,
265 RightToLeft = 1,
266 EuropeanNumber = 2,
267 EuropeanNumberSeparator = 3,
268 EuropeanNumberTerminator = 4,
269 ArabicNumber = 5,
270 CommonNumberSeparator = 6,
271 BlockSeparator = 7,
272 SegmentSeparator = 8,
273 WhiteSpaceNeutral = 9,
274 OtherNeutral = 10,
275 LeftToRightEmbedding = 11,
276 LeftToRightOverride = 12,
277 RightToLeftArabic = 13,
278 RightToLeftEmbedding = 14,
279 RightToLeftOverride = 15,
280 PopDirectionalFormat = 16,
281 DirNonSpacingMark = 17,
282 BoundaryNeutral = 18,
283 Count
284 }
285
286 public enum BlockCode
287 {
288 NoBlock = 0,
289 BasicLatin = 1,
290 Latin1Supplement = 2,
291 LatinExtendedA = 3,
292 LatinExtendedB = 4,
293 IpaExtensions = 5,
294 SpacingModifierLetters = 6,
295 CombiningDiacriticalMarks = 7,
296 Greek = 8,
297 Cyrillic = 9,
298 Armenian = 10,
299 Hebrew = 11,
300 Arabic = 12,
301 Syriac = 13,
302 Thaana = 14,
303 Devanagari = 15,
304 Bengali = 16,
305 Gurmukhi = 17,
306 Gujarati = 18,
307 Oriya = 19,
308 Tamil = 20,
309 Telugu = 21,
310 Kannada = 22,
311 Malayalam = 23,
312 Sinhala = 24,
313 Thai = 25,
314 Lao = 26,
315 Tibetan = 27,
316 Myanmar = 28,
317 Georgian = 29,
318 HangulJamo = 30,
319 Ethiopic = 31,
320 Cherokee = 32,
321 UnifiedCanadianAboriginalSyllabics = 33,
322 Ogham = 34,
323 Runic = 35,
324 Khmer = 36,
325 Mongolian = 37,
326 LatinExtendedAdditional = 38,
327 GreekExtended = 39,
328 GeneralPunctuation = 40,
329 SuperscriptsAndSubscripts = 41,
330 CurrencySymbols = 42,
331 CombiningMarksForSymbols = 43,
332 LetterlikeSymbols = 44,
333 NumberForms = 45,
334 Arrows = 46,
335 MathematicalOperators = 47,
336 MiscellaneousTechnical = 48,
337 ControlPictures = 49,
338 OpticalCharacterRecognition = 50,
339 EnclosedAlphanumerics = 51,
340 BoxDrawing = 52,
341 BlockElements = 53,
342 GeometricShapes = 54,
343 MiscellaneousSymbols = 55,
344 Dingbats = 56,
345 BraillePatterns = 57,
346 CjkRadicalsSupplement = 58,
347 KangxiRadicals = 59,
348 IdeographicDescriptionCharacters = 60,
349 CjkSymbolsAndPunctuation = 61,
350 Hiragana = 62,
351 Katakana = 63,
352 Bopomofo = 64,
353 HangulCompatibilityJamo = 65,
354 Kanbun = 66,
355 BopomofoExtended = 67,
356 EnclosedCjkLettersAndMonths = 68,
357 CjkCompatibility = 69,
358 CjkUnifiedIdeographsExtensionA = 70,
359 CjkUnifiedIdeographs = 71,
360 YiSyllables = 72,
361 YiRadicals = 73,
362 HangulSyllables = 74,
363 HighSurrogates = 75,
364 HighPrivateUseSurrogates = 76,
365 LowSurrogates = 77,
366 PrivateUse = 78,
367 PrivateUseArea = PrivateUse,
368 CjkCompatibilityIdeographs = 79,
369 AlphabeticPresentationForms = 80,
370 ArabicPresentationFormsA = 81,
371 CombiningHalfMarks = 82,
372 CjkCompatibilityForms = 83,
373 SmallFormVariants = 84,
374 ArabicPresentationFormsB = 85,
375 Specials = 86,
376 HalfwidthAndFullwidthForms = 87,
377 OldItalic = 88,
378 Gothic = 89,
379 Deseret = 90,
380 ByzantineMusicalSymbols = 91,
381 MusicalSymbols = 92,
382 MathematicalAlphanumericSymbols = 93,
383 CjkUnifiedIdeographsExtensionB = 94,
384 CjkCompatibilityIdeographsSupplement = 95,
385 Tags = 96,
386 CyrillicSupplementary = 97,
387 CyrillicSupplement = CyrillicSupplementary,
388 Tagalog = 98,
389 Hanunoo = 99,
390 Buhid = 100,
391 Tagbanwa = 101,
392 MiscellaneousMathematicalSymbolsA = 102,
393 SupplementalArrowsA = 103,
394 SupplementalArrowsB = 104,
395 MiscellaneousMathematicalSymbolsB = 105,
396 SupplementalMathematicalOperators = 106,
397 KatakanaPhoneticExtensions = 107,
398 VariationSelectors = 108,
399 SupplementaryPrivateUseAreaA = 109,
400 SupplementaryPrivateUseAreaB = 110,
401 Limbu = 111,
402 TaiLe = 112,
403 KhmerSymbols = 113,
404 PhoneticExtensions = 114,
405 MiscellaneousSymbolsAndArrows = 115,
406 YijingHexagramSymbols = 116,
407 LinearBSyllabary = 117,
408 LinearBIdeograms = 118,
409 AegeanNumbers = 119,
410 Ugaritic = 120,
411 Shavian = 121,
412 Osmanya = 122,
413 CypriotSyllabary = 123,
414 TaiXuanJingSymbols = 124,
415 VariationSelectorsSupplement = 125,
416 Count,
417 InvalidCode = -1
418 }
419
420 public enum EastAsianWidth
421 {
422 Neutral,
423 Ambiguous,
424 Halfwidth,
425 Fullwidth,
426 Narrow,
427 Wide,
428 Count
429 }
430
431 public enum CharNameChoice
432 {
433 Unicode,
434 Unicode10,
435 Extended,
436 Count
437 }
438
439 public enum NameChoice
440 {
441 Short,
442 Long,
443 Count
444 }
445
446 public enum DecompositionType
447 {
448 None,
449 Canonical,
450 Compat,
451 Circle,
452 Final,
453 Font,
454 Fraction,
455 Initial,
456 Isolated,
457 Medial,
458 Narrow,
459 Nobreak,
460 Small,
461 Square,
462 Sub,
463 Super,
464 Vertical,
465 Wide,
466 Count
467 }
468
469 public enum JoiningType
470 {
471 NonJoining,
472 JoinCausing,
473 DualJoining,
474 LeftJoining,
475 RightJoining,
476 Transparent,
477 Count
478 }
479
480 public enum JoiningGroup
481 {
482 NoJoiningGroup,
483 Ain,
484 Alaph,
485 Alef,
486 Beh,
487 Beth,
488 Dal,
489 DalathRish,
490 E,
491 Feh,
492 FinalSemkath,
493 Gaf,
494 Gamal,
495 Hah,
496 HamzaOnHehGoal,
497 He,
498 Heh,
499 HehGoal,
500 Heth,
501 Kaf,
502 Kaph,
503 KnottedHeh,
504 Lam,
505 Lamadh,
506 Meem,
507 Mim,
508 Noon,
509 Nun,
510 Pe,
511 Qaf,
512 Qaph,
513 Reh,
514 Reversed_Pe,
515 Sad,
516 Sadhe,
517 Seen,
518 Semkath,
519 Shin,
520 Swash_Kaf,
521 Syriac_Waw,
522 Tah,
523 Taw,
524 Teh_Marbuta,
525 Teth,
526 Waw,
527 Yeh,
528 Yeh_Barree,
529 Yeh_With_Tail,
530 Yudh,
531 Yudh_He,
532 Zain,
533 Fe,
534 Khaph,
535 Zhain,
536 Count
537 }
538
539 public enum LineBreak
540 {
541 Unknown,
542 Ambiguous,
543 Alphabetic,
544 BreakBoth,
545 BreakAfter,
546 BreakBefore,
547 MandatoryBreak,
548 ContingentBreak,
549 ClosePunctuation,
550 CombiningMark,
551 CarriageReturn,
552 Exclamation,
553 Glue,
554 Hyphen,
555 Ideographic,
556 Inseperable,
557 Inseparable = Inseperable,
558 InfixNumeric,
559 LineFeed,
560 Nonstarter,
561 Numeric,
562 OpenPunctuation,
563 PostfixNumeric,
564 PrefixNumeric,
565 Quotation,
566 ComplexContext,
567 Surrogate,
568 Space,
569 BreakSymbols,
570 Zwspace,
571 NextLine,
572 WordJoiner,
573 Count
574 }
575
576 public enum NumericType
577 {
578 None,
579 Decimal,
580 Digit,
581 Numeric,
582 Count
583 }
584
585 public enum HangulSyllableType
586 {
587 NotApplicable,
588 LeadingJamo,
589 VowelJamo,
590 TrailingJamo,
591 LvSyllable,
592 LvtSyllable,
593 Count
594 }
595
596 /***********************************************************************
597
598 Get the property value for an enumerated or integer
599 Unicode property for a code point. Also returns binary
600 and mask property values.
601
602 Unicode, especially in version 3.2, defines many more
603 properties than the original set in UnicodeData.txt.
604
605 The properties APIs are intended to reflect Unicode
606 properties as defined in the Unicode Character Database
607 (UCD) and Unicode Technical Reports (UTR). For details
608 about the properties see http://www.unicode.org/ . For
609 names of Unicode properties see the file PropertyAliases.txt
610
611 ***********************************************************************/
612
613 uint getProperty (dchar c, Property p)
614 {
615 return u_getIntPropertyValue (cast(uint) c, cast(uint) p);
616 }
617
618 /***********************************************************************
619
620 Get the minimum value for an enumerated/integer/binary
621 Unicode property
622
623 ***********************************************************************/
624
625 uint getPropertyMinimum (Property p)
626 {
627 return u_getIntPropertyMinValue (p);
628 }
629
630 /***********************************************************************
631
632 Get the maximum value for an enumerated/integer/binary
633 Unicode property
634
635 ***********************************************************************/
636
637 uint getPropertyMaximum (Property p)
638 {
639 return u_getIntPropertyMaxValue (p);
640 }
641
642 /***********************************************************************
643
644 Returns the bidirectional category value for the code
645 point, which is used in the Unicode bidirectional algorithm
646 (UAX #9 http://www.unicode.org/reports/tr9/).
647
648 ***********************************************************************/
649
650 Direction charDirection (dchar c)
651 {
652 return cast(Direction) u_charDirection (c);
653 }
654
655 /***********************************************************************
656
657 Returns the Unicode allocation block that contains the
658 character
659
660 ***********************************************************************/
661
662 BlockCode getBlockCode (dchar c)
663 {
664 return cast(BlockCode) ublock_getCode (c);
665 }
666
667 /***********************************************************************
668
669 Retrieve the name of a Unicode character.
670
671 ***********************************************************************/
672
673 char[] getCharName (dchar c, CharNameChoice choice, inout char[] dst)
674 {
675 UErrorCode e;
676
677 uint len = u_charName (c, choice, dst.ptr, dst.length, e);
678 testError (e, "failed to extract char name (buffer too small?)");
679 return dst [0..len];
680 }
681
682 /***********************************************************************
683
684 Get the ISO 10646 comment for a character.
685
686 ***********************************************************************/
687
688 char[] getComment (dchar c, inout char[] dst)
689 {
690 UErrorCode e;
691
692 uint len = u_getISOComment (c, dst.ptr, dst.length, e);
693 testError (e, "failed to extract comment (buffer too small?)");
694 return dst [0..len];
695 }
696
697 /***********************************************************************
698
699 Find a Unicode character by its name and return its code
700 point value.
701
702 ***********************************************************************/
703
704 dchar charFromName (CharNameChoice choice, char[] name)
705 {
706 UErrorCode e;
707
708 dchar c = u_charFromName (choice, toString(name), e);
709 testError (e, "failed to locate char name");
710 return c;
711 }
712
713 /***********************************************************************
714
715 Return the Unicode name for a given property, as given in the
716 Unicode database file PropertyAliases.txt
717
718 ***********************************************************************/
719
720 char[] getPropertyName (Property p, NameChoice choice)
721 {
722 return toArray (u_getPropertyName (p, choice));
723 }
724
725 /***********************************************************************
726
727 Return the Unicode name for a given property value, as given
728 in the Unicode database file PropertyValueAliases.txt.
729
730 ***********************************************************************/
731
732 char[] getPropertyValueName (Property p, NameChoice choice, uint value)
733 {
734 return toArray (u_getPropertyValueName (p, value, choice));
735 }
736
737 /***********************************************************************
738
739 Gets the Unicode version information
740
741 ***********************************************************************/
742
743 void getUnicodeVersion (inout Version v)
744 {
745 u_getUnicodeVersion (v);
746 }
747
748 /***********************************************************************
749
750 Get the "age" of the code point
751
752 ***********************************************************************/
753
754 void getCharAge (dchar c, inout Version v)
755 {
756 u_charAge (c, v);
757 }
758
759
760 /***********************************************************************
761
762 These are externalised directly to the client (sans wrapper),
763 but this may have to change for linux, depending upon the
764 ICU function-naming conventions within the Posix libraries.
765
766 ***********************************************************************/
767
768 static extern (C)
769 {
770 /***************************************************************
771
772 Check if a code point has the Alphabetic Unicode
773 property.
774
775 ***************************************************************/
776
777 bool function (dchar c) isUAlphabetic;
778
779 /***************************************************************
780
781 Check if a code point has the Lowercase Unicode
782 property.
783
784 ***************************************************************/
785
786 bool function (dchar c) isULowercase;
787
788 /***************************************************************
789
790 Check if a code point has the Uppercase Unicode
791 property.
792
793 ***************************************************************/
794
795 bool function (dchar c) isUUppercase;
796
797 /***************************************************************
798
799 Check if a code point has the White_Space Unicode
800 property.
801
802 ***************************************************************/
803
804 bool function (dchar c) isUWhiteSpace;
805
806 /***************************************************************
807
808 Determines whether the specified code point has the
809 general category "Ll" (lowercase letter).
810
811 ***************************************************************/
812
813 bool function (dchar c) isLower;
814
815 /***************************************************************
816
817 Determines whether the specified code point has the
818 general category "Lu" (uppercase letter).
819
820 ***************************************************************/
821
822 bool function (dchar c) isUpper;
823
824 /***************************************************************
825
826 Determines whether the specified code point is a
827 titlecase letter.
828
829 ***************************************************************/
830
831 bool function (dchar c) isTitle;
832
833 /***************************************************************
834
835 Determines whether the specified code point is a
836 digit character according to Java.
837
838 ***************************************************************/
839
840 bool function (dchar c) isDigit;
841
842 /***************************************************************
843
844 Determines whether the specified code point is a
845 letter character.
846
847 ***************************************************************/
848
849 bool function (dchar c) isAlpha;
850
851 /***************************************************************
852
853 Determines whether the specified code point is an
854 alphanumeric character (letter or digit) according
855 to Java.
856
857 ***************************************************************/
858
859 bool function (dchar c) isAlphaNumeric;
860
861 /***************************************************************
862
863 Determines whether the specified code point is a
864 hexadecimal digit.
865
866 ***************************************************************/
867
868 bool function (dchar c) isHexDigit;
869
870 /***************************************************************
871
872 Determines whether the specified code point is a
873 punctuation character.
874
875 ***************************************************************/
876
877 bool function (dchar c) isPunct;
878
879 /***************************************************************
880
881 Determines whether the specified code point is a
882 "graphic" character (printable, excluding spaces).
883
884 ***************************************************************/
885
886 bool function (dchar c) isGraph;
887
888 /***************************************************************
889
890 Determines whether the specified code point is a
891 "blank" or "horizontal space", a character that
892 visibly separates words on a line.
893
894 ***************************************************************/
895
896 bool function (dchar c) isBlank;
897
898 /***************************************************************
899
900 Determines whether the specified code point is
901 "defined", which usually means that it is assigned
902 a character.
903
904 ***************************************************************/
905
906 bool function (dchar c) isDefined;
907
908 /***************************************************************
909
910 Determines if the specified character is a space
911 character or not.
912
913 ***************************************************************/
914
915 bool function (dchar c) isSpace;
916
917 /***************************************************************
918
919 Determine if the specified code point is a space
920 character according to Java.
921
922 ***************************************************************/
923
924 bool function (dchar c) isJavaSpaceChar;
925
926 /***************************************************************
927
928 Determines if the specified code point is a whitespace
929 character according to Java/ICU.
930
931 ***************************************************************/
932
933 bool function (dchar c) isWhiteSpace;
934
935 /***************************************************************
936
937 Determines whether the specified code point is a
938 control character (as defined by this function).
939
940 ***************************************************************/
941
942 bool function (dchar c) isCtrl;
943
944 /***************************************************************
945
946 Determines whether the specified code point is an ISO
947 control code.
948
949 ***************************************************************/
950
951 bool function (dchar c) isISOControl;
952
953 /***************************************************************
954
955 Determines whether the specified code point is a
956 printable character.
957
958 ***************************************************************/
959
960 bool function (dchar c) isPrint;
961
962 /***************************************************************
963
964 Determines whether the specified code point is a
965 base character.
966
967 ***************************************************************/
968
969 bool function (dchar c) isBase;
970
971 /***************************************************************
972
973 Determines if the specified character is permissible
974 as the first character in an identifier according to
975 Unicode (The Unicode Standard, Version 3.0, chapter
976 5.16 Identifiers).
977
978 ***************************************************************/
979
980 bool function (dchar c) isIDStart;
981
982 /***************************************************************
983
984 Determines if the specified character is permissible
985 in an identifier according to Java.
986
987 ***************************************************************/
988
989 bool function (dchar c) isIDPart;
990
991 /***************************************************************
992
993 Determines if the specified character should be
994 regarded as an ignorable character in an identifier,
995 according to Java.
996
997 ***************************************************************/
998
999 bool function (dchar c) isIDIgnorable;
1000
1001 /***************************************************************
1002
1003 Determines if the specified character is permissible
1004 as the first character in a Java identifier.
1005
1006 ***************************************************************/
1007
1008 bool function (dchar c) isJavaIDStart;
1009
1010 /***************************************************************
1011
1012 Determines if the specified character is permissible
1013 in a Java identifier.
1014
1015 ***************************************************************/
1016
1017 bool function (dchar c) isJavaIDPart;
1018
1019 /***************************************************************
1020
1021 Determines whether the code point has the
1022 Bidi_Mirrored property.
1023
1024 ***************************************************************/
1025
1026 bool function (dchar c) isMirrored;
1027
1028 /***************************************************************
1029
1030 Returns the decimal digit value of a decimal digit
1031 character.
1032
1033 ***************************************************************/
1034
1035 ubyte function (dchar c) charDigitValue;
1036
1037 /***************************************************************
1038
1039 Maps the specified character to a "mirror-image"
1040 character.
1041
1042 ***************************************************************/
1043
1044 dchar function (dchar c) charMirror;
1045
1046 /***************************************************************
1047
1048 Returns the general category value for the code point.
1049
1050 ***************************************************************/
1051
1052 ubyte function (dchar c) charType;
1053
1054 /***************************************************************
1055
1056 Returns the combining class of the code point as
1057 specified in UnicodeData.txt.
1058
1059 ***************************************************************/
1060
1061 ubyte function (dchar c) getCombiningClass;
1062
1063 /***************************************************************
1064
1065 The given character is mapped to its lowercase
1066 equivalent according to UnicodeData.txt; if the
1067 character has no lowercase equivalent, the
1068 character itself is returned.
1069
1070 ***************************************************************/
1071
1072 dchar function (dchar c) toLower;
1073
1074 /***************************************************************
1075
1076 The given character is mapped to its uppercase equivalent
1077 according to UnicodeData.txt; if the character has no
1078 uppercase equivalent, the character itself is returned.
1079
1080 ***************************************************************/
1081
1082 dchar function (dchar c) toUpper;
1083
1084 /***************************************************************
1085
1086 The given character is mapped to its titlecase
1087 equivalent according to UnicodeData.txt; if none
1088 is defined, the character itself is returned.
1089
1090 ***************************************************************/
1091
1092 dchar function (dchar c) toTitle;
1093
1094 /***************************************************************
1095
1096 The given character is mapped to its case folding
1097 equivalent according to UnicodeData.txt and
1098 CaseFolding.txt; if the character has no case folding
1099 equivalent, the character itself is returned.
1100
1101 ***************************************************************/
1102
1103 dchar function (dchar c, uint options) foldCase;
1104
1105 /***************************************************************
1106
1107 Returns the decimal digit value of the code point in
1108 the specified radix.
1109
1110 ***************************************************************/
1111
1112 uint function (dchar ch, ubyte radix) digit;
1113
1114 /***************************************************************
1115
1116 Determines the character representation for a specific
1117 digit in the specified radix.
1118
1119 ***************************************************************/
1120
1121 dchar function (uint digit, ubyte radix) forDigit;
1122
1123 /***************************************************************
1124
1125 Get the numeric value for a Unicode code point as
1126 defined in the Unicode Character Database.
1127
1128 ***************************************************************/
1129
1130 double function (dchar c) getNumericValue;
1131 }
1132
1133
1134 /***********************************************************************
1135
1136 Bind the ICU functions from a shared library. This is
1137 complicated by the issues regarding D and DLLs on the
1138 Windows platform
1139
1140 ***********************************************************************/
1141
1142 private static void* library;
1143
1144 /***********************************************************************
1145
1146 ***********************************************************************/
1147
1148 private static extern (C)
1149 {
1150 uint function (uint, uint) u_getIntPropertyValue;
1151 uint function (uint) u_getIntPropertyMinValue;
1152 uint function (uint) u_getIntPropertyMaxValue;
1153 uint function (dchar) u_charDirection;
1154 uint function (dchar) ublock_getCode;
1155 uint function (dchar, uint, char*, uint, inout UErrorCode) u_charName;
1156 uint function (dchar, char*, uint, inout UErrorCode) u_getISOComment;
1157 uint function (uint, char*, inout UErrorCode) u_charFromName;
1158 char* function (uint, uint) u_getPropertyName;
1159 char* function (uint, uint, uint) u_getPropertyValueName;
1160 void function (inout Version) u_getUnicodeVersion;
1161 void function (dchar, inout Version) u_charAge;
1162 }
1163
1164 /***********************************************************************
1165
1166 ***********************************************************************/
1167
1168 static FunctionLoader.Bind[] targets =
1169 [
1170 {cast(void**) &forDigit, "u_forDigit"},
1171 {cast(void**) &digit, "u_digit"},
1172 {cast(void**) &foldCase, "u_foldCase"},
1173 {cast(void**) &toTitle, "u_totitle"},
1174 {cast(void**) &toUpper, "u_toupper"},
1175 {cast(void**) &toLower, "u_tolower"},
1176 {cast(void**) &charType, "u_charType"},
1177 {cast(void**) &charMirror, "u_charMirror"},
1178 {cast(void**) &charDigitValue, "u_charDigitValue"},
1179 {cast(void**) &isJavaIDPart, "u_isJavaIDPart"},
1180 {cast(void**) &isJavaIDStart, "u_isJavaIDStart"},
1181 {cast(void**) &isIDIgnorable, "u_isIDIgnorable"},
1182 {cast(void**) &isIDPart, "u_isIDPart"},
1183 {cast(void**) &isIDStart, "u_isIDStart"},
1184 {cast(void**) &isMirrored, "u_isMirrored"},
1185 {cast(void**) &isBase, "u_isbase"},
1186 {cast(void**) &isPrint, "u_isprint"},
1187 {cast(void**) &isISOControl, "u_isISOControl"},
1188 {cast(void**) &isCtrl, "u_iscntrl"},
1189 {cast(void**) &isWhiteSpace, "u_isWhitespace"},
1190 {cast(void**) &isJavaSpaceChar, "u_isJavaSpaceChar"},
1191 {cast(void**) &isSpace, "u_isspace"},
1192 {cast(void**) &isDefined, "u_isdefined"},
1193 {cast(void**) &isBlank, "u_isblank"},
1194 {cast(void**) &isGraph, "u_isgraph"},
1195 {cast(void**) &isPunct, "u_ispunct"},
1196 {cast(void**) &isHexDigit, "u_isxdigit"},
1197 {cast(void**) &isAlpha, "u_isalpha"},
1198 {cast(void**) &isAlphaNumeric, "u_isalnum"},
1199 {cast(void**) &isDigit, "u_isdigit"},
1200 {cast(void**) &isTitle, "u_istitle"},
1201 {cast(void**) &isUpper, "u_isupper"},
1202 {cast(void**) &isLower, "u_islower"},
1203 {cast(void**) &isUAlphabetic, "u_isUAlphabetic"},
1204 {cast(void**) &isUWhiteSpace, "u_isUWhiteSpace"},
1205 {cast(void**) &isUUppercase, "u_isUUppercase"},
1206 {cast(void**) &isULowercase, "u_isULowercase"},
1207 {cast(void**) &getNumericValue, "u_getNumericValue"},
1208 {cast(void**) &getCombiningClass, "u_getCombiningClass"},
1209 {cast(void**) &u_getIntPropertyValue, "u_getIntPropertyValue"},
1210 {cast(void**) &u_getIntPropertyMinValue,"u_getIntPropertyMinValue"},
1211 {cast(void**) &u_getIntPropertyMaxValue,"u_getIntPropertyMaxValue"},
1212 {cast(void**) &u_charDirection, "u_charDirection"},
1213 {cast(void**) &ublock_getCode, "ublock_getCode"},
1214 {cast(void**) &u_charName, "u_charName"},
1215 {cast(void**) &u_getISOComment, "u_getISOComment"},
1216 {cast(void**) &u_charFromName, "u_charFromName"},
1217 {cast(void**) &u_getPropertyName, "u_getPropertyName"},
1218 {cast(void**) &u_getPropertyValueName, "u_getPropertyValueName"},
1219 {cast(void**) &u_getUnicodeVersion, "u_getUnicodeVersion"},
1220 {cast(void**) &u_charAge, "u_charAge"},
1221 ];
1222
1223 /***********************************************************************
1224
1225 ***********************************************************************/
1226
1227 static this ()
1228 {
1229 library = FunctionLoader.bind (icuuc, targets);
1230 }
1231
1232 /***********************************************************************
1233
1234 ***********************************************************************/
1235
1236 static ~this ()
1237 {
1238 FunctionLoader.unbind (library);
1239 }
1240 }