view com.ibm.icu/src/com/ibm/icu/mangoicu/UCollator.d @ 120:536e43f63c81

Comprehensive update for Win32/Linux32 dmd-2.053/dmd-1.068+Tango-r5661 ===D2=== * added [Try]Immutable/Const/Shared templates to work with differenses in D1/D2 instead of version statements used these templates to work with strict type storage rules of dmd-2.053 * com.ibm.icu now also compilable with D2, but not tested yet * small fixes Snippet288 - shared data is in TLS ===Phobos=== * fixed critical bugs in Phobos implemention completely incorrect segfault prone fromStringz (Linux's port ruthless killer) terrible, incorrect StringBuffer realization (StyledText killer) * fixed small bugs as well Snippet72 - misprint in the snippet * implemented missed functionality for Phobos ByteArrayOutputStream implemented (image loading available) formatting correctly works for all DWT's cases As a result, folowing snippets now works with Phobos (Snippet### - what is fixed): Snippet24, 42, 111, 115, 130, 235, 276 - bad string formatting Snippet48, 282 - crash on image loading Snippet163, 189, 211, 213, 217, 218, 222 - crash on copy/cut in StyledText Snippet244 - hang-up ===Tango=== * few changes for the latest Tango trunc-r5661 * few small performance improvments ===General=== * implMissing-s for only one version changed to implMissingInTango/InPhobos * incorrect calls to Format in toString-s fixed * fixed loading \uXXXX characters in ResourceBundle * added good UTF-8 support for StyledText, TextLayout (Win32) and friends UTF functions revised and tested. It is now in java.nonstandard.*Utf modules StyledText and TextLayout (Win32) modules revised for UTF-8 support * removed small diferences in most identical files in *.swt.* folders *.swt.internal.image, *.swt.events and *.swt.custom are identical in Win32/Linux32 now 179 of 576 (~31%) files in *.swt.* folders are fully identical * Win32: snippets now have right subsystem, pretty icons and native system style controls * small fixes in snippets Snippet44 - it's not Snippet44 Snippet212 - functions work with different images and offsets arrays Win32: Snippet282 - crash on close if the button has an image Snippet293 - setGrayed is commented and others Win32: As a result, folowing snippets now works Snippet68 - color doesn't change Snippet163, 189, 211, 213, 217, 218, 222 - UTF-8 issues (see above) Snippet193 - no tabel headers
author Denis Shelomovskij <verylonglogin.reg@gmail.com>
date Sat, 09 Jul 2011 15:50:20 +0300
parents ebefa5c2eab4
children
line wrap: on
line source

/*******************************************************************************

        @file UCollator.d

        Copyright (c) 2004 Kris Bell

        This software is provided 'as-is', without any express or implied
        warranty. In no event will the authors be held liable for damages
        of any kind arising from the use of this software.

        Permission is hereby granted to anyone to use this software for any
        purpose, including commercial applications, and to alter it and/or
        redistribute it freely, subject to the following restrictions:

        1. The origin of this software must not be misrepresented; you must
           not claim that you wrote the original software. If you use this
           software in a product, an acknowledgment within documentation of
           said product would be appreciated but is not required.

        2. Altered source versions must be plainly marked as such, and must
           not be misrepresented as being the original software.

        3. This notice may not be removed or altered from any distribution
           of the source.

        4. Derivative works are permitted, but they must carry this notice
           in full and credit the original source.


                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


        @version        Initial version, November 2004
        @author         Kris

        Note that this package and documentation is built around the ICU
        project (http://oss.software.ibm.com/icu/). Below is the license
        statement as specified by that software:


                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


        ICU License - ICU 1.8.1 and later

        COPYRIGHT AND PERMISSION NOTICE

        Copyright (c) 1995-2003 International Business Machines Corporation and
        others.

        All rights reserved.

        Permission is hereby granted, free of charge, to any person obtaining a
        copy of this software and associated documentation files (the
        "Software"), to deal in the Software without restriction, including
        without limitation the rights to use, copy, modify, merge, publish,
        distribute, and/or sell copies of the Software, and to permit persons
        to whom the Software is furnished to do so, provided that the above
        copyright notice(s) and this permission notice appear in all copies of
        the Software and that both the above copyright notice(s) and this
        permission notice appear in supporting documentation.

        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
        OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
        MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT
        OF THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
        HOLDERS INCLUDED IN THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL
        INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING
        FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,
        NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION
        WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

        Except as contained in this notice, the name of a copyright holder
        shall not be used in advertising or otherwise to promote the sale, use
        or other dealings in this Software without prior written authorization
        of the copyright holder.

        ----------------------------------------------------------------------

        All trademarks and registered trademarks mentioned herein are the
        property of their respective owners.

*******************************************************************************/

module com.ibm.icu.mangoicu.UCollator;

private import  com.ibm.icu.mangoicu.ICU,
                com.ibm.icu.mangoicu.USet,
                com.ibm.icu.mangoicu.ULocale,
                com.ibm.icu.mangoicu.UString;
private import java.lang.util;

/*******************************************************************************

        The API for Collator performs locale-sensitive string comparison.
        You use this service to build searching and sorting routines for
        natural language text. Important: The ICU collation service has been
        reimplemented in order to achieve better performance and UCA compliance.
        For details, see the collation design document.

        For more information about the collation service see the users guide.

        Collation service provides correct sorting orders for most locales
        supported in ICU. If specific data for a locale is not available,
        the orders eventually falls back to the UCA sort order.

        Sort ordering may be customized by providing your own set of rules.
        For more on this subject see the Collation customization section of
        the users guide.

        See <A HREF="http://oss.software.ibm.com/icu/apiref/ucol_8h.html">
        this page</A> for full details.

*******************************************************************************/

class UCollator : ICU
{
        package Handle handle;

        enum    Attribute
                {
                FrenchCollation,
                AlternateHandling,
                CaseFirst,
                CaseLevel,
                NormalizationMode,
                DecompositionMode = NormalizationMode,
                strength,
                HiraganaQuaternaryMode,
                NumericCollation,
                AttributeCount
                }

        enum    AttributeValue
                {
                Default = -1,
                Primary = 0,
                Secondary = 1,
                Tertiary = 2,
                DefaultStrength = Tertiary,
                CeStrengthLimit,
                Quaternary = 3,
                Identical = 15,
                strengthLimit,
                Off = 16,
                On = 17,
                Shifted = 20,
                NonIgnorable = 21,
                LowerFirst = 24,
                UpperFirst = 25,
                AttributeValueCount
                }

         enum   RuleOption
                {
                TailoringOnly,
                FullRules
                }

         enum   BoundMode
                {
                BoundLower = 0,
                BoundUpper = 1,
                BoundUpperLong = 2,
                BoundValueCount
                }

        typedef AttributeValue Strength;

        /***********************************************************************

                Open a UCollator for comparing strings. The locale specified
                determines the required collation rules. Special values for
                locales can be passed in - if ULocale.Default is passed for
                the locale, the default locale collation rules will be used.
                If ULocale.Root is passed, UCA rules will be used

        ***********************************************************************/

        this (ULocale locale)
        {
                UErrorCode e;

                handle = ucol_open (toString(locale.name), e);
                testError (e, "failed to open collator");
        }

        /***********************************************************************

                Produce a UCollator instance according to the rules supplied.

                The rules are used to change the default ordering, defined in
                the UCA in a process called tailoring. For the syntax of the
                rules please see users guide

        ***********************************************************************/

        this (UStringView rules, AttributeValue mode, Strength strength)
        {
                UErrorCode e;

                handle = ucol_openRules (rules.get.ptr, rules.len, mode, strength, null, e);
                testError (e, "failed to open rules-based collator");
        }

        /***********************************************************************

                Open a collator defined by a short form string. The
                structure and the syntax of the string is defined in
                the "Naming collators" section of the users guide:
                http://oss.software.ibm.com/icu/userguide/Collate_Concepts.html#Naming_Collators
                Attributes are overriden by the subsequent attributes.
                So, for "S2_S3", final strength will be 3. 3066bis
                locale overrides individual locale parts.

                The call to this constructor is equivalent to a plain
                constructor, followed by a series of calls to setAttribute
                and setVariableTop

        ***********************************************************************/

        this (char[] shortName, bool forceDefaults)
        {
                UErrorCode e;

                handle = ucol_openFromShortString (toString(shortName), forceDefaults, null, e);
                testError (e, "failed to open short-name collator");
        }

        /***********************************************************************

                Internal constructor invoked via USearch

        ***********************************************************************/

        package this (Handle handle)
        {
                this.handle = handle;
        }

        /***********************************************************************

                Close a UCollator

        ***********************************************************************/

        ~this ()
        {
                ucol_close (handle);
        }

        /***********************************************************************

                Get a set containing the contractions defined by the
                collator.

                The set includes both the UCA contractions and the
                contractions defined by the collator. This set will
                contain only strings. If a tailoring explicitly
                suppresses contractions from the UCA (like Russian),
                removed contractions will not be in the resulting set.

        ***********************************************************************/

        void getContractions (USet set)
        {
                UErrorCode e;

                ucol_getContractions (handle, set.handle, e);
                testError (e, "failed to get collator contractions");
        }

        /***********************************************************************

                Compare two strings. Return value is -, 0, +

        ***********************************************************************/

        int strcoll (UStringView source, UStringView target)
        {
                return ucol_strcoll (handle, source.get.ptr, source.len, target.get.ptr, target.len);
        }

        /***********************************************************************

                Determine if one string is greater than another. This
                function is equivalent to strcoll() > 1

        ***********************************************************************/

        bool greater (UStringView source, UStringView target)
        {
                return ucol_greater (handle, source.get.ptr, source.len, target.get.ptr, target.len) != 0;
        }

        /***********************************************************************

                Determine if one string is greater than or equal to
                another. This function is equivalent to strcoll() >= 0

        ***********************************************************************/

        bool greaterOrEqual (UStringView source, UStringView target)
        {
                return ucol_greaterOrEqual (handle, source.get.ptr, source.len, target.get.ptr, target.len) != 0;
        }

        /***********************************************************************

                This function is equivalent to strcoll() == 0

        ***********************************************************************/

        bool equal (UStringView source, UStringView target)
        {
                return ucol_equal (handle, source.get.ptr, source.len, target.get.ptr, target.len) != 0;
        }

        /***********************************************************************

                Get the collation strength used in a UCollator. The
                strength influences how strings are compared.

        ***********************************************************************/

        Strength getStrength ()
        {
                return ucol_getStrength (handle);
        }

        /***********************************************************************

                Set the collation strength used in this UCollator. The
                strength influences how strings are compared. one of
                Primary, Secondary, Tertiary, Quaternary, Dentical, or
                Default

        ***********************************************************************/

        void setStrength (Strength s)
        {
                ucol_setStrength (handle, s);
        }

        /***********************************************************************

                Get the display name for a UCollator. The display name is
                suitable for presentation to a user

        ***********************************************************************/

        void getDisplayName (ULocale obj, ULocale display, UString dst)
        {
                uint fmt (wchar* p, uint len, ref UErrorCode e)
                {
                        return ucol_getDisplayName (toString(obj.name), toString(display.name), dst.get.ptr, dst.len, e);
                }

                dst.format (&fmt, "failed to get collator display name");
        }

        /***********************************************************************

                Returns current rules. Options define whether full rules
                are returned or just the tailoring.

        ***********************************************************************/

        void getRules (UString dst, RuleOption o = RuleOption.FullRules)
        {
                uint fmt (wchar* p, uint len, ref UErrorCode e)
                {
                        uint needed = ucol_getRulesEx (handle, o, dst.get.ptr, dst.len);
                        if (needed > len)
                            e = e.BufferOverflow;
                        return needed;
                }

                dst.format (&fmt, "failed to get collator rules");
        }

        /***********************************************************************

                Get the short definition string for a collator.

                This API harvests the collator's locale and the attribute
                set and produces a string that can be used for opening a
                collator with the same properties using the char[] style
                constructor. This string will be normalized.

                The structure and the syntax of the string is defined in the
                "Naming collators" section of the users guide:
                http://oss.software.ibm.com/icu/userguide/Collate_Concepts.html#Naming_Collators

        ***********************************************************************/

        char[] getShortDefinitionString (ULocale locale = ULocale.Default)
        {
                UErrorCode    e;
                char[64] dst;

                uint len = ucol_getShortDefinitionString (handle, toString(locale.name), dst.ptr, dst.length, e);
                testError (e, "failed to get collator short name");
                return dst[0..len].dup;
        }

        /***********************************************************************

                Verifies and normalizes short definition string. Normalized
                short definition string has all the option sorted by the
                argument name, so that equivalent definition strings are the
                same

        ***********************************************************************/

        char[] normalizeShortDefinitionString (char[] source)
        {
                UErrorCode    e;
                char[64] dst;

                uint len = ucol_normalizeShortDefinitionString (toString(source), dst.ptr, dst.length, null, e);
                testError (e, "failed to normalize collator short name");
                return dst[0..len].dup;
        }

        /***********************************************************************

                  Get a sort key for a string from a UCollator. Sort keys
                  may be compared using strcmp.

        ***********************************************************************/

        ubyte[] getSortKey (UStringView t, ubyte[] result)
        {
                uint len = ucol_getSortKey (handle, t.get.ptr, t.len, result.ptr, result.length);
                if (len < result.length)
                    return result [0..len];
                 return null;
        }

        /***********************************************************************

                Merge two sort keys. The levels are merged with their
                corresponding counterparts (primaries with primaries,
                secondaries with secondaries etc.). Between the values
                from the same level a separator is inserted. example
                (uncompressed): 191B1D 01 050505 01 910505 00 and
                1F2123 01 050505 01 910505 00 will be merged as
                191B1D 02 1F212301 050505 02 050505 01 910505 02 910505 00
                This allows for concatenating of first and last names for
                sorting, among other things. If the destination buffer is
                not big enough, the results are undefined. If any of source
                lengths are zero or any of source pointers are null/undefined,
                result is of size zero.

        ***********************************************************************/

        ubyte[] mergeSortkeys (ubyte[] left, ubyte[] right, ubyte[] result)
        {
                uint len = ucol_mergeSortkeys (left.ptr, left.length, right.ptr, right.length, result.ptr, result.length);
                if (len < result.length)
                    return result [0..len];
                 return null;
        }

        /***********************************************************************

                Produce a bound for a given sortkey and a number of levels.

                Return value is always the number of bytes needed, regardless
                of whether the result buffer was big enough or even valid.

                Resulting bounds can be used to produce a range of strings
                that are between upper and lower bounds. For example, if
                bounds are produced for a sortkey of string "smith", strings
                between upper and lower bounds with one level would include
                "Smith", "SMITH", "sMiTh".

                There are two upper bounds that can be produced. If BoundUpper
                is produced, strings matched would be as above. However, if
                bound produced using BoundUpperLong is used, the above example
                will also match "Smithsonian" and similar.

        ***********************************************************************/

        ubyte[] getBound (BoundMode mode, ubyte[] source, ubyte[] result, uint levels = 1)
        {
                UErrorCode e;

                uint len = ucol_getBound (source.ptr, source.length, mode, levels, result.ptr, result.length, e);
                testError (e, "failed to get sortkey bound");
                if (len < result.length)
                    return result [0..len];
                 return null;
        }

        /***********************************************************************

                Gets the version information for a Collator.

                Version is currently an opaque 32-bit number which depends,
                among other things, on major versions of the collator
                tailoring and UCA

        ***********************************************************************/

        void getVersion (ref Version v)
        {
                ucol_getVersion (handle, v);
        }

        /***********************************************************************

                Gets the UCA version information for this Collator

        ***********************************************************************/

        void getUCAVersion (ref Version v)
        {
                ucol_getUCAVersion (handle, v);
        }

        /***********************************************************************

                Universal attribute setter

        ***********************************************************************/

        void setAttribute (Attribute attr, AttributeValue value)
        {
                UErrorCode e;

                ucol_setAttribute (handle, attr, value, e);
                testError (e, "failed to set collator attribute");
        }

        /***********************************************************************

                Universal attribute getter

        ***********************************************************************/

        AttributeValue getAttribute (Attribute attr)
        {
                UErrorCode e;

                AttributeValue v = ucol_getAttribute (handle, attr, e);
                testError (e, "failed to get collator attribute");
                return v;
        }

        /***********************************************************************

                Variable top is a two byte primary value which causes all
                the codepoints with primary values that are less or equal
                than the variable top to be shifted when alternate handling
                is set to Shifted.

        ***********************************************************************/

        void setVariableTop (UStringView t)
        {
                UErrorCode e;

                ucol_setVariableTop (handle, t.get.ptr, t.len, e);
                testError (e, "failed to set variable-top");
        }

        /***********************************************************************

                Sets the variable top to a collation element value
                supplied.Variable top is set to the upper 16 bits.
                Lower 16 bits are ignored.

        ***********************************************************************/

        void setVariableTop (uint x)
        {
                UErrorCode e;

                ucol_restoreVariableTop (handle, x, e);
                testError (e, "failed to restore variable-top");
        }

        /***********************************************************************

                Gets the variable top value of this Collator. Lower 16 bits
                are undefined and should be ignored.

        ***********************************************************************/

        uint getVariableTop ()
        {
                UErrorCode e;

                uint x = ucol_getVariableTop (handle, e);
                testError (e, "failed to get variable-top");
                return x;
        }

        /***********************************************************************

                Gets the locale name of the collator. If the collator is
                instantiated from the rules, then this function will throw
                an exception

        ***********************************************************************/

        void getLocale (ULocale locale, ULocale.Type type)
        {
                UErrorCode e;

                locale.name = cast(String) toArray (ucol_getLocaleByType (handle, type, e));
                if (isError(e) || locale.name is null)
                    exception ("failed to get collator locale");
        }

        /***********************************************************************

                Get the Unicode set that contains all the characters and
                sequences tailored in this collator.

        ***********************************************************************/

        USet getTailoredSet ()
        {
                UErrorCode e;

                Handle h = ucol_getTailoredSet (handle, e);
                testError (e, "failed to get tailored set");
                return new USet (h);
        }


        /***********************************************************************

                Bind the ICU functions from a shared library. This is
                complicated by the issues regarding D and DLLs on the
                Windows platform

        ***********************************************************************/

        mixin(genICUNative!("in"
                ,"void            function (Handle)", "ucol_close"
                ,"Handle          function (char *loc, ref UErrorCode e)", "ucol_open"
                ,"Handle          function (wchar* rules, uint rulesLength, AttributeValue normalizationMode, Strength strength, UParseError *parseError, ref UErrorCode e)", "ucol_openRules"
                ,"Handle          function (char *definition, byte forceDefaults, UParseError *parseError, ref UErrorCode e)", "ucol_openFromShortString"
                ,"uint            function (Handle, Handle conts, ref UErrorCode e)", "ucol_getContractions"
                ,"int             function (Handle, wchar* source, uint sourceLength, wchar* target, uint targetLength)", "ucol_strcoll"
                ,"byte            function (Handle, wchar* source, uint sourceLength, wchar* target, uint targetLength)", "ucol_greater"
                ,"byte            function (Handle, wchar* source, uint sourceLength, wchar* target, uint targetLength)", "ucol_greaterOrEqual"
                ,"byte            function (Handle, wchar* source, uint sourceLength, wchar* target, uint targetLength)", "ucol_equal"
                ,"Strength        function (Handle)", "ucol_getStrength"
                ,"void            function (Handle, Strength strength)", "ucol_setStrength"
                ,"uint            function (char *objLoc, char *dispLoc, wchar* result, uint resultLength, ref UErrorCode e)", "ucol_getDisplayName"
                ,"uint            function (Handle, char *locale, char *buffer, uint capacity, ref UErrorCode e)", "ucol_getShortDefinitionString"
                ,"uint            function (char *source, char *destination, uint capacity, UParseError *parseError, ref UErrorCode e)", "ucol_normalizeShortDefinitionString"
                ,"uint            function (Handle, wchar* source, uint sourceLength, ubyte *result, uint resultLength)", "ucol_getSortKey"
                ,"uint            function (ubyte *source, uint sourceLength, BoundMode boundType, uint noOfLevels, ubyte *result, uint resultLength, ref UErrorCode e)", "ucol_getBound"
                ,"void            function (Handle, Version info)", "ucol_getVersion"
                ,"void            function (Handle, Version info)", "ucol_getUCAVersion"
                ,"uint            function (ubyte *src1, uint src1Length, ubyte *src2, uint src2Length, ubyte *dest, uint destCapacity)", "ucol_mergeSortkeys"
                ,"void            function (Handle, Attribute attr, AttributeValue value, ref UErrorCode e)", "ucol_setAttribute"
                ,"AttributeValue  function (Handle, Attribute attr, ref UErrorCode e)", "ucol_getAttribute"
                ,"uint            function (Handle, wchar* varTop, uint len, ref UErrorCode e)", "ucol_setVariableTop"
                ,"uint            function (Handle, ref UErrorCode e)", "ucol_getVariableTop"
                ,"void            function (Handle, uint varTop, ref UErrorCode e)", "ucol_restoreVariableTop"
                ,"uint            function (Handle, RuleOption delta, wchar* buffer, uint bufferLen)", "ucol_getRulesEx"
                ,"char*           function (Handle, ULocale.Type type, ref UErrorCode e)", "ucol_getLocaleByType"
                ,"Handle          function (Handle, ref UErrorCode e)", "ucol_getTailoredSet"
        ));
}