Mercurial > projects > mde
diff mde/mergetag/doc/file-format-text.txt @ 0:d547009c104c
Repository creation.
committer: Diggory Hardy <diggory.hardy@gmail.com>
author | Diggory Hardy <diggory.hardy@gmail.com> |
---|---|
date | Sat, 27 Oct 2007 18:05:39 +0100 |
parents | |
children | 78eb491bd642 |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/mde/mergetag/doc/file-format-text.txt Sat Oct 27 18:05:39 2007 +0100 @@ -0,0 +1,184 @@ +This is the file format for mergetag text files. +Version: 0.1 unfinalised + + +The encoding should be unicode UTF-8, UTF-16 or UTF-32, and for anything other than UTF-8 must include a BOM. + + +Hierarchy: ++ Sections (special section: see header) +++ Data Tags + + +IDs: +IDs are used for several purposes; they are always stored as a uint number (0-4294967295). They may +be given in the file as a base-10 or hex number or, where a lookup table is provided to the reader, +as a double-quoted string (with no escape sequences). +Multiple section or data tags with the same ID are allowed; see the "Merging rules" section. + + +Outside of tags the only whitespace or valid tags is allowed. Whitespace is ignored. +The following tags are valid (see below for details): +tag purpose +{...} section identifiers +<...> data items +!{...} simple comment block +!<...> comment block parsed the same as <...> +Within tags, type specifications or data items whitespace is allowed between symbols. + + +Section identifier tags: +Format: {ID} or {ID|ID} +In the {ID|ID} case, the first ID is the section type, and the second ID the section name. +In the {ID} case, the section type ID has been ommitted and the default type is used (0). +A section identifier marks the beginning of a new section, extending until the next section +identifier or the end of the file. When a section is read, a new + + +Data item tags: +Format: <tp|ID=dt> +A data item with type tp, identifier ID and data dt. If the data does not fit the given type it is +an error and the tag is ignored. Once split into a type string, ID and data string, the contents +are passed to an addTag() function within the DataSection class which will parse tags of a +recognised format and either ignore or print a warning about other tags. + + +Data item tags: Type format: +Note: + The type is not initially parsed; it is read as a token terminated by any of these + characters: <>|= + Of course any character other than a | terminating the token is an error. +Format: + tp a basic type + tp[] a dynamic list of sub-type tp +Possible future additions: + tp() a dynamic merging list of sub-type tp (only valid as the primary type, ie + <subtype()|...>, not a sub-type of a tuple or another dynamic list) + {t1,t2,...,tn} a tuple with sub-types t1, t2, ..., tn + +Basic types (only items with a + are currently supported): + abbrev./full name (each type has two names which can be used): + + 0 void --- less useful type ++ 1 bool --- integer types ++ s8 byte ++ u8 ubyte ++ s16 short ++ u16 ushort ++ s32 int ++ u32 uint ++ s64 long ++ u64 ulong + s128 cent + u128 ucent + ++ binary --- alias for ubyte[] + ++ fp32 float --- floating point types ++ fp64 double ++ fp real + im32 ifloat + im64 idouble + im ireal + cpx32 cfloat + cpx64 cdouble + cpx creal + ++ UTF8 char --- character types (actually these CANNOT support UTF8 chars with length > 1) + UTF16 wchar + UTF32 dchar ++ string --- alias for char[] --- (DOES support UTF8) + wstring --- alias for wchar[] + dstring --- alias for dchar[] + + +Data item tags: Data format: +Valid chars: [](){},+-.0-9eEixXa-fA-F '.' ".*" +Format: + [d1,d2,...,dn] data all of type t corresponding to t[] + (d1,d2,...,dn) data all of type t corresponding to t() + {d1,d2,...,dn} data corresponding to a type declaration of {t1,t2,...,tn} + d a single data element + +Single data elements: + z an integer number (regexp: [+-]?[0-9]+) + z a floating point number (rough regexp: [+-]?[0-9]*[.]?[0-9]*(e[+-]?[0-9]+)?) + zi an imaginary floating point number (z is a floating point number) + y+zi, y-zi a complex number (4+0i may be written as 4, etc) (y, z are f.p.s) + 0xz, -0xz a hexadecimal integer z (composed of chars 0-9,a-f,A-F) + 'c' a char/wchar/dchar character, depending on the type specified (c may be any + single character except ' or an escape sequence) + "string" equivalent to ['s','t','r','i','n','g'] (for a string/wstring/dstring type) + may contain escape sequences + Escape sequences are a subset of those supported by D: \" \' \\ \a \b \f \n \r \t \v + XX...XX Binary (ubyte[]); each pair of chars is read as a hex ubyte + <void> void "data" has no symbols + + +Data format: Escape sequences: +To be created and written. + + +Comment tags (there are no line comments): +Simple comment blocks: +Format: !{...} +This is a simple comment block, and only curly braces ({,}) are treated specially. A {, whether or +not it is preceded by a !, starts an embedded comment block, and a } ends either an embedded block +or the actual comment block. Note: beware commenting out {...} tags with a string ID containing +curly braces which aren't in matching pairs. +Commented data tags: +Format: !<tp|ID=dt> +Basically a commented out data tag. Conformance to the above spec may not be checked as strictly as +normal, but the dt section is checked for strings so that a > within a string won't end the tag. + + +Merging rules: +if, when a data item is read, a data item with the same identifier +within the same section exists in the DataSet being read into: ++ if the types are identical: +++ if the primary type is a tp() mergeable dynamic list: ++++ the entries from the item being read are concatenated to those in the item ++++ in the DataSet +++ else: +++- the item already in the DataSet takes priority and is left untouched ++ else: ++- a warning is issued, and the data item within the DataSet is left untouched +This allows merging some config settings in a user config file with the remaining settings in a +complete system config file and some support for modifications overriding or adding to some data. + + +Header: +The header is a standard section which is mandatory and must be the first section. Its section +identifier must start at the beginning of the file with no whitespace, declared with: + {MTXY} where XY is a two digit CAPITAL HEX version number representing the + mergetag format version, e.g. {MT01} . +If these are not the first 6 characters of the file the file will not be regarded as valid. +This formatting is very strict to allow reliable low-level parsing. + + +The data tags within the header have no special meaning; any may be used such as the following: + <string|"Author"="..."> + <string|"Name"="..."> + <string|"Description"="..."> + <string|"Program"="..."> (which program created/uses this?) + <*|"Version"=...> (use any supported type) + <string|"Date"="YYYYMMDD"> (reverse date format; optionally "YYYYMMDDhhmmss") + <{u16,u8,u8}|"Date"={YYYY,MM,DD}> (actually this type probably won't be supported by + a standard section) + <string|"Copyright"=...> + + +Example: +{MT01} +{example section} +<u32|"num"=5> +<{u32,UTF8[]}()|"DATA"=( + {1,['a']}, + {59,['w','o','r','d']}, + {2,"strings can be written like this"} )> +<wchar[]|"name"="This string is stored in UTF16, regardless of the file's encoding."> +<{u32,UTF8[]}()|"DATA"=( + {3,"this is appended to the previous 'DATA' item"} )> +{"section: section identifiers and tuples are not confused since tuples only occur inside <...> items"} +<void|Empty tag= > +!{this is a comment {containing a comment}}