diff mde/mergetag/doc/file-format-text.txt @ 0:d547009c104c

Repository creation. committer: Diggory Hardy <diggory.hardy@gmail.com>
author Diggory Hardy <diggory.hardy@gmail.com>
date Sat, 27 Oct 2007 18:05:39 +0100
parents
children 78eb491bd642
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/mde/mergetag/doc/file-format-text.txt	Sat Oct 27 18:05:39 2007 +0100
@@ -0,0 +1,184 @@
+This is the file format for mergetag text files.
+Version: 0.1 unfinalised
+
+
+The encoding should be unicode UTF-8, UTF-16 or UTF-32, and for anything other than UTF-8 must include a BOM.
+
+
+Hierarchy:
++	Sections	(special section: see header)
+++	Data Tags
+
+
+IDs:
+IDs are used for several purposes; they are always stored as a uint number (0-4294967295). They may
+be given in the file as a base-10 or hex number or, where a lookup table is provided to the reader,
+as a double-quoted string (with no escape sequences).
+Multiple section or data tags with the same ID are allowed; see the "Merging rules" section.
+
+
+Outside of tags the only whitespace or valid tags is allowed. Whitespace is ignored.
+The following tags are valid (see below for details):
+tag		purpose
+{...}		section identifiers
+<...>		data items
+!{...}		simple comment block
+!<...>		comment block parsed the same as <...>
+Within tags, type specifications or data items whitespace is allowed between symbols.
+
+
+Section identifier tags:
+Format: {ID} or {ID|ID}
+In the {ID|ID} case, the first ID is the section type, and the second ID the section name.
+In the {ID} case, the section type ID has been ommitted and the default type is used (0).
+A section identifier marks the beginning of a new section, extending until the next section
+identifier or the end of the file. When a section is read, a new 
+
+
+Data item tags:
+Format: <tp|ID=dt>
+A data item with type tp, identifier ID and data dt. If the data does not fit the given type it is
+an error and the tag is ignored. Once split into a type string, ID and data string, the contents
+are passed to an addTag() function within the DataSection class which will parse tags of a
+recognised format and either ignore or print a warning about other tags.
+
+
+Data item tags: Type format:
+Note:
+	The type is not initially parsed; it is read as a token terminated by any of these
+	characters:	<>|=
+	Of course any character other than a | terminating the token is an error.
+Format:
+	tp		a basic type
+	tp[]		a dynamic list of sub-type tp
+Possible future additions:
+	tp()		a dynamic merging list of sub-type tp (only valid as the primary type, ie
+        		<subtype()|...>, not a sub-type of a tuple or another dynamic list)
+	{t1,t2,...,tn}	a tuple with sub-types t1, t2, ..., tn
+
+Basic types (only items with a + are currently supported):
+	abbrev./full name (each type has two names which can be used):
+	
+	0	void	--- less useful type
++	1	bool	--- integer types
++	s8	byte
++	u8	ubyte
++	s16	short
++	u16	ushort
++	s32	int
++	u32	uint
++	s64	long
++	u64	ulong
+	s128	cent
+	u128	ucent
+	
++		binary	--- alias for ubyte[]
+	
++	fp32	float	--- floating point types
++	fp64	double
++	fp	real
+	im32	ifloat
+	im64	idouble
+	im	ireal
+	cpx32	cfloat
+	cpx64	cdouble
+	cpx	creal
+	
++	UTF8	char	--- character types (actually these CANNOT support UTF8 chars with length > 1)
+	UTF16	wchar
+	UTF32	dchar
++		string	--- alias for char[] --- (DOES support UTF8)
+		wstring	--- alias for wchar[]
+		dstring	--- alias for dchar[]
+
+
+Data item tags: Data format:
+Valid chars:	[](){},+-.0-9eEixXa-fA-F '.' ".*"
+Format:
+	[d1,d2,...,dn]	data all of type t corresponding to t[]
+	(d1,d2,...,dn)	data all of type t corresponding to t()
+	{d1,d2,...,dn}	data corresponding to a type declaration of {t1,t2,...,tn}
+	d		a single data element
+
+Single data elements:
+	z		an integer number (regexp: [+-]?[0-9]+)
+	z		a floating point number (rough regexp: [+-]?[0-9]*[.]?[0-9]*(e[+-]?[0-9]+)?)
+	zi		an imaginary floating point number (z is a floating point number)
+	y+zi, y-zi	a complex number (4+0i may be written as 4, etc) (y, z are f.p.s)
+	0xz, -0xz	a hexadecimal integer z (composed of chars 0-9,a-f,A-F)
+	'c'		a char/wchar/dchar character, depending on the type specified (c may be any
+			single character except ' or an escape sequence)
+	"string"	equivalent to ['s','t','r','i','n','g'] (for a string/wstring/dstring type)
+			may contain escape sequences
+			Escape sequences are a subset of those supported by D: \" \' \\ \a \b \f \n \r \t \v
+	XX...XX		Binary (ubyte[]); each pair of chars is read as a hex ubyte
+	<void>		void "data" has no symbols
+
+
+Data format: Escape sequences:
+To be created and written.
+
+
+Comment tags (there are no line comments):
+Simple comment blocks:
+Format: !{...}
+This is a simple comment block, and only curly braces ({,}) are treated specially. A {, whether or
+not it is preceded by a !, starts an embedded comment block, and a } ends either an embedded block
+or the actual comment block. Note: beware commenting out {...} tags with a string ID containing
+curly braces which aren't in matching pairs.
+Commented data tags:
+Format: !<tp|ID=dt>
+Basically a commented out data tag. Conformance to the above spec may not be checked as strictly as
+normal, but the dt section is checked for strings so that a > within a string won't end the tag.
+
+
+Merging rules:
+if, when a data item is read, a data item with the same identifier
+within the same section exists in the DataSet being read into:
++	if the types are identical:
+++		if the primary type is a tp() mergeable dynamic list:
++++			the entries from the item being read are concatenated to those in the item
++++			in the DataSet
+++		else:
+++-			the item already in the DataSet takes priority and is left untouched
++	else:
++-		a warning is issued, and the data item within the DataSet is left untouched
+This allows merging some config settings in a user config file with the remaining settings in a
+complete system config file and some support for modifications overriding or adding to some data.
+
+
+Header:
+The header is a standard section which is mandatory and must be the first section. Its section
+identifier must start at the beginning of the file with no whitespace, declared with:
+	{MTXY}		where XY is a two digit CAPITAL HEX version number representing the
+			mergetag format version, e.g. {MT01} .
+If these are not the first 6 characters of the file the file will not be regarded as valid.
+This formatting is very strict to allow reliable low-level parsing.
+
+
+The data tags within the header have no special meaning; any may be used such as the following:
+	<string|"Author"="...">
+	<string|"Name"="...">
+	<string|"Description"="...">
+	<string|"Program"="...">	(which program created/uses this?)
+	<*|"Version"=...>		(use any supported type)
+	<string|"Date"="YYYYMMDD">	(reverse date format; optionally "YYYYMMDDhhmmss")
+	<{u16,u8,u8}|"Date"={YYYY,MM,DD}>	(actually this type probably won't be supported by
+						a standard section)
+	<string|"Copyright"=...>
+
+
+Example:
+{MT01}
+{example section}
+<u32|"num"=5>
+<{u32,UTF8[]}()|"DATA"=(
+	{1,['a']},
+	{59,['w','o','r','d']},
+	{2,"strings can be written like this"} )>
+<wchar[]|"name"="This string is stored in UTF16, regardless of the file's encoding.">
+<{u32,UTF8[]}()|"DATA"=(
+	{3,"this is appended to the previous 'DATA' item"} )>
+{"section: section identifiers and tuples are not confused since tuples only occur inside <...> items"}
+<void|Empty tag= >
+!{this is a comment {containing a comment}}