diff udis86-1.4/docs/doc.html @ 1:4a9dcbd9e54f

-files of 0.13 beta -fixes so that it now compiles with the current dmd version
author marton@basel.hu
date Tue, 05 Apr 2011 20:44:01 +0200
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/udis86-1.4/docs/doc.html	Tue Apr 05 20:44:01 2011 +0200
@@ -0,0 +1,537 @@
+<html>
+<head>
+<title>Documentation: Udis86 - Disassembler Library for x86 and AMD64</title>
+<style type="text/css">@import 'style.css';</style>
+<style>
+#api_tab td {
+	vertical-align: top;
+	padding: .4em;
+}
+#api_tab {
+	width: 100%;
+	padding: 0em;
+}
+.func {
+	text-align: center;
+	font-family: monospace;
+}
+
+</style>
+</head>
+<body>
+<div id="topbar"><h1>Udis86 - Disassembler Library for x86 and AMD64</h1></div>
+<div id="wrapper">
+<div id="content">
+
+<a href="index.html">Home</a>
+
+<h1>Documentation</h1>
+<small><i>Dec 19, 2006</i> - Added new API function and Standalone build mode.</small>
+<hr size="1"/>
+<br/>
+<a href="#sec1">1. Using the library - libudis86</a>
+<blockquote>
+	<a href="#sec11">1.1 Compiling and Installing</a> <br/>
+	<blockquote>
+		<a href="#sec111">1.1.1 Standalone Udis86</a> <br/>
+	</blockquote>
+	<a href="#sec12">1.2 Interfacing With Your Program</a> <br/>
+	<a href="#sec13">1.3 A Quick Example</a><br/>
+	<a href="#sec14">1.4 The Udis86 Object</a><br/>
+	<a href="#sec15">1.5 Functions</a><br/>
+	<a href="#sec16">1.6 Examining an Instruction</a>
+</blockquote>
+<a href="#sec2">2. Using the command-line tool - udcli</a>
+<blockquote>
+	<a href="#sec21">2.1 Usage</a> <br/>
+	<a href="#sec22">2.2 Command-line options</a> <br/>
+	<a href="#sec23">2.3 The hexadecimal input mode</a> <br/>
+</blockquote>
+<br/>
+<hr size="1"/>
+
+<a name="sec1"></a>
+<h2>1. Using the library - libudis86</h2>
+
+<p>libudis86 can be used in a variety of situations, and the extent to which you
+need to know the API depends on the functionality you are looking for. At its
+core, libudis86 is a disassembler engine, which when given an input stream of 
+machine code bytes, disassembles them for you to inspect. You could use it,
+simply, to generate assembly language output of the code, or to inspect
+individual instructions, their operands, etc.</p>
+
+<a name="sec11"></a>
+<h3>1.1 Compiling and installing</h3>
+
+<p>libudis86 is developed for Unix-like environments, and steps to installing
+it is very simple. Get the source tarball, unpack it, and,</p>
+<pre>
+$ ./configure
+$ make
+$ make install
+</pre>
+<p>is all you need to do. Ofcourse, you may need to have root privileges to make 
+install. The install scripts copy the necessary header and library files to 
+appropriate locations of in system.
+</p>
+
+<a name="sec111"></a>
+<h4>1.1.1 Standalone Udis86</h4>
+
+<p>Standalone udis86 is (for now) a simple build hack that lets you build
+a single relocatable object file with all of the libudis86 functionality, that you
+could possibly use in any environment, say, a kernel module. The standalone object 
+assumes the availablity of the following functionality in the environment.</p>
+
+<pre>
+- memset()
+- vprintf()
+- sscanf()
+</pre>
+
+<p> To build udis86 in standalone mode, do the following.</p>
+
+<pre>
+$ ./configure
+$ make standalone
+</pre>
+
+which will generate <code>ud_standalone.o</code> in <code>ud/libudis86</code>. This build
+mode is a little hack'ish in nature and hopefully is replaced with better configure scripts
+in the future. If you have ideas, let me know!
+
+<a name="sec12"></a>
+<h3>1.2 Interfacing with your program</h3>
+
+<p>Once you have installed libudis86, to use it with your program, first,
+include in your program the udis86.h header file,
+</p>
+
+<pre>
+#include &lt;udis86.h&gt;
+</pre>
+
+<p>and then, add the following flag to your GCC command-line options.</p>
+
+<pre>
+-ludis86
+</pre>
+
+<a name="sec13"></a>
+<h3>1.3 A Quick Example</h3>
+
+The following code is an example of a program that interfaces with libudis86 and
+uses the API to generate assembly language output for 64-bit code, input from
+STDIN.
+<pre>
+/* simple_example.c */
+
+#include &lt;stdio.h&gt;
+#include &lt;udis86.h&gt;
+
+int main()
+{
+	ud_t ud_obj;
+
+	ud_init(&ud_obj);
+	ud_set_input_file(&ud_obj, stdin);
+	ud_set_mode(&ud_obj, 64);
+	ud_set_syntax(&ud_obj, UD_SYN_INTEL);
+
+	while (ud_disassemble(&ud_obj)) {
+		printf("\t%s\n", ud_insn_asm(&ud_obj));
+	}
+
+ 	return 0;
+}
+</pre>
+To compile the program:
+<pre>
+$ gcc -ludis86 simple_example.c -o simple_example
+</pre>
+
+<p>This example should give you an idea of how this library can
+be used. The following sections describe, in detail, the complete API of 
+libudis86.</p>
+
+<a name="sec14"></a>
+<h3>1.4 The Udis86 Object</h3>
+
+<p>To maintain reentrancy and thread safety, udis86 does not use static data.
+All data related to the disassembly process are stored in a single object, 
+called the udis86 object. So, to use libudis86 you must create an instance of 
+this object,</p>
+
+<pre>
+ud_t my_ud_obj;
+</pre>
+
+and initialize it,
+
+<pre>
+ud_init(&my_ud_obj);
+</pre>
+
+Ofcourse, you can create multiple instances of libudis86 and spawn multiple 
+threads of disassembly. Thats entirely upto how you want to use the library. 
+libudis86 guarantees reentrancy and thread safety.
+
+<a name="sec15"></a>
+<h3>1.5 Functions</h3>
+
+All functions in libudis86 take a pointer to the udis86 object (ud_t) as the first
+argument. The following is a list of all functions available.
+
+
+<ol>
+	<li>
+	<pre>void ud_init (ud_t* ud_obj)</pre>
+	ud_t object initializer. This function must be called on a udis86 object
+	before it can used anywhere else.
+	</li>
+
+	<li>
+	<pre>void ud_set_input_hook(ud_t* ud_obj, int (*hook)())</pre>
+
+	This function sets the input source for the library. To retrieve each 
+	byte in the stream, libudis86 calls back the function pointed to by 
+	"hook".
+
+	The hook function, defined by the user code, must return a single byte
+	of code each time it is called. To signal end-of-input, it must return the 
+	constant, <code>UD_EOI</code>.
+	</li>
+
+	<li>
+	<pre>void ud_set_input_buffer(ud_t* ud_obj, unsigned char* buffer, size_t size);</pre>
+
+	This function sets the input source for the library to a buffer of fixed
+	size.
+	</li>
+
+	<li>
+	<pre>void ud_set_input_file(ud_t* ud_obj, FILE* filep);</pre>
+
+	This function sets the input source for the library to a file pointed to by
+	the passed FILE pointer. Note that the library does not perform any checks,
+	assuming the file pointer to be properly initialized.
+	</li>
+	
+	<li>
+	<pre>void ud_set_mode(ud_t* ud_obj, uint8_t mode_bits);</pre>
+
+	Sets the mode of disassembly. Possible values are 16, 32, and 64. By
+	default, the library works in 32bit mode.
+	</li>
+
+	<li>
+	<pre>void ud_set_pc(ud_t*, uint64_t pc);</pre>
+
+	Sets the program counter (EIP/RIP). This changes the offset of the 
+	assembly output	generated, with direct effect on branch instructions.
+	</li>
+
+	<li>
+	<pre>void ud_set_syntax(ud_t*, void (*translator)(ud_t*));</pre>
+
+	libudis86 disassembles one instruction at a time into an intermediate
+	form that lets you inspect the instruction and its various aspects
+	individually. But to generate the assembly language output, this 
+	intermediate form must be translated. This function sets the translator.
+	There are two inbuilt translators, 
+	<br/>
+	<ul>
+	<li>UD_SYN_INTEL - for INTEL (NASM-like) syntax.</li>
+	<li>UD_SYN_ATT - for AT&amp;T (GAS-like) syntax.</li>
+	</ul>
+	If you do not want libudis86 to translate, you can pass a NULL to the 
+	function, with no more translations thereafter. This is particularily
+	useful for cases when you only want to identify chunks of code and then
+	create the assembly output if needed.
+	<br/><br/>
+	If you want to create your own translator, you must pass a pointer to 
+	function that accepts a pointer to ud_t. This function will be called
+	by libudis86 after each instruction is decoded.
+	</li>
+
+	<li>
+	<pre>void ud_set_vendor(ud_t*, unsigned vendor);</pre>
+
+	Sets the vendor of whose instruction to choose from. This is only useful for
+	selecting the VMX or SVM instruction sets at which point INTEL and AMD have 
+	diverged significantly. At a later stage, support for a more granular 
+	selection of instruction sets maybe added.
+	<br/>
+	<ul>
+	<li>UD_VENDOR_INTEL - for INTEL instruction set.</li>
+	<li>UD_VEDNOR_ATT - for AMD instruction set.</li>
+	</ul>
+	</li>
+
+	<li>
+	<pre>unsigned int ud_disassemble(ud_t*);</pre>
+
+	This function disassembles the next instruction in the input stream.
+	<i>RETURNS</i>, the number of bytes disassembled. A 0 indicates end of 
+	input. 
+	<i>NOTE</i>, to restart disassembly, after the end of input, you must 
+	call one of the input setting functions with the new input source.
+	</li>
+
+	<li>
+	<pre>unsigned int ud_insn_len(ud_t* u);</pre>
+	Returns the number of bytes disassembled.
+	</li>
+
+	
+	<li>
+	<pre>uint64_t ud_insn_off(ud_t*);</pre>
+	Returns the starting offset of the  disassembled instruction relative
+	to the program counter value specified initially.
+	</li>
+
+	<li>
+	<pre>char* ud_insn_hex(ud_t*);</pre>
+	Returns pointer to character string holding the hexadecimal 
+	representation of the disassembled bytes.
+	</li>
+
+	<li>
+	<pre>uint8_t* ud_insn_ptr(ud_t* u);</pre>
+	Returns pointer to the buffer holding the instruction bytes. Use
+	ud_insn_len(), to determine the length of this buffer.
+	</li>
+
+	<li>
+	<pre>char* ud_insn_asm(ud_t* u);</pre>
+	If the syntax is specified, returns pointer to the character string holding
+	assembly language representation of the disassembled instruction.
+	</li>
+
+	<li>
+	<pre>void ud_input_skip(ud_t*, size_t n);</pre>
+	Skips <code>n</code> number of bytes in the input stream.
+	</li>
+</ol>
+
+<a name="sec16"></a>
+<h3>1.6 Examining an Instruction</h3>
+
+After calling ud_disassembly, instructions can be examined by accessing fields 
+of the ud_t object, as described below.
+
+<ol>
+	<li>
+	<pre>ud_mnemonic_code_t ud_obj->mnemonic</pre>
+
+	The mnemonic code for the disassembled instruction. All codes are
+	prefixed by UD_I, such as, <code>UD_Imov, UD_Icall, UD_Ijmp</code>, etc.
+	See a list of mnemonic codes in mnemonics.h.
+	</li>
+
+	<li>
+	<pre>ud_operand_t ud_obj->operand[n]</pre>
+	
+	The array of operands of the disassembled instruction. A maximum of 
+	three operands are allowed, indexed as 0, 1, and 2. Operands can be
+	examined using the their sub-fields as described below.
+	</li>
+
+	<li>
+	<pre>ud_type_t ud_obj->operand[n].type</pre>
+	This field represents the type of the operand n. Possible values are -
+<pre>
+UD_OP_MEM	-	A Memory Addressing Operand.
+UD_OP_REG	-	A Register Operand.
+UD_OP_PTR	-	A Segment:Offset Pointer Operand.
+UD_OP_IMM	-	An Immediate Operand
+UD_OP_JIMM	-	An Immediate Operand for Branch Instructions.
+UD_OP_CONST	-	A Constant Value Operand.
+UD_NONE		-	No Operand.</pre>
+
+	</li>
+
+	<li>
+	<pre>ud_obj->operand[n].size</pre>
+	This field gives the size of operand n. Possible values are - 8, 16, 32, 48, 64.
+	</li>
+
+	<li>
+	<pre>
+ud_obj->operand[n].base
+ud_obj->operand[n].index
+ud_obj->operand[n].scale
+ud_obj->operand[n].offset
+ud_obj->operand[n].lval</pre>
+
+	For operands of type <code>UD_OP_MEM</code>, 
+	<ul>
+	<li><code>ud_obj->operand[n].base</code> is the base register (if any),</li>
+	<li><code>ud_obj->operand[n].index</code> is the index register (if any),</li>
+	<li><code>ud_obj->operand[n].scale</code> is the scale (if any), </li>
+	<li><code>ud_obj->operand[n].offset</code> is the size of displacement/offset to be added (8,16,32,64),</li>
+	<li><code>ud_obj->operand[n].lval</code> is displacement/offset (if any).</li>
+	</ul>
+
+	For operands of type <code>UD_OP_REG</code>, 
+	<ul>
+	<li><code>ud_obj->operand[n].base</code> field gives the register.</li>
+	</ul>
+
+	For operands of type <code>UD_OP_PTR</code>,
+	<ul> 
+	<li><code>ud_obj->operand[n].lval</code> holds the segment:offset.</li>
+	<li><code>ud_obj->operand[n].size</code> can have two values 32 
+		(for 16:16 seg:off) and 48 (for 16:32 seg:off).</li>
+	</ul>
+
+	For operands of type <code>UD_OP_IMM, UD_OP_JIMM, UD_OP_CONST</code>,
+	<ul> 
+	<li><code>ud_obj->operand[n].lval</code> holds the value.</li>
+	</ul>
+
+	Possible values for <code>ud_obj->operand[n].base</code> and <code>ud_obj->operand[n].index</code>.
+  <pre>
+  /* No register */
+  UD_NONE, 
+
+  /* 8 bit GPRs */
+  UD_R_AL,	UD_R_CL,	UD_R_DL,	UD_R_BL,
+  UD_R_AH,	UD_R_CH,	UD_R_DH,	UD_R_BH,
+  UD_R_SPL,	UD_R_BPL,	UD_R_SIL,	UD_R_DIL,
+  UD_R_R8B,	UD_R_R9B,	UD_R_R10B,	UD_R_R11B,
+  UD_R_R12B,	UD_R_R13B,	UD_R_R14B,	UD_R_R15B,
+
+  /* 16 bit GPRs */
+  UD_R_AX,	UD_R_CX,	UD_R_DX,	UD_R_BX,
+  UD_R_SP,	UD_R_BP,	UD_R_SI,	UD_R_DI,
+  UD_R_R8W,	UD_R_R9W,	UD_R_R10W,	UD_R_R11W,
+  UD_R_R12W,	UD_R_R13W,	UD_R_R14W,	UD_R_R15W,
+	
+  /* 32 bit GPRs */
+  UD_R_EAX,	UD_R_ECX,	UD_R_EDX,	UD_R_EBX,
+  UD_R_ESP,	UD_R_EBP,	UD_R_ESI,	UD_R_EDI,
+  UD_R_R8D,	UD_R_R9D,	UD_R_R10D,	UD_R_R11D,
+  UD_R_R12D,	UD_R_R13D,	UD_R_R14D,	UD_R_R15D,
+	
+  /* 64 bit GPRs */
+  UD_R_RAX,	UD_R_RCX,	UD_R_RDX,	UD_R_RBX,
+  UD_R_RSP,	UD_R_RBP,	UD_R_RSI,	UD_R_RDI,
+  UD_R_R8,	UD_R_R9,	UD_R_R10,	UD_R_R11,
+  UD_R_R12,	UD_R_R13,	UD_R_R14,	UD_R_R15,
+
+  /* segment registers */
+  UD_R_ES,	UD_R_CS,	UD_R_SS,	UD_R_DS,
+  UD_R_FS,	UD_R_GS,	
+
+  /* control registers*/
+  UD_R_CR0,	UD_R_CR1,	UD_R_CR2,	UD_R_CR3,
+  UD_R_CR4,	UD_R_CR5,	UD_R_CR6,	UD_R_CR7,
+  UD_R_CR8,	UD_R_CR9,	UD_R_CR10,	UD_R_CR11,
+  UD_R_CR12,	UD_R_CR13,	UD_R_CR14,	UD_R_CR15,
+	
+  /* debug registers */
+  UD_R_DR0,	UD_R_DR1,	UD_R_DR2,	UD_R_DR3,
+  UD_R_DR4,	UD_R_DR5,	UD_R_DR6,	UD_R_DR7,
+  UD_R_DR8,	UD_R_DR9,	UD_R_DR10,	UD_R_DR11,
+  UD_R_DR12,	UD_R_DR13,	UD_R_DR14,	UD_R_DR15,
+
+  /* mmx registers */
+  UD_R_MM0,	UD_R_MM1,	UD_R_MM2,	UD_R_MM3,
+  UD_R_MM4,	UD_R_MM5,	UD_R_MM6,	UD_R_MM7,
+
+  /* x87 registers */
+  UD_R_ST0,	UD_R_ST1,	UD_R_ST2,	UD_R_ST3,
+  UD_R_ST4,	UD_R_ST5,	UD_R_ST6,	UD_R_ST7, 
+
+  /* extended multimedia registers */
+  UD_R_XMM0,	UD_R_XMM1,	UD_R_XMM2,	UD_R_XMM3,
+  UD_R_XMM4,	UD_R_XMM5,	UD_R_XMM6,	UD_R_XMM7,
+  UD_R_XMM8,	UD_R_XMM9,	UD_R_XMM10,	UD_R_XMM11,
+  UD_R_XMM12,	UD_R_XMM13,	UD_R_XMM14,	UD_R_XMM15,
+
+  UD_R_RIP</pre>
+
+  Possible values for <code>ud_obj->operand[n].lval</code> depend on <code>ud_obj->operand[n].size</code>,
+  based on which you could use its sub-fields to access the integer values.
+
+  <pre>
+ud_obj->operand[n].lval.sbyte		- Signed Byte
+ud_obj->operand[n].lval.ubyte		- Unsigned Byte
+ud_obj->operand[n].lval.sword		- Signed Word
+ud_obj->operand[n].lval.uword		- Unsigned Word
+ud_obj->operand[n].lval.sdword		- Signed Double Word
+ud_obj->operand[n].lval.udword		- Unsined Double Word
+ud_obj->operand[n].lval.sqword		- Signed Quad Word
+ud_obj->operand[n].lval.uqword		- Unsigned Quad Word
+ud_obj->operand[n].lval.ptr.seg		- Pointer Segment in Segment:Offset
+ud_obj->operand[n].lval.ptr.off		- Pointer Offset in Segment:Offset </pre>
+
+	<li>Prefix Fields - These fields store prefixes (if found). If a prefix
+	does not exists then the field corresponding to it has the value <code>UD_NONE</code>.
+	<pre>
+ud_obj->operand[n].pfx_rex		- 64-bit mode REX prefix
+ud_obj->operand[n].pfx_seg		- Segment register prefix
+ud_obj->operand[n].pfx_opr		- Operand-size prefix (66h)
+ud_obj->operand[n].pfx_adr		- Address-size prefix (67h)
+ud_obj->operand[n].pfx_lock		- Lock prefix
+ud_obj->operand[n].pfx_rep		- Rep prefix
+ud_obj->operand[n].pfx_repe		- Repe prefix
+ud_obj->operand[n].pfx_repne		- Repne prefix</pre>
+	</li>
+
+	Possible values for <code>ud_obj->operand[n].pfx_seg</code> are,
+	<pre>
+UD_R_ES,	UD_R_CS,	UD_R_SS,	UD_R_DS,
+UD_R_FS,	UD_R_GS,	UD_NONE</pre>
+	</li>
+
+	<li>
+	<pre>uint4_t ud_obj->pc</pre>
+	The program counter.
+	</li>
+</ol>
+
+<a name="sec2"></a>
+<h2>2. Using the command-line tool - udcli</h2>
+
+A front-end incarnation of this library, udcli is a small command-line tool for 
+your quick disassembly needs.
+
+<a name="sec21"></a>
+<h3>2.1 Usage</h3>
+<pre>$ udcli [-option[s]] file</pre>
+
+<a name="sec22"></a>
+<h3>2.2 Options</h3>
+<pre>
+-16      : Set the disassembly mode to 16 bits.
+-32      : Set the disassembly mode to 32 bits. (default)
+-64      : Set the disassembly mode to 64 bits.
+-intel   : Set the output to INTEL (NASM like) syntax. (default)
+-att     : Set the output to AT&amp;T (GAS like) syntax.
+-v &lt;v&gt;   : Set vendor. &lt;v&gt; = {intel, amd} 
+-o &lt;pc&gt;  : Set the value of program counter to <pc>. (default = 0)
+-s &lt;pc&gt;  : Set the number of bytes to skip before disassembly to <n>.
+-c &lt;pc&gt;  : Set the number of bytes to disassemble to <n>.
+-x       : Set the input mode to whitespace seperated 8-bit numbers in
+           hexadecimal representation. Example: 0f 01 ae 00
+-noff    : Do not display the offset of instructions.
+-nohex   : Do not display the hexadecimal code of instructions.
+-h       : Display help message.</pre>
+
+<a name="sec23"></a>
+<h3>2.3 The hexadecimal input mode</h3>
+
+Noteworthy among the command-line options of the udcli is "-x" which sets the
+input mode to whitespace seperated 8-bit numbers in hexadecimal representation.
+This could come as a handy tool, for quickly disassembling hexadecimal 
+representation of machine code, like those generated during software crashes, etc.
+
+<div style="text-align:center; padding: 1em;">
+<img src="ss.jpg" style="border: 1px double; padding: 2px;"/>
+</div>
+
+<div style="text-align:center"><small>&copy; 2006 Vivek Mohan</small></div>
+</body>
+</html>