Mercurial > projects > ddbg_continued
view udis86-1.4/docs/doc.html @ 1:4a9dcbd9e54f
-files of 0.13 beta
-fixes so that it now compiles with the current dmd version
author | marton@basel.hu |
---|---|
date | Tue, 05 Apr 2011 20:44:01 +0200 |
parents | |
children |
line wrap: on
line source
<html> <head> <title>Documentation: Udis86 - Disassembler Library for x86 and AMD64</title> <style type="text/css">@import 'style.css';</style> <style> #api_tab td { vertical-align: top; padding: .4em; } #api_tab { width: 100%; padding: 0em; } .func { text-align: center; font-family: monospace; } </style> </head> <body> <div id="topbar"><h1>Udis86 - Disassembler Library for x86 and AMD64</h1></div> <div id="wrapper"> <div id="content"> <a href="index.html">Home</a> <h1>Documentation</h1> <small><i>Dec 19, 2006</i> - Added new API function and Standalone build mode.</small> <hr size="1"/> <br/> <a href="#sec1">1. Using the library - libudis86</a> <blockquote> <a href="#sec11">1.1 Compiling and Installing</a> <br/> <blockquote> <a href="#sec111">1.1.1 Standalone Udis86</a> <br/> </blockquote> <a href="#sec12">1.2 Interfacing With Your Program</a> <br/> <a href="#sec13">1.3 A Quick Example</a><br/> <a href="#sec14">1.4 The Udis86 Object</a><br/> <a href="#sec15">1.5 Functions</a><br/> <a href="#sec16">1.6 Examining an Instruction</a> </blockquote> <a href="#sec2">2. Using the command-line tool - udcli</a> <blockquote> <a href="#sec21">2.1 Usage</a> <br/> <a href="#sec22">2.2 Command-line options</a> <br/> <a href="#sec23">2.3 The hexadecimal input mode</a> <br/> </blockquote> <br/> <hr size="1"/> <a name="sec1"></a> <h2>1. Using the library - libudis86</h2> <p>libudis86 can be used in a variety of situations, and the extent to which you need to know the API depends on the functionality you are looking for. At its core, libudis86 is a disassembler engine, which when given an input stream of machine code bytes, disassembles them for you to inspect. You could use it, simply, to generate assembly language output of the code, or to inspect individual instructions, their operands, etc.</p> <a name="sec11"></a> <h3>1.1 Compiling and installing</h3> <p>libudis86 is developed for Unix-like environments, and steps to installing it is very simple. Get the source tarball, unpack it, and,</p> <pre> $ ./configure $ make $ make install </pre> <p>is all you need to do. Ofcourse, you may need to have root privileges to make install. The install scripts copy the necessary header and library files to appropriate locations of in system. </p> <a name="sec111"></a> <h4>1.1.1 Standalone Udis86</h4> <p>Standalone udis86 is (for now) a simple build hack that lets you build a single relocatable object file with all of the libudis86 functionality, that you could possibly use in any environment, say, a kernel module. The standalone object assumes the availablity of the following functionality in the environment.</p> <pre> - memset() - vprintf() - sscanf() </pre> <p> To build udis86 in standalone mode, do the following.</p> <pre> $ ./configure $ make standalone </pre> which will generate <code>ud_standalone.o</code> in <code>ud/libudis86</code>. This build mode is a little hack'ish in nature and hopefully is replaced with better configure scripts in the future. If you have ideas, let me know! <a name="sec12"></a> <h3>1.2 Interfacing with your program</h3> <p>Once you have installed libudis86, to use it with your program, first, include in your program the udis86.h header file, </p> <pre> #include <udis86.h> </pre> <p>and then, add the following flag to your GCC command-line options.</p> <pre> -ludis86 </pre> <a name="sec13"></a> <h3>1.3 A Quick Example</h3> The following code is an example of a program that interfaces with libudis86 and uses the API to generate assembly language output for 64-bit code, input from STDIN. <pre> /* simple_example.c */ #include <stdio.h> #include <udis86.h> int main() { ud_t ud_obj; ud_init(&ud_obj); ud_set_input_file(&ud_obj, stdin); ud_set_mode(&ud_obj, 64); ud_set_syntax(&ud_obj, UD_SYN_INTEL); while (ud_disassemble(&ud_obj)) { printf("\t%s\n", ud_insn_asm(&ud_obj)); } return 0; } </pre> To compile the program: <pre> $ gcc -ludis86 simple_example.c -o simple_example </pre> <p>This example should give you an idea of how this library can be used. The following sections describe, in detail, the complete API of libudis86.</p> <a name="sec14"></a> <h3>1.4 The Udis86 Object</h3> <p>To maintain reentrancy and thread safety, udis86 does not use static data. All data related to the disassembly process are stored in a single object, called the udis86 object. So, to use libudis86 you must create an instance of this object,</p> <pre> ud_t my_ud_obj; </pre> and initialize it, <pre> ud_init(&my_ud_obj); </pre> Ofcourse, you can create multiple instances of libudis86 and spawn multiple threads of disassembly. Thats entirely upto how you want to use the library. libudis86 guarantees reentrancy and thread safety. <a name="sec15"></a> <h3>1.5 Functions</h3> All functions in libudis86 take a pointer to the udis86 object (ud_t) as the first argument. The following is a list of all functions available. <ol> <li> <pre>void ud_init (ud_t* ud_obj)</pre> ud_t object initializer. This function must be called on a udis86 object before it can used anywhere else. </li> <li> <pre>void ud_set_input_hook(ud_t* ud_obj, int (*hook)())</pre> This function sets the input source for the library. To retrieve each byte in the stream, libudis86 calls back the function pointed to by "hook". The hook function, defined by the user code, must return a single byte of code each time it is called. To signal end-of-input, it must return the constant, <code>UD_EOI</code>. </li> <li> <pre>void ud_set_input_buffer(ud_t* ud_obj, unsigned char* buffer, size_t size);</pre> This function sets the input source for the library to a buffer of fixed size. </li> <li> <pre>void ud_set_input_file(ud_t* ud_obj, FILE* filep);</pre> This function sets the input source for the library to a file pointed to by the passed FILE pointer. Note that the library does not perform any checks, assuming the file pointer to be properly initialized. </li> <li> <pre>void ud_set_mode(ud_t* ud_obj, uint8_t mode_bits);</pre> Sets the mode of disassembly. Possible values are 16, 32, and 64. By default, the library works in 32bit mode. </li> <li> <pre>void ud_set_pc(ud_t*, uint64_t pc);</pre> Sets the program counter (EIP/RIP). This changes the offset of the assembly output generated, with direct effect on branch instructions. </li> <li> <pre>void ud_set_syntax(ud_t*, void (*translator)(ud_t*));</pre> libudis86 disassembles one instruction at a time into an intermediate form that lets you inspect the instruction and its various aspects individually. But to generate the assembly language output, this intermediate form must be translated. This function sets the translator. There are two inbuilt translators, <br/> <ul> <li>UD_SYN_INTEL - for INTEL (NASM-like) syntax.</li> <li>UD_SYN_ATT - for AT&T (GAS-like) syntax.</li> </ul> If you do not want libudis86 to translate, you can pass a NULL to the function, with no more translations thereafter. This is particularily useful for cases when you only want to identify chunks of code and then create the assembly output if needed. <br/><br/> If you want to create your own translator, you must pass a pointer to function that accepts a pointer to ud_t. This function will be called by libudis86 after each instruction is decoded. </li> <li> <pre>void ud_set_vendor(ud_t*, unsigned vendor);</pre> Sets the vendor of whose instruction to choose from. This is only useful for selecting the VMX or SVM instruction sets at which point INTEL and AMD have diverged significantly. At a later stage, support for a more granular selection of instruction sets maybe added. <br/> <ul> <li>UD_VENDOR_INTEL - for INTEL instruction set.</li> <li>UD_VEDNOR_ATT - for AMD instruction set.</li> </ul> </li> <li> <pre>unsigned int ud_disassemble(ud_t*);</pre> This function disassembles the next instruction in the input stream. <i>RETURNS</i>, the number of bytes disassembled. A 0 indicates end of input. <i>NOTE</i>, to restart disassembly, after the end of input, you must call one of the input setting functions with the new input source. </li> <li> <pre>unsigned int ud_insn_len(ud_t* u);</pre> Returns the number of bytes disassembled. </li> <li> <pre>uint64_t ud_insn_off(ud_t*);</pre> Returns the starting offset of the disassembled instruction relative to the program counter value specified initially. </li> <li> <pre>char* ud_insn_hex(ud_t*);</pre> Returns pointer to character string holding the hexadecimal representation of the disassembled bytes. </li> <li> <pre>uint8_t* ud_insn_ptr(ud_t* u);</pre> Returns pointer to the buffer holding the instruction bytes. Use ud_insn_len(), to determine the length of this buffer. </li> <li> <pre>char* ud_insn_asm(ud_t* u);</pre> If the syntax is specified, returns pointer to the character string holding assembly language representation of the disassembled instruction. </li> <li> <pre>void ud_input_skip(ud_t*, size_t n);</pre> Skips <code>n</code> number of bytes in the input stream. </li> </ol> <a name="sec16"></a> <h3>1.6 Examining an Instruction</h3> After calling ud_disassembly, instructions can be examined by accessing fields of the ud_t object, as described below. <ol> <li> <pre>ud_mnemonic_code_t ud_obj->mnemonic</pre> The mnemonic code for the disassembled instruction. All codes are prefixed by UD_I, such as, <code>UD_Imov, UD_Icall, UD_Ijmp</code>, etc. See a list of mnemonic codes in mnemonics.h. </li> <li> <pre>ud_operand_t ud_obj->operand[n]</pre> The array of operands of the disassembled instruction. A maximum of three operands are allowed, indexed as 0, 1, and 2. Operands can be examined using the their sub-fields as described below. </li> <li> <pre>ud_type_t ud_obj->operand[n].type</pre> This field represents the type of the operand n. Possible values are - <pre> UD_OP_MEM - A Memory Addressing Operand. UD_OP_REG - A Register Operand. UD_OP_PTR - A Segment:Offset Pointer Operand. UD_OP_IMM - An Immediate Operand UD_OP_JIMM - An Immediate Operand for Branch Instructions. UD_OP_CONST - A Constant Value Operand. UD_NONE - No Operand.</pre> </li> <li> <pre>ud_obj->operand[n].size</pre> This field gives the size of operand n. Possible values are - 8, 16, 32, 48, 64. </li> <li> <pre> ud_obj->operand[n].base ud_obj->operand[n].index ud_obj->operand[n].scale ud_obj->operand[n].offset ud_obj->operand[n].lval</pre> For operands of type <code>UD_OP_MEM</code>, <ul> <li><code>ud_obj->operand[n].base</code> is the base register (if any),</li> <li><code>ud_obj->operand[n].index</code> is the index register (if any),</li> <li><code>ud_obj->operand[n].scale</code> is the scale (if any), </li> <li><code>ud_obj->operand[n].offset</code> is the size of displacement/offset to be added (8,16,32,64),</li> <li><code>ud_obj->operand[n].lval</code> is displacement/offset (if any).</li> </ul> For operands of type <code>UD_OP_REG</code>, <ul> <li><code>ud_obj->operand[n].base</code> field gives the register.</li> </ul> For operands of type <code>UD_OP_PTR</code>, <ul> <li><code>ud_obj->operand[n].lval</code> holds the segment:offset.</li> <li><code>ud_obj->operand[n].size</code> can have two values 32 (for 16:16 seg:off) and 48 (for 16:32 seg:off).</li> </ul> For operands of type <code>UD_OP_IMM, UD_OP_JIMM, UD_OP_CONST</code>, <ul> <li><code>ud_obj->operand[n].lval</code> holds the value.</li> </ul> Possible values for <code>ud_obj->operand[n].base</code> and <code>ud_obj->operand[n].index</code>. <pre> /* No register */ UD_NONE, /* 8 bit GPRs */ UD_R_AL, UD_R_CL, UD_R_DL, UD_R_BL, UD_R_AH, UD_R_CH, UD_R_DH, UD_R_BH, UD_R_SPL, UD_R_BPL, UD_R_SIL, UD_R_DIL, UD_R_R8B, UD_R_R9B, UD_R_R10B, UD_R_R11B, UD_R_R12B, UD_R_R13B, UD_R_R14B, UD_R_R15B, /* 16 bit GPRs */ UD_R_AX, UD_R_CX, UD_R_DX, UD_R_BX, UD_R_SP, UD_R_BP, UD_R_SI, UD_R_DI, UD_R_R8W, UD_R_R9W, UD_R_R10W, UD_R_R11W, UD_R_R12W, UD_R_R13W, UD_R_R14W, UD_R_R15W, /* 32 bit GPRs */ UD_R_EAX, UD_R_ECX, UD_R_EDX, UD_R_EBX, UD_R_ESP, UD_R_EBP, UD_R_ESI, UD_R_EDI, UD_R_R8D, UD_R_R9D, UD_R_R10D, UD_R_R11D, UD_R_R12D, UD_R_R13D, UD_R_R14D, UD_R_R15D, /* 64 bit GPRs */ UD_R_RAX, UD_R_RCX, UD_R_RDX, UD_R_RBX, UD_R_RSP, UD_R_RBP, UD_R_RSI, UD_R_RDI, UD_R_R8, UD_R_R9, UD_R_R10, UD_R_R11, UD_R_R12, UD_R_R13, UD_R_R14, UD_R_R15, /* segment registers */ UD_R_ES, UD_R_CS, UD_R_SS, UD_R_DS, UD_R_FS, UD_R_GS, /* control registers*/ UD_R_CR0, UD_R_CR1, UD_R_CR2, UD_R_CR3, UD_R_CR4, UD_R_CR5, UD_R_CR6, UD_R_CR7, UD_R_CR8, UD_R_CR9, UD_R_CR10, UD_R_CR11, UD_R_CR12, UD_R_CR13, UD_R_CR14, UD_R_CR15, /* debug registers */ UD_R_DR0, UD_R_DR1, UD_R_DR2, UD_R_DR3, UD_R_DR4, UD_R_DR5, UD_R_DR6, UD_R_DR7, UD_R_DR8, UD_R_DR9, UD_R_DR10, UD_R_DR11, UD_R_DR12, UD_R_DR13, UD_R_DR14, UD_R_DR15, /* mmx registers */ UD_R_MM0, UD_R_MM1, UD_R_MM2, UD_R_MM3, UD_R_MM4, UD_R_MM5, UD_R_MM6, UD_R_MM7, /* x87 registers */ UD_R_ST0, UD_R_ST1, UD_R_ST2, UD_R_ST3, UD_R_ST4, UD_R_ST5, UD_R_ST6, UD_R_ST7, /* extended multimedia registers */ UD_R_XMM0, UD_R_XMM1, UD_R_XMM2, UD_R_XMM3, UD_R_XMM4, UD_R_XMM5, UD_R_XMM6, UD_R_XMM7, UD_R_XMM8, UD_R_XMM9, UD_R_XMM10, UD_R_XMM11, UD_R_XMM12, UD_R_XMM13, UD_R_XMM14, UD_R_XMM15, UD_R_RIP</pre> Possible values for <code>ud_obj->operand[n].lval</code> depend on <code>ud_obj->operand[n].size</code>, based on which you could use its sub-fields to access the integer values. <pre> ud_obj->operand[n].lval.sbyte - Signed Byte ud_obj->operand[n].lval.ubyte - Unsigned Byte ud_obj->operand[n].lval.sword - Signed Word ud_obj->operand[n].lval.uword - Unsigned Word ud_obj->operand[n].lval.sdword - Signed Double Word ud_obj->operand[n].lval.udword - Unsined Double Word ud_obj->operand[n].lval.sqword - Signed Quad Word ud_obj->operand[n].lval.uqword - Unsigned Quad Word ud_obj->operand[n].lval.ptr.seg - Pointer Segment in Segment:Offset ud_obj->operand[n].lval.ptr.off - Pointer Offset in Segment:Offset </pre> <li>Prefix Fields - These fields store prefixes (if found). If a prefix does not exists then the field corresponding to it has the value <code>UD_NONE</code>. <pre> ud_obj->operand[n].pfx_rex - 64-bit mode REX prefix ud_obj->operand[n].pfx_seg - Segment register prefix ud_obj->operand[n].pfx_opr - Operand-size prefix (66h) ud_obj->operand[n].pfx_adr - Address-size prefix (67h) ud_obj->operand[n].pfx_lock - Lock prefix ud_obj->operand[n].pfx_rep - Rep prefix ud_obj->operand[n].pfx_repe - Repe prefix ud_obj->operand[n].pfx_repne - Repne prefix</pre> </li> Possible values for <code>ud_obj->operand[n].pfx_seg</code> are, <pre> UD_R_ES, UD_R_CS, UD_R_SS, UD_R_DS, UD_R_FS, UD_R_GS, UD_NONE</pre> </li> <li> <pre>uint4_t ud_obj->pc</pre> The program counter. </li> </ol> <a name="sec2"></a> <h2>2. Using the command-line tool - udcli</h2> A front-end incarnation of this library, udcli is a small command-line tool for your quick disassembly needs. <a name="sec21"></a> <h3>2.1 Usage</h3> <pre>$ udcli [-option[s]] file</pre> <a name="sec22"></a> <h3>2.2 Options</h3> <pre> -16 : Set the disassembly mode to 16 bits. -32 : Set the disassembly mode to 32 bits. (default) -64 : Set the disassembly mode to 64 bits. -intel : Set the output to INTEL (NASM like) syntax. (default) -att : Set the output to AT&T (GAS like) syntax. -v <v> : Set vendor. <v> = {intel, amd} -o <pc> : Set the value of program counter to <pc>. (default = 0) -s <pc> : Set the number of bytes to skip before disassembly to <n>. -c <pc> : Set the number of bytes to disassemble to <n>. -x : Set the input mode to whitespace seperated 8-bit numbers in hexadecimal representation. Example: 0f 01 ae 00 -noff : Do not display the offset of instructions. -nohex : Do not display the hexadecimal code of instructions. -h : Display help message.</pre> <a name="sec23"></a> <h3>2.3 The hexadecimal input mode</h3> Noteworthy among the command-line options of the udcli is "-x" which sets the input mode to whitespace seperated 8-bit numbers in hexadecimal representation. This could come as a handy tool, for quickly disassembling hexadecimal representation of machine code, like those generated during software crashes, etc. <div style="text-align:center; padding: 1em;"> <img src="ss.jpg" style="border: 1px double; padding: 2px;"/> </div> <div style="text-align:center"><small>© 2006 Vivek Mohan</small></div> </body> </html>