04/02/97
Numbering Systems
3 systems we will use in this class:
There is a difference between a numbers representation and the amount that it represents. Ex: 1010 = 10 in binary, 1,010 in decimal. 1016 = 16 in decimal, not 10.
Positional Numbering SystemExamples:
12310 = 1 * 102 + 2 * 101 + 3 * 100 = 100 + 20 + 3 = 123
10102 = 1 * 23 + 0 * 22 + 1 * 21 + 0 * 20 = 8 + 0 + 2 + 0 = 10
Conventions for specifying a number's base:| Suffix Representation | |
| 1010 (no suffix) | decimal |
| 1010d or 1010t | decimal |
| 1010b | binary |
| 1010h | hexidecimal |
| Subscript Representation | |
| NB | The Number N in Base B |
Problem with binary representation: Have to write a lot of digits to represent integers. Ex: 101010111011111 ~= 30,000
Solution: It is easy to convert between binary and hexidecimal representations. Memorize the following chart:
| 1111 | F | 0111 | 7 |
| 1110 | E | 0110 | 6 |
| 1101 | D | 0101 | 5 |
| 1100 | C | 0100 | 4 |
| 1011 | B | 0011 | 3 |
| 1010 | A | 0010 | 2 |
| 1001 | 9 | 0001 | 1 |
| 1000 | 8 | 0000 | 0 |
Now you can quite easily convert from binary to hexidecimal numbers:
| 1010 | 1011 | 1011 | 1111 | Divide the number into 4 bit nibbles |
| A | B | B | F | Conver each nibble into hexidecimal according to the chart |
If we were to write 10256781056, it would be difficult to look at the number and right away tell if it were 100 million, 1 billion, 10 billion, etc. However, if we separate the number into triples with commas, as in 10,256,781,056, it is much easier to read and tell that the number is 10 billion... Likewise, when writing binary numbers, separate the digits into 4-bit nibbles, ex: 10111011 = 1011 1011. This way is much easier to read (as well as convert quickly to hexidecimal).
Unsigned IntegersWith n bits, we can represent 2n different values. As a convention, those values will be all integers on the range from 0 -> 2n - 1. Some common examples are:
| 28 | 256 |
| 216 | 65,536 |
| 232 | ~4 billion |
| Bit Number | 76543210 |
| 10110111 |
Bit number 7 is the H.O. (High Order) bit, also called the most significant bit (MSB). Bit number 0 is the L.O. (Low Order) bit, also called the least significant bit (LSB). Likewise, a word can be divided in the H.O. and L.O. bytes (bits 15-8, and 7-0, respectively). It also follows that a dword can be divided into H.O. and L.O. words (bits 31-16, and 15-0, respectively).
04/04/97
Number representation systems
| 0100 0000 | Start off with 127 |
| 1011 1111 | Negate all of the digits |
| 1100 0000 | Add 1, this is the result (-127) |
| 0100 0000 | Start off with 127 |
| 1100 0000 | Add -127 |
| 0000 0000 | Ignoring the overflow, the answer is zero. |
| 0000 0000 | Start off with 127 |
| 1111 1111 | Negate all of the digits |
| 0000 0000 | Add 1 (ignoring the overflow), the result is zero |
Operations
| Binary | Hex | |
| X | 1010 1110 | AE |
| Y | 1111 0101 | F5 |
| X AND Y | 1010 0100 | A4 |
| X OR Y | 1111 1111 | FF |
| X XOR Y | 0101 1011 | 5B |
Interesting properties of bitwise operators:
04/07/97
Shifts and Rotates
| 1000 0000 | -128 |
| 0100 0000 | SHR by one digit, the result is +64 |
| ((B SHL 3) AND 1000b) OR | Shift the 3 bit into the third value and mask off all the other bits |
| ((B SHR 2) AND 101b) OR | Shift the value right 2 so that the 2 and 0 bits are in the proper place, then mask off all the other values |
| ((B ROL 3) AND 10b) | Now shift the 1 bit into the proper place and mask it |
Memory
04/09/97
Computer architecture continued...
| Register Type | Size | Name |
| General Purpose Registers | 8 bit | AH, AL, BH, BL, CH, CL, DH, DL |
| 16 bit | AX, BX, CX, DX, SI, DI, BP, SP | |
| 32 bit | EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP | |
| Special Purpose Registers | 16 bit | IP, FLAGS |
| 32 bit | EIP, EFLAGS | |
| Segment Registers | 16 bit | CS, DS, ES, SS, FS, GS |
Some points to note about the registers:
04/11/97
Quick disucssion on memory and CPU clock speed and relation to overall system performance:
Segments continued...
Segmented Addresses - Take the format seg:offset, where seg is 16 bits, allowing up to 65,535 segments, and offset is:
Consider the following HLL code fragment:
int i, j
i = 100
How will these instructions be represented in assembly? The compiler will create a symbol table where the label will be stored along with it's address in memory.
| name | address |
| i | 1000h |
| j | 1004h |
How does the PC access memory?
The MOV instruction is the primary way to access memory. It takes the following form:
MOV dest, src
Where dest and src must take one of the following forms:
| dest | src |
| reg | reg/mem |
| reg/mem | reg |
| reg/mem | constant |
Addressing Modes on the 8086
There are 17 different addressing modes on the 8086. On the 80386 and above, there are something like 1000! You will not be required to know the 80386 modes, but you must memorize the 17 8086 addressing modes.
| MOV AX, [1000h] | Moves data from DS:1000h into AX |
| MOV AX, ES:[1000h] | Moves data from ES:1000h into AX |
| MOV AX, [BX] | Suppose the value in BX is 1000h, move the data from DS:[1000h] into AX |
| MOV AX, [BP] | Suppose tha value in BP is 1000h, move the data from SS:[1000h] into AX |
04/14/97
Addressing modes continued...
These forms make up the 17 addressing modes on the 8086. Here are some interesting points to note:
Notice, a sematically equivalent way to do the last command would be MOV A[BX], AL, which looks more like a High Level Langauge array access.
| disp | bx bp | si di |
Assembly Langauge Programming
Shell.asm - template file for writing programs. Contains four segments directives to use for programming:
| cseg | Code segment : all machine instructions go here |
| dseg | Data segment : declare variables here |
| sseg | Stack segment : where values go when you use push and pop; you don't need to worry about this segment. |
| zzzzzzseg | Heap where data is allocated dynamically with malloc and free |
Segment definitions in Assembly
| Desg | segment | para | public | 'data' |
| Name of segment | Identifier | Align the segment on 16-bit boundaries | Other modules can see the segment | For identifying this segment when there are other segments with the same name. |
| ends |
Variable declarations in Assembly
name type value(s)Points to Note:
char typedef byte integer typedef sword
A integer ?
04/16/97
Variable declarations continued...
integer typedef sword char typedef byte longint typedef sdword Dseg segment ... i integer 0 j integer 10 a char 'A' b longint ? Dseg endsHere is what the segment looks like in memory:
| location | value |
| 5 | b |
| 4 | a |
| 2 | j |
| 0 | i |
mov ax, i <==> mov ax, ds:[0] mov cx, j <==> mov cx, ds:[2] ; The next two lines do ax = i mov bx, 0 mov ax, [bx]Each segment has a location counter that tells the assembler where to put new data in that segment. It then inserts the data and bumps the location counter by the size of that data.
Array declarations
Dseg segment i integer 0,1,2,3 Dseg endsHere are some code examples of how to access the values of i:
mov bx, 0 mov ax, [bx+i] ; ax = i[0] mov bx, 2 mov cx, [bx+i] ; cx = i[1]
Characters
Dseg segment A char 'R','A','N','D','Y' B char 'RANDY', 0 C char 'Randy', ' Hyde', 0 D char 'Randy Hyde',0 E char 'Randy' char ' Hyde' byte 0 Dseg ends
In this data segment declaration, A & B are equivalent, as are C, D, and E.
This leads to new way to declare arrays:
Dseg segment A char 64 dup(?) Dseg endsThis makes 64 dupicate copies of ?, the base address of which is at A.
04/18/97
Arrays
Consider the following data segment declarations:
Dseg segment ...
A word 64 dup(?)
B word 64 dup(0,1,2,3)
S byte 256 dup('stack')
D word 4 dup(4 dup(?))
Dseg ends
A is a simple one-dimensional array of 64 words (128 bytes). B is a single dimension array of 256 words (512 bytes), where the values are initialized to 0,1,2,3,0,1,2,3,0,1,... S is a declaration typical of what you would see in the Shell.asm file to define and initialize the stack. Each letter in 'stack' is a byte, so the size of the stack would be 256 * 5 = 1,280 bytes. The final array D is a two-dimensional 4x4 array, totaling 16 words, or 32 bytes in overall size. Notice that the assembler doesn't really specify any particular attributes a 2-dimensional array delcared this way. You could just have easily declared the array this way: D word 16 dup(?)The fact that the array has two dimensions is only relevant in how you treat the contiguous block of memory that is set aside by the delcaration. There are two different ways that you can access two-dimensional arrays:
Row-Major and Column-Major Ordering
Suppose you have a two-dimensional array B declared as follows:
B word 4 dup(4 dup(?))Which will result in a two dimensional array appearing as follows:
| Columns | |||||
| Rows | 0,0 | 0,1 | 0,2 | 0,3 | |
| 1,0 | 1,1 | 1,2 | 1,3 | ||
| 2,0 | 2,1 | 2,2 | 2,3 | ||
| 3,0 | 3,1 | 3,2 | 3,3 | ||
| Row Major Ordering | Column Major Ordering | |
| 0,0 | 0,0 | |
| 0,1 | 1,0 | |
| 0,2 | 2,0 | |
| 0,3 | 3,0 | |
| 1,0 | 0,1 | |
| 1,1 | 1,1 | |
| 1,2 | 2,1 | |
| 1,3 | 3,1 | |
| ... | ... | |
| 3,2 | 2,3 | |
| 3,3 | 3,3 |
Here are formulas to use for both schemes, supposing you want to access B[y,x]
mov bx, i imul bx, 4 ; multiply by rowsize add bx, j add bx, bx ; multiply by element size (2) mov ax, B[bx]
04/21/97
Structures and Pointers
A structure in assembly language is basically the same as a C++ class. For example:
| Declaring Structures | |
| In C++: | In Assembly: |
Struct {
int i;
int j;
float f;
} MyStruct;
|
MyStruct struct i sword ? j sword ? f real4 ? MyStruct ends |
| Using Structures | |
MyStruct m; m.i = ... m.j = ... m.f = ... |
m MyStruct {}
mov m.i, ...
mov m.j, ...
mov m.f, ...
|
Array of structs:
x MyStruct 10dup({})
Accessing:
mov bx, index imul bx, 8 ; multiply by the size of the struct mov x[bx].i, 0Notice, you can also use the assembly language sizeof function to compute the size of the structure, which would make the second line above be:
imul bx, sizeof(MyStruct)There are two reasons why this is a good idea:
Pointers
2 types:
Declaring pointers: (in data segment)
I word ? myPtr dword I ; declare a far pointer to I myNPtr word I ; initialize myNPtr with address of IUsing Pointers:
mov ES, myPtr+2 mov BX, myPtr mov AX, ES:[BX] ; mov ax, INotice, instead of the two move instructions, there is an assembly language instruction
les bx, myPtrthat will automatically load ES:BX with the 32-bit operand field, where the H.O. byte goes into ES, and the L.O. byte goes into the register specified (in this case, BX). There are also lds, lfs, lgs, and lss, which do the same with those repsective registers. The general form of the instruction is:
l*s reg, 32-bit valueWhere * is e,d,f,g, or s.
Problems with far pointers: They are larger and instructions used to access them are slow.
Problems with near pointers: Only holds the offset, so you have to know what segment the data is in. However, they are smaller and you can use the mov instruction to access them, instead of l*s (which is much slower).
Physical address: Suppose you have a seg:offset address, xxxx:yyyy, how do you compute the physical address in memory where it will be mapped?
| xxxx0 | Append a zero to the segment |
| +yyyy | Add the offset |
| ppppp | Result is physical address in memory |
| 100:1000 | 200:0 | |
| 1000 | 2000 | Append zero to segment |
| 1000 | 0000 | Add in offset |
| 2000 | 2000 | Physical address |
04/23/97
Delcaring Constants in Assembly:
RowSize=4 ColSize=4 ; in data segment... MatrixOne word RowSize dup(ColSize dup(?)) ; creates 4 x 4 matrixNote: Put all constants at beginning of source file (right after includes), so if someone sees a constant they know right where they can look up it's value.
2 attributes of a variable in memory: value and address. Example: consider the following data segment declaration:
I word 10 Ptr dword I J word 20Here are the attributes of the three variables:
| Variable | Address | Value |
| I | DS:0 | 10 |
| Ptr | DS:2 | DS:0 |
| J | DS:6 | 20 |
lea ax, J mov Ptr, ax mov Ptr+2, dsThis is not an acceptable way to move the pointer because the mov instruction has mis-matched types (pointer is 32-bits, the registers only 16). To remedy this, use the following method:
lea ax, J mov word ptr Ptr, ax mov word ptr Ptr+2, dsThis tells the assembler to treat the L.O. word of Ptr as a 16-bit value, thus making the mov instruction have compatibly operand types. Note: word ptr will only work on memory locations, not registers. Another way you could have done this is with the les instruction:
les bx, Ptr mov es:[bx], ax ; Ptr now points at J (which was stored in ax)Another instruction you can use is the lea instruction (load effective address), which loads a 16-bit destination with the offset of it's source operand. Example:
lea ax, iWhich is basically the same as the following in C:
int *pi; int i; pi=&i;Suppose you want to move a constant into a register, as in the following:
mov [bx], constThis is unacceptable to the assembler because it doesn't know the size of the what bx is pointing at. To alleviate this problem, use
mov word ptr [bx], constThis tells the assembler to treat the value pointed at by bx as a 16-bit word.
Dynamic memory allocation:
malloc: routine will allocate memory on the stack and return a pointer to it in es:di. The amount of bytes allocated is specified by the value in the cx register. Example: consider the following declaration of a 64 x 16 matrix of words in the data segment:
A word 64 dup(16 dup(?))Here is how you would allocate it dynamically:
mov cx, 64*16*2 ; cx now holds number of byte to allocate ; (rowsize * colsize * itemsize) malloc ; allocate memory on heap mov word ptr A, di ; set A equal to the pointer to the newly mov word ptr A+2, es ; allocated memoryNow, if we want to access A[x][y] = ax:
; the following 4 instruction compute the index using row-major ordering mov bx, y imul bx, 16 add bx, x add bx, bx les di, A ; load es:di with the base address of the ; array (previosly stored in A) mov es:[di+bx], ax ; use base + offset addressing mode to ; access array element.
Accessing fields in structures through pointers:
Suppose you have the following structure, and pointer to that structrue, declared in the data segment:
S struct
i word ?
j word ?
...
S ends
MyStruct S {}
SPtr dword MyStruct
Now, how would you access fields in the structure through the pointer? Consider the following:
les di, SPtr mov ax, es:[di].iThis method will not work because es:di doesn't know which structure it's pointing at, so it doesn't know which i you mean to access (a field named i could appear in many different structures). You must tell the assembler which structure you mean to access in the following manner:
mov ax, es:[di].S.iThis method indicates to the assembler what structure es:di is pointing at.
04/25/97
Introduction to the UCR Standard Library
Detailed documentation on the standard library routines can be found in Chapter 7 in the Art of Assembly Text.
To use the library, you must have the following include directives in your source file:
include stdlib.o includelib stdlib.libNote: these are already included for you in the shell.asm template file. You must also have include & lib environment variables set up with paths to these files. This has already been done on the machines in the lab under Windows 95.
Overview of some useful routines:
print byte "Hello World",cr,lf,0cr and lf are pre-defined constants that indicate a carriage return and a line feed, repsectively. The 0 byte at the end is what terminates the routine, which basically will print everything it sees until it finds that zero byte. Neglecting to put the zero byte will yield some strange results. Note: You cannot print variables with print, only string constants.
SomeStr char "Hello World",cr,lf,0 ptr2SomeStr dword SomeStrTo print out the contents of SomeStr using puts:
les di, ptr2SomeStr putsHowever, if you don't want to have to use a pointer, you could do the following:
mov di, ds mov es, di lea di, SomeStrNote: you cannot just load es with ds, because you cannot mov segment registers into segment registers. You must go through a general purpose register. The above operation is so common, there is an instruction that does exactly that:
lesi SomeStr ; loads es:di with the address of SomeStr
; in data segment... Input char 128 dup(?) ... ; in code segment... lesi Input ; es:di now points at Input gets ; reads STDIN, stores results in Input print byte "You just entered:",0 puts ; prints Input to screen (es:di still points ; at Input)
mov cx, 3 mov ax, 5 lp: putisize sub ax, 1 cmp ax, 0 jgt lp putcrThis code sequence will output the following to the screen:
5 4 3 2 1Each integer printed to the screen is buffered by 3 spaces (the value in cx).
Conditional jumps based on comparisons:
| Signed | Unsigned | ||
| jg | jump greater | ja | jump above |
| jge | jump greater or equal | jae | jump above or equal |
| jl | jump less than | jb | jump below |
| jle | jump less than or equal | jbe | jump below or equal |
| je | jump equal | ||
| jne | jump not equal | ||
Conditional Jumps based on Flags:
| jc | Jump if carry |
| jnc | Jump if carry |
| js | Jump if sign |
| jns | Jump if no sign |
| jo | Jump if overflow |
| jno | Jump if no overflow |
| jz | Jump if zero |
| jnz | Jump if not zero |
04/28/97
Stack Operations
| push mem/reg | sp = sp-2 ss:[sp] = mem/reg |
| pop mem/reg | mem/reg = ss:[sp] sp = sp+2 |
push ax ; series of instructions affecting ax pop ax ; restores original value of ax
push ax push bx push cx push dx ... pop ax pop bx pop cx pop dxIn this example, ax will get the old value of dx, bx will get cx's old value, and so on.
myputs proc near ... ret myputs endp ... call myputsWhen the call myputs instruction is executed, the location of the next instruction to execute (stored in the IP register) is pushed onto the stack. Then, control is transfered over to myputs. When myputs is done, the ret instruction will pop IP off the stack, and execute that instruction (the next instruction after the call instruction) next.
Here's the code for myputs:
; assumes es:di points to a string to print myputs proc near push di ; save di, since we'll be changing it's value whilelp: mov al, es:[di] cmp al, 0 ; check for zero byte (end of string) je done putc ; print character in al inc di jmp whilelp done: pop di ; restore di's old value ret myputs endp3 common errors to avoid when working with the stack:
Here is the code for getu, which reads an unsigned integer from the stdin.
; Assumes: ; In data segment ... ; Input byte (128dup(?)) geti proc near push es push di BadNum: lesi Input ; es:di points at Input string gets ; reads string into es:di (Input) SkipSpcs: cmp byte ptr es:[di], ' ' jne notspace inc di jmp SkipSpcs NotSpace: cmp byte ptr es:[di], '0' jb RetryMsg cmp byte ptr es:[di], '9' jna GoodNum RetryMsg: print byte "Invalid Integer, try again.",cr,lf,0 jmp BadNum GoodNum: atou ; convert string to unsigned int pop di pop es ret getu endp
04/30/97
Text Equates Text Equates (defined using the testequ directive) are used to literally substitute one substring for another. You define text equates in the following manner:
name textequWhen your program is assembled, everywhere that name appears in the source code, the assembler will insert "substitute." For the following example, assumeyou have written a procedure _geti. Using text equates allows you to do something like the following:
geti textequNow, instead of calling the procedure, you can simply write the name of the procedure (actually the text equated name, the real name is _geti), effectively making it appear like an instruction rather than a procedure. The routines in the standard library are declared in this fashion.
Parameter Passing
There are many different methods of passing parameters and places of passing them. The main ways we will study is pass by reference & pass by value. The main place we'll be passing our parameters is on the stack. Heres an example:
; ReadArray[i][j] - assumes pointer to the array to read values into ; passed as parameter in es:di ReadArray proc near getu ; returns unsigned int in ax mov bx, i imul bx, numcols add bx, j add bx, bx mov es:[di+bx], ax ReadArray endpHere's an example of the usage:
lesi A1 ; es:di holds base address of A1 ReadArray lesi A2 ; es:di holds base address of A2 ReadArrayNote: The ReadArray procedure clearly affects the ax and bx registers. These registers should be pushed on the stack upon entry, and popped of the stack on exit, to preserve their values.