04/02/97

Numbering Systems

3 systems we will use in this class:

There is a difference between a numbers representation and the amount that it represents. Ex: 1010 = 10 in binary, 1,010 in decimal. 1016 = 16 in decimal, not 10.

Positional Numbering System

Examples:

12310 = 1 * 102 + 2 * 101 + 3 * 100 = 100 + 20 + 3 = 123

10102 = 1 * 23 + 0 * 22 + 1 * 21 + 0 * 20 = 8 + 0 + 2 + 0 = 10

Conventions for specifying a number's base:
Suffix Representation
1010 (no suffix) decimal
1010d or 1010t decimal
1010b binary
1010h hexidecimal
Subscript Representation
NB The Number N in Base B

Problem with binary representation: Have to write a lot of digits to represent integers. Ex: 101010111011111 ~= 30,000

Solution: It is easy to convert between binary and hexidecimal representations. Memorize the following chart:

1111 F 0111 7
1110 E 0110 6
1101 D 0101 5
1100 C 0100 4
1011 B 0011 3
1010 A 0010 2
1001 9 0001 1
1000 8 0000 0

Now you can quite easily convert from binary to hexidecimal numbers:
1010 1011 1011 1111 Divide the number into 4 bit nibbles
A B B F Conver each nibble into hexidecimal according to the chart
Conventions to Follow When Writing Decimal Numbers:

If we were to write 10256781056, it would be difficult to look at the number and right away tell if it were 100 million, 1 billion, 10 billion, etc. However, if we separate the number into triples with commas, as in 10,256,781,056, it is much easier to read and tell that the number is 10 billion... Likewise, when writing binary numbers, separate the digits into 4-bit nibbles, ex: 10111011 = 1011 1011. This way is much easier to read (as well as convert quickly to hexidecimal).

Unsigned Integers

With n bits, we can represent 2n different values. As a convention, those values will be all integers on the range from 0 -> 2n - 1. Some common examples are:

28 256
216 65,536
232 ~4 billion
Bit Identification
Bit Number 76543210
10110111

Bit number 7 is the H.O. (High Order) bit, also called the most significant bit (MSB). Bit number 0 is the L.O. (Low Order) bit, also called the least significant bit (LSB). Likewise, a word can be divided in the H.O. and L.O. bytes (bits 15-8, and 7-0, respectively). It also follows that a dword can be divided into H.O. and L.O. words (bits 31-16, and 15-0, respectively).


04/04/97

Number representation systems


04/07/97

Shifts and Rotates

Memory


04/09/97

Computer architecture continued...


04/11/97

Quick disucssion on memory and CPU clock speed and relation to overall system performance:

Segments continued...
Segmented Addresses - Take the format seg:offset, where seg is 16 bits, allowing up to 65,535 segments, and offset is:

We will limit ourselves to using 16 bit segments, but because our programs will never be more than 64K, it won't matter. Really, we don't need to concern ourselves with segments.
Segment Registers - These are special registers that point to certain segments so that we do not need to specify what segment we are accessing in our instructions (they default to the segments pointed at by these registers)
CS - Code Segment : register points to the 64K block where you put your machine instructions.
DS - Data Segment
ES - Extra Segment : this extra segment can be used as an extra data segment if you use more than 64K.
SS - Stack Segment
In most programs we will be writing in this class, all we'll need is CS, DS, & SS.

Consider the following HLL code fragment:
int i, j
i = 100
How will these instructions be represented in assembly? The compiler will create a symbol table where the label will be stored along with it's address in memory.
name address
i 1000h
j 1004h
So, if we want to set i to 100, then the instruction will be MOV DS:[1000h], 100.

How does the PC access memory?
The MOV instruction is the primary way to access memory. It takes the following form:
MOV dest, src
Where dest and src must take one of the following forms:
dest src
reg reg/mem
reg/mem reg
reg/mem constant
Some points to note about the MOV instruction:

Addressing Modes on the 8086
There are 17 different addressing modes on the 8086. On the 80386 and above, there are something like 1000! You will not be required to know the 80386 modes, but you must memorize the 17 8086 addressing modes.


04/14/97

Addressing modes continued...

These forms make up the 17 addressing modes on the 8086. Here are some interesting points to note:

Assembly Langauge Programming

Shell.asm - template file for writing programs. Contains four segments directives to use for programming:
cseg Code segment : all machine instructions go here
dseg Data segment : declare variables here
sseg Stack segment : where values go when you use push and pop; you don't need to worry about this segment.
zzzzzzseg Heap where data is allocated dynamically with malloc and free
Point to Note: it is the programs responsibility to make sure that the CS register points at the code segment and that the DS register points at the data segment. However, you won't need to worry about this either, because it is already done for you in the Shell.asm file.
Here's another point: For each of your programs, make a copy of the Shell.asm file and delete the comments that tell you where to put things. These are only for your benefit, and are not reasonable comments. Remove them.

Segment definitions in Assembly Dseg
Desg segment para public 'data'
Name of segment Identifier Align the segment on 16-bit boundaries Other modules can see the segment For identifying this segment when there are other segments with the same name.
ends
Variables in the Dseg are global.

Variable declarations in Assembly

 name		type	value(s) 
Points to Note:

04/16/97

Variable declarations continued...


integer		typedef	sword

char		typedef byte

longint		typedef sdword



Dseg		segment	...

i		integer	0

j		integer 10

a		char	'A'

b		longint ?

Dseg		ends

Here is what the segment looks like in memory:
location value
5 b
4 a
2 j
0 i
Here's some code sequence examples:

mov ax, i <==> mov ax, ds:[0] 

mov cx, j <==> mov cx, ds:[2] 



; The next two lines do ax = i



mov bx, 0 

mov ax, [bx] 		

Each segment has a location counter that tells the assembler where to put new data in that segment. It then inserts the data and bumps the location counter by the size of that data.
Note: C & CH should not be used as variable names for characters as they are reserved words. C is used to tell the assembler you will be linking your code with C code, and CH is a register (the H.O. byte of CX).

Array declarations


Dseg		segment

i		integer	0,1,2,3

Dseg		ends

Here are some code examples of how to access the values of i:

mov	bx, 0

mov	ax, [bx+i]		; ax = i[0]

mov	bx, 2	

mov	cx, [bx+i]		; cx = i[1]

Characters


Dseg		segment

A		char	'R','A','N','D','Y'

B		char	'RANDY', 0

C		char	'Randy', ' Hyde', 0

D		char	'Randy Hyde',0

E		char	'Randy'

		char	' Hyde'

		byte	0

Dseg		ends

In this data segment declaration, A & B are equivalent, as are C, D, and E.

This leads to new way to declare arrays:


Dseg		segment

A		char	64 dup(?)

Dseg		ends

This makes 64 dupicate copies of ?, the base address of which is at A.

04/18/97

Arrays

Consider the following data segment declarations:


Dseg		segment	...

A		word	64 dup(?)

B		word	64 dup(0,1,2,3)

S		byte	256 dup('stack')

D		word	4 dup(4 dup(?))

Dseg		ends

A is a simple one-dimensional array of 64 words (128 bytes). B is a single dimension array of 256 words (512 bytes), where the values are initialized to 0,1,2,3,0,1,2,3,0,1,... S is a declaration typical of what you would see in the Shell.asm file to define and initialize the stack. Each letter in 'stack' is a byte, so the size of the stack would be 256 * 5 = 1,280 bytes. The final array D is a two-dimensional 4x4 array, totaling 16 words, or 32 bytes in overall size. Notice that the assembler doesn't really specify any particular attributes a 2-dimensional array delcared this way. You could just have easily declared the array this way:

D		word	16 dup(?)

The fact that the array has two dimensions is only relevant in how you treat the contiguous block of memory that is set aside by the delcaration. There are two different ways that you can access two-dimensional arrays:

Row-Major and Column-Major Ordering

Suppose you have a two-dimensional array B declared as follows:


B		word	4 dup(4 dup(?))

Which will result in a two dimensional array appearing as follows:
Columns
Rows 0,0 0,1 0,2 0,3
1,0 1,1 1,2 1,3
2,0 2,1 2,2 2,3
3,0 3,1 3,2 3,3
Conventionally, there are two different methods used to map this two-dimensioanl block into a one-dimensional block of memory. Here is what the memory will look like for each scheme:
Row Major Ordering Column Major Ordering
0,0 0,0
0,1 1,0
0,2 2,0
0,3 3,0
1,0 0,1
1,1 1,1
1,2 2,1
1,3 3,1
... ...
3,2 2,3
3,3 3,3
For this class, we will mostly use Row Major ordering. This is mainly due to the fact that C/C++ also uses Row Major Ordering, so if we want to link our assembly programs with C or C++, they will both need to use the same array ordering scheme.

Here are formulas to use for both schemes, supposing you want to access B[y,x]

Instruction sequences used to access array elements in assembly (using Row Major Ordering), supposing you want to do AX = B[i,j], where B is an 4 x 4 array of words:

mov	bx, i

imul 	bx, 4		; multiply by rowsize

add	bx, j

add	bx, bx		; multiply by element size (2)

mov	ax, B[bx]


04/21/97

Structures and Pointers

A structure in assembly language is basically the same as a C++ class. For example:
Declaring Structures
In C++: In Assembly:

Struct {

	int i;

	int j;

	float f;

} MyStruct;


MyStruct	struct

i		sword	?

j		sword	?

f		real4	?

MyStruct 	ends

Using Structures

MyStruct m;

m.i = ...

m.j = ...

m.f = ...


m		MyStruct {}

mov		m.i, ...

mov		m.j, ...

mov		m.f, ...

Array of structs:
Declaring:


x		MyStruct 10dup({})

Accessing:

mov	bx, index

imul	bx, 8		; multiply by the size of the struct

mov	x[bx].i, 0	

Notice, you can also use the assembly language sizeof function to compute the size of the structure, which would make the second line above be:

imul	bx, sizeof(MyStruct)

There are two reasons why this is a good idea:

Pointers

2 types:

Declaring pointers: (in data segment)


I		word	?

myPtr		dword	I 		; declare a far pointer to I

myNPtr		word	I		; initialize myNPtr with address of I

Using Pointers:

mov	ES, myPtr+2			

mov	BX, myPtr

mov	AX, ES:[BX]			; mov ax, I

Notice, instead of the two move instructions, there is an assembly language instruction
 les	bx, myPtr 
that will automatically load ES:BX with the 32-bit operand field, where the H.O. byte goes into ES, and the L.O. byte goes into the register specified (in this case, BX). There are also lds, lfs, lgs, and lss, which do the same with those repsective registers. The general form of the instruction is:

l*s	reg, 32-bit value

Where * is e,d,f,g, or s.

Problems with far pointers: They are larger and instructions used to access them are slow.

Problems with near pointers: Only holds the offset, so you have to know what segment the data is in. However, they are smaller and you can use the mov instruction to access them, instead of l*s (which is much slower).

Physical address: Suppose you have a seg:offset address, xxxx:yyyy, how do you compute the physical address in memory where it will be mapped?
xxxx0 Append a zero to the segment
+yyyy Add the offset
ppppp Result is physical address in memory
Given this method, where would the following addresses be mapped? 100:1000 & 200:0.
100:1000 200:0
1000 2000 Append zero to segment
1000 0000 Add in offset
2000 2000 Physical address
Both of these addresses map to the same spot in memory. If we were to compare pointers to each of these two values, the program would say they are the same, when really they are different. In reality, for any possible address, there are 16 possible forms to access it. The solution: Normalized Addresses, of the form xxxx:y. Insert a colon between the least significant digit and the one before it.


04/23/97

Delcaring Constants in Assembly:


RowSize=4

ColSize=4

; in data segment...

MatrixOne	word	RowSize dup(ColSize dup(?)) ; creates 4 x 4 matrix

Note: Put all constants at beginning of source file (right after includes), so if someone sees a constant they know right where they can look up it's value.

2 attributes of a variable in memory: value and address. Example: consider the following data segment declaration:


I	word	10

Ptr	dword 	I

J	word	20

Here are the attributes of the three variables:
Variable Address Value
I DS:0 10
Ptr DS:2 DS:0
J DS:6 20
Suppose you want to set Ptr as a pointer to J. Consider the following code:

lea	ax, J

mov	Ptr, ax	

mov	Ptr+2, ds	

This is not an acceptable way to move the pointer because the mov instruction has mis-matched types (pointer is 32-bits, the registers only 16). To remedy this, use the following method:

lea	ax, J

mov	word ptr Ptr, ax

mov	word ptr Ptr+2, ds

This tells the assembler to treat the L.O. word of Ptr as a 16-bit value, thus making the mov instruction have compatibly operand types. Note: word ptr will only work on memory locations, not registers. Another way you could have done this is with the les instruction:

les	bx, Ptr

mov	es:[bx], ax		; Ptr now points at J (which was stored in ax)

Another instruction you can use is the lea instruction (load effective address), which loads a 16-bit destination with the offset of it's source operand. Example:

lea	ax, i

Which is basically the same as the following in C:

int *pi;

int i;

pi=&i;	

Suppose you want to move a constant into a register, as in the following:

mov	[bx], const

This is unacceptable to the assembler because it doesn't know the size of the what bx is pointing at. To alleviate this problem, use

mov	word ptr [bx], const

This tells the assembler to treat the value pointed at by bx as a 16-bit word.

Dynamic memory allocation:

malloc: routine will allocate memory on the stack and return a pointer to it in es:di. The amount of bytes allocated is specified by the value in the cx register. Example: consider the following declaration of a 64 x 16 matrix of words in the data segment:

 

A	word	64 dup(16 dup(?))

Here is how you would allocate it dynamically:

mov	cx, 64*16*2		; cx now holds number of byte to allocate 

				; (rowsize * colsize * itemsize)

malloc				; allocate memory on heap

mov	word ptr A, di		; set A equal to the pointer to the newly

mov	word ptr A+2, es	; allocated memory

Now, if we want to access A[x][y] = ax:

; the following 4 instruction compute the index using row-major ordering

mov	bx, y			

imul	bx, 16

add	bx, x

add	bx, bx

les	di, A			; load es:di with the base address of the

				; array (previosly stored in A)

mov	es:[di+bx], ax		; use base + offset addressing mode to 

				; access array element.

Accessing fields in structures through pointers:

Suppose you have the following structure, and pointer to that structrue, declared in the data segment:


S		struct

i		word	?

j		word 	?

...

S		ends

MyStruct	S	{}

SPtr		dword	MyStruct

Now, how would you access fields in the structure through the pointer? Consider the following:

les	di, SPtr

mov	ax, es:[di].i

This method will not work because es:di doesn't know which structure it's pointing at, so it doesn't know which i you mean to access (a field named i could appear in many different structures). You must tell the assembler which structure you mean to access in the following manner:

mov	ax, es:[di].S.i 	

This method indicates to the assembler what structure es:di is pointing at.

04/25/97

Introduction to the UCR Standard Library

Detailed documentation on the standard library routines can be found in Chapter 7 in the Art of Assembly Text.

To use the library, you must have the following include directives in your source file:


include 	stdlib.o

includelib	stdlib.lib

Note: these are already included for you in the shell.asm template file. You must also have include & lib environment variables set up with paths to these files. This has already been done on the machines in the lab under Windows 95.

Overview of some useful routines:

Conditional jumps based on comparisons:
Signed Unsigned
jg jump greater ja jump above
jge jump greater or equal jae jump above or equal
jl jump less than jb jump below
jle jump less than or equal jbe jump below or equal
je jump equal
jne jump not equal
Note: all of these functions have a jn* counterpart, which is the exact opposite. For example, jng is the opposite of jg. Also, jng is the exact same as jle. However, it is bad form to use jle as an opposite to jg if you wish to test the opposite condition, because it can be very confusing. It is much more clear what you are intending to do if you use the jn* form of these conditional jump operations.

Conditional Jumps based on Flags:
jc Jump if carry
jnc Jump if carry
js Jump if sign
jns Jump if no sign
jo Jump if overflow
jno Jump if no overflow
jz Jump if zero
jnz Jump if not zero
These comparisons test the conditions of the flags and jump if the said condition is true.


04/28/97

Stack Operations
push mem/reg sp = sp-2
ss:[sp] = mem/reg
pop mem/reg mem/reg = ss:[sp]
sp = sp+2
Points to note about stack:

Here's the code for myputs:


; assumes es:di points to a string to print 

myputs		proc	near

		push	di		; save di, since we'll be changing it's value

whilelp:	mov	al, es:[di]

		cmp	al, 0		; check for zero byte (end of string)

		je	done

		putc			; print character in al

		inc 	di		

		jmp	whilelp

done:		pop	di		; restore di's old value

		ret

myputs		endp

3 common errors to avoid when working with the stack:

Here is the code for getu, which reads an unsigned integer from the stdin.


; Assumes:

; In data segment ... 

; Input		byte	(128dup(?))

geti		proc	near

		push 	es

		push	di

BadNum:		lesi	Input		; es:di points at Input string

		gets			; reads string into es:di (Input)

SkipSpcs:	cmp	byte ptr es:[di], ' '

		jne	notspace

		inc	di

		jmp	SkipSpcs

NotSpace:	cmp	byte ptr es:[di], '0'

		jb	RetryMsg

		cmp	byte ptr es:[di], '9'

		jna	GoodNum

RetryMsg:	print

		byte "Invalid Integer, try again.",cr,lf,0

		jmp	BadNum

GoodNum:	atou			; convert string to unsigned int

		pop	di

		pop	es

		ret

getu		endp


04/30/97

Text Equates Text Equates (defined using the testequ directive) are used to literally substitute one substring for another. You define text equates in the following manner:


name		textequ	

When your program is assembled, everywhere that name appears in the source code, the assembler will insert "substitute." For the following example, assumeyou have written a procedure _geti. Using text equates allows you to do something like the following:

geti		textequ	

Now, instead of calling the procedure, you can simply write the name of the procedure (actually the text equated name, the real name is _geti), effectively making it appear like an instruction rather than a procedure. The routines in the standard library are declared in this fashion.

Parameter Passing

There are many different methods of passing parameters and places of passing them. The main ways we will study is pass by reference & pass by value. The main place we'll be passing our parameters is on the stack. Heres an example:


; ReadArray[i][j] - assumes pointer to the array to read values into 

;                   passed as parameter in es:di

ReadArray	proc	near

		getu			; returns unsigned int in ax

		mov	bx, i

		imul	bx, numcols	 

		add	bx, j

		add	bx, bx	

		mov 	es:[di+bx], ax

ReadArray	endp

Here's an example of the usage:

		lesi	A1		; es:di holds base address of A1

		ReadArray	

		lesi	A2		; es:di holds base address of A2

		ReadArray	

Note: The ReadArray procedure clearly affects the ax and bx registers. These registers should be pushed on the stack upon entry, and popped of the stack on exit, to preserve their values.
Back to Top | Class Notes Index | Course Materials | Assembly Language Programming