Some of the most important and most frequently used instructions are those that move data. Without them, there would be no way for registers or memory to even have anything in them to operate on.
Data transfer instructions
Move
| mov src, dest | GAS Syntax | 
| mov dest, src | Intel Syntax | 
mov stands for move.
Despite its name the mov instruction copies the src operand into the dest operand.
After the operation both operands contain the same contents.
Operands
| srcoperand | destoperand | ||
|---|---|---|---|
| immediate value | register | memory | |
| Yes (into larger register) | Yes (same size) | Yes (register determines size of retrieved memory) | register | 
| Yes (up to 32-bit values) | Yes | No | memory | 
Modified flags
- No FLAGS are modified by this instruction
Example
.data
value:
	.long 2
.text
	.globl _start
_start:
	movl $6, %eax                         # eax ≔ 6
	                                      #  └───────┐
	movw %eax, value                      # value ≔ eax
	                                      #   └───────────┐
	movl $0, %ebx                         # ebx ≔ 0  │    │
	                                      #       ┌──┘    │
	movb %al, %bl                         # bl ≔ al       │
	                                      # %ebx is now 6 │
	                                      #         ┌─────┘
	movl value, %ebx                      # ebx ≔ value
	
	movl $value, %esi                     # ebx ≔ @value
	# %esi is now the address of value
	
	xorl %ebx, %ebx                       # ebx ≔ ebx ⊻ ebx
	                                      # %ebx is now 0
	
	movw value(, %ebx, 1), %bx            # bx ≔ value[ebx*1]
	                                      # %ebx is now 6
	
# Linux sys_exit
	movl $1, %eax                         # eax ≔ 1
	xorl %ebx, %ebx                       # ebx ≔ 0
	int $0x80
Data swap
| xchg src, dest | GAS Syntax | 
| xchg dest, src | Intel Syntax | 
xchg stands for exchange.
The xchg instruction swaps the src operand with the dest operand.
It is like doing three mov operations:
- from destto a temporary (another register),
- then from srctodest, and finally
- from the temporary storage to src,
except that no register needs to be reserved for temporary storage.
This exchange pattern of three consecutive mov instructions can be detected by the DFU present in some architectures, which will trigger special treatment.
The opcode for xchg is shorter though.
Operands
Any combination of register or memory operands, except that at most one operand may be a memory operand. You cannot exchange two memory blocks.
Modified Flags
None.
Example
 .data
 
 value:
        .long   2
 
 .text
        .global _start
 
 _start:
        movl    $54, %ebx
        xorl    %eax, %eax
 
        xchgl   value, %ebx
        # %ebx is now 2
        # value is now 54
 
        xchgw   %ax, value
        # Value is now 0
        # %eax is now 54
 
        xchgb   %al, %bl
        # %ebx is now 54
        # %eax is now 2
 
        xchgw   value(%eax), %ax
        # value is now 0x00020000 = 131072
        # %eax is now 0
 
 # Linux sys_exit 
        mov     $1, %eax
        xorl    %ebx, %ebx
        int     $0x80
Application
If one of the operands is a memory address, then the operation has an implicit lock prefix, that is, the exchange operation is atomic.
This can have a large performance penalty.
However, on some platforms exchanging two (non-partial) registers will trigger the register renamer. The register renamer is a unit in that merely renames registers, so no data actually have to be moved. This is super fast (branded as “zero-latency”). Renaming registers could be useful since
- some instructions either require certain operands to be located in specific register, but data will be needed later on,
- or encoding some opcodes is shorter if one of the operands is the accumulator register.
The  xchg instruction is used for changing the Byte order (LE ↔ BE) of 16-bit values, because the bswap instruction is only available for 32-, and 64-bit values.
You do so by addressing partial registers, e. g. xchg ah, al.
It is also worth noting that the common nop (no operation) instruction, 0x90, is the opcode for xchgl %eax, %eax.
Data swap based on comparison
| cmpxchg arg2, arg1 | GAS Syntax | 
| cmpxchg arg1, arg2 | Intel Syntax | 
cmpxchg stands for compare and exchange.
Exchange is misleading as no data are actually exchanged.
The cmpxchg instruction has one implicit operand: the al/ax/eax depending on the size of arg1.
- The instruction compares arg1toal/ax/eax.
- If they are equal, arg1becomesarg2. (arg1=arg2)
- Otherwise, al/ax/eaxbecomesarg1.
Unlike xchg there is no implicit lock prefix, and if the instruction is required to be atomic, lock has to be prefixed.
Operands
arg2 has to be a register.
arg1 may be either a register or memory operand.
Modified flags
- ZF≔- arg1= (- al|- ax|- eax) [depending on- arg1’s size]
- CF,- PF,- AF,- SF,- OFare altered, too.
Application
The following example shows how to use the cmpxchg instruction to create a spin lock which will be used to protect the result variable.
The last thread to grab the spin lock will get to set the final value of result:
| example for a spin lock | 
|---|
| global main 
extern printf
extern pthread_create
extern pthread_exit
extern pthread_join
section .data
	align 4
	sLock:		dd 0	; The lock, values are:
				; 0	unlocked
				; 1	locked	
	tID1:		dd 0
	tID2:		dd 0
	fmtStr1:	db "In thread %d with ID: %02x", 0x0A, 0
	fmtStr2:	db "Result %d", 0x0A, 0
section .bss
	align 4
	result:		resd 1
section .text
	main:			; Using main since we are using gcc to link
				;
				; Call pthread_create(pthread_t *thread, const pthread_attr_t *attr,
				;			void *(*start_routine) (void *), void *arg);
				;
	push	dword 0		; Arg Four: argument pointer
	push	thread1		; Arg Three: Address of routine
	push	dword 0		; Arg Two: Attributes
	push	tID1		; Arg One: pointer to the thread ID
	call	pthread_create
	push	dword 0		; Arg Four: argument pointer
	push	thread2		; Arg Three: Address of routine
	push	dword 0		; Arg Two: Attributes
	push	tID2		; Arg One: pointer to the thread ID
	call	pthread_create
				;
				; Call int pthread_join(pthread_t thread, void **retval) ;
				;
	push	dword 0		; Arg Two: retval
	push	dword [tID1]	; Arg One: Thread ID to wait on
	call	pthread_join
	push	dword 0		; Arg Two: retval
	push	dword [tID2]	; Arg One: Thread ID to wait on
	call	pthread_join
	push	dword [result]
	push	dword fmtStr2
	call	printf
	add	esp, 8		; Pop stack 2 times 4 bytes
	call exit
thread1:
	pause
	push	dword [tID1]
	push	dword 1	
	push	dword fmtStr1
	call	printf
	add	esp, 12		; Pop stack 3 times 4 bytes
	call	spinLock
	mov	[result], dword 1
	call	spinUnlock
	push	dword 0		; Arg one: retval
	call	pthread_exit
thread2:
	pause
	push	dword [tID2]
	push	dword 2	
	push	dword fmtStr1
	call	printf
	add	esp, 12		; Pop stack 3 times 4 bytes
	call	spinLock
	mov	[result], dword 2
	call	spinUnlock
	push	dword 0		; Arg one: retval
	call	pthread_exit
spinLock:
	push	ebp
	mov	ebp, esp
	mov	edx, 1		; Value to set sLock to
spin:	mov	eax, [sLock]	; Check sLock
	test	eax, eax	; If it was zero, maybe we have the lock
	jnz	spin		; If not try again
	;
	; Attempt atomic compare and exchange:
	; if (sLock == eax):
	;	sLock		<- edx
	;	zero flag	<- 1
	; else:
	;	eax		<- edx
	;	zero flag	<- 0
	;
	; If sLock is still zero then it will have the same value as eax and
	; sLock will be set to edx which is one and therefore we aquire the
	; lock. If the lock was acquired between the first test and the
	; cmpxchg then eax will not be zero and we will spin again.
	;
	lock	cmpxchg [sLock], edx
	test	eax, eax
	jnz	spin
	pop	ebp
	ret
spinUnlock:
	push	ebp
	mov	ebp, esp
	mov	eax, 0
	xchg	eax, [sLock]
	pop	ebp
	ret
exit:
				;
				; Call exit(3) syscall
				;	void exit(int status)
				;
	mov	ebx, 0		; Arg one: the status
	mov	eax, 1		; Syscall number:
	int 	0x80
In order to assemble, link and run the program we need to do the following: $ nasm -felf32 -g cmpxchgSpinLock.asm
$ gcc -o cmpxchgSpinLock cmpxchgSpinLock.o -lpthread
$ ./cmpxchgSpinLock
 | 
Move with zero extend
| movz src, dest | GAS Syntax | 
| movzx dest, src | Intel Syntax | 
movz stands for move with zero extension.
Like the regular mov the movz instruction copies data from the src operand to the dest operand, but the remaining bits in dest that are not provided by src are filled with zeros.
This instruction is useful for copying a small, unsigned value to a bigger register.
Operands
Dest has to be a register, and src can be either another register or a memory operand.
For this operation to make sense dest has to be larger than src.
Modified flags
There are none.
Example
 .data
 
 byteval:
        .byte   204
 
 .text
        .global _start
 
 _start:
        movzbw  byteval, %ax
        # %eax is now 204
 
        movzwl  %ax, %ebx
        # %ebx is now 204
 
        movzbl  byteval, %esi
        # %esi is now 204
 
 # Linux sys_exit 
        mov     $1, %eax
        xorl    %ebx, %ebx
        int     $0x80
Move with sign extend
| movs src, dest | GAS Syntax | 
| movsx dest, src | Intel Syntax | 
movsx stands for move with sign extension.
The movsx instruction copies the src operand in the dest operand and pads the remaining bits not provided by src with the sign bit (the MSB) of src.
This instruction is useful for copying a signed small value to a bigger register.
Operands
movsx accepts the same operands as movzx.
Modified Flags
movsx does not modify any flags, either.
Example
 .data
 
 byteval:
        .byte   -24 # = 0xe8
 
 .text
        .global _start
 
 _start:
        movsbw  byteval, %ax
        # %ax is now -24 = 0xffe8
 
        movswl  %ax, %ebx
        # %ebx is now -24 = 0xffffffe8
 
        movsbl  byteval, %esi
        # %esi is now -24 = 0xffffffe8
 
 # Linux sys_exit 
        mov     $1, %eax
        xorl    %ebx, %ebx
        int     $0x80
Move String
movsb
Move byte.
The movsb instruction copies one byte from the memory location specified in esi to the location specified in edi.
If the direction flag is cleared, then esi and edi are incremented after the operation. Otherwise, if the direction flag is set, then the pointers are decremented.
In that case the copy would happen in the reverse direction, starting at the highest address and moving toward lower addresses until ecx is zero.
Operands
There are no explicit operands, but
- ecxdetermines the number of iterations,
- esispecifies the source address,
- edithe destination address, and
- DF is used to determine the direction (it can be altered by the cldandstdinstruction).
Modified flags
No flags are modified by this instruction.
Example
section .text
  ; copy mystr into mystr2
  mov esi, mystr    ; loads address of mystr into esi
  mov edi, mystr2   ; loads address of mystr2 into edi
  cld               ; clear direction flag (forward)
  mov ecx,6
  rep movsb         ; copy six times
 
section .bss
  mystr2: resb 6
 
section .data
  mystr db "Hello", 0x0
movsw
Move word
The movsw instruction copies one word (two bytes) from the location specified in esi to the location specified in edi. It basically does the same thing as movsb, except with words instead of bytes.
Operands
None.
Modified flags
- No FLAGS are modified by this instruction
Example
section .code
  ; copy mystr into mystr2
  mov esi, mystr
  mov edi, mystr2
  cld
  mov ecx,4
  rep movsw
  ; mystr2 is now AaBbCca\0
 
section .bss
  mystr2: resb 8
 
section .data
  mystr db "AaBbCca", 0x0
Load Effective Address
| lea src, dest | GAS Syntax | 
| lea dest, src | Intel Syntax | 
lea stands for load effective address.
The lea instruction calculates the address of the src operand and loads it into the dest operand.
Operands
src
- Immediate
- Register
- Memory
dest
- Register
Modified flags
- No FLAGS are modified by this instruction
Note
Load Effective Address calculates its src operand in the same way as the mov instruction does, but rather than loading the contents of that address into the dest operand, it loads the address itself.
lea can be used not only for calculating addresses, but also general-purpose unsigned integer arithmetic (with the caveat and possible advantage that FLAGS are unmodified).
This can be quite powerful, since the src operand can take up to 4 parameters: base register, index register, scalar multiplier and displacement, e.g. [eax + edx*4 -4] (Intel syntax) or -4(%eax, %edx, 4) (GAS syntax).
The scalar multiplier is limited to constant values 1, 2, 4, or 8 for byte, word, double word or quad word offsets respectively.
This by itself allows for multiplication of a general register by constant values 2, 3, 4, 5, 8 and 9, as shown below (using NASM syntax):
lea ebx, [ebx*2]      ; Multiply ebx by 2
lea ebx, [ebx*8+ebx]  ; Multiply ebx by 9, which totals ebx*18
Conditional Move
| cmovcc src, dest | GAS Syntax | 
| cmovcc dest, src | Intel Syntax | 
cmov stands for conditional move.
It behaves like mov but the execution depends on various flags.
There are following instruction available:
| … = 1 | … = 0 | |
|---|---|---|
| ZF | cmovz,cmove | cmovnz,cmovne | 
| OF | cmovo | cmovno | 
| SF | cmovs | cmovns | 
| CF | cmovc,cmovb,cmovnae | cmovnc,cmovnb,cmovae | 
| CF ∨ ZF | cmovbe | N/A | 
| PF | cmovp,cmovpe | cmovnp,cmovpo | 
| SF = OF | cmovge,cmovnl | cmovnge,cmovl | 
| ZF ∨ SF ≠ OF | cmovng,cmovle | N/A | 
| CF ∨ ZF | cmova | N/A | 
| ¬CF | SF = OF | |
| ¬ZF | cmovnbe,cmova | cmovg,cmovnle | 
|  | The  | 
Operands
Dest has to be a register.
Src can be either a register or memory operand.
Application
The cmov instruction can be used to eliminate branches, thus usage of cmov instruction avoids branch mispredictions.
However, the cmov instructions needs to be used wisely:
the dependency chain will become longer.
Data transfer instructions of 8086 microprocessor
General
General purpose byte or word transfer instructions:
- mov
- copy byte or word from specified source to specified destination
- push
- copy specified word to top of stack.
- pop
- copy word from top of stack to specified location
- pusha
- copy all registers to stack
- popa
- copy words from stack to all registers
- xchg
- Exchange bytes or exchange words
- xlat
- translate a byte in alusing a table in memory
Input/Output
These are I/O port transfer instructions:
- in
- copy a byte or word from specific port to accumulator
- out
- copy a byte or word from accumulator to specific port
Address Transfer Instruction
Special address transfer Instructions:
- lea
- load effective address of operand into specified register
- lds
- load DS register and other specified register from memory
- les
- load ES register and other specified register from memory
Flags
Flag transfer instructions:
- lahf
- load ahwith the low byte of flag register
- sahf
- stores ahregister to low byte of flag register
- pushf
- copy flag register to top of stack
- popf
- copy top of stack word to flag register