NASM - The Netwide Assembler

Chapter 4: Syntax Quirks and Summaries

4.1. Summary of the `JMP` and `CALL` Syntax

The JMP and CALL instructions support a variety of syntaxes to simplify their specific use cases. Some of the following chapters explain how these two instructions interact with various special symbols that NASM uses and some document non-obvious scenarios regarding differently sized modes of operation.

4.1.1. Near Jumps

Near jumps are jumps within a single segment. Probably the most common way to use them is through labels, as explained in section 3.9. APX added a near jump instruction – JMPABS, that allows jumps to any 64-bit address specified with an immediate operand. The instruction works with absolute addresses and the syntax options are shown in section 4.1.6.

4.1.2. Infinite Loop Trick

One of the ways to quickly implement an infinite loop is using the $ token which evaluates to the current position in the code. So a one line infinite loop can simply look like:

     jmp $

4.1.3. Jumps and Mixed Sizes

In some special circumstances one might need to jump between 16-bit mode and 32-bit mode. A similar issue is addressing between 16 and 32 bit segments. The possible cases and the relevant syntax for both problems are explained in section 12.1 and section 12.2 respectively.

4.1.4. Calling Procedures Outside of a Shared Library

When writing shared libraries it's often necessary to call external code. In the ELF format the keyword takes on a different meaning than normally when it helps reference a segment – it's used to refer to some special symbols (more about it can be found in section 9.7.1). In the case described here, "wrt ..plt" references a PLT (procedure linkage table) entry. It can be used to call external routines in a way explained in section 11.2.5.

4.1.5. `FAR` Calls and Jumps

NASM supports FAR (inter-segment) calls and jumps by means of the syntax call segment:offset, where segment and offset both represent immediate values. So to call a far procedure, you could code either of

        call    (seg procedure):procedure 
        call    weird_seg:(procedure wrt weird_seg)

(The parentheses are included for clarity, to show the intended parsing of the above instructions. They are not necessary in practice.)

NASM also supports the syntax call far procedure as a synonym for the first of the above usages. JMP works identically to CALL in these examples.

To declare a far pointer to a data item in a data segment, you must code

        dw symbol, seg symbol                ; 16 bit 
        dd symbol, word seg symbol           ; 32 bit

NASM supports no convenient synonym for this, though you can always invent one using the macro processor.

4.1.6. 64-bit absolute jump (`JMPABS`)

Defined as part of the APX specification, JMPABS is a new near jump instruction takes a 64-bit absolute address immediate. It is the only direct jump instruction that can jump anywhere in the address space in 64-bit mode.

NASM allows this instruction to be specified either as:

     jmpabs target

... or:

     jmp abs target

The generated code is identical. The ABS is required regardless of the DEFAULT setting.

4.1.7. Optimizing jump lengths and sizes

JMP lengths can be specified using keywords such as SHORT and NEAR. The keyword used also has consequences in how many bytes will be emitted in the final assembled instruction. It's worth to note the behavior of SHORT and NEAR for example in 16 bit mode. If it's specifically required to emit a 3 byte encoding of the jump instruction, then the NEAR version shall always fulfill this requirement, even if the jump is made within a SHORT distance (so up to a byte away). If the optimized version is expected then it's best to not use a length specifier at all and let the assembler pick the relevant version by itself.

Using size specifiers with jumps (and therefore with labels which are just immediates) will be optimized down to the shortest possible encoding since the size specifier is relevant to the operation size and not to the jump length.

00000000 EB09                    jmp label 
00000002 EB07                    jmp SHORT label 
00000004 E90400                  jmp NEAR label 
00000007 EB02                    jmp BYTE label 
00000009 EB00                    jmp WORD label 

                                 label:

4.2. Compact NDS/NDD Operands

Some instructions that use the VEX prefix, mainly AVX ones, use NDS (Non-Destructive Source) or NDD (New Data Destination) operands. Semantically it works by passing another operand to the instruction so that none of the source operands are modified as a result of the operation.

Syntactically NASM allows both the obvious format mentioned above and a compact format – compact meaning that if a user passes two operands instead of three, one of them is simply copied to be used as the source or destination. Thereby these instructions have exactly the same encoding:

     vaddpd xmm0, xmm0, xmm1 
     vaddpd xmm0, xmm1

Here the XMM0 register is used as the "non-destructive source" even though in this case it will of course be modified.

4.3. 64-bit moffs

The moffs operand can be used with the MOV instruction, only using the "A" register (AL, AX, EAX, or RAX), and for non-64-bit operand size means to address memory at an offset from a segment. For 64-bit operands it simply accesses memory at a specified offset (since segment based addressing is mostly unavailable in 64-bit mode). Syntax to use 64-bit offsets to address memory is showcased in section 13.2.2.

4.4. Split EA Addressing Syntax

Instructions that use the mib operand, (that is memory addressed with a base register, with some offset, with an added index register that's multiplied by some scale factor) can also utilize the split EA (effective addressing). The new form is mainly intended for MPX instructions that use the mib operands, but can be used for any memory reference. The basic concept of this form is splitting base and index:

     mov eax,[ebx+8,ecx*4]   ; ebx=base, ecx=index, 4=scale, 8=disp

NASM supports all currently possible forms of the mib syntax:

     ; bndstx 
     ; next 5 lines are parsed same 
     ; base=rax, index=rbx, scale=1, displacement=3 
     bndstx [rax+0x3,rbx], bnd0      ; NASM - split EA 
     bndstx [rbx*1+rax+0x3], bnd0    ; GAS - '*1' indecates an index reg 
     bndstx [rax+rbx+3], bnd0        ; GAS - without hints 
     bndstx [rax+0x3], bnd0, rbx     ; ICC-1 
     bndstx [rax+0x3], rbx, bnd0     ; ICC-2

4.5. No Syntax for Ternary Logic Instruction

VPTERNLOGD and VPTERNLOGQ are instructions that implement an arbitrary logic function for three inputs. They take three register operands and one immediate value that determines what logic function the instruction shall implement on execution. Specifically the output of the desired logic function is encoded in the immediate 8-bit operand. 3 binary inputs can be configured in 8 possible ways giving 8 output bits that could implement any one of 256 possible logic functions. Therefore it's not practical to have any syntax around different possible logic functions.

However there are some macro solutions that can help avoid writing out truth tables in order to use the ternary logic instructions. The simple, more manual way is to calculate the logic operation encoding on the fly with a few lines of arithmetic directives:

     a equ 0xaa 
     b equ 0xcc 
     c equ 0xf0 
     imm equ a | b & c

Here, values for "a", "b" and "c" together are all possible bit configurations that a 3 input function can take ("a" being the least significant bit and "c" being the most significant one). Then the "imm" variable is calculated by evaluating the desired logic function, in this case "a or b and c", thereby getting the function's output column that one would get when writing out the truth tables.

Note that only the expression must be written using the bitwise operators &, |, ^, and ~. Using the boolean operators &&, ||, ^^, ! and ? : will not work correctly.

The vtern standard macro package, section 7.6, allows for these kinds of expressions without introducing the symbols a, b and c into the global namespace:

%use vtern 
     vpternlogd xmm1, xmm2, xmm3, a | b & c 
     vpternlogq ymm4, ymm5, xmm6, (b ^ c) & ~a 
     ; a, b, and c are not defined as symbols elsewhere

4.6. APX Instruction Syntax

Intel APX (Advanced Performance Extensions) introduces multiple new features, mostly to existing instructions. APX is only available in 64-bit mode.

There are 16 new general purpose registers, R16 to R31.
Many instructions now support a non-destructive destination operand.
The ability to suppress the setting of the arithmetic flags.
The ability to zero the upper parts of a full 64-bit register for 8- and 16-bit operation size instructions. (This zeroing is always performed for 32-bit operations; this has been the case since 64-bit mode was first introduced.)
New instructions to conditionally set the arithmetic flags to a user-specified value.
Performance-enhanced versions of the PUSH and POP instructions.
A 64-bit absolute jump instruction.
A new REX2 prefix.

See https://www.nasm.us/specs/apx for a link to the APX technical documentation. NASM generally follows the syntax specified in the Assembly Syntax Recommendations for Intel APX document although some syntax is relaxed, see below.

4.6.1. Extended General Purpose Registers (EGPRs)

When it comes to register size, the new registers (R16–R31) work the same way as registers R8–R15 (see also section 13.1):

R31 is the 64-bit form of register 31,
R31D is the 32-bit form,
R31W is the 16-bit form, and
R31B is the 8-bit form. The form R31L can also be used if the altreg macro package is used (%use altreg), see section 7.1.

Extended registers require that either a REX2 prefix (the default, if possible) or an EVEX prefix is used.

There are some instructions that don't support EGPRs. In that case, NASM will generate an error if they are used.

4.6.2. New Data Destination (NDD)

Using the new data destination register (when supported) is specified by adding an additional register in place of the first operand. For example an ADD instruction:

     add rax, rbx, rcx

... which would add RBX and RCX and store the result in RAX, without modifying neither RBX nor RCX.

4.6.3. Suppress Modifying Flags (NF)

The {nf} prefix on a supported instruction inhibits the update of the flags, for example:

     {nf} add rax, rbx

... will add RAX and RBX together, storing the result in RAX, while leaving the flags register unchanged.

NASM also allows the {nf} prefix (or any other curly-brace prefix) to be specified after the instruction mnemonic. Spaces around curly-brace prefixes are optional:

     {nf} add rax, rbx       ; Standard syntax 
     {nf}add  rax, rbx       ; Prefix without space 
     add {nf} rax, rbx       ; Suffix syntax 
     add{nf}  rax, rbx       ; Suffix without space

4.6.4. Zero Upper (ZU)

The {zu} prefix can be used meaning – "zero-upper", which disables retaining the upper parts of the registers and instead zero-extends the value into the full 64-bit register when the operand size is 8 or 16 bits (this is always done when the operand size is 32 bits, even without APX). For example:

     {zu} setb al

... zeroes out bits [63:8] of the RAX register. For this specific instruction, NASM also eccepts these alternate syntaxes:

     {zu} setb ax 
     setb {zu} al 
     setb {zu} ax 
     setb {zu} eax 
     setb {zu} rax 
     setb eax 
     setb rax

4.6.5. Source Condition Code (Scc) and Default Flags Value (DFV)

The source condition code (Scc) instructions, CCMPScc and CTESTScc, perform a test which if successful set the arithmetic flags to a user specfied value and otherwise leave them unchanged.

NASM allows the resulting default flags value to be specified either using the {dfv=}...} syntax, containing a comma-separated list of zero or more of the CPU flags OF, SF, ZF or CF or simply as a numeric immediate (with OF, SF, ZF and CF being represented by bits 3 to 0 in that order.)

The PF flag is always set to the same value as the CF flag, and the AF flag is always cleared. NASM allows {dfv=pf} as an alias for {dfv=cf}, but do note that it still affects both flags.

NASM allows, but does not require, a comma after the {dfv=} value; when using the immediate syntax a comma is required; these examples all produce the same instruction:

     ccmpl {dfv=of,cf} rdx, r30 
     ccmpl {dfv=of,cf}, rdx, r30 
     ccmpl 0x9, rdx, r30                     ; Comma required

The immediate syntax also allows for the {dfv=} values to be stored in a symbol, or having arithmetic done on them. Note that when used in an expression, or in contexts other than EQU or one of the Scc instructions, parenteses are required; this is a safety measure (programmer needs to explicitly indicate that use as an expression is what is intended):

     ccmpl ({dfv=of}|{dfv=cf}), rdx, r30     ; Parens, comma required 
ocf1 equ {dfv=of,cf}                         ; Parens not required 
     ccmpl ocf1, rdx, r30                    ; Comma required 
ofcf equ ({dfv=of,sf,cf} & ~{dfv=sf})        ; Parens required 
     ccmpl ofcf2, rdx, r30                   ; Comma required

4.6.6. `PUSH` and `POP` Extensions

APX adds variations of the PUSH and POP instructions that:

informs the CPU that a specific PUSH and POP constitute a matched pair, allowing the hardware to optimize for this common use case: PUSHP and POPP;
operates on two registers at the same time: PUSH2 and POP2, with paired variants PUSH2P and POP2P.

These extensions only apply to register forms; they are not supported for memory or immediate operands.

The standard syntax for (P)PUSH2 and (P)POP2 specify the registers in the order they are to be pushed and popped on the stack:

     push2p rax, rbx 
     ; rax in [rsp+8] 
     ; rbx is [rsp+0] 
     pop2p rbx, rax

... would be the equivalent of:

     push rax 
     push rbx 
     ; rax in [rsp+8] 
     ; rbx is [rsp+0] 
     pop rbx 
     pop rax

NASM also allows the registers to be specified as a register pair separated by a colon, in which case the order is always specified in the order high:low and thus is the same for PUSH2 and POP2. This means the order of the operands in the POP2 instruction is different:

     push2p rax:rbx 
     ; rax in [rsp+8] 
     ; rbx is [rsp+0] 
     pop2p rax:rbx

4.6.7. APX and the NASM optimizer

When the optimizer is enabled (see section 2.1.24), NASM may apply a number of optimizations, some of which may apply non-APX instructions to what otherwise would be APX forms. Some examples are:

The {nf} prefix may be ignored on instructions that already don't modify the arithmetic flags.
When the {nf} prefix is specified, NASM may generate another instruction which would not modify the flags register. For example, {nf} ror rax, rcx, 3 can be translated into rorx rax, rcx, 3.
The {zu} prefix may be ignored on instruction that already zero the upper parts of the destination register.
When the {zu} prefix is specified, NASM may generate another instruction which would zero the upper part of the register. For example, {zu} mov ax, cs can be translated into mov eax, cs.
New data destination or nondestructive source operands may be contracted if they are the same (and the semantics are otherwise identical). For example, add eax, eax, edx could be encoded as add eax, edx using legacy encoding. NASM does not perform this optimization as of version 3.00, but it probably will in the future.

4.6.8. Force APX Encoding

APX encoding, using REX2 and EVEX, respectively, can be forced by using the {rex2} or {evex} instruction prefixes.