## SSE2 — OpCode List

(under construction -- this list might be incomplete?!)

Additionally, with AMD64's 64/128 bit register extensions some of the functionality changes...

Arithmetic:

`addpd`

- Adds 2 64bit doubles.

`addsd`

- Adds bottom 64bit doubles.

`subpd`

- Subtracts 2 64bit doubles.

`subsd`

- Subtracts bottom 64bit doubles.

`mulpd`

- Multiplies 2 64bit doubles.

`mulsd`

- Multiplies bottom 64bit doubles.

`divpd`

- Divides 2 64bit doubles.

`divsd`

- Divides bottom 64bit doubles.

`maxpd`

- Gets largest of 2 64bit doubles for 2 sets.

`maxsd`

- Gets largets of 2 64bit doubles to bottom set.

`minpd`

- Gets smallest of 2 64bit doubles for 2 sets.

`minsd`

- Gets smallest of 2 64bit values for bottom set.

`paddb`

- Adds 16 8bit integers.

`paddw`

- Adds 8 16bit integers.

`paddd`

- Adds 4 32bit integers.

`paddq`

- Adds 2 64bit integers.

`paddsb`

- Adds 16 8bit integers with saturation.

`paddsw`

- Adds 8 16bit integers using saturation.

`paddusb`

- Adds 16 8bit unsigned integers using saturation.

`paddusw`

- Adds 8 16bit unsigned integers using saturation.

`psubb`

- Subtracts 16 8bit integers.

`psubw`

- Subtracts 8 16bit integers.

`psubd`

- Subtracts 4 32bit integers.

`psubq`

- Subtracts 2 64bit integers.

`psubsb`

- Subtracts 16 8bit integers using saturation.

`psubsw`

- Subtracts 8 16bit integers using saturation.

`psubusb`

- Subtracts 16 8bit unsigned integers using saturation.

`psubusw`

- Subtracts 8 16bit unsigned integers using saturation.

`pmaddwd`

- Multiplies 16bit integers into 32bit results and adds results.

`pmulhw`

- Multiplies 16bit integers and returns the high 16bits of the result.

`pmullw`

- Multiplies 16bit integers and returns the low 16bits of the result.

`pmuludq`

- Multiplies 2 32bit pairs and stores 2 64bit results.

`rcpps`

- Approximates the reciprocal of 4 32bit singles.

`rcpss`

- Approximates the reciprocal of bottom 32bit single.

`sqrtpd`

- Returns square root of 2 64bit doubles.

`sqrtsd`

- Returns square root of bottom 64bit double.

Logic:

`andnpd`

- Logically NOT ANDs 2 64bit doubles.

`andnps`

- Logically NOT ANDs 4 32bit singles.

`andpd`

- Logically ANDs 2 64bit doubles.

`pand`

- Logically ANDs 2 128bit registers.

`pandn`

- Logically Inverts the first 128bit operand and ANDs with the second.

`por`

- Logically ORs 2 128bit registers.

`pslldq`

- Logically left shifts 1 128bit value.

`psllq`

- Logically left shifts 2 64bit values.

`pslld`

- Logically left shifts 4 32bit values.

`psllw`

- Logically left shifts 8 16bit values.

`psrad`

- Arithmetically right shifts 4 32bit values.

`psraw`

- Arithmetically right shifts 8 16bit values.

`psrldq`

- Logically right shifts 1 128bit values.

`psrlq`

- Logically right shifts 2 64bit values.

`psrld`

- Logically right shifts 4 32bit values.

`psrlw`

- Logically right shifts 8 16bit values.

`pxor`

- Logically XORs 2 128bit registers.

`orpd`

- Logically ORs 2 64bit doubles.

`xorpd`

- Logically XORs 2 64bit doubles.

Compare:

`cmppd`

- Compares 2 pairs of 64bit doubles.

`cmpsd`

- Compares bottom 64bit doubles.

`comisd`

- Compares bottom 64bit doubles and stores result in

`EFLAGS`

.

`ucomisd`

- Compares bottom 64bit doubles and stores result in

`EFLAGS`

. (

QNaNs don't throw exceptions with

`ucomisd`

, unlike

`comisd`

.

`pcmpxxb`

- Compares 16 8bit integers.

`pcmpxxw`

- Compares 8 16bit integers.

`pcmpxxd`

- Compares 4 32bit integers.

Compare Codes (the

`xx`

parts above):

`eq`

- Equal to.

`lt`

- Less than.

`le`

- Less than or equal to.

`ne`

- Not equal.

`nlt`

- Not less than.

`nle`

- Not less than or equal to.

`ord`

- Ordered.

`unord`

- Unordered.

Conversion:

`cvtdq2pd`

- Converts 2 32bit integers into 2 64bit doubles.

`cvtdq2ps`

- Converts 4 32bit integers into 4 32bit singles.

`cvtpd2pi`

- Converts 2 64bit doubles into 2 32bit integers in an

`MMX`

register.

`cvtpd2dq`

- Converts 2 64bit doubles into 2 32bit integers in the bottom of an

`XMM`

register.

`cvtpd2ps`

- Converts 2 64bit doubles into 2 32bit singles in the bottom of an

`XMM`

register.

`cvtpi2pd`

- Converts 2 32bit integers into 2 32bit singles in the bottom of an

`XMM`

register.

`cvtps2dq`

- Converts 4 32bit singles into 4 32bit integers.

`cvtps2pd`

- Converts 2 32bit singles into 2 64bit doubles.

`cvtsd2si`

- Converts 1 64bit double to a 32bit integer in a

GPR.

`cvtsd2ss`

- Converts bottom 64bit double to a bottom 32bit single. Tops are unchanged.

`cvtsi2sd`

- Converts a 32bit integer to the bottom 64bit double.

`cvtsi2ss`

- Converts a 32bit integer to the bottom 32bit single.

`cvtss2sd`

- Converts bottom 32bit single to bottom 64bit double.

`cvtss2si`

- Converts bottom 32bit single to a 32bit integer in a

`GPR`

.

`cvttpd2pi`

- Converts 2 64bit doubles to 2 32bit integers using truncation into an

`MMX`

register.

`cvttpd2dq`

- Converts 2 64bit doubles to 2 32bit integers using truncation.

`cvttps2dq`

- Converts 4 32bit singles to 4 32bit integers using truncation.

`cvttps2pi`

- Converts 2 32bit singles to 2 32bit integers using truncation into an

`MMX`

register.

`cvttsd2si`

- Converts a 64bit double to a 32bit integer using truncation into a

`GPR`

.

`cvttss2si`

- Converts a 32bit single to a 32bit integer using truncation into a

`GPR`

.

Load/Store:

(is "minimize cache pollution" the same as "without using cache"??)

`movq`

- Moves a 64bit value, clearing the top 64bits of an

`XMM`

register.

`movsd`

- Moves a 64bit double, leaving tops unchanged if move is between two

`XMM`

registers.

`movapd`

- Moves 2 aligned 64bit doubles.

`movupd`

- Moves 2 unaligned 64bit doubles.

`movhpd`

- Moves top 64bit value to or from an

`XMM`

register.

`movlpd`

- Moves bottom 64bit value to or from an

`XMM`

register.

`movdq2q`

- Moves bottom 64bit value into an

`MMX`

register.

`movq2dq`

- Moves an

`MMX`

register value to the bottom of an

`XMM`

register. Top
is cleared to zero.

`movntpd`

- Moves a 128bit value to memory without using the cache. NT is "Non
Temporal."

`movntdq`

- Moves a 128bit value to memory without using the cache.

`movnti`

- Moves a 32bit value without using the cache.

`maskmovdqu`

- Moves 16 bytes based on sign bits of another

`XMM`

register.

`pmovmskb`

- Generates a 16bit Mask from the sign bits of each byte in an

`XMM`

register.

Shuffling:

`pshufd`

- Shuffles 32bit values in a complex way.

`pshufhw`

- Shuffles high 16bit values in a complex way.

`pshuflw`

- Shuffles low 16bit values in a complex way.

`unpckhpd`

- Unpacks and interleaves top 64bit doubles from 2 128bit sources into 1.

`unpcklpd`

- Unpacks and interleaves bottom 64bit doubles from 2 128 bit sources into 1.

`punpckhbw`

- Unpacks and interleaves top 8 8bit integers from 2 128bit sources into 1.

`punpckhwd`

- Unpacks and interleaves top 4 16bit integers from 2 128bit sources into 1.

`punpckhdq`

- Unpacks and interleaves top 2 32bit integers from 2 128bit sources into 1.

`punpckhqdq`

- Unpacks and interleaces top 64bit integers from 2 128bit sources into 1.

`punpcklbw`

- Unpacks and interleaves bottom 8 8bit integers from 2 128bit sources into 1.

`punpcklwd`

- Unpacks and interleaves bottom 4 16bit integers from 2 128bit sources into 1.

`punpckldq`

- Unpacks and interleaves bottom 2 32bit integers from 2 128bit sources into 1.

`punpcklqdq`

- Unpacks and interleaces bottom 64bit integers from 2 128bit sources into 1.

`packssdw`

- Packs 32bit integers to 16bit integers using saturation.

`packsswb`

- Packs 16bit integers to 8bit integers using saturation.

`packuswb`

- Packs 16bit integers to 8bit unsigned integers unsing saturation.

Cache Control:

`clflush`

- Flushes a Cache Line from all levels of cache.

`lfence`

- Guarantees that all memory loads issued before the

`lfence`

instruction are completed before anyloads after the

`lfence`

instruction.

`mfence`

- Guarantees that all memory reads and writes issued before
the

`mfence`

instruction are completed before any reads or writes
after the

`mfence`

instruction.

`pause`

- Pauses execution for a set amount of time.