SSE4 Instruction Set

SSE4 — An Overview

SSE4 was formally announced on September 27th, 2006, and became available in hardware in early 2007 for both Intel and AMD processors. Earlier hints were available, but were incomplete (old versions of this page were based on such reports).

SSE4 now comes in 3 flavors: SSE4.1, SSE4.2, and SSE4a. All together, there are 54 instructions, 47 of which belong to SSE4.1, the remaining 7 belonging to SSE4.2. SSE4a is from AMD (who didn't support all the SSE4 instructions), and adds 6 instructions for bit manipulation.

SSE4 — The Instructions

SSE4.1
mpsadbw - Sum of absolute differences.
phminposuw - minimum+index extraction (16bit word).
pmuldq - packed multiply.
pmulld - packed multiply.
dpps - dot product, single precision.
dppd - dot product, double precision.
blendps - conditional copy.
blendpd - conditional copy.
blendvps - conditional copy.
blendvpd - conditional copy.
pblendvb - conditional copy.
pblendw - conditional copy.
pminsb - packed minimum signed byte.
pmaxsb - packed maximum signed byte.
pminuw - packed minimum unsigned word.
pmaxuw - packed maximum unsigned word.
pminud - packed minimum unsigned dword.
pmaxud - packed maximum unsigned dword.
pminsd - packed minimum signed dword.
pmaxsd - packed maximum signed dword.
roundps - packed round single precision float to integer.
roundss - scalar round single precision float to integer.
roundpd - packed round double precision float to integer.
roundsd - scalar round double precision float to integer.
insertps - complex data shuffling.
pinsrb - complex data shuffling.
pinsrd - complex data shuffling.
pinsrq - complex data shuffling.
extractps - complex data shuffling.
pextrb - complex data shuffling.
pextrw - complex data shuffling.
pextrd - complex data shuffling.
pextrq - complex data shuffling.
pmovsxbw - packed sign extension.
pmovzxbw - packed zero extension.
pmovsxbd - packed sign extension.
pmovzxbd - packed zero extension.
pmovsxbq - packed sign extension.
pmovzxbq - packed zero extension.
pmovxswd - packed sign extension.
pmovzxwd - packed zero extension.
pmovsxwq - packed sign extension.
pmovzxwq - packed zero extension.
pmovsxdq - packed sign extension.
pmovzxdq - packed zero extension.
ptest - same as test, but for sse registers.
pcmpeqq - quadword compare for equality.
packusdw - saturating signed dwords to unsigned words.
movntdqa - Non-temporal aligned move (this uses write-combining for efficiency).

SSE4.2
crc32 - CRC32C function (using 0x11edc6f41 as the polynomial).
pcmpestri - Packed compare explicit length string, Index.
pcmpestrm - Packed compare explicit length string, Mask.
pcmpistri - Packed compare implicit length string, Index.
pcmpistrm - Packed compare implicit length string, Mask.
pcmpgtq - Packed compare, greater than.
popcnt - Population count.

SSE4a
lzcnt - Leading Zero count.
popcnt - Population count.
extrq - Mask-shift operation.
inserq - Mask-shift operation.
movntsd - Non-temporal double precision move.
movntss - Non-temporal single precision move.
Styles: Default · Green · Sianse