AMD MMX Extensions
MMX Extensions Overview
With the release of the Athlon
in 1999, AMD
extended the standard
with some new instructions. These instructions provide some enhanced
conversion and selection instructions, as well as some advanced cache
management instructions. Many of these extensions later found their way into SSE
During the development of SSE
it was somtimes called MMX2. This is incorrect. To further complicate
matters, in 2005 Intel
another SIMD technology named MMX2 for their
processor line, which should
actually be called WMMX2.
MMX Extensions — Cache Management
Similar to the
instructions, programmers are now given control over which cache level
they want the data to be loaded into. This allows a program to only fetch
data to the external caches, or all the way into the microprocessor
fetches data into the CPU
without using L1 and L2
fetches data to all cache levels.
fetches data to L1 and L2 cache.
fetches data to just the L2 cache.
We are also provided with an instruction that controls write ordering,
which is useful for multi-processor operation.
make all previous writes global. This will force a
new write to wait for other writes to complete before executing.
MMX Extensions — Data MovementThese extensions also provide a number of data-moving instructions.
These include some configurable data-shuffling instructions, masked
moves, and word insertion/extraction.
movntq is a non-temporal write. This means that it bypasses
the cache, keeping its contents unchanged. This instruction can only be
used to write to memory from a register.
maskmovq is a conditional move. This instrction also
bypasses the cache, like
movntq. It uses
point to a destionation address, and moves bytes from an MMX register to
edi memory location based upon the top bits of each byte
of another MMX register. (this really needs an example).
pmovmskb moves to top bits of each byte of an MMX register
into the bottom 8 bits of a regular 32-bit register.
pextrw extracts a selected word from an MMX register into the
bottom half of a 32-bit register. Selection is performed by an 8-bit
value, of which the bottom 2 bits are used to select which of the 4 words
in an MMX register to extract. The top 16 bits of the 32-bit
register are set to zero.
pinsrw is just like
pextrw, except it moves data
from the bottom of a 32-bit register or 16-bit memory location into a
selected word of an MMX register. None of the other words in the
destionation MMX register are changed.
pshufw is a completely crazy instruction that allows a
programmer to shuffle data between 2 MMX registers in 1 of 256 possible
ways. It takes 3 parameters; the two MMX registers and an 8-bit
permutation value. It uses the permutation byte 2 bits at a time to
determine where to shuffle the data from. This needs an example.
MMX Extensions — Integer OperationsAMD's extentions provide several new integer operations as
pavgb stores the rounded-up averages of an MMX register and
another register or memory location. This instruction is identical to the
pavgusb instruction. It operates on bytes, and treats
them as unsigned.
pavgw is just like
pavgb, except it operates on
16-bit words instead of 8-bit bytes. These are also treated a sunsigned
pmaxsw loads a register with the maximum value between that
register and another register or memory location. It operates on 16-bit
words, and treats them as signed values.
pmaxub is similar to
pmaxsw, except it operates
on 8-bit bytes instead of words, and threats the values as
pminsw loads a register with the minimum value between that
register and another register or memory location. Like
pmaxsw, it operates on 16-bit signed words.
pminub is just like
pminsw except it operates on
8-bit unsigned bytes instead of 16-bit signed words.
pmulhuw multiplies 4 16-bit unsigned words in a register with
4 more in another register or memory location, and stores the top 16 bits
of each 32-bit result.
psadbw calculates the sum of absolute differences. What this
means is that it calculates the byte difference between a register and
another register or memory location, and sums the absolute value of all 8
differences. The result goes into the bottom 16 bits of the first
register, and the top 48 bits are set to zero.
MMX is a registered trademark of Intel
or its subsidiaries in the United States and other
Athlon is a registered trademark of Advanced Micro