AMD MMX Extensions
MMX Extensions Overview
With the release of the
Athlon
in 1999,
AMD extended the standard
MMX instruction set
with some new instructions. These instructions provide some enhanced
conversion and selection instructions, as well as some advanced cache
management instructions. Many of these extensions later found their way into
SSE.
During the development of
SSE,
it was somtimes called MMX2. This is incorrect. To further complicate
matters, in 2005
Intel
announced
another SIMD technology named MMX2 for their
XScale processor line, which should
actually be called
WMMX2.
MMX Extensions — Cache Management
Similar to the
prefetch
and
prefetchw
instructions, programmers are now given control over which cache level
they want the data to be loaded into. This allows a program to only fetch
data to the external caches, or all the way into the microprocessor
cache.
prefetchnta
fetches data into the
CPU without using L1 and L2
cache.
prefetcht0
fetches data to all cache levels.
prefetcht1
fetches data to L1 and L2 cache.
prefetcht2
fetches data to just the L2 cache.
We are also provided with an instruction that controls write ordering,
which is useful for multi-processor operation.
sfence
make all previous writes global. This will force a
new write to wait for other writes to complete before executing.
MMX Extensions — Data Movement
These extensions also provide a number of data-moving instructions.
These include some configurable data-shuffling instructions, masked
moves, and word insertion/extraction.
movntq
is a non-temporal write. This means that it bypasses
the cache, keeping its contents unchanged. This instruction can only be
used to write to memory from a register.
maskmovq
is a conditional move. This instrction also
bypasses the cache, like movntq
. It uses edi
to
point to a destionation address, and moves bytes from an MMX register to
the edi
memory location based upon the top bits of each byte
of another MMX register. (this really needs an example).
pmovmskb
moves to top bits of each byte of an MMX register
into the bottom 8 bits of a regular 32-bit register.
pextrw
extracts a selected word from an MMX register into the
bottom half of a 32-bit register. Selection is performed by an 8-bit
value, of which the bottom 2 bits are used to select which of the 4 words
in an MMX register to extract. The top 16 bits of the 32-bit
register are set to zero.
pinsrw
is just like pextrw
, except it moves data
from the bottom of a 32-bit register or 16-bit memory location into a
selected word of an MMX register. None of the other words in the
destionation MMX register are changed.
pshufw
is a completely crazy instruction that allows a
programmer to shuffle data between 2 MMX registers in 1 of 256 possible
ways. It takes 3 parameters; the two MMX registers and an 8-bit
permutation value. It uses the permutation byte 2 bits at a time to
determine where to shuffle the data from. This needs an example.
MMX Extensions — Integer Operations
AMD's extentions provide several new integer operations as
well.
pavgb
stores the rounded-up averages of an MMX register and
another register or memory location. This instruction is identical to the
3DNow! pavgusb
instruction. It operates on bytes, and treats
them as unsigned.
pavgw
is just like pavgb
, except it operates on
16-bit words instead of 8-bit bytes. These are also treated a sunsigned
values.
pmaxsw
loads a register with the maximum value between that
register and another register or memory location. It operates on 16-bit
words, and treats them as signed values.
pmaxub
is similar to pmaxsw
, except it operates
on 8-bit bytes instead of words, and threats the values as
unsigned.
pminsw
loads a register with the minimum value between that
register and another register or memory location. Like
pmaxsw
, it operates on 16-bit signed words.
pminub
is just like pminsw
except it operates on
8-bit unsigned bytes instead of 16-bit signed words.
pmulhuw
multiplies 4 16-bit unsigned words in a register with
4 more in another register or memory location, and stores the top 16 bits
of each 32-bit result.
psadbw
calculates the sum of absolute differences. What this
means is that it calculates the byte difference between a register and
another register or memory location, and sums the absolute value of all 8
differences. The result goes into the bottom 16 bits of the first
register, and the top 48 bits are set to zero.
Trademark Information
MMX is a registered trademark of
Intel
Corporation or its subsidiaries in the United States and other
countries.
Athlon is a registered trademark of
Advanced Micro
Devices, Inc.