3DNow! Instruction Set
3DNow! — The RegistersLike MMX, 3DNow! maps into the FPU registers, giving the programmer 8 64-bit wide registers to use. As in MMX, they are also addressed as MM0 - MM7. 3DNow!'s register format isn't as flexible as MMX's. In 3DNow! each register is composed of 2 32-bit floating point value. When using integer units, the formats are identical to the MMX formats (1 64-bit quantity, 2 32-bit quantities, 4 16-bit quantities, or 8 8-bit quantities).
3DNow! — State Management3DNow! uses the same space as MMX, so it can be cleared with the MMX
emms instruction. It should be used in the same places
(when transitioning from MMX/3DNow! mode to regular floating point
mode) as well.
In addition to the above, 3DNow! adds one more state management instruction,
emms), which operates
very much like
femms leaves the
contents of the MMX/3DNow! registers undefined, allowing it to execute
3DNow! — Cache ManagementBeginning with 3DNow!, SIMD instruction sets have added a few instructions that deal with managing the data cache of the processors. These allow the programmer to fetch data into cache while other data is being operated on, effectively hiding the RAM latency and preventing the CPU from stalling on cache misses.
3DNow! adds two new instructions,
prefetchw for these purposes.
prefetchw are almost identical.
The only exception is that
prefetchw prepares the cache to
be written to, in anticipation of writing. This is useful if the
programmer knows that they'll be changing the values located
there. In contrast,
prefetch just loads the data into
cache without expecting to write back to it. Early AMD
processors such as the K6-2 and K6-III treated
prefetchw exactly the same as
On the AMD Athlon, however,
prefetchw caused the
processor to mark the cache line as modified.
prefetchw take 1 parameter,
which is the address where to start loading data. A full cache line is
loaded, which is at least 32 bytes.
It's safe to prefetch an invalid memory location, so going off the end of an array is ok.
3DNow! — Integer InstructionsWhile 3DNow! is mainly for floating point use, there were a few integer instructions that were added to MMX as well.
pavgusb gives the rounded-up average of 8 unsigned 8-bit
quantity pairs. It takes 2 parameters. One parameter must be an MMX
register, and the other can be an MMX register or a memory
pmulhrw multiplies 4 16-bt quantity pairs, and returns
the highest 16 bits, rounded up. This is similar to the MMX instruction
pmulh, except that this one rounds. It takes 2
parameters, one of which is an MMX register. The other can be
another MMX register or a memory location.
3DNow! — Conversion Instructions3DNow! provides 2 instructions to convert between integer and floating point types. They are
convert integers to floating point and floating point to integers,
pi2fd takes 2 parameters. One is the destination, which
gets the floating point value, and the second is an MMX register or a
memory location that has the integer to convert.
pf2id also takes 2 parameters, for the same purposes. Of
course, this converts the other way.
3DNow! — Floation Point InstructionsFloating Point operation is the real power behind 3DNow!'s instruction set. There are instructions for all kinds of operations, including max and min functions, reciprocals, square roots (and reciprocal square roots), as well as ordinary add, subtract and multiply functions.
Max and Min
pfmax is the instruction used to get the maximum value of
2 pairs of floating point values (one register). Its first parameter
is an MMX register, and its second is another register or a memory
location. Once completed, the initial register will have the larger
value of each pair.
pfmin operates in the same way as
it stores the minimum instead of the maximum.
Comparison3DNow! gives us a few instructions for comparing MMX registers.
pfcmpeq is used to check for equality between a register and
another register or a register and memory. It compares both 32-bit values
at the same time. This instruction sets the initial register to all zeros
if the compare is false and all ones if the compare is true.
pfcmpge operates the same way
only it checks for greater than or equal to, not just equal to.
pfcmpgt operates the same way
pfcmpeq does, only
it checks for greater than (not equal to).
pfadd adds an MMX register and another MMX register or an
MMX register and a memory location together. Fairly simple.
pfacc performs an accumulation operation. It adds the top
and bottom values of the first register into the bottom of that register,
and stores the sum of another register or memory location's top and bottom
into the top of the first register.
pfsub subtracts an MMX register from another register or a
memory location. Just like
pfadd, but in reverse.
pfsubr performs a reverse-subtract. Instead of
subtracting the second parameter from the first (r1=r1-r2), this one
subtracts the first from the second (r1=r2-r1).
pfmul multiplies two registers or a register and a memory
locaion, and stores the results in the first register.
pfrcp stores the reciprocal of a register or memory location
into a register. This instruction is only accurate to 14 bits and takes
2 clock cycles to complete. Higher precision can be obtained by using a
few more instructions (listed later). This instruction duplicates the
result into the top and bottom halves of the destination register.
pfrsqrt performs a reciprocal square root of a memory
location or register, and stores it in the top and bottom halves of a
destination register, similar to
pfrcp. This instruction is
only accurate to about 15 bits, and full precision can be obtained by
using a few more instructions (listed below).
High Precision Reciprocals and Square RootsFor some applications, a quick approximation of a reciprocal or square root may be satisfactory. However, if more precision is needed, 3DNow! provides a few more instructions that extend
pfrsqrt above to higher-precision operations. These
improve accuracy by using a Newton-Raphston algorithm.
pfrcpit1 is the first iteration of Newton-Raphston. It
takes two input operands, the first being the number being
recriprocated, and the second being the output of that number passed
pfrcpit2 is essentially the same as
except it is the second iteration. Its inputs are the outputs of
pfrsqit1 is the first iteration of Newron-Raphson after using
pfrsqrt. It parallels
Trademark Information3DNow! is a registered trademark of Advanced Micro Devices, Inc.
MMX is a registered trademark of Intel Corporation or its subsidiaries in the United States and other countries.