Using CPUID for SIMD Detection

CPUID

cpuid is an instruction added to Intel Pentiums (and some later 80486's) that enables programmers to determine what kind of features the current CPU supports, who made it, various extensions and abilities, and cache information.

This article will show you how to get information using cpuid, and how to interpret that information to detect support for MMX and its extensions, 3DNow! and its extensions, SSE and its extensions, and some other useful features.

Using CPUID

There are many ways to use cpuid, depending on where you are working. There is inline assembly for GCC and MSVC (both different), as well as using it in plain assembly files (for NASM et.al.)

cpuid uses the value in eax, and returns data into eax, ebx, ecx, and edx. The eax input is known as the "Function" input. It can have values from 0x00000000 to 0x00000001, and 0x80000000 to 0x80000008.

To use cpuid in GCC, it is normally easiest to just define a macro, like this:

#define cpuid(func,ax,bx,cx,dx)\
	__asm__ __volatile__ ("cpuid":\
	"=a" (ax), "=b" (bx), "=c" (cx), "=d" (dx) : "a" (func));


In the above, you simply put whatever function number you want in for func, and put in 4 variables that will get the output values of eax, ebx, ecx, and edx, respectively.

	int a,b,c,d;
	...
	cpuid(0,a,b,c,d);
	...


In NASM, you simply use the instruction cpuid, and handle the outputs as desired. In MSVC, it's a bit longer, but still not too crazy. (Please keep in mind that I don't use MSVC much at all anymore, so this isn't tested).
#define cpuid(func,a,b,c,d)\
	asm {\
	mov	eax, func\
	cpuid\
	mov	a, eax\
	mov	b, ebx\
	mov	c, ecx\
	mov	d, edx\
	}
And then you can call it with the same above snippet.

CPUID — Functions

As mentioned above, cpuid has many functions, depending on the microprocessor it is on. We'll start from the beginning.

Function 0x00000000:
Function 0 is used to get the Vendor String from the CPU. It also tells us the maximum function supported by cpuid. Every cpuid-supporting CPU will allow at least this function. I'll describe as many functions as I find information about, but please keep in mind that not all CPUs will handle all function values.

When called, eax gets the maximum function call value.
ebx gets the first 4 bytes of the Vendor String.
edx gets the second 4 bytes of the Vendor String.
ecx gets the last 4 bytes of the Vendor String.

When all the Vendor String bytes are lined up, they'll spell a clever phrase depending on who manufactured it.
Intel uses "GenuineIntel",
AMD uses "AuthenticAMD", and
Transmeta uses "GenuineTMx86".

There were other players back in the day, but they seem to have gone out of business or left the CPU market. They were SiS, Cyrix (now owned by Via), Centaur (also owned by Via?), NexGen (sold to AMD in 1996), National Semiconductor (their processor was called the 'Geode'), UMC (violated Intel patents, so they were never sold Stateside. Seemed to only produce some 486's), and Rise (IP sold to SiS in 1999).

Function 0x0000001:
Function 0x1 returns the Processor Family, Model, and Stepping information in eax. edx gets the Standard Feature Flags.


bits (eax) field
0-3Stepping number
4-7Model number
8-11Family number
12-13Processor Type
16-19Extended Model Number
20-27Extended Family Number


What we're really interested in is what's in edx, the Standard Feature Flags. This register holds a bitmask of all the features needed to determine which SIMD extensions our CPU supports. If we're trying to detect SSE3, we'll also want to look at ecx.

bit (edx) feature
18PN
19CLFlush
23MMX
25SSE
26SSE2
28HTT


bit (ecx) feature
0SSE3
Please note that these features are very abridged. There are many other features in the other bits of this register.

Function 0x00000002:
Function 0x2 tells us some information about the processor's cache and TLB configuration.

Function 0x00000003:
Function 0x3 gets us the bottom 64 bits of the processor's 96 it serial number in edx:ecx(if PN is enabled. This was only used on Pentium III processors. The top 32 bits are the Processor Signature, which is the value of eax after executing cpuid with eax=1 (Function 1).

Function 0x00000004:
Function 0x4 returns some more detailed information about the cache as well as information regarding how many cores the processor has. It looks like Function 0x4 can be used multiplt times to get more information about the various caches the CPU is aware of (?).

Function 0x00000005:
Function 0x5 is used for the monitor and mwait SSE3 instructions. These are used only by the Operating System, so they're not very useful for user applications.

Function 0x00000006:
Function 0x6 provides us with some information on Power Management and Temperature Control. Again, these are perating System features, and we won't bother with them much here.

While Intel was off making cpuid early on, they didn't mention how they planned to use the bits. So when other vendors started making extensions of their own, they needed a way to use cpuid to indicate their features without stepping on any toes. To perform this, they use the second function range (0x80000000+). These functions act just like their lower-valued counterparts.

Function 0x80000000:
Function 0x80000000 is just like Function 0, except it returns the highest Extended function supported. This is probably something like 0x80000008 (at least, for Athlons). The other registers are kept the same (the Vendor String).

Function 0x80000001:
Function 0x80000001 returns values into the same registers as Function 1, but some of the meanings have changed.
eax gets the Extended Stepping/Mode/Family Numbers.
edx gets the Extended Feature Flags.


bit (edx) feature
22AMD MMX Extensions
303DNow!2
313DNow!
Please Note that this is abridged. There are many other features in the other bits of this register.

SSE3:
With later revisions of Intel's Pentium 4 line, they extended SSE2 a bit further. AMD Athlon64's are supposed to support these instructions as well. These new instructions are known as SSE3 or Prescott New Instructions (PNI).

Please note that SSE extends the way the cpu operates, so not only does the CPU have to support it, but the OS also needs to support it. The way to detect this varies from system to system.
Windows98 and up support SSE
Linux 2.4 supports SSE (patches for 2.2, if it's not native by now)
Various BSD flavors have support for SSE as well.

With the above information in hand, detecting available SIMD extensions should be simple.

CPUID — Detecting Cache Line Sizes

Finding a processor's Cache Line Size is helpful if you want to aggressively flush and prefetch data into the cache using instructions like prefetch and clflush. Detecting this is different depending on who makes the CPU.

For Intel Microprocessors, the Cache Line Size can be calculated by multiplying bh by 8 after calling cpuid function 0x1.

For AMD Microprocessors, the data Cache Line Size is in cl and the instruction Cache Line Size is in dl after calling cpuid function 0x80000005.

Knowing the Cache Line Size allows you to dispatch effective prefetches and flushes, and can also help you align data to Cache Line boundaries to avoid loading from split Cache Lines, which can affect performance.

Trademark Information

AMD, Athlon, and 3DNow! are registered trademarks of Advanced Micro Devices, Inc.

MMX and Pentium are registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

Windows is either a registered trademark or trademark of Microsoft Corporation.

Styles: Default · Green · Sianse