The ARM Diaries, Part 2: Understanding the Cortex A12
by Anand Lal Shimpi on July 17, 2013 12:30 PM EST- Posted in
- CPUs
- Arm
- SoCs
- Cortex A12
A couple of weeks ago I began this series on ARM with a discussion of the company’s unique business model. In covering semiconductor companies we’ve come across many that are fabless, but it’s very rare that you come across a successful semiconductor company that doesn’t even sell a chip. ARM’s business entirely revolves around licensing IP for its instruction set as well as its own CPU (and now GPU and video) cores.
Before we get into discussions of specific cores, it’s important to talk about ARM’s portfolio as a whole. In the PC space we’re used to focusing on Intel’s latest and greatest microarchitectures, which are then scaled in various ways to hit lower price targets. We might see different core counts, cache sizes, frequencies and maybe even some unfortunate instruction set tweaking but for the most part Intel will deliver a single microarchitecture to cover the vast majority of the market. These days, this microarchitecture is simply known as Core.
Back in 2008, Intel introduced a second microarchitecture under the Atom brand to target lower cost (and lower power) markets. The combination of Atom and Core spans the overwhelming majority of the client computing market for Intel. The prices of these CPUs range from the low double digits with Atom to many hundreds of dollars for the highest end Core processors (the most expensive desktop Haswell is $350, however mobile now extends up to $1100). There are other designs that target servers (which are then repurposed for ultra high-end desktops), but those are beyond the scope of this discussion for now.
If we limit our discussion to personal computing devices (smartphones, tablets, laptops and desktops), where Intel uses two microarchitectures ARM uses three. The graphic below illustrates the roadmap:
You need to somewhat ignore the timescale on the x-axis since those dates really refer to when ARM IP is first available to licensees, not when products are shipping to consumers, but you get an idea for the three basic vectors of ARM’s Cortex A-series of processor IP. Note that there are also Cortex R (embedded) and Cortex M (microcontroller) series of processor IP offered as well, but once again those are beyond the scope of our discussion here.
If we look at currently available cores, there’s the Cortex A15 on the high end, Cortex A9 for the mainstream and Cortex A7 for entry/low cost markets. If we’re to draw parallels with Intel’s product lineup, the Cortex A15 is best aligned with ultra low power/low frequency Core parts (think Y-series SKUs), while the Cortex A9 vector parallels Atom. Cortex A7 on the other hand targets a core size/cost/power level that Intel doesn’t presently address. It’s this third category labeled high efficiency above that Intel doesn’t have a solution for. This answers the question of why ARM needs three microarchitectures while Intel only needs two: in mobile, ARM targets a broader spectrum of markets than Intel.
Dynamic Range
If you’ve read any of our smartphone/tablet SoC coverage over the past couple of years you’ll note that I’m always talking about an increasing dynamic range of power consumption in high-end smartphones and tablets. Each generation performance goes up, and with it typically comes a higher peak power consumption. Efficiency improvements (either through architecture, process technology or both) can make average power in a reasonable workload look better, but at full tilt we’ve been steadily marching towards higher peak power consumption regardless of SoC vendor. ARM provided a decent overview of the CPU power/area budget as well as expected performance over time of its CPU architectures:
Looking at the performance segment alone, we’ll quickly end up with microarchitectures that are no longer suited for mobile, either because they’re too big/costly or they draw too much power (or both).
The performance vector of ARM CPU IP exists because ARM has its sights set higher than conventional smartphones. Starting with the Cortex A57, ARM hopes to have a real chance in servers (and potentially even higher performance PCs, Windows RT and Chrome OS being obvious targets).
Although we see limited use of ARM’s Cortex A15 in smartphones today (some international versions of the Galaxy S 4), it’s very clear that for most phones a different point on the power/performance curve makes the most sense.
The Cortex A8 and A9 were really the ARM microarchitectures that drove smartphone performance over the past couple of years. The problem is that while ARM’s attentions shifted higher up the computing chain with Cortex A15, there was no successor to take the A9’s place. ARM’s counterpoint would be that Cortex A15 can be made suitable for lower power operation, however its partners (at least to date) seemed to be focused on extracting peak performance from the A15 rather than pursuing a conservative implementation designed for lower power operation. In many ways this makes sense. If you’re an SoC vendor that’s paying a premium for a large die CPU, you’re going to want to get the most performance possible out of the design. Only Apple seems to have embraced the idea of using die area to deliver lower power consumption.
The result of all of this is that the Cortex A9 needed a successor. For a while we’d been hearing about a new ARM architecture that would be faster than Cortex A9, but lower power (and lower performance) than Cortex A15. Presently, the only architecture in between comes from Qualcomm in the form of some Krait derivative. For ARM to not let its IP licensees down, it too needed a solution for the future of the mainstream smartphone market. Last month we were introduced to that very product: ARM’s Cortex A12.
Slotting in numerically between A9 and A15, the initial disclosure unfortunately didn’t come with a whole lot of actual information. Thankfully, we now have some color to add.
65 Comments
View All Comments
JDG1980 - Wednesday, July 17, 2013 - link
So, roughly speaking, how does ARM IPC compare to x86? Obviously it's not going to be as high as on modern big-core desktop x86 parts like SB/IB/Haswell, but how does it compare to Atom (both the current generation and the new one in the pipeline) and Bobcat/Kabini?nathanddrews - Wednesday, July 17, 2013 - link
I think that was covered before:http://www.anandtech.com/show/6936/intels-silvermo...
coder543 - Wednesday, July 17, 2013 - link
The performance expectations (which relate to IPC) were misguided though. Intel's compiler was, essentially, cheating by skipping entire sections of the benchmark. http://www.androidauthority.com/analyst-says-intel...DanNeely - Wednesday, July 17, 2013 - link
If I'm interpreting the labels on that graph correctly ("K900 at 1.0"); they've under clocked the atom by 50% (from 2ghz to 1ghz) from what it normally operates at; which would flip the results back to Intel winning the majority of the tests.Krysto - Wednesday, July 17, 2013 - link
No, that's actually the *real* clock speed of Atom, which Intel misleadingly calls a "2 Ghz" core. The 2 Ghz speed is the turbo-boost speed, just like for laptops Haswell will really be clocked at 1.3 Ghz (same performance level as 1.5 Ghz IVB), and it goes up to 2.3 Ghz, or whatever, with Turbo-Boost.The problem is that I think benchmarks do use the Turbo-Boost speed fully, which means Atom will do very well in benchmarks, while that may not be the case in real day to day life, where the phone might never activate the Turbo-Boost speed, and just use the slower "real clock speed" all the time.
name99 - Wednesday, July 17, 2013 - link
The fact of Turbo-Boost is a problem for lazy benchmarking, it is NOT a problem for Intel.Why do you want a high speed core in your phone? There is a population that wants to run aggressive games, or transcode video or whatever, and these people care about sustained performance. But they're in the dramatic minority. For most users, the value of a high speed core is that it makes the phone more zippy, meaning that operations are fast when they need to be fast, after which the phone can go back to sleeping. The user-level "speed" of a phone is measured by how fast it draws a (single) PDF page, or renders a (single) complex web page, or launches an app, not by how it performs over any task that takes longer than a second. In such a world, if Turbo-Boost allows the app to sprint for a second, then go back to low-power mode, the user is very happy with that behavior.
The only "problem" with this strategy, for Intel, is that it is obvious and will be copied by everyone. Intel is there first and most aggressively for historical and process reasons, but there's no reason they will remain the only player.
(It's also quite likely that competitors will adopt the ideas of Turbo-Boost, just never call it that. After all the problem to be solved for phones is different from the REAL Turbo-Boost problem. Turbo-Boost comes from a world where you run the chip as hot as it can go --- till it just about to overheat. If an ARM core has no danger of actually overheating, then the design space is different. Now it's simply "we'll rate the core for 2GHz, but at that speed it uses up 10 nJ/op, so as far as possible we'll try to run it a 1GHz (using up 2 nJ/op) or better yet 100MHz (using up 10 pJ/op) [all numbers made up, but you get the point].)
Wilco1 - Wednesday, July 17, 2013 - link
All CPUs typically run at a far lower frequency than the maximum - I'm sure nobody believes eg. Krait 800 runs all 4 cores at 2.3GHz all the time. So if you call a specific frequency the "base" then anything faster than that is automatically a turbo boost. In that sense Intel's turbo boost is largely marketing, a way to claim a low TDP by setting the base frequency arbitrarily low, and allowing to go well over that TDP for a certain amount of time at a much higher boost frequency.inighthawki - Wednesday, July 17, 2013 - link
My 3.5/3.9GHz advertised i7 4770K runs at 800Mhz at idle :)TomWomack - Thursday, July 18, 2013 - link
My desktop Haswell with Intel's retail cooler runs all four cores 24/7 turbo-boosted - I'm not quite sure what to, it reports 3401MHz in Linux but that's because /proc/cpuinfo asks ACPI which isn't fully compatible with turbo-boost. The machine draws 100W from the mains while doing so, which (given that it idles at 28W) is entirely consistent with the 84W TDP.And, indeed, it runs at 800MHz at idle; and I suspect often slower than that, but /proc/cpuinfo doesn't report C-states
RicDavis - Friday, July 19, 2013 - link
Intel's turbostat has proven very useful for getting good reporting of CPU clock speeds under Linux. with the -v option it also displays the maximum speeds that CPU will run at as the no of active cores varies. Recommended. i7z is another option, but I've seen it do a bad job of showing which cores are active when hyperthreading is enabled.