AMD: Addressing the challenge of energy-efficient computing

Interested in learning what’s next for the gaming industry? Join gaming executives to discuss emerging parts of the industry this October at GamesBeat Summit Next. Learn more.

Back in 2014, Advanced Micro Devices set an aggressive goal of 25×20, or reaching 25 times better energy efficiency for its processors and graphics chips by 2020. The company exceeded that goal, and now it has set a new 30×25 goal, or 30 times better energy efficiency by 2025 in the machine learning and high-performance computing space in data centers.

I talked about this ambition with Sam Naffziger, who is AMD senior vice president, corporate fellow and product technology architect. Naffziger said that AMD’s graphics processing units (GPUs) and central processing units (CPUs) have undergone big changes over the past few generations as the company tries to balance the demands of enthusiast gamers, data center computing, and the need to deliver better power efficiency and performance-per-watt.

It’s a recognition that performance isn’t the only valuable metric to pursue. If our data centers melt the polar ice caps, they’re not very valuable anymore. While the chip industry is bumping up against the limits of Moore’s Law, Naffziger says he has a lot of confidence in the industry and his fellow engineers to innovate.

Here’s an edited transcript of our interview.

Samuel Naffziger is AMD senior vice president, corporate fellow and product technology architect. Image courtesy AMD.

VentureBeat: Can you tell us about your background and AMD’s interest in energy efficiency?

Sam Naffziger: I’ve been at AMD 16 years. I’ve been leading our power efficiency, power technology for much of that time. For the last few years I’ve been in a product architecture role across the company, optimizing all of our products to make them the best possible in the world. Starting in late 2017, I went to the graphics division to lead an effort to drive the performance-per-watt and overall performance and efficiency to regain competitiveness and leadership there. That’s what I’ve been focused on for a number of years.

We’ve developed an extremely strong track record now that we’re pretty excited about. It comes at a compelling time in where the industry is at. The power consumption of pretty much everything, from servers to high-performance computing to gaming, is going up and to the right. It’s a very opportune time to focus on efficiency improvements. That’s what we’ve been doing for quite some time. In fact, it goes back – I don’t know if you’re familiar with the 25 by 20 initiative that kicked off long ago. It seems like a whole different world now. But that was a bold goal set in 2014 to develop our notebook processors to a 25X efficiency improvement.

The way we like to do things at AMD is very transparent, and not broad, unmeasurable goals. The kind that sound compelling, but you can’t be held accountable to. We’re very transparent with the methodology for measuring there. We tracked generational improvements over time. By the 2020 product deployment, we had met and exceeded that 25X goal, which was not an easy thing to do. It required driving performance up and power down simultaneously, a lot of innovation at the engineering level.

We wanted to build on that success. Notebooks are great, and certainly efficiency and battery life drive a lot of the consumer experience improvements there. But as far as having a big environmental impact and improving the overall energy footprint of IT equipment, we raised our sights to the data center as well, with the 30 by 25 goal that we rolled out last year to drive a 30X efficiency gain in the machine learning and high-performance computing space. That’s an area that you watch closely. I was super excited that we got into the most recent Top 500 and Green 500 lists and took the top spots there with our Epyc products. That’s the first step on the road to 30X efficiency.

Those CDNA products go hand in glove with RDNA. They share a common core of graphics IP and components. The methodologies and approaches apply to both. That’s where we’ve been focusing on the gaming side as well. What we did is, back when I joined the graphics group, we set out a long-term road map. These sorts of improvements take many years to develop and to deliver to the market. We set a long-term plan which encompassed four generations of GPU development. We started with the ground-up RDNA architecture, with the Navi 10 product. With 7nm and everything else we got a good 50 performance-per-watt boost with that product. Then, in 2020 we delivered what people called the Big Navi, Navi 21, which was the same 7nm technology, but it was the recipient of many of the methodologies and approaches that we drove in the intervening years to deliver another 50% plus on top of the first RDNA generation.

What was particularly interesting about that achievement, and something that we continue to build on, is we are leveraging the unique strengths of AMD in having leadership CPU and GPU technology. Our competitors either have good CPUs or good GPUs, but nobody has both, at least not yet. We have a very collaborative engineering culture here. We just thrive on innovating, solving hard problems, working together across the company. As we looked at what it would require to hit our efficiency goals for graphics, we engaged our CPU designers, who had done a fantastic job with the Zen architecture and delivery there.

Graphics architecture is a very different design space. It’s handling textures and pixels, highly parallel. It has historically been hovering around 1 GHz forever. We did a bunch of deep dives and design reviews to figure out what we could do to leverage CPU capabilities and radically improve what graphics could deliver for efficiency. That’s where a lot of the RDNA 2 gains came from.

amd sam 3
CPUs are on a relentless path for more performance. Image courtesy AMD.

VentureBeat: My impression over the years has been that Nvidia always pushed for performance, and quite often didn’t care so much about the power consumption. They tried to set themselves apart on that front relatively, and relative to someone like Intel that made sense. Whereas AMD was in a different space that looked at some tradeoffs between performance and energy efficiency. You could compete well against someone like Nvidia by putting two graphics cards into the space where one Nvidia card would fit, because the Nvidia card was using so much power. I thought that was an interesting way to position, but is there more nuance you can bring to that picture as far as how you see some of these competitive dynamics? Maybe you would leapfrog at one point, but then they would leapfrog at another. The competition and market share would constantly swing back and forth.

Naffziger: There are various games that can be played. A dual GPU can be operating at a more efficient point, delivering more performance-per-watt. Whether that’s beneficial to the average gaming experience is another question. That’s difficult to coordinate. But it is a matter of focus. We certainly were – not short-changing Nvidia’s contributions, because they do have very power-efficient designs, and have had that. We were behind for a number of years. We made a strategic plan to never fall behind again on performance-per-watt.

Power efficiency provides more flexibility in design. With a more power-efficient design, we can choose to either maximize performance, still burning a lot of power, or optimize the efficiency. That was another aspect that we’ve exploited and invested in substantially: power management. It takes advantage of the wide operating range of these products. We’ve driven the frequency up, and that is something unique to AMD. Our GPU frequencies are 2.5 GHz plus now, which is hitting levels not before achieved. It’s not that the process technology is that much faster, but we’ve systematically gone through the design, re-architected the critical paths at a low level, the things that get in the way of high frequency, and done that in a power-efficient way.

Frequency tends to have a reputation of resulting in high power. But in reality, if it’s done right, and we just re-architect the paths to reduce the levels of logic required, without adding a bunch of huge gates and extra pipe stages and such, we can get the work done faster. If you know what drives power consumption in silicon processors, it’s voltage. That’s a quadratic effect on power. To hit 2.5 GHz, Nvidia could do that, and in fact they do it with overclocked parts, but that drives the voltage up to very high levels, 1.2 or 1.3 volts. That’s a squared impact on power. Whereas we achieve those high frequencies at modest voltages and do so much more efficiently.

With the smart power management we can detect if we’re in a phase of a game that needs high frequency, or if we’re in a phase that’s limited by memory bandwidth, for instance. We can modulate the operating point of the processor to be as power efficient as possible. No need to run the engine at maximum frequency if you’re waiting on memory access. We invested heavily in that with some very high-bandwidth microcontrollers that tap into the performance monitors deep in the design to get insights into what’s going on in the engine and modulate the operating point up and down very rapidly. When you combine that capability with the high frequency, we can end up with a much more balanced design.

The other thing is just the bread-and-butter of switching capacitance optimizations. Most of my background is in CPU design. I drove a lot of the power improvements there that culminated in the Zen architecture. There’s a lot of detailed engineering metrics that we drive that analyze the efficiency of the architecture. As you can imagine, we have billions of transistors in these things. We should only be wiggling the ones that are delivering useful work. We would burn thousands of watts if we switched all the transistors simultaneously. Only a tiny fraction of them are necessary to do the work at a given point in time.

We analyze our design pre-silicon, as we’re in the process of developing it, to assess that efficiency. In other words, when a gate switches, did we actually need to switch it? It’s a mentality change that is analyzing the implementations to look at every bit of activity and see whether it’s required for performance. If it’s not, shut it off. We took those kinds of approaches and that thinking from our CPU side and drove a pretty dramatic improvement in all of those switching metrics. We absolutely analyzed heavily the Nvidia designs and what they were doing, and of course targeted doing much better.

amd sam 4
It isn’t easy keeping up with user demands. Image courtesy AMD.

VentureBeat: I remember when Raja Koduri shifted over to Intel in 2017. I know that one person can’t make that huge a difference, but is there anything you would trace to pre-Raja and post-Raja in terms of how AMD looks at graphics? Is there anything you gravitated more or less toward?

Naffziger: Raja is a visionary. He paints a great and compelling picture of the gaming future and features that are required to drive the gaming experience to the next level. He’s great at that. As far as hands-on silicon execution, his background is in software. He definitely helped AMD to improve our software game and feature sets. I worked closely with Raja, but I didn’t join the graphics group until after he had left. He had a sabbatical there and went to Intel. So as far as the performance-per-watt, that was not really Raja’s footprint. But some of the software dimensions and such.

VentureBeat: How much do you credit things like, say, manufacturing staying on track and design taking the right approach as well? It was an interesting time in the last few years, where TSMC outdid Intel. That was such a shock to the system. It was so different from what people were used to. How important was it to have these things happening at the same time? Interesting directions in design, but also much more competitive foundries.

Naffziger: That’s a very important point. The underlying manufacturing technology is absolutely critical. In fact, usually when we do the product launches, we break out the percentage gains that we got from each dimension – performance-per-watt, power efficiency optimizations, process technology. That was key. We placed our bets with TSMC and the 7nm delivered. Of course we’re continuing to leverage their latest generation of technology. Nvidia has the freedom to choose TSMC as well. As you know, Intel is going to be leveraging TSMC also, especially for graphics. Their new Arc line has the same process technology as our GPUs. In some sense, with freedom of choice we have a level playing field there in tech. But it’s key.

The other thing to point out is that from RDNA 1 to RDNA 2, that was the same 7nm, and we still managed to squeeze a doubling of performance and a 50% gain in performance-per-watt. That’s just design prowess. We’re proud of that. Some of that was not just the basics of optimizable switching. We also did innovative architecture developments. The Infinity Cache in particular was an exciting thing to bring to market. That, as well as some of the power optimizations, was a CPU-leveraged capability. At the core of that is the same dense SRAM array that we use in our CPU designs for the L3 cache. It’s very power-efficient, very high bandwidth, and it turned out it was a great fit for graphics. No one had done such a large last-level cache like that. In fact, there was a lot of uncertainty as to whether the rates would be high enough to justify it. But we placed a bet, because going to a much wider GDDR6 interface is certainly a high-power solution for getting that bandwidth. We placed a bet on that. We went with a narrower bus interface and a large cache. That’s worked well for us. We see Nvidia following suit with larger last-level caches. But no one’s at 128MB yet.

VentureBeat: What has it been like for AMD to get in the data center in a much bigger way with graphics, and getting into supercomputers as well?

Naffziger: It’s been a great engineering challenge. We made a strategic choice to bifurcate our graphics line. They share a lot of common components, but different architecture lines, the Compute DNA and Radeon DNA. That enabled us to optimize the compute architecture to be the best possible on just those functions. Much wider math data paths, much higher bandwidth to the caches and to memory of course, using HBM. And also jettisoning the overhead for 3D rendering. There’s no need for pixel processing if you’re just deploying in a supercomputer or an AI-training network. That freed up more area for high-bandwidth memory, for big math data paths, and the capabilities that compute needs.

amd sam 9
GPU power consumption is also climbing. Image courtesy AMD.

That was a lot of fun once we had that separate sandbox, if you will, where it’s just a compute optimized design. Let’s go and just kill it for that market space. And the same approaches of optimizing the switching, the clocking, the power management, everything else, those of course could be leveraged between gaming and compute. That’s been great. It’s a continual learning process. But as you can see, we’ve achieved great efficiency.

The other thing we rolled out at our financial analyst day that we’re looking forward to delivering later this year is the RDNA 3. We’re not going to let our momentum slow at all in the efficiency gains. We publicly went out with a commitment to another 50% performance-per-watt improvement. That’s three generations of compounded efficiency gains there, 1.5 or more. We’re not talking about all the details of how we’re going to do it, but one component is leveraging our chiplet expertise to unlock the full capabilities of the silicon we can purchase. It’s going to be fun as we get more of that detail out.

VentureBeat: As far as the concern that we were running into walls with things like Moore’s Law hitting limits and other physical limitations looming, how concerned are you about that at this point?

Naffziger: I’m concerned in the sense that it drives new dimensions of innovation to get the efficiencies. The silicon technology is not going to do it for us. We’ve seen this coming for a long time. Like I said, lead times are long. We’ve been investing in things like the Infinity Cache, chiplet architecture and all these approaches that exploit new dimensions to keep the gains coming. So yes, it’s a big concern, but for those who prepare in advance and invest in the right technology, we have a lot of opportunity still.

amd sam 8
Energy efficiency trends. Will servers melt the ice caps? Image courtesy AMD.

VentureBeat: Compared to Nvidia and Intel, do you feel like we’re in a state of divergence when it comes to designs, or some kind of convergence?

Naffziger: It’s hard to speculate. Nvidia certainly hasn’t jumped on the chiplet bandwagon yet. We have a big lead there and we see big opportunities with that. They’ll be forced to do so. We’ll see when they deploy it. Intel certainly has jumped on that. Ponte Vecchio is the poster child for chiplet extremes. I would say that there’s more convergence than divergence. But the companies that innovate in the right space the soonest gain an advantage. It’s when you deliver the new technology as much as what the technology is. Whoever is first with innovation has the advantage.

Originally appeared on: TheSpuzz