
23 Jun 2011 Hamburg - At ISC11 we had a long interview with Dave Jursik Vice President of Deep Computing at IBM, about what it takes to build an Exascale system. But we started with talking about the new SuperMUC system at LRZ which has some new cooling techniques that allows it to run at the lowest clock speed possible, while delivering the maximum application performance.
Primeur Magazine: Can you tell a bit about your recent and planned installation in Europe?
Dave Jursik: : IBM has got a two prong strategy in HPC. While we have always been focused on large complex problems, examples of that would be any of the national labs, obviously in Germany, we had a long standing relationship with Forschungszentrum Jülich (Jülich). They have installed large production systems, like BlueGene, as well as experimental systems, like QPase, that was built out of cell processors for Jülich and two universities. That was an experiment. We did some custom development in both cases. We are trying to achieve a specific objective, in systems like those in Jülich. An example in the Netherlands is SARA, sponsored by NCF.
But the last year, the most significant success we had in Europe was the selection by LRZ in Munich to replace their installed production system. What was most interesting about that large system - called "SuperMUC" - was through the process of how LRZ conducted their procurement: they were very specific in setting out some aggressive goals for energy efficiency. This was a focus of the Bavarian government and therefore, of course, of their installation. They challenged all suppliers to find ways to reduce as much as possible the energy consumed while still providing a very high performance service for their users.
In collaboration with our German lab in Böblingen we developed a technology that allows the system to be cooled with, and that sounds a bit strange, hot water of 35 - 40 degrees Celsius. Which is completely unusual. In addition to that we also developed software that allowed for the optimal performance of the system, based on the requirement of each unique job. So it was really not that we are just putting hot water in it, we were really building a solution that would optimize the energy efficiency of that system, which was kind of interesting though through that specific collaboration in Germany in the development of our German plant. Obviously Intel was a direct party to that. We were developing a solution which will be beneficial in probably many other areas of Europe, where there are similarly high power costs.
Primeur Magazine: Is it a dynamic cooling system? In the way that it can adapt to the kind of application you are running?
Dave Jursik: : The performance of the system will be based on the needs of the application. So if an application is not dependent upon a fast clock, then it will not use one, therefore, it will not use the amount of electricity that would have been required otherwise. Most of the system, and it is not just turning nodes on and off, is actually making sure that the clock frequency is optimized for that particular application. So the software is scheduling and resource management software controls the execution of the job.
Primeur Magazine: How does the resource management know what the application needs?
Dave Jursik: : In general, the first time it runs, it collects all sorts of statistics, and that says here is how the job runs: actually we need more clock or less clock and at that point. It allows the user to run it differently the next time it runs. So the system captures the information. The user or administrator gets to choose how the code runs next time the job is being submit. The system administrator has a choice whether they allow the user to select how to run it, or do it themselves. It is quite unique, today I think, the ability to do it. Currently it is job dependent, but I think ultimately it will be job step dependent.
That was one of the significant things we did. Thinking about Europe, that was a very big one, and I think from an IBM stand point, as I told we have a two prong approach, one of them is being able to develop large energy efficient systems. This was the first very very large x86 solution. We have done nothing of this size and complexity before, it also had a unique Infiniband architecture, from Mellanox.
But also at the same time, if you looked at today's TOP500 announcement here at ISC11 as an example, we are also just as interested in our ability to provide solutions to a broad audience, so it is not just the big national labs, while they are always important, they are a very important part of our business, being able to provide effective solutions for mid sized companies, it may be strange to say that, but automobile companies are not the ones that have got national labs, they do not have Jülich sized systems installed. But whether it is Audi or Daimler they have difficult problems to solve. So the thing that you will notice from our TOP500 submissions, is that we have a large number that spans government research, classified research, academic research, life sciences, petroleum, electronics, so it is a very broad base and we are proud of being able to continue that, because the solutions we develop for them, are not necessarily the same as with the very large national labs.
An example of where we have extended the value proposition, is last weeks announcement, in which we announced a set of HPC class of Cloud tools. A management suite of software, which took the basic HPC software stack, and extended it so someone can use this for private Clouds. So they can actually, you know, define, implement and execute jobs that provision dynamically the infrastructure that is within the Cloud for a specific set of users, then within that our belief with Cloud environments is that you have to have an operational environment that is unique to the type of user you are serving. So we announced HPC Cloud for engineering. It is specifically designed for automotive, aerospace and electronics, designed for what engineering designers are going to need. So one example in automotive and aerospace is that there is a very heavy dependence on visualization. Our solution will include a visualization component, so that they are able to basically do everything from the design to the analysis, whether it is on their work station or whether that it is going to run on a large computational cluster or be managed in a private Cloud environment.
Primeur Magazine: What do you mean with private Cloud exactly?
Dave Jursik: : It is not public.
Primeur Magazine: Private or public is the easy part. But what do you mean with Cloud?
Dave Jursik: : The idea is that an automotive company could have a collection of different technologies, but managed by a single resource manager that is able to dynamically provision.
Primeur Magazine: It is more Platform as a Service?
Dave Jursik: : The argument would be it is more platform independent. But a cluster of resources would be managed by software that is able to provision or re-provision the architecture, including the right amount of disk capacity or capability and be able to optimize that as they are running their job.
Primeur Magazine: So you optimize resources, it is not like you are provisioning virtual machines or things like that?
Dave Jursik: : The idea is the HPC environment demands lots of resources, not just little resources. So the commercial environment is virtualizing a single CPU and attempt to aggregate as much capacity of the right type as is needed with the right storage architecture for a given job. The management tools we put as extensions into our job scheduler resource manager and some of the communication software will allow that to happen. When a company is interested, they can, of course, take open source software and write their own extensions. We are trying to provide an alternative for that if they want something that is developed and supported and something that can be extended with specific applications that they might have.
The basis is a software suite, there is a set of services that goes to implementing as a set of hardware that can be defined, but the thing that I think is the most interesting, is that in our belief Cloud cannot be generic: it has to be specific to a set of work loads . That is why we started with this. Anyone can use this software suite, but the additional function of adding a specific automotive engineering stack is what we will continue to do in other industries. We gained some experience with this.
So two prongs are really big things and we are going to have installations hopefully next year also in the US that will be very large systems. LRZ is the one that has been published. But we hope to continue having a strong presence also below the level of the peak of the pyramid. I again put emphasis on, it will not be necessary on the TOP500, but you will see the breadth. We have 213 systems in the TOP500 this year, and the numbers have been increasing. But the breadth is important to us. We will continue to be able to deliver large systems: you will see more of those popping up. But more of our focus has been on the software and services to the mid tier of the technical computer market. We are trying to balance both of those with more of an emphasis on software which is what our Cloud solution really is.
Primeur Magazine: How are you doing with PRACE, new architectures and that kind of things?
Dave Jursik: : We are actively involved in both PRACE and Prospect. You know the first PRACE system that became available was the IBM BlueGene in Jülich. And practically I think that HLRS is now contributing with the Cray machines. The Germans are very heavily investing. The French have begun in fact with CEA, announced this morning, and they will be even bigger. I did not know that. So, you know from an IBM stand point we have a huge research and development investment in Europe. Labs in La Gaude in France, Zürich in Switzerland, Böblingen in Germany, Mainz in Germany, Hursley in England. We have a very significant research population in Europe, and our interest - we explained this to the EU recently - is not just collaborating, but ensuring that what we are doing in development here in Europe, benefits Europe. Two interesting examples: remember I talked about LRZ and the cooling solution, that was actually done as an experiment initially for an institution in Switzerland and we extended that with further enhancements to do the thing in Germany with LRZ. So those investments are important, hence we are very involved in the large initiatives, like PRACE.
I think the European Union is very interested in ensuring that research is done here that contributes to science and economy. We have been investing with thousands of research staff that are here, which is unique.
We started with Deep Computing on demand a long time ago. Nobody was calling that Cloud at the time. We started with a centre in Poughkeepsie, New York. It was physical hardware built for whoever wanted to use it, for whatever purpose. The challenges with that was that provisioning took hours. The cleaning up of what was done, took hours and the scheduling therefore had to be done initially by the quarter, then by the month, then by the week, and now with what I describe, even though it is a private Cloud, the software was able to provision in minutes. So it was an experiment before Cloud was Cloud, but we got a lot of experience and actually some of the software functions that we developed there are now made part of the software stack that we have today. So for us it is not new, but when I told you before, that we aimed at breadth, what we did learn here is that the generic Cloud is a difficult business model as there is not enough value added: you just end up with providing cycles. We want to provide more value, which is why we do things like with the engineering Cloud that is very specific. So that was learned even though it was not really meant as an experiment. It was a business that we did OK at, but did not grow at the rate/pace we wanted and one of our conclusions was that the solution was not specific enough to any particular class of users, but anyway, yes, we have been at it for a very long time already.
Primeur Magazine: Could you tell a little bit more about processors and where they are heading. Because in the TOP500 you still have the Power processor for example. Is that continuing?
Dave Jursik: : IBM is continuing to develop Power based processors. We also have a huge business with x86. LRZ obviously is a good example and there are many more. But from the 231 TOP500 entries, most are power based. Which is typical, but on the other hand, the Power processor will be the core that is used in whatever we develop for Exascale. The reason for that is, that we believe that in order to be successful with Exascale, the natural progression of technology, will not magically provide a solution.
So our belief is that whatever happens to x86 will happen, but do no think that you can build a system out of it with millions of cores that an application can actually use in a system that is intolerant to failure. It is impossible, at least for IBM, to envisage one person doing chips and another is doing memory and someone else is doing software. That is not going to work. Our believe is that it has got to be an end-to-end system design that begins with fundamental research and needs invention of new technology that does not yet exist. It is going to be 3D chip technology for instance. It is going to be optics used for the interconnects on the chip. It is going to be different types of memory. It is going to require invention, which is why we have the research team here in Europe, and that is matched by the one in the US.
Our belief is that we have to have a view of the entire Exascale thing and enhance the Power chip as it will be at the base of whatever we do. The Power7 has been announced, we are delivering commercial systems, and HPC systems with that. It will have a follow on life. As you step forward to whenever Exascale comes, it is going to be a Power chip at the core of it. Hence we are going to continue the investment.
Again, this is the benefit of having a broad enough business , and you have to look at the IDC data, we have a several billion dollars worth of commercial power based business. So we have volume there and the same core will be used with complete different implementations for our future large HPC systems. So meanwhile, I fully expect we are going to continue to exploit whatever technologies Intel and AMD come up with, for the broader base of solutions that we provide. We have a good relationship with both companies that will continue.
The interconnect right now, is Infiniband. Faster versions of Ethernet will continue to grow too, so I think we will continue to build, what we hope, differentiated ix86 solutions, as we did for LRZ, to scale them up or down, depending on what somebody needs. But at the very, very high end for Exascale it will be the Power chip. That will continue past our collective retirements, I am sure.
Primeur Magazine: What do you estimate the power consumption will be for Exascale systems?
Dave Jursik: : The prediction that IBM research came up was that if you just left things to themselves and the industry, an Exascale system would take 120 MWatt/s to run. It is impossible. So something needs to happen and that just is not going to happen with the natural progression without a really significant invention.
Primeur Magazine: You try to put that all back in the Power architecture.
Dave Jursik: : Yes, we will. We are designing it, and again, it is not that people like Intel or AMD cannot design good chips, but Exascale is not just the chip, it is the interconnect, it is all the various components, it is the software architecture, it is the whole thing.
Primeur Magazine: Thanks for this interview.