ASTRON & IBM Center for Exascale Technology

Newsletter 4

Dear reader,

We are now approximately 2.5 years into the Dome project, which means we are about half way. What looked like a very, very difficult and even close to impossible task when we began early 2012 is gradually developing into a project which shows some light at the end of the tunnel! 

Saving on energy

Our initial take on the full-scale SKA radio telescope, using then established methods, would have resulted in a piece of IT equipment easily consuming more than 5 gigawatts of electrical energy, even taking into account technology scaling as we have enjoyed in the last 50 years (also known as Moore's law). Five gigawatts is an inordinate amount of energy, hardly affordable over time. And impossible to obtain in the deserts of Southern Africa and Western Australia.

However, with a lot of groundbreaking work we have identified a few pieces of technology which can save the day. Firstly, we have proposed an integrated design using the latest high-speed analog-to-digital conversion technology to digitize the signals from the well over 250,000 consuming as little energy as possible. This not only saves energy during digital conversion, but an equal amount of energy in cooling the associated IT equipment - a doubling of the savings. Secondly, our experiments with the transmission of analog data over fibre are showing some really exciting results. You can read about them in this newsletter.

Gauss Award

With several exabytes of raw data coming from the antennas and a daily data production of at least 1 petabyte, one needs to be really, really careful transporting this data: the transport of 1 exabyte of data at 100 gigabits per second takes well over 2.5 years! Transmitting 1 petabyte still takes close to a day. It is obvious that the transport and the processing of this data needs to be closely integrated. Transporting too much data will jeopardize the system operation.

At the 2014 International Supercomputing Conference (ISC'14) in Leipzig, Germany, the Dome team was awarded for its contribution with the prestigious Gauss Award - for outlining which kind of systems are likely to meet the SKA requirements. It comes as no surprise that even the most powerful graphics processors (GPU's) available today cannot do it, they are not optimized for processing such huge amounts of data.

Talking research

In this newsletter you will find interviews with a number of Dome researchers from IBM and ASTRON, speaking about each of the 7 research streams that are now active. They will share some of the main intermediate results with you, ranging from innovative, intelligent storage concepts and water-cooled microservers to totally new ways of addressing the processing requirements for radio telescopes.

Enjoy reading!

 

Albert-Jan Boonstra and Ton Engbersen

Scientific Directors Dome for ASTRON and IBM

 


 

1: Algorithms & Machines

A working calculation tool for compute speed and power consumption

 

Dome research package number 1, Algorithms & Machines, was the first to really kick off. This team has created a model that calculates the amount of compute power and electricity needed to turn a radio signal into a picture. Change one parameter and, indeed, the model will tell you if that would be more efficient. Stefan Wijnholds and Rik Jongerius explain why this tool is so important and what is still on their to do list.

 

Rik Jongerius and Stefan Wijnholds

Rik Jongerius and Stefan Wijnholds

 

End-to-end baseline

“When we started off two years ago, the first thing we had to do was to create a tool that would calculate how much compute power and how much electrical power the SKA will need for each data step”, Stefan Wijnholds explains. “So we built a model that shows you exactly what happens when you change anything in the SKA architecture. How will it affect both performance and energy consumption? And how will that look on the energy bill?“

“The model works, as we have shown during conferences in Australia and England”, Rik Jongerius says. “And even though we are still tweaking this tool and we may want to adapt it to future data processing innovations, we now have an end-to-end baseline of the entire compute process, from the antenna to the astronomical sky picture. That in itself is a good intermediate result, it is valuable input for optimizing the SKA design. It also creates a frame of reference for the other Dome research teams. Our friends in workstream number 6 for instance, Novel Algorithms, may use this tool to identify the elements that consume the most or the least energy. It is a means of prioritizing.”

What astronomers want

Judging by the name Algorithms & Machines, calculations are not the only thing on their plate. So Stefan and Rik continue: “We have two important steps to take. One is collecting science cases. What exactly will astronomers want from the SKA? Most likely, they will not always need the full pipeline, maybe just a section of it. We need to identify which scientific queries consume the largest amount of energy.”

“The other thing is hardware. We can calculate the effects of changes in SKA architecture. But what about the hardware: processing units such as CPU’s, GPU’s, various accelerators? Calculating all the scenarios by hand would be drudgery. So we will have to apply smart algorithms to handle specific hardware platforms. Together with our fellow scientists at IBM Research in Zurich, we have made a lot of progress on this.”

 


 

2: Access patterns

How the road to the future could be paved with good old tape

 

The SKA is going to gather so much information, that we simply won’t be able to store it on a hard disk. Even if it were physically possible, it would be too expensive because of the energy it would consume and because hard disks are relatively costly. So Dome researchers are looking for reliable high-volume, low-cost storage. But how to make sure that the way the information is stored won’t hold back performance? Meet Yusik Kim and Yan Grange.

 

Yan Grange and Yusik Kim

Yan Grange and Yusik Kim

 

Tape is cheap

Yusik Kim explains why good old tape might be part of the answer: “Tape is cheap. You save the data and store it away. It won’t use any energy, as opposed to disk drives, which are always spinning.” But tape also has its disadvantages. For one thing, re-accessing data takes more time when the information is stored on tape. Pieces of information are placed one after the other on magnetic tape, so searching through the data is a lengthy operation.

Obviously, Dome researchers want the best of both worlds: “Yes, we want to optimize both response time and cost”, Yusik says. “So we’re looking at tiered storage: assigning different categories of data to different types of storage media. In order to do that, you need to choose the right tier every time you want to store data. If you can identify specific access patterns, recognizing how data is written to a file and then read, processed and re-stored later, you can make the right choice.”

Predictive caching

His colleague Yan Grange continues: “Over the past two years, we’ve put most of our time and effort into creating a model that will choose the best storage tier, based on those access patterns. Each application has its own fingerprint in terms of access patterns. We need to predict what kind of information will be accessed, when and how. For the long term, we can make fair predictions based on historic behaviour now, the model works.”

And then there is the short term. “There are lots of fluctuations in activity on the short term. If you can successfully predict that, you can really optimize data placement by putting the right information in a fast cache device for quick access. We call that short-term predictive caching. And it’s really difficult. We don’t even know exactly what is going to be stored into the SKA archives. Just as we don’t know what astronomers will want to do with that data. We simply have to make assumptions and integrate new insights as they come along, just like other researchers do.”

 


 

3: Nanophotonics

Where the fun lies in making smart choices

 

‘Nano’ derives from ‘nanos’, the old Greek word for ‘dwarf’. A nanometre is one millionth of a millimetre. In nanophotonics, pulses of light are sent through and manipulated by silicon chips with optical circuits that are only a few hundred nanometres thick and wide. Glassfibre cables and waveguides transport information between these chips over a distance of a few millimetres up to many kilometers. And these are some of the questions Peter Maat and Jonas Weiss are asking themselves: What is the best technology to turn the radio waves from the SKA antennas into an optical signal (light)? And how do we transport it reliably over the required distances to the receiver electronics? 

 

Peter Maat and Jonas Weiss

Peter Maat and Jonas Weiss

 

Radio over fibre

Peter Maat is happy to share a cost-cutting concept: “There will be about 250,000 low-frequency antennas at the SKA. And there are two good reasons for not wanting a lot of electronics on site. The signals they emit might disturb the radio signals we want to pick up from deep space. Also, if we were to process each antenna’s radio signal at the 250,000 antennas, it would cost a great deal of money. So the idea is to combine signals and put them through one processing unit.”

Jonas Weiss jumps in: “This is where we go from electrical to optical. We’ve already tested our own design for such a link and it performed better than expected. We’ve been lucky so far, we’ve had very few failures. We’re involved in two SKA consortia, working on getting the right specifications for our analog-optical link technology. That has proven to be difficult, as the general system specs won’t tell us exactly what the SKA is going to need once it’s up and running. But it’s fun to make smart choices, picking low-cost items to create high-end solutions.” Peter adds: “We’re proud to say that we are able to build a good link for under 50 euros.”

Reaching out to industry

Then there is the matter of getting the signal from the transmitter to a receiver, and on to the computer. “Distances vary from 200 meters to about 2 kilometres”, Jonas explains. “On short distances, we could use multi-mode technology, with light travelling through the glass fibre across various paths at the same time. This has its limits. You could compare it to many people in a room, talking at the same time; the risk is that you get a mixed-up signal. So for very long distances, we’re focussing on single-mode, just one signal, one path. You can even go transatlantic that way. Each technology has its own advantages and disadvantages. We’re trying to find the optimal mix to meet the SKA’s very specific requirements.”

Peter is happy his Dome work also takes him out of the lab. “In order to create solutions that are truly viable, we need to reach out to industry. I’ve been talking to some highly specialized Dutch companies for instance, some of them might join the Dome Users Platform at some point. They are coming up with innovative ideas and production infrastructures, to make the technology affordable. It’s an interesting journey.”

 


 

4: Microserver

Establishing a new class of server: small, more powerful and energy-efficient

 

On Thursday 3 July, a prototype of the Dome microserver was unveiled at the ASTRON & IBM Center for Exascale Technology in Dwingeloo. A revolutionary piece of power-saving technology with a 64-bit server CPU placed on a 133 x 55 mm board, roughly the size of a smartphone. It functions like a full-fledged business class server, but it is 4 to 10 times smaller. The microserver boasts innovations like a hot-water cooling system which also supplies electrical power. Andreas Doering and Matteo Cossale tell us more. 

 

Matteo Cossale and Andreas Doering

Matteo Cossale and Andreas Doering

 

A new class of server

“Actually, we had a head start”, Andreas says. “We had been doing lots of preparatory work and high-level design studies since 2011, before the Dome contract was even signed. We had a very clear idea of what we wanted to do and how. The fact that we’d already started working on the microserver was one of the things that helped make Dome convincing as a science alliance.”

“To me, it was simply the right time to establish a new class of server. When you look back, we started with mainframes, big servers with dedicated hardware, not for the consumer market. In the 1980s desktop PC’s using rack servers were invented and around 1990 laptops were introduced using blade servers. The microserver we are now working on will also find their way into tablets and smart phones. Business servers and consumer computers have a parallel development in that sense.”

“IBM has a vested interest in this type of innovation, of course. From a scientific point of view, it is also very appealing. I’ve always had a fascination for parallelism and there is a lot of that in the microserver.” Parallelism works on the principle of using more than one thread in order to complete a query faster. “We have many nodes on it and each node is more power-efficient than most existing servers, so the ratio of energy consumption to compute power is good”, Andreas enthuses.

Doing the right thing right

Matteo sheds some light on the multifunctional water-cooling system. “Using a liquid coolant instead of air has various advantages, which make it possible to have such a densely packed, compact device. The water-cooling system performs three tasks at once. Firstly, water runs through microchannels to keep the operating temperature below 85 degrees. Secondly, we use the copper plate we call thermal packaging to feed electric power into the microserver. And thirdly, this packaging adds mechanical stability to the microserver as a whole. Although copper is a great conductor for both electricity and heat, it also has its drawbacks, being heavy and expensive. We will probably replace it by an even better kind of material within the next few years, as soon as we find it.”

Working with various Dutch companies like TPC Electronics, Structon and Transfer DSW has been a positive experience. “They have been extremely flexible, constantly adapting to new insights and requests, coming up with solutions. That’s impressive.” Nevertheless, the microserver still poses several challenges. “The microserver consists of many components”, Andreas says, “and we are not specialists on all of them. Having to make the best choice for each of the individual parts has led me to areas I never dreamt of going. I’ve learned a lot!” Or, as Matteo puts it: “The biggest headache is double-checking if we’re doing the right thing the right way.” 

 


 

5: Accelerators

Award-winning insight: what´s on the market today is not good enough for SKA

 

About one year ago, Dome research package number 5 took off in search of the best system design for creating sky images. As it turns out, the hardware on the market today is simply not good enough to do what SKA needs. In the meantime, their research has resulted in an award-winning paper. Meet Bram Veenboer, John Romein, Leandro Fiorin and Erik Vermij.

 

Erik Vermij, Leandro Fiorin, John Romein and Bram Veenboer

Erik Vermij, Leandro Fiorin, John Romein and Bram Veenboer

 

Award-winners

We’re talking to four team members. Two of them are particularly happy right now. Erik Vermij and Leandro Fiorin, together with their IBM Research colleague Christoph Hagleitner and Koen Bertels of the Delft University of Technology, wrote an award-winning paper.

PhD researcher Erik Vermij spoke at the International Supercomputing Conference in Leipzig on 23 June, upon receiving the prestigious Gauss Award for best technological research paper.

“Apparently, they liked the paper”, Erik says humbly. And its conclusion is clear-cut, he continues: “Our analysis shows we need something special for the SKA. New algorithms, new hardware. Luckily, this completely justifies the research we’ve been doing in our Dome research package. At this point, we’re trying to come up with a solid proposal of what the new hardware could look like.” The hardware Erik is referring to is called an accelerator, which performs certain tasks faster than a CPU, the central processing unit of a computer. Accelerators are necessary to combine the radio signals from over 250,000 antennas and to create images out of those combined or correlated radio signals, among other things.

Saving a whole lot of money

John Romein adds: “We’ve tested everything on the market today, from NVIDIA to DAS-4, from AMD GPU’s to Intel’s new Xeon Phi. We know none of them is fit for the job, but we need to understand why. So what we want to learn is: why are certain combinations of algorithms and accelerators efficient or not? To what extent do the accelerators reach their absolute performance? How programmable are they? And what is their energy consumption?”

“Based on that knowledge, our goal is to design the best architecture in terms of energy-efficiency and cost control”, Leandro says. “If we can come up with a new kind of high-speed accelerator with great compute power and combine it with smart algorithms running smoothly on those accelerators, that will save the SKA operation a whole lot of money.”

 


 

6: Novel algorithms

How much data does your science need?

 

The SKA, with exabytes of data coming in day after day, demands exascale computing many magnitudes more than today’s supercomputers can do. So we can build better hardware, as the team of research package 5 are looking into. But we will also need smarter ways of making all the necessary calculations. Sanaz Kazemi and Ronald Nijboer tell us what they’ve been up to so far.

 

Sanaz Kazemi and Ronald Nijboer

Sanaz Kazemi and Ronald Nijboer

 

Being really smart

Ronald introduces three ways of optimizing the SKA processing chain: “The existing algorithms consume way too much power when scaled up to SKA level, so something has to change. One option is to improve the algorithms we already have. Another option would be to come up with completely new ones. Or we can be really smart and find a way to use a smaller amount of data. In terms of electrical power, most is used for moving data around. If there is less data to move around, SKA will spend less on energy.”

Sanaz adds: “One important milestone we are working towards is answering this question: how many samples do we need from each antenna to make sure we have a certain level of accuracy? We don’t always have better accuracy when we get more samples, more correlator data. So there’s no point in collecting data samples that do not add to the final result. What we’re trying to establish now is the optimal sampling of data.”

A new name

Research package number 6 started in March 2014, two years into the Dome project. Originally, its name was Compressive sampling. Why the name change? Ronald and Sanaz shed some light on the matter: “Novel algorithms gives our research a broader scope than Compressive sampling did. We’re looking at the whole chain, from the telescope to the image, evaluating different approaches for data reduction, calibration and imaging. When our research started, the Dome project was already two years underway. Insights had progressed and the focus of our research has shifted slightly compared to the initial plans. We do keep track of other research groups working on compressive sampling or compressed sensing, of course.”

The SKA timeframe allows for phased development of these novel algorithms. “When you’re doing fundamental research,” Ronald says,”you’re hoping for a breakthrough, a giant leap forward. By their nature, these things are hard to predict and hard to time. Construction of the SKA starts in 2017 and that may be too soon for something really revolutionary. That is why we’re also looking into optimizing existing algorithms, so we have a working solution ready in time. But we will also pursue the revolutionary plan, maybe we can introduce something brand new a little bit further down the line.”

 


 

7: Realtime communication models for exascale computing

High-risk, high-reward research with remarkable features

 

Research package number 7 got going in March this year, so it has only been active for a few months now. Nevertheless, it is already out of its preparatory phase, working on a custom-designed communication protocol to fit the SKA’s unique needs. With exabytes of data coming in every day and information-sending devices which may be dedicated devices instead of computers, Przemek Lenkiewicz and Chris Broekema have some challenging work to do.

Chris Broekema and Przemek Lenkiewicz 

Chris Broekema and Przemek Lenkiewicz

Technological handshake

Scientific director Ton Engbersen explains why it is vital to create a new communication model for the SKA: “With several exabytes of raw data coming from the antennas and a daily data production of at least 1 petabyte, one needs to be really, really careful transporting this data. The transport of 1 exabyte at 100 gigabits per second takes well over 2.5 years. Transmitting 1 petabyte still takes close to 1 day. Transporting too much data will jeopardize the system operation.”

“Yes, we have a well-defined problem”, Chris agrees, “and therefore a well-defined goal: finding the best way to transport exascale amounts of data fast, reliably and energy-efficiently. We’re looking into various RDMA industry standards.” RDMA stands for remote direct memory access, where a sending party writes directly into a receiving party’s memory, without a central processing unit or CPU intervening. “This creates security issues that need solving. It also creates the need for a protocol to match the two machines, we have to make sure there is a technological handshake.”

Remarkable

“We are dealing with some remarkable features”, Chris continues. “For one thing, the SKA will generate a continuous bulk stream of data. And some of the machines sending data may not be computers but dedicated devices located hundreds of kilometres away. Those challenges are unique to our application. Nevertheless, the IBM Research team in Zurich we are co-operating with sees many possible applications beyond radio astronomy. So the new solutions we will develop may very well spin off into software that is applicable throughout many other industries.”

Przemek adds: “The first thing we did after we formulated a proper work plan, was making an analysis of the communication technology that is on the market today. We found that a lot is already available, yet none of it covers all the aspects we have in mind for the SKA. Our main goal is to have a design and prototype of our new communication protocol around February 2015.”

 


Dome microserver premiere

Today, July 3rd 2014, the Dome project unveiled its first prototype of the microserver, a full-fledged server about the size of a mobile phone, equipped with water cooling to allow for very dense packaging of server and memory. Members of the press were able to watch this Dome Premiere live through an online video stream. Around 65 participants, amongst whom many members of the Dome Users Platform, were present in Dwingeloo. Some of them also outlined how their companies contributed to reaching this important milestone.

 

Prototype of the 64-bit Dome microserver

 Prototype of the 64-bit Dome microserver

 


 

IBM, ASTRON and University of Groningen launch ERCET

Mind you, the advances being made in Dome research are not only relevant to radio telescopes. The more we as a society want to unlock the secrets hidden in the massive amounts of data generated by billions of sensors around us, the more each and every enterprise needs to understand how to do that. Following traditional ICT architectures will no longer be enough. Novel approaches are needed, and Dome is showing how to go about this.

This is also the fundamental background of the collaboration announced by the University of Groningen, ASTRON and IBM on June 26th, which is expected to expand the Dome research into important domains like health care, energy and water management. The collaboration goes by the name of ERCET: European Research Center for Exascale Technology. The agreement was signed by Marco de Vos (ASTRON), Sibrand Poppema (University of Groningen) and Harry van Dorenmalen (IBM).

 

Signing the ERCET agreement, from left to right: Marco de Vos, Sibrand Poppema, Harry van Dorenmalen 

Signing the ERCET agreement, from left to right: Marco de Vos,

Sibrand Poppema, Harry van Dorenmalen

 


 

Users Platform meeting

On 25 and 26 June, seventeen design experts from Transfer DSW, Sintecs, Strukton, ASTRON and IBM came together at the Hanze Institute of Technology (HIT) in Assen to learn about and discuss the newest printed circuit board (PCB) technologies presented by Optiprint and IBM Italy.

The parties present at the workshop are closely involved in designing the microserver printed circuit board. They received information about the newest technologies in PCB design, rigid and flexible printed circuit boards and application areas and constraints. This led to vivid discussion concerning design approaches and constraint, for example concerning how much data can be transported over the flexible connections between rigid and flexible parts of such electronic circuit boards.

This particular seminar was a semi-closed Users Platform event. Future technology seminars organized by the Exascale Centre can be either closed or open, depending on the topic. 

 

Participants of the Dome Users Platform meeting

Participants of the Dome Users Platform meeting

 


 

Upcoming events

 

3 July                                Dome microserver premiere in Dwingeloo

7 – 11 July                         SKA CSP consortium 3rd technical interchange meeting

16 – 18 September               Dome face-to-face meeting

29 September 29 – 2 Oct.      SKA engineering meeting in Fremantle, Western Australia

Fall 2014                           Users Platform open networking event, date to be announced soon