Preparing for the SKA data

The world is wholly underprepared for the big-data challenges presented by the Square Kilometre Array (SKA) – the largest radio telescope ever built. This is because we have never seen data volumes on this scale before. By the time the SKA comes online in 2020, the scientific community needs to be ready with the necessary hardware and software so that the data can be put to good use immediately. UCT eResearch has been helping with two specific challenges: delivering the data sets to researchers around the world, and working to enable visualisation of the data

The SKA has established precursor projects – pathfinders – to prototype the tools required to transport, process, analyse and visualise their data. The South African MeerKAT project is one of these pathfinder radio telescopes.

Two members of UCT eResearch – data scientist David Aikema, and Adrianna Pińska, a scientific software developer – have been seconded to the Inter-University Institute for Data-Intensive Astronomy (IDIA) to help with these challenges.

“eResearch offers centralised research services to the whole community, as well as embedded services within specialised research teams,” explains UCT eResearch Director, Dr Dale Peters. “Working integrally with the IDIA team, David and Adrianna are able to respond directly to the big-data challenges presented by the SKA.”

Delivering the data

With colleagues at IDIA, ASTRON in the Netherlands, the Institute of Astrophysics of Andalusia in Spain, and the Canadian Astronomy Data Centre, Aikema has been working on solutions to deliver the massive data sets produced by the SKA to astronomers in South Africa and around the world.

The challenge, explains Aikema – which will start with the full MeerKAT array and expand exponentially with the SKA – will be in ensuring that the data are archived and stored in a way that is accessible to the various (geographically diverse) research projects related to the SKA.

Key to the data delivery architecture are the SKA regional centres. IDIA, along with ASTRON in the Netherlands, are pathfinder SKA regional centres; but more such centres are in the pipeline, scattered around the world. SKA data sets will be sent to these regional centres.

“The end goal of the MeerKAT data-delivery architecture – which should then give us a blueprint for handling the SKA data when it comes – is that researchers will be able to access and work with big data through services and systems provided by the SKA regional centres,” explains Aikema.

Visualising the data

Visualisation is the human brain’s best way of understanding large volumes of data. It is a tool that allows us to represent large and incomprehensible data sets in a way that allows us to see a pattern and comprehend the information within the data.

Because of the size and complexity of astronomical data, an effective visualisation tool is key. In her work for IDIA, Pińska is collaborating with a team at the Academia Sinica Institute of Astronomy and Astrophysics in Taiwan, and the National Radio Astronomy Observatory (NRAO) in the United States, to upscale the Cube Analysis and Rendering Tool for Astronomy (CARTA) – a platform for viewing astronomical data – to enable it to handle the data requirements of MeerKAT and the SKA.CARTA was originally designed to visualise multi-dimensional data sets that vary in size and volume. “CARTA has a client-server architecture, so users can connect through a web browser,” says Pińska. “The large data files then sit on a server, where they are processed, and the rendered image data is sent back to the viewer.”

The challenge is that CARTA was designed for much smaller data sets than those expected from MeerKAT and SKA. Pińska’s job is therefore to optimise CARTA so it can open really large data files.

“There are two major challenges here. One is speed: if the viewer needs to make calculations over the data, CARTA may do so quickly on a small image, but take much longer on a larger image,” explains Pińska. “The other issue is memory: you need enough memory to open the image, and some calculations may have more memory requirements proportional to the size of the image.”

To solve these two issues, Pińska is working on applying new algorithms to CARTA for efficient viewing of massive data-intensive images, with the option to upscale as those images increase in size.

Fortunately – with international teams grappling with these obstacles now, and working out solutions according to the scale of the expected data volumes – by the time SKA data sets are ready for science in the 2020s, we will be ready to meet the challenge.

Culled From: