Objects in Space: “Herschel” the largest telescope ever flown.
Managing telemetry data and information on steering and calibrating scientific on-board instruments with an object database.
Interview with Dr. Johannes Riedinger, Herschel Mission Manager, European Space Agency (ESA), and Dr Jon Brumfitt, System Architect of the Herschel Science Ground Segment, European Space Agency (ESA).
More Objects in Space…
I became aware of another very interesting project at the European Space Agency (ESA). On May 14, 2009, the European Space Agency launched an Arianne 5 rocket carrying the largest telescope ever flown: the “Herschel” telescope, 3.5 meters in diameter. The satellite whose orbit is some 1.6 million kilometers from the Earth, will operate 48 months.
One interesting aspect of this project, for us at odbms.org, is that they use an object database to manage telemetry data and information on steering and calibrating scientific on-board instruments.
I had the pleasure to interview Dr. Johannes Riedinger, Herschel Mission Manager, and Dr. Jon Brumfitt, System Architect of Herschel Scientific Ground Segment, both at the European Space Agency.
Hope you`ll enjoy the interview!
Q1. What is the mission of the “Herschel” telescope?
Johannes: The Herschel Space Observatory is the latest in a series of space telescopes that observe the sky at far-infrared wavelengths that cannot be observed from the ground. The most prominent predecessors were the Infrared Astronomical Satellite (IRAS), a joint US, UK, NL project launched in 1983, next in line was the Infrared Space Observatory (ISO), a telescope the European Space Agency launched in 1995, which was followed by the Spitzer Space Telescope, a NASA mission launched in 2003. The two features that distinguish Herschel from all its predecessors are its large primary mirror, which translates into a high sensitivity because of the large photon collecting area, and the fact that it is sensitive out to a wavelength of 670 μm. Extending the wavelength range of its predecessors by more than a factor of 3, Herschel is closing the last big gap that has remained in viewing celestial objects at wavelengths between 200 μm and the sub-millimetre regime. In the wavelength range in which Herschel overlaps to varying degrees with its predecessors, from 57 to 200 μm, Herschel’s big advantage is once more the size of its primary mirror. Because spatial resolution at a given wavelength improves linearly with telescope diameter, Herschel has a 4 to 6 times sharper vision than all earlier far-infrared telescopes.
Q2. In a report in 2010  you wrote that “you collect every day an average of six to seven gigabit raw telemetry data and additional information for steering and calibrating three scientific on-board instruments“. Is this still actual? What are the main technical challenges with respect to data processing, manipulation and storage that this project poses?
Johannes: The average rate at which we have been receiving raw data from Herschel over the past 18 months is about 12 gigabits/day, which increases somewhat through the addition of related data by the time it is ingested into the database. The amount of data, and its distribution to the three national data centres that monitor the health and calibration of the instruments, is quite manageable. More challenging are the data volumes that need to be handled in the form of various levels of scientific data products, which are generated in the data processing pipeline. Another challenge, even for today’s computers, is the amount of memory some products require during processing. This goes significantly beyond the amount of memory a standard desktop or laptop computer comes with, and even the amount of disk space such machines used to come with until a few years ago: Only recently we have procured two computers with 256 GB of RAM for the grid on which we run the pipeline to efficiently process some of the data.
Q3. What are the requirements of the “Herschel” data processing software? And what are the main components and functionalities of the Herschel software?
Jon: The software is required to plan daily sequences of astronomical observations and associated spacecraft manoeuvres, based on proposals submitted by the astronomical community, and to turn these into telecommands sequences to uplink to the spacecraft. The data processing system is required to process the resulting telemetry and convert it into scientific products (images, spectra, etc) for the astronomers. The “uplink” system is particularly critical, because it must have a high availability to keep the spacecraft supplied with commands. Significant parts of the software were required to be working several years before launch, so that they could be used to test the scientific instruments in the laboratory. This approach, which we call “smooth transition”, ensures that certain critical parts of the software are very mature by the time we come to launch and that the instruments have been extensively tested with the real control system.
The uplink chain starts with a proposal submission system that allows astronomers to submit their observation requests. The astronomer downloads a Java client which communicates with a JBOSS application server to ingest proposals into the Versant database. Next, the mission planning system allows the mission planner to schedule sequences of observations. This is a complex task which is subject to many constraints on the allowed spacecraft orientation, the target visibility, stray light, ground station availability, etc. It is important to minimize the time wasted by slewing between targets. Each schedule is expanded into a sequence of several thousand telecommands to control the spacecraft and scientific instruments for a period of 24 hours. The spacecraft is in contact with the Earth only once a day for typically 3 hours, during which time a new schedule is uplinked and data from the previous day is downlinked and ingested into the Versant database. Between contact periods, the spacecraft executes the commands autonomously. The sequences of instrument commands have to be accurately synchronized with each little manoeuvre of the spacecraft when we are mapping areas of the sky. This is achieved by a system known as the “Common Uplink System” in which the detailed commanding is programmed in a special language that models the on-board execution timing.
The other major component of the software is the Data Processing system. This consists of a pipeline that processes the telemetry each day and places the resulting data products in an archive. Astronomers can download the products. We also provide an interactive analysis system which they can download to carry out further specialized analysis of their data.
Overall, it is quite a complex system with about 3 million lines of Java code (including test code) organized as 1100 Java packages and 300 CVS modules. There are 13,000 classes of which 300 are persistent.
Q4. You have chosen to have two separate database systems for Herschel: a relational database for storing and managing processed data products and an object database for storing and managing proposal data, mission planning data, telecommands and raw (unprocessed) telemetry. Is this correct? Why two databases? What is the size of the data expected for each database?
Jon: The processed data products, which account for the vast bulk of the data, are kept in a relational database, which forms part of a common infrastructure shared by many of our ESA science projects. This archive provides a uniform way of accessing data from different missions and performing multi-mission queries. Also, the archive will need to be maintained for years after the Herschel project finishes. The Herschel data in this archive is expected to grow to around 50 TB.
The proposal data, scheduling data, telecommands and raw (unprocessed) telemetry are kept in the Versant object database. This database will only grow to around 2 TB. In fact, back in 2000, we envisaged that all the data would be in the object database, but it soon became apparent that there were enormous benefits from adopting a common approach with other projects.
The object database is very good for the kind of data involved in the uplink chain. This typically requires complex linking of objects and navigational access. The data in the relational archive is stored in the form of FITS files (a format widely used by astronomers) and the appropriate files can be found by querying the meta-data that consist only of a few hundred keywords even for files that can be several gigabytes in size. We need to deliver the data as FITS files, so it makes sense to store it in that form, rather than to convert objects in a database into files in response to each astronomer query. Our interactive analysis system retrieves the files from the archive and converts them back into objects for processing.
Johannes: At present the Science Data Archive contains about 100 TB of data. This includes older versions of the same products that were generated by earlier versions of the pipeline software—we regenerate the products with the latest software and the best current knowledge of instrument calibration about every 6 months. Several years from now, when we will have generated the “best and final” products for the Herschel legacy archive and after we have discarded all the earlier versions of each product, we expect to end up, as Jon says, with an archive volume of about 50 to 100 TB. This is up by two orders of magnitude from the legacy archive of ISO which, given today’s technology, you can carry around in your pocket on a 512 GB external disk drive.
Q5. How do the two databases interact with each other? Are there any similarities with the Gaia data processing system?
Jon: The two database systems are largely separate as they perform rather different roles. The gap is bridged by the data processing pipeline which takes the telemetry from the Versant database, processes it and puts the resulting products in the archive.
We do, in fact, have the ability to store products in the Versant database and this is used by experts within the instrument teams, although for the normal astronomer everything is archive based. There are many kinds of products and in principle new kinds of products can be defined to meet the needs of data processing as the mission progresses. If it were necessary to define a new persistent class for each new product type, we would have a major schema evolution problem. Consequently, product classes are defined by “composition” of a fixed set of building blocks to build an appropriate hierarchical data-structure. These blocks are implemented as a fixed set of persistent classes, allowing new product types to be defined without a schema change.
The Versant database requires that first-class objects in the database are “enhanced” to make them persistence capable. However, we need to use Plain Old Java Objects for the products so that our interactive analysis system does not require a Versant installation. We have solved this by using “product persistor” classes that act as adaptors to make the POJOs persistent.
Johannes: Concerning Gaia, and here I need to make a point that has little to do with database technology per se but it has a lot to do with using databases as a tool for different purposes even in the same generic area of science—astronomy in this case—I want to emphasize the differences between Herschel and Gaia rather than the similarities.
Herschel is an observatory, i.e. the satellite collects data from any celestial object which is selected by a user to be observed in a particular observing mode with one or more of Herschel’s three on-board instruments. Each observing mode generates its own, specific set of products that is characteristic of the observing mode rather than characteristic of the celestial object that is observed: A set of products can e.g. consist of a set of 2-dimensional, monochromatic images at infrared wavelengths, from which various false-colour composites can be generated by superposition. In the lingo of this journal: Observing modes are the classes, the data products are the objects. If I observe two different celestial objects in the same observing mode, I can directly compare the results. E.g. the distribution of colours in a false-colour image immediately tells me something about the temperature distribution of the material that is present; the intensity of the colour tells me something about the amount of material of a given temperature. And if I also have a spectrum of the celestial object, this tells me something about the chemical composition of the material and its physical state, such as pressure, density, and temperature.
Gaia, on the other hand, is not an observatory and it does not offer different observing modes that can be requested. It measures the same set of a dozen or so parameters for each and every object time and time again, with some of these parameters being time, brightness in different colours, position relative to other objects that appear in the same snapshot, and radial velocity. But every time it measures these parameters for a particular object it measures them in a different context, i.e. in combination with a different set of other objects that appear in the same snapshot. So you end up with a billion objects, each appearing on a different ensemble of several dozen snapshots, and you have to find the “global best fit” that minimizes the residual errors of a dozen parameters fitted to a billion objects. Computationally, and compared to Herschel, this is a gargantuan task. But it is a single task—derive the mechanical state of an ensemble of a billion objects whose motion is controlled by relativistic celestial mechanics in the presence of a few possible disturbances, such as orbiting planets or unstable stellar states (pulsating or otherwise variable stars). On Herschel, the scientific challenge is somewhat different: We are not so intensely interested in the state of motion of individual objects, we are interested in the chemical composition of matter—much of which does not consist of stars but of clouds of gas and dust which Gaia cannot “see”—and its physical state.
Q6. You have chosen an object database, from Versant, for storing and managing raw (unprocessed) telemetry data. What is the rationale for this choice? How do you map raw data into database objects? What is a typical object database representation of these subsets of “Herschel” data stored in the Versant Object Database? What are the typical operations on such data?
Jon: Back in 2000, we looked at the available object databases. Versant and Objectivity appeared to have the scalability to cope with the amount of data we were envisaging. After a detailed comparison, we chose Versant although I think either would have done the job. At this stage, we still envisaged storing all the product data in the object database.
The uplink objects, such as proposals, observations, telecommands, schedules, etc, are stored directly as objects in the Versant database, as you might expect. The telemetry is a special case because we want to preserve the original data exactly as it arrives in binary packets, so that we can always get back to the raw data. So we have a TelemetryPacket class that encapsulates the binary packet and provides methods for accessing its contents. To support queries, we decode key meta-data from the binary packet when the object is constructed and store it as attributes of the object. This allows querying to retrieve, for example, packets for a given time range for a specified instrument or packets for a particular observation.
The persistent data model is known as the Core Class Model (CCM). This was developed by starting with a domain model describing the key entities in the problem domain, such as proposals and observations, and then adding methods by analysing the interactions implied by the use-cases. The model then evolved by the introduction of various design patterns and further classes related to the implementation domain.
Q7. How did you specifically use the Versant object database? Are there any specific components of the Versant object database that were (are) crucial for the data storage and management part of the project? If yes, which ones? Were there any technical challenges? If yes, which ones?
Jon: With a project that spans 20 years from the initial concept at the science ground segment level to generation and availability of the final archive of scientific products, it is important to allow for technology becoming obsolete. At the start of the science ground segment development part of the project, a decade ago, we looked at the standards that were then emerging (ODMG and JDO) to see if these could provide a vendor-independent database interface. However, the standards were not sufficiently mature, so we designed our own abstraction layer for the database and the persistent data. That meant that, in principle, we could change to a different database if needed. We tried to only include features that you would reasonably expect from any object database.
We organized all the persistent classes in the system into a single package. This was important to keep changes to the schema under tight control by ensuring that the schema version was defined by a single deliverable item. Consequently, the CCM classes are responsible for persistence and data integrity, but have little application-specific functionality. The persistent objects can be customized by the applications for example by using the decorator pattern. For various reasons, the persistent classes tend to contain Versant-specific code, so by keeping them all in one package it keeps the vendor-specific code together in one module behind the abstraction layer.
We needed to propagate copies of subsets of the data to databases to the three instrument control centres. At the time, there wasn’t an out-of-the-box solution that did what we wanted, so we developed our own data propagation. This was quite a substantial amount of work. One nice thing is that this is all hidden within our database abstraction layer, so that it is transparent to the applications. We continuously propagate the database to all three instrument teams and we also propagate it to our test system and backup system.
Another important factor is backup and recovery and it makes a big difference how you organize the data. We have an uplink node and a set of telemetry nodes. There is a current telemetry node into which new data is ingested. All the older telemetry nodes are read-only as the data never changes, which means you only have to back them up once. The abstraction layer is able to hide the mapping of objects onto databases.
Q8. Is scalability an important factor in your application?
Jon: In the early stages, when we were considering putting all the data into the object database, scalability was a very important issue. It became much less of an issue when we decided to store the products in the common archive.
Q9. Do you have specific time requirements for handling data objects?
Jon: We need to ingest the 1.5 GB of telemetry each day and propagate it to the instrument sites. In general the performance of the Versant database is more than adequate. It is also important that we can plan (and if necessary replan) the observations for an operational day within a few hours, although this does not pose a problem in terms of database performance.
Q10. By the end of the four years life cycle of “Herschel” you are expecting a minimum of 50 terabytes of data which will be available to the scientific community. How do you plan to store, maintain and elaborate this amount of data?
Johannes: At the European Space Astronomy Centre (ESAC) in Madrid we keep the archives of all of ESA’s astrophysics missions, and a few years ago we started to also import and organise into archives the data of the planetary missions that deal with observations of the sun, planets, comets and asteroids. The user interface to all these archives is basically the same. This generates a kind of “corporate identity” of these archives and if you have used one you know how to use all of them.
For many years, the Science Archive Team at ESAC has played a leading role in Europe in the area of “Virtual Observatories”. If you are looking for some specific information on a particular astronomical object, chances are good, and they are getting better by the day, that some archive in the world already has this information. ESA’s archives, and amongst them the Herschel legacy archive once it has been built, are accessible through this virtual observatory from practically anywhere in the world.
Q10. You are basically half way in the life cycle of “Herschel”. What are the main lessons learned so far from the project? What are the next steps planned for the project and the main technical challenges ahead?
Johannes: We are very lucky, and without doubt the more than 1,300 individuals who are listed on a web page as having made major contributions to the success of this project have every reason to be proud, that we are operating this world class observatory with only minor problems in satellite hardware which are well under control.
We are approximately half way through the in-orbit life time of the mission after launch. But this is not the same as saying that we have reaped half of the science results from the mission.
For the first 4 months of the mission, i.e. until mid-September 2009, the satellite and the ground segment underwent a checkout, a commissioning and a performance verification phase. Through a large number of “engineering” and “calibration” observations in these initial phases we ensured that the observing modes we had planned to provide and had advertised to our users worked in principle, that the telescope pointed in the right direction, that the temperatures did not drift beyond specification, etc. From mid-September to about the end of 2009, we performed an increasing number of scientific observations of astronomical targets in what we called the “Science Demonstration Phase”. These observations were “the proof of the pudding” and showed that, indeed, the astronomers could do the science they had intended to do with the data they were getting from Herschel: More than 250 scientific papers were published in mid-2010 that are based on observations made during this Science Demonstration Phase.
We have been in the “Routine Science Mission Phase” for about one year now, and we expect to remain in this phase for at least another year and a half, i.e. we have collected perhaps 40% of the scientific data Herschel will eventually have collected when the Liquid Helium coolant runs out. Already now we can see that Herschel will revolutionize some of the currently accepted theories and concepts of how and where stars are born, how they enrich the interstellar medium with elements heavier than Helium when they die, and in many other areas such as how the birth rate of stars has changed over the age of the universe, which is over 13 billion years old. But, most importantly, we need to realise and remember that sustained progress in science does not come about on short time scales. The initial results that have already been published are spectacular, but they are only the tip of the iceberg, they are results that stare you in the face. Many more results, which together will change our view of the cosmos as profoundly as the cream of the coffee results we already see now, will only be found after years of archival research, by astronomers who plough through vast amounts of archive data in the search of new, exciting similarities between seemingly different types of objects and stunning differences between objects that had been classified to be of the same type. And this kind of knowledge will accumulate from Herschel data for many years beyond the end of the Herschel in-orbit lifetime. ESA will support this slower but more sustained scientific progress through archival research by keeping together a team of about half a dozen software experts and several instrument specialists for up to 5 years after the end of the in-orbit mission. In addition, members of the astronomical community will, as they have done on previous missions, contribute to the archive their own, interactively improved products which extract features from the data that no automatic pipeline process can extract no matter how clever it has been set up to be.
I believe it is fair to say that no-one yet knows the challenges the software developers and the software users will face.
But I do expect that a lot more problems will arise from the scientific interpretation of the data than from the software technologies we use, which are up to the task that is at hand.
Dr. Johannes Riedinger, Herschel Mission Manager, European Space Agency.
Johannes Riedinger has a background in Mathematics and Astronomy and has worked on space projects since the start of his PhD thesis in 1980, when he joined a team at Max-Planck-Institut für Astronomie in Heidelberg in the development of an instrument for a space shuttle payload. Working on a successor project, the ISO Photo-Polarimeter ISOPHOT, he joined industry from 1985-1988 as the System Engineer for the phase B study of this payload. Johannes joined the European Agency in 1988 and, having contributed to the implementation of the Science Ground Segments of the Infrared Space Observatory (ISO, launched in 1995) and the X-ray Multi Mirror telescope (XMM-Newton, launched in 1999), he became the Herschel Science Centre Development Manager in 2000. Following successful commissioning of the satellite and ground segment, he became Herschel Mission Manager in 2009.
Dr. Jon Brumfitt, System Architect of Herschel Scientific Ground Segment, European Space Agency.
Jon Brumfitt has a background in Electronics with Physics and Mathematics and has worked on several of ESA’s astrophysics missions, including IUE, Hipparcos, ISO, XMM and currently Herschel. After completing his PhD and a post-doctoral fellowship in image processing, Jon worked on data reduction for the IUE satellite before joining Logica Space and Defence in 1980. In 1984 he moved to Logica’s research centre in Cambridge and then in 1993 to ESTEC in the Netherlands to work on the scientific ground segments for ISO and XMM. In January 2000, he joined the newly formed Herschel team as ground segment System Architect. As Herschel approached launch, he moved down to the European Space Astronomy Centre in Madrid to become part of the Herschel Science Operations Team.
For Additional Reading
 Data From Outer Space.
This short paper describes a case study: the handling of telemetry data and information of the “Herschel” telescope from outer space with the Versant object database. The telescope was launched by the European Space Agency (ESA) with the Ariane 5 rocket on 14 May 2009.
Paper | Introductory | English | LINK to DOWNLOAD (PDF)| 2010|
– HERSCHEL: Exploring the formation of galaxies and stars
– ESA latest_news
– ESA Press Releases