Archive Team Header Policy
The main data we are interested in in relation to the Archive is the science ready level1 FITS data. This is what we use as a source to describe observations, accordingly our descriptions can only be as precise as the headers of the files. The question we ask to determine whether a specific header format is sufficient for our purposes is:
Does the header contain sufficient information to completely describe the underlying observation solely from it's constituent FITS files?
This is the paramount requirement that must be fulfilled for a successful integration into our archive. Further specifications can make the data more interoperable and easier to deal with, but if this requirement is not fulfilled, it is a total showstopper. In this context completely is meant to indicate the full set of fields of an observation description. They include, but are not limited to:
- Observation: Is it possible to find the unique parent observation solely from the header?
- Coordinates for the observation: Which part of the sky/the sun is covered?
- Temporal coverage: When does the observation start and end?
- Spectral coverage: Which wavelengths are covered by the observation?
In general, benefits of well structured header data are:
- Interoperability: Third party tools that follow the same standards are compatible with your data
- Attractiveness: Data that is easier to use and understand is more attractive for others to use and will thus higher scientific impact.
Any file structure that satisfies the requirement above is formally fit to be integrated into the archive. In practice, it makes sense to further specify these requirements both as a way to enhance interoperability and as a way to provide a proven structure to which new instrument concepts can follow.
When dealing with potential files to integrate into the archive, we distinguish between:
- old files from existing archives and possibly discontinued instruments
- files from new instruments that are still under development.
Our current policy is that we will only try to reshape headers of existing instruments if necessary and if the required effort is in proportion to the expected scientific value of the archive. For new data we strongly encourage developers to adhere to the recommendations we define below.
Our recommendations for header structure are based on the requirements stated above. We have found that the recommendations for header structure presented by the SOLARNET project cover most standard use cases and consistently cover all the information needed for describing observations. Below we will list header standards we have defined for some of the existing instruments.
In some cases additional help is necessary for understanding header keywords. We will include entries in the list below as needed:
File Name Recommendations
Exempt from the SOLARNET recommendations for filenames:
We recommend that file names only contain letters A-Z and a-z, digits 0-9, periods, underscoresand plus/minus igns. Each component of the file name should be separated with an underscore – not a minus sign. In this regard, a range may be considered a single component with a minus sign between the min and max values (such as start/end ate). File name components with numerical values should be a) preceded with one or more identifying letters, and b) given in a fixed-decimal format, e.g. (00.0300). Variable-length string values should be post-fixed with underscores of a fixed length. Another common practice has been to start the file name with the “instrument name” – although typically defined in a consistent manner only on a per mission or per observatory basis - i.e. collisions may appear with other missions. Thus, we recommend prefixing the instrument name with a mission or observatory identifier (e.g. iris for IRIS or sst for SST). After the instrument name, the data level is normally encoded as e.g. “l0” and “l1” for level 0 and 1. Note, however, that the definitions of data levels are normally entirely project/instrument-specific and does not by itself uniquely identify what kinds of processing have been applied. Within each data set it is often very useful to have file names that can be sorted by time when subject to a lexical sort (such as with “ls”). This requires that the next item in the file name should be the date and time (YYYYMMDD_HHMMSS[.d]). The “d” part is fractional seconds, with enough digits to distinguish between any two consecutive observations.
tldr: make your filenames like this: