Commit b33637b7 authored by Drew Leonard's avatar Drew Leonard
Browse files

Add demo docs from KIS tutorial

parent cf19cfc1
=========
User Demo
=========
Programmatic access to the SDC data will be provided by a custom client for ``Fido``, SunPy's search and download interface.
This approach will allow the user to query and download data using SunPy, which will provide consistency with users' experiences working with other solar data in Python, and compatibility with the large and growing number of available scientific Python packages.
A brief introduction to SunPy
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
SunPy is an open-source, community-developed package which provides a wide range of general tools for research in solar physics.
It is written in Python, which also allows easy integration with the large and growing ecosystem of scientific Python packages.
The main features of SunPy are:
- the ``Map`` object which provides a single, coordinate-aware interface for loading and plotting solar image data;
- ``TimeSeries``, a similar interface for time-series data;
- solar physical constants;
- use of Astropy's ``units`` submodule for physical quantities;
- tools for parsing time strings and dealing with time ranges;
- ``Fido``, a unified search and download interface for obtaining solar data from a variety of different sources;a database submodule for tracking locally downloaded files.
A full tutorial on any of these topics would be beyond the scope of this document, but for a quick primer you can check out `this tutorial <https://github.com/aperiosoftware/stfc_sunpy_intro_2021>`_ which was run by Aperio Software for an STFC summer school in Durham.
It demonstrates some basic analysis using several of the tools listed above.
An introduction to all of these features can be found on the `SunPy website <https://docs.sunpy.org/en/stable/guide/tour.html>`_, and the documentation provides more detailed explanations and examples.
Some of you may also be interested in the `SSWIDL/sunpy Cheat Sheet <https://docs.sunpy.org/en/stable/guide/ssw.html>`_.
For the purposes of this tutorial, the feature that will interest us most is ``Fido``.
Getting help or helping out
---------------------------
SunPy is a community-driven project, and the developers are always willing to welcome new users and provide help where needed.
The best place to start if you want help with anything in SunPy is the `SunPy matrix room <https://matrix.to/#/#sunpy:openastronomy.org>`_.
As well as being open-source, SunPy has a philosophy of being open development.
This means that anyone can get involved, either by contributing to the package itself (adding features, fixing bugs, writing documentation, etc) or by supporting the project more broadly (giving the developers feedback, helping other users, and so on).
Again, the best place to start with this is `matrix <https://matrix.to/#/#sunpy:openastronomy.org>`_, and you can find repositories for the code base, documentation, website and more at `SunPy's GitHub page <https://github.com/sunpy>`_.
A brief introduction to Fido
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``Fido`` is SunPy's unified search and download interface.
It allows users to query many different data sources at once using a system of attributes to specify the particular data the user is looking for.
A query returns a table of results which can be inspected and sliced, such that the user can either download all of those results or a subset of them.
The below is adapted from the `documentation on the SunPy website <https://docs.sunpy.org/en/stable/guide/acquiring_data/fido.html#fido-guide>`_.
``Fido`` and ``attrs`` can both be imported from ``sunpy.net``.
>>> from sunpy.net import Fido, attrs as a
Attributes are used to specify particular aspects of the data, such as instrument, time range or wavelength.
>>> a.Time('2012/03/04', '2012/03/06')
<sunpy.net.attrs.Time(2012-03-04 00:00:00.000, 2012-03-06 00:00:00.000)>
>>> a.Instrument('aia')
<sunpy.net.attrs.Instrument: aia object at 0x7fbc17391870>
To query data we pass one or more attributes to ``Fido.search()``, which returns data that match those attributes.
So to find level one AIA data in a particular two-day period we can do:
>>> result = Fido.search(a.Time('2012/3/4', '2012/3/6'), a.Instrument.aia, a.Level.one)
This will output a table of the data found by ``Fido`` which satisfy the specified constraints.
Searches can be made arbitrarily complex by including many different attributes, and also by using the ``|`` (OR) operator.
So for example we can search for data from either of two different instruments:
>>> result = Fido.search(a.Time('2012/3/4', '2012/3/4 02:00'), a.Instrument.lyra | a.Instrument.rhessi)
You may notice here that this search has returned results from several different data providers.
These can be indexed either numerically or with the name of the provider, and each one can be further indexed individually, e.g.:
>>> result[2]
>>> result['vso'][1:] # or equivalently, result[2, 1:]
Once we've selected which files we want, these can then be downloaded using ``Fido.fetch()``, which takes as an argument a results table of the kind returned by ``search()``.
>>> files = Fido.fetch(result)
``fetch()`` downloads these results to the location specified in the SunPy configuration (by default ``/home/<user>/sunpy/data``) and returns a list of the names of the downloaded files.
Here we've passed the whole search result to download all of the files, but we could also have sliced it as we did above, to download only certain files.
If you run the command above more than once, you will notice that it is much faster the second time.
This is because internally Fido checks whether or not the files already exist.
If they do, it skips the download and just returns the filename.
KIS Data Interface
^^^^^^^^^^^^^^^^^^
The custom client is required to provide access to the SDC servers.
Importing this automatically registers the client with Fido, allowing the user to interact with the existing SunPy interfaces rather than directly with the custom client.
>>> from sdc.client import KISClient
Querying SDC data is then done in the same way as any other Fido query, e.g.:
>>> result = Fido.search(a.Instrument("GRIS"), a.Time("2014/04/26", "2014/04/27"), a.Level(1))
This will return a table of the results found, which can be inspected or used to download the corresponding files.
**Note:** it is currently necessary to specify the processing level of the data you want - in this case level one.
This is expected to change at a later date so that level one is the default, but it will still be possible to specify other levels.
In this particular case we are only given results from a single provider - the KISClient, which is querying the Science Data Centre's servers.
As we saw earlier, in principle Fido can return results from any number of providers, so the results table is separated by providers, and can be indexed by those providers.
So in our current case, all our results are accessible by
>>> result['kis']
Again as we saw earlier, we can download these files using ``Fido.fetch()``.
In this case, due to how the ``KISClient`` and the Data Centre are set up, it possible to download either the actual fits files containing the data, or json files containing the observation records.
To download the data it is necessary to specify the keyword ``binary=True``.
>>> files = Fido.fetch(result['kis'], binary=True)
Interacting with the data
^^^^^^^^^^^^^^^^^^^^^^^^^
These files can now be loaded and interacted with in a variety of ways.
The most basic would be to use ``astropy.io.fits`` to open files individually.
This would return a HDU (Header-Data Unit) for each file, containing the metadata and the data array.
However, for a large number of files this quickly becomes cumbersome.
To quickly inspect this many files, we are instead going to use ``ndcube`` to load and plot all of the data at once.
First, since the files are not necessarily ordered how we want them, we will have to sort them first.
We will inspect the headers of the files and sort them by the ``"ISTEP"`` keyword, which provides the slit position ID, and we will store the headers in the corresponding order:
>>> from astropy.io import fits
>>> sortedfiles = sorted(files, key=lambda f: fits.getheader(f)["ISTEP"])
>>> headers = [fits.getheader(f) for f in sortedfiles]
Now, in order to construct the ``ndcube`` object we need the WCS (World Coordinate System) information from the file headers, which describes how the pixel grid of the images corresponds to physical coordinates.
>>> from astropy.wcs import WCS
>>> with fits.open(files[0]) as hdul:
>>> wcs0 = WCS(hdul[0]) # extract WCS info from PrimaryHDU of first fits file
>>> wcs0
WCS Keywords
Number of WCS axes: 4
CTYPE : 'HPLN-TAN' 'HPLT-TAN' 'WAVE' 'STOKES'
CRVAL : 0.026489444444444444 -0.09147444444444444 1.5640199000000002e-06 1.0
CRPIX : 0.0 0.0 0.0 0.0
PC1_1 PC1_2 PC1_3 PC1_4 : -0.594823 -0.803857 0.0 0.0
PC2_1 PC2_2 PC2_3 PC2_4 : 0.803857 -0.594823 0.0 0.0
PC3_1 PC3_2 PC3_3 PC3_4 : 0.0 0.0 1.0 0.0
PC4_1 PC4_2 PC4_3 PC4_4 : 0.0 0.0 0.0 1.0
CDELT : -3.793055555555556e-05 -3.793055555555556e-05 3.9903e-12 1.0
NAXIS : 1 471 1010 4
So we can see from this that the data have two spatial dimensions, a wavelength axis and a stokes axis.
In principle the WCS headers should be constructed such that they all describe the same coordinate system - therefore the WCS keywords from any one of the files should enable us to construct a correct WCS object.
That being the case, we will use only the WCS from the first file.
**Important workaround:** There is currently a bug in ``astropy`` which means that one of the WCS keywords is not processed properly for this data.
This will be fixed in the future but for now it is necessary to use this workaround in order to avoid errors:
>>> wcs0.wcs.specsys = 'TOPOCENT'
Next let's inspect a single file before we combine them.
To do this we will load the data from the first file, and pass that and the WCS object to ndcube.
>>> import ndcube
>>> data = fits.getdata(sortedfiles[0])
>>> ndc = ndcube.NDCube(data, wcs0)
>>> data.shape
(4, 1010, 471, 1)
>>> ndc
<ndcube.ndcube.NDCube object at 0x7f7bb096bf10>
NDCube
------
Dimensions: [4.00e+00 1.01e+03 4.71e+02 1.00e+00] pix
Physical Types of Axes: [('phys.polarization.stokes',), ('em.wl',), ('custom:pos.helioprojective.lon', 'custom:pos.helioprojective.lat'), ('custom:pos.helioprojective.lon', 'custom:pos.helioprojective.lat')]
Unit: None
Data Type: >i4
Now we can use ``NDCube``'s built-in plotting capabilities to display the data.
>>> import matplotlib.pyplot as plt
>>> ndc.plot()
.. image:: demo-ndcube-onefile.png
This is obviously not a very interesting plot, because the file only contains a single raster position.
``NDCube`` does have tools to slice the data differently, but we'll demonstrate this later with the full dataset.
To do this we will extract the data from the fits files and use ``numpy``'s ``concatenate()`` function to stack the arrays in the desired order, contructing a single array.
>>> import numpy as np
>>> data = np.concatenate([fits.getdata(f) for f in sortedfiles], axis=-1)
>>> data.shape
(4, 1010, 471, 105)
Notice that now we have an array with the same axes, but we have stacked the raster scans from each of the 105 files on the appropriate axis, so that we have a complete cube of data.
>>> ndc = ndcube.NDCube(data, wcs0)
>>> ndc
<ndcube.ndcube.NDCube object at 0x7f7b9c175a50>
NDCube
------
Dimensions: [ 4. 1010. 471. 105.] pix
Physical Types of Axes: [('phys.polarization.stokes',), ('em.wl',), ('custom:pos.helioprojective.lon', 'custom:pos.helioprojective.lat'), ('custom:pos.helioprojective.lon', 'custom:pos.helioprojective.lat')]
Unit: None
Data Type: int32
Now if we plot the data in the same way, we get a spatial slice through the cube at a given wavelength and Stokes position, which we can move through interactively or run as an animation.
>>> ndc.plot()
.. image:: demo-ndcube-allfiles.png
By default NDCube uses the WCS to determine the spatial axes and plots those, and provides the sliders at the bottom of the plot for the remaining axes.
We may want to look at the data differently though, for example to see one raster position across the whole wavelength range.
To do this ``NDCube.plot()`` takes a ``plot_axes`` keyword which can be used to specify which dimensions are plotted and which are given sliders.
>>> import astropy.units as u
>>> ndc.plotter.plot(plot_axes=[0, 'y', 'x', 0], clip_interval=[0.1,99.9]*u.percent)
.. image:: demo-ndcube-newaxes.png
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment