TWODFDR – 2dFdr Python Bindings

The 2dFdr package is an automatic data reduction pipeline dedicated to reducing multi-fibre spectroscopy data that has been developed by AAO staff for over 20 years (see e.g. Farrell et al. 2019). It is the primary means to reduce data obtained with multi-object spectroscopy (MOS) and integral field unit (IFU) instruments on the AAT (see e.g. 2dfdr Data Reduction Software). There is still a high demand for 2dFdr to reduce data from ongoing surveys (e.g. Hector) and the AAT archive hosted by Data Central.

Why are we making a Python interface to 2dFdr?

The 2dFdr source code hosted by the AAO is predominantly written in Fortran, but there are also large components written in C and Tcl/Tk. In the past decade or so the astronomical community has embraced the Python programming language in earnest (see Figure 1. of The Astropy Collaboration et al. 2022). Given the mature ecosystem of astronomical and scientific Python modules now available, many astronomers write their data reduction pipelines and data analysis code exclusively in Python. As a consequence there are relatively few astronomers that are confident programmers in Fortran, C and Tcl/Tk.

It is clear that the maintenance and longevity of 2dFdr going forward depends upon making the data reduction routines accessible to astronomers via Python. Recently aaorun commands have added a scripting interface to the main reduction routines of 2dFdr (Farrell et al. 2019), which can in turn be called via Python code (using e.g. subprocess.run). This approach has worked to some extent, allowing survey teams to reduce large amounts of data, however special care is required to handle instances where routines clash with each other and fail. Miszalski et al. 2022 found that adding automatic retries upon failure could address this problem and allow for aaorun commands to reliably run within a web service.

However, the aaorun commands are a essentially a stop-gap measure and do not allow for individual routines to be modified or extended in a meaningful way. Ideally, the entire 2dFdr codebase could be rewritten in Python, but this would take considerable effort. The most practical step forward for 2dFdr is to implement Python bindings directly to the 2dFdr code.

What is twodfdr?

In this project we use the Fortran to Python interface generator (F2PY), distributed as part of NumPy, to generate Python functions that directly call the individual Fortran and C functions in 2dFdr. These functions include 2dFdr file operations, algorithms and data reduction routines. We then build Python classes that provide a more user-friendly and robust interface to these low-level functions. The twodfdr API is documented using docstrings and we use pytest to thoroughly test all functions.

In addition, we are developing PyCPL recipes, written in 100 per cent Python, that serve as a robust and regularised framework for users to execute 2dFdr data reduction routines in the same style as ESO data reduction pipelines. Initially these recipes call the aaorun commands in a robust way, but over time we aim to replace the aaorun commands with calls to the twodfdr Python classes. This will allow for a fully modular environment in which to maintain and extend the 2dFdr data reduction routines.

Current twodfdr components

Component

Description

Test Coverage

Tdfio

Python interface to the tdfio module (2dFdr file i/o routines)

100%

TdfArgs

Python interface to SDS ARGS (2dFdr function arguments)

100%

Tdfdr

Python interface to data reduction routines in the drexec module (work in progress)

TBD

TdfCore

Python interface to everything else in the drexec module (work in progress )

TBD

AAORun recipes

PyCPL recipes that call aaorun

TBD

Tdf recipes

PyCPL recipes that call twodfdr Python routines (work in progress)

TBD

The Test Coverage in the above table indicates the amount of functions which have unit tests implemented using pytest. These tests ensure that the functions work as desired, which is not automatically guaranteed in projects like twodfdr that combine two or more programming languages.

Prior to the twodfdr project, tdfio had no tests at all. While our main aim in this project is to provide access to 2dFdr routines, we have an equally strong commitment to demonstrating via test-driven development that the codebase is robust and can be trusted for research purposes.

What is PyCPL? What is ESO software doing in an AAO data reduction pipeline?

The Python Language Bindings for CPL (PyCPL) are a core dependency for twodfdr. It provides Python bindings to ESO’s Common Pipeline Library (CPL) and was developed by AAO staff in close collaboration with members of ESO’s pipeline development group. Readers may naturally ask: Why we are adding ESO software dependencies to an already well-featured AAO data reduction pipeline? The main answer is that PyCPL is more than just bindings to CPL. It provides the powerful capabilities to:

  • Define data reduction recipes entirely in Python, and

  • Execute those recipes using Pyesorex.

For those familiar with ESO’s data reduction pipelines, the Pyesorex program is a Python replacement for the esorex program, which is responsible for executing ESO pipeline recipes. The Pyesorex program can run both traditional ESO pipelines (written in C) and new pipelines written in Python.

We refer users to the PyCPL Getting Started documentation for more information about its capabilities.

What are the benefits of PyCPL for twodfdr?

The PyCPL recipes that Pyesorex can execute are developed in a regularised framework, providing a clean and stable interface to data reduction recipes. Each recipe has the benefit that it can execute any Python code it wishes. In the first instance we have developed PyCPL recipes that call the aaorun commands for a maximum number of user-specified times until the command completes successfully. Later, we will call 2dFdr routines directly from the recipes to replace the usage of aaorun.

The twodfdr module also benefits from the cpl.ui.ParameterList data structure of PyCPL to manage:

  • Parameters passed to 2dFdr functions

  • Parameters defining instrument profiles (replacing the old 2dFdr .idx files)

Both of these tasks can be unwieldly for larger pipelines and twodfdr benefits substantially from using PyCPL.

In the near future, these PyCPL recipes for 2dFdr can also fit into the Pythonic AAO Reduction Environment (PARE), that allows users to chain together PyCPL recipes to automatically reduce their data. PARE handles all the data organisation, input and output file handling, recipe parameters and calibration files necessary for running complex pipelines. More information will be provided in due course.

Release Notes

Development on twodfdr is ongoing. A more formal release will be made at a later date.

Downloads

The latest twodfdr source code is available here:

User Guide

For detailed instructions please see the TWODFDR User Guide.

API Reference

For complete documentation of the twodfdr interfaces please refer to the API Reference.

Installation and Developer Guide

Contributions to twodfdr are welcome. It is strongly encouraged that new contributions or improvements to 2dFdr reduction routines or algorithms are only made in Python. The future maintenance of 2dFdr depends on a growing Python codebase that steadily replaces the legacy Fortran routines over time.

For detailed information on how to install and contribute to twodfdr please refer to the TWODFDR Developer Guide.

Getting Support

Bug reports or other constructive feedback may be submitted via the Data Central Help Desk and should contain detailed information.

Frequently Asked Questions

  • Who is developing twodfdr?
  • Who is the scientific team behind twodfdr?
    • While the RDS group are leading the technical aspects of the project, the scientific aspects are lead by Prof. Scott Croom (USyd), Prof. Chris Lidman (ANU) and Dr. Madusha Gunawardhana (USyd).