AstroAsciiData – a python module for working with ascii data tables

AstroAsciiData – a Python module for working with ASCII tables

[Last updated: 2011/01/14]

Introduction

AstroAsciiData is a Python module for manipulating ASCII tables. It can handle metadata such as, header comments and information on individual columns, for example name and unit. It can be used to save the table, including metadata, into text files. Data in a table column can be converted into Numpy arrays.

It can output tables in various formats in addition to ASCII: Numpy/Numarray data, HTML table, LaTeX table and FITS table.

This module is developed by M. Kümmel and J. Haase at the Space Telescope European Coordinating Facility, and is part of the Astrolib project.

Requirements

The module can be downloaded from the AstroAsciiData website.

For working with plain ASCII data no software other than Python itself is required. For converting data into Numpy and Numarray formats, the Numpy and Numarray modules are needed, respectively. For working with FITS data, the PyFITS package is required.

An Example

>>> import asciidata as asd 
 
# Table with 2 columns and 3 rows, with "|" as delimiter and 'Null' 
# for null/missing values. 
>>> table = asd.create(2,3,null='Null',delimiter="|")
 
# Create column objects and assign them to the columns in the table. 
>>> table[0] = asd.AsciiColumn(element=[1,2,3])
>>> table[1] = asd.AsciiColumn(element=[2,4,9])
 
# Add 1 row at the end and then fill each cell of that row. 
>>> table.insert(1,start=-1)
>>> table[0][3] = 4 
>>> table[1][3] = 16 
 
# Print a string representation of the table. 
>>> print table
  1|    2 
  2|    4 
  3|    9 
  4|   16 
 
# Save table to text file. 
>>> table.writeto("example_table.txt")
 
# Create a table by reading data from file. Default delimiter is one 
# or more spaces; here we use "|". 
>>> table2 = asd.open("example_table.txt",delimiter = "|")
 
# Convert to SExtractor format; the original table itself is changed. 
>>> table2.toSExtractor()
 
# Append a comment to the header. 
>>> table2.header.append("A simple table")
 
# Change the default names; default column names are 'column1', 
# 'column2' and so on. 
>>> table2['column1'].rename("Number")
>>> table2['column2'].rename("Square")
 
# Add comments for each column; SExtractor format. 
>>> table2['Number'].set_colcomment("A number")
>>> table2['Square'].set_colcomment("Square of Number")
 
# Describe the unit of data in each column; only a description. 
>>> table2['Number'].set_unit("Integer")
>>> table2['Square'].set_unit("Integer")
>>> print table2
# 1  Number  A number  [Integer] 
# 2  Square  Square of Number  [Integer] 
#A simple table 
    1     2 
    2     4 
    3     9 
    4    16 
 
# Print the column 'Number'. 
>>> print table2['Number']
Column: Number
    1 
    2 
    3 
    4 
 
# Print the column 'Square'. 
>>> print table2['Square']
Column: Square
    2 
    4 
    9 
   16 
 
# Write to a new file. 
>>> table2.writeto("example_table_sExt.txt")
 
# Change delimiter to "|". 
>>> table2.newdelimiter("|")
>>> print table2
# 1  Number  A number  [Integer] 
# 2  Square  Square of Number  [Integer] 
#A simple table 
    1|    2 
    2|    4 
    3|    9 
    4|   16 

Contents of the example_table.txt file:

 
    1|    2
    2|    4
    3|    9
    4|   16
 

Contents of the example_table_sExt.txt file:

 
# 1  Number  A number  [Integer]
# 2  Square  Square of Number  [Integer]
#A simple table
    1     2
    2     4
    3     9
    4    16
 

Concepts

ASCII data is represented in the SExtractor format. In this format, data is stored in files as columns, with some delimiter separating the columns. Metadata is provided in the header of the file; each header line starts with a comment character, usually "#".

For each column, the metadata stored are: column number, column name, a comment and a unit; the unit is used only as a label. Following the metadata for the columns, there can be zero or more lines of normal header comments, each starting with a comment character.

The following is an example SExtractor table, taken from http://www.stecf.org/software/PYTHONtools/astroasciidata/manual/asciidata/node2.html.

 
#   1 NUMBER          Running object number
#   2 XWIN_IMAGE      Windowed position estimate along x              [pixel]
#   3 YWIN_IMAGE      Windowed position estimate along y              [pixel]
#   4 ERRY2WIN_IMAGE  Variance of windowed pos along y                [pixel**2]
#   5 AWIN_IMAGE      Windowed profile RMS along major axis           [pixel]
#   6 ERRAWIN_IMAGE   RMS windowed pos error along major axis         [pixel]
#   7 BWIN_IMAGE      Windowed profile RMS along minor axis           [pixel]
#   8 ERRBWIN_IMAGE   RMS windowed pos error along minor axis         [pixel]
#   9 MAG_AUTO        Kron-like elliptical aperture magnitude         [mag]
#  10 MAGERR_AUTO     RMS error for AUTO magnitude                    [mag]
#  11 CLASS_STAR      S/G classifier output
#  12 FLAGS           Extraction flags
  1 100.523  11.911   2.783 0.0693 2.078 0.0688  -5.3246   0.0416  0.00  19
  2 100.660   4.872   7.005 0.1261 3.742 0.0989  -6.4538   0.0214  0.00  27
  3 131.046  10.382   1.965 0.0681 1.714 0.0663  -4.6836   0.0524  0.00  17
  4 338.959   4.966  11.439 0.1704 4.337 0.1450  -7.1747   0.0173  0.00  25

Note that, a table can also be treated as a plain non-SExtractor file, i.e., no header information. In this case the column metadata in the header will not be included when the data is written to a file.

An AstroAsciiData table is an instance of the AsciiData class. It contains several AsciiColumn instances, each of which represent a column of data. A column can be accessed using an integer index, or the name of the column if it has one. When a table instance is used in a loop, i.e., as an iterator, the iteration takes place over the columns. A cell in a column can be accessed by indexing the columns.

In short, treat a table as a list of columns and a column as a list of cells.

A table also contains an instance of the Header class, which is used to store metadata of the table. This header contains the generic header comments mentioned above and can also be treated as a list.

Working with data

There are several ways of creating a table, adding columns and rows, and manipulating data in them. The following sections describe these methods.

Creating tables

Tables can be created either by reading data from a file or by creating an empty table with a certain number of rows and columns.

Creating table from data in a file

The function open() is used for this. The following code reads data in a file, with the given delimiter and substitutes the string ‘Null’ for all null (empty) values.

table = asd.open(filename="table",null='Null',delimiter=delimiter)

Creating empty tables

There are two functions for creating empty tables. One is used to create a plain table and the other for creating SExtractor format tables. A plain table, when written to disk will have only simple comments in the header and not any detailed information on columns as in the case of a SExtractor table.

plain_table = asd.create(ncols=2,nrows=3)
SExt_table = asd.createSEx(ncols=2,nrows=3)

Optional parameters null and delimiter can be used to set the respective properties of the table.

At any time after creation, the format of a table can be changed in the following manner:

table.toplain()
table.toSExtractor()
 
# Converting to FITS table 
table.tofits()

Adding and manipulating data

Data can be added to an existing table in three ways:

  1. Create a column and add it to an existing column in the table.
  2. Insert empty row(s) into the table and then add data into the
    cells.
  3. Insert data directly into existing rows.

In addition, data in a column can be converted into a Numpy array using the method tonumpy and into a Numarray array using the method tonumarray. This allows complex manipulation of data in the table. If needed, the data can be written back into the appropriate column, after performing required calculations.

To insert a new column, we first need to append a column to an existing table, using the append method. Then an AsciiColumn instance can be created and inserted into the column.

table = asd.create(1,10) # 1 column, 10 rows. 
data = range(1,11) # The list [1,2,...,10]. 
# Create a column object with some initial data. 
col = asd.AsciiColumn(element=data)
# Assign it to the first column. 
table[0] = col
 
# Add a new column. 
data = range(10) # list [0,1,...,9] 
col1 = asd.AsciiColumn(element=data)
# Add a column filled with Null values. 
table.append("new_column")
# Replace the above column with the ``col1`` object. 
table[1] = col1

The AsciiColumn constructor takes the following parameters in addition to the element parameter. All these are set to None by default.

  • null : Sets the null value representation for the column.
  • colname : Sets the name of the column
  • nrows : Sets the number of rows in the column.

Columns can be deleted using the Python del operator. The table metadata will be automatically updated.

>>> table.ncols
2 
>>> del table[0]
>>> table.ncols
1 

Empty rows can be inserted into a table using the insert method of a table object. We can specify the number of rows to be added and the index where the first row has to be inserted; existing rows are pushed downwards, unless the indicated position is at the end of the table.

# Add 4 rows at the beginning of the table. 
table.insert(4,start=0)
 
# Add 4 rows at the end of the table. 
table.insert(4,start=-1)

The following example illustrates adding data into cells. The number of rows and columns in a table are stored in the properties nrows and ncols, respectively, of the table object.

# Add square of column number into first row. 
for i in range(table.ncols):
  table[i][0] = i**2 
 
# Add sum of row number and column number to each cell. 
for i in range(table.ncols):
  for j in range(table.nrows):
    table[i][j] = i+j

Rows can be deleted from a table using the delete method of the table object.

>>> print table
 0     1 
 1     2 
 2     3 
 3     4 
 4     5 
 5     6 
 6     7 
 7     8 
 8     9 
 9    10 
>>> table.delete(start=0)
>>> print table
 1     2 
 2     3 
 3     4 
 4     5 
 5     6 
 6     7 
 7     8 
 8     9 
 9    10 
>>> table.delete(start=0, end=3)
>>> print table
 4     5 
 5     6 
 6     7 
 7     8 
 8     9 
 9    10 

The sort method sorts the table based on values in a given column.

>>> table.sort(colname=0,descending=1)
>>> print(table)
 9    10 
 8     9 
 7     8 
 6     7 
 5     6 
 4     5 

Adding metadata

The metadata stored in an AsciiData table can be viewed with the method info of the table object.

A table knows which file the data is associated with , table.filename, the number of rows and columns in it , table.nrows and table.ncols respectively, the delimiter, the comment character and how null values are represented in it.

All these information can be listed with the info method, i.e., table.info().

The header information can be accessed and modified using the header member of the table.

header = table.header
 
# List the information in the header. 
print header
 
# Add a new comment line to the table header. 
header.append("Another comment line.")
 
# Update the file associated with the table. 
table.flush()

The following lists the various methods that can be used to access and modify table metadata.

table.info()
Returns a string with table metadata, except header.
table.header
Reference to the table header. Use print table.header to list
the header.
table.header.append()
Add comment lines to the header.
table.newcomment_char()
Sets the comment character used in the table, to the character supplied.
table.newdelimiter()
Sets the delimiter to the character supplied.
table.newnull()
Sets the value used to represent null values to the value supplied.

The delimiter and null values used are stored in the members _delimiter and _null of the table object; use these as read only variables, if direct access to these value are needed.

For each column a table stores the column name, the type of data in the column, the format of the data, the null value representation, the unit of the data and comments for the column.

The unit is only a label and is not used in any other manner. The format of the data is what is used when data is converted into a string, i.e., when the table is printed or written to a file.

Information on each column can be listed using the info method of the column object, e.g., table[0].info().

In the following col is a reference to a column object in the table, for example col = table[0].

col.info()
List the metadata associated with the column.
col.get_colcomment() and col.set_colcomment()
Get and set the comment for the column.
col.get_format()
Gets the format used in displaying the data in string
representations of the data. For example, if format is "%5i" then
when table is written to file or printed on to the screen, the data
in the column is displayed as an integer occupying 5 spaces.
col.reformat()
Change the display format of the column.
col.colname and col.rename()
The first returns the name of the column and the second renames it
to the value provided.
col.get_type() and col.set_type()
Gets and sets the datatype of the data in the column.
col.get_unit() and col.set_unit()
Gets and sets the unit for the data in the column.

Writing to files

AstroAsciiData can save files in the following formats: plain text, FITS, HTML and LaTeX. All these functions take as argument the name of the output file.

Format Method
plain text writeto()
FITS writetofits()
HTML writetohtml()
LaTeX writetolatex()
Advertisements
This entry was posted in Astronomy, Python and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s