O'Reilly Open Source Convention
oreilly.comO'Reilly Network

Arrow Home
Arrow Registration
Arrow Hotel/Travel
Arrow See & Do
Arrow Tutorials
Arrow Sessions
Arrow Evening Events
Arrow BOFs
Arrow Speakers
Arrow Press
Arrow Mail List
Arrow Exhibitors
Arrow Sponsors
O'Reilly Open Source Convention
Sheraton San Diego Hotel, San Diego, CA
July 23-27, 2001

News Coverage


Designing a Masked Array Facility for Python

Paul Dubois, Lawrence Livermore Laboratory

Track: Python
Date: Wednesday, July 25
Time: 3:45pm - 4:15pm
Location: Bel Aire North

MA operates nearly transparently as a drop-in replacement for Numerical, offering nearly all the same functions. In addition, more methods and functions are available to deal with the complexities of masked arrays. This talk discusses the design choices and features of the MA extension.

Numerical Python has proven to be a powerful and successful array-language extension to Python. Its use permits compact and efficient manipulations of large amounts of data. The Numerical Python package defines a large number of functions on arrays, such as trigonometric functions, logs and exponentials, inner and outer products, logical functions, and specialized array manipulation functions.

Unfortunately, many real-life applications contain data that, while basically consisting of full arrays, contain elements whose value is unknown. Most commonly, this arises from observational data, such as a particular weather station that was out of operation for a time, or an area of the Earth or experimental domain that needs to be excluded from a calculation, or by times when no observations are taken in an otherwise uniform sequence, such as daily financial data not recorded on weekends or holidays, missing pixels in a picture, etc. Additionally, many numerical algorithms can be more easily implemented using arrays with "missing" data.

The masked array facility, MA, was designed to meet this need. A masked array is conceptually an array with missing values. MA represents such an array as a data array and a mask array. The mask array, if present, is an array of 1's and 0's, of the same shape as the data array, where a 1 represents a location with invalid data. When operations are performed, the resulting quantity has an appropriate mask. For example, if we add two masked arrays, the result has a mask consisting of the logical or of the operands' masks. All operations avoid using the data from invalid locations.

For technical reasons it is not possible to inherit from the Numeric array object. While it would be possible to write a separate extension in C, it was felt important that MA arrays be convertible to and from Numeric arrays, and that a maintenance burden not be introduced by writing a compiled extension. Therefore, MA is implemented entirely in Python. As such, MA is a great case study of Python's power and flexibility. As a class, the MaskedArray class illustrates usage of nearly every feature you can have in a Python class, and classes such as masked_binary_operator illustrate the design of function-like class instances.

oreilly.com Home | Conferences Home | Open Source Convention Home
Registration | Hotels/Travel | Tutorials | Sessions | Speakers
Press | Mail List | Exhibitors | Sponsors

© 2001, O'Reilly Media, Inc.