NetEpi Collection and NetEpi Analysis: Open-Source Solutions to Some Pressing Data Management and Data Analysis Problems in Public Health Practice
Public health is an area of endeavour concerned with protecting and improving the health of sub-groups of the population (or the entire population), rather than with the care of individual patients. The global SARS epidemic in 2003 reminded everyone that communicable diseases are still a threat, and there is now real concern over the threat posed by avian and future human pandemic influenza viruses. The first part of this presentation will describe NetEpi Collection, a free, open-source, Web-based disease outbreak data management tool written in Python, development of which commenced in 2003 during the SARS epidemic, and its strengths, limitations and future needs will be discussed. In particular the need for multi-master distributed database features which are robust in the face of slow and unreliable networks and frequent network partition will be explored, as well as the problem of semantic encoding and metadata management during rapidly evolving outbreaks and epidemics. The distributed task management functions involving thousands of health care workers needed to deal effectively with an influenza pandemic will also be described. The second part of the presentation will describe NetEpi Analysis, which is a tool for interactive exploratory data analysis of large population health data sets ("large" meaning in the range of 10 to 100 million records). This is written primarily in Python, NumPy and R and is also available under an open-source license. The simple but somewhat novel data reduction and summarisation approach used, involving an object-oriented implementation of fast set operations on sorted inverse ordinal mappings of vertically-disaggregated dataset columns will be described, and the strengths and weaknesses of this approach will be explored. Future directions, including the use of parallel computing, to which the our approach lends itself, will be discussed. Both tools will be briefly demonstrated.
Keywords: public health, disease outbreaks, web applications, data analysis, Python, PostgreSQL, R
Dr Tim Churches
Medical epidemiologist, Population Health Division, New South Wales Department of Health
|
Dr James Farrow
Farrow-Norris Pty Ltd and School of IT, University of Sydney
|
Ref: OS7P0061