Towards a Global Lake Observatory Research and Education Network

10 January 2005

 

1. Scientific Background:

 

Water has played a critical role in society for thousands of years and will increasingly constrain human economic and social development. Aquatic systems can play important roles as conduits of inorganic carbon from terrestrial systems to the atmosphere and as mineralization sites of organic carbon (Cole et al. 1994). The carbon cycle in lakes is key to understanding land-water interactions, the trophic state of lakes, and clarity of lakes. Lake metabolism, described as the balance between the complementary processes of gross primary production (GPP) and respiration (R) (Hanson et al 2003), is a fundamental lake characteristic that helps describe the source of carbon incorporated into all trophic levels of the ecosystem[1]. 

 

Lake metabolism can be measured using high frequency (~0.002 Hz) observations of dissolved oxygen or carbon dioxide concentrations in the surface waters of lakes (Cole et al), (Hanson et al).  These measurements are made using sensors deployed on a lake with data sent wirelessly to a base station and incorporated into an easily accessible database.  Calculations of metabolism can be made using diel data. In addition measurements such as wind speed, wind direction, water temperature profiles, three-dimensional water circulation patterns (measured by an acoustic Doppler current profiler) are desirable to refine the GPP and R calculations.  While these data are sufficient to calculate metabolism variables, it is necessary to use additional information to interpret the values.  Ancillary information on weather (precipitation, photosynthetically active radiation), changes in bacterial, algal and zooplankton abundance, and nutrient status are often necessary to understand patterns of lake metabolism.  These data are typically not collected on the instrumented buoy but many are available after the fact in NTL LTER or other databases. 

 

To understand the role of lakes in the global carbon balance, a system of buoys with sensors have been put on lakes in Wisconsin and Taiwan (Taiwan’s sensored-buoy became operational April 2004), linked together by prototype data federation cross-query interface for sensor-produced data (lakemetabolism.org, and ecogrid.nchc.org.tw).

 

This first-step towards a global network has already produced understanding of lake dynamics, especially in the role of episodic events that influence the physics, chemistry and biology of the lake, by using moored meteorological instruments and thermistor chains and oxygen sensors. The additional data enabled calculation of gross primary production, respiration, and net ecosystem production.  These sensors also showed the response of Yuan Yang Lake in Taiwan to several typhoons and were helpful in interpreting the apparent resetting of the microbial community composition due to the mixing and increased runoff.   Furthermore, these sensors sample at a higher spatial and temporal frequency than possible manually and provide data during important events when access is limited.

 

Because lake metabolism is a property of fundamental interest to limnologists, resource managers and the public, and because it can be estimated using field-deployed sensors such as those NTL or YYL, lake metabolism is an excellent choice as a science driver of biological database and informatics research. 

 

We envision a global network of thousands of lake metabolism buoys, deployed strategically around the globe in disparate lakes, to understand at local, regional, continental and global scales such issues as the direction and rate of change of lake metabolism; the factors controlling daily, seasonal, and among-year variability of lake metabolism; and the reciprocal interactions between human use of lakes and lake metabolism.  A global network of automated lake observatories, each collecting and transferring data in near real time, is within our grasp in the next decade.  But this vision, as well as those of other large-scale projects, cannot come to fruition without real and meaningful partnerships between ecological scientists, developers of middleware, and information management specialists to solve key information technology and database issues.

 

In the next section we describe the current activities that have led to our current accomplishment, and lay out some design principles for a global network of lakes.  The philosophy we have followed of making data available to the scientists as quickly as possible and in a sustained way has allowed us to see limitations of our system while demonstrating the value of our approach to gain new scientific insights. 

 

2. Architectural Considerations and Conceptual Design:

 

In order to realize our vision of a global lake observatory network, much more work needs to be done.  The pieces of the infrastructure that will be developed and implemented with current or pending support (see Appendix) will put in place some critical components, and will provide us stepping stones to the next level of infrastructure. But more technology needs to be developed and deployed.  In particular, we envision a system that will allow a researcher to obtain not just the sensor data, but the ancillary data to interpret lake metabolism calculations. We envision a system that will allow a researcher to launch a calculation (e.g. lake metabolism, lake evaporation) and have the system provide that response. We envision a system that a researcher, with permission, will be able to adjust any of the sensors in the network to capture data from unique events, perhaps to do so automatically, as the event unfolds. 

 

At this stage of the development, we feel the next critical deployment issue is to redesign our working prototype that links two lakes together to other lakes (see section 3 for a list of individuals invited to the workshop – and their lakes).

 

The philosophy we have followed of making data available to the scientists as quickly as possible and in a sustained way has allowed us to see limitations of our system while demonstrating the value of our approach to gain new scientific insights.

 

We will continue to follow that overarching philosophy in this project.

 

2.a. Architectural Design of Global Lake Observatory Network

 

To prepare for this next stage of expansion several key design and implementation strategies are proposed:

·        Each lake or lake system is a separate (autonomous) administrative domain, which is responsible for the deployment and maintenance of equipment (from sensor to database) and curation of data (including providing access to data and creation of metadata).

·        For lakes participating in the network, a core set of sensor data are openly available  as soon as it comes from the sensor/data logger to the database. We assume that the data will “flow” in near real time, automatically from sensor to database.

·        The system must be easily extensible to new lakes. Our approach will be to design an interface to allow each lake to register its data, and the system will then allow others to see those data. We are using a services oriented architecture (i.e. web service based) to implement such a prototype.

·        We will standardize core data (e.g., terminology, language) as far as possible, using ontologies (or a lookup table) for other data.

·        The network design must automatically detect and reflect changes within a lake administrative domain that are manifest in the lake database, e.g., changes in sensor location, number of sensors from a site. Note: within an administrative domain the responsibility to update the database will be with the lake administrator (the automated updating of the lake database given changes in sensor or sensor locations is part of NSF BDI proposal – see Appendix). For the global design linking many lakes, we are planning that once a lake is registered, the system will  automatically check all databases and update the interface, reflecting the data that are available. .

 

Lake administrative domains: Currently we have two lake administrative domains we have been working with: North Temperate Lakes, in Wisconsin; and Yuan Yang Lake in Taiwan (Phase 0 of our network).  Each of the systems is responsible for providing the infrastructure to move key sensor data (lake temperature at various depths, dissolved oxygen at several depths, wind speed and direction, ambient air temperature, barometric pressure, precipitation, lake elevation) to an internet accessible database, with a known schema. These data come from different locations around the lakes.

 

This is the simplest version of a lake infrastructure. More complicated versions would include one or more of the following:

·        Different sensor data (e.g. acoustic Doppler current profiles) (to be tested with GBMF funding)

·        Sensor data from more than one lake in a lake system (as is being implemented with funds from GBMF)

·        Ancillary data from lake samples, to give information on dissolved organic carbon, dissolved inorganic carbon, total phosphorous, total nitrogen, chlorophyll concentration, bacteria counts and types, phytoplankton, major ions. Many of these data are available on line from NTL, but are not yet at YYL.

·        Sensors will be added, moved, removed at the will of the researchers. Thus, any local design must translate these changes to the database.

·        See figure XX for a schematics for what a complex but single administrative domain would address. 

 

Global (Networked) Architecture: A global (networked) architecture would provide access to data from several lake administrative domains. Currently, a prototype exists via support from PRAGMA, Ecogrid, and the YYL projects, that hooks together two lakes with a single interface. This prototype used the JDBC connection directly into the databases at both NTL and YYL (Phase 0 sites). This has proven a very useful demonstration, since already scientists have seen the impact of extreme weather (typhoon) on YYL, and its response. This will result in a paper in 2005.

 

However, that system is not extensible without additional manual labor. Furthermore, we expect that not all databases will offer JDBC connections. We are redesigning this system, via both PRAGMA and GBMF funding, with the idea of using web services to access data.. However we will limit the prototype to sensor data. At the March 2005 meeting, supported by GBMF and other NSF funds, we will involve the community to review the new prototype, establish a core set of sensor data to be collected by all lakes, and a set of ancillary data that could be collected, and discuss ways to extend that system to more lakes. The version of the system with just sensor data is illustrated in Figure YY.

 

·        Extending the system beyond sensor data to other ancillary data (e.g. data that come from water samples analyzed in the lab – such as the amount of total phosphorous)

·        Instituting security features to protect the data from malicious attack.

·        Integrating data with computational or presentational tools

·        Developing cross-site query tools.

 

In short,

·        Phase 0: We established a prototype, using JDBC connections to allow query of two lakes.

·        Phase 1: We are redesigning the interface two and more lakes to provide a registration of lakes into the larger system and to allow connections into the data in the database via web services. We focus exclusively on data from sensors. We will have a prototype by March 2005 of this system.

·        Phase 2: We want to move the Phase 1 prototype into use and extend it to three other lakes by the end of 2005. How to do this will be a key discussion at the March meeting, and will entail understanding the physical infrastructure at various lakes, but also what data are currently being collected, what terms are used to refer to the data, what units and other metadata are used, and how data can be accessed.

 

There are many other issues we need to deal with at the global, networked level:

·        Extending the system to other types of sensor data, such as Acoustic Doppler Current Profile data, or visual data

·        Extending the system beyond sensor data to other ancillary data (e.g. data that come from water samples analyzed in the lab – such as the amount of total phosphorous)

·        Instituting security features to protect the data from malicious attack.

·        Integrating data with computational or presentational tools

·        Developing cross-site query tools.

·        Linking data to other data, e.g. remote sensing data.

 

 


3. Lakes Participating in the Meeting

 

Lake

Project or Institution

Location

Contact

NTL (Trout Bog)

NTL – LTER U Wisconsin

Wisconsin USA

Tim Kratz

Paul Hanson

YYL

TFRI, AS, NCHC

Hsinchu County.

Taiwan

Hen-biau King, Fang-Pang Lin

Several

Lammi Bio. Station

Finland

Lauri Arvola Marko Jarvinen

Lake Rotoiti

CBER, U Waikato

New Zealand

David Hamilton Eloise Ryan

Lake Soyang

KNU

Korea

Bomchul Kim

 

 

Japan

Toshio Iwakuma

 

 

Israel

Ami Nishri

 

Dorest Research Center

Canada

Andrew Patterson

 

Nanjing Inst of Geography & Limnology

China

Boqiang Qin

 

 

Australia

Jason Antenucci

 

 

UK

Glen George

 

 

 

 

 

 

 

 


Appendix: Projects contributing to a global lake observatory system:

 

The core of our joint progress to date leveraged efforts in the North Temperate Lakes LTER, the International LTER, and PRAGMA. From August 2003 through April 2004, we embarked on the first step along the path from concept to implementation, in the beginning of a global network of lakes measuring and sharing lake metabolism data by creating a prototype network of lakes in Wisconsin and Taiwan.  The recent Gordon and Betty Moore Foundation award and a pending BDI proposal builds on this concrete success, and provides on a strong foundation of ongoing interdisciplinary research described in this section.

 

NTL: Comparative Study of a Suite of Lakes in Wisconsin -- North Temperate Lakes Long-Term Ecological Research (NTL-LTER), S.R. Carpenter, T.K. Kratz, B.J. Benson, and 16 co-PIs. DEB 0217533, 10/15/02 – 10/15/08, $6.72M, http://lter.limnology.wisc.edu.  The goals of the NTL-LTER program, established in 1981, are to detect long-term change in lakes and surrounding landscapes; understand physical, chemical, and biological linkages at lake, landscape, and regional scales; and understand feedbacks between lake and human processes. Seven lakes in northern Wisconsin and four lakes in southern Wisconsin serve as foci for this work; three lakes are instrumented with buoys capable of measuring lake metabolism. Since 2000, the project has produced more than 190 peer-reviewed publications and 15 graduate student theses, and dozens of core datasets of physical, chemical and biological limnology as well as data characterizing human activities. The NTL research drives the scientific questions, and the corresponding practical experience with deploying sensor network motivates our proposed activities.

 

YYL: Yuan Yang Lake. T. Kratz, http://lakemetabolism.org. A recent supplemental award has allowed the NTL team to partner with the PRAGMA program as well as with colleagues at the Taiwan Forestry Research Institute, Academia Sinica, and the Taiwan National Center for High-Performance Computing (NCHC) to deploy a lake metabolism buoy in Yuan Yang Lake (YYL), Taiwan.  Data from the buoys in Taiwan and Wisconsin move to an Oracle database and are web-accessible in near real time [NN04]. This prototype forms the cornerstone for the present project and a testbed for tools we develop.

 

PRAGMA: Pacific Rim Application and Grid Middleware Assembly, P. Arzberger, P. Papadopoulos, INT-0314015, 7/15/03 – 6/30/06, $1,201,299, http://www.pragma-grid.net. PRAGMA, an open, institution-based organization consisting of 21 institutional members, with a steering committee (which includes Donald F. McMullen and Fang-Pang Lin), has built sustained collaborations and advanced the use of the grid by applications. Publications documenting success include telescience [Lee03]; quantum chemistry [Sudholt04]; and assisting colleagues in Taiwan combating SARS [PARS04]. Finally, PRAGMA helped connect NTL and Taiwan researchers. PRAGMA will assist expanding this project and will provide a platform for testing software developed by the BDI project.

 

PRIME: Pacific Rim Undergraduate Experiences, G. Wienhausen, L. Feldman, P. Arzberger, INT-0407508, 4/1/04 – 3/31/07, $156,039, http://prime.ucsd.edu. PRIME supports 9-week, international research internships for undergraduate students at select PRAMGA sites. Students have co-mentors, one at UCSD, and one at the host site, and either develop or use cyberinfrastructure to advance scientific applications. In 2004, the initial host sites include the National Center for High-performance Computing (NCHC), Taiwan; Cybermedia Center (CMC) Osaka University, Japan; Monash University, Australia. PRIME students have contributed infrastructure for the global lake observatory system and will be encouraged to work in this area.

 

The Gordon and Betty Moore Foundation (GBMF):Toward a Distributed Information System for Marine Biology and Limnology.P.Arzberger, A. Gupta, K. Stocks, T. Kratz. $1,762,421, 15 Oct 2004 – 14 Oct 2006. The Biomedical Informatics Research Network (BIRN), a National Institutes of Health (NIH) initiative, has constructed an infrastructure that allows researchers nationwide to share and analyze biomedical data, and used mediation of the data as a key technology. The GBMF award will extend that infrastructure, designed for the biomedical community, to handle queries on spatial and temporal data. The test cases for the oceans include OBIS, and overfishing of Seamounts. The funding for the lakes component will provide new equipment on an entire lake system, thus allowing first-of-its-kind analysis of an entire lake system, and tools to make accessible those sensor data.  In addition, the GBMF funding will initiate a process via a workshop to organize the international community to build a global network of lake sensors as well as the linkage between coral reefs sites. The GBMF project will extend a tool that will be useful to the global environmental community, establish a prototype lake system observatory, and initiate a global lake observatory network.

 

Biological Databases and Informatics (BDI): Automating Scaling and Data Processing in a Network of Sensors: Towards a Global Network for Lake Metabolism Research and Education. P. Arzberger, F. Vernon, R. McMullen, T. Kratz. DBI 0446802, Pending. This pending proposal to the NSF’s Biological Database and Informatics (BDI) program addresses two issues critical to scaling the current system to a network of hundreds to thousands of lake metabolism sensors: 1. automating the reconfiguration of databases as sensors are dynamically deployed in a network; 2. automating the quality assurance and event detection algorithms to determine the validity of the sensor signals or the occurrence of a biological or physical event.  The technologies employed range from intelligent agents in a grid-services framework to machine learning for event detection.  If funded, the BDI proposal would provide tools to scale up by automation two key activities within a lake system.

 

EcoGrid (ecogrid.nchc.org.tw),Taiwan, lead by the National Center for High-performance Computing (NCHC), a founding PRAGMA partner, is a collaboration with the Taiwan Forest Research Institute (TFRI), and the Taiwan Ecological Research Network (TERN), which is part of the International Long Term Ecological Research network.  EcoGrid has deployed infrastructure at several of the TERN and TFRI sites (including Fushan, Yuan Yang Lake, and Kenting), developed technology for storing and sorting data, and provided the database expertise to provide the YYL data to the prototype.

 



[1] If GPP is greater than R, then the lake is autotrophic and internally produces reduced carbon sufficient to fuel higher trophic levels.  Alternatively, if GPP is less than R, then the lake is heterotrophic and must receive an external source of reduced carbon to fuel higher trophic levels.