2018 Summit

Artificial Intelligence in Environmental Health Science and Decision Making



15 TW Alexander Dr, Durham, NC 27703

Thursday Afternoon, October 18
Noon-1 pm    Registration
1-5 pm            Brief welcome followed by BayesiaLab seminar

Friday, October 19
8-8:30              Registration
8:30-8:40         Brief Opening
8:40-10:00         Opening Plenaries (Tom Dietterich, Paul Whaley, Samuel Adams)
10:00-10:15       Break
10:15-Noon       5 Case Studies Addressing Key Questions (15-20 minutes each)
11:45-1:00       Lunch
1:00-3:00         2 to 3 Breakout Sessions further answering all these questions and developing some recommendations
3:00-3:45         Share Findings back in Auditorium
3:45-4:00         Conclusion


Bayesian Networks  Artificial Intelligence for Research, Analytics, and Reasoning
Stefan Conrady, Managing Partner, Bayesia USA

“Currently, Bayesian Networks have become one of the most complete, self-sustained and coherent formalisms used for knowledge acquisition, representation and application through computer systems.” (Bouhamed et al., 2015)

In this workshop, we illustrate how scientists in many fields of study — rather than only computer scientists — can employ Bayesian networks as a very practical form of Artificial Intelligence for exploring complex problems. We present the remarkably simple theory behind Bayesian networks and then demonstrate how to utilize them for research and analytics tasks with the BayesiaLab software platform. More specifically, we illustrate supervised and unsupervised machine learning algorithms for knowledge discovery in high-dimensional domains.

Also, while Artificial Intelligence is commonly associated with another buzzword, “Big Data,” we show that Bayesian networks can bring Artificial Intelligence to problems for which we possess little or no data. Here, expert knowledge modeling is critical, and we describe how even a minimal amount of expertise can serve as a basis for robust reasoning under uncertainty with Bayesian networks.

Finally, theoretical expert knowledge remains absolutely mandatory for performing causal inference with non-experimental data. However, we show how the number causal assumptions can be substantially and conveniently reduced by using machine-learned Bayesian network as the model.

Seminar Program Part I (Approx. 60 minutes)


  • The Promise, the Peril, and the Limitations of Artificial Intelligence
  • Human Cognitive Limitations & Biases in Reasoning


  • Human-Machine Teaming
  • Practical Artificial Intelligence for Here & Now

Background: A Conceptual Map of Analytic Modeling and Reasoning

  • X — Inference Type: Probabilistic vs. Deterministic
  • Y — Model Purpose: Observational vs. Causal Inference
  • Z — Model Source: Data vs. Theory

Introducing Bayesian Networks as a Research Framework

  • Under the Hood: The Simple Math of Bayesian Networks
  • Key Advantages of Bayesian Networks as a Modeling Framework

Seminar Program Part II (Approx. 180 minutes)

Bayesian Networks in Practice

  1. Knowledge Encoding & Probabilistic Inference

o    Introductory Example: Prosecutor’s Fallacy

o    Reinventing the Delphi Method with the Bayesia Expert Knowledge Elicitation Environment (BEKEE)

  • Policy Development Under Extreme Uncertainty: Antibiotic and Anti-Malarial Prescription Guidelines in Sub-Saharan Africa
  1. Knowledge Discovery for Classification/Prediction

o    Optimizing the Resources Required for the Diagnosis of Coronary Artery Disease

o    Leukemia Classification with Microarray Analysis

  1. Knowledge Discovery for Human Interpretation

o    Example t.b.d.

o    2D/3D/VR Visualization of Network Structures

  1. Knowledge Encoding + Knowledge Discovery for Causal Inference

o    Simpson’s Paradox Rears its Ugly Head

o    Example t.b.d.

Examples similar to the ones in this seminar can be found in our book, “Bayesian Networks & BayesiaLab: A Practical Introduction for Researchers,” which can be downloaded free of charge (https://www.bayesia.com/book)

All slides, networks, and datasets will be made available for download after the event.


Modern Machine Learning: Probabilistic Modeling and Functional Prediction
Tom Dietterich, Distinguished Professor Emeritus, Oregon State University, School of Electrical Engineering and Computer Science

Machine learning pursues two main paradigms for data analysis: probabilistic programming and function fitting. Probabilistic programming methods provide rich languages for defining probabilistic models and efficient algorithms for fitting those models to data. Function fitting algorithms seek to fit a highly-accurate prediction function drawn from a highly-flexible (non-parametric) class of functions. Building on the Bayesia tutorial from the previous afternoon, this talk will illustrate probabilistic programming by discussing multilevel modeling in the probabilistic programming language Stan. Then the talk will describe the random forest method and present recent techniques for making inferences based on these versatile models. Finally, the talk will discuss deep neural networks and their potential application to problems of environmental health science. These novel methods are best applied to problems of converting unstructured data (images, text) to structured data for subsequent statistical analysis. They also have potential to revolutionize medical imaging.


Systematic Reviews, Machine Learning and the Liberation of Knowledge from Information in Environmental Health Research
Paul Whaley, Evidence Based Toxicology Collaboration Research Fellow, Lancaster Environment Centre, Lancaster University, United Kingdom

Research synthesis methodologists have more-or-less won the argument that systematic methods represent the gold standard for reviewing and mapping scientific evidence in the service of chemicals policy and risk management decision-making. The problem is, being systematic requires a lot of resources: even to answer very focused questions which only include a few dozen primary manuscripts, it can take a typical research team 12-18 months to complete a systematic review – and there are thousands of chemicals which need reviewing over hundreds of health end-points, with results refreshed every couple of years or so. This is already more data than humans can process, and new data is piling up at an exponential rate.

Maintaining the highest standards for evidence reviews therefore requires that we get machines to read scientific documents on our behalf. This in turn requires data infrastructures which can store, and allow nuanced querying of, the very large amount of data points and complex relationships between them which are expressed in scientific documents. In this context, this presentation will give an overview of at least one way in which people who are not computer scientists can contribute to the AI revolution in chemical risk research, why we should get excited about graph databases, and what the likely milestones are between where we are now and the bold new world which is still, let’s be honest, a little way off just yet.


Environmental Health … in Context
Samuel Adams, Senior Artificial Intelligence Researcher, RTI International

Why isn’t the Holy Grail of data science and AI standing before us right now filled to overflowing with all the knowledge and insight we “just know” is there? One of the biggest reasons is the lack of context. The environment is a vast, intermeshed metasystem of metasystems where nearly everything is connected to, and interacts with, everything else, at least at some distance across a network of intermediary systems. New tools like Deep Learning hold great promise, but without putting both the inputs AND the outputs into a greater integrated context, the ultimate value delivered by our collective efforts will still be minimal. This talk will discuss both the opportunities for developing and maintaining large scale contextual knowledge graphs as well as approaches to overcoming the technical challenges along the way.


Case Studies Addressing the Following Questions

What data sets are currently available and what kind of data sets do people need to start collecting?

How to bring these techniques and methods to scientists and decision makers.  How to train those people working in risk assessment. How to have them understand that this is a different paradigm.

What are the curriculum adjustments?

Harnessing Machine Learning to Predict Toxicities
Nicole Kleinstreuer, Deputy Director, National Institute of Environmental Health Sciences, NTP Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM)

Traditional toxicology has relied upon testing chemicals in animals, methods which are costly, time-consuming, low-throughput, and often provide little insight into the mechanisms by which chemicals might affect human health and disease pathways. Toxicology testing in the 21st century, as demonstrated by the federal Tox21 research consortium, provides new methods to rapidly and reliably test tens of thousands of chemicals in a vast array of cellular, molecular, and genetic targets. When combined with large, well-curated reference datasets for substances with demonstrated toxic potential, machine learning approaches such as support vector machines, random forests, and deep learning can be used in supervised and unsupervised ways to predict complex toxicities and find associations between chemical structure and mechanisms. Such models demonstrate the power of artificial intelligence to inform regulatory decision making and help industry design safer, more sustainable products.


Using Bayesian networks to discover relations between genes, environment, and disease
Mark Borsuk, Pratt School of Engineering, Duke University

We review the applicability of Bayesian networks (BNs) for discovering relations between genes, environment, and disease. By translating probabilistic dependencies among variables into graphical models and vice versa, BNs provide a comprehensible and modular framework for representing complex systems. We first describe the Bayesian network approach and its applicability to understanding the genetic and environmental basis of disease. We then describe a variety of algorithms for learning the structure of a network from observational data. Because of their relevance to real-world applications, the topics of missing data and causal interpretation are emphasized. The BN approach is then exemplified through application to data from a population-based study of bladder cancer in New Hampshire. We find that allowing for network structures that depart from a strict causal interpretation enhances our ability to discover complex associations including gene-gene (epistasis) and gene-environment interactions. While BNs are already powerful tools for the genetic dissection of disease and generation of prognostic models, there remain some conceptual and computational challenges. These include the proper handling of continuous variables and unmeasured factors, the explicit incorporation of prior knowledge, and the evaluation and communication of the robustness of substantive conclusions to alternative assumptions and data manifestations.


Machine Learning in Dose-Response Assessment: Translating Science to Decisions
Jackie MacDonald Gibson, Associate Professor, UNC Chapel Hill, Department of Environmental Sciences and Engineering; RTI University Scholar, 2017-2018

In the United States, decisions about whether to implement new drinking water regulations are based on quantitative risk assessments of the potential health benefits of these regulations. The methods used for these risk assessments do not yet incorporate major advances in artificial intelligence that could improve risk predictions.  To demonstrate the potential advantages of adopting artificial intelligence methods for United States environmental regulatory risk assessment, this talk will present a case study predicting the benefits of decreasing arsenic exposure in drinking water using a machine-learned Bayesian belief network model. The model was learned from a data set of 1,050 individuals from an arsenic-endemic region of Chihuahua, Mexico. The model integrates arsenic exposure data with biomarkers of arsenic metabolism and demographic characteristics to quantify the probability of diabetes for different exposure levels and population subgroups. The predictive ability of the Bayesian network model will be compared to that of a reference dose model and of a model estimated with Benchmark Dose Software, which are the prevailing approaches in current United States regulatory risk assessment practice. Implications for policymaking will be discussed.


Bayesian Inference for Substance and Chemical Toxicity (BISCT)
Lyle D. Burgoon, Leader, US Army Engineer Research and Development Center, Bioinformatics and Computational Toxicology Group

Lyle Burgoon leads research on Artificial Intelligence to Drive the Military Environment. The focus of his work is the creation of Artificial Intelligence that can augment human decision-making in primarily military and humanitarian environments. He is an expert in AI sensor fusion (sensors including Internet of Things sensors, ground-based sensors, space-based sensors, laboratory-based sensors) to augment human decision-making in difficult national security settings. Environmental public health challenges that he works on include predicting the impacts of the environment and environmental changes on military intelligence and warfighter readiness, AI sensor fusion to understand urban warfare environments and health impacts, to technologies for food security and logistics, to forecasting the potential secondary and tertiary impacts on human health as a result of warfare, and the automated global identification of human infrastructure networks anywhere in the world. His recent work also includes forecasting potential toxicity of military materials based on structural information, and the use of Bayesian Networks to fuse data from laboratory assays to predict potential toxicity.


Rare Diseases and AI Analysis with Potential Use of the Environmental Genome
Dr. Michael Kowolenko, CEO, NoviSystems
Dr. Michael Overcash, Executive Director, Environmental Genome Initiative

The ability to develop analytical infrastructure that allows for the fusion of different data sets can result in applications that allow users to collect information and draw conclusions in a less cumbersome and confusing fashion. This application of “backend” compute techniques such as machine learning and natural language processing allows complex relationships to be explored as a method to convey complex relationships to individuals with Rare Diseases. In a related effort, the individuals with an occurrence of rare diseases are grouped into familial or nonfamilial categories. The latter can be looked at with several analytics built on the Environmental Genome which estimates specific chemical emissions from manufacturing plants, transportation areas, the energy grid locations, and agriculture. The future challenge is how to link these chemical sources to the exposure zones of those with nonfamilial rare diseases. This is the first step in a much more transparent, but complicated approach to environmental pollutants and the use of artificial intelligence.




Samuel Adams, RTI International
Surafel Adere, Duke
Michelle Angrish, US EPA
Martin Armes, The Collaborative
Scott Auerbach, NIEHS
Maureen Avakian, MDB Inc.
David Aylor, NC State
Maryam Azimi, Lenovo
Mamta Behl, NIEHS
Shannon Bell, ILS
Alexandre Borrel, NIEHS
Mark Borsuk, Duke
David Brown, The Collaborative
Lyle Burgoon, US Army
Neal Cariello, Integrated Laboratory Systems
Xialquing Chang, Integrated Laboratory Systems
Rada Chirkova, NC State
Sarah Catherine Colley, UNC Chapel Hill
Gwen Collman, NIEHS
Stefan Conrady, Bayesia USA
Jesse Cushman, NIEHS
Sivanesan Dakshanamurthy, Georgetown
Sally Darney, NIEHS
Deepa Dawadi
Demosmita, NC State
Rob DeWoskin, etioLogic
Tom Dietterich, Oregon State
Cedric Dongmo, University of Yaounde
Chris Duncan, NIEHS
Steve Dutton, US EPA
Steve Edwards, RTI International
Neeraja Erraguntla, American Chemistry Council
Jianing Fan, Duke
Lydia Feinstein, Social & Scientific Systems
Kenda Freeman, MDB Inc.
Jim French, Live Learn Innovate Foundation
Stavros Garantziotis, NIEHS
Patrick Gray, Duke
Lauren Gridley, RTI International
Hanbing Guan, Cisco
John Hardin, NC Board of Science, Technology & Innovation
Linchen He, Duke
Gina Hilton, PETA International Science Consortium
Stephanie Holmgren, NIEHS
Kennedy Holt, UNC Chapel Hill/NC Public Health
Beibei Hu, Duke
Sara Imhof, NC Biotech Center
Jaronda Ingram, Zoetis
Kristin Inman, NIEHS
Agnes Janoshazi, NIEHS
Peer Karmaus, NIEHS
Chandana Kasireddy, NIEHS
Channa Keshava, US EPA
Manal Khan, UNC Chapel Hill
Nicole Kleinstreuer, NIEHS
Les Klimczak, NIEHS
Michael Kowolenko, NoviSystems
Hamid Krim, NC State
Dhirendra Kumar, NIEHS
Archana Lamichhane, NIEHS
Christopher Lavender, NIEHS
Janice Lee, US EPA
Tess Leuthner, Duke
Jian-Liang Li, NIEHS
Xing Li, Duke
Yuanyuan Li, NIEHS
Rui Liu, Social & Scientific Systems
Yun Liu, UNC Chapel Hill
Ming Lu, Health Canada
Jackie MacDonald Gibson, UNC/RTI
Alexandra Maertens, Johns Hopkins
Elizabeth Mannshardt, US EPA
Dwi Sianto Mansjur, IBM
Kamel Mansouri, ILS
Courtney McCortsin, Duke
Sena McCrory, Duke
Vanessa Michelou, Novozymes
Erika Munshi, Duke
Reshma Nargund, Duke
Emmanuel Obeng-Gyasi, NC A&T
Michael Overcash, Environmental Genome Initiative
Okan Pala, NC State
Shannon Parker, Duke
Rajneesh Pathania, NIEHS
John Phillips, Georgia International
Terry Pierson, RTI International
Sunil Rajgopal Prasad, MResult
Asif Rashid, DS Technologies
Caroline Ridley, US EPA
Leon Rosentsvit, Technion IIT
Marianna Rosentsvit, NIEHS
Risa Sayre, US EPA
Kate Scholfield, US EPA
Frederic Seidler, Duke
Joseph Shaw, Indiana
Mina Shehee, NC DHHS
Shanshan Shi, Duke
Susan Simmons, NC State
Linda Smail, Zayed University
Raquel Silva, US EPA
Solomon Tamkabari, Rivers State College
Michele Taylor US EPA
Shane Thacker, US EPA
Kimberly Thigpen Tart, NIEHS
Jacob Traverse, Triangle Global Health Consortium
Natalia Vinas, ERDC
Jimmy Washington, NIEHS
James Weaver, US EPA
Leah Wehmas, US EPA
Paul Whaley, Lancaster University UK
Emily Woolard, US EPA
Ya Xue, Infinia ML
Chis Yoo
Hong Zu, NIEHS
Hal Zenick, US EPA
Yongjie Zhou, US FDA


Michelle Angrish, US Environmental Protection Agency
Martin Armes, Research Triangle Environmental Health Collaborative
Scott Auerbach, National Institute of Environmental Health Sciences
Maureen Avakian, MDB, Inc.
Mark Borsuk, Duke University
David Brown, Research Triangle Environmental Health Collaborative
Kenda Freeman, MDB, Inc.
Stephanie Holmgren, National Institute of Environmental Health Sciences
Jackie MacDonald Gibson, UNC-Chapel Hill/RTI International
Michael Overcash, Environmental Genome Initiative
Terrence Pierson, RTI International
Michele Taylor, US Environmental Protection Agency
Kimberly Thigpen Tart, National Institute of Environmental Health Sciences
Hal Zenick (Retired), US Environmental Protection Agency

1. Applications and Data

  1. When does your organization or project use artificial intelligence to solve environmental health sciences research questions?
    • What are these questions?
  2. What AI components are used to solve those problems (approaches and methods?
    • What are the key skill sets needed to solve those problems?
  3. What infrastructure and data/database challenges exist?
    • How are data made consistent and interoperable?
    • Semantic querying capability?

2.  Decision-Makers

  1. How might AI enhance chemical risk assessment?
  2. Who (if anyone) in your organization is thinking about how artificial intelligence can be used to improve decision-making?
  3. What specific challenges (e.g. technical or institutional) do you or does your organization face?
  4. How can risk analysts be educated about artificial intelligence methods?
  5. What are the institutional barriers to wider use by decision-makers and risk analysts?

3.  Educating the Next Generation

  1. How can the next generation of environmental health scientists, risk assessors, and decision-makers be trained in artificial intelligence?
  2. How can risk analysts be “trained-up” to use AI components or be connected to data scientsists?
  3. Which methods are most important to include in future educational curricula?
  4. What curriculum adjustments are needed?
  5. What institutional changes are needed to promote this training?

Work Groups

Hotel Information

Contact Martin Armes for details