Bootcamp on Systems-Data Science: Practical Combining Data Science and Systems Science for Health (June 27-June 30, 2022)

Intended Audience

Purpose

Instructor

Teaching Assistants

Location

Summary

Registration Cost and Link


Systems Science and Data Science are two rapidly developing areas of computational science that have been demonstrated to offer tremendous capacity for informing health understanding, and which are applied by a growing number of projects in health and health care.  While each of these approaches taps the power of computational models, they have traditionally largely been pursued in isolation from each other.  Such fragmentation is particularly unfortunate, because the techniques are not merely highly compatible — for example, in each using computational or informatics mechanisms to provide temporally and locationally fine grained longitudinal understanding across multiple generative pathways — but synergistic, with each tradition opening strong opportunities for empowering the other, and with the combination of both yielding opportunities for insight and improved decision making far beyond the sum of what each can bring in isolation.  We present here a proven set of Systems-Data Science methods that achieve this “whole greater than the sum of its parts” and that allow each approach to better fulfill its full potential. 

This bootcamp will explore the repeatedly demonstrated power of Systems-Data science methods by systematically presenting, exploring, explicating and understanding applicability of a half-dozen distinct but complementary techniques that cross-leverage Systems Science techniques (in the form of dynamic modeling) and Data Science, particularly emphasizing the role of cross-linking models with “big data” offering high volume, velocity, variety, and veracity.  

While some of the techniques explored will be simpler — such as model parameterization with dynamic network data and calibration of models to empirical evidence gathered via big data — most involve leveraging of machine learning techniques such as Approximate Bayesian Computation, Particle Filtering, Markov Chain Monte Carlo, Particle Markov Chain Monte Carlo, State Space Embedding, Convergent Cross-Mapping and Deep Learning methods.   Coverage of the overwhelming majority of the methods presented will include 1) A health application case study 2) Conceptual background in the major methods used 3) An exemplar implementation that can be adapted by participants 4) Provision or access to codebases or libraries implementing the underlying methodology.  In many cases — such as with smartphone-based data collection and google search query volumes — the bootcamp will further provide participants with access to tools or codebases to collect such data, together with examples of such data and analyses using it.  We will also feature case studies of use of these technologies for day-to-day COVID-19 decision support provincially and across Canada.

Intended Audience

Combining “big data” with “dynamic modeling”and “machine learning” for health decision making insight.

This workshop is targeted at professionals from a variety of health fields, including quantitatively grounded health researchers, health decision makers.  While material is presented in a way that is accessible to a quantitative health researcher, the bootcamp will also confer much value to researchers from STEM backgrounds interested in applying such techniques.  Because of time constraints, the event does not include a thorough introduction to dynamic modeling (such as using agent-based and system dynamics modeling); participants unfamiliar with such techniques are advised to refer to introductory online videos of the instructor (https://www.youtube.com/NathanielOsgood) or to seek out literature on the basics of such approaches.

Purpose

Over the past decade, public health and health care decision making has increasingly been impacted by the growth of two recent, versatile and deep computational traditions:  Data Science and Systems Science. Each of these traditions offer great promise for both sharpened understanding of traditional health questions and assistance in addressing complex health challenges confronting the nation and the world.  

While both Systems Science and Data Science represent cutting-edge computational traditions that offer great promise for fine-temporal grained longitudinal understanding across multiple pathways of complex systems, these two traditions are typically pursued in parallel rather than in a joint manner.  Within this bootcamp, we will discuss and provide a means of tapping the great promise of cross-leveraging Systems Science techniques (in the form of dynamic modeling) and Data Science, particularly emphasizing the role of cross-linking models with “big data” offering high volume, velocity, variety and veracity.  

To that end, this bootcamp will systematically survey well-tested approaches for combining of Systems Science techniques (in the form of dynamic modeling) and rich data sources, particularly emphasizing means of cross-linking models with big data offering high volume, velocity, variety and veracity.  While some of these linkage methods will be simpler — such as model parameterization with dynamic network data and calibration of models to empirical evidence gathered via big data — most involve leveraging of machine learning techniques such as via Hidden Markov Models, Particle Filters, Markov Chain Monte Carlo, Particle Markov Chain Monte Carlo, and Deep Learning methods.   Coverage of the overwhelming majority of the methods presented will include 1) a health application case study 2) conceptual background in the major methods used 3) an exemplar implementation that can be adapted by participants 4) provision or access to codebases or libraries implementing the underlying methodology.  In many cases — such as with smartphone-based data collection, twitter harvesting, and google search query volumes — the bootcamp will further provide participants with access to tools or codebases to collect such data, together with examples of such data and analyses using it.

Examples of data discussed in this bootcamp include fine-grained temporal, geographic and network information collected by smartphone-based and wearables (including data from sensors, survey instruments and crowdsourcing mechanisms), social media posts, data involving online search behaviour, website accesses, historical epidemiological time series and rich cross-linked databases.  Dynamic models grounded by such novel data sources can allow for articulated theory building regarding difficult-to-observe aspects of human behavior. Both models and such data can aid in understanding the dynamics of multiple pathways-to-effect associated with interventions. Such models can also aid in informing evaluation of and judicious selection between interventions to lessen health burdens.   Such models secure particularly powerful benefits from big data when they are complemented by machine learning and computational statistics techniques that permit recurrent model regrounding in the newest evidence, and which allow a model to knit together a holistic portrait of the system — including many latent factors — and which support grounded investigation of intervention effects.

This hands-on tutorial introduces health researchers and practitioners to concrete tools, practical skills and the conceptual background required to combine dynamic modeling and “big data” related to health behavior, and offers guidance to participants in getting started in applying such techniques to studies and applications of specific interest to them.

Finally, the bootcamp discusses limitations, shortcomings, hurdles of and gaps in state-of-the-art techniques for combining dynamic modeling and “big data”, and discuss lessons learned from case studies that encountered acute barriers or limited gains from the techniques presented here.  The bootcamp will further characterize areas where methodological innovation is underway.

Instructor

Dr. Nathaniel D. Osgood

Dr. Nathaniel D. Osgood is a Professor in the Department of Computer Science and Associate Faculty in the Department of Community Health & Epidemiology at the University of Saskatchewan. His research is focused on providing cross-linked simulation, ubiquitous sensing, and machine learning tools to inform understanding of population health trends and health policy tradeoffs. His applications work has addressed challenges in the communicable, zoonotic, environmental, and chronic disease areas.  Dr. Osgood is further the co-creator of two novel mobile sensor-based epidemiological monitoring systems, most recently the Google Android- and iPhone-based iEpi (now Ethica Health) mobile epidemiological monitoring systems.  He has additionally contributed innovations to improve dynamic modeling quality and efficiency, introduced novel techniques hybridizing multiple simulation approaches and simulation models with decision analysis tools, and which leverage such models using data gathered from wireless epidemiological monitoring systems.  Dr. Osgood has led many international courses in simulation modeling and health around the world, and his online videos on the subject attract thousands of views per month.  Prior to joining the U of S faculty, he graduated from MIT with a PhD in Computer Science in 1999, served as a Senior Lecturer at MIT and worked for a number of years in a variety of academic, consulting and industry positions.

Teaching Assistants

The course will be staffed with a broad set of graduate-level teaching assistants, who will provide assistance both during the tutorial sessions and during the open times and post-tutorial brainstorming sessions.  To better address the questions of participants from a wide variety of backgrounds, the teaching assistants will be drawn from both health science and technical backgrounds.

Location

The bootcamp will be held at the University of Saskatchewan.  Lectures on conceptual basics of the approaches (illustrated by reference to the examples) will also be offered, and we will endeavour to record all sessions and make those available with quick turnaround to participants during and following the bootcamp.  Slides and take-home summaries will be available for participants to take home from the bootcamp. Topics are anticipated to include the following, with details of coverage of these and additional topics depending on participant interests expressed via pre-study surveys.  

  • Big data in health
    • Examples
    • Basic characteristics: The 4 V’s and their significance
    • Fundamental compatibility with dynamic models:  Pathway specific, multi-pathway, temporally fine grained, longitudinal, and use in intervention evaluation
  • Systematic high-level survey and summary of
    • Means of interfacing “big data” into dynamic models
    • Ways dynamic modeling can support collection of big data
    • Ways dynamic modeling can support machine learning with big data
    • Machine learning approaches responsive to particular types of research questions
  • Particle Filtering (Sequential Monte Carlo) for providing an evolving view of a model’s latent state
    • Conceptual basis
    • Particle Filtering as CTSCAN: Combining multiple lines of evidence into an evolving picture of the latent state of the system
    • For compartmental (System Dynamics) models, as conducted in a visually transparent fashion in AnyLogic
    • Look at specifics of implementation in AnyLogic
    • Case studies using AnyLogic: Pertussis, H1N1 Influenza, Measles case studies (Example for TB available upon request)
    • Use with stochastically evolving parameters
    • Use to estimate mixing matrices in compartmental transmission models.
    • Understanding options for retrospective estimation:  Sampling Cross-Sectional state vs. Sampling particle trajectories
    • Use with agent-based models: Example, barriers and prospects
    • The importance of latent parameters in planning interventions
    • Guidelines for particle count selection
  • Markov Chain Monte Carlo
    • Conceptual basis
    • Case study: Flu-like communicable disease
    • Look at specifics of implementation in R/Vensim
  • Particle MCMC (Marginal Metropolis hastings variant) for jointly estimating parameters and latent compartmental (System Dynamics) models
    • Conceptual basis
    • Computational challenges
    • Tradeoffs with respect to Particle Filtering
    • Case study: Opioid modeling, Chickenpox, Influenza
    • Look at specifics of implementation in R (via CEPHIL library codebase)
  • Deep learning, big data and dynamic modeling
    • Conceptual basis
    • Modes of combination with big data
    • Case studies:  Communicable disease detection, tweet classification
    • Look at specifics of implementation in Tensor Flow
  • Hidden Markov Models
    • Conceptual basis
    • Diverse means of interacting with dynamic model: Estimating initial state, determining underlying rates of behaviours, decision-making strategies in model
    • Case studies using AnyLogic:  Foodborne illness outbreaks
    • Look at specifics of implementation in R (via mhsmm package)
  • Simulation modeling in support of Data Science
    • Support for generation of synthetic ground truth for evaluation of
      • Study plans
      • Machine learning inference
    • Identification of temporal granularity required
    • Prioritization of data collection at baseline and following interventions

Summary

While Data Science/Big Data and Systems Science both offer rich computational toolboxes for confronting foremost health challenges, work using each is typically pursued in isolation.  This forgoes great opportunity, they are not only complementary, but synergistic. This bootcamp will systematically present participants with the practical tools, conceptual foundation, and case study exemplars associated with multiple means of combining Systems Science techniques (in the form of dynamic modeling using Agent-Based and compartmental/System Dynamics approaches) with big data.  

Registration Cost

$100.00 (plus GST) – Students

$500.00 (plus GST) – Online Attendees

$1,000.00 (plus GST) – In Person Attendees

Registration Link:

https://communityconferences.usask.ca/index.aspx?cid=582