top of page

About Introduction to Biomedical Data Science

Data science has steadily increased in popularity over the past decade and involves all industries including healthcare and the biomedical sciences. The goal in healthcare is to use data science methods to improve medical quality and safety and reduce costs. 


There is optimism that machine learning and artificial intelligence (AI) will be major drivers of predictive analytics, image, voice and text recognition. Recently, applied AI models have outperformed medical experts at classifying medical images - particularly in cardiology, dermatology, ophthalmology and radiology.

This textbook was written for anyone in the medical or informatics fields who feels they need a

stronger background in data science.  Understanding the textbook content and data exercises do not require programming skills or higher math. Chapter exercises are based on healthcare data and supplemental YouTube videos are available in most chapters. The content begins with spreadsheet tips and tricks and ends with artificial intelligence.

Bill Hersh MD announced in February 2021 that OHSU will launch a new course on applied data science and machine learning for health and clinical informatics students using this textbook. 

A special thanks to Ann Yoshihashi MD for her help with the publication of this textbook. If you have any questions, corrections or suggestions please use the Contact Us page. 

Instructors: please register under the Register tab so you are eligible to download a PDF version of the book, PowerPoint slides, and an instructor manual. 

Bios: Bob Hoyt MD and Bob Muenchen MS PSTAT

Below you will find a list of the textbook authors and a detailed table of contents.

1. Authors


Brenda Griffith

Technical Writer


Austin, TX



Associate Clinical Professor

Department of Internal Medicine

Virginia Commonwealth University

Richmond, VA


David Hurwitz MD, FACP, ABPM-CI

Associate CMIO

Allscripts Healthcare Solutions

Chicago, IL


Madhurima Kaushal MS


Washington University at St. Louis, School of Medicine

St. Louis, MO



Assistant Professor

New York Medical College

Department of Emergency Medicine

Valhalla, NY


Karen A. Monsen PhD, RN, FAMIA, FAAN


School of Nursing

University of Minnesota

Minneapolis, MN


Robert Muenchen MS, PSTAT

Manager, Research Computing Support

University of Tennessee

Knoxville, TN


Dallas Snider PhD

Chair, Department of Information Technology

University of West Florida

Pensacola, FL

2. Table of Contents


  1. Introduction

  2. Background and history

  3. Conflicting perspectives

    1. the statistician’s perspective

    2. the machine learner’s perspective

    3. the database administrator’s perspective

    4. the data visualizer’s perspective

  4. Data analytical processes

    1. raw data

    2. data pre-processing

    3. exploratory data analysis (EDA)

    4. predictive modeling approaches

    5. types of models

    6. types of software

  5. Major types of analytics

    1. descriptive analytics

    2. diagnostic analytics

    3. predictive analytics (modeling)

    4. prescriptive analytics

  6. Putting it all together

  7. Biomedical data science tools

  8. Biomedical data science education

  9. Biomedical data science careers

  10. Importance of soft skills in data science

  11. Biomedical data science resources

  12. Biomedical data science challenges

  13. Future trends

  14. Conclusion

  15. References


  1. Introduction

    1. basic spreadsheet functions

    2. download the sample spreadsheet

  2. Navigating the worksheet

  3. Clinical application of spreadsheets

    1. formulas and functions

    2. filter

    3. sorting data

    4. freezing panes

    5. conditional formatting

    6. pivot tables

    7. visualization

    8. data analysis

  4. Tips and tricks

    1. Microsoft Excel shortcuts – windows users

    2. Google sheets tips and tricks

  5. Conclusions

  6. Exercises

  7. References


  1. Introduction

  2. Measures of central tendency & dispersion

    1. the normal and log-normal distributions

  3. Descriptive and inferential statistics

  4. Categorical data analysis

  5. Diagnostic tests

  6. Bayes’ theorem

  7. Types of research studies

    1. observational studies

    2. interventional studies

    3. meta-analysis

  8. Correlation

  9. Linear regression

  10. Comparing two groups

    1. the independent-samples t-test

    2. the wilcoxon-mann-whitney test

  11. Comparing more than two groups

  12. Other types of tests

    1. generalized tests

    2. exact or permutation tests

    3. bootstrap or resampling tests

  13. Stats packages and online calculators

    1. commercial packages

    2. non-commercial or open source packages

    3. online calculators

  14. Challenges

  15. Future trends

  16. Conclusion

  17. Exercises

  18. References


  1. Introduction

    1. historical data visualizations

    2. visualization frameworks

  2. Visualization basics

  3. Data visualization software

    1. Microsoft Excel

    2. Google sheets

    3. Tableau

    4. R programming language

    5. other visualization programs

  4. Visualization options

    1. visualizing categorical data

    2. visualizing continuous data

  5. Dashboards

  6. Geographic maps

  7. Challenges

  8. Conclusion

  9. Exercises

  10. References


  1. Introduction

  2. Definitions

  3. A brief history of database models

    1. hierarchical model

    2. network model

    3. relational model

  4. Relational database structure

  5. Clinical data warehouses (CDWs)

  6. Structured query language (SQL)

  7. Learning SQL

  8. Conclusion

  9. Exercises

  10. References


  1. Introduction

  2. The seven v’s of big data related to health care data

  3. Technical background

  4. Application

  5. Challenges

    1. technical

    2. organizational

    3. legal

    4. translational

  6. Future trends

  7. Conclusion

  8. References


  1. Introduction

  2. History

  3. Definitions

  4. Biological data analysis - from data to discovery

  5. Biological data types

    1. genomics

    2. transcriptomics

    3. proteomics

    4. bioinformatics data in public repositories

    5. biomedical cancer data portals

  6. Tools for analyzing bioinformatics data

    1. command line tools

    2. web-based tools

  7. Genomic data analysis

  8. Genomic data analysis workflow

    1. variant calling pipeline for whole exome sequencing data

    2. quality check

    3. alignment

    4. variant calling

    5. variant filtering and annotation

    6. downstream analysis

    7. reporting and visualization

  9. Precision medicine - from big data to patient care

  10. Examples of precision medicine

  11. Challenges

  12. Future trends

  13. Useful resources

  14. Conclusion

  15. Exercises

  16. References


  1. Introduction

  2. History

  3. R language

    1. installing R & rstudio

    2. an example R program

    3. getting help in R

    4. user interfaces for R

    5. R’s default user interface: rgui

    6. Rstudio

    7. menu & dialog guis

    8. some popular R guis

    9. R graphical user interface comparison

    10. R resources

  4. Python language

    1. installing Python

    2. an example Python program

    3. getting help in Python

    4. user interfaces for Python

    5. reproducibility

  5. R vs. Python

  6. Future trends

  7. Conclusion

  8. Exercises

  9. References


  1. Brief history

  2. Introduction

    1. data refresher

    2. training vs test data

    3. bias and variance

    4. supervised and unsupervised learning

  3. Common machine learning algorithms

  4. Supervised learning

  5. Unsupervised learning

    1. dimensionality reduction

    2. reinforcement learning

    3. semi-supervised learning

  6. Evaluation of predictive analytical performance

    1. classification model evaluation

    2. regression model evaluation

  7. Machine learning software

    1. Weka

    2. Orange

    3. Rapidminer studio

    4. Knime

    5. Google tensorflow 2

    6. honorable mention

    7. summary

  8. Programming languages and machine learning

  9. Machine learning challenges

  10. Machine learning examples

    1. example 1 classification

    2. example 2 regression

    3. example 3 clustering

    4. example 4 association rules

  11. Conclusion

  12. Exercises

  13. References


  1. Introduction

    1. definitions

  2. History

  3. Ai architectures

  4. Deep learning

  5. Image analysis (computer vision)

    1. Radiology

    2. Ophthalmology

    3. Dermatology

    4. Pathology

    5. Cardiology

    6. Neurology

    7. Wearable devices

    8. image libraries and packages

  6. Natural language processing

    1. NLP libraries and packages

    2. text mining and medicine

    3. speech recognition

  7. Electronic health record data and AI

  8. Genomic analysis

  9. AI platforms

    1. deep learning platforms and programs

  10. Artificial intelligence challenges

    1. general

    2. data issues

    3. technical

    4. socio economic and legal

    5. regulatory

    6. adverse unintended consequences

    7. need for more ML and AI education

  11. Future trends

  12. Conclusion

  13. Exercises

  14. References


bottom of page