DATA SCIENCE TRAINING

 

New to data science or looking to pick up a few new skills? Don’t miss these free webinars, guided practical tutorials and online resources featuring CANUE data.

Developed in partnership with Population Data BC


Module 1: Introduction to Machine Learning

  • What is machine learning?
  • Supervised vs unsupervised learning
  • Model- and kernel-based methods
  • Measures of Accuracy (Test/train and cross-validation)
  • Causality and Accuracy
  • Unsupervised learning as feature reduction
Module 2: Regression and Regularization Algorithms

  • Regression with many correlated variables
  • Automatic variable selection, early approaches and problems
  • Gradient descent
  • Regularization  (L1 vs L2 vs ElasticNet)
Module 3: Advanced Supervised Learning 

  • Decision trees
  • Problems in overfit
  • Random Forest
  • Out-of-bag error vs cross-validation
Module 4: Advanced Unsupervised Learning 

  • Who uses unsupervised learning?
  • K-means
  • Expectation-maximization
  • Susceptibility to outliers
  • Dangers of labeling clusters

Dr. Aman Verma  is a Data Engineer with a PhD in Epidemiology from McGill University, and an undergraduate degree in Computer Science. He has experience in developing machine learning systems with large databases, particularly for scientific data in healthcare. While he’s comfortable learning any programming language, he’s recently become particularly interested in R. Aman is currently involved in a number of projects, including measuring how following opioid prescription guidelines can decrease the risk of opioid overdose, modelling trajectories of chronic obstructive pulmonary disease, and assessing how to best prioritize ambulance calls using secondary healthcare data.

 


AN INTRODUCTION TO DATA MANAGEMENT AND CLEANING FOR ANLAYSIS IN ‘R’  

This self paced free online course will provide you with an introduction to Data Management and Cleaning for Analysis using R Software. Each of the four modules includes a Power Point slide deck, CANUE training data, R code and associated exercises for practice.

To access this resource please create a Population Data BC account here: https://my.popdata.bc.ca/accounts/register/

Once your account has been approved you will be able to access the Education and Training site and self enroll in this and other free online courses.

Topics covered include:

  • Introduction and theory of data cleaning and management
  • Getting started with R software
  • Subsetting variables and data cleaning
  • Creating variables, subset observations and data cleaning
  • Merging, joining and reshaping data

 

Megan Striha currently works as a Data Analyst. She has a Masters of Public Health degree and three years of experience in health data analysis, including working with survey, administrative and census data.