class: top, left, title-slide .title[ # 10 simple rules for teaching R for Data Science ] .author[ ### Tiffany Timbers (UBC) ] .author[ ### 2022-09-17 ] .date[ ### bit.ly/timbers-cascadia-r-conf-2022 ] --- <style type="text/css"> # source: https://community.rstudio.com/t/using-multiple-font-sizes-for-code-chunks/26405/6 .remark-slide-content { font-size: 20px; padding: 20px 80px 20px 80px; } .remark-code, .remark-inline-code { background: #f0f0f0; } .remark-code { font-size: 24px; } .huge .remark-code { /*Change made here*/ font-size: 200% !important; } .small .remark-code { /*Change made here*/ font-size: 80% !important; } .tiny .remark-code { /*Change made here*/ font-size: 60% !important; } </style> ### 1. Teach R by doing data analysis **What?** In your first lesson, load the data, do some "mild" wrangling and make a plot! **Why?** Motivation! *** .pull-left[ .tiny[ ```r library(tidyverse) # load the data set can_lang <- read_csv("data/can_lang.csv") # obtain the 10 most common Aboriginal languages aboriginal_lang <- filter(can_lang, category == "Aboriginal languages") arranged_lang <- arrange(aboriginal_lang, by = desc(mother_tongue)) ten_lang <- slice(arranged_lang, 1:10) # create the visualization ggplot(ten_lang, aes(x = mother_tongue, y = reorder(language, mother_tongue))) + geom_bar(stat = "identity") + xlab("Mother Tongue (Number of Canadian Residents)") + ylab("Language") ``` ] ] .pull-right[ <img src="img/nachos-to-cheesecake-1.png" width="100%" style="display: block; margin: auto auto auto 0;" /> ] Source: [Chapter 1 of *Data Science: A First Introduction*](https://datasciencebook.ca/intro.html) by Tiffany Timbers, Trevor Campbell & Melissa Lee --- <!---Teaching the tidyverse can facilitate this:---> .pull-left[ <blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">When you start writing a loop then turn it into dplyr <a href="https://twitter.com/hashtag/rstats?src=hash&ref_src=twsrc%5Etfw">#rstats</a>; David Robinson (@drob) <a href="https://twitter.com/drob/status/701790246111825925">Feb 22, 2016</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> ] .pull-right[ <img src="img/jenny-teach-tidyverse.png" width="1736" style="display: block; margin: auto;" /> #### Explore more: David Robinson: - [Teach the tidyverse to beginners](http://varianceexplained.org/r/teach-tidyverse/) blog post - [Announcing "Introduction to the Tidyverse", my new DataCamp course](http://varianceexplained.org/r/intro-tidyverse/) blog post - [Introduction to the Tidyverse](https://www.datacamp.com/courses/introduction-to-the-tidyverse) online course ] --- ### 2. Use participatory live coding! **What?** Narrate and type out the code you are teaching. Have participants follow along. **Why?** Demonstrates process and makes you appear human! *** <img src="img/lex-live-coding.png" width="70%" style="display: block; margin: auto auto auto 0;" /> Source: [A video introduction to live coding part 2](https://www.youtube.com/watch?v=SkPmwe_WjeY&t=111s) by Lex Nederbragt --- <img src="img/carpentries.png" width="70%" style="display: block; margin: auto;" /> #### Explore more: - [Ten quick tips for teaching with participatory live coding](https://doi.org/10.1371/journal.pcbi.1008090) by Lex Nederbragt, Rayna Harris, Alison Hill & Greg Wilson - [Live coding section of Teaching Tech Together](https://teachtogether.tech/en/index.html#s:performance-live) by Greg Wilson - Example live coding videos by Lex Nederbragt ([bad example](https://www.youtube.com/watch?v=bXxBeNkKmJE&t=5s), [good example](https://www.youtube.com/watch?v=SkPmwe_WjeY&t=111s)) --- ### 3. Give tons and tons of practice! **What?** Give the learners many many problems to solve. **Why?** Repetition leads to learning! *** Read the data files listed in the table below and and bind the names listed in the table to them. | File | Object name to bind to the data frame | File location | |---|---|----| | `abbotsford_lang.xlsx` | `abbotsford` | `data` directory of this repo | | `calgary_lang.csv` | `calgary` | `data` directory of this repo | | `edmonton_lang.xlsx` | `edmonton` | https://github.com/ttimbers/canlang/tree/master/inst/extdata | | `kelowna_lang.csv` | `kelowna` | `data` directory of this repo | | `vancouver_lang.csv` | `vancouver` | `data` directory of this repo | | `victoria_lang.csv` | `victoria` | https://github.com/ttimbers/canlang/tree/master/inst/extdata | --- .pull-left[ <img src="img/swirl.png" width="30%" style="display: block; margin: auto;" /> <img src="img/swirl-demo.png" width="90%" style="display: block; margin: auto;" /> ] .pull-right[ #### Explore more: - [Swirl R package](https://swirlstats.com/) by Sean Kross and others - [Practice worksheets to accompany Data Science: A First Introduction](https://github.com/UBC-DSCI/data-science-a-first-intro-worksheets#readme) by Tiffany Timbers, Trevor Campbell and Melissa Lee - [Supervised Machine Learning Case Studies in R course](https://supervised-ml-course.netlify.app) by Julia Silge - [Coding exercise types in Teaching Tech Together](https://teachtogether.tech/en/index.html#s:exercises) by Greg Wilson ] --- ### 4. Give tons and tons of timely feedback! **What?** Automated software tests to give hints and feedback. **Why?** Effective practice requires feedback! *** <img src="img/tests-as-feedback.png" width="70%" style="display: block; margin: auto auto auto 0;" /> --- .pull-left[ <img src="img/teaching-tech-together.jpeg" width="75%" style="display: block; margin: auto;" /> ] .pull-right[ #### Explore more: - [Automatic Grading chapter of *Teaching Tech Together*](https://teachtogether.tech/en/index.html#s:exercises-grading) by Greg Wilson - [learnr](https://rstudio.github.io/learnr/) R package for automated feedback of R Markdown documents - [nbgrader](https://nbgrader.readthedocs.io/en/stable/) tool for automated feedback & autograding of Jupyter notebooks - [ottergrader](https://otter-grader.readthedocs.io/en/latest/) tool for automated feedback & autograding of both Jupyter notebooks & R Markdown documents ] --- ## 5. Use tractable or toy data examples **What?** Use a countable number of things when teaching a new tool, method or algorithm. **Why?** Allows one to see how all the elements are manipulated. *** A small subset of the `palmerpengiuns` data set for teaching K-means clustering: <img src="img/10-toy-example-clustering-1.png" width="35%" style="display: block; margin: auto;" /> Source: [Clustering chapter of *Data Science: A First Introduction*](https://datasciencebook.ca/clustering.html) by Tiffany Timbers, Trevor Campbell & Melissa Lee --- .pull-left[ <img src="img/jenny-dplyr-join-tweet.png" width="100%" style="display: block; margin: auto;" /> .small[ ```r left_join(superheroes, publishers) ``` ] We basically get `x = superheroes` back, but with the addition of variable `yr_founded`, which is unique to `y = publishers`. Hellboy, whose publisher does not appear in `y = publishers`, has an `NA` for `yr_founded`. <img src="img/jenny-bryan-left-join.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ #### Explore more: - [Jenny Bryan's join cheat sheet](https://stat545.com/join-cheatsheet.html) - [`dpylr::tibble`](https://tibble.tidyverse.org/reference/tibble.html) or [`dpylr::tribble`](https://tibble.tidyverse.org/reference/tribble.html) R functions - [`sample`](https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/sample) or [`dpylr::sample_n`](https://dplyr.tidyverse.org/reference/sample_n.html) R functions - [charlatan](https://docs.ropensci.org/charlatan/) R package for creating fake data - [drawdata.xyz](https://drawdata.xyz/) app to draw a data set and download it ] --- ## 6. Use real and rich, but accessible, data sets **What?** Use real and interesting data that is also easily understood by all. **Why?** Motivating! *** <img src="img/README-example-plot-from-data-1.png" width="70%" style="display: block; margin: auto auto auto 0;" /> Source: [`canlang`](https://ttimbers.github.io/canlang/) R package --- .pull-left[ Jenny Bryan's [`gapminder`](https://github.com/jennybc/gapminder) R data package: | variable | meaning | | :-------- | :----------------------- | | country | | | continent | | | year | | | lifeExp | life expectancy at birth | | pop | total population | | gdpPercap | per-capita GDP | ] .pull-right[ <img src="10-simple-rules-for-teaching-r-for-data-science_files/figure-html/gapminder-1.png" style="display: block; margin: auto;" /> ] #### Explore more: - [`canlang`](https://ttimbers.github.io/canlang/) R data package - [`palmerpenguins`](https://allisonhorst.github.io/palmerpenguins/) R data package - [`Lahman`](https://github.com/cdalzell/Lahman) baseball R data package - [`datateachr`](https://github.com/UBC-MDS/datateachr) R data package - [`taxyvr`](https://github.com/UBC-MDS/taxyvr) R data package - [UCI Machine Learning Repository](https://archive-beta.ics.uci.edu/) for data sets --- ## 7. Provide cultural and historical context **What?** When R things seem "odd", take time to explain why they are the way they are. **Why?** Cultivates an appreciation, or at least understanding of R's quirks (compared to other programming languages). Helps prevent frustration and/or annoyances. Helps with motivation again! *** When teaching programming with the `tidyverse`, I make sure to introduce R's history and it's design purpose, so that the role and rationale for unquoted column names is clear: >#### Now, a bit of history about R >- An implementation of the S programming language (created at Bell labs in 1976) >- written in C, Fortran, and R itself >- R was created by Ross Ihaka and Robert Gentleman (Statisticians from NZ) >- R is named partly after the authors and partly as a play on the name of S >- First stable beta version in 2000 --- .pull-left[ <img src="img/why-assignment-1.png" width="100%" style="display: block; margin: auto;" /> (Or why `<-` instead of `=`?) <img src="img/why-assignment-2.png" width="100%" style="display: block; margin: auto;" /> Source: https://colinfay.me/r-assignment/ ] .pull-right[ #### Explore more: - [History and Overview of R chapter](https://bookdown.org/rdpeng/rprogdatascience/history-and-overview-of-r.html) from R Programming for Data Science by Roger Peng - [Introduction to *R for Data Engineers*](https://tidynomicon.github.io/tidynomicon/) by Greg Wilson - [Wikipedia article on R](https://en.wikipedia.org/wiki/R_(programming_language) ] --- ## 8. Build a safe, inclusive and welcoming community **What?** Put in place scaffolding and guidelines to facilitate a safe learning environment. **Why?** Students cannot effectively learn if they do not feel safe. *** <img src="img/dsci-100-coc.png" width="75%" style="display: block; margin: auto auto auto 0;" /> Source: https://ubc-dsci.github.io/dsci-100-student/CODE_OF_CONDUCT.html --- <img src="img/carpentries-coc.png" width="100%" style="display: block; margin: auto;" /> Source: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html #### Explore more: - [Ally Skills workshop](https://frameshiftconsulting.com/ally-skills-workshop/) --- ### 9. Use checklists to focus and facilitate peer learning **What?** Focus and provide structure to student peer review by providing them review checklists. **Why?** Peer review is hard and often new to many. Also reviewing something you are new at is even harder! What to focus on? *** ##### Data analysis review checklist: .pull-left[ <img src="img/da-review-checklist-1.png" width="85%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="img/da-review-checklist-2.png" width="85%" style="display: block; margin: auto;" /> ] Source: https://github.com/UBC-MDS/data-analysis-review-checklist --- #### ROpenSci R package reviews utilize checklists: .pull-left[ <img src="img/ropensci-1.png" width="80%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="img/ropensci-2.png" width="80%" style="display: block; margin: auto;" /> ] Source: https://github.com/ropensci/software-review/issues/495 #### Explore more: - [Using rOpenSci Software Peer Review Guidelines for Teaching](https://ropensci.org/blog/2019/08/27/software-peer-review-guidelines-for-teaching/) blog post by Tiffany Timbers - [ROpenSci review template](https://devguide.ropensci.org/reviewtemplate.html) - [Journal of Open Source Software review checklist](https://joss.readthedocs.io/en/latest/review_checklist.html) - [Machine Learning Reproducibility Checklist](https://www.cs.mcgill.ca/~jpineau/ReproducibilityChecklist.pdf) by Joelle Pineau - [ Checklist Manifesto](http://atulgawande.com/book/the-checklist-manifesto/) by Atul Gawande --- ## 10. Have students do projects **What?** Have students perform a (scoped) data analysis project on a topic of their choosing. **Why?** Motivation and exposure to the messiness of real data analysis. *** Student project from DSCI 524 (Collborative Software Development): <img src="img/student-project.png" width="90%" style="display: block; margin: auto auto auto 0;" /> Source: https://github.com/UBC-MDS/rlyrics --- Projects are central to UBC's founding data science course, STAT 545 (created by Jenny Bryan). <img src="img/stat-545.png" width="60%" style="display: block; margin: auto auto auto 0;" /> Source: https://stat545.stat.ubc.ca/ #### Explore more: - [Project courses in MDS](https://ubc-mds.github.io/2019-08-22-project-courses/) blog post by Tiffany Timbers and Mike Gelbart - [Project-based learning: A review of the literature](https://doi.org/10.1177/1365480216659733) by Dimitra Kokotsaki, Victoria Menzies and Andy Wiggins --- ## 10 simple rules for teaching R for Data Science -- 1. Teach R by doing data analysis -- 2. Use participatory live coding -- 3. Give tons and tons of practice -- 4. Give tons and tons of feedback -- 5. Use tractable or toy data examples -- 6. Use real and rich, but accessible, data sets -- 7. Provide cultural and historical context -- 8. Build a safe, inclusive and welcoming community -- 9. Use checklists to focus and facilitate peer learning -- 10. Have students do projects --- ### Acknowledgements - UBC Master of Data Science students - UBC DSCI 100 and 310 students - [UBC Master of Data Science teaching team](https://ubc-mds.github.io/team/) and the DSCI 100 teaching team - [The Carpentries](https://carpentries.org/) - [Greg Wilson](https://third-bit.com/), Software Engineer, Deep Genomics - [Jenny Bryan](https://jennybryan.org/), Software Engineer, Posit - [David Robinson](http://varianceexplained.org/), Director of Data Scientist, Heap - [Roger Peng](https://rdpeng.org/), Professor of Statistics and Data Sciences , University of Texas, Austin - [Lex Nederbragt](https://lexnederbragt.com/), Senior Lecturer, University of Oslo - [Sean Kross](https://seankross.com/), Data Staff Scientist, Fred Hutch Data Science Lab - [Colin Fay](https://colinfay.me/), Data Scientist & R Hacker, ThinkR - And many more educators in the R community!!! There are too many to list! --- class: inverse, center, middle ## 10 simple rules for teaching R for Data Science #### talk slides: *[https://bit.ly/timbers-cascadia-r-conf-2022](https://bit.ly/timbers-cascadia-r-conf-2022)*