These notes have evolved from a presentation at Li & Fung in the Fall of 2016. Revision inspired by Allan Miller & I leading a discussion on “How to Learn R” at the Berkeley R Beginners meetup.
A quick start to R and data science.
In the olden days, 10 years ago, I spent a lot of time build links and making tricks & tips list. Now I just point to RStudio resources and a couple of my favorite books.
The classical way to learn R is to work through the all basic ideas in “base R” and then learn the modern enhancements and packages. If you like this approach see the just released 832 page tome The Book of R: A First Course in Programming and Statistics will appeal to you. It is actually a pretty good book. But…
My prefered way to teach R today is to leverage the new way of working in R – much of which has been developed by folks now on the RStudio team. In particular Hadley Wickham and his students who have created the “tidyverse” (formerly known as the “Hadleyverse”, but Hadley is getting modest in his old age).
Jim’s Principles for Getting Started in R
- Use RStudio!!!
- Live in the tidyverse
- Use RStudio’s project framework
- Invest in learning ggplot2 concepts (don’t start with qplot()!)
- Deliver your results via RMarkdown
- And use R Notebooks for development and EDA.
- Use git & GitHub. See:
- Josh’s Version Control with Git (& SVN)
- Git/GitHub chapter in Hadley’s R Packages
- Jenny Bryan’s complete notes http://happygitwithr.com/ from here useR! 2016 tutorial
- Package up your tools
Step-by-step guide
- Load R
- Load RStudio Desktop (IDE)
- See RStudio’s guide to on-line learning, in particular
- Their archive of excellent webinars, and
- Garrett & friends have great cheat sheets
- Read Garrett & Hadley’s R for Data Science –
a work in progressnow complete & in print (AKA The Tidyverse Guide)- Hadley just finished the Graphics for Communication chapter in which he mentions the ggplot2 extensions site.
- Practice on real data! Perhaps while finishing R for Data Science. You now know enough to do real work.
- Catch up with ggplot2 developments
- Winston’s Cookbook for R (with link to his R Graphics Cookbook)
- Read & follow advice in Hadley’s R Packages book
- For the brave programmers: Hadley’s Advanced R
Other Resources
- datascience+ R tutorials
- Cran Task Views to help you find the right package(s) for your area of work
- R tagged questions on stackoverflow – for when you are stuck
- R-bloggers – first place to look to check for ideas
- R Journal – Open Access R Journal
Favorite Books
(In addition to above, these predate Tideverse)
- Norm Matloff’s The Art of R Programming: A Tour of Statistical Software Design
- Really good if you have programming background in some other language.
- Nina & John’s Practical Data Science with R
- Emphasis on practical. How to do data science in the real world.
- Robert’s R in Action: Data Analysis and Graphics with R 2nd Edition
Tricks & Tips
dplyr & tidyr Quick Start
I like Brad Boehmke’s Data Processing with dplyr & tidyr
Google Drive Access
- Kay’s blog post http://thebiobucket.blogspot.com/2014/03/download-all-documents-from-google.html
- Note RGoogleDrive link is wrong
- also search his blog – does a lot of neat stuff w/ Drive
Google Sheets in R
Use the package googlesheets (duh!) see Jenny’s GitHub https://github.com/jennybc/googlesheets
Interesting Projects
- the cloudyr project – making R cloudier!
- sparklyr – R Studio’s connector to spark provides complete dplyr backend & more!