Introduction to R

What is R?

R is a versatile, open source programming/scripting language important tool for computational statistics, visualization and data science. Inspired by the programming language S.

  • Open source software under GPL
  • Superior (if not just comparable) to commercial alternatives. R has over 7,000 user contributed packages at this time. It's widely used both in academia and industry
  • Available on all platforms
  • Not just for statistics, but also general purpose programming
  • Is object oriented and functional
  • Large and growing community of peers

The Big Data problem

Your major thesis project involves studying the effects of a gene called GGL which appears to play a large role in addiction to the internet. To evaluate the gene's function you are comparing two different groups of mice; one set of mice with a normal GGL gene (wildtype) and one set in which the GGL gene is turned off (knockout). Within each set of mice you are also asked to evaluate two different cell types (A and B). The samples are sent to a sequencing facility and rather large data matrices are returned to you. Everytime you open it in Excel, your laptop freezes and crashes. After much frustration you decide to use R to analyze the data.

Objectives

In this lesson, students will learn to use R within the RStudio environment. They will learn basic R syntax, get familiar with important data structures and how to work with them.

Topics

  1. Getting Started
  2. Syntax and Data Structures
  3. Dataframes
  4. Manipulating data
  5. Re-organizing data
  6. Analyzing and Plotting data

Other Resources