Skip to main content

R

Welcome to R

R is a powerful, specialized programming language and environment for statistical computing and graphics. Developed in the early 1990s, it has become the tool of choice for statisticians, data analysts, and researchers worldwide. R is highly extensible through user-contributed packages, offering capabilities for data manipulation, complex statistical modeling, machine learning, and the creation of publication-quality visualizations. Its strong focus on data analysis, combined with an active community, makes R an indispensable skill for anyone working deeply with data.

Introduction to R

How to install R and RStudio and write your first "Hello, World!" program.

R is one of the most popular languages for statistical analysis and data visualization. Its ecosystem, especially with RStudio, provides an excellent environment for both beginners and experts.

Step 1: Download and Install R

1. Visit the Official Website: Go to The R Project website and download R for your operating system (Windows, macOS, or Linux).

2. Run the Installer: Follow the setup instructions for your OS.

Step 2: Install RStudio (Recommended IDE)

RStudio is an Integrated Development Environment (IDE) that makes working with R much easier.

1. Download RStudio Desktop from the RStudio website.

2. Install it on your system.

Step 3: Write and Run Your First R Script

1. Open RStudio.

2. Go to File → New File → R Script.

3. Type the following code in the script editor:

print("Hello, World!")

4. To run the code, you can:

- Select the line and press Ctrl+Enter (Cmd+Return on Mac).

- Click the 'Run' button in the script editor.

5. You should see the output in the Console pane below:

> print("Hello, World!")
[1] "Hello, World!"

Congratulations! You have successfully installed R and run your first script. 🎉

For more details, visit the official R documentation at The R Manuals. Happy coding! 🚀

Syntax of R

1. R Uses Curly Braces for Code Blocks

R uses curly braces {} to define blocks of code, such as in functions, loops, and conditionals. While indentation is not enforced, it is highly recommended for readability.

Correct Example:
if (5 > 3) {
  print("Five is greater than three")  # Code block inside braces
}
Incorrect Example (Will cause an error):
if (5 > 3)
print("This will cause an error!")  # Missing braces for multi-line block

Always use braces for multi-line blocks to avoid errors and improve code clarity.

2. One-Based Indexing

R follows one-based indexing, meaning the first element of a vector, list, or other data structure is at index 1, not 0.

Example:
fruits < - c("apple", "banana", "cherry")
print(fruits[1])  # Outputs: "apple" (first element)
print(fruits[3])  # Outputs: "cherry" (third element) 
Incorrect Example (Returns empty vector):
print(fruits[0])  # Returns character(0), an empty vector

Since R starts counting from 1, accessing index 0 does not give an error but returns an empty vector of the same type.

3. Comments in R

Comments are used to explain code and are ignored by R. R only supports single-line comments, which start with a #.

Single-Line Comment:
# This is a single-line comment
print("Hello, World!")  # This is an inline comment

For multi-line explanations, you must start each line with a #. There is no native multi-line comment syntax in R.

4. Case Sensitivity

R is a case-sensitive language. Variable names, function names, and keywords must be used with consistent capitalization.

Example:
myVar <- 5
MyVar <- 10
print(myVar)  # Outputs: 5
print(MyVar)  # Outputs: 10 (different variable)
Incorrect Example:
Print("Hello")  # Error: could not find function "Print"

R's built-in functions like print() must be written in lowercase.

5. Variables and Dynamic Typing

R is dynamically typed. You do not need to declare a variable's type; it is determined by the value you assign.

Example:
x <- 10       # Integer
class(x)     # "numeric" (all numbers are numeric by default)
y <- 3.14     # Float/Double
class(y)     # "numeric"
z <- "Hello"  # String/Character
class(z)     # "character"
Reassigning Different Data Types:
a <- 5        # Initially numeric
a <- "R"      # Now reassigned as a character
print(a)     # Outputs: "R"

Because R uses dynamic typing, a variable's type can change during execution.

Conclusion

Understanding R's basic syntax is essential for writing correct and efficient code. Key takeaways include:

  • R uses {} for code blocks and encourages good indentation.
  • One-based indexing means the first element is at index 1.
  • Comments are single-line and start with #.
  • R is case-sensitive.
  • R supports dynamic typing, allowing variables to change types.

print and cat

The print() and cat() functions in R are used to display output. print() is the standard function for displaying any R object, while cat() is better for concatenating and outputting multiple items in a more controlled format.

1. Basic print() Usage

The print() function displays a single R object, such as a string, number, or variable.

print("Hello, World!")  # Outputs: [1] "Hello, World!"
print(42)               # Outputs: [1] 42

2. Using cat() for Concatenated Output

The cat() function is useful for outputting multiple items. It converts its arguments to character strings, concatenates them, and prints them. Unlike print(), it does not add a newline by default or include vector indices like [1].

cat("Hello", "R", "World", "!")  # Outputs: Hello R World !

You can control the separator using the sep parameter.

cat("Hello", "R", sep = "-")  # Outputs: Hello-R

3. Using the sep Parameter

The sep parameter in cat() defines the separator between items.

cat("apple", "banana", "cherry", sep = ", ")
# Outputs: apple, banana, cherry

4. Printing Variables

You can print variables directly with print() or concatenate them with cat().

name <- "Alice"
age <- 25
print(name)  # Outputs: [1] "Alice"
cat("Name:", name, "Age:", age, "\n")
# Outputs: Name: Alice Age: 25

For more complex formatting, use sprintf() or paste().

cat(sprintf("Name: %s, Age: %d\n", name, age))
# Outputs: Name: Alice, Age: 25

5. Printing Special Characters

Use escape sequences for formatting:

cat("Hello\nWorld!")  # \n adds a new line
cat("This is a tab:\tR")  # \t adds a tab space

Conclusion

The print() and cat() functions are fundamental for displaying output in R. Key takeaways:

  • Use print(object) for standard output of any R object.
  • Use cat(...) for concatenating and controlling the output format with sep.
  • cat() does not add a newline by default; use \n when needed.
  • Use sprintf() for precise, C-style string formatting.

Arithmetic Operators in R

Arithmetic operators perform mathematical calculations. R supports addition, subtraction, multiplication, division, exponentiation, modulus, and integer division.

Examples:

a &tl;- 15
b <- 4

cat("a + b =", a + b, "\n")   # 19 (addition)
cat("a - b =", a - b, "\n")   # 11 (subtraction)
cat("a * b =", a * b, "\n")   # 60 (multiplication)
cat("a / b =", a / b, "\n")   # 3.75 (true division)
cat("a %/% b =", a %/% b, "\n")  # 3 (integer division)
cat("a %% b =", a %% b, "\n")   # 3 (modulus)
cat("a ^ b =", a ^ b, "\n")   # 50625 (exponentiation, also a ** b)

Arithmetic operators are vectorized, meaning they work element-wise on vectors.

x <- c(1, 2, 3)
y <- c(4, 5, 6)
print(x + y)  # Outputs: 5 7 9

Comparison Operators in R

Comparison operators compare two values and return a logical (boolean) result, TRUE or FALSE. They include equality, inequality, and greater/less than comparisons.

Examples:

x <- 7
y <- 10

print(x == y)   # FALSE (equal to)
print(x != y)   # TRUE (not equal to)
print(x > y)    # FALSE
print(x &lf; y)    # TRUE
print(x >= 7)   # TRUE
print(y <= 7)   # FALSE

Comparison results are often used in if statements or for subsetting data.

vec <- 1:5
print(vec > 3)  # Outputs: FALSE FALSE FALSE TRUE TRUE

Logical Operators in R

Logical operators combine logical values and return TRUE or FALSE. The main operators are & (and), | (or), and ! (not).

Examples:

is_sunny <- TRUE
is_warm <- FALSE

print(is_sunny & is_warm)   # FALSE (both must be TRUE)
print(is_sunny | is_warm)    # TRUE  (at least one is TRUE)
print(!is_warm)              # TRUE  (negates the value)

For control flow (e.g., in if conditions), use the short-circuiting operators && and ||, which only evaluate the first element of a vector.

if (x > 0 && y > 0) {
  print("Both are positive.")
}

Bitwise Operators in R

Bitwise operators work on the binary representations of integers. They are less common in R for data analysis but are available for low-level programming. R uses functions like bitwAnd() for these operations.

Examples:

a <- as.raw(6)   # binary: 110
b lt;- as.raw(3)   # binary: 011

print(as.integer(a & b))  # 2 (binary 010, AND)
print(as.integer(a | b))  # 7 (binary 111, OR)
print(as.integer(xor(a, b))) # 5 (binary 101, XOR)

For integer inputs, use the bitw family of functions.

print(bitwAnd(6, 3))  # 2
print(bitwOr(6, 3))   # 7
print(bitwShiftL(1, 2)) # 4 (left shift)
print(bitwShiftR(8, 1)) # 4 (right shift)

Assignment Operators in R

The main assignment operator in R is <-. The equals sign = can also be used, but < is the conventional choice. Compound assignment is not native but can be achieved with packages like zeallot or by reassignment.

Examples:

x <- 10   # standard assignment
x = 10    # alternative assignment (less common in scripts)
print(x)

# Reassignment with arithmetic
x <- x + 5  # x is now 15
x <- x - 3  # x is now 12
x <- x * 2  # x is now 24

For multiple assignment, you can assign multiple variables at once.

c(a, b) <- c(4, 5)  # a is 4, b is 5 (requires zeallot package for this syntax)
# Or, more simply:
a <- 4
b <- 5

Integers in R

In R, whole numbers are usually stored as the numeric type (double-precision floating point). To explicitly create an integer, you must use the L suffix.

1. Creating Integers and Numerics

a <- 10     # This is a 'numeric' by default
class(a)   # "numeric"
b <- 10L    # This is an 'integer'
class(b)   # "integer"
c <- as.integer(10)  # Another way to create an integer

2. Integer Operations

Integers and numerics work together in arithmetic operations. R will often return a numeric result.

print(5L + 3L)   # 8 (integer)
print(5 / 2)    # 2.5 (numeric, even with integers)
print(5L %/% 2L) # 2 (integer division)

3. Type Conversion

num <- as.integer("123")
print(num)  # 123 (integer)

num <- as.integer(3.1415)
print(num)  # 3 (truncates towards zero)

Numeric (Doubles/Floats) in R

The numeric data type in R is used for real numbers (floats/doubles). It is the default type for numbers.

1. Creating Numerics

a <- 3.1415
b <- -0.5
c <- 100.0  # Even this is numeric
print(c(a, b, c))
class(a) # "numeric"

2. Numeric Operations

print(2.5 + 1.5)  # 4
print(5.0 / 2)   # 2.5
print(3.1415 ^ 2) # Exponentiation
print(7.3 %% 3)   # Modulus (remainder)

3. Type Conversion and Special Values

num <- as.numeric("123.45")
print(num)  # 123.45

num <- as.numeric(TRUE)  # TRUE coerces to 1
print(num)  # 1

# Special numeric values
print(Inf - Inf)  # NaN (Not a Number)
print(1 / 0)      # Inf

Character Strings in R

Strings in R are stored in the character vector type. They can be defined using either single (') or double (") quotes.

1. Creating Character Strings

a <- 'Hello, World!'
b <- "R is great"
# Both are equivalent
print(a)
print(b)

R does not have a native multi-line string syntax like triple quotes. You can create a vector of strings or use the paste() function with a newline character.

multiline <- paste("This is", "a multi-line", "string", sep = "\n")
cat(multiline)

2. String Operations

String concatenation is done with paste().

print(paste("Hello", "World", sep = " "))  # Concatenation
print(paste0("Hello", "World"))     # Concatenation with no separator

# Get string length with nchar()
print(nchar("Hello"))  # 5

3. String Functions

text <- "hello world"
print(toupper(text))      # "HELLO WORLD"
print(tolower("HELLO"))  # "hello"
print(sub("world", "earth", text))  # Substitute first occurrence: "hello earth"

String Functions: Trimming, Splitting, and Replacing

R has many useful functions for string manipulation, often found in base R or the stringr package.

Trimming with trimws()

text <- "   hello world   "
print(trimws(text))            # "hello world"
print(trimws(text, "left"))   # "hello world   "
print(trimws(text, "right"))  # "   hello world"

Splitting Strings with strsplit()

line <- "apple,banana,cherry"
print(strsplit(line, ","))   # Returns a list: [[1]] "apple" "banana" "cherry"

words <- "one two three"
print(strsplit(words, " ")[[1]])  # Extract the vector: "one" "two" "three"

Replacing Text with sub() and gsub()

sentence <- "I like cats and cats are nice"
print(sub("cats", "dogs", sentence))  # Replaces only first: "I like dogs and cats are nice"
print(gsub("cats", "dogs", sentence)) # Replaces all: "I like dogs and dogs are nice"

Joining Strings with paste()

parts <- c("2025", "09", "26")
date <- paste(parts, collapse = "-")
print(date)   # "2025-09-26"

Common Pitfalls

  • trimws() only removes whitespace by default.
  • strsplit() returns a list; use [[1]] to get the first element as a vector.
  • sub() replaces only the first occurrence; gsub() replaces all.
  • paste() with collapse turns a vector into a single string.

String Formatting & Case Methods

R provides several ways to format strings and combine them with variables.

Formatting with sprintf()

# %s for strings, %d for integers, %f for floats
template <- "The sum of %d and %d is %d"
msg <- sprintf(template, 2, 4, 2+4)
print(msg)   # The sum of 2 and 4 is 6

# Control decimal places with %.2f
print(sprintf("Pi is approximately %.2f", pi)) # "Pi is approximately 3.14"

Case Conversion

text <- "Hello World"
print(toupper(text))   # "HELLO WORLD"
print(tolower(text))   # "hello world"

The stringr Package

The stringr package provides a more consistent and user-friendly set of string functions.

library(stringr)
str_to_upper("hello")  # "HELLO"
str_replace_all("a-a-a", "a", "b") # "b-b-b" (like gsub)

Common Pitfalls

  • Using wrong format specifiers in sprintf() can cause errors or unexpected output.
  • toupper() and tolower() are base R functions, not methods on the string object.

Vectors

Vectors are the fundamental data structure in R. They are one-dimensional arrays that can hold numeric, character, or logical data, but all elements must be of the same type. They are created using the c() function (combine).

1. Creating a Vector

num_vec <- c(1, 2, 3, 4, 5)       # Numeric vector
char_vec <- c("a", "b", "c")      # Character vector
log_vec <- c(TRUE, FALSE, TRUE)   # Logical vector
print(num_vec)
print(char_vec)

2. Vectorized Operations

A key feature of R is that most operations are vectorized, meaning they are applied to each element of the vector without the need for explicit loops.

v1 <- c(1, 2, 3)
v2 <- c(4, 5, 6)
print(v1 + v2)  # 5, 7, 9
print(v1 * 2)  # 2, 4, 6

3. Vector Recycling

When performing operations on two vectors of different lengths, R will recycle the shorter vector to match the length of the longer one.

short <- c(1, 2)
long <- c(10, 20, 30, 40)
print(short + long)  # 11, 22, 31, 42 (short is recycled to c(1,2,1,2))

Warning: Recycling can produce unexpected results if the longer vector's length is not a multiple of the shorter one's length.

Indexing Vectors

You access elements of a vector using square brackets []. R uses 1-based indexing, so the first element is at position 1.

Basic Indexing

fruits <- c("apple", "banana", "cherry")
print(fruits[1])   # Outputs: "apple" (first element)
print(fruits[3])   # Outputs: "cherry" (third element)

Negative Indexing

Negative indices are used to exclude elements.

print(fruits[-1])  # All elements except the first: "banana" "cherry"
print(fruits[-c(1,3)])  # Excludes first and third: "banana"

Logical Indexing

You can use a logical vector to select elements where the condition is TRUE.

numbers <- c(10, 20, 30, 40)
print(numbers[numbers > 25])  # Outputs: 30, 40

Common Pitfalls

  • Accessing an index of 0 returns an empty vector.
  • Accessing an index beyond the vector's length returns NA.
  • Negative and positive indices cannot be mixed in a single subset operation.

Slicing Vectors

Slicing allows you to extract a contiguous portion of a vector using the syntax vector[start:end]. Both start and end are inclusive.

Basic Slicing

numbers <- c(0, 1, 2, 3, 4, 5)
print(numbers[2:4])    # Outputs: 1, 2, 3 (elements 2 through 4)
print(numbers[1:3])    # Outputs: 0, 1, 2 (first three elements)
print(numbers[4:6])    # Outputs: 3, 4, 5 (from 4th to last)

Using Sequences

print(numbers[seq(1, 5, by=2)])  # Outputs: 0, 2, 4 (every 2nd element)
print(numbers[c(1,3,5)])        # Outputs: 0, 2, 4 (specific indices)

Reversing a Vector

print(rev(numbers))  # Outputs: 5, 4, 3, 2, 1, 0
print(numbers[6:1])  # Also reverses

Common Pitfalls

  • The end index is inclusive, unlike Python where it's exclusive.
  • Slicing returns a new vector; it does not modify the original.

Vector-Specific Functions & Methods

R provides many built-in functions for creating, manipulating, and summarizing vectors.

Useful Vector Creation Functions

seq1 <- 1:5          # Creates a sequence: 1,2,3,4,5
seq2 <- seq(1, 10, by=2) # 1,3,5,7,9
rep1 <- rep(1, times=5)  # 1,1,1,1,1 (repeat)
rep2 <- rep(1:2, each=2) # 1,1,2,2

Summary Functions

nums <- c(1, 2, 3, 4, NA, 6)  # Note the NA (missing value)
print(length(nums))   # Number of elements (6)
print(sum(nums, na.rm = TRUE))   # Sum, removing NA (16)
print(mean(nums, na.rm = TRUE))  # Mean, removing NA (3.2)
print(max(nums, na.rm = TRUE))   # Maximum value (6)
print(min(nums, na.rm = TRUE))   # Minimum value (1)

Logical Vector Functions

log_vec <- c(TRUE, FALSE, TRUE)
print(all(log_vec))  # FALSE (are all values TRUE?)
print(any(log_vec))  # TRUE (is any value TRUE?)

Common Pitfalls

  • Many functions return NA if the vector contains missing values; use na.rm = TRUE to ignore them.
  • The length() function returns the total number of elements, not the count of TRUE values.

Lists

Lists are versatile R objects that can contain elements of different types (e.g., numbers, strings, vectors, even other lists). They are created with the list() function.

1. Creating a List

my_list <- list(1, "a", TRUE, c(2, 5, 7))
print(my_list)

2. Accessing List Elements

Elements can be accessed by position using single brackets [] (which returns a list) or double brackets [[]] (which returns the element itself).

print(my_list[2])     # Returns a list containing "a"
print(my_list[[2]])   # Returns the element "a" itself

# Accessing a vector inside the list
print(my_list[[4]][2]) # Accesses the 2nd element of the vector in the 4th list item: 5

3. Named Lists

List elements can be named, which allows for access with the $ operator.

named_list <- list(name = "Alice", age = 30, scores = c(85, 92, 78))
print(named_list$name)    # "Alice"
print(named_list[["age"]]) # 30
print(named_list$scores[2]) # 92

Data Frames

Data frames are the most important data structure for data analysis in R. They are used to store tabular data, where each column can be a different type (e.g., numeric, character), but all elements within a column must be the same type. This is analogous to a spreadsheet or a Python pandas DataFrame.

1. Creating a Data Frame

df <- data.frame(
  name = c("Alice", "Bob", "Charlie"),
  age = c(25, 30, 35),
  score = c(85, 92, 78)
)
print(df)

2. Accessing Data Frame Elements

You can access columns, rows, and individual elements using various methods.

print(df$name)        # Access the 'name' column as a vector
print(df[["age"]])    # Another way to access a column
print(df[2, ])        # Get the second row
print(df[2, 3])       # Get the element in the 2nd row, 3rd column
print(df[, "score"])   # Get the 'score' column

3. Adding and Modifying Columns

df$new_col <- c("A", "B", "C")  # Add a new column
df$age <- df$age + 1              # Modify an existing column
print(df)

Data Frame Functions

R provides many functions to inspect and manipulate data frames.

Inspecting Data Frames

print(dim(df))        # Dimensions (rows, columns)
print(nrow(df))       # Number of rows
print(ncol(df))       # Number of columns
print(names(df))      # Column names
print(str(df))        # Structure of the data frame
print(summary(df))    # Summary statistics for each column

Subsetting Data Frames

# Select specific columns
print(df[, c("name", "score")])

# Filter rows based on a condition
print(df[df$score > 80, ])

# Use the subset() function
print(subset(df, age >= 30 & score < 90))

Common Pitfalls

  • Using $ for a column name that doesn't exist returns NULL.
  • When selecting a single column with [, it remains a data frame by default; use [[ or $ to get a vector.

Variables

Variables in R are used to store data. They are dynamically typed, meaning the type is determined by the assigned value. Variable names can contain letters, numbers, dots, and underscores, but cannot start with a number.

1. Creating Variables

a <- 10
b <- "Hello"
c <- 3.14
print(a)
print(b)
print(c)

2. Assigning Multiple Variables

# R does not have built-in multiple assignment like Python.
# You can assign separately or use a list:
a <- 4
b <- 5

# Or, for multiple return values from a function:
values <- list(4, 5)
a <- values[[1]]
b <- values[[2]]

3. Checking and Converting Variable Types

x <- "123"
print(class(x))      # "character"

# Type conversion
num <- as.numeric(x)
print(class(num))    # "numeric"
print(num)           # 123

4. The Environment

You can see all defined variables in the environment using ls().

ls()  # Lists all variables in the current environment

Conditionals: if, else and ifelse

Conditional statements in R allow you to execute different code blocks based on logical conditions. The basic structure uses if and else. For vectorized conditional checks, use ifelse().

1. Basic if Statement

x <- 10
if (x > 5) {
  print("x is greater than 5")
}

2. if-else Statement

x <- 3
if (x > 5) {
  print("x is greater than 5")
} else {
  print("x is not greater than 5")
}

3. if-else-if Chain

x <- 5
if (x > 5) {
  print("x is greater than 5")
} else if (x == 5) {
  print("x is exactly 5")
} else {
  print("x is less than 5")
}

4. Vectorized ifelse()

The ifelse() function is useful for applying a conditional check to each element of a vector.

vec <- 1:5
result <- ifelse(vec > 3, "High", "Low")
print(result)  # "Low" "Low" "Low" "High" "High"

for-loop (Basics)

A for loop in R repeats a block of code for each element in a sequence (like a vector or list).

Looping Over a Vector

fruits <- c("apple", "banana", "cherry")
for (fruit in fruits) {
  print(paste("I like", fruit))
}

Looping with an Index

for (i in 1:length(fruits)) {
  print(paste("Fruit", i, "is", fruits[i]))
}

Using break and next

Use break to exit a loop early and next to skip to the next iteration (similar to continue in Python).

for (i in 1:5) {
  if (i == 3) {
    next  # skip number 3
  }
  if (i == 5) {
    break  # stop the loop at 5
  }
  print(i)
}
# Prints 1, 2, 4

Common Pitfalls

  • Looping is often not the most efficient way to work with data in R; vectorized operations are preferred.
  • The loop variable (e.g., fruit) is not limited to the loop's scope; it remains in the environment after the loop finishes.

while-loop (Basics)

A while loop repeats a block of code as long as its condition is TRUE. The number of repetitions is not fixed in advance.

Simple Counting Example

x <- 1
while (x <= 5) {
  print(paste("Count is", x))
  x <- x + 1   # Crucial: update the counter
}

Stopping Early with break

x <- 1
while (TRUE) {       # runs until broken
  if (x %% 5 == 0) {
    print(paste(x, "is a multiple of 5"))
    break
  }
  x <- x + 1
}

Common Pitfalls

  • Infinite loops: If the condition never becomes FALSE, the loop will run forever. Always ensure the condition can change.
  • Forgetting to update the variable in the condition (e.g., x <- x + 1) is a common cause of infinite loops.

Loop Control Statements: break and next

R provides break and next to control the flow of loops.

break — Exiting a Loop

for (i in 1:10) {
  if (i > 5) {
    break
  }
  print(i)
}
# Prints 1,2,3,4,5

next — Skipping an Iteration

for (i in 1:5) {
  if (i %% 2 == 0) {
    next  # Skip even numbers
  }
  print(paste("Odd:", i))
}
# Prints Odd: 1, Odd: 3, Odd: 5

There is no pass

R does not have a pass statement. If a block needs to be empty, you can simply use an empty block {} or a comment.

for (i in 1:3) {
  # Placeholder for future logic - do nothing for now
  # An empty block is valid
}

Nested Loops

Nested loops are loops inside loops. They are useful for working with multi-dimensional data, like matrices, or for generating combinations.

Example: Multiplication Table

for (i in 1:3) {
  for (j in 1:3) {
    cat(i, "x", j, "=", i * j, "  ")
  }
  cat("\n")  # New line after each inner loop
}

Nested while Loop

i <- 1
while (i <= 2) {
  j <- 1
  while (j <= 2) {
    print(paste(i, j))
    j <- j + 1
  }
  i <- i + 1
}

Pitfalls

  • Complexity: Nested loops can quickly become slow if the inner loops run many times.
  • Consider using vectorized operations or the outer() function instead of nested loops for mathematical operations.

Functions in R

Functions in R are defined using the function keyword and are assigned to a variable. They are first-class objects, meaning you can pass them as arguments to other functions.

1. Defining Functions

greet <- function(name) {
  return(paste("Hello,", name, "!"))
}
print(greet("Alice"))

2. Default Arguments

power <- function(x, exponent = 2) {
  return(x ^ exponent)
}
print(power(3))    # Uses default: 9
print(power(3, 3)) # Overrides default: 27

3. Anonymous Functions

You can create functions without a name (anonymous functions), often used with functions like lapply().

# An anonymous function that adds two numbers
(function(a, b) { a + b })(3, 4)  # Returns 7

# More commonly used with apply functions
lapply(1:3, function(x) x^2)  # Returns list(1, 4, 9)

Importing Packages in R

R's functionality is extended through packages (libraries). Packages can be installed from CRAN (Comprehensive R Archive Network) or other repositories and then loaded into your session.

1. Installing Packages

# Install a package from CRAN
install.packages("dplyr")

# Install multiple packages at once
install.packages(c("ggplot2", "tidyr"))

2. Loading Packages

# Load a package into the current session
library(dplyr)

# Alternatively, use require(), but library() is more common
require(ggplot2)

3. Using Package Functions

Once a package is loaded, you can use its functions directly.

library(dplyr)
# Now you can use dplyr functions like filter(), select(), etc.

4. Accessing Functions Without Loading

You can use a specific function from a package without loading the entire package using ::.

dplyr::filter(mtcars, mpg > 20)

5. Common Data Science Packages

  • dplyr, tidyr: Data manipulation and cleaning.
  • ggplot2: Data visualization.
  • readr, readxl: Reading data from files.
  • shiny: Building interactive web apps.

Reading and Writing Files

R provides several functions to read data from and write data to files. Common formats include CSV, text, and Excel files.

1. Reading a CSV File

# Read a CSV file into a data frame
df <- read.csv("path/to/your/file.csv")

# Specify options like strings as factors
# In recent R versions, strings are not converted to factors by default
df <- read.csv("file.csv", stringsAsFactors = FALSE)

2. Writing to a CSV File

write.csv(df, "path/to/output/file.csv", row.names = FALSE)

3. Reading Text Files

# Read entire file as a character vector
lines <- readLines("textfile.txt")

# Read with a connection, useful for large files
con <- file("textfile.txt", "r")
first_line <- readLines(con, n = 1)
close(con)

4. Basic File Functions

file.exists("myfile.csv")  # Check if file exists
file.remove("old_file.csv") # Delete a file
dir.create("new_folder")    # Create a directory

5. The here Package (Recommended)

The here package helps manage file paths in a project, making your code more reproducible.

library(here)
csv_path <- here("data", "my_data.csv")  # Constructs a reliable path
df <- read.csv(csv_path)

dplyr — Install & Import

dplyr is a core package for data manipulation in R. It provides a grammar of data manipulation with easy-to-understand verb-based functions.

Install and Load

install.packages("dplyr")  # Install the package
library(dplyr)         # Load it into your session

Why use dplyr?

  • Intuitive function names (verbs) like filter(), select(), mutate().
  • Efficient computation, including on large datasets.
  • Consistent syntax and excellent integration with the pipe operator %>%.

dplyr — Key Verbs

dplyr's functionality is built around a set of core 'verbs' for data manipulation.

filter() — Select Rows

filter(mtcars, mpg > 20, cyl == 4)  # Cars with mpg>20 and 4 cylinders

select() — Select Columns

select(mtcars, mpg, cyl, hp)        # Select only these columns
select(mtcars, -mpg)                # Select all columns except mpg

mutate() — Create or Modify Columns

mutate(mtcars, kpl = mpg * 0.425144) # Add a new column for km per liter

arrange() — Sort Rows

arrange(mtcars, desc(mpg))  # Sort by mpg, highest first

summarize() — Aggregate Data

summarize(mtcars, avg_mpg = mean(mpg, na.rm = TRUE)) # Average mpg

These verbs are most powerful when used with group_by().

mtcars %>%
  group_by(cyl) %>%
  summarize(avg_mpg = mean(mpg)) # Average mpg for each cylinder group

dplyr — The Pipe Operator %>%

The pipe operator %>% (from the magrittr package, included with dplyr) allows you to chain multiple operations together in a readable, left-to-right fashion. It takes the output of the expression on its left and passes it as the first argument to the function on its right.

Example Without Pipes

cyl_4 <- filter(mtcars, cyl == 4)
cyl_4_mpg <- select(cyl_4, mpg, cyl)
result <- arrange(cyl_4_mpg, mpg)

Example With Pipes

result <- mtcars %>%
  filter(cyl == 4) %>%
  select(mpg, cyl) %>%
  arrange(mpg)

The pipe makes the sequence of operations clear and avoids creating intermediate variables.

The Native Pipe |>

R 4.1.0 introduced a native pipe operator |>. Its behavior is very similar to %>%, but with some technical differences.

result <- mtcars |>
  filter(cyl == 4) |>
  select(mpg, cyl) |>
  arrange(mpg)

ggplot2 — Install & Import

ggplot2 is a powerful and popular package for creating static, publication-quality graphics in R based on the Grammar of Graphics.

Install and Load

install.packages("ggplot2")
library(ggplot2)

Why use ggplot2?

  • Consistent and logical syntax based on layers.
  • High flexibility and customization for complex plots.
  • Produces elegant graphics with relatively little code.

ggplot2 — Basic Usage

The fundamental syntax for a ggplot2 graph involves:

  1. The ggplot() function, which defines the data and aesthetic mappings (aes()).
  2. Adding layers with geom_ functions (e.g., geom_point(), geom_line()).
  3. Using + to add components together.

Simple Scatter Plot

ggplot(data = mtcars, aes(x = wt, y = mpg)) +
  geom_point()

Adding a Smooth Line

ggplot(mtcars, aes(wt, mpg)) +
  geom_point() +
  geom_smooth(method = "lm")

Using Color for Groups

ggplot(mtcars, aes(wt, mpg, color = factor(cyl))) +
  geom_point()

ggplot2 — Modifiers & Styling

You can customize almost every aspect of a ggplot2 graph by adding more layers and theme elements.

Labels and Title

ggplot(mtcars, aes(wt, mpg)) +
  geom_point() +
  labs(
    title = "Car Weight vs. MPG",
    x = "Weight (1000 lbs)",
    y = "Miles Per Gallon"
  )

Axis Limits and Scales

ggplot(mtcars, aes(wt, mpg)) +
  geom_point() +
  xlim(0, 6) +
  ylim(10, 35)

Themes

ggplot(mtcars, aes(wt, mpg)) +
  geom_point() +
  theme_bw()  # Use a black-and-white theme

# Customize the theme in detail
ggplot(mtcars, aes(wt, mpg)) +
  geom_point() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Advanced R Concepts

Beyond the basics, R offers many advanced features for writing efficient, powerful, and reusable code.

1. The Apply Family

The apply family of functions (apply(), lapply(), sapply(), vapply()) are used to apply a function to margins of an array or to elements of a list/vector, often as an alternative to loops.

# Apply a function to each column of a data frame (margin=2)
apply(mtcars, 2, mean)

# Apply a function to each element of a list
my_list <- list(a = 1:3, b = 4:6)
lapply(my_list, mean)  # Returns a list
sapply(my_list, mean)  # Tries to simplify the result to a vector

2. Functional Programming with purrr

The purrr package enhances R's functional programming capabilities, providing a more consistent and powerful set of tools than the base apply functions.

library(purrr)
map(my_list, mean)        # Similar to lapply
map_dbl(my_list, mean)    # Returns a numeric vector

3. Writing Efficient R Code

R can be slow with loops on large data. Key strategies for efficiency include:

  • Vectorization: Use built-in vectorized functions whenever possible.
  • Avoid growing objects in loops: Pre-allocate memory for results.
  • Use efficient data structures: Data frames for tabular data, matrices for homogeneous numeric data.
  • Profile your code: Use system.time() or the profvis package to find bottlenecks.