Learning Objectives
-
Understand packages and libraries in R and how they work.
-
Learn basics of ggplot2.
-
Calculate and visualize probabilities from:
- Binomial distribution.
- Normal distribution.
Key Functions
ggplot()dbinom()pbinom()geom_segment()pnorm()stat_function()dnorm()geom_area()
Introduction to ggplot2
-
ggplot2 is a powerful visualization package based on the Grammar of Graphics.
-
Three key components in
ggplot():- Data – the dataset.
- Aesthetics (aes) – variables mapped to axes, color, size, etc.
- Geometric object – the type of plot (
geom_histogram,geom_point, etc.).
ggplot Basics Example
Dataset: penguins.csv.
-
Start with data:
ggplot(data = penguins) -
Add aesthetics:
ggplot(data = penguins, aes(x = body_mass_g)) -
Add geometry:
ggplot(data = penguins, aes(x = body_mass_g)) + geom_histogram()
Mnemonic: Data → Aesthetics → Geometric object.
Binomial Distribution in R
Probability Mass Function
-
dbinom(x, size, prob)→ P(X = x)x: number of successes.size: number of trials.prob: probability of success.
Example: exactly 3 heads in 4 flips, p=0.5.
dbinom(x = 3, size = 4, prob = 0.5)
Cumulative Distribution
-
pbinom(q, size, prob, lower.tail)→ P(X ≤ q) iflower.tail=TRUE. -
Default:
lower.tail = TRUE.- TRUE → P(X ≤ x).
- FALSE → P(X > x).
Example: at most 1 head in 4 flips.
pbinom(q = 1, size = 4, prob = 0.5)
Binomial Visualization
-
Create a data frame with outcomes and probabilities:
df1 <- data.frame(x = 0:4, y = dbinom(0:4, size = 4, prob = 0.5)) -
Plot with
geom_segment:ggplot(df1, aes(x = x, xend = x, y = 0, yend = y)) + geom_segment() + labs(title = "Binomial(4, 0.5)", x = "Number of Heads", y = "Probability")
Reminder: Always add informative title + axis labels.
Normal Distribution in R
Cumulative Probabilities
-
Function:
pnorm(q, mean, sd, lower.tail). -
Example: P(X < 50) when mean=80, sd=15.
pnorm(q = 50, mean = 80, sd = 15) -
For the standard normal distribution (Z): mean=0, sd=1 (default).
Visualizing Normal PDF
-
Use
stat_function()withdnorm():ggplot(data.frame(x = c(-4, 4)), aes(x = x)) + stat_function(fun = dnorm, args = list(0, 1)) -
Improved with labels and formatting:
ggplot(data.frame(x = c(-4, 4)), aes(x = x)) + stat_function(fun = dnorm, args = list(0, 1), col = "black", lwd = 1) + labs(title = "Normal(0, 1)", x = "", y = "Density")
Shading Areas in Normal Distribution
-
Use
geom_area()to shade regions. -
Example: Shade region where Z ≤ 1.4.
ggplot(data.frame(x = c(-4, 4)), aes(x = x)) + geom_area(stat = "function", fun = dnorm, args = list(0, 1), fill = "lightblue", xlim = c(-4, 1.4)) + stat_function(fun = dnorm, args = list(0, 1), col = "black", lwd = 1) + labs(title = "Normal(0, 1)", y = "Density")
Key: specify xlim for shaded area bounds.
Uniform Distribution in R
-
Function:
punif(q, min, max, lower.tail). -
Example: Wait time between 20–70 minutes. P(X ≥ 30).
punif(q = 30, min = 20, max = 70, lower.tail = FALSE)
General Reminders
-
Always install once, load per session for packages.
-
Binomial distribution: use
dbinom()for exact probabilities,pbinom()for cumulative. -
Normal distribution: use
pnorm()for probabilities,dnorm()for density plotting. -
Uniform distribution: use
punif(). -
Visualizations should always include:
- Title.
- Clear axis labels.
- Professional style suitable for reports.