How to support loop drawing in ggplot2? - r

data <- data.frame(a=1:10, b=1:10 * 2, c=1:10 * 3)
library(ggplot2)
p <- ggplot(NULL, aes(x = 1:10))
# Using for loop will cause the plot only to draw the last line.
for (i in names(data)){
p <- p + geom_line(aes(y = data[[i]], colour = i))
}
# Lines below works fine.
# p <- p + geom_line(aes(y = data[["a"]], colour = "a"))
# p <- p + geom_line(aes(y = data[["b"]], colour = "b"))
# p <- p + geom_line(aes(y = data[["c"]], colour = "c"))
print(p)
Why loop plotting doesn't work as what we expected?
Is this a lazy plotting method?

You don't actually have to loop to get your lines. You just need to reshape your data and actually include x in your data frame. Your data is wide, and ggplot2 likes long data. This is how you can easily make multiple lines in a single plot.
As an aside, your method doesn't work as you are replacing p each time you iterate, ending up with only the endpoint of the loop.
library(ggplot2)
library(tidyr)
data <- data.frame(x = 1:10, a=1:10, b=1:10 * 2, c=1:10 * 3)
df <- gather(data, name, value, -x)
ggplot(df, aes(x = x, y = value, color = name)) +
geom_line()

Related

In R ggplot, how to add labels to points in a stat_qq?

df <- data.frame(y = rt(26, df = 5), name = letters)
p <- ggplot(df, aes(sample = y))
p + stat_qq() + stat_qq_line()
The above produced the plot as expected.
But now I need labels at each point, so:
df <- data.frame(y = rt(26, df = 5), name = letters)
p <- ggplot(df, aes(sample = y))
p + stat_qq() + stat_qq_line() + geom_text(label = letters)
But it complains that geom_text needs x and y aes.
how do I fix it?
I found out how to compute the y.
But don't know how to compute the x.
You can use ggplot_build() to get the coordinates of points in your plot. In your case these are found in data[[1]].
The default labels appear right on top of the points. Spacing is adjusted using the variable offset... seems to look good.
library(ggplot2)
df <- data.frame(y = rt(26, df = 5), name = letters)
myplot <- ggplot(df, aes(sample = y)) +
stat_qq() +
stat_qq_line()
x.pnts <- ggplot_build(myplot)$data[[1]]$x
y.pnts <- ggplot_build(myplot)$data[[1]]$y
offset <- (max(y.pnts) - min(y.pnts)) / 20
myplot +
geom_text(label = df$name,
x = x.pnts,
y = y.pnts + offset)

Why does R behave differently when parsing parameters of plotting?

I am attempting to plot multiple time series variables on a single line chart using ggplot. I am using a data.frame which contains n time series variables, and a column of time periods. Essentially, I want to loop through the data.frame, and add exactly n goem_lines to a single chart.
Initially I tried using the following code, where;
df = data.frame containing n time series variables, and 1 column of time periods
wid = n (number of time series variables)
p <- ggplot() +
scale_color_manual(values=c(colours[1:wid]))
for (i in 1:wid) {
p <- p + geom_line(aes(x=df$Time, y=df[,i], color=var.lab[i]))
}
ggplotly(p)
However, this only produces a plot of the final time series variable in the data.frame. I then investigated further, and found that following sets of code produce completely different results:
p <- ggplot() +
scale_color_manual(values=c(colours[1:wid]))
i = 1
p = p + geom_line(aes(x=df$Time, y=df[,i], color=var.lab[i]))
i = 2
p = p + geom_line(aes(x=df$Time, y=df[,i], color=var.lab[i]))
i = 3
p = p + geom_line(aes(x=df$Time, y=df[,i], color=var.lab[i]))
ggplotly(p)
Plot produced by code above
p <- ggplot() +
scale_color_manual(values=c(colours[1:wid]))
p = p + geom_line(aes(x=df$Time, y=df[,1], color=var.lab[1]))
p = p + geom_line(aes(x=df$Time, y=df[,2], color=var.lab[2]))
p = p + geom_line(aes(x=df$Time, y=df[,3], color=var.lab[3]))
ggplotly(p)
Plot produced by code above
In my mind, these two sets of code are identical, so could anyone explain why they produce such different results?
I know this could probably be done quite easily using autoplot, but I am more interested in the behavior of these two snipits of code.
What you're trying to do is a 'hack' way by plotting multiple lines, but it's not ideal in ggplot terms. To do it successfully, I'd use aes_string. But it's a hack.
df <- data.frame(Time = 1:20,
Var1 = rnorm(20),
Var2 = rnorm(20, mean = 0.5),
Var3 = rnorm(20, mean = 0.8))
vars <- paste0("Var", 1:3)
col_vec <- RColorBrewer::brewer.pal(3, "Accent")
library(ggplot2)
p <- ggplot(df, aes(Time))
for (i in 1:length(vars)) {
p <- p + geom_line(aes_string(y = vars[i]), color = col_vec[i], lwd = 1)
}
p + labs(y = "value")
How to do it properly
To make this plot more properly, you need to pivot the data first, so that each aesthetic (aes) is mapped to a variable in your data frame. That means we need a single variable to be color in our data frame. Hence, we pivot_longer and plot again:
library(tidyr)
df_melt <- pivot_longer(df, cols = Var1:Var3, names_to = "var")
ggplot(df_melt, aes(Time, value, color = var)) +
geom_line(lwd = 1) +
scale_color_manual(values = col_vec)

Which is the equivalen to seaborn hue in ggplot?

I'm starting to program in R and I'm getting stuck in this plot.
This is the plot I'm traying to make:
I'm able to do it with this code:
x <- seq(0, 10,1 )
y = x**2
z= x**3
plot(x, y, type="o", col="blue",xlab='x',ylab="y = x2")
lines(x,z,col="green")
I need to do it ussing ggplot, since I have to add futher formating, but I'm not finding the way to do it, I'm loking for the equivalen of the "hue" function on seaborn.
sns.catplot(x="sex", y="survived", hue="class", kind="point", data=titanic);
To use ggplot2, it is better to prepare a data frame with all the values. Furthermore, it is recommended to work with "long-format" data frame. We can then map the color to class, which is y and z in your example.
x <- seq(0, 10,1 )
y <- x**2
z <- x**3
# Load the tidyverse package, which contains ggplot2 and tidyr
library(tidyverse)
# Create example data frame
dat <- data.frame(x, y, z)
# Conver to long format
dat2 <- dat %>% gather(class, value, -x)
# Plot the data
ggplot(dat2,
# Map x to x, y to value, and color to class
aes(x = x, y = value, color = class)) +
# Add point and line
geom_point() +
geom_line() +
# Map the color as y is blue and z is green
scale_color_manual(values = c("y" = "blue", "z" = "green")) +
# Adjust the format to mimic the base R plot
theme_classic() +
theme(panel.grid = element_blank())
One way would be creating two dataframes separately
library(ggplot2)
df1 <- data.frame(x, y)
df2 <- data.frame(x, z)
ggplot(df1, aes(x, y)) +
geom_line(color = "blue") +
geom_point(color = "blue", shape = 1, size = 2) +
geom_line(data = df2, aes(x, z), color = "green") +
ylab("y = x2")

Handle ggplot2 axis text face programmatically

(x-posted to community.rstudio.com)
I'm wondering if it's possible to change the axis text in ggplot2 programatically or if there is some native way to do this in ggplot2. In this reprex, the idea is that I want to bold the axis text of a variable y that has an absolute value of x over 1.5. I can add it in manually via theme(), and that works fine:
library(ggplot2)
library(dplyr)
library(forcats)
set.seed(2939)
df <- data.frame(x = rnorm(15), y = paste0("y", 1:15), group = rep(1:3, 5))
df <- mutate(df, big_number = abs(x) > 1.5, face = ifelse(big_number, "bold",
"plain"))
p <- ggplot(df, aes(x = x, y = fct_inorder(y), col = big_number)) + geom_point() +
theme(axis.text.y = element_text(face = df$face))
p
Plot 1 with no facets
But if I facet it by group, y gets reordered and ggplot2 has no idea how face is connected to df and thus y, so it just bolds in the same order as the first plot.
p + facet_grid(group ~ .)
Plot 2 with facets
And it's worse if I use a different scale for each.
p + facet_grid(group ~ ., scales = "free")
Plot 3 with facets and different scales
What do you think? Is there a general way to handle this that would work consistently here?
Idea: Don't change theme, change y-axis labels. Create a call for every y with if/else condition and parse it with parse.
Not the most elegant solution (using for loop), but works (need loop as bquote doesn't work with ifelse). I always get confused when trying to work with multiple expressions (more on that here).
Code:
# Create data
library(tidyverse)
set.seed(2939)
df <- data.frame(x = rnorm(15), y = paste0("y", 1:15), group = rep(1:3, 5)) %>%
mutate(yF = fct_inorder(y),
big_number = abs(x) > 1.5)
# Expressions for y-axis
# ifelse doesn't work
# ifelse(df$big_number, bquote(bold(1)), bquote(plain(2)))
yExp <- c() # Ignore terrible way of concatenating
for(i in 1:nrow(df)) {
if (df$big_number[i]) {
yExp <- c(yExp, bquote(bold(.(as.character(df$yF[i])))))
} else {
yExp <- c(yExp, bquote(plain(.(as.character(df$yF[i])))))
}
}
# Plot with facets
ggplot(df, aes(x, yF, col = big_number)) +
geom_point() +
scale_y_discrete(breaks = levels(df$yF),
labels = parse(text = yExp)) +
facet_grid(group ~ ., scales = "free")
Result:
Inspired by #PoGibas, I also used a function in scale_y_discrete(), which works, too.
bold_labels <- function(breaks) {
big_nums <- filter(df, y %in% breaks) %>%
pull(big_number)
labels <- purrr::map2(
breaks, big_nums,
~ if (.y) bquote(bold(.(.x))) else bquote(plain(.(.x)))
)
parse(text = labels)
}
ggplot(df, aes(x, fct_inorder(y), col = big_number)) +
geom_point() +
scale_y_discrete(labels = bold_labels) +
facet_grid(group ~ ., scales = "free")

How to plot three point lines using ggplot2 instead of the default plot in R

I have three matrix and I want to plot the graph using ggplot2. I have the data below.
library(cluster)
require(ggplot2)
require(scales)
require(reshape2)
data(ruspini)
x <- as.matrix(ruspini[-1])
w <- matrix(W[4,])
df <- melt(data.frame(max_Wmk, min_Wmk, w, my_time = 1:10), id.var = 'my_time')
ggplot(df, aes(colour = variable, x = my_time, y = value)) +
geom_point(size = 3) +
geom_line() +
scale_y_continuous(labels = comma) +
theme_minimal()
I want to add the three plots into one plot using a beautiful ggplot2.
Moreover, I want to make the points with different values have different colors.
I'm not quite sure what you're after, here's a guess
Your data...
max <- c(175523.9, 33026.97, 21823.36, 12607.78, 9577.648, 9474.148, 4553.296, 3876.221, 2646.405, 2295.504)
min <- c(175523.9, 33026.97, 13098.45, 5246.146, 3251.847, 2282.869, 1695.64, 1204.969, 852.1595, 653.7845)
w <- c(175523.947, 33026.971, 21823.364, 5246.146, 3354.839, 2767.610, 2748.689, 1593.822, 1101.469, 1850.013)
Slight modification to your base plot code to make it work...
plot(1:10,max,type='b',xlab='Number',ylab='groups',col=3)
points(1:10,min,type='b', col=2)
points(1:10,w,type='b',col=1)
Is this what you meant?
If you want to reproduce this with ggplot2, you might do something like this...
# ggplot likes a long table, rather than a wide one, so reshape the data, and add the 'time' variable explicitly (ie. my_time = 1:10)
require(reshape2)
df <- melt(data.frame(max, min, w, my_time = 1:10), id.var = 'my_time')
# now plot, with some minor customisations...
require(ggplot2); require(scales)
ggplot(df, aes(colour = variable, x = my_time, y = value)) +
geom_point(size = 3) +
geom_line() +
scale_y_continuous(labels = comma) +
theme_minimal()
UPDATE after the question was edited and the example data changed, here's an edit to suit the new example data:
Here's your example data (there's scope for simplification and speed gains here, but that's another question):
library(cluster)
require(ggplot2)
require(scales)
require(reshape2)
data(ruspini)
x <- as.matrix(ruspini[-1])
wss <- NULL
W=matrix(data=NA,ncol=10,nrow=100)
for(j in 1:100){
k=10
for(i in 1: k){
wss[i]=kmeans(x,i)$tot.withinss
}
W[j,]=as.matrix(wss)
}
max_Wmk <- matrix(data=NA, nrow=1,ncol=10)
for(i in 1:10){
max_Wmk[,i]=max(W[,i],na.rm=TRUE)
}
min_Wmk <- matrix(data=NA, nrow=1,ncol=10)
for(i in 1:10){
min_Wmk[,i]=min(W[,i],na.rm=TRUE)
}
w <- matrix(W[4,])
Here's what you need to do to make the three objects into vectors so you can make the data frame as expected:
max_Wmk <- as.numeric(max_Wmk)
min_Wmk <- as.numeric(min_Wmk)
w <- as.numeric(w)
Now reshape and plot as before...
df <- melt(data.frame(max_Wmk, min_Wmk, w, my_time = 1:10), id.var = 'my_time')
ggplot(df, aes(colour = variable, x = my_time, y = value)) +
geom_point(size = 3) +
geom_line() +
scale_y_continuous(labels = comma) +
theme_minimal()
And here's the result:

Resources