There is a good discussion about using ggplot in loop and other creative ways at Looping over variables in ggplot. However, the discussion does not quite solve my problem.
I have a vertical dataset that I need to create plots from in a loop. There is no error in the code but my code only prints the last plot. Can't figure out why. Here is a reproducible example:
df <- cbind.data.frame(var = sample(c('a','b'), size = 100, replace = TRUE),
grp = sample(c('x','y'), size = 100, replace = TRUE), value = rnorm(100))
for (i in 2) {
plot.df <- df[which(df$var == c('a','b')[i]),]
print(ggplot(plot.df, aes(x = 1:nrow(plot.df), y = value, color = grp)) +
geom_line() + ggtitle(c('a','b')[i]))
}
As an alternative, you might also consider using lapply, as it makes the code a lot more readable.
If I am not mistaken you want to produce plots for each of the levels of the variable var.
You can firstly define your function, and then apply it to all levels
my_plot <- function(x){
# debug: x <- "a"
plot.df <- df[df$var %in% x,]
ggplot(plot.df, aes(x = 1:nrow(plot.df), y = value, color = grp)) +
geom_line() + ggtitle(x)
}
lapply(unique(df$var), my_plot)
The comment by #EJJ is correct, your loop isn't you need something like
for (i in seq_along(1:nlevels(factor(df$var))))
library(ggplot2)
library(dplyr)
df <- cbind.data.frame(var = sample(c('a','b'), size = 100, replace = TRUE),
grp = sample(c('x','y'), size = 100, replace = TRUE), value = rnorm(100))
for (i in seq_along(1:nlevels(factor(df$var)))) {
plot.df <- df[which(df$var == c('a','b')[i]),]
print(ggplot(plot.df, aes(x = 1:nrow(plot.df), y = value, color = grp)) +
geom_line() + ggtitle(c('a','b')[i]))
}
Related
A question posted here shows how to declare some of the values missing. I have a similar problem except I wish to highlight a single value with a different color eg. mpg = 20. Ideally, I would like it to show up on the legend as well.
To be clear, I wish to highlight a specific value on the gradient.
I am reusing the code that was used in the other post to seed the effort. This code specifies the lower limit of the data but does not allow for an arbitrarily chosen value.
I was wondering if people know how to do this with our without using something like scale_colour_gradientn.
library(ggplot2)
dat <- head(mtcars)
dat$model <- head(colnames(mtcars))
dat$is_low <- ifelse(dat$mpg < 20, TRUE, FALSE)
ggplot(dat, aes(x = model, y = mpg, fill = mpg)) +
geom_col() +
scale_fill_continuous(limits=c(20,max(dat$mpg)))
This is adapted from the answer I gave here, but it requires some messing around with the palette.
This is a custom palette function that replaces the values between the target values with the replace_colour, but it requires to know the range of the data first. Note that the function isn't very user friendly, but it does the job.
library(ggplot2)
library(scales)
my_palette <- function(colours, target = c(20.5, 21.5),
range = range(target), values = NULL,
replace_colour = "green") {
target <- (target - range[1]) / diff(range)
ramp <- scales::colour_ramp(colours)
force(values)
function(x) {
# Decide what values to replace
replace <- x > target[1] & x < target[2]
if (length(x) == 0)
return(character())
if (!is.null(values)) {
xs <- seq(0, 1, length.out = length(values))
f <- stats::approxfun(values, xs)
x <- f(x)
}
out <- ramp(x)
# Actually replace values
out[replace] <- replace_colour
out
}
}
You can then use that function with a custom scale as follows. I chose to highlight around 21 because 20 doesn't occur in dat$mpg.
dat <- head(mtcars)
dat$model <- head(colnames(mtcars))
dat$is_low <- ifelse(dat$mpg < 20, TRUE, FALSE)
colours <- seq_gradient_pal("#132B43", "#56B1F7")(seq(0, 1, length.out = 12))
ggplot(dat, aes(x = model, y = mpg, fill = mpg)) +
geom_col() +
continuous_scale(
"fill", "my_pal",
my_palette(colours, range = range(dat$mpg), target = c(20.9, 21.1)),
guide = guide_colourbar(nbin = 500) # Give guide plenty bins
)
Created on 2021-04-13 by the reprex package (v1.0.0)
Applying this to log scaled values requires you to log scale all the input data to my_palette too.
dat <- head(mtcars)
dat$model <- head(colnames(mtcars))
dat$mpg <- c(1e-6, 1e-4, 1e-2, 1e0, 1e2, 1e4)
colours <- seq_gradient_pal("#132B43", "#56B1F7")(seq(0, 1, length.out = 12))
ggplot(dat, aes(x = model, y = mpg, fill = mpg)) +
geom_col() +
scale_y_log10() +
continuous_scale(
"fill", "my_pal", trans = "log10",
my_palette(colours, range = log10(range(dat$mpg)),
target = log10(1e2) * c(0.9, 1.1)),
guide = guide_colourbar(nbin = 500) # Give guide plenty bins
)
My first Q here, so please go lightly if I'm out of step anywhere.
I'm trying to code R to produce a single chart to contain a number of data series lines. The number of data series may vary but will be provided in the data frame. I have tried to rearrange another thread's content to print the geom_line , but not successfully.
The logic is:
#desire to replace loop of 1:5 with ncol(df)
print(ggplot(df,aes(x=time))
for (i in 1:5) {
print (+ geom_line(aes(y=df[,i]))
}
#functioning geom point loops ggplot production:
for (i in 1:5) {
print(ggplot(df,aes(x=time,y=df[,i]))+geom_point())
}
#functioning multi-line ggplot where n is explicit:
ggplot(data=df, aes(x=time), group=1) +
geom_line(aes(y=df$`3`))+
geom_line(aes(y=df$`4`))
The functioning example code produces n number of point charts, 5 in this case. I would like just one chart to contain n line series.
This may be similar to How to plot n dimensional matrix? for which there are currently no relevant answers
Any contributions much appreciated, thanks
You can use gather from tidyverse "world" to do that.
As you didn't supply a sample data I used mtcars.
I created two data.frames one with 3 columns one with 9. In each one of them I plotted all of the variables against the variable mpg.
library(tidyverse)
df3Columns <- mtcars[, 1:4]
df9Columns <- mtcars[, 1:10]
df3Columns %>%
gather(var, value, -mpg) %>%
ggplot(aes(mpg, value, group = var, color = var)) +
geom_line()
df9Columns %>%
gather(var, value, -mpg) %>%
ggplot(aes(mpg, value, group = var, color = var)) +
geom_line()
Edit - using the sample data in comments.
library(tidyverse)
df %>%
rownames_to_column("time") %>%
gather(var, value, -time) %>%
ggplot(aes(time, value, group = var, color = var)) +
geom_line()
Sample data:
df <- structure(list("39083" = c(96, 100, 100), "39090" = c(99, 100, 100), "39097" = c(99, 100, 100)), row.names = 3:5, class = "data.frame")
To strictly answer your question, you can simply store your ggplot in a variable and add the geom_line one by one:
df <- structure(list("39083" = c(96, 100, 100), "39090" = c(99, 100, 100), "39097" = c(99, 100, 100)), row.names = 3:5, class = "data.frame")
g <- ggplot(df, aes(x = 1:nrow(df)))
for (i in colnames(df))
{
g <- g + geom_line(y = df[,i])
}
g <- g + scale_y_continuous(limits = c(min(df), max(df)))
print(g)
However, this is not a very convenient solution. I would highly recommend to refactor your data frame to be more ggplot style.
df.ultimate <- data.frame(time = numeric(), value = numeric(), group = character())
for (i in colnames(df))
{
df.ultimate <- rbind(df.ultimate, data.frame(time = 1:nrow(df), value = df[, i], group = i))
}
g <- ggplot(df.ultimate, aes(x = time, y = value, color = group))
g <- g + geom_line()
print(g)
A one-line solution:
ggplot(data.frame(time = rep(1:nrow(df), ncol(df)),
value = as.vector(as.matrix(df)),
group = rep(colnames(df), each = nrow(df))),
aes(x = time, y = value, color = group)) + geom_line()
I've got an exponentially distributed variable that I'd like to plot using ggplot2. I'm going to take the log of the variable. However, instead of having the axis label be the log format, I'd like it to be the original exponentially distributed values. Here's an example.
set.seed(1000)
aero_df <-
data_frame(
x = rnorm(100,100,99),
y = sample(c('dream on',
'dude looks like a lady'),
100,
replace = T)) %>%
mutate(x = x*x,
log_x = log(x)) %>%
gather(key,value,-y)
aero_plot <- ggplot(aero_df,aes(value,color = y,fill = y))+
geom_density(show.legend = F)+
facet_wrap(key~y,scales = 'free')
I'd like to have the x variable labels on the log_x.
aero_plot
I started of with this, but the issue here is that you can see the normal log_x labels also in the x plots.
ticks <- c(3,6,9,12)
logticks <- c(exp(9),exp(10),exp(11))
ggplot(aero_df,aes(value,color = y,fill = y))+
geom_density(show.legend = F)+
scale_x_continuous(breaks = c(ticks,logticks), labels = c(ticks,log(logticks))) +
facet_wrap(key~y,scales = 'free')
ggplot's scale_x_log10 to the rescue, maybe? I'm not 100% sure I understand your question, because I didn't understand your example code. Hopefully this is what you mean...
library(tidyverse)
set.seed(1000)
aero_df <-
data_frame(
x = rnorm(100,100,99),
y = sample(c('dream on',
'dude looks like a lady'),
100,
replace = T))
aero_plot <- ggplot(aero_df,aes(x,color = y,fill = y)) +
geom_density(show.legend = F) +
scale_x_log10() +
facet_wrap(~y,scales = 'free')
print(aero_plot)
I am trying to insert labels into a proportional barchart: one label per segment, with as text the percentage of each segment. With the help of thothal I managed to do this:
var1 <- factor(as.character(c(1,1,2,3,1,4,3,2,3,2,1,4,2,3,2,1,4,3,1,2)))
var2 <- factor(as.character(c(1,4,2,3,4,2,1,2,3,4,2,1,1,3,2,1,2,4,3,2)))
data <- data.frame(var1, var2)
dat <- ddply(data, .(var1), function(.) {
res <- cumsum(prop.table(table(factor(.$var2))))
data.frame(lab = names(res), y = c(res))
})
ggplot(data, aes(x = var1)) + geom_bar(aes(fill = var2), position = 'fill') +
geom_text(aes(label = lab, x = var1, y = y), data = dat)
I would like to have for labels the percentage of each level, and not the level name.
Any help appreciated!
You are telling geom_text to use var2 as your y variable. That is in fact as.numeric(data$var2), which translates to a range of 1-4. However, your barplot uses the cumulative percentages.
Hence you have to calculate these positions before:
library(ggplot2)
library(plyr) # just for convenience
var1 <- factor(as.character(c(1,1,2,3,1,4,3,2,3,2,1,4,2,3,2,1,4,3,1,2)))
var2 <- factor(as.character(c(1,4,2,3,4,2,1,2,3,4,2,1,1,3,2,1,2,4,3,2)))
data <- data.frame(var1, var2)
dat <- ddply(data, .(var1), function(.) {
res <- cumsum(prop.table(table(factor(.$var2)))) # re-factor to use only used levels
res2 <- prop.table(table(factor(.$var2))) # re-factor to use only used levels
data.frame(lab = names(res), y = c(res), lab2 = c(res2))
})
ggplot(data, aes(x = var1)) + geom_bar(aes(fill = var2), position = 'fill') +
geom_text(aes(label = round(lab2, 2), x = var1, y = y), data = dat)
This places the labs at the end of each bar. If you want to have them slightly offset, you should play arround in the creation of dat.
Another way to get non-cumulative percentage plus centering the labels, for future reference:
dat <- ddply(data, .(var1), function(.) {
good <- prop.table(table(factor(.$var2)))
res <- cumsum(prop.table(table(factor(.$var2))))
data.frame(lab = names(res), y = c(res), good = good, pos = cumsum(good) - 0.5*good)
})
ggplot(data, aes(x = var1)) + geom_bar(aes(fill = var2), position = 'fill') +
geom_text(aes(label = round(good.Freq, 2), x = var1, y = pos.Freq), data = dat)
I used the following code and work well for me, give it a try.
geom_text(aes(label = paste(round(dat2$value,0), "%"),
vjust = ifelse(value >= 0, -0.05, 1.15)
),
size = 4, position = position_stack(vjust=0.5)
)
Basically, you need label = paste(y value, "%"). In my code, dat2 is the data file name; value is the Y value in the figure. In this case, I rounded up the number with 0 decimal.Good luck.
Alright, this has got me stumped. I have this function:
tf <- function(formula = NULL, data = NULL) {
res <- as.character(formula[[2]])
fac2 <- as.character(formula[[3]][3])
fac1 <- as.character(formula[[3]][2])
# Aesthetic & Data 1
p <- ggplot(aes_string(x = fac1, y = res, color = fac1), data = data) +
facet_grid(paste(".~", fac2)) + geom_point() # OK if we only go this far
facCounts <- count(data, vars = c(fac2, fac1))
facCounts$label <- paste("n = ", facCounts$freq , sep = "")
facCounts$y <- min(data$res) - 0.1*diff(range(data$res))
facCounts <- facCounts[,-3]
names(facCounts) <- c("f2", "f1", "lab", "y") # data frame looks correct
# Aesthetic & Data 2
p <- p + geom_text(aes(x = f1, y = y, label = lab),
color = "black", size = 4.0, data = facCounts) + facet_grid(".~f2")
p
}
Which when run with this data and call:
set.seed(1234)
mydf <- data.frame(
resp = rnorm(40),
cat1 = sample(LETTERS[1:3], 40, replace = TRUE),
cat2 = sample(letters[1:2], 40, replace = TRUE))
p <- tf(formula = resp~cat1*cat2, data = mydf); print(p)
Produces this picture:
If you look carefully, you'll see that the data in the two facets are actually the same. The counts are correct for that data that should be displayed (and is stored in facCounts). If the call to geom_text is commented out, then the plot is correct. A variety of changes to the geom_text call leave me with either what you see above, or the correct data is present but the count texts overlap. I can't find the way out of this labyrinth! An attempt with + annotate("text", ...) doesn't work either. What change is needed to keep the data faceted and the counts correct? Thanks. This is ggplot 0.9.3 btw.
Now that I've convinced myself that this will work:
tf <- function(formula = NULL, data = NULL) {
res <- as.character(formula[[2]])
fac2 <- as.character(formula[[3]][3])
fac1 <- as.character(formula[[3]][2])
# Aesthetic & Data 1
p <- ggplot(aes_string(x = fac1, y = res, color = fac1), data = data) +
facet_grid(paste(".~", fac2)) + geom_point() # OK if we only go this far
facCounts <- count(data, vars = c(fac2, fac1))
facCounts$label <- paste("n = ", facCounts$freq , sep = "")
facCounts$y <- min(data$res) - 0.1*diff(range(data$res))
facCounts <- facCounts[,-3]
names(facCounts) <- c("cat2", "f1", "lab", "y") # data frame looks correct
# Aesthetic & Data 2
p <- p + geom_text(aes(x = f1, y = y, label = lab),
color = "black", size = 4.0, data = facCounts)
p
}
You were calling facet_grid a second time using a differently named faceting variable. Removing the second call and renaming f2 to `cat2 seems to work.