I have a dataset of three variables: year, age group and result. There are 9 different age groups. What I am trying to do is to create a 3x3 plot where I have 9 subplots with geom_line() plots of each subgroups. So far I ended up with a 3x3 plot with all the results in all of the plots which was my step 1. However, I can't seem to find a way to do the last little bit to create my plot. Help is appreciated.
Here is my code so far:
# n is a list of unique values aka age groups in the Data1
n <- unique(Data1$age)
# Preparation for plotting all subplots
plot_lst <- vector("list", length = length(n))
for (i in 1:length(n)) {
g <- ggplot(Data1, aes(x=year, y=data, color=age)) +
geom_line()
plot_lst[[i]] <- g
}
# Plotting all subplots into one
cowplot::plot_grid(plotlist = plot_lst, ncol = 3)
I know it is currently missing the filtering which I mainly tried to do within the geom_line() but I wasn't able to find a working solution.
Found a solution with:
ggplot(Data1, aes(year, data)) +
geom_line() +
geom_point() +
facet_wrap(~ age)
Related
This question already has answers here:
"for" loop only adds the final ggplot layer
(4 answers)
Closed 6 years ago.
I have a very simple loop trying to draw four curves on the same graph using ggplot. Here is the code:
df = data.frame(x=0:10/10)
gg = ggplot(df)
for (t in 4:1/4)
gg = gg + geom_path(aes(x,x^t))
gg
When I run it, it only shows the last graph. If I add them one at a time, eg:
df = data.frame(x=0:10/10)
gg = ggplot(df)
gg = gg + geom_path(aes(x,x^1.00))
gg = gg + geom_path(aes(x,x^0.75))
gg = gg + geom_path(aes(x,x^0.50))
gg = gg + geom_path(aes(x,x^0.25))
gg
it works just fine. Can someone explain the magic?
Baptiste suggested to create the entire data.frame with all variables first, and then plot it (preferably in long format). The answer provided by Gene creates the data in wide format requiring to loop over the columns.
The code below creates the data in long format and plots all curves in one call:
# create data in long format
df <- expand.grid(x = 0:10/10, exp = 1:4/4)
df$y <- df$x^df$exp
# plot
library(ggplot2)
gg <- ggplot(df, aes(x, y, group = exp)) + geom_line()
gg
Note that geom_line() is used here because it connects the observations in order of the variable on the x axis. geom_path() connects the observations in the order in which they appear in the data.
The different curves can be colour-coded as well:
# continous scale
gg + aes(colour = exp)
# discrete scale
gg + aes(colour = factor(exp))
Note that by including the colour aesthetic in the call to aes() an appropriate legend is created by default.
you could substitute the value explicitly,
eval(substitute(expr = {gg = gg + geom_path(aes(x,x^t))}, env = list(t=t)))
but a better solution would be to create the entire data.frame with all variables first, and then plot it (preferably in long format).
As alluded to by baptise and the aforementioned solution, the for loop doesn't work because of lazy evaluation. Here's a working for loop approach that works by updating the supplied data in every loop. As mentioned elsewhere, there are more efficient ways to plot this
#make the data and put it all into a single df
df = data.frame(x=0:10/10)
df = cbind(df,sapply(4:1/4, function(t) df$x^t))
# initiate ggplot
g <- ggplot(df)
# make some colours
cols = colorRampPalette(c("blue",'green'))(ncol(df))
# loop over columns
for (j in 2:ncol(df)){
# update the data within the loop
gg.data <- data.frame(x = df[,1], y = df[,j])
# add the line
g <- g + geom_path(data = gg.data, aes(x,y), col = cols[j])
}
g
library(alr4)
par(mfrow = c(2,2))
ggplot(walleye, aes(x= age)) + geom_histogram() + facet_grid(~age)
I would like to create 4 histograms from the data set walleye. I would like the histograms to be for the length of the walleye. The for histograms should each have their own age for counting. I would like to restrict the ages from 1 to 4. How can I do that with ggplot?
If I understand what you are trying to do correctly, this should help:
library(alr4)
library(ggplot2)
ggplot(subset(walleye, age<5), aes(x=length)) + geom_histogram() + facet_grid(~age)
This way you are only plotting the subset of the data where age is 1-4, and you are actually plotting histograms of length.
You could try this too (adding another line of code on top of your code):
library(alr4)
library(ggplot2)
p <- ggplot(walleye, aes(x= age)) + geom_histogram() + facet_grid(~age)
p %+% subset(walleye, age %in% 1:4)
I want to create a (time-series) plot out of 40 million data points in order to show two regression lines with two specific events on each of it (first occurrence of an optimum in time-series).
Currently, I draw the regression lines and add a geom_vline to it to indicate the event.
As I want to be independent from colours in the plot, it would be beneficial if I could just plot the marker geom_vline as a point on the regression line.
Do you have any idea how to solve this using ggplot2?
My current approach is this here (replaced data points with test data):
library(ggplot2)
# Generate data
m1 <- "method 1"
m2 <- "method 2"
data1 <- data.frame(Time=seq(100), Value=sample(1000, size=100), Type=rep(as.factor(m1), 100))
data2 <- data.frame(Time=seq(100), Value=sample(1000, size=100), Type=rep(as.factor(m2), 100))
df <- rbind(data1, data2)
rm(data1, data2)
# Calculate first minima for each Type
m1_intercept <- df[which(df$Type == m1), ][which.min(df[which(df$Type == m1), ]$Value),]
m2_intercept <- df[which(df$Type == m2), ][which.min(df[which(df$Type == m2), ]$Value),]
# Plot regression and vertical lines
p1 <- ggplot(df, aes(x=Time, y=Value, group=Type, colour=Type), linetype=Type) +
geom_smooth(se=F) +
geom_vline(aes(xintercept=m1_intercept$Time, linetype=m1_intercept$Type)) +
geom_vline(aes(xintercept=m2_intercept$Time, linetype=m2_intercept$Type)) +
scale_linetype_manual(name="", values=c("dotted", "dashed")) +
guides(colour=guide_legend(title="Regression"), linetype=guide_legend(title="First occurrence of optimum")) +
theme(legend.position="bottom")
ggsave("regression.png", plot=p1, height=5, width=7)
which generates this plot:
My desired plot would be something like this:
So my questions are
Does it make sense to indicate a minimum value on a regression line? The values y-axis position would be in fact wrong but just to indicate the timepoint?
If yes, how can I achieve such a behaviour?
If no, what would you think could be better?
Thank you very much in advance!
Robin
If you first run your ggplot() call with only geom_smooth(), you can access plotted values through ggplot_build(), which we then can use to plot points on the two fitted lines. Example:
# Create initial plot
p1<-ggplot(df, aes(x=Time, y=Value, colour=Type)) +
geom_smooth(se=F)
# Now we can access the fitted values
smooths <- ggplot_build(p1)$data[[1]]
smooths_1 <- smooths[smooths$group==1,] # First group (method 1)
smooths_2 <- smooths[smooths$group==2,] # Second group (method 2)
# Then we find the closest plotted values to the minima
smooth_1_x <- smooths_1$x[which.min(abs(smooths_1$x - m1_intercept$Time))]
smooth_2_x <- smooths_2$x[which.min(abs(smooths_2$x - m2_intercept$Time))]
# Subset the previously defined datasets for respective closest values
point_data1 <- smooths_1[smooths_1$x==smooth_1_x,]
point_data2 <- smooths_1[smooths_2$x==smooth_2_x,]
Now we use point_data1 and point_data2 to place the points on your plot:
ggplot(df, aes(x=Time, y=Value, colour=Type)) +
geom_smooth(se=F) +
geom_point(data=point_data1, aes(x=x, y=y), colour = "red",size = 5) +
geom_point(data=point_data2, aes(x=x, y=y), colour = "red", size = 5)
To reproduce this plot, you can use set.seed(42) for your data generation step.
Am having trouble making my faceted plot only display data, as opposed to displaying facets with no data.
The following code:
p<- ggplot(spad.data, aes(x=Day, y=Mean.Spad, color=Inoc))+
geom_point()
p + facet_grid(N ~ X.CO2.)
Gives the following graphic:
I have played around with it for a while but can't seem to figure out a solution.
Dataframe viewable here: https://docs.google.com/spreadsheets/d/11ZiDVRAp6qDcOsCkHM9zdKCsiaztApttJIg1TOyIypo/edit?usp=sharing
Reproducible Example viewable here: https://docs.google.com/document/d/1eTp0HCgZ4KX0Qavgd2mTGETeQAForETFWdIzechTphY/edit?usp=sharing
Your issue lies in the missing observations for your x- and y variables. Those don't influence the creation of facets, that is only influenced by the levels of faceting variables present in the data. Here is an illustration using sample data:
#generate some data
nobs=100
set.seed(123)
dat <- data.frame(G1=sample(LETTERS[1:3],nobs, T),
G2 = sample(LETTERS[1:3], nobs, T),
x=rnorm(nobs),
y=rnorm(nobs))
#introduce some missings in one group
dat$x[dat$G1=="C"] <- NA
#attempt to plot
p1 <- ggplot(dat, aes(x=x,y=y)) + facet_grid(G1~G2) + geom_point()
p1 #facets are generated according to the present levels of the grouping factors
#possible solution: remove the missing data before plotting
p2 <- ggplot(dat[complete.cases(dat),], aes(x=x, y=y)) + facet_grid(G1 ~G2) + geom_point()
p2
I like the stat_ecdf() feature part of ggplot2 package, which I find quite useful to explore a data series. However this is only visual, and I wonder if it is feasible - and if yes how - to get the associated table?
Please have a look to the following reproducible example
p <- ggplot(iris, aes_string(x = "Sepal.Length")) + stat_ecdf() # building of the cumulated chart
p
attributes(p) # chart attributes
p$data # data is iris dataset, not the serie used for displaying the chart
As #krfurlong showed me in this question, the layer_data function in ggplot2 can get you exactly what you're looking for without the need to recreate the data.
p <- ggplot(iris, aes_string(x = "Sepal.Length")) + stat_ecdf()
p.data <- layer_data(p)
The first column in p.data, "y", contains the ecdf values. "x" is the Sepal.Length values on the x-axis in your plot.
We can recreate the data:
#Recreate ecdf data
dat_ecdf <-
data.frame(x=unique(iris$Sepal.Length),
y=ecdf(iris$Sepal.Length)(unique(iris$Sepal.Length))*length(iris$Sepal.Length))
#rescale y to 0,1 range
dat_ecdf$y <-
scale(dat_ecdf$y,center=min(dat_ecdf$y),scale=diff(range(dat_ecdf$y)))
Below 2 plots should look the same:
#plot using new data
ggplot(dat_ecdf,aes(x,y)) +
geom_step() +
xlim(4,8)
#plot with built-in stat_ecdf
ggplot(iris, aes_string(x = "Sepal.Length")) +
stat_ecdf() +
xlim(4,8)