I have a list of data frames that are all of the same structure, and I want to plot information from all of these data frames on the same diagram in R using ggplot, like when facet_wrap is used to show multiple panels on a single image, but am having trouble. below I have created a reproducible example.
library(ggplot)
#Designating 3 datasets:
data_1 <- mtcars
data_2 <- mtcars
data_3 <- mtcars
#Making them into a list:
mylist <- list(data_1, data_2, data_3)
#What things should look like, with facet_wrap being by "dataset", and thus a panel for each of the
#three datasets presented.
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point() + facet_wrap(~Species)
But instead, when I run the following, I get an error saying that the data must be presented as a dataframe, not a list:
ggplot(mylist, aes(x = cyl, y = mpg)) + geom_point() + facet_wrap(~.x)
Does anyone know the best way to use ggplot to plot from a list like this? Do you have to somehow wrap ggplot within lapply()?
One option would be to bind your dataframes by row using e.g. dplyr::bind_rows:
library(ggplot2)
data_1 <- mtcars
data_2 <- mtcars
data_3 <- mtcars
mylist <- list(data_1, data_2, data_3) |>
dplyr::bind_rows(.id = "id")
ggplot(mylist, aes(x = cyl, y = mpg)) + geom_point() + facet_wrap(~id)
Related
I am trying to display grouped boxplot and combined boxplot into one plot. Take the iris data for instance:
data(iris)
p1 <- ggplot(iris, aes(x=Species, y=Sepal.Length)) +
geom_boxplot()
p1
I am trying to compare overall distribution with distributions within each categories. So is there a way to display a boxplot of all samples on the left of these three grouped boxplots?
Thanks in advance.
You can rbind a new version of iris, where Species equals "All" for all rows, to iris before piping to ggplot
p1 <- iris %>%
rbind(iris %>% mutate(Species = 'All')) %>%
ggplot(aes(x = Species, y = Sepal.Length)) +
geom_boxplot()
Yes, you can just create a column for all species as follows:
iris = iris %>% mutate(all = "All Species")
p1 <- ggplot(iris) +
geom_boxplot(aes(x=Species, y=Sepal.Length)) +
geom_boxplot(aes(x=all, y=Sepal.Length))
p1
I have composed a function that develops histograms using ggplot2 on the numerical columns of a dataframe that will be passed to it. The function stores these plots into a list and then returns the list.
However when I run the function I get the same plot again and again.
My code is the following and I provide also a reproducible example.
hist_of_columns = function(data, class, variables_to_exclude = c()){
library(ggplot2)
library(ggthemes)
data = as.data.frame(data)
variables_numeric = names(data)[unlist(lapply(data, function(x){is.numeric(x) | is.integer(x)}))]
variables_not_to_plot = c(class, variables_to_exclude)
variables_to_plot = setdiff(variables_numeric, variables_not_to_plot)
indices = match(variables_to_plot, names(data))
index_of_class = match(class, names(data))
plots = list()
for (i in (1 : length(variables_to_plot))){
p = ggplot(data, aes(x= data[, indices[i]], color= data[, index_of_class], fill=data[, index_of_class])) +
geom_histogram(aes(y=..density..), alpha=0.3,
position="identity", bins = 100)+ theme_economist() +
geom_density(alpha=.2) + xlab(names(data)[indices[i]]) + labs(fill = class) + guides(color = FALSE)
name = names(data)[indices[i]]
plots[[name]] = p
}
plots
}
data(mtcars)
mtcars$am = factor(mtcars$am)
data = mtcars
variables_to_exclude = 'mpg'
class = 'am'
plots = hist_of_columns(data, class, variables_to_exclude)
If you check the list plots you will discover that it contains the same plot repeated.
Simply use aes_string to pass string variables into the ggplot() call. Right now, your plot uses different data sources, not aligned with ggplot's data argument. Below x, color, and fill are separate, unrelated vectors though they derive from same source but ggplot does not know that:
ggplot(data, aes(x= data[, indices[i]], color= data[, index_of_class], fill=data[, index_of_class]))
However, with aes_string, passing string names to x, color, and fill will point to data:
ggplot(data, aes_string(x= names(data)[indices[i]], color= class, fill= class))
Here is strategy using tidyeval that does what you are after:
library(rlang)
library(tidyverse)
hist_of_cols <- function(data, class, drop_vars) {
# tidyeval overhead
class_enq <- enquo(class)
drop_enqs <- enquo(drop_vars)
data %>%
group_by(!!class_enq) %>% # keep the 'class' column always
select(-!!drop_enqs) %>% # drop any 'drop_vars'
select_if(is.numeric) %>% # keep only numeric columns
gather("key", "value", -!!class_enq) %>% # go to long form
split(.$key) %>% # make a list of data frames
map(~ ggplot(., aes(value, fill = !!class_enq)) + # plot as usual
geom_histogram() +
geom_density(alpha = .5) +
labs(x = unique(.$key)))
}
hist_of_cols(mtcars, am, mpg)
hist_of_cols(mtcars, am, c(mpg, wt))
My data set features a factor(TypeOfCat) and a numeric (AgeOfCat).
I've made the below box plot. In addition to a box representing each type of cat, I've also tried to add a box representing the ungrouped data (ie the entire cohort of cats and their ages). What I've got is not quite what I'm after though, as sum() of course won't provide all the information needed to create such a plot. Any help would be much appreciated.
Data set and current code:
Df1 <- data.frame(TypeOfCat=c("A","B","B","C","C","A","B","C","A","B","A","C"),
AgeOfCat=c(14,2,5,8,4,5,2,6,3,6,12,7))
Df2 <- data.frame(TypeOfCat=c("AllCats"),
AgeOfCat=sum(Df1$AgeOfCat)))
Df1 <- rbind(Df1, Df2)
qplot(Df1$TypeOfCat,Df1$AgeOfCat, geom = "boxplot") + coord_flip()
No need for sum. Just take all the values individually for AllCats:
# Your original code:
library(ggplot2)
Df1 <- data.frame(TypeOfCat=c("A","B","B","C","C","A","B","C","A","B","A","C"),
AgeOfCat=c(14,2,5,8,4,5,2,6,3,6,12,7))
# this is the different part:
Df2 <- data.frame(TypeOfCat=c("AllCats"),
AgeOfCat=Df1$AgeOfCat)
Df1 <- rbind(Df1, Df2)
qplot(Df1$TypeOfCat,Df1$AgeOfCat, geom = "boxplot") + coord_flip()
You can see you have all the observations if you add geom_point to the boxplot:
ggplot(Df1, aes(TypeOfCat, AgeOfCat)) +
geom_boxplot() +
geom_point(color='red') +
coord_flip()
Like this?
library(ggplot2)
# first double your data frame, but change "TypeOfCat", since it contains all:
df <- rbind(Df1, transform(Df1, TypeOfCat = "AllCats"))
# then plot it:
ggplot(data = df, mapping = aes(x = TypeOfCat, y = AgeOfCat)) +
geom_boxplot() + coord_flip()
I would like to plot multiple separate plots and so far I have the following code:
However, I don't want the final column from my dataset; it makes ggplot2 plot x-variable vs x-variable.
library(ggplot2)
require(reshape)
d <- read.table("C:/Users/trinh/Desktop/Book1.csv", header=F,sep=",",skip=24)
t<-c(0.25,1,2,3,4,6,8,10)
d2<-d2[,3:13] #removing unwanted columns
d2<-cbind(d2,t) #adding x-variable
df <- melt(d2, id = 't')
ggplot(data=df, aes(y=value,x=t) +geom_point(shape=1) +
geom_smooth(method='lm',se=F)+facet_grid(.~variable)
I tried adding
data=subset(df,df[,3:12])
but I don't think I am writing it correctly. Please advise. Thanks.
Here's how you could do it, using data(iris) as an example:
(i) plot with all variables
df <- reshape2::melt(iris, id="Species")
ggplot(df, aes(y=value, x=Species)) + geom_point() + facet_wrap(~ variable)
(ii) plot without "Petal.Width"
library(dplyr)
df2 <- df %>% filter(!variable == "Petal.Width")
ggplot(df2, aes(y=value, x=Species)) + geom_point() + facet_wrap(~ variable)
library(ggplot2)
x<-c(1,2,3,4,5)
a<-c(3,8,4,7,6)
b<-c(2,9,4,8,5)
df1 <- data.frame(x, a, b)
x<-c(1,2,3,4,5)
a<-c(6,5,9,4,1)
b<-c(9,5,8,6,2)
df2 <- data.frame(x, a, b)
df.lst <- list(df1, df2)
plotdata <- function(x) {
ggplot(data = x, aes(x=x, y=a, color="blue")) +
geom_point() +
geom_line()
}
lapply(df.lst, plotdata)
I have a list of data frames and i am trying to plot the same columns on the same ggplot. I tried with the code above but it seems to return only one plot.
There should be 2 ggplots. one with the "a" column data plotted and the other with the "b" column data plotted from both data frames in the list.
i've looked at many examples and it seems that this should work.
They are both plotted. If you are using RStudio, click the back arrow to toggle between the plots. If you want to see them together, do:
library(gridExtra)
do.call(grid.arrange,lapply(df.lst, plotdata))
If you want them on the same plot, it's as simple as:
ggplot(data = df1, aes(x=x, y=a), color="blue") +
geom_point() +
geom_line() +
geom_line(data = df2, aes(x=x, y=a), color="red") +
geom_point(data = df2, aes(x=x, y=a), color="red")
Edit: if you have several of these, you are probably better off combining them into a big data set while keeping the df of origin for use in the aesthetic. Example:
df.lst <- list(df1, df2)
# put an identifier so that you know which table the data came from after rbind
for(i in 1:length(df.lst)){
df.lst[[i]]$df_num <- i
}
big_df <- do.call(rbind,df.lst) # you could also use `rbindlist` from `data.table`
# now use the identifier for the coloring in your plot
ggplot(data = big_df, aes(x=x, y=a, color=as.factor(df_num))) +
geom_point() +
geom_line() + scale_color_discrete(name="which df did I come from?")
#if you wanted to specify the colors for each df, see ?scale_color_manual instead