I have a dataframe in R consisting of 104 columns, appearing as so:
id vcr1 vcr2 vcr3 sim_vcr1 sim_vcr2 sim_vcr3 sim_vcr4 sim_vcr5 sim_vcr6 sim_vcr7
1 2913 -4.782992840 1.7631999 0.003768704 1.376937 -2.096857 6.903021 7.018855 6.135139 3.188382 6.905323
2 1260 0.003768704 3.1577108 -0.758378208 1.376937 -2.096857 6.903021 7.018855 6.135139 3.188382 6.905323
3 2912 -4.782992840 1.7631999 0.003768704 1.376937 -2.096857 6.903021 7.018855 6.135139 3.188382 6.905323
4 2914 -1.311132669 0.8220594 2.372950077 -4.194246 -1.460474 -9.101704 -6.663676 -5.364724 -2.717272 -3.682574
5 2915 -1.311132669 0.8220594 2.372950077 -4.194246 -1.460474 -9.101704 -6.663676 -5.364724 -2.717272 -3.682574
6 1261 2.372950077 -0.7022792 -4.951318264 -4.194246 -1.460474 -9.101704 -6.663676 -5.364724 -2.717272 -3.682574
The "sim_vcr*" variables go all the way through sim_vcr100
I need two overlapping density density curves contained within one plot, looking something like this (except here you see 5 instead of 2):
I need one of the density curves to consist of all values contained in columns vcr1, vcr2, and vcr3, and I need another density curve containing all values in all of the sim_vcr* columns (so 100 columns, sim_vcr1-sim_vcr100)
Because the two curves overlap, they need to be transparent, like in the attached image. I know that there is a pretty straightforward way to do this using the ggplot command, but I am having trouble with the syntax, as well as getting my data frame oriented correctly so that each histogram pulls from the proper columns.
Any help is much appreciated.
With df being the data you mentioned in your post, you can try this:
Separate dataframes with next code, then plot:
library(tidyverse)
library(gdata)
#Index
i1 <- which(startsWith(names(df),pattern = 'vcr'))
i2 <- which(startsWith(names(df),pattern = 'sim'))
#Isolate
df1 <- df[,c(1,i1)]
df2 <- df[,c(1,i2)]
#Melt
M1 <- pivot_longer(df1,cols = names(df1)[-1])
M2 <- pivot_longer(df2,cols = names(df2)[-1])
#Plot 1
ggplot(M1) + geom_density(aes(x=value,fill=name), alpha=.5)
#Plot 2
ggplot(M2) + geom_density(aes(x=value,fill=name), alpha=.5)
Update
Use next code for one plot:
#Unique plot
#Melt
M <- pivot_longer(df,cols = names(df)[-1])
#Mutate
M$var <- ifelse(startsWith(M$name,'vcr',),'vcr','sim_vcr')
#Plot 3
ggplot(M) + geom_density(aes(x=value,fill=var), alpha=.5)
Using the dplyr package, first you can convert your data to long format using the function pivot_longer as follows:
df %<>% pivot_longer(cols = c(starts_with('vcr'), starts_with('sim_vcr')),
names_to = c('type'),
values_to = c('values'))
After using filter function you can create separate plots for each value type
For vcr columns:
df %>%
filter(str_detect(type, '^vcr')) %>%
ggplot(.) +
geom_density(aes(x = values, fill = type), alpha = 0.5)
The above produces the following plot:
for sim_vcr columns:
df %>%
filter(str_detect(type, '^sim_vcr')) %>%
ggplot(.) +
geom_density(aes(x = values, fill = type), alpha = 0.5)
The above code produces the following plot:
Another simple way to subset and prepare your data for ggplot is with gather() from tidyr which you can read more about. Heres how I do it. df being your data frame provided.
# Load tidyr to use gather()
library(tidyr)
#Split appart the data you dont want on their own, the first three columns, and gather them
df_vcr <- gather(data = df[,2:4])
#Gather the other columns in the dataframe
df_sim<- gather(data = df[,-c(1:4)])
#Plot the first
ggplot() +
geom_density(data = df_vcr,
mapping = aes(value, group = key, color = key, fill = key),
alpha = 0.5)
#Plot the second
ggplot() +
geom_density(data = df_sim,
mapping = aes(value, group = key, color = key, fill = key),
alpha = 0.5)
However I am a little unclear on what you mean by "all values in all of the sim_vcr* columns". Perhaps you want all of those values in one density curve? To do this, simply do not give ggplot any grouping info in the second case.
ggplot() + geom_density(data = df_sim,
mapping = aes(value),
fill = "grey50",
alpha = 0.5)
Notice here I can still specify the 'fill' for the curve outside of the aes() function and it will apply it too all curves instead of give each group specified in 'key' a different color.
Related
I am trying to plot 400 ecdf graphs in one image using ggplot.
As far as I know ggplot does not support the par(new=T) option.
So the first solution I thought was use the grid.arrange function in gridExtra package.
However, the ecdfs I am generating are in a for loop format.
Below is my code, but you could ignore the steps for data processing.
i=1
for(i in 1:400)
{
test<-subset(df,code==temp[i,])
test<-test[c(order(test$Distance)),]
test$AI_ij<-normalize(test$AI_ij)
AI = test$AI_ij
ggplot(test, aes(AI)) +
stat_ecdf(geom = "step") +
scale_y_continuous(labels = scales::percent) +
theme_bw() +
new_theme +
xlab("Calculated Accessibility Value") +
ylab("Percent")
}
So I have values stored in "AI" in the for loop.
In this case how should I plot 400 graphs in the same chart?
This is not the way to put multiple lines on a ggplot. To do this, it is far easier to pass all of your data together and map code to the "group" aesthetic to give you one ecdf line for each code.
By far the hardest part of answering this question was attempting to reverse-engineer your data set. The following data set should be close enough in structure and naming to allow the code to be run on your own data.
library(dplyr)
library(BBmisc)
library(ggplot2)
set.seed(1)
all_codes <- apply(expand.grid(1:16, LETTERS), 1, paste0, collapse = "")
temp <- data.frame(sample(all_codes, 400), stringsAsFactors = FALSE)
df <- data.frame(code = rep(all_codes, 100),
Distance = sqrt(rnorm(41600)^2 + rnorm(41600)^2),
AI_ij = rnorm(41600),
stringsAsFactors = FALSE)
Since you only want the first 400 codes from temp that appear in df to be shown on the plot, you can use dplyr::filter to filter out code %in% test[[1]] rather than iterating through the whole thing one element at a time.
You can then group_by code, and arrange by Distance within each group before normalizing AI_ij, so there is no need to split your data frame into a new subset for every line: the data is processed all at once and the data frame is kept together.
Finally, you plot this using the group aesthetic. Note that because you have 400 lines on one plot, you need to make each line faint in order to see the overall pattern more clearly. We do this by setting the alpha value to 0.05 inside stat_ecdf
Note also that there are multiple packages with a function called normalize and I don't know which one you are using. I have guessed you are using BBmisc
So you can get rid of the loop and do:
df %>%
filter(code %in% temp[[1]]) %>%
group_by(code) %>%
arrange(Distance, by_group = TRUE) %>%
mutate(AI = normalize(AI_ij)) %>%
ggplot(aes(AI, group = code)) +
stat_ecdf(geom = "step", alpha = 0.05) +
scale_y_continuous(labels = scales::percent) +
theme_bw() +
xlab("Calculated Accessibility Value") +
ylab("Percent")
I've tried about every iteration I can find on Stack Exchange of for loops and lapply loops to create ggplots and this code has worked well for me. My only problem is that I can't assign unique titles and labels. From what I can tell in the function i takes the values of my response variable so I can't index the title I want as the ith entry in a character string of titles.
The example I've supplied creates plots with the correct values but the 2nd and 3rd plots in the plot lists don't have the correct titles or labels.
Mock dataset:
library(ggplot2)
nms=c("SampleA","SampleB","SampleC")
measr1=c(0.6,0.6,10)
measr2=c(0.6,10,0.8)
measr3=c(0.7,10,10)
qual1=c("U","U","")
qual2=c("U","","J")
qual3=c("J","","")
df=data.frame(nms,measr1,qual1,measr2,qual2,measr3,qual3,stringsAsFactors = FALSE)
identify columns in dataset that contain response variable
measrsindex=c(2,4,6)
Create list of plots that show all samples for each measurement
plotlist=list()
plotlist=lapply(df[,measrsindex], function(i) ggplot(df,aes_string(x="nms",y=i))+
geom_col()+
ggtitle("measr1")+
geom_text(aes(label=df$qual1)))
Create list of plots that show all measurements for each sample
plotlist2=list()
plotlist2=lapply(df[,measrsindex],function(i)ggplot(df,aes_string(x=measrsindex, y=i))+
geom_col()+
ggtitle("SampleA")+
geom_text(aes(label=df$qual1)))
The problem is that I cant create unique title for each plot. (All plots in the example have the title "measr1" or "SampleA)
Additionally I cant apply unique labels (from qual columns) for each bar. (ex. the letter for qual 2 should appear on top of the column for measr2 for each sample)
Additionally in the second plot list the x-values aren't "measr1","measr2","measr3" they're the index values for those columns which isn't ideal.
I'm relatively new to R and have never posted on Stack Overflow before so any feedback about my problem or posting questions is welcomed.
I've found lots of questions and answers about this sort of topic but none that have a data structure or desired plot quite like mine. I apologize if this is a redundant question but I have tried to find the solution in previous answers and have been unable.
This is where I got the original code to make my loops, however this example doesn't include titles or labels:
Looping over ggplot2 with columns
You could loop over the names of the columns instead of the column itself and then use some non-standard evaluation to get column values from the names. Also, I have included label in aes.
library(ggplot2)
library(rlang)
plotlist3 <- purrr::map(names(df)[measrsindex],
~ggplot(df, aes(nms, !!sym(.x), label = qual1)) +
geom_col() + ggtitle(.x) + geom_text(vjust = -1))
plotlist3[[1]]
plotlist3[[2]]
The same can be achieved with lapply as well
plotlist4 <- lapply(names(df)[measrsindex], function(x)
ggplot(df, aes(nms, !!sym(x), label = qual1)) +
geom_col() + ggtitle(x) + geom_text(vjust = -1))
I would recommend putting your data in long format prior to using ggplot2, it makes plotting a much simpler task. I also recoded some variables to facilitate constructing the plot. Here is the code to construct the plots with lapply.
library(tidyverse)
#Change from wide to long format
df1<-df %>%
pivot_longer(cols = -nms,
names_to = c(".value", "obs"),
names_sep = c("r","l")) %>%
#Separate Sample column into letters
separate(col = nms,
sep = "Sample",
into = c("fill","Sample"))
#Change measures index to 1-3
measrsindex=c(1,2,3)
plotlist=list()
plotlist=lapply(measrsindex, function(i){
#Subset by measrsindex (numbers) and plot
df1 %>%
filter(obs == i) %>%
ggplot(aes_string(x="Sample", y="meas", label="qua"))+
geom_col()+
labs(x = "Sample") +
ggtitle(paste("Measure",i, collapse = " "))+
geom_text()})
#Get the letters A : C
samplesvec<-unique(df1$Sample)
plotlist2=list()
plotlist2=lapply(samplesvec, function(i){
#Subset by samplesvec (letters) and plot
df1 %>%
filter(Sample == i) %>%
ggplot(aes_string(x="obs", y = "meas",label="qua"))+
geom_col()+
labs(x = "Measure") +
ggtitle(paste("Sample",i,collapse = ", "))+
geom_text()})
Watching the final plots, I think it might be useful to use facet_wrap to make these plots. I added the code to use it with your plots.
#Plot for Measures
ggplot(df1, aes(x = Sample,
y = meas,
label = qua)) +
geom_col()+
facet_wrap(~ obs) +
ggtitle("Measures")+
labs(x="Samples")+
geom_text()
#Plot for Samples
ggplot(df1, aes(x = obs,
y = meas,
label = qua)) +
geom_col()+
facet_wrap(~ Sample) +
ggtitle("Samples")+
labs(x="Measures")+
geom_text()
Here is a sample of the plots using facet_wrap.
I want to plot a chart in R where it will show me vertical lines for each type in facet.
df is the dataframe with person X takes time in minutes to reach from A to B and so on.
I have tried below code but not able to get the result.
df<-data.frame(type =c("X","Y","Z"), "A_to_B"= c(20,56,57), "B_to_C"= c(10,35,50), "C_to_D"= c(53,20,58))
ggplot(df, aes(x = 1,y = df$type)) + geom_line() + facet_grid(type~.)
I have attached image from excel which is desired output but I need only vertical lines where there are joins instead of entire horizontal bar.
I would not use facets in your case, because there are only 3 variables.
So, to get a similar plot in R using ggplot2, you first need to reformat the dataframe using gather() from the tidyverse package. Then it's in long or tidy format.
To my knowledge, there is no geom that does what you want in standard ggplot2, so some fiddling is necessary.
However, it's possible to produce the plot using geom_segment() and cumsum():
library(tidyverse)
# First reformat and calculate cummulative sums by type.
# This works because factor names begins with A,B,C
# and are thus ordered correctly.
df <- df %>%
gather(-type, key = "route", value = "time") %>%
group_by(type) %>%
mutate(cummulative_time = cumsum(time))
segment_length <- 0.2
df %>%
mutate(route = fct_rev(route)) %>%
ggplot(aes(color = route)) +
geom_segment(aes(x = as.numeric(type) + segment_length, xend = as.numeric(type) - segment_length, y = cummulative_time, yend = cummulative_time)) +
scale_x_discrete(limits=c("1","2","3"), labels=c("Z", "Y","X"))+
coord_flip() +
ylim(0,max(df$cummulative_time)) +
labs(x = "type")
EDIT
This solutions works because it assigns values to X,Y,Z in scale_x_discrete. Be careful to assign the correct labels! Also compare this answer.
It's often the case I melt my dataframes to show multiple variables on one barplot. The goal is to create a geom_bar with one par for each variable, and one summary label for each bar.
For example, I'll do this:
mtcars$id<-rownames(mtcars)
tt<-melt(mtcars,id.vars = "id",measure.vars = c("cyl","vs","carb"))
ggplot(tt,aes(variable,value))+geom_bar(stat="identity")+
geom_text(aes(label=value),color='blue')
The result is a barplot in which the label for each bar is repeated for each case (it seems):
What I want to have is one label for each bar, like this:
A common solution is to create aggregated values to place on the graph, like this:
aggr<-tt %>% group_by(variable) %>% summarise(aggrLABEL=mean(value))
ggplot(tt,aes(variable,value))+geom_bar(stat="identity")+
geom_text(aes(label=aggr$aggrLABEL),color='blue')
or
ggplot(tt,aes(variable,value))+geom_bar(stat="identity")+
geom_text(label=dplyr::distinct(tt,value),color='blue')
However, these attempts result in errors, respectively:
For solution 1: Error: Aesthetics must be either length 1 or the same as the data (96): label, x, y
For solution 2: Error in [<-.data.frame(*tmp*, aes_params, value = list(label = list( : replacement element 1 is a matrix/data frame of 7 rows, need 96
So, what to do? Setting geom_text to stat="identity" does not help either.
What I would do is create another dataframe with the summary values of your columns. I would then refer to that dataframe in the geom_text line. Like this:
library(tidyverse) # need this for the %>%
tt_summary <- tt %>%
group_by(variable) %>%
summarize(total = sum(value))
ggplot(tt, aes(variable, value)) +
geom_col() +
geom_text(data = tt_summary, aes(label = total, y = total), nudge_y = 1) # using nudge_y bc it looks better.
I am trying to plot the following data
factor <- as.factor(c(1,2,3))
V1_mean <- c(100,200,300)
V2_mean <- c(350,150,60)
V1_stderr <- c(5,9,3)
V2_stderr <- c(12,9,10)
plot <- data.frame(factor,V1_mean,V2_mean,V1_stderr,V2_stderr)
I want to create a plot with factor on the x-axis, value on the y-axis and seperate lines for V1 and V2 (hence the points are the values of V1_mean on one line and V2_mean on the other). I would also like to add error bars for these means based on V1_stderr and V2_stderr
Many thanks
I'm not sure regarding your desired output, but here's a possible solution.
First of all, I wouldn't call your data plot as this is a stored function in R which is being commonly used
Second of all, when you want to plot two lines in ggplot you'll usually have to tide your data using functions such as melt (from reshape2 package) or gather (from tidyr package).
Here's an a possible approach
library(ggplot2)
library(reshape2)
dat <- data.frame(factor, V1_mean, V2_mean, V1_stderr, V2_stderr)
mdat <- cbind(melt(dat[1:3], "factor"), melt(dat[c(1, 4:5)], "factor"))
names(mdat) <- make.names(names(mdat), unique = TRUE)
ggplot(mdat, aes(factor, value, color = variable)) +
geom_point(aes(group = variable)) + # You can also add `geom_point(aes(group = variable)) + ` if you want to see the actual points
geom_errorbar(aes(ymin = value - value.1, ymax = value + value.1))