I have data from three doses of a treatment, with three replicates per each dose:
df <- data.frame(dose=c(rep(1,300),rep(3,300),rep(5,300)),
replicate=rep(c(rep("X1",100),rep("X2",100),rep("X3",100)),3),
value=c(rnorm(300,1,1),rnorm(300,3,1),rnorm(300,5,1)),stringsAsFactors=F)
df$dose <- factor(df$dose,levels=c(1,3,5))
I want to display it using an cdf plot. Per each replicate I can simply plot the cdfs of the three doses with:
for(r in c("X1","X2","X3")){
ggplot(dplyr::filter(df,replicate==r),aes(x=value,color=dose))+
stat_ecdf(geom="step")+
theme_bw()+
theme(panel.border=element_blank(),strip.background=element_blank())
}
But I'm looking for a way to display all replicates of each dose in one figure, with standard error shading around the mean value, similar to plots achieved with stat_smooth.
Can this be achieved?
Also, either for this or for a single replicate's plot:
r <- "X1"
ggplot(dplyr::filter(df,replicate==r),aes(x=value,color=dose))+
stat_ecdf(geom="step")+
theme_bw()+
theme(panel.border=element_blank(),strip.background=element_blank())
Is there a way to compute the area under each of the ecdfs?
You can use interaction to group the data (in ggplot) based on two columns;
ggplot(df,aes(x=value,color = interaction(replicate, dose)
, group=interaction(replicate, dose)))+
stat_ecdf(geom="step")+
theme_bw()+
theme(panel.border=element_blank(),strip.background=element_blank())
This would be your plot;
As it is a little bit vague:
if you want to have lines for dose then you can get rid of replicate in the interaction or use dose instead of replicate in your ddplyr::filter;
ggplot(dplyr::filter(df,dose==1),aes(x=value,color = interaction(replicate, dose)
, group=interaction(replicate, dose)))+
stat_ecdf(geom="step")+
theme_bw()+
theme(panel.border=element_blank(),strip.background=element_blank())
And you'd get:
Related
I am trying to make a plot of proportions of a binomial distribution (yes/no) depending on one ordinal and one continuous variable. Somehow when including the continuous one as color of the dots the appearance of the plot radically changes. Can someone help me with how to include the third variable without having the plot turn into below table-looking result?
Code as follows:
#making table with proportions of people who switch (1),
## after arsenic level and education.
educ_switch <- prop.table(table(welldata$educ[welldata$switch==1],
welldata$arsenic[welldata$switch==1],
welldata$switch[welldata$switch==1]))
educ_switch <- as_data_frame(educ_switch, make.names=TRUE)
#remove observations where the proportion is 0
educ_switch1 <- educ_switch[which (educ_switch$proportion>0),]
p <- ggplot(educ_switch1, aes(x = educ, y=proportion))
If I do p + geom_point()
I get the following picture:
But when I try to distinguish the third variable by coloring it with p + geom_point(aes(colour = arsenic))
I get this weird looking thing instead:
So I have 10.000 values in a vector from a Monte Carlo simulation. I want to plot this data as a histogram and a density plot. Doing this with the hist() function is easy, and it will calculate the frequency of the of the different values automatically. My ambition is however doing this in ggplot.
My biggest problem right now is how to transform the data so ggplot can handle it. I would like my x-axis to show the "price" while the x-axis shows the frequency or density. My data has a lot decimals as shown in the example data below.
myData <- c(266.8997, 271.5137, 225.4786, 223.3533, 258.1245, 199.5601, 234.2341, 231.7850, 260.2091, 184.5102, 272.8287, 203.7482, 212.5140, 220.9094, 221.2627, 236.3224)
My current code using the hist()-function, and the plot is shown below.
hist(myData,
xlab ="Price",
prob=TRUE)
lines(density(myData))
Histogram for the data vector containing 10000 values
How would you sort the data, and how would you do this with ggplot? I am thinking if I should round the numbers as well?
Hard to say exactly without seeing a sample of your data, but have you tried:
ggplot(myData, aes(Price)) + geom_histogram()
or:
ggplot(myData, aes(Price)) + geom_density()
Just try this:
ggplot() +
geom_bar(aes(myData)) +
geom_density(aes(myData))
I am trying to find the best way to create barplots in R with standard errors displayed. I have seen other articles but I cannot figure out the code to use with my own data (having not used ggplot before and this seeming to be the most used way and barplot not cooperating with dataframes). I need to use this in two cases for which I have created two example dataframes:
Plot df1 so that the x-axis has sites a-c, with the y-axis displaying the mean value for V1 and the standard errors highlighted, similar to this example with a grey colour. Here, plant biomass should the mean V1 value and treatments should be each of my sites.
Plot df2 in the same way, but so that before and after are located next to each other in a similar way to this, so pre-test and post-test equate to before and after in my example.
x <- factor(LETTERS[1:3])
site <- rep(x, each = 8)
values <- as.data.frame(matrix(sample(0:10, 3*8, replace=TRUE), ncol=1))
df1 <- cbind(site,values)
z <- factor(c("Before","After"))
when <- rep(z, each = 4)
df2 <- data.frame(when,df1)
Apologies for the simplicity for more experienced R users and particuarly those that use ggplot but I cannot apply snippets of code that I have found elsewhere to my data. I cannot even get enough code together to produce a start to a graph so I hope my descriptions are sufficient. Thank you in advance.
Something like this?
library(ggplot2)
get.se <- function(y) {
se <- sd(y)/sqrt(length(y))
mu <- mean(y)
c(ymin=mu-se, ymax=mu+se)
}
ggplot(df1, aes(x=site, y=V1)) +
stat_summary(fun.y=mean, geom="bar", fill="lightgreen", color="grey70")+
stat_summary(fun.data=get.se, geom="errorbar", width=0.1)
ggplot(df2, aes(x=site, y=V1, fill=when)) +
stat_summary(fun.y=mean, geom="bar", position="dodge", color="grey70")+
stat_summary(fun.data=get.se, geom="errorbar", width=0.1, position=position_dodge(width=0.9))
So this takes advantage of the stat_summary(...) function in ggplot to, first, summarize y for given x using mean(...) (for the bars), and then to summarize y for given x using the get.se(...) function for the error-bars. Another option would be to summarize your data prior to using ggplot, and then use geom_bar(...) and geom_errorbar(...).
Also, plotting +/- 1 se is not a great practice (although it's used often enough). You'd be better served plotting legitimate confidence limits, which you could do, for instance, using the built-in mean_cl_normal function instead of the contrived get.se(...). mean_cl_normal returns the 95% confidence limits based on the assumption that the data is normally distributed (or you can set the CL to something else; read the documentation).
I used group_by and summarise_each function for this and std.error function from package plotrix
library(plotrix) # for std error function
library(dplyr) # for group_by and summarise_each function
library(ggplot2) # for creating ggplot
For df1 plot
# Group data by when and site
grouped_df1<-group_by(df1,site)
#summarise grouped data and calculate mean and standard error using function mean and std.error(from plotrix)
summarised_df1<-summarise_each(grouped_df1,funs(mean=mean,std_error=std.error))
# Define the top and bottom of the errorbars
limits <- aes(ymax = mean + std_error, ymin=mean-std_error)
#Begin your ggplot
#Here we are plotting site vs mean and filling by another factor variable when
g<-ggplot(summarised_df1,aes(site,mean))
#Creating bar to show the factor variable position_dodge
#ensures side by side creation of factor bars
g<-g+geom_bar(stat = "identity",position = position_dodge())
#creation of error bar
g<-g+geom_errorbar(limits,width=0.25,position = position_dodge(width = 0.9))
#print graph
g
For df2 plot
# Group data by when and site
grouped_df2<-group_by(df2,when,site)
#summarise grouped data and calculate mean and standard error using function mean and std.error
summarised_df2<-summarise_each(grouped_df2,funs(mean=mean,std_error=std.error))
# Define the top and bottom of the errorbars
limits <- aes(ymax = mean + std_error, ymin=mean-std_error)
#Begin your ggplot
#Here we are plotting site vs mean and filling by another factor variable when
g<-ggplot(summarised_df2,aes(site,mean,fill=when))
#Creating bar to show the factor variable position_dodge
#ensures side by side creation of factor bars
g<-g+geom_bar(stat = "identity",position = position_dodge())
#creation of error bar
g<-g+geom_errorbar(limits,width=0.25,position = position_dodge(width = 0.9))
#print graph
g
I am trying to present the results of a logistic regression analysis for the maturity schedule of a fish species. Below is my reproducible code.
#coded with R version R version 3.0.2 (2013-09-25)
#Frisbee Sailing
rm(list=ls())
library(ggplot2)
library(FSA)
#generate sample data 1 mature, 0 non mature
m<-rep(c(0,1),each=25)
tl<-seq(31,80, 1)
dat<-data.frame(m,tl)
# add some non mature individuals at random in the middle of df to
#prevent glm.fit: fitted probabilities numerically 0 or 1 occurred error
tl<-sample(50:65, 15)
m<-rep(c(0),each=15)
dat2<-data.frame(tl,m)
#final dataset
data3<-rbind(dat,dat2)
ggplot can produce a logistic regression graph showing each of the data points employed, with the following code:
#plot logistic model
ggplot(data3, aes(x=tl, y=m)) +
stat_smooth(method="glm", family="binomial", se=FALSE)+
geom_point()
I want to combine the probability of being mature at a given size, which is obtained, and plotted with the following code:
#plot proportion of mature
#clump data in 5 cm size classes
l50<-lencat(~tl,data=data3,startcat=30,w=5)
#table of frequency of mature individuals by size
mat<-with(l50, table(LCat, m))
#proportion of mature
mat_prop<-as.data.frame.matrix(prop.table(mat, margin=1))
colnames(mat_prop)<-c("nm", "m")
mat_prop$tl<-as.factor(seq(30,80, 5))
# Bar plot probability mature
ggplot(mat_prop, aes(x=tl,y=m)) +
geom_bar(stat="bin")
What I've been trying to do, with no success, is to make a graph that combines both, since the axis are the same it should be straightforward, but I cant seem to make t work. I have tried:
ggplot(mat_prop, aes(x=tl,y=m)) +
geom_bar(stat="bin")+
stat_smooth(method="glm", family="binomial", se=FALSE)
but does not work. Any help would be greatly appreciated. I am new so not able to add the resulting graphs to this post.
I see three problems with your code:
Using stat="bin" in your geom_bar() is inconsisten with giving values for the y-axis (y=m). If you bin, then you count the number of x-values in an interval and use that count as y-value, so there is no need to map your data to the y-axis.
The data for the glm-plot is in data3, but your combined plot only uses mat_prop.
The x-axis of the two plots are acutally not quite the same. In the bar plot, you use a factor variable on the x-axis, making the axis discrete, while in the glm-plot, you use a numeric variable, which leads to a continuous x-axis.
The following code gave a graph combining your two plots:
mat_prop$tl<-seq(30,80, 5)
ggplot(mat_prop, aes(x=tl,y=m)) +
geom_bar(stat="identity") +
geom_point(data=data3) +
geom_smooth(data=data3,aes(x=tl,y=m),method="glm", family="binomial", se=FALSE)
I could run it after first sourcing your script to define all the variables. The three problems mentioned above are adressed as follows:
I used geom_bar(stat="identity") in order not to use binning in the bar plot.
I use the data-argument in geom_point and geom_smooth in order to use the correct data (data3) for these parts of the plot.
I redifine mat_prop$tl to make it numeric. It is then consistent with the column tl in data3, which is numeric as well.
(I also added the points. If you don't want them, just remove geom_point(data=data3).)
The plot looks as follows:
What I have is a 3-Levels Repeated Measures Factor and a continuous variable (Scores in psychological questionnaire, measured only once pre-experiment, NEO), which showed significant interaction together in a Linear Mixed Effects Model with a Dependent Variable (DV; State-Scores measured at each time level, IAS).
To see the nature of this interaction, I would like to create a plot with time levels on X-Axis, State-Score on Y-Axis and multiple curves for the continuous variable, similar to this. The continuous variable should be categorized in, say quartiles (so I get 4 different curves), which is exactly what I can't achieve. Until now I get a separate curve for each value in the continuous variable.
My goal is also comparable to this, but I need the categorial (time) variable not as separate curves but on the X-Axis.
I tried out a lot with different plot functions in R but did'nt manage to get what I want, maybe because I am not so skilled in dealing with R.
F. e.
gplot(Data_long, aes(x = time, y = IAS, colour = NEO, group = NEO)) +
geom_line()
from the first link shows me dozens of curves (one for each value in the measurement NEO) and I can't find how to group continuous variables in a meaningful way in that gplot function.
Edit:
Original Data:
http://www.pastebin.ca/2598926
(I hope it is not too inconvenient.)
This object (Data_long) was created/converted with the following line:
Data_long <- transform(Data_long0, neo.binned=cut(NEO,c(25,38,46,55,73),labels=c(".25",".50",".75","1.00")))
Every value in the neo.binned col seems to be set correctly with enough cases per quantile.
What I then tried and didn't work:
ggplot(Data_long, aes(x = time, y = ias, color = neo.binned)) + stat_summary(fun.y="median",geom="line")
geom_path: Each group consist of only one observation. Do you need to adjust the group >aesthetic?
I got 92 subjects and values for NEO between 26-73. Any hints what to enter for cut and labels function? Quantiles are 0% 25% 50% 75% 100% 26 38 46 55 73.
Do you mean something like this? Here, your data is binned according to NEO into three classes, and then the median of IAS over these bins is drawn. Check out ?cut.
Data_long <- transform(Data_long, neo.binned=cut(NEO,c(0,3,7,10),labels=c("lo","med","hi")))
Plot everything in one plot.
ggplot(Data_long, aes(x = time, y = IAS, color = neo.binned))
+ stat_summary(aes(group=neo.binned),fun.y="median",geom="line")
And stealing from CMichael's answer you can do it all in multiple (somehow you linked to facetted plots in your question):
ggplot(Data_long,aes(x=time,y=IAS))
+ stat_summary(fun.y="median",geom="line")
+ facet_grid(neo.binned ~ .)
Do you mean facetting #ziggystar initial Plot?
quantiles = quantile(Data_long$NEO,c(0.25,0.5,0.75))
Data_long$NEOQuantile = ifelse(Data_long$NEO<=quantiles[1],"first NEO Quantile",
ifelse(Data_long$NEO<=quantiles[2],
"second NEO Quantile",
ifelse(Data_long$NEO<=quantiles[3],
"third NEO Quantile","forth NEO Quantile")))
require(ggplot2)
p = ggplot(Data_long,aes(x=time,y=IAS)) + stat_quantile(quantiles=c(1),formula=y ~ x)
p = p + facet_grid(.~NEOQuantile)
p