ggplot on point estimates and confidence intervals - r

I am trying to plot the data frame using ggplot that looks like the plot at the bottom of http://www.ats.ucla.edu/stat/r/dae/logit.htm.
a<-data.frame(Year=c("2012","2012","2012","2013","2013","2013","2014","2014","2014"),
Engagement=rep(c("low","med","high"),3),
cost=c(4464.88,4690.14,4342.72,5326.63,5000.03,3967.02,4646.27,4282.38,3607.79),
lower=c(4151.4,5027.51,4095.73,4366.82,4682.85,3715.86,3775.25,3642.41,3235.43),
upper=c(4778.35,5625.75,5196.81,5013.45,5317.2,4848.89,4910.19,4291.64,3980.14))
I tried:
k<-ggplot(a,aes(x=Year,y=cost))
k+geom_ribbon(aes(ymin=lower,ymax=upper,fill=Engagement),alpha=0.2)+
geom_pointrange(aes(x=Year,y=cost,ymin=lower,ymax=lower),size=1,width=0.2,color="blue")
I appreciate all the helps.
I just also tried :
pd <- position_dodge(0.1)
k<-ggplot(a,aes(x=Year,y=cost))
k+geom_ribbon(aes(ymin=lower,ymax=upper,fill=Engagement),alpha=0.2)+
geom_line(position=pd,aes(color=Engagement))
error message:
ymax not defined: adjusting position using y instead
geom_path: Each group consist of only one observation.
Do you need to adjust the group aesthetic?

Thanks everybody, problem solved:
k+geom_line(aes(group=Engagement,color=Engagement))+
geom_errorbar(aes(ymin=lower,ymax=upper,color=Engagement,width=0.2))

I'm assuming by "look like" you mean add the ribbons to your graph. If so then the issue is stemming from the Year variable in the a data.frame. It currently has factor class and it needs to be numeric.
If you add this before you call your ggplot graph you should see the ribbons:
a$Year <- as.numeric(a$Year)
You could also modify your entire assignment of a to the following:
a<-data.frame(Year=as.numeric(c("2012","2012","2012","2013","2013","2013","2014","2014","2014")),
Engagement=rep(c("low","med","high"),3),
cost=c(4464.88,4690.14,4342.72,5326.63,5000.03,3967.02,4646.27,4282.38,3607.79),
lower=c(4151.4,5027.51,4095.73,4366.82,4682.85,3715.86,3775.25,3642.41,3235.43),
upper=c(4778.35,5625.75,5196.81,5013.45,5317.2,4848.89,4910.19,4291.64,3980.14))

Related

ggplot function with only the septum of color

I'm practicing ggplot for data visualization.
However, when I apply the code as follow:
> ggplot(halloweenn,aes(x=Time,y=Count,color=Date,group=Date))+ geom_point()
The scatter plot show like this
[1]: https://i.stack.imgur.com/7Z2D9.png
The unit of year becomes 2 and the color is super difficult to check. This is very different from the sample that I saw online and from my teacher, their plots are in different color and the unit of the year remain 1 as my original data.
Is there anything wrong with my code or what should specify in order to the scatter I want?
Thanks in advance!
I found the answer by myself.
The type of Date and Time here is class rather than factor.
After converting the data type, the scatter plot has the same outcome I want.

R: creating a barplot depicting a percentage of a percentage in ggplot2

I'm having a lot of trouble using my current dataset to create the barplot I need. It seems straightforward enough, but I am getting an error whenever I run my code.
link to my data set
some background information
Percent_Calls is calculated by Call/(Call+Noise)
Percent_Total is calculated by (Call+Noise)/(sum(Call)+sum(Noise));
PercentofCall is calculated by Percent_Calls*Percent_Total
I am trying to create a barplot (with percentages on the y axis) with CRF_Score as the x-variable and the Percent_Total values as the bars. Eventually, I would like to highlight the portion of PercentofCall in Percent_Total.
require(ggplot2)
ggplot(FD2_CAna, aes(CRF_Score, fill=Percent_Total)) + geom_bar(binwidth=0.05)
The above code usually works for me, however I am getting this error instead:
Error in unit(tic_pos.c, "mm") : 'x' and 'units' must have length > 0
I have tried using as.factor(x) as suggested in another thread, but the graph output is not what I need.
This is more along of lines of what I want, except it was made in JMP.
Sorry for the long explanation, what am I doing wrong here?
To get the similar plot to JMP you should use Percent_Total as y values and not as the fill= values and then in geom_bar() use stat="identity".
With your JMP plot it seems that Percent_Total is treated as factor and not as numeric variable - you can see it by comparing the height of bars with values 23 and 2 - they are almost the same width. If file FD2_CAna.csv is imported properly then values are numeric.
FD2_CAna<-read.csv2(file="FD2_CAna.csv",header=T,sep=",",dec=".")
ggplot(FD2_CAna, aes(CRF_Score, Percent_Total)) + geom_bar(stat="identity")

Colour points by x-axis values in ggplot

I have the following code to make a graph of my data:
library(ggplot2)
library(reshape)
sdata <- read.csv("http://dl.dropbox.com/u/58164604/sdata.csv", stringsAsFactors = FALSE)
pdata<-melt(sdata, id.vars="Var")
p<-ggplot(pdata, aes(Var,value,col=variable))
p+geom_point(aes(shape = variable),alpha=0.7)
This creates a graph with 'Var' being the x-axis and 'value' being the y-axis.
What I would like to do is change how the points are coloured. Instead of being by the variable name, I would like them to be by their 'Var' value. So I would like all points that have a Var value between 1-10 to be one colour, 11-20 to be another, and so on for 21-30, 31-35 and 36-41. What I would also like is there to be a ribbon/area shaded behind these points that extends from the highest to lowest value for each Var value, but this ribbon would also have to have the same colour as the points, just with a lower transparency level.
For a bonus question I am also having trouble getting the 'mean' variable from my example to appear as a geom_line rather than a geom_point. I have been playing around with this:
p+geom_point()+geom_line(data=pdata[which(pdata$variable=="Mean")])
but I can't get it to work. If anyone can help with any of this that would be great. Thanks.
Using cut with options labels=F, I add a new variable for coloring.
pdata <- transform(pdata,varc =cut(pdata$Var,10,labels=F))
p<-ggplot(subset(pdata,variable!='Mean'), aes(Var,value,col=varc))
p+geom_point(aes(shape = variable),alpha=0.7)+
geom_line(data=subset(pdata,variable =='Mean'),size=2)
Edit:ribbon part
I don't understand the part of the ribbon(maybe if you can more explain upper and lower values), but I think here we can simply use geom-polygon
last_plot()+ geom_polygon(aes(fill=varc, group=variable),alpha=0.3,linetype=3)
In regard to your first question, you can use the cut function to classify your continuous data into categories. For example:
with(mtcars, cut(mpg, seq(min(mpg), max(mpg), length = 5))
This cuts the continuous values in the mpg column into 5 classes.

In R, how to plot bars only in a certain interval of data?

My problem is very simple.
I have to plot a data series in R, using bars. Data are contained in a vector vet.
I've used barplot, that plots my data from the first to the last:
barplot(vet), and everything was fine.
Now, on the contrary, I would like to plot not all my data, but just a part of them: from 10% to the end.
How could I do this with barplot()?
How could I do this with plot()?
Thanx
You need to subset your data before plotting:
##Work out the 10% quantile and subset
v = vet[vet > quantile(vet, 0.1)]
It is not clear exactly what you want to do.
If you want to plot only a subset of the bars (but the whole bars) then you could just subset the data before passing it to barplot.
If you want to plot all the bars, but only that part beyond 10% (not include 0) then you can do this by setting the ylim argument. But it is very discouraged to do a barplot that does not include 0. You may be better off using a dotplot instead of a barplot if 0 is not meaningful.
If you want the regular plot, but want to exclude plotting outside of a given window within the plot then the clip function may be what you want.
The gap.barplot function from the plotrix package may also be what you want.

Geom_ribbon() just turns the graph blank

Hi I got a data frame weekly.mean.values with the following structure:
week:mean:ci.lower:ci.upper
Where week is a factor; mean, ci.lower and ci.upper are numeric. For each week, there is only one mean, and one ci.lower or ci.upper.
I was trying to plot a shaded area inside of the 95% confidence interval around the mean, with the following code:
ggplot(weekly.mean.values,aes(x=week,y=mean)) +
geom_line() +
geom_ribbon(aes(ymin=ci.lower,ymax=ci.upper))
The plot, however, came out blank (that is only with x-axis and y-axis present, but no lines, or points, let alone shaded areas).
If I removed the geom_ribbon part, I did get a line. I know that this should be a very simple task but I don't know why I couldn't get geom_ribbon to plot what I wanted. Any hint would be truly appreciated.
I realize this thread is super old, but google still find it.
The answer is that you need to set the ymin and ymax to use a part of the data you are using on the y-axis. It you set them to scalar values then the ribbon covers the entire plot from top to bottom.
You can use
ymin=0
ymax=mean
to go from 0 to your y-point or even
ymin=mean-1
ymax=mean+1
to have the ribbon cover a strip encompassing your actual data.
I may be missing something, but the ribbon will be plotted filled with grey20 by default. You are plotting this layer on top of the data so no wonder it obscures it. Also, it is also possible that the limits for the plot axes derived from the data provided to the initial ggplot() call will not be sufficient to contain the confidence interval ribbon. In that case, I would not be surprised to see a grey/blank plot.
To see if this is the problem, try altering your geom_ribbon() line to:
geom_ribbon(aes(ymin=ci.lower,ymax=ci.upper), alpha = 0.5)
which will plot the ribbon with transparency whic should show the data underneath if the problem is what I think it is.
If so, set the x and y limits to the range of the data +/- the confidence interval you wish to plot and swap the order of the layers (i.e. draw the line on top of the ribbon), and use transparency in the ribbon to show the grid through it.
From ggplot's docs for geom_ribbon (2.1.0):
For each continuous x value, geom_interval displays a y interval. geom_area is a special case of geom_ribbon, where the minimum of the range is fixed to 0.
In this case, x values cannot be factors for geom_ribbon. One solution would be to convert week from a factor to a numeric. e.g.
ggplot(weekly.mean.values,aes(x=as.numeric(week),y=mean)) +
geom_line() +
geom_ribbon(aes(ymin=ci.lower,ymax=ci.upper))
geom_line should handle the switch from factor to numeric without incident, although the X axis scale may display differently.

Resources