Colour points by x-axis values in ggplot - r

I have the following code to make a graph of my data:
library(ggplot2)
library(reshape)
sdata <- read.csv("http://dl.dropbox.com/u/58164604/sdata.csv", stringsAsFactors = FALSE)
pdata<-melt(sdata, id.vars="Var")
p<-ggplot(pdata, aes(Var,value,col=variable))
p+geom_point(aes(shape = variable),alpha=0.7)
This creates a graph with 'Var' being the x-axis and 'value' being the y-axis.
What I would like to do is change how the points are coloured. Instead of being by the variable name, I would like them to be by their 'Var' value. So I would like all points that have a Var value between 1-10 to be one colour, 11-20 to be another, and so on for 21-30, 31-35 and 36-41. What I would also like is there to be a ribbon/area shaded behind these points that extends from the highest to lowest value for each Var value, but this ribbon would also have to have the same colour as the points, just with a lower transparency level.
For a bonus question I am also having trouble getting the 'mean' variable from my example to appear as a geom_line rather than a geom_point. I have been playing around with this:
p+geom_point()+geom_line(data=pdata[which(pdata$variable=="Mean")])
but I can't get it to work. If anyone can help with any of this that would be great. Thanks.

Using cut with options labels=F, I add a new variable for coloring.
pdata <- transform(pdata,varc =cut(pdata$Var,10,labels=F))
p<-ggplot(subset(pdata,variable!='Mean'), aes(Var,value,col=varc))
p+geom_point(aes(shape = variable),alpha=0.7)+
geom_line(data=subset(pdata,variable =='Mean'),size=2)
Edit:ribbon part
I don't understand the part of the ribbon(maybe if you can more explain upper and lower values), but I think here we can simply use geom-polygon
last_plot()+ geom_polygon(aes(fill=varc, group=variable),alpha=0.3,linetype=3)

In regard to your first question, you can use the cut function to classify your continuous data into categories. For example:
with(mtcars, cut(mpg, seq(min(mpg), max(mpg), length = 5))
This cuts the continuous values in the mpg column into 5 classes.

Related

Is there an option to limit the presented axis values in a ggplot with significance label without cutting them off in R

I'm analyzing numeric data with values between 1 to 7. I want to plot boxplots and show the significance across categories. My problem is that adding the labels also extends the values in the y axis. This might imply that the possible data range is up to more than 7 - which is not the best. I tried using ylim() but using it cuts off the signif labels. Is there a way to make the axis values to be 1-7, without cutting the information the should apear beyond this range?
my current plot:
when using ylim()
the desired outcome is something like that:
As mentioned in the comments, the solution is setting breaks:
gboxplot(...)+ scale_y_continuous(breaks = seq(0, 7, by = 1))

How to prevent geom_text_repel from labeling points on scatter plot with default number ordering list?

My dataset looks like this:
I'm trying to create a simple scatter plot with data labels that are names (first and last name).
I used geom_text_repel in ggrepel to create data labels, but the labels on the plot are just numbers in the order of the data points in my dataset.
For example, if you look at the first datapoint, instead of the label being "Stephen Curry" it is "1"
I have no idea why this is happening and I can't find anyone else who even has my problem, let alone a solution.
Code:
ggplot(gravity,
aes(TS., USG., label = rownames(gravity))) +
geom_point(aes(TS., USG.), color='black') +
geom_text_repel(aes(TS., USG., label = rownames(gravity)))
The image above shows the plot created by the code. As you can see, the labels are just the ordering number instead of the name. I don't see why this happening considering those ordering numbers are not part of the dataset I imported.
Thanks in advance

What is this kind of plot and how do I change its color

I really like taking advantage of the 120+ plotting methods of base R plot function, throwing at it random object and checking what comes out.
This time, I just sent a basic data.frame with one factorvariable. I like the plot that came out, but I couldn't manage to change its color.
To make the plot:
library(dplyr)
set.seed(123)
dd <- data.frame(FF = cut(runif(1000,0,100), seq(0,100,10)),
XX = rnorm(1000, 10, 2)) %>%
group_by(FF) %>%
summarize(XX=mean(XX))
Plotting it with plot(dd) gives:
I want the bars to be an other color than black. I tried the obvious plot(dd, col="red") but it doesn't do anything. Same thing with fg or bg, which I tried also.
I looked at ?plot.data.frame and ?plot.factor but didn't see any hint on how to change the color. What actual function is building this plot, how do we call it and how to change the colors on it?
It looks like plot is trying to make a boxplot, but since there's only one value per category, you're getting a single horizontal line. border="red" will change the color to red. If you run x = plot(dd) and then inspect x, you'll see it's a list in which the first element contains the boxplot stats.
plot is a "generic" function that "dispatches" different actual plotting functions (called "methods"), depending on what type of object you provide to plot. In this case, you have a categorical x-axis variable, so plot dispatches a "method" that produces a boxplot.
To see what specific functions plot can call, run methods(plot). I haven't checked, but I suspect the plot.data.frame function ends up calling a function that produces a boxplot when the x variable is categorical and the y variable is numeric (run graphics:::plot.data.frame(dd), to see this).
If you run plot before summarizing, you can see the boxplot appear:
library(dplyr)
set.seed(123)
dd <- data.frame(FF = cut(runif(1000,0,100), seq(0,100,10)),
XX = rnorm(1000, 10, 2))
plot(dd, border="red")
dd %>%
group_by(FF) %>%
summarize(XX=mean(XX)) %>%
plot(border="red")

ggplot on point estimates and confidence intervals

I am trying to plot the data frame using ggplot that looks like the plot at the bottom of http://www.ats.ucla.edu/stat/r/dae/logit.htm.
a<-data.frame(Year=c("2012","2012","2012","2013","2013","2013","2014","2014","2014"),
Engagement=rep(c("low","med","high"),3),
cost=c(4464.88,4690.14,4342.72,5326.63,5000.03,3967.02,4646.27,4282.38,3607.79),
lower=c(4151.4,5027.51,4095.73,4366.82,4682.85,3715.86,3775.25,3642.41,3235.43),
upper=c(4778.35,5625.75,5196.81,5013.45,5317.2,4848.89,4910.19,4291.64,3980.14))
I tried:
k<-ggplot(a,aes(x=Year,y=cost))
k+geom_ribbon(aes(ymin=lower,ymax=upper,fill=Engagement),alpha=0.2)+
geom_pointrange(aes(x=Year,y=cost,ymin=lower,ymax=lower),size=1,width=0.2,color="blue")
I appreciate all the helps.
I just also tried :
pd <- position_dodge(0.1)
k<-ggplot(a,aes(x=Year,y=cost))
k+geom_ribbon(aes(ymin=lower,ymax=upper,fill=Engagement),alpha=0.2)+
geom_line(position=pd,aes(color=Engagement))
error message:
ymax not defined: adjusting position using y instead
geom_path: Each group consist of only one observation.
Do you need to adjust the group aesthetic?
Thanks everybody, problem solved:
k+geom_line(aes(group=Engagement,color=Engagement))+
geom_errorbar(aes(ymin=lower,ymax=upper,color=Engagement,width=0.2))
I'm assuming by "look like" you mean add the ribbons to your graph. If so then the issue is stemming from the Year variable in the a data.frame. It currently has factor class and it needs to be numeric.
If you add this before you call your ggplot graph you should see the ribbons:
a$Year <- as.numeric(a$Year)
You could also modify your entire assignment of a to the following:
a<-data.frame(Year=as.numeric(c("2012","2012","2012","2013","2013","2013","2014","2014","2014")),
Engagement=rep(c("low","med","high"),3),
cost=c(4464.88,4690.14,4342.72,5326.63,5000.03,3967.02,4646.27,4282.38,3607.79),
lower=c(4151.4,5027.51,4095.73,4366.82,4682.85,3715.86,3775.25,3642.41,3235.43),
upper=c(4778.35,5625.75,5196.81,5013.45,5317.2,4848.89,4910.19,4291.64,3980.14))

ggplot2: Use every N-th legend key label

I have a data.frame with 72 discrete categories. When I colour by these categories I get 72 different keys in the legend. I would prefer to only use every 2nd or 3rd key.
Any idea how I cut down on the number of lines in the legend?
Thanks
H.
Code that reproduces my problem is given below.
t=seq(0,2*pi,length.out=10)
RR=rep(cos(t),72)+0.1*rnorm(720)
dim(RR)=c(10,72)
stuff=data.frame(alt,RR)
names(stuff)=c("altitude",
paste(rep(15:20,each=12),
rep(c("00","05",as.character(seq(from=10,to=55,by=5))),6),
sep=":"))
bb=melt(stuff,id.vars=c(1))
names(bb)[2:3]=c("period","velocity")
ggplot(data=bb,aes(altitude,velocity))+geom_point(aes(color=period))+geom_smooth()
You can treat your period values as numeric in geom_point(). That will make colors as gradient (values from 1 to 72 corresponding to number of levels). Then with scale_colour_gradient() you can set number of breaks you need and add labels as your actual period values.
ggplot(data=bb,aes(altitude,velocity))+
geom_point(aes(color=as.numeric(period)))+
geom_smooth()+
scale_colour_gradient("Period",low="red", high="blue",
breaks=c(seq(1,72,12),72),labels=unique(bb$period)[c(seq(1,72,12),72)])
It looks hard to customize the legend here for a discrete_color_scale!So I propose a lattice solution. You need just to give the right text to the auto.key list.
libarry(latticeExtra)
labels.time <- unique(bb$period)[rep(c(F,F,T),each=3)] ## recycling here to get third label
g <- xyplot(velocity~altitude, data=bb,groups=period,
auto.key=list(text=as.character(labels.time),
columns=length(labels.time)/3),
par.settings = ggplot2like(), axis = axis.grid,
panel = function(x,y,...){
panel.xyplot(x,y,...)
panel.smoother(x,y,...)
})

Resources