I have a data.frame with 72 discrete categories. When I colour by these categories I get 72 different keys in the legend. I would prefer to only use every 2nd or 3rd key.
Any idea how I cut down on the number of lines in the legend?
Thanks
H.
Code that reproduces my problem is given below.
t=seq(0,2*pi,length.out=10)
RR=rep(cos(t),72)+0.1*rnorm(720)
dim(RR)=c(10,72)
stuff=data.frame(alt,RR)
names(stuff)=c("altitude",
paste(rep(15:20,each=12),
rep(c("00","05",as.character(seq(from=10,to=55,by=5))),6),
sep=":"))
bb=melt(stuff,id.vars=c(1))
names(bb)[2:3]=c("period","velocity")
ggplot(data=bb,aes(altitude,velocity))+geom_point(aes(color=period))+geom_smooth()
You can treat your period values as numeric in geom_point(). That will make colors as gradient (values from 1 to 72 corresponding to number of levels). Then with scale_colour_gradient() you can set number of breaks you need and add labels as your actual period values.
ggplot(data=bb,aes(altitude,velocity))+
geom_point(aes(color=as.numeric(period)))+
geom_smooth()+
scale_colour_gradient("Period",low="red", high="blue",
breaks=c(seq(1,72,12),72),labels=unique(bb$period)[c(seq(1,72,12),72)])
It looks hard to customize the legend here for a discrete_color_scale!So I propose a lattice solution. You need just to give the right text to the auto.key list.
libarry(latticeExtra)
labels.time <- unique(bb$period)[rep(c(F,F,T),each=3)] ## recycling here to get third label
g <- xyplot(velocity~altitude, data=bb,groups=period,
auto.key=list(text=as.character(labels.time),
columns=length(labels.time)/3),
par.settings = ggplot2like(), axis = axis.grid,
panel = function(x,y,...){
panel.xyplot(x,y,...)
panel.smoother(x,y,...)
})
Related
par(mfrow=c(1,2))
Trigen <- data.frame(OTriathlon$Gender,OTriathlon$Swim,OTriathlon$Bike,OTriathlon$Run)
colnames(Trigen) <- c("Gender","Swim","Bike","Run")
res <- split(Trigen[,2:4],Trigen$Gender)
pairs(res$Male, pch="M", col = 4)
points(res$Female, pch ="F", col= 2)
Basically, Customize the pairs plot, so where the plot symbol and color of each data point represents
gender.
I did some random things in the code but the issue that I am facing is that I cant add female points to the existing plot. After running the points code it just stays the same doesn't get updated
There is no need to call points sevral times, because you can use the factor directly as a color. Example:
plot(iris[,c(2,3)], col=iris$Species)
Here's a fiddle for a simplified version of a plot I am trying to generate.
On line 44 the plot points are sized according to 1/Error:
main_aes = aes(x = Date, y = Popular_Support, size=1/Error)
But instead of displaying 1/Error values in the legend, I want it to display Sample Size which is 1/Error^2, which the legend title being Sample Size.
I only want this displayed in the legend, but I still want the original values to weight the point sizes.
How can I do this? How can I perform a calculation on the legend text that is displayed and change the legend title?
You can do this as follows:
plot + scale_size_continuous(breaks=seq(40,70,10), labels=seq(40,70,10)^2,
name="Sample Size")
Also, plot is an R function, so it's probably better to use a different name for your plot objects.
I created several plots in R. Occasionally, the program does not match the color of the variables in the plot to the variable colors in the legend. In the attached file (Unfortunately, I can't yet attach images b/c of reputation), the first 2 graphs are assigned a black/red color scheme. But, the third chart automatically uses a green/black and keeps the legend with black/red. I cannot understand why this happens.
How can I prevent this from happening?
I know it's possible to assign color, but I am struggling to find a clear way to do this.
Code:
plot(rank, abundance, pch=16, col=type, cex=0.8)
legend(60,50,unique(type),col=1:length(type),pch=16)
plot(rank, abundance, pch=16, col=Origin, cex=0.8)
legend(60,50,unique(Origin),col=1:length(Origin),pch=16)
Below is where color pattern won't match
plot(rank, abundance, pch=16, col=Lifecycle, cex=0.8)
legend(60,50,unique(Lifecycle),col=1:length(Lifecycle),pch=16)
data frame looks like this:
Plant rank abundance Lifecycle Origin type
X 1 23 Perennial Native Weedy
Y 2 10 Annual Exotic Ornamental
Z 3 9 Perennial Native Ornamental
First, I create some fake data.
df <- data.frame(rank = 1:10, abundance = runif(10,10,100),
Lifecycle = sample(c('Perennial', 'Annual'), 10, replace=TRUE))
Then I explicitly say what colors I want my points to be.
cols=c('dodgerblue', 'plum')
Then I plot, using the factor df$Lifecycle to color points.
plot(df$rank, df$abundance, col = cols[df$Lifecycle], pch=16)
When the factor df$Lifecycle is used above, it converts it to a numeric reference to cols, such that it sorts the values alphabetically. Therefore, in the legend, we just need to sort the unique df$Lifecycle values, and then hand it our color vector (cols).
legend(5, 40, sort(unique(df$Lifecycle)), col=cols, pch=16, bty='n')
Hopefully this helps.
I had some problems while trying to plot a histogram to show the frequency of every value while plotting the value as well. For example, suppose I use the following code:
x <- sample(1:10,1000,replace=T)
hist(x,label=TRUE)
The result is a plot with labels over the bar, but merging the frequencies of 1 and 2 in a single bar.
Apart from separate this bar in two others for 1 and 2, I also need to put the values under each bar.
For example, with the code above I would have the number 10 under the tick at the right margin of its bar, and I needed to plot the values right under the bars.
Is there any way to do both in a single histogram with hist function?
Thanks in advance!
Calling hist silently returns information you can use to modify the plot. You can pull out the midpoints and the heights and use that information to put the labels where you want them. You can use the pos argument in text to specify where the label should be in relation to the point (thanks #rawr)
x <- sample(1:10,1000,replace=T)
## Histogram
info <- hist(x, breaks = 0:10)
with(info, text(mids, counts, labels=counts, pos=1))
I have the following code to make a graph of my data:
library(ggplot2)
library(reshape)
sdata <- read.csv("http://dl.dropbox.com/u/58164604/sdata.csv", stringsAsFactors = FALSE)
pdata<-melt(sdata, id.vars="Var")
p<-ggplot(pdata, aes(Var,value,col=variable))
p+geom_point(aes(shape = variable),alpha=0.7)
This creates a graph with 'Var' being the x-axis and 'value' being the y-axis.
What I would like to do is change how the points are coloured. Instead of being by the variable name, I would like them to be by their 'Var' value. So I would like all points that have a Var value between 1-10 to be one colour, 11-20 to be another, and so on for 21-30, 31-35 and 36-41. What I would also like is there to be a ribbon/area shaded behind these points that extends from the highest to lowest value for each Var value, but this ribbon would also have to have the same colour as the points, just with a lower transparency level.
For a bonus question I am also having trouble getting the 'mean' variable from my example to appear as a geom_line rather than a geom_point. I have been playing around with this:
p+geom_point()+geom_line(data=pdata[which(pdata$variable=="Mean")])
but I can't get it to work. If anyone can help with any of this that would be great. Thanks.
Using cut with options labels=F, I add a new variable for coloring.
pdata <- transform(pdata,varc =cut(pdata$Var,10,labels=F))
p<-ggplot(subset(pdata,variable!='Mean'), aes(Var,value,col=varc))
p+geom_point(aes(shape = variable),alpha=0.7)+
geom_line(data=subset(pdata,variable =='Mean'),size=2)
Edit:ribbon part
I don't understand the part of the ribbon(maybe if you can more explain upper and lower values), but I think here we can simply use geom-polygon
last_plot()+ geom_polygon(aes(fill=varc, group=variable),alpha=0.3,linetype=3)
In regard to your first question, you can use the cut function to classify your continuous data into categories. For example:
with(mtcars, cut(mpg, seq(min(mpg), max(mpg), length = 5))
This cuts the continuous values in the mpg column into 5 classes.