I have a plot generated by the following R code - basically a panel of many histograms/bars. and to each one I'd like to add a vertical line, but the vertical line for each facet is different in it's position. Alternatively I'd like to colour the bars red depending on whether the x value is higher than a threshold - how do I do this to such a plot with ggplot2 / R.
I generated the chart like so:
Histogramplot3 <- ggplot(completeFrame, aes(P_Value)) + geom_bar() + facet_wrap(~ Generation)
Where completeFrame is my dataframe, P_Value is my x variable, and the Facet Wrap Variable Generation is a factor.
It's easier to help with specific examples, but simulating some data, maybe this will help:
#simulate data
completeFrame<-data.frame(P_Value=rnorm(200,0.8,0.1),Generation=rep(1:4,times=50))
#draw the basic plot
h3 <- qplot(data=completeFrame,x=P_Value,geom="blank") +
geom_bar(binwidth=0.02, col="black", fill="black") +
# overlay the "red" bars for the subset of data
geom_bar(data=completeFrame[which(completeFrame$P_Value>0.8),],binwidth=0.02, col="black", fill="red") +
facet_wrap(~ Generation)
#add lines to the subsets
h3 <- h3+geom_hline(data=completeFrame[which(completeFrame$Generation==2),],aes(yintercept=max(P_Value)))
h3 <- h3+geom_hline(data=completeFrame[which(completeFrame$Generation==1),],aes(yintercept=2.5))
h3 <- h3+geom_hline(data=completeFrame[which(completeFrame$Generation==3),],aes(yintercept=mean(P_Value)))
h3
Related
I have an excel table with the data of the Odds Ratios of different diseases for my study. I want to make a forestplot with the R package ggplot2. I have used this script:
library(ggplot2)
df <- excel.xlsx
fp <- ggplot(data=df, aes(x=Disease, y=OR, ymin=Lower, ymax=Upper)) +
geom_pointrange() +
geom_hline(yintercept=1, lty=2) + # add a dotted line at x=1 after flip
coord_flip() + # flip coordinates (puts labels on y axis)
xlab("Disease") + ylab("OR (95% CI)") +
theme_bw() # use a white background
print(fp)
This makes round black spots for all diseases.I would like to change the shape of the dots on the graph to squares or other different form, but only to some diseases. I would like to change the shape of the points on the graph corresponding to rows 6, 8, 14 and 16 and the rest of the points leave them as they are now.
Thank you in advanced.
I have tried this script but it makes only black spots.
the example code is not reproducible when I'm writing this answer, but I think you just need to specify shape in the aes
This question includes a complete example with multiple shapes
In trying to color my stacked histogram according to a factor column; all the bars have a "green" roof? I want the bar-top to be the same color as the bar itself. The figure below shows clearly what is wrong. All the bars have a "green" horizontal line at the top?
Here is a dummy data set :
BodyLength <- rnorm(100, mean = 50, sd = 3)
vector <- c("80","10","5","5")
colors <- c("black","blue","red","green")
color <- rep(colors,vector)
data <- data.frame(BodyLength,color)
And the program I used to generate the plot below :
plot <- ggplot(data = data, aes(x=data$BodyLength, color = factor(data$color), fill=I("transparent")))
plot <- plot + geom_histogram()
plot <- plot + scale_colour_manual(values = c("Black","blue","red","green"))
Also, since the data column itself contains color names, any way I don't have to specify them again in scale_color_manual? Can ggplot identify them from the data itself? But I would really like help with the first problem right now...Thanks.
Here is a quick way to get your colors to scale_colour_manual without writing out a vector:
data <- data.frame(BodyLength,color)
data$color<- factor(data$color)
and then later,
scale_colour_manual(values = levels(data$color))
Now, with respect to your first problem, I don't know exactly why your bars have green roofs. However, you may want to look at some different options for the position argument in geom_histogram, such as
plot + geom_histogram(position="identity")
..or position="dodge". The identity option is closer to what you want but since green is the last line drawn, it overwrites previous the colors.
I like density plots better for these problems myself.
ggplot(data=data, aes(x=BodyLength, color=color)) + geom_density()
ggplot(data=data, aes(x=BodyLength, fill=color)) + geom_density(alpha=.3)
I am working on visualising some patterns in network data and have some issues labelling lines, where I have multiple classes of lines:
loess lines for each factor (network)
a baseline at y=4000
a gam line that acts on all of the data (not factored)
Now, stack overflow has helped get me to this point (thanks!), but I feel like I have run into a brick wall for what I need to do:
A. provide a legend entry for the line #3
B. label each line on the graph (as per #1 #2 #3 - so 8 lines total)
Here is the code that I have so far:
p <- ggplot(network_data, aes(x=timeofday,y=dspeed, colour=factor(network)))+stat_smooth(method="loess",formula=y~x,se=FALSE)
p <- p + stat_function(fun=function(x)4000, geom="line", linetype="dashed", aes(colour="Baseline"))
p <- p + xlab("Time of Day (hr)") + ylab("Download Speed (ms)")
p <- p + theme(axis.line=element_line(colour="black"))
# add the gam line, colouring it purple for now
q <- layer(data=network_data, mapping=aes(x=timeofday,y=dspeed), stat="smooth"
, stat_params=list(method="gam", formula=y~s(x), se=FALSE), geom="smooth", geom_params=list(colour="purple"), position=position_identity())
graph <- p+q # add the layer
#legend
graph <- graph+scale_colour_discrete(name="network")
# set up the origin correctly and axes etc
graph2 <- graph + scale_y_continuous(limits=c(0,6500), expand=c(0,0), breaks=c(0,1000,2000,3000,4000,5000,6000)) + scale_x_datetime(limits=as.POSIXct(c("2015-04-13 00:00:01","2015-04-13 23:59:59")), expand = c(0, 0), breaks=date_breaks("1 hour"), labels=date_format("%H"))
Happy to consider other packages, but ggplot2 seems to be the best so far.
Is there anyway to do this 'automatically' (through programming) as I am trying to automate the generation of these graphs?
I have made the data available here as a .Rda file:
https://dl.dropboxusercontent.com/u/5268020/network_data.Rda
And here is an image of the current plot:
For q B, try annotate and manually code in the location and text for the label of each line. Seems unnecessary given the legend.
http://docs.ggplot2.org/current/annotate.html
I'm new with ggplot2 and I have a question that I couldn't find the answer.
I've created the following toy data to help in the explanation:
data <- data.frame(tech=c(rep(letters[1:15],2)),
sep=c(rep(c("SitutationA", "SitutationB"),each=15)),
error=c(runif(15,min=-0.2, max=0.5), runif(15, min=0.3, max=1)))
I want to plot a geom_bar graph showing the "error" (axis y) for each technique "tech" (axis x) divided in two different situations (SituationA and SituationB) using facet_grid. The color (fill) of each bar should represent the "error" of each technique, and not the technique (as a factor). The errors for situations A and B are measured in different scales. However, in my code, an error of the same value have the same color in both situations. I do not want this behavior since they were measured in different scales. Thus, I would like that the colors in Situations A and B were independents.
The following code plots the graph, but using the same color for both situations.
ggplot(data, aes(x=tech, y=error)) +
geom_bar(aes(fill=error), stat="identity", position="dodge") +
facet_grid(sep ~ ., scales="free_y") +
scale_fill_continuous(guide=FALSE)
How could I use different continuous fills for each facet (situationA and situationB)?
Thank you.
You can't have two different fill scales on the same plot.
Solution to the problem could be to make two plots and then put them together with grid.arrange() from library gridExtra.
In the first plot put only values of SitutationA. Changed y scale to show values with two numbers after decimal point (to be the same as for second plot). Removed x axis title, texts and ticks and changed plot margins - set bottom margin to -0.4 to reduce space between plots.
library(grid)
library(gridExtra)
p1<-ggplot(subset(data,sep=="SitutationA"), aes(x=tech, y=error)) +
geom_bar(aes(fill=error), stat="identity", position="dodge") +
facet_grid(sep ~ ., scales="free_y") +
scale_fill_continuous(guide=FALSE)+
scale_y_continuous(breaks=c(0,0.25,0.50))+
theme(axis.text.x=element_blank(),
axis.title.x=element_blank(),
axis.ticks.x=element_blank(),
plot.margin=unit(c(1,1,-0.4,1),"lines"))
For the second plot (SitutationB) changed top plot margin to -0.4 to reduce space between plots. Then changed scale_fill_continuous() and provided new colors.
p2<-ggplot(subset(data,sep=="SitutationB"), aes(x=tech, y=error)) +
geom_bar(aes(fill=error), stat="identity", position="dodge") +
facet_grid(sep ~ ., scales="free_y") +
scale_fill_continuous(guide=FALSE,low="red",high="purple") +
theme(plot.margin=unit(c(-0.4,1,1,1),"lines"))
Now put both plots together.
grid.arrange(p1,p2)
In the following example, how do I set separate ylims for each of my facets?
qplot(x, value, data=df, geom=c("smooth")) + facet_grid(variable ~ ., scale="free_y")
In each of the facets, the y-axis takes a different range of values and I would like to different ylims for each of the facets.
The defaults ylims are too long for the trend that I want to see.
This was brought up on the ggplot2 mailing list a short while ago. What you are asking for is currently not possible but I think it is in progress.
As far as I know this has not been implemented in ggplot2, yet. However a workaround - that will give you ylims that exceed what ggplot provides automatically - is to add "artificial data". To reduce the ylims simply remove the data you don't want plot (see at the and for an example).
Here is an example:
Let's just set up some dummy data that you want to plot
df <- data.frame(x=rep(seq(1,2,.1),4),f1=factor(rep(c("a","b"),each=22)),f2=factor(rep(c("x","y"),22)))
df <- within(df,y <- x^2)
Which we could plot using line graphs
p <- ggplot(df,aes(x,y))+geom_line()+facet_grid(f1~f2,scales="free_y")
print(p)
Assume we want to let y start at -10 in first row and 0 in the second row, so we add a point at (0,-10) to the upper left plot and at (0,0) ot the lower left plot:
ylim <- data.frame(x=rep(0,2),y=c(-10,0),f1=factor(c("a","b")),f2=factor(c("x","y")))
dfy <- rbind(df,ylim)
Now by limiting the x-scale between 1 and 2 those added points are not plotted (a warning is given):
p <- ggplot(dfy,aes(x,y))+geom_line()+facet_grid(f1~f2,scales="free_y")+xlim(c(1,2))
print(p)
Same would work for extending the margin above by adding points with higher y values at x values that lie outside the range of xlim.
This will not work if you want to reduce the ylim, in which case subsetting your data would be a solution, for example to limit the upper row between -10 and 1.5 you could use:
p <- ggplot(dfy,aes(x,y))+geom_line(subset=.(y < 1.5 | f1 != "a"))+facet_grid(f1~f2,scales="free_y")+xlim(c(1,2))
print(p)
There are actually two packages that solve that problem now:
https://github.com/zeehio/facetscales, and https://cran.r-project.org/package=ggh4x.
I would recommend using ggh4x because it has very useful tools, such as facet grid multiple layers (having 2 variables defining the rows or columns), scaling the x and y-axis as you wish in each facet, and also having multiple fill and colour scales.
For your problems the solution would be like this:
library(ggh4x)
scales <- list(
# Here you have to specify all the scales, one for each facet row in your case
scale_y_continuous(limits = c(2,10),
scale_y_continuous(breaks = c(3, 4))
)
qplot(x, value, data=df, geom=c("smooth")) +
facet_grid(variable ~ ., scale="free_y") +
facetted_pos_scales(y = scales)
I have one example of function facet_wrap
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
facet_wrap(vars(class), scales = "free",
nrow=2,ncol=4)
Above code generates plot as:
my level too low to upload an image, click here to see plot