I have a dataset myData which contains x and y values for various Samples. I can create a line plot for a dataset which contains a few Samples with the following pseudocode, and it is a good way to represent this data:
myData <- data.frame(x = 290:450, X52241 = c(..., ..., ...), X75123 = c(..., ..., ...))
myData <- myData %>% gather(Sample, y, -x)
ggplot(myData, aes(x, y)) + geom_line(aes(color=Sample))
Which generates:
This turns into a Spaghetti Plot when I have a lot more Samples added, which makes the information hard to understand, so I want to represent the "hills" of each sample in another way. Preferably, I would like to represent the data as a series of stacked bars, one for each myData$Sample, with transparency inversely related to what is in myData$y. I've tried to represent that data in photoshop (badly) here:
Is there a way to do this? Creating faceted plots using facet_wrap() or facet_grid() doesn't give me what I want (far too many Samples). I would also be open to stacked ridgeline plots using ggridges, but I am not understanding how I would be able to convert absolute values to a stat(density) value needed to plot those.
Any suggestions?
Thanks to u/Joris for the helpful suggestion! Since, I did not find this question elsewhere, I'll go ahead and post the pretty simple solution to my question here for others to find.
Basically, I needed to apply the alpha aesthetic via aes(alpha=y, ...). In theory, I could apply this over any geom. I tried geom_col(), which worked, but the best solution was to use geom_segment(), since all my "bars" were going to be the same length. Also note that I had to "slice" up the segments in order to avoid the problem of overplotting similar to those found here, here, and here.
ggplot(myData, aes(x, Sample)) +
geom_segment(aes(x=x, xend=x-1, y=Sample, yend=Sample, alpha=y), color='blue3', size=14)
That gives us the nice gradient:
Since the max y values are not the same for both lines, if I wanted to "match" the intensity I normalized the data (myDataNorm) and could make the same plot. In my particular case, I kind of preferred bars that did not have a gradient, but which showed a hard edge for the maximum values of y. Here was one solution:
ggplot(myDataNorm, aes(x, Sample)) +
geom_segment(aes(x=x, xend=x-1, y=Sample, y=end=Sample, alpha=ifelse(y>0.9,1,0)) +
theme(legend.position='none')
Better, but I did not like the faint-colored areas that were left. The final code is what gave me something that perfectly captured what I was looking for. I simply moved the ifelse() statement to apply to the x aesthetic, so the parts of the segment drawn were only those with high enough y values. Note my data "starts" at x=290 here. Probably more elegant ways to combine those x and xend terms, but whatever:
ggplot(myDataNorm, aes(x, Sample)) +
geom_segment(aes(
x=ifelse(y>0.9,x,290), xend=ifelse(y>0.9,x-1,290),
y=Sample, yend=Sample), color='blue3', size=14) +
xlim(290,400) # needed to show entire scale
Related
I am trying to create a plot that displays a line with two x axis, one is a continuous numeric and the other is discrete.
This an example of the data:
df <-cbind.data.frame("Category"=c("A","A","A","A","A","B","B","B","B","B"),
"Y"=c(5,6,4,8,9,4,5,3,7,8),
"X1"=c(0,10,20,30,40,0,10,20,30,40),
"X2"=c(0,0,1,1,2,0,1,2,2,3))
I tried to add a secondary axis and re-scale it, but since my two variables are not proportional I don't know how to re-scale so the same Y point in the line will fit both x axis.
ggplot(data=df) +
geom_path(aes(y=Y,x=X1),color="red")+
geom_path(aes(y=Y,x=X2*10),color="blue")+
facet_wrap(~Category)+
scale_y_continuous("Y")+
scale_x_continuous("X1",sec.axis = sec_axis(~ .*1/10, "X2"))
I read different problems with two axis, but was not able to find a solution for my problem.
I am looking for something like this:
I will appreciate a lot any help on this!
The plot you provide does not evidence a clear algebraic relationship, so I'm going to give you an example of a completely-arbitrary second x-axis.
library(ggplot2)
ggplot(mtcars, aes(mpg, disp)) +
geom_point() +
scale_x_continuous(sec.axis=sec_axis(~., breaks=c(15,20,30), labels=c('a','b','c')))
The first argument is the transformation "~." (essentially x2=x1) and is required, so in this case it's a 1-for-1 transformation. The other two are relatively clear, you place 'a' at x=15, 'b' at x=20, etc. I don't think there's a way to put both on the same axis (with ggplot2 alone).
I have a dataframe that I want to reorder to make a ggplot so I can easily see which items have the highest and lowest values in them. In my case, I've grouped the data into two groups, and it'd be nice to have a visual representation of which group tends to score higher. Based on this question I came up with:
library(ggplot2)
cor.data<- read.csv("https://dl.dropbox.com/s/p4uy6uf1vhe8yzs/cor.data.csv?dl=0",stringsAsFactors = F)
cor.data.sorted = cor.data[with(cor.data,order(r.val,pic)),] #<-- line that doesn't seem to be working
ggplot(cor.data.sorted,aes(x=pic,y=r.val,size=df.val,color=exp)) + geom_point()
which produces this:
I've tried quite a few variants to reorder the data, and I feel like this should be pretty simple to achieve. To clarify, if I had succesfully reorganised the data then the y-values would go up as the plot moves along the x-value. So maybe i'm focussing on the wrong part of the code to achieve this in a ggplot figure?
You could do something like this?
library(tidyverse);
cor.data %>%
mutate(pic = factor(pic, levels = as.character(pic)[order(r.val)])) %>%
ggplot(aes(x = pic, y = r.val, size = df.val, color = exp)) + geom_point()
This obviously still needs some polishing to deal with the x axis label clutter etc.
Rather than try to order the data before creating the plot, I can reorder the data at the time of writing the plot:
cor.data<- read.csv("https://dl.dropbox.com/s/p4uy6uf1vhe8yzs/cor.data.csv?dl=0",stringsAsFactors = F)
cor.data.sorted = cor.data[with(cor.data,order(r.val,pic)),] #<-- This line controls order points drawn created to make (slightly) more readible plot
gplot(cor.data.sorted,aes(x=reorder(pic,r.val),y=r.val,size=df.val,color=exp)) + geom_point()
to create
Here is an example of the code I'm working with
x<-as.factor(rep(c("tree_mean","tree_qmean","tree_skew"),3))
factor<-c(rep("mfn2_burned_99",3),rep("mfna_burned_5_7",3),rep("mfna_burned_5_7_10_12",3)))
y<-c(0.336457409,-0.347422910,-0.318945621,1.494109367, 0.003578698,-0.019985780,-0.484171146, 0.611589217,-0.322292664)
dat<-as.data.frame(cbind(x,factor,y))
head(dat)
x factor y
tree_mean mfn2_burned_99 -0.3364574
tree_qmean mfn2_burned_99 -0.3474229
tree_skew mfn2_burned_99 -0.3189456
tree_mean mfna_burned_5_7 -0.8269814
tree_qmean mfna_burned_5_7 -0.8088810
tree_skew mfna_burned_5_7 -2.5429226
tree_mean mfna_burned_5_7_10_12 -0.8601206
tree_qmean mfna_burned_5_7_10_12 -0.8474920
tree_skew mfna_burned_5_7_10_12 -2.9854178
I am trying to plot how much x deviates from 0, and facet it by each factor, as so:
ggplot(dat) +
geom_point(aes(x=x,y=y),shape=1,size=3)+
geom_linerange(aes(x=x,ymin=0,ymax=y))+
geom_hline(yintercept=0)+
facet_grid(factor~.)
This works fine when I have three factors (ignore the *: I had a significance column which I have since removed.
Example below:
However, I have 8 factors in total, and faceting obscures the plot such that the distance from zero for each x value gets very distorted.
Example below
So, my question is this: what would be a better way of coding/rendering this plot given my large number of x values and factors using faceting or color coding by factor in ggplot??
I would be very open to color-coding each distance for x by factor rather than faceting, but I have been beating my head against the wall trying to figure out how to even do that in ggplot (very new to ggplot), so I can't yet say if it would make the figure much more interpretable.
One option as you note is to color your point and/or linerange by a factor. You can then use position_dodge to move the points slightly on the x axis.
For example:
ggplot(dat, aes(color = factor)) +
geom_point(aes(x=x,y=y),shape=1,size=3, position = position_dodge(width = 0.5)+
geom_linerange(aes(x=x,ymin=0,ymax=y), position = position_dodge(width =0.5))+
geom_hline(yintercept=0)
I think this would still be difficult with many factors, but with 8 it might suit your purposes.
I am making a graph in ggplot2 consisting of a set of datapoints plotted as points, with the lines predicted by a fitted model overlaid. The general idea of the graph looks something like this:
names <- c(1,1,1,2,2,2,3,3,3)
xvals <- c(1:9)
yvals <- c(1,2,3,10,11,12,15,16,17)
pvals <- c(1.1,2.1,3.1,11,12,13,14,15,16)
ex_data <- data.frame(names,xvals,yvals,pvals)
ex_data$names <- factor(ex_data$names)
graph <- ggplot(data=ex_data, aes(x=xvals, y=yvals, color=names))
print(graph + geom_point() + geom_line(aes(x=xvals, y=pvals)))
As you can see, both the lines and the points are colored by a categorical variable ('names' in this case). I would like the legend to contain 2 entries: a dot labeled 'Data', and a line labeled 'Fitted' (to denote that the dots are real data and the lines are fits). However, I cannot seem to get this to work. The (awesome) guide here is great for formatting, but doesn't deal with the actual entries, while I have tried the technique here to no avail, i.e.
print(graph + scale_colour_manual("", values=c("green", "blue", "red"))
+ scale_shape_manual("", values=c(19,NA,NA))
+ scale_linetype_manual("",values=c(0,1,1)))
The main trouble is that, in my actual data, there are >200 different categories for 'names,' while I only want the 2 entries I mentioned above in the legend. Doing this with my actual data just produces a meaningless legend that runs off the page, because the legend is trying to be a key for the colors (of which I have way too many).
I'd appreciate any help!
I think this is close to what you want:
ggplot(ex_data, aes(x=xvals, group=names)) +
geom_point(aes(y=yvals, shape='data', linetype='data')) +
geom_line(aes(y=pvals, shape='fitted', linetype='fitted')) +
scale_shape_manual('', values=c(19, NA)) +
scale_linetype_manual('', values=c(0, 1))
The idea is that you specify two aesthetics (linetype and shape) for both lines and points, even though it makes no sense, say, for a point to have a linetype aesthetic. Then you manually map these "nonsense" aesthetics to "null" values (NA and 0 in this case), using a manual scale.
This has been answered already, but based on feedback I got to another question (How can I fix this strange behavior of legend in ggplot2?) this tweak may be helpful to others and may save you headaches (sorry couldn't put as a comment to the previous answer):
ggplot(ex_data, aes(x=xvals, group=names)) +
geom_point(aes(y=yvals, shape='data', linetype='data')) +
geom_line(aes(y=pvals, shape='fitted', linetype='fitted')) +
scale_shape_manual('', values=c('data'=19, 'fitted'=NA)) +
scale_linetype_manual('', values=c('data'=0, 'fitted'=1))
I have some data for which, at one level of a factor, there is a significant correlation. At the other level, there is none. Plotting these side-by-side is simple. Adding a line to both of them with stat_smooth, also straightforward. However, I do not want the line or its fill displayed in one of the two facets. Is there a simple way to do this? Perhaps specifying a blank color for the fill and colour of one of the lines somehow?
Don't think about picking a facet, think supplying a subset of your data to stat_smooth:
ggplot(df, aes(x, y)) +
geom_point() +
geom_smooth(data = subset(df, z =="a")) +
facet_wrap(~ z)
Of course, I later answered my own question. Although, is there a less hack-y way to do this? I wonder if one could even fit different functions to different panels.
One technique is to use + scale_fill_manual and scale_colour_manual. They allow one to specify what colors will be used. So, in this case, let's say you have
a<-qplot(x, y, facets=~z)+stat_smooth(method="lm", aes(colour=z, fill=z))
You can specify colors for the fill and colour using the following. Note, the second color is clear, as it is using a hex value with the final two numbers representing transparency. So, 00=clear.
a+stat_fill_manual(values=c("grey", "#11111100"))+scale_colour_manual(values=c("blue", "#11111100"))