I'm trying to make a dotplot where a numerical y values are grouped according to character variables. That works fine, but I also want to change the sizes of the dots according to another variable, so that there are three differrent sizes of dots in the plot. I can change the dot sizes, it's just that R doesn't seem to be getting it right.
I couldn't find a good sample dataset, so I've made a quick example:
#Making some sufficient data:
y1 <- c(1,1,2,3,4,5,6,6)
x1 <- c('A','A','B','C','A','A','B','B')
size1 <- c(0.3,0.3,0.3,0.3,0.3,0.6,0.6,1.0)
data1 <- data.frame(x1,y1,size1)
data1
#define size as a vector: apparently it helps some problems
size2 <- data1$size1
#plot my dotplot!
ggplot(data1, aes(x=x1,y=y1)) +
geom_dotplot(binaxis="y", stackdir="center", dotsize=size2)
Overall, the dotplot works fine. The y variables are grouped according to their group of A, B, or C. However, the dotsizes are incorrect: The only dot in group C should be small (dotsize=0.3), the two dots at y=1 of group A should both be of equal size... and so on.
Dotplot with all sorts of dotsize inaccuracies
The question 'geom_dotplot dot sizes change when plotting different datasets in loop' (geom_dotplot dot sizes change when plotting different datasets in loop) said that the dotsize of geom_dotplot wasn't exactly a dot size, but was relative to bin width. That could explain why I'm having trouble. However, I'm unsure of how to fix this. Is there a way to reliably vary dot sizes in ggplot2's dotplots, or should I try making a dotplot with a more flexible tool than geom_dotplot? (Restarting R and my computer don't work.)
Cheers!
The stack overflow thread you shared clarifies what you can do with geom_dotplot and if you add a binwidth param, you can see the effect of dotsize. Here is an example,
base <- ggplot(data1, aes(x=x1,y=y1))
base + geom_dotplot(binaxis="y", stackdir="center", dotsize=size1, binwidth = 1)
Output
Using geom_point instead of geom_dotplot should solve the problem
ggplot(data1, aes(x=x1,y=y1)) +
geom_point(aes(size=size1))
Related
I have a dataset myData which contains x and y values for various Samples. I can create a line plot for a dataset which contains a few Samples with the following pseudocode, and it is a good way to represent this data:
myData <- data.frame(x = 290:450, X52241 = c(..., ..., ...), X75123 = c(..., ..., ...))
myData <- myData %>% gather(Sample, y, -x)
ggplot(myData, aes(x, y)) + geom_line(aes(color=Sample))
Which generates:
This turns into a Spaghetti Plot when I have a lot more Samples added, which makes the information hard to understand, so I want to represent the "hills" of each sample in another way. Preferably, I would like to represent the data as a series of stacked bars, one for each myData$Sample, with transparency inversely related to what is in myData$y. I've tried to represent that data in photoshop (badly) here:
Is there a way to do this? Creating faceted plots using facet_wrap() or facet_grid() doesn't give me what I want (far too many Samples). I would also be open to stacked ridgeline plots using ggridges, but I am not understanding how I would be able to convert absolute values to a stat(density) value needed to plot those.
Any suggestions?
Thanks to u/Joris for the helpful suggestion! Since, I did not find this question elsewhere, I'll go ahead and post the pretty simple solution to my question here for others to find.
Basically, I needed to apply the alpha aesthetic via aes(alpha=y, ...). In theory, I could apply this over any geom. I tried geom_col(), which worked, but the best solution was to use geom_segment(), since all my "bars" were going to be the same length. Also note that I had to "slice" up the segments in order to avoid the problem of overplotting similar to those found here, here, and here.
ggplot(myData, aes(x, Sample)) +
geom_segment(aes(x=x, xend=x-1, y=Sample, yend=Sample, alpha=y), color='blue3', size=14)
That gives us the nice gradient:
Since the max y values are not the same for both lines, if I wanted to "match" the intensity I normalized the data (myDataNorm) and could make the same plot. In my particular case, I kind of preferred bars that did not have a gradient, but which showed a hard edge for the maximum values of y. Here was one solution:
ggplot(myDataNorm, aes(x, Sample)) +
geom_segment(aes(x=x, xend=x-1, y=Sample, y=end=Sample, alpha=ifelse(y>0.9,1,0)) +
theme(legend.position='none')
Better, but I did not like the faint-colored areas that were left. The final code is what gave me something that perfectly captured what I was looking for. I simply moved the ifelse() statement to apply to the x aesthetic, so the parts of the segment drawn were only those with high enough y values. Note my data "starts" at x=290 here. Probably more elegant ways to combine those x and xend terms, but whatever:
ggplot(myDataNorm, aes(x, Sample)) +
geom_segment(aes(
x=ifelse(y>0.9,x,290), xend=ifelse(y>0.9,x-1,290),
y=Sample, yend=Sample), color='blue3', size=14) +
xlim(290,400) # needed to show entire scale
I am trying to create a plot that displays a line with two x axis, one is a continuous numeric and the other is discrete.
This an example of the data:
df <-cbind.data.frame("Category"=c("A","A","A","A","A","B","B","B","B","B"),
"Y"=c(5,6,4,8,9,4,5,3,7,8),
"X1"=c(0,10,20,30,40,0,10,20,30,40),
"X2"=c(0,0,1,1,2,0,1,2,2,3))
I tried to add a secondary axis and re-scale it, but since my two variables are not proportional I don't know how to re-scale so the same Y point in the line will fit both x axis.
ggplot(data=df) +
geom_path(aes(y=Y,x=X1),color="red")+
geom_path(aes(y=Y,x=X2*10),color="blue")+
facet_wrap(~Category)+
scale_y_continuous("Y")+
scale_x_continuous("X1",sec.axis = sec_axis(~ .*1/10, "X2"))
I read different problems with two axis, but was not able to find a solution for my problem.
I am looking for something like this:
I will appreciate a lot any help on this!
The plot you provide does not evidence a clear algebraic relationship, so I'm going to give you an example of a completely-arbitrary second x-axis.
library(ggplot2)
ggplot(mtcars, aes(mpg, disp)) +
geom_point() +
scale_x_continuous(sec.axis=sec_axis(~., breaks=c(15,20,30), labels=c('a','b','c')))
The first argument is the transformation "~." (essentially x2=x1) and is required, so in this case it's a 1-for-1 transformation. The other two are relatively clear, you place 'a' at x=15, 'b' at x=20, etc. I don't think there's a way to put both on the same axis (with ggplot2 alone).
Here is an example of the code I'm working with
x<-as.factor(rep(c("tree_mean","tree_qmean","tree_skew"),3))
factor<-c(rep("mfn2_burned_99",3),rep("mfna_burned_5_7",3),rep("mfna_burned_5_7_10_12",3)))
y<-c(0.336457409,-0.347422910,-0.318945621,1.494109367, 0.003578698,-0.019985780,-0.484171146, 0.611589217,-0.322292664)
dat<-as.data.frame(cbind(x,factor,y))
head(dat)
x factor y
tree_mean mfn2_burned_99 -0.3364574
tree_qmean mfn2_burned_99 -0.3474229
tree_skew mfn2_burned_99 -0.3189456
tree_mean mfna_burned_5_7 -0.8269814
tree_qmean mfna_burned_5_7 -0.8088810
tree_skew mfna_burned_5_7 -2.5429226
tree_mean mfna_burned_5_7_10_12 -0.8601206
tree_qmean mfna_burned_5_7_10_12 -0.8474920
tree_skew mfna_burned_5_7_10_12 -2.9854178
I am trying to plot how much x deviates from 0, and facet it by each factor, as so:
ggplot(dat) +
geom_point(aes(x=x,y=y),shape=1,size=3)+
geom_linerange(aes(x=x,ymin=0,ymax=y))+
geom_hline(yintercept=0)+
facet_grid(factor~.)
This works fine when I have three factors (ignore the *: I had a significance column which I have since removed.
Example below:
However, I have 8 factors in total, and faceting obscures the plot such that the distance from zero for each x value gets very distorted.
Example below
So, my question is this: what would be a better way of coding/rendering this plot given my large number of x values and factors using faceting or color coding by factor in ggplot??
I would be very open to color-coding each distance for x by factor rather than faceting, but I have been beating my head against the wall trying to figure out how to even do that in ggplot (very new to ggplot), so I can't yet say if it would make the figure much more interpretable.
One option as you note is to color your point and/or linerange by a factor. You can then use position_dodge to move the points slightly on the x axis.
For example:
ggplot(dat, aes(color = factor)) +
geom_point(aes(x=x,y=y),shape=1,size=3, position = position_dodge(width = 0.5)+
geom_linerange(aes(x=x,ymin=0,ymax=y), position = position_dodge(width =0.5))+
geom_hline(yintercept=0)
I think this would still be difficult with many factors, but with 8 it might suit your purposes.
Here is a code:
set.seed (12)
library(ggplot2)
dat = data.frame(a=runif(40,0,1),b=c('a','b','c','d','e'),c=c('Hi','Hello'))
ggplot(dat,aes(x=b,y=a,shape=factor(c))) + stat_summary(fun.data=mean_cl_normal)
The graph it creates has error bars that overlap so that it is hard to distinguish the limits. I've often seen graphs where the different series (given by the factor c) are slightly horizontally shifted so that error bars does not overlap. Is there a way to achieve this with R when using a categorical variable in x ?
Thank you
You can use something like position_dodge():
ggplot(dat,aes(x=b,y=a,shape=factor(c))) +
stat_summary(fun.data=mean_cl_normal, position=position_dodge(width=0.2))
Example plot:
I'm trying to make a boxplot with ggplot2 using the following code:
p <- ggplot(
data,
aes(d$score, reorder(d$names d$scores, median))
) +
geom_boxplot()
I have factors called names and integers called scores.
My code produces a plot, but the graphic does not depict the boxes (only shows lines) and I get a warning message, "position_dodge requires non-overlapping x intervals." I've tried to adjust the height and width with geom_boxplot(width=5), but this does not seem to fix the problem. Can anyone suggest a possible solution to my problem?
I should point out that my boxplot is rather large and has about 200 name values on the y-axis). Perhaps this is the problem?
The number of groups is not the problem; I can see the same thing even when there are only 2 groups. The issue is that ggplot2 draws boxplots vertically (continuous along y, categorical along x) and you are trying to draw them horizontally (continuous along x, categorical along y).
Also, your example has several syntax errors and isn't reproducible because we don't have data/d.
Start with some mock data
dat <- data.frame(scores=rnorm(1000,sd=500),
names=sample(LETTERS, 1000, replace=TRUE))
Corrected version of your example code:
ggplot(dat, aes(scores, reorder(names, scores, median))) + geom_boxplot()
This is the horizontal lines you saw.
If you instead put the categorical on the x axis and the continuous on the y you get
ggplot(dat, aes(reorder(names, scores, median), scores)) + geom_boxplot()
Finally, if you want to flip the coordinate axes, you can use coord_flip(). There can be some additional problems with this if you are doing even more sophisticated things, but for basic boxplots it works.
ggplot(dat, aes(reorder(names, scores, median), scores)) +
geom_boxplot() + coord_flip()
In case anyone else arrives here wondering why they're seeing
Warning message:
position_dodge requires non-overlapping x intervals
Why this happens
The reason this happens is because some of the boxplot / violin plot (or other plot type) are possibly overlapping. In many cases, you may not care, but in some cases, it matters, hence why it warns you.
How to fix it
You have two options. Either suppress warnings when generating/printing the ggplot
The other option, simply alter the width of the plot so that the plots don't overlap, then the warning goes away. Try altering the width argument to the geom: e.g. geom_boxplot(width = 0.5) (same works for geom_violin())
In addition to #stevec's options, if you're seeing
position_stack requires non-overlapping x intervals
position_fill requires non-overlapping x intervals
position_dodge requires non-overlapping x intervals
position_dodge2 requires non-overlapping x intervals
and if your x variable is supposed to overlap for different aesthetics such as fill, you can try making the x_var into a factor:
geom_bar(aes(x = factor(x_var), fill = type)