I want to create a scatterplot in ggplot2 with one or more lines over-layed. Having looked at the documentation for geom_smooth() and geom_line(), it remains unclear to me how I can specify the equations for lines that I want to add to a plot. I understand that this must be very basic, so please feel free simply to point me toward the appropriate documentation that I must have overlooked.
geom_abline() is the name of the geom you're looking for, e.g. geom_abline(aes(intercept=a,slope=b)). There are examples in the online documentation.
Related
For example, if I was to plot a trendline in Excel I can choose to include its formula on the chart, it will display something like myY = 0.5 + 0.5 * myX.
R has quite advanced functionality for creating sophisticated mathematical expressions, so I assume this is fairly straightforward to do. I'm just not sure how to do it.
I like to do this in the legend, so here's how I do that:
plot(myX,myY);fit=lm(myY~myX);abline(fit);fit2=round(fit[[1]],3)
legend('bottomright',lty=1,paste('myY =',feet[1],if(feet[2]>0)'+'else'-',abs(feet[2]),'myX'))
Using this dataset, here's how it looks:
You may have to change 'bottomright' depending on where you have a free corner in the scatterplot, or you may need to supply coordinates to put it outside the box if you don't have any free room for the legend inside the plot. You may also prefer to use ggplot2, in which case the code would be different...
I'd like to achieve what this person has achieved without using ggplot. Any ideas?
How do I create a continuous density heatmap of 2D scatter data in R?
You can see what I get when using the solution detailed in that question.
ggplot(df,aes(x=x,y=y))+
stat_density2d(aes(alpha=..level..), geom="polygon") +
scale_alpha_continuous(limits=c(0,1),breaks=seq(0,1,by=0.1))+
geom_point(colour="red",alpha=0.2)+
theme_bw()
The heatmap is so sparse. I want it to cover much more than what it is covering now. It's terribly hard to see anything about the density. Any ideas of different ways to make density heatmaps from 2D data besides this ggplot solution?
One idea I had was instead of using linear color labeling (see the black to white spectrum on the left, which is linear), using logarithmic scale for the density labeling. Any ideas how I could do this?
"The heatmap is so sparse. I want it to cover much more than what it is covering now. It's terribly hard to see anything about the density."
Please be specific: what do you want to see in areas with most or all NAs?
if you use geom_point with alpha-blending and position_jitter, the current plot is as good as it gets
if some solid color, then use geom_hex(), see http://mfcovington.github.io/r_club/solutions/2013/02/28/peer-produced-plots-solutions/ for code. Then play with the continuous color_scale... you probably want a nonlinear transform. Post us your revised attempt, if you want a critique.
I actually ended up using smoothScatter, which works well and uses classic R plotting.
This is a repeat of a question originally asked here: Indicating the statistically significant difference in bar graph but asked for R instead of python.
My question is very simple. I want to produce barplots in R, using ggplot2 if possible, with an indication of significant difference between the different bars, e.g. produce something like this. I have had a search around but can't find another question asking exactly the same thing.
I know that this is an old question and the answer by Didzis Elferts already provides one solution for the problem. But I recently created a ggplot-extension that simplifies the whole process of adding significance bars: ggsignif
Instead of tediously adding the geom_path and annotate to your plot you just add a single layer geom_signif:
library(ggplot2)
library(ggsignif)
ggplot(iris, aes(x=Species, y=Sepal.Length)) +
geom_boxplot() +
geom_signif(comparisons = list(c("versicolor", "virginica")),
map_signif_level=TRUE)
Full documentation of the package is available at CRAN.
You can use geom_path() and annotate() to get similar result. For this example you have to determine suitable position yourself. In geom_path() four numbers are provided to get those small ticks for connecting lines.
df<-data.frame(group=c("A","B","C","D"),numb=c(12,24,36,48))
g<-ggplot(df,aes(group,numb))+geom_bar(stat="identity")
g+geom_path(x=c(1,1,2,2),y=c(25,26,26,25))+
geom_path(x=c(2,2,3,3),y=c(37,38,38,37))+
geom_path(x=c(3,3,4,4),y=c(49,50,50,49))+
annotate("text",x=1.5,y=27,label="p=0.012")+
annotate("text",x=2.5,y=39,label="p<0.0001")+
annotate("text",x=3.5,y=51,label="p<0.0001")
I used the suggested method from above, but I found the annotate function easier for making lines than the geom_path function. Just use "segment" instead of "text". You have to break things up by segment and define starting and ending x and y values for each line segment.
example for making 3 lines segments:
annotate("segment", x=c(1,1,2),xend=c(1,2,2), y= c(125,130,130), yend=c(130,130,125))
Is there an function in R that does the same job as Matlab's "bar" function?
R does have a "barplot" function in the library graphics, however, it is not the same.
The Matlab bar(X,Y) (verbatim excerpt from MATLAB documentation) "draws a bar for each element in Y at locations specified in X, where X is a vector defining the x-axis intervals for the vertical bars." (emphasis mine)
However, the R barplot function does not allow one to specify locations.
Perhaps there is a method in ggplot2 that supports this? I am only able to find standard bar charts in ggplot2.
No, barplot is not the same as bar, but you should read the whole help. You can do many things to position the bars. The first is simply their order in Y. You could insert spaces if you wish (additional 0s). If you have X and Y then sort Y on X (Y[order(X)]) and plot it. If you need to change positions use the "space" and "width" arguments. It's not as straightforward as specifying X values I suppose but it's definitely more useful in most situations. Generally what you want to adjust is widths of bars and spaces between bars. Their position on the X-axis should be arbitrary. If the position on the X-axis is really meaningful then you should be using line plots, not bar graphs.
In R:
barplot(rbind(1:10, 2:11), beside=T, names.arg=1:10)
In MATLAB:
>> bar(1:10, [(1:10)' (2:11)'])
Read up on par . Then observe, for example:
x<-c(1,2,4,5,6)
y<-c(3,4,3,4,2)
plot(x,y,type='h',lwd=6)
Edit: yes, I know this doesn't (yet) plot multiple data sets, but I would hope you can see simple ways to make that happen, with spacings, colors, etc. specified to your exact liking :-)
Sounds vaguely like the R stepfun. On the other hand one would need to know what "draws a bar" means before saying it is not the same as barplot(..., horiz=TRUE) One would, of course, need to examine some more detailed evidence such as data and plots before arriving at a conclusion, however. #John Colby should be congratulated for adding some specificity to the discussion. The axis function is probably what Quant Guy needs education regarding.
I'm creating a plot and adding a basic loess smooth line to it.
qplot(Age.GTS2004., X.d18O,data=deepsea, geom=c('point')) +
geom_smooth(method="loess",se=T,span=0.01, alpha=.5, fill='light blue',color='navy')
The problem is that the line is coming out really choppy. I need more evaluation point for the curve in certain areas. Is there a way to increase the number of evaluation points without having to reconstruct geom_smooth?
Use the n parameter, as documented in stat_smooth.
Hadley: The documentation leads people astray. geom_smooth does not document that it accepts parameters on behalf of stat_smooth, nor is there any link on that page to stat_smooth for continued reading.
I figured the parameter was buried on some other help page, but I landed here to clue in where.