Identifying values in R Plot - r

I have been trying to identify extreme values in a R ggplot2.
Is there any way to have a plot where besides the point (or instead of it) representing the values, it also shows the index? Or any other thing that allows you to quickly identify it?
The closest thing I found was with the identify() function, but it didn't work very well for me.
Any recommendations?
I'll give a basic ggplot plot:
df = data.frame(x = runif(10,0,1), y = runif(10,0,1))
ggplot(df, aes(x,y)) +
geom_point(col="red") + theme_bw()

Update:
I've been trying new things. I finally got exactly what I wanted.
df = data.frame(x = runif(10,0,1), y = runif(10,0,1))
ggplot(df, aes(x,y, label = rownames(df))) +
geom_point() + geom_text() + theme_bw()
Now I can easily identify the values that I want. Hope it helps other people that are new to ggplot.
If anyone knows ways to improve it, feel free to do so.

I'd suggest installing the plotly package and then running:
plotly::ggplotly(.Last.value)

Related

Different rendering between chloroplethr package and ggplot2

I got a visualisation problem and I can't get my head around. From some data with position information included I want to create a map. I found the great choroplethr package which was a great starting point and it really helped me a lot to understand how to process data for meaningful results. Here is the map the way I would like to have it:
But when I try to replicate the steps (cf. self$render of the chlorplethr package using ggplot2) I get the following result:
Does anyone have an idea where my parameters are wrong/lacking something? Here is the code:
fig <- ggplot(data=merge.shp, aes(x = long, y = lat, group = group)) +
geom_polygon(aes(fill = sqm_cat), na.rm=FALSE, rule="evenodd", position="identity") +
coord_equal() +
scale_fill_brewer("", drop = FALSE, na.value = "black") +
ggplot2::theme_void() +
ggtitle("People per square kilometers")
Edit: I think, I found the problem. If I just plot the map for one region
merge.shp %>% filter(plz %in% c("645"))
%>% ggplot(aes(x=long, y=lat, group=group)) + geom_path
I get the following result:
So everything may be related to the "wrong" connection of coordinates. If I replace geom_path with geom_point there are some reasonable outlines. But how do I translate this to the map?
The solution is actually quite simple. The order is essential for geom_polygon and I was mistakenly assuming my dataframe merge.shp was in an ascending order for all factors, which it wasn't. Introducing
merge.shp = merge.shp[order(merge.shp$order), ]
made it work.

can i convert a base plot in r to a ggplot object?

I'm somewhat new to R and I love ggplot - that's all I use for plotting, so I don't know all the archaic syntax needed for base plots in R (and I'd rather not have learn it). I'm running pROC::roc and I would like to plot the output in ggplot (so I can fine tune how it looks). I can immediately get a plot as follows:
size <- 100
response <- sample(c(0,1), replace=TRUE, size=size)
predictor <- rnorm(100)
rocobject <- pROC::roc(response, predictor,smooth=T)
plot(rocobject)
To use ggplot instead, I can create a data frame from the output and then use ggplot (this is NOT my question). What I want to know is if I can somehow 'convert' the plot made in the code above into ggplot automatically so that I can then do what I want in ggplot? I've searched all over and I can't seem to find the answer to this 'basic' question. Thanks!!
Better late than never? I think the ggplotify package might do what you want. You basically plug in your plot generating code to the as.ggplot() function like so:
p6 <- as.ggplot(~plot(iris$Sepal.Length, iris$Sepal.Width, col=color, pch=15))
https://cran.r-project.org/web/packages/ggplotify/vignettes/ggplotify.html
No, I think unfortunately this is not possible.
Even though this does not answer your real question, building it with ggplot is actually not difficult.
Your original plot:
plot(rocobject)
In ggplot:
library(ggplot2)
df<-data.frame(y=unlist(rocobject[1]), x=unlist(rocobject[2]))
ggplot(df, aes(x, y)) + geom_line() + scale_x_reverse() + geom_abline(intercept=1, slope=1, linetype="dashed") + xlab("Specificity") + ylab("sensitivity")

ggplot2: setting guide ticks in scale_color_gradient()

I'd like to change x values in the legend below so that they are nice round numbers (e.g. 0.01, 1, 10) instead of the crazy long, seemingly arbitrary decimals. How do I do so? I generated the plot using the code further below. This seems like there should be a really easy option to set, but I googled around and read the help files and for the life of me, I can't figure it out.
I'm currently using ggplot2_1.0.1.
library(ggplot2)
d = data.frame(x = 10^seq(-2,2,length.out=10),y=10^(seq(-2,2,length.out=10)))
g = ggplot(data=d,aes(x=x,y=y,group=x,color=x)) + geom_point()
g = g + scale_color_gradient(trans="log",guide="legend")
print(g)
Just add breaks= argument to the function scale_color_gradient() to set desired levels.
ggplot(data=d,aes(x=x,y=y,group=x,color=x)) + geom_point() +
scale_color_gradient(trans="log",guide="legend",breaks=c(0.01,1,10))

Getting counts on bins in a heat map using R

This question follows from these two topics:
How to use stat_bin2d() to compute counts labels in ggplot2?
How to show the numeric cell values in heat map cells in r
In the first topic, a user wants to use stat_bin2d to generate a heatmap, and then wants the count of each bin written on top of the heat map. The method the user initially wants to use doesn't work, the best answer stating that stat_bin2d is designed to work with geom = "rect" rather than "text". No satisfactory response is given.
The second question is almost identical to the first, with one crucial difference, that the variables in the second question question are text, not numeric. The answer produces the desired result, placing the count value for a bin over the bin in a stat_2d heat map.
To compare the two methods i've prepared the following code:
library(ggplot2)
data <- data.frame(x = rnorm(1000), y = rnorm(1000))
ggplot(data, aes(x = x, y = y))
geom_bin2d() +
stat_bin2d(geom="text", aes(label=..count..))
We know this first gives you the error:
"Error: geom_text requires the following missing aesthetics: x, y".
Same issue as in the first question. Interestingly, changing from stat_bin2d to stat_binhex works fine:
library(ggplot2)
data <- data.frame(x = rnorm(1000), y = rnorm(1000))
ggplot(data, aes(x = x, y = y))
geom_binhex() +
stat_binhex(geom="text", aes(label=..count..))
Which is great and all, but generally, I don't think hex binning is very clear, and for my purposes wont work for the data i'm trying to desribe. I really want to use stat_2d.
To get this to work, i've prepared the following work around based on the second answer:
library(ggplot2)
data <- data.frame(x = rnorm(1000), y = rnorm(1000))
x_t<-as.character(round(data$x,.1))
y_t<-as.character(round(data$y,.1))
x_x<-as.character(seq(-3,3),1)
y_y<-as.character(seq(-3,3),1)
data<-cbind(data,x_t,y_t)
ggplot(data, aes(x = x_t, y = y_t)) +
geom_bin2d() +
stat_bin2d(geom="text", aes(label=..count..))+
scale_x_discrete(limits =x_x) +
scale_y_discrete(limits=y_y)
This works around allows one to bin numerical data, but to do so, you have to determine bin width (I did it via rounding) before bringing it into ggplot. I actually figured it out while writing this question, so I may as well finish.
This is the result: (turns out I can't post images)
So my real question here, is does any one have a better way to do this? I'm happy I at least got it to work, but so far I haven't seen an answer for putting labels on stat_2d bins when using a numerical variable.
Does any one have a method for passing on x and y arguments to geom_text from stat_2dbin without having to use a work around? Can any one explain why it works with text variables but not with numbers?
Another work around (but perhaps less work). Similar to the ..count.. method you can extract the counts from the plot object in two steps.
library(ggplot2)
set.seed(1)
dat <- data.frame(x = rnorm(1000), y = rnorm(1000))
# plot
p <- ggplot(dat, aes(x = x, y = y)) + geom_bin2d()
# Get data - this includes counts and x,y coordinates
newdat <- ggplot_build(p)$data[[1]]
# add in text labels
p + geom_text(data=newdat, aes((xmin + xmax)/2, (ymin + ymax)/2,
label=count), col="white")

Add rectangles around common values in ggplot

When I make an experimental design, I use ggplot to show the layout. Here's a simple example:
df <- data.frame(Block=rep(1:2, each=18),
Row=rep(1:9, 4),
Col=rep(1:4, each=9),
Treat=sample(c(1:6),replace=F))
Which I'll plot like:
df.p <- ggplot(df, aes(Row, Col)) + geom_tile(aes(fill=as.factor(Treat)))
to give:
Sometimes I have a structure within the design I would like to highlight by putting a box around it, for example a mainplot. In this case:
df$Mainplot <- ceiling(df$Row/3) + 3*(ceiling(df$Col/2) - 1)
I then use geom_rect and some messy code that needs adjusting for each design to generate something like:
Question: How do I add the rectangles around the mainplots in a simple way? It seems like a simple enough problem, but I haven't found an obvious way. I can map colour or some other aesthetic to mainplot, but I can't seem to surround them with a box. Any pointers greatly appreciated.
Here is a possible solution where I create an auxiliary data.frame for plotting borders with geom_rect(). I'm not sure if this is as simple as you would like! I hope the code that computes the rectangle coordinates will be reusable/generalizable with just a bit of additional effort.
library(ggplot2)
# Load example data.
df = data.frame(Block=rep(1:2, each=18),
Row=rep(1:9, 4),
Col=rep(1:4, each=9),
Treat=sample(c(1:6),replace=F))
df$Mainplot = ceiling(df$Row/3) + 3*(ceiling(df$Col/2) - 1)
# Create an auxiliary data.frame for plotting borders.
group_dat = data.frame(Mainplot=sort(unique(df$Mainplot)),
xmin=0, xmax=0, ymin=0, ymax=0)
# Fill data.frame with appropriate values.
for(i in 1:nrow(group_dat)) {
item = group_dat$Mainplot[i]
tmp = df[df$Mainplot == item, ]
group_dat[i, "xmin"] = min(tmp$Row) - 0.5
group_dat[i, "xmax"] = max(tmp$Row) + 0.5
group_dat[i, "ymin"] = min(tmp$Col) - 0.5
group_dat[i, "ymax"] = max(tmp$Col) + 0.5
}
p2 = ggplot() +
geom_tile(data=df, aes(x=Row, y=Col, fill=factor(Treat)),
colour="grey30", size=0.35) +
geom_rect(data=group_dat, aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax),
size=1.4, colour="grey30", fill=NA)
ggsave(filename="plot_2.png", plot=p2, height=3, width=6.5)
Here's a solution that might be a easier. Just use geom_tile with alpha set to 0. I didn't take the time to give you an exact solution, but here's an example. To achieve what you want I'm guessing you'll need to actually create a new data frame, which should be easy enough.
df <- data.frame(Block=rep(1:2, each=18),Row=rep(1:9, 4),Col=rep(1:4, each=9),Treat=sample(c(1:6),replace=F))
df$blocking <- rep(sort(rep(1:3,3)),4)
df.p <- ggplot(df, aes(Row, Col)) + geom_tile(aes(fill=as.factor(Treat)))
df.p+ geom_tile(data=df,aes(x=Row,y=blocking),colour="black",fill="white",alpha=0,lwd=1.4)
the alpha=0 will create a blank tile, and then you can set the line width using lwd. That's probably easier than specifying all the rectangles. Hope it helps.
I thought it would be worth posting my own (non-ideal) solution, since it seems there's nothing obvious I'm missing. I'm going to leave the question unanswered in the hope someone will come up with something.
At present, I use geom_rect in a fashion that would probably be able to be made general (perhaps into a geom_border addition to ggplot??). For the example in my question, the essential information is that each mainplot is 3 x 2.
Adding onto df.p from the original question, this is what I do currently:
df.p1 <- df.p + geom_rect(aes(xmin=((Mainplot- 3*(ceiling(Col/2)-1) )-1)*3 + 0.5,
xmax=((Mainplot - 3*(ceiling(Col/2)-1))-1)*3 + 3.5,
ymin=ceiling(ceiling(Col/2)/2 + 2*(ceiling(Col/2)-1))-0.5,
ymax=2*ceiling(Col/2)+0.5),
colour="black", fill="transparent",size=1)
Ugly, I know - hence the question. That code generates the second plot from the question. Maybe the best option is building this all into a function.

Resources