R: point at which geom_smooth drops below a certain value - r

Hi stack overflow community,
I hope the two interrelated questions I am asking are not too nooby. I tried several google searches but could not find a solution.
I use R to plot the findings of a linguistic "experiment", in which I checked in how far two grammatical constructions yield acceptable descriptions of an event, depending on how for it unfolds. My data look like similar to this:
event,PFV.alone,PFV.and.PART
0.01,0,1
0.01,0,1
0.05,0,1
0.05,0,1
0.05,0,1
0.1,0,1
0.1,0,1
0.25,0,1
0.25,0,1
0.25,0,1
0.3,0,1
0.3,0,1
0.33,0,1
0.33,0,1
0.33,0,1
0.33,0,1
....
0.67,1,0.5
0.75,1,0.5
0.75,1,0
0.75,1,0
0.75,1,0
0.8,1,1
0.8,1,0
0.8,1,0
0.8,1,0
0.85,1,1
0.85,1,0
0.9,1,0
0.9,1,0
0.9,1,0
0.95,1,0
As you can see, for each of the two constructions there are "plateaus" where acceptability is 0 or 1 and then there's a "transitional" area. In order to illustrate the "plateaus" I use geom_segment and to create a smooth "transition" for the scattered data in between, I use geom_smooth. Here's my code:
#after loading datafile into "Daten":
p <- ggplot(data = Daten,
aes(x=event, y=PFV.and.PART, xmin=0, ymin=0, xmax=1, ymax=1))
p + geom_blank() +
coord_fixed()+
xlab("Progress of the event") +
ylab("Acceptability") +
geom_segment(x=0, xend=1, y=0.5,yend=0.5, linetype="dotted") +
geom_smooth(data=(subset(Daten, event==0.33 | event ==0.9)),
aes(color="chocolate"),
method="loess", fullrange=FALSE, level=0.95, se=FALSE) +
geom_segment(x=0,xend=0.33,y=1,yend=1, color="chocolate", size=1) +
geom_segment(x=0.9,xend=1,y=0,yend=0, color="chocolate", size=1) +
geom_smooth(data=(subset(Daten, event==0.33 | event==0.67)),
aes(x = event, y = PFV.alone, color="cyan4"),
method="lm",fullrange=FALSE, level=0.95, se=FALSE) +
geom_segment(color="cyan4",x=0,xend=0.33,y=0,yend=0,size=1) +
geom_segment(color="cyan4", x=0.67,xend=1,y=1,yend=1, size=1) +
scale_x_continuous(labels = scales::percent) +
scale_y_continuous(breaks = c (0,0.5,1), labels = scales::percent)+
labs(color='Construction')+
scale_color_manual(labels = c("PFV + PART", "PFV alone"),
values = c("chocolate", "cyan4")) +
theme(legend.position=c(0.05, 0.8),
legend.justification = c("left", "top"),
legend.background = element_rect(fill = "darkgray"))
This code produces a nice graph, but there's one calculation and one plot-related issue that I need help with.
First, and most importantly, I'd like to find out, at what point exactly the geom_smooth (loess) curve for "PFV.and.PART" drops down to 0.5, i.e. hits 50% acceptability. I fear that this might involve some quiet complex code?
Related to the preceding point, I'd like to mark area/line, where both curves are above 0.5 (50% acceptability), or to speak in terms of what I am trying to show: the percentages of the event at which both constructions yield a description that is at least 50% acceptable. This, of course would be based on point 1, as it is neceessary to determine the right limit, whereas the left limit does not constitute a problem as it seems to lie at x=0.5,y=0.5.
I'd really appreciate any help and I hope that I have provided all the necessary information. Please excuse me if this question has been addressed elsewhere.

Here's one approach, which involves fitting a loess model outside of ggplot
# Generate some data
set.seed(2019)
my_dat <- c(sample(c(1,0.5, 0),33, prob = c(0.85,0.15,0), replace = TRUE),
sample(c(1,0.5, 0),33, prob = c(0.1,0.7,0.1), replace = TRUE),
sample(c(1,0.5,0),34, prob = c(0,0.15,0.85), replace = TRUE))
df <- tibble(x = 1:100, y = my_dat)
# fit a loess model
m1 <- loess(y~x, data = df)
df <- df %>%
add_column(pred = predict(m1)) # predict using the loess model
# plot
df %>%
ggplot(aes(x,y))+
geom_point() +
geom_line(aes(y = pred))
# search for a value of x that gives a prediction of 0.5
f <- function(x) { 0.5 - predict(m1)[x]}
uniroot(f, interval = c(1, 100))
# $root
# [1] 53.99997

Related

Looking for a way to plot the following correlation data using ggplot

I'm trying to visualize the following data and would appreciate some advice. Basically I ran a bunch of correlations and want to visualize if variable A or variable B is more strongly correlated with Height, Weight, Volume, etc.
variable <- c('A','B','A','B','A','B')
outcome <- c('Height','Height','Weight', 'Weight', 'Volume', 'Volume')
correlation_coeff <- c(0.76, 0.65, 0.77,0.56,0.91,-0.34)
p_value<- c(0.04,0.03,0.01,0.02,0.001,0.09)
data <- data.frame(variable, outcome, correlation_coeff, p_value)
Since this is not a matrix of correlation coefficients (I never looked at the correlation between Height and Weight, for example) I'm not really sure what to do. Normally I just use the ggcorrplot() function but in this case it's obviously not going to work. Any ideas?
You could plot the correlation plot directly with geom_tile, which closely resembles the look of ggcorrplot.
You can optionally have the p values overlaid:
ggplot(data, aes(variable, outcome, fill = correlation_coeff)) +
geom_tile(color = "black") +
geom_text(aes(label = paste("p =", p_value)), size = 10) +
scale_fill_gradientn(colors = c("blue", "white", "red"), limits = c(-1, 1)) +
theme_minimal() +
theme(axis.title = element_blank(),
axis.text = element_text(size = 16)) +
labs(fill = "Correlation coefficient") +
coord_equal()
ggcorrplot() takes a matrix as an input, so you just need to turn your data into a matrix:
variable <- c('A','B')
outcome <- c('Height','Weight', 'Volume')
correlation_coeff <- c(0.76, 0.65, 0.77,0.56,0.91,-0.34)
p_value<- c(0.04,0.03,0.01,0.02,0.001,0.09)
data <- matrix(correlation_coeff, nrow = 2, ncol = 3, dimnames = list(variable,outcome))
ggcorplot(data)
That should make a 2x3 correlation chart in the style you're looking for.

How to split 300 months into groups of 60 for ggplot2 x-axis readability in r

I have been searching for 2 or 3 days now trying to find a resolution for my problem without success. I apologize if this is really easy or is already out there, but I can't find it. I have 3 data frames that can vary in length from 1 to 300. It is only possible to display about 60 values along the ggplot x-axis without it becoming unreadable and I don't want to omit values, so I am trying to find a way to calculate how long each data frame is and split it into "x" plots of no more than 60 each.
So far I have tried: facet_grid, facet_wrap, transform and split. "Split" works ok for splitting the data, but it adds an "X1." "X2." ... "Xn." to the front of the variable names (where n is the number of partitions it broke the data into). So when I call ggplot2, it can't find my original variable names ("Cost" and "Month") because they look like X1.Cost X1.Month, X2.Cost etc...How do I fix this?
I'm open to any suggestions, especially if I can fix both issues (not hard coding into 60 rows at a time AND breaking into graphs with smaller x-axis ranges). Thanks in advance for your patience and help.
Stephanie (desperate grad student)
Here is some stub code:
```{r setup, include=FALSE}
xsz <- 60 # would like not to have to hardcode this
ix1 <- seq(1:102) # would like to break into 2 or 3 approx equal graphs #
fcost <- sample(0:200, 102)
f.df <- data.frame("Cost" = fcost, "Month" = ix1)
fn <- nrow(f.df)
fr <- rep(1:ceiling(fn/xsz),each=xsz)[1:fn]
fd <- split(f.df,fr)
fc <- numeric(length(fd))
for (i in 1:length(fd)){
print(ggplot(as.data.frame(fd[i]), aes(Month, Cost)) +
geom_line(colour = "darkred", size = .5) +
geom_point(colour = "red", size = 1) +
labs(x = "Projected Future Costs (monthly)", y = "Dollars") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.text.x = element_text(angle = 60, vjust = .6)))
}
```
When I run it, I get:
Error in eval(expr, envir, enclos) : object 'Month' not found
When I do:
names(as.data.frame(fd[1]))
I get:
[1] "X1.Cost" "X1.Month"
Use [[]] for lists.
print(ggplot(as.data.frame(fd[[i]]), aes(Month, Cost)) +
To answer your other question, you have to create a new variable with a plot number. Here I'm using rep.
f.df$plot_number <-rep(1:round(nrow(f.df)/60),each=60,len=nrow(f.df))
Then, you create a list of plots in a loop
plots <- list() # new empty list
for (i in unique(f.df$plot_number)) {
p = ggplot(f.df[f.df$plot_number==i,], aes(Month, Cost)) +
geom_line(colour = "darkred", size = .5) +
geom_point(colour = "red", size = 1) +
labs(x = "Projected Future Costs (monthly)", y = "Dollars") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.text.x = element_text(angle = 60, vjust = .6))
plots[[paste0("p",i)]] <- p # add each plot into plot list
}
With package gridExtra, you can then arrange your plots in a single one.
library(gridExtra)
do.call("grid.arrange", c(plots, ncol=1))

How to get a really periodic polar surface plot with ggplot

Sample data:
mydata="theta,rho,value
0,0.8400000,0.0000000
40,0.8400000,0.4938922
80,0.8400000,0.7581434
120,0.8400000,0.6675656
160,0.8400000,0.2616592
200,0.8400000,-0.2616592
240,0.8400000,-0.6675656
280,0.8400000,-0.7581434
320,0.8400000,-0.4938922
360,0.8400000,0.0000000
0,0.8577778,0.0000000
40,0.8577778,0.5152213
80,0.8577778,0.7908852
120,0.8577778,0.6963957
160,0.8577778,0.2729566
200,0.8577778,-0.2729566
240,0.8577778,-0.6963957
280,0.8577778,-0.7908852
320,0.8577778,-0.5152213
360,0.8577778,0.0000000
0,0.8755556,0.0000000
40,0.8755556,0.5367990
80,0.8755556,0.8240077
120,0.8755556,0.7255612
160,0.8755556,0.2843886
200,0.8755556,-0.2843886
240,0.8755556,-0.7255612
280,0.8755556,-0.8240077
320,0.8755556,-0.5367990
360,0.8755556,0.0000000
0,0.8933333,0.0000000
40,0.8933333,0.5588192
80,0.8933333,0.8578097
120,0.8933333,0.7553246
160,0.8933333,0.2960542
200,0.8933333,-0.2960542
240,0.8933333,-0.7553246
280,0.8933333,-0.8578097
320,0.8933333,-0.5588192
360,0.8933333,0.0000000
0,0.9111111,0.0000000
40,0.9111111,0.5812822
80,0.9111111,0.8922910
120,0.9111111,0.7856862
160,0.9111111,0.3079544
200,0.9111111,-0.3079544
240,0.9111111,-0.7856862
280,0.9111111,-0.8922910
320,0.9111111,-0.5812822
360,0.9111111,0.0000000
0,0.9288889,0.0000000
40,0.9288889,0.6041876
80,0.9288889,0.9274519
120,0.9288889,0.8166465
160,0.9288889,0.3200901
200,0.9288889,-0.3200901
240,0.9288889,-0.8166465
280,0.9288889,-0.9274519
320,0.9288889,-0.6041876
360,0.9288889,0.0000000
0,0.9466667,0.0000000
40,0.9466667,0.6275358
80,0.9466667,0.9632921
120,0.9466667,0.8482046
160,0.9466667,0.3324593
200,0.9466667,-0.3324593
240,0.9466667,-0.8482046
280,0.9466667,-0.9632921
320,0.9466667,-0.6275358
360,0.9466667,0.0000000
0,0.9644444,0.0000000
40,0.9644444,0.6512897
80,0.9644444,0.9997554
120,0.9644444,0.8803115
160,0.9644444,0.3450427
200,0.9644444,-0.3450427
240,0.9644444,-0.8803115
280,0.9644444,-0.9997554
320,0.9644444,-0.6512897
360,0.9644444,0.0000000
0,0.9822222,0.0000000
40,0.9822222,0.6751215
80,0.9822222,1.0363380
120,0.9822222,0.9125230
160,0.9822222,0.3576658
200,0.9822222,-0.3576658
240,0.9822222,-0.9125230
280,0.9822222,-1.0363380
320,0.9822222,-0.6751215
360,0.9822222,0.0000000
0,1.0000000,0.0000000
40,1.0000000,0.6989533
80,1.0000000,1.0729200
120,1.0000000,0.9447346
160,1.0000000,0.3702890
200,1.0000000,-0.3702890
240,1.0000000,-0.9447346
280,1.0000000,-1.0729200
320,1.0000000,-0.6989533
360,1.0000000,0.0000000"
read in a data frame:
foobar <- read.csv(text = mydata)
You can check (if you really want to!) that the data are periodic in the theta direction, i.e., for each given rho, the point at theta=0 and theta=360 are precisely the same. I would like to plot a nice polar surface plot, in other words an annulus colored according to value. I tried the following:
library(viridis) # just because I very much like viridis: if you don't want to install it, just comment this line and uncomment the scale_fill_distiller line
library(ggplot2)
p <- ggplot(data = foobar, aes(x = theta, y = rho, fill = value)) +
geom_tile() +
coord_polar(theta = "x") +
scale_x_continuous(breaks = seq(0, 360, by = 45), limits=c(0,360)) +
scale_y_continuous(limits = c(0, 1)) +
# scale_fill_distiller(palette = "Oranges")
scale_fill_viridis(option = "plasma")
I'm getting:
Yuck! Why the nasty hole in the annulus? If I generate a foobar data frame with more rows (more theta and rho values) the hole gets smaller. This isn't a viable solutione, both because computing data at more rho/theta values is costly and time-consuming, and both because even with 100x100=10^4 rows I still get a hole. Also, with a bigger dataframe, ggplot takes forever to render the plot: the combination of geom_tile and coord_polar is incredibly inefficient. Isn't there a way to get a nice-looking polar plot without unnecessarily wasting memory & CPU time?
Edit: all value of data for theta=360 were removed (repeat from the values of theta=0)
ggplot(data = foobar, aes(x = theta, y = rho, fill = value)) +
geom_tile() +
coord_polar(theta = "x",start=-pi/9) +
scale_y_continuous(limits = c(0, 1))+
scale_x_continuous(breaks = seq(0, 360, by = 45))
I just removed limits from scale_x_continuous
That gives me:

Density graph using ggplot2

I created a craps simulator and generated some data as a result. I'm looking at the probability of wins vs losses given the number of rolls. Here's the first 25 results of my data, the rest looks exactly the same (just 50,000 rows long):
Here's the code I'm using to create a density graph with my data:
ggplot(df, aes(x=rollCount, fill = winOrLoss)) +
#geom_histogram(binwidth = 1, position = "identity", alpha=0.6) +
scale_y_sqrt() +
scale_x_continuous(limits=c(1, 32), breaks=1:32) +
labs(title="Roll Count Histogram", x="Roll Count", y="Count") +
geom_hline(aes(yintercept=0)) +
geom_density()
and here's the resulting graph:
My hope is that the density graph looked something like this:
My main questions is how I can get it to be much more smooth instead of the up and down it currently looks like. Do I need to do something to my data before I put it in the graph? I just put it into a data frame with df <- data.frame(rollCount, winOrLoss) and let ggplot take care of the rest.
You have a discrete distribution. stat_density assumes a continuous distribution. Use geom_histogram instead:
set.seed(42)
rollCount <- sample(1:20, 50, TRUE, prob = 20:1)
winOrLoss <- sample(c("W", "L"), 50, TRUE)
DF <- data.frame(rollCount, winOrLoss)
library(ggplot2)
ggplot(DF, aes(x=rollCount, fill = winOrLoss)) +
geom_histogram(binwidth = 1, position = "identity", alpha=0.6,
aes(y = ..density..))

ggplot2 and regression lines and R^2 values

I know there have been a number of entries with regards to adding R^2 values to plots, but I am having trouble following the codes. I am graphing a scatter plot with three categories. I have added a linear regression line for each one. I would now like to add r^2 values for each but I can't figure out how to do this.
My code:
veg <- read.csv("latandwtall2.csv", header=TRUE)
library("ggplot2")
a <- ggplot(veg, aes(x=avglat, y=wtfi, color=genus)) + geom_point(shape=19, size=4)
b <- a + scale_colour_hue(l=50) + stat_smooth(method = "lm", formula = y ~ x, size = 1, se = FALSE)
c <- b + labs(x="Latitude", y="Weight (g)")
d <- c + theme_bw()
e <- d + theme(panel.grid.minor=element_blank(), panel.grid.major=element_blank())
#changes size of text
f <- e + theme(
axis.title.x = element_text(color="black", vjust=-0.35, size=15, face="bold"),
axis.title.y = element_text(color="black" , vjust=0.35, size=15, face="bold")
)
g <- e+theme(legend.key=element_rect(fill='white'))
g
Any help with how to add R^2 values would be greatly appreciated. Thanks!
If you build a data frame with the r-squared values, you might be able to (mostly) automate the positioning of the annotation text by including it as a call to geom_text.
Here's a toy example. The rsq data frame is used in geom_text to place the r-squared labels. In this case, I've set it up to put the labels just after the highest x-value and the predict function gets the y-value. It's probably too much work for a single plot, but if you're doing this a lot, you can turn it into a function so that you don't have to repeat the set-up code every time, and maybe add some fancier logic to make label placement more flexible:
library(reshape2) # For melt function
# Fake data
set.seed(12)
x = runif(100, 0, 10)
dat = data.frame(x, y1 = 2*x + 3 + rnorm(100, 0, 5),
y2 = 4*x + 20 + rnorm(100, 0, 10))
dat.m = melt(dat, id.var="x")
# linear models
my1 = lm(y1 ~ x, data=dat)
my2 = lm(y2 ~ x, data=dat)
# Data frame for adding r-squared values to plot
rsq = data.frame(model=c("y1","y2"),
r2=c(summary(my1)$adj.r.squared,
summary(my2)$adj.r.squared),
x=max(dat$x),
y=c(predict(my1, newdata=data.frame(x=max(dat$x))),
predict(my2, newdata=data.frame(x=max(dat$x)))))
ggplot() +
geom_point(data=dat.m, aes(x, value, colour=variable)) +
geom_smooth(data=dat.m, aes(x, value, colour=variable),
method="lm", se=FALSE) +
geom_text(data=rsq, aes(label=paste("r^2 == ", round(r2,2)),
x=1.05*x, y=y, colour=model, hjust=0.5),
size=4.5, parse=TRUE)
I can't really reproduce what you're doing but you need to use annotate()
Something that could work (puting the R2 on the 10th point) would be :
R2 = 0.4
i = 10
text = paste("R-squared = ", R2, sep="")
g = g + annotate("text", x=avglat[i], y=wtfi[i], label=text, font="Calibri", colour="red", vjust = -2, hjust = 1)
Use vjust and hjust to adjust the position of the text to the point (change the i), and just fill the variable R2 with your computed rsquared. You can choose the point you like or manually enter the x,y coordinate it's up to you. Does that help ?
PS. I put extra parameters (font, colours) so that you have the flexibility to change them.
Build the model separately, get the R^2 from there, and add it to the plot. I'll give you some dummy code, but it would be of better quality if you had given us a sample data frame.
r2 = summary(lm(wtfi ~ avglat, data=veg))$r.squared
#to piggyback on Romain's code...
i=10
g = g + annotate("text", x=avglat[i], y=wtfi[i], label=round(r2,2), font="Calibri", colour="red", vjust = -2, hjust = 1)
The way I wrote it here you don't need to hard-code the R^2 value in.

Resources