Adding geom_rug of observed values in ggplot - r

Using the goats data set from the ResourceSelection package I can look at the relationship between ELEVATION and a binary response (STATUS) using glm.
library(ResourceSelection)
library(ggplot2)
mod <- glm(STATUS ~ ELEVATION, family=binomial, data = goats)
summary(mod)
I then want to predict over a larger range of ELEVATIONand do so with the following code.
df <- data.frame(ELEVATION = seq(0,5000,1))
df$Preds <- predict(mod, newdata = df, type="response")
ggplot(df, aes(x=ELEVATION, y = Preds)) + geom_point()
Now, with the resulting ggplot how can I add a rug to the bottom of the figure that shows the observed values of ELEVATION from the goats data set when STATUS == 1. e.g. I want a rug showing goats$ELEVATION[goats$STATUS == 1]
I have tried adding geom_rug(), but am not sure how to include the values from the goats data frame rather than the df that I used in the ggplot code. In other words, how can I include a rug of the observed values (subset as indicated above) from the original data in the plot with the new predicted data from the df data frame?
Thanks in advance!

geom_rug has a data argument (all geoms do), so you should just give it that data you want to be plotted.
ggplot(df, aes(x=ELEVATION, y = Preds)) + geom_point() +
geom_rug(data = subset(goats, STATUS == 1),
aes(x = ELEVATION), inherit.aes = F)
In this case, you map y = Preds, which is a column not present in the goats data, so we need to set inherit.aes = F for the rug layer using the goats data to prevent ggplot from looking for the nonexistent column.

Related

how to make the Expected value curve for a longitudinal data in r

I have a longitudinal data where I would like to make the expected value curve. In the x-axis I have time and in the y-axis I have a continuous variable.
Without data it is hard to reproduce your problem first I generated some random data:
df <- data.frame(Age = sample(1:50),
variable = runif(50, 0, 1))
I am not sure if this is what you want, but you can use geom_smooth to create an expected value curve using this code:
library(tidyverse)
df %>%
ggplot(aes(x = Age, y = variable)) +
geom_point() +
geom_smooth()
Output:

How can I manually add labels to multiple ggplot2 mappings created through a for-loop?

I have been working on plotting several lines according to different probability levels and am stuck adding labels to each line to represent the probability level.
Since each curve plotted has varying x and y coordinates, I cannot simply have a large data-frame on which to perform usual ggplot2 functions.
The end goal is to have each line with a label next to it according to the p-level.
What I have tried:
To access the data comfortably, I have created a list df with for example 5 elements, each element containing a nx2 data frame with column 1 the x-coordinates and column 2 the y-coordinates. To plot each curve, I create a for loop where at each iteration (i in 1:5) I extract the x and y coordinates from the list and add the p-level line to the plot by:
plot = plot +
geom_line(data=df[[i]],aes(x=x.coor, y=y.coor),color = vector_of_colors[i])
where vector_of_colors contains varying colors.
I have looked at using ggrepel and its geom_label_repel() or geom_text_repel() functions, but being unfamiliar with ggplot2 I could not get it to work. Below is a simplification of my code so that it may be reproducible. I could not include an image of the actual curves I am trying to add labels to since I do not have 10 reputation.
# CREATION OF DATA
plevel0.5 = cbind(c(0,1),c(0,1))
colnames(plevel0.5) = c("x","y")
plevel0.8 = cbind(c(0.5,3),c(0.5,1.5))
colnames(plevel0.8) = c("x","y")
data = list(data1 = line1,data2 = line2)
# CREATION OF PLOT
plot = ggplot()
for (i in 1:2) {
plot = plot + geom_line(data=data[[i]],mapping=aes(x=x,y=y))
}
Thank you in advance and let me know what needs to be clarified.
EDIT :
I have now attempted the following :
Using bind_rows(), I have created a single dataframe with columns x.coor and y.coor as well as a column called "groups" detailing the p-level of each coordinate.
This is what I have tried:
plot = ggplot(data) +
geom_line(aes(coors.x,coors.y,group=groups,color=groups)) +
geom_text_repel(aes(label=groups))
But it gives me the following error:
geom_text_repel requires the following missing aesthetics: x and y
I do not know how to specify x and y in the correct way since I thought it did this automatically. Any tips?
You approach is probably a bit to complicated. As far as I get it you could of course go on with one dataset and use the group aesthetic to get the same result you are trying to achieve with your for loop and multiple geom_line. To this end I use dplyr:.bind_rows to bind your datasets together. Whether ggrepel is needed depends on your real dataset. In my code below I simply use geom_text to add an label at the rightmost point of each line:
plevel0.5 <- data.frame(x = c(0, 1), y = c(0, 1))
plevel0.8 <- data.frame(x = c(0.5, 3), y = c(0.5, 1.5))
library(dplyr)
library(ggplot2)
data <- list(data1 = plevel0.5, data2 = plevel0.8) |>
bind_rows(.id = "id")
ggplot(data, aes(x = x, y = y, group = id)) +
geom_line(aes(color = id)) +
geom_text(data = ~ group_by(.x, id) |> filter(x %in% max(x)), aes(label = id), vjust = -.5, hjust = .5)

How can I add rows to a training set for Arima Model?

I obtain both sets from a single column, test set is the continuity of training set. But when plotting the graph both begin at the same point, not as a continuity.
training_set = series[1:500,]
test_set = series[501:790,]
plot.default(training_set, type = "l")
lines(test_set, col="blue", type = "l")
How can I plot both sets one starting after one finishes?
If you have data from one column, it will be difficult to create the plot you expect.
Take a look at my example. I plot out all the data and marked the check points as red points.
library(tidyverse)
n=100
df = tibble(
x = 1:n,
y = rnorm(n)
)
training_set_idx = sample(1:n, n*0.8)
df %>% ggplot(aes(x, y))+
geom_line()+
geom_point(data=df[-training_set_idx,], color="red", size=3)
Well, unless you want to have the test data selected arbitrarily (for me it's lines 40:60). Then you can do it like this:
df %>% ggplot(aes(x, y))+
geom_line()+
geom_line(data=df[40:60,], color="blue", size=3)
However, you should not divide the data into training and checking sets arbitrarily!
Going back to my first example, the training data will be df[training_set_idx] and the test data will be df[-idx_training_set].

How can I produce a scatterplot using ggplot in R where each column is a different colour?

I have a list of model-output in R that I want to plot using ggplot. I want to produce a scatter plot within which every column of data is a different colour. In the example here, I have three model outputs which I want to plot against 'measured'. What I want in the end is a scatter with three different 'clouds' of points, each of which is a different colour. Here is a reproducible example of what I have so far:
library(ggplot)
library(tidyverse)
#data for three different models as well as a column for 'observations' (measured)
output <- list(model1 = 1:10, model2 = 22:31, model3=74:83)
#create the dataframe
df <- data.frame(
predicted = output,
measured = 1:length(output[[1]]),
#year = as.factor(data$year),
#site = data$site
#model = as.factor(names(output)),
#stringsAsFactors=TRUE)
fix.empty.names = TRUE)
#fix the column names
colnames(df)<-names(output)
#plot the data with a different colour for each column of data
p <- ggplot(df) +
geom_point(
aes(
measured,
predicted,
colour =colnames(df)
)
) +
ylim(-5, 90)+
theme_minimal()
p + geom_hline(yintercept=0)
print(p)
I am getting the error: Error in FUN(X[[i]], ...) : object 'measured' not found
why is 'measured' not being found? I can see it in the df?
Perhaps I needs to collapse all the model outputs into one column a create a column as a 'factor' column to 'assign' each data point to a particular model?
The first issue is that your output list only has as many elements as you have models, so it has no name for the last "measured" column and that gets overwritten with NA.
Compare:
colnames(df)<-names(output). # NA in last col
colnames(df)<-c(names(output), "measured"). # fixed
Then, to plot your data in ggplot2 it's almost always better to convert to longer, "tidy" format, with one row per observation. pivot_longer from tidyr is great for that.
df %>%
pivot_longer(-measured, # don't pivot "measured" -- keep in every row
names_to = "model",
values_to = "predicted") %>%
ggplot() +
geom_point(
aes(
measured,
predicted,
colour = model
)
) +
ylim(-5, 90)+
theme_minimal() +
geom_hline(yintercept=0)
You changed the name of your object :
colnames(df)<-names(output)
So now your columns were not found.
I reorganized your object into a data frame that can be easily understood by ggplot2. Do not hesitate to look at your objects.
Here is one option :
library(ggplot2)
library(tidyverse)
#data for three different models as well as a column for 'observations' (measured)
output <- list(model1 = 1:10, model2 = 22:31, model3=74:83)
#create the dataframe
df <- data.frame(
predicted = unlist(output),
measured = 1:length(unlist(output)),
model = names(output)
)
#plot the data with a different colour for each column of data
p <- ggplot(df) +
geom_point(aes(measured, predicted,colour = model)) +
ylim(-5, 90)+
theme_minimal()
p + geom_hline(yintercept=0)
print(p)
plotwithgroups
If you add this line :
facet_grid(~model) +
You can get this which sounds like what you were asking :
plotwithfacet

How to create facet_grid on timeseries data?

I have a dataset like this:
I want to show the trend, the x-axis containing the year values and the y-axis the corresponding values from the columns, so Maybe
ggplot(data,aes(year,bm))
I want to not just plot one column but Maybe more of them. As in one plot it seems to be too much detailed I wanted to make use of facet_grid to arrange the plots nicely next to eacht other. However It did not work for my data as I think I have no 'real' objects to compare.
Does anyone has an idea how to I can realize facet_grid so it loos like something like this (in may case p1=BM and p2=BMSW):
The problem is your data format. Here is an example with some fake data
library(tidyverse)
##Create some fake data
set.seed(3)
data <- tibble(
year = 1991:2020,
bm = rnorm(30),
bmsw = rnorm(30),
bmi = rnorm(30),
bmandinno = rnorm(30),
bmproc = rnorm(30),
bmart = rnorm(30)
)
##Gather the variables to create a long dataset
new_data <- data %>%
gather(model, value, -year)
##plot the data
ggplot(new_data, aes(x = year, y = value)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
facet_grid(~model)

Resources