facet_grid() causing crash - r

I can not figure out what I'm missing. I keep crashing r or causing it to give very weird plots.
> head(vData)
vix.Close vstoxx vxfxi.Close Date
2011-03-16 29.40 35.2293 35.84 2011-03-16
2011-03-17 26.37 30.6133 31.77 2011-03-17
2011-03-18 24.44 28.5337 29.31 2011-03-18
2011-03-21 20.61 25.2355 25.95 2011-03-21
2011-03-22 20.21 24.3914 24.52 2011-03-22
2011-03-23 19.17 23.9226 24.03 2011-03-23
The below works:
p1.1<-ggplot(data = vData, aes(x = Date, y = vix.Close)) + geom_line(col= "red")
p1.1
p2<-p1.1 + geom_line(data = vData[!is.na(vData$vstoxx),], aes(x = Date, y = vstoxx), col="blue")
p2
p3<-p2 + geom_line(data = vData[!is.na(vData$vxfxi.Close),], aes(x = Date, y = vxfxi.Close), col="green")
p3
p4<-p3 + labs(title = "Volatility Indexes", x = "Time", y = "Index")
p4
But this is the part that is giving me trouble:
p5<- p4 + facet_grid(Date~., scales = Date)
p5

I echo what baptiste said: what is it you're trying to do? The code you've provided suggests that you're trying to create a separate line chart for each date in the dataset, which doesn't make much sense. For this demonstration, I'll show you how to facet the data by year to see the correlations between the different measurements of volatility over time. If you provide more detail as a comment, I'll revisit the code.
First let's take a look at what you've already done.
library(tidyverse)
library(gridExtra)
library(lubridate)
library(reshape2)
#Generate dummy data
vData <- tibble(
vix.Close = rnorm(1000, mean = 12, sd = 5),
vstoxx = rnorm(1000, mean = 12, sd = 5),
vxfxi.Close = rnorm(1000, mean = 12, sd = 5),
Date = as.Date(1:1000, origin = '2011-01-01')
)
# Generate individual plots per your question
p1.1 <-
ggplot(data = vData, aes(x = Date, y = vix.Close)) + geom_line(col = "red")
p1.1
p2 <-
p1.1 + geom_line(data = vData[!is.na(vData$vstoxx), ], aes(x = Date, y = vstoxx), col =
"blue")
p2
p3 <-
p2 + geom_line(data = vData[!is.na(vData$vxfxi.Close), ], aes(x = Date, y = vxfxi.Close), col =
"green")
p3
p4 <-
p3 + labs(title = "Volatility Indexes", x = "Time", y = "Index")
p4
You're creating four different plots and then layering them on top of each other. This approach works here, but it's cumbersome to make changes to each of the calls to ggplot or if you want to add/remove variables. Let's move your data to a "long" format and simplify the ggplot call.
# Melt the data into three columns and remove NAs
vData <- melt(vData, id = "Date") %>%
filter(!is.na(value)) %>%
tbl_df()
# Create one ggplot for all three indexes
ggplot(data = vData, aes(x = Date, y = value, color = variable)) +
geom_line() +
labs(title = "Volatility Indexes", x = "Time", y = "Index")
Now back to the big problem: you shouldn't be faceting by date because that would give you a huge number of tiny unreadable line charts. There are a number of other facets that might make sense. For example, you could look at the distribution of the three indexes by year.
ggplot(data = vData, aes(x = variable, y = value, color = variable)) +
geom_boxplot() +
labs(title = "Volatility Indexes", x = "", y = "") +
facet_grid(year(Date) ~ .)
So put some thought into what exactly you want to show.

Related

Dodging vertical lines for median_hilow in ggplot

I need to plot lines that show median and IQR for 3 replicates, across multiple samples.
Data:
sampleid <- rep(1:20, each = 3)
replicate <- rep(1:3, 20)
sample1 <- seq(120,197, length.out = 60)
sample2 <- seq(113, 167, length.out = 60)
sample3 <- seq(90,180, length.out = 60)
What I have done so far?
df <- as.data.frame(cbind(sampleid,replicate,sample1, sample2, sample3))
library(reshape2)
long <- melt(df,id.vars = c('sampleid', 'replicate'))
ggplot(data = long, aes(x = variable, y = value, colour = factor(replicate))) + stat_summary(fun.data=median_hilow, conf.int=.5)
However, the plot of the IQR for replicates that I am getting are overlapped with each other for each sample. I would like to find out a way to "dodge" these 3 lines so that they are visible next to each other, without changing other parameters of the plot that I have achieved. Is this achievable?
You have to introduce jitter to the lines:
ggplot(data = long, aes(x = variable, y = value, colour = factor(replicate))) +
stat_summary(fun.data=median_hilow, fun.args = (conf.int=.5), position = "jitter")
Please note you also need to have your conf.int=5 wrapped in the fun.args.
Alternatively, change your x to factor(replicate) and add facet_wrap:
ggplot(data = long, aes(x = factor(replicate), y = value, colour = factor(replicate))) +
stat_summary(fun.data=median_hilow, fun.args = (conf.int=.5)) +
facet_wrap(~variable)

ggplot change line color specified by x axis values

Code to reproduce:
myDat <- data.frame(Event = rep(c("Arrival", "Departure"), 3),
AtNode = c("StationA", "StationA", "Track", "Track", "StationB", "StationB"),
Lane = c("Lane1", "Lane1", "Lane2", "Lane2", "Lane1", "Lane1"),
atTime = c(10, 12, 18, 20, 34, 36),
Type = c("Station", "Station", "Track", "Track", "Station", "Station"),
Train = 1 )
ggplot(data =myDat, aes(x = atTime, y=factor(AtNode, levels = unique(paste(myDat[order(myDat$atTime),"AtNode"]))), group = Train, colour = Lane ))+
geom_point(data = myDat)+
geom_path(data = myDat[which(!grepl(pattern = "Track", myDat$Type)),])
Now i need to project the two green points (Y = "Track") on the orange line and color the line between the projected points the same color as the points.
Expected result: (without the points (Y ="Track")
Thanks in advance for every hint or trick!
Cheers
I don't think your output is the right way of showing what you want. You have factors on your y-axis, which means it ranges between 1 and 3.
Therefore, projecting a line there means nothing in terms of y-axis values.
For me, the correct way of showing your data would be like this
ggplot(data =myDat,
aes(x = atTime, y=factor(AtNode, levels = unique(paste(myDat[order(myDat$atTime),"AtNode"]))),
group = AtNode, colour = Lane ))+
geom_point()+
geom_line() +
labs(y = 'AtNode')
However, to do it how you asked, you can do some simple trigonometry to project your line segment
x1 = 1 + tan(asin(2/sqrt(484)))*6 #y projection given x = 18
x2 = 1 + tan(asin(2/sqrt(484)))*8 #y projection given x = 20
foo = data.frame(x = c(18,20), y = c(x1, x2), Lane = "Lane2")
ggplot(data = myDat, aes(x = atTime, y=factor(AtNode, levels = unique(paste(myDat[order(myDat$atTime),"AtNode"]))), group = 1, colour = Lane ))+
geom_path(data = myDat[which(!grepl(pattern = "Track", myDat$Type)),]) +
geom_line(data = foo, aes(x = x, y = y, color = Lane), size = 1) +
scale_y_discrete(drop = FALSE)
I don't think there is a quick solution to this, but you could do something like this:
myDat$AtNode <- factor(myDat$AtNode, levels = unique(paste(myDat[order(myDat$atTime),"AtNode"]))) #Generate factor here so we can use in imputation calculation
impute_rows <- which(myDat$Type == "Track") #Select rows to impute
slope_df <- myDat[impute_rows + c(-1,1), ] #Select rows before and after imputation to calculate slope
line <- lm(as.numeric(AtNode) ~ atTime, data = slope_df) #Get slope of line so we can do the calculations
df <- data.frame(x = myDat[impute_rows, "atTime"], y = myDat[impute_rows, "atTime"]*line$coefficients[["atTime"]] + line$coefficients[["(Intercept)"]], Lane = myDat[impute_rows,"Lane"], Train = myDat[impute_rows,"Train"])
ggplot(data =myDat, aes(x = atTime, y=AtNode, group = Train, colour = Lane ))+
geom_path(data = myDat[which(!grepl(pattern = "Track", myDat$Type)),]) +
geom_path(data = df, aes(x = x, y = y), size = 2) +
scale_y_discrete(drop = FALSE)
The idea is as follows:
Identify the rows you want to impute: which()
Identify the rows before and after the ones to impute slope_df
Using the rows before and after the desired values to impute generate equation of line you want to impute along (using the slope_df)
Generate data based on the line df <- data.frame(...)
Note that you also need the scale_y_discrete(drop = FALSE) so that the Track level isn't removed from the plot.

How to create multiple (6) plots with ggplot and save them to a pdf file?

I have a matrix (pred_matrix, dim = 1e6, 250), the rows are "pixelstacks" of 250 NDVI values of a Landsat scene, from which i did a "fuzzy cmeans" classification witch 6 centers (classes), stored in the list results. I want now to plot a random subset of each class of the 1e6 rows. This is my quick and dirty code so far:
random_index <- floor(runif(10000, 1, 1e6+1))
random_cluster <- results[[6]]$cluster[random_index]
random_pred_matrix <- pred_matrix[random_index, ]
dates_subse_after_pred <- rdn_num[rm_na_pred_df]
random_res <- cbind(random_pred_matrix, random_cluster)
random_res <- t(random_res)
random_res <- cbind(c(dates_subse_after_pred, 1), random_res)
df_1 <- data.frame(random_res[1:250,c(TRUE, random_cluster==1)])
df_2 <- data.frame(random_res[1:250,c(TRUE, random_cluster==2)])
df_3 <- data.frame(random_res[1:250,c(TRUE, random_cluster==3)])
df_4 <- data.frame(random_res[1:250,c(TRUE, random_cluster==4)])
df_5 <- data.frame(random_res[1:250,c(TRUE, random_cluster==5)])
df_6 <- data.frame(random_res[1:250,c(TRUE, random_cluster==6)])
df_1.long <- melt(df_1, id.vars = 1)
df_1.long$X1 <- as.Date(df_1.long$X1)
df_2.long <- melt(df_2, id.vars = 1)
df_2.long$X1 <- as.Date(df_2.long$X1)
df_3.long <- melt(df_3, id.vars = 1)
df_3.long$X1 <- as.Date(df_3.long$X1)
df_4.long <- melt(df_4, id.vars = 1)
df_4.long$X1 <- as.Date(df_4.long$X1)
df_5.long <- melt(df_5, id.vars = 1)
df_5.long$X1 <- as.Date(df_5.long$X1)
df_6.long <- melt(df_6, id.vars = 1)
df_6.long$X1 <- as.Date(df_6.long$X1)
ggplot(df_1.long) +
geom_line( aes(x = X1, y= value, group = variable), color = "lightblue")
ggplot(df_2.long) +
geom_line( aes(x = X1, y= value, group = variable), color = "blue")
ggplot(df_3.long) +
geom_line( aes(x = X1, y= value, group = variable), color = "lightgreen")
ggplot(df_4.long) +
geom_line( aes(x = X1, y= value, group = variable), color = "green")
ggplot(df_5.long) +
geom_line( aes(x = X1, y= value, group = variable), color = "pink")
ggplot(df_6.long) +
geom_line( aes(x = X1, y= value, group = variable), color = "red")
After this i have just hit 6 times the export button in rstudio and inserted it all in a word document...
Is there a way to do this in a loop? Or even produce a final pdf containing the 6 plots?
Separate file
I think what you are after is having the following six times in your code.
ggsave("filename.png", # or pdf if you like
plot = last_plot(), # or give ggplot object name as in myPlot,
width = 5, height = 5,
units = "in", # other options c("in", "cm", "mm"),
dpi = 300)
For example,
library(ggplot2)
p1 <- ggplot(df_1.long) +
geom_line( aes(x = X1, y= value, group = variable),
color = "lightblue")
ggsave("df1.png", plot = p1, dpi = 300)
All in one
If you want all the six files in one pdf, then first do
pdf("file_name.pdf")
# do your ggplots here
p1
p2
p6
dev.off()
If you are using Rstudio I would recommend writing your code in a Rmarkdown file and then exporting to pdf directly.

Having trouble plotting multiple data sets and their confidence intervals on the same GGplot. Data Frame included

First off, here is my data frame:
> df.combined
MLSupr MLSpred MLSlwr BPLupr BPLpred BPLlwr
1 1.681572 1.392213 1.102854 1.046068 0.8326201 0.6191719
2 3.363144 2.784426 2.205708 2.112885 1.6988250 1.2847654
3 5.146645 4.232796 3.318946 3.201504 2.5999694 1.9984346
4 6.930146 5.681165 4.432184 4.368555 3.6146180 2.8606811
5 8.713648 7.129535 5.545422 5.480557 4.5521112 3.6236659
6 10.497149 8.577904 6.658660 6.592558 5.4896044 4.3866506
7 12.280651 10.026274 7.771898 7.681178 6.3907488 5.1003198
8 14.064152 11.474644 8.885136 8.924067 7.4889026 6.0537381
9 15.847653 12.923013 9.998373 10.125539 8.5444783 6.9634176
10 17.740388 14.429805 11.119222 11.327011 9.6000541 7.8730970
11 19.633122 15.936596 12.240071 12.620001 10.7425033 8.8650055
12 21.525857 17.443388 13.360919 13.821473 11.7980790 9.7746850
13 23.535127 19.010958 14.486789 15.064362 12.8962328 10.7281032
14 25.544397 20.578528 15.612659 16.307252 13.9943865 11.6815215
15 27.553667 22.146098 16.738529 17.600241 15.1368357 12.6734300
16 29.562937 23.713668 17.864399 18.893231 16.2792849 13.6653384
17 31.572207 25.281238 18.990268 20.245938 17.4678163 14.6896948
18 33.581477 26.848807 20.116138 21.538928 18.6102655 15.6816033
19 35.590747 28.416377 21.242008 22.891634 19.7987969 16.7059597
20 37.723961 30.047177 22.370394 24.313671 21.0352693 17.7568676
So, as you can see, i have predicted values along with the upper and lower bounds of their 95% CI. I'd like to plot the lines and their ribbons for MLS and BPL in the same plot but i'm not quite sure how.
Right now, for a single data set, I am using this command:
ggplot(BULISeason, aes(x = 1:length(BULISeason$`Running fit`), y = `Running fit`)) +
geom_line(aes(fill = "black")) +
geom_ribbon(aes(ymin = `Running lwr`, ymax = `Running upr`, fill = "red"),alpha = 0.25)
Note: The variables are different for the independent data frames.
You can, of course, construct your plots as a series of layers like you imply in your question. For that you can use the following code:
ggplot(data = df.combined) +
geom_ribbon(aes(x = x, ymin = MLSlwr, ymax = MLSupr),
fill = "blue", alpha = 0.25) +
geom_line(aes(x = x, y = MLSpred), color = "black") +
geom_ribbon(aes(x = x, ymin = BPLlwr, ymax = BPLupr),
fill = "red", alpha = 0.25) +
geom_line(aes(x = x, y = BPLpred), color = "black")
and obtain something like this:
However, reshaphing your dataset to a "tidy", or long format, has some advantages. For example you could map the origin of the predictions into a color and the type of prediction into line types in the resulting plot:
You can achieve that using the following code:
library(tidyr)
tidy.data <- df.combined %>%
# add id variable
mutate(x = 1:20) %>%
# reshape to long format
gather("variable", "value", 1:6) %>%
# separate variable names at position 3
separate(variable,
into = c("model", "line"),
sep = 3,
remove = TRUE)
# plot
ggplot(data = tidy.data, aes(x = x,
y = value,
linetype = line,
color = model)) +
geom_line() +
scale_linetype_manual(values = c("dashed", "solid", "dashed"))
You can still use ribbons in your plot by spreading your dataframe back to a wide(r) format:
# back to wide
wide.data <- tidy.data %>%
spread(line, value)
# plot with ribbon
ggplot(data = wide.data, aes(x = x, y = pred)) +
geom_ribbon(aes(ymin = lwr, ymax = upr, fill = model), alpha = .5) +
geom_line(aes(group = model))
Hope this helps!

Plot time series and forecast simultaneously using ggplot2

I have a time series with forecast and confidence interval data, I wanted to plot them simultaneously using ggplot2. I'm doing it by the code below:
set.seed(321)
library(ggplot2)
#create some dummy data similar to mine
sample<-rnorm(350)
forecast<-rnorm(24)
upper<-forecast+2*sd(forecast)
lower<-forecast-2*sd(forecast)
## wrap data into a data.frame
df1 = data.frame(time = seq(325,350,length=26), M = sample[325:350], isin = "observations")
df2 = data.frame(time = seq(351,374,length=24), M = forecast , isin = "my_forecast")
df3 = data.frame(time = seq(351,374,length=24), M = upper ,isin = "upper_bound")
df4 = data.frame(time = seq(351,374,length=24), M = lower, isin = "lower_bound")
df = rbind(df1, df2, df3, df4)
## ggplot object
ggplot(df, aes(x = time, y = M, color = isin)) + geom_line()
How can I join upper and lower lines in one color? and also how can I set specific colors to forecast and sample?
Use scale_colour_manual:
ggplot(df, aes(x = time, y = M, color = isin)) + geom_line() +
scale_colour_manual(values=c(observations='blue', my_forecast='red', upper_bound='black', lower_bound='black'))
edit
This is another option, inspired by #rnso answer.
ggplot(df1, aes(x = time, y = M)) + geom_line(colour='blue') +
geom_smooth(aes(x=time, y=M, ymax=upper_bound, ymin=lower_bound),
colour='red', data=df5, stat='identity')
Following may be useful:
ggplot() +
geom_line(data=df1, aes(x = time, y = M, color = isin)) +
stat_smooth(data=df2, aes(x = time, y = M, color = isin))
'method' option can also be used in stat_smooth()

Resources