I need help plotting using smooth.spline in r - r

I am using spline.smooth in r and need some help. Here are my codes:
my_df_1 <- data.frame(date = seq(as.Date("2020/1/15"), by = "day", length.out = 100),
rate = runif(100, min=0, max=0.7),
count=sample(2:50, 100, replace = TRUE))
fit <- smooth.spline(my_df_1$date, my_df_1$rate)
plot(my_df_1$date, my_df_1$rate)
lines(fit)
First, I want to change the size of points on plot using column count. Second, is there a way to plot smooth spline in ggplot? Third, how can I format my x-axis for the date. I want to have a day and month in the x-axis. Thanks in advance everyone!

How about this:
my_df_1 %>%
ggplot(aes(x = date, y = rate)) +
geom_smooth() +
geom_point(aes(size = count)) +
scale_x_date(date_labels = "%d-%B")

Related

How to (re)arrange panels of facet_wrap/_grid?

R noob here. I have been stumped on this graph all day and solutions like this and this this seem to hold my answer but I cannot get them to work for me.
I have a data frame that is a large version of the below sample which I am trying to plot using ggplot.
# create data
df <- data.frame(
"ID" = rep(1:5, each = 4),
"Date" = c(seq(as.Date("2019/09/18"), by = "day", length.out = 4),
seq(as.Date("2019/09/18"), by = "day", length.out = 4),
seq(as.Date("2020/08/07"), by = "day", length.out = 4),
seq(as.Date("2020/09/12"), by = "day", length.out = 4),
seq(as.Date("2020/09/29"), by = "day", length.out = 4)),
"MaxDepth" = round(runif(20, min = 10, max = 50), 1),
"Trip" = rep(1:5, each = 4)
)
# plot using ggplot
ggplot(df, aes(Date, MaxDepth, col = factor(Trip))) +
geom_line() +
facet_grid(ID ~ format(Date, "%Y"), scales = "free_x") +
scale_y_reverse() +
scale_x_date(date_labels = "%b") +
labs(title = "Daily maximum depth\n",
x = "",
y = "Depth [m]\n",
col = "Fishing trip")
This turns out nicely as a two column, eleven row faceted graph with the fishing trips as colours.
However, it includes a lot of empty panels which I would like to avoid by creating a one column graph still with all eleven ID rows but that are separated by the same split label the two columns had. I.e. I would like the two individuals that were in the LHS 2019 plot to have that 2019 label on top, separated by the 2020 label from the other 9 individuals.
.
Hope this is clear. Please correct me or let me know what to improve for a better question.
Grateful for any help! Even if those are suggestions that this is not a good way of representation or something like this is simply not possible. Thank you all!
Here is a possible way. I am not sure whether it works for your real data.
library(ggplot2)
library(patchwork)
library(dplyr)
plot_fun <- function(dtt){
ggplot(dtt, aes(Date, MaxDepth, col = factor(Trip))) +
geom_line() +
facet_grid(ID ~ format(Date, "%Y"), scales = "free_x") +
scale_y_reverse() +
scale_x_date(date_labels = "%b") +
labs(x = NULL, y = NULL, col = "Fishing trip")
}
p1 <- plot_fun(df %>% filter(format(Date, '%Y') == '2019'))
p2 <- plot_fun(df %>% filter(format(Date, '%Y') == '2020'))
p1 / p2
ggsave('~/Downloads/test.png', width = 6, height = 6)

Triple variable stacked line/area graph in R

I'm trying to generate a stacked line/area graph utilizing the ggplot and geom_area functions. I have my data loaded into R correctly from what I can tell. Every time I generate the plot, the graph is empty (even though the axis looks correct except for the months being organized in alpha).
I've tried utilizing the data.frame function to define my variables but was unable to generate my plot. I've also looked around Stack Overflow and other websites, but no one seems to have the issue of no errors but still an empty plot.
Here's my data set:
Here's the code I'm using currently:
ggplot(OHV, aes(x=Month)) +
geom_area(aes(y=A+B+Unknown, fill="A")) +
geom_area(aes(y=B, fill="B")) +
geom_area(aes(y=Unknown, fill="Unknown"))
Here's the output at the end:
I have zero error messages, simply just no data being plotted on my graph.
Your dates are being interpreted as a factor. You must transform them.
ibrary(tidyverse)
set.seed(1)
df <- data.frame(Month = seq(lubridate::ymd('2018-01-01'),
lubridate::ymd('2018-12-01'), by = '1 month'),
Unknow = sample(17, replace = T, size = 12),
V1 = floor(runif(12, min = 35, max = 127)),
V2 = floor(runif(12, min = 75, max = 275)))
df <- df %>%
dplyr::mutate(Month = format(Month, '%b')) %>%
tidyr::gather(key = "Variable", value = "Value", -Month)
ggplot2::ggplot(df) +
geom_area(aes(x = Month, y = Value, fill = Variable),
position = 'stack')
Note that I used tidyr::gather to be able to stack the areas in an easier way.
Now assuming your year of analysis is 2018, you need to transform the date of your data frame to something continuous, in the interpretation of r.
df2 <- df %>%
dplyr::mutate(Month = paste0("2018-", Month, "-01"),
Month = lubridate::parse_date_time(Month,"y-b-d"),
Month = as.Date(Month))
library(scales)
ggplot2::ggplot(df2) +
geom_area(aes(x = Month, y = Value, fill = Variable),
position = 'stack') +
scale_x_date(labels = scales::date_format("%b"))

Gap between forecast and actual data in ggplot

I am trying to plot some data, fitted values and forecasts on a nice ggplot format but when I plot my data the way I think should work I get a gap between the real data and the forecast. The gap is meaningless but it would be nice if it was gone.
Some R code you can use to recreate my problem is:
library(xts)
library(tidyverse)
library(forecast)
dates <- seq(as.Date("2016-01-01"), length = 100, by = "days")
realdata <- arima.sim(model = list(ar = 0.7, order = c(1,1,0)), n = 99)
data <- xts(realdata, order.by = dates)
user_arima <- arima(data, order = c(1,1,0))
user_arimaf <- forecast(user_arima)
fits <- xts(user_arimaf$fitted, order.by = dates)
fcastdates <- as.Date(dates[100]) + 1:10
meancast <- xts(user_arimaf$mean[1:10], order.by = fcastdates)
lowercast95 <- xts(user_arimaf$lower[1:10], order.by = fcastdates)
uppercast95 <- xts(user_arimaf$upper[1:10], order.by = fcastdates)
frame <- merge(data, fits, meancast, uppercast95, lowercast95, all = TRUE, fill = NA)
frame <- as.data.frame(frame) %>%
mutate(date = as.Date(dates[1] + 0:(109)))
frame %>%
ggplot() +
geom_line(aes(date, data, color = "Data")) +
geom_line(aes(date, fits, color = "Fitted")) +
geom_line(aes(date, meancast, color = "Forecast")) +
geom_ribbon(aes(date, ymin=lowercast95,ymax=uppercast95),alpha=.25) +
scale_color_manual(values = c(
'Data' = 'black',
'Fitted' = 'red',
'Forecast' = 'darkblue')) +
labs(color = 'Legend') +
theme_classic() +
ylab("some data") +
xlab("Date") +
labs(title = "chart showing a gap",
subtitle = "Shaded area is the 95% CI from the ARIMA")
And the chart is below
I know there is a geom_forecast in ggplot now but I would like to build this particular plot the way i'm doing it. Although if there's no other solution to the gap then i'll use the geom_forecast.
Closing the gap requires providing a data point in the meancast column for the blank area. I guess it makes sense just to use the value for the last "real" data point.
# Grab the y-value corresponding to the date just before the gap.
last_data_value = frame[frame$date == as.Date("2016-04-09"), "data"]
# Construct a one-row data.frame.
extra_row = data.frame(data=NA_real_,
fits=NA_real_,
meancast=last_data_value,
uppercast95=last_data_value,
lowercast95=last_data_value,
date=as.Date("2016-04-09"))
# Add extra row to the main data.frame.
frame = rbind(frame, extra_row)

R heat map of annual time series by entire year

I am trying to make a heatmap of several years of daily averages of salinity in an estuary in R.
I would like the format to include month on the x-axis and year on the y-axis, so each Jan 1st directly above another Jan. 1st. In other words, NOT like a typical annual calendar style (not like this: http://www.r-bloggers.com/ggplot2-time-series-heatmaps/).
So far I have only been able to plot by the day of the year using:
{r}
d <- read.xlsx('GC salinity transposed.xlsx', sheetName = "vert-3", header = TRUE, stringsAsFactors = FALSE, colClasses = c("integer", "integer", "numeric"), endRow = 2254)
{r}
ggplot(d, aes(x = Day.Number, y = Year)) + geom_tile(aes(fill = Salinity)) + scale_fill_gradient(name = 'Mean Daily Salinity', low = 'white', high = 'blue') + theme(axis.title.y = element_blank())
And get this:
heat map not quite right
Could someone please tell me a better way to do this - a way that would include month, rather than day of the year along the x-axis? Thank you. New to R.
The lubridate package comes in handy for stuff like this. Does this code do what you want? I'm assuming you only have one salinity reading per month and there's no need to average across multiple values in the same month.
library(lubridate)
library(ggplot2)
# Define some data
df <- data.frame(date = seq.Date(from = as.Date("2015-01-01"), by = 1, length.out = 400),
salinity = runif(400, min=5, max=7))
# Create fields for plotting
df$day <- paste0(ifelse(month(df$date)<10,"0",""),
month(df$date),
"-",
ifelse(day(df$date)<10,"0",""),
day(df$date))
df$month <- paste0(ifelse(month(df$date)<10,"0",""),
month(df$date))
df$year <- year(df$date)
library(lubridate)
library(ggplot2)
# Define some data
df <- data.frame(date = seq.Date(from = as.Date("2015-01-01"), by = 1, length.out = 400),
salinity = runif(400, min=5, max=7))
# Create fields for plotting
df$day <- paste0(ifelse(month(df$date)<10,"0",""),
month(df$date),
"-",
ifelse(day(df$date)<10,"0",""),
day(df$date))
df$month <- paste0(ifelse(month(df$date)<10,"0",""),
month(df$date))
df$year <- year(df$date)
#Plot results by month
ggplot(data=df) +
geom_tile(aes(x = month, y = year, fill = salinity)) +
scale_y_continuous(breaks = c(2015,2016))
#Plot results by day
ggplot(data=df) +
geom_tile(aes(x = day, y = year, fill = salinity)) +
scale_y_continuous(breaks = c(2015,2016))
Results by month:
Results by day (do you really want this? It's very hard to read with 366 x-axis values):

How to Create a Graph of Statistical Time Series

I have data in the following format:
Date Year Month Day Flow
1 1953-10-01 1953 10 1 530
2 1953-10-02 1953 10 2 530
3 1953-10-03 1953 10 3 530
I would like to create a graph like this:
Here is my current image and code:
library(ggplot2)
library(plyr)
library(reshape2)
library(scales)
## Read Data
df <- read.csv("Salt River Flow.csv")
## Convert Date column to R-recognized dates
df$Date <- as.Date(df$Date, "%m/%d/%Y")
## Finds Water Years (Oct - Sept)
df$WY <- as.POSIXlt(as.POSIXlt(df$Date)+7948800)$year+1900
## Normalizes Water Years so stats can be applied to just months and days
df$w <- ifelse(month(df$Date) %in% c(10,11,12), 1903, 1904)
##Creates New Date (dat) Column
df$dat <- as.Date(paste(df$w,month(df$Date),day(df$Date), sep = "-"))
## Creates new data frame with summarised data by MonthDay
PlotData <- ddply(df, .(dat), summarise, Min = min(Flow), Tenth = quantile(Flow, p = 0.05), TwentyFifth = quantile(Flow, p = 0.25), Median = quantile(Flow, p = 0.50), Mean = mean(Flow), SeventyFifth = quantile(Flow, p = 0.75), Ninetieth = quantile(Flow, p = 0.90), Max = max(Flow))
## Melts data so it can be plotted with ggplot
m <- melt(PlotData, id="dat")
## Plots
p <- ggplot(m, aes(x = dat)) +
geom_ribbon(aes(min = TwentyFifth, max = Median), data = PlotData, fill = alpha("black", 0.1), color = NA) +
geom_ribbon(aes(min = Median, max = SeventyFifth), data = PlotData, fill = alpha("black", 0.5), color = NA) +
scale_x_date(labels = date_format("%b"), breaks = date_breaks("month"), expand = c(0,0)) +
geom_line(data = subset(m, variable == "Mean"), aes(y = value), size = 1.2) +
theme_bw() +
geom_line(data = subset(m, variable %in% c("Min","Max")), aes(y = value, group = variable)) +
geom_line(data = subset(m, variable %in% c("Ninetieth","Tenth")), aes(y = value, group = variable), linetype = 2) +
labs(x = "Water Year", y = "Flow (cfs)")
p
I am very close but there are some issues I'm having. First, if you can see a way to improve my code, please let me know. The main problem I ran into was that I needed two dataframes to make this graph: one melted, and one not. The unmelted dataframe was necessary (I think) to create the ribbons. I tried many ways to use the melted dataframe for the ribbons, but there was always a problem with the aesthetic length.
Second, I know to have a legend - and I want one, I need to have something in the aesthetics of each line/ribbon, but I am having trouble getting that to work. I think it would involve scale_fill_manual.
Third, and I don't know if this is possible, I would like to have each month label in between the tick marks, not on them (like in the above image).
Any help is greatly appreciated (especially with creating more efficient code).
Thank you.
Something along these lines might get you close with base:
library(lubridate)
library(reshape2)
# simulating data...
Date <- seq(as.Date("1953-10-01"),as.Date("2010-10-01"),by="day")
Year <- year(Date)
Month <- month(Date)
Day <- day(Date)
set.seed(1)
Flow <- rpois(length(Date), 2000)
Data <- data.frame(Date=Date,Year=Year,Month=Month,Day=Day,Flow=Flow)
# use acast to get it in a convenient shape:
PlotData <- acast(Data,Year~Month+Day,value.var="Flow")
# apply for quantiles
Quantiles <- apply(PlotData,2,function(x){
quantile(x,probs=c(1,.9,.75,.5,.25,.1,0),na.rm=TRUE)
})
Mean <- colMeans(PlotData, na.rm=TRUE)
# ugly way to get month tick separators
MonthTicks <- cumsum(table(unlist(lapply(strsplit(names(Mean),split="_"),"[[",1))))
# and finally your question:
plot(1:366,seq(0,max(Flow),length=366),type="n",xlab = "Water Year",ylab="Discharge",axes=FALSE)
polygon(c(1:366,366:1),c(Quantiles["50%",],rev(Quantiles["75%",])),border=NA,col=gray(.6))
polygon(c(1:366,366:1),c(Quantiles["50%",],rev(Quantiles["25%",])),border=NA,col=gray(.4))
lines(1:366,Quantiles["90%",], col = gray(.5), lty=4)
lines(1:366,Quantiles["10%",], col = gray(.5))
lines(1:366,Quantiles["100%",], col = gray(.7))
lines(1:366,Quantiles["0%",], col = gray(.7), lty=4)
lines(1:366,Mean,lwd=3)
axis(1,at=MonthTicks, labels=NA)
text(MonthTicks-15,-100,1:12,pos=1,xpd=TRUE)
axis(2)
The plotting code really isn't that tricky. You'll need to clean up the aesthetics, but polygon() is usually my strategy for shaded regions in plots (confidence bands, whatever).
Perhaps this will get you closer to what you're looking for, using ggplot2 and plyr:
library(ggplot2)
library(plyr)
library(lubridate)
library(scales)
df$MonthDay <- df$Date - years( year(df$Date) + 100 ) #Normalize points to same year
df <- ddply(df, .(Month, Day), mutate, MaxDayFlow = max(Flow) ) #Max flow on day
df <- ddply(df, .(Month, Day), mutate, MinDayFlow = min(Flow) ) #Min flow on day
p <- ggplot(df, aes(x=MonthDay) ) +
geom_smooth(size=2,level=.8,color="black",aes(y=Flow)) + #80% conf. interval
geom_smooth(size=2,level=.5,color="black",aes(y=Flow)) + #50% conf. interval
geom_line( linetype="longdash", aes(y=MaxDayFlow) ) +
geom_line( linetype="longdash", aes(y=MinDayFlow) ) +
labs(x="Month",y="Flow") +
scale_x_date( labels = date_format("%b") ) +
theme_bw()
Edit: Fixed X scale and X scale label
(Partial answer with base plotting function and not including the min, max, or mean.) I suspect you will need to construct a dataset before passing to ggplot, since that is typical for that function. I already do something similar and then pass the resulting matrix to matplot. (It doesn't do that kewl highlighting, but maybe ggplot can do it>
HDL.mon.mat <- aggregate(dfrm$Flow,
list( dfrm$Year + dfrm$Month/12),
quantile, prob=c(0.1,0.25,0.5,0.75, 0.9), na.rm=TRUE)
matplot(HDL.mon.mat[,1], HDL.mon.mat$x, type="pl")

Resources