Errors when mixing ddply and interval - r

I've been trying to calculate intervals for individuals and have run into a weird error. Specifically, in this code:
library(lubridate)
library(tidyverse)
library(plyr)
df<-tibble(dates=mdy(c("2/20/20","2/25/20","3/1/20","3/11/20","3/20/20")),recips=c("x","x","a","a","a"),treatment=c("T","P","T","P","P"),eventtype=c("a","real","y","z","real"))
df%>%mutate(window=interval(start=dates,end=dates+weeks(2)))
ddply(df,.(recips),mutate,window=interval(start=dates,end=dates+weeks(2)))
the last line draws an error that the second to last line doesn't draw. Any tips?

The issue would be the class of the output of interval which is not in compliance with the ddply. An option is to convert to character with as.character
plyr::ddply(df, c("recips"), plyr::mutate,
window = as.character(interval(start = dates, end = dates + weeks(2))))
-output
# dates recips treatment eventtype window
#1 2020-03-01 a T y 2020-03-01 UTC--2020-03-15 UTC
#2 2020-03-11 a P z 2020-03-11 UTC--2020-03-25 UTC
#3 2020-03-20 a P real 2020-03-20 UTC--2020-04-03 UTC
#4 2020-02-20 x T a 2020-02-20 UTC--2020-03-05 UTC
#5 2020-02-25 x P real 2020-02-25 UTC--2020-03-10 UTC
Based on the data showed, we are creating the interval on each element of 'date'. So, the group_by operation is not needed
library(dplyr)
df %>%
mutate(window = interval(start=dates,end=dates+weeks(2)))

Related

attempt to use fct_collapse() with class Date, must be factor or character vector, not an s3 object

I have a dataset I want to plot which requires some simplifying of the date which will be my x-axis. Right now I have every single day from March 2020 to November 2022, but I want to use manually defined groups of 6 month periods, with the leftover just being the exception (This is my first question here so let me know if more context is needed).
Anyways, my instinct was to use fct_collapse, but I get this error:
.f must be a factor or character vector, not an S3 object with class Date
I understand it is because my column: by_date_total$date is a date
I don't see a forcats operation that would work, is my only option to convert the date class and then reconvert it back to date? If I convert the date class, how will the
the desired groups I set be read? I saw another answer which used as.date.frame to coerce the date class into a character class, but when I convert it to the character class
I can no longer use ('%y-%m-%d - %y-%m-%d') BUT I guess it never worked in the first place.
my dataframe by_date_total:
date total_deaths total_cases
<date> <dbl> <dbl>
2020-03-15 68 3595
2020-03-16 91 4502
2020-03-17 117 5901
2020-03-18 162 8345
2020-03-19 212 12387
2020-03-20 277 17998
2020-03-21 359 24507
This is what I tried that produced the error:
plot_by_date <- by_date_total%>%
mutate(
date2 =
fct_collapse(date,
'6 months' = c("2020-03-15" - "2020-09-14"),
'12 months' = c("2021-09-15" - "2021-03-14"),
'18 months' = c("2022-03-15" - "2022-09-14"),
'18 months+' = c("2022-09-15" - "2022-11-14"))
)
plot_by_date
I did not include the rest of the ggplot(aes()) info because I want to verify this step works first
changing it to character class idea: FOLLOWED BY RUNNING THE ABOVE AGAIN
ERROR:
non-numeric argument to binary operator
plot_by_date <- as_data_frame(by_date_total) %>%
rename(Date = date) %>%
mutate(Date = str_replace_all(Date, "\\D", "-"),
Date = as.character(Date))
plot_by_date
case_when() is a good alternative.
data.frame(date=as.Date(c("2020-03-16", "2020-03-14", "2021-09-16", "2022-03-16", "2022-09-16", "2022-11-15"))) %>%
mutate(date2 = case_when(date >= "2022-09-15" ~ "18+ months",
date >= "2022-03-15" ~ "18 months",
date >= "2021-09-15" ~ "12 months",
date >= "2020-03-15" ~ "6 months",
TRUE ~ "other"))
# date date2
#1 2020-03-16 6 months
#2 2020-03-14 other
#3 2021-09-16 12 months
#4 2022-03-16 18 months
#5 2022-09-16 18+ months
#6 2022-11-15 18+ months

Trends and seasonality in Multiple Time Series (in R)

In the past last days I have been struggling a lot trying to handle my data. The question is that all the information I find online and in books dont suit my data.
My original data is +100 columns of time series (independent from each others), each with 48 months, starting in 08/2017 and finishing in 07/2021.
The objective is to obtain a value/metric representing the trend/seasonality, for each time series, so I can then make comparisons between them.
Below a data sample and two approaches that I tried to follow but failed.
Data sample (with only 6 columns of data, named orderly from 287 to 293):
287 288 289 290 292 293
2017-08-01 0.1613709 0.09907194 0.2542814 0.2179386 0.08020622 0.07926023
2017-09-01 0.1774719 0.10227714 0.2211257 0.1979846 0.09384094 0.10182659
2017-10-01 0.1738235 0.11191972 0.2099357 0.1930938 0.08038543 0.09304474
2017-11-01 0.1999949 0.14005038 0.2282944 0.2140095 0.08814765 0.10820706
2017-12-01 0.2203560 0.16408010 0.1864422 0.1890152 0.08735655 0.11958204
2018-01-01 0.2728642 0.22230381 0.1906515 0.1954573 0.10269819 0.13728082
2018-02-01 0.2771547 0.24142554 0.2287340 0.2431592 0.12353792 0.15428189
2018-03-01 0.2610135 0.24747148 0.2631311 0.2862447 0.18993516 0.17344621
2018-04-01 0.3502901 0.32087711 0.3012136 0.3339466 0.18706540 0.20857209
2018-05-01 0.3669179 0.36063092 0.3789247 0.3781572 0.18566273 0.20633488
2018-06-01 0.2643827 0.27359616 0.3415491 0.3172041 0.19025036 0.18735599
2018-07-01 0.2335092 0.29352583 0.3298348 0.2986179 0.17155325 0.15914827
2018-08-01 0.1994154 0.24043388 0.2868625 0.2659566 0.16226752 0.14772256
2018-09-01 0.1709875 0.20753322 0.2648888 0.2465150 0.15494714 0.14099699
2018-10-01 0.1843677 0.20504727 0.2600666 0.2480716 0.14583226 0.13660546
2018-11-01 0.2662550 0.23209503 0.1921081 0.2067601 0.14891306 0.14775722
2018-12-01 0.3455008 0.25827029 0.1825465 0.2222157 0.15189449 0.15854924
2019-01-01 0.3562984 0.28744854 0.1726661 0.2381863 0.15497530 0.16970100
2019-02-01 0.3596556 0.29504905 0.2190216 0.2532990 0.16528823 0.17614880
2019-03-01 0.3676633 0.30941445 0.2663822 0.3146126 0.19225333 0.19722699
2019-04-01 0.3471219 0.32011859 0.3318789 0.3620176 0.21693162 0.21269362
2019-05-01 0.3391499 0.33623537 0.3498372 0.3514615 0.22655705 0.21467237
2019-06-01 0.2134116 0.23256447 0.3097683 0.2937520 0.20671346 0.18182811
2019-07-01 0.1947303 0.25061919 0.3017159 0.2840877 0.16773642 0.12524420
2019-08-01 0.1676979 0.23042951 0.2933951 0.2741012 0.17294869 0.14598469
2019-09-01 0.1574564 0.20590697 0.2507077 0.2448338 0.16662829 0.14514487
2019-10-01 0.1670441 0.21569649 0.2239352 0.2349953 0.15196066 0.14107334
2019-11-01 0.2314212 0.23944840 0.1962703 0.2248290 0.16566737 0.18157745
2019-12-01 0.2937217 0.26243412 0.2524490 0.2844418 0.17893194 0.22077498
2020-01-01 0.3023854 0.28244002 0.2816947 0.3094329 0.16686343 0.22517501
2020-02-01 0.3511840 0.30870934 0.3109404 0.3344240 0.15479491 0.22957504
2020-03-01 0.3968343 0.33328386 0.3382992 0.3578028 0.14350501 0.23369119
2020-04-01 0.3745884 0.34262505 0.3675449 0.3827939 0.19862225 0.23809122
2020-05-01 0.3530601 0.35166492 0.3709603 0.3476905 0.25196152 0.24234931
2020-06-01 0.2282214 0.20867654 0.3517663 0.3336991 0.24879937 0.22456414
2020-07-01 0.2057477 0.21648387 0.3331914 0.3201591 0.20879761 0.18008671
2020-08-01 0.2000177 0.19419089 0.3040352 0.2979807 0.19359850 0.16924703
2020-09-01 0.1848961 0.19882785 0.2737280 0.2814912 0.17682968 0.15218477
2020-10-01 0.3177567 0.22982973 0.2646506 0.2804482 0.20588015 0.20085790
2020-11-01 0.3710144 0.28390520 0.2552706 0.2793703 0.18294126 0.15860050
2020-12-01 0.3783443 0.27966508 0.2316715 0.2586552 0.17646898 0.17848388
2021-01-01 0.3458173 0.25866979 0.2361880 0.2659490 0.17908497 0.18354894
2021-02-01 0.3604397 0.27641854 0.2407045 0.2732429 0.19147607 0.18462597
2021-03-01 0.3736471 0.29244967 0.2685608 0.2918238 0.20266803 0.18559877
2021-04-01 0.3581235 0.31151629 0.3729554 0.3619925 0.22856252 0.20997657
2021-05-01 0.3513976 0.34056181 0.4269086 0.4071241 0.26643216 0.24394560
2021-06-01 0.2306971 0.29087504 0.3798922 0.2053191 0.25745857 0.23557143
2021-07-01 0.2577626 0.26011944 0.3343924 0.3452438 0.21910554 0.19516812
I have tried to approch the issue with an xts format
projsxts <- xts(x= projs_2017Jul_t, order.by = projs_2017Jul_time)
plot(projsxts, main="NDVI values for oak projects with ESR (fitted values)", xlab="Time", ylab="NDVI")
[Xts timeseries plot][1]
[1]: https://i.stack.imgur.com/M46YQ.png
And also the normal ts approach, using "mts" as class for a multiple time series:
projs_2017Jul_ts1 <- ts(projs_2017Jul_t, frequency = 12, start=c(2017,8), end = c(2021,8), class = "mts", names = names2017)
print(projs_2017Jul_ts1)
I can obtain a summary, but when I try to use "decompose" I have the errors that "time series has no or less than 2 periods", although it has 48 months.
If I try to "stl", it says its only allowed in univariate series.
describe2017 <- summary.matrix(projs_2017Jul_ts1) #########gives Min, Median, Mean, Max (...) Values per column
projs_2017Jul_ts1 <- decompose(projs_2017Jul_ts1)
*"Error in decompose(projs_2017Jul_ts1) : time series has no or less than 2 periods"*
decompose_ts <- stl(projs_2017Jul_ts1)
*Error in stl(projs_2017Jul_ts1) : only univariate series are allowed*
Any advice/suggestion on how to do this, please? Thank you !
You basic approach is correct (create a time-series object, then use methods to decompose the time-series). I was able to reproduce your error, which is good.
The stl function only takes a single (univariate time-series), but when you feed a single time-series into stl, it gives the same error as you got using the decompose function. I think your data are not long enough for the algorithm to decompose. Typically, you need two full periods of data, in this case, the period is likely supra-annual, and five years is not long enough for the algorithm to identify the periodicity of the series.
see this post: error in stl, series has less than two periods (erroneous?)
## code I used to get your data into R
x <- readClipboard()
ts.data <- read.table(text = x, header = TRUE)
## code to create a timeseies object for 287
ts1 <- xts::xts(ts.data[,"X287"], order.by = as.Date(row.names(ts.data)))
## check the plot
plot(ts1)
plot of ts for 287
## use stl - Cleveland et al 1990 method for decomposing timeseries into seasonal, trend and remainder
stl.ts1 <- stl(ts1)
Error in stl(as.ts(ts1)) :
series is not periodic or has less than two period

changing date/time variable to time that starts at 00:00:00 in r

I'm looking for a simple and correct way to change the date/time (POSIXct) format into a time that starts at 00:00:00.
I couldn't find an answer to this in R language, but if I overlooked one, please tell me :)
So I have this :
date/time
v1
2022-02-16 15:07:15
38937
2022-02-16 15:07:17
39350
And I would like this :
time
v1
00:00:00
38937
00:00:02
39350
Can somebody help me with this?
Thanks :)
You can calculate the difference between the two datetimes in seconds, and add i to a random date starting at "00:00:00", before formatting it to only including the time. See the time column in the reprex underneath:
library(dplyr)
ibrary(lubridate)
df %>%
mutate(
date = lubridate::ymd_hms(date),
seconds = as.numeric(date - first(date)),
time = format(
lubridate::ymd_hms("2022-01-01 00:00:00") + seconds,
format = "%H:%M:%S"
)
)
#> # A tibble: 2 × 4
#> date v1 seconds time
#> <dttm> <dbl> <dbl> <chr>
#> 1 2022-02-16 15:07:15 38937 0 00:00:00
#> 2 2022-02-16 15:07:17 39350 2 00:00:02
Created on 2022-03-30 by the reprex package (v2.0.1)
Note that this will be misleading if you ever have over 24 hours between two datetimes. In these cases you should probably include the date.
Data
df <- tibble::tribble(
~date, ~v1,
"2022-02-16 15:07:15", 38937,
"2022-02-16 15:07:17", 39350
)
You can deduct all date/time with the first record of date/time, and change the result to type of time by the hms() function in the hms package.
library(dplyr)
library(hms)
df %>%
mutate(`date/time` = hms::hms(as.numeric(as.POSIXct(`date/time`) - as.POSIXct(first(`date/time`)))))
date/time v1
1 00:00:00 38937
2 00:00:02 39350
Note that in this method, even if the time difference is greater than 1 day, it'll be reflected in the result, for example:
df <- read.table(header = T, check.names = F, sep = "\t", text = "
date/time v1
2022-02-16 15:07:15 38937
2022-02-18 15:07:17 39350")
df %>%
mutate(`date/time` = hms::hms(as.numeric(as.POSIXct(`date/time`) - as.POSIXct(first(`date/time`)))))
date/time v1
1 00:00:00 38937
2 48:00:02 39350

Compute average over 20 second intervals and group by another column

I'm working with a large dataset of different variables collected during the dives of elephant seals. I would like to analyze my data on a fine-scale (20 second intervals). I want to bin my data into 20 second intervals, basically I just want to get the mean for every 20 seconds, so I can run more analysis on these intervals of data. However, I need to group my data by dive # so that I'm not binning information from separate dives.
There are three methods I've tried so far:
period.apply() but I cannot group with this function.
split() to subset my data by dive #, but can't seem to find a way to then calculate the mean of
different columns over 20 second intervals within these subsets.
openair package, using timeaverage() but continue to get an error (see code below).
Below is what the data looks like, and the code I've tried. I would like the means of Depth, MSA, rate_s, and HR for each 20 second window - grouped by diveNum and ~ideally~ also D_phase.
> head(seal_dives)
datetime seal_ID Depth MSA D_phase diveNum rate_s HR
1 2018-04-06 14:47:51 Congaree 4.5 0.20154042 D 1 NA 115.3846
2 2018-04-06 14:47:51 Congaree 4.5 0.20154042 D 1 NA 117.6471
3 2018-04-06 14:47:52 Congaree 4.5 0.11496760 D 1 NA 115.3846
4 2018-04-06 14:47:52 Congaree 4.5 0.11496760 D 1 NA 122.4490
5 2018-04-06 14:47:53 Congaree 4.5 0.05935992 D 1 NA 113.2075
6 2018-04-06 14:47:53 Congaree 4.5 0.05935992 D 1 NA 113.2075
#openair package using timeaverage, results in error message
> library(openair)
> seal_20<-timeAverage(
seal_dives,
avg.time = "20 sec",
data.thresh = 0,
statistic = "mean",
type = c("diveNum","D_phase"),
percentile = NA,
start.date = NA,
end.date = NA,
vector.ws = FALSE,
fill = FALSE
)
Can't find the variable(s) date
Error in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
#converting to time series and using period.apply(), but can't find a way to group them by dive #, or use split() then convert to time series.
#create a time series data class from our data frame
> seal_dives$datetime<-as.POSIXct(seal_dives$datetime,tz="GMT")
> seal_xts <- xts(seal_dives, order.by=seal_dives[,1])
> seal_20<-period.apply(seal_xts$Depth, endpoints(seal_xts$datetime, "seconds", 20), mean)
#split data by dive # but don't know how to do averages over 20 seconds
> seal_split<-split(seal_dives, seal_dives$diveNum)
Maybe there is a magical way to do this that I haven't found on the internet yet, or maybe I'm just doing something wrong in one of my methods.
You can use floor_date function from lubridate to bin data every 20 seconds. Group them along with diveNum and D_phase to get average of other columns using across.
library(dplyr)
library(lubridate)
result <- df %>%
group_by(diveNum, D_phase, datetime = floor_date(datetime, '20 sec')) %>%
summarise(across(c(Depth, MSA, rate_s, HR), mean, na.rm = TRUE), .groups = 'drop')
result

Having trouble correctly producing time series plot

I am trying to plot a time series from an excel file in R Studio. It has a single column named 'Dates'. This column contains datetime data of customer visits in the form 2/15/2014 6:17:22 AM. The datetime was originally in char format and I converted it into a Large POSIXct value using lubridate:
tsData <- mdy_hms(fullUsage$Dates)
Which gives me a value:
POSIXct[1:25,354], format: "2018-04-13 10:18:14" "2018-04-14 13:27:11" .....
I then tried converting it into a time series object using the code below:
require(xts)
visitTimes.ts <- xts(tsData, start = 1, order.by=as.POSIXct(tsData))
plot(visitTimes.ts)
ts_plot(visitTimes.ts)
ts_info(visitTimes.ts)
Im not 100% sure but it looks like the plot is coming out using the sum count of visits. I believe my problem may be in correctly indexing my data using the dates. I apologize in advance if this is a simple issue to deal with I am still learning R. I have included the screenshot of my plot.
yes you are right, you need to provide both the date column (x axis) and the value (y axis)
here's a simple example:
v1 <- data.frame(Date = mdy_hms(c("1-1-2020-00-00-00", "1-2-2020-00-00-00", "1-3-2020-00-00-00")), Value = c(1, 3, 6))
v2 <- xts(v1["Value"], order.by = v1[, "Date"])
plot(v2)
first argument of xts takes the x values, on the order.by i leave the actual ts object
You need to count the number of events in each time period and plot these values on the y axis. You didn't provide enough data for a reproducible example, so I have created a small example. We'll use the tidyverse packages dplyr and lubridate to help us out here:
library(lubridate)
library(dplyr)
library(ggplot2)
set.seed(69)
fullUsage <- data.frame(Dates = as.POSIXct("2020-01-01") +
minutes(round(cumsum(rexp(10000, 1/25))))
)
head(fullUsage)
#> Dates
#> 1 2020-01-01 00:02:00
#> 2 2020-01-01 00:15:00
#> 3 2020-01-01 00:22:00
#> 4 2020-01-01 00:29:00
#> 5 2020-01-01 01:13:00
#> 6 2020-01-01 01:27:00
First of all, we will create columns that show the hour of day and the month that events occurred:
fullUsage$hours <- hour(fullUsage$Dates)
fullUsage$month <- floor_date(fullUsage$Dates, "month")
Now we can effectively just count the number of events per month and plot this number for each month:
fullUsage %>%
group_by(month) %>%
summarise(n = length(hours)) %>%
ggplot(aes(month, n)) +
geom_col()
And we can do the same for the hour of day:
fullUsage %>%
group_by(hours) %>%
summarise(n = length(hours)) %>%
ggplot(aes(hours, n)) +
geom_col() +
scale_x_continuous(breaks = 0:23) +
labs(y = "Hour of day")
Created on 2020-08-05 by the reprex package (v0.3.0)

Resources