transforming data into time series - r

I have a csv data set that I would like t transform into time series data for time series analysis.
The data looks like that (there are additional columns, and there are 17,190 obs.):
temp interval
10.0 2014-04-01 00:00:00
10.0 2014-04-01 00:15:00
10.0 2014-04-01 00:30:00
10.0 2014-04-01 00:45:00
7.8 2014-04-01 01:00:00
The Interval column is in POSIXct format.
I would appreciate help with the code for transforming it into time series please.
Thank you

CSV stands for comma separated values. The data shown in the question is not in that form but if we assume that the data is a data frame DF shown reproducibly in the Note at the end then the following code gives a zoo series z and also converts it to a ts series tt where the times are the number of seconds since 1970-01-01 00:00:00. See ?read.zoo for more information on that function. Also the zoo package contains an entire vignette with many read.zoo examples.
z can be used for plotting and tt might be useful if you are using functions that only accept ts class input.
library(zoo)
z <- read.zoo(DF, index = "interval", tz = "")
tt <- as.ts(z)
Note
Lines <- "
temp interval
10.0 2014-04-01 00:00:00
10.0 2014-04-01 00:15:00
10.0 2014-04-01 00:30:00
10.0 2014-04-01 00:45:00
7.8 2014-04-01 01:00:00"
# read into separate lines, trim whitespace from ends and
# replace 2 or more consecutive spaces with comma
L <- gsub(" +", ",", trimws(readLines(textConnection(Lines))))
DF <- read.csv(text = L)

If you read in the csv with read_csv in tidyverse you'll get the interval column in POSIXct class automatically.
dput below:
library(tidyverse)
df <- structure(list(temp = c(10, 10, 10, 10, 7.8), interval = structure(c(1396310400,
1396311300, 1396312200, 1396313100, 1396314000), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), class = c("spec_tbl_df", "tbl_df",
"tbl", "data.frame"), row.names = c(NA, -5L), spec = structure(list(
cols = list(temp = structure(list(), class = c("collector_double",
"collector")), interval = structure(list(format = ""), class = c("collector_datetime",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
you can then just reorder cols and convert to zoo.
library(zoo)
df <- df %>%
select(interval, temp) %>%
zoo()
class(df)
[1] "zoo"

Related

How can I convert number to hh:mm:ss?

I have a column in a dataframe that is the number of seconds past midnight. How would I got about converting that number to a time displayed as hh:mm:ss? For instance:
hrsecs
1563
13088
14309
becomes
Time
00:26:03
03:38:08
03:58:29
Convert the seconds to period (seconds_to_period) and use hms from hms package
library(lubridate)
library(dplyr)
df1 <- df1 %>%
transmute(Time = hms::hms(seconds_to_period(hrsecs)))
-output
df1
Time
1 00:26:03
2 03:38:08
3 03:58:29
data
df1 <- structure(list(hrsecs = c(1563L, 13088L, 14309L)),
class = "data.frame", row.names = c(NA,
-3L))
1) character output Convert to POSIXct and then format. No packages are used.
x <- c(1563, 13088, 14309)
tt <- format(as.POSIXct("1970-01-01") + x, "%T"); tt
## [1] "00:26:03" "03:38:08" "03:58:29"
or
tt <- format(structure(x, class = c("POSIXct", "POSIXt"), tzone = "UTC"), "%T")
tt
## [1] "00:26:03" "03:38:08" "03:58:29"
2) times class output If you want to be able to manipulate the times then this will express them internally as fractions of a day but render them as times.
library(chron)
times(tt)
## [1] 00:26:03 03:38:08 03:58:29

Seperate columns with space

df<-separate(df$ALISVERIS_TARIHI, c("key","value")," ", extra=merge)
Error in UseMethod("separate_") :
no applicable method for 'separate_' applied to an object of class "character"
"20190901" how can I separate this into 3 columns like 2019 09 01?
If you want to separate the 1st column based on number of characters you can use extract as -
df <- tidyr::extract(df, ALISVERIS_TARIHI, c('year', 'month', 'day'), '(.{4})(..)(..)')
df
# year month day a
#1 2019 09 01 a
#2 2019 09 08 b
The same pattern can be used with strcapture in base R -
data <- strcapture('(.{4})(..)(..)', df$ALISVERIS_TARIHI,
proto = list(year = integer(), month = integer(), day = integer()))
data
It is easier to help if you provide data in a reproducible format
df <- data.frame(ALISVERIS_TARIHI = c('20190901', '20190908'), a = c('a', 'b'))
We could use read.table from base R
cbind(read.table(text = as.character(as.Date(df$ALISVERIS_TARIHI,
format = "%Y%m%d")), sep="-", header = FALSE,
col.names = c("year", "month", "day")), df['a'])
year month day a
1 2019 9 1 a
2 2019 9 8 b
data
df <- structure(list(ALISVERIS_TARIHI = c("20190901", "20190908"),
a = c("a", "b")), class = "data.frame", row.names = c(NA,
-2L))

Find closest date in a df to a date in another df and subtract the difference

I have 2 dataframes:
df1<-structure(list(id = c(1, 2), date = structure(c(1483636800, 1485192000
), class = c("POSIXct", "POSIXt"), tzone = "")), class = "data.frame", row.names = c(NA,
-2L))
df2<-structure(list(id.1 = structure(1:3, .Label = c("A", "B", "C"
), class = "factor"), sunrise = structure(c(1483617946, 1485198384,
1485205584), class = c("POSIXct", "POSIXt"), tzone = "")), class = "data.frame", row.names = c(NA,
-3L))
df1:
id date
1 1 2017-01-05 12:20:00
2 2 2017-01-23 12:20:00
df2:
id.1 sunrise
1 A 2017-01-05 07:05:46
2 B 2017-01-23 14:06:24
3 C 2017-01-23 16:06:24
I would like to find the closest sunrise date in df2 to each of the dates in df1, and then calculate the time difference (in hours) and put these new values in new column "closest" in df1. To find the time until sunrise, whereby a negative value would indicate a period after sunrise and a positive value a period before sunrise.
My df1 is very large > 100 million rows, so it's important that the solution works fast and efficient. The solution I found cannot accommodate the size of my data set, not even on a single column (will return: "Error: vector memory exhausted (limit reached?)". Attempts to remedy this issue were so far unsuccessful.
sDTA <- data.table(df1)[2]
rDTB <- data.table(df2)
try1<-sDTA[, closest := rDTB[sDTA, on = .(sunrise = date), roll = "nearest", x.sunrise]][]
try1$Time_until_sunrise<difftime(try1$closest,try1$date,units="hours")
I previously encountered exhausted memory issues while trying to use aggregate() functions, but was able to 'repair' this by replacing these with those that used dplyr. Perhaps this could be a solution again.

Convert the date into an individual date and time in R

I would like to convert a date that I have in R into an individual date and time. At the moment the format of the date is POSIXct
An example is given here:
"2019-03-29 20:42:07"
I want the date to be in one column and the time of that date in a corresponding column. I have found something similar here, but it doesn't answer my question.
Many thanks
If the column shows POSIXct class. Create two new columns by coercing to Date (as.Date) and the time part with format
df1 <- transform(df1, date = as.Date(datetime), time = format(datetime, "%T"))
df1
# datetime date time
#1 2019-03-29 20:42:07 2019-03-30 20:42:07
data
df1 <- structure(list(datetime = structure(1553910127, class = c("POSIXct",
"POSIXt"), tzone = "")), class = "data.frame", row.names = c(NA,
-1L))

Subsetting data frame with multiple date conditions for ranges in between

I need subsets between multiple dates.
Example data frame:
testdf <- data.frame(short_date = seq(as.Date("2007-03-01"),
as.Date("2008-09-01"), by = 'day'))
An example of data frame with values for date ranges:
dates_cut <- structure(list(emergence = structure(c(13627, 13997), class = "Date"), disease_onset = structure(c(13694, 14062), class = "Date")), .Names = c("emergence", "disease_onset"), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"))
Obviously this is just a sample, there is a number of years for which I need subsets of data in between ($emergence date and $disese_onset).
This works for one data range:
testdf %>% filter(short_date >=dates_cut[1,1], short_date >=dates_cut[1,2])
The problem is when there are multiple date ranges.
Thanks.
One option would be to lapply over the rows of dates_cut and then store each subset in a list. After that you can rbind them all together with do.call:
list <- lapply(1:nrow(dates_cut), function(i) {
testdf[which(testdf$short_date >= dates_cut[i, "emergence"] &
testdf$short_date <= dates_cut[i, "disease_onset"]), , drop = FALSE]})
res <- do.call(rbind, list)
head(res)
# short_date
#55 2007-04-24
#56 2007-04-25
#57 2007-04-26
#58 2007-04-27
#59 2007-04-28
#60 2007-04-29

Resources