I have a column in a dataframe that is the number of seconds past midnight. How would I got about converting that number to a time displayed as hh:mm:ss? For instance:
hrsecs
1563
13088
14309
becomes
Time
00:26:03
03:38:08
03:58:29
Convert the seconds to period (seconds_to_period) and use hms from hms package
library(lubridate)
library(dplyr)
df1 <- df1 %>%
transmute(Time = hms::hms(seconds_to_period(hrsecs)))
-output
df1
Time
1 00:26:03
2 03:38:08
3 03:58:29
data
df1 <- structure(list(hrsecs = c(1563L, 13088L, 14309L)),
class = "data.frame", row.names = c(NA,
-3L))
1) character output Convert to POSIXct and then format. No packages are used.
x <- c(1563, 13088, 14309)
tt <- format(as.POSIXct("1970-01-01") + x, "%T"); tt
## [1] "00:26:03" "03:38:08" "03:58:29"
or
tt <- format(structure(x, class = c("POSIXct", "POSIXt"), tzone = "UTC"), "%T")
tt
## [1] "00:26:03" "03:38:08" "03:58:29"
2) times class output If you want to be able to manipulate the times then this will express them internally as fractions of a day but render them as times.
library(chron)
times(tt)
## [1] 00:26:03 03:38:08 03:58:29
Related
How can I add one day in a datetime in R?
structure(list(timestamp = structure(1667523601, tzone = "UTC", class = c("POSIXct",
"POSIXt"))), class = "data.frame", row.names = c(NA, -1L))
1) Add seconds If xx is the input then add one day's worth of seconds. This works because POSIXct is internally expressed in seconds and R does not use leap seconds (although it does have a builtin vector, .leap.seconds, available). No packages are used.
xx$timestamp
## [1] "2022-11-04 01:00:01 UTC"
xx$timestamp + 24 * 60 * 60
## [1] "2022-11-05 01:00:01 UTC"
2) POSIXlt Another way is to convert it to POSIXlt, add one to the mday component and then convert back. No packages are used.
lt <- as.POSIXlt(xx$timestamp)
lt$mday <- lt$mday + 1
as.POSIXct(lt)
## [1] "2022-11-05 01:00:01 UTC"
3) seq seq with by = "day" can be used. No packages are used.
do.call("c", lapply(xx$timestamp, function(x) seq(x, length = 2, by = "day")[2]))
## [1] "2022-11-05 01:00:01 UTC"
If we knew that there was only one element in timestamp it could be simplified to
seq(xx$timestamp, length = 2, by = "day")[2]
## [1] "2022-11-05 01:00:01 UTC"
4) lubridate The lubridate package supports adding days like this:
library(lubridate)
xx$timestamp + days(1)
## [1] "2022-11-05 01:00:01 UTC"
Note
The input shown in the question is:
xx <- structure(list(timestamp = structure(1667523601, tzone = "UTC",
class = c("POSIXct", "POSIXt"))),
class = "data.frame", row.names = c(NA, -1L))
With lubridate
library(tidyverse)
library(lubridate)
df %>%
mutate(
timestamp = timestamp %m+% days(1)
)
# A tibble: 1 × 1
timestamp
<dttm>
1 2022-11-04 01:00:01
With lubridate you can add days with ...+days(1).
library(tidyverse)
dd <- structure(list(timestamp = structure(1667523601, tzone = "UTC", class = c(
"POSIXct",
"POSIXt"
))), class = "data.frame", row.names = c(NA, -1L))
dd |> mutate(timestamp + lubridate::days(1))
#> timestamp timestamp + lubridate::days(1)
#> 1 2022-11-04 01:00:01 2022-11-05 01:00:01
I have 2 dataframes:
df1<-structure(list(id = c(1, 2), date = structure(c(1483636800, 1485192000
), class = c("POSIXct", "POSIXt"), tzone = "")), class = "data.frame", row.names = c(NA,
-2L))
df2<-structure(list(id.1 = structure(1:3, .Label = c("A", "B", "C"
), class = "factor"), sunrise = structure(c(1483617946, 1485198384,
1485205584), class = c("POSIXct", "POSIXt"), tzone = "")), class = "data.frame", row.names = c(NA,
-3L))
df1:
id date
1 1 2017-01-05 12:20:00
2 2 2017-01-23 12:20:00
df2:
id.1 sunrise
1 A 2017-01-05 07:05:46
2 B 2017-01-23 14:06:24
3 C 2017-01-23 16:06:24
I would like to find the closest sunrise date in df2 to each of the dates in df1, and then calculate the time difference (in hours) and put these new values in new column "closest" in df1. To find the time until sunrise, whereby a negative value would indicate a period after sunrise and a positive value a period before sunrise.
My df1 is very large > 100 million rows, so it's important that the solution works fast and efficient. The solution I found cannot accommodate the size of my data set, not even on a single column (will return: "Error: vector memory exhausted (limit reached?)". Attempts to remedy this issue were so far unsuccessful.
sDTA <- data.table(df1)[2]
rDTB <- data.table(df2)
try1<-sDTA[, closest := rDTB[sDTA, on = .(sunrise = date), roll = "nearest", x.sunrise]][]
try1$Time_until_sunrise<difftime(try1$closest,try1$date,units="hours")
I previously encountered exhausted memory issues while trying to use aggregate() functions, but was able to 'repair' this by replacing these with those that used dplyr. Perhaps this could be a solution again.
I would like to convert a date that I have in R into an individual date and time. At the moment the format of the date is POSIXct
An example is given here:
"2019-03-29 20:42:07"
I want the date to be in one column and the time of that date in a corresponding column. I have found something similar here, but it doesn't answer my question.
Many thanks
If the column shows POSIXct class. Create two new columns by coercing to Date (as.Date) and the time part with format
df1 <- transform(df1, date = as.Date(datetime), time = format(datetime, "%T"))
df1
# datetime date time
#1 2019-03-29 20:42:07 2019-03-30 20:42:07
data
df1 <- structure(list(datetime = structure(1553910127, class = c("POSIXct",
"POSIXt"), tzone = "")), class = "data.frame", row.names = c(NA,
-1L))
I have a csv data set that I would like t transform into time series data for time series analysis.
The data looks like that (there are additional columns, and there are 17,190 obs.):
temp interval
10.0 2014-04-01 00:00:00
10.0 2014-04-01 00:15:00
10.0 2014-04-01 00:30:00
10.0 2014-04-01 00:45:00
7.8 2014-04-01 01:00:00
The Interval column is in POSIXct format.
I would appreciate help with the code for transforming it into time series please.
Thank you
CSV stands for comma separated values. The data shown in the question is not in that form but if we assume that the data is a data frame DF shown reproducibly in the Note at the end then the following code gives a zoo series z and also converts it to a ts series tt where the times are the number of seconds since 1970-01-01 00:00:00. See ?read.zoo for more information on that function. Also the zoo package contains an entire vignette with many read.zoo examples.
z can be used for plotting and tt might be useful if you are using functions that only accept ts class input.
library(zoo)
z <- read.zoo(DF, index = "interval", tz = "")
tt <- as.ts(z)
Note
Lines <- "
temp interval
10.0 2014-04-01 00:00:00
10.0 2014-04-01 00:15:00
10.0 2014-04-01 00:30:00
10.0 2014-04-01 00:45:00
7.8 2014-04-01 01:00:00"
# read into separate lines, trim whitespace from ends and
# replace 2 or more consecutive spaces with comma
L <- gsub(" +", ",", trimws(readLines(textConnection(Lines))))
DF <- read.csv(text = L)
If you read in the csv with read_csv in tidyverse you'll get the interval column in POSIXct class automatically.
dput below:
library(tidyverse)
df <- structure(list(temp = c(10, 10, 10, 10, 7.8), interval = structure(c(1396310400,
1396311300, 1396312200, 1396313100, 1396314000), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), class = c("spec_tbl_df", "tbl_df",
"tbl", "data.frame"), row.names = c(NA, -5L), spec = structure(list(
cols = list(temp = structure(list(), class = c("collector_double",
"collector")), interval = structure(list(format = ""), class = c("collector_datetime",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
you can then just reorder cols and convert to zoo.
library(zoo)
df <- df %>%
select(interval, temp) %>%
zoo()
class(df)
[1] "zoo"
I wrote a function that creates column based on a datetime column using parameters starting and ending dates, but I can't get it to work.
df is a data frame object.
create_gv <- function(df, s_ymd, e_ymd, char) {
df<-get(df)
for (i in (1:nrow(df))) {
ymd <- format(df[i,1],"%y%m%d")
if ((strptime(ymd,format = "%y%m%d") >= strptime(s_ymd,format = "%y%m%d") & strptime(ymd,format = "%y%m%d") <= strptime(e_ymd,format = "%y%m%d")) == TRUE) {
df$group_var[i]<-char
}
}
}
create_gv("example","171224","171224","D")
I get
> example
start_time group_var
1 2017-12-24 10:42:39 NA
2 2017-12-24 10:44:31 NA
3 2018-01-14 12:05:53 NA
4 2018-01-14 12:22:12 NA
Reproducible data frame named example here:
example <- structure(list(start_time = structure(c(1514112159, 1514112271, 1515931553, 1515932532), class = c("POSIXct", "POSIXt"), tzone = ""), group_var = c(NA, NA, NA, NA)), .Names = c("start_time", "group_var"), row.names = c(NA, -4L), class = "data.frame")
Desired output:
start_time group_var
1 2017-12-24 10:42:39 D
2 2017-12-24 10:44:31 D
3 2018-01-14 12:05:53 NA
4 2018-01-14 12:22:12 NA
From your description, my understanding is that you want to check if the date in a row is between the start and end date (which are scalars), and update the value of group_var accordingly.
The lubridate package provides a set of tools which allow to easily work with dates. In order to compare dates you don't need to format them. format only helps with the viewing of these dates. I have used the dplyr package which allows you to easily perform data transformations.
To solve the problem, we use the dplyr::mutate function which transforms a column by row, as a function of other columns. In this case, the date column in our dataset (start_time) is to compared with scalar start and end times in order to codify the group_var variable.
library(lubridate)
library(magrittr)
char <- "D"
# Randomly setting the start and end times for the purpose of the example. Any value can be passed to this.
s_ymd <- df$start_time[1] - 5000
e_ymd <- df$start_time[2] + 5000
df %>% dplyr::mutate(group_var = ifelse(start_time > s_ymd & start_time <
e_ymd,
char, NA)) -> df
df
To use a function directly, write:
create_gv <- function(start_time, s_ymd, e_ymd, char){
g_var <- ifelse(start_time > s_ymd & start_time < e_ymd,
char, NA)
return(g_var)
}
df %>% dplyr::mutate(group_var = create_gv(start_time, !!s_ymd, !!e_ymd,
!!char))
Here since s_ymd, e_ymd and char are scalars (i.e. not columns in the data frame), we need to unquote them. Note that the mutate function works on vectorized functions as desired.