I'm having trouble converting character values into date (hour + minutes), I have the following codes:
start <- c("2022-01-10 9:35PM","2022-01-10 10:35PM")
end <- c("2022-01-11 7:00AM","2022-01-11 8:00AM")
dat <- data.frame(start,end)
These are all in character form. I would like to:
Convert all the datetimes into date format and into 24hr format like: "2022-01-10 9:35PM" into "2022-01-10 21:35",
and "2022-01-11 7:00AM" into "2022-01-11 7:00" because I would like to calculate the difference between the dates in hrs.
Also I would like to add an ID column with a specific ID, the desired data would like this:
ID <- c(101,101)
start <- c("2022-01-10 21:35","2022-01-10 22:35")
end <- c("2022-01-11 7:00","2022-01-11 8:00")
diff <- c(9,10) # I'm not sure how the calculations would turn out to be
dat <- data.frame(ID,start,end,diff)
I would appreciate all the help there is! Thanks!!!
You can use lubridate::ymd_hm. Don't use floor if you want the exact value.
library(dplyr)
library(lubridate)
dat %>%
mutate(ID = 101,
across(c(start, end), ymd_hm),
diff = floor(end - start))
start end ID diff
1 2022-01-10 21:35:00 2022-01-11 07:00:00 101 9 hours
2 2022-01-10 22:35:00 2022-01-11 08:00:00 101 9 hours
The base R approach with strptime is:
strptime(dat$start, "%Y-%m-%d %H:%M %p")
[1] "2022-01-10 09:35:00 CET" "2022-01-10 10:35:00 CET"
Related
I have a data Y . Y has a column time .
time column looks like this:
For example, 20211201000010 means 2021-12-01 00:00:10 .
time <- strptime(Y$time, format = "%Y%m%d%H%M%S")
start_time <- min(time)
In this code, start_time is 2021-12-01 00:00:02.
But I want to round up the start_timeas 2021-12-01 00:00:10,since the start_time should be 10 seconds interval for my data.
How can I round up 2021-12-01 00:00:02 as 2021-12-01 00:00:10 ?
lubridate package is always our friends for datetime work.
library(lubridate)
xx1 <- '20211201010002'
ymd_hms(xx1) %>%
ceiling_date(unit = '10s')
[1] "2021-12-01 01:00:10 UTC"
You may need to calculate the remainder (divide by 10) before you format the data.
e.g.
20211201000002 %% 10 = 2;
20211201000010 %% 10 = 0
Then you find the first 0 in your list.
I have column called trip_start_timestamp with values like "01/23/2020 03:00:00 PM" and column datatype is factor. I am looking to have the column values as "2020/01/23 15:00:00" and the weekday for the specific value like "thursday".
You can use mdy_hms from lubridate to get data into POSIXct and use weekdays to get day of the week.
library(dplyr)
library(lubridate)
df %>%
mutate(trip_start_timestamp = mdy_hms(trip_start_timestamp),
weekday = weekdays(trip_start_timestamp))
# trip_start_timestamp weekday
#1 2020-01-23 15:00:00 Thursday
#2 2020-01-25 01:00:00 Saturday
In base R :
df$trip_start_timestamp <- as.POSIXct(df$trip_start_timestamp,
format = '%m/%d/%Y %I:%M:%S %p', tz = 'UTC')
df$weekday <- weekdays(df$trip_start_timestamp)
df
data
df <- data.frame(trip_start_timestamp = factor(c("01/23/2020 03:00:00 PM",
"01/25/2020 01:00:00 AM")))
I want to make a time series with the frequency a date and time is observed. The raw data looked something like this:
dd-mm-yyyy hh:mm
28-2-2018 0:12
28-2-2018 11:16
28-2-2018 12:12
28-2-2018 13:22
28-2-2018 14:23
28-2-2018 14:14
28-2-2018 16:24
The date and time format is in the wrong way for R, so I had to adjust it:
extracted_times <- as.POSIXct(bedrijf.CSV$viewed_at, format = "%d-%m-%Y %H:%M")
I ordered the data with frequency in a table using the following code:
timeserieswithoutzeros <- table(extracted_times)
The data looks something like this now:
2018-02-28 00:11:00 2018-02-28 01:52:00 2018-02-28 03:38:00
1 2 5
2018-02-28 04:10:00 2018-02-28 04:40:00 2018-02-28 04:45:00
2 1 1
As you may see there are a lot of unobserved dates and times.
I want to add these unobserved dates and times with the frequency of 0.
I tried the complete function, but the error states that it can't best used, because I use as.POSIXct().
Any ideas?
As already mentinoned in the comments by #eric-lecoutre, you can combine your observations with a sequence begining at the earliest ending at the last date using seq and subtract 1 of the frequency table.
timeseriesWithzeros <- table(c(extracted_times, seq(min(extracted_times), max(extracted_times), "1 min")))-1
Maybe the following is what you want.
First, coerce the data to class "POSIXt" and create the sequence of all date/time between min and max by steps of 1 minute.
bedrijf.CSV$viewed_at <- as.POSIXct(bedrijf.CSV$viewed_at, format = "%d-%m-%Y %H:%M")
new <- seq(min(bedrijf.CSV$viewed_at),
max(bedrijf.CSV$viewed_at),
by = "1 mins")
tmp <- data.frame(viewed_at = new)
Now see if these values are in the original data.
tmp$viewed <- tmp$viewed_at %in% bedrijf.CSV$viewed_at
tbl <- xtabs(viewed ~ viewed_at, tmp)
sum(tbl != 0)
#[1] 7
Final clean up.
rm(new, tmp)
I have a data frame with hour stamp and corresponding temperature measured. The measurements are taken at random intervals over time continuously. I would like to convert the hours to respective date-time and temperature measured. My data frame looks like this: (The measurement started at 20/05/2016)
Time, Temp
09.25,28
10.35,28.2
18.25,29
23.50,30
01.10,31
12.00,36
02.00,25
I would like to create a data.frame with respective date-time and Temp like below:
Time, Temp
2016-05-20 09:25,28
2016-05-20 10:35,28.2
2016-05-20 18:25,29
2016-05-20 23:50,30
2016-05-21 01:10,31
2016-05-21 12:00,36
2016-05-22 02:00,25
I am thankful for any comments and tips on the packages or functions in R, I can have a look to do this. Thanks for your time.
A possible solution in base R:
df$Time <- as.POSIXct(strptime(paste('2016-05-20', sprintf('%05.2f',df$Time)), format = '%Y-%m-%d %H.%M', tz = 'GMT'))
df$Time <- df$Time + cumsum(c(0,diff(df$Time)) < 0) * 86400 # 86400 = 60 * 60 * 24
which gives:
> df
Time Temp
1 2016-05-20 09:25:00 28.0
2 2016-05-20 10:35:00 28.2
3 2016-05-20 18:25:00 29.0
4 2016-05-20 23:50:00 30.0
5 2016-05-21 01:10:00 31.0
6 2016-05-21 12:00:00 36.0
7 2016-05-22 02:00:00 25.0
An alternative with data.table (off course you can also use cumsum with diff instead of rleid & shift):
setDT(df)[, Time := as.POSIXct(strptime(paste('2016-05-20', sprintf('%05.2f',Time)), format = '%Y-%m-%d %H.%M', tz = 'GMT')) +
(rleid(Time < shift(Time, fill = Time[1]))-1) * 86400]
Or with dplyr:
library(dplyr)
df %>%
mutate(Time = as.POSIXct(strptime(paste('2016-05-20',
sprintf('%05.2f',Time)),
format = '%Y-%m-%d %H.%M', tz = 'GMT')) +
cumsum(c(0,diff(Time)) < 0)*86400)
which will both give the same result.
Used data:
df <- read.table(text='Time, Temp
09.25,28
10.35,28.2
18.25,29
23.50,30
01.10,31
12.00,36
02.00,25', header=TRUE, sep=',')
You can use a custom date format combined with some code that detects when a new day begins (assuming the first measurement takes place earlier in the day than the last measurement of the previous day).
# starting day
start_date = "2016-05-20"
values=read.csv('values.txt', colClasses=c("character",NA))
last=c(0,values$Time[1:nrow(values)-1])
day=cumsum(values$Time<last)
Time = strptime(paste(start_date,values$Time), "%Y-%m-%d %H.%M")
Time = Time + day*86400
values$Time = Time
The format of my excel data file is:
day value
01-01-2000 00:00:00 4
01-01-2000 00:01:00 3
01-01-2000 00:02:00 1
01-01-2000 00:04:00 1
I open my file with this:
ts = read.csv(file=pathfile, header=TRUE, sep=",")
How can I add additional rows with zero number in column “value” into the data frame. Output example:
day value
01-01-2000 00:00:00 4
01-01-2000 00:01:00 3
01-01-2000 00:02:00 1
01-01-2000 00:03:00 0
01-01-2000 00:04:00 1
This is now completely automated in the padr package. Takes only one line of code.
original <- data.frame(
day = as.POSIXct(c("01-01-2000 00:00:00",
"01-01-2000 00:01:00",
"01-01-2000 00:02:00",
"01-01-2000 00:04:00"), format="%m-%d-%Y %H:%M:%S"),
value = c(4, 3, 1, 1))
library(padr)
library(dplyr) # for the pipe operator
original %>% pad %>% fill_by_value(value)
See vignette("padr") or this blog post for its working.
I think this is a more general solution, which relies on creating a sequence of all timestamps, using that as the basis for a new data frame, and then filling in your original values in that df where applicable.
# convert original `day` to POSIX
ts$day <- as.POSIXct(ts$day, format="%m-%d-%Y %H:%M:%S", tz="GMT")
# generate a sequence of all minutes in a day
minAsNumeric <- 946684860 + seq(0,60*60*24,by=60) # all minutes of your first day
minAsPOSIX <- as.POSIXct(minAsNumeric, origin="1970-01-01", tz="GMT") # convert those minutes to POSIX
# build complete dataframe
newdata <- as.data.frame(minAsPOSIX)
newdata$value <- ts$value[pmatch(newdata$minAsPOSIX, ts$day)] # fill in original `value`s where present
newdata$value[is.na(newdata$value)] <- 0 # replace NAs with 0
Try:
ts = read.csv(file=pathfile, header=TRUE, sep=",", stringsAsFactors=F)
ts.tmp = rbind(ts,list("01-01-2000 00:03:00",0))
ts.out = ts.tmp[order(ts.tmp$day),]
Notice that you need to force load the strings in first column as character and not factors otherwise you will have issue with the rbind. To get the day column to be a factor after than just do:
ts.out$day = as.factor(ts.out$day)
Tidyr offers the nice complete function to generate rows for implicitly missing data. I use replace_na to turn NA values to 0 in second step.
ts%>%
tidyr::complete(day=seq.POSIXt(min(day), max(day), by="min"))%>%
dplyr::mutate(value=tidyr::replace_na(value,0))
Notice that I set the granularity of the dates to minutes since your dataset expects a row every minute.