separating data with respect to month, day, year and hour in R - r

I have two columns in a data frame first is water consumption and the second column is for date+hour. for example
Value Time
12.2 1/1/2016 1:00
11.2 1/1/2016 2:00
10.2 1/1/2016 3:00
The data is for 4 years and I want to create separate columns for month date year and hour.
I would appreciate any help

We can convert to Datetime and then extract the components. We assume the format of 'Time' column is 'dd/mm/yyyy H:M' (in case it is different i.e. 'mm/dd/yyyy H:M', change the dmy_hm to mdy_hm)
library(dplyr)
library(lubridate)
df1 %>%
mutate(Time = dmy_hm(Time), month = month(Time),
year = year(Time), hour = hour(Time))
# Value Time month year hour
#1 12.2 2016-01-01 01:00:00 1 2016 1
#2 11.2 2016-01-01 02:00:00 1 2016 2
#3 10.2 2016-01-01 03:00:00 1 2016 3
In base R, we can either use strptime or as.POSIXct and then use either format or extract components
df1$Time <- strptime(df1$Time, "%d/%m/%Y %H:%M")
transform(df1, month = Time$mon+1, year = Time$year + 1900, hour = Time$hour)
# Value Time month year hour
#1 12.2 2016-01-01 01:00:00 1 2016 1
#2 11.2 2016-01-01 02:00:00 1 2016 2
#3 10.2 2016-01-01 03:00:00 1 2016 3
data
df1 <- structure(list(Value = c(12.2, 11.2, 10.2), Time = c("1/1/2016 1:00",
"1/1/2016 2:00", "1/1/2016 3:00")), class = "data.frame", row.names = c(NA,
-3L))

Related

Converting mixed times into 24 hour format

I currently have a dataset with multiple different time formats(AM/PM, numeric, 24hr format) and I'm trying to turn them all into 24hr format. Is there a way to standardize mixed format columns?
Current sample data
time
12:30 PM
03:00 PM
0.961469907
0.913622685
0.911423611
09:10 AM
18:00
Desired output
new_time
12:30:00
15:00:00
23:04:31
21:55:37
21:52:27
09:10:00
18:00:00
I know how to do them all individually(an example below), but is there a way to do it all in one go because I have a large amount of data and can't go line by line?
#for numeric time
> library(chron)
> x <- c(0.961469907, 0.913622685, 0.911423611)
> times(x)
[1] 23:04:31 21:55:37 21:52:27
The decimal times are a pain but we can parse them first, feed them back as a character then use lubridate's parse_date_time to do them all at once
library(tidyverse)
library(chron)
# Create reproducible dataframe
df <-
tibble::tibble(
time = c(
"12:30 PM",
"03:00 PM",
0.961469907,
0.913622685,
0.911423611,
"09:10 AM",
"18:00")
)
# Parse times
df <-
df %>%
dplyr::mutate(
time_chron = chron::times(as.numeric(time)),
time_chron = if_else(
is.na(time_chron),
time,
as.character(time_chron)),
time_clean = lubridate::parse_date_time(
x = time_chron,
orders = c(
"%I:%M %p", # HH:MM AM/PM 12 hour format
"%H:%M:%S", # HH:MM:SS 24 hour format
"%H:%M")), # HH:MM 24 hour format
time_clean = hms::as_hms(time_clean)) %>%
select(-time_chron)
Which gives us
> df
# A tibble: 7 × 2
time time_clean
<chr> <time>
1 12:30 PM 12:30:00
2 03:00 PM 15:00:00
3 0.961469907 23:04:31
4 0.913622685 21:55:37
5 0.911423611 21:52:27
6 09:10 AM 09:10:00
7 18:00 18:00:00

Subsetting dates in R

I have the following data
dat <- structure(list(Datetime = structure(c(1261987200, 1261987500,
1261987800, 1261988100, 1261988400), class = c("POSIXct", "POSIXt"
), tzone = ""), Rain = c(0, -999, -999, -999, -999)), row.names = c(NA,
5L), class = "data.frame")
The first column contains the dates (year, month, day, hour). The second column is Rainfall.
The dates are not continuous. Some of the dates with missing Rainfall were already removed.
I would like to ask what is the best way of subsetting this data in terms of Year, Day, month or hour?
For example, I just want to get all data for July (month = 7). What I do is something like this:
dat$month<-substr(dat$Datetime,6,7)
july<-dat[which(dat$month == 7),]
or if its a year, say 2010:
dat$year<-substr(dat$Datetime,1,4)
dat<-which(dat$year == 2010),]
Then convert them into numeric types.
Is there an easier way to do this in R? the dates are already formatted using POSIXlt.
I'll appreciate any help on this.
Lyndz
If you want to convert the Datetime to year or month (numeric), you can try format like below
df1 <- transform(
df,
year = as.numeric(format(Datetime,"%Y")),
month = as.numeric(format(Datetime,"%m"))
)
which gives
Datetime Rain year month
1 2009-12-28 09:00:00 0 2009 12
2 2009-12-28 09:05:00 -999 2009 12
3 2009-12-28 09:10:00 -999 2009 12
4 2009-12-28 09:15:00 -999 2009 12
5 2009-12-28 09:20:00 -999 2009 12
If you want to subset df1 further by year (for example, year == 2010), then
subset(
df1,
year == 2010
)
You can use the lubridate package and associated month and year functions.
library(tidyverse)
library(lubridate)
df <- structure(list(
Datetime = structure(
c(1261987200, 1261987500,
1261987800, 1261988100, 1261988400),
class = c("POSIXct", "POSIXt"),
tzone = ""
),
Rain = c(0,-999,-999,-999,-999)
),
row.names = c(NA,
5L),
class = "data.frame") %>%
as_tibble()
df %>%
mutate(month = lubridate::month(Datetime),
year = lubridate::year(Datetime))
Output:
# A tibble: 5 x 4
Datetime Rain month year
<dttm> <dbl> <dbl> <dbl>
1 2009-12-28 16:00:00 0 12 2009
2 2009-12-28 16:05:00 -999 12 2009
3 2009-12-28 16:10:00 -999 12 2009
4 2009-12-28 16:15:00 -999 12 2009
5 2009-12-28 16:20:00 -999 12 2009

aggregate data frame to typical year/week

so i have a large data frame with a date time column of class POSIXct and a another column with price data of class numeric. the date time column has values of the form "1998-12-07 02:00:00 AEST" that are half hour observations across 20 years. a sample data set can be generated with the following code (vary the 100 to whatever number of observations are necessary):
data.frame(date.time = seq.POSIXt(as.POSIXct("1998-12-07 02:00:00 AEST"), as.POSIXct(Sys.Date()+1), by = "30 min")[1:100], price = rnorm(100))
i want to look at a typical year and typical week. so for the typical year i have the following code:
mean.year <- aggregate(df$price, by = list(format(df$date.time, "%m-%d %H:%M")), mean)
it seems to give me what i want:
Group.1 x
1 01-01 00:00 31.86200
2 01-01 00:30 34.20526
3 01-01 01:00 28.40105
4 01-01 01:30 26.01684
5 01-01 02:00 23.68895
6 01-01 02:30 23.70632
however the column "Group.1" is of class character and i would like it to be of class POSIXct. how can i do this?
for the typical week i have the following code
mean.week <- aggregate(df$price, by = list(format(df$date.time, "%wday %H:%M")), mean)
the output is as follows
Group.1 x
1 0day 00:00 33.05613
2 0day 00:30 30.92815
3 0day 01:00 29.26245
4 0day 01:30 29.47959
5 0day 02:00 29.18380
6 0day 02:30 25.99400
again, column "Group.1" is of class character and i would like POSIXct. also, i would like to have the day of the week as "Monday", "Tuesday", etc. instead of 0day. how would i do this?
Convert the datetime to a character string that can validly be converted back to POSIXct and then do so:
mean.year <- aggregate(df["price"],
by = list(time = as.POSIXct(format(df$date.time, "2000-%m-%d %H:%M"))), mean)
head(mean.year)
## time price
## 1 2000-12-07 02:00:00 -0.56047565
## 2 2000-12-07 02:30:00 -0.23017749
## 3 2000-12-07 03:00:00 1.55870831
## 4 2000-12-07 03:30:00 0.07050839
## 5 2000-12-07 04:00:00 0.12928774
## 6 2000-12-07 04:30:00 1.71506499
To get the day of the week use %a or %A -- see ?strptime for the list of percent codes.
mean.week <- aggregate(df["price"],
by = list(time = format(df$date.time, "%a %H:%M")), mean)
head(mean.week)
## time price
## 1 Mon 02:00 -0.56047565
## 2 Mon 02:30 -0.23017749
## 3 Mon 03:00 1.55870831
## 4 Mon 03:30 0.07050839
## 5 Mon 04:00 0.12928774
## 6 Mon 04:30 1.71506499
Note
The input df in reproducible form -- note that set.seed is needed to make it reproducible:
set.seed(123)
df <- data.frame(date.time = seq.POSIXt(as.POSIXct("1998-12-07 02:00:00 AEST"),
as.POSIXct(Sys.Date()+1), by = "30 min")[1:100], price = rnorm(100))

Adding Time Stamp to a date in R [duplicate]

This question already has answers here:
R tick data : merging date and time into a single object
(2 answers)
Closed 5 years ago.
Hi Have 2 columns in a dataframe. Column 1 has Dates like 2017-01-01 and column 2 has time stamp like 1:00 PM.
I need to create another column that combines these 2 information and gives me the 2017-01-01 13:00:00
Use as.POSIXct to convert from character to date format.
df$date.time <- as.POSIXct(paste(df$date, df$time), format = "%Y-%m-%d %I:%M %p")
EDIT:
To provide some further context... You paste the date and the time column together to get the string 2017-001-01 1:00 PM.
You then input the format of the string as a POSIXct argument using format =. You can see the relationship between symbols and their meaning here.
Reproducible example
library(lubridate)
A <- data.frame(X1 = ymd("2017-01-01"),
X2 = "1:00 PM", stringsAsFactors=F)
# X1 X2
# 1 2017-01-01 1:00 PM
solution
library(dplyr)
library(lubridate)
temp <- A %>%
mutate(X3 = ymd_hm(paste(X1, X2)))
output
X1 X2 X3
<date> <chr> <dttm>
1 2017-01-01 1:00 PM 2017-01-01 13:00:00
multi-row input
B <- data.frame(X1 = ymd("2017-01-01", "2016-01-01"),
X2 = c("1:00 PM", "2:00 AM"), stringsAsFactors=F)
temp <- B %>%
mutate(X3 = ymd_hm(paste(X1, X2)))
# X1 X2 X3
# <date> <chr> <dttm>
# 1 2017-01-01 1:00 PM 2017-01-01 13:00:00
# 2 2016-01-01 2:00 AM 2016-01-01 02:00:00

R: extract hour from variable format timestamp

My dataframe has timestamp with and without seconds, and a random use of 0 in front of months and hours, i.e. 01 or 1
library(tidyverse)
df <- data_frame(cust=c('A','A','B','B'), timestamp=c('5/31/2016 1:03:12', '05/25/2016 01:06',
'6/16/2016 01:03', '12/30/2015 23:04:25'))
cust timestamp
A 5/31/2016 1:03:12
A 05/25/2016 01:06
B 6/16/2016 01:03
B 12/30/2015 23:04:25
How to extract hours into a separate column? The desired output:
cust timestamp hours
A 5/31/2016 1:03:12 1
A 05/25/2016 01:06 1
B 6/16/2016 9:03 9
B 12/30/2015 23:04:25 23
I prefer the answer with tidyverse and mutate, but my attempt fails to extract hours correctly:
df %>% mutate(hours=strptime(timestamp, '%H') %>% as.character() )
# A tibble: 4 × 3
cust timestamp hours
<chr> <chr> <chr>
1 A 5/31/2016 1:03:12 2016-10-31 05:00:00
2 A 05/25/2016 01:06 2016-10-31 05:00:00
3 B 6/16/2016 01:03 2016-10-31 06:00:00
4 B 12/30/2015 23:04:25 2016-10-31 12:00:00
Try this:
library(lubridate)
df <- data.frame(cust=c('A','A','B','B'), timestamp=c('5/31/2016 1:03:12', '05/25/2016 01:06',
'6/16/2016 09:03', '12/30/2015 23:04:25'))
df %>% mutate(hours=hour(strptime(timestamp, '%m/%d/%Y %H:%M')) %>% as.character() )
cust timestamp hours
1 A 5/31/2016 1:03:12 1
2 A 05/25/2016 01:06 1
3 B 6/16/2016 09:03 9
4 B 12/30/2015 23:04:25 23
Here is a solution that appends 00 for the seconds when they are missing, then converts to a date using lubridate and extracts the hours using format. Note, if you don't want the 00:00 at the end of the hours, you can just eliminate them from the output format in format:
df %>%
mutate(
cleanTime = ifelse(grepl(":[0-9][0-9]:", timestamp)
, timestamp
, paste0(timestamp, ":00")) %>% mdy_hms
, hour = format(cleanTime, "%H:00:00")
)
returns:
cust timestamp cleanTime hour
<chr> <chr> <dttm> <chr>
1 A 5/31/2016 1:03:12 2016-05-31 01:03:12 01:00:00
2 A 05/25/2016 01:06 2016-05-25 01:06:00 01:00:00
3 B 6/16/2016 01:03 2016-06-16 01:03:00 01:00:00
4 B 12/30/2015 23:04:25 2015-12-30 23:04:25 23:00:00
Your timestamp is a character string (), you need to format is as a date (with as.Date for example) before you can start using functions like strptime.
You are going to have to go through some string manipulations to have properly formatted data before you can convert it to dates. Prepend a zero to months with a single digit and append :00 to hours with missing seconds. Use strsplit() and other regex functions. Afterwards do as.Date(df$timestamp,format = '%m/%d/%Y %H:%M:%S'), then you will be able to use strptime to extract the hours.

Resources