Converting dataframe to date format [duplicate] - r

This question already has an answer here:
Convert factor to date class for multiple columns
(1 answer)
Closed 5 years ago.
i have a dataframe containing 250 columns of dates and in time in character format except for column 1 which contains Employee ID id. How can i convert all the columns except 1st column to date format.
1 1/5/2015 17:20 1/6/2015 17:19 1/7/2015 16:34 1/8/2015 17:08
2 1/2/2015 18:22 1/5/2015 17:48 NA 1/7/2015 17:09
3 1/2/2015 16:59 1/5/2015 17:06 1/6/2015 16:38 1/7/2015 16:33
4 1/2/2015 17:25 1/5/2015 17:14 1/6/2015 17:07 1/7/2015 16:32
5 1/2/2015 18:31 1/5/2015 17:49 1/6/2015 17:26 1/7/2015 17:37
6 1/2/2015 20:29 1/5/2015 20:57 1/6/2015 21:06 1/7/2015 20:36
Above date and in time of employee are in character format.
Tried doing
parse_date_time(df[,-1],"ymd_HMS") and parse_date_time(df[,2:250],"ymd_HMS")
but the same is not working. However while specifying only one column the syntax is working. Practically to do for 250 codes by individually specifying each columns is bad coding.

Apply strtime function to turn your character values into datetime values.
df[,2:250] <- as.data.frame(lapply(df[,2:250], strptime, format="%Y-%m-%d %H:%M:%S"))
The df[,2:250] will take only the columns you are interested in.
The format format="%Y-%m-%d %H:%M:%S" describes the format of your character entries. You can see what the letters mean here.

Related

Subsetting data based on time in R

I have a set of traffic data that has date and time columns, however, I'm having issues to properly subset the data according to the specific times. Is there a way to properly subset data based on date and time ranges? Using a filter or subset does not seem to work for me.
For e.g. I would like to extract data from 17/08/2019 to 19/08/2019 and for the following time periods: 06:00 to 07:00, 08:30 to 10:00, 12:00 to 13:00, 17:30 to 19:00, 19:00 to 20:00 and 20:00 to 22:00. I appreciate everyone's advice, please!
Vehicle.No. Date Time Payment.Amount
SXX0001A 17/08/2019 00:01 1.25
SXX0002A 17/08/2019 00:21 5
SXX0003A 17/08/2019 00:31 0
SXX0004A 17/08/2019 02:01 3
SXX0005A 17/08/2019 03:01 2
SXX0006A 17/08/2019 18:01 1.25
.
.
.
SXX0007A 18/08/2019 00:01 1.25
SXX0008A 18/08/2019 02:01 1.25
SXX0009A 18/08/2019 19:01 1.25
SXX0010A 18/08/2019 20:01 1.25
.
.
.
SXX0006A 20/08/2019 02:01 1.25
SXX0006A 20/08/2019 03:01 3.25
SXX0006A 20/08/2019 01:01 5.25
SXX0006A 20/08/2019 12:01 0
SXX0006A 20/08/2019 14:01 1.25
.
.
.
The first thing is to make sure that your Date and Time variables are in date and time formats respectively. It is impossible to tell, from what you are providing, whether this is the case or whether those variables are characters or factors.
Let's assume that they are characters:
df <- read.table(
text =
"Vehicle.No. Date Time Payment.Amount
SXX0001A 17/08/2019 00:01 1.25
SXX0002A 17/08/2019 00:21 5
SXX0003A 17/08/2019 00:31 0
SXX0004A 17/08/2019 02:01 3
SXX0005A 17/08/2019 03:01 2
SXX0006A 17/08/2019 18:01 1.25
SXX0007A 18/08/2019 00:01 1.25
SXX0008A 18/08/2019 02:01 1.25
SXX0009A 18/08/2019 19:01 1.25
SXX0010A 18/08/2019 20:01 1.25
SXX0006A 20/08/2019 02:01 1.25
SXX0006A 20/08/2019 03:01 3.25
SXX0006A 20/08/2019 01:01 5.25
SXX0006A 20/08/2019 12:01 0
SXX0006A 20/08/2019 14:01 1.25",
stringsAsFactors = F,
header = T
)
str(df$Date)
chr [1:15] "17/08/2019" "17/08/2019" "17/08/2019" "17/08/2019" ...
str(df$Time)
chr [1:15] "00:01" "00:21" "00:31" "02:01" "03:01" "18:01" "00:01" "02:01" ...
Let's create 2 new variables (date and datetime) in date and datetime formats. I am creating a datetime variable rather than a time one because this will come in handy later. The package readr has great functions to parse vectors.
library(dplyr)
library(readr)
df <-
df %>%
mutate(
date = parse_date(Date, "%d/%m/%Y"),
datetime = parse_datetime(paste(Date, Time), "%d/%m/%Y %H:%M")
)
str(df$date)
Date[1:15], format: "2019-08-17" "2019-08-17" "2019-08-17" ...
str(df$datetime)
POSIXct[1:15], format: "2019-08-17 00:01:00" "2019-08-17 00:21:00" ...
It is not clear to me how you want your output (do you want to filter the data that fit in any of the times you list? or do you want to filter for each date and time period separately?). Let's assume that you want all of the data that fit in any of the date and time periods you list.
Since we need to filter for the same time periods for several days, we will use purrr to avoid code repetition:
create a list of filtered data frames (each element corresponding to one of the days of interest)
create a function that will filter data for all the time periods of interest for a certain day. This function uses the package lubridate.
apply the function to each element of the list and output a data frame thanks to purrr:map_df() and remove the variables time and datetime we had created (though maybe you should keep them and get rid of your Date and Time variables instead).
library(purrr)
library(lubridate)
ls <- list(
filter(df, date == "2019-08-17"),
filter(df, date == "2019-08-18"),
filter(df, date == "2019-08-19")
)
select_times <- function(df) {
df %>%
filter(
datetime %within% interval(paste(unique(df$date), "06:00:00"),
paste(unique(df$date), "07:00:00")) |
datetime %within% interval(paste(unique(df$date), "08:30:00"),
paste(unique(df$date), "10:00:00")) |
datetime %within% interval(paste(unique(df$date), "12:00:00"),
paste(unique(df$date), "13:00:00")) |
datetime %within% interval(paste(unique(df$date), "17:30:00"),
paste(unique(df$date), "22:00:00"))
)
}
map_df(ls, select_times) %>%
select(- date, - datetime)
Output:
Vehicle.No. Date Time Payment.Amount
1 SXX0006A 17/08/2019 18:01 1.25
2 SXX0009A 18/08/2019 19:01 1.25
3 SXX0010A 18/08/2019 20:01 1.25
This is the subset of your data for the time periods of interest during the days of interest.
For alternative solutions, you might want to look at the package xts. This post could be useful.

Converting date column from factor into posixct makes my dataframe have a length and size of zero

I am trying to figure out a problem I'm having with converting a date column from factor to posixct. I have a script which I have run several times with no issue. When I read my .csv file, it has a date column which comes as a factor. But I convert it into POSIXct using as.POSIXct() so I can create a plot and do some calculations.
Recently, I had to reinstall R. Now when I try to convert from factor to posixct, my dataframe length and size become zero. I have searched for a solution, but could not find a solution. Why would as.POSIXct() erase my dataframe?
date Julian Hour MINUTE date1 v1
2004-04-25 116 18 0 2004-04-25 18:00 0.0000001
2004-04-25 116 18 30 2004-04-25 18:30 0.0000001
2004-04-25 116 19 0 2004-04-25 19:00 0.0000002
2004-04-25 116 19 30 2004-04-25 19:30 0.0000003
2004-04-25 116 21 0 2004-04-25 21:00 0.0000001
You may try to include stringsAsFactors = F in the read.csv() so that read.csv will read your strings as character instead of factors.

boxplot time series by monthly average in r

I have a time series with hourly data on energy consumption in the form of a zoo object. And there are 16 indices (in the range [1:143206]) for which the Date is NA. Here is a sample of the data:
Date PJMW_MW
1 2002-04-01 01:00:00 4374
...
8709 2003-03-29 23:00:00 4827
8710 2003-03-30 00:00:00 4611
8711 2003-03-30 01:00:00 4421
8712 NA 4285
8713 2003-03-30 03:00:00 4212
8714 2003-03-30 04:00:00 4321
...
143206 2018-08-03 00:00:005489
The data above is a data.frame object called dat but I have it in a zoo object called hourly_ts:
1 4374
...
7709 6135
7710 6324
7711 6626
7712 6866
7713 6987
7714 7028
7715 7026
...
143206 5265
I would like to see the monthly averages, like, for which month is the consumption generally higher, and I saw that there is a simple formula for this: boxplot(hourly_ts ~ cycle(hourly_ts))
But the error Error in cycle.zoo(hourly_ts) : ‘x’is not regular appears.
The weird thing is that hourly_ts has a specified frequency (24 hours per day) and start time (April 1st 2002 01:00:00), so from that there shouldn't be any missing values in the time.
Supposing the missing values are what's causing the irregularity, is there a way I can add the values myself?
I would also like to use the aggregate function but have no idea what the by parameter should be.

How to shift date and time column by 6 hours in R?

I have a date and time column which follows 24 hour format. Now I want it to shift it by 6 hours, such that 6 am of the current day becomes 00:00 and the day completes on 6 am of the following day. In excel, if we subtract 0.25 from the date column, it directly shifts the dates by 6 hours. But similar functionality doesn't seem to work in R. How does one achieve this in R?
You should provide more information on your question, like the data you're using.
To replicate that with R, you could use the lubridate package :
library(lubridate)
new_time <- time - hms("06:00:00")
Hope this helps
A solution using base R. By default arithmetic operations add seconds, so:
now <- Sys.time() #gets the current time
now
"2016-04-22 09:52:21 CEST"
now + 6*3600
"2016-04-22 15:52:21 CEST"
With your data, you can try, around strptime:
df <- read.table(text="
DateTime
2/10/2016 19:18
2/10/2016 19:15
2/10/2016 19:12
2/10/2016 19:09
2/10/2016 19:06
2/10/2016 19:03", sep=";", h=T)
df
DateTime
1 2/10/2016 19:18
2 2/10/2016 19:15
3 2/10/2016 19:12
4 2/10/2016 19:09
5 2/10/2016 19:06
6 2/10/2016 19:03
df$NewTime <- strptime(as.character(df$DateTime), format="%d/%m/%Y %H:%M") 6*3600
df
DateTime NewTime
1 2/10/2016 19:18 2016-10-03 01:18:00
2 2/10/2016 19:15 2016-10-03 01:15:00
3 2/10/2016 19:12 2016-10-03 01:12:00
4 2/10/2016 19:09 2016-10-03 01:09:00
5 2/10/2016 19:06 2016-10-03 01:06:00
6 2/10/2016 19:03 2016-10-03 01:03:00
You could remove the as.character step with stringsAsFactors = FALSE in read.table.
Does it solve your problem?

Reading in dates from Excel into R

I have multiple csv files which I need to read into R. The first column of the files contain dates and times, which I am converting into POSIXlt when I have loaded the data frame. Each of my csv files have the dates and times formatted in the same way in Excel, however, some files are read in differently.
For example,
My file looks like this once imported:
date value
1 2011/01/01 00:00:00 39
2 2011/01/01 00:15:00 35
3 2011/01/01 00:30:00 38
4 2011/01/01 00:45:00 39
5 2011/01/01 01:00:00 38
6 2011/01/01 01:15:00 38
Therefore, the code I use to amend the format is:
DATA$date <- as.POSIXlt(DATA$date,format="%Y/%m/%d %H:%M:%S")
However, some files are being read in as:
date value
1 01/01/2011 00:00 39
2 01/01/2011 00:15 35
3 01/01/2011 00:30 38
4 01/01/2011 00:45 39
5 01/01/2011 01:00 38
6 01/01/2011 01:15 38
Which means my format section of my code does not work and gives an error. Therefore, is there anyway to automatically detect which format the date column is in? Or, is there a way of knowing how it will be read, since the format of the column in Excel is the same on both.
When using the wrong formatting string for your date input, I seem to get NA values. If this be the case, you solve this problem in two steps. First, format the dates from Excel assuming that you have all three of hours, minutes, and seconds:
date.original <- DATA$date
DATA$date <- as.POSIXlt(DATA$date,format="%Y/%m/%d %H:%M:%S")
This should leave NA values in the date column for those dates which be missing seconds. Then you can try this:
DATA$date[is.na(DATA$date)] <- as.POSIXlt(date.original, format="%Y/%m/%d %H:%M")
This should cover the remaining data.
Data
DATA <- data.frame(date=c('2011/01/01 00:00:00', '2011/01/01 00:15',
'2011/01/01 00:30:00', '2011/01/01 00:45'),
value=c(39, 35, 38, 39))

Resources