Proper date formatting in R - r

I am currently working with this dataset.
'data.frame': 2938 obs. of 4 variables:
$ X : int 21562 21603 21618 21620 21659 21990 21996 22024 22592 22665 ...
$ uuid : Factor w/ 2938 levels "0005d695-6bc8-48ad-b323-803499630e43",..: 2396 2910 2372 2008 2582 1405 2114 1447 2348 2503 ...
$ date : Factor w/ 2927 levels "2015-06-06T06:33:14Z",..: 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 ...
$ type : Factor w/ 1 level "productCart": 1 1 1 1 1 1 1 1 1 1 ...
There is a variable date here, where date is in this format:
date --> 2015-06-06T06:33:14Z
I want to create a new variable and change date into a more workable format which should look like that:
NewDate --> 2015-06-06 06:33:14
Could you please give me any advice? I am trying few different approaches and none of them works so far.

You can use as.POSIXct to convert to 'POSIXct' class
date1 <- as.POSIXct(date, format='%Y-%m-%dT%H:%M:%SZ')
date1
#[1] "2015-06-06 06:33:14 EDT"
Or using lubridate
library(lubridate)
ymd_hms(date, tz='EDT')
#[1] "2015-06-06 06:33:14 EDT"
If we want to extract the hour and minute part, format can be used
format(date1, '%H:%M')
#[1] "06:33"
data
date <- "2015-06-06T06:33:14Z"

Related

How to create a date (column) from a date-time (column) in R

I have imported a CSV containing dates in the column "Activity_Date_Minute". The date value for example is "04/12/2016 01:12:00". Now when I read the .csv into a dataframe and extract only the date this gives me date in the column as 4-12-20. Can someone help how to get the date in mm-dd-yyyy in a separate column?
Tried the below code. Was expecting to see a column with dates e.g 04/12/2016 (mm/dd/yyyy).
#Installing packages
install.packages("tidyverse")
library(tidyverse)
install.packages('ggplot2')
library(ggplot2)
install.packages("dplyr")
library(dplyr)
install.packages("lubridate")
library(lubridate)
##Installing packages
install.packages("tidyverse")
library(tidyverse)
install.packages('ggplot2')
library(ggplot2)
install.packages("dplyr")
library(dplyr)
install.packages("lubridate")
library(lubridate)
##Reading minute-wise METs into "minutewiseMET_Records" and summarizing MET per day for all the IDs
minutewiseMET_Records <- read.csv("minuteMETsNarrow_merged.csv")
str(minutewiseMET_Records)
## converting column ID to character,Activity_Date_Minute to date
minutewiseMET_Records$Id <- as.character(minutewiseMET_Records$Id)
minutewiseMET_Records$Date <- as.Date(minutewiseMET_Records$Activity_Date_Minute)
str(minutewiseMET_Records)
The Console is as follows:
> minutewiseMET_Records <- read.csv("minuteMETsNarrow_merged.csv")
> str(minutewiseMET_Records)
'data.frame': 1048575 obs. of 3 variables:
$ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
$ Activity_Date_Minute: chr "04/12/2016 00:00" "04/12/2016 00:01" "04/12/2016 00:02" "04/12/2016 00:03" ...
$ METs : int 10 10 10 10 10 12 12 12 12 12 ...
> ## converting column ID to character,Activity_Date_Minute to date
> minutewiseMET_Records$Id <- as.character(minutewiseMET_Records$Id)
> minutewiseMET_Records$Date <- as.Date(minutewiseMET_Records$Activity_Date_Minute)
> ## converting column ID to character,Activity_Date_Minute to date
> minutewiseMET_Records$Id <- as.character(minutewiseMET_Records$Id)
> minutewiseMET_Records$Date <- as.Date(minutewiseMET_Records$Activity_Date_Minute)
> str(minutewiseMET_Records)
'data.frame': 1048575 obs. of 4 variables:
$ Id : chr "1503960366" "1503960366" "1503960366" "1503960366" ...
$ Activity_Date_Minute: chr "04/12/2016 00:00" "04/12/2016 00:01" "04/12/2016 00:02" "04/12/2016 00:03" ...
$ METs : int 10 10 10 10 10 12 12 12 12 12 ...
$ Date : Date, format: "4-12-20" "4-12-20" ...
>
I think this will work for you
minutewiseMET_Records$Date <- format(as.Date(minutewiseMET_Records$Activity_Date_Minute, format = "%d/%m/%Y"),"%m/%d/%Y")
Fist of all you have to tell R the format of your initial data. Then, you ask it which is the format you want for the output.
Activity_Date_Minute isn’t a datetime in your initial data, it’s a character. So you’ll have to first convert it to a datetime (e.g., using lubridate::mdy_hm()), then use as.Date().
library(dplyr)
library(lubridate)
minutewiseMET_Records %>%
mutate(
Activity_Date_Minute = mdy_hm(Activity_Date_Minute),
Activity_Date = as.Date(Activity_Date_Minute)
)
# A tibble: 4 × 2
Activity_Date_Minute Activity_Date
<dttm> <date>
1 2016-04-12 00:00:00 2016-04-12
2 2016-04-12 00:01:00 2016-04-12
3 2016-04-12 00:02:00 2016-04-12
4 2016-04-12 00:03:00 2016-04-12

Time series object

I have one table with two columns DATE and Q.
DATE Q
--------------------
2013-01-04 932
2013-01-05 409
2013-01-08 511
2013-01-11 121
2013-01-12 252
2013-01-13 201
2013-01-14 40
2013-01-15 66
2013-01-17 NA
2013-01-18 123
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 10 obs. of 2 variables:
$ DATE: POSIXct, format: "2013-01-04" "2013-01-05" "2013-01-08" "2013-01-11" ...
$ Q: num 932 409 511 121 252 201 40 66 NA 123 ..
You can see from data, there is a irregular frequency.First column have data which are converted into date format and in the second column data is numeric. So my intention is to convert this table into times series object, for further projections with forecast package.
So can anyone help me with some code to convert this table into ts object?
time <- seq(as.Date("2018-1-1"),as.Date("2019-1-1"),by=1)
df <- data.frame(Time=Time)
output <- dplyr::left_join(df,YOUR_TABLE,by="DATE")
Your table should have date column by name "DATE". So now you have NA values when your data is missing and you can transform your data to time series. I dont know if it this would help, for me sometimes it does. Maybe tackle NA problem with some replacing method.

How to convert monthly time-series in R

I am working on a monthly-based time-series data set:
> head(data, n=10)
# A tibble: 10 x 2
Month Inflation
<dttm> <dbl>
1 1979-01-01 00:00:00 0.0258
2 1979-02-01 00:00:00 0.0234
3 1979-03-01 00:00:00 0.0055
4 1979-04-01 00:00:00 0.0302
5 1979-05-01 00:00:00 0.0305
6 1979-06-01 00:00:00 0.0232
7 1979-07-01 00:00:00 0.025
8 1979-08-01 00:00:00 0.0234
9 1979-09-01 00:00:00 0.0074
10 1979-10-01 00:00:00 0.0089
Although it appears that the data is yet to be recognized as a time-series data as it shows the following structure:
> str(data)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 479 obs. of 2 variables:
$ Month : POSIXct, format: "1979-01-01" "1979-02-01" "1979-03-01" "1979-04-01" ...
$ Inflation: num 0.0258 0.0234 0.0055 0.0302 0.0305 0.0232 0.025 0.0234 0.0074 0.0089 ...
When I tried to convert it using xts function, it gave me this error:
> inflation <- xts(data[,-1], order.by=as.Date(data[,1], "%m/%d/%Y"))
Error in as.Date.default(data[, 1], "%m/%d/%Y") :
do not know how to convert 'data[, 1]' to class “Date”
Please help me with the most appropriate way of data conversion.
Thanks
# You have something like:
data <- data.frame(
Month = as.Date(as.Date("1979-01-01"):as.Date("2000-01-01"), origin="1970-01-01"),
Inflation = rnorm(7671)) # same number of obs
Create TS
choose start and end dates appropriatelly
tseries <- ts(data$Inflation, start = c(1979,1), end = c(2000,1), frequency = 12)
plot(tseries)

R - Find a value based on a criteria

I have a dataframe DF in which I have numerous of columns, one is with Dates and an other is the Hour.
My point is that I need to find the PRICE (dame datafra 36 hours before. All my days don't have 24 hours so I can't just shift my data set.
My idea was to look for the day before in my dataset & 12 hours before.
This is what I wrote but this is not working:
for (i in 38:nrow(DF)){
RefDay=as.Date(DF$Date[i])
HourRef=DF$Hour[i]
DF$P24[i]=DF[which(DF$Date == (RefDay-1))& which(DF$Hour == (HourRef-36)),"PRICE"]
}
Here is my DF:
'data.frame': 20895 obs. of 45 variables:
$ Hour : Factor w/ 24 levels "0","1","2","3",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Date : POSIXct, format: "2016-07-01" "2016-07-01" "2016-07-01" "2016-07-01" ...
$ PRICE : num 29.4 24.7 23.4 21.9 20.2 ...
Here is a sample of my data:
DF.Hour DF.Date DF.PRICE
1 0 2016-07-01 29.36
2 1 2016-07-01 24.69
3 2 2016-07-01 23.42
4 3 2016-07-01 21.91
5 4 2016-07-01 20.19
6 5 2016-07-01 22.44
Try to fill the data.frame with full days. You can do it with complete in tidyr. It will fill the not existing values with NA.
If you have any NAs in your full data.frame you can go for the 36th element before with for example lag(price, 36).
DF <- complete(DF, Hour, Date) %>% arrange(Date)
DF$Price[is.na(DF$Price)] <- lag(Price, 36)

converting data frame (factors) into xts

I know this have been asked several times but I could not find the right way to get around my problem. I have a very simple CSV file that I upload, looking like:
27.07.2015,100
28.07.2015,100.1504
29.07.2015,100.1957
30.07.2015,100.5044
31.07.2015,100.7661
03.08.2015,100.9308
04.08.2015,100.8114
05.08.2015,100.6927
06.08.2015,100.7501
07.08.2015,100.7194
10.08.2015,100.8197
11.08.2015,100.8133
Now I need to convert my data.frame into xts so I can use the PerformanceAnalytics package. My data.frame has the structure:
> str(mpey)
'data.frame': 243 obs. of 2 variables:
$ V1: Factor w/ 243 levels "01.01.2016","01.02.2016",..: 210 218 228 234 241 21 30 38 45 52 ...
- attr(*, "names")= chr "5" "6" "7" "8" ...
$ V2: Factor w/ 242 levels "100","100.0062",..: 1 4 5 10 16 20 17 13 15 14 ...
- attr(*, "names")= chr "5" "6" "7" "8" ...
I tried different things with as.xts function but could make it work.
Could you please help me get over this?
Here's a solution using the tidyquant package, which contains as_xts() for coercing data frames to xts objects and as_tibble() for coercing time series objects such as xts to tibbles ("tidy" data frames).
Recreate your data
> data_df
# A tibble: 12 × 2
date value
<fctr> <fctr>
1 27.07.2015 100
2 28.07.2015 100.1504
3 29.07.2015 100.1957
4 30.07.2015 100.5044
5 31.07.2015 100.7661
6 03.08.2015 100.9308
7 04.08.2015 100.8114
8 05.08.2015 100.6927
9 06.08.2015 100.7501
10 07.08.2015 100.7194
11 10.08.2015 100.8197
12 11.08.2015 100.8133
First, we need to reformat your data frame. The dates and values are both stored as factors and they need to be in a date and double class, respectively. We'll load tidyquant and reformat the data frame. Note that tidyquant loads the tidyverse and financial packages so you don't need to load anything else. The date can be converted with lubridate::dmy which converts characters in a day-month-year format to date. The value needs to go from factor to character then from character to double, and this is done by nesting as.numeric and as.character.
> library(tidyquant)
> data_tib <- data_df %>%
mutate(date = dmy(date),
value = as.numeric(as.character(value)))
> data_tib
# A tibble: 12 × 2
date value
<date> <dbl>
1 2015-07-27 100.0000
2 2015-07-28 100.1504
3 2015-07-29 100.1957
4 2015-07-30 100.5044
5 2015-07-31 100.7661
6 2015-08-03 100.9308
7 2015-08-04 100.8114
8 2015-08-05 100.6927
9 2015-08-06 100.7501
10 2015-08-07 100.7194
11 2015-08-10 100.8197
12 2015-08-11 100.8133
Now, we can coerce to xts using the tidyquant::as_xts() function. Just specify date_col = date.
> data_xts <- data_tib %>%
as_xts(date_col = date)
> data_xts
value
2015-07-27 100.0000
2015-07-28 100.1504
2015-07-29 100.1957
2015-07-30 100.5044
2015-07-31 100.7661
2015-08-03 100.9308
2015-08-04 100.8114
2015-08-05 100.6927
2015-08-06 100.7501
2015-08-07 100.7194
2015-08-10 100.8197
2015-08-11 100.8133

Resources