R lubridate as_date does not convert datetime to date [duplicate] - r

This question already has an answer here:
Convert factor to date class for multiple columns
(1 answer)
Closed 2 years ago.
I read in an array from Excel using read_excel, and get two datetime columns, but what I need is two columns of dates
User DOB Answer_dt Question Answer
<chr> <dttm> <dttm> <int> <int>
1 User1 1900-01-01 00:00:00 2017-01-26 00:00:00 1 7
2 User2 1900-01-01 00:00:00 2017-01-26 00:00:00 2 8
I would like the datetime columns to be converted to dates (the times are irrelevant), and have tried using mutate and lubridate in various combinations, but have succeeded only in getting an error message that I don't understand:
> library(lubridate)
> dt <- eML_daily[1, "DOB"]
> dt
# A tibble: 1 x 1
DOB
<dttm>
1 1900-01-01 00:00:00
Warning message:
`...` is not empty.
These dots only exist to allow future extensions and should be empty.
Did you misspecify an argument?
> as_date(dt)
Error in as.Date.default(x, ...) :
do not know how to convert 'x' to class “Date”
> as_date(df[,"DOB"])
Error in as.Date.default(x, ...) :
do not know how to convert 'x' to class “Date”
I don't understand the warning messages, and can't quite see what I am doing wrong. Surely it should be a simple matter to convert from dttm to date and discard the time, which I don't need.
I'd be very appreciative for a pointer.
Sincerely and with many thanks in advance
Thomas Philips

In as_date(dt) you are attempting to convert a tibble to a datetime. That unsurprisingly fails. In as_date(df[,"DOB"]), I can't say what you are trying to do as you haven't given us df.
Working example;
library(tidyverse)
library(lubridate)
dt <- tibble(x=as_datetime("2017-01-26 00:00:00"))
dt
# A tibble: 1 x 1
x
<dttm>
1 2017-01-26 00:00:00
dt %>% mutate(x=as_date(x))
# A tibble: 1 x 1
x
<date>
1 2017-01-26

You can use as.Date to convert date-time columns to date.
If you want to change columns 2 and 3 to date, you can do.
eML_daily[2:3] <- lapply(eML_daily[2:3], as.Date)
Or with dplyr :
library(dplyr)
eML_daily %>% mutate(across(2:3, as.Date))
#For dplyr < 1.0.0
#eML_daily %>% mutate_at(2:3, as.Date)

Have you tried to convert it to character first?
Here's a quick sample:
x <- tibble(dt = c(Sys.time(),Sys.time() - 345767)) %>%
mutate(dt = as_date(as.character(dt)))

Related

Easiest way to convert a data.frame to a time series object in R

I need to read a data series stored in a .csv in R and analyze it using the library TSstudio. This data series consists of two columns, the first one stores the date, the second one stores a floating point value measured daily. As straightforward as it could get.
So I first read the csv as a data.frame:
a_data_frame <- read.csv("some_data.csv", sep=";", dec = ",", col.names=c("date", "value"))
head(a_data_frame)
A data.frame: 6 × 2
date value
<chr> <dbl>
1 04/06/1986 0.065041
2 05/06/1986 0.067397
3 06/06/1986 0.066740
4 09/06/1986 0.068247
5 10/06/1986 0.067041
6 11/06/1986 0.066740
The values in the first column are of type char, so I convert them to date thanks to the library lubridate:
library(lubridate)
a_data_frame$date <- dmy(a_data_frame$date)
head(a_data_frame)
A data.frame: 6 × 2
date value
<date> <dbl>
1 1986-06-04 0.065041
2 1986-06-05 0.067397
3 1986-06-06 0.066740
4 1986-06-09 0.068247
5 1986-06-10 0.067041
6 1986-06-11 0.066740
Here comes my headache. When I try to convert the data.frame to time series, I get a matrix of type character instead:
a_time_series <- as.ts(a_data_frame)
head(a_time_series)
A matrix: 6 × 2 of type chr
date value
1986-06-04 0.065041
1986-06-05 0.067397
1986-06-06 0.066740
1986-06-09 0.068247
1986-06-10 0.067041
1986-06-11 0.066740
Is there any other way to convert a data.frame to a ts object?
Assuming some_data.csv generated reproducibly in the Note read it into a zoo series and then use as.ts. That gives a daily series with NA's for the missing days and the time being the number of days since the Epoch. That may or may not be the ts object you want but the question did not specify it further. Also see this answer.
library(zoo)
z <- read.csv.zoo("some_data.csv", format = "%d/%m/%Y")
tt <- as.ts(z); tt
## Time Series:
## Start = 5998
## End = 6005
## Frequency = 1
## [1] 0.065041 0.067397 0.066740 NA NA 0.068247 0.067041
0.066740
Note
Lines <- "date,value
04/06/1986,0.065041
05/06/1986,0.067397
06/06/1986,0.066740
09/06/1986,0.068247
10/06/1986,0.067041
11/06/1986,0.066740"
cat(Lines, file = "some_data.csv")

How to use group_by without ordering alphabetically?

I'm trying to visualize some bird data, however after grouping by month, the resulting output is out of order from the original data. It is in order for December, January, February, and March in the original, but after manipulating it results in December, February, January, March.
Any ideas how I can fix this or sort the rows?
This is the code:
BirdDataTimeClean <- BirdDataTimes %>%
group_by(Date) %>%
summarise(Gulls=sum(Gulls), Terns=sum(Terns), Sandpipers=sum(Sandpipers),
Plovers=sum(Plovers), Pelicans=sum(Pelicans), Oystercatchers=sum(Oystercatchers),
Egrets=sum(Egrets), PeregrineFalcon=sum(Peregrine_Falcon), BlackPhoebe=sum(Black_Phoebe),
Raven=sum(Common_Raven))
BirdDataTimeClean2 <- BirdDataTimeClean %>%
pivot_longer(!Date, names_to = "Species", values_to = "Count")
You haven't shared any workable data but i face this many times when reading from csv and hence all dates and data are in character.
as suggested, please convert the date data to "date" format using lubridate package or base as.Date() and then arrange() in dplyr will work or even group_by
example :toy data created
birds <- data.table(dates = c("2020-Feb-20","2020-Jan-20","2020-Dec-20","2020-Apr-20"),
species = c('Gulls','Turns','Gulls','Sandpiper'),
Counts = c(20,30,40,50)
str(birds) will show date is character (and I have not kept order)
using lubridate convert dates
birds$dates%>%lubridate::ymd() will change to date data-type
birds$dates%>%ymd()%>%str()
Date[1:4], format: "2020-02-20" "2020-01-20" "2020-12-20" "2020-04-20"
save it with birds$dates <- ymd(birds$dates) or do it in your pipeline as follows
now simply so the dplyr analysis:
birds%>%group_by(Months= ymd(dates))%>%
summarise(N=n()
,Species_Count = sum(Counts)
)%>%arrange(Months)
will give
# A tibble: 4 x 3
Months N Species_Count
<date> <int> <dbl>
1 2020-01-20 1 30
2 2020-02-20 1 20
3 2020-04-20 1 50
However, if you want Apr , Jan instead of numbers and apply as.Date() with format etc, the dates become "character" again. I woudl suggest you keep your data that way and while representing in output for others -> format it there with as.Date or if using DT or other datatables -> check the output formatting options. That way your original data remains and users see what they want.
this will make it character
birds%>%group_by(Months= as.character.Date(dates))%>%
summarise(N=n()
,Species_Count = sum(Counts)
)%>%arrange(Months)
A tibble: 4 x 3
Months N Species_Count
<chr> <int> <dbl>
1 2020-Apr-20 1 50
2 2020-Dec-20 1 40
3 2020-Feb-20 1 20
4 2020-Jan-20 1 30

How to convert a "char" column to datetime column in large datasets

I am working with large datasets and in which one column is represented as char data type instead of a DateTime datatype. I trying it convert but I am unable to convert it.
Could you please suggest any suggestions for this problem? it would be very helpful for me
Thanks in advance
code which i am using right now
c_data$dt_1 <- lubridate::parse_date_time(c_data$started_at,"ymd HMS")
getting output:
2027- 05- 20 20:10:03
but desired output is
2020-05-20 10:03
Here is another way using lubridate:
library(lubridate)
df <- tibble(start_at = c("27/05/2020 10:03", "25/05/2020 10:47"))
df %>%
mutate(start_at = dmy_hms(start_at))
# A tibble: 2 x 1
start_at
<dttm>
1 2020-05-27 20:10:03
2 2020-05-25 20:10:47
In R, dates and times have a single format. You can change it's format to your required format but then it would be of type character.
If you want to keep data in the format year-month-day min-sec you can use format as -
format(Sys.time(), '%Y-%m-%d %M:%S')
#[1] "2021-08-27 17:54"
For the entire column you can apply this as -
c_data$dt_2 <- format(c_data$dt_1, '%Y-%m-%d %M:%S')
Read ?strptime for different formatting options.
Using anytime
library(dplyr)
library(anytime)
addFormats("%d/%m/%Y %H:%M")
df %>%
mutate(start_at = anytime(start_at))
-output
# A tibble: 2 x 1
start_at
<dttm>
1 2020-05-27 10:03:00
2 2020-05-25 10:47:00

Mutate and format multiple date columns [duplicate]

This question already has an answer here:
Convert multiple character columns to as.Date and time in R
(1 answer)
Closed 2 years ago.
I have a tibble containing some date columns formatted as strings:
library(tidyverse)
df<-tibble(dates1 = c("2020-08-03T00:00:00.000Z", "2020-08-03T00:00:00.000Z"),
dates2 = c("2020-08-05T00:00:00.000Z", "2020-08-05T00:00:00.000Z"))
I want to convert the strings from YMD-HMS to DMY-HMS. Can someone explain to me why this doesn't work:
df %>%
mutate_at(vars(starts_with("dates")), as.Date, format="%d/%m/%Y %H:%M:%S")
Whereas this does?
df %>% mutate(dates1 = format(as.Date(dates1), "%d/%m/%Y %H:%M:%S")) %>%
mutate(dates2 = format(as.Date(dates2), "%d/%m/%Y %H:%M:%S"))
Finally, is it possible to assign these columns as 'datetime' columns (e.g. dttm) rather than chr once the date formatting has taken place?
The format argument which you are passing is for as.Date whereas what you really want is to pass it for format function. You can use an anonymous function for that or use formula syntax.
library(dplyr)
df %>%
mutate(across(starts_with("dates"), ~format(as.Date(.), "%d/%m/%Y %H:%M:%S")))
# A tibble: 2 x 2
# dates1 dates2
# <chr> <chr>
#1 03/08/2020 00:00:00 05/08/2020 00:00:00
#2 03/08/2020 00:00:00 05/08/2020 00:00:00
To represent data as date or datetime R uses standard way of representing them which is Y-M-D H:M:S, you can change the representation using format but then the output would be character as above.
df %>%
mutate(across(starts_with("dates"), lubridate::ymd_hms))
# dates1 dates2
# <dttm> <dttm>
#1 2020-08-03 00:00:00 2020-08-05 00:00:00
#2 2020-08-03 00:00:00 2020-08-05 00:00:00

Why is dcast giving 1 and 0? [duplicate]

This question already has answers here:
dcast warning: ‘Aggregation function missing: defaulting to length’
(2 answers)
Closed 1 year ago.
I have the following dataframe
FileNumber<-c("510708396","510708396","510708396","510708485","510667325")
EventCode<-c("CASCRT","DISCSENT","DISCSENT","CASCRT","DISCSENT")
EventDate<-c("8/21/2018 12:00:00 AM","12/3/2018 2:41:18 PM","12/3/2018 3:50:16 PM","8/23/2018 12:00:00 AM","12/12/2018 9:11:28 AM")
df<-data.frame(FileNumber,EventCode,EventDate)
FileNumber EventCode EventDate
1 510708396 CASCRT 8/21/2018 12:00:00 AM
2 510708396 DISCSENT 12/3/2018 2:41:18 PM
3 510708396 DISCSENT 12/3/2018 3:50:16 PM
4 510708485 CASCRT 8/23/2018 12:00:00 AM
5 510667325 DISCSENT 12/12/2018 9:11:28 AM
I want to change this long format dataframe into a wide format data with using EventCodes CASRT and DISCSENT as the column names. I tried the following
library(reshape2)
dcast(df,FileNumber~EventCode,value.var = "EventDate")
however I recieve the following and a message that "Aggregation function missing: defaulting to length" where as I was expecting the EventDate values.
FileNumber CASCRT DISCSENT
1 510667325 0 1
2 510708396 1 2
3 510708485 1 0
I'm guessing this has something do to do with the non-unique values in the FileNumber how do I make sure that I get the Event Date values instead of 1's and 0's.
You get this error because there are multiple rows with same EventNumber and EventCode. When trying to cast the data into wide format, reshape does not know how to handle multiple values and uses its fallback solution which is lenght (i.e. counting how many elements there are in this cell)
You need to decide how you want to proceed in the case where there are more than value per cell.
You could transform the EventDate column to date-time format, so that you can compute the mean value. Or use only the max or min.
If you want to keep each date in a list, I'd highly suggest using tidyr s pivot_wider function:
FileNumber<-c("510708396","510708396","510708396","510708485","510667325")
EventCode<-c("CASCRT","DISCSENT","DISCSENT","CASCRT","DISCSENT")
EventDate<-c("8/21/2018 12:00:00 AM","12/3/2018 2:41:18 PM","12/3/2018 3:50:16 PM","8/23/2018 12:00:00 AM","12/12/2018 9:11:28 AM")
df<-data.frame(FileNumber,EventCode,EventDate)
library(dplyr)
library(tidyr)
df2 <- df %>%
pivot_wider(names_from = EventCode,
values_from = EventDate)
This raises a warning, but puts the multiple elements in a list:
df2 is now:
# A tibble: 3 x 3
FileNumber CASCRT DISCSENT
<fct> <list<fct>> <list<fct>>
1 510708396 [1] [2]
2 510708485 [1] [0]
3 510667325 [0] [1]
And we can access the elements in the list:
df2$DISCSENT[1]
Returns:
list_of<factor<b7763>>[1]>
[[1]]
[1] 12/3/2018 2:41:18 PM 12/3/2018 3:50:16 PM
5 Levels: 12/12/2018 9:11:28 AM ... 8/23/2018 12:00:00 AM

Resources