I would like to transform EncounterDate_G column to be in the format of Year-month-day eg: 2021-1-24. I tried messing around with the mdy and ymd functions as part of lubridate but I'm not having much luck.
library(dplyr)
library(tidyr)
library(lubridate)
gwlfullflattened2 <-
structure(list(V1_G = 1:20, mrn_G = 100:119, Full.name_G = c("Aintzane Eilert",
"Dervila Muriel", "Hermes Ingólfr", "Yordana Hadley", "Talaat Archembald",
"Erato Hozan", "Abram Eli", "Drahoslava Gottfrid", "Itxaro Csenge",
"Isokrates Linas", "Yejide Calixto", "Bohuslav Fedlimid", "Siva Jerneja",
"Mae Albie", "Rodolfo Slavomír", "Neptune Mahesh", "Madhavi Luka",
"Lexia Lành", "Marnie Urien", "Hovsep Tase"), date_of_birth_G = c("1/1/1990",
"1/1/1991", "1/1/1992", "1/1/1993", "1/1/1994", "1/1/1995", "1/1/1996",
"1/1/1997", "1/1/1998", "1/1/1999", "1/1/2000", "1/1/2001", "1/1/2002",
"1/1/2003", "1/1/2004", "1/1/2005", "1/1/2006", "1/1/2007", "1/1/2008",
"1/1/2009"), EncounterDate_G = c("1/5/2016", "1/4/2021", "1/21/2021",
"5/25/2021", "5/19/2021", "5/17/2021", "12/2/2021", "12/1/2021",
"1/5/2016", "1/4/2021", "1/21/2021", "5/25/2021", "5/19/2021",
"5/17/2021", "12/2/2021", "12/1/2021", "1/5/2016", "1/4/2021",
"1/21/2021", "5/25/2021")), row.names = c(NA, -20L), class = c("data.table",
"data.frame"))
gwlfullflattened22 <-
gwlfullflattened2 %>%
mutate(across(c(EncounterDate_G), mdy())) %>%
mutate(across(c(EncounterDate_G), as_datetime)) %>%
as_tibble()
The error I get is
Error: Problem with mutate() input ..1.
i ..1 = across(c(EncounterDate_G), mdy()).
x Problem with across() input .fns.
i .fns must be NULL, a function, a formula, or a list of functions/formulas.
You can convert them into date object by as.Date(vector, format).
I could suggest a simple code:
gwlfullflattened2$EncounterDate_G <- as.Date(gwlfullflattened2$EncounterDate_G, format = "%m/%d/%Y")
Related
I am using the excellent nanotime package to store my timestamps, but I am unable to make the package work when my tibble contains a missing value.
Consider this:
library(nanotime)
library(tibble)
library(dplyr)
tibble(time = c('2020-01-01 10:10:10.123456',
NA,
'2020-01-01 10:10:10.123456')) %>%
mutate(enhance = nanotime(time,
tz = 'GMT',
format = '%Y-%m-%d %H:%M:%E9S'))
Error in RcppCCTZ::parseDouble(x, fmt = format, tz = tz) :
Parse error on NA
What am I missing here? Using na.rm = TRUE does not work unfortunately.
Thanks!
The issue is NA is of type logical, you need to have all the values in the column of same type. We can use as.integer64 to replace logical NA's with integer64 NA.
library(nanotime)
tbl <- tibble::tibble(time = c('2020-01-01 10:10:10.123456',
NA,
'2020-01-01 10:10:10.123456'))
tbl$enhance <- as.integer64(NA)
tbl$enhance[!is.na(tbl$time)] <- nanotime(na.omit(tbl$time), tz = 'GMT',
format = '%Y-%m-%d %H:%M:%E9S')
nanotime(tbl$enhance)
I have the following dataframe/tibble sample:
structure(list(name = c("Contents.Key", "Contents.LastModified",
"Contents.ETag", "Contents.Size", "Contents.Owner", "Contents.StorageClass",
"Contents.Bucket", "Contents.Key", "Contents.LastModified", "Contents.ETag"
), value = c("2019/01/01/07/556662_cba3a4fc-cb8f-4150-859f-5f21a38373d0_0e94e664-4d5e-4646-b2b9-1937398cfaed_2019-01-01-07-54-46-064",
"2019-01-01T07:54:47.000Z", "\"378d04496cb27d93e1c37e1511a79ec7\"",
"24187", "e7c0d260939d15d18866126da3376642e2d4497f18ed762b608ed2307778bdf1",
"STANDARD", "vfevvv-edrfvevevev-streamed-data", "2019/01/01/07/556662_cba3a4fc-cb8f-4150-859f-5f21a38373d0_33a8ba28-245c-490b-99b2-254507431d47_2019-01-01-07-54-56-755",
"2019-01-01T07:54:57.000Z", "\"df8cc7082e0cc991aa24542e2576277b\""
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))
I want to spread the names column using tidyr::spread() function but I don't get the desired result
df %>% tidyr::spread(key = name, value = value)
I get an error:
Error: Duplicate identifiers for rows:...
Also tried with melt function same result.
I have connected to S3 using aws.s3::get_bucket() function and trying to convert it to dataframe. I am aware there is a aws.s3::get_bucket_df() function which should do this but it doesn't work (you may look at my relevant question.
After I've got the bucket list, I've unlisted it and run enframe command.
Please advise.
You can introduce a new column first(introduces NAs, will have to deal with them).
df %>%
mutate(RN=row_number()) %>%
group_by(RN) %>%
spread(name,value)
I'm trying to use mutate_at() from dplyr to coerce date-like columns into columns of type Date using as.Date(), but I'm getting an error. Here's the code:
library(dplyr)
df = data.frame(date_1 = "7/5/2014", date_2 = "7/22/2011")
df %>%
mutate_at(.vars = c("date_1", "date_2"), .funs = as.Date("%m/%d/%Y"))
This gives me an error: Error in charToDate(x): character string is not in a standard unambiguous format
Not sure what's going on here, so I'd appreciate your help. I prefer dplyr solutions, but if there's a better way to do it, I'm open to that as well.
I personally prefer using the syntax as so:
The . here refers to the column, which needs to be passed to the as.Date function.
library(dplyr)
df = data.frame(date_1 = "7/5/2014", date_2 = "7/22/2011")
df %>%
mutate_at(vars(date_1, date_2), funs(as.Date(., "%m/%d/%Y")))
I have a time recorded in following format mm:ss where the minutes values can actually be greater than 59. I have parsed it as chr. Now I need to sort my observations in a descending order so I firstly created a time2 variable with ms function and used arrange on the new variable. However arranging doesn't work and the values in the second column are totally mixed up.
library(tidyverse)
library(lubridate)
test <- structure(list(time = c("00:58", "07:30", "08:07", "15:45", "19:30",
"24:30", "30:05", "35:46", "42:23", "45:30", "48:08", "52:01",
"63:45", "67:42", "80:12", "86:36", "87:51", "04:27", "09:34",
"12:33", "18:03", "20:28", "21:39", "23:31", "24:02", "26:28",
"31:13", "43:03", "44:00", "45:38")), .Names = "time", row.names = c(NA,
-30L), class = c("tbl_df", "tbl", "data.frame"))
test %>% mutate(time2 = ms(time)) %>% arrange(time2) %>% View()
How can I arrange this times?
I think it would be easier to just put time in te same unit and then arrange(). Try this:
test %>% mutate(time_in_seconds = minute(ms(time) )*60 + second(ms(time))) %>%
arrange(desc(time_in_seconds)) %>%
View()
seconds_to_period(minute(ms(test$time))*60 + second(ms(test$time))) # to get right format (with hours)
This is a known limitation with arrange. dplyr does not support S4 objects: https://github.com/tidyverse/lubridate/issues/515
I am getting an odd error when I try to extract the year from a date object
here is a dput of my dates:
structure(list(date = structure(c(15706, 15707, 15708, 15709,
15710, 15711), class = "Date")), .Names = "date", row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
when I pipe to lubridate::year(date) I get the following error.
Error in year(., date) : unused argument (date)
In a pipe if you do that, it wouldn't work because of the order or evaluation
dates %>%
lubridate::year(date)
Error in lubridate::year(., date) : unused argument (date)
Either we need to pull the column and then apply the function
dates %>%
pull(date) %>%
lubridate::year(.)
Or another way is to use the function within {}
dates %>%
{lubridate::year(.$date)}
#[1] 2013 2013 2013 2013 2013 2013
Or use the standard way of creating column by using mutate
dates %>%
mutate(year = lubridate::year(date))
Having named your object data, I'm assuming this is what you did:
data %>%
year(date)
That also didn't work for me. You can try this:
year(data$date)
Alternatively, you could use magrittr operator %$%:
library(magrittr)
dates %$%
year(date)