Using `mutate_at()` with `as.Date()` - r

I'm trying to use mutate_at() from dplyr to coerce date-like columns into columns of type Date using as.Date(), but I'm getting an error. Here's the code:
library(dplyr)
df = data.frame(date_1 = "7/5/2014", date_2 = "7/22/2011")
df %>%
mutate_at(.vars = c("date_1", "date_2"), .funs = as.Date("%m/%d/%Y"))
This gives me an error: Error in charToDate(x): character string is not in a standard unambiguous format
Not sure what's going on here, so I'd appreciate your help. I prefer dplyr solutions, but if there's a better way to do it, I'm open to that as well.

I personally prefer using the syntax as so:
The . here refers to the column, which needs to be passed to the as.Date function.
library(dplyr)
df = data.frame(date_1 = "7/5/2014", date_2 = "7/22/2011")
df %>%
mutate_at(vars(date_1, date_2), funs(as.Date(., "%m/%d/%Y")))

Related

How to run left join in dplyr transforming the key columns ( using lubridate function) on the fly

I have two databases where I need to combine columns based on 2 common Date columns, with condition that the DAY for those dates are the same.
"2020/01/01 20:30" MUST MATCH "2020/01//01 17:50"
All dates are in POSIXct format.
While I could use some pre-cprocessing with string parsing or the like, I wanted to handle it via lubridate/dplyr like:
DB_New <- left_join(DB_A,DB_B, by=c((date(Date1) = date(Date2)))
notice I am using the function "date" from dplyr to rightly match condition as explained above. I am though getting the error as below:
DB_with_rain <- left_join(DB_FEB_2019_join,Chuvas_BH, by=c(date(Saida_Real)= date(DateTime)))
Error: unexpected '=' in "DB_with_rain <- left_join(DB_FEB_2019_join,Chuvas_BH, by=c(date(Saida_Real)="
Within in the by, we cannot do the conversion - it expects the column name as a string. It should be done before the left_join
library(dplyr)
DF_FEB_2019_join %>%
mutate(Saida_Real = as.Date(Saida_Real, format = "%Y/%m/%d %H:%M")) %>%
left_join(Chuvas_BH %>%
mutate(DateTime = as.Date(DateTime, format = "%Y/%m/%d %H:%M")),
by = c(Saida_Real = "DateTime"))
With lubridate function, the as.Date can be replaced with ymd_hm and convert to Date class with as.Date

How to use data.frame column values as function argument in dplyr mutate function

I am running into a problem using the mutate functin of the dplyr package. I would like to use one column as argument of the strptime function
Example df:
rdf=data.frame(
d="20180514",
h=sample(1:25, 10)-1,
m=sample(1:60, 10)-1
)
df = data.frame(
stringtime = paste(rdf$d, rdf$h, rdf$m, sep=""),
timezone = sample(rep(c("GMT", "CET"), 5), 10)
)
df
stringtime timezone
1 201805141701 CET
2 201805140116 GMT
.
.
By intuition I wanted to run the command as follows:
df %>% mutate(timestamp = strptime(stringtime, tz=timezone, format="%Y%m%d%h%M")
Unluckily I get an error saying:
Error in [...]: invalid 'tz' value.
Does anybody have an idea what mistake I am making or if there would be an easy workaround?
Thanks in advance!
Update
As there are different 'timezone', an option is to group_split and then specify the first 'timezone'
library(dplyr)
library(purrr)
df %>%
group_split(timezone) %>%
map_df(~ .x %>%
mutate(timestamp = as.POSIXct(stringtime,
format = "%Y%m%d%H%M", tz = as.character(first(timezone)))))
According to strptime
strptime converts character vectors to class "POSIXlt": its input x is first converted by as.character.
POSIXlt class is not supported in mutate as the underlying structure when unclassed is a list
df %>%
mutate(timestamp = as.POSIXlt(stringtime, format="%Y%m%d%H%M"))
Error: Column timestamp is of unsupported class POSIXlt; please use
POSIXct instead
Instead use as.POSIXct
df %>%
mutate(timestamp = as.POSIXct(stringtime, format="%Y%m%d%H%M"))
# stringtime timezone timestamp
#1 201805141314 GMT 2018-05-14 13:14:00
#2 20180514115 GMT 2018-05-14 11:05:00
#3 201805141434 CET 2018-05-14 14:34:00
#...

R - mutate columns with different formats

I'm trying to do analysis from multiple csv files, and in order to create a key that can be used for left_join I think that I need to try and merge two columns. At present I'm trying to use the tidyverse packages (inc. mutate), but I'm running into an issue as the two columns to merge have different formatting: 1 is a double and the other is in date format. I'm using the following code
qlik2 <- qlik %>%
separate('Admit DateTime', into = c('Admit Date', 'Admit Time'), sep = 10) %>%
mutate(key = MRN + `Admit Date`)
and getting tis output error:
Error in mutate_impl(.data, dots) :
Evaluation error: non-numeric argument to binary operator.
If there's another way around this (or if the error is actually related to something else), then I'd appreciate any thoughts on the matter. Equally, if people know of a way to left_join with multiple keys, then that would work as well.
Thanks,
Cal
Hard without a reproducible example. But if i understand your question you either want a numeric key, or trying to concatinate a string with the plus +.
Numeric key
library(hablar)
qlik2 <- qlik %>%
separate('Admit DateTime',
into = c('Admit Date', 'Admit Time'),
sep = 10) %>%
convert(num(MRN, `Admit Date`)) %>%
mutate(key = MRN + `Admit Date`)
String key
qlik2 <- qlik %>%
separate('Admit DateTime',
into = c('Admit Date', 'Admit Time'),
sep = 10) %>%
mutate(key = paste(MRN, `Admit Date`))

select just date fields

I have a dataframe of various types (numeric, integer, Date, character).
I want to subset this to just the columns that have a format of 'Date'. How do I go about doing this?
mtcars$dates = '2015-05-05'
mtcars$dates = as.Date(mtcars$dates)
#filter just gives me: newdf = mtcars$dates
Another way using Filter:
#make a function that checks for the Date class
is.Date <- function(x) inherits(x, 'Date')
#use Filter to filter the data.frame
Filter(is.Date, mtcars)
We can use sapply to loop over the columns, get the class of the column, check whether it is 'Date' and use that logical vector to subset the columns.
mtcars[sapply(mtcars, class) == "Date"]
Package purrr has a keep function for this:
keep(mtcars, ~inherits(.x, "Date"))
The ~ and .x coding allows the use of inherits on each column without creating a separate function or using an anonymous function.
select_if lets you use a predicate on the columns of a data frame. Only those columns for which the predicate returns TRUE will be selected:
library(dplyr)
select_if(mtcars, function(x) inherits(x, 'Date'))
I had the same problem and found the above answers helpful, but I ultimately came up with a current tidyverse solution with a little help from the lubridate package to avoid creating my own anonymous function.
library(tidyverse)
library(lubridate)
my_mtcars <- mtcars %>%
as_tibble(rownames = "make_model") %>%
mutate(
start_date = as.Date("2022-01-01"),
end_date = as.Date("2022-01-31"),
POSIXct = as.POSIXct("2022-01-05")
)
my_mtcars %>%
select(where(is.Date))
Note, this only returns the start_date and end_date columns, but lubridate has the function is.POSIXt() for objects with other date-time classes.
my_mtcars %>%
select(where(~ is.Date(.x) | is.POSIXt(.x)))
This should work:
data(mtcars)
mtcars$dates = '2015-05-05'
mtcars$dates = as.Date(mtcars$dates)
head(mtcars)
v=sapply(mtcars,class) #get the class of each column
datecol=names(v)[v=='Date'] # select the columns having date class
mtcars[datecol] #subset those columns.

gather_ does not work. Shouldn't quoting and ~ing have the same effect in standard evaluation mode?

I have issues getting tidyr's gather to work in it's standard evaluation version gather_ :
require(tidyr)
require(dplyr)
require(lazyeval)
df = data.frame(varName=c(1,2))
gather works:
df %>% gather(variable,value,varName)
but I'd like to be able to take the name varName from a variable in standard evaluation mode, and can't seem to get it right:
name='varName'
df %>% gather_("variable","value",interp(~v,v=name))
Error in match(x, y, 0L) : 'match' requires vector arguments
I'm also confused by the following.
This works as expected:
df %>% gather_("variable","value","varName")
The next line should be equivalent to last line (from my understanding of http://cran.r-project.org/web/packages/dplyr/vignettes/nse.html ), but doesn't work:
df %>% gather_(~variable,~value,~varName)
Error in match(x, y, 0L) : 'match' requires vector arguments
Looking at the source of tidyr:::gather_.data.frame, you can see that it is just a wrapper for reshape2::melt. As such, it only works for character or numeric arguments. Acutally the following (which I would consider a bug) works:
df %>% gather_("variable", "value", 1)
As far as I can tell the nse vignette only refers to dplyr and not to tidyr.
Although this question has been answered, the following code could be used for defining keys and values for gathering purposes more generally in a function, using a vector of inputs for key and value:
data <- data.frame(a = runif(10), b = runif(10), c = runif(10))
Key <- "ColId"
Value <- "ColValue"
data %>% gather(key = KeyTmp, value = ValTmp) %>%
rename_(.dots = setNames("KeyTmp", Key) ) %>%
rename_(.dots = setNames("ValTmp", Value) )

Resources