select just date fields - r

I have a dataframe of various types (numeric, integer, Date, character).
I want to subset this to just the columns that have a format of 'Date'. How do I go about doing this?
mtcars$dates = '2015-05-05'
mtcars$dates = as.Date(mtcars$dates)
#filter just gives me: newdf = mtcars$dates

Another way using Filter:
#make a function that checks for the Date class
is.Date <- function(x) inherits(x, 'Date')
#use Filter to filter the data.frame
Filter(is.Date, mtcars)

We can use sapply to loop over the columns, get the class of the column, check whether it is 'Date' and use that logical vector to subset the columns.
mtcars[sapply(mtcars, class) == "Date"]

Package purrr has a keep function for this:
keep(mtcars, ~inherits(.x, "Date"))
The ~ and .x coding allows the use of inherits on each column without creating a separate function or using an anonymous function.

select_if lets you use a predicate on the columns of a data frame. Only those columns for which the predicate returns TRUE will be selected:
library(dplyr)
select_if(mtcars, function(x) inherits(x, 'Date'))

I had the same problem and found the above answers helpful, but I ultimately came up with a current tidyverse solution with a little help from the lubridate package to avoid creating my own anonymous function.
library(tidyverse)
library(lubridate)
my_mtcars <- mtcars %>%
as_tibble(rownames = "make_model") %>%
mutate(
start_date = as.Date("2022-01-01"),
end_date = as.Date("2022-01-31"),
POSIXct = as.POSIXct("2022-01-05")
)
my_mtcars %>%
select(where(is.Date))
Note, this only returns the start_date and end_date columns, but lubridate has the function is.POSIXt() for objects with other date-time classes.
my_mtcars %>%
select(where(~ is.Date(.x) | is.POSIXt(.x)))

This should work:
data(mtcars)
mtcars$dates = '2015-05-05'
mtcars$dates = as.Date(mtcars$dates)
head(mtcars)
v=sapply(mtcars,class) #get the class of each column
datecol=names(v)[v=='Date'] # select the columns having date class
mtcars[datecol] #subset those columns.

Related

Can't figure out how to change "X5.13.1996" to date class?

I have dates listed as "X5.13.1996", representing May 13th, 1996. The class for the date column is currently a character.
When using mdy from lubridate, it keeps populating NA. Is there a code I can use to get rid of the "X" to successfully use the code? Is there anything else I can do?
You can use substring(date_variable, 2) to drop the first character from the string.
substring("X5.13.1996", 2)
[1] "5.13.1996"
To convert a variable (i.e., column) in your data frame:
library(dplyr)
library(lubridate)
dates <- data.frame(
dt = c("X5.13.1996", "X11.15.2021")
)
dates %>%
mutate(converted = mdy(substring(dt, 2)))
or, without dplyr:
dates$converted <- mdy(substring(dates$dt, 2))
Output:
dt converted
1 X5.13.1996 1996-05-13
2 X11.15.2021 2021-11-15

How to change variable to factor based on its name in some list by using across?

(I am new in R)
Trying to change variables data type of df members to factors based on condition if their names available in a list to_factors_list.
I have tried some code using mutate(across()) but it's giving errors.
Data prep.:
library(tidyverse)
# tidytuesday himalayan data
members <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-22/members.csv")
# creating list of names
to_factors_list <- members %>%
map_df(~(data.frame(n_distinct = n_distinct(.x))),
.id = "var_name") %>%
filter(n_distinct < 15) %>%
select(var_name) %>% pull()
to_factors_list
############### output ###############
'season''sex''hired''success''solo''oxygen_used''died''death_cause''injured''injury_type'
Getting error in below code attempts:
members %>%
mutate(across(~.x %in% to_factors_list, factor))
members %>%
mutate_if( ~.x %in% to_factors_list, factor)
I am not sure what's wrong and how can I make this work ?
In base R, this can be done with lapply
members[to_factors_list] <- lapply(members[to_factors_list], factor)
The correct syntax is:
members %>% mutate(across(to_factors_list, factor))
Or if you prefer an older-version dplyr syntax:
members %>% mutate_at(vars(to_factors_list), factor)

dplyr mutate inside for loop - Issue

I am performing Data Analysis and cleaning in R using tidyverse.
I have a Data Frame with 23 columns containing values 'NO','STEADY','UP' and 'down'.
I want to change all the values in these 23 columns to 0 in case of 'NO','STEADY' and 1 in other case.
What i did is, i created a list by name keys in which i have kept all my columns, After that i am using for loop, ifelse statements and mutate.
Please have a look at the code below
# Column names are kept in the list by name keys
keys = c('metformin', 'repaglinide', 'nateglinide', 'chlorpropamide', 'glimepiride',
'glipizide', 'glyburide', 'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol',
'insulin', 'glyburide-metformin', 'tolazamide', 'metformin-pioglitazone',
'metformin-rosiglitazone', 'glimepiride-pioglitazone', 'glipizide-metformin',
'troglitazone', 'tolbutamide', 'acetohexamide')
After that, i used following code to get the desired result :
for (col in keys){
Dataset = Dataset %>%
mutate(col = ifelse(col %in% c('No','Steady'),0,1)) }
I was expecting that, it will do the changes that i require, but nothing happens after this. (NO ERROR MESSAGE AND NO DESIRED RESULT)
After that, i researched further and executed following code
for (col in keys){
print(col)}
It gives me elements of list as characters like - "metformin"
So, i thought - may be this is the issue. Hence, i used the below code to caste the keys as symbols :
keys_new = sym(keys)
After that i again ran the same code:
for (col in keys_new){
Dataset = Dataset %>%
mutate(col = ifelse(col %in% c('No','Steady'),0,1))}
It gives me following Error -
Error in match(x, table, nomatch = 0L) :
'match' requires vector arguments
After all this. I also tried to create a function to get the desired results, but that too didn't worked:
change = function(name){
Dataset = Dataset %>%
mutate(name = ifelse(name %in% c('No','Steady'),0,1),
name = as.factor(name))
return(Dataset)}
for (col in keys){
change(col)}
This didn't perform any action. (NO ERROR MESSAGE AND NO DESIRED RESULT)
When keys_new is placed in this code:
for (col in keys_new){
change(col)}
I got the same Error :
Error in match(x, table, nomatch = 0L) :
'match' requires vector arguments
PLEASE GUIDE
There's no need to loop or keep track of column names. You can use mutate_all -
Dataset %>%
mutate_all(~ifelse(. %in% c('No','Steady'), 0, 1))
Another way, thanks to Rui Barradas -
Dataset %>%
mutate_all(~as.integer(!. %in% c('No','Steady')))
There's a simpler way using mutate_at and case_when.
Dataset %>% mutate_at(keys, ~case_when(. %in% c("NO", "STEADY") ~ 0, TRUE ~ 1))
mutate_at will only mutate the columns specified in the keys variable. case_when then lets you replace one value by another by some condition.
This answer for using mutate through forloop.
I don't have your data, so i tried to make my own data, i changed the keys into a tibble using enframe then spread it into columns and used the row number as a value for each column, then check if the value is higher than 10 or not.
To use the column name in mutate you have to use !! and := in the mutate function
df <- enframe(c('metformin', 'repaglinide', 'nateglinide', 'chlorpropamide', 'glimepiride',
'glipizide', 'glyburide', 'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol',
'insulin', 'glyburide-metformin', 'tolazamide', 'metformin-pioglitazone',
'metformin-rosiglitazone', 'glimepiride-pioglitazone', 'glipizide-metformin',
'troglitazone', 'tolbutamide', 'acetohexamide')
) %>% spread(key = value,value = name)
keys = c('metformin', 'repaglinide', 'nateglinide', 'chlorpropamide', 'glimepiride',
'glipizide', 'glyburide', 'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol',
'insulin', 'glyburide-metformin', 'tolazamide', 'metformin-pioglitazone',
'metformin-rosiglitazone', 'glimepiride-pioglitazone', 'glipizide-metformin',
'troglitazone', 'tolbutamide', 'acetohexamide')
for (col in keys){
df = df %>%
mutate(!!as.character(col) := ifelse( df[col] > 10,0,100) )
}

removing and replacing observations with string package

I have two datasets, I'm trying to join together. the column i am joining by does not exactly match up with each other. first file the column looks like this: 00:01:54:2145 etc. 00: for every single observation. I want to change all the observations in this column to be in this format: 01/54/2145.
I have tried several things with string package, but can't get it to work.
df1 <- df %>%
str_replace_all("00:")
I'm getting this error, but don't think that's the only problem:
argument is not an atomic vector; coercing
Thank you
library(stringr)
library(dplyr)
my_conversion <- Vectorize(function(str) {
str_replace(str, "^00:", "") %>%
str_replace_all(":", "/")
})
df <- data.frame(
a_column = 1:3, key_column = c("00:01:54:2145", "00:01:54:2145", "00:01:54:2145"))
df %>% mutate(key_column = my_conversion(key_column))

dplyr::filter used with a function on string representation of factor

I have a dataframe with some 20 columns and some 10^7 rows. One of the columns is an id column that is a factor. I want to filter the rows by properties of the string representation of the levels of the factor. The code below achieves this, but seems to me to be really rather inelegant. In particular that I have to create a vector of the relevant ids seems to me should not be needed.
Any suggestions for streamlining this?
library(dplyr)
library(tidyr)
library(gdata)
dat <- data.frame(id=factor(c("xxx-nld", "xxx-jap", "yyy-aus", "zzz-ita")))
europ.id <- function(id) {
ctry.code <- substring(id, nchar(id)-2)
ctry.code %in% c("nld", "ita")
}
ids <- levels(dat$id)
europ.ids <- subset(ids, europ.campaign(ids))
datx <- dat %>% filter(id %in% europ.ids) %>% drop.levels
Docendo Discimus gave the right answer in comments. To explain it first see the error I kept getting in my different attempts
> dat %>% filter(europ.id(id))
Error in nchar(id) : 'nchar()' requires a character vector
Calls: %>% ... filter_impl -> .Call -> europ.id -> substring -> nchar
Then note that his solution works because grepl applies as.character to its argument if needed (from the man: a character vector where matches are sought, or an object which can be coerced by as.character to a character vector). This implicit application of as.character also happens if you use %in%. Since this solution is also perfectly performant, we can do the following
dat %>% filter(europ.id(as.character(id)) %>% droplevels
Or to make it read a bit nicer update the function to
europ.id <- function(id) {
ids <- as.character(id)
ctry.code <- substring(ids, nchar(ids)-2)
ctry.code %in% c("nld", "ita")
}
and use
dat %>% filter(europ.id(id)) %>% droplevels
which reads exactly like what I was looking for.

Resources