Convert a variable in multiple dataframes to character in R - r

I have 4 datasets:(y25_age,y30_age,y25_mri,y30_mri). Each dataset has an ID variable. I want to convert the ID format from numeric to character in the above datasets. I have tried the below code
x<-list(y25_age,y30_age,y25_mri,y30_mri)
x$ID<-lapply(x,function(x){x<-x["ID"]<-as.character(x["ID"])})
However, this gives an output of all the IDs as characters, which is not what I want. Any suggestions are welcome? Thank you in advance.

Here, the lhs to <- should be x and there should be a return statement for 'x'
x <- lapply(x,function(u){u$ID <-as.character(u$ID)
u})
NOTE: changed the anonymous function from 'x' to 'u' to avoid any confusion
Or another option is transform
x <- lapply(x, transform, ID = as.character(ID))
If the intention is to change the original objects, the 'x' should be a named list
names(x) <- c('y25_age','y30_age','y25_mri','y30_mri')
and then use list2env
list2env(x, .GlobalEnv) # not recommended though

this should also work
library(tidyverse)
map(x, ~ .x %>% mutate(ID = as.character(ID)))

Related

Is there a way to convert some specific column to numeric?

I don't want to convert all my entire dataset
variable <- mutate_all(dataset, function(x) as.numeric(x))
as it gives a different dataset altogether. Can somebody help on what to do.
We can do this with type.convert based on automatically checking the type of columns and converting it to numeric
dataset <- type.convert(dataset, as.is = TRUE)
Or use mutate_at, where we specify the columns of interest wrapped in vars
library(dplyr)
dataset <- dataset %>%
mutate_at(vars(Time, Length), as.numeric)

Why is the type of a character column a list?

Why is the type of the column created through map list? I would expect it to be a character column. How can I convert it to a character column?
t <- mtcars %>% mutate(new_col=map(mpg, function(x) as.character(x)))
typeof(t$new_col)
> [1] "list"
Thanks
You can use map_chr() instead of map().
And you can just write
mtcars %>% mutate(new_col = map_chr(mpg, as.character))
The result of map was a list. It's not generally wise to add lists to dataframes, but it can be done. The other common mistake is to add the result of POSIXlt to a dataframe. Again it can be done but subsequent operations may fail. You could have just used the function:
> t <- mtcars %>% mutate(new_col=as.character(mpg))
> typeof(t$new_col)
[1] "character"

R function to compare columns

In R language I would like to create a function to view selected columns for comparison in the Viewer. Assuming my dataframe is df1:
compare_col <- function(x){
select(df1, x) %>%
View()
}
If I define the function by x, I can only put input 1 column.
compare_col <- function(x)
compare_col("col_1")
Only if I define the function by say x,y, then can I input in 2 columns.
compare_col <- function(x, y)
compare_col("col_1", "col_2")
How can I create a function that is dynamic enough to input in any no. of columns?
You can use the rlang package to achieve this.
This will allow you to input a string of column names using the syms and !!! operator which will splice and evaluate in the given environment dynamically as you require.
library(dplyr)
#library(rlang)
compare_col <- function(x){
df1 %>% select(!!! syms(x)) %>%
View()
}
compare_col(c("col1", "col2"))
Just realised, all I actually needed to do was vectorise the inputs when calling the function.
compare_col(c("col1", "col2"))

Name the column of data frame and set as factor at the same time

I need your help to simplify the following code.
I need to name the columns of matrix and format each of it as factor.
How can I do that for 100 columns without doing it one by one.
z <- matrix(sample(seq(3),n*p,replace=TRUE),nrow=n)
train.data <- data.frame(x1=factor(z[,1],x2=factor(z[,2],....,x100=factor(z[,52]))
Here's one option
setNames(data.frame(lapply(split(z, col(z)), factor)), paste0("x", 1:p))
or use magrittr piping syntax
library(magrittr)
split(z, col(z)) %>%
lapply(factor) %>%
data.frame %>%
setNames(paste0("x", 1:p))

Selecting only numeric columns from a data frame

Suppose, you have a data.frame like this:
x <- data.frame(v1=1:20,v2=1:20,v3=1:20,v4=letters[1:20])
How would you select only those columns in x that are numeric?
EDIT: updated to avoid use of ill-advised sapply.
Since a data frame is a list we can use the list-apply functions:
nums <- unlist(lapply(x, is.numeric), use.names = FALSE)
Then standard subsetting
x[ , nums]
## don't use sapply, even though it's less code
## nums <- sapply(x, is.numeric)
For a more idiomatic modern R I'd now recommend
x[ , purrr::map_lgl(x, is.numeric)]
Less codey, less reflecting R's particular quirks, and more straightforward, and robust to use on database-back-ended tibbles:
dplyr::select_if(x, is.numeric)
Newer versions of dplyr, also support the following syntax:
x %>% dplyr::select(where(is.numeric))
The dplyr package's select_if() function is an elegant solution:
library("dplyr")
select_if(x, is.numeric)
Filter() from the base package is the perfect function for that use-case:
You simply have to code:
Filter(is.numeric, x)
It is also much faster than select_if():
library(microbenchmark)
microbenchmark(
dplyr::select_if(mtcars, is.numeric),
Filter(is.numeric, mtcars)
)
returns (on my computer) a median of 60 microseconds for Filter, and 21 000 microseconds for select_if (350x faster).
in case you are interested only in column names then use this :
names(dplyr::select_if(train,is.numeric))
iris %>% dplyr::select(where(is.numeric)) #as per most recent updates
Another option with purrr would be to negate discard function:
iris %>% purrr::discard(~!is.numeric(.))
If you want the names of the numeric columns, you can add names or colnames:
iris %>% purrr::discard(~!is.numeric(.)) %>% names
This an alternate code to other answers:
x[, sapply(x, class) == "numeric"]
with a data.table
x[, lapply(x, is.numeric) == TRUE, with = FALSE]
library(purrr)
x <- x %>% keep(is.numeric)
The library PCAmixdata has functon splitmix that splits quantitative(Numerical data) and qualitative (Categorical data) of a given dataframe "YourDataframe" as shown below:
install.packages("PCAmixdata")
library(PCAmixdata)
split <- splitmix(YourDataframe)
X1 <- split$X.quanti(Gives numerical columns in the dataset)
X2 <- split$X.quali (Gives categorical columns in the dataset)
If you have many factor variables, you can use select_if funtion.
install the dplyr packages. There are many function that separates data by satisfying a condition. you can set the conditions.
Use like this.
categorical<-select_if(df,is.factor)
str(categorical)
Another way could be as follows:-
#extracting numeric columns from iris datset
(iris[sapply(iris, is.numeric)])
Numerical_variables <- which(sapply(df, is.numeric))
# then extract column names
Names <- names(Numerical_variables)
This doesn't directly answer the question but can be very useful, especially if you want something like all the numeric columns except for your id column and dependent variable.
numeric_cols <- sapply(dataframe, is.numeric) %>% which %>%
names %>% setdiff(., c("id_variable", "dep_var"))
dataframe %<>% dplyr::mutate_at(numeric_cols, function(x) your_function(x))

Resources