Consider a data.frame with a mix of data types.
For a weird purpose, a user needs to convert all columns to characters.
How is it best done? A tidyverse attempt at solution is this:
map(mtcars,as.character) %>% map_df(as.list) %>% View()
c2<-map(mtcars,as.character) %>% map_df(as.list)
when I call str(c2) it should say a tibble or data.frame with all characters.
The other option would be some parameter settings for write.csv() or in write_csv() to achieve the same thing in the resulting file output.
EDIT: 2021-03-01
Beginning with dplyr 1.0.0, the _all() function variants are superceded. The new way to accomplish this is using the new across() function.
library(dplyr)
mtcars %>%
mutate(across(everything(), as.character))
With across(), we choose the set of columns we want to modify using tidyselect helpers (here we use everything() to choose all columns), and then specify the function we want to apply to each of the selected columns. In this case, that is as.character().
Original answer:
You can also use dplyr::mutate_all.
library(dplyr)
mtcars %>%
mutate_all(as.character)
In base R:
x[] <- lapply(x, as.character)
This converts the columns to character class in place, retaining the data.frame's attributes. A call to data.frame() would cause them to be lost.
Attribute preservation using dplyr: Attributes seem to be preserved during dplyr::mutate(across(everything(), as.character)). Previously they were destroyed by dplyr::mutate_all.
Example
x <- mtcars
attr(x, "example") <- "1"
In the second case below, the example attribute is retained:
# Destroys attributes
data.frame(lapply(x, as.character)) %>%
attributes()
# Preserves attributes
x[] <- lapply(x, as.character)
attributes(x)
This might work, but not sure if it's the best.
df = data.frame(lapply(mtcars, as.character))
str(df)
Most efficient way using data.table-
data.table::setDT(mtcars)
mtcars[, (colnames(mtcars)) := lapply(.SD, as.character), .SDcols = colnames(mtcars)]
Note: You can use this to convert few columns of a data table to your desired column type.
If we want to convert all columns to character then we can also do something like this-
to_col_type <- function(col_names,type){
get(paste0("as.", type))(dt[[col_names]])
}
mtcars<- rbindlist(list(Map(to_col_type ,colnames(mtcars),"character")))
mutate_all in the accepted answer is superseded.
You can use mutate() function with across():
library(dplyr)
mtcars %>%
mutate(across(everything(), as.character))
Related
I saw a code like this:
df %>% rename(!!! setNames(map$VarName, map$StdName))
what does !!! means here? also, why we use rename and setNames together? And if df has less variables than map$VarName, is it a way to let this codes runs? right now it will pop error message Error: Can't rename columns that don't exist.
Any suggestion? Many thanks.
As the error suggests, there are some column names that didn't exist from the key/val dataset. With the current code, one option is to `subset/filter the 'map' dataset with the columns that are common in 'df'
map1 <- subset(map, VarName %in% colnames(df))
and then use the subset dataset with splicing notation (!!!) on the named vector
library(dplyr)
df %>%
rename(!!! setNames(map1$VarName, map1$StdName))
Instead of !!!, we can use rename_with as well
df %>%
rename_with(~ map1$StdName, all_of(map1$VarName))
Why is the type of the column created through map list? I would expect it to be a character column. How can I convert it to a character column?
t <- mtcars %>% mutate(new_col=map(mpg, function(x) as.character(x)))
typeof(t$new_col)
> [1] "list"
Thanks
You can use map_chr() instead of map().
And you can just write
mtcars %>% mutate(new_col = map_chr(mpg, as.character))
The result of map was a list. It's not generally wise to add lists to dataframes, but it can be done. The other common mistake is to add the result of POSIXlt to a dataframe. Again it can be done but subsequent operations may fail. You could have just used the function:
> t <- mtcars %>% mutate(new_col=as.character(mpg))
> typeof(t$new_col)
[1] "character"
I have a dplyr question: How do I use transmute over each column without writing each column out by hand? I.e. is there something like transmute_each()?
I want to do the following: Using dplyr I want to get the z-score of each column for a MWE below:
tickers <- c(rep(1,10),rep(2,10))
df <- data.frame(cbind(tickers,rep(1:20),rep(2:21),rep(2:21),rep(4:23),rep(3:22)))
colnames(df) <- c("tickers","col1","col2","col3","col4","col5")
df %>% group_by(tickers)
Is there a simple way to then use transmute to achieve the following:
for(i in 2:ncol(df)){
df[,i] <- df[,i] - mean(df[,i])/sd(df[,i])
}
Many thanks
Now that there is a transmute_at() function (as of dplyr 0.7), you can do the following:
df %>%
group_by(tickers) %>%
transmute_at(.vars = vars(starts_with("col")),
.funs = funs(scale(.))) %>%
ungroup
Note that this uses the scale() function from base R, which by default converts a numeric vector into a z-score.
Also, the use of vars() in the .vars argument allows you to use all the helper functions that are available for dplyr's select(), such as one_of(), ends_with(), etc.
Finally, instead of writing funs(scale(.)) here, since you're using a simple function in the .funs argument, you can just write .funs = scale.
I solved this using the following:
df %>%
group_by(tickers) %>%
mutate_at(.funs = funs((. - mean(.))/sd(.)),
.cols = vars(matches("col")))
I need your help to simplify the following code.
I need to name the columns of matrix and format each of it as factor.
How can I do that for 100 columns without doing it one by one.
z <- matrix(sample(seq(3),n*p,replace=TRUE),nrow=n)
train.data <- data.frame(x1=factor(z[,1],x2=factor(z[,2],....,x100=factor(z[,52]))
Here's one option
setNames(data.frame(lapply(split(z, col(z)), factor)), paste0("x", 1:p))
or use magrittr piping syntax
library(magrittr)
split(z, col(z)) %>%
lapply(factor) %>%
data.frame %>%
setNames(paste0("x", 1:p))
Is it possible to set all column names to upper or lower within a dplyr or magrittr chain?
In the example below I load the data and then, using a magrittr pipe, chain it through to my dplyr mutations. In the 4th line I use the tolower function , but this is for a different purpose: to create a new variable with lowercase observations.
mydata <- read.csv('myfile.csv') %>%
mutate(Year = mdy_hms(DATE),
Reference = (REFNUM),
Event = tolower(EVENT)
I'm obviously looking for something like colnames = tolower but know this doesn't work/exist.
I note the dplyr rename function but this isn't really helpful.
In magrittr the colname options are:
set_colnames instead of base R's colnames<-
set_names instead of base R's names<-
I've tried numerous permutations with these but no dice.
Obviously this is very simple in base r.
names(mydata) <- tolower(names(mydata))
However it seems incongruous with the dplyr/magrittr philosophies that you'd have to do that as a clunky one liner, before moving on to an elegant chain of dplyr/magrittr code.
with {dplyr} we can do :
mydata %>% rename_all(tolower)
or
mydata %>% rename(across(everything(), tolower))
iris %>% setNames(tolower(names(.))) %>% head
Or equivalently use replacement function in non-replacement form:
iris %>% `names<-`(tolower(names(.))) %>% head
iris %>% `colnames<-`(tolower(names(.))) %>% head # if you really want to use `colnames<-`
Using magrittr's "compound assignment pipe-operator" %<>% might be, if I understand your question correctly, an even more succinct option.
library("magrittr")
names(iris) %<>% tolower
?`%<>%` # for more
mtcars %>%
set_colnames(value = casefold(colnames(.), upper = FALSE)) %>%
head
casefold is available in base R and can convert in both direction, i.e. can convert to either all upper case or all lower case by using the flag upper, as need might be.
Also colnames() will use only column headers for case conversion.
You could also define a function:
upcase <- function(df) {
names(df) <- toupper(names(df))
df
}
library(dplyr)
mtcars %>% upcase %>% select(MPG)