In R, once the following code is ran:
temp <- split(mtcars, mtcars$cyl)
If I send only "temp" to someone else ...
What code can he use to put back slices of "temp" together? He does not need to use "cyl" as column name; he can use whatever he wants. Thanks!
We can use do.call with rbind, but the order or rows may be different
do.call(rbind, temp)
If the column info is known, then unsplit can be useful as it will keep the same order as before the split
unsplit(temp, mtcars$cyl)
You can use dplyr's bind_rows or data.table's rbindlist. To identify which rows come from which element of the list we can use .id/idcol parameter.
dplyr::bind_rows(temp, .id = 'id')
data.table::rbindlist(temp, idcol = 'id')
By default it assigns name of the list as id column, if you want them as numbers you can remove the names from the list using unname.
dplyr::bind_rows(unname(temp), .id = 'id')
data.table::rbindlist(unname(temp), idcol = 'id')
Related
I have a data.table in R. The table has column names.
I have a vector named colnames with a few of the column names in the table.
colnames<-c("cost1", "cost2", "cost3")
I want to select the columns whose names are in the vector colnames from the table.
The name of the table is dt.
I have tried doing the following:
selected_columns <- dt[,colnames]
But this, does not work, and I get an error.
However, when I try the following, it works:
selected_columns <- dt[,c("cost1", "cost2", "cost3")]
I want to use the vector variable (colnames) to access the columns and not the c("..") method.
How can I do so?
You can try like this:
dt[, .SD, .SDcols = colnames]
Meanwhile, data.table gives an alternative choice in recent version:
dt[, ..colnames]
Another nice alternatives is leveraging select from tidyverse/dplyr universe. select gives you a lot of flexibility when selecting columns from a data frame / tibble or data table.
library("data.table")
library("tidyverse")
df <- data.table(mtcars)
columns_to_select <- c("cyl", "mpg")
select(df, columns_to_select)
You can also skip quoting column names if you wish
select(df, c(cyl, mpg))
or leverage ellipsis and pass multiple quoted or unquoted names
Here, comparing multiple objects.
objs <- list(select(df, c(cyl, mpg)),
select(df, cyl, mpg),
select(df, "cyl", "mpg"))
outer(objs, objs, Vectorize(all.equal))
You may want to have a further look at dtplyr, which provides a bridge between data table and dplyr, if you want to go this route.
How to set list names ,here is the code as below.
Currently,split_data include two sub list [[1]] and [[2]], how set names separately for them?
I want set name 'A' for [[1]],'B' for [[2]], so can retrieve data use split_data['A']...
Anyone can help on this, thanks ?
for instance ma <- list(a=c('a1','a2'),b=c('b1','b2')) can use ma["a"] for sub list
library(tidyverse)
test_data <- data.frame(category=c('A','B','A','B','A','B','A','B'),
sales=c(1,2,4,5,8,1,4,6))
split_data <- test_data %>% group_split(category)
Others have shown you in the comments how to get what you want using split() instead of group_split(). That seems like the easiest solution.
However, if you're stuck with the existing code, here's an alternative that keeps your current code, and adds the names.
library(tidyverse)
test_data <- data.frame(category=c('A','B','A','B','A','B','A','B'),
sales=c(1,2,4,5,8,1,4,6))
split_data <- test_data %>% group_split(category)
names(split_data) <- test_data %>% group_by(category) %>% group_keys() %>% apply(1, paste, collapse = ".")
The idea is to use group_by to split in the same way group_split does, then extract the keys as a tibble. This will have one row per group, but will have the different variables in separate columns, so I put them together by pasting the columns with a dot as separator. The last expression in the pipe is equivalent to apply(keys, 1, f)
where f is function(row) paste(row, collapse = "."). It applies f to each row of the tibble, producing a single name.
This should work even if the split happens on multiple variables, and produces names similar to those produced by split().
I have 4 datasets:(y25_age,y30_age,y25_mri,y30_mri). Each dataset has an ID variable. I want to convert the ID format from numeric to character in the above datasets. I have tried the below code
x<-list(y25_age,y30_age,y25_mri,y30_mri)
x$ID<-lapply(x,function(x){x<-x["ID"]<-as.character(x["ID"])})
However, this gives an output of all the IDs as characters, which is not what I want. Any suggestions are welcome? Thank you in advance.
Here, the lhs to <- should be x and there should be a return statement for 'x'
x <- lapply(x,function(u){u$ID <-as.character(u$ID)
u})
NOTE: changed the anonymous function from 'x' to 'u' to avoid any confusion
Or another option is transform
x <- lapply(x, transform, ID = as.character(ID))
If the intention is to change the original objects, the 'x' should be a named list
names(x) <- c('y25_age','y30_age','y25_mri','y30_mri')
and then use list2env
list2env(x, .GlobalEnv) # not recommended though
this should also work
library(tidyverse)
map(x, ~ .x %>% mutate(ID = as.character(ID)))
I have two separate datasets: one has the column headers and another has the data.
The first one looks like this:
where I want to make the 2nd column as the column headers of the next dataset:
How can I do this? Thank you.
In general you can use colnames, which is a list of your column names of your dataframe or matrix. You can rename your dataframe then with:
colnames(df) <- *listofnames*
Also it is possible just to rename one name by using the [] brackets.
This would rename the first column:
colnames(df2)[1] <- "name"
For your example we gonna take the values of your column. Try this:
colnames(df2) <- as.character(df1[,2])
Take care that the length of the columns and the header is identical.
Equivalent for rows is rownames()
dplyr way w/ reproducible code:
library(dplyr)
df <- tibble(x = 1:5, y = 11:15)
df_n <- tibble(x = 1:2, y = c("col1", "col2"))
names(df) <- df_n %>% select(y) %>% pull()
I think the select() %>% pull() syntax is easier to remember than list indexing. Also I used names over colnames function. When working with a dataframe, colnames simply calls the names function, so better to cut out the middleman and be more explicit that we are working with a dataframe and not a matrix. Also shorter to type.
You can simply do this :
names(data)[3]<- 'Newlabel'
Where names(data)[3] is the column you want to rename.
Consider a data.frame with a mix of data types.
For a weird purpose, a user needs to convert all columns to characters.
How is it best done? A tidyverse attempt at solution is this:
map(mtcars,as.character) %>% map_df(as.list) %>% View()
c2<-map(mtcars,as.character) %>% map_df(as.list)
when I call str(c2) it should say a tibble or data.frame with all characters.
The other option would be some parameter settings for write.csv() or in write_csv() to achieve the same thing in the resulting file output.
EDIT: 2021-03-01
Beginning with dplyr 1.0.0, the _all() function variants are superceded. The new way to accomplish this is using the new across() function.
library(dplyr)
mtcars %>%
mutate(across(everything(), as.character))
With across(), we choose the set of columns we want to modify using tidyselect helpers (here we use everything() to choose all columns), and then specify the function we want to apply to each of the selected columns. In this case, that is as.character().
Original answer:
You can also use dplyr::mutate_all.
library(dplyr)
mtcars %>%
mutate_all(as.character)
In base R:
x[] <- lapply(x, as.character)
This converts the columns to character class in place, retaining the data.frame's attributes. A call to data.frame() would cause them to be lost.
Attribute preservation using dplyr: Attributes seem to be preserved during dplyr::mutate(across(everything(), as.character)). Previously they were destroyed by dplyr::mutate_all.
Example
x <- mtcars
attr(x, "example") <- "1"
In the second case below, the example attribute is retained:
# Destroys attributes
data.frame(lapply(x, as.character)) %>%
attributes()
# Preserves attributes
x[] <- lapply(x, as.character)
attributes(x)
This might work, but not sure if it's the best.
df = data.frame(lapply(mtcars, as.character))
str(df)
Most efficient way using data.table-
data.table::setDT(mtcars)
mtcars[, (colnames(mtcars)) := lapply(.SD, as.character), .SDcols = colnames(mtcars)]
Note: You can use this to convert few columns of a data table to your desired column type.
If we want to convert all columns to character then we can also do something like this-
to_col_type <- function(col_names,type){
get(paste0("as.", type))(dt[[col_names]])
}
mtcars<- rbindlist(list(Map(to_col_type ,colnames(mtcars),"character")))
mutate_all in the accepted answer is superseded.
You can use mutate() function with across():
library(dplyr)
mtcars %>%
mutate(across(everything(), as.character))