Specific column selection from data.table in R - r

I have a data.table in R. The table has column names.
I have a vector named colnames with a few of the column names in the table.
colnames<-c("cost1", "cost2", "cost3")
I want to select the columns whose names are in the vector colnames from the table.
The name of the table is dt.
I have tried doing the following:
selected_columns <- dt[,colnames]
But this, does not work, and I get an error.
However, when I try the following, it works:
selected_columns <- dt[,c("cost1", "cost2", "cost3")]
I want to use the vector variable (colnames) to access the columns and not the c("..") method.
How can I do so?

You can try like this:
dt[, .SD, .SDcols = colnames]
Meanwhile, data.table gives an alternative choice in recent version:
dt[, ..colnames]

Another nice alternatives is leveraging select from tidyverse/dplyr universe. select gives you a lot of flexibility when selecting columns from a data frame / tibble or data table.
library("data.table")
library("tidyverse")
df <- data.table(mtcars)
columns_to_select <- c("cyl", "mpg")
select(df, columns_to_select)
You can also skip quoting column names if you wish
select(df, c(cyl, mpg))
or leverage ellipsis and pass multiple quoted or unquoted names
Here, comparing multiple objects.
objs <- list(select(df, c(cyl, mpg)),
select(df, cyl, mpg),
select(df, "cyl", "mpg"))
outer(objs, objs, Vectorize(all.equal))
You may want to have a further look at dtplyr, which provides a bridge between data table and dplyr, if you want to go this route.

Related

How can I isolate (or filter) a part of a string in several columns at the same time?

I have a data frame with bacteria families from with all their OTUs (phylum, order, family...).
The data frame is large and I would like the name of each column to be only the last part of each string. The one that starts with "f___"
For example
I tried some methods in R (like dplyr::filter or filter(str_detect))and also separating columns in Excel and could not get what I wanted. I don't do it manually because it's too many columns.
df being your dataframe, you could use rename_with from package dplyr:
df %>%
rename_with(
## your renaming function (see ?gsub for help on
## replacing with search patterns (regular expressions):
~ gsub('.*;f___(.*)$', '\\1', .x),
## column selection (see ?dplyr::select for handy shortcuts)
cols = everything()
)
the .x in the replacement formula ~ etc. represents the variable argument to the replacement function, in this case the 'old' column name. You'll encounter this 'dot-something' pattern frequently in tidyverse packages.
microbiota <- read_csv("Tablas/nivel5-familia_clean.csv")
colnames(microbiota) <- gsub(colnames(microbiota),pattern = '.*f__', replacement = "")
I solve it like this.

In R, how to reverse a split()

In R, once the following code is ran:
temp <- split(mtcars, mtcars$cyl)
If I send only "temp" to someone else ...
What code can he use to put back slices of "temp" together? He does not need to use "cyl" as column name; he can use whatever he wants. Thanks!
We can use do.call with rbind, but the order or rows may be different
do.call(rbind, temp)
If the column info is known, then unsplit can be useful as it will keep the same order as before the split
unsplit(temp, mtcars$cyl)
You can use dplyr's bind_rows or data.table's rbindlist. To identify which rows come from which element of the list we can use .id/idcol parameter.
dplyr::bind_rows(temp, .id = 'id')
data.table::rbindlist(temp, idcol = 'id')
By default it assigns name of the list as id column, if you want them as numbers you can remove the names from the list using unname.
dplyr::bind_rows(unname(temp), .id = 'id')
data.table::rbindlist(unname(temp), idcol = 'id')

Add a column based on the dynamically named columns

A new column must be added to the existed dataframe so, that it is the mean of some other columns which are selected dynamiclly.
I prefer using dplyr, and thus the solution might look like something as follows:
selected_columns <- c("am", "mpg")
dplyr::mutate_at(mt_cars, vars(selected_columns), funs(new_col = rowMeans(.)))
Is there a way to modify this chunk or is another approach required?
Here, we just need to subset the columns of data (. ) with the string vector and get the rowMeans
library(dplyr)
mtcars %>%
mutate(new_col = rowMeans(.[selected_columns]))
mutate doesn't have the funs parameter (funs is already deprecated with list) and it is in mutate_if/mutate_at/mutate_all.

How to Rename Column Headers in R

I have two separate datasets: one has the column headers and another has the data.
The first one looks like this:
where I want to make the 2nd column as the column headers of the next dataset:
How can I do this? Thank you.
In general you can use colnames, which is a list of your column names of your dataframe or matrix. You can rename your dataframe then with:
colnames(df) <- *listofnames*
Also it is possible just to rename one name by using the [] brackets.
This would rename the first column:
colnames(df2)[1] <- "name"
For your example we gonna take the values of your column. Try this:
colnames(df2) <- as.character(df1[,2])
Take care that the length of the columns and the header is identical.
Equivalent for rows is rownames()
dplyr way w/ reproducible code:
library(dplyr)
df <- tibble(x = 1:5, y = 11:15)
df_n <- tibble(x = 1:2, y = c("col1", "col2"))
names(df) <- df_n %>% select(y) %>% pull()
I think the select() %>% pull() syntax is easier to remember than list indexing. Also I used names over colnames function. When working with a dataframe, colnames simply calls the names function, so better to cut out the middleman and be more explicit that we are working with a dataframe and not a matrix. Also shorter to type.
You can simply do this :
names(data)[3]<- 'Newlabel'
Where names(data)[3] is the column you want to rename.

Why when I use mutate command the index column tend to disappear?

Using the data.frame mtcars, when I apply the mutate command, the index column from mtcars with cars names tend to disappear.
mtcars
mutate(mtcars, displ_l = disp / 61.0237)
I want to visualize the whole data.frame with new modifications. Could it be possible?
Thanks
Then, the way to turn row names from data.frame into a column is by using this:
setDT(df, keep.rownames = TRUE)[]

Resources