Add a column based on the dynamically named columns

Add a column based on the dynamically named columns - r

A new column must be added to the existed dataframe so, that it is the mean of some other columns which are selected dynamiclly.
I prefer using dplyr, and thus the solution might look like something as follows:
selected_columns <- c("am", "mpg")
dplyr::mutate_at(mt_cars, vars(selected_columns), funs(new_col = rowMeans(.)))
Is there a way to modify this chunk or is another approach required?

Here, we just need to subset the columns of data (. ) with the string vector and get the rowMeans
library(dplyr)
mtcars %>%
mutate(new_col = rowMeans(.[selected_columns]))
mutate doesn't have the funs parameter (funs is already deprecated with list) and it is in mutate_if/mutate_at/mutate_all.

Related

How can I isolate (or filter) a part of a string in several columns at the same time?

I have a data frame with bacteria families from with all their OTUs (phylum, order, family...).
The data frame is large and I would like the name of each column to be only the last part of each string. The one that starts with "f___"
For example
I tried some methods in R (like dplyr::filter or filter(str_detect))and also separating columns in Excel and could not get what I wanted. I don't do it manually because it's too many columns.

df being your dataframe, you could use rename_with from package dplyr:
df %>%
rename_with(
## your renaming function (see ?gsub for help on
## replacing with search patterns (regular expressions):
~ gsub('.*;f___(.*)$', '\\1', .x),
## column selection (see ?dplyr::select for handy shortcuts)
cols = everything()
)
the .x in the replacement formula ~ etc. represents the variable argument to the replacement function, in this case the 'old' column name. You'll encounter this 'dot-something' pattern frequently in tidyverse packages.

microbiota <- read_csv("Tablas/nivel5-familia_clean.csv")
colnames(microbiota) <- gsub(colnames(microbiota),pattern = '.*f__', replacement = "")
I solve it like this.

Specific column selection from data.table in R

I have a data.table in R. The table has column names.
I have a vector named colnames with a few of the column names in the table.
colnames<-c("cost1", "cost2", "cost3")
I want to select the columns whose names are in the vector colnames from the table.
The name of the table is dt.
I have tried doing the following:
selected_columns <- dt[,colnames]
But this, does not work, and I get an error.
However, when I try the following, it works:
selected_columns <- dt[,c("cost1", "cost2", "cost3")]
I want to use the vector variable (colnames) to access the columns and not the c("..") method.
How can I do so?

You can try like this:
dt[, .SD, .SDcols = colnames]
Meanwhile, data.table gives an alternative choice in recent version:
dt[, ..colnames]

Another nice alternatives is leveraging select from tidyverse/dplyr universe. select gives you a lot of flexibility when selecting columns from a data frame / tibble or data table.
library("data.table")
library("tidyverse")
df <- data.table(mtcars)
columns_to_select <- c("cyl", "mpg")
select(df, columns_to_select)
You can also skip quoting column names if you wish
select(df, c(cyl, mpg))
or leverage ellipsis and pass multiple quoted or unquoted names
Here, comparing multiple objects.
objs <- list(select(df, c(cyl, mpg)),
select(df, cyl, mpg),
select(df, "cyl", "mpg"))
outer(objs, objs, Vectorize(all.equal))
You may want to have a further look at dtplyr, which provides a bridge between data table and dplyr, if you want to go this route.

Filtering multiple string columns based on 2 different criteria - questions about "grepl" and "starts_with"

I want to create a subset of my data by using the select and filter functions from dplyr. I have consulted a few similar questions about partial string matches and selecting with grepl, but found no solution to my problem.
The columns that I want to filter all start with the same letters, let's say "DGN." So I have DGN1, DGN2, DGN3, etc. all the way up until DGN25. The two criteria I want to filter on are contains "C18" and starts with "153".
Ideally, I would want to run a code chunk that looks like this:
dgn_subset <- df %>%
select(ID, date, starts_with("DGN") %>%
filter(grepl("C18"|starts_with("153"), starts_with("DGN")))
There are 2 main issues here --
I don't think that grepl can take "starts_with" as an input for the pattern. Also, it can't take "starts_with" as the column argument (I think it may only be able to filter on one column at a time?).
To get the code to work, I could replace the starts_with("153") portion with "153" and the starts_with("DGN") portion with "DGN1," but that gives me many observations that I do not want and it only filters on the first DGN column.
Are there any alternative functions or packages I can use to solve my problem?
Any help is greatly appreciated!

We can use filter with across. where we loop over the columns using c_across specifying the column name match in select_helpers (starts_with), get a logical output with grepl checking for either "C18" or (|) the number that starts with (^) 153
library(dplyr) #1.0.0
library(stringr)
df %>%
# // do a row wise grouping
rowwise() %>%
# // subset the columns that starts with 'DGN' within c_across
# // apply grepl condition on the subset
# // wrap with any for any column in a row meeting the condition
filter(any(grepl("C18|^153", c_across(starts_with("DGN")))))
Or with filter_at
df %>%
# //apply the any_vars along with grepl in filter_at
filter_at(vars(starts_with("DGN")), any_vars(grepl("C18|^153", .)))
data
df <- data.frame(ID = 1:3, DGN1 = c("2_C18", 32, "1532"),
DGN2 = c("24", "C18_2", "23"))

In base R, you can use startsWith to select the columns that you want to look for, using sapply check for the pattern in those columns. Use rowSums to calculate how many times that pattern occurs in each row and then select the row with at-least one occurrence.
cols <- startsWith(names(df), 'DGN')
df[rowSums(sapply(df[cols], grepl, pattern = 'C18|^153')) > 0, ]
Similar logic but with lapply you can do :
df[Reduce(`|`, lapply(df[cols], grepl, pattern = 'C18|^153')), ]

Exclude columns by names in mutate_at in dplyr

I am trying to do something very simple, and yet can't figure out the right way to specify. I simply want to exclude some named columns from mutate_at. It works fine if I specify position, but I don't want to hard code positions.
For example, I want the same output as this:
mtcars %>% mutate_at(-c(1, 2), max)
But, by specifying mpg and cyl column names.
I tried many things, including:
mtcars %>% mutate_at(-c('mpg', 'cyl'), max)
Is there a way to work with names and exclusion in mutate_at?

You can use vars to specify the columns, which works the same way as select() and allows you to exclude columns using -:
mtcars %>% mutate_at(vars(-mpg, -cyl), max)

One option is to pass the strings inside one_of
mtcars %>%
mutate_at(vars(-one_of("mpg", "cyl")), max)

Using dplyr's select where variable names are quoted [duplicate]

This question already has answers here:
Pass a vector of variable names to arrange() in dplyr
(6 answers)
Closed 7 years ago.
Often I'll want to select a subset of variables where the subset is the result of a function. In this simple case, I first get all the variable names which pertain to width characteristics
library(dplyr)
library(magrittr)
data(iris)
width.vars <- iris %>%
names %>%
extract(grep(".Width", .))
Which returns:
>width.vars
[1] "Sepal.Width" "Petal.Width"
It would be useful to be able to use these returns as a way to select columns (and while I'm aware that contains() and its siblings exist, there are plenty of more complicated subsets I would like to perform, and this example is made trivial for the purpose of this example.
If I was to attempt to use this function as a way to select columns, the following happens:
iris %>%
select(Species,
width.vars)
Error: All select() inputs must resolve to integer column positions.
The following do not:
* width.vars
How can I use dplyr::select with a vector of variable names stored as strings?

Within dplyr, most commands have an alternate version that ends with a '_' that accept strings as input; in this case, select_. These are typically what you have to use when you are utilizing dplyr programmatically.
iris %>% select_(.dots=c("Species",width.vars))

First of all, you can do the selection in dplyr with
iris %>% select(Species, contains(".Width"))
No need to create the vector of names separately. But if you did have a list of columns as string names, you could do
width.vars <- c("Sepal.Width", "Petal.Width")
iris %>% select(Species, one_of(width.vars))
See the ?select help page for all the available options.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Add a column based on the dynamically named columns - r

Here, we just need to subset the columns of data (. ) with the string vector and get the rowMeans library(dplyr) mtcars %>% mutate(new_col = rowMeans(.[selected_columns])) mutate doesn't have the funs parameter (funs is already deprecated with list) and it is in mutate_if/mutate_at/mutate_all.

Related

How can I isolate (or filter) a part of a string in several columns at the same time?

Specific column selection from data.table in R

Filtering multiple string columns based on 2 different criteria - questions about "grepl" and "starts_with"

Exclude columns by names in mutate_at in dplyr

Using dplyr's select where variable names are quoted [duplicate]

Categories

Resources