Passing a vector through a select statement [duplicate] - r

This question already has answers here:
grep using a character vector with multiple patterns
(11 answers)
Closed 3 years ago.
Looking for help to find a way to pass a vector of strings into a select statement. I want to subset a data frame to only output variables that contain the same string as my vector. I don't want it to match exactly and hence need to pass a function like contains as there are some text in the data frame variables that I do not have in my vector.
here is an example of the vector I want to pass into my select statement.
c("clrs_name", "_clrs_sitedetails_value", "_clrs_targetlicence_value",
"clrs_licenceclass", "clrs_licenceownership", "clrs_type", "statuscode")
For example, I want to extract the variable "odate_value_clrs_name" from my data frame and the string "clrs_name" in vector should extract that, but I am not sure how to incorporate contains and a vector into a select statement.

We can use matches in select after collapseing the pattern vector with | by either paste from base R or str_c (str_c would also return NA if there are any NAs). This would not return any error or warning if one of the pattern is missing or doesn't have any match with the column names
library(dplyr)
library(stringr)
df1 %>%
select(matches(str_c(v1, collapse = "|")))
where
v1 <- c("clrs_name", "_clrs_sitedetails_value", "_clrs_targetlicence_value",
"clrs_licenceclass", "clrs_licenceownership", "clrs_type", "statuscode")

Related

How to point an R function to a particular column of a dataset? [duplicate]

This question already has answers here:
Dynamically select data frame columns using $ and a character value
(10 answers)
Closed 2 years ago.
I have a dataset called df, which has columns a and b with three integers each. I want to write a function for the mean (obviously this already exists; I want to write a larger function and this appears to be where problems are occurring). However, this function returns NA:
mean_function <- function(x) {
mean(df$x)
}
mean_function(a) returns NA, while mean(df$a) returns 2. Is there something I'm missing about how R functions handle datasets, or another problem?
We need [[ instead of $ as it will literally check for x as column and pass a string
mean_function <- function(x) {mean(df[[x]])}
mean_function("a")
If we need to pass unquoted column name, substitute and convert to character with deparse
mean_function<- function(x) {
x <- deparse(substitute(x))
mean(df[[x]]
}
mean_function(a)

How do I extract elements from a dataframe by pattern? [duplicate]

This question already has answers here:
Subset data to contain only columns whose names match a condition
(10 answers)
Closed 3 years ago.
I have a dataframe dat that has many variables like
"x_tp1_y"
"g_tp1_z"
"f_tp2_h"
I would like to extract elements that include "tp1".
I already tried this:
grep("tp1", dat)
grepl("tp1", dat)
dat["tp1",]
I just want R to give me elements with this pattern so I do not have to type in all variable names that are in the dataframe dat.
Like this:
command that extracts elements with pattern "tp1"
R returns parts of the dataframe that have pattern "tp1":
x_tp1_y g_tp1_z
1 2
0 3
And then I would like to create a new dataframe.
I know that I just can use
newdat <- data.frame( dat[[1]], dat[ c(1:30)])
but I have so many elements in my dataframe that this would take ages.
Thank you for your help!
dat[,grep("tp1", colnames(dat))]
grep finds the index numbers in the column names of the data.frame (the vector colnames(dat)) that contain the necessary pattern. "[" subsets

Loop Through Column Names with Similar Structure [duplicate]

This question already has answers here:
How to extract columns with same name but different identifiers in R
(3 answers)
Closed 3 years ago.
I have a very large dataset. Of those, a small subset have the same column name with an indexing value that is numeric (unlike the post "How to extract columns with same name but different identifiers in R" where the indexing value is a string). For example
Q_1_1, Q_1_2, Q_1_3, ...
I am looking for a way to either loop through just those columns using the indices or to subset them all at once.
I have tried to use paste() to write their column names but have had no luck. See sample code below
Define Dataframe
df = data.frame("Q_1_1" = rep(1,5),"Q_1_2" = rep(2,5),"Q_1_3" = rep(3,5))
Define the Column Name Using Paste
cn <- as.symbol(paste("Q_1_",1, sep=""))
cn
df$cn
df$Q_1_1
I want df$cn to return the same thing as df$Q_1_1, but df$cn returns NULL.
If you are just trying to subset your data frame by column name, you could use dplyr for subseting all your indexed columns at once and a regex to match all column names with a certain pattern:
library(dplyr)
df = data.frame("Q_1_1" = rep(1,5),"Q_1_2" = rep(2,5),"Q_1_3" = rep(3,5), "A_1" = rep(4,5))
newdf <- df %>%
dplyr::select(matches("Q_[0-9]_[0-9]"))
the [0-9] in the regex matches any digit between the _. Depending on what variable you're trying to match you might have to change the regular expression.
The problem with your solution was that you only saved the name of your columns but did not actually assign it back to the data frame / to a column.
I hope this helps!

R loop over string and use it to refer to column names [duplicate]

This question already has answers here:
Dynamically select data frame columns using $ and a character value
(10 answers)
Closed 4 years ago.
I have data frame with column names 1990.x ..2000.x, 1990.y,..2000.y. I want to replace NAs in variables ending with ".x" with values from .y from corresponding year. It is element by element computation of formula 1990.x = 0.5+0.2*log(1990.y)
I wanted to do something like this:
for (v in colnames(df[ ,grepl(".x",names(df))])) {
print(v)
df$v <- ifelse(is.na(df$v), ols$coefficients[1]+ols$coefficients[2]*log(df$gsub(".x",".y",v)), df$v)
}
but this is not working. How can i make this loop working, or is there any better solution?
Thanks
The $ operator is available for convenience, but can't be used inside of a for loop where the value of the column you're selecting is going to change, e.g, in your for loop. Your code will need to use the [ operator (open and closed square brackets) instead:
df[,v] <- ifelse(is.na(df[,v]), ols$coefficients[1]+ols$coefficients[2]*log(df$gsub(".x",".y",v)), df[,v])

matches patterns in vector with strings in data frame [duplicate]

This question already has answers here:
Matching multiple patterns
(6 answers)
Closed 6 years ago.
I have a data frame that contains two types cols and vector with names.
How select some rows in data frame matches with vector strings.
name = c("p4#HPS1", "p7#HPS2", "p4#HPS3", "p7#HPS4", "p7#HPS5", "p9#HPS6", "p11#HPS7", "p10#HPS8", "p15#HPS9")
expression = c(118.84, 90.04, 106.6, 104.99, 93.2, 66.84, 90.02, 108.03, 111.83)
dataset <- as.data.frame(cbind(name, expression))
nam <- c("HPS5", "HPS6", "HPS9", "HPS2")
The function should return date frame only for the specified lines
I try
dataset[mapply(grepl,nam,dataset$name)]
but it didn't work
We can use paste with collapse on the 'nam', use it as pattern argument in grep, get the index and subset the 'dataset'
dataset[grep(paste(nam, collapse="|"), dataset$name),]
If we are using the OP's code, wrap the 'name' column inside a list or else the mapply will go through individual elements of 'name' and as the number elements are not the same in 'name' and 'nam', this will throw a warning about the longer argument not a multiple of length of shorter. The mapply will return a logical matrix from which we take the rowSums and check whether it is greater than 0 to get a logical vector for subsetting the rows.
dataset[rowSums(mapply(grepl, nam, list(dataset$name)))>0,]

Resources