How do I extract elements from a dataframe by pattern? [duplicate] - r

This question already has answers here:
Subset data to contain only columns whose names match a condition
(10 answers)
Closed 3 years ago.
I have a dataframe dat that has many variables like
"x_tp1_y"
"g_tp1_z"
"f_tp2_h"
I would like to extract elements that include "tp1".
I already tried this:
grep("tp1", dat)
grepl("tp1", dat)
dat["tp1",]
I just want R to give me elements with this pattern so I do not have to type in all variable names that are in the dataframe dat.
Like this:
command that extracts elements with pattern "tp1"
R returns parts of the dataframe that have pattern "tp1":
x_tp1_y g_tp1_z
1 2
0 3
And then I would like to create a new dataframe.
I know that I just can use
newdat <- data.frame( dat[[1]], dat[ c(1:30)])
but I have so many elements in my dataframe that this would take ages.
Thank you for your help!

dat[,grep("tp1", colnames(dat))]
grep finds the index numbers in the column names of the data.frame (the vector colnames(dat)) that contain the necessary pattern. "[" subsets

Related

Naming rows in a large dataset by vector with different letters and numbers [duplicate]

This question already has answers here:
Concatenate a vector of strings/character
(8 answers)
Closed last year.
I am trying to rename rows in my dataset. I need to change them like this: first row would be named "IL1", second "IL2",..., "ILn", where n is a number of rows in the dataset.
I know how to change it by for example rownames(df) <- c("IL1","IL2","IL3","IL4"). But type it word by word is possible only in smaller datasets. I need to change it in dataset where are hundreds of rows.
Any ideas? Thank you.
You can use paste0:
rownames(df) <- paste0("IL", 1:nrow(df))

Passing a vector through a select statement [duplicate]

This question already has answers here:
grep using a character vector with multiple patterns
(11 answers)
Closed 3 years ago.
Looking for help to find a way to pass a vector of strings into a select statement. I want to subset a data frame to only output variables that contain the same string as my vector. I don't want it to match exactly and hence need to pass a function like contains as there are some text in the data frame variables that I do not have in my vector.
here is an example of the vector I want to pass into my select statement.
c("clrs_name", "_clrs_sitedetails_value", "_clrs_targetlicence_value",
"clrs_licenceclass", "clrs_licenceownership", "clrs_type", "statuscode")
For example, I want to extract the variable "odate_value_clrs_name" from my data frame and the string "clrs_name" in vector should extract that, but I am not sure how to incorporate contains and a vector into a select statement.
We can use matches in select after collapseing the pattern vector with | by either paste from base R or str_c (str_c would also return NA if there are any NAs). This would not return any error or warning if one of the pattern is missing or doesn't have any match with the column names
library(dplyr)
library(stringr)
df1 %>%
select(matches(str_c(v1, collapse = "|")))
where
v1 <- c("clrs_name", "_clrs_sitedetails_value", "_clrs_targetlicence_value",
"clrs_licenceclass", "clrs_licenceownership", "clrs_type", "statuscode")

Loop Through Column Names with Similar Structure [duplicate]

This question already has answers here:
How to extract columns with same name but different identifiers in R
(3 answers)
Closed 3 years ago.
I have a very large dataset. Of those, a small subset have the same column name with an indexing value that is numeric (unlike the post "How to extract columns with same name but different identifiers in R" where the indexing value is a string). For example
Q_1_1, Q_1_2, Q_1_3, ...
I am looking for a way to either loop through just those columns using the indices or to subset them all at once.
I have tried to use paste() to write their column names but have had no luck. See sample code below
Define Dataframe
df = data.frame("Q_1_1" = rep(1,5),"Q_1_2" = rep(2,5),"Q_1_3" = rep(3,5))
Define the Column Name Using Paste
cn <- as.symbol(paste("Q_1_",1, sep=""))
cn
df$cn
df$Q_1_1
I want df$cn to return the same thing as df$Q_1_1, but df$cn returns NULL.
If you are just trying to subset your data frame by column name, you could use dplyr for subseting all your indexed columns at once and a regex to match all column names with a certain pattern:
library(dplyr)
df = data.frame("Q_1_1" = rep(1,5),"Q_1_2" = rep(2,5),"Q_1_3" = rep(3,5), "A_1" = rep(4,5))
newdf <- df %>%
dplyr::select(matches("Q_[0-9]_[0-9]"))
the [0-9] in the regex matches any digit between the _. Depending on what variable you're trying to match you might have to change the regular expression.
The problem with your solution was that you only saved the name of your columns but did not actually assign it back to the data frame / to a column.
I hope this helps!

Replace column names with single line of code [duplicate]

This question already has answers here:
Rename multiple columns by names
(20 answers)
Closed 4 years ago.
Data frame with 4 columns and want to replace 2nd and 3rd column names only.
data frame=df
col.names =A,B,C,D
New col.names= Z,F
i have tried with the below code :
colnames(df)[2]<-"Z"
colnames(df)[3]<-"F"
but is there any possibility to rename with single line of code ?
Actual data frame contains 150+ colnames, so searching for better solution.
As it is a data.frame, names can also work in place of colnames as names of a data.frame is the column names. Subset the column names with index [2:3] (if it is a range of columns or use [c(2, 3)]) and assign it to the new column names by concatenating (c) names as a vector
names(df)[2:3] <- c("Z", "F")

how to convert a dataframe to a list [duplicate]

This question already has answers here:
data.frame rows to a list
(12 answers)
Closed 4 years ago.
I have a very small csv file that when I import to R, becomes a dataframe. I would like to make this dataframe a list, but "as.list" only reads the dataframe items to me in list form and does not actually make a change to the data. I need to make a properties csv a list in order to use it to create a community in R. Any suggestions would be appreciated!
Technically, a data frame is a list, with the restriction that each element of the list is of the same size. If you want to split your data frame into a list based on the row, you can use split
df_as_list <- split(df, 1:nrow(df))
This can be fancier too, it can be based on the levels of a factor or character vector:
df_as_list <- split(df, df$identifier)
Either of these will create a list of data frames, with some number of rows from the original data frame assigned to each element of the list.

Resources