Rename columns of a data frame by searching column name - r

I am writing a wrapper to ggplot to produce multiple graphs based on various datasets. As I am passing the column names to the function, I need to rename the column names so that ggplot can understand the reference.
However, I am struggling with renaming of the columns of a data frame
here's a data frame:
df <- data.frame(col1=1:3,col2=3:5,col3=6:8)
here are my column names for search:
col1_search <- "col1"
col2_search <- "col2"
col3_search <- "col3"
and here are column names to replace:
col1_replace <- "new_col1"
col2_replace <- "new_col2"
col3_replace <- "new_col3"
when I search for column names, R sorts the column indexes and disregards the search location.
for example, when I run the following code, I expected the new headers to be new_col1, new_col2, and new_col3, instead the new column names are: new_col3, new_col2, and new_col1
colnames(df)[names(df) %in% c(col3_search,col2_search,col1_search)] <- c(col3_replace,col2_replace,col1_replace)
Does anyone have a solution where I can search for column names and replace them in that order?

require(plyr)
df <- data.frame(col2=1:3,col1=3:5,col3=6:8)
df <- rename(df, c("col1"="new_col1", "col2"="new_col2", "col3"="new_col3"))
df
And you can be creative in making that second argument to rename so that it is not so manual.

> names(df)[grep("^col", names(df))] <-
paste("new", names(df)[grep("^col", names(df))], sep="_")
> names(df)
[1] "new_col1" "new_col2" "new_col3"
If you want to replace an ordered set of column names with an arbitrary character vector, then this should work:
names(df)[sapply(oldNames, grep, names(df) )] <- newNames
The sapply()-ed grep will give you the proper locations for the 'newNames' vector. I suppose you might want to make sure there are a complete set of matches if you were building this into a function.

hmm, this might be way to complicated, but the first that come into my mind:
lookup <- data.frame(search = c(col3_search,col2_search,col1_search),
replace = c(col3_replace,col2_replace,col1_replace))
colnames(df) <- lookup$replace[match(lookup$search, colnames(df))]

I second #justin's aes_string suggestion. But for future renaming you can try.
require(stringr)
df <- data.frame(col1=1:3,col2=3:5,col3=6:8)
oldNames <- c("col1", "col2", "col3")
newNames <- c("new_col1", "new_col2", "new_col3")
names(df) <- str_replace(string=names(df), pattern=oldNames, replacement=newNames)

Related

Renaming Data Frame Column Names using Partial Matching

If I have the code at the bottom of the post, how can I replace the column names of df1 with the second column of df2 using partial matching of df2's first column? The output should look like df3. The entirety of my data frame is filled with many other names besides .length (i.e. CS1.1.width, CS2.12.height, etc), but the CS#.#. always remains in the name.
I then need to remove the ".length" from the colnames.
I have tried using pmatch below for the first part of the question, but the output is not correct.
names(df1) <- df2$new[pmatch(names(df1), df2$partial_atch)]
How would I go about this? Thanks.
old <- c("CS1.1.length", "CS1.7.length", "CS1.10.length", "CS1.12.length", "CS2.4.length", "CS2.6.length", "CS2.9.length", "CS2.11.length", "CS1.1.height")
df1 <- data.frame()
for (k in old) df1[[k]] <- as.character()
new <- c("Bob", "Alex", "Gary", "Taylor", "Tom", "John", "Pat", "Mary")
partial_match <- c("CS1.1", "CS1.7", "CS1.10", "CS1.12", "CS2.4", "CS2.6", "CS2.9", "CS2.11")
df2 <- data.frame(Partial_Match = partial_match, Name = new)
new1 <- c("Bob.length", "Alex.length", "Gary.length", "Taylor.length", "Tom.length", "John.length", "Pat.length", "Mary.length", "Bob.height")
df3 <- data.frame()
for (k in new) df3[[k]] <- as.character()
Edit: The number of columns in df1 is greater than the number of elements in partial_match, so added an additional column in df1 as example.
Here's an option with str_replace from the stringi package:
This works because you can use a vector of pattern = to replace with a matching replacement =.
We need to paste on the trailing . because this prevents CS1.1 replacing CS1.11 and CS1.10.
library(stringi)
stri_replace_all_regex(names(df1),
pattern = paste0(as.character(df2$Partial_Match),"\\."),
replacement = paste0(as.character(df2$Name),"\\."),
vectorize_all = FALSE)
#[1] "Bob.length" "Alex.length" "Gary.length" "Taylor.length" "Tom.length" "John.length" "Pat.length"
#[8] "Mary.length"

replace substring in subset of column names in R

I have a dataframe where I want to replace a subset of the columns with new names created by prepending an identifier to the old one. For example, to prepend columns 3:7 with the string, "TEST", I tried the following.
What am I missing here?
# Make a test df
df <- data.frame(replicate(10,sample(0:1,100,rep=TRUE)))
#Subsetting works fine
colnames(df[,3:7])
#sub works fine
sub("^", "TEST.", colnames(df[,3:7]))
#replacing the subset of column names with sub does not
colnames(df[,3:7]) <- sub("^", "TEST.", colnames(df[,3:7]))
colnames(df)
#Also doesn't work
colnames(df[,3:7]) <- paste("TEST.", colnames(df[,3:7]), sep ="")
colnames(df)
The column names should be a vector, with the indices outside of the parentheses:
colnames(df)[3:7] <- sub("^", "TEST.", colnames(df)[3:7])
You could also:
colnames(df)[3:7] <- paste0("TEST.", colnames(df)[3:7])

cbind dataframe in R with placeholders

Imagine I have three dataframes:
data.frame1 <- data.frame(x=c(1:10))
data.frame2 <- data.frame(x=c(11:20))
data.frame3 <- data.frame(x=c(21:30))
I could bind them together by explicitely naming each of them:
res.data.frame <- cbind(data.frame1, data.frame2, data.frame3)
However, I am looking for more dynamic ways to do so, e.g. with placeholders.
This saves somehow the three dataframes in a new dataframe, but not in a usable format:
res.data.frame1 <- as.data.frame(mapply(get, grep("^data.frame.$", ls(), value=T)))
This command would only save the three names:
res.data.frame2 <- grep(pattern = "^data.frame.$", ls(), value=T)
This one only gives an error message:
res.data.frame3 <- do.call(cbind, lapply(ls(pattern = "^data.frame.$")), get)
Does anyone know the right way to do this?
Something like this maybe?
Assuming ls()
# [1] "data.frame1" "data.frame2" "data.frame3"
as.data.frame(Reduce("cbind", sapply(ls(), function(i) get(i))))
Based on #akrun's comment, this can be simplified to
as.data.frame(Reduce("cbind", mget(ls())))

Replace column names with the string that partially match in R

I have a dataframe with column names mycolumns (have more than 2000 columns). I have this obect called myobject which contains sets of strings that partially matches with the column names(each matches with only one column name) in mycolumns. I want to replace the column names with the respective strings in my object.So the new column names of the dataframe will be "jackal","cat.11","Rat.Fox". Please note this has to be done by using pattern matching or regex as the order of the matched names could be different in myobject.
mycolumns <- c("jackal.fox11.FAD", "cat.11.miss.DAD", "Rat.Fox.11.33.DDG")
myobject <- c("jackal","Rat.Fox","cat.11")
How about a for loop with grep:
#your example
mycolumns <- c("jackal.fox11.FAD", "cat.11.miss.DAD", "Rat.Fox.11.33.DDG")
myobject <- c("jackal","Rat.Fox","cat.11")
#for loop solution
for(i in myobject){
mycolumns[grepl(i, mycolumns)] <- i
}
Data setup:
> mycols = qw("jackal.fox11.FAD cat.11.miss.DAD Rat.Fox.11.33.DDG")
> df = read.csv(textConnection("1,2,3"), header=F)
> names(df) = qw("jackal Rat.Fox cat.11")
The business:
> names(df) = sapply(names(df), function(n) mycols[grepl(n, mycols)])
The result:
> names(df)
[1] "jackal.fox11.FAD" "Rat.Fox.11.33.DDG" "cat.11.miss.DAD"
props to #luke-singham for basis of approach
qw defined in my .Rprofile as in https://stackoverflow.com/a/31932661/338303
If you can guarantee that the names are the same as here, this is quite simple. However, that situation is trivial, so there doesn't seem to be any value in the solution vs just names(df) <- myobject
names(df)[c(grep(myobject[1], mycolumns), grep(myobject[2], mycolumns), grep(myobject[3], mycolumns))] <- myobject

r filtering a dataframe by a column contating a keyword

I am trying to filter a column which contains a keyword (in this example dog) but I am having problems.
id <- c(1,2,3,4)
type <- c("dog1","dog2" ,"cat1","cat2")
df1 <- data.frame(id,type)
df1
dfdog <- subset(packagesall, type %in% c("dog"))
dfdog
I would be grateful for your help.
Try grep:
df1[grep("dog",df1$type),]

Resources