replace substring in subset of column names in R - r

I have a dataframe where I want to replace a subset of the columns with new names created by prepending an identifier to the old one. For example, to prepend columns 3:7 with the string, "TEST", I tried the following.
What am I missing here?
# Make a test df
df <- data.frame(replicate(10,sample(0:1,100,rep=TRUE)))
#Subsetting works fine
colnames(df[,3:7])
#sub works fine
sub("^", "TEST.", colnames(df[,3:7]))
#replacing the subset of column names with sub does not
colnames(df[,3:7]) <- sub("^", "TEST.", colnames(df[,3:7]))
colnames(df)
#Also doesn't work
colnames(df[,3:7]) <- paste("TEST.", colnames(df[,3:7]), sep ="")
colnames(df)

The column names should be a vector, with the indices outside of the parentheses:
colnames(df)[3:7] <- sub("^", "TEST.", colnames(df)[3:7])
You could also:
colnames(df)[3:7] <- paste0("TEST.", colnames(df)[3:7])

Related

Renaming Data Frame Column Names using Partial Matching

If I have the code at the bottom of the post, how can I replace the column names of df1 with the second column of df2 using partial matching of df2's first column? The output should look like df3. The entirety of my data frame is filled with many other names besides .length (i.e. CS1.1.width, CS2.12.height, etc), but the CS#.#. always remains in the name.
I then need to remove the ".length" from the colnames.
I have tried using pmatch below for the first part of the question, but the output is not correct.
names(df1) <- df2$new[pmatch(names(df1), df2$partial_atch)]
How would I go about this? Thanks.
old <- c("CS1.1.length", "CS1.7.length", "CS1.10.length", "CS1.12.length", "CS2.4.length", "CS2.6.length", "CS2.9.length", "CS2.11.length", "CS1.1.height")
df1 <- data.frame()
for (k in old) df1[[k]] <- as.character()
new <- c("Bob", "Alex", "Gary", "Taylor", "Tom", "John", "Pat", "Mary")
partial_match <- c("CS1.1", "CS1.7", "CS1.10", "CS1.12", "CS2.4", "CS2.6", "CS2.9", "CS2.11")
df2 <- data.frame(Partial_Match = partial_match, Name = new)
new1 <- c("Bob.length", "Alex.length", "Gary.length", "Taylor.length", "Tom.length", "John.length", "Pat.length", "Mary.length", "Bob.height")
df3 <- data.frame()
for (k in new) df3[[k]] <- as.character()
Edit: The number of columns in df1 is greater than the number of elements in partial_match, so added an additional column in df1 as example.
Here's an option with str_replace from the stringi package:
This works because you can use a vector of pattern = to replace with a matching replacement =.
We need to paste on the trailing . because this prevents CS1.1 replacing CS1.11 and CS1.10.
library(stringi)
stri_replace_all_regex(names(df1),
pattern = paste0(as.character(df2$Partial_Match),"\\."),
replacement = paste0(as.character(df2$Name),"\\."),
vectorize_all = FALSE)
#[1] "Bob.length" "Alex.length" "Gary.length" "Taylor.length" "Tom.length" "John.length" "Pat.length"
#[8] "Mary.length"

R Remove columns from a data.frame with other df

Hello I have 2 dataframes:
df1 looks like:
and the df2 looks like:
I have noticied that the df1 has point symbol (.), while df2 has "-". It is weird because both of them, if I open with a text editor or excel, they have "-".
What I need is to drop all the columns of df1 that match with a value of df2.
I have used this:
DataGenSample = df1[,!(names(df1) %in% df2)]
#DataGenSample <- df1[ , !(colnames(df1) %in% df2)]
but there is no change.
All the Data can be foun here. Whith the code that I have used.
# Data (df1):
DataGen <- read.table("data_CNA.txt",sep="\t", header=TRUE, check.names = FALSE)
# Samples (df2):
DeleteSample <- read.table("MuestrasEliminar.txt",sep="\t", header=TRUE, check.names = FALSE)
#Delete columns:
#DataGenSample = DataGen[,!(names(DataGen) %in% DeleteSample)]
DataGenSample <- DataGen[ , !(colnames(DataGen) %in% DeleteSample)]
The issue is - vs ..
When you read in the data, your read command probably has an argument like check.names that changes the names to make them "standard" R names - which means no punctuation other than _ and .. If you set check.names = FALSE the original names will be kept, and your code should work just fine.
Ok, I found out that you need to convert your df in vector first:
vecDeleteSample <- DeleteSample$SAMPLE_ID
And then you can drop the mathing columns of your vector/list:
DataGenSample <- DataGen[,!(names(DataGen) %in% vecDeleteSample)]

gsub not working on colnames?

I have a dataframe called df with column names in the following format:
"A Agarwal" "A Agrawal" "A Balachandran"
"A.Brush" "A.Casavant" "A.Chakrabarti"
They are first initial and last name. However, some of them are separated with a space, while other are with a period. I need to replace the period with a period.(The first column is called author.ID, and I excluded it from the following code)
I have tried the following codes but the resulting colnames still do not change.
colnames(df[, -1]) = gsub("\\s", "\\.", colnames(df[, -1]))
colnames(df[, -1]) = gsub(" ", ".", colnames(df[, -1]))
What am I doing wrong?
Thanks.
Note that df[, -1] gets you all rows and columns except the first column (see this reference). In order to modify the column names you should use colnames(df).
To replace the first literal space with a dot, use
colnames(df) <- sub(" ", ".", colnames(df), fixed=TRUE)
If there can be more than one whitespace, use a regex:
colnames(df) <- sub("\\s+", ".", colnames(df))
If you need to remove all whitespaces sequences with a single dot in the column names, use gsub:
colnames(df) <- gsub("\\s+", ".", colnames(df))

Replace column names with the string that partially match in R

I have a dataframe with column names mycolumns (have more than 2000 columns). I have this obect called myobject which contains sets of strings that partially matches with the column names(each matches with only one column name) in mycolumns. I want to replace the column names with the respective strings in my object.So the new column names of the dataframe will be "jackal","cat.11","Rat.Fox". Please note this has to be done by using pattern matching or regex as the order of the matched names could be different in myobject.
mycolumns <- c("jackal.fox11.FAD", "cat.11.miss.DAD", "Rat.Fox.11.33.DDG")
myobject <- c("jackal","Rat.Fox","cat.11")
How about a for loop with grep:
#your example
mycolumns <- c("jackal.fox11.FAD", "cat.11.miss.DAD", "Rat.Fox.11.33.DDG")
myobject <- c("jackal","Rat.Fox","cat.11")
#for loop solution
for(i in myobject){
mycolumns[grepl(i, mycolumns)] <- i
}
Data setup:
> mycols = qw("jackal.fox11.FAD cat.11.miss.DAD Rat.Fox.11.33.DDG")
> df = read.csv(textConnection("1,2,3"), header=F)
> names(df) = qw("jackal Rat.Fox cat.11")
The business:
> names(df) = sapply(names(df), function(n) mycols[grepl(n, mycols)])
The result:
> names(df)
[1] "jackal.fox11.FAD" "Rat.Fox.11.33.DDG" "cat.11.miss.DAD"
props to #luke-singham for basis of approach
qw defined in my .Rprofile as in https://stackoverflow.com/a/31932661/338303
If you can guarantee that the names are the same as here, this is quite simple. However, that situation is trivial, so there doesn't seem to be any value in the solution vs just names(df) <- myobject
names(df)[c(grep(myobject[1], mycolumns), grep(myobject[2], mycolumns), grep(myobject[3], mycolumns))] <- myobject

Rename columns of a data frame by searching column name

I am writing a wrapper to ggplot to produce multiple graphs based on various datasets. As I am passing the column names to the function, I need to rename the column names so that ggplot can understand the reference.
However, I am struggling with renaming of the columns of a data frame
here's a data frame:
df <- data.frame(col1=1:3,col2=3:5,col3=6:8)
here are my column names for search:
col1_search <- "col1"
col2_search <- "col2"
col3_search <- "col3"
and here are column names to replace:
col1_replace <- "new_col1"
col2_replace <- "new_col2"
col3_replace <- "new_col3"
when I search for column names, R sorts the column indexes and disregards the search location.
for example, when I run the following code, I expected the new headers to be new_col1, new_col2, and new_col3, instead the new column names are: new_col3, new_col2, and new_col1
colnames(df)[names(df) %in% c(col3_search,col2_search,col1_search)] <- c(col3_replace,col2_replace,col1_replace)
Does anyone have a solution where I can search for column names and replace them in that order?
require(plyr)
df <- data.frame(col2=1:3,col1=3:5,col3=6:8)
df <- rename(df, c("col1"="new_col1", "col2"="new_col2", "col3"="new_col3"))
df
And you can be creative in making that second argument to rename so that it is not so manual.
> names(df)[grep("^col", names(df))] <-
paste("new", names(df)[grep("^col", names(df))], sep="_")
> names(df)
[1] "new_col1" "new_col2" "new_col3"
If you want to replace an ordered set of column names with an arbitrary character vector, then this should work:
names(df)[sapply(oldNames, grep, names(df) )] <- newNames
The sapply()-ed grep will give you the proper locations for the 'newNames' vector. I suppose you might want to make sure there are a complete set of matches if you were building this into a function.
hmm, this might be way to complicated, but the first that come into my mind:
lookup <- data.frame(search = c(col3_search,col2_search,col1_search),
replace = c(col3_replace,col2_replace,col1_replace))
colnames(df) <- lookup$replace[match(lookup$search, colnames(df))]
I second #justin's aes_string suggestion. But for future renaming you can try.
require(stringr)
df <- data.frame(col1=1:3,col2=3:5,col3=6:8)
oldNames <- c("col1", "col2", "col3")
newNames <- c("new_col1", "new_col2", "new_col3")
names(df) <- str_replace(string=names(df), pattern=oldNames, replacement=newNames)

Resources