Replace column names with the string that partially match in R - r

I have a dataframe with column names mycolumns (have more than 2000 columns). I have this obect called myobject which contains sets of strings that partially matches with the column names(each matches with only one column name) in mycolumns. I want to replace the column names with the respective strings in my object.So the new column names of the dataframe will be "jackal","cat.11","Rat.Fox". Please note this has to be done by using pattern matching or regex as the order of the matched names could be different in myobject.
mycolumns <- c("jackal.fox11.FAD", "cat.11.miss.DAD", "Rat.Fox.11.33.DDG")
myobject <- c("jackal","Rat.Fox","cat.11")

How about a for loop with grep:
#your example
mycolumns <- c("jackal.fox11.FAD", "cat.11.miss.DAD", "Rat.Fox.11.33.DDG")
myobject <- c("jackal","Rat.Fox","cat.11")
#for loop solution
for(i in myobject){
mycolumns[grepl(i, mycolumns)] <- i
}

Data setup:
> mycols = qw("jackal.fox11.FAD cat.11.miss.DAD Rat.Fox.11.33.DDG")
> df = read.csv(textConnection("1,2,3"), header=F)
> names(df) = qw("jackal Rat.Fox cat.11")
The business:
> names(df) = sapply(names(df), function(n) mycols[grepl(n, mycols)])
The result:
> names(df)
[1] "jackal.fox11.FAD" "Rat.Fox.11.33.DDG" "cat.11.miss.DAD"
props to #luke-singham for basis of approach
qw defined in my .Rprofile as in https://stackoverflow.com/a/31932661/338303

If you can guarantee that the names are the same as here, this is quite simple. However, that situation is trivial, so there doesn't seem to be any value in the solution vs just names(df) <- myobject
names(df)[c(grep(myobject[1], mycolumns), grep(myobject[2], mycolumns), grep(myobject[3], mycolumns))] <- myobject

Related

R function used to rename columns of a data frames

I have a data frame, say acs10. I need to relabel the columns. To do so, I created another data frame, named as labelName with two columns: The first column contains the old column names, and the second column contains names I want to use, like the table below:
column_1
column_2
oldLabel1
newLabel1
oldLabel2
newLabel2
Then, I wrote a for loop to change the column names:
for (i in seq_len(nrow(labelName))){
names(acs10)[names(acs10) == labelName[i,1]] <- labelName[i,2]}
, and it works.
However, when I tried to put the for loop into a function, because I need to rename column names for other data frames as well, the function failed. The function I wrote looks like below:
renameDF <- function(dataF,varName){
for (i in seq_len(nrow(varName))){
names(dataF)[names(dataF) == varName[i,1]] <- varName[i,2]
print(varName[i,1])
print(varName[i,2])
print(names(dataF))
}
}
renameDF(acs10, labelName)
where dataF is the data frame whose names I need to change, and varName is another data frame where old variable names and new variable names are paired. I used print(names(dataF)) to debug, and the print out suggests that the function works. However, the calling the function does not actually change the column names. I suspect it has something to do with the scope, but I want to know how to make it works.
In your function you need to return the changed dataframe.
renameDF <- function(dataF,varName){
for (i in seq_len(nrow(varName))){
names(dataF)[names(dataF) == varName[i,1]] <- varName[i,2]
}
return(dataF)
}
You can also simplify this and avoid for loop by using match :
renameDF <- function(dataF,varName){
names(dataF) <- varName[[2]][match(names(dataF), varName[[1]])]
return(dataF)
}
This should do the whole thing in one line.
colnames(acs10)[colnames(acs10) %in% labelName$column_1] <- labelName$column_2[match(colnames(acs10)[colnames(acs10) %in% labelName$column_1], labelName$column_1)]
This will work if the column name isn't in the data dictionary, but it's a bit more convoluted:
library(tibble)
df <- tribble(~column_1,~column_2,
"oldLabel1", "newLabel1",
"oldLabel2", "newLabel2")
d <- tibble(oldLabel1 = NA, oldLabel2 = NA, oldLabel3 = NA)
fun <- function(dat, dict) {
names(dat) <- sapply(names(dat), function(x) ifelse(x %in% dict$column_1, dict[dict$column_1 == x,]$column_2, x))
dat
}
fun(d, df)
You can create a function containing just on line of code.
renameDF <- function(df, varName){
setNames(df,varName[[2]][pmatch(names(df),varName[[1]])])
}

Removing a row by string-matching in R regardless of whether it exists or not

I am trying to remove a row in a dataframe based on string matching. I'm using:
data <- data[- grep("my_string", data$field1),]
When there's an actual row with the value "my_string" in data$field1 this works as expected and it drops that row. However, if there is no string "my_string", it creates an empty dataframe. How to I do write this so that it allows for the possibility of the string to not exist, and still keeps my data frame intact?
It may be better to use grepl and negate with !
data[!grepl("my_string", data$field1),]
Or another option is setdiff on grep
data[setdiff(seq_len(nrow(data)), grep("my_string", data$field1)),]
You can use a plain if statement.
df <- data.frame(fieled = c("my_string", "my_string_not", "something", "something_else"),
numbers = 1:4)
result <- grep("gabriel", df$fieled)
if (length(result))
{
df <- df[- result, ]
}
df
result <- grep("my_string", df$fieled)
if (length(result))
{
df <- df[- result, ]
}
df

Converting list of Characters to Named num in R

I want to create a dataframe with 3 columns.
#First column
name_list = c("ABC_D1", "ABC_D2", "ABC_D3",
"ABC_E1", "ABC_E2", "ABC_E3",
"ABC_F1", "ABC_F2", "ABC_F3")
df1 = data.frame(C1 = name_list)
These names in column 1 are a bunch of named results of the cor.test function. The second column should consist of the correlation coefficents I get by writing ABC_D1$estimate, ABC_D2$estimate.
My problem is now that I dont want to add the $estimate manually to every single name of the first column. I tried this:
df1$C2 = paste0(df1$C1, '$estimate')
But this doesnt work, it only gives me this back:
"ABC_D1$estimate", "ABC_D2$estimate", "ABC_D3$estimate",
"ABC_E1$estimate", "ABC_E2$estimate", "ABC_E3$estimate",
"ABC_F1$estimate", "ABC_F2$estimate", "ABC_F3$estimate")
class(df1$C2)
[1] "character
How can I get the numeric result for ABC_D1$estimate in my dataframe? How can I convert these characters into Named num? The 3rd column should constist of the results of $p.value.
As pointed out by #DSGym there are several problems, including the it is not very convenient to have a list of character names, and it would be better to have a list of object instead.
Anyway, I think you can get where you want using:
estimates <- lapply(name_list, function(dat) {
dat_l <- get(dat)
dat_l[["estimate"]]
}
)
cbind(name_list, estimates)
This is not really advisable but given those premises...
Ok I think now i know what you need.
eval(parse(text = paste0("ABC_D1", '$estimate')))
You connect the two strings and use the functions parse and eval the get your results.
This it how to do it for your whole data.frame:
name_list = c("ABC_D1", "ABC_D2", "ABC_D3",
"ABC_E1", "ABC_E2", "ABC_E3",
"ABC_F1", "ABC_F2", "ABC_F3")
df1 = data.frame(C1 = name_list)
df1$C2 <- map_dbl(paste0(df1$C1, '$estimate'), function(x) eval(parse(text = x)))

Split the dataframe into subset dataframes and naming them on-the-fly (for loop)

I have 9880 records in a data frame, I am trying to split it into 9 groups of 1000 each and the last group will have 880 records and also name them accordingly. I used for-loop for 1-9 groups but manually for the last 880 records, but i am sure there are better ways to achieve this,
library(sqldf)
for (i in 0:8)
{
assign(paste("test",i,sep="_"),as.data.frame(final_9880[((1000*i)+1):(1000*(i+1)), (1:53)]))
}
test_9<- num_final_9880[9001:9880,1:53]
also am unable to append all the parts in one for-loop!
#append all parts
all_9880<-rbind(test_0,test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9)
Any help is appreciated, thanks!
A small variation on this solution
ls <- split(final_9880, rep(0:9, each = 1000, length.out = 9880)) # edited to Roman's suggestion
for(i in 1:10) assign(paste("test",i,sep="_"), ls[[i]])
Your command for binding should work.
Edit
If you have many dataframes you can use a parse-eval combo. I use the package gsubfn for readability.
library(gsubfn)
nms <- paste("test", 1:10, sep="_", collapse=",")
eval(fn$parse(text='do.call(rbind, list($nms))'))
How does this work? First I create a string containing the comma-separated list of the dataframes
> paste("test", 1:10, sep="_", collapse=",")
[1] "test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9,test_10"
Then I use this string to construct the list
list(test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9,test_10)
using parse and eval with string interpolation.
eval(fn$parse(text='list($nms)'))
String interpolation is implemented via the fn$ prefix of parse, its effect is to intercept and substitute $nms with the string contained in the variable nms. Parsing and evaluating the string "list($mns)" creates the list needed. In the solution the rbind is included in the parse-eval combo.
EDIT 2
You can collect all variables with a certain pattern, put them in a list and bind them by rows.
do.call("rbind", sapply(ls(pattern = "test_"), get, simplify = FALSE))
ls finds all variables with a pattern "test_"
sapply retrieves all those variables and stores them in a list
do.call flattens the list row-wise.
No for loop required -- use split
data <- data.frame(a = 1:9880, b = sample(letters, 9880, replace = TRUE))
splitter <- (data$a-1) %/% 1000
.list <- split(data, splitter)
lapply(0:9, function(i){
assign(paste('test',i,sep='_'), .list[[(i+1)]], envir = .GlobalEnv)
return(invisible())
})
all_9880<-rbind(test_0,test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9)
identical(all_9880,data)
## [1] TRUE

Rename columns of a data frame by searching column name

I am writing a wrapper to ggplot to produce multiple graphs based on various datasets. As I am passing the column names to the function, I need to rename the column names so that ggplot can understand the reference.
However, I am struggling with renaming of the columns of a data frame
here's a data frame:
df <- data.frame(col1=1:3,col2=3:5,col3=6:8)
here are my column names for search:
col1_search <- "col1"
col2_search <- "col2"
col3_search <- "col3"
and here are column names to replace:
col1_replace <- "new_col1"
col2_replace <- "new_col2"
col3_replace <- "new_col3"
when I search for column names, R sorts the column indexes and disregards the search location.
for example, when I run the following code, I expected the new headers to be new_col1, new_col2, and new_col3, instead the new column names are: new_col3, new_col2, and new_col1
colnames(df)[names(df) %in% c(col3_search,col2_search,col1_search)] <- c(col3_replace,col2_replace,col1_replace)
Does anyone have a solution where I can search for column names and replace them in that order?
require(plyr)
df <- data.frame(col2=1:3,col1=3:5,col3=6:8)
df <- rename(df, c("col1"="new_col1", "col2"="new_col2", "col3"="new_col3"))
df
And you can be creative in making that second argument to rename so that it is not so manual.
> names(df)[grep("^col", names(df))] <-
paste("new", names(df)[grep("^col", names(df))], sep="_")
> names(df)
[1] "new_col1" "new_col2" "new_col3"
If you want to replace an ordered set of column names with an arbitrary character vector, then this should work:
names(df)[sapply(oldNames, grep, names(df) )] <- newNames
The sapply()-ed grep will give you the proper locations for the 'newNames' vector. I suppose you might want to make sure there are a complete set of matches if you were building this into a function.
hmm, this might be way to complicated, but the first that come into my mind:
lookup <- data.frame(search = c(col3_search,col2_search,col1_search),
replace = c(col3_replace,col2_replace,col1_replace))
colnames(df) <- lookup$replace[match(lookup$search, colnames(df))]
I second #justin's aes_string suggestion. But for future renaming you can try.
require(stringr)
df <- data.frame(col1=1:3,col2=3:5,col3=6:8)
oldNames <- c("col1", "col2", "col3")
newNames <- c("new_col1", "new_col2", "new_col3")
names(df) <- str_replace(string=names(df), pattern=oldNames, replacement=newNames)

Resources