rename columns containing pattern in r using plyr rename function - r

I would like to rename all columns in a dataframe containing a pattern in r. Ie, I would like to substitute the column name "variable" for all columns containing "variable", such as "htn.variable". I thought I could use rename from plyr and grepl. I have created an example:
exp<-data.frame(htn.variable = c(1,2,3), id = c(5,6,7), visit = c(1,3,4))
require(plyr)
rename ( exp, c(
names(exp)[grepl ( 'variable',names(exp))] = "variable" ))
But I get the following error:
Error: unexpected '=' in:
" c(
names(exp)[grepl ( 'variable',names(exp))] ="
I think this has to do with calling up a name within a function, and I would like to ask if anyone might have a suggestion how to make this work please? Thanks.

Why bother with rename at all?
colnames(exp)[grepl('variable',colnames(exp))] <- 'variable'

If you only want to replace the part of the column name that is equal to 'variable', use:
colnames(exp) <- gsub('variable', 'replace string', colnames(exp))

rename ( exp, “variable” = names(exp)[grepl ( 'variable',names(exp))])

I am not 100% sure if this is what you need but it may be a start. I stayed away from plyr
for (i in 1:ncol(exp)){
if (substr(names(exp)[i],5,12) == "variable"){
names(exp)[i] <- "new.variable" #or any new var name
}
}
exp
You could also just remove the first four elements of the variable name.

Related

Insert value in a column if value in another column contains a certain word/letter

I am essentially trying to create some code to detect if the values in a column contains "%". If so, make df$unit col to be %. if not so, do nothing.
I tried the below code but it returns % for all rows of values, even if they don't contain % inside.
How should I fix it?
if(stringr::str_detect(df$variable, "%")) {
df$unit <- "%"
}
A tidyverse approach
library(dplyr)
library(stringr)
df %>%
mutate(unit = if_else(str_detect(variable,"%"),"%",unit))
Try the below:
library(stringr)
df[str_detect(df$variable, "%"), 'unit'] <- "%"
This doesn't need any extra libraries.
In base R, you can use replace with grepl.
df <- transform(df,unit = replace(unit, grepl('%', variable, fixed = TRUE), '%'))
Or
df$unit[grepl('%', df$variable, fixed = TRUE)] <- '%'

Converting list of Characters to Named num in R

I want to create a dataframe with 3 columns.
#First column
name_list = c("ABC_D1", "ABC_D2", "ABC_D3",
"ABC_E1", "ABC_E2", "ABC_E3",
"ABC_F1", "ABC_F2", "ABC_F3")
df1 = data.frame(C1 = name_list)
These names in column 1 are a bunch of named results of the cor.test function. The second column should consist of the correlation coefficents I get by writing ABC_D1$estimate, ABC_D2$estimate.
My problem is now that I dont want to add the $estimate manually to every single name of the first column. I tried this:
df1$C2 = paste0(df1$C1, '$estimate')
But this doesnt work, it only gives me this back:
"ABC_D1$estimate", "ABC_D2$estimate", "ABC_D3$estimate",
"ABC_E1$estimate", "ABC_E2$estimate", "ABC_E3$estimate",
"ABC_F1$estimate", "ABC_F2$estimate", "ABC_F3$estimate")
class(df1$C2)
[1] "character
How can I get the numeric result for ABC_D1$estimate in my dataframe? How can I convert these characters into Named num? The 3rd column should constist of the results of $p.value.
As pointed out by #DSGym there are several problems, including the it is not very convenient to have a list of character names, and it would be better to have a list of object instead.
Anyway, I think you can get where you want using:
estimates <- lapply(name_list, function(dat) {
dat_l <- get(dat)
dat_l[["estimate"]]
}
)
cbind(name_list, estimates)
This is not really advisable but given those premises...
Ok I think now i know what you need.
eval(parse(text = paste0("ABC_D1", '$estimate')))
You connect the two strings and use the functions parse and eval the get your results.
This it how to do it for your whole data.frame:
name_list = c("ABC_D1", "ABC_D2", "ABC_D3",
"ABC_E1", "ABC_E2", "ABC_E3",
"ABC_F1", "ABC_F2", "ABC_F3")
df1 = data.frame(C1 = name_list)
df1$C2 <- map_dbl(paste0(df1$C1, '$estimate'), function(x) eval(parse(text = x)))

How to define column specification for similarly named column with readr?

I have a data base with 250 columns and want to read only 50 of them instead of loading all of them then dropping columns with dplyr::select. I suppose I can do that using a column specification. I don't want to type the column specification manually for all those columns.
The 50 columns I want to keep have a common prefix, say 'blop', so I managed to manually change the column specification object I got from readr::spec_csv. I then used it to read my data file :
short_colspec <- readr::spec_csv('myfile.csv')
short_colspec$cols <- lapply(names(short_colspec$cols), function(name){
if (substr(name, 1, 4) == 'blop'){
return(col_character())
} else {
return(col_skip())
}
})
short_data <- read_csv('myfile.csv', col_types = short_colspec)
Is there a way to specify such a column specification with readr (or any other package) functions in a more robust way than what I did ?
using data.table's fread you can select columns you want to skip (=drop) or keep (=select)
#read first line of file to select which columns to keep
#adjust the strsplit-character here ';' according to your csv-type
keep_col <- readLines( "myfile.csv", n = 1L ) %>% strsplit( ";" ) %>% el() %>% grep( "blop", . )
#read file, only the desired columns
fread( "myfile.csv", select = keep_col )

Customizing make.names function in R?

I am automating a R code for which I have to use make.names function. Default behavior of make.names function is fine with me but when my table name contains a "-", I want the table name to be different.
For example, current behavior :
> make.names("iris-ir")
[1] "iris.ir"
But I want it to modify only in the case when I have "-" present in table name:
> make.names("iris-ir")
[1] "iris_ir"
How can I achieve this? EDIT: using only builtin packages.
Use the following function:
library(dplyr)
make_names<-function(name)
{
name <- as.character(name)
if(contains("-", vars = name))
sub("-", "_", name)
}
This should do what you want.
Sorry, I forgot to mention that the contains function is in the dplyr package.
Without dplyr
make_names<-function(name)
{
name <- as.character(name)
if(grepl("-", name, fixed = T))
sub("-", "_", name)
else
name
}

Add a prefix to all rows in R

I am trying to add a prefix end to all rows in a col ensnp in a dataframe chrs:
Name endsnp
Bov001 Bov001
Bov002 Bov001
My expected output must be like that:
Name endsnp
Bov001 endBov001
Bov002 endBov001
I have tried chrs <- transform(chrs, endsnp = sprintf("end", endsnp)), but I get this output:
Name endsnp
Bov001 end
Bov002 end
Any ideas about my error? Thank you!
Just use paste0 to combine strings.
For example,
chrs$endsnp = paste0('end', chrs$endsnp)
or using paste and specifing the separator between the strings
chrs$endsnp = paste('end', chrs$endsnp, sep='')

Resources