I am trying to add a prefix end to all rows in a col ensnp in a dataframe chrs:
Name endsnp
Bov001 Bov001
Bov002 Bov001
My expected output must be like that:
Name endsnp
Bov001 endBov001
Bov002 endBov001
I have tried chrs <- transform(chrs, endsnp = sprintf("end", endsnp)), but I get this output:
Name endsnp
Bov001 end
Bov002 end
Any ideas about my error? Thank you!
Just use paste0 to combine strings.
For example,
chrs$endsnp = paste0('end', chrs$endsnp)
or using paste and specifing the separator between the strings
chrs$endsnp = paste('end', chrs$endsnp, sep='')
Related
Suppose I have the dataframe below:
>colnames(df)
>[1] "0939.HK.Open" "0939.HK.High" "0939.HK.Low" "0939.HK.Close" "0939.HK.Volume" "0939.HK.Adjusted"
Wondering if I can replace column name with a defined symbol?
>symbol<-'0123.KI'
And what I would like to obtain is:
>[1] "0123.KI.Open" "0123.KI.High" "0123.KI.Low" "0123.KI.Close" "0123.KI.Volume" "0123.KI.Adjusted"
Thank you so much guys.
Try:
colnames(df) <- paste0(symbol, ".", tools::file_ext(colnames(df)))
As the column names look like a filename, I am taking the extension - file_ext, then adding the symbol using paste0.
Another way:
symbol <- '0123.KI'
colnames(df) <- gsub("0939.HK", symbol, colnames(df), fixed=TRUE)
I have 4 values in the list
c("JSMITH_WWWFRecvd2001_asof_20220901.xlsx", "WSMITH_AMEXRecvd2002_asof_20220901.xlsx",
"PSMITH_WWWFRecvd2003_asof_20220901.xlsx", "QSMITH_AMEXRecvd2004_asof_20220901.xlsx")
I would like my outcome to be
"wwwf_01","amex_02","wwwf_03","amex_04"
You can use sub:
tolower(sub('.+_(.+)Recvd[0-9][0-9](..).+', '\\1_\\2', x))
Something like this would work. You can extract the string you want with str_extract() make it lower case with tolower() and paste the formatted counter to the end of the string with a "_" separator =.
paste(tolower(stringr::str_extract(x,"WWWF|AMEX" )), sprintf("%02d",seq_along(x)), sep = "_")
I have a list of filenames in the format "filename PID00-00-00" or just "PID00-00-00".
I want to extract part of the filename to create an ID column.
I am currently using this code for the string extraction
names(df) <- stringr::str_extract(names(df), "(?<=PID)\\d+")
binded1 = rbindlist(df, idcol = "ID")%>%
as.data.frame(binded1)
This gives the ID as the first set of digits after PID. e.g. filename PID1234-00-01 becomes ID 1234.
I want to also extract the first hyphen and following digits. So from filename PID1234-00-01 I want 1234-00.
What should my regex be?
try this:
stringr::str_extract(names(df),"(?<=PID)\\d{4}-\\d{2}")
I am using R to do some data pre-processing, and here is the problem that I am faced with: I input the data using read.csv(filename,header=TRUE), and then the space in variable names became ".", for example, a variable named Full Code became Full.Code in the generated dataframe. After the processing, I use write.xlsx(filename) to export the results, while the variable names are changed. How to address this problem?
Besides, in the output .xlsx file, the first column become indices(i.e., 1 to N), which is not what I am expecting.
If your set check.names=FALSE in read.csv when you read the data in then the names will not be changed and you will not need to edit them before writing the data back out. This of course means that you would need quote the column names (back quotes in some cases) or refer to the columns by location rather than name while editing.
To get spaces back in the names, do this (right before you export - R does let you have spaces in variable names, but it's a pain):
# A simple regular expression to replace dots with spaces
# This might have unintended consequences, so be sure to check the results
names(yourdata) <- gsub(x = names(yourdata),
pattern = "\\.",
replacement = " ")
To drop the first-column index, just add row.names = FALSE to your write.xlsx(). That's a common argument for functions that write out data in tabular format (write.csv() has it, too).
Here's a function (sorry, I know it could be refactored) that makes nice column names even if there are multiple consecutive dots and trailing dots:
makeColNamesUserFriendly <- function(ds) {
# FIXME: Repetitive.
# Convert any number of consecutive dots to a single space.
names(ds) <- gsub(x = names(ds),
pattern = "(\\.)+",
replacement = " ")
# Drop the trailing spaces.
names(ds) <- gsub(x = names(ds),
pattern = "( )+$",
replacement = "")
ds
}
Example usage:
ds <- makeColNamesUserFriendly(ds)
Just to add to the answers already provided, here is another way of replacing the “.” or any other kind of punctation in column names by using a regex with the stringr package in the way like:
require(“stringr”)
colnames(data) <- str_replace_all(colnames(data), "[:punct:]", " ")
For example try:
data <- data.frame(variable.x = 1:10, variable.y = 21:30, variable.z = "const")
colnames(data) <- str_replace_all(colnames(data), "[:punct:]", " ")
and
colnames(data)
will give you
[1] "variable x" "variable y" "variable z"
I would like to rename all columns in a dataframe containing a pattern in r. Ie, I would like to substitute the column name "variable" for all columns containing "variable", such as "htn.variable". I thought I could use rename from plyr and grepl. I have created an example:
exp<-data.frame(htn.variable = c(1,2,3), id = c(5,6,7), visit = c(1,3,4))
require(plyr)
rename ( exp, c(
names(exp)[grepl ( 'variable',names(exp))] = "variable" ))
But I get the following error:
Error: unexpected '=' in:
" c(
names(exp)[grepl ( 'variable',names(exp))] ="
I think this has to do with calling up a name within a function, and I would like to ask if anyone might have a suggestion how to make this work please? Thanks.
Why bother with rename at all?
colnames(exp)[grepl('variable',colnames(exp))] <- 'variable'
If you only want to replace the part of the column name that is equal to 'variable', use:
colnames(exp) <- gsub('variable', 'replace string', colnames(exp))
rename ( exp, “variable” = names(exp)[grepl ( 'variable',names(exp))])
I am not 100% sure if this is what you need but it may be a start. I stayed away from plyr
for (i in 1:ncol(exp)){
if (substr(names(exp)[i],5,12) == "variable"){
names(exp)[i] <- "new.variable" #or any new var name
}
}
exp
You could also just remove the first four elements of the variable name.