Substitute multiple periods in all column names in R [duplicate] - r

This question already has answers here:
R: How to replace . in a string?
(5 answers)
Closed 2 years ago.
I have the following data.frame.
df = data.frame(a.dfs.56=c(rep("a",8), rep("b",5), rep("c",7), rep("d",10)),
b.fqh.28=rnorm(30, 6, 2),
c.34.2.fgs=rnorm(30, 12, 3.5),
d.tre.19.frn=rnorm(30, 8, 3)
)
How can I substitute all periods "." in the column names to have them become dashes "-"?
I am aware of options like check.names=FALSE when using read.table or data.frame, but in this case, I cannot use this.
I have also tried variations of the following posts, but they did not work for me.
Specifying column names in a data.frame changes spaces to "."
How can I use gsub in multiple specific column in r
R gsub column names in all data frames within a list
Thank you.

You can use gsub for name replacement
names(df) <- gsub(".", "-", names(df), fixed=TRUE)
Note that you need fixed=TRUE because normally gsub expects regular expressions and . is a special regular expression character.
But be aware that - is a non-standard character for variable names. If you try to use those columns with functions that use non-standard evaluation, you will need to surround the names in back-ticks to use them. For example
dplyr::filter(df, `a-dfs-56`=="a")

gsub("\\.", "-", names(df)) is the regex (regular expressions) way. The . is a special symbol in regex that means "match any single character". That's why the fixed = TRUE argument is included in MrFlick's answer.
The \\ (escape) tells R that we wan't the literal period and not the special symbol that it represents.

Related

R Combine two gsub() statements [duplicate]

This question already has answers here:
Replace multiple strings in one gsub() or chartr() statement in R?
(9 answers)
Closed 1 year ago.
I have these two lines of R codes:
df$symbol <- gsub("\\^", "-P", df$symbol) # find "^" and change it to "-P"
df$symbol <- gsub("/", "-", df$symbol) # find "/" and change it to "-"
How can I combine them into one line?
Thank you!
Given that you have two different replacement strings, there may not be a way to do this with just a single call to gsub. However, you could chain two calls to gsub here:
df$symbol <- gsub("/", "-", gsub("\\^", "-P", df$symbol))

Split column names in R using a separator [duplicate]

This question already has answers here:
Extracting numbers from vectors of strings
(12 answers)
Closed 2 years ago.
I have a dataframe X with column names such as
1_abc,
2_fgy,
27_msl,
936_hhq,
3_hdv
I want to just keep the numbers as the column name (so instead of 1_abc, just 1). How do I go about removing it while keeping the rest of the data intact?
All column names have underscore as the separator between numeric and character variables. There are about 400 columns so I want to be able to code this without using specific column name
You may use sub here for a base R option:
names(df) <- sub("^(\\d+).*$", "\\1", names(df))
Another option might be:
names(df) <- sub("_.*", "", names(df))
This would just strip off everything from the first underscore until the end of the column name.

Split column names in R [duplicate]

This question already has answers here:
Remove all characters before a period in a string
(5 answers)
Closed 4 years ago.
I have a data frame like below.
df:
X1.Name X1.ID X1.Prac X1.SCD
But, I need to split the column name by dot and display as,
output df:
Name ID Prac SCD
Using sub:
names(df) <- sub("^[^.]+\\.", "", names(df))
Demo
The regex pattern I used will match everything from the start of the string up to, and including, the first dot. Then, it replaces that, and only that, with empty string.
^ from the start of the string
[^.]+ match one or more characters which are NOT dots
\\. then match a literal dot
We then replace this entire pattern with empty string "", i.e. we remove it from the original string.

Inserting character into variable names [duplicate]

This question already has answers here:
Insert a character at a specific location in a string
(8 answers)
Closed 5 years ago.
I have a dataset with variable names such as FamId00 and ISCO8899 and would like to write a command to insert an underscore before the last two digits, which represent years. What is the best way of doing it? I have tried with regex but the further I got was to:
gsub('.{2}$', '', varname)
which gives me:
FamId
How to I add '_' and the original last two digits back? Also, I have variables in the dataset that do not have the year in the last two digits (i.e. ID and sex). Is there a way to keep the regular expression from affecting those?
We don't need gsub just a sub would be enough as this is only a single instance replacement. Capture the last two characters as a group ((...)) and in the replacement use the _ followed by the backreference of that capture group
sub("(.{2})$", "_\\1", varname)
#[1] "FamId_00" "ISCO88_99"
The . is a metacharacter implying any character. If this needs to be specific i.e. digits, use \\d{2} in place of .{2}
data
varname <- c("FamId00", "ISCO8899")
Alternative solution always using sub() or gsub() and a different pattern.
ids <- c("FamId00", "ISCO8899")
gsub("(^.*)([[:digit:]]{2}$)", "\\1_\\2", ids)
[1] "FamId_00" "ISCO88_99"

convert commas in a column of a data set points r

I've imported from excel a dataset. And I have a column 'Height' and I would want to replace the ',' by '.' .
I tried with this command but it gives me error.
apply(apply(DATASET$Height, 2, gsub, patt=",", replace="."), 2, as.numeric)
Thank you very much for your help
To recode column 'Height' in data frame 'DATASET':
DATASET$Height <- gsub(",",".",DATASET$Height,fixed=TRUE)
Any errors? If no you can proceed to convert the column to numeric.
Get errors when converting to numeric? Perhaps you have still other characters besides "," that prevent R from reading the values as numbers. In that case you would need to apply gsub a second time to remove all non-numeric characters.
First, you should check if it is character. Then, I would split the strings by the comma, then paste them with a dot:
suppose a is what you get with DATASET[["Height"]]
a <- c("234,23", "2314,54", "234,65")
then with sapply, you can split and collapse each character element:
b <- sapply(a,
function(string){
paste0(unlist(strsplit(string, split=",")),collapse=".")
})
Now, you can replace the DATASET[["Height"]] with b.

Resources