Remove part of column name post the second "_" [duplicate] - r

This question already has answers here:
Exclude everything after the second occurrence of a certain string
(2 answers)
Closed 3 years ago.
I have a vector which has names of the columns
group <- c("amount_bin_group", "fico_bin_group", "cltv_bin_group", "p_region_bin")
I want to replace the part after the second "_" from each element i.e. I want it to be
group <- c("amount_bin", "fico_bin", "cltv_bin", "p_region")
I can split this into two vectors and try gsub or substr. However, it would be nice to do that in vector. Any thoughts?
I checked other posts regarding the same question, but none of them has this framework

> sub("(.*)_.*$", "\\1", group)
[1] "amount_bin" "fico_bin" "cltv_bin" "p_region"

Related

Restructuring column names in a df [duplicate]

This question already has answers here:
Getting and removing the first character of a string
(7 answers)
Remove prefix letter from column variables
(3 answers)
Closed 2 years ago.
I've got a data with column names that look like this:
X121.10.21 X131.90.23
I want to remove the X at the beginning of each string, remove the third number after the . and then reorder the first and second number. Like this:
10.121 90.131
How can I do this? I would especially appreciate a way to do this with dplyr, if possible.
We can use sub, capture as a group and replace with the backreference of the captured group
names(df1) <- sub("X(\\d+)\\.(\\d+)\\..*", "\\2.\\1", names(df1))

How do I shorten row names in R? Please read below [duplicate]

This question already has answers here:
R - remove anything after comma from column
(5 answers)
Closed 2 years ago.
I have a table of values in R where the row names are very large. I want to shorten them. My row names look like this:
GSM1051550_7800246087_R02C01
I want to rename every row to only have the first part of the name, i.e., GSM1051550. How can I do this in R?
Building on jay.sf's comment (assuming your table's names is ABC):
row.names(ABC) <- sub("\\_.*", "", row.names(ABC))

Only keep part before the 2th pattern in R [duplicate]

This question already has answers here:
How to delete everything after nth delimiter in R?
(2 answers)
Remove text after second colon
(3 answers)
Remove all characters after the 2nd occurrence of "-" in each element of a vector
(1 answer)
Closed 3 years ago.
How could I remove everything before the second pattern occurence in a dataframe using R?
I used:
for (i in 1:length(df1)){
df1[, i]<- gsub(".*_", "",df1[, i])
}
But I guess there is a better way to apply that for all the dataframe?
Here is an exemple of a value in the dataframe:
name_000004_A_B_C
name_00003_C_D
and get
A_B_C
C_D
Thank you for your help.
x <- c("name_000004_A_B_C", "name_00003_C_D")
gsub("(name_[0-9]*_)(.*)", "\\2", x)
##[1] "A_B_C" "C_D"
More generalised:
gsub("([a-z0-9]*_[a-z0-9]*_)(.*)", "\\2", x)
#[1] "A_B_C" "C_D"
The global substitution takes two matching group patterns into consideration, first is the pattern (name_[0-9]*_) and the second is whatever comes after. It keeps the second matching group. Hope this hepls!

Difference between [A-Z] and LETTERS in grep [duplicate]

This question already has answers here:
Matching multiple patterns
(6 answers)
Closed 5 years ago.
I am trying to only keep rows whose id contains letters. And I find the following two ways give different results.
df[grep("[A-Z]",df$id),]
df[grep(LETTERS,df$id),]
It seems the second way will omit many rows that actually have letters.
Why?
If you want to grep patterns in a vector try this:
to_match <- paste(LETTERS, collapse = "|")
to_match
[1] "A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z"
and then
df[grep(to_match, df$id), ]
Explanation:
You will match any of the characters in "to_match" since they are separated by the "or" operator "|".

Remove underscore from a string in R [duplicate]

This question already has answers here:
Replace specific characters within strings
(7 answers)
Closed 7 years ago.
In my data.frame, I have a column of type character, where all the values look like this : 123_456 (three digits, an underscore, three digits).
I need to transform these values to a numeric, and as.numeric(my_dataframe$my_column) gives me a NA. Therefore I need to remove the underscore first, in order to do as.numeric.
How would I do that please ?
Thanks
We can use sub
as.numeric(sub("_", "", my_dataframe$my_column))

Resources