Restructuring column names in a df [duplicate] - r

This question already has answers here:
Getting and removing the first character of a string
(7 answers)
Remove prefix letter from column variables
(3 answers)
Closed 2 years ago.
I've got a data with column names that look like this:
X121.10.21 X131.90.23
I want to remove the X at the beginning of each string, remove the third number after the . and then reorder the first and second number. Like this:
10.121 90.131
How can I do this? I would especially appreciate a way to do this with dplyr, if possible.

We can use sub, capture as a group and replace with the backreference of the captured group
names(df1) <- sub("X(\\d+)\\.(\\d+)\\..*", "\\2.\\1", names(df1))

Related

I want to put in one row several rows of the same column depending on the characteristics of another column. Any suggestions on how to do that in R? [duplicate]

This question already has answers here:
Collapse / concatenate / aggregate a column to a single comma separated string within each group
(6 answers)
Concatenate several columns to comma separated strings by group
(5 answers)
Closed 1 year ago.
I have one df1 and I want to merge certain rows of specific columns based on the names of another column
An option is aggregate by specifying a formula with the rhs specifying the grouping column ('Name') and . for all the others ('Likes', 'How many hrs spend liking') and paste them together
aggregate(. ~ Name, df1, FUN = toString)

How do I shorten row names in R? Please read below [duplicate]

This question already has answers here:
R - remove anything after comma from column
(5 answers)
Closed 2 years ago.
I have a table of values in R where the row names are very large. I want to shorten them. My row names look like this:
GSM1051550_7800246087_R02C01
I want to rename every row to only have the first part of the name, i.e., GSM1051550. How can I do this in R?
Building on jay.sf's comment (assuming your table's names is ABC):
row.names(ABC) <- sub("\\_.*", "", row.names(ABC))

Remove part of column name post the second "_" [duplicate]

This question already has answers here:
Exclude everything after the second occurrence of a certain string
(2 answers)
Closed 3 years ago.
I have a vector which has names of the columns
group <- c("amount_bin_group", "fico_bin_group", "cltv_bin_group", "p_region_bin")
I want to replace the part after the second "_" from each element i.e. I want it to be
group <- c("amount_bin", "fico_bin", "cltv_bin", "p_region")
I can split this into two vectors and try gsub or substr. However, it would be nice to do that in vector. Any thoughts?
I checked other posts regarding the same question, but none of them has this framework
> sub("(.*)_.*$", "\\1", group)
[1] "amount_bin" "fico_bin" "cltv_bin" "p_region"

Remove underscore from a string in R [duplicate]

This question already has answers here:
Replace specific characters within strings
(7 answers)
Closed 7 years ago.
In my data.frame, I have a column of type character, where all the values look like this : 123_456 (three digits, an underscore, three digits).
I need to transform these values to a numeric, and as.numeric(my_dataframe$my_column) gives me a NA. Therefore I need to remove the underscore first, in order to do as.numeric.
How would I do that please ?
Thanks
We can use sub
as.numeric(sub("_", "", my_dataframe$my_column))

Remove duplicates from a column R [duplicate]

This question already has answers here:
Remove duplicated rows
(10 answers)
Closed 7 years ago.
I have a long column (9500 rows in excel), where I have a lot of gene ids. I want to remove the duplicates.
ID
BXDC2
BXDC5
BXDC5
BZRPL1
BZRPL1
C10orf11
C10orf116
C10orf119
C10orf120
C10orf125
C10orf125
And I want the result to be:
ID
BXDC2
BXDC5
BZRPL1
C10orf11
C10orf116
C10orf119
C10orf120
C10orf125
Can anybody help me with an R script :-)?
You can use duplicated or unique. Here, I am assuming that the column name is 'ID'
df1[!duplicated(df1$ID),,drop=FALSE]
Or
library(data.table)#v1.9.4+
unique(setDT(df1), by='ID')

Resources