Rename column names by pattern [duplicate] - r

This question already has answers here:
Splitting a column in a data frame by an nth instance of a character
(3 answers)
Accessing element of a split string in R
(4 answers)
First entry from string split
(7 answers)
Closed 1 year ago.
I want to rename my columns cause it's too long, for example:
chrX:99883666-99894988_TSPAN6_ENSG00000000003.10 to TSPAN6
chrX:99839798-99854882_TNMD_ENSG00000000005.5 to TNMD
chr20:49505584-49575092_DPM1_ENSG00000000419.8 to DPM1
How can I rename it consider the elements I want to delete differs from every columns?

Using strsplit we can try:
names(df) <- strsplit(names(df), "_")[[1]][2]
If you only want to target a certain subset of names, then simply filter names(df) using that logic.

You can do that using regex. How about extracting a word between two underscores ?
x <- c("chrX:99883666-99894988_TSPAN6_ENSG00000000003.10",
"chrX:99839798-99854882_TNMD_ENSG00000000005.5",
"chr20:49505584-49575092_DPM1_ENSG00000000419.8")
sub('.*?_(\\w+)_.*', '\\1', x)
#[1] "TSPAN6" "TNMD" "DPM1"
For names of the column you can use names(df) instead of x.
names(df) <- sub('.*?_(\\w+)_.*', '\\1', names(df))
and if you prefer dplyr -
library(dplyr)
df <- df %>% rename_with(~sub('.*?_(\\w+)_.*', '\\1', .))

Related

How to set column names in R by repeating character? [duplicate]

This question already has answers here:
How to create a sequence starting with a character and then with numbers in R
(1 answer)
Make sequential numeric column names prefixed with a letter
(3 answers)
Closed 1 year ago.
Suppose I want to create a column name in R called L1, L2, ..., up to L200. How could I do this for a data frame?
I tried colnames(df) <- c('L1':'L200'), but this does not work (returns error message NAs introduced by coercion), even though there are 200 columns.
Help on this appreciated!
We can use paste
colnames(df) <- paste0("L", 1:200)
or to make it more automatic
colnames(df) <- paste0("L", seq_along(df))
NOTE: The range (:) operator works for integer, and not with character in base R i.e. 'L1' is a string, while 1 is integer, so 1:200 gives the range of values from 1 to 200
Here is another solution:
colnames(df) <- sprintf("L%d", 1:200)

Create a character from column names (in R) [duplicate]

This question already has answers here:
R regex find last occurrence of delimiter
(4 answers)
Closed 1 year ago.
I have a matrix with thousands of columns which names are as shown below:
Z41_5_tes_ACGTTCCATAGCCGTA
Z41_5_ACGTTCCAGAGCGGTA
Z53_5_ACGTTCCAGAGCCGTA
Z53_5_ACGTTCCAGATCTGTA
Z41_5_ACGTTGCATAGCGGTA
Z41_5_tes_ACGTTCGCTAGCCGTA
I would like to create a vector with names that include the beginning of each columns names as shown below:
Z41_5_tes
Z41_5
Z53_5
Z53_5
Z41_5
Z41_5_tes
I have tried but here I did not capture Z41_5_tes.
names <- gsub("^([^]*[^_]).$", "\1", colnames(x#data))
Z41_5
Z53_5
Remove everything after the last underscore.
sub('_[^_]*$', '', x)
#[1] "Z41_5_tes" "Z41_5" "Z53_5" "Z53_5" "Z41_5" "Z41_5_tes"
Extract everything before last underscore.
sub('(.*)_.*', '\\1', x)
#[1] "Z41_5_tes" "Z41_5" "Z53_5" "Z53_5" "Z41_5" "Z41_5_tes"
data
x <- c("Z41_5_tes_ACGTTCCATAGCCGTA", "Z41_5_ACGTTCCAGAGCGGTA",
"Z53_5_ACGTTCCAGAGCCGTA", "Z53_5_ACGTTCCAGATCTGTA",
"Z41_5_ACGTTGCATAGCGGTA", "Z41_5_tes_ACGTTCGCTAGCCGTA")

Split column names in R using a separator [duplicate]

This question already has answers here:
Extracting numbers from vectors of strings
(12 answers)
Closed 2 years ago.
I have a dataframe X with column names such as
1_abc,
2_fgy,
27_msl,
936_hhq,
3_hdv
I want to just keep the numbers as the column name (so instead of 1_abc, just 1). How do I go about removing it while keeping the rest of the data intact?
All column names have underscore as the separator between numeric and character variables. There are about 400 columns so I want to be able to code this without using specific column name
You may use sub here for a base R option:
names(df) <- sub("^(\\d+).*$", "\\1", names(df))
Another option might be:
names(df) <- sub("_.*", "", names(df))
This would just strip off everything from the first underscore until the end of the column name.

Extract column names from df as a vector - or copy / paste column names to new df? [duplicate]

This question already has answers here:
Simplest way to get rbind to ignore column names
(2 answers)
Closed 2 years ago.
is there a way to extract column names from a df and convert into a vector? In fact, what I am trying to do is rbing two df's, the second has no names so returning a name matching error? Maybe there is an easier way to copy the df1 column names to df2 so rbind will work?
In one line:
df3 <- rbind(df1, setNames(df2, names(df1)))

Deleting Rows by condition [duplicate]

This question already has answers here:
Filter rows which contain a certain string
(5 answers)
Closed 5 years ago.
Below is the image that describes my data frame, I wish to conditionally delete all city names which have "Range" written in them as indicated in the snippet. I tried various approaches but haven't been successful so far.
There are two things: detect a pattern in a character vector, you can use stringr::str_detect() and extract a subset of rows, this is dplyr::filter() purpose.
library(dplyr)
library(stringr)
df <- df %>%
filter( ! str_detect(City, "Range") )
Use grep with invert option to select all lines without Range.
yourDataFrame <- yourDataFrame[grep("Range", yourDataFrame$City, invert = TRUE), ]

Resources