Split column names in R using a separator [duplicate] - r

This question already has answers here:
Extracting numbers from vectors of strings
(12 answers)
Closed 2 years ago.
I have a dataframe X with column names such as
1_abc,
2_fgy,
27_msl,
936_hhq,
3_hdv
I want to just keep the numbers as the column name (so instead of 1_abc, just 1). How do I go about removing it while keeping the rest of the data intact?
All column names have underscore as the separator between numeric and character variables. There are about 400 columns so I want to be able to code this without using specific column name

You may use sub here for a base R option:
names(df) <- sub("^(\\d+).*$", "\\1", names(df))
Another option might be:
names(df) <- sub("_.*", "", names(df))
This would just strip off everything from the first underscore until the end of the column name.

Related

Read column names with hyphen using read.table in R [duplicate]

This question already has an answer here:
R read.csv - header with a specific symbol(>)
(1 answer)
Closed 6 months ago.
My column names have hyphens ("-"). When reading as table, the column names have (".") in place of ("-"). I want to retain the column names as-is.
df = read.table("meth_clin_kipan_pathanalysis.txt", header=T, sep="\t", row.names=1)
Raw:
TCGA-2K-A9WE-01A
TCGA-2Z-A9J1-01A
First
row
Second
row
Current output:
TCGA.2K.A9WE.01A
TCGA.2Z.A9J1.01A
First
row
Second
row
Desired output (same as raw, i.e., as-is):
TCGA-2K-A9WE-01A
TCGA-2Z-A9J1-01A
First
row
Second
row
Simply use check.names=FALSE argument of read.table to preserve original column names that have special characters and spaces.
This could work
library(dplyr)
df %>%
rename_with(~stringr::str_replace_all(., '\\.', '-'), everything())

Rename column names by pattern [duplicate]

This question already has answers here:
Splitting a column in a data frame by an nth instance of a character
(3 answers)
Accessing element of a split string in R
(4 answers)
First entry from string split
(7 answers)
Closed 1 year ago.
I want to rename my columns cause it's too long, for example:
chrX:99883666-99894988_TSPAN6_ENSG00000000003.10 to TSPAN6
chrX:99839798-99854882_TNMD_ENSG00000000005.5 to TNMD
chr20:49505584-49575092_DPM1_ENSG00000000419.8 to DPM1
How can I rename it consider the elements I want to delete differs from every columns?
Using strsplit we can try:
names(df) <- strsplit(names(df), "_")[[1]][2]
If you only want to target a certain subset of names, then simply filter names(df) using that logic.
You can do that using regex. How about extracting a word between two underscores ?
x <- c("chrX:99883666-99894988_TSPAN6_ENSG00000000003.10",
"chrX:99839798-99854882_TNMD_ENSG00000000005.5",
"chr20:49505584-49575092_DPM1_ENSG00000000419.8")
sub('.*?_(\\w+)_.*', '\\1', x)
#[1] "TSPAN6" "TNMD" "DPM1"
For names of the column you can use names(df) instead of x.
names(df) <- sub('.*?_(\\w+)_.*', '\\1', names(df))
and if you prefer dplyr -
library(dplyr)
df <- df %>% rename_with(~sub('.*?_(\\w+)_.*', '\\1', .))

Create a character from column names (in R) [duplicate]

This question already has answers here:
R regex find last occurrence of delimiter
(4 answers)
Closed 1 year ago.
I have a matrix with thousands of columns which names are as shown below:
Z41_5_tes_ACGTTCCATAGCCGTA
Z41_5_ACGTTCCAGAGCGGTA
Z53_5_ACGTTCCAGAGCCGTA
Z53_5_ACGTTCCAGATCTGTA
Z41_5_ACGTTGCATAGCGGTA
Z41_5_tes_ACGTTCGCTAGCCGTA
I would like to create a vector with names that include the beginning of each columns names as shown below:
Z41_5_tes
Z41_5
Z53_5
Z53_5
Z41_5
Z41_5_tes
I have tried but here I did not capture Z41_5_tes.
names <- gsub("^([^]*[^_]).$", "\1", colnames(x#data))
Z41_5
Z53_5
Remove everything after the last underscore.
sub('_[^_]*$', '', x)
#[1] "Z41_5_tes" "Z41_5" "Z53_5" "Z53_5" "Z41_5" "Z41_5_tes"
Extract everything before last underscore.
sub('(.*)_.*', '\\1', x)
#[1] "Z41_5_tes" "Z41_5" "Z53_5" "Z53_5" "Z41_5" "Z41_5_tes"
data
x <- c("Z41_5_tes_ACGTTCCATAGCCGTA", "Z41_5_ACGTTCCAGAGCGGTA",
"Z53_5_ACGTTCCAGAGCCGTA", "Z53_5_ACGTTCCAGATCTGTA",
"Z41_5_ACGTTGCATAGCGGTA", "Z41_5_tes_ACGTTCGCTAGCCGTA")

Changing a full last name to just the first letter of the name in R [duplicate]

This question already has answers here:
Getting and removing the first character of a string
(7 answers)
Extract the first (or last) n characters of a string
(5 answers)
Closed 2 years ago.
I'm working in R. I have a dataset with people first and last names. There is a column called "First" and another column called "Last".
I want to change "Bodie" to just "B" and do the same for all the observations in the "Last" column.
I'm newer to programming so I don't even know where to start. I have looked at some of the string packages in R and can't quite figure out what to do. Thanks for the help.
We can use substr to extract the first letter of the 'Last' column
df1$Last <- substr(df1$Last, 1, 1)
Or sub to remove all the characters other than the first
df1$Last <- sub("^(.).*", "\\1", df1$Last)
Or another option is to split the characters, select the first element
df1$Last <- sapply(strsplit(df1$Last, ""), `[`, 1)
Just a variation on the #akrun answer which uses sub sans a capture group:
df1$Last <- sub("(?<=.).*$", "", df1$Last, perl=TRUE)

Replace column names with single line of code [duplicate]

This question already has answers here:
Rename multiple columns by names
(20 answers)
Closed 4 years ago.
Data frame with 4 columns and want to replace 2nd and 3rd column names only.
data frame=df
col.names =A,B,C,D
New col.names= Z,F
i have tried with the below code :
colnames(df)[2]<-"Z"
colnames(df)[3]<-"F"
but is there any possibility to rename with single line of code ?
Actual data frame contains 150+ colnames, so searching for better solution.
As it is a data.frame, names can also work in place of colnames as names of a data.frame is the column names. Subset the column names with index [2:3] (if it is a range of columns or use [c(2, 3)]) and assign it to the new column names by concatenating (c) names as a vector
names(df)[2:3] <- c("Z", "F")

Resources