Read column names with hyphen using read.table in R [duplicate] - r

This question already has an answer here:
R read.csv - header with a specific symbol(>)
(1 answer)
Closed 6 months ago.
My column names have hyphens ("-"). When reading as table, the column names have (".") in place of ("-"). I want to retain the column names as-is.
df = read.table("meth_clin_kipan_pathanalysis.txt", header=T, sep="\t", row.names=1)
Raw:
TCGA-2K-A9WE-01A
TCGA-2Z-A9J1-01A
First
row
Second
row
Current output:
TCGA.2K.A9WE.01A
TCGA.2Z.A9J1.01A
First
row
Second
row
Desired output (same as raw, i.e., as-is):
TCGA-2K-A9WE-01A
TCGA-2Z-A9J1-01A
First
row
Second
row

Simply use check.names=FALSE argument of read.table to preserve original column names that have special characters and spaces.

This could work
library(dplyr)
df %>%
rename_with(~stringr::str_replace_all(., '\\.', '-'), everything())

Related

Rename column names by pattern [duplicate]

This question already has answers here:
Splitting a column in a data frame by an nth instance of a character
(3 answers)
Accessing element of a split string in R
(4 answers)
First entry from string split
(7 answers)
Closed 1 year ago.
I want to rename my columns cause it's too long, for example:
chrX:99883666-99894988_TSPAN6_ENSG00000000003.10 to TSPAN6
chrX:99839798-99854882_TNMD_ENSG00000000005.5 to TNMD
chr20:49505584-49575092_DPM1_ENSG00000000419.8 to DPM1
How can I rename it consider the elements I want to delete differs from every columns?
Using strsplit we can try:
names(df) <- strsplit(names(df), "_")[[1]][2]
If you only want to target a certain subset of names, then simply filter names(df) using that logic.
You can do that using regex. How about extracting a word between two underscores ?
x <- c("chrX:99883666-99894988_TSPAN6_ENSG00000000003.10",
"chrX:99839798-99854882_TNMD_ENSG00000000005.5",
"chr20:49505584-49575092_DPM1_ENSG00000000419.8")
sub('.*?_(\\w+)_.*', '\\1', x)
#[1] "TSPAN6" "TNMD" "DPM1"
For names of the column you can use names(df) instead of x.
names(df) <- sub('.*?_(\\w+)_.*', '\\1', names(df))
and if you prefer dplyr -
library(dplyr)
df <- df %>% rename_with(~sub('.*?_(\\w+)_.*', '\\1', .))

Create a character from column names (in R) [duplicate]

This question already has answers here:
R regex find last occurrence of delimiter
(4 answers)
Closed 1 year ago.
I have a matrix with thousands of columns which names are as shown below:
Z41_5_tes_ACGTTCCATAGCCGTA
Z41_5_ACGTTCCAGAGCGGTA
Z53_5_ACGTTCCAGAGCCGTA
Z53_5_ACGTTCCAGATCTGTA
Z41_5_ACGTTGCATAGCGGTA
Z41_5_tes_ACGTTCGCTAGCCGTA
I would like to create a vector with names that include the beginning of each columns names as shown below:
Z41_5_tes
Z41_5
Z53_5
Z53_5
Z41_5
Z41_5_tes
I have tried but here I did not capture Z41_5_tes.
names <- gsub("^([^]*[^_]).$", "\1", colnames(x#data))
Z41_5
Z53_5
Remove everything after the last underscore.
sub('_[^_]*$', '', x)
#[1] "Z41_5_tes" "Z41_5" "Z53_5" "Z53_5" "Z41_5" "Z41_5_tes"
Extract everything before last underscore.
sub('(.*)_.*', '\\1', x)
#[1] "Z41_5_tes" "Z41_5" "Z53_5" "Z53_5" "Z41_5" "Z41_5_tes"
data
x <- c("Z41_5_tes_ACGTTCCATAGCCGTA", "Z41_5_ACGTTCCAGAGCGGTA",
"Z53_5_ACGTTCCAGAGCCGTA", "Z53_5_ACGTTCCAGATCTGTA",
"Z41_5_ACGTTGCATAGCGGTA", "Z41_5_tes_ACGTTCGCTAGCCGTA")

Split column names in R using a separator [duplicate]

This question already has answers here:
Extracting numbers from vectors of strings
(12 answers)
Closed 2 years ago.
I have a dataframe X with column names such as
1_abc,
2_fgy,
27_msl,
936_hhq,
3_hdv
I want to just keep the numbers as the column name (so instead of 1_abc, just 1). How do I go about removing it while keeping the rest of the data intact?
All column names have underscore as the separator between numeric and character variables. There are about 400 columns so I want to be able to code this without using specific column name
You may use sub here for a base R option:
names(df) <- sub("^(\\d+).*$", "\\1", names(df))
Another option might be:
names(df) <- sub("_.*", "", names(df))
This would just strip off everything from the first underscore until the end of the column name.

Loop Through Column Names with Similar Structure [duplicate]

This question already has answers here:
How to extract columns with same name but different identifiers in R
(3 answers)
Closed 3 years ago.
I have a very large dataset. Of those, a small subset have the same column name with an indexing value that is numeric (unlike the post "How to extract columns with same name but different identifiers in R" where the indexing value is a string). For example
Q_1_1, Q_1_2, Q_1_3, ...
I am looking for a way to either loop through just those columns using the indices or to subset them all at once.
I have tried to use paste() to write their column names but have had no luck. See sample code below
Define Dataframe
df = data.frame("Q_1_1" = rep(1,5),"Q_1_2" = rep(2,5),"Q_1_3" = rep(3,5))
Define the Column Name Using Paste
cn <- as.symbol(paste("Q_1_",1, sep=""))
cn
df$cn
df$Q_1_1
I want df$cn to return the same thing as df$Q_1_1, but df$cn returns NULL.
If you are just trying to subset your data frame by column name, you could use dplyr for subseting all your indexed columns at once and a regex to match all column names with a certain pattern:
library(dplyr)
df = data.frame("Q_1_1" = rep(1,5),"Q_1_2" = rep(2,5),"Q_1_3" = rep(3,5), "A_1" = rep(4,5))
newdf <- df %>%
dplyr::select(matches("Q_[0-9]_[0-9]"))
the [0-9] in the regex matches any digit between the _. Depending on what variable you're trying to match you might have to change the regular expression.
The problem with your solution was that you only saved the name of your columns but did not actually assign it back to the data frame / to a column.
I hope this helps!

Replace column names with single line of code [duplicate]

This question already has answers here:
Rename multiple columns by names
(20 answers)
Closed 4 years ago.
Data frame with 4 columns and want to replace 2nd and 3rd column names only.
data frame=df
col.names =A,B,C,D
New col.names= Z,F
i have tried with the below code :
colnames(df)[2]<-"Z"
colnames(df)[3]<-"F"
but is there any possibility to rename with single line of code ?
Actual data frame contains 150+ colnames, so searching for better solution.
As it is a data.frame, names can also work in place of colnames as names of a data.frame is the column names. Subset the column names with index [2:3] (if it is a range of columns or use [c(2, 3)]) and assign it to the new column names by concatenating (c) names as a vector
names(df)[2:3] <- c("Z", "F")

Resources