This question already has answers here:
Remove all characters before a period in a string
(5 answers)
Closed 4 years ago.
I have a data frame like below.
df:
X1.Name X1.ID X1.Prac X1.SCD
But, I need to split the column name by dot and display as,
output df:
Name ID Prac SCD
Using sub:
names(df) <- sub("^[^.]+\\.", "", names(df))
Demo
The regex pattern I used will match everything from the start of the string up to, and including, the first dot. Then, it replaces that, and only that, with empty string.
^ from the start of the string
[^.]+ match one or more characters which are NOT dots
\\. then match a literal dot
We then replace this entire pattern with empty string "", i.e. we remove it from the original string.
Related
This question already has answers here:
R regex find last occurrence of delimiter
(4 answers)
Closed 1 year ago.
I have a matrix with thousands of columns which names are as shown below:
Z41_5_tes_ACGTTCCATAGCCGTA
Z41_5_ACGTTCCAGAGCGGTA
Z53_5_ACGTTCCAGAGCCGTA
Z53_5_ACGTTCCAGATCTGTA
Z41_5_ACGTTGCATAGCGGTA
Z41_5_tes_ACGTTCGCTAGCCGTA
I would like to create a vector with names that include the beginning of each columns names as shown below:
Z41_5_tes
Z41_5
Z53_5
Z53_5
Z41_5
Z41_5_tes
I have tried but here I did not capture Z41_5_tes.
names <- gsub("^([^]*[^_]).$", "\1", colnames(x#data))
Z41_5
Z53_5
Remove everything after the last underscore.
sub('_[^_]*$', '', x)
#[1] "Z41_5_tes" "Z41_5" "Z53_5" "Z53_5" "Z41_5" "Z41_5_tes"
Extract everything before last underscore.
sub('(.*)_.*', '\\1', x)
#[1] "Z41_5_tes" "Z41_5" "Z53_5" "Z53_5" "Z41_5" "Z41_5_tes"
data
x <- c("Z41_5_tes_ACGTTCCATAGCCGTA", "Z41_5_ACGTTCCAGAGCGGTA",
"Z53_5_ACGTTCCAGAGCCGTA", "Z53_5_ACGTTCCAGATCTGTA",
"Z41_5_ACGTTGCATAGCGGTA", "Z41_5_tes_ACGTTCGCTAGCCGTA")
This question already has answers here:
Extracting numbers from vectors of strings
(12 answers)
Closed 2 years ago.
I have a dataframe X with column names such as
1_abc,
2_fgy,
27_msl,
936_hhq,
3_hdv
I want to just keep the numbers as the column name (so instead of 1_abc, just 1). How do I go about removing it while keeping the rest of the data intact?
All column names have underscore as the separator between numeric and character variables. There are about 400 columns so I want to be able to code this without using specific column name
You may use sub here for a base R option:
names(df) <- sub("^(\\d+).*$", "\\1", names(df))
Another option might be:
names(df) <- sub("_.*", "", names(df))
This would just strip off everything from the first underscore until the end of the column name.
This question already has answers here:
Getting and removing the first character of a string
(7 answers)
Extract the first (or last) n characters of a string
(5 answers)
Closed 2 years ago.
I'm working in R. I have a dataset with people first and last names. There is a column called "First" and another column called "Last".
I want to change "Bodie" to just "B" and do the same for all the observations in the "Last" column.
I'm newer to programming so I don't even know where to start. I have looked at some of the string packages in R and can't quite figure out what to do. Thanks for the help.
We can use substr to extract the first letter of the 'Last' column
df1$Last <- substr(df1$Last, 1, 1)
Or sub to remove all the characters other than the first
df1$Last <- sub("^(.).*", "\\1", df1$Last)
Or another option is to split the characters, select the first element
df1$Last <- sapply(strsplit(df1$Last, ""), `[`, 1)
Just a variation on the #akrun answer which uses sub sans a capture group:
df1$Last <- sub("(?<=.).*$", "", df1$Last, perl=TRUE)
This question already has answers here:
Insert a character at a specific location in a string
(8 answers)
Closed 5 years ago.
I have a dataset with variable names such as FamId00 and ISCO8899 and would like to write a command to insert an underscore before the last two digits, which represent years. What is the best way of doing it? I have tried with regex but the further I got was to:
gsub('.{2}$', '', varname)
which gives me:
FamId
How to I add '_' and the original last two digits back? Also, I have variables in the dataset that do not have the year in the last two digits (i.e. ID and sex). Is there a way to keep the regular expression from affecting those?
We don't need gsub just a sub would be enough as this is only a single instance replacement. Capture the last two characters as a group ((...)) and in the replacement use the _ followed by the backreference of that capture group
sub("(.{2})$", "_\\1", varname)
#[1] "FamId_00" "ISCO88_99"
The . is a metacharacter implying any character. If this needs to be specific i.e. digits, use \\d{2} in place of .{2}
data
varname <- c("FamId00", "ISCO8899")
Alternative solution always using sub() or gsub() and a different pattern.
ids <- c("FamId00", "ISCO8899")
gsub("(^.*)([[:digit:]]{2}$)", "\\1_\\2", ids)
[1] "FamId_00" "ISCO88_99"
This question already has an answer here:
Split delimited single value character vector
(1 answer)
Closed 5 years ago.
I have a string in R in the following form:
"AAAAA","BBBBB","CCCCC",..
And i want to convert it to a standard typical R vector containing the same string elements ("AAAAA", "BBBBB", etc.):
vector<-c("AAAAA","BBBBB","CCCCC",..)
I've read that strsplit could do it, but haven't managed to achieve it.
strsplit gives you back a list of the character vectors, so if you want it in a single vector, use unlist as well.
So,
unlist(strsplit(string, ","))