How do i atribbute different parameters to the function strsplit(split = " ")? [duplicate] - r

This question already has answers here:
Split column at delimiter in data frame [duplicate]
(6 answers)
R strsplit with multiple unordered split arguments?
(4 answers)
Closed 2 years ago.
as the title of the question says, i want to know how do i atribbute different parameters to the function
strsplit(x, split = " ")
If i apply it, i get every word in my string as a single vector, when it is separeted by space-bar. Ok. But the point is, i want also to split words that are connected with a dot (like banana.apple turning to "banana" and "apple").
I tought something like this (below) would work, but it doesnt...
strsplit(x, split = " ", "[.]")
Can anybody help me?

This should work if you want to split on both:
library(stringr)
x <- c("banana.apple turning.something")
str_split(x, "[\\.\\s]")
# [[1]]
# [1] "banana" "apple" "turning" "something"

Related

R Combine two gsub() statements [duplicate]

This question already has answers here:
Replace multiple strings in one gsub() or chartr() statement in R?
(9 answers)
Closed 1 year ago.
I have these two lines of R codes:
df$symbol <- gsub("\\^", "-P", df$symbol) # find "^" and change it to "-P"
df$symbol <- gsub("/", "-", df$symbol) # find "/" and change it to "-"
How can I combine them into one line?
Thank you!
Given that you have two different replacement strings, there may not be a way to do this with just a single call to gsub. However, you could chain two calls to gsub here:
df$symbol <- gsub("/", "-", gsub("\\^", "-P", df$symbol))

Create a character from column names (in R) [duplicate]

This question already has answers here:
R regex find last occurrence of delimiter
(4 answers)
Closed 1 year ago.
I have a matrix with thousands of columns which names are as shown below:
Z41_5_tes_ACGTTCCATAGCCGTA
Z41_5_ACGTTCCAGAGCGGTA
Z53_5_ACGTTCCAGAGCCGTA
Z53_5_ACGTTCCAGATCTGTA
Z41_5_ACGTTGCATAGCGGTA
Z41_5_tes_ACGTTCGCTAGCCGTA
I would like to create a vector with names that include the beginning of each columns names as shown below:
Z41_5_tes
Z41_5
Z53_5
Z53_5
Z41_5
Z41_5_tes
I have tried but here I did not capture Z41_5_tes.
names <- gsub("^([^]*[^_]).$", "\1", colnames(x#data))
Z41_5
Z53_5
Remove everything after the last underscore.
sub('_[^_]*$', '', x)
#[1] "Z41_5_tes" "Z41_5" "Z53_5" "Z53_5" "Z41_5" "Z41_5_tes"
Extract everything before last underscore.
sub('(.*)_.*', '\\1', x)
#[1] "Z41_5_tes" "Z41_5" "Z53_5" "Z53_5" "Z41_5" "Z41_5_tes"
data
x <- c("Z41_5_tes_ACGTTCCATAGCCGTA", "Z41_5_ACGTTCCAGAGCGGTA",
"Z53_5_ACGTTCCAGAGCCGTA", "Z53_5_ACGTTCCAGATCTGTA",
"Z41_5_ACGTTGCATAGCGGTA", "Z41_5_tes_ACGTTCGCTAGCCGTA")

Changing a full last name to just the first letter of the name in R [duplicate]

This question already has answers here:
Getting and removing the first character of a string
(7 answers)
Extract the first (or last) n characters of a string
(5 answers)
Closed 2 years ago.
I'm working in R. I have a dataset with people first and last names. There is a column called "First" and another column called "Last".
I want to change "Bodie" to just "B" and do the same for all the observations in the "Last" column.
I'm newer to programming so I don't even know where to start. I have looked at some of the string packages in R and can't quite figure out what to do. Thanks for the help.
We can use substr to extract the first letter of the 'Last' column
df1$Last <- substr(df1$Last, 1, 1)
Or sub to remove all the characters other than the first
df1$Last <- sub("^(.).*", "\\1", df1$Last)
Or another option is to split the characters, select the first element
df1$Last <- sapply(strsplit(df1$Last, ""), `[`, 1)
Just a variation on the #akrun answer which uses sub sans a capture group:
df1$Last <- sub("(?<=.).*$", "", df1$Last, perl=TRUE)

Split comma delimited string [duplicate]

This question already has an answer here:
Split delimited single value character vector
(1 answer)
Closed 5 years ago.
I have a string in R in the following form:
"AAAAA","BBBBB","CCCCC",..
And i want to convert it to a standard typical R vector containing the same string elements ("AAAAA", "BBBBB", etc.):
vector<-c("AAAAA","BBBBB","CCCCC",..)
I've read that strsplit could do it, but haven't managed to achieve it.
strsplit gives you back a list of the character vectors, so if you want it in a single vector, use unlist as well.
So,
unlist(strsplit(string, ","))

R-taking reverse of substring of strsplit sentence [duplicate]

This question already has answers here:
How to reverse a string in R
(14 answers)
Closed 5 years ago.
I have a sentence, ['this', 'is, 'my', house'].
After splitting it by using "-"as a as separator,and reversing it to[ house, my, is, this], how do I access the last part of string? and join my and is together with house to form another sentence?
sentence <- c("this","is","my","house")
strsplit(sentence[4], split="")[[1]][nchar(sentence[4]):1]
This code might be a bit dense for a beginner to interpret. The [[1]] is necessary because the value of strsplit is always a list, even when it's just one vector of individual characters; the indexing extracts that vector. The indexing after that, [nchar(sentence[4]):1], reorders the letters in that vector backwards, from the last to the first, in this case c(5,4,3,2,1). The split="" argument causes the strsplit function to split the string at every possible point, i.e. between each character.
out <- strsplit(sentence, "-")
last <- out[length(out)]
flip <- rev(last)
word <- paste(flip, collapse='')

Resources