R-taking reverse of substring of strsplit sentence [duplicate] - r

This question already has answers here:
How to reverse a string in R
(14 answers)
Closed 5 years ago.
I have a sentence, ['this', 'is, 'my', house'].
After splitting it by using "-"as a as separator,and reversing it to[ house, my, is, this], how do I access the last part of string? and join my and is together with house to form another sentence?

sentence <- c("this","is","my","house")
strsplit(sentence[4], split="")[[1]][nchar(sentence[4]):1]
This code might be a bit dense for a beginner to interpret. The [[1]] is necessary because the value of strsplit is always a list, even when it's just one vector of individual characters; the indexing extracts that vector. The indexing after that, [nchar(sentence[4]):1], reorders the letters in that vector backwards, from the last to the first, in this case c(5,4,3,2,1). The split="" argument causes the strsplit function to split the string at every possible point, i.e. between each character.

out <- strsplit(sentence, "-")
last <- out[length(out)]
flip <- rev(last)
word <- paste(flip, collapse='')

Related

Is there an R function to format a character string pattern? [duplicate]

This question already has an answer here:
Split delimited single value character vector
(1 answer)
Closed 5 years ago.
I have a string in R in the following form:
"AAAAA","BBBBB","CCCCC",..
And i want to convert it to a standard typical R vector containing the same string elements ("AAAAA", "BBBBB", etc.):
vector<-c("AAAAA","BBBBB","CCCCC",..)
I've read that strsplit could do it, but haven't managed to achieve it.
strsplit gives you back a list of the character vectors, so if you want it in a single vector, use unlist as well.
So,
unlist(strsplit(string, ","))

Changing a full last name to just the first letter of the name in R [duplicate]

This question already has answers here:
Getting and removing the first character of a string
(7 answers)
Extract the first (or last) n characters of a string
(5 answers)
Closed 2 years ago.
I'm working in R. I have a dataset with people first and last names. There is a column called "First" and another column called "Last".
I want to change "Bodie" to just "B" and do the same for all the observations in the "Last" column.
I'm newer to programming so I don't even know where to start. I have looked at some of the string packages in R and can't quite figure out what to do. Thanks for the help.
We can use substr to extract the first letter of the 'Last' column
df1$Last <- substr(df1$Last, 1, 1)
Or sub to remove all the characters other than the first
df1$Last <- sub("^(.).*", "\\1", df1$Last)
Or another option is to split the characters, select the first element
df1$Last <- sapply(strsplit(df1$Last, ""), `[`, 1)
Just a variation on the #akrun answer which uses sub sans a capture group:
df1$Last <- sub("(?<=.).*$", "", df1$Last, perl=TRUE)

Inserting character into variable names [duplicate]

This question already has answers here:
Insert a character at a specific location in a string
(8 answers)
Closed 5 years ago.
I have a dataset with variable names such as FamId00 and ISCO8899 and would like to write a command to insert an underscore before the last two digits, which represent years. What is the best way of doing it? I have tried with regex but the further I got was to:
gsub('.{2}$', '', varname)
which gives me:
FamId
How to I add '_' and the original last two digits back? Also, I have variables in the dataset that do not have the year in the last two digits (i.e. ID and sex). Is there a way to keep the regular expression from affecting those?
We don't need gsub just a sub would be enough as this is only a single instance replacement. Capture the last two characters as a group ((...)) and in the replacement use the _ followed by the backreference of that capture group
sub("(.{2})$", "_\\1", varname)
#[1] "FamId_00" "ISCO88_99"
The . is a metacharacter implying any character. If this needs to be specific i.e. digits, use \\d{2} in place of .{2}
data
varname <- c("FamId00", "ISCO8899")
Alternative solution always using sub() or gsub() and a different pattern.
ids <- c("FamId00", "ISCO8899")
gsub("(^.*)([[:digit:]]{2}$)", "\\1_\\2", ids)
[1] "FamId_00" "ISCO88_99"

Split comma delimited string [duplicate]

This question already has an answer here:
Split delimited single value character vector
(1 answer)
Closed 5 years ago.
I have a string in R in the following form:
"AAAAA","BBBBB","CCCCC",..
And i want to convert it to a standard typical R vector containing the same string elements ("AAAAA", "BBBBB", etc.):
vector<-c("AAAAA","BBBBB","CCCCC",..)
I've read that strsplit could do it, but haven't managed to achieve it.
strsplit gives you back a list of the character vectors, so if you want it in a single vector, use unlist as well.
So,
unlist(strsplit(string, ","))

How to subset a sapply function output [duplicate]

This question already has answers here:
Extracting nth element from a nested list following strsplit - R
(4 answers)
Closed 5 years ago.
Given a dataframe, I would like to use strsplit on one of my columns, and return the first element of the vector. Here is the example:
testdf<- data.frame(col1= c('string1.string2', 'string3.string4'),
col2= c('somevalue', 'someothervalue'),
stringsAsFactors = FALSE)
I want to generate a new column such as
testdf$col3 <- c('string1', 'string3')
I tried the following:
testdf$col3<- strsplit(testdf$col1, split = '\\.')[[1]])[1]
which, of course, doesn't work. It returns just the first element of the output ('string1') and writes it for the whole column.
One solution would be to write a custom function:
customfx<- function(ind_cell){
my_out<- strsplit(ind_cell, split = '\\.')[[1]][1]
return(my_out)}
Then use it with sapply. I was wondering if there is an alternative to this. The talking stick is yours :)
You can use sub (which is vectorized) with regex for this:
testdf$col3 <- sub("^([^.]+).*", "\\1", testdf$col1)
testdf
# col1 col2 col3
#1 string1.string2 somevalue string1
#2 string3.string4 someothervalue string3
Here use ^([^.]+).* to match the whole string and capture the substring from the beginning until a dot is met, then replace the whole string with the captured group using back reference.

Resources