Split a character to letters and numbers

Split a character to letters and numbers - r

I have a unique character, each letter follows a number. For instance: A1B10C5
I would like to split it into letter <- c(A, B, C) and number <- c(1, 10, 5) using R.

We can use regex lookarounds to split between the letters and numbers
v1 <- strsplit(str1, "(?<=[A-Za-z])(?=[0-9])|(?<=[0-9])(?=[A-Za-z])", perl = TRUE)[[1]]
v1[c(TRUE, FALSE)]
#[1] "A" "B" "C"
as.numeric(v1[c(FALSE, TRUE)])
#[1] 1 10 5
data
str1 <- "A1B10C5"

str_extract_all is another way to do this:
library(stringr)
> str <- "A1B10C5"
> str
[1] "A1B10C5"
> str_extract_all(str, "[0-9]+")
[[1]]
[1] "1" "10" "5"
> str_extract_all(str, "[aA-zZ]+")
[[1]]
[1] "A" "B" "C"

To extract letters and numbers at same time, you can use str_match_all to get letters and numbers in two separate columns:
library(stringr)
str_match_all("A1B10C5", "([a-zA-Z]+)([0-9]+)")[[1]][,-1]
# [,1] [,2]
#[1,] "A" "1"
#[2,] "B" "10"
#[3,] "C" "5"

You can also use the base R regmatches with gregexpr:
regmatches(this, gregexpr("[0-9]+", "A1B10C5"))
[[1]]
[1] "1" "10" "5"
regmatches(this, gregexpr("[A-Z]+", "A1B10C5"))
[[1]]
[1] "A" "B" "C"
These return lists with a single element, a character vector. As akrun does, you can extract the list item using [[1]] and can also convert the vector of digits to numeric like this:
as.numeric(regmatches(this, gregexpr("[0-9]+", this))[[1]])

Related

R: strsplit on negative lookaround

Say I need to strsplit caabacb into individual letters except when a letter is followed by a b, thus resulting in "c" "a" "ab" "a" "cb". I tried using the following line, which looks OK on regex tester but does not work in R. What did I do wrong?
strsplit('caabacb','(?!b)',perl=TRUE)
[[1]]
[1] "c" "a" "a" "b" "a" "c" "b"

You could also add a prefix positive lookbehind that matches any character (?<=.). The positive lookbehind (?<=.) would split the string at every character (without removal of characters), but the negative lookahead (?!b) excludes splits where a character is followed by a b:
strsplit('caabacb', '(?<=.)(?!b)', perl = TRUE)
#> [[1]]
#> [1] "c" "a" "ab" "a" "cb"

strsplit() probably needs something to split. You could insert e.g. a ";" with gsub().
strsplit(gsub("(?!^.|b|\\b)", ";", "caabacb", perl=TRUE), ";", perl=TRUE)
# [[1]]
# [1] "c" "a" "ab" "a" "cb"

Trim text after character for every item in list - R

I am trying to remove the text before and including a character ("-") for every element in a list.
Ex-
x = list(c("a-b","b-c","c-d"),c("a-b","e-f"))
desired output:
"b" "c" "d"
"b" "f"
I have tried using various combinations of lapply and gsub, such as
lapply(x,gsub,'.*-','',x)
but this just returns a null list-
[[1]]
[1] ""
[[2]]
[1] ""
And only using
gsub(".*-","",x)
returns
"d\")" "f\")"

You are close, but using lapply with gsub, R doesn't know which arguments are which. You just need to label the arguments explicitly.
x <- list(c("a-b","b-c","c-d"),c("a-b","e-f"))
lapply(x, gsub, pattern = "^.*-", replacement = "")
[[1]]
[1] "b" "c" "d"
[[2]]
[1] "b" "f"

This can be done with a for loop.
val<-list()
for(i in 1:length(x)){
val[[i]]<-gsub('.*-',"",x[[i]])}
val
[[1]]
[1] "b" "c" "d"
[[2]]
[1] "b" "f"

How do I apply an index vector over a list of vectors?

I want to apply a long index vector (50+ non-sequential integers) to a long list of vectors (50+ character vectors containing 100+ names) in order to retrieve specific values (as a list, vector, or data frame).
A simplified example is below:
> my.list <- list(c("a","b","c"),c("d","e","f"))
> my.index <- 2:3
Desired Output
[[1]]
[1] "b"
[[2]]
[1] "f"
##or
[1] "b"
[1] "f"
##or
[1] "b" "f"
I know I can get the same value from each element using:
> lapply(my.list, function(x) x[2])
##or
> lapply(my.list,'[', 2)
I can pull the second and third values from each element by:
> lapply(my.list,'[', my.index)
[[1]]
[1] "b" "c"
[[2]]
[1] "e" "f"
##or
> for(j in my.index) for(i in seq_along(my.list)) print(my.list[[i]][[j]])
[1] "b"
[1] "e"
[1] "c"
[1] "f"
I don't know how to pull just the one value from each element.
I've been looking for a few days and haven't found any examples of this being done, but it seems fairly straight forward. Am I missing something obvious here?
Thank you,
Scott

Whenever you have a problem that is like lapply but involves multiple parallel lists/vectors, consider Map or mapply (Map simply being a wrapper around mapply with SIMPLIFY=FALSE hardcoded).
Try this:
Map("[",my.list,my.index)
#[[1]]
#[1] "b"
#
#[[2]]
#[1] "f"
..or:
mapply("[",my.list,my.index)
#[1] "b" "f"

R - generate all combinations from 2 vectors given constraints

I would like to generate all combinations of two vectors, given two constraints: there can never be more than 3 characters from the first vector, and there must always be at least one characters from the second vector. I would also like to vary the final number of characters in the combination.
For instance, here are two vectors:
vec1=c("A","B","C","D")
vec2=c("W","X","Y","Z")
Say I wanted 3 characters in the combination. Possible acceptable permutations would be: "A" "B" "X"or "A" "Y" "Z". An unacceptable permutation would be: "A" "B" "C" since there is not at least one character from vec2.
Now say I wanted 5 characters in the combination. Possible acceptable permutations would be: "A" "C" "Z" "Y" or "A" "Y" "Z" "X". An unacceptable permutation would be: "A" "C" "D" "B" "X" since there are >3 characters from vec2.
I suppose I could use expand.grid to generate all combinations and then somehow subset, but there must be an easier way. Thanks in advance!

I'm not sure wheter this is easier, but you can leave away permutations that do not satisfy your conditions whith this strategy:
generate all combinations from vec1 that are acceptable.
generate all combinations from vec2 that are acceptable.
generate all combinations taking one solution from 1. + one solution from 2. Here I'd do the filtering with condition 3 afterwards.
(if you're looking for combinations, you're done, otherwise:) produce all permutations of letters within each result.
Now, let's have
vec1 <- LETTERS [1:4]
vec2 <- LETTERS [23:26]
## lists can eat up lots of memory, so use character vectors instead.
combine <- function (x, y)
combn (y, x, paste, collapse = "")
res1 <- unlist (lapply (0:3, combine, vec1))
res2 <- unlist (lapply (1:length (vec2), combine, vec2))
now we have:
> res1
[1] "" "A" "B" "C" "D" "AB" "AC" "AD" "BC" "BD" "CD" "ABC"
[13] "ABD" "ACD" "BCD"
> res2
[1] "W" "X" "Y" "Z" "WX" "WY" "WZ" "XY" "XZ" "YZ"
[11] "WXY" "WXZ" "WYZ" "XYZ" "WXYZ"
res3 <- outer (res1, res2, paste0)
res3 <- res3 [nchar (res3) == 5]
So here you are:
> res3
[1] "ABCWX" "ABDWX" "ACDWX" "BCDWX" "ABCWY" "ABDWY" "ACDWY" "BCDWY" "ABCWZ"
[10] "ABDWZ" "ACDWZ" "BCDWZ" "ABCXY" "ABDXY" "ACDXY" "BCDXY" "ABCXZ" "ABDXZ"
[19] "ACDXZ" "BCDXZ" "ABCYZ" "ABDYZ" "ACDYZ" "BCDYZ" "ABWXY" "ACWXY" "ADWXY"
[28] "BCWXY" "BDWXY" "CDWXY" "ABWXZ" "ACWXZ" "ADWXZ" "BCWXZ" "BDWXZ" "CDWXZ"
[37] "ABWYZ" "ACWYZ" "ADWYZ" "BCWYZ" "BDWYZ" "CDWYZ" "ABXYZ" "ACXYZ" "ADXYZ"
[46] "BCXYZ" "BDXYZ" "CDXYZ" "AWXYZ" "BWXYZ" "CWXYZ" "DWXYZ"
If you prefer the results split into single letters:
res <- matrix (unlist (strsplit (res3, "")), nrow = length (res3), byrow = TRUE)
> res
[,1] [,2] [,3] [,4] [,5]
[1,] "A" "B" "C" "W" "X"
[2,] "A" "B" "D" "W" "X"
[3,] "A" "C" "D" "W" "X"
[4,] "B" "C" "D" "W" "X"
(snip)
[51,] "C" "W" "X" "Y" "Z"
[52,] "D" "W" "X" "Y" "Z"
Which are your combinations.

Better way to apply this function to each row a data frame?

I'd like to apply a function to each row of a data frame, as below. I know how to use apply in the case where the data frame contains only numbers, but what if the rows contain, say, booleans / logicals, strings and integers? Example:
df <- data.frame(x=1:10,
y=c(TRUE, FALSE),
z=letters[1:10],
stringsAsFactors=FALSE)
RowFunction <- function(row) {
if (row$y) return(row$x)
return (row$z)
}
sapply(1:dim(df)[1], function(i) { RowFunction(df[i, ]) })
Is there a better way to do this? My first thought was to use apply(df, 1, RowFunction) after adding row <- as.list(row) to the beginning of RowFunction, but this doesn't work because apply coerces df into an array, which can't handle rows containing different data types.
Just for my R knowledge, I'd like to know if there is a cleaner way to do this than sapply(1:dim(df)[1], ... ). Any ideas?
Thanks in advance!

In this case, you can simply use ifelse:
sapply(1:dim(df)[1], function(i) { RowFunction(df[i, ]) })
[1] "1" "b" "3" "d" "5" "f" "7" "h" "9" "j"
with(df, ifelse(y, x, z))
[1] "1" "b" "3" "d" "5" "f" "7" "h" "9" "j"
For convenience and readability I also used with - this allows you to refer to a column just by name, without using the $ operator.

The ifelse function can do it with lapply:
lapply(df$y, ifelse, df$x, df$z) # does return list with varying modes
My earlier (more clunky) version:
res <- list()
for(i in seq_along(rownames(df) ) ) { res <- c(res, df[i,1+2*!df[i,"y"] ]) }
res
#--------
[[1]]
[1] 1
[[2]]
[1] "b"
[[3]]
[1] 3
[[4]]
[1] "d"
[[5]]
[1] 5
[[6]]
[1] "f"
[[7]]
[1] 7
[[8]]
[1] "h"
[[9]]
[1] 9
[[10]]
[1] "j"

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Split a character to letters and numbers - r

I have a unique character, each letter follows a number. For instance: A1B10C5 I would like to split it into letter <- c(A, B, C) and number <- c(1, 10, 5) using R.

We can use regex lookarounds to split between the letters and numbers v1 <- strsplit(str1, "(?<=[A-Za-z])(?=[0-9])|(?<=[0-9])(?=[A-Za-z])", perl = TRUE)[[1]] v1[c(TRUE, FALSE)] #[1] "A" "B" "C" as.numeric(v1[c(FALSE, TRUE)]) #[1] 1 10 5 data str1 <- "A1B10C5"

str_extract_all is another way to do this: library(stringr) > str <- "A1B10C5" > str [1] "A1B10C5" > str_extract_all(str, "[0-9]+") [[1]] [1] "1" "10" "5" > str_extract_all(str, "[aA-zZ]+") [[1]] [1] "A" "B" "C"

To extract letters and numbers at same time, you can use str_match_all to get letters and numbers in two separate columns: library(stringr) str_match_all("A1B10C5", "([a-zA-Z]+)([0-9]+)")[[1]][,-1] # [,1] [,2] #[1,] "A" "1" #[2,] "B" "10" #[3,] "C" "5"

Related

R: strsplit on negative lookaround

Trim text after character for every item in list - R

How do I apply an index vector over a list of vectors?

R - generate all combinations from 2 vectors given constraints

Better way to apply this function to each row a data frame?

Categories

Resources