String lengths after splitting - r

I have a string I would like to split and access the elements as strings. Here is code:
x <- "b0.5,0.5"
y <- strsplit(x,',')
str(y)
y[[1]][1]
str(y[[1]][1])
length(y[[1]][1])
z <- y[[1]][1]
length(z)
substr(z,1,1)
substr(z,1,2)
substr(z,1,3)
substr(z,1,4)
The length of z is 1, but I can access at least 4 substrings of length 1. Can someone explain this to me? Thanks!

Well strsplit() returns a list, so if you want a character vector of components as a result, then unlist the return value:
x <- "b0.5,0.5"
y <- unlist(strsplit(x, ","))
y
[1] "b0.5" "0.5"

Depending on how many entries you have my suggestion would be to use sapply:
x <- "b0.5,0.5"
y <- strsplit(x,',')
sapply(y, \(z) z)
[,1]
[1,] "b0.5"
[2,] "0.5"

Related

Issue matching string in list

I am trying to see if a list contains a particular string but I am having an issue.
> k
[1] "Investment"
> t
[[1]]
[1] "Investment" "Non-Investment"
> class(k)
[1] "character"
> class(t)
[1] "list"
> k %in% t
[1] FALSE
should not the above code result in TRUE rather than FALSE?
You need to unlist the list:
X <- "investment"
Y <- list(c("non-investment", "investment"))
X %in% unlist(Y)
Note I've changed it to X and Y: t is a base function so it's best not to overwrite it because it might cause conflicts!
One thing to consider is lists with multiple vectors, and figuring out whether you want to be searching across a list of vectors, or within a specific vector. Then you can use unlist to check all vectors simultaneously, and the square brackets to check a specific vector. To illustrate this, here there are sublists in Y, and the X string is in the second list, unlist tells us that X is in Y, while Y[[1]] returns FALSE, because %in% is only checking the first sublist:
X <- "alpha"
Y <- list(c("non-investment", "investment"), c("alpha", "beta"))
X %in% unlist(Y)
X %in% Y[[1]]
Note that, if you had specified Y as just a vector - which is essentially what it is in your example because there are not other sublists - then you could just use:
X <- "investment"
Y <- c("non-investment", "investment")
X %in% Y
The problem with t is it is a length one list of vectors - try k %in% t[[1]]. You may want to use unlist().
EDIT Sorry, list of vector, not lists.

R: Filter vectors by 'two-way' partial match

With two vectors
x <- c("abc", "12")
y <- c("bc", "123", "nomatch")
is there a way to do a filter of both by 'two-way' partial matching (remove elements in one vector if they contain or are contained in any element in the other vector) so that the result are these two vectors:
x1 <- c()
y1 <- c("nomatch")
To explain - every element of x is either a substring or a superstring of one of the elements of y, hence x1 is empty. Update - it is not sufficient for a substring to match the initial chars - a substring might be found anywhere in the string it matches. Example above has been updated to reflect this.
I originally thought ?pmatch might be handy, but your edit clarifies you don't just want to match the start of items. Here's a function that should work:
remover <- function(x,y) {
pmx <- sapply(x, grep, x=y)
pmy <- sapply(y, grep, x=x)
hit <- unlist(c(pmx,pmy))
list(
x[!(seq_along(x) %in% hit)],
y[!(seq_along(y) %in% hit)]
)
}
remover(x,y)
#[[1]]
#character(0)
#
#[[2]]
#[1] "nomatch"
It correctly does nothing when no match is found (thanks #Frank for picking up the earlier error):
remover("yo","nomatch")
#[[1]]
#[1] "yo"
#
#[[2]]
#[1] "nomatch"
We can do the following:
# Return data.frame of matches of a in b
m <- function(a, b) {
data.frame(sapply(a, function(w) grepl(w, b), simplify = F));
}
# Match x and y and remove
x0 <- x[!apply(m(x, y), 2, any)]
y0 <- y[!apply(m(x, y), 1, any)]
# Match y and x and remove
x1 <- x0[!apply(m(y0, x0), 1, any)]
y1 <- y0[!apply(m(y0, x0), 2, any)]
x1;
#character(0)
x2;
#[1] "nomatch"
I build a matrix of all possible matches in both directions, then combine both with | as a match in any direction is equally a match, and then and use it to subset x and y:
x <- c("abc", "12")
y <- c("bc", "123", "nomatch")
bool_mat <- sapply(x,function(z) grepl(z,y)) | t(sapply(y,function(z) grepl(z,x)))
x1 <- x[!apply(bool_mat,2,any)] # character(0)
y1 <- y[!apply(bool_mat,1,any)] # [1] "nomatch"

How to extract some characters based on certain patterns from a vector?

Here are the data
x <- c("a01|a44;b013|b021|c35;c014|c035|c078")
y <- c("a03|a41;b033|b021|72;c014|c031|c078")
z <- c("a01|a44;c014|c035|c078;b013|b021|d35|c33")
v <- c(x, y, z)
I want to extract the third element separated by "|" from a string starting with "b0". The expected result is c35,72,d35.
We can try
sapply(strsplit(v, ';'), function(x)
sapply(strsplit(x[grep('^b0', x)], '[|]'), `[`,3))
#[1] "c35" "72" "d35"
Or use sub
sub('.*;b0\\d{2}\\|[^|]+\\|([^;|]+).*', '\\1', v)
#[1] "c35" "72" "d35"

Turning a couple of vectors into a list of vectors

Suppose I have a collection of independent vectors, of the same length. For example,
x <- 1:10
y <- rep(NA, 10)
and I wish to turn them into a list whose length is that common length (10 in the given example), in which each element is a vector whose length is the number of independent vectors that were given. In my example, assuming output is the output object, I'd expect
> str(output)
List of 10
$ : num [1:2] 1 NA
...
> output
[[1]]
[1] 1 NA
...
What's the common method of doing that?
use mapply and c:
mapply(c, x, y, SIMPLIFY=FALSE)
[[1]]
[1] 1 NA
[[2]]
[1] 2 NA
..<cropped>..
[[10]]
[1] 10 NA
Another option:
split(cbind(x, y), seq(length(x)))
or even:
split(c(x, y), seq(length(x)))
or even (assuming x has no duplicate values as in your example):
split(c(x, y), x)
Here is a solution that allows you to zip arbitrary number of equi-length vectors into a list, based on position of the element
merge_by_pos <- function(...){
dotlist = list(...)
lapply(seq_along(dotlist), function(i){
Reduce('c', lapply(dotlist, '[[', i))
})
}
x <- 1:10
y <- rep(NA, 10)
z <- 21:30
merge_by_pos(x, y, z)

R programming: Creating a list of paired elements

I have a list of elements say:
l <- c("x","ya1","xb3","yb3","ab","xc3","y","xa1","yd4")
Out of this list I would like to make a list of the matching x,y pairs, i.e.
(("xa1" "ya1") ("xb3" "yb3") ("x" "y"))
In essence, I need to capture the X elements, the Y elements and then pair them up:
I know how to do the X,Y extraction part:
xelems <- grep("^x", l, perl=TRUE, value=TRUE)
yelems <- grep("^y", l, perl=TRUE, value=TRUE)
An X element pairs up with a Y element when
1. xElem == yElem # if xElem and yElem are one char long, i.e. 'x' and 'y'
2. substr(xElem,1,nchar(xElem)) == substr(yElem,1,nchar(yElem))
There is no order, i.e. matching xElem and yElem can be positioned anywhere.
I am however not very sure about the next part. I am more familiar with the SKILL programming language (SKILL is a LISP derivative) and this is how I write it:
procedure( get_xy_pairs(inputList "l")
let(( yElem (xyPairs nil) xList yList)
xList=setof(i inputList rexMatchp("^x" i))
yList=setof(i inputList rexMatchp("^y" i))
when(xList && yList
unless(length(xList)==length(yList)
warn("xList and yList mismatch : %d vs %d\n" length(xList) length(yList))
)
foreach(xElem xList
if(xElem=="x"
then yElem="y"
else yElem=strcat("y" substring(xElem 2 strlen(xElem)))
)
if(member(yElem yList)
then xyPairs=cons(list(xElem yElem) xyPairs)
else warn("x element %s has no matching y element \n" xElem)
)
)
)
xyPairs
)
)
When run on l, this would return
get_xy_pairs(l)
*WARNING* x element xc3 has no matching y element
(("xa1" "ya1") ("xb3" "yb3") ("x" "y"))
As I am still new to R, I would appreciate if you folks can help. Besides, I do understand the R folks tend to avoid for loops and are more into lapply ?
Maybe something like this would work. (Only tested on your sample data.)
## Remove any item not starting with x or y
l2 <- l[grepl("^x|^y", l)]
## Split into a list of items starting with x
## and items starting with y
L <- split(l2, grepl("^x", l2))
## Give "names" to the "starting with y" group
names(L[[1]]) <- gsub("^y", "x", L[[1]])
## Use match to match the names in the y group with
## the values from the x group. This results in a
## nice named vector with the pairs you want
Matches <- L[[1]][match(L[[2]], names(L[[1]]), nomatch=0)]
Matches
# x xb3 xa1
# "y" "yb3" "ya1"
As a data.frame:
MatchesDF <- data.frame(x = names(Matches), y = unname(Matches))
MatchesDF
# x y
# 1 x y
# 2 xb3 yb3
# 3 xa1 ya1
I would store tuples in a list, i.e:
xypairs
[[1]]
[1] "x" "y"
[[2]]
[2] "xb3" "yb3"
Your procedure can be simplified with match and substring.
xends <- substring(xelems, 2)
yends <- substring(yelems, 2)
ypaired <- match(xends, yends) # Indices of yelems that match xelems
# Now we need to handle the no-matches:
xsorted <- c(xelems, rep(NA, sum(is.na(ypaired))))
ysorted <- yelems[ypaired]
ysorted <- c(ysorted, yelems[!(yelems %in% ysorted)])
# Now we create the list of tuples:
xypairs <- lapply(1:length(ysorted), function(i) {
c(xsorted[i], ysorted[i])
})
Result:
xypairs
[[1]]
[1] "x" "y"
[[2]]
[1] "xb3" "yb3"
[[3]]
[1] "xc3" NA
[[4]]
[1] "xa1" "ya1"
[[5]]
[1] NA "yd4"

Resources