Issue matching string in list - r

I am trying to see if a list contains a particular string but I am having an issue.
> k
[1] "Investment"
> t
[[1]]
[1] "Investment" "Non-Investment"
> class(k)
[1] "character"
> class(t)
[1] "list"
> k %in% t
[1] FALSE
should not the above code result in TRUE rather than FALSE?

You need to unlist the list:
X <- "investment"
Y <- list(c("non-investment", "investment"))
X %in% unlist(Y)
Note I've changed it to X and Y: t is a base function so it's best not to overwrite it because it might cause conflicts!
One thing to consider is lists with multiple vectors, and figuring out whether you want to be searching across a list of vectors, or within a specific vector. Then you can use unlist to check all vectors simultaneously, and the square brackets to check a specific vector. To illustrate this, here there are sublists in Y, and the X string is in the second list, unlist tells us that X is in Y, while Y[[1]] returns FALSE, because %in% is only checking the first sublist:
X <- "alpha"
Y <- list(c("non-investment", "investment"), c("alpha", "beta"))
X %in% unlist(Y)
X %in% Y[[1]]
Note that, if you had specified Y as just a vector - which is essentially what it is in your example because there are not other sublists - then you could just use:
X <- "investment"
Y <- c("non-investment", "investment")
X %in% Y

The problem with t is it is a length one list of vectors - try k %in% t[[1]]. You may want to use unlist().
EDIT Sorry, list of vector, not lists.

Related

String lengths after splitting

I have a string I would like to split and access the elements as strings. Here is code:
x <- "b0.5,0.5"
y <- strsplit(x,',')
str(y)
y[[1]][1]
str(y[[1]][1])
length(y[[1]][1])
z <- y[[1]][1]
length(z)
substr(z,1,1)
substr(z,1,2)
substr(z,1,3)
substr(z,1,4)
The length of z is 1, but I can access at least 4 substrings of length 1. Can someone explain this to me? Thanks!
Well strsplit() returns a list, so if you want a character vector of components as a result, then unlist the return value:
x <- "b0.5,0.5"
y <- unlist(strsplit(x, ","))
y
[1] "b0.5" "0.5"
Depending on how many entries you have my suggestion would be to use sapply:
x <- "b0.5,0.5"
y <- strsplit(x,',')
sapply(y, \(z) z)
[,1]
[1,] "b0.5"
[2,] "0.5"

R, whether all the elements of X are present in Y

In R, how do you test for elements of one vector NOT present in another vector?
X <- c('a','b','c','d')
Y <- c('b', 'e', 'a','d','c','f', 'c')
I want to know whether all the elements of X are present in Y ? (TRUE or FALSE answer)
You can use all and %in% to test if all values of X are also in Y:
all(X %in% Y)
#[1] TRUE
You want setdiff:
> setdiff(X, Y) # all elements present in X but not Y
character(0)
> length(setdiff(X, Y)) == 0
[1] TRUE
A warning about setdiff : if your input vectors have repeated elements, setdiff will ignore the duplicates. This may or may not be what you want to do.
I wrote a package vecsets , and here's the difference in what you'll get. Note that I modified X to demonstrate the behavior.
library(vecsets)
X <- c('a','b','c','d','d')
Y <- c('b', 'e', 'a','d','c','f', 'c')
setdiff(X,Y)
character(0)
vsetdiff(X,Y)
[1] "d"

Turning a couple of vectors into a list of vectors

Suppose I have a collection of independent vectors, of the same length. For example,
x <- 1:10
y <- rep(NA, 10)
and I wish to turn them into a list whose length is that common length (10 in the given example), in which each element is a vector whose length is the number of independent vectors that were given. In my example, assuming output is the output object, I'd expect
> str(output)
List of 10
$ : num [1:2] 1 NA
...
> output
[[1]]
[1] 1 NA
...
What's the common method of doing that?
use mapply and c:
mapply(c, x, y, SIMPLIFY=FALSE)
[[1]]
[1] 1 NA
[[2]]
[1] 2 NA
..<cropped>..
[[10]]
[1] 10 NA
Another option:
split(cbind(x, y), seq(length(x)))
or even:
split(c(x, y), seq(length(x)))
or even (assuming x has no duplicate values as in your example):
split(c(x, y), x)
Here is a solution that allows you to zip arbitrary number of equi-length vectors into a list, based on position of the element
merge_by_pos <- function(...){
dotlist = list(...)
lapply(seq_along(dotlist), function(i){
Reduce('c', lapply(dotlist, '[[', i))
})
}
x <- 1:10
y <- rep(NA, 10)
z <- 21:30
merge_by_pos(x, y, z)

R programming: Creating a list of paired elements

I have a list of elements say:
l <- c("x","ya1","xb3","yb3","ab","xc3","y","xa1","yd4")
Out of this list I would like to make a list of the matching x,y pairs, i.e.
(("xa1" "ya1") ("xb3" "yb3") ("x" "y"))
In essence, I need to capture the X elements, the Y elements and then pair them up:
I know how to do the X,Y extraction part:
xelems <- grep("^x", l, perl=TRUE, value=TRUE)
yelems <- grep("^y", l, perl=TRUE, value=TRUE)
An X element pairs up with a Y element when
1. xElem == yElem # if xElem and yElem are one char long, i.e. 'x' and 'y'
2. substr(xElem,1,nchar(xElem)) == substr(yElem,1,nchar(yElem))
There is no order, i.e. matching xElem and yElem can be positioned anywhere.
I am however not very sure about the next part. I am more familiar with the SKILL programming language (SKILL is a LISP derivative) and this is how I write it:
procedure( get_xy_pairs(inputList "l")
let(( yElem (xyPairs nil) xList yList)
xList=setof(i inputList rexMatchp("^x" i))
yList=setof(i inputList rexMatchp("^y" i))
when(xList && yList
unless(length(xList)==length(yList)
warn("xList and yList mismatch : %d vs %d\n" length(xList) length(yList))
)
foreach(xElem xList
if(xElem=="x"
then yElem="y"
else yElem=strcat("y" substring(xElem 2 strlen(xElem)))
)
if(member(yElem yList)
then xyPairs=cons(list(xElem yElem) xyPairs)
else warn("x element %s has no matching y element \n" xElem)
)
)
)
xyPairs
)
)
When run on l, this would return
get_xy_pairs(l)
*WARNING* x element xc3 has no matching y element
(("xa1" "ya1") ("xb3" "yb3") ("x" "y"))
As I am still new to R, I would appreciate if you folks can help. Besides, I do understand the R folks tend to avoid for loops and are more into lapply ?
Maybe something like this would work. (Only tested on your sample data.)
## Remove any item not starting with x or y
l2 <- l[grepl("^x|^y", l)]
## Split into a list of items starting with x
## and items starting with y
L <- split(l2, grepl("^x", l2))
## Give "names" to the "starting with y" group
names(L[[1]]) <- gsub("^y", "x", L[[1]])
## Use match to match the names in the y group with
## the values from the x group. This results in a
## nice named vector with the pairs you want
Matches <- L[[1]][match(L[[2]], names(L[[1]]), nomatch=0)]
Matches
# x xb3 xa1
# "y" "yb3" "ya1"
As a data.frame:
MatchesDF <- data.frame(x = names(Matches), y = unname(Matches))
MatchesDF
# x y
# 1 x y
# 2 xb3 yb3
# 3 xa1 ya1
I would store tuples in a list, i.e:
xypairs
[[1]]
[1] "x" "y"
[[2]]
[2] "xb3" "yb3"
Your procedure can be simplified with match and substring.
xends <- substring(xelems, 2)
yends <- substring(yelems, 2)
ypaired <- match(xends, yends) # Indices of yelems that match xelems
# Now we need to handle the no-matches:
xsorted <- c(xelems, rep(NA, sum(is.na(ypaired))))
ysorted <- yelems[ypaired]
ysorted <- c(ysorted, yelems[!(yelems %in% ysorted)])
# Now we create the list of tuples:
xypairs <- lapply(1:length(ysorted), function(i) {
c(xsorted[i], ysorted[i])
})
Result:
xypairs
[[1]]
[1] "x" "y"
[[2]]
[1] "xb3" "yb3"
[[3]]
[1] "xc3" NA
[[4]]
[1] "xa1" "ya1"
[[5]]
[1] NA "yd4"

Consistently subset matrix to a vector and avoid colnames?

I would like to know if there is R syntax to extract a column from a matrix and always have no name attribute on the returned vector (I wish to rely on this behaviour).
My problem is the following inconsistency:
when a matrix has more than one row and I do myMatrix[, 1] I will get the first column of myMatrix with no name attribute. This is what I want.
when a matrix has exactly one row and I do myMatrix[, 1], I will get the first column of myMatrix but it has the first colname as its name.
I would like to be able to do myMatrix[, 1] and consistently get something with no name.
An example to demonstrate this:
# make a matrix with more than one row,
x <- matrix(1:2, nrow=2)
colnames(x) <- 'foo'
# foo
# [1,] 1
# [2,] 2
# extract first column. Note no 'foo' name is attached.
x[, 1]
# [1] 1 2
# now suppose x has just one row (and is a matrix)
x <- x[1, , drop=F]
# extract first column
x[, 1]
# foo # <-- we keep the name!!
# 1
Now, the documentation for [ (?'[') mentions this behaviour, so it's not a bug or anything (although, why?! why this inconsistency?!):
A vector obtained by matrix indexing will be unnamed unless ‘x’ is one-dimensional when the row names (if any) will be indexed to provide names for the result.
My question is, is there a way to do x[, 1] such that the result is always unnamed, where x is a matrix?
Is my only hope unname(x[, 1]) or is there something analogous to ['s drop argument? Or is there an option I can set to say "always unname"? Some trick I can use (somehow override ['s behaviour when the extracted result is a vector?)
Update on why the code below works (as far as I can tell)
Subsetting with [ is handled using functions contained in the R source file subset.c in ~/src/main. When using matrix indexing to subset a matrix, the function VectorSubset is called. When there is more than one index used (i.e., one each for rows and columns as in x[,1]), then MatrixSubset is called.
The function VectorSubset only assigns names to 1-dimensional arrays being subsetted. Since a matrix is a 2-D array, no names are assigned to the result when using matrix indexing. The function MatrixSubset, however, does attempt to pass on dimnames under certain circumstances.
Therefore, the matrix indexing you refer to in the quote from the help page seems to be the key:
x <- matrix(1)
colnames(x) <- "foo"
x[, 1] ## 'Normal' indexing
# foo
# 1
x[matrix(c(1, 1), ncol = 2)] ## Matrix indexing
# [1] 1
And with a wider 1-row matrix:
xx <- matrix(1:10, nrow = 1)
colnames(xx) <- sprintf('foo%i', seq_len(ncol(xx)))
xx[, 6] ## 'Normal' indexing
# foo6
# 6
xx[matrix(c(1, 6), ncol = 2)] ## Matrix indexing
# [1] 6
With a matrix with both dimensions > 1:
yy <- matrix(1:10, nrow = 2, dimnames = list(NULL,
sprintf('foo%i', 1:5)))
yy[cbind(seq_len(nrow(yy)), 3)] ## Matrix indexing
# [1] 5 6

Resources