Vector-list comparison in R

Vector-list comparison in R - r

I am currently trying to check if a list(containing multiple vectors filled with values) is equal to a vector. Unfortunately the following functions did not worked for me: match(), any(), %in%. An example of what I am trying to achieve is given below:
Lets say:
lists=list(c(1,2,3,4),c(5,6,7,8),c(9,7))
vector=c(1,2,3,4)
answer=match(lists,vector)
When I execute this it does return False values instead of a positive result. When I compare a vector with a vector is working but when I compare a vector with a list it seems that it can not work properly.

I would use intersect, something like this :
lapply(lists,intersect,vector)
[[1]]
[1] 1 2 3 4
[[2]]
numeric(0)
[[3]]
numeric(0)

I'm not completely sure what you want the result to be (for example do you care about vector order?) but regardless you'll need to think about lapply. For example,
##Create some data
R> lists=list(c(1,2,3,4),c(5,6,7,8),c(9,7))
R> vector=c(1,2,3,4)
then we use lapply to go through each list element and apply a function. In this case, I've used the match function (since you mentioned that in your question):
R> lapply(lists, function(i) all(match(i, vector)))
[[1]]
[1] TRUE
[[2]]
[1] NA
[[3]]
[1] NA
It's probably worth converting to a vector, so
R> unlist(lapply(lists, function(i) all(match(i, vector))))
[1] TRUE NA NA
and to change NA to FALSE, something like:
m = unlist(lapply(lists, function(i) all(match(i, vector))))
m[is.na(m)] = FALSE

Related

Why when I pass a dataframe of integers to apply function in R the variable gets transformed?

I'm quite newbie in R, so please be indulgent. This must be something really simple. I have a function that I would like to execute over every element of a dataframe. Minimal example:
agenericfunction = function(pos) {
print(pos)
}
When I call the function like this:
apply(as.data.frame(1:5), 1, agenericfunction)
This is what I get:
1:5
1
1:5
2
1:5
3
1:5
4
1:5
5
[1] 1 2 3 4 5
But if I modify the function like this:
agenericfunction = function(pos) {
print(paste0("_",pos))
}
I then get what one would normally expect:
[1] "_1"
[1] "_2"
[1] "_3"
[1] "_4"
[1] "_5"
[1] "_1" "_2" "_3" "_4" "_5"
I do not understand why my integer 'pos' variable gets converted in the first case into some weird thing that provokes that output. If I use the "class" function on "pos", it always says that it is a integer (in any of the two cases above). Could someone explain this behaviour?
Thanks in advance and best regards

What you are seeing in the 1st case is the column name above each integer. Meaning that when the as.data.frame is called, it creates a column with colnames 1:8.
If you do df <- as.data.frame(1:8) and then colnames(df) you will see this:
> names(t)
[1] "1:8"
As to why its working with paste0, paste0 ignores the column name and just returns the values that are inside. Modifying your function to return return(paste0(pos)) will yield the result you want.
If you want to avoid that, best create a matrix instead of a data.frame:
apply(matrix(1:5), 1, agenericfunction)
Output:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 1 2 3 4 5

Avoiding a loop on a strsplit list

I have a vector v where each entry is one or more strings (or possibly character(0)) seperated by semicolons:
ABC
DEF;ABC;QWE
TRF
character(0)
ABC;GFD
I need to find the indices of the vector which contain "ABC" (1,2,5 or a logical vector T,T,F,F,T) after splitting on ";"
I am currently using a loop as follows:
toSelect=integer(0)
for(i in c(1:length(v))){
if(length(v[i])==0) next
words=strsplit(v[i],";")[[1]]
if(!is.na(match("ABC",words))) toSelect=c(toSelect,i)
}
Unfortunately, my vector has 450k entries, so this takes far too long. I would prefer create a logical vector by doing something like
toSelect=(!is.na(match("ABC",strsplit(v,";")))
But since strsplit returns a list, I can't find a way to properly format strsplit(v,";") as a vector (unlist won't do since it would ruin the indices). Does anybody have any ideas on how to speed up this code?
Thanks!

Use regular expressions:
v = list("ABC", "DEF;ABC;QWE", "TRF", character(0), "ABC;GFD")
grep("(^|;)ABC($|;)", v)
#[1] 1 2 5

The tricky part is dealing with character(0), which #BlueMagister fudges by replacing it with character(1) (this allows use of a vector, but doesn't allow representation of the original problem). Perhaps
v <- list("ABC", "DEF;ABC;QWE", "TRF", character(0), "ABC;GFD")
v[sapply(v, length) != 0] <- strsplit(unlist(v), ";", fixed=TRUE)
to do the string split. One might proceed in base R, but I'd recommend the IRanges package
source("http://bioconductor.org/biocLite.R")
biocLite("IRanges")
to install, then
library(IRanges)
w = CharacterList(v)
which gives a list-like structure where all elements must be character vectors.
> w
CharacterList of length 5
[[1]] ABC
[[2]] DEF ABC QWE
[[3]] TRF
[[4]] character(0)
[[5]] ABC GFD
One can then do fun things like ask "are element members equal to ABC"
> w == "ABC"
LogicalList of length 5
[[1]] TRUE
[[2]] FALSE TRUE FALSE
[[3]] FALSE
[[4]] logical(0)
[[5]] TRUE FALSE
or "are any element members equal to ABC"
> any(w == "ABC")
[1] TRUE TRUE FALSE FALSE TRUE
This will scale very well. For operations not supported "out of the box", the strategy (computationally cheap) is to unlist then transform to an equal-length vector then relist using the original CharacterList as a skeleton, for instance to use reverse on each member:
> relist(reverse(unlist(w)), w)
CharacterList of length 5
[[1]] CBA
[[2]] FED CBA EWQ
[[3]] FRT
[[4]] character(0)
[[5]] CBA DFG
As #eddi points out, this is slower than grep. The motivation is (a) to avoid needing to formulate complicated regular expressions while (b) gaining flexibility for other operations one might like to do on data structured like this.

Using strsplit with sapply and %in%:
v <- c("ABC","DEF;ABC;QWE","TRF",character(1),"ABC;GFD")
sapply(strsplit(v,";"),function(x) "ABC" %in% x)
#[1] TRUE TRUE FALSE FALSE TRUE

aaply: Why does aaply(data,2,class) returns "data.frame" for all columns?

I have a question about aaply. I want to check which column is.numeric but the return values of aaply are kind of unexpected. Below is example code. Why do I get "data.frame" for all columns (which explains why is.numeric is FALSE even for columns with numeric vectors)?
Thanks!
data=data.frame(str=rep("str",3),num=c(1:3))
is.numeric(data[,1])
# FALSE
is.numeric(data[,2])
# TRUE
aaply(data,2,is.numeric)
# FALSE FALSE
aaply(data,2,class)
# "data.frame" "data.frame"
EDIT: In other situations this produces a warning message:
aaply(data,2,mean)
# 1: mean(<data.frame>) is deprecated.
# Use colMeans() or sapply(*, mean) instead.

It is the way aaply works, you could even use identity to see what is passed to each function call, a data.frame representing each column of data:
aaply(data, 2, identity)
# $num
# num
# 1 1
# 2 2
# 3 3
#
# $str
# str
# 1 str
# 2 str
# 3 str
So using aaply the way you want, you would have to use a function that extracts the first column of each data.frame, something like:
aaply(data, 2, function(df)is.numeric(df[[1]]))
# num str
# TRUE FALSE
but it seems much easier to just do:
sapply(data, is.numeric)
# str num
# FALSE TRUE

The basic reason is that you are providing aaply with an argument of a class it is not designed to work with. The first letter of a plyr function signifies the type of argument, in this case "a" for array. It does work as you expect if you offer an array:
> xx <- plyr::aaply(matrix(1:10, 2), 2, class)
> xx
1 2 3 4 5
"integer" "integer" "integer" "integer" "integer"
At least that was my understanding until I read the help page. It says that dataframe input should be accepted and that an array should be the output. So you have discovered either an error in the documentation or a bug in the function. Either way, the correct place to take this up is on the 'manipulatr' Google-newsgroup. There is a fair chance that #hadley will be along to clear things up, since he is a valued contributor here as well.

Passing more than one parameter to an anonymous function in R

Why doesn't this work? or is just the way R works?
Thanks
JJ
a <- c(1,2,3)
b <- 5
lapply(a, function(x) print(x)) # works
lapply(a, function(x,b) print(b)) # doesn't work.
I get --
Error in FUN(c(1, 2, 3)[[1L]], ...) :
argument "b" is missing, with no default

lapply only passes one argument on, because it's only designed to have one argument vary. If you just want to pass extra arguments along, put them as additional options to lapply:
lapply(a, function(x,y) print(y), y=b)
[1] 5
[1] 5
[1] 5
[[1]]
[1] 5
[[2]]
[1] 5
[[3]]
[1] 5
From the lapply help file:
... optional arguments to FUN.
If you want more than one varying argument to be passed to your function, look at mapply.

You could try putting a and b together in a list as follows:
lapply(list(a, b), function(x) print(b))
or specifying an argumant to pass b to as in:
lapply(a, function(x, y=b) print(y))
But I'm not really sure what you're after.

How do I use `[` correctly with (l|s)apply to select a specific column from a list of matrices?

Consider the following situation where I have a list of n matrices (this is just dummy data in the example below) in the object myList
mat <- matrix(1:12, ncol = 3)
myList <- list(mat1 = mat, mat2 = mat, mat3 = mat, mat4 = mat)
I want to select a specific column from each of the matrices and do something with it. This will get me the first column of each matrix and return it as a matrix (lapply() would give me a list either is fine).
sapply(myList, function(x) x[, 1])
What I can't seem able to do is use [ directly as a function in my sapply() or lapply() incantations. ?'[' tells me that I need to supply argument j as the column identifier. So what am I doing wrong that this does't work?
> lapply(myList, `[`, j = 1)
$mat1
[1] 1
$mat2
[1] 1
$mat3
[1] 1
$mat4
[1] 1
Where I would expect this:
$mat1
[1] 1 2 3 4
$mat2
[1] 1 2 3 4
$mat3
[1] 1 2 3 4
$mat4
[1] 1 2 3 4
I suspect I am getting the wrong [ method but I can't work out why? Thoughts?

I think you are getting the 1 argument form of [. If you do lapply(myList, `[`, i =, j = 1) it works.

After two pints of Britain's finest ale and a bit of cogitation, I realise that this version will work:
lapply(myList, `[`, , 1)
i.e. don't name anything and treat it like I had done mat[ ,1]. Still don't grep why naming j doesn't work...
...actually, having read ?'[' more closely, I notice the following section:
Argument matching:
Note that these operations do not match their index arguments in
the standard way: argument names are ignored and positional
matching only is used. So ‘m[j=2,i=1]’ is equivalent to ‘m[2,1]’
and *not* to ‘m[1,2]’.
And that explains my quandary above. Yeah for actually reading the documentation.

It's because [ is a .Primitive function. It has no j argument. And there is no [.matrix method.
> `[`
.Primitive("[")
> args(`[`)
NULL
> methods(`[`)
[1] [.acf* [.AsIs [.bibentry* [.data.frame
[5] [.Date [.difftime [.factor [.formula*
[9] [.getAnywhere* [.hexmode [.listof [.noquote
[13] [.numeric_version [.octmode [.person* [.POSIXct
[17] [.POSIXlt [.raster* [.roman* [.SavedPlots*
[21] [.simple.list [.terms* [.ts* [.tskernel*
Though this really just begs the question of how [ is being dispatched on matrix objects...

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Vector-list comparison in R - r

I would use intersect, something like this : lapply(lists,intersect,vector) [[1]] [1] 1 2 3 4 [[2]] numeric(0) [[3]] numeric(0)

Related

Why when I pass a dataframe of integers to apply function in R the variable gets transformed?

Avoiding a loop on a strsplit list

aaply: Why does aaply(data,2,class) returns "data.frame" for all columns?

Passing more than one parameter to an anonymous function in R

How do I use `[` correctly with (l|s)apply to select a specific column from a list of matrices?

Categories

Resources