Match number within list of different length vectors - r

I want to match a number within a list containing vector of different lengths. Still my solution (below) doesn't match anything beyond the first item of each vector.
seq_ <- seq(1:10)
list_ <- list(seq_[1:3], seq_[4:7], seq_[8:10])
list_
# [[1]]
# [1] 1 2 3
#
# [[2]]
# [1] 4 5 6 7
#
# [[3]]
# [1] 8 9 10
but
for (i in seq_) {
print(match(i,list_))
}
# [1] 1
# [1] NA
# [1] NA
# [1] 3
# [1] NA
# [1] NA
# [1] NA
# [1] NA
# [1] NA
# [1] NA

In the general case, you probably will be happier with which, as in
EDIT: rewrote to show the full looping over values.
seq_ <- seq(1:10)
list_ <- list(seq_[1:3], seq_[4:7], seq_[8:10])
matchlist<-list(length=length(list_))
for( j in 1:length(list_)) {
matchlist[[j]] <- unlist(sapply(seq_, function(k) which(list_[[j]]==k) ))
}
That will return the locations of all matches. It's probably more clear what's happening if you create an input like my.list <- list(sample(1:10,4,replace=TRUE), sample(1:10,7,replace=TRUE))

Related

is there a way I can recycle elements of the shorter list in purrr:: map2 or purrr::walk2?

purrr does not seem to support recycling of elements of a vector in case there is a shortage of elements in one of the two (while using purrr::map2 or purrr::walk2). Unlike baseR where we just get a warning if the larger vector is not a multiple of the shorter one.
Consider this toy example:
This works:
map2(1:3,4:6,sum)
#
#[[1]]
#[1] 5
#[[2]]
#[1] 7
#[[3]]
#[1] 9
And this doesn't work:
map2(1:3,4:9,sum)
Error: .x (3) and .y (6) are different lengths
I understand very well why this is not allowed - as it can make catching bugs very difficult. But is there any way in purrr I can force this to happen? Perhaps using some base R trick with purrr?
You can put both lists in a data frame and let that command repeat your vectors:
input <- data.frame(a = 1:3, b = 4:9)
purrr::map2(input$a, input$b, sum)
It's by design with purrr but you can use Map :
Map(sum,1:3,4:9)
# [[1]]
# [1] 5
#
# [[2]]
# [1] 7
#
# [[3]]
# [1] 9
#
# [[4]]
# [1] 8
#
# [[5]]
# [1] 10
#
# [[6]]
# [1] 12
And here's how I would recycle if I had to :
x <- 1:3
y <- 4:9
l <- max(length(y), length(x))
map2(rep(x,len = l), rep(y,len = l),sum)
# [[1]]
# [1] 5
#
# [[2]]
# [1] 7
#
# [[3]]
# [1] 9
#
# [[4]]
# [1] 8
#
# [[5]]
# [1] 10
#
# [[6]]
# [1] 12

Make elements NA depending on a predicate function

How can I easily change elements of a list or vectors to NAs depending on a predicate ?
I need it to be done in a single call for smooth integration in dplyr::mutate calls etc...
expected output:
make_na(1:10,`>`,5)
# [1] 1 2 3 4 5 NA NA NA NA NA
my_list <- list(1,"a",NULL,character(0))
make_na(my_list, is.null)
# [[1]]
# [1] 1
#
# [[2]]
# [1] "a"
#
# [[3]]
# [1] NA
#
# [[4]]
# character(0)
Note:
I answered my question as I have one solution figured out but Id be happy to get alternate solutions. Also maybe this functionality is already there in base R or packaged in a prominent library
Was inspired by my frustration in my answer to this post
We can build the following function:
make_na <- function(.x,.predicate,...) {
is.na(.x) <- sapply(.x,.predicate,...)
.x
}
Or a bit better to leverage purrr's magic :
make_na <- function(.x,.predicate,...) {
if (requireNamespace("purrr", quietly = TRUE)) {
is.na(.x) <- purrr::map_lgl(.x,.predicate,...)
} else {
if("formula" %in% class(.predicate))
stop("Formulas aren't supported unless package 'purrr' is installed")
is.na(.x) <- sapply(.x,.predicate,...)
}
.x
}
This way we'll be using purrr::map_lgl if library purrr is available, sapply otherwise.
Some examples :
make_na <- function(.x,.predicate,...) {
is.na(.x) <- purrr::map_lgl(.x,.predicate,...)
.x
}
Some use cases:
make_na(1:10,`>`,5)
# [1] 1 2 3 4 5 NA NA NA NA NA
my_list <- list(1,"a",NULL,character(0))
make_na(my_list, is.null)
# [[1]]
# [1] 1
#
# [[2]]
# [1] "a"
#
# [[3]]
# [1] NA
#
# [[4]]
# character(0)
make_na(my_list, function(x) length(x)==0)
# [[1]]
# [1] 1
#
# [[2]]
# [1] "a"
#
# [[3]]
# [1] NA
#
# [[4]]
# [1] NA
If purrr is installed we can use this short form:
make_na(my_list, ~length(.x)==0)

Updating lists in R

I have the following list in R and I want to replace all NULL in the list with zero. Is there a better way of doing this rather than iterating through the list?
$`2014-06-15`
NULL
$`2014-06-16`
[1] 7
$`2014-06-17`
[1] 17
$`2014-06-18`
[1] 24
$`2014-06-19`
[1] 8
$`2014-06-20`
[1] 11
$`2014-06-21`
NULL
$`2014-06-22`
[1] 1
$`2014-06-23`
[1] 20
$`2014-06-24`
[1] 21
In reference to your solution, this way is easier and faster than replacing with a for loop and if statement. Here's a short example.
> ( temp <- list(A = NULL, B = 1:5) )
# $A
# NULL
#
# $B
# [1] 1 2 3 4 5
> temp[sapply(temp, is.null)] <- 0
> temp
# $A
# [1] 0
#
# $B
# [1] 1 2 3 4 5
Nevermind solved it.
temp is my list of dates above
allDates <- names(temp)
for (i in allDates) {
if (is.null(temp[[i]]))
temp[[i]] <- 0
}

function usage and output of lapply

I am trying to play with function of lapply
lapply(1:3, function(i) print(i))
# [1] 1
# [1] 2
# [1] 3
# [[1]]
# [1] 1
# [[2]]
# [1] 2
# [[3]]
# [1] 3
I understand that lapply should be able to perform print (i) against each element i among 1:3
But why the output looks like this.
Besides, when I use unlist, I get the output like the following
unlist(lapply(1:3, function(i) print(i)))
# [1] 1
# [1] 2
# [1] 3
# [1] 1 2 3
The description of lapply function is the following:
"lapply returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X."
Your example:
lapply(1:3, function(x) print(x))
Prints the object x and returns a list of length 3.
str(lapply(1:3, function(x) print(x)))
# [1] 1
# [1] 2
# [1] 3
# List of 3
# $ : int 1
# $ : int 2
# $ : int 3
There are a few ways to avoid this as mentioned in the comments:
1) Using invisible
lapply(1:3, function(x) invisible(x))
# [[1]]
# [1] 1
# [[2]]
# [1] 2
# [[3]]
# [1] 3
unlist(lapply(1:3, function(x) invisible(x)))
# [1] 1 2 3
2) Without explicitly printing inside the function
unlist(lapply(1:3, function(x) x))
# [1] 1 2 3
3) Assining the list to an object:
l1 <- lapply(1:3, function(x) print(x))
unlist(l1)
# [1] 1 2 3

Is there a dictionary functionality in R

Is there a way to create a "dictionary" in R, such that it has pairs?
Something to the effect of:
x=dictionary(c("Hi","Why","water") , c(1,5,4))
x["Why"]=5
I'm asking this because I am actually looking for a two categorial variables function.
So that if x=dictionary(c("a","b"),c(5,2))
x val
1 a 5
2 b 2
I want to compute x1^2+x2 on all combinations of x keys
x1 x2 val1 val2 x1^2+x2
1 a a 5 5 30
2 b a 2 5 9
3 a b 5 2 27
4 b b 2 2 6
And then I want to be able to retrieve the result using x1 and x2. Something to the effect of:
get_result["b","a"] = 9
what is the best, efficient way to do this?
I know three R packages for dictionaries: hash, hashmap, and dict.
Update July 2018: a new one, container.
Update September 2018: a new one, collections
hash
Keys must be character strings. A value can be any R object.
library(hash)
## hash-2.2.6 provided by Decision Patterns
h <- hash()
# set values
h[["1"]] <- 42
h[["foo"]] <- "bar"
h[["4"]] <- list(a=1, b=2)
# get values
h[["1"]]
## [1] 42
h[["4"]]
## $a
## [1] 1
##
## $b
## [1] 2
h[c("1", "foo")]
## <hash> containing 2 key-value pair(s).
## 1 : 42
## foo : bar
h[["key not here"]]
## NULL
To get keys:
keys(h)
## [1] "1" "4" "foo"
To get values:
values(h)
## $`1`
## [1] 42
##
## $`4`
## $`4`$a
## [1] 1
##
## $`4`$b
## [1] 2
##
##
## $foo
## [1] "bar"
The print instance:
h
## <hash> containing 3 key-value pair(s).
## 1 : 42
## 4 : 1 2
## foo : bar
The values function accepts the arguments of sapply:
values(h, USE.NAMES=FALSE)
## [[1]]
## [1] 42
##
## [[2]]
## [[2]]$a
## [1] 1
##
## [[2]]$b
## [1] 2
##
##
## [[3]]
## [1] "bar"
values(h, keys="4")
## 4
## a 1
## b 2
values(h, keys="4", simplify=FALSE)
## $`4`
## $`4`$a
## [1] 1
##
## $`4`$b
## [1] 2
hashmap
See https://cran.r-project.org/web/packages/hashmap/README.html.
hashmap does not offer the flexibility to store arbitrary types of objects.
Keys and values are restricted to "scalar" objects (length-one character, numeric, etc.). The values must be of the same type.
library(hashmap)
H <- hashmap(c("a", "b"), rnorm(2))
H[["a"]]
## [1] 0.1549271
H[[c("a","b")]]
## [1] 0.1549271 -0.1222048
H[[1]] <- 9
Beautiful print instance:
H
## ## (character) => (numeric)
## ## [1] => [+9.000000]
## ## [b] => [-0.122205]
## ## [a] => [+0.154927]
Errors:
H[[2]] <- "Z"
## Error in x$`[[<-`(i, value): Not compatible with requested type: [type=character; target=double].
H[[2]] <- c(1,3)
## Warning in x$`[[<-`(i, value): length(keys) != length(values)!
dict
Currently available only on Github: https://github.com/mkuhn/dict
Strengths: arbitrary keys and values, and fast.
library(dict)
d <- dict()
d[[1]] <- 42
d[[c(2, 3)]] <- "Hello!" # c(2,3) is the key
d[["foo"]] <- "bar"
d[[4]] <- list(a=1, b=2)
d[[1]]
## [1] 42
d[[c(2, 3)]]
## [1] "Hello!"
d[[4]]
## $a
## [1] 1
##
## $b
## [1] 2
Accessing to a non-existing key throws an error:
d[["not here"]]
## Error in d$get_or_stop(key): Key error: [1] "not here"
But there is a nice feature to deal with that:
d$get("not here", "default value for missing key")
## [1] "default value for missing key"
Get keys:
d$keys()
## [[1]]
## [1] 4
##
## [[2]]
## [1] 1
##
## [[3]]
## [1] 2 3
##
## [[4]]
## [1] "foo"
Get values:
d$values()
## [[1]]
## [1] 42
##
## [[2]]
## [1] "Hello!"
##
## [[3]]
## [1] "bar"
##
## [[4]]
## [[4]]$a
## [1] 1
##
## [[4]]$b
## [1] 2
Get items:
d$items()
## [[1]]
## [[1]]$key
## [1] 4
##
## [[1]]$value
## [[1]]$value$a
## [1] 1
##
## [[1]]$value$b
## [1] 2
##
##
##
## [[2]]
## [[2]]$key
## [1] 1
##
## [[2]]$value
## [1] 42
##
##
## [[3]]
## [[3]]$key
## [1] 2 3
##
## [[3]]$value
## [1] "Hello!"
##
##
## [[4]]
## [[4]]$key
## [1] "foo"
##
## [[4]]$value
## [1] "bar"
No print instance.
The package also provides the function numvecdict to deal with a dictionary in which numbers and strings (including vectors of each) can be used as keys, and that can only store vectors of numbers.
You simply create a vector with your key value pairs.
animal_sounds <- c(
'cat' = 'meow',
'dog' = 'woof',
'cow' = 'moo'
)
print(animal_sounds['cat'])
# 'meow'
Update: To answer the 2nd portion of the question, you can create a dataframe and compute the values like this:
val1 <- c(5,2,5,2) # Create val1 column
val2 <- c(5,5,2,2) # Create val2 column
df <- data.frame(val1, val2) # create dataframe variable
df['x1^2+x2'] <- val1^2 + val2 # create expression column
Output:
val1 val2 x1^2+x2
1 5 5 30
2 2 5 9
3 5 2 27
4 2 2 6
You can use just data.frame and row.names to do this:
x=data.frame(row.names=c("Hi","Why","water") , val=c(1,5,4))
x["Why",]
[1] 5
In that vectors, matrices, lists, etc. behave as "dictionaries" in R, you can do something like the following:
> (x <- structure(c(5,2),names=c("a","b"))) ## "dictionary"
a b
5 2
> (result <- outer(x,x,function(x1,x2) x1^2+x2))
a b
a 30 27
b 9 6
> result["b","a"]
[1] 9
If you wanted a table as you've shown in your example, just reshape your array...
> library(reshape)
> (dfr <- melt(result,varnames=c("x1","x2")))
x1 x2 value
1 a a 30
2 b a 9
3 a b 27
4 b b 6
> transform(dfr,val1=x[x1],val2=x[x2])
x1 x2 value val1 val2
1 a a 30 5 5
2 b a 9 2 5
3 a b 27 5 2
4 b b 6 2 2
See my answer to a very recent question. In essence, you use environments for this type of functionality.
For the higher dimensional case, you may be better off using an array (twodimensional) if you want the easy syntax for retrieving the result (you can name the rows and columns). As an alternative,you can paste together the two keys with a separator that doesn't occur in them, and then use that as a unique identifier.
To be specific, something like this:
tmp<-data.frame(x=c("a", "b"), val=c(5,2))
tmp2<-outer(seq(nrow(tmp)), seq(nrow(tmp)), function(lhs, rhs){tmp$val[lhs] + tmp$val[rhs]})
dimnames(tmp2)<-list(tmp$x, tmp$x)
tmp2
tmp2["a", "b"]
Using tidyverse
Adding an answer using more recent tidyverse approaches.
There are probably cleaner ways of handling the crossing (which creates all combinations) and unnesting, but this is a quick and dirty approach.
library(tidyverse)
my_tbl <- tibble(x = c("A", "B"), val=c(5,2)) %>%
crossing(x1 = ., x2 = .) %>% # Create all combinations
unnest_wider(everything(), names_sep="_") %>% # Unpack into distinct columns
mutate(result = x1_val^2 + x2_val) # Calculate result
# Access result by accessing the row in the data frame
my_tbl %>%
filter(x1_x == "A", x2_x == "B") %>%
pull(result)
#> [1] 27
# Convert tibble to a named vector that could be accessed more easily.
# However, this is limited to string names.
my_named_vector <- my_tbl %>%
transmute(name = str_c(x1_x, "_", x2_x), value=result) %>%
deframe()
my_named_vector[["A_B"]]
#> [1] 27
Created on 2022-04-06 by the reprex package (v2.0.1)
tibble version 3.1.6
dplyr version 1.0.8
tidyr version 1.2.0
stringr version 1.4.0

Resources