Get an element from a vector with its attribute in R - r

gregexpr returns a list containing a vector with some additional data:
[[1]]
[1] 21 136 409 512 587 693
attr(,"match.length")
[1] 3 4 5 5 4 9
How do I extract just one element with a corresponding attribute at once?
[[1]]
[1] 409
attr(,"match.length")
[1] 5
UPD: The final object must be compatible with regmatches function.

In general, there's no way for R to know that elements of the vector correspond 1-1 with elements of one of its attributes.
If you know this is true (as it is with gregexpr results), then the way to tell R about it is to set a class on the object, and write your own subsetting code. For example,
`[.gregexpr_result` <- function(x, i) {
attrs <- lapply(x, function(element) {
allattrs <- attributes(element)
allattrs[["match.length"]] <- allattrs[["match.length"]][i]
allattrs
})
x <- lapply(x, `[`, i)
for (j in seq_along(x))
attributes(x[[j]]) <- attrs[[j]]
x
}
x <- paste(letters[1:2], letters[1:2])
result <- gregexpr("b", x)
class(result) <- "gregexpr_result"
result
#> [[1]]
#> [1] -1
#> attr(,"match.length")
#> [1] -1
#> attr(,"index.type")
#> [1] "chars"
#> attr(,"useBytes")
#> [1] TRUE
#>
#> [[2]]
#> [1] 1 3
#> attr(,"match.length")
#> [1] 1 1
#> attr(,"index.type")
#> [1] "chars"
#> attr(,"useBytes")
#> [1] TRUE
#>
#> attr(,"class")
#> [1] "gregexpr_result"
result[2]
#> [[1]]
#> [1] NA
#> attr(,"match.length")
#> [1] NA
#> attr(,"index.type")
#> [1] "chars"
#> attr(,"useBytes")
#> [1] TRUE
#>
#> [[2]]
#> [1] 3
#> attr(,"match.length")
#> [1] 1
#> attr(,"index.type")
#> [1] "chars"
#> attr(,"useBytes")
#> [1] TRUE
Created on 2022-11-20 with reprex v2.0.2

We may do
out <- lapply(lst1, `[`, 3)
attr(out, "match.length") <- attr(lst1, "match.length")[3]
-output
> out
[[1]]
[1] 409
attr(,"match.length")
[1] 5
data
lst1 <- structure(list(c(21, 136, 409, 512, 587, 693)),
match.length = c(3,
4, 5, 5, 4, 9))

Related

Subset with only consecutive numbers

I want to generate all subset of {1,2,3,4} with only consecutive numbers. (For example I want subset {1}, {1,2} or {2,3,4} but not {2,4}. )
This is what I have been trying:
library(ggm)
p2<-powerset(1:4, sort = TRUE, nonempty = TRUE)
m2<-p2
for (i in 1:length(p2)){
ifelse(length(p2[[i]]) <2, m2<-m2, ifelse(max(diff(as.numeric(p2[[i]])))>1, m2<-m2[-
c(i)],m2<-m2))
}
I want to first generate power set of {1,2,3,4} and exclude subsets with inconsecutive numbers. But when I am doing the
m2<-m2[- c(i)]
command in the 2nd ifelse to exclude subsets with inconsecutive numbers, I believe I change the index of power set so I keep getting the wrong subsets as I desired.
Any suggestions on how to do it correctly?
Thanks!
You can get all unique ascending sequences between 1 and 4 in base R with the following one-liner:
apply(which(upper.tri(diag(4), TRUE), TRUE), 1, function(x) x[1]:x[2])
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 1 2
#>
#> [[3]]
#> [1] 2
#>
#> [[4]]
#> [1] 1 2 3
#>
#> [[5]]
#> [1] 2 3
#>
#> [[6]]
#> [1] 3
#>
#> [[7]]
#> [1] 1 2 3 4
#>
#> [[8]]
#> [1] 2 3 4
#>
#> [[9]]
#> [1] 3 4
#>
#> [[10]]
#> [1] 4

Why `lapply` returns result of assignment automatically?

q <- lapply(1:3, function(x) x ** 2)
## returns nothing, because it is an assignment
# however, how you explain this?:
> lapply(list(1:3, 4:6, 7:9, 10:11), function(v) q <- lapply(v, function(x) x ** 2))
[[1]]
[[1]][[1]]
[1] 1
[[1]][[2]]
[1] 4
[[1]][[3]]
[1] 9
[[2]]
[[2]][[1]]
[1] 16
[[2]][[2]]
[1] 25
[[2]][[3]]
[1] 36
[[3]]
[[3]][[1]]
[1] 49
[[3]][[2]]
[1] 64
[[3]][[3]]
[1] 81
[[4]]
[[4]][[1]]
[1] 100
[[4]][[2]]
[1] 121
# while this gives the same but is logical (q is stated as return value).
> lapply(list(1:3, 4:6, 7:9, 10:11), function(v) {q <- lapply(v, function(x) x ** 2);q})
[[1]]
[[1]][[1]]
[1] 1
[[1]][[2]]
[1] 4
[[1]][[3]]
[1] 9
[[2]]
[[2]][[1]]
[1] 16
[[2]][[2]]
[1] 25
[[2]][[3]]
[1] 36
[[3]]
[[3]][[1]]
[1] 49
[[3]][[2]]
[1] 64
[[3]][[3]]
[1] 81
[[4]]
[[4]][[1]]
[1] 100
[[4]][[2]]
[1] 121
why in the second expression, although the inner lapply is just assigned to q but q not called at end of function, the value of the assignment
is returned to the outer lapply and thus collected?
Please, anybody has an explanation for this phenomenon?
It also works with =
lapply(list(1:3, 4:6, 7:9, 10:11), function(v) q = lapply(c(v), function(x) x ** 2))
[[1]]
[[1]][[1]]
[1] 1
[[1]][[2]]
[1] 4
[[1]][[3]]
[1] 9
[[2]]
[[2]][[1]]
[1] 16
[[2]][[2]]
[1] 25
[[2]][[3]]
[1] 36
[[3]]
[[3]][[1]]
[1] 49
[[3]][[2]]
[1] 64
[[3]][[3]]
[1] 81
[[4]]
[[4]][[1]]
[1] 100
[[4]][[2]]
[1] 121
The answer lies in the return value of an assignment operation. The assignment operator <- not only writes a value to a variable in the calling environment, it actually invisibly returns the assigned value itself to the caller.
Remember all operations in R are actually functions. When you do
x <- 3
You are actually doing
`<-`(x, 3)
Which not only creates the symbol "x" in the calling environment and assigns the value 3 to that symbol, but invisibly returns the value 3 to the caller. To see this, consider:
y <- 2
y
#> [1] 2
y <- `<-`(x, 3)
y
#> [1] 3
Or equivalently,
y <- (x <- 4)
y
#> [1] 4
And in fact, because of R's order of evaluation, we can even do:
y <- x <- 5
y
#> [1] 5
Which is a neat way of setting multiple variables to the same value on the same line.
Now consider the lambda function you use inside your lapply:
function(v) q <- lapply(v, function(x) x ** 2)
Look what happens when we consider this function as a stand-alone:
func <- function(v) q <- lapply(v, function(x) x ** 2)
func(1:3)
As predicted, nothing happens. But what happens when we do:
a <- func(1:3)
If func(1:3) doesn't return anything, then presumably a should be empty now.
But it isn't...
a
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 4
#>
#> [[3]]
#> [1] 9
Because the value of assignation was returned to the caller invisibly, we were able to assign it to a value in the calling scope. Therefore, doing
lapply(list(1:3, 4:6, 7:9, 10:11), function(v) q <- lapply(v, function(x) x ** 2))
assigns the value of your inner function applied to all the list elements to a new list. This list is not returned invisibly, but just returned as normal.
So this is expected behaviour.

Change every element of every list (nested lists) using purrr

I have hundreds of observations of census data - each feature is stored within a list with the name census. I am trying to perform an action
a) on all elements of all lists: I want to make all non character elements numeric.
b) a named element present within each list: I want to remove a prefix from a named column in every list
A toy example below.
Census is a nested list within a list
library(tidyverse)
library(purrr)
POA_CODE = c("POA101","POA102")
dogs = c(4,4)
cats = c(3,2)
children = c(0, 1)
salary = c(100, 120)
employed.prop = c(1,0.5)
pets <- list(POA_CODE, as.integer(dogs), as.integer(cats))
children <-list(POA_CODE, as.integer(children))
employment <-list(POA_CODE, salary, employed.prop)
census <- list(pets, children, employment)
Attempt to change all non-numeric elements in every list to numeric
#change all non-numeric elements to numeric
census_num <- census %>%
map(function(x){
ifelse(is.character == TRUE, x,
as.numeric(x))}
)
I get the following error message:
Error in is.character == TRUE :
comparison (1) is possible only for atomic and list types
Attempt to remove prefix from every postcode in census[[]]$'POA_CODE'
#Remove "POA" prefix from every postcode
census_code <- pmap(census, ~.x[["POA_CODE"]],function(x){
str_replace(POA_CODE,"POA","")
})
I get the error
Error: Element 2 of `.l` must have length 1 or 3, not 2
You have a nested list, so you need nested maps :
library(purrr)
map(census, function(x) map_if(x, is.character, ~as.numeric(sub('POA', '', .x))))
#[[1]]
#[[1]][[1]]
#[1] 101 102
#[[1]][[2]]
#[1] 4 4
#[[1]][[3]]
#[1] 3 2
#[[2]]
#[[2]][[1]]
#[1] 101 102
#[[2]][[2]]
#[1] 0 1
#[[3]]
#[[3]][[1]]
#[1] 101 102
#[[3]][[2]]
#[1] 100 120
#[[3]][[3]]
#[1] 1.0 0.5
In base R, we can solve it with nested lapply :
lapply(census, function(x) lapply(x, function(y)
if(is.character(y)) as.numeric(sub('POA', '', y)) else y))
You could use rapply() in base R:
rapply(
census,
function(x) if(is.character(x)) as.numeric(sub("^\\D+","", x)) else x,
how = "replace")
#> [[1]]
#> [[1]][[1]]
#> [1] 101 102
#>
#> [[1]][[2]]
#> [1] 4 4
#>
#> [[1]][[3]]
#> [1] 3 2
#>
#>
#> [[2]]
#> [[2]][[1]]
#> [1] 101 102
#>
#> [[2]][[2]]
#> [1] 0 1
#>
#>
#> [[3]]
#> [[3]][[1]]
#> [1] 101 102
#>
#> [[3]][[2]]
#> [1] 100 120
#>
#> [[3]][[3]]
#> [1] 1.0 0.5
or purrr::map_depth()
library(purrr)
map_depth(census, 2, ~if(is.character(.)) as.numeric(sub("^\\D+","", .)) else .)
#> [[1]]
#> [[1]][[1]]
#> [1] 101 102
#>
#> [[1]][[2]]
#> [1] 4 4
#>
#> [[1]][[3]]
#> [1] 3 2
#>
#>
#> [[2]]
#> [[2]][[1]]
#> [1] 101 102
#>
#> [[2]][[2]]
#> [1] 0 1
#>
#>
#> [[3]]
#> [[3]][[1]]
#> [1] 101 102
#>
#> [[3]][[2]]
#> [1] 100 120
#>
#> [[3]][[3]]
#> [1] 1.0 0.5
We can use rrapply with parse_number
library(rrapply)
library(readr)
rrapply(census, f = function(x) if(is.character(x)) readr::parse_number(x) else x)
#[[1]]
#[[1]][[1]]
#[1] 101 102
#[[1]][[2]]
#[1] 4 4
#[[1]][[3]]
#[1] 3 2
#[[2]]
#[[2]][[1]]
#[1] 101 102
#[[2]][[2]]
#[1] 0 1
#[[3]]
#[[3]][[1]]
#[1] 101 102
#[[3]][[2]]
#[1] 100 120
#[[3]][[3]]
#[1] 1.0 0.5

Tracing functions in R

I want to trace a function so that it prints all of its arguments at the call
and it prints the return value together with the arguments when it returns the result.
The function trace allows to define action to be performed on entering and on exiting a function call.
Is there a function returning the list of arguments within the function, and is there a way of getting the result value without doing each one of multiple branches
where each branch exits the function?
in the following example, tracing should print a list of both input parameters
(or the function call as text itself) at the call and the return value when the function exits in any one of the branches.
myfun <- function(a,b){
if (a==1) return(b+1)
if (a==2) return(b*10)
return(b)
}
You're looking for the functions match.call() and returnValue():
myfun <- function(a,b){
if (a==1) return(b+1)
if (a==2) return(b*10)
return(b)
}
trace("myfun", tracer = substitute(print(as.list(match.call()))),
exit = substitute(print(returnValue())))
#> [1] "myfun"
myfun(1, 2)
#> Tracing myfun(1, 2) on entry
#> [[1]]
#> myfun
#>
#> $a
#> [1] 1
#>
#> $b
#> [1] 2
#>
#> Tracing myfun(1, 2) on exit
#> [1] 3
#> [1] 3
myfun(2, 2)
#> Tracing myfun(2, 2) on entry
#> [[1]]
#> myfun
#>
#> $a
#> [1] 2
#>
#> $b
#> [1] 2
#>
#> Tracing myfun(2, 2) on exit
#> [1] 20
#> [1] 20
myfun(3, 2)
#> Tracing myfun(3, 2) on entry
#> [[1]]
#> myfun
#>
#> $a
#> [1] 3
#>
#> $b
#> [1] 2
#>
#> Tracing myfun(3, 2) on exit
#> [1] 2
#> [1] 2
Created on 2018-10-07 by the reprex package (v0.2.1)
As Moody_Mudskipper mentions, in the comments, you can also use quote() rather than substitute():
myfun <- function(a,b){
if (a==1) return(b+1)
if (a==2) return(b*10)
return(b)
}
trace("myfun", tracer = quote(print(as.list(match.call()))),
exit = quote(print(returnValue())))
#> [1] "myfun"
myfun(1, 2)
#> Tracing myfun(1, 2) on entry
#> [[1]]
#> myfun
#>
#> $a
#> [1] 1
#>
#> $b
#> [1] 2
#>
#> Tracing myfun(1, 2) on exit
#> [1] 3
#> [1] 3
myfun(2, 2)
#> Tracing myfun(2, 2) on entry
#> [[1]]
#> myfun
#>
#> $a
#> [1] 2
#>
#> $b
#> [1] 2
#>
#> Tracing myfun(2, 2) on exit
#> [1] 20
#> [1] 20
myfun(3, 2)
#> Tracing myfun(3, 2) on entry
#> [[1]]
#> myfun
#>
#> $a
#> [1] 3
#>
#> $b
#> [1] 2
#>
#> Tracing myfun(3, 2) on exit
#> [1] 2
#> [1] 2
Created on 2018-10-07 by the reprex package (v0.2.1)
For an illustration of the difference between the two, see this Stack Overflow question.
Just overlap it with .trace in name?
myfun.trace <- function(a,b){
if (a==1) return({{"a","b"},{a,b}},{b+1})
if (a==2) return({{"a","b"},{a,b}},{b*10})
return({{"a","b"},{a,b}},{b}) }

R grep(): How to search for letter "l"?

I have a problem with R's grep() function apparently finding an "l" everywhere:
> l <- list(list(), list("a"), list("a","l"))
> grep("a",l)
[1] 2 3
> grep("l",l)
[1] 1 2 3
> grep("l",l,fixed=TRUE)
[1] 1 2 3
This problem seems to occur only with the letter "l". Does anyone have a hint on that?
Many thanks,
Cord
If you look at the documentation for the argument x in grep you'll see that it should be
a character vector where matches are sought, or an object which can be coerced by as.character to a character vector.
If you try that operation you'll see what goes wrong:
> as.character(l)
[1] "list()" "list(\"a\")" "list(\"a\", \"l\")"
so the same "problem" happens if you grep for i, s etc.
You could try the following instead
sapply(l, function(i) grep("l", i))
which produces
[[1]]
integer(0)
[[2]]
integer(0)
[[3]]
[1] 2
Interesting post, I never knew grep convert the x vector like this:
l <- list(list(), list("a"), list("a","l"))
l
#> [[1]]
#> list()
#>
#> [[2]]
#> [[2]][[1]]
#> [1] "a"
#>
#>
#> [[3]]
#> [[3]][[1]]
#> [1] "a"
#>
#> [[3]][[2]]
#> [1] "l"
Internally grep is converting l to a character vector
grep
#> function (pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
#> fixed = FALSE, useBytes = FALSE, invert = FALSE)
#> {
#> if (!is.character(x))
#> x <- structure(as.character(x), names = names(x))
#> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#> .Internal(grep(as.character(pattern), x, ignore.case, value,
#> perl, fixed, useBytes, invert))
#> }
#> <bytecode: 0x0000000012e18610>
#> <environment: namespace:base>
So now l is actually:
structure(as.character(l), names = names(l))
#> [1] "list()" "list(\"a\")" "list(\"a\", \"l\")"
Which has "l" in each.
You could unlist l first to get expected results:
ul <- unlist(l)
ul
#> [1] "a" "a" "l"
grep("a",l)
#> [1] 2 3
grep("a",ul)
#> [1] 1 2
grep("l",l)
#> [1] 1 2 3
grep("l",ul)
#> [1] 3
grep("l",l,fixed=TRUE)
#> [1] 1 2 3
grep("l",ul,fixed=TRUE)
#> [1] 3

Resources