R switch() how to compare cases to a vector? - r

I've got a little issue with the switch statement in R.
The following code is OK ... :
value = "B"
switch(value,
"A"={
print("value = A")
},
"B"={
print("value = B")
},
"C"={
print("value = C")
},
{
print("Other !!!")
}
)
It returns :
[1] "value = B"
... but I would like to compare value to a vector's values like this :
mychr = c("A","B", "C")
value = "B"
switch(value,
mychr[1]={
print(mychr[1])
},
mychr[2]={
print(mychr[2])
},
mychr[3]={
print(mychr[3])
},
{
print("Other !!!")
}
)
And that doesn't work.
I can do this with "if else", or with the first code but I would like to understand the mistake.

Im sure there are easier ways but you can use alist and do.call
xlist = c("A","B", "C")
value = "B"
myList <- alist(print(xlist[1]), print(xlist[2]), print(xlist[3]), print("Other !!!"))
myList <- setNames(myList, c(xlist, ''))
do.call(switch, c(EXPR = value, myList))
Examples:
> value = "D"
> do.call(switch, c(EXPR = value, myList))
[1] "Other !!!"
> value = "A"
> do.call(switch, c(EXPR = value, myList))
[1] "A"
> value = "C"
> do.call(switch, c(EXPR = value, myList))
[1] "C"
> value = "B"
> do.call(switch, c(EXPR = value, myList))
[1] "B"

Your mistake is that the switch alternatives require literal names and can't have a variable name. You're naming the arguments of the function and they internally become names of items in a list. In fact, the "" aren't necessary around your A, B, C in your first example. Knowing that, the problem becomes more obvious. Try assigning b <- "B" and then using b as the second switch item. That will fail as well because it's not going to resolve the variable name, just take it as a literal. In your case it fails with a syntax problem because alist[1] isn't a valid alternative label without quotes.
Similar restrictions are placed on switch statements in other languages as well. Think of the alternatives in the switch as being labelled like names of items in a list structure. They can be pretty much anything quoted but they don't need quotes and then there are restrictions on what they can be. What they cannot be is variable.
As you recognize in your question, there are alternative ways to do this. Here's a concise one that will be much faster than a switch statement (even if you did manage to solve the variable list items problem indirectly with a call function).
i <- which(alist == value)
if (length(i) == 1){
print( paste0( 'value = ', alist[i]) )
}else {
print( 'Other !!' ) }

The == operator works on vectors, so what you need is just:
> a <- c("A", "B", "C")
> a
[1] "A" "B" "C"
> b <- "B"
> b == a
[1] FALSE TRUE FALSE
Alternatively, you can use which
> which(a == b)
[1] 2
Note that can return several elements if a contains more than one instance of b
You can then proceed using an if or ifelse or switch statement on the result.
PS: you should avoid using list as a variable name, as it is a standard function in R

Try using vapply.
i.e. in your case
vapply(mychr,switch,"",...SWITCH ARGS...)
I think what you would want is:
vapply(mychr,switch,"",
mychr[1]=mychr[1],mychr[2]=mychr[2],mychr[3]=mychr[3],"Other !!!")

Related

Add void / blank line between print()

I built a function that print results in R console.
The function works within a loop.
Several prints are made when the function is used.
I'd like to add void/space line between printed results
thx!
short example of data and working loop :
test <- c("A","B","C")
for (i in 1:length(test)){
print(test[i])
print("")
}
actual output :
[1] "A"
[1] ""
[1] "B"
[1] ""
[1] "C"
[1] ""
desired output :
[1] "A"
[1] "B"
[1] "C"
The comments under your question already provide hacks how to solve your problem. Nevertheless, it is worthwhile to mention some problems of your solution.
First, you are using a for loop, which is notoriously slow in R. If possible, it is always better to use a builtin vectorized function, e.g. (To add a blank line, just use "\n\n" instead of "\n"):
> cat(test, sep="\n")
A
B
C
Moreover, it might generally be preferable, for legibility reasons, to use formatted output with sprintf and cat, e.g.
> for (i in 1:length(test)) {
> cat(sprintf("test[%i] = %s\n", i, test[i]))
> }
test[1] = A
test[2] = B
test[3] = C
And because sprintf is a vectorized function, too, the same can be achieved without a for loop:
> cat(sprintf("test[%i] = %s\n", 1:length(test), test), sep="")
test[1] = A
test[2] = B
test[3] = C

strsplit(rquote, split = "")[[1]] in R

rquote <- "r's internals are irrefutably intriguing"
chars <- strsplit(rquote, split = "")[[1]]
This question has been asked before on this forum and has one answer on it but I couldn't understand anything from that answer, so here I am asking this question again.
In the above code what is the meaning of [[1]] ?
The program that I'm trying to run:
rquote <- "r's internals are irrefutably intriguing"
chars <- strsplit(rquote, split = "")[[1]]
rcount <- 0
for (char in chars) {
if (char == "r") {
rcount <- rcount + 1
}
if (char == "u") {
break
}
}
print(rcount)
When I don't use [[1]] I get the following warning message in for loop and I get a wrong output of 1 for rcount instead of 5:
Warning message: the condition has length > 1 and only the first element will be used
strsplit is vectorized. That means it splits each element of a vector into a vectors. To handle this vector of vectors it returns a list in which a slot (indexed by [[) corresponds to a element of the input vector.
If you use the function on a one element vector (single string as you do), you get a one-slot list. Using [[1]] right after strsplit() selects the first slot of the list - the anticipated vector.
Unfortunately, your list chars works in a for loop - you have one iteration with the one slot. In if you compare the vector of letters against "r" which throws the warning. Since the first element of the comparison is TRUE, the condition holds and rcount is rised by 1 = your result. Since you are not indexing the letters but the one phrase, the cycle stops there.
Maybe if you run something like strsplit(c("one", "two"), split="") , the outcome will be more straightforward.
> strsplit(c("one", "two"), split="")
[[1]]
[1] "o" "n" "e"
[[2]]
[1] "t" "w" "o"
> strsplit(c("one", "two"), split="")[[1]]
[1] "o" "n" "e"
> strsplit(c("one"), split="")[[1]][2]
[1] "n"
We'll start with the below as data, without [[1]]:
rquote <- "r's internals are irrefutably intriguing"
chars2 <- strsplit(rquote, split = "")
class(chars2)
[1] "list"
It is always good to have an estimate of your return value, your above '5'. We have both length and lengths.
length(chars2)
[1] 1 # our list
lengths(chars2)
[1] 40 # elements within our list
We'll use lengths in our for loop for counter, and, as you did, establish a receiver vector outside the loop,
rcount2 <- 0
for (i in 1:lengths(chars2)) {
if (chars2[[1]][i] == 'r') {
rcount2 <- rcount2 +1
}
if (chars2[[1]][i] == 'u') {
break
}
}
print(rcount2)
[1] 6
length(which(chars2[[1]] == 'r')) # as a check, and another way to estimate
[1] 6
Now supposing, rather than list, we have a character vector:
chars1 <- strsplit(rquote, split = '')[[1]]
length(chars1)
[1] 40
rcount1 <- 0
for(i in 1:length(chars1)) {
if(chars1[i] == 'r') {
rcount1 <- rcount1 +1
}
if (chars1[i] == 'u') {
break
}
}
print(rcount1)
[1] 5
length(which(chars1 == 'r'))
[1] 6
Hey, there's your '5'. What's going on here? Head scratch...
all.equal(chars1, unlist(chars2))
[1] TRUE
That break should just give us 5 'r' before a 'u' is encountered. What's happening when it's a list (or does that matter...?), how does the final r make it into rcount2?
And this is where the fun begins. Jeez. break for coffee and thinking. Runs okay. Usual morning hallucination. They come and go. But, as a final note, when you really want to torture yourself, put browser() inside your for loop and step thru.
Browse[1]> i
[1] 24
Browse[1]> n
debug at #7: break
Browse[1]> chars2[[1]][i] == 'u'
[1] TRUE
Browse[1]> n
> rcount2
[1] 5

Recode if string (with punctuation) contains certain text

How can I search through a character vector and, if the string at a given index contains a pattern, replace that index's value?
I tried this:
List <- c(1:8)
Types<-as.character(c(
"ABC, the (stuff).\n\n\n fun", "meaningful", "relevant", "rewarding",
"unpleasant", "enjoyable", "engaging", "disinteresting"))
for (i in List) {
if (grepl(Types[i], "fun", fixed = TRUE))
{Types[i]="1"
} else if (grepl(Types[i], "meaningful", fixed = TRUE))
{Types[i]="2"}}
The code works for "meaningful", but doesn't when there's punctuation or other things in the string, as with "fun".
The first argument to grepl is the pattern, not the string.
This would be a literal fix of your code:
for (i in seq_along(Types)) {
if (grepl("fun", Types[i], fixed = TRUE)) {
Types[i] = "1"
} else if (grepl("meaningful", Types[i], fixed = TRUE)) {
Types[i] = "2"
}
}
Types
# [1] "1" "2" "relevant" "rewarding" "unpleasant"
# [6] "enjoyable" "engaging" "disinteresting"
BTW, the use of List works, but it's a little extra: when you have separate variables like that, it is possible that one might go out of sync with the other. For instance, if you update Types and forget to update List, then it will break (or fail). For this, I used seq_along(Types) instead.
BTW: here's a slightly different version that leaves Types untouched and returns a new vector, and is introducing you to the power of vectorization:
Types[grepl("fun", Types, fixed = TRUE)] <- "1"
Types[grepl("meaningful", Types, fixed = TRUE)] <- "2"
Types
# [1] "1" "2" "relevant" "rewarding" "unpleasant"
# [6] "enjoyable" "engaging" "disinteresting"
The next level (perhaps over-complicating?) would be to store the patterns and recoding replacements in a frame (always a 1-to-1, you'll never accidentally update one without the other, can be stored in CSV if needed) and Reduce on it:
ptns <- data.frame(ptn = c("fun", "meaningful"), repl = c("1", "2"))
Reduce(function(txt, i) {
txt[grepl(ptns$ptn[i], txt, fixed = TRUE)] <- ptns$repl[i]
txt
}, seq_len(nrow(ptns)), init = Types)
# [1] "1" "2" "relevant" "rewarding" "unpleasant"
# [6] "enjoyable" "engaging" "disinteresting"
You could use str_replace_all:
library(stringr)
pat <- c(fun = '1', meaningful = '2')
str_replace_all(Types, setNames(pat, sprintf('(?s).*%s.*', names(pat))))
[1] "1" "2" "relevant"
[4] "rewarding" "unpleasant" "enjoyable"
[7] "engaging" "disinteresting"
Try to use str_replace(string, pattern, replacement) from string package.

R - How to extract object names from expression

Given an rlang expression:
expr1 <- rlang::expr({
d <- a + b
})
How to retrieve the names of the objects refered to within the expression ?
> extractObjects(expr1)
[1] "d" "a" "b"
Better yet, how to retrieve the object names and categorise them by "required"(input) and "created"(output) ?
> extractObjects(expr1)
$created
[1] "d"
$required
[1] "a" "b"
The base function all.vars does this:
〉all.vars(expr1)
[1] "d" "a" "b"
Alternatively, you can use all.names to get all names in the expression rather than just those that aren’t used as calls or operators:
〉all.names(expr1)
[1] "{" "<-" "d" "+" "a" "b"
Don’t be misled: this result is correct! All of these appear in the expression, not just a, b and d.
But it may not be what you want.
In fact, I’m assuming what you want corresponds to the leaf tokens in the abstract syntax tree (AST) — in other words, everything except function calls (and operators, which are also function calls).
The syntax tree for your expression looks as follows:1
{
|
<-
/\
d +
/ \
a b
Getting this information means walking the AST:
leaf_nodes = function (expr) {
if(is.call(expr)) {
unlist(lapply(as.list(expr)[-1L], leaf_nodes))
} else {
as.character(expr)
}
}
〉leaf_nodes(expr1)
[1] "d" "a" "b"
Thanks to the AST representation we can also find inputs and outputs:
is_assignment = function (expr) {
is.call(expr) && as.character(expr[[1L]]) %in% c('=', '<-', '<<-', 'assign')
}
vars_in_assign = function (expr) {
if (is.call(expr) && identical(expr[[1L]], quote(`{`))) {
vars_in_assign(expr[[2L]])
} else if (is_assignment(expr)) {
list(created = all.vars(expr[[2L]]), required = all.vars(expr[[3L]]))
} else {
stop('Expression is not an assignment')
}
}
〉vars_in_assign(expr1)
$created
[1] "d"
$required
[1] "a" "b"
Note that this function does not handle complex assignments (i.e. stuff like d[x] <- a + b or f(d) <- a + b very well.
1 lobstr::ast shows the syntax tree differently, namely as
█─`{`
└─█─`<-`
├─d
└─█─`+`
├─a
└─b
… but the above representation is more conventional outside R, and I find it more intuitive.
Another solution is to extract the abstract symbolic tree explicitly:
getAST <- function(ee) purrr::map_if(as.list(ee), is.call, getAST)
str(getAST(expr1))
# List of 2
# $ : symbol {
# $ :List of 3
# ..$ : symbol <-
# ..$ : symbol d
# ..$ :List of 3
# .. ..$ : symbol +
# .. ..$ : symbol a
# .. ..$ : symbol b
Then traverse the AST to find the assignment(s):
extractObjects <- function(ast)
{
## Ensure that there is at least one node
if( length(ast) == 0 ) stop("Provide an AST")
## If we are working with the assigment
if( identical(ast[[1]], as.name("<-")) ) {
## Separate the LHS and RHS
list(created = as.character(ast[[2]]),
required = sapply(unlist(ast[[3]]), as.character))
} else {
## Otherwise recurse to find all assignments
rc <- purrr::map(ast[-1], extractObjects)
## If there was only one assignment, simplify reporting
if( length(rc) == 1 ) purrr::flatten(rc)
else rc
}
}
extractObjects( getAST(expr1) )
# $created
# [1] "d"
#
# $required
# [1] "+" "a" "b"
You may then filter math operators out, if needed.
This is an interesting one. I think that conceptually, it might not be clear in ALL possible expressions what exactly is input and output. If you look at the so called abstract syntax tree (AST), which you can visualize with lobstr::ast(), it looks like this.
So in simple cases when you always have LHS <- operations on RHS variables, if you iterate over the AST, you will always get the LST right after the <- operator. If you assign z <- rlang::expr(d <- a+b), then z behaves like a list and you can for example do the following:
z <- rlang::expr(d <- a+b)
for (i in 1:length(z)) {
if (is.symbol(z[[i]])) {
print(paste("Element", i, "of z:", z[[i]], "is of type", typeof(z[[i]])))
if (grepl("[[:alnum:]]", z[[i]])) {print(paste("Seems like", z[[i]], "is a variable"))}
} else {
for (j in 1:length(z[[i]])){
print(paste("Element", j, paste0("of z[[",i,"]]:"), z[[i]][[j]], "is of type", typeof(z[[i]][[j]])))
if (grepl("[[:alnum:]]", z[[i]][[j]])) {print(paste("Seems like", z[[i]][[j]], "is a variable"))}
}
}
}
#> [1] "Element 1 of z: <- is of type symbol"
#> [1] "Element 2 of z: d is of type symbol"
#> [1] "Seems like d is a variable"
#> [1] "Element 1 of z[[3]]: + is of type symbol"
#> [1] "Element 2 of z[[3]]: a is of type symbol"
#> [1] "Seems like a is a variable"
#> [1] "Element 3 of z[[3]]: b is of type symbol"
#> [1] "Seems like b is a variable"
Created on 2020-09-03 by the reprex package (v0.3.0)
As you can see these trees can quickly get complicated and nested. So in a simple case like in your example, assuming that variables are using alphanumeric representations, we can kind of identify what the "objects" (as you call them) are and what are operators (which don't match the [[:alnum:]] regex). As you can see the type cannot be used to distinguish between objects and operators since it is always symbol (btw z below is a language as is z[[3]] which is why we can condition on whether z[[i]] is a symbol or not and if not, dig a level deeper). You could then (at your peril) try to classify that the objects that appear immediately after a <- are "outputs" and the rest are "inputs" but I don't have too much confidence in this, especially for more complex expressions.
In short, this is all very speculative.

names of leaves of nested list in R

I want to check if two nested lists have the same names at the last level.
If unlist gave an option not to concatenate names this would be trivial. However, it looks like I need some function leaf.names():
X <- list(list(a = pi, b = list(alpha.c = 1:5, 7.12)), d = "a test")
leaf.names(X)
[1] "a" "alpha.c" "NA" "d"
I want to avoid any inelegant grepping if possible. I feel like there should be some easy way to do this with rapply or unlist...
leaf.names <- function(X) names(rlang::squash(X))
or
leaf.names <- function(X){
while(any(sapply(X, is.list))) X <- purrr::flatten(X)
names(X)
}
gives
leaf.names(X)
# [1] "a" "alpha.c" "" "d"

Resources