R: Collect intermediate output of recursive function - r

I have a recursive function that uses the output of the previous call as the input of the next call:
recurse_foo = function(input) {
if(identical(input, character(0))) return(NULL)
else {
uu = get_incremental_output(input) ## <-- interested in collecting this
return(recurse_foo(uu))
}
}
As is evident, the terminal output is not very interesting, and I am interested in collecting the intermediate output, but I cannot imagine that growing a global list or any other side effect would be elegant (which is the only thing I can think of).
Any other abstractions that might be useful here?
Thanks.
Specific example:
final_countdown = function(input) {
if (input/2 < 1) return(NULL)
else {
uu = input/2 # <-- interested in collecting this
print(uu)
return(final_countdown(uu))
}
}
final_countdown(100)
In this case, I am interested in collecting the sequence of uus that are printed.

This is a solution, if all intermediate outputs are of the same type:
final_countdown = function(input) {
if (input/2 < 1) return(NA)
else {
c(input, final_countdown(input/2))
}
}

Related

Return() Not Working While Print() does after building a function in R

I'm working with panel data in R and am endeavoring to build a function that returns every user ID where PCA==1. I've largely gotten this to work, with one small problem: it only returns the values when I end the function with print() but does not do so when I end the function with return(). As I want the ids in a vector so I can later subset the data to only include those IDs, that's a problem. Code reflected below - can anyone advise on what I'm doing wrong?
The version that works (but doesn't do what I want):
retrievePCA<-function(data) {
for (i in 1:dim(data)[1]) {
if (data$PCA[i] == 1) {
id<-data$CPSIDP[i]
print(id)
}
}
}
retrievePCA(data)
The version that doesn't:
retrievePCA<-function(data) {
for (i in 1:dim(data)[1]) {
if (data$PCA[i] == 1) {
id<-data$CPSIDP[i]
return(id)
}
}
}
vector<-retrievePCA(data)
vector
Your problem is a simple misunderstanding of what a function and returning from a function does.
Take the small example below
f <- function(x){
x <- x * x
return x
x <- x * x
return x
}
f(2)
[1] 4
4 is returned, 8 is not. That is because return exits the function returning the specific value. So in your function the function hits the first instance where PCA[i] == 1 and then exits the function. Instead you should create a vector, list or another alternative and return this instead.
retrievePCA<-function(data) {
ids <- vector('list', nrow(data))
for (i in 1:nrow(data)) {
if (data$PCA[i] == 1) {
ids[[i]] <-data$CPSIDP[i]
}
}
return unlist(ids)
}
However you could just do this in one line
data$CPSIDP[data$PCA == 1]

How to return event$data in rstudio/websocket

I am trying to extend websocket::Websocket with a method that sends some data and returns the message, so that I can assign it to an object. My question is pretty much identical to https://community.rstudio.com/t/capture-streaming-json-over-websocket/16986. Unfortunately, the user there never revealed how they solved it themselves. My idea was to have the onMessage method return the event$data, i.e. something like:
my_websocket <- R6::R6Class("My websocket",
inherit = websocket::WebSocket,
public = list(
foo = function(x) {
msg <- super$send(paste("x"))
return(msg)
} )
)
load_websocket <- function(){
ws <- my_websocket$new("ws://foo.local")
ws$onMessage(function(event) {
return(event$data)
})
return(ws)
}
my_ws <- load_websocket()
my_ws$foo("hello") # returns NULL
but after spending a good hour on the Websocket source code, I am still completely in the dark as to where exactly the callback happens, "R environment wise".
You need to use super assignment operator <<-. <<- is most useful in conjunction with closures to maintain state. Unlike the usual single arrow assignment (<-) that always works on the current level, the double arrow operator can modify variables in parent levels.
my_websocket <- R6::R6Class("My websocket",
inherit = websocket::WebSocket,
public = list(
foo = function(x) {
msg <<- super$send(paste("x"))
return(msg)
} )
)
load_websocket <- function(){
ws <- my_websocket$new("ws://foo.local")
ws$onMessage(function(event) {
return(event$data)
})
return(ws)
}
my_ws <- load_websocket()
my_ws$foo("hello")

Detecting first iteration (cycle) in R loop (without counter)

I'd like to detect the first iteration in a loop within a function from inside the body of the loop (i.e., without using some counter variable defined outside the loop), and in the most flexible possible manner.
Here would be one basic solution, just to demonstrate the idea:
vect = c('x', 'y', 'z')
for (elem in vect) {
print(elem)
isfirst(elem, vect)
}
isfirst = function(ele, vec) {
if (ele == vec[1]) {
print('this is the first cycle!')
} else {
print('this is NOT the first cycle!')
}
}
The "problem" with this is that I want this function to be easily reusable in any loop: that means that it should not need loop-specific arguments such as elem and vect. That is: another loop might use e.g. for (my_item in my_list) etc., and so then the isfirst arguments would need to be modified correspondingly, e.g. isfirst(my_item, my_list). The ideal way would be to just have an isfirst() without any arguments needed.
I'm not sure whether this is even possible, but I welcome any ideas.
(About why I need this: I would simply want to provide users with a function that behaves differently based on whether or not the iteration is the first, and that they can flexibly use in any loop and don't need to make even this small adjustment of changing the arguments.)
Well, here is the closest I could get:
vect = c('x', 'y', 'z')
for (elem in enum(vect)) {
print(elem)
isfirst()
}
enum = function(vec) {
assign("first_iteration", TRUE, envir = .GlobalEnv)
vec = mapply(c, 1:length(vec), vec, SIMPLIFY = FALSE) # this is just a small extra, not related to the question
return(vec)
}
isfirst = function() {
if (first_iteration == TRUE) {
print('this is the first cycle!')
assign("first_iteration", FALSE, envir = .GlobalEnv)
} else {
print('this is NOT the first cycle!')
}
}
But I'm still hoping for a better solution.

What to set return value for function that normally returns matrix but could fail a conditional

I have a function that is supposed to return a matrix for further functions to use. I have it reading in a file and doing calculations, but I need the main function to skip to the next file if the current one does not meet the correct formatting. This is how basically how I have it now:
for (file in list.files(directory)) {
for (i in 1:length(var)) {
matrix <- foo('someFile.txt',var[i]) # returns matrix under normal conditions
if (typeof(matrix)) == "logical") { # check if foo returns FALSE
warning(paste0('File ',file, ' is not formatted correctly'))
break # skip to next file if so
}
...
}
}
foo <- function(input,seq)
data <- readLines(input)
if (!data[1] %in% c("first","line","values")) {
return(FALSE)
}
...
return(data)
}
But testing the class matrix returns seems clunky and poor technique. Sorry I don't know how to phrase the question better.

R -- screening Excel rows according to characteristics of multiple cells

I am trying to eliminate all rows in excel that have he following features:
First column is an integer
Second column begins with an integer
Third column is empty
The code I have written appears to run indefinitely. CAS.MULT is the name of my dataframe.
for (i in 1:nrow(CAS.MULT)) {
testInteger <- function(x) {
test <- all.equal(x, as.integer(x), check.attributes = FALSE)
if (test == TRUE) {
return (TRUE)
}
else {
return (FALSE)
}
}
if (testInteger(as.integer(CAS.MULT[i,1])) == TRUE) {
if (testInteger(as.integer(substring(CAS.MULT[i,2],1,1))) == TRUE) {
if (CAS.MULT[i,3] == '') {
CAS.MULT <- data.frame(CAS.MULT[-i,])
}
}
}
}
You should be very wary of deleting rows within a for loop, if often leads to undesired behavior. There are a number of ways you could handle this. For instance, you can flag the rows for deletion and then delete them after.
Another thing I noticed is that you are converting your columns to integers before passing them to your function to test if they are integers, so you will be incorrectly returning true for all values passed to the function.
Maybe something like this would work (without a reproducible example it's hard to say if it will work or not):
toDelete <- numeric(0)
for (i in 1:nrow(CAS.MULT)) {
testInteger <- function(x) {
test <- all.equal(x, as.integer(x), check.attributes = FALSE)
if (test == TRUE) {
return (TRUE)
}
else {
return (FALSE)
}
}
if (testInteger(CAS.MULT[i,1]) == TRUE) {
if (testInteger(substring(CAS.MULT[i,2],1,1)) == TRUE) {
if (CAS.MULT[i,3] == '') {
toDelete <- c(toDelete, i)
}
}
}
}
CAS.MULT <- CAS.MULT[-1*toDelete,]
Hard to be sure without testing my code on your data, but this might work. Instead of a loop, the code below uses logical indexing based on the conditions you specified in your question. This is vectorized (meaning it operates on the entire data frame at once, rather than by row) and is much faster than looping row by row:
CAS.MULT.screened = CAS.MULT[!(CAS.MULT[,1] %% 1 == 0 |
as.numeric(substring(CAS.MULT[,2],1,1)) %% 1 == 0 |
CAS.MULT[,3] == ""), ]
For more on checking whether a value is an integer, see this SO question.
One other thing: Just for future reference, for efficiency you should define your function outside the loop, rather than recreating the function every time through the loop.

Resources