Global/Local Variable Scoping Difficulty in R - r

I've been trying to dust off my R skills recently and I've been having some difficulty with variable scoping in this particular code.
So my function loop here calls other functions within the program that currently work without any problem calling years and trials (both ints), and simMat (a matrix of numeric). Where my main question lies is with the matrix simMat. I want to be able to call it from the command line and see the values but whenever I do that I'll get a matrix of NAs and I don't know why. I am nearly positive that is something to do with the variable scoping but I am not very familiar with that. Also, the suppressWarnings are to get rid of messages about coercion (don't know a lot about that either, any recommendation is appreciated)
I want to be able to call simMat form the command line and pass it to another function to do some arithmetic. I would greatly appreciate any help here on how I can accomplish this!!!
#This looks the same for the func asking for the num of years and trials
numTrials <- function()
{
trials<- readline(prompt="How many trials? ")
trials<- as.integer(trials)
if (is.na(trials)){
trials<- readinteger()
}
return(trials)
}
#Do the simple cash flow simulation
loop<-function(trials, years)
{
trials<-suppressWarnings(numTrials())
years<-suppressWarnings(numYears())
simMat<-matrix(nrow=trials, ncol=years)
for (i in 1:trials){
sim <- newCashFlow[1]
for (j in 1:years){
simMat[i,j]<-sim
random<-randomRates(cholMat2)
sim = sim + sum(random*newCashFlow[j]*weights)
}
}
simMat
plotSimulation(simMat,years,i)
}

If you intend to access something from the console of R which is acting within the global environment, then you need to create the variable OUTSIDE of the function, in that environment you will be working in. As such it will persist when the loop function has completed its tasks.
To be able to use the matrix simMat outside of the loop create it there.
Also, before doing so, run the following script in your code to see where each variable lives. This will help you understand what happens as you make changes.
Sys.getenv(c("simMat", "trials", "years", "sim"))
or simply call the environment with parent.env(simMat)
This website is a very good one to explain these environment issues.
Hadley Wickham...R Genius!
More Hadley Wickha Genius on Lexical Scoping & Functions
These two sites should get you past anything!

Related

for Loop, replacement has length zero, Shiny

This is a question that has been answered in context to R, so I should have a similar solution. The problem is, my code works in R but not in Shiny ?
error source
for(i in 1:N)
{
rank_free_choice<- rank_free_choice_fn(signal_agent[i], M, gamma, omega, K,m)
website_choice<- website_choice_fn(rank_data_today,alpha,rank_free_choice)
t1<- ranking_algo_fn(rank_data_today, website_choice, kappa)
rank_data_today<- t1
df_website_choice[i,]<- website_choice
df_rank_data[i,]<- rank_data_today
}
Both matrices are initialized before the loop begins, and rank_data_today was also created before.
The function continues further, and multiple outputs are put together in a list before returning it outside the function.
Curiously I have another app that runs this code similarly, and that works fine!! In that one the initial rank data is passed to df_rank_data[i,] and the updated are passed to df_rank_data[i+1,]
Anybody with a solution? Or perhaps could explain this answer in my context?
I figured it out, and since the problem was so bizarre, I'm posting it here in case anyone else runs into a similar problem.
The reason the code wasn't working was because one of the inputs to the function was missing in Shiny!!!!!
So basically it was a plain and simple typo/carelessness but the error didn't really help.
The Shiny app is just a wrapper around a simulation I wrote in R that used functions, taking inputs from other functions. The error only showed up in the penultimate function [No real way to trace it]
It was working in R because I didn't have to separately input any values as I'd already saved the code.

Using For-Loop With Strings

I'm learning R and trying to use it for a statistical analysis at the same time.
Here, I am in the first part of the work: I am writing matrices and doing some simple things with them, in order to work later with these.
punti<-c(0,1,2,4)
t1<-matrix(c(-8,36,-8,-20,51,-17,-17,-17,57,-19,-19,-19,35,-8,-19,-8,0,0,0,0,-20,-20,-20,60,
-8,-8,-28,44,-8,-8,39,-23,-8,-19,35,-8,57,-8,-41,-8,-8,55,-8,-39,-8,-8,41,-25,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),ncol=4,byrow=T)
colnames(t1) <- c("20","1","28","19")
r1<-matrix(c(12,1,19,9,20,20,11,20,20,11,20,28,0,0,0,12,19,19,20,19,28,15,28,19,11,28,1,
33,20,28,31,1,19,17,28,19,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),ncol=3,byrow=T)
pt1<-rbind(sort(colSums(t1)),sort(punti))
colnames(r1)<-c("Valore","Vincitore","Perdente")
r1<-as.data.frame(r1)
But I have more matrices t_ and r_ so I would like to run a for-loop like:
for (i in 1:150)
{
pt[i]<-rbind(sort(colSums(t[i])),sort(punti))
colnames(r[i])<-c("Valore","Vincitore","Perdente")
r[i]<-as.data.frame(r[i])
}
This one just won't work because r_, t_ and pt_ are strings, but you get both the idea and that I would not like to copy-paste these three lines and manually edit the [i] 150 times. Is there a way to do it?
personally i don't advise dynamically and automatically creating lots of variables in the global environment, and would advise you to think about how you can accomplish your goals without such an approach. with that said, if you feel you really need to dynamically create all these variables, you may benefit from the assign function.
it could work like so:
for (i in 1:150)
{
assign(paste0('p',i),rbind(sort(colSums(t[i])),sort(punti)))
}
the first argument in the assign function is the formula for the variable name and how it is created; the second argument is what you wish to assign to the variable being created.

call columns from inside a for loop in R

I basically want to be capable to call columns from inside a for loop (in reality two nested for loops), using past() and i (j..) value of the loop to access
my data frames columns wise in a flexible manner.
#for the showcase I use the standard cars example
r1 <- cars
r2 <- cars
# in case there are more data to consider I would want to add, ore remove further with out changing the rest
# here I am entering the "dimension" of what I want to compare for the showcase its only one
num_r <- 2 #total number of reactors in the experiment
for( i in 1:num_r)
{
# shoud create proxie variable to be processed further
assign(paste("proxi_r",i,sep="", colapse="") , do.call("matrix",
list(get(paste("r",i,"$speed",sep="", colapse="" )))))
# further operations of gluing and arranging data follow so they fit tests formatting requirements
}
which gives me:
Error in get(paste("r", i, "$speed", sep = "", colapse = "")) :
object 'r1$speed' not found
but when typ r1$speed it obviously exists??
Sofare I searched "R object dont exist inside loop", "using paste() to acces variables inside loop", "foor loops and objects","do.call inside loops" ....and similar...
Is there anything to circumvent get() so I don’t have to look into the topic of environments, so I can keep the flexibility of my loops so I don’t have re-edit my script every time I have a changed the experimental configuration, which is really time consuming and allows a lot of errors to sneak inside.
The size of the data have crashed excel with extensive use of excel macros, which everyone in the lab here is using, several times :) , so there is no going back to the convort zone.
I am now trying to dig into R programming with a R statics book, and a lot of googling and reading tutorials, so please forgive my naive approach, and my lousy English.
I would be very thankful for any tips, as I feel sort of stuck right now.
This is a common confusion. You've created an object name "r1$speed" , i.e. a complete character string. This is not the same as the object r1 subsetted by $speed .
Try using get(paste('r',i,collapse='',sep=''))$speed

Scoping and functions in R 2.11.1 : What's going wrong?

This question comes from a range of other questions that all deal with essentially the same problem. For some strange reason, using a function within another function sometimes fails in the sense that variables defined within the local environment of the first function are not found back in the second function.
The classical pattern in pseudo-code :
ff <- function(x){
y <- some_value
some_function(y)
}
ff(x)
Error in eval(expr, envir, enclos) :
object 'y' not found
First I thought it had something to do with S4 methods and the scoping in there, but it also happens with other functions. I've had some interaction with the R development team, but all they did was direct me to the bug report site (which is not the most inviting one, I have to say). I never got any feedback.
As the problem keeps arising, I wonder if there is a logic explanation for it. Is it a common mistake made in all these cases, and if so, which one? Or is it really a bug?
Some of those questions :
Using functions and environments
R (statistical) scoping error using transformBy(), part of the doBy package.
How to use acast (reshape2) within a function in R?
Why can't I pass a dataset to a function?
Values not being copied to the next local environment
PS : I know the R-devel list, in case you wondered...
R has both lexical and dynamic scope. Lexical scope works automatically, but dynamic scope must be implemented manually, and requires careful book-keeping. Only functions used interactively for data analysis need dynamic scope, so most authors (like me!) don't learn how to do it correctly.
See also: the standard non-standard evaluation rules.
There are undoubtedly bugs in R, but a lot of the issues that people have been having are quite often errors in the implementation of some_function, not R itself. R has scoping rules ( see http://cran.r-project.org/doc/manuals/R-intro.html#Scope) which when combined with lazy evaluation of function arguments and the ability to eval arguments in other scopes are extremely powerful but which also often lead to subtle errors.
As Dirk mentioned in his answer, there isn't actually a problem with the code that you posted. In the links you posted in the question, there seems to be a common theme: some_function contains code that messes about with environments in some way. This messing is either explicit, using new.env and with or implicitly, using a data argument, that probably has a line like
y <- eval(substitute(y), data)
The moral of the story is twofold. Firstly, try to avoid explicitly manipulating environments, unless you are really sure that you know what you are doing. And secondly, if a function has a data argument then put all the variables that you need the function to use inside that data frame.
Well there is no problem in what you posted:
/tmp$ cat joris.r
#!/usr/bin/r -t
some_function <- function(y) y^2
ff <- function(x){
y <- 4
some_function(y) # so we expect 16
}
print(ff(3)) # 3 is ignored
$ ./joris.r
[1] 16
/tmp$
Could you restate and postan actual bug or misfeature?

What is the best way to avoid passing a data frame around?

I have 12 data.frames to work with. They are similar and I have to do the same processing to each one, so I wrote a function that takes a data.frame, processes it, and then returns a data.frame. This works. But I am afraid that I am passing around a very big structure. I may be making temporary copies (am I?) This can't be efficient. What is the best way to avoid passing a data.frame around?
doSomething <- function(df) {
// do something with the data frame, df
return(df)
}
You are, indeed, passing the object around and using some memory. But I don't think you can do an operation on an object in R without passing the object around. Even if you didn't create a function and did your operations outside of the function, R would behave basically the same.
The best way to see this is to set up an example. If you are in Windows open Windows Task Manager. If you are in Linux open a terminal window and run the top command. I'm going to assume Windows in this example. In R run the following:
col1<-rnorm(1000000,0,1)
col2<-rnorm(1000000,1,2)
myframe<-data.frame(col1,col2)
rm(col1)
rm(col2)
gc()
this creates a couple of vectors called col1 and col2 then combines them into a data frame called myframe. It then drops the vectors and forces garbage collection to run. Watch in your windows task manager at the mem usage for the Rgui.exe task. When I start R it uses about 19 meg of mem. After I run the above commands my machine is using just under 35 meg for R.
Now try this:
myframe<-myframe+1
your memory usage for R should jump to over 144 meg. If you force garbage collection using gc() you will see it drop back to around 35 meg. To try this using a function, you can do the following:
doSomething <- function(df) {
df<-df+1-1
return(df)
}
myframe<-doSomething(myframe)
when you run the code above, memory usage will jump up to 160 meg or so. Running gc() will drop it back to 35 meg.
So what to make of all this? Well, doing an operation outside of a function is not that much more efficient (in terms of memory) than doing it in a function. Garbage collection cleans things up real nice. Should you force gc() to run? Probably not as it will run automatically as needed, I just ran it above to show how it impacts memory usage.
I hope that helps!
I'm no R expert, but most languages use a reference counting scheme for big objects. A copy of the object data will not be made until you modify the copy of the object. If your functions only read the data (i.e. for analysis) then no copy should be made.
I came across this question looking for something else, and it's old - so I'll just provide a brief answer for now (leave a comment if you'd like more explanation).
You can pass around environments in R which contain anywhere from 1 to all of your variables. But probably you don't need to worry about it.
[You might also be able to do something similar with classes. I only currently understand how to use classes for polymorphic functions - and note there's more than 1 class system kicking around.]

Resources