using value of a function & nested function in R - r

I wrote a function in R - called "filtre": it takes a dataframe, and for each line it says whether it should go in say bin 1 or 2. At the end, we have two data frames that sum up to the original input, and corresponding respectively to all lines thrown in either bin 1 or 2. These two sets of bin 1 and 2 are referred to as filtre1 and filtre2. For convenience the values of filtre1 and filtre2 are calculated but not returned, because it is an intermediary thing in a bigger process (plus they are quite big data frame). I have the following issue:
(i) When I later on want to use filtre1 (or filtre2), they simply don't show up... like if their value was stuck within the function, and would not be recognised elsewhere - which would oblige me to copy the whole function every time I feel like using it - quite painful and heavy.
I suspect this is a rather simple thing, but I did search on the web and did not find the answer really (I was not sure of best key words). Sorry for any inconvenience.
Thxs / g.

It's pretty hard to know the optimum way of achieve what you want as you do not provide proper example, but I'll give it a try. If your variables filtre1 and filtre2 are defined inside of your function and you do not return them, of course they do not show up on your environment. But you could just return the classification and make filtre1 and filtre2 afterwards:
#example data
df<-data.frame(id=1:20,x=sample(1:20,20,replace=TRUE))
filtre<-function(df){
#example function, this could of course be done by bins<-df$x<10
bins<-numeric(nrow(df))
for(i in 1:nrow(df))
if(df$x<10)
bins[i]<-1
return(bins)
}
bins<-filtre(df)
filtre1<-df[bins==1,]
filtre2<-df[bins==0,]

Related

For loop setup with multiple parameters in R

I'm trying to figure out how to get a for loop setup in R when I want it to run two or more parameters at once. Below I have posted a sample code where I am able to get the code to run and fill a matrix table with two values. In the 2nd line of the for loop I have
R<-ARMA.var(length(x_global_sample),ar=c(tt[i],-.7))
And what I would like to do is replace the -.7 with another tt[i], example below, so that my for loop would run through the values starting at (-1,-1), then it would be as follows (-1,-.99),
(-1,-.98),...,(1,.98),(1,.99),(1,1) where the result matrix would then be populated by the output of Q and sigma.
R<-ARMA.var(length(x_global_sample),ar=c(tt[i],tt[i]))
or something similar to
R<-ARMA.var(length(x_global_sample),ar=c(tt[i],ss[i]))
It may be very possible that this would be better handled by two for loops however I'm not 100% sure on how I would set that up so the first parameter would be fixed and the code would run through the sequence of the second parameter, once that would get finished the first parameter would now increase by one and fix itself at that increase until the second parameter does another run through.
I've posted some sample code down below where the ARMA.var function just comes from the ts.extend package. However, any insight into this would be great.
Thank you
tt<-seq(-1,1,0.01)
Result<-matrix(NA, nrow=length(tt)*length(tt), ncol=2)
for (i in seq_along(tt)){
R<-ARMA.var(length(x_global_sample),ar=c(tt[i],-.7))
Q<-t((y-X%*%beta_est_d))%*%solve(R)%*%(y-X%*%beta_est_d)+
lam*t(beta_est_d)%*%D%*%beta_est_d
RSS<-sum((y-X%*%solve(t(X)%*%solve(R)%*%X+lam*D)%*%t(X)%*%solve(R)%*%y)^2)
Denom<-n-sum(diag(X%*%solve(t(X)%*%solve(R)%*%X+lam*D)%*%t(X)%*%solve(R)))
sigma<-RSS/Denom
Result[i,1]<-Q
Result[i,2]<-sigma
rm(Q)
rm(R)
rm(sigma)
}
Edit: I realize that what I have posted above is quite unclear so to simplify things consider the following code,
x<-seq(1,20,1)
y<-seq(1,20,2)
Result<-matrix(NA, nrow=length(x)*length(y), ncol=2)
for(i in seq_along(x)){
z1<-x[i]+y[i]
z2<-z1+y[i]
Result[i,1]<-z1
Result[i,2]<-z2
}
So the results table would appear as follow as the following rows,
Row1: 1+1=2, 2+1=3
Row2: 1+3=4, 4+3=7
Row3: 1+5=6, 6+5=11
Row4: 1+7=8, 8+7=15
And this pattern would continue with x staying fixed until the last value of y is reached, then x would start at 2 and cycle through the calculations of y to the point where my last row is as,
RowN: 20+19=39, 39+19=58.
So I just want to know if is there a way to do it in one loop or if is it easier to run it as 2 loops.
I hope this is clearer as to what my question was asking, and I realize this is not the optimal way to do this, however for now it is just for testing purposes to see how long my initial process takes so that it can be streamlined down the road.
Thank you

summary of row of numbers in R

I just hope to learn how to make a simple statistical summary of the random numbers fra row 1 to 5 in R. (as shown in picture).
And then assign these rows to a single variable.
enter image description here
Hope you can help!
When you type something like 3 on a single line and ask R to "run" it, it doesn't store that anywhere -- it just evaluates it, meaning that it tries to make sense out of whatever you've typed (such as 3, or 2+1, or sqrt(9), all of which would return the same value) and then it more or less evaporates. You can think of your lines 1 through 5 as behaving like you've used a handheld scientific calculator; once you type something like 300 / 100 into such a calculator, it just shows you a 3, and then after you have executed another computation, that 3 is more or less permanently gone.
To do something with your data, you need to do one of two things: either store it into your environment somehow, or to "pipe" your data directly into a useful function.
In your question, you used this script:
1
3
2
7
6
summary()
I don't think it's possible to repair this strategy in the way that you're hoping -- and if it is possible, it's not quite the "right" approach. By typing the numbers on individual lines, you've structured them so that they'll evaluate individually and then evaporate. In order to run the summary() function on those numbers, you will need to bind them together inside a single vector somehow, then feed that vector into summary(). The "store it" approach would be
my_vector <- c(1, 3, 7, 2, 6)
summary(my_vector)
The importance isn't actually the parentheses; it's the function c(), which stands for concatenate, and instructs R to treat those 5 numbers as a collective object (i.e. a vector). We then pass that single object into my_vector().
To use the "piping" approach and avoid having to store something in the environment, you can do this instead (requires R 4.1.0+):
c(1, 3, 7, 2, 6) |> summary()
Note again that the use of c() is required, because we need to bind the five numbers together first. If you have an older version of R, you can get a slightly different pipe operator from the magrittr library instead that will work the same way. The point is that this "binding" part of the process is an essential part that can't be skipped.
Now, the crux of your question: presumably, your data doesn't really look like the example you used. Most likely, it's in some separate .csv file or something like that; if not, hopefully it is easy to get it into that format. Assuming this is true, this means that R will actually be able to do the heavy lifting for you in terms of formatting your data.
As a very simple example, let's say I have a plain text file, my_example.txt, whose contents are
1
3
7
2
6
In this case, I can ask R to parse this file for me. Assuming you're using RStudio, the simplest way to do this is to use the File -> Import Dataset part of the GUI. There are various options dealing with things such as headers, separators, and so forth, but I can't say much meaningful about what you'd need to do there without seeing your actual dataset.
When I import that file, I notice that it does two things in my R console:
my_example <- read.table(...)
View(my_example)
The first line stores an object (called a "data frame" in this case) in my environment; the second shows a nice view of how it's rendered. To get the summary I wanted, I just need to extract the vector of numbers I want, which I see from the view is called V1, which I can do with summary(my_example$V1).
This example is probably not helpful for your actual data set, because there are so many variations on the theme here, but the theme itself is important: point R at a file, as it to render an object, then work with that object. That's the approach I'd recommend instead of typing data as lines within an R script, as it's much faster and less error-prone.
Hopefully this will get you pointed in the right direction in terms of getting your data into R and working with it.

index by name or by position in list / vector, which is faster?

I am currently trying to optimise the speed of a physical model computation. The specificity of this model is that it uses hundreds of input parameters, all stored in a big named vector:
initialize = c("temperature"=100, "airpressure"=150, "friction"=0.46)
The model, while iterating hundreds of times, needs to access the parameters, possibly updates them, etc.:
compute(initialize['temperature'], initialize['airpressure'])
initialize['friction'] <- updateP(initialize['friction'])
This is the logic. However I wonder if this is really efficient to work like this. What happens behind an indexation by name, is it fast? Some ideas to change this logic:
define each parameter as an independent variable in the environment?
(but how to pass a a large number of them as argument of a function?
have a list of parameters instead of a named vector?
access each parameter by its index in the vector, like this:
compute(initialize[1], initialize[2])
If I go with this last solution, of course I will loose the readability of the code (which parameter is actually initialize[1]?). So a way to go could be to define their positions first:
temperature.pos <- 1
airpressure.pos <- 2
compute(initialize[temperature.pos], initialize[airpressure.pos])
Of course, why didn't I try this and tested the speed? Well, it would take me hours to transform every location of parameters call in the script, that's why I ask before doing it.
And maybe there is a even more clever solution?
Thanks

Accessing index inside *apply

I have two containers, conty and contx. The values of both are tied to each other. conty[1] relates to contx[1] etc. while using apply on contx I want to access the index inside an apply structure so I can put values from corresponding element in conty into contz depending upon the index of x.
lapply(contx, function(x) {
if (x==1) append(contz,conty[xindex])
})
I could easily do this in a for loop but everybody insists that using the apply is better. And I tried to look for examples but the only thing I could find was mostly stuff for generating maps where it wasn't entirely clear how I could adapt to my problem.
There are a few issues here.
"everybody insists that using the apply is better". Sorry, but they're wrong; it's not necessarily better. See the old-school Burns Inferno ("If you are using R and you think you’re in hell, this is a map for you"), chapter 4 ("Overvectorization"):
A common reflex is to use a function in the apply family. This is not vectorization, it is loop-hiding. The apply function has a for loop in its definition. The lapply function buries the loop, but execution times tend to be roughly equal to an explicit for loop ... Base your decision of using an apply function on Uwe’s Maxim (page 20). The issue is of human time rather than silicon chip time. Human time can be wasted by taking longer to write the code, and (often much more importantly) by taking more time to understand subsequently what it does.
However, what you are doing that's bad is growing an object (also covered in the Inferno). Assuming that in your example contz started as an empty list, this should work (is my example reflective of your use case?)
x <- c(1,2,3,1)
conty <- list("a","b","c","d")
contz <- conty[which(x==1)]
Alternatively, if you want to use both the value and the index in your function, you can write a two-variable function f(val,index) and then use Map(f,my_list,seq_along(my_list))

Ordered Map / Hash Table in R

While working with lists i've noticed an issue that i didn't expect.
result5 <- vector("list",length(queryResults[[1]]))
for(i in 1:length(queryResults[[1]])){
id <- queryResults[[1]][i]
result5[[id]] <-getPrices(id)
}
The problem is that after this code runs instead of the result staying the same size (w/e queryResults[[1]] is) it goes up to the last index creating a bunch of null entries in the middle.
result5 current stores a number of int,double lists so it looks like :
result5[[index(int)]][[row]][col]
While on it's own it's not too problematic I would rather avoid that simply for easier size calculations later on.
For clarification, id is an integer. And in the given case for loop offers same performance, but greater convenience than the apply functions.
After some testing seems like the easiest way of doing it is :
Using a hash package to convert it using a hash using :
result6 <- hash(queryResults[[1]],lapply(queryResults[[1]],getPrices))
And if it needs to get accessed calling
result6[[toString(id)]]
With the difference in performance being marginal, albeit it's still fairly annoying having to include toString in your code.
It's not clear exactly what your question is, but judging by the structure of the loop, you probably want
result5[[i]] <- getPrices(id)
rather than result5[[id]] <- getPrices(id).

Resources