Need help in understanding this R script [closed] - r

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I am very new in R so I would appreciate if anybody helps me to understand the following script so I can write in Matlab.
I am only interested in "R" method section of this script:
The following function is from copula package in R and it is for empirical copula:
> F.n
function (x, X, offset = 0, method = c("C", "R"))
{
stopifnot(is.numeric(d <- ncol(X)), is.matrix(x), d == ncol(x))
n <- nrow(X)
if (d == 1)
vapply(x, function(x.) sum(X <= x.), NA_real_)/(n + offset)
else {
method <- match.arg(method)
switch(method, C = {
m <- nrow(x)
.C(Cn_C, as.double(X), as.integer(n), as.integer(d),
as.double(x), as.integer(m), ec = double(m),
as.double(offset))$ec
}, R = {
tX <- t(X)
vapply(1:nrow(x), function(k) sum(colSums(tX <= x[k,
]) == d), NA_real_)/(n + offset)
}, stop("wrong 'method': ", method))
}
}
<environment: namespace:copula>

Walk-Through
function (x, X, offset = 0, method = c("C", "R"))
{
Function definition that takes two mandatory and two optional named arguments.
stopifnot(is.numeric(d <- ncol(X)), is.matrix(x), d == ncol(x))
This makes an assertion (and exits if not true) that the return from ncol(X) is numeric (is it ever not?), that x is a matrix (2-dim array), and x and X have the same number of columns.
n <- nrow(X)
Stores the number of rows for future use. (nrow is not an "expensive" function, but it does no harm to do it this way.)
if (d == 1)
vapply(x, function(x.) sum(X <= x.), NA_real_)/(n + offset)
Branch on the condition that X has only one column.
The vapply function is iterating over each element of x, passes the number to the inner function, and captures each response into another vector. The function(x.) ... is an anonymous or immediate function, and in this case is called as many times as there are elements in x. The NA_real_ is a way to tell vapply what type of variable will be returned; sapply would have worked as clearly (though not as quickly).
The resulting vector is divided element-wise by n + offset and is returned out of the overall function.
else {
method <- match.arg(method)
This ensures that method will be either "C" or "R" or the function will error-out.
The following, switch, is similar to a select or case statement, where it will do the first block of code if method is "C", the second if "R", and will stop otherwise.
switch(method, C = {
m <- nrow(x)
.C(Cn_C, as.double(X), as.integer(n), as.integer(d),
as.double(x), as.integer(m), ec = double(m),
as.double(offset))$ec
Makes a call to a C library function named Cn_C with the provided arguments, and extracts the ec component from the return.
}, R = {
tX <- t(X)
Takes the transpose of X.
vapply(1:nrow(x), function(k) sum(colSums(tX <= x[k,
]) == d), NA_real_)/(n + offset)
(See my discussion above for vapply. This time it is iterating over a sequence of incrementing numbers from 1 to the number of rows in x.)
For the anonymous (inner) function, going inside-out, it starts by comparing the transposed X with the kth row of x, resulting in a matrix of booleans. The colSums in this case is merely counting the number of TRUEs in each column that come from that matrix-wise comparison. The outer sum is counting the number of these column-wise sums that are equal to the number of columns in X (as stored in d, earlier).
This vector is divided element-wise by (n + offset), resulting in another vector.
}, stop("wrong 'method': ", method))
Throw an exception if something else was provided to method in the function call.
}
}
Wrap-Up
You really should do some more study on help(vapply) if you are not very comfortable with Map-like functions on vectors. You might benefit from reading help(match.arg), help(switch). I think the other major functions (nrow, ncol, t) are clear enough.
You probably don't need to worry about the .C(...) call to a library function, since this function is kind enough to provide an R-native implementation as well. (Well, that's what I'm inferring, at least.)

Related

Saving recursive function results to a global data frame in R

I'm trying to recreate the functionality of the memoise package in base R by saving the outputs of a recursive function in a data frame. I have this function "P" and then I made this "metaP" wrapper that will run P(n) if metaP(n) hasn't been run before and then save the results of P(n), or it produces the previously saved output. My issue is it only works at the first level. If I run metaP(5) it will save the output of metaP(5), but in order to get P(5) it also had to calculate P(4) and the results of P(4) aren't getting saved. I'm assuming it's getting lost in the recursive environments, but when I tried using the assign function and setting it to the global environment it still didn't work.
In the example below, I run metaP 5 through 10, and df has 5 through 10 saved, but it doesn't have 1 through 5 saved, some of which must have been calculated to come up with the answers of 5 through 10.
df <- data.frame(n = 0, pn = 1)
metaP <- function(n) {
if (!n %in% df$n) df <<- rbind(df, data.frame(n = n, pn = P(n)))
df[df$n == n, "pn"]
}
P <- function(n) {
if (n < 0) return(0)
k <- rep(1:((sqrt(24 * n + 1) + 1) / 6), each = 2) * c(1, -1)
return(sum((-1) ^ (k + 1) * sapply(n - k * (3 * k - 1) / 2, metaP)) %% 1e6)
}
sapply(5:10, metaP)
df
The issue here is kind of subtle. The expression
df <<- rbind(df, data.frame(n = n, pn = P(n)))
is ambiguous, because the ?rbind documentation doesn't define the order in which the two arguments to rbind() are evaluated. It appears that R is evaluating df, then doing the recursive call, then appending that result to the saved value of df. Any changes to the global variable that happened during the recursive call are lost.
To fix this, rewrite the conditional part as
if (!n %in% df$n) {
newval <- data.frame(n = n, pn = P(n))
df <<- rbind(df, newval)
}
(I'd also suggest adding parens to the test, and writing it as if (!(n %in% df$n)), because it's not immediately obvious that these are the same. I was confused about this in an earlier answer to this question. But checking ?Syntax shows that %in% has higher priority than !.)

R - Replacing a variable with its value in every iteration of a loop

Somewhat new to R (coming from SQL), trying to write a quick loop to generate a series of functions that perform a task a specified number of times (i.e. for function 2, do something 2 times, function 3 -> 3 times, etc.).
My issue is arising in that I'm using the iterative variable (in the below code, 'k') as a part of the lower-level function - as such, when I go to evaluate ANY of the subsequently generated functions, it returns the value of the function as of the last value of k (here, 4), no matter if I call function 2, 3, or 4.
My question is therefore how can I substitute the value of 'k' for the variable 'k' when I'm generating the lower level functions? For example, on the first iteration, when k = 2, I want to substitute '2' for every occurrence of 'k' in the lower level function, such that when function 2 is run later, it sees the value of '2', rather than the last value of 'k'?
Code below (note that the 'x' value the function will evaluate is a 1 column, variable row matrix):
x <- as.matrix(11:20)
for (k in 2:4) {
actvfun <- NULL
actvfun <- function(x) {
actv <- NULL
actvmtx <- NULL
actvmtx <- as.matrix(x)
for (j in 2:(k+1)) {
actv <- rep(NA, length(x))
for (i in j:length(x)) {
actv[i] <- x[i - (j - 1)]
}
actvmtx <- as.matrix(cbind(actvmtx, actv))
}
assign(paste("lag0", k, "av", sep=""), actvmtx)
return(apply(get(paste("lag0", k, "av", sep="")), 1, mean, na.rm=T))
assign(paste("LAGTEST0", k, "AV", sep=""), apply(actvmtx, 1, mean, na.rm=T))
}
assign(paste("v5LAGTEST0", k, ".av", sep=""), actvfun)
}
v5LAGTEST02.av(x)
v5LAGTEST03.av(x)
The last two items are the checks I was running - currently both return the result using k = 4, rather than their respective values of 2 and 3.
Any help is greatly appreciated - I know loops are somewhat frowned upon in R (as is 'assign', but I'm not sure how else to achieve the desired result of variable function names), so I'm certainly open to new suggestions!
Thanks,
Nate

R - apply to replace for loop with functions [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I have seen a dozen sites to learn apply, sapply, lapply, but they don't get much farther than teaching you how to take the sum or mean of a row or column. I have to use a lot of for loops, often nested. Please show me how to replace for with apply in the following.
for (i in 1:NTR) {
X = as.vector(as.matrix(XTR[i,]))
YZ = FN(W, V, U, X) # return Y and Z
Y = YZ$Y
Z = YZ$Z
for (j in 1:NOUT) {
F[i+NOUT*(j-1)] = Y[j]-YTR[i,j]
} # j
} # i
and
for (k in 1:NOUT) {
for (iwt in 1:NWT) {
m = NHID*(NNLIN+1)
if (iwt <= m) {
i = (iwt-1) %/% (NNLIN+1) + 1
j = (iwt-1) %% (NNLIN+1)
EVZ = V[k,i+1]*Z[i]*(1-Z[i])
if (j>0) EVZ = EVZ*X[j]
J[k+(n-1)*NOUT,iwt] = EVZ
}
} # iwt
} # k
Thanks very much for any replies.
The easiest way to transform a loop into an lapply is to take what's inside your for loop and plug it inside a function. Let's start with your inner loop.
foo_inner <- function(j) {
F[i+NOUT*(j-1)] = Y[j]-YTR[i,j]
}
There is one problem here though: your function should return a value instead of assigning it. In this case, you want to return Y[j]-YTR[i,j].
foo_inner <- function(j) {
return(Y[j]-YTR[i,j])
}
You can then simply apply this function to 1:NOUT. In this case, it seems (though I can't be sure given the lack of info in your post) that this return value is just a number so that you can create a vector directly instead of a list. In this case, it is better to use sapply rather than lapply (the 's' stands for simplify (to a vector) while the 'l' means list):
sapply(1:NOUT, foo_inner)
This new vector must now be assigned in your outer loop. From your code, it appears you want to assign this vector to F[i+NOUT*(1:NOUT-1)]. I simply replaced your j value by 1:NOUT which is your iterator. Your code snippet can therefore be modified like thus:
for (i in 1:NTR) {
X = as.vector(as.matrix(XTR[i,]))
YZ = FN(W, V, U, X) # return Y and Z
Y = YZ$Y
Z = YZ$Z
F[i+NOUT*(1:NOUT-1)] <- sapply(1:NOUT, foo_inner)
} # i
Let's tackle the outer loop now. As before, we can plug what's inside in a function:
foo_outer <- function(i) {
X = as.vector(as.matrix(XTR[i,]))
YZ = FN(W, V, U, X) # return Y and Z
Y = YZ$Y
Z = YZ$Z
F[i+NOUT*(1:NOUT-1)] <- sapply(1:NOUT, foo_inner)
}
There is here two problems to overcome: First, your foo_inner function takes i as parameter. But, foo_inner being defined outside foo_outer, it will always use the i defined in your environment and not the argument of foo_outer. There are two solutions to this: either define foo_inner inside foo_outer:
foo_outer <- function(i) {
X = as.vector(as.matrix(XTR[i,]))
YZ = FN(W, V, U, X) # return Y and Z
Y = YZ$Y
Z = YZ$Z
foo_inner <- function(j) {
return(Y[j]-YTR[i,j])
}
F[i+NOUT*(1:NOUT-1)] <- sapply(1:NOUT, foo_inner)
}
Or, modify your foo_inner function so that it takes i as parameter and returns a correct function. You will then apply the function foo_inner(i) inside the inner sapply rather than foo_inner:
foo_inner <- function(i) {
function(j) {
return(Y[j]-YTR[i,j])
}
}
foo_outer <- function(i) {
X = as.vector(as.matrix(XTR[i,]))
YZ = FN(W, V, U, X) # return Y and Z
Y = YZ$Y
Z = YZ$Z
F[i+NOUT*(1:NOUT-1)] <- sapply(1:NOUT, foo_inner(i))
}
The next modification to do is to ensure you want to return a value rather than doing an assignment inside your function:
foo_inner <- function(i) {
function(j) {
return(Y[j]-YTR[i,j])
}
}
foo_outer <- function(i) {
X = as.vector(as.matrix(XTR[i,]))
YZ = FN(W, V, U, X) # return Y and Z
Y = YZ$Y
Z = YZ$Z
return(sapply(1:NOUT, foo_inner(i)))
}
You can now apply this function:
lapply(1:NTR, foo_outer)
In this case, I used an lapply as each element returned will be a vector so I'd rather return a list and collapse it afterwards as I'm not sure if that's what sapply does in this case (and am too lazy to find out at the moment, I'll correct if someone can confirm this).
So you want one big vector now, so I'll just collapse all that.
do.call(c, lapply(1:NTR, foo_outer))
I can then just assign this value to F directly:
F <- do.call(c, lapply(1:NTR, foo_outer))
Obviously, I can't ensure this is exactly what you wanted without knowing what F, YTR, Y and all your inputs are. But I hope it gets you started on the right path!
Edit: I think I created the wrong order for your final vector: with the above it will put all the "j" values for i= 1 then all the "j" values for i = 2... But looking back at your F, it looks like the order you want is for j=1 all the "i" values... To do this, You just have to rearrange the output of your second lapply. This hould work.
I want to create a function that will get all j elements from the list:
LL <- lapply(1:NTR, foo_outer)
get_j_values <- function(j) {
sapply(LL, function(x) x[[j]])
}
For any j value, get_j_value returns in a vector all the jth element of LL. By applying this function to all possible j values, it returns a list with first element: all "i" values for j=1 then for the second element all "i" values for j=2...
LL2 <- lapply(1:NOUT, get_j_values)
I can then collapse this list in one big vector and assign it.
F <- do.call(c, LL2)
Edit 2: Although it is possible to work with apply functions to recreate your for loop, this might be one of these times where a for loop is actually better: there is no accumulation of result so the apply method shouldn't be faster and the for loop would be clearer I believe. This is usually the case when you are looping over indices that are used over several different objects so that you can't use an apply on any specific object directly but need to apply a function to a vector of indices... Just my two cents...

R: Compute on previous elements of an ordered vector [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
Given an ordered vector vec <- c(1, 4, 6, 3, 2, 7), I want to compute for each element i of vec the weighted average of the previous elements where the weight is the inverse of the distance from the element i.
The function should proceed as following.
For the first element 1, should return NA (no previous element).
For the second element 4, should return 1.
For the third element 6, should return weighted.mean(x = c(1,4), w
= c(1,2)).
For the fourth element 3, should return weighted.mean(x =
c(1,4,6), w = c(1,2,3))
The resulting vector result should be, with length(result) == length(vec), c(NA, 1, 3, 4.5, 3.9, 3.266667).
UPDATE:
I clearly mean without using a loop
result <- numeric()
for (i in 1:length(vec)) {
if (i == 1) {
result <-
c(result, NA)
} else {
previous_elements <- vec[1:(i-1)]
result <-
c(result,
weighted.mean(x = previous_elements, w = 1:length(previous_elements)))
}
}
Here's a naive implementation. Create a function that does what you say; the only 'clever' thing is to use the function seq_len() instead of 1:i to generate the indexes
fun = function(i, vec)
weighted.mean(head(vec, i - 1), w=seq_len(i - 1))
and then use it in sapply
sapply(seq_along(vec), fun, vec)
This is good enough -- NaN as the first element, rather than NA, but that's easily corrected after the fact (or conceptually accepted as the right answer). It's also better than your solution, but still 'using a loop' -- the management of the result vector is done by sapply(), rather than in your loop where you have to manage it yourself. And in particular your 'copy and append' approach is very bad performance-wise, making a copy of the existing result each time through the loop. It's better to pre-allocate a result vector of the appropriate length result = numeric(length(vec)) and then fill it result[[i]] = ..., and better still to just let sapply() do the right thing for you!
The problem is that the naive implementation scales quadratically -- you make a pass along vec to process each element, and then for each element you make a second pass to calculate the weighted mean, so there are n (n - 1) / 2 calculations. So...
Take a look at weighted.mean
> stats:::weighted.mean.default
function (x, w, ..., na.rm = FALSE)
{
## SNIP -- edited for brevity
w <- as.double(w)
if (na.rm) {
i <- !is.na(x)
w <- w[i]
x <- x[i]
}
sum((x * w)[w != 0])/sum(w)
}
and use cumsum() instead of sum() to get the cumulative weights, rather than the individual weights, i.e., return a vector as long as x, where the ith element is the weighted mean up to that point
cumweighted.mean <- function(x, w) {
## handle NA values?
w <- as.numeric(w) # to avoid integer overflow
cumsum(x * w)[w != 0] / cumsum(w)
}
You'd like something a little different
myweighted.mean <- function(x)
c(NA, cumweighted.mean(head(x, -1), head(seq_along(x), - 1)))
This makes a single pass through the data, so scales linearly (at least in theory).

dim() Error in outer

I have made the following function I would like to make a 3-dimensional image of using the function "persp", therefore I use the function outer, to get the value of the function for each combination of a and b, but this make an Error.
So my code is:
a<- seq(from=0, to=5,by=0.25)
b<- seq(from=0.1, to=2,by=0.1)
Rab <- function(a,b){
r <- matrix(ncol = 1, nrow = 4)
for (p in seq(from=0, to=4,by=1)){
g <- ifelse(a>=0 & a<1-1/p & p >b, a*p,
ifelse(a>=0 & a<1-1/b & p< b, -1+(a+1/b),
ifelse(a > 1-1/max(p,b),-1+p,NA)))
w <- p
r[w] <- g
}
return(r)
}
q <- outer(a,b,Rab)
And then I get the following Error and warning messages, which I don't understand.
Error in outer(a, b, Rab) :
dims [product 420] do not match the length of object [4]
In addition: Warning messages:
1: In r[w] <- g :
number of items to replace is not a multiple of replacement length
2: In r[w] <- g :
number of items to replace is not a multiple of replacement length
3: In r[w] <- g :
number of items to replace is not a multiple of replacement length
4: In r[w] <- g :
number of items to replace is not a multiple of replacement length
I have tried to read about it, and I think it is because I have constructed the function Rab wrong, but I don't know how to correct it.
Any help is appreciated.
You are right that your Rab function is wrong. The documentation of outer says
X and Y must be suitable arguments for FUN. Each will be extended by rep to length the products of the lengths of X and Y before FUN is called.
FUN is called with these two extended vectors as arguments (plus any arguments in ...). It must be a vectorized function (or the name of one) expecting at least two arguments and returning a value with the same length as the first (and the second).
So in your example a and b are extended to both have length length(a) * length(b), which happens to be 420 in your case. your function Rab should then return a vector of the same length.
In Rab you compute a vector g that has the correct length and would be suitable as a return value. Instead of returning this vector you try to assign it to an entry in the matrix r. Note that this matrix is defined as
r <- matrix(ncol = 1, nrow = 4)
and can't hold vectors of length 420 in either its rows or columns (this is what the warning messages are about). You lose all but the first element of your vector g in the process. You then go on to re-compute g with a slightly different set of parameters, which brings us to the next problem. These computations happen in a loop that is defined like this:
for (p in seq(from=0, to=4,by=1)){
## compute g ...
r[p] <- g
}
You seem to expect this loop to be executed four times but it is actually run five times for values of p equalling 0, 1, 2, 3 and 4. This means that the first g is assigned to r[0], which R silently ignores. Of course when you then try to return r none of this really matters because it only has length 4 (rather than 420) and this triggers an error.
I'm not convinced that I really understand what you are trying to do but the following might be a step in the right direction.
Rab <- function(a, b, p){
ifelse(a>=0 & a<1-1/p & p >b, a*p,
ifelse(a>=0 & a<1-1/b & p< b, -1+(a+1/b),
ifelse(a > 1-1/max(p,b),-1+p,NA)))
}
This will compute the g from your function once for a fixed value of p and return the result. You'd call this like so (for p=0):
q <- outer(a, b, Rab, 0)
If you want to call it for a number of different p you can do something like
q <- lapply(0:3, function(x,y,f, p) outer(x, y, f, p), x=a, y=b, f=Rab)
This would call Rab with p = 0, 1, 2 and 3 (I'm guessing that's what you want, adjust as required).

Resources