Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
Given an ordered vector vec <- c(1, 4, 6, 3, 2, 7), I want to compute for each element i of vec the weighted average of the previous elements where the weight is the inverse of the distance from the element i.
The function should proceed as following.
For the first element 1, should return NA (no previous element).
For the second element 4, should return 1.
For the third element 6, should return weighted.mean(x = c(1,4), w
= c(1,2)).
For the fourth element 3, should return weighted.mean(x =
c(1,4,6), w = c(1,2,3))
The resulting vector result should be, with length(result) == length(vec), c(NA, 1, 3, 4.5, 3.9, 3.266667).
UPDATE:
I clearly mean without using a loop
result <- numeric()
for (i in 1:length(vec)) {
if (i == 1) {
result <-
c(result, NA)
} else {
previous_elements <- vec[1:(i-1)]
result <-
c(result,
weighted.mean(x = previous_elements, w = 1:length(previous_elements)))
}
}
Here's a naive implementation. Create a function that does what you say; the only 'clever' thing is to use the function seq_len() instead of 1:i to generate the indexes
fun = function(i, vec)
weighted.mean(head(vec, i - 1), w=seq_len(i - 1))
and then use it in sapply
sapply(seq_along(vec), fun, vec)
This is good enough -- NaN as the first element, rather than NA, but that's easily corrected after the fact (or conceptually accepted as the right answer). It's also better than your solution, but still 'using a loop' -- the management of the result vector is done by sapply(), rather than in your loop where you have to manage it yourself. And in particular your 'copy and append' approach is very bad performance-wise, making a copy of the existing result each time through the loop. It's better to pre-allocate a result vector of the appropriate length result = numeric(length(vec)) and then fill it result[[i]] = ..., and better still to just let sapply() do the right thing for you!
The problem is that the naive implementation scales quadratically -- you make a pass along vec to process each element, and then for each element you make a second pass to calculate the weighted mean, so there are n (n - 1) / 2 calculations. So...
Take a look at weighted.mean
> stats:::weighted.mean.default
function (x, w, ..., na.rm = FALSE)
{
## SNIP -- edited for brevity
w <- as.double(w)
if (na.rm) {
i <- !is.na(x)
w <- w[i]
x <- x[i]
}
sum((x * w)[w != 0])/sum(w)
}
and use cumsum() instead of sum() to get the cumulative weights, rather than the individual weights, i.e., return a vector as long as x, where the ith element is the weighted mean up to that point
cumweighted.mean <- function(x, w) {
## handle NA values?
w <- as.numeric(w) # to avoid integer overflow
cumsum(x * w)[w != 0] / cumsum(w)
}
You'd like something a little different
myweighted.mean <- function(x)
c(NA, cumweighted.mean(head(x, -1), head(seq_along(x), - 1)))
This makes a single pass through the data, so scales linearly (at least in theory).
Related
This question already has answers here:
How to index a vector sequence within a vector sequence
(5 answers)
Closed 3 years ago.
Let's say we have three vectors:
vec1 = c(1, 2, 3, 4, 5)
vec2 = c(1, 4, 3, 1, 2)
vec3 = c(5, 5, 4, 2, 1)
And let's say I am interested in this specific order of these specific values:
specific_order_of_specific_values <- c(1, 2)
How do I find the vectors which contain that specific order of specific values?
In our example, vec1 and vec2 would return as TRUE and vec3 would return as FALSE.
My idea to solve this is to write a function that loops through each index of the vector and checks if that index matches the first index of the "matching vector." If it does, then check if the i + 1 index matches the the second index of the "matching vector." And so on, so forth. I am genuinely curious if there is a more elegant solution to this using base functions, etc.
P.S. My actual problem is slightly more complicated since I am interested in which rows (of a matrix) have a very specific order of values. But I can simply convert the data frame to a list of vectors.
This is neither the prettiest nor most efficient way to go about it, but I think captures the logic:
has_subvec = function(x,s){
xL = length(x); sL = length(s)
if(xL < sL) return(FALSE)
any(sapply(1:(xL-sL+1),function(i){
isTRUE(all.equal(x[i:(i+sL-1)],s))
}))
}
where x is the vector to search within and s is the 'subvector' we're looking for.
To search each row of a matrix, can use apply(M,1,has_subvec,s=c(1,2)).
One way would be to convert the vector and pattern to match to string and use grepl to know if the pattern exists in other vectors.
order_match <- toString(c(1, 2))
grepl(paste0("\\b", order_match,"\\b"), sapply(list(vec1, vec2, vec3), toString))
#[1] TRUE TRUE FALSE
I ended up writing a function that solves this problem for a list of vectors. This is a pretty ugly solution, so I encourage corrections to mine or more elegant alternative solutions.
matching_vector_test <- c(1033, 280)
test_list <- list(c(1033, 280, 112), c(1033, 112, 280))
match_vector <- function(list_of_vectors, matching_vector) {
list_of_matching_vectors <- list()
for (i in 1:length(list_of_vectors)) {
for (j in 1:length(list_of_vectors[[i]])) {
for (k in 1:length(matching_vector)) {
if ((k < length(matching_vector)) & (j < length(list_of_vectors[[i]])) & (list_of_vectors[[i]][j] == matching_vector[k]) & (list_of_vectors[[i]][j+1] == matching_vector[k + 1])) {
print("test")
list_of_matching_vectors[[i]] <- list_of_vectors[[i]]
break
}
}
}
}
list_of_matching_vectors <- Filter(length, list_of_matching_vectors)
list_of_matching_vectors
}
match_vector(list_of_vectors = test_list, matching_vector = matching_vector_test )
[[1]]
[1] 1033 280 112
Somewhat new to R (coming from SQL), trying to write a quick loop to generate a series of functions that perform a task a specified number of times (i.e. for function 2, do something 2 times, function 3 -> 3 times, etc.).
My issue is arising in that I'm using the iterative variable (in the below code, 'k') as a part of the lower-level function - as such, when I go to evaluate ANY of the subsequently generated functions, it returns the value of the function as of the last value of k (here, 4), no matter if I call function 2, 3, or 4.
My question is therefore how can I substitute the value of 'k' for the variable 'k' when I'm generating the lower level functions? For example, on the first iteration, when k = 2, I want to substitute '2' for every occurrence of 'k' in the lower level function, such that when function 2 is run later, it sees the value of '2', rather than the last value of 'k'?
Code below (note that the 'x' value the function will evaluate is a 1 column, variable row matrix):
x <- as.matrix(11:20)
for (k in 2:4) {
actvfun <- NULL
actvfun <- function(x) {
actv <- NULL
actvmtx <- NULL
actvmtx <- as.matrix(x)
for (j in 2:(k+1)) {
actv <- rep(NA, length(x))
for (i in j:length(x)) {
actv[i] <- x[i - (j - 1)]
}
actvmtx <- as.matrix(cbind(actvmtx, actv))
}
assign(paste("lag0", k, "av", sep=""), actvmtx)
return(apply(get(paste("lag0", k, "av", sep="")), 1, mean, na.rm=T))
assign(paste("LAGTEST0", k, "AV", sep=""), apply(actvmtx, 1, mean, na.rm=T))
}
assign(paste("v5LAGTEST0", k, ".av", sep=""), actvfun)
}
v5LAGTEST02.av(x)
v5LAGTEST03.av(x)
The last two items are the checks I was running - currently both return the result using k = 4, rather than their respective values of 2 and 3.
Any help is greatly appreciated - I know loops are somewhat frowned upon in R (as is 'assign', but I'm not sure how else to achieve the desired result of variable function names), so I'm certainly open to new suggestions!
Thanks,
Nate
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I am very new in R so I would appreciate if anybody helps me to understand the following script so I can write in Matlab.
I am only interested in "R" method section of this script:
The following function is from copula package in R and it is for empirical copula:
> F.n
function (x, X, offset = 0, method = c("C", "R"))
{
stopifnot(is.numeric(d <- ncol(X)), is.matrix(x), d == ncol(x))
n <- nrow(X)
if (d == 1)
vapply(x, function(x.) sum(X <= x.), NA_real_)/(n + offset)
else {
method <- match.arg(method)
switch(method, C = {
m <- nrow(x)
.C(Cn_C, as.double(X), as.integer(n), as.integer(d),
as.double(x), as.integer(m), ec = double(m),
as.double(offset))$ec
}, R = {
tX <- t(X)
vapply(1:nrow(x), function(k) sum(colSums(tX <= x[k,
]) == d), NA_real_)/(n + offset)
}, stop("wrong 'method': ", method))
}
}
<environment: namespace:copula>
Walk-Through
function (x, X, offset = 0, method = c("C", "R"))
{
Function definition that takes two mandatory and two optional named arguments.
stopifnot(is.numeric(d <- ncol(X)), is.matrix(x), d == ncol(x))
This makes an assertion (and exits if not true) that the return from ncol(X) is numeric (is it ever not?), that x is a matrix (2-dim array), and x and X have the same number of columns.
n <- nrow(X)
Stores the number of rows for future use. (nrow is not an "expensive" function, but it does no harm to do it this way.)
if (d == 1)
vapply(x, function(x.) sum(X <= x.), NA_real_)/(n + offset)
Branch on the condition that X has only one column.
The vapply function is iterating over each element of x, passes the number to the inner function, and captures each response into another vector. The function(x.) ... is an anonymous or immediate function, and in this case is called as many times as there are elements in x. The NA_real_ is a way to tell vapply what type of variable will be returned; sapply would have worked as clearly (though not as quickly).
The resulting vector is divided element-wise by n + offset and is returned out of the overall function.
else {
method <- match.arg(method)
This ensures that method will be either "C" or "R" or the function will error-out.
The following, switch, is similar to a select or case statement, where it will do the first block of code if method is "C", the second if "R", and will stop otherwise.
switch(method, C = {
m <- nrow(x)
.C(Cn_C, as.double(X), as.integer(n), as.integer(d),
as.double(x), as.integer(m), ec = double(m),
as.double(offset))$ec
Makes a call to a C library function named Cn_C with the provided arguments, and extracts the ec component from the return.
}, R = {
tX <- t(X)
Takes the transpose of X.
vapply(1:nrow(x), function(k) sum(colSums(tX <= x[k,
]) == d), NA_real_)/(n + offset)
(See my discussion above for vapply. This time it is iterating over a sequence of incrementing numbers from 1 to the number of rows in x.)
For the anonymous (inner) function, going inside-out, it starts by comparing the transposed X with the kth row of x, resulting in a matrix of booleans. The colSums in this case is merely counting the number of TRUEs in each column that come from that matrix-wise comparison. The outer sum is counting the number of these column-wise sums that are equal to the number of columns in X (as stored in d, earlier).
This vector is divided element-wise by (n + offset), resulting in another vector.
}, stop("wrong 'method': ", method))
Throw an exception if something else was provided to method in the function call.
}
}
Wrap-Up
You really should do some more study on help(vapply) if you are not very comfortable with Map-like functions on vectors. You might benefit from reading help(match.arg), help(switch). I think the other major functions (nrow, ncol, t) are clear enough.
You probably don't need to worry about the .C(...) call to a library function, since this function is kind enough to provide an R-native implementation as well. (Well, that's what I'm inferring, at least.)
I am trying to create a function that will take in a vector k and return to me a matrix with dimensions length(distMat[1,]) by length(k). distMat is a huge matrix and indSpam is a long vector. In particular to my situation, length(distMat[1,]) is 2412. When I enter in k as a vector of length one, I get a vector of length 2412. I want to be able to enter in k as a vector of length two and get a matrix of 2412x2. I am trying to use a while loop to let it go through the length of k, but it only returns to me a vector of length 2412. What am I doing wrong?
predNeighbor = function(k, distMat, indSpam){
counter = 1
while (counter<(length(k)+1))
{
preMatrix = apply(distMat, 1, order)
orderedMatrix = t(preMatrix)
truncate = orderedMatrix[,1:k[counter]]
checking = indSpam[truncate]
checking2 = matrix(checking, ncol = k[counter])
number = apply(checking2, 1, sum)
return(number[1:length(distMat[1,])] > (k[counter]/2))
counter = counter + 1
}
}
I am trying to create a function that will take in a vector k and return to me a matrix with dimensions length(distMat[1,]) by length(k)
Here's a function that does this.
foo <- function(k, distMat) {
return(matrix(0, nrow = length(distMat[1, ]), ncol = length(k)))
}
If you have other requirements, please describe them in words.
Based on your comment, I think I understand better your goal. You have a function that returns a vector of length k and you want to save it's output as rows in a matrix. This is a pretty common task. Let's do a simple example where k starts out as 1:10, and say we want to add some noise to it with a function foo() and see how the rank changes.
In the case where the input to the function is always the same, replicate() works very well. It will automatically put everything in a matrix
k <- 1:10
noise_and_rank <- function(k) {
rank(k + runif(length(k), min = -2, max = 2))
}
results <- replicate(n = 8, expr = {noise_and_rank(k)})
In the case where you want to iterate, i.e., the output from the one go is the input for the next, a for loop is good, and we just pre-allocate a matrix with 0's, to fill in one column/row at a time
k <- 1:10
n.sim <- 8
results <- matrix(0, nrow = length(k), ncol = n.sim)
results[, 1] <- k
for(i in 2:n.sim) {
results[, i] <- noise_and_rank(results[, i - 1])
}
What your original question seems to be about is how to do the pre-allocation. If the input is always the same, using replicate() means you don't worry about it. If the input is is different each time, then pre-allocate using matrix(), you don't need to write any special function.
I want to skip an error (if there is any) in a loop and continue the next iteration. I want to compute 100 inverse matrices of a 2 by 2 matrix with elements randomly sampled from {0, 1, 2}. It is possible to have a singular matrix (for example,
1 0
2 0
Here is my code
set.seed(1)
count <- 1
inverses <- vector(mode = "list", 100)
repeat {
x <- matrix(sample(0:2, 4, replace = T), 2, 2)
inverses[[count]] <- solve(x)
count <- count + 1
if (count > 100) break
}
At the third iteration, the matrix is singular and the code stops running with an error message. In practice, I would like to bypass this error and continue to the next loop. I know I need to use a try or tryCatch function but I don't know how to use them. Similar questions have been asked here, but they are all really complicated and the answers are far beyond my understanding. If someone can give me a complete code specifically for this question, I really appreciate it.
This would put NULLs into inverses for the singular matrices:
inverses[[count]] <- tryCatch(solve(x), error=function(e) NULL)
If the first expression in a call to tryCatch raises an error, it executes and returns the value of the function supplied to its error argument. The function supplied to the error arg has to take the error itself as an argument (here I call it e), but you don't have to do anything with it.
You could then drop the NULL entries with inverses[! is.null(inverses)].
Alternatively, you could use the lower level try. The choice is really a matter of taste.
count <- 0
repeat {
if (count == 100) break
count <- count + 1
x <- matrix(sample(0:2, 4, replace = T), 2, 2)
x.inv <- try(solve(x), silent=TRUE)
if ('try-error' %in% class(x.inv)) next
else inverses[[count]] <- x.inv
}
If your expression generates an error, try returns an object with class try-error. It will print the message to screen if silent=FALSE. In this case, if x.inv has class try-error, we call next to stop the execution of the current iteration and move to the next one, otherwise we add x.inv to inverses.
Edit:
You could avoid using the repeat loop with replicate and lapply.
matrices <- replicate(100, matrix(sample(0:2, 4, replace=T), 2, 2), simplify=FALSE)
inverses <- lapply(matrices, function(mat) if (det(mat) != 0) solve(mat))
It's interesting to note that the second argument to replicate is treated as an expression, meaning it gets executed afresh for each replicate. This means you can use replicate to make a list of any number of random objects that are generated from the same expression.
Instead of using tryCatch you could simply calculate the determinant of the matrix with the function det. A matrix is singular if and only if the determinant is zero.
Hence, you could test whether the determinant is different from zero and calculate the inverse only if the test is positive:
set.seed(1)
count <- 1
inverses <- vector(mode = "list", 100)
repeat {
x <- matrix(sample(0:2, 4, replace = T), 2, 2)
# if (det(x)) inverses[[count]] <- solve(x)
# a more robust replacement for the above line (see comment):
if (is.finite(determinant(x)$modulus)) inverses[[count]] <- solve(x)
count <- count + 1
if (count > 100) break
}
Update:
It is, however, possible to avoid generating singular matrices. The determinant of a 2-by-2 matrix mat is definded as mat[1] * mat[4] - mat[3] * mat[2]. You could use this knowledge for sampling random numbers. Just do not sample numbers which will produce a singular matrix. This, of course, depends on the numbers sampled before.
set.seed(1)
count <- 1
inverses <- vector(mode = "list", 100)
set <- 0:2 # the set of numbers to sample from
repeat {
# sample the first value
x <- sample(set, 1)
# if the first value is zero, the second and third one are not allowed to be zero.
new_set <- ifelse(x == 0, setdiff(set, 0), set)
# sample the second and third value
x <- c(x, sample(new_set, 2, replace = T))
# calculate which 4th number would result in a singular matrix
not_allowed <- abs(-x[3] * x[2] / x[1])
# remove this number from the set
new_set <- setdiff(0:2, not_allowed)
# sample the fourth value and build the matrix
x <- matrix(c(x, sample(new_set, 1)), 2, 2)
inverses[[count]] <- solve(x)
count <- count + 1
if (count > 100) break
}
This procedure is a guarantee that all generated matrices will have an inverse.
try is just a way of telling R: "If you commit an error inside the following parentheses, then skip it and move on."
So if you're worried that x <- matrix(sample(0:2, 4, replace = T), 2, 2) might give you an error, then all you have to do is:
try(x <- matrix(sample(0:2, 4, replace = T), 2, 2))
However, keep in mind then that x will be undefined if you do this and it ends up not being able to compute the answer. That could cause a problem when you get to solve(x) - so you can either define x before try or just "try" the whole thing:
try(
{
x <- matrix(sample(0:2, 4, replace = T), 2, 2)
inverses[[count]] <- solve(x)
}
)
The documentation for try explains your problem pretty well. I suggest you go through it completely.
Edit: The documentation example looked pretty straightforward and very similar to the op's question. Thanks for the suggestion though. Here goes the answer following the example in the documentation page:
# `idx` is used as a dummy variable here just to illustrate that
# all 100 entries are indeed calculated. You can remove it.
set.seed(1)
mat_inv <- function(idx) {
print(idx)
x <- matrix(sample(0:2, 4, replace = T), nrow = 2)
solve(x)
}
inverses <- lapply(1:100, function(idx) try(mat_inv(idx), TRUE))