R - Number of cases in 4-dimensional pairwise array - r

I'm making a "pairwise" array in R. Given the vector combo, I'm finding every permutation of 4 elements. Thus, a 4-dimensional "pairwise" array. My current approach is making it as a simple list, using nested sapply functions, like so:
fourList <- sapply(X = combo, FUN = function(h) {
hi <- which(combo == h) #get index of h
sapply(X = combo[hi:n], FUN = function(i) {
ii <- which(combo == i) #get index of i
sapply(X = combo[ii:n], FUN = function(j) {
ji <- which(combo == j) #get index of j
sapply(X = combo[ji:n], FUN = function(k) {
list(c(h,i,j,k))
})
})
})
})
I'd like to make some sort of progress indicator, so I can report to the user what percentage of the array has been built. Ideally, I'd just take numberCasesCompleted and divide that by totalCases = length(combo)^4 to get the fraction that is done. However, I can't seem to figure out an algorithm that takes in hi, ji, and ii, and outputs the value numberCasesCompleted. How can I calculate this?
In the 2D (x by y) case (e.g: sapply(X, function(x) {sapply(X[xi:n], function(y) {list(c(x,y))}}), this could be calculated by sum(n - (x-2:x), y-(x-1)), but generalizing that to 4 dimensions sounds rather difficult.

I'm stupid. Just add the proportion complete of the first level to the proportion complete of the second level (scaled down to a single iteration at the first level), and so forth.
In my case: completion <- hi/(n+1) + (ii/(n+1))*(1/n) + (ji/n)*(1/n)*(1/n)
(The n+1 denominators are there because there's effectively another loop after hi is equal to n, as ii still has a full set of iterations to complete. Otherwise it would end at ~101%. But for a rough/quick estimation of progress, this is fine.)
However, it is worth noting that (according to #Gregor in the comments) there are much better ways of making combinations in R, so my original use case may be moot (just don't use nested sapply in the first place).

Related

How to use lapply with a condition in R to fit only one element each time

Suppose I have two vectors. Suppose further that I would like my function takes only one values of each vector and return me the output. Then, I would like another function to check the values of each run. If the output of the previous run is smaller than the new one. Then, I would like my function to stop and return me all the previous values. My original function is very complicated (estimation models). Hence, I try to provide an example to explain my idea.
Suppose that I have these two vectors:
set.seed(123)
x <- rnorm(1:20)
y <- rnorm(1:20)
Then, I would like to write a function which only takes one values of each vector and multiplied them. Then, return me the output. Then, I would like the function to check if the previous multiplication is smaller than the new one or not. If yes, then stop and return me all the previous multiplication.
I tried this:However, this functions takes all the values at once and return me a list of the multiplication. I was thinking about using lapply, to fit one element at a time but I do not know how to work with the conditions.
myfun <- function(x, y, n){
multi <- list()
for ( i in 1:n){
multi[[i]] <- x[[i]]*y[[i]]
}
return(multi)
}
myfun(x,y,10)
Here is another try
x <- rnorm(1:20)
y <- rnorm(1:20)
myfun <- function(x, y){
multi <- x*y
return(multi)
}
This is the first function. I would like to run it element by element. Each time, I would like it to returns me only one multiplication result. Then, another function (wrapper function) check the result. It the second output of the first function (multiplication function) is larger than the first one, then stop, otherwise keep going.
I would like to write a function which only takes one values of each vector and multiplied them. Then, return me the output. Then, I would like the function to check if the previous multiplication is smaller than the new one or not.
I would like the multiplication in a separate function. Then, I would like to check its output. So, I should have a warper function.
You can apply a for loop with a stopping condition, similar to what you have already:
# example input
set.seed(123)
x <- rnorm(1:20)
y <- rnorm(1:20)
# example function
f = function(xi, yi) xi*yi
# wrapper
stopifnot(length(x) == length(y))
res = vector(length(x), mode="list")
for (i in seq_along(x)){
res[[i]] = f(x[[i]], y[[i]])
if (i > 1L && res[[i]] > res[[i-1L]]) break
}
res[seq_len(i)]
Comments:
It is better to predefine the max length res might need (here, length(x)), rather than expanding it in the loop.
For this function (multiplication), there is no good reason to proceed elementwise. R's multiplication function is vectorized and fast.
You don't need to use a list-class output for this function, since it is returning doubles; res = double(length(x)) should also work.
You don't need to use list-style accessors for x, y and res unless lists are involved; res[i] = f(x[i], y[i]) should work, etc.

Which loop to use, R language?

We have to create function(K) that returns vector which has all items smaller than or equal to K from fibonacci sequence. We can assume K is fibonacci item. For example if K is 3 the function would return vector (1,1,2,3).
In general, a for loop is used when you know how many iterations you need to do, and a while loop is used when you want to keep going until a condition is met.
For this case, it sounds like you get an input K and you want to keep going until you find a Fibonacci term > K, so use a while loop.
ans <- function(n) {
x <- c(1,1)
while (length(x) <= n) {
position <- length(x)
new <- x[position] + x[position-1]
x <- c(x,new)
}
return(x[x<=n])
}
`
Tried many different loops, and this is closest I get. It works with every other number but ans(3) gives 1,1,2 even though it should give 1,1,2,3. Couldn't see what is wrong with this.

Easiest way in R to get vector of frequencies of elements in vector

Suppose I have a vector of values v. What is the easiest way to get a vector f of length equal to v, where the ith element of f is the frequency of the ith element of v in v?
The only way I know to do it seems unnecessarily complicated:
v = sample(1:10,100,replace=TRUE)
D = data.frame( idx=1:length(v), v=v )
E = merge( D, data.frame(table(v)) )
E = E[ with(E,order(idx)), ]
f = E$Freq
Surely there's a simpler way to do this, along the lines of "frequencies(v)"?
For a vector of small positive integers v, as in the question, the expression
tabulate(v)[v]
is particularly simple as well as speedy.
For more general numerical vectors v you can persuade ecdf to help you out, as in
w <- sapply(v, ecdf(v)) * length(v)
tabulate(w)[w]
It's probably better to do the coding of the underlying algorithm yourself, though--and it certainly avoids the floating point rounding error implicit in the preceding solution:
frequencies <- function(x) {
i <- order(x)
v <- x[i]
w <- cumsum(c(TRUE, v[-1] != v[-length(x)]))
f <- tabulate(w)[w]
return(f[order(i)])
}
This algorithm sorts the data, assigns sequential identifiers 1, 2, 3, ... to the values as it encounters them (by summing a binary indicator of when the values change), uses the preceding tabulate()[] trick to obtain the frequencies efficiently, and then unsorts the results to make the output match the input, component by component.
I think the best solution here is:
ave(v,v,FUN=length)
It is simply ave()'s design to replicate and map the return value of FUN() back to every index of the input vector whose element was part of the group for which that particular invocation of FUN() was performed.
Something like this works for me:
sapply(v, function(elmt, vec) sum(vec == elmt), vec=v)
i would suggest you use table and as.vector:
as.vector(table(dataInVector))

How to append a vector to a vector r - in a vectorized style

We all know that appending a vector to a vector within a for loop in R is a bad thing because it costs time. A solution would be to do it in a vectorized style. Here is a nice example by Joshua Ulrich. It is important to first create a vector with known length and then fill it up, instead of appending each new piece to an existing piece within the loop.
Still, in his example he demonstrates 'only' how to append one data piece at a time. I am now fighting with the idea to fill a vector with vectors - not scalars.
Imagine I have a vector with a length of 100
vector <- numeric(length=100)
and a smaller vector that would fit 10 times into the first vector
vec <- seq(1,10,1)
How would I have to construct a loop that adds the smaller vector to the large vector without using c() or append ?
EDIT: This example is simplified - vec does not always consist of the same sequence but is generated within a for loop and should be added to vector.
You could just use normal vector indexing within the loop to accomplish this:
vector <- numeric(length=100)
for (i in 1:10) {
vector[(10*i-9):(10*i)] <- 1:10
}
all.equal(vector, rep(1:10, 10))
# [1] TRUE
Of course if you were just trying to repeat a vector a certain number of times rep(vec, 10) would be the preferred solution.
A similar approach, perhaps a little more clear if your new vectors are of variable length:
# Let's over-allocate so that we now the big vector is big enough
big_vec = numeric(1e4)
this.index = 1
for (i in 1:10) {
# Generate a new vector of random length
new_vec = runif(sample(1:20, size = 1))
# Stick in in big_vec by index
big_vec[this.index:(this.index + length(new_vec) - 1)] = new_vec
# update the starting index
this.index = this.index + length(new_vec)
}
# truncate to only include the added values
big_vec = big_vec[1:(this.index - 1)]
As #josilber suggested in comments, lists would be more R-ish. This is a much cleaner approach, unless the new vector generation depends on the previous vectors, in which case the for loop might be necessary.
vec_list = list()
for (i in 1:10) {
# Generate a new vector of random length
vec_list[[i]] = runif(sample(1:20, size = 1))
}
# Or, use lapply
vec_list = lapply(1:10, FUN = function(x) {runif(sample(1:20, size = 1))})
# Then combine with do.call
do.call(c, vec_list)
# or more simply, just unlist
unlist(vec_list)

How to print the name of current row when using apply in R?

For example, I have a matrix k
> k
d e
a 1 3
b 2 4
I want to apply a function on k
> apply(k,MARGIN=1,function(p) {p+1})
a b
d 2 3
e 4 5
However, I also want to print the rowname of the row being apply so that I can know which row the function is applied on at that time.
It may looks like this:
apply(k,MARGIN=1,function(p) {print(rowname(p)); p+1})
But I really don't do how to do that in R.
Does anyone has any idea?
Here's a neat solution to what I think you're asking. (I've called the input matrix mat rather than k for clarity - in this example, mat has 2 columns and 10 rows, and the rows are named abc1 through to abc10.)
In the code below, the result out1 is the thing you wanted to calculate (the outcome of the apply command). The result out2 comes out identically to out1 except that it prints out the rownames that it is working on (I put in a delay of 0.3 seconds per row so you can see it really does do this - take this out when you want the code to run full speed obviously!)
The trick I came up with was to cbind the row numbers (1 to n) onto the left of mat (to create a matrix with one additional column), and then use this to refer back to the rownames of mat. Note the line x = y[-1] which means that the actual calculation within the function (here, adding 1) ignores the first column of row numbers, which means it's the same as the calculation done for out1. Whatever sort of calculation you want to perform on the rows can be done this way - just pretend that y never existed, and formulate your desired calculation using x. Hope this helps.
set.seed(1234)
mat = as.matrix(data.frame(x = rpois(10,4), y = rpois(10,4)))
rownames(mat) = paste("abc", 1:nrow(mat), sep="")
out1 = apply(mat,1,function(x) {x+1})
out2 = apply(cbind(seq_len(nrow(mat)),mat),1,
function(y) {
x = y[-1]
cat("Doing row:",rownames(mat)[y[1]],"\n")
Sys.sleep(0.3)
x+1
}
)
identical(out1,out2)
You can use a variable outside of the apply call to keep track of the row index and pass the row names as an extra argument to your function:
idx <- 1
apply(k, 1, function(p, rn) {print(rn[idx]); idx <<- idx + 1; p + 1}, rownames(k))
This should work. The cat() function is what you want to use when printing results during evaluation of a function. paste(), conversely, just returns a character vector but doesn't send it to the command window.
The solution below uses a counter created as a closure, allowing it to "remember" how many times the function has been run before. Note the use of the global assign <<-. If you really want to understand what's going on here, I recommend reading through this wiki https://github.com/hadley/devtools/wiki/
Note there may be an easier way to do this; my solution assumes that there is no way to access the rownumber or rowname of a current row using typical means within an apply function. As previously mentioned, this would be no problem in a loop.
k <- matrix(c(1,2,3,4),ncol=2)
rownames(k) <- c("a","b")
colnames(k) <- c("d","e")
make.counter <- function(x){
i <- 0
function(){
i <<- i+1
i
}
}
counter1 <- make.counter()
apply(k,MARGIN=1,function(p){
current.row <- rownames(k)[counter1()]
cat(current.row,"\n")
return(p+1)
})
As far as I know you cannot do that with apply, but you could loop through the rownames of your data frame. Lame example:
lapply(rownames(mtcars), function(x) sprintf('The mpg of %s is %s.', x, mtcars[x, 1]))

Resources