Looping through an R vector to apply a formula - r

I am trying to loop through two vectors and apply a formula to the data but I can't figure out how to do what I want to do...
Basically here is some sample data...two vectors that I need to reference...
v1 <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17)
v2 <- c(0.01,0.06,0.11,0.16,0.21,0.26,0.31,0.36,0.41,0.46,0.51,0.56,0.61,0.66,0.71,0.76,0.81)
This is this the formaula I would like to apply...this would be the first instance...
(1+v2[2])^v1[2] / (1+(v2[1]^v1[1]))
The second iteration would be...
(1+v2[3])^v1[3] / (1+(v2[2]^v1[2]))
So I am trying to do this but I can't figure out how to iterate the vector index...I am trying along the lines of this but I can't increase the index via a counter, e.g., i...
for (i in seq_along(v1) {
i <- 0
res[i] <- ((1+v2[2+i])^v1[2+i] / (1+(v2[1+i]^v1[1+i]))
i <- i+1
}
After going through the loop I would get the following output...
> res
[1] 1.112475 1.217187 1.323924 1.432501 1.542753 1.654534 1.767715 1.882178 1.997821 2.114550 2.232280 2.350938 2.470454 2.590767 2.711821 2.833564
I have done lots of searching but I am coming up blank, any suggestions?

You can do this in one go with res <- (1+v2[-1])^v1[-1] / (1+(v2[-17]^v1[-17]))
Basically this uses length-16 vectors (v1 and v2 minus the first and last elements) and takes advantage of R's natural vector processing.

Related

Problem with checking logical within for loop

Inspired by the leetcode challenge for two sum, I wanted to solve it in R. But while trying to solve it by brute-force I run in to an issue with my for loop.
So the basic idea is that given a vector of integers, which two integers in the vector, sums up to a set target integer.
First I create 10000 integers:
set.seed(1234)
n_numbers <- 10000
nums <- sample(-10^4:10^4, n_numbers, replace = FALSE)
The I do a for loop within a for loop to check every single element against eachother.
# ensure that it is actually solvable
target <- nums[11] + nums[111]
test <- 0
for (i in 1:(length(nums)-1)) {
for (j in 1:(length(nums)-1)) {
j <- j + 1
test <- nums[i] + nums[j]
if (test == target) {
print(i)
print(j)
break
}
}
}
My problem is that it starts wildly printing numbers before ever getting to the right condition of test == target. And I cannot seem to figure out why.
I think there are several issues with your code:
First, you don't have to increase your j manually, you can do this within the for-statement. So if you really want to increase your j by 1 in every step you can just write:
for (j in 2:(length(nums)))
Second, you are breaking only the inner-loop of the for-loop. Look here Breaking out of nested loops in R for further information on that.
Third, there are several entries in nums that gave the "right" result target. Therefore, your if-condition works well and prints all combination of nums[i]+nums[j] that are equal to target.

How to create a matrix or list of results using a loop?

I am performing a loop to compute the values of 4 expressions. My loop is:
for (i in c(1:14)){
VV1a <- round((db$Ya1[i]^Comb$Sigma)+ (1/(exp(log(1/p1a)^Comb$Alpha)))*
((db$Xa1[i]^Comb$Sigma)-(db$Ya1[i]^Comb$Sigma)),1)
VV1b <- round((db$Yb1[i]^Comb$Sigma)+ (1/(exp(log(1/p1b)^Comb$Alpha)))*
((db$Xb1[i]^Comb$Sigma)-(db$Yb1[i]^Comb$Sigma)),1)
VV2a <- round((db$Ya2[i]^Comb$Sigma)+ (1/(exp(log(1/p2a)^Comb$Alpha)))*
((db$Xa2[i]^Comb$Sigma)-(db$Ya2[i]^Comb$Sigma)),1)
VV2b <- round((db$Yb2[i]^Comb$Sigma)+ (1/(exp(log(1/p2b)^Comb$Alpha)))*
((db$Xb2[i]^Comb$Sigma)-(db$Yb2[i]^Comb$Sigma)),1)
}
Now for each singular, I have 2105401 values. However, using this statement each time R overwrites the elements (of course). In the end, my elements (VV1a, ....) contain only the last loop (i.e. i = 14).
How do I keep all the computation? To be more specific: ideally, for each, I would like to have a vector of the values computed.
Use a list().
Assuming that you're doing different calculations for VV1a, VV1b, etc..., you could store, for every iteration i, the resulting array as a list.
results <- list()
for (i in c(1:14)){
results[["VV1a"]][[i]] <- list(your_calculations_which_result_in_a_vector)
....
}

Indexing variables in R

I am normally a maple user currently working with R, and I have a problem with correctly indexing variables.
Say I want to define 2 vectors, v1 and v2, and I want to call the nth element in v1. In maple this is easily done:
v[1]:=some vector,
and the nth element is then called by the command
v[1][n].
How can this be done in R? The actual problem is as follows:
I have a sequence M (say of length 10, indexed by k) of simulated negbin variables. For each of these simulated variables I want to construct a vector X of length M[k] with entries given by some formula. So I should end up with 10 different vectors, each of different length. My incorrect code looks like this
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
for(k in 1:sims){
x[k]<-rep(NA,M[k])
X[k]<-rep(NA,M[k])
for(i in 1:M[k]){x[k][i]<-runif(1,min=0,max=1)
if(x[k][i]>=0 & x[i]<=0.1056379){
X[k][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[k][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
}
The error appears to be that x[k] is not a valid name for a variable. Any way to make this work?
Thanks a lot :)
I've edited your R script slightly to get it working and make it reproducible. To do this I had to assume that eks_2016_kasko was an integer value of 10.
require(MASS)
sims<-10
# Because you R is not zero indexed add one
M<-rnegbin(sims, 10*exp(-2.17173), 840.1746) + 1
# Create a list
x <- list()
X <- list()
for(k in 1:sims){
x[[k]]<-rep(NA,M[k])
X[[k]]<-rep(NA,M[k])
for(i in 1:M[k]){
x[[k]][i]<-runif(1,min=0,max=1)
if(x[[k]][i]>=0 & x[[k]][i]<=0.1056379){
X[[k]][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[[k]][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
This will work and I think is what you were trying to do, BUT is not great R code. I strongly recommend using the lapply family instead of for loops, learning to use data.table and parallelisation if you need to get things to scale. Additionally if you want to read more about indexing in R and subsetting Hadley Wickham has a comprehensive break down here.
Hope this helps!
Let me start with a few remarks and then show you, how your problem can be solved using R.
In R, there is most of the time no need to use a for loop in order to assign several values to a vector. So, for example, to fill a vector of length 100 with uniformly distributed random variables, you do something like:
set.seed(1234)
x1 <- rep(NA, 100)
for (i in 1:100) {
x1[i] <- runif(1, 0, 1)
}
(set.seed() is used to set the random seed, such that you get the same result each time.) It is much simpler (and also much faster) to do this instead:
x2 <- runif(100, 0, 1)
identical(x1, x2)
## [1] TRUE
As you see, results are identical.
The reason that x[k]<-rep(NA,M[k]) does not work is that indeed x[k] is not a valid variable name in R. [ is used for indexing, so x[k] extracts the element k from a vector x. Since you try to assign a vector of length larger than 1 to a single element, you get an error. What you probably want to use is a list, as you will see in the example below.
So here comes the code that I would use instead of what you proposed in your post. Note that I am not sure that I correctly understood what you intend to do, so I will also describe below what the code does. Let me know if this fits your intentions.
# define M
library(MASS)
eks_2016_kasko <- 486689.1
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
# define the function that calculates X for a single value from M
calculate_X <- function(m) {
x <- runif(m, min=0,max=1)
X <- ifelse(x > 0.1056379, rlnorm(m, 6.228244, 0.3565041),
rlnorm(m, 8.910837, 1.1890874))
}
# apply that function to each element of M
X <- lapply(M, calculate_X)
As you can see, there are no loops in that solution. I'll start to explain at the end:
lapply is used to apply a function (calculate_X) to each element of a list or vector (here it is the vector M). It returns a list. So, you can get, e.g. the third of the vectors with X[[3]] (note that [[ is used to extract elements from a list). And the contents of X[[3]] will be the result of calculate_X(M[3]).
The function calculate_X() does the following: It creates a vector of m uniformly distributed random values (remember that m runs over the elements of M) and stores that in x. Then it creates a vector X that contains log normally distributed random variables. The parameters of the distribution depend on the value x.

Logical comparison of two vectors with binary (0/1) result

For an assignment I had to create a random vector theta, a vector p containing for each element of theta the associated probability, and another random vector u. No problems thus far, but I'm stuck with the next instruction which I report below:
Generate a vector r1 that has a 1 in position i if pi ≥ ui and 0 if pi < ui. The
vector r1 is a Rasch item given the latent variable theta.
theta=rnorm(1000,0,1)
p=(exp(theta-1))/(1+exp(theta-1))
u=runif(1000,0,1)
I tried the following code, but it doesn't work.
r1<-for(i in 1:1000){
if(p[i]<u[i]){
return("0")
} else {
return("1")}
}
You can use the ifelse function:
r1 <- ifelse(p >= u, 1, 0)
Or you can simply convert the logical comparison into a numeric vector, which turns TRUE into 1 and FALSE into 0:
r1 <- as.numeric(p >= u)
#DavidRobinson gave a nice working solution, but let's look at why your attempt didn't work:
r1<-for(i in 1:1000){
if(p[i]<u[i]){
return("0")
} else {
return("1")}
}
We've got a few problems, biggest of which is that you're confusing for loops with general functions, both by assigning and using return(). return() is used when you are writing your own function, with function() <- .... Inside a for loop it isn't needed. A for loop just runs the code inside it a certain number of times, it can't return something like a function.
You do need a way to store your results. This is best done by pre-allocating a results vector, and then filling it inside the for loop.
r1 <- rep(NA, length(p)) # create a vector as long as p
for (i in 1:1000) {
if (p[i] < u[i]) { # compare the ith element of p and u
r1[i] <- 0 # put the answer in the ith element of r1
} else {
r1[i] <- 1
}
}
We could simplify this a bit. Rather than bothering with the if and the else, you could start r1 as all 0's, and then only change it to a 1 if p[i] >= u[i]. Just to be safe I think it's better to make the for statement something like for (i in 1:length(p)), or best yet for (i in seq_along(p)), but the beauty of R is how few for loops are necessary, and #DavidRobinson's vectorized suggestions are far cleaner.

How to print the name of current row when using apply in R?

For example, I have a matrix k
> k
d e
a 1 3
b 2 4
I want to apply a function on k
> apply(k,MARGIN=1,function(p) {p+1})
a b
d 2 3
e 4 5
However, I also want to print the rowname of the row being apply so that I can know which row the function is applied on at that time.
It may looks like this:
apply(k,MARGIN=1,function(p) {print(rowname(p)); p+1})
But I really don't do how to do that in R.
Does anyone has any idea?
Here's a neat solution to what I think you're asking. (I've called the input matrix mat rather than k for clarity - in this example, mat has 2 columns and 10 rows, and the rows are named abc1 through to abc10.)
In the code below, the result out1 is the thing you wanted to calculate (the outcome of the apply command). The result out2 comes out identically to out1 except that it prints out the rownames that it is working on (I put in a delay of 0.3 seconds per row so you can see it really does do this - take this out when you want the code to run full speed obviously!)
The trick I came up with was to cbind the row numbers (1 to n) onto the left of mat (to create a matrix with one additional column), and then use this to refer back to the rownames of mat. Note the line x = y[-1] which means that the actual calculation within the function (here, adding 1) ignores the first column of row numbers, which means it's the same as the calculation done for out1. Whatever sort of calculation you want to perform on the rows can be done this way - just pretend that y never existed, and formulate your desired calculation using x. Hope this helps.
set.seed(1234)
mat = as.matrix(data.frame(x = rpois(10,4), y = rpois(10,4)))
rownames(mat) = paste("abc", 1:nrow(mat), sep="")
out1 = apply(mat,1,function(x) {x+1})
out2 = apply(cbind(seq_len(nrow(mat)),mat),1,
function(y) {
x = y[-1]
cat("Doing row:",rownames(mat)[y[1]],"\n")
Sys.sleep(0.3)
x+1
}
)
identical(out1,out2)
You can use a variable outside of the apply call to keep track of the row index and pass the row names as an extra argument to your function:
idx <- 1
apply(k, 1, function(p, rn) {print(rn[idx]); idx <<- idx + 1; p + 1}, rownames(k))
This should work. The cat() function is what you want to use when printing results during evaluation of a function. paste(), conversely, just returns a character vector but doesn't send it to the command window.
The solution below uses a counter created as a closure, allowing it to "remember" how many times the function has been run before. Note the use of the global assign <<-. If you really want to understand what's going on here, I recommend reading through this wiki https://github.com/hadley/devtools/wiki/
Note there may be an easier way to do this; my solution assumes that there is no way to access the rownumber or rowname of a current row using typical means within an apply function. As previously mentioned, this would be no problem in a loop.
k <- matrix(c(1,2,3,4),ncol=2)
rownames(k) <- c("a","b")
colnames(k) <- c("d","e")
make.counter <- function(x){
i <- 0
function(){
i <<- i+1
i
}
}
counter1 <- make.counter()
apply(k,MARGIN=1,function(p){
current.row <- rownames(k)[counter1()]
cat(current.row,"\n")
return(p+1)
})
As far as I know you cannot do that with apply, but you could loop through the rownames of your data frame. Lame example:
lapply(rownames(mtcars), function(x) sprintf('The mpg of %s is %s.', x, mtcars[x, 1]))

Resources