stan: input data for arrays of vectors - vector

I am fairly new to stan and I am trying to read in some data for my model.
I defined an array of vectors as proposed in the corresponding stan manual but I don't know how to write down my input data.
The data parameters I need look like this:
data {
int K; // number classes
int N; // number of all data points
vector[2] y[N];
}
For a normal vector vector[k] my input looks like this:
K <- 5
N <- 2
y <- c(8.90680694580078,5.51890277862549)
But I just don't know how to do this for the sort of vector I have.
Something like this doesn't work for N <- 4
y <- c(c(8.90680694580078,5.51890277862549), c(2.00219345092773,10.7796802520752))
Any suggestions?

In R, you need to pass a matrix with N rows and 2 columns or I believe it would work to pass a list with N elements, each of which is a vector of length 2.

You can create a matrix
y <- matrix(rnorm(4), ncol = 2)

Related

"Indexing" (in a mathematical sense) variables in R so that the correct variable is chosen in each iteration of a loop

I have 28 variables. Each one of them is a vector of numeric class. The names of these vectors are sub_1, sub_2, sub_3, and on and on all the way down to sub_28.
What I want to do with these vectors is to compute a system of 28 equations where, in each equation, only one of those vectors is involved.
The right hand-side of each equation is the calculation that I want to make on each vector and the left hand-side is where I want to store the output of each calculation.
So this is what I do. First, I declare a vector of length 28.
alpha = vector("numeric", 28)
Each component of this vector is going to store the corresponding outputs of the calculations.
For example, I want to set alpha[1] equal to
1 + length(sub_1)*(sum(log(sub_1/(1.5))))
And I want to set alpha[2] equal to
1 + length(sub_2)*(sum(log(sub_2/(1.5))))
And so on. You get the idea.
I thought about using a 'for' loop. This is what comes to my mind:
for (i in 1:28) {
alpha[i] = 1 + length(sub_i)*(sum(log(sub_i/(1.5))))}
I know exactly what is wrong with this code. The compiler searches for a variable whose name is sub_i, and it won't find that variable because I haven't declared it. What I want is for the compiler to read the _i as a subindex. I want the compiler to look —in each iteration of the loop— for the sub_i vector whose subindex i matches the number of the iteration. How can I achieve that?
Edit: by the way, the 28 vectors have varying lengths.
I would put the 28 numeric variable vectors in a list and apply a function on all list elements.
Simulated data:
set.seed(1) # for reproducibility
# pick some random vector lengths
veclengths <- sample(50:100, 28, replace = TRUE)
# generate random numeric values to generate the vectors
my.variables <- lapply(veclengths, function(x) rnorm(x,100,10))
# name the vectors as in your example (not required)
names(my.variables) <- paste0("sub_", seq_along(my.variables))
# extract these 28 separate vectors as individual variables for your use case
list2env(my.variables , envir = .GlobalEnv)
You could then load your vectors into a list, e.g.
vars <- ls(pattern="sub_.*") # pick variables by name pattern
# I sorted here numerically, for convenience
my.variables <- mget(vars[order(as.numeric(gsub("sub_", "", vars)))])
Then just apply the function you chose to all list elements separately
resfun <- function(x) {1 + length(x)*(sum(log(x/(1.5))))}
alpha <- unlist(lapply(my.variables, resfun))

writing a loop in R with a function

Teach me how to create a simple loop to calculate the following equation:
v0 = v * exp(k*d)
where v is a dataframe containing 17631 rows x 15 variables. For every v(row) it is multiplied with exp(k*d).
where k is a column vector containing 15 rate constant, one for each variable.
where d is a row vector containing 17631 rows.
From my heart thanks!
If you want for loops, you can do it like below
# for loop by row
for (i in seq(nrow(v))) {
v0 <- rbind(v0,v[i,]*exp(d*k[i]))
}
# for loop by column
for (j in seq(ncol(v))) {
v0 <- cbind(v0,v[,j]*exp(d*k))
}
However, the most efficient way is using matrix to manipulate the data. Instead of using for loop, maybe you can try the code below
# matrix approach
v0 <- as.matrix(v)*exp(outer(d,k,"*"))

for loop only showing result of one case in R

I intend to fill a matrix I created that has 1000 rows and 2 columns. Here B is 1000.
resampled_ests <- matrix(NA, nrow = B, ncol = 2)
names(resampled_ests) <- c("Intercept_Est", "Slope_Est")
I want to fill it using a for loop looping from 1 to 1000.
ds <- diamonds[resampled_values[b,],]
Here, each of the ds(there should be 1000 versions of it in the for loop) is a data frame with 2 columns and 2000 rows. and I would like to use the lm() function to get the Beta coefficients of the two columns of data.
for (b in 1:B) {
#Write code that fills in the matrix resample_ests with coefficent estimates.
ds <- diamonds[resampled_values[b,],]
lm2 <- lm(ds$price~ds$carat, data = ds)
rowx <- coefficients(lm2)
resampled_ests <- rbind(rowx)
}
However, after I run the loop, resampled_ests, which is supposed to be a matrix of 1000 rows only shows 1 row, 1 pair of coefficients. But when I test the code outside of the loop by replacing b with numbers, I get different results which are correct. But by putting them together in a for loop, I don't seem to be row binding all of these different pairs of coefficients. Can someone explain why the result matrix resampled_etsis only showing one result case(1 row) of data?
rbind(x) returns x because you're not binding it to anything. If you want to build a matrix row by row, you need something like
resampled_ests <- rbind(resampled_ests, rowx)
This also means you need to initialize resampled_ests before the loop.
Which, if you're doing that anyway, I might just make a 1000 x 2 matrix of zeros and fill in the rows in the loop. Something like...
resampled_ests <- matrix(rep(0, 2*B), nrow=B)
for (b in 1:B) {
ds <- diamonds[resampled_values[b,],]
lm2 <- lm(ds$price~ds$carat, data = ds)
rowx <- coefficients(lm2)
resampled_ests[b,] <- rowx
}

R: data frame containing columns of differing length corresponding to maximum possible of combn()/choose()

I am trying to generate a data frame that contains all of the results of possible combinations. I'm using the function
combn(x,m)
x <- 17
m <- some range of the numbers between 2 and 16
in a loop where each iteration corresponds to a new value of m. Each iteration of the loop returns a vector of length choose(n,k) where n is equivalent to m and x is equivalent to k. I want to append each resulting vector as a column in a dataframe that contains all of the results, but this is not straightforward since the length of each vector varies. I have been able to accomplish this by first establishing a dataframe of NA values (data.frame) that is then incrementally filled by the values of the new.vector with the below loop:
n <- max(length(data.frame), length(new.vector))
for(l in 0:n) {
data.frame[l,j-1] <- new.vector[l]
}
I have two questions:
Is there a better way to append a new column that differs in length from the previous columns in the data frame that uses the power of R and vector operations rather than doing this via a loop?
Since this method works, I can go with it, but I've struggled to find the way to set the maximum number of rows in the dataframe that I initialize. It should be the maximum of choose(n,k1), choose(n,k2), choose(n,k3) ... choose(n,kn). I'm currently using the below to initialize the dataframe, but it generates the absolute maximum for a given n, which may be more rows than necessary depending on the range of k values.
dataframe <- data.frame(matrix(NA, nrow = ncol(combn(n,length(n)/2)),
ncol = max.n-min.n+1))

Method in [R] for arrays of data frames

I am looking for a best practice to store multiple vector results of an evaluation performed at several different values. Currently, my working code does this:
q <- 55
value <- c(0.95, 0.99, 0.995)
a <- rep(0,q) # Just initialize the vector
b <- rep(0,q) # Just initialize the vector
for(j in 1:length(value)){
for(i in 1:q){
a[i]<-rnorm(1, i, value[j]) # just as an example function
b[i]<-rnorm(1, i, value[j]) # just as an example function
}
df[j] <- data.frame(a,b)
}
I am trying to find the best way to store individual a and b for each value level
To be able to iterate through the variable "value" later for graphing
To have the value of the variable "value" and/or a description of it available
I'm not exactly sure what you're trying to do, so let me know if this is what you're looking for.
q = 55
value <- c(sd95=0.95, sd99=0.99, sd995=0.995)
a = sapply(value, function(v) {
rnorm(q, 1:q, v)
})
In the code above, we avoid the inner loop by vectorizing. For example, rnorm(55, 1:55, 0.95) will give you 55 random normal deviates, the first drawn from a distribution with mean=1, the second from a distribution with mean=2, etc. Also, you don't need to initialize a.
sapply takes the place of the outer loop. It applies a function to each value in value and returns the three vectors of random draws as the data frame a. I've added names to the values in value and sapply uses those as the column names in the resulting data frame a. (It would be more standard to make value a list, rather than a vector with named elements. You can do that with value <- list(sd95=0.95, sd99=0.99, sd995=0.995) and the code will otherwise run the same.)
You can create multiple data frames and store them in a list as follows:
q <- list(a=10, b=20)
value <- list(sd95=0.95, sd99=0.99, sd995=0.995)
df.list = sapply(q, function(i) {
sapply(value, function(v) {
rnorm(i, 1:i, v)
})
})
This time we have two different values for q and we wrap the sapply code from above inside another call to sapply. The inner sapply does the same thing as before, but now it gets the value of q from the outer sapply (using the dummy variable i). We're creating two data frames, one called a and the other called b. a has 10 rows and b has 20 (due to the values we set in q). Both data frames are stored in a list called df.list.

Resources