vector-matrix multiplication in r - r

I want to multiply 1000 random variables to a matrix so as to get 1000 different resultant matrices.
I'm running the following code :
Threshold <- runif(1000,min=0,max=1) #Generating 1000 random variables so that we can see 1000 multiple results of Burstscore
Burstscore <- matrix(data=0,nrow=nrow(Fm2),ncol=ncol(Fpre2))
#Calculating the final burst score
for (i in 1:nrow(Fm2)){
for (j in 1:ncol(Fpre)){ #Dimentions of all the matrices (Fpre,Fm,Growth,TD,Burstscore) are 432,24
{
Burstscore[i,j]= ((as.numeric(Threshold))*(as.numeric(Growth[i,j]))) + ((1-(as.numeric(Threshold)))*(as.numeric(TD[i,j])))
}
}
}
I'm getting the following error -
'Error in Burstscore[i, j] = ((as.numeric(Threshold)) * (as.numeric(Growth[i, :
number of items to replace is not a multiple of replacement length'

You are trying to put in one cell of the Burstscore matrix 1000 values (as you are multiplying each [i,j] one by the entire "Threshold" vector). Apart from this, your code contains unnecesary elements (brackets or as.numeric() statements). And, of course, as said above, it is not fully reproducible, and I had to "invent" several matrices.
I guess that what you want to do is the following:
Threshold <- runif(1000,min=0,max=1)
Growth <- matrix(runif(432*24), ncol=24)
Burstscore <- vector("list", length(Threshold))
for (i in 1:length(Threshold)) {
Burstscore[[i]]= (Threshold[i]*Growth) + ((1-Threshold[i])*TD)
}
In R, it would be even more elegant to use a lapply() function:
Burstscore <- lapply(Threshold, function(x) (x*Growth)+((1-x)*TD))
Finally, I suggest you also put a more meaningful title to your question, so it could potentially be helpful to others also.

Related

Replacing the first n values of each R dataframe column according to function

I'm trying to compare a "regular" data-set to a contaminated one, however I'm having trouble creating the contaminated data-set
Each list contains 25 data-frames, to each corresponding a size n; each data-frame contain m=850 samples of size n = {100, 200, ..., 2500} of an exponential distribution
I have tried replacing the first n/4 items of each sample for each data-frame.
The current way I am doing it adds extra entries to the contaminated data-frames, which I do not want - I merely wish to replace them.
However, if I switch c(j) with c(1:n/4), an error pops up saying replacement has 25 rows, data has 100.
What could I do better?
set.seed(915)
n_lst <- seq(from = 100, to = 2500, by=100)
m_lst <- seq(from=1, to=850, by=1)
l = list()
lCont = list()
i=1
for (n in n_lst) {
l[[i]] = lCont[[i]] = data.frame(replicate(850, rexp(n, 0.73)))
for (j in m_lst) {
lCont[[i]][c(j), c(1:n/4)] = rexp(n/4, 0.01)
}
i <-i+1
}
Bellow are the original list and the contaminated list (sorry about the formatting issues I was having trouble with the formatting verification)
Original List
Contaminated List
The main problem is that you are indexing using [columns, rows], which is backwards. R indexes data frames and matrices as [rows, columns]. Switching to lCont[[i]][1:(n / 4), j] will solve that.
Also note that : comes early in R's order of operations, you want 1:(n / 4), not 1:n / 4.
And a last comment, c() is only needed if you're combining more than one thing, like c(1:5, 12). c(j) is a long way to write j.

For Loop : number of items to replace is not a multiple of replacement length

I am working on an R Markdown document using a DICE (climate-economy) model.
I have a function YPC that depends on two variables tt and r. My goal is to obtain a data frame that contains all possible images of YPC (for all possible values of r and tt), with the cell YPC[i,j] containing the value of YPC for i=tt and j=r. For each line of my matrix (each value of tt), I want every cell after a certain value r=l[tt] to be filled with the value 0.
Example : l[1]=100 ; Then all the cells on the first line after the 100th column should be filled with zero.
I create my matrix and define its cells as follows
YPC<-matrix(nrow = NT, ncol = l[100])
for (i in 1:NT){
for (j in 1:l[100]) {
if (j<=l[i]) {
YPC[i,j] <- fYPC(Y, r=j, tt=i)}
else{
YPC[i,j] <- 0}
}
}
When I print the generated cells, I see that it works great up to the l[i] value. At the l[i] value, it prints "Inf" and a warning message : number of items to replace is not a multiple of replacement length. At the end of the computation, I obtain a "no loop for break/next, jumping to top level" error.
I don't write all my code here as it is quite long and only few elements seem relevant.
Also, a reproducible example is hard to provide as I don't use a regular dataset here. I only have simple data for parameters of the model that my code builds on to create several equations that use those parameters.
EDIT
Thanks for your answers.
Here is how fYPC is defined :
fYPC <- function(Y, r, tt=NULL){
if (is.null(tt)){
output=(exp(Y/l+et2*sqrt(2)*erfinv(2*r/l-1))+exp(Y/l+et2*sqrt(2)*erfinv(2(r-1)/l-1)))
}else {
output=(exp(Y[tt]/l[tt]+et2*sqrt(2)*erfinv(2*r/l[tt]-1))+exp(Y[tt]/l[tt]+et2*sqrt(2)*erfinv(2*(r-1)/l[tt]-1)))
}
return(output)
}
In the is.null part, Y is a value so the output is also of length 1.
In the else part, Y is a vector, then Y[tt] and l[tt] are length 1 and so is the final output
About NT, you can actually replace it by 100. And l[i] is defined as follows (with pop0, popasym, popadj respectively 100, 1000, 0.134) :
l = pop0
for(i in 2:NT) l[i] <- l[i-1] * (popasym / l[i-1])**popadj

subscript out of bounds R for large for-loop

I have a for loop which works perfectly for (relative) small amount of repetitions times=10,100. But for larger values for "times" I get an error by filling-in the matrix: subscript out of bounds... (See code below, and code-explanation)
Error:
Error in M_zp_var[j, (1:n)] : subscript out of bounds
To fill the matrix I use a second for-loop in the first one
M_zp<-matrix(numeric(1),B,N)
for(j in 1:dim(M_zp)[1]){ M_zp[j,]<-Z_pi(x,y) }
and I even tried it with
M_zp<-t(replicate(B, Z_pi(x,y)))
instead, but it does not work either.
As I said the code works if my outer-for-loop is small. I do not understand why should it work differently if I choose the variable "times" larger (1000, 5000).
I know the meaning of "subscript out of bounds"
I hope you can help me I would be so greatfull! :(
Code:
T_p<- matrix(numeric(1),times,Bvar)
H0 <- numeric(times)
for(i in 1:times){
# Each loop starts with a new random vector
x<-rnorm(n) #x<- rdist(n,...); #
y<-rnorm(m) #y<- rdist(m,...); #
# Order statstic with T_pi's T(Z(pi))
n<-length(x)
m<-length(y)
N<-n+m
#Permutate (x,y) B times
**M_zp<-matrix(numeric(1),B,N)
for(j in 1:dim(M_zp)[1]){ M_zp[j,]<-Z_pi(x,y) }** #see below for Z_pi() funct.
#M_zp<-t(replicate(B, Z_pi(x,y)))
M_zp_var<-unique(M_zp)
Bf<-dim(unique(M_zp))[1]
#Test statistic computation for each one of the Permutations in M_zp_var
T_pvec<-numeric(Bvar)
for(j in 1:Bvar){
xp<-M_zp_var[j,(1:n)]
yp<-M_zp_var[j,((n+1):(n+m))]
m_xp<- mstern(xp)
m_yp<- mstern(yp)
T_pvec[j]<-(sd(xp)-sd(yp))/sqrt(m_xp/n+m_yp/m)
}
T_p[i,]<-sort(T_pvec)
}
Z_pi<-function(x,y){
n<-length(x)
m<-length(y)
x<-sort(x)
y<-sort(y)
wicy<-function(s){ wi<-which(y==s); return(wi)}
wicx<-function(s){ wi<-which(x==s); return(wi)}
r<-sample(1:min(n,m),1) #interger beween 1 and min(m+n) = length of entries to interchange between x and y
zwy<-sample(y,r) #vector of length r out of entries in y (randomly) =intries to interchange with x
wy<-unlist(lapply(zwy,wicy))
zwx<-sample(x,r)
wx<-unlist(lapply(zwx,wicx))
yp<-y
xp<-x
yp[wy]<-zwx
xp[wx]<-zwy
zp<-c(sort(xp), sort(yp))
return(zp)
}
Explanation:
The loop simulates "times"-times a permutation test (Monte-Carlo). One loop generates 2 random samples and then permutates them randomly B=16000 times (a permutation is made with the function Z_pi() ), then it takes only the different permuatations (unique() ) And out of each permutation it calculates a Test statistic... The codes stops working at the marked row (in bold) which involves the function Z_pi().

Indexing variables in R

I am normally a maple user currently working with R, and I have a problem with correctly indexing variables.
Say I want to define 2 vectors, v1 and v2, and I want to call the nth element in v1. In maple this is easily done:
v[1]:=some vector,
and the nth element is then called by the command
v[1][n].
How can this be done in R? The actual problem is as follows:
I have a sequence M (say of length 10, indexed by k) of simulated negbin variables. For each of these simulated variables I want to construct a vector X of length M[k] with entries given by some formula. So I should end up with 10 different vectors, each of different length. My incorrect code looks like this
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
for(k in 1:sims){
x[k]<-rep(NA,M[k])
X[k]<-rep(NA,M[k])
for(i in 1:M[k]){x[k][i]<-runif(1,min=0,max=1)
if(x[k][i]>=0 & x[i]<=0.1056379){
X[k][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[k][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
}
The error appears to be that x[k] is not a valid name for a variable. Any way to make this work?
Thanks a lot :)
I've edited your R script slightly to get it working and make it reproducible. To do this I had to assume that eks_2016_kasko was an integer value of 10.
require(MASS)
sims<-10
# Because you R is not zero indexed add one
M<-rnegbin(sims, 10*exp(-2.17173), 840.1746) + 1
# Create a list
x <- list()
X <- list()
for(k in 1:sims){
x[[k]]<-rep(NA,M[k])
X[[k]]<-rep(NA,M[k])
for(i in 1:M[k]){
x[[k]][i]<-runif(1,min=0,max=1)
if(x[[k]][i]>=0 & x[[k]][i]<=0.1056379){
X[[k]][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[[k]][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
This will work and I think is what you were trying to do, BUT is not great R code. I strongly recommend using the lapply family instead of for loops, learning to use data.table and parallelisation if you need to get things to scale. Additionally if you want to read more about indexing in R and subsetting Hadley Wickham has a comprehensive break down here.
Hope this helps!
Let me start with a few remarks and then show you, how your problem can be solved using R.
In R, there is most of the time no need to use a for loop in order to assign several values to a vector. So, for example, to fill a vector of length 100 with uniformly distributed random variables, you do something like:
set.seed(1234)
x1 <- rep(NA, 100)
for (i in 1:100) {
x1[i] <- runif(1, 0, 1)
}
(set.seed() is used to set the random seed, such that you get the same result each time.) It is much simpler (and also much faster) to do this instead:
x2 <- runif(100, 0, 1)
identical(x1, x2)
## [1] TRUE
As you see, results are identical.
The reason that x[k]<-rep(NA,M[k]) does not work is that indeed x[k] is not a valid variable name in R. [ is used for indexing, so x[k] extracts the element k from a vector x. Since you try to assign a vector of length larger than 1 to a single element, you get an error. What you probably want to use is a list, as you will see in the example below.
So here comes the code that I would use instead of what you proposed in your post. Note that I am not sure that I correctly understood what you intend to do, so I will also describe below what the code does. Let me know if this fits your intentions.
# define M
library(MASS)
eks_2016_kasko <- 486689.1
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
# define the function that calculates X for a single value from M
calculate_X <- function(m) {
x <- runif(m, min=0,max=1)
X <- ifelse(x > 0.1056379, rlnorm(m, 6.228244, 0.3565041),
rlnorm(m, 8.910837, 1.1890874))
}
# apply that function to each element of M
X <- lapply(M, calculate_X)
As you can see, there are no loops in that solution. I'll start to explain at the end:
lapply is used to apply a function (calculate_X) to each element of a list or vector (here it is the vector M). It returns a list. So, you can get, e.g. the third of the vectors with X[[3]] (note that [[ is used to extract elements from a list). And the contents of X[[3]] will be the result of calculate_X(M[3]).
The function calculate_X() does the following: It creates a vector of m uniformly distributed random values (remember that m runs over the elements of M) and stores that in x. Then it creates a vector X that contains log normally distributed random variables. The parameters of the distribution depend on the value x.

Element-wise operation with two vectors of a data frame in R

My first question here: how to apply an efficient routine that iterates values of two vectors (pairwise) of a given data frame?
To be more specific, consider the following example using the following data frame:
df0 <- data.frame(matrix(c(1,2,2,3,1,3,0.4,0.2,0.2,0.1,0.4,0.1),nrow=6,ncol=2))
colnames(df0) <- c("value","frequency")
The first column is a real value and the second column is a frequency (or weights). NOTICE: the weights have to be strictly positive, they might be repeated, they not necessarily add up to one (because of repetition).
I am performing the following LOOP to calculate my function P. This P is supposed to be a number between 0 and 1.
# Define two parameters
K = 1/2
alpha = 0
# LOOP
mattemp <- matrix(,nrow=length(df0$value), ncol=length(df0$value))
for(i in 1:length(df0$value)) {
for(j in 1:length(df0$value)) {
mattemp[i,j] <- df0$frequency[i]^(1+alpha) * df0$frequency[j] * abs(df0$value[i]-df0$value[j])
P <- K * sum(mattemp)
}
}
Basically, my function P is calculating:
P = K * (0.4^alpha * 0.2 * |1-2| + 0.4^alpha * 0.1 * |1-3| + ...
This code works perfectly well as long as the matrix is small.
However, I am trying to implement this routine for a big matrix (5400 x 5400) and this LOOP does not seem to find an end.
I already tried to loop it using a foreach command (using %dopar%), but it does not work as well.
Is there a smart and concise routine that R can handle??? It does not need to follow the above structure, as long as it is efficient.
Thank you very much
Try:
df$nval <- (df0$value - mean(df0$value)) / sd(df0$value)
ij <- combn(nrow(df0), 2)
foo <- sum(df0$frequency[ij[1, ]] ^ (1 + alpha) * df0$frequency[ij[2, ]] * abs(df0$nval[ij[1, ]] - df0$nval[ij[2, ]]))
P <- K*2*sum(foo)
Reasoning: Basically you are testing every possible permutation between frequencies and normalized values. We use combn to create half of those. We then just vectorize the whole thing. Since combn only gives unique combinations, we need to multiply by 2. [Keep in mind that we don't need the values on the diagonal, as abs(df0$value[i] - df0$value[i]) is equal to 0, and we are only missing cases where i=j and j=i, so that's why we multiply by 2.] We then multiply by K and get P.
It's not clear how you want to normalize, so I just substracted the mean and divided that by the standard deviation. If you meant something else, you yourself can change it accordingly.
Edit1: Big thanks to #alexis_laz for finding a mistake and suggesting improvements that almost double the speed!
Edit2: Adjusted script to fit changed requirements.

Resources