basic summation using R - r

so I know there is a standard deviation function in R but im trying to figure out how to write code to figure out SD the long way using a for loop.
men<-c(150,175,213,241,190,132,110,208,187)
alex<-NULL
for(i in 1:length(men)
{
alex[i]<-(men[i]-178.44)^2
}
this is what I have so far and what I am trying to do is store the value of (men[i]-mean)^2 in the vector alex so I can go on to sum the vector alex and find the standard deviation. however, I receive an error message when I try to run this code. Any input is appreciated.

Since some calculations in R can be applied over entire vectors, you could simply write the following and forget the for loop all together.
> alex <- (men - mean(men))^2
> alex
# [1] 809.08642 11.86420 1194.08642 3913.19753 133.53086
# [6] 2157.08642 4684.64198 873.53086 73.19753
As per your comment, here is the way I'd do this with a for loop. Notice the initialization of alex is a numeric vector with length exactly the same length as the vector we're calculating over. This makes for loops run faster in R.
> alex <- numeric(length(men))
> for(i in 1:length(men)) alex[i] <- (men[i] - mean(men))^2
> alex
# [1] 809.08642 11.86420 1194.08642 3913.19753 133.53086
# [6] 2157.08642 4684.64198 873.53086 73.19753

Related

How to counter the 'non-numeric matrix extent' error in R?

I'm trying to generate a data frame of simulated values from the student's t distribution using the standard stochastic equation. The function I use is as follows:
matgen<-function(means,chi,covariancematrix)
{
cols<-ncol(means);
normals<-mvrnorm(n=500,mu=means,Sigma = covariancematrix);
invgammas<-rigamma(n=500,alpha=chi/2,beta=chi/2);
gen<-as.data.frame(matrix(data=NA,ncol=cols,nrow=500));
i<-1;
while(i<=500)
{
gen[i,]<-t(means)+normals[i,]*sqrt(invgammas[i]);
i<=i+1;
}
return(gen);
}
If it's not clear, I'm trying to create an empty data frame, that takes in values in cols number of columns and 500 rows. The values are numeric, of course, and R tells me that in the 9th row:
gen<-as.data.frame(matrix(data=NA,ncol=cols,nrow=500));
There's an error: 'non-numeric matrix extent'.
I remember using as.data.frame() to convert matrices into data frames in the past, and it worked quite smoothly. Even with numbers. I have been out of touch for a while, though, and can't seem to recollect or find online a solution to this problem. I tried is.numeric(), as.numeric(), 0s instead of NA there, but nothing works.
As Roland pointed out, one problem is, that col doesn't seem to be numeric. Please check if means is a dataframe or matrix, e.g. str(means). If it is, your code should not result in the error: 'non-numeric matrix extent'.
You also have some other issues in your code. I created a simplified example and pointed out the bugs I found as comments in the code:
library(MASS)
library(LearnBayes)
means <- cbind(c(1,2,3),c(4,5,6))
chi <- 10
matgen<-function(means,chi,covariancematrix)
{
cols <- ncol(means) # if means is a dataframe or matrix, this should work
normals <- rnorm(n=20,mean=100,sd=10) # changed example for simplification
# normals<-mvrnorm(n=20,mu=means,Sigma = covariancematrix)
# input to mu of mvrnorm should be a vector, see ?mvrnorm; but this means that ncol(means) is always 1 !?
invgammas<-rigamma(n=20,a=chi/2,b=chi/2) # changed alpha= to a and beta= to b
gen<-as.data.frame(matrix(data=NA,ncol=cols,nrow=20))
i<-1
while(i<=20)
{
gen[i,]<-t(means)+normals[i]*sqrt(invgammas[i]) # changed normals[i,] to normals [i], because it is a vector
i<-i+1 # changed <= to <-
}
return(gen)
}
matgen(means,chi,covariancematrix)
I hope this helps.
P.S. You don't need ";" at the end of every line in R

Creating a vector in a for loop

I am ashamed I need assistance on such a simple task. I want to create 20 normal distributed numbers, add them, and then do this again x times. Then plot a histogram of these sums. This is an exercise in Gilman and Hills text "Data Analysis Using Regression and Multilevel/Hierarchical Models".
I thought this would be simple, but I am into it about 10 hours now. Web searches and looking in "The Art of R Programming" by Norman Matloff and "R for Everyone" by Jared Lander have not helped. I suspect the answer is so simple that no one would suspect a problem. The syntax in R is something that I am having difficulty with.
> # chapter 2 exercise 3
> n.sim <- 10 # number of simultions
>
> sumNumbers <- rep(NA, n.sim) # generate vector of NA's
> for (i in 1:n.sim) # begin for loop
+{
+ numbers <- rnorm(20,0,1)
+ sumNumbers(i) <- sum(numbers) # defined as a vector bur R
+ # thinks it's a function
+ }
Error in sumNumbers(i) <- sum(numbers) :
could not find function "sumNumbers<-"
>
> hist(sumNumbers)
Hide Traceback
Rerun with Debug
Error in hist.default(sumNumbers) : 'x' must be numeric
3 stop("'x' must be numeric")
2 hist.default(sumNumbers)
1 hist(sumNumbers)
>
A few things:
When you put parentheses after a variable name, the R interpreter assumes that it's a function. In your case, you want to reference an index of a variable, so it should be sumNumbers[i] <- sum(numbers), which uses square brackets instead. This will solve your problem.
You can initiate sumNumbers as sumNumbers = numeric(n.sim). It's a bit easier to read in simple case like this.
By default, rnorm(n) is the same as rnorm(n,0,1). This can save you some time typing.
You can replicate an operation a specified number of times with the replicate function:
set.seed(144) # For consistent results
(simulations <- replicate(10, sum(rnorm(20))))
# [1] -9.3535884 1.4321598 -1.7812790 -1.1851263 -1.9325988 2.9652475 2.9559994
# [8] 0.7164233 -8.1364348 -7.3428464
After simulating the proper number of samples, you can plot with hist(simulations).

Optimize variance calculation, for loop too slow

Here is the next step of the question answered at this link [Apply function too slow in r
I have to calculate for a lot of species a specific formula per row. The formula correspond to a variance calculation and so need the result obtained in the above link.
My current script consists in using a for-loop which is naturally very slow. I simplified the problem in the following script, using a simple df called az.
az=data.frame(c(1,2,10),c(2,4,20),c(3,6,30))
colnames(az)=c("a","b","c")
# Necessary number calculated in step 1 (see link above)
m <- as.matrix(az)
m[is.na(m)] <- 0 #remove NA from sums
step1 = as.vector(m %*% m[nrow(m),])
# Initial for loop
prov=0 # prov for provisional number
for (i in 1:nrow(az)){
for (j in 1:ncol(az)){
prov=prov+az[i,j]*az[nrow(az),j]
prov=prov+az[i,j]*(az[nrow(az),j]-step1[i])^2
}
print(prov)
prov=0
}
As I have to repeat the operation for a huge number of species, I was wondering if anyone has a more efficient solution, maybe using vectorized expressions.
Kind regards.
This code will return the same values that your code prints out, but more efficiently.
> n<-nrow(m)
> mm<-t(m)
> prov<-mm*mm[,n]
> prov<-prov+mm*(mm[,n]-step1[col(mm)])^2
> colSums(prov)
[1] 82140 791480 113717400

Output of parApply different from my input

I am still quite new to r (used to program in Matlab) and I am trying use the parallel package to speed up some calculations. Below is an example which I am trying to calculate the rolling standard deviation of a matrix (by column) with the use of zoo package, with and without parallelising the codes. However, the shape of the outputs came out to be different.
# load library
library('zoo')
library('parallel')
library('snow')
# Data
z <- matrix(runif(1000000,0,1),100,1000)
#This is what I want to calculate with timing
system.time(zz <- rollapply(z,10,sd,by.column=T, fill=NA))
# Trying to achieve the same output with parallel computing
cl<-makeSOCKcluster(4)
clusterEvalQ(cl, library(zoo))
system.time(yy <-parCapply(cl,z,function(x) rollapplyr(x,10,sd,fill=NA)))
stopCluster(cl)
My first output zz has the same dimensions as input z, whereas output yy is a vector rather than a matrix. I understand that I can do something like matrix(yy,nrow(z),ncol(z)) however I would like to know if I have done something wrong or if there is a better way of coding to improve this. Thank you.
From the documentation:
parRapply and parCapply always return a vector. If FUN always returns
a scalar result this will be of length the number of rows or columns:
otherwise it will be the concatenation of the returned values.
And:
parRapply and parCapply are parallel row and column apply functions
for a matrix x; they may be slightly more efficient than parApply but
do less post-processing of the result.
So, I'd suggest you use parApply.

how to use "for" loop in R for non-consecutive observations

I am still getting acquainted with R and I've found some small technicalities that I would really appreciate if someone could help me to solve them.
I am trying to write a loop using "for" for non-consecutive observations, so instead of a loop for a sequence from 1:1000 days I would like to run it for specific observations, let say, each 64 days
I tried defining a vector X with the sequence I want, but R returns an error and only uses the first numerical entrance of the vector.
X<-seq(from=1, to=1000, by=64)
for(i in 1:X){....
I hope someone can give me a hint how to do this
Thank you in advanced
What you need is
for( i in seq(from=1, to=1000, by=64) ) { print(i) }
1:X with try to create a vector from 1 to X stepping 1 at a time, and in this case X is a vector so it only takes the first element.

Resources