so I know there is a standard deviation function in R but im trying to figure out how to write code to figure out SD the long way using a for loop.
men<-c(150,175,213,241,190,132,110,208,187)
alex<-NULL
for(i in 1:length(men)
{
alex[i]<-(men[i]-178.44)^2
}
this is what I have so far and what I am trying to do is store the value of (men[i]-mean)^2 in the vector alex so I can go on to sum the vector alex and find the standard deviation. however, I receive an error message when I try to run this code. Any input is appreciated.
Since some calculations in R can be applied over entire vectors, you could simply write the following and forget the for loop all together.
> alex <- (men - mean(men))^2
> alex
# [1] 809.08642 11.86420 1194.08642 3913.19753 133.53086
# [6] 2157.08642 4684.64198 873.53086 73.19753
As per your comment, here is the way I'd do this with a for loop. Notice the initialization of alex is a numeric vector with length exactly the same length as the vector we're calculating over. This makes for loops run faster in R.
> alex <- numeric(length(men))
> for(i in 1:length(men)) alex[i] <- (men[i] - mean(men))^2
> alex
# [1] 809.08642 11.86420 1194.08642 3913.19753 133.53086
# [6] 2157.08642 4684.64198 873.53086 73.19753
Related
I'm trying to generate a data frame of simulated values from the student's t distribution using the standard stochastic equation. The function I use is as follows:
matgen<-function(means,chi,covariancematrix)
{
cols<-ncol(means);
normals<-mvrnorm(n=500,mu=means,Sigma = covariancematrix);
invgammas<-rigamma(n=500,alpha=chi/2,beta=chi/2);
gen<-as.data.frame(matrix(data=NA,ncol=cols,nrow=500));
i<-1;
while(i<=500)
{
gen[i,]<-t(means)+normals[i,]*sqrt(invgammas[i]);
i<=i+1;
}
return(gen);
}
If it's not clear, I'm trying to create an empty data frame, that takes in values in cols number of columns and 500 rows. The values are numeric, of course, and R tells me that in the 9th row:
gen<-as.data.frame(matrix(data=NA,ncol=cols,nrow=500));
There's an error: 'non-numeric matrix extent'.
I remember using as.data.frame() to convert matrices into data frames in the past, and it worked quite smoothly. Even with numbers. I have been out of touch for a while, though, and can't seem to recollect or find online a solution to this problem. I tried is.numeric(), as.numeric(), 0s instead of NA there, but nothing works.
As Roland pointed out, one problem is, that col doesn't seem to be numeric. Please check if means is a dataframe or matrix, e.g. str(means). If it is, your code should not result in the error: 'non-numeric matrix extent'.
You also have some other issues in your code. I created a simplified example and pointed out the bugs I found as comments in the code:
library(MASS)
library(LearnBayes)
means <- cbind(c(1,2,3),c(4,5,6))
chi <- 10
matgen<-function(means,chi,covariancematrix)
{
cols <- ncol(means) # if means is a dataframe or matrix, this should work
normals <- rnorm(n=20,mean=100,sd=10) # changed example for simplification
# normals<-mvrnorm(n=20,mu=means,Sigma = covariancematrix)
# input to mu of mvrnorm should be a vector, see ?mvrnorm; but this means that ncol(means) is always 1 !?
invgammas<-rigamma(n=20,a=chi/2,b=chi/2) # changed alpha= to a and beta= to b
gen<-as.data.frame(matrix(data=NA,ncol=cols,nrow=20))
i<-1
while(i<=20)
{
gen[i,]<-t(means)+normals[i]*sqrt(invgammas[i]) # changed normals[i,] to normals [i], because it is a vector
i<-i+1 # changed <= to <-
}
return(gen)
}
matgen(means,chi,covariancematrix)
I hope this helps.
P.S. You don't need ";" at the end of every line in R
I am ashamed I need assistance on such a simple task. I want to create 20 normal distributed numbers, add them, and then do this again x times. Then plot a histogram of these sums. This is an exercise in Gilman and Hills text "Data Analysis Using Regression and Multilevel/Hierarchical Models".
I thought this would be simple, but I am into it about 10 hours now. Web searches and looking in "The Art of R Programming" by Norman Matloff and "R for Everyone" by Jared Lander have not helped. I suspect the answer is so simple that no one would suspect a problem. The syntax in R is something that I am having difficulty with.
> # chapter 2 exercise 3
> n.sim <- 10 # number of simultions
>
> sumNumbers <- rep(NA, n.sim) # generate vector of NA's
> for (i in 1:n.sim) # begin for loop
+{
+ numbers <- rnorm(20,0,1)
+ sumNumbers(i) <- sum(numbers) # defined as a vector bur R
+ # thinks it's a function
+ }
Error in sumNumbers(i) <- sum(numbers) :
could not find function "sumNumbers<-"
>
> hist(sumNumbers)
Hide Traceback
Rerun with Debug
Error in hist.default(sumNumbers) : 'x' must be numeric
3 stop("'x' must be numeric")
2 hist.default(sumNumbers)
1 hist(sumNumbers)
>
A few things:
When you put parentheses after a variable name, the R interpreter assumes that it's a function. In your case, you want to reference an index of a variable, so it should be sumNumbers[i] <- sum(numbers), which uses square brackets instead. This will solve your problem.
You can initiate sumNumbers as sumNumbers = numeric(n.sim). It's a bit easier to read in simple case like this.
By default, rnorm(n) is the same as rnorm(n,0,1). This can save you some time typing.
You can replicate an operation a specified number of times with the replicate function:
set.seed(144) # For consistent results
(simulations <- replicate(10, sum(rnorm(20))))
# [1] -9.3535884 1.4321598 -1.7812790 -1.1851263 -1.9325988 2.9652475 2.9559994
# [8] 0.7164233 -8.1364348 -7.3428464
After simulating the proper number of samples, you can plot with hist(simulations).
Here is the next step of the question answered at this link [Apply function too slow in r
I have to calculate for a lot of species a specific formula per row. The formula correspond to a variance calculation and so need the result obtained in the above link.
My current script consists in using a for-loop which is naturally very slow. I simplified the problem in the following script, using a simple df called az.
az=data.frame(c(1,2,10),c(2,4,20),c(3,6,30))
colnames(az)=c("a","b","c")
# Necessary number calculated in step 1 (see link above)
m <- as.matrix(az)
m[is.na(m)] <- 0 #remove NA from sums
step1 = as.vector(m %*% m[nrow(m),])
# Initial for loop
prov=0 # prov for provisional number
for (i in 1:nrow(az)){
for (j in 1:ncol(az)){
prov=prov+az[i,j]*az[nrow(az),j]
prov=prov+az[i,j]*(az[nrow(az),j]-step1[i])^2
}
print(prov)
prov=0
}
As I have to repeat the operation for a huge number of species, I was wondering if anyone has a more efficient solution, maybe using vectorized expressions.
Kind regards.
This code will return the same values that your code prints out, but more efficiently.
> n<-nrow(m)
> mm<-t(m)
> prov<-mm*mm[,n]
> prov<-prov+mm*(mm[,n]-step1[col(mm)])^2
> colSums(prov)
[1] 82140 791480 113717400
I am still quite new to r (used to program in Matlab) and I am trying use the parallel package to speed up some calculations. Below is an example which I am trying to calculate the rolling standard deviation of a matrix (by column) with the use of zoo package, with and without parallelising the codes. However, the shape of the outputs came out to be different.
# load library
library('zoo')
library('parallel')
library('snow')
# Data
z <- matrix(runif(1000000,0,1),100,1000)
#This is what I want to calculate with timing
system.time(zz <- rollapply(z,10,sd,by.column=T, fill=NA))
# Trying to achieve the same output with parallel computing
cl<-makeSOCKcluster(4)
clusterEvalQ(cl, library(zoo))
system.time(yy <-parCapply(cl,z,function(x) rollapplyr(x,10,sd,fill=NA)))
stopCluster(cl)
My first output zz has the same dimensions as input z, whereas output yy is a vector rather than a matrix. I understand that I can do something like matrix(yy,nrow(z),ncol(z)) however I would like to know if I have done something wrong or if there is a better way of coding to improve this. Thank you.
From the documentation:
parRapply and parCapply always return a vector. If FUN always returns
a scalar result this will be of length the number of rows or columns:
otherwise it will be the concatenation of the returned values.
And:
parRapply and parCapply are parallel row and column apply functions
for a matrix x; they may be slightly more efficient than parApply but
do less post-processing of the result.
So, I'd suggest you use parApply.
I am still getting acquainted with R and I've found some small technicalities that I would really appreciate if someone could help me to solve them.
I am trying to write a loop using "for" for non-consecutive observations, so instead of a loop for a sequence from 1:1000 days I would like to run it for specific observations, let say, each 64 days
I tried defining a vector X with the sequence I want, but R returns an error and only uses the first numerical entrance of the vector.
X<-seq(from=1, to=1000, by=64)
for(i in 1:X){....
I hope someone can give me a hint how to do this
Thank you in advanced
What you need is
for( i in seq(from=1, to=1000, by=64) ) { print(i) }
1:X with try to create a vector from 1 to X stepping 1 at a time, and in this case X is a vector so it only takes the first element.