Use apply functions to avoid for loop, R - r

I want to optimize the following code by using one of the apply functions instead of a for loop but in vain :
A=matrix(c(1,0,1,1,1,0,0,1,0,1,0,0),nrow=6,ncol=2)
corr=c()
for(i in 1:nrow(A))
{
for(j in 0:(nrow(A)-i))
{
if(A[i,1]!=A[i+j,1] & A[i,2]!=A[i+j,2])
corr=append(corr,"2 diff")
else if(A[i,1]!=A[i+j,1] & A[i,2]==A[i+j,2])
corr=append(corr,"diff gauche")
else if(A[i,1]==A[i+j,1] & A[i,2]!=A[i+j,2])
corr=append(corr,"diff droite")
else
corr=append(corr,1)
}
}
B=matrix(nrow=NROW(A),ncol=NROW(A))
B[lower.tri(B, diag=T)]=corr
B=t(B)
I've tried sapply but did not get the result, I may have misused the sapply function.

Related

multiApply function for loop on a 3D array

I am trying to make my data processing more efficient for a spatial temperature data project. I have a for loop that will do what I want, but it is much too slow for processing multiple years of data. This loop looks at each spatial cell and, based on the 365 temperature values in that year, creates a value for the frequency, duration, number, and temp of heat events that will go into seperate 2d dataframes.
for (b in 1:299) { #longitude
for (c in 1:424) { #latitude
data <- year[b,c] #makes all temps into a vector
for (d in 2:364) {
if (data[d]>=Threshold & data[d+1]>=Threshold) {
frequencydf[b,c]=frequencydf[b,c]+1
tempsdf[b,c]=tempsdf[b,c]+data[d]
}else if (data[d-1]>=Threshold & data[d]>=Threshold & data[d+1]<Threshold) {
frequencydf[b,c]=frequencydf[b,c]+1
numberdf[b,c]=numberdf[b,c]+1
tempsdf[b,c]=tempsdf[b,c]+data[d]
}else {
frequencydf[b,c]=frequencydf[b,c]
numberdf[b,c]=numberdf[b,c]
tempsdf[b,c]=tempsdf[b,c]
}
}
durationdf[b,c]=frequencydf[b,c]/numberdf[b,c]
tempsdf[b,c]=tempsdfd[b,c]/frequencydf[b,c]
}
})
Therefore, I am trying to work with apply fuctions to speed up the process. I think I am running into issues when attempting to analyze each spacial cell by values in the 3rd (time) dimention in my array.
I am starting with the frequency parameter and trying to create the same data frame as above.
frequencylist <- Apply(year_array, fun = frequency.calc1, margins=c(1, 2))
frequencydf <- as.data.frame(frequencylist)
Using this function:
frequency.calc1 = function(cell) {
data <- as.vector(cell)
frequency <- 0
for (d in 2:364) {
if (data[d]>=Threshold & data[d+1]>=Threshold) {
frequency=frequency+1
}else if (data[d-1]>=Threshold & data[d]>=Threshold & data[d+1]<Threshold) {
frequency=frequency+1
}else {
frequency=frequency
}
return(frequency)
}
}
I am very new to creating functions and using the Apply function so any advice would be appreciated!
For-loops and *apply functions run about the same speed. Your problem is all those "if" s.
First of all, you have two separate conditions both of which lead to incrementing frequency. Figure out how to combine them. Next, remember that the R language is vectorized, so you don't need a loop at all. With a little careful thought, you can write a line something like
frequency <- sum(data[1:N-2] >=threshold & data[2:N-1] >=threshold & data[3:N<threshold)
I haven't checked all the ">" vs "<" but you get the idea.
As a side note, NEVER hard-code the range of a loop. You can start with "2" since your conditionals reference "d-1" but let the maximum value be defined as something like length(data) - 1
The solution used to simplify the process is shown below. Sum functions with conditionals were used in place of the if statements. This made the process incredibly efficient and did not use the apply function or an additional function.
for (b in 1:299) {
for (c in 1:424) {
data <- year[b,c]
N=length(data)
frequency[b,c] <- sum(data[1:N] >=Threshold & data[2:N] >=Threshold & data[3:N] <Threshold) + sum(data[1:N] >=Threshold & data[2:N] >=Threshold)
number[b,c] <- sum(data[1:N] >=Threshold & data[2:N] >=Threshold & data[3:N] <Threshold)
duration[b,c] <- frequency[b,c]/number[b,c]
temps[b,c] <- sum(data[data[1:N] >=Threshold & data[2:N] >=Threshold & data[3:N] <Threshold]) + sum(data[data[1:N] >=Threshold & data[2:N] >=Threshold])
temps[b,c] <- temps[b,c]/frequency[b,c]
}}
Thank you for your help #Carl Witthoft

Nested for-loops into apply?

I need help figuring out how to improve a for loop. It doesn't necessarily needs to be apply, I just thought it was the best way after researching it on StackOverflow. I tried following the guides I found on StackOverflow, but I'm a newbie and believe I'm not getting a good grasp on the apply function.
This is a reconstruction of the snippet I'm trying to change:
for (j in 1:NROW(teste2) {
for (z in 1:NROW(teste2)) {
if(is.na(teste2$DATE[z]==Sys.Date()+j-1) | z==NROW(teste2)){
teste2$SALDO[z] <- 0
}else{
if(teste2$SALDO[z]!=0 & teste2$DATE[z]==Sys.Date()+j-1){
teste2$SALDO[z+1] <- teste2$PREVISAO_FINAL2[z] + teste2$SALDO[z+1]
}else{
teste2$SALDO[z] <- teste2$SALDO[z]
}
}
}
I tried doing the following:
for (j in 1:NROW(teste2) {
rows = 1:NROW(teste2)
saldo_fn <- function(z){
return(if(is.na(teste2$DATE[z]==Sys.Date()+j-1) | z==NROW(teste2)){
teste2$SALDO[z] <- 0
}else{
if(teste2$SALDO[z]!=0 & teste2$DATE[z]==Sys.Date()+j-1){
teste2$SALDO[z+1] <- teste2$PREVISAO_FINAL2[z] + teste2$SALDO[z+1]
}else{
teste2$SALDO[z] <- teste2$SALDO[z]
}
})
}
teste2$SALDO <- sapply(rows, saldo_fn)
}
But when I run sum(teste2$SALDO) it gives a different value.
What am I doing wrong? how do I fix it?
You cannot use apply-family function to optimize the algorithm. The reason is the line:
teste2$SALDO[z+1] <- teste2$PREVISAO_FINAL2[z] + teste2$SALDO[z+1]
You are recursively changing the value of next element based on the value of current one.
It is possible avoid for-loop in by using recursion, i.e. if you see something like x[i+1] = x[i+1] + x[i] you should use either for-loops or recursive functions (I prefer for-loops they are much easier and there is no problem with call stack overflow), if you see something like z[i] = F(x[i], y[i]), where F is some function, you can use apply-family functions.

R; Avoiding for loops

Is there a way to avoid for-loop & if-else in R as used in the code below.
drive_type<-vector(mode="expression",length=length(my_file$xx))
for(i in 1:length(my_file$xx)){
serial_num<-my_file$xx[i]
temp_1<-my_file[grep("serial_num",rownames(my_file)), ]
if('6' %in% temp_1$yy){
drive_type[i]<-'12header'}
else {
drive_type[i]<-'6header'}
}

Multiple conditions in if statements in R

I am trying to cut down a list of gene names that I have been given. I'm trying to eliminate any repetitive names that may be present but I keep getting an error when running my code:
counter=0
i=0
j=0
geneNamesRevised=array(dim=length(geneNames))
for (i in 0:length(geneNamesRevised))
geneNamesRevised[i]=""
geneNamesRevised
for (i in 1:length(geneNames))
for (j in 1:length(geneNamesRevised))
if (geneNames[i]==geneNamesRevised[j])
{
break
}
else if ((j==length(geneNamesRevised)-1) &&
(geneNames[i]!=geneNamesRevised[j]))
{
geneNamesRevised[counter]=geneNames[i]
counter++
}
The error message is a repetitive string of :
the condition has length > 1 and only the first element will be usedthe condition has length > 1 and only the first element will be usedthe condition has length > 1 and only the first element will be used
and this error message is for the last "else if" statement that has the '&&'.
Thank you!
Why not just
geneNamesRevised <- unique( geneNames )
... which returns a shortened list. There is also a duplicated function that can be used to remove duplicates when negated.
There are a few problems in your code.
1) The else is incorrectly specified - or not :) thanks #Mohsen_Fatemi
2) & is usually what you need rather than &&
3) counter++ isn't R
Copy the code below and see if it runs
for (i in 1:length(geneNames)){
for (j in 1:length(geneNamesRevised)){
if (geneNames[i]==geneNamesRevised[j])
{
break
} else {
if ((j==length(geneNamesRevised)-1) & (geneNames[i]!=geneNamesRevised[j]))
{
geneNamesRevised[counter]=geneNames[i]
counter <- counter + 1
}
}
}
}
Edit
4) also you were missing braces for your fors
use & instead of && ,
else if ((j==length(geneNamesRevised)-1) & (geneNames[i]!=geneNamesRevised[j]))

R:data simulation

I am m trying to run the following code:
data_greene<-read.delim(file.choose(),header=T)
result_b2_HC0<-matrix(1:2000,ncol=4)
for (i in 1:500){
X1<-data_greene[[3]]*10^-4
X2<-X1^2
e<-rnorm(50,0,1)
sigma2<-exp(5.30+5.30*X1)
lambda<-max(sigma2)/min(sigma2)
Y<-1+1*X1+0*X2+sqrt(sigma2)*e
lms<-lmsreg(Y~X1+X2)
yhat<-lms$fitted
resid<-lms$residual
s<-abs(resid)
lms2<-lmsreg(s~yhat)
shat<-lms2$fitted
w1<-1/shat^2
scale<-lms$scale[1]
stdres<-resid/scale
e=abs(stdres)
w2<-NULL
for (i in 1:50){
if(e[i]<=1.345) w2[i]<-1 else w2[i]<-1.345/e[i]
}
w<-w1*w2
WLS<-lm(Y~X1+X2,weights=w)
res1<-WLS$residual
HCCMEHC0<-function(Y,X1,X2){
X<-cbind(1,X1,X2)
W<-diag(w)
inv<-solve(t(X)%*%W%*%X)
psi0<-diag(res1^2)
HC0<-inv%*%t(X)%*%W%*%psi0%*%W%*%X%*%inv
return(HC0)
}
result_b2_HC0[i,1]<-WLS$coef[3]
result_b2_HC0[i,2]<-sqrt(HCCMEHC0(Y,X1,X2)[3,3])
result_b2_HC0[i,3]<-result_b2_HC0[i,1]/result_b2_HC0[i,2]
result_b2_HC0[i,4]<-2*pt(-abs(result_b2_HC0[i,3]),df=47)
}
result_b2_HC0
I would expect the matrix to be complete, but the result only appears at row 50 in the matrix. What am I doing wrong?
You are using the same variable i in two nested for loops. Change the second for loop to use the variable j instead.
To avoid this error, make sure you always use indentation. Also, learn how to use vector mathematics. Your second loop can be rewritten from
e=abs(stdres)
w2<-NULL
for (i in 1:50){
if(e[i]<=1.345) w2[i]<-1 else w2[i]<-1.345/e[i]
}
to
e=abs(stdres)
w2<-ifelse( e <= 1.345, 1, 1.345/e )
This is cleaner, easier to read, and faster.

Resources