R; Avoiding for loops - r

Is there a way to avoid for-loop & if-else in R as used in the code below.
drive_type<-vector(mode="expression",length=length(my_file$xx))
for(i in 1:length(my_file$xx)){
serial_num<-my_file$xx[i]
temp_1<-my_file[grep("serial_num",rownames(my_file)), ]
if('6' %in% temp_1$yy){
drive_type[i]<-'12header'}
else {
drive_type[i]<-'6header'}
}

Related

multiApply function for loop on a 3D array

I am trying to make my data processing more efficient for a spatial temperature data project. I have a for loop that will do what I want, but it is much too slow for processing multiple years of data. This loop looks at each spatial cell and, based on the 365 temperature values in that year, creates a value for the frequency, duration, number, and temp of heat events that will go into seperate 2d dataframes.
for (b in 1:299) { #longitude
for (c in 1:424) { #latitude
data <- year[b,c] #makes all temps into a vector
for (d in 2:364) {
if (data[d]>=Threshold & data[d+1]>=Threshold) {
frequencydf[b,c]=frequencydf[b,c]+1
tempsdf[b,c]=tempsdf[b,c]+data[d]
}else if (data[d-1]>=Threshold & data[d]>=Threshold & data[d+1]<Threshold) {
frequencydf[b,c]=frequencydf[b,c]+1
numberdf[b,c]=numberdf[b,c]+1
tempsdf[b,c]=tempsdf[b,c]+data[d]
}else {
frequencydf[b,c]=frequencydf[b,c]
numberdf[b,c]=numberdf[b,c]
tempsdf[b,c]=tempsdf[b,c]
}
}
durationdf[b,c]=frequencydf[b,c]/numberdf[b,c]
tempsdf[b,c]=tempsdfd[b,c]/frequencydf[b,c]
}
})
Therefore, I am trying to work with apply fuctions to speed up the process. I think I am running into issues when attempting to analyze each spacial cell by values in the 3rd (time) dimention in my array.
I am starting with the frequency parameter and trying to create the same data frame as above.
frequencylist <- Apply(year_array, fun = frequency.calc1, margins=c(1, 2))
frequencydf <- as.data.frame(frequencylist)
Using this function:
frequency.calc1 = function(cell) {
data <- as.vector(cell)
frequency <- 0
for (d in 2:364) {
if (data[d]>=Threshold & data[d+1]>=Threshold) {
frequency=frequency+1
}else if (data[d-1]>=Threshold & data[d]>=Threshold & data[d+1]<Threshold) {
frequency=frequency+1
}else {
frequency=frequency
}
return(frequency)
}
}
I am very new to creating functions and using the Apply function so any advice would be appreciated!
For-loops and *apply functions run about the same speed. Your problem is all those "if" s.
First of all, you have two separate conditions both of which lead to incrementing frequency. Figure out how to combine them. Next, remember that the R language is vectorized, so you don't need a loop at all. With a little careful thought, you can write a line something like
frequency <- sum(data[1:N-2] >=threshold & data[2:N-1] >=threshold & data[3:N<threshold)
I haven't checked all the ">" vs "<" but you get the idea.
As a side note, NEVER hard-code the range of a loop. You can start with "2" since your conditionals reference "d-1" but let the maximum value be defined as something like length(data) - 1
The solution used to simplify the process is shown below. Sum functions with conditionals were used in place of the if statements. This made the process incredibly efficient and did not use the apply function or an additional function.
for (b in 1:299) {
for (c in 1:424) {
data <- year[b,c]
N=length(data)
frequency[b,c] <- sum(data[1:N] >=Threshold & data[2:N] >=Threshold & data[3:N] <Threshold) + sum(data[1:N] >=Threshold & data[2:N] >=Threshold)
number[b,c] <- sum(data[1:N] >=Threshold & data[2:N] >=Threshold & data[3:N] <Threshold)
duration[b,c] <- frequency[b,c]/number[b,c]
temps[b,c] <- sum(data[data[1:N] >=Threshold & data[2:N] >=Threshold & data[3:N] <Threshold]) + sum(data[data[1:N] >=Threshold & data[2:N] >=Threshold])
temps[b,c] <- temps[b,c]/frequency[b,c]
}}
Thank you for your help #Carl Witthoft

Use apply functions to avoid for loop, R

I want to optimize the following code by using one of the apply functions instead of a for loop but in vain :
A=matrix(c(1,0,1,1,1,0,0,1,0,1,0,0),nrow=6,ncol=2)
corr=c()
for(i in 1:nrow(A))
{
for(j in 0:(nrow(A)-i))
{
if(A[i,1]!=A[i+j,1] & A[i,2]!=A[i+j,2])
corr=append(corr,"2 diff")
else if(A[i,1]!=A[i+j,1] & A[i,2]==A[i+j,2])
corr=append(corr,"diff gauche")
else if(A[i,1]==A[i+j,1] & A[i,2]!=A[i+j,2])
corr=append(corr,"diff droite")
else
corr=append(corr,1)
}
}
B=matrix(nrow=NROW(A),ncol=NROW(A))
B[lower.tri(B, diag=T)]=corr
B=t(B)
I've tried sapply but did not get the result, I may have misused the sapply function.

Nested for-loops into apply?

I need help figuring out how to improve a for loop. It doesn't necessarily needs to be apply, I just thought it was the best way after researching it on StackOverflow. I tried following the guides I found on StackOverflow, but I'm a newbie and believe I'm not getting a good grasp on the apply function.
This is a reconstruction of the snippet I'm trying to change:
for (j in 1:NROW(teste2) {
for (z in 1:NROW(teste2)) {
if(is.na(teste2$DATE[z]==Sys.Date()+j-1) | z==NROW(teste2)){
teste2$SALDO[z] <- 0
}else{
if(teste2$SALDO[z]!=0 & teste2$DATE[z]==Sys.Date()+j-1){
teste2$SALDO[z+1] <- teste2$PREVISAO_FINAL2[z] + teste2$SALDO[z+1]
}else{
teste2$SALDO[z] <- teste2$SALDO[z]
}
}
}
I tried doing the following:
for (j in 1:NROW(teste2) {
rows = 1:NROW(teste2)
saldo_fn <- function(z){
return(if(is.na(teste2$DATE[z]==Sys.Date()+j-1) | z==NROW(teste2)){
teste2$SALDO[z] <- 0
}else{
if(teste2$SALDO[z]!=0 & teste2$DATE[z]==Sys.Date()+j-1){
teste2$SALDO[z+1] <- teste2$PREVISAO_FINAL2[z] + teste2$SALDO[z+1]
}else{
teste2$SALDO[z] <- teste2$SALDO[z]
}
})
}
teste2$SALDO <- sapply(rows, saldo_fn)
}
But when I run sum(teste2$SALDO) it gives a different value.
What am I doing wrong? how do I fix it?
You cannot use apply-family function to optimize the algorithm. The reason is the line:
teste2$SALDO[z+1] <- teste2$PREVISAO_FINAL2[z] + teste2$SALDO[z+1]
You are recursively changing the value of next element based on the value of current one.
It is possible avoid for-loop in by using recursion, i.e. if you see something like x[i+1] = x[i+1] + x[i] you should use either for-loops or recursive functions (I prefer for-loops they are much easier and there is no problem with call stack overflow), if you see something like z[i] = F(x[i], y[i]), where F is some function, you can use apply-family functions.

if/else statement evaluating only else statement

Disclaimer: This is a question regarding an assignment for a Coursera course.
I'm having trouble coming up with a way to create a new column that differentiates between weekdays and weekends in my data set. I'm using a nested if/else statement within a for loop. The problem is the output makes every row 'weekday'. Does anyone see something glaringly wrong with my code? My end goal is to create a new factor variable that is either "weekend" or "weekday."
df4 <- mutate(df4, day = weekdays(df4$date))
for (i in df4$day) {
if(i %in% c("Saturday",'Sunday')) {
df4$day_type <- 'weekend'
} else {
df4$day_type <- 'weekday'
}
}
I modify a little bit of your code .(see below)
for (i in 1 : dim(df4)[1]) {
if(df4$day[i] %in% c('Saturday','Sunday')) {
df4$day_type[i] <- 'weekend'
} else {
df4$day_type[i] <- 'weekday'
}
}

R:data simulation

I am m trying to run the following code:
data_greene<-read.delim(file.choose(),header=T)
result_b2_HC0<-matrix(1:2000,ncol=4)
for (i in 1:500){
X1<-data_greene[[3]]*10^-4
X2<-X1^2
e<-rnorm(50,0,1)
sigma2<-exp(5.30+5.30*X1)
lambda<-max(sigma2)/min(sigma2)
Y<-1+1*X1+0*X2+sqrt(sigma2)*e
lms<-lmsreg(Y~X1+X2)
yhat<-lms$fitted
resid<-lms$residual
s<-abs(resid)
lms2<-lmsreg(s~yhat)
shat<-lms2$fitted
w1<-1/shat^2
scale<-lms$scale[1]
stdres<-resid/scale
e=abs(stdres)
w2<-NULL
for (i in 1:50){
if(e[i]<=1.345) w2[i]<-1 else w2[i]<-1.345/e[i]
}
w<-w1*w2
WLS<-lm(Y~X1+X2,weights=w)
res1<-WLS$residual
HCCMEHC0<-function(Y,X1,X2){
X<-cbind(1,X1,X2)
W<-diag(w)
inv<-solve(t(X)%*%W%*%X)
psi0<-diag(res1^2)
HC0<-inv%*%t(X)%*%W%*%psi0%*%W%*%X%*%inv
return(HC0)
}
result_b2_HC0[i,1]<-WLS$coef[3]
result_b2_HC0[i,2]<-sqrt(HCCMEHC0(Y,X1,X2)[3,3])
result_b2_HC0[i,3]<-result_b2_HC0[i,1]/result_b2_HC0[i,2]
result_b2_HC0[i,4]<-2*pt(-abs(result_b2_HC0[i,3]),df=47)
}
result_b2_HC0
I would expect the matrix to be complete, but the result only appears at row 50 in the matrix. What am I doing wrong?
You are using the same variable i in two nested for loops. Change the second for loop to use the variable j instead.
To avoid this error, make sure you always use indentation. Also, learn how to use vector mathematics. Your second loop can be rewritten from
e=abs(stdres)
w2<-NULL
for (i in 1:50){
if(e[i]<=1.345) w2[i]<-1 else w2[i]<-1.345/e[i]
}
to
e=abs(stdres)
w2<-ifelse( e <= 1.345, 1, 1.345/e )
This is cleaner, easier to read, and faster.

Resources