How to vectorize from if to ifelse with multiple statements? - r

I just read that vectorization increases performance and lowers significantly computation time, and in the case of if() else , best choice is ifelse().
My problem is I got some if statements inside a for loop, and each if statement contains multiple assignments, like the following:
x <- matrix(1:10,10,3)
criteria <- matrix(c(1,1,1,0,1,0,0,1,0,0,
1,1,1,1,1,0,0,1,1,0,
1,1,1,1,1,1,1,1,1,1),10,3) #criteria for the ifs
output1 <- rep(list(NA),10) #storage list for output
for (i in 1:10) {
if (criteria[i,1]>=1) {
output1[[i]] <- colMeans(x)
output1[[i]] <- output1[[i]][1] #part of the somefunction output
} else {
if(criteria[i,2]>=1) {
output1[[i]] <- colSums(x)
output1[[i]] <- output1[[i]][1] #the same
} else {
output1[[i]] <- colSums(x+1)
output1[[i]] <- output1[[i]][1] #the same
}}}
How can I translate this into ifelse?
Thanks in advance!

Note that you don't need a for loop as all operations used are vectorized:
output2 <- ifelse(criteria[, 1] >= 1,
colMeans(x)[1],
ifelse(criteria[, 2] >= 1,
colSums(x)[1],
colSums(x+1)[1]))
identical(output1, as.list(output2))
## [1] TRUE

At least you can convert two assignments into one. So instead of
output[[i]] <- somefunction(arg1,arg2,...)
output[[i]] <- output[[i]]$thing #part of the somefunction output
you can refer directly to the only part you are interested in.
output[[i]] <- somefunction(arg1,arg2,...)$thing #part of the somefunction output
Hope that it helps!

It seems I found the answer trying to build the example:
output2 <- rep(list(NA),10) #storage list for output
for (i in 1:10) {
output2[[i]] <- ifelse(criteria[i,1]>=1,
yes=colMeans(x)[1],
no=ifelse(criteria[i,2]>=1,
yes=colSums(x)[1],
no=colSums(x+1)[1]))}

Related

How to use a for loop with multiple results

I have to automate this sequence of functions:
for (i in c(15,17,20,24,25,26,27,28,29,45,50,52,55,60,62)) {
WBES_sf_angola_i <- subset(WBES_sf_angola, isic == i)
WBES_angola_i <- as_Spatial(WBES_sf_angola_i)
FDI_angola_i <- FDI_angola[FDI_angola$isic==i,]
dist_ao_i <- distm(WBES_angola_i,FDI_angola_i, fun = distGeo)/1000
rm(WBES_sf_angola_i,WBES_angola_i,FDI_angola_i)
}
As a result, I want a "dist_ao" for each i. The indexed values are to be found in the isic columns of the WBES_sf_angola and the FDI_angola datasets.
How can I embed the index in the various items' names?
EDIT:
I tried with following modification:
for (i in c(15,17,20,24,25,26,27,28,29,45,50,52,55,60,62)) {
WBES_sf_angola_i <- subset(WBES_sf_angola, isic == i)
WBES_angola_i <- as_Spatial(WBES_sf_angola_i)
FDI_angola_i <- FDI_angola[FDI_angola$isic==i,]
result_list <- list()
result_list[[paste0("dist_ao_", i)]] <- distm(WBES_angola_i,FDI_angola_i, fun = distGeo)/1000
rm(WBES_sf_angola_i,WBES_angola_i,FDI_angola_i)
}
and the output is just a list of 1 that contains dist_ao_62. Where do I avoid overwriting?
Untested (due to missing MRE) but should work:
result_list <- list()
for (i in c(15,17,20,24,25,26,27,28,29,45,50,52,55,60,62)) {
result_list[[paste0("dist_ao_", i)]] <- distm(as_Spatial(subset(WBES_sf_angola, isic == i)) , FDI_angola[FDI_angola$isic==i,], fun = distGeo)/1000
}
You could approach it this way. All resulting dataframes will be included in the list, which you can convert to a dataframe from the last line of the the code here. NOTE: since not reproducible, I have mostly taken the code from your question inside the loop.
WBES_sf_angola_result <- list() # renamed this, as it seems you are using a dataset with the name WBES_sf_angola
WBES_angola <- list()
FDI_angola <- list()
dist_ao <- list()
for (i in c(15,17,20,24,25,26,27,28,29,45,50,52,55,60,62)) {
WBES_sf_angola[[paste0("i_", i)]] <- subset(WBES_sf_angola, isic == i)
WBES_angola[[paste0("i_", i)] <- as_Spatial(WBES_sf_angola_i)
FDI_angola[[paste0("i_", i)] <- FDI_angola[FDI_angola$isic==i,]
dist_ao[[paste0("i_", i)] <- distm(WBES_angola_i,FDI_angola_i, fun = distGeo)/1000
rm(WBES_sf_angola_i,WBES_angola_i,FDI_angola_i)
}
WBES_sf_angola_result <- do.call(rbind, WBES_sf_angola_result) # to get a dataframe
Your subset data can also be accessed through list index. eg.
WBES_sf_angola_result[[i_15]] # for the first item.

R - Saving the values from a For loop in a vector or list

I'm trying to save each iteration of this for loop in a vector.
for (i in 1:177) {
a <- geomean(er1$CW[1:i])
}
Basically, I have a list of 177 values and I'd like the script to find the cumulative geometric mean of the list going one by one. Right now it will only give me the final value, it won't save each loop iteration as a separate value in a list or vector.
The reason your code does not work is that the object ais overwritten in each iteration. The following code for instance does what precisely what you desire:
a <- c()
for(i in 1:177){
a[i] <- geomean(er1$CW[1:i])
}
Alternatively, this would work as well:
for(i in 1:177){
if(i != 1){
a <- rbind(a, geomean(er1$CW[1:i]))
}
if(i == 1){
a <- geomean(er1$CW[1:i])
}
}
I started down a similar path with rbind as #nate_edwinton did, but couldn't figure it out. I did however come up with something effective. Hmmmm, geo_mean. Cool. Coerce back to a list.
MyNums <- data.frame(x=(1:177))
a <- data.frame(x=integer())
for(i in 1:177){
a[i,1] <- geomean(MyNums$x[1:i])
}
a<-as.list(a)
you can try to define the variable that can save the result first
b <- c()
for (i in 1:177) {
a <- geomean(er1$CW[1:i])
b <- c(b,a)
}

If else statement to delete repeated values

I'm a novice R user and have created a small script that is doing some trigonometry with movement data. I need to add a final column that deletes repeated values from the column before it.
I've tried adding an if else statement that seems to work when isolated, but keep having errors when it is put into the for loop. I'd appreciate any advice.
# trig loop
list.df <- vector("list", max(Sp_test$ID))
names1 <- c(1:max(Sp_test$ID))
for(i in 1:max(Sp_test$ID)) {
if(i %in% unique(Sp_test$ID)) {
idata <- subset(Sp_test, ID == i)
idata$originx <- idata[1,3]
idata$originy <- idata[1,4]
idata$deltax <- idata[,"UTME"]-idata[,"originx"]
idata$deltay <- idata[,"UTMN"]-idata[,"originy"]
idata$length <- sqrt((idata[,"deltax"])^2+(idata[,"deltay"]^2))
idata$arad <- atan2(idata[,"deltay"],idata[,"deltax"])
idata$xnorm <- idata[,"deltax"]/idata[,"length"]
idata$ynorm <- idata[,"deltay"]/idata[,"length"]
sumy <- sum(idata$ynorm, na.rm=TRUE)
sumx <- sum(idata$xnorm, na.rm=TRUE)
idata$vecsum <- atan2(sumy,sumx)
idata$width <- idata$length*sin(idata$arad-idata$vecsum)
# need if else statement excluding a repeat from the position just before it
list.df[[i]] <- idata
names1[i] <- i
} }
# this works alone, I think the problem is when it gets to the first of the dataset and there is not one before it
if (idata$width[j]==idata$width[j-1]) {
print("NA")
} else {
print(idata$width[j])
}
I think you want to use the function diff for this. diff(idata$width) will give the differences between successive values of idata$width. Then
idata$width[c(FALSE, diff(idata$width) == 0)] <- NA
I think does what you want. The initial FALSE is since there is no value corresponding to the first element (since as you rightly noted, the first element doesn't have an element before it).

Relooping a function over its own output

I have defined a function which I want to reapply to its own output multiple times. I tried
replicate(1000,myfunction)
but realised that this is just applying my function to my initial input 1000 times, rather than applying my function to the new output each time. In effect what I desire is:
function(function(...function(x_0)...))
1000 times over and being able to see the changes at each stage.
I have previous defined b as a certain vector of length 7.
b_0=b
C=matrix(0,7,1000)
for(k in 1:1000){
b_k=myfun(b_(k-1))
}
C=rbind(b_k)
C
Is this the right idea behind what I want?
You could use Reduce for this. For example
add_two <- function(a) a+2
ignore_current <- function(f) function(a,b) f(a)
Reduce(ignore_current(add_two), 1:10, init=4)
# 24
Normally Reduce expects to iterate over a set of new values, but in this case I use ignore_current to drop the sequence value (1:10) so that parameter is just used to control the number of times we repeat the process. This is the same as
add_two(add_two(add_two(add_two(add_two(add_two(add_two(add_two(add_two(add_two(4))))))))))
Pure functional programming approach, use Compose from functional package:
library(functional)
f = Reduce(Compose, replicate(100, function(x) x+2))
#> f(2)
#[1] 202
But this solution does not work for too big n ! Very interesting.
A loop would work just fine here.
apply_fun_n_times <- function(input, fun, n){
for(i in 1:n){
input <- fun(input)
}
return(input)
}
addone <- function(x){x+1}
apply_fun_n_times(1, addone, 3)
which gives
> apply_fun_n_times(1, addone, 3)
[1] 4
you can try a recursive function:
rec_func <- function(input, i=1000) {
if (i == 0) {
return(input)
} else {
input <- myfunc(input)
i <- i - 1
rec_func(input, i)
}
}
example
myfunc <- function(item) {item + 1}
> rec_func(1, i=1000)
[1] 1001

Output function results to a vector

I have created a function to call a function on each row in a dataset. I would like to have the output as a vector. As you can see below the function outputs the results to the screen, but I cannot figure out how to redirect the output to a vector that I can use outside the function.
n_markers <- nrow(data)
p_values <-rep(0, n_markers)
test_markers <- function()
{
for (i in 1:n_markers)
{
hets <- data[i, 2]
hom_1 <- data[i, 3]
hom_2 <- data[i, 4]
p_values[i] <- SNPHWE(hets, hom_1, hom_2)
}
return(p_values)
}
test_markers()
Did you just take this code from here? I worry that you didn't even try to figure it out on your own first, but hopefully I am wrong.
You might be overthinking this. Simply store the results of your function in a vector like you do with other functions:
stored_vector <- test_markers()
But, as mentioned in the comments, your function could probably be reduced to:
stored_vector <- sapply(1:nrow(data), function(i) SNPHWE(data[i,2],data[i,3],data[i,4]) )

Resources