Optimizing for + if in R - r

I am a bit lost about how to optimize for loops in R.
I have a set such that element i belongs to the set iff contains[[i]] == 1. I want to check whether sets of indices are included in this set. Currently I have the following code. Can it be written more efficiently?
contains = c(1, 0, 0, 1, 0, 1, 1, 0)
indices = c(4, 5) # not ok
# indices = c(4, 6) # ok
ok <- TRUE
for (index in indices) {
if (contains[[index]] == 0) {
ok <- FALSE
break
}
}
if (ok) {
print("ok")
} else {
print("not ok")
}

I would suggest either of these:
ok = all(indices %in% which(contains == 1))
ok = all(contains[indices] == 1)
They will be faster than a for loop in almost all cases. (Exception: if the vectors involved are very long and there is an early discrepancy, your break will stop searching as soon as a first false is found and probably be faster.)
If you need really fast solutions on biggish data, please share some code to simulate data at scale so we can benchmark on a relevant use case.

Related

Compare function ( ==, all.equal) is not working properly when it comes to two digit numbers in R?

I am trying to compare two number in my code, when it comes to compare one digit number , it works fine whether I use == or all.equall function, but when it comes to comparing 2 digit number or more like 17, it can't say they are the same, I have already go through this thread and all.equall is not working as well. beside my numbers are all integers. can any one tell me what the problem is here ?
I'll put the code here so the problem can be reproducible.
library(igraph)
node1<- c(1,1,1,2,2,2,3,3,4,4,5,5,7,8,9,9,10,12,14,14,17,17,19)
node2<-c(2,3,4,5,6,17,12,14,7,8,6,13,14,9,10,11,11,13,16,15,18,19,20)
AZADEH_GRAPH.data <- data.frame(node1,node2)
dataframe_AZADEH_GRAPH<-AZADEH_GRAPH
graph_AZADEH_GRAPH=graph.data.frame(dataframe_AZADEH_GRAPH,directed=FALSE)
Nodes1_AZADEH_GRAPH<- replicate(vcount(graph_AZADEH_GRAPH), 0)
SuperEgo_AZADEH_GRAPH<- list()
Com_AZADEH_GRAPH<- list()
community_member <-matrix()
neghbor_list<-list()
count_neighbors<-list()
community_1<-list()
SuperEgo_AZADEH_GRAPH[[2]]=make_ego_graph(graph_AZADEH_GRAPH,2,
V(graph_AZADEH_GRAPH)$name[2],
mode = "all",mindist = 0)
Com_AZADEH_GRAPH[[2]] <- cluster_infomap(SuperEgo_AZADEH_GRAPH[[2]][[1]])
community_member<-data.matrix(membership(Com_AZADEH_GRAPH[[2]]))
neghbor_list[2]=ego(graph_AZADEH_GRAPH, order = 1,
nodes = V(graph_AZADEH_GRAPH)$name[2], mode = "all",mindist = 1)
count_neighbors[2]=length(neghbor_list[[2]])
for (k in 1:nrow(community_member))
{
RRR<-cbind(community_member,as.integer(rownames(community_member)[k]))
}
for (n in 1:nrow(RRR))
{
RRR[n,2]<-as.integer(rownames(RRR)[n])
}
for (i in 1: length(neghbor_list[[2]]))
{
for (j in 1:nrow(RRR))
{
if (neghbor_list[[2]][i]==RRR[[j,2]])
{
community_1[i]=RRR[[j,1]]
}
}
}
the problem is with if statements and more specifically when i=3 and j=6 neghbor_list[[2]][3],
RRR[[6,2]] both return 17 but still it gives False it is working fine when i=1 & 2
(Posted solution on behalf of the question author).
The issue is found, it was referring to the indexes, I should have use $name instead after neghbor_list[[2]][3].

if else statement concatenation - R

This is a very common question: 1, 2, 3, 4, 5, and still I cannot find even an answer to my problem.
If a == 1, then do X.
If a == 0, then do Y.
If a == 0 and b == 1, then do Z.
Just to explain: the if else statements has to do Y if a==0 no matter the value of b. But if b == 1 and a == 0, Z will do additional changes to those already done by Y.
My current code and its error:
if (a == 1){
X
} else if(a == 0){
Y
} else if (a == 0 & b == 1){
Z}
Error in !criterion : invalid argument type
An else only happens if a previous if hasn't happened.
When you say
But if b == 1 and a == 0, Z will do additional changes to those already done by Y
Then you have two options:
## Option 1: nest Z inside Y
if (a == 1){
X
} else if(a == 0){
Y
if (b == 1){
Z
}
}
## Option 2: just use `if` again (not `else if`):
if (a == 1) {
X
} else if(a == 0) {
Y
}
if (a == 0 & b == 1) {
Z
}
Really, you don't need any else here at all.
## This will work just as well
## (assuming that `X` can't change the value of a from 1 to 0
if (a == 1) {
X
}
if (a == 0) {
Y
if (b == 1){
Z
}
}
Typically else is needed when you want to have a "final" action that is done only if none of the previous if options were used, for example:
# try to guess my number between 1 and 10
if (your_guess == 8) {
print("Congratulations, you guessed my number!")
} else if (your_guess == 7 | your_guess = 9) {
print("Close, but not quite")
} else {
print("Wrong. Not even close!")
}
In the above, else is useful because I don't want to have enumerate all the other possible guesses (or even bad inputs) that a user might enter. If they guess 8, they win. If they guess 7 or 9, I tell them they were close. Anything else, no matter what it is, I just say "wrong".
Note: this is true for programming languages in general. It is not unique to R.
However, since this is in the R tag, I should mention that R has if{}else{} and ifelse(), and they are different.
if{} (and optionally else{}) evaluates a single condition, and you can run code to do anything in {} depending on that condition.
ifelse() is a vectorized function, it's arguments are test, yes, no. The test evaluates to a boolean vector of TRUE and FALSE values. The yes and no arguments must be vectors of the same length as test. The result will be a vector of the same length as test, with the corresponding values of yes (when test is TRUE) and no (when test is FALSE).
I believe you want to include Z in the second condition like this:
if (a == 1){X}
else if(a == 0){
Y
if (b == 1){Z}
}

NSGA2 Genetic Algorithm in R

I am working on the NSGA2 package on R (library mco).
My NSGA2 code takes forever to run, so I am wondering:
1) Is there a way to limit the precision of the solution values (say, maybe up to 3 decimal places) instead of infinite?
2) How do I set an equality constraint (the ones online all seemed to be about >= or <= than =)? Not sure if I'm doing it right.
My entire relevant code for reference, for easy tracing: https://docs.google.com/document/d/1xj7OPng11EzLTTtWLdRWMm8zJ9f7q1wsx2nIHdh3RM4/edit?usp=sharing
Relevant sample part of code reproduced here:
VTR = get.hist.quote(instrument = 'VTR',
start="2010-01-01", end = "2015-12-31",
quote = c("AdjClose"),provider = "yahoo",
compress = "d")
ObjFun1 <- function (xh){
f1 <- sum(HSVaR_P(merge(VTR, CMI, SPLS, KSS, DVN, MAT, LOE, KEL, COH, AXP), xh, 0.05, 2))
tempt = merge(VTR, CMI, SPLS, KSS, DVN, MAT, LOE, KEL, COH, AXP)
tempt2 = tempt[(nrow(tempt)-(2*N)):nrow(tempt),]
for (i in 1:nrow(tempt2))
{
for (j in 1:ncol(tempt2))
{
if (is.na(tempt2[i,j]))
{
tempt2[i,j] = 0
}
}
}
f2 <- ((-1)*abs(sum((xh*t(tempt2)))))
c(f1=f1,f2=f2)
}
Constr <- function(xh){
totwt <- (1-sum(-xh))
totwt2 <- (sum(xh)-1)
c(totwt,totwt2)
}
Solution1 <- nsga2(ObjFun1, n.projects, 2,
lower.bounds=rep(0,n.projects), upper.bounds=rep(1,n.projects),
popsize=n.solutions, constraints = Constr, cdim=1,
generations=generations)
The function HSVaR_P returns matrix(x,2*500,1).
Even when I set generations = 1, the code does not seem to run. Clearly there should be some error in the code, somewhere, but I am not entirely sure about the mechanics of the NSGA2 algorithm.
Thanks.

Count Sequential Occurrences of Value in Vector

Given a generic vector x (which could be numeric, character, factor, etc.), I need to be able to count the number of sequential occurrences of value, including singletons.
x <- c(0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1)
In this case, the following would be returned if value == 0.
[1] 1 1 2 3
This code below works, but it is very slow when x gets large. Does anyone know of a way to speed this up? I imagine there must be a clever vectorized way of doing this.
getSequential <- function(x, value) {
counts <- c()
last <- FALSE
for (i in 1:length(x)) {
if (last & x[i] == value) {
counts[length(counts)] <- counts[length(counts)] + 1
} else if (x[i] == value) {
counts <- c(counts, 1)
last <- TRUE
} else {
last <- FALSE
}
}
return(counts)
}
you can use rle
rle(x)$lengths[which(rle(x)$values==0)]
# 1 1 2 3
for speed, you could only run rle once:
x1 <- rle(x)
x1$lengths[which(x1$values==0)]
Well, the code is quite good. I doubt using rle and with or which together would increase speed of the algorithm (much).
My proposition:
counting(int[] input, int value) {
int[] len = int[size(input)](); \\assume worst case scenario
len[0] = 0;
int j = 0;
for (i = 0; i < size(input); i++) { \\2 unit operations
if (input[i] != value && len[j] == 0) \\followed relatively often, 3-4 unit operations (1-2 unit operations on this step)
continue; \\5 unit operations
else if (input[1] == value) \\also followed relatively often, 4 unit operations (on '&&' if first is false, second is not checked)
len[j]++; \\5 unit operations
else /*if (input[i] != value && len[j] != 0)*/ { \\4 unit operations (commented operation is not needed (only remaining possible outcome)
j++; \\5 unit operations
len[j] = 0; \\7 unit operations (access and substitution)
continue; \\8 unit operations
}
}
}
As you can see there are at most 8 unit operations, and at best 4-5. Worst case scenario is that there are n/2 paths that go through 8 operations, but most of the cases I suppose it would follow one of the 5 step path.
Well, perhaps rle and those other functions are better optimized, but question is, is it optimized for your problem? I recommend checking.

R - Arrays with variable dimension

I have a weird question..
Essentially, I have a function which takes a data frame of dimension Nx(2k) and transforms it into an array of dimension Nx2xk. I then further use that array in various locations in the function.
My issue is this, when k == 2, I'm left with a matrix of degree Nx2, and even worse, if N = 1, I'm stuck with a matrix of degree 1x2.
I would like to write myArray[thisRow,,] to select that slice of the array, but this falls short for the N = 1, k = 2 case. I tried myArray[thisRow,,,drop = FALSE] but that gives an 'incorrect number of dimensions' error. This same issue arrises for the Nx2 case.
Is there a work around for this issue, or do I need to break my code into cases?
Sample Code Shown Below:
thisFunction <- function(myDF)
{
nGroups = NCOL(myDF)/2
afMyArray = myDF
if(nGroups > 1)
{
afMyArray = abind(lapply(1:nGroups, function(g){myDF[,2*(g-1) + 1:2]}),
along = 3)
}
sapply(1:NROW(myDF),
function(r)
{
thisSlice = afMyArray[r,,]
*some operation on thisSlice*
})
}
Thanks,
James

Resources