Loop for minimum spanning tree does not work - r

we as 3 friends try to solve minimum spanning tree with coflicts problem using r. In solving this question, we read files in .txt format that contain for ex.
"1 2 5
2 4 6" etc. which indicates from node 1 to 2, there exists an edge with weight 5 and
"1 2 2 4" etc. which indicates there's a conflict relationship between the edges 1-2 and 2-4. To continue, we have to form an nxn conflict matrix in which we will store 0's if there exist no conflict relation between the edges or 1 if there exist a conflict relation. For this purpose, we developed a 3-for loop for(i in 1:dim(edges_read)[1]){
for(i in 1:dim(edges_read)[1]){
for(k in 1:dim(edges_read)[1]){
for(t in 1:dim(conflicts)[1]){
if(all(conflicts[t,] == c(edges_read[i,1], edges_read[i,2],
edges_read[k,1], edges_read[k,2]) )){
conflictmatrix[i,k] <- 1
}
}
}
}
However, R cannot get us a solution and this for loops take very long times. How can we solve this situation? Thanks for further assistance

As you have discovered, for() loops are not fast in R. There are faster approaches, but it's hard to provide examples without data. Please use something like dput(edges_read) and dput(conflicts) to provide a small example of the data.
As one example, you could implement the for loops in the Rcpp package for speed improvement. Based on the code in your question, you could re-implement the 3-loop code sort of like this:
Rcpp::cppFunction('NumericVector MSTC_nxn_Cpp(NumericMatrix edges_read, NumericMatrix conflicts){
int n = edges_read.nrow(); //output matrix size (adjust to what you need)
int m = conflicts.nrow(); //output matrix size (adjust to what you need)
NumericMatrix conflictmatrix( n , m ); //the output matrix
for(int i=0;i<n;i++){ //your i loop
for(int k=0;k<n;k++){ // your k loop
double te = edges_read( i, 0 ); //same as edges_read[i,1]
double tf = edges_read( i, 1 ); //same as edges_read[i,2]
double tg = edges_read( k, 0 ); //same as edges_read[k,1]
double th = edges_read( k, 1 ); //same as edges_read[k,2]
NumericVector w = NumericVector::create(te,tf,tg,th); //this could probably be more simple
for(int t=0;t<m;t++){ //your t loop
NumericVector v = conflicts( t , _ ); // same as conflicts[t,]
LogicalVector r; //vector for checking if conflicts and edges are the same
for(int p=0; p<4; p++){ //loop to check logic
r[p]=v[p]==w[p]; //True / False stored
};
int q = r.size();
for (int ii = 0; ii < q; ++ii) { //similar to all() This code could be simplified!
if (!r[ii]) {false;}
else{conflictmatrix[i,k] = 1;}}
}}}
return conflictmatrix; //your output
}')
#Then run the function
MSTC_nxn_Cpp(edges_read, conflicts )

Related

How to find a pair of numbers in a list given a specific range?

The problem is as such:
given an array of N numbers, find two numbers in the array such that they will have a range(max - min) value of K.
for example:
input:
5 3
25 9 1 6 8
output:
9 6
So far, what i've tried is first sorting the array and then finding two complementary numbers using a nested loop. However, because this is a sort of brute force method, I don't think it is as efficient as other possible ways.
import java.util.*;
public class Main {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
int n = sc.nextInt(), k = sc.nextInt();
int[] arr = new int[n];
for(int i = 0; i < n; i++) {
arr[i] = sc.nextInt();
}
Arrays.sort(arr);
int count = 0;
int a, b;
for(int i = 0; i < n; i++) {
for(int j = i; j < n; j++) {
if(Math.max(arr[i], arr[j]) - Math.min(arr[i], arr[j]) == k) {
a = arr[i];
b = arr[j];
}
}
}
System.out.println(a + " " + b);
}
}
Much appreciated if the solution was in code (any language).
Here is code in Python 3 that solves your problem. This should be easy to understand, even if you do not know Python.
This routine uses your idea of sorting the array, but I use two variables left and right (which define two places in the array) where each makes just one pass through the array. So other than the sort, the time efficiency of my code is O(N). The sort makes the entire routine O(N log N). This is better than your code, which is O(N^2).
I never use the inputted value of N, since Python can easily handle the actual size of the array. I add a sentinel value to the end of the array to make the inner short loops simpler and quicker. This involves another pass through the array to calculate the sentinel value, but this adds little to the running time. It is possible to reduce the number of array accesses, at the cost of a few more lines of code--I'll leave that to you. I added input prompts to aid my testing--you can remove those to make my results closer to what you seem to want. My code prints the larger of the two numbers first, then the smaller, which matches your sample output. But you may have wanted the order of the two numbers to match the order in the original, un-sorted array--if that is the case, I'll let you handle that as well (I see multiple ways to do that).
# Get input
N, K = [int(s) for s in input('Input N and K: ').split()]
arr = [int(s) for s in input('Input the array: ').split()]
arr.sort()
sentinel = max(arr) + K + 2
arr.append(sentinel)
left = right = 0
while arr[right] < sentinel:
# Move the right index until the difference is too large
while arr[right] - arr[left] < K:
right += 1
# Move the left index until the difference is too small
while arr[right] - arr[left] > K:
left += 1
# Check if we are done
if arr[right] - arr[left] == K:
print(arr[right], arr[left])
break

Course Scheduling from Leetcode

This is my solution to Course Scheduling Problem from leetcode. I am looking for any suggestions to improve my code, even slightest ones.
Here is the question:
There are a total of n courses you have to take, labeled from 0 to n-1.
Some courses may have prerequisites, for example to take course 0 you have to first take course 1, which is expressed as a pair: [0,1]
Given the total number of courses and a list of prerequisite pairs, return the ordering of courses you should take to finish all courses.
There may be multiple correct orders, you just need to return one of them. If it is impossible to finish all courses, return an empty array.
Example 1:
Input: 2, [[1,0]]
Output: [0,1]
Explanation: There are a total of 2 courses to take. To take course 1 you should have finished course 0. So the correct course order is [0,1] .
Example 2:
Input: 4, [[1,0],[2,0],[3,1],[3,2]]
Output: [0,1,2,3] or [0,2,1,3]
Explanation: There are a total of 4 courses to take. To take course 3 you should have finished both courses 1 and 2. Both courses 1 and 2 should be taken after you finished course 0. So one correct course order is [0,1,2,3]. Another correct ordering is [0,2,1,3].
Here is my solution:
class Solution:
def findOrder(self, numCourses, prerequisites):
"""
:type numCourses: int
:type prerequisites: List[List[int]]
:rtype: bool
"""
#Convert prerequisites into an adjacency list
adj = []
for i in range(numCourses):
adj.append(set())
for pair in prerequisites:
adj[pair[0]].add(pair[1])
def DFSHelper(s):
visited.add(s)
stack.add(s)
for neighbor in adj[s]:
# if neighbor vertex has never been visted before, there is no way it could be a backedge.
# visit this unvisited vertex
if(neighbor not in visited):
if(not DFSHelper(neighbor)):
return False
Sorted.append(neighbor)
else:
if(neighbor in stack):
return False
stack.remove(s)
return True
visited = set()
stack = set()
Sorted = []
for j in range(len(adj)):
if(j not in visited):
if(not DFSHelper(j)):
print(j)
return []
Sorted.append(j)
return Sorted
I first converted given prerequisites list into an adjacency list representation of graph, then did topological sorting of the graph. I used DFS recursively to topologically sort the graph. The list Sorted stores the result of sorting. While doing DFS I also checked if the graph contains any cycle, if it does just return []. For purpose of checking cycle I maintained a set called stack that stores all the vertices that are currently in call stack.
This is a simple question first create a graph and then find topological sorting on nodes.
If topological order contains all nodes then we have our answer else not possible to finish all the courses.
class Solution {
public int[] findOrder(int n, int[][] prerequisites) {
List<Integer>[] g = new ArrayList[n];
for(int i = 0; i < n; i++)g[i] = new ArrayList();
int[] deg = new int[n];
for(int[] e: prerequisites) {
g[e[1]].add(e[0]);
deg[e[0]]++;
}
Queue<Integer> q = new LinkedList();
for(int i = 0; i < n; i++) {
if(deg[i] == 0)q.add(i);
}
int[] res = new int[n];
int idx = 0;
while(!q.isEmpty()) {
int u = q.poll();
res[idx++] = u;
for(int v: g[u]) {
deg[v]--;
if(deg[v] == 0) q.add(v);
}
}
return idx == n ? res: new int[0];
}}

Recursion confusion with local variables

I'm trying to improve my recursion skill(reading a written recursion function) by looking at examples. However, I can easily get the logic of recursions without local variables. In below example, I can't understand how the total variables work. How should I think a recursive function to read and write by using local variables? I'm thinking it like stack go-hit-back. By the way, I wrote the example without variables. I tried to write just countThrees(n / 10); instead of total = total + countThrees(n / 10); but it doesn't work.
with total variable:
int countThrees(int n) {
if (n == 0) { return 0; }
int lastDigit = n % 10;
int total = 0;
total = total + countThrees(n / 10);
if (lastDigit == 3) {
total = total + 1;
}
return total;
}
simplified version
int countThrees(int x)
{
if (x / 10 == 0) return 0;
if (x % 10 == 3)
return 1 + countThrees(x / 10);
return countThrees(x / 10);
}
In both case, you have to use a stack indeed, but when there are local variables, you need more space in the stack as you need to put every local variables inside. In all cases, the line number from where you jump in a new is also store.
So, in your second algorithme, if x = 13, the stack will store "line 4" in the first step, and "line 4; line 3" in the second one, in the third step you don't add anything to the stack because there is not new recursion call. At the end of this step, you read the stack (it's a First in, Last out stack) to know where you have to go and you remove "line 3" from the stack, and so.
In your first algorithme, the only difference is that you have to add the locale variable in the stack. So, at the end of the second step, it looks like "Total = 0, line 4; Total = 0, line 4".
I hope to be clear enough.
The first condition should read:
if (x == 0) return 0;
Otherwise the single 3 would yield 0.
And in functional style the entire code reduces to:
return x == 0 ? 0
: countThrees(x / 10) + (x % 10 == 3 ? 1 : 0);
On the local variables:
int countThrees(int n) {
if (n == 0) {
return 0;
}
// Let an alter ego do the other digits:
int total = countThrees(n / 10);
// Do this digit:
int lastDigit = n % 10;
if (lastDigit == 3) {
++total;
}
return total;
}
The original code was a bit undecided, when or what to do, like adding to total after having it initialized with 0.
By declaring the variable at the first usage, things become more clear.
For instance the absolute laziness: first letting the recursive instances calculate the total of the other digits, and only then doing the last digit oneself.
Using a variable lastDigit with only one usage is not wrong; it explains what is happening: you inspect the last digit.
Preincrement operator ++x; is x += 1; is x = x + 1;.
One could have done it (recursive call and own work) the other way around, so it probably says something about the writer's psychological preferences
The stack usage: yes total before the recursive call is an extra variable on the stack. Irrelevant for numbers. Also a smart compiler could see that total is a result.
On the usage of variables: they can be stateful, and hence are useful for turning recursion into iteration. For that tail recursion is easiest: the recursion happening last.
int countThrees(int n) {
int total = 0;
while (n != 0) {
int digit = n % 10;
if (digit == 3) {
++total;
}
n /= 10; // Divide by 10
}
return total;
}

Fastest way to drop rows with missing values?

I'm working with a large dataset x. I want to drop rows of x that are missing in one or more columns in a set of columns of x, that set being specified by a character vector varcols.
So far I've tried the following:
require(data.table)
x <- CJ(var1=c(1,0,NA),var2=c(1,0,NA))
x[, textcol := letters[1:nrow(x)]]
varcols <- c("var1","var2")
x[, missing := apply(sapply(.SD,is.na),1,any),.SDcols=varcols]
x <- x[!missing]
Is there a faster way of doing this?
Thanks.
This should be faster than using apply:
x[rowSums(is.na(x[, ..varcols])) == 0, ]
# var1 var2 textcol
# 1: 0 0 e
# 2: 0 1 f
# 3: 1 0 h
# 4: 1 1 i
Here is a revised version of a c++ solution with a number of modifications based on a long discussion with Matthew (see comments below). I am new to c so I am sure that someone might still be able to improve this.
After library("RcppArmadillo") you should be able to run the whole file including the benchmark using sourceCpp('cleanmat.cpp'). The c++-file includes two functions. cleanmat takes two arguments (X and the index of the columns) and returns the matrix without the columns with missing values. keep just takes one argument X and returns a logical vector.
Note about passing data.table objects: These functions do not accept a data.table as an argument. The functions have to be modified to take DataFrame as an argument (see here.
cleanmat.cpp
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
mat cleanmat(mat X, uvec idx) {
// remove colums
X = X.cols(idx - 1);
// get dimensions
int n = X.n_rows,k = X.n_cols;
// create keep vector
vec keep = ones<vec>(n);
for (int j = 0; j < k; j++)
for (int i = 0; i < n; i++)
if (keep[i] && !is_finite(X(i,j))) keep[i] = 0;
// alternative with view for each row (slightly slower)
/*vec keep = zeros<vec>(n);
for (int i = 0; i < n; i++) {
keep(i) = is_finite(X.row(i));
}*/
return (X.rows(find(keep==1)));
}
// [[Rcpp::export]]
LogicalVector keep(NumericMatrix X) {
int n = X.nrow(), k = X.ncol();
// create keep vector
LogicalVector keep(n, true);
for (int j = 0; j < k; j++)
for (int i = 0; i < n; i++)
if (keep[i] && NumericVector::is_na(X(i,j))) keep[i] = false;
return (keep);
}
/*** R
require("Rcpp")
require("RcppArmadillo")
require("data.table")
require("microbenchmark")
# create matrix
X = matrix(rnorm(1e+07),ncol=100)
X[sample(nrow(X),1000,replace = TRUE),sample(ncol(X),1000,replace = TRUE)]=NA
colnames(X)=paste("c",1:ncol(X),sep="")
idx=sample(ncol(X),90)
microbenchmark(
X[!apply(X[,idx],1,function(X) any(is.na(X))),idx],
X[rowSums(is.na(X[,idx])) == 0, idx],
cleanmat(X,idx),
X[keep(X[,idx]),idx],
times=3)
# output
# Unit: milliseconds
# expr min lq median uq max
# 1 cleanmat(X, idx) 253.2596 259.7738 266.2880 272.0900 277.8921
# 2 X[!apply(X[, idx], 1, function(X) any(is.na(X))), idx] 1729.5200 1805.3255 1881.1309 1913.7580 1946.3851
# 3 X[keep(X[, idx]), idx] 360.8254 361.5165 362.2077 371.2061 380.2045
# 4 X[rowSums(is.na(X[, idx])) == 0, idx] 358.4772 367.5698 376.6625 379.6093 382.5561
*/
For speed, with a large number of varcols, perhaps look to iterate by column. Something like this (untested) :
keep = rep(TRUE,nrow(x))
for (j in varcols) keep[is.na(x[[j]])] = FALSE
x[keep]
The issue with is.na is that it creates a new logical vector to hold its result, which then must be looped through by R to find the TRUEs so it knows which of the keep to set FALSE. However, in the above for loop, R can reuse the (identically sized) previous temporary memory for that result of is.na, since it is marked unused and available for reuse after each iteration completes. IIUC.
1. is.na(x[, ..varcols])
This is ok but creates a large copy to hold the logical matrix as large as length(varcols). And the ==0 on the result of rowSums will need a new vector, too.
2. !is.na(var1) & !is.na(var2)
Ok too, but ! will create a new vector again and so will &. Each of the results of is.na have to be held by R separately until the expression completes. Probably makes no difference until length(varcols) increases a lot, or ncol(x) is very large.
3. CJ(c(0,1),c(0,1))
Best so far but not sure how this would scale as length(varcols) increases. CJ needs to allocate new memory, and it loops through to populate that memory with all the combinations, before the join can start.
So, the very fastest (I guess), would be a C version like this (pseudo-code) :
keep = rep(TRUE,nrow(x))
for (j=0; j<varcols; j++)
for (i=0; i<nrow(x); i++)
if (keep[i] && ISNA(x[i,j])) keep[i] = FALSE;
x[keep]
That would need one single allocation for keep (in C or R) and then the C loop would loop through the columns updating keep whenever it saw an NA. The C could be done in Rcpp, in RStudio, inline package, or old school. It's important the two loops are that way round, for cache efficiency. The thinking is that the keep[i] && part helps speed when there are a lot of NA in some rows, to save even fetching the later column values at all after the first NA in each row.
Two more approaches
two vector scans
x[!is.na(var1) & !is.na(var2)]
join with unique combinations of non-NA values
If you know the possible unique values in advance, this will be the fastest
system.time(x[CJ(c(0,1),c(0,1)), nomatch=0])
Some timings
x <-data.table(var1 = sample(c(1,0,NA), 1e6, T, prob = c(0.45,0.45,0.1)),
var2= sample(c(1,0,NA), 1e6, T, prob = c(0.45,0.45,0.1)),
key = c('var1','var2'))
system.time(x[rowSums(is.na(x[, ..varcols])) == 0, ])
user system elapsed
0.09 0.02 0.11
system.time(x[!is.na(var1) & !is.na(var2)])
user system elapsed
0.06 0.02 0.07
system.time(x[CJ(c(0,1),c(0,1)), nomatch=0])
user system elapsed
0.03 0.00 0.04

iterative version of easy recursive algorithm

I have a quite simple question, I think.
I've got this problem, which can be solved very easily with a recursive function, but which I wasn't able to solve iteratively.
Suppose you have any boolean matrix, like:
M:
111011111110
110111111100
001111111101
100111111101
110011111001
111111110011
111111100111
111110001111
I know this is not an ordinary boolean matrix, but it is useful for my example.
You can note there is sort of zero-paths in there...
I want to make a function that receives this matrix and a point where a zero is stored and that transforms every zero in the same area into a 2 (suppose the matrix can store any integer even it is initially boolean)
(just like when you paint a zone in Paint or any image editor)
suppose I call the function with this matrix M and the coordinate of the upper right corner zero, the result would be:
111011111112
110111111122
001111111121
100111111121
110011111221
111111112211
111111122111
111112221111
well, my question is how to do this iteratively...
hope I didn't mess it up too much
Thanks in advance!
Manuel
ps: I'd appreciate if you could show the function in C, S, python, or pseudo-code, please :D
There is a standard technique for converting particular types of recursive algorithms into iterative ones. It is called tail-recursion.
The recursive version of this code would look like (pseudo code - without bounds checking):
paint(cells, i, j) {
if(cells[i][j] == 0) {
cells[i][j] = 2;
paint(cells, i+1, j);
paint(cells, i-1, j);
paint(cells, i, j+1);
paint(cells, i, j-1);
}
}
This is not simple tail recursive (more than one recursive call) so you have to add some sort of stack structure to handle the intermediate memory. One version would look like this (pseudo code, java-esque, again, no bounds checking):
paint(cells, i, j) {
Stack todo = new Stack();
todo.push((i,j))
while(!todo.isEmpty()) {
(r, c) = todo.pop();
if(cells[r][c] == 0) {
cells[r][c] = 2;
todo.push((r+1, c));
todo.push((r-1, c));
todo.push((r, c+1));
todo.push((r, c-1));
}
}
}
Pseudo-code:
Input: Startpoint (x,y), Array[w][h], Fillcolor f
Array[x][y] = f
bool hasChanged = false;
repeat
for every Array[x][y] with value f:
check if the surrounding pixels are 0, if so:
Change them from 0 to f
hasChanged = true
until (not hasChanged)
For this I would use a Stack ou Queue object. This is my pseudo-code (python-like):
stack.push(p0)
while stack.size() > 0:
p = stack.pop()
matrix[p] = 2
for each point in Arround(p):
if matrix[point]==0:
stack.push(point)
The easiest way to convert a recursive function into an iterative function is to utilize the stack data structure to store the data instead of storing it on the call stack by calling recursively.
Pseudo code:
var s = new Stack();
s.Push( /*upper right point*/ );
while not s.Empty:
var p = s.Pop()
m[ p.x ][ p.y ] = 2
s.Push ( /*all surrounding 0 pixels*/ )
Not all recursive algorithms can be translated to an iterative algorithm. Normally only linear algorithms with a single branch can. This means that tree algorithm which have two or more branches and 2d algorithms with more paths are extremely hard to transfer into recursive without using a stack (which is basically cheating).
Example:
Recursive:
listsum: N* -> N
listsum(n) ==
if n=[] then 0
else hd n + listsum(tl n)
Iteration:
listsum: N* -> N
listsum(n) ==
res = 0;
forall i in n do
res = res + i
return res
Recursion:
treesum: Tree -> N
treesum(t) ==
if t=nil then 0
else let (left, node, right) = t in
treesum(left) + node + treesum(right)
Partial iteration (try):
treesum: Tree -> N
treesum(t) ==
res = 0
while t<>nil
let (left, node, right) = t in
res = res + node + treesum(right)
t = left
return res
As you see, there are two paths (left and right). It is possible to turn one of these paths into iteration, but to translate the other into iteration you need to preserve the state which can be done using a stack:
Iteration (with stack):
treesum: Tree -> N
treesum(t) ==
res = 0
stack.push(t)
while not stack.isempty()
t = stack.pop()
while t<>nil
let (left, node, right) = t in
stack.pop(right)
res = res + node + treesum(right)
t = left
return res
This works, but a recursive algorithm is much easier to understand.
If doing it iteratively is more important than performance, I would use the following algorithm:
Set the initial 2
Scan the matrix for finding a 0 near a 2
If such a 0 is found, change it to 2 and restart the scan in step 2.
This is easy to understand and needs no stack, but is very time consuming.
A simple way to do this iteratively is using a queue.
insert starting point into queue
get first element from queue
set to 2
put all neighbors that are still 0 into queue
if queue is not empty jump to 2.

Resources