nodes Similarity in R - r

i try to compute similarity between a pre-known node and all the other nodes of the graph in R. and at each step, if the similarity exceeds a certain threshold i put the node in a vector, for storing all the nodes in dataframe ( for each node, i will give their similars).
but,this code give only the last node, and his last similar node.
v <- DC2$node[order(-DC2$'Centrality')]
Nei1 <- neighbors(g1,as.character(v[1]),1)
vec <- numeric()
if(length(Nei1) > 0) {
for (i in 1:length(V(g1))) {
Nei2 <- neighbors(g,as.character(V(g1)[i]),1)
k1 <- as.numeric(degree(g1,as.character(v[1])))
k2 <- as.numeric(degree(g1,as.character(V(g1)[i])))
Simhpi <- (length(intersect(Nei1,Nei2)) / min(k1,k2))
if (Simhpi >= 0.5) {
for (j in 1:length(V(g1))) {
vv <- V(g1)[j]
vec[j] <- c(vec,vv$name)
}
}
}
}
nn<- data.frame(node=as.character(v[1]), Nei=vec)
thanks for your help.

Related

R Raster - Create layer with conditionals looping through multiple layers

I am working with a time-series raster brick. The brick has 365 layers representing a value for each day of the year.
I want to create a new layer in which each cell holds the number of day of year in which a certain condition is met.
My current approach is the following (APHRO being the raster brick), but returns the error message below:
enter code here
r <- raster(ncol=40, nrow=20)
r[] <- rnorm(n=ncell(r))
APHRO <- brick(x=c(r, r*2, r))
NewLayer <- calc(APHRO, fun=FindOnsetDate(APHRO))
Returning this error:
Error in .local(x, ...) : not a valid subset
And the function being parsed:
FindOnsetDate <- function (s) {
x=0
repeat {
x+1
if(s[[x]] >= 20 | s[[x]] + s[[x+1]] >= 20 & ChkFalseOnset() == FALSE)
{break}
}
return(x);
}
With the function for the 3rd condition being:
ChkFalseOnset <- function (x) {
for (i in 0:13){
if (sum(APHRO[[x+i:x+i+7]]) >= 5)
{return(FALSE); break}
return(TRUE)
}
}
Thank you in advance!!!!
And please let me know if I should provide more information - tried to keep it parsimonious.
The problem is that your function is no good:
FindOnsetDate <- function (s) {
x=0
repeat {
x+1
if(s[[x]] >= 20 | s[[x]] + s[[x+1]] >= 20)
{break}
}
return(x);
}
FindOnsetDate(1:100)
#Error in s[[x]] :
# attempt to select less than one element in get1index <real>
Perhaps something like this:
FindOnsetDate <- function (s) {
j <- s + c(s[-1], 0)
sum(j > 20 | s > 20)
# if all values are positive, just do sum(j > 20)
}
FindOnsetDate(1:20)
#10
This works now:
r <- calc(APHRO, FindOnsetDate)
I would suggest a basic two-step process. With a 365-days example:
set.seed(123)
r <- raster(ncol=40, nrow=20)
r_list <- list()
for(i in 1:365){
r_list[[i]] <- setValues(r,rnorm(n=ncell(r),mean = 10,sd = 5))
}
APHRO <- brick(r_list)
Use a basic logic test for each iteration:
r_list2 <- list()
for(i in 1:365){
if(i != 365){
r_list2[[i]] <- APHRO[[i]] >= 20 | APHRO[[i]] + APHRO[[i+1]] >= 20
}else{
r_list2[[i]] <- APHRO[[i]] >= 20
}
}
Compute sum by year:
NewLayer <- calc(brick(r_list2), fun=sum)
plot(NewLayer)

optimise algorithm for building a graph based on node weights

I am trying to improve a function to build a network based on the score calculated from some node attributes. The function tries to find the best subnetwork from a graph maximizing the product of node's attributes.
The function starts in a random node and start searching in the first neighbor, if there are some neighbors whose node's score suffice a threshold, the neighbour/s is added to the first node and the process continues until no more are added (the addition of the neighbour does not produce the desired increment in the score). If there is no node in the first neighbours that yields the increment of the score, then the function looks to the second degree neighbours. In this situation, it is very likely that there are several paths to connect the node (2nd degree neighbour), in this specific case, the chosen path will be the shortest with the highest weight (one of the nodes attribute).
I could do some paralelization of the code, although I don't know how to implement it in this type of function.
The function is the following:
build_network <-
function (G, seed, d= 2){
net <- G
d <- d
score.fun<-function(g){
Za <- sum(V(g)$weight*V(g)$RWRNodeweight)/sqrt(sum(V(g)$RWRNodeweight^2))
k <- vcount(g)
tmp <- genesets.length.null.stat[[as.character(k)]] # genesets.length.null.stat is a list with the median of Za and sd of Za calculated for 1000 replicates of networks of size k
Sa <- (Za-tmp[1])/tmp[2]
}
best.fun<-function(in.nodes,out.nodes) {
score<-(-Inf); best<-character()
for(node in out.nodes){
subG.update<-induced.subgraph(net, c(in.nodes,node))
if( score.fun(subG.update) > score ){
score<-score.fun(subG.update)
best<-node
}
}
list("node"=best,"score"=score)
}
subG <- induced.subgraph(net, seed)
if (!is.connected(subG)) { #the seed must be connected
stop("Input seeds are disjoint")
}
while (TRUE) {
in.nodes <- V(subG)$name
node_num <- vcount(subG)
subsum <- score.fun(subG)
#subx <- V(subG)$name
for (rad in 1:d) {
tmp.neigh <- unlist(neighborhood(net, order = rad, nodes = V(subG)$name))
pot.nodes <- V(net)[tmp.neigh]$name
out.nodes <- setdiff(pot.nodes, in.nodes)
if (length(out.nodes) == 0) break
best_node<-best.fun(in.nodes, out.nodes)
new_score<-best_node$score
best_node<-best_node$node
if (new_score > subsum + 0.01) {
tmp <- unlist(lapply(best_node, function(x) node2treePath(net,V(subG)$name, x))) # node2treePath is a function to retrieve the shortest path with the highest node weights
in.nodes <- c(tmp, V(subG)$name)
subG <- induced.subgraph(net, in.nodes)
break
}
}
if (node_num == vcount(subG)) break
}
return(subG)
}
I am trying to apply this function to a graph of ~10,000 nodes. Here will be an approximation of the code for running the function
### generate some example data
library(igraph)
my_graph <- erdos.renyi.game(10000, 0.0003)
V(my_graph)$name <- 1:vcount(my_graph)
V(my_graph)$weight <- rnorm(10000)
V(my_graph)$RWRNodeweight <- runif(10000, min=0, max=0.05)
### Run the function
sublist = list()
for (node in V(G)$name) {
subnet <- build_network(G, node, d)
sublist[[node]] <- subnet }
EDIT: here is the dput of head(genesets.length.null.stat)
structure(list(`1` = c(1.01397367504035, 1.18858228819048), `2` = c(1.61970348041337, 1.30189433386605), `3` = c(2.11767222957028, 1.36222065695878), `4` = c(2.47710421934929, 1.36968129959296), `5` = c(2.776011866622, 1.36318885187196), `6` = c(3.16885126246671, 1.42577861995897)), .Names = c("1", "2", "3", "4", "5", "6"))
Here is the node2treePath function:
node2treePath <- function (G, Tnodes, node){
tmp.path <- get.all.shortest.paths(G, node, Tnodes)$res
tmp.l <- unlist(lapply(tmp.path, length))
index <- which(tmp.l == min(tmp.l))
tmp.path = tmp.path[index]
tmp.sum <- unlist(lapply(tmp.path, function(x)return(sum(V(G)[x]$weight))))
index <- which(tmp.sum == max(tmp.sum))
selected.path = tmp.path[index]
collect <- unlist(lapply(selected.path, function(x)return(V(G)[x]$name)))
return(collect)
}
For the logic you want to do (and I imagine you may wish to change in way incompatible with the above answers) the following code is about ten times 30% faster. I used Rprof and profr and recoded some slow bits in trivial ways, e.g. not passing a named list pair, just an anonymous pair from one of your functions. The numerically named list with pairs of values for genesets.length.null.stat is very inefficient. I replaced it with two numeric vectors. You also call the 'V' function a lot, which was a big time consumer: as you can see, you can call it once, then query the result as needed.
# node2treePath is a function to retrieve the shortest path with the highest node weights
node2treePath_jw <- function(G, Tnodes, node){
tmp.path <- get.all.shortest.paths(G, node, Tnodes)$res
tmp.l <- vapply(tmp.path, length, integer(1))
index <- which(tmp.l == min(tmp.l))
tmp.path = tmp.path[index]
Vg <- V(G)
tmp.sum <- vapply(tmp.path, function(x) sum(Vg[x]$weight), numeric(1))
index <- which(tmp.sum == max(tmp.sum))
selected.path = tmp.path[index]
sapply(selected.path, function(x) Vg[x]$name)
}
build_network_jw <- function(net, seed, d= 2){
score.fun <- function(Vg, k){
Za <- sum(Vg$weight * Vg$RWRNodeweight) / sqrt(sum(Vg$RWRNodeweight^2))
(Za - genesets_jack_a[k]) / genesets_jack_b[k]
}
best.fun_jw <- function(in.nodes, out.nodes) {
score <- (-Inf)
best <- character()
for (node in out.nodes) {
subG.update <- induced.subgraph(net, c(in.nodes,node))
Vsgu <- V(subG.update)
Vsgu_count <- vcount(subG.update)
sf <- score.fun(Vsgu, Vsgu_count)
if (sf > score) {
score <- sf
best <- node
}
}
list(best, score)
}
subG <- induced.subgraph(net, seed)
if (!is.connected(subG)) { #the seed must be connected
stop("Input seeds are disjoint")
}
while (TRUE) {
VsubG <- V(subG)
Vnet <- V(net)
in.nodes <- VsubG$name
node_num <- vcount(subG)
subsum <- score.fun(VsubG, node_num)
for (rad in 1:d) { # d = 2
tmp.neigh <- unlist(neighborhood(net, order = rad, nodes = VsubG$name))
pot.nodes <- Vnet[tmp.neigh]$name
out.nodes <- setdiff(pot.nodes, in.nodes)
if (length(out.nodes) == 0) break
best_node <- best.fun_jw(in.nodes, out.nodes)
new_score <- best_node[[2]]
best_node <- best_node[[1]]
if (new_score > subsum + 0.01) {
tmp <- sapply(best_node, function(x) node2treePath_jw(net, VsubG$name, x))
in.nodes <- c(tmp, VsubG$name)
subG <- induced.subgraph(net, in.nodes)
break
}
}
if (node_num == vcount(subG)) break
}
subG
}
node2treePath <- function (G, Tnodes, node){
tmp.path <- get.all.shortest.paths(G, node, Tnodes)$res
tmp.l <- unlist(lapply(tmp.path, length))
index <- which(tmp.l == min(tmp.l))
tmp.path = tmp.path[index]
tmp.sum <- unlist(lapply(tmp.path, function(x)return(sum(V(G)[x]$weight))))
index <- which(tmp.sum == max(tmp.sum))
selected.path = tmp.path[index]
collect <- unlist(lapply(selected.path, function(x)return(V(G)[x]$name)))
return(collect)
}
build_network <- function (net, seed, d= 2){
#genesets.length.null.stat <- structure(list(`1` = c(1.01397367504035, 1.18858228819048), `2` = c(1.61970348041337, 1.30189433386605), `3` = c(2.11767222957028, 1.36222065695878), `4` = c(2.47710421934929, 1.36968129959296), `5` = c(2.776011866622, 1.36318885187196), `6` = c(3.16885126246671, 1.42577861995897)), .Names = c("1", "2", "3", "4", "5", "6"))
genesets.length.null.stat <- lapply(1:500, function(x) c(runif(1)+x, runif(1)+x))
names(genesets.length.null.stat) <- 1:500
score.fun<-function(g){
Za <- sum(V(g)$weight*V(g)$RWRNodeweight)/sqrt(sum(V(g)$RWRNodeweight^2))
k <- vcount(g)
tmp <- genesets.length.null.stat[[as.character(k)]] # genesets.length.null.stat is a list with the median of Za and sd of Za calculated for 1000 replicates of networks of size k
Sa <- (Za-tmp[1])/tmp[2]
}
best.fun <- function(in.nodes,out.nodes) {
score<-(-Inf); best<-character()
for (node in out.nodes){
subG.update<-induced.subgraph(net, c(in.nodes,node))
if (score.fun(subG.update) > score) {
score<-score.fun(subG.update)
best<-node
}
}
list("node"=best,"score"=score)
}
subG <- induced.subgraph(net, seed)
if (!is.connected(subG)) { #the seed must be connected
stop("Input seeds are disjoint")
}
while (TRUE) {
in.nodes <- V(subG)$name
node_num <- vcount(subG)
subsum <- score.fun(subG)
#subx <- V(subG)$name
for (rad in 1:d) {
tmp.neigh <- unlist(neighborhood(net, order = rad, nodes = V(subG)$name))
pot.nodes <- V(net)[tmp.neigh]$name
out.nodes <- setdiff(pot.nodes, in.nodes)
if (length(out.nodes) == 0) break
#message("length in.nodes = ", length(in.nodes))
#message("length out.nodes = ", length(out.nodes))
best_node<-best.fun(in.nodes, out.nodes)
new_score<-best_node$score
best_node<-best_node$node
if (new_score > subsum + 0.01) {
tmp <- unlist(lapply(best_node, function(x) node2treePath(net,V(subG)$name, x))) # node2treePath is a function to retrieve the shortest path with the highest node weights
in.nodes <- c(tmp, V(subG)$name)
subG <- induced.subgraph(net, in.nodes)
break
}
}
if (node_num == vcount(subG)) break
}
subG
}
library(igraph)
library(profr)
library(igraph)
library(profr)
#genesets.length.null.stat <- lapply(1:500, function(x) c(runif(1)+x, runif(1)+x))
#names(genesets.length.null.stat) <- 1:500
set.seed(1)
genesets_jack_a = runif(500) + 1:500
genesets_jack_b = runif(500) + 1:500
do_it_jw <- function(n = 1000){
my_graph <- erdos.renyi.game(n, 0.0003)
V(my_graph)$name <- 1:vcount(my_graph)
V(my_graph)$weight <- rnorm(n)
V(my_graph)$RWRNodeweight <- runif(n, min = 0, max = 0.05)
### Run the function
sublist = list()
Vmg <- V(my_graph)
for (node in Vmg$name) {
#message(node)
subnet <- build_network_jw(my_graph, node, 2)
sublist[[node]] <- subnet }
}
do_it <- function(n = 1000){
my_graph <- erdos.renyi.game(n, 0.0003)
V(my_graph)$name <- 1:vcount(my_graph)
V(my_graph)$weight <- rnorm(n)
V(my_graph)$RWRNodeweight <- runif(n, min = 0, max = 0.05)
### Run the function
sublist = list()
Vmg <- V(my_graph)
for (node in Vmg$name) {
#message(node)
subnet <- build_network(my_graph, node, 2)
sublist[[node]] <- subnet }
}
library(microbenchmark)
mb <- microbenchmark(do_it(1000), do_it_jw(1000), times = 5)
print(mb)
Since your score function only depends on node attributes and not edge's, the solution is not unique; you might want to search for a best tree instead. If you restructure your problem so that your nodes are edges and vice-versa, you probably can just use eg Djikstra's algorithm to find the best one. That is already in the igraph package as shortest.paths().
I can't read the R code, but based on your description: If the score threshold is constant, then this is easy to do in O(|V|+|E|+|C|^2) time, where |C| is the number of "good" components (this will be further explained shortly).
In a first pass, delete all nodes with score below the threshold. Then find all connected components in this new graph (this can be done in O(|V|+|E|) time by starting a DFS at each as-yet-unvisited node), calculate their scores by multiplying together all vertex weights in the component, and label each vertex with its component ID. This already tells you the "good" components -- the ones that don't require any 2nd-degree connections.
Suppose this produces |C| components. Create an empty hashtable H which has component-ID pairs for keys, and (length, weight) pairs for values. Now go back through each vertex v you deleted in the first pass: for each one, look at all its neighbours and record the shortest edge to each distinct component (this can be done using a length-|C| array to store the shortest edge to each component seen so far). After examining all of v's neighbours, count the number k of distinct components they fall into: if k >= 2, then v potentially should be used to connect some of these k(k-1)/2 pairs of components. For every pair of distinct components i and j that could be connected by v, update H with the weight and distance of this 2-edge connection as necessary: that is, if i and j are not yet joined together, then record that v joins them; otherwise, if they are already joined by some vertex u, only update H if v can do better (i.e., if it uses less total length and greater weight than u would). This step can be thought of as building a minimum spanning tree in a "component graph" derived from the original, pruned graph. The scores for each new "combined" component can easily be calculated as you go just by multiplying together the scores of the two constituent components.
Finally, simply return the component whose product is maximum.

Correlation Function

I wrote a function that takes a directory of data files and a threshold for complete cases and calculates the correlation between two variables(sulfate and nitrate) on the second and third column of the data set,for monitor locations where the number of completely observed cases is greater than the threshold.
My function returns a warning message:
in if(n>0){:
the condition has length > 1 and only the first element will be used.
Please been trying to fix this to no avail. I am very new to programming. Would appreciate any helpful solution.
corr <- function(directory, threshold = 0) {
files <- list.files(directory,full.names = TRUE)
p <- data.frame()
for (i in 1:332) {
p <- rbind(p,read.csv(files[i]))
r <- c(p[,2])
s <- c(p[,3])
n <- p[! is.na(p[,])]
if (n > threshold) {
cor(r,s)
} else {
return(0)
}
}
}

capturing R matrix list

Below piece of code is generating what I need but I am not able to store it so that I can use it further.
In the case below, I want to store each player's hand in a list of matrices p such that p[i]<-deck2[smpl,].
The second thing I want is to save and use the final matrix of deck2 (i.e say with 10 players, it will be a 29 row matrix). I can see NROW(deck2) as 29 but the assignment of d<-deck2 is not happening. What am I missing here?
deck2=matrix(c(rep( c(2:10,"J","Q","K","A"),4),rep(c("C","D","H","S"),rep(13,4))), ncol=2,dimnames=list(NULL,c("rank","suit")))
player_hands=function(players)
{ if(players >= 2 && players <= 10) {
for(i in 1:players)
{
smpl <- sample(1:NROW(deck2),2,replace=F)
r <- deck2[smpl,]
p <- deck2[smpl,]
deck2 <- deck2[-smpl,]
print(r)
if(i==players)
{ smpl <- sample(1:NROW(deck2),3,replace=F)
r <- deck2[smpl,]
p <- deck2[smpl,]
deck2 <- deck2[-smpl,]
print("Dealer Hand")
print(r)
}
else i=i+1
} }
else print("Invalid No. of Players")
}
I believe this should do what you want. It will return a list containing two items.
The first of these two items is the list of hands p, of which the last one will be the dealer's hand.
The second of the two items it returns will be the new deck2.
player_hands=function(players)
{ if(players >= 2 && players <= 10) {
p = list()
for(i in 1:players)
{
smpl <- sample(1:NROW(deck2),2,replace=F)
r <- deck2[smpl,]
p[[i]] = r
deck2 <- deck2[-smpl,]
if(i==players)
{ smpl <- sample(1:NROW(deck2),3,replace=F)
r <- deck2[smpl,]
p[[players+1]] <- r
deck2 <- deck2[-smpl,]
}
else i=i+1
}
return(list(p, deck2))
}
else print("Invalid No. of Players")
}

Perceptron (single layer 2D) - Result with samples on straight line

I tried to implement a simple 2D single layer perceptron and ended up with this solution:
perceptron <- function(featureVec, classVec, wStart=matrix(c(0,0,0)), eta=1, limit = 50) {
plot(x=featureVec[,1],y=featureVec[,2])
# Extending dimensions
dimension <- dim(featureVec)[1]
featureVec <- cbind(featureVec,rep(1,dimension))
# Inverting 2. class
index <- classVec == -1
featureVec[index,] <- apply(matrix(featureVec[index]),1,prod,-1)
wTemp <- wStart
y <- featureVec %*% wTemp
iteration = 0
while (T) {
y <- featureVec %*% wTemp
delta <- as.matrix(featureVec[y <= 0,])
for(i in 1:nrow(delta)) {
wTemp <- wTemp + eta*delta[i,]
}
result <- featureVec %*% wTemp
if (sum(result <= 0) == 0) {
break
}
if (iteration >= limit) {
stop("Maximum count of interations reached!")
}
iteration = iteration + 1
}
if(wTemp[2] != 0) {
abline(-wTemp[3]/wTemp[2],-wTemp[1]/wTemp[2])
} else if(wTemp[2] == 0) {
abline(v=wTemp[1])
} else if(wTemp[1] == 0) {
abline(h=wTemp[2])
}
return(wTemp)
}
The feature vector works row-wise, the class vector needs values of 1 and -1 col-wise.
For most of my tests it works correct, but when I have samples like (0,0) (0,1) with classes (1,-1) I get no result. That happens with some of my examples with two points lying on a straight line (horizontal to a coordinate axis). When I try to choose different start vectors it sometimes works correctly (I have no deterministic behaviour here right now I guess). Is that a correct behaviour or is my implementation wrong?
Thanks for your help, Meiner.
EDIT: Some changes of the inital post.
Bad Dataset:
featureTest <- matrix(c(0,0,0,1),byrow=T,nrow=2)
classTest <- matrix(c(1,-1),nrow=2)
perceptron(featureTest,classTest)
featureTest <- matrix(c(0,1,0,2),byrow=T,nrow=2)
classTest <- matrix(c(1,-1),nrow=2)
perceptron(featureTest,classTest)
Good Dataset:
featureTest <- matrix(c(0,0,0,2),byrow=T,nrow=2)
classTest <- matrix(c(1,-1),nrow=2)
perceptron(featureTest,classTest)

Resources