Related
I have a square matrix that represents directed interactions, with values representing the magnitude of the "flow" from row i to column j.
mat <- structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.59734154600838,
0.962276996464401, 0.996554553573577, 0.988150008522967, 0.581536975261071,
0.280105566896129, 0.0520717823071291, 0.0443864046117343, 0.0162858335588474,
0, 0, 0, 0, 0, 0, 0, 0.111900863185923, 0.289483837277475, 0.338036619790556,
0.973201117894343, 0.876145758734938, 0.280105566896129, 0.245172586054694,
0.101440228047504, 0.0136022221272776, 0, 0, 0, 0, 0, 0, 0.073088274682518,
0.21588462733217, 0.258134862678946, 0.93528472971792, 0.921844796228768,
0.318790697187933, 0.280105566896129, 0.117928032625428, 0.016073037487081,
0, 0, 0, 0, 0, 0, 0, 0.0119602547215087, 0.0174757225504163,
0.443466799224191, 0.941024455005652, 0.632609306727839, 0.57418820480725,
0.280105566896129, 0.043827579210664, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0.0547471528159807, 0.884304818335752, 0.937495721370637,
0.925118019265575, 0.280105566896129, 0.055967839940851, 0.0122649398400715,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0679263578760456, 0.104884821422108,
0.569814755335506, 0.853130344409379, 0.280105566896129, 0.0728699300735904,
0.0339371561178606, 0.012188886551821, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0.0219303360220489, 0.843994038605239, 0.759918325154657,
0.280105566896129, 0.143508732965731, 0.0556400089034765, 0.0296286033644999,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0.421151438381493, 0.977746695038157,
0.499880491267235, 0.280105566896129, 0.116686808742586, 0.0639605586005988,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0495967410949283, 0.841406989124245,
0.85505217514437, 0.578265483357174, 0.280105566896129, 0.163154497800251,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.499941945587477, 0.993657104473566,
0.807475685951474, 0.45318772928331, 0.280105566896129), .Dim = c(15L,
15L))
I am interested in calculating the weighted linkage similarity (both in and out flows) of all vertices in the network, so taking magnitude into account.
Using igraph, I can calculate the Jaccard similarity but without considering weights
library(igraph)
bin <- mat
bin[bin > 0] <- 1
similarity(graph_from_adjacency_matrix(bin),
mode = "all",
method = "jaccard")
# this gives the same result as the one above
similarity(graph_from_adjacency_matrix(mat, weighted = T),
mode = "all",
method = "jaccard")
Using the code from this blogpost, I was able to calculate the Jaccard similarity of outflows and inflows and combine them.
# outflow similarity
sim.jac.out <- matrix(0, nrow=nrow(mat), ncol=nrow(mat))
pairs <- t(combn(1:nrow(mat), 2))
for (i in 1:nrow(pairs)) {
num <- sum(sapply(1:ncol(mat), function(x) (min(mat[pairs[i,1],x], mat[pairs[i,2],x]))))
den <- sum(sapply(1:ncol(mat), function(x) (max(mat[pairs[i,1],x], mat[pairs[i,2],x]))))
sim.jac.out[pairs[i,1],pairs[i,2]] <- num/den
sim.jac.out[pairs[i,2],pairs[i,1]] <- num/den
}
sim.jac.out[which(is.na(sim.jac.out))] <- 0
diag(sim.jac.out) <- 1
# inflow similarity
sim.jac.in <- matrix(0, nrow=nrow(mat), ncol=nrow(mat))
pairs <- t(combn(1:nrow(t(mat)), 2))
for (i in 1:nrow(pairs)) {
num <- sum(sapply(1:ncol(t(mat)), function(x) (min(t(mat)[pairs[i,1],x], t(mat)[pairs[i,2],x]))))
den <- sum(sapply(1:ncol(t(mat)), function(x) (max(t(mat)[pairs[i,1],x], t(mat)[pairs[i,2],x]))))
sim.jac.in[pairs[i,1],pairs[i,2]] <- num/den
sim.jac.in[pairs[i,2],pairs[i,1]] <- num/den
}
sim.jac.in[which(is.na(sim.jac.in))] <- 0
diag(sim.jac.in) <- 1
# total similariry
sim.jac.all <- (sim.jac.in + sim.jac.out)/2
So the general question is, does this make sense?
But more specifically, I would be interested to know if there is a way to incorporate link weights in the calculation of similarity with igraph.
In my real dataset, I need to do this several times iteratively (swapping individuals), for a large number of networks, so my method would take forever. I believe igraph uses C++ under the hood.
I am trying to calculate robustness, a graph theory measure using R (braingraph package).
Robustness = robustness(my_networkgraph, type = c("vertex"), measure = ("btwn.cent"))
I get the following error, when I use the above robustness function:
Error in order(vertex_attr(g, measure), decreasing = TRUE) : argument 1 is not a vector
Any idea, what I am doing wrong here?
My network, which is a matrix has been converted to igraph object and robustness was calculated.
My network as a matrix:
mynetwork <- matrix(c(0, 1, 0, 1, 0, 0, 0, 0,
1, 0, 1, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 1, 1, 0, 1, 1,
0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0), nrow = 8)
This matrix was converted as igraph using the following code:
my_networkgraph <-graph_from_adjacency_matrix(mynetwork, mode = c("undirected"),weighted = NULL, diag = TRUE, add.colnames = NULL, add.rownames = NA)
Please help me to understand the above error
Thanks
Priya
There was a bug in the above function. To run the robustness code, you will need to supply a vertex attribute to your network: V(network)$degree <- degree(network) V(network)$btwn.cent <- centr_betw(network)$res
I have an array:
a <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
and would like to implement the following function:
w<-function(a){
if (a>0){
a/sum(a)
}
else 1
}
This function would like to check whether there is any value in a larger than 0 and if yes then divide each element by the sum of the total.
Otherwise it should just record 1.
I get the following warning message:
Warning message:
In if (a > 0) { :
the condition has length > 1 and only the first element will be used
How can I correct the function?
maybe you want ifelse:
a <- c(1,1,1,1,0,0,0,0,2,2)
ifelse(a>0,a/sum(a),1)
[1] 0.125 0.125 0.125 0.125 1.000 1.000 1.000 1.000
[9] 0.250 0.250
if statement is not vectorized. For vectorized if statements you should use ifelse. In your case it is sufficient to write
w <- function(a){
if (any(a>0)){
a/sum(a)
}
else 1
}
or a short vectorised version
ifelse(a > 0, a/sum(a), 1)
It depends on which do you want to use, because first function gives output vector of length 1 (in else part) and ifelse gives output vector of length equal to length of a.
Just adding a point to the whole discussion as to why this warning comes up (It wasn't clear to me before). The reason one gets this is as mentioned before is because 'a' in this case is a vector and the inequality 'a>0' produces another vector of TRUE and FALSE (where 'a' is >0 or not).
If you would like to instead test if any value of 'a>0', you can use functions - 'any' or 'all'
Best
Here's an easy way without ifelse:
(a/sum(a))^(a>0)
An example:
a <- c(0, 1, 0, 0, 1, 1, 0, 1)
(a/sum(a))^(a>0)
[1] 1.00 0.25 1.00 1.00 0.25 0.25 1.00 0.25
The way I cam across this question was when I tried doing something similar where I was defining a function and it was being called with the array like others pointed out
You could do something like this however for this scenarios its less elegant compared to Sven's method.
sapply(a, function(x) afunc(x))
afunc<-function(a){
if (a>0){
a/sum(a)
}
else 1
}
Use lapply function after creating your function normally.
lapply(x="your input", fun="insert your function name")
lapply gives a list so use unlist function to take them out of the function
unlist(lapply(a,w))
I would say the most efficient way is the answer by
user1317221_G. However, if you want to go back to the basics, then looping over the length of your vector (since the if function doesnt work over the length of the vector) using the for function would be useful.
w <- c() ##creates empty vector named 'w'
for(i in 1:length(a)){
if (a[i]>0){
w[i] <- a[i]/sum(a)
}
else
w[i] <- 1
}
I have a data frame df in the following form:
v2 v3
2.3 c(1,5,8,2)
1.2 c(2,4,3,2)
The typeof(df$v3[1]) is list, and I want to convert it to vector. So I write a `sapply`` function and run it:
df$v3 <- sapply(
df$v3,
function(x) {x <- unlist(x)}
)
But it just keeps running and does not generate any result. I also tried lapply but it can't give me the expected result. Instead, it again generates a list.
Any my dput(droplevels(head(df))) result is :
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0))), .Names = c("V2", "V3"), row.names = c(NA,
-6L), .internal.selfref = <pointer: 0x102010578>, class = c("data.table", "data.frame"))
Could you please tell me how to solve it?
EDIT:
I run the df$V3 <- unlist(df$V3) for a long time. But the type is still list and a warning is generated:
Warning messages:
1: In `[<-.data.table`(x, j = name, value = value) :
Supplied 496450000 items to be assigned to 99290 items of column 'V3' (496350710 unused)
2: In `[<-.data.table`(x, j = name, value = value) :
Coerced 'double' RHS to 'list' to match the column's type; may have truncated precision. Either change the target column to 'double' first (by creating a new 'double' vector length 99290 (nrows of entire table) and assign that; i.e. 'replace' column), or coerce RHS to 'list' (e.g. 1L, NA_[real|integer]_, as.*, etc) to make your intent clear and for speed. Or, set the column type correctly up front when you create the table and stick to it, please.
I would like to calculate the minimum number of consecutive elements in a vector that when added (consecutively) would be less than a given value.
For example in the following vector
ev<-c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 2.7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3.27, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 370.33, 1375.4,
1394.03, 1423.8, 1360, 1269.77, 1378.8, 1350.37, 1425.97, 1423.6,
1363.4, 1369.87, 1365.5, 1294.97, 1362.27, 1117.67, 1026.97,
1077.4, 1356.83, 565.23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 356.83,
973.5, 0, 240.43, 1232.07, 1440, 1329.67, 1096.87, 1331.37, 1305.03,
1328.03, 1246.03, 1182.3, 1054.53, 723.03, 1171.53, 1263.17,
1200.37, 1054.8, 971.4, 936.4, 968.57, 897.93, 1099.87, 876.43,
1095.47, 1132, 774.4, 1075.13, 982.57, 947.33, 1096.97, 929.83,
1246.9, 1398.2, 1063.83, 1223.73, 1174.37, 1248.5, 1171.63, 1280.57,
1183.33, 1016.23, 1082.1, 795.37, 900.83, 1159.2, 992.5, 967.3,
1440, 804.13, 418.17, 559.57, 563.87, 562.97, 1113.1, 954.87,
883.8, 1207.1, 1046.83, 995.77, 803.93, 1036.63, 946.9, 887.33,
727.97, 733.93, 979.2, 1176.8, 1241.3, 1435.6)
What is the minimum number of elements that when added consecutively (as in the order within the vector) would sum up to lets say 20000
To be more clear i need the following:
Start with ev[1] and add consecutively up to 20000. Record the number of elements you had to add in order to get to 20000 as r[1]. Then start with ev[2] and add till 20000 and so on. Recored the number of elements you had to add till 20000 as r[2]. Do this for the entire length of ev. Then return the min(r)
For example
j<-c(1, 2, 3, 5, 7, 9, 2).
I want the minimum number of elements that when added consecutively would give lets say >20. This should be 3 (5+7+9)
Thanks a lot
Well, I'll give it a shot: This one will find the length of the minimum sequence of numbers
that add up to or above max. It makes no claims to be fast, but it has O(2n) time complexity :-)
I made it return both the start index and the length.
f <- function(x, max=10) {
s <- 0
len <- Inf
start <- 1
j <- 1
for (i in seq_along(x)) {
s <- s + x[i]
while (s >= max) {
if (i-j+1 < len) {
len <- i-j+1
start <- j
}
s <- s - x[j]
j <- j + 1
}
}
list(start=start, length=len)
# uncomment the line below if you don't need the start index...
#len
}
r <- f(ev, 20000) # list(start=245, length=15)
sum(ev[seq(r$start, len=r$length)]) # 20275.42
# Test speed:
x <- sin(1:1e6)
system.time( r <- f(x, 1.9) ) # 1.54 secs
# Compile the function makes it 9x faster...
g <- compiler::cmpfun(f)
system.time( r <- g(x, 1.9) ) # 0.17 secs
library(zoo) # Needed for rollapply
N <- 20000 # The desired sum we want to achieve
j <- 0
for(i in 1:length(ev)){
k <- rollapply(ev, i, sum)
j[i] <- max(k)
if(j[i] >= N){
break
}
}
i # contains how many consecutive elements you need to sum (15)
j[i] # contains the corresponding sum(20275.42)
Currently this doesn't tell you where the specific subset occurs in the vector but another use of rollapply could get you that information.
There are other ways to do it but if you have a really long vector this will break out of the loop so you don't calculate more than you need. The basic idea is to use rollapply to create a vector of the consecutive sums of length k and then find the maximum of that. If this is less than what we desire do the same thing for sums of length k+1. Repeat until we find a sum that is larger than the desired threshold.
Edit:
This appears to be about 100x faster. I haven't compared it to Tommy's answer (which is probably faster than this but this will provide a significant speedup compared to my original method.
Edit 2: Moving the [-n] and removing the suppresswarnings speeds this up quite a bit.
myfun <- function(ev, N){
i <- 1
n <- length(ev)
j <- ev
repeat{
j <- (j[-n] + ev[-c(1:i)])
i <- i+1
n <- n-1
if(max(j) >= N | i > length(ev)){
break;
}
}
return(i)
}
myfun(ev, 20000)
# And stealing the idea from Tommy gives a nice speedup as well
myfuncomp <- compiler:cmpfun(myfun)
myfuncomp(ev, 20000)
myfunc3 <- compiler:cmpfun(myfun, options = list(optimize = 3))
myfunc3(ev, 20000)
library(rbenchmark) # For testing
# If you have Tommy's functions loaded as f and g you can compare
benchmark(f(ev, 20000), g(ev, 20000), myfun(ev, 20000), myfuncomp(ev, 20000), myfunc3(ev, 20000))
you mean something like this?
> sum(ifelse(cumsum(ev)<=200000, 1, 0))
[1] 364
I think this may be a Traveling Salesman Problem in disguise unless you put in some more constraints. You cannot necessarily start at the max ev and go out in either direction since it may be a local non-dense maximum
x=1:length(ev)
plot(x,ev)
lxy <- loess(ev~x )
lines(predict(lxy, x=1:length(y)))
title(main="loess() fit of ev")
But in the region of the most dense values the values are fairly flat.
x=1:length(y); y=c(356.83,
973.5, 0, 240.43, 1232.07, 1440, 1329.67, 1096.87, 1331.37, 1305.03,
1328.03, 1246.03, 1182.3, 1054.53, 723.03, 1171.53, 1263.17,
1200.37, 1054.8, 971.4, 936.4, 968.57, 897.93, 1099.87, 876.43,
1095.47, 1132, 774.4, 1075.13, 982.57, 947.33, 1096.97, 929.83,
1246.9, 1398.2, 1063.83, 1223.73, 1174.37, 1248.5, 1171.63, 1280.57,
1183.33, 1016.23, 1082.1, 795.37, 900.83, 1159.2, 992.5, 967.3,
1440, 804.13, 418.17, 559.57, 563.87, 562.97, 1113.1, 954.87,
883.8, 1207.1, 1046.83, 995.77, 803.93, 1036.63, 946.9, 887.33,
727.97, 733.93, 979.2, 1176.8, 1241.3, 1435.6)
lxyhi <- loess(y~x)
plot(x,y)
lines(predict(lxyhi, x=1:length(y)))