Convert equal interval of vector to rows of matrix - r

I've imported table that contains the travel times for an origin-destination cost matrix of size nxn. As a result, travel times equal to zero when an origin and destination are the same.
For example, an OD cost matrix of 25 origins and 25 destinations (625 elements) would have zero values running down the diagonal. In a vector, the value 0 occurs at the 0th element, 26th element, 51st element, etc.
I've read the travel times in as a vector and I'd like to reshape the vector into a matrix where every element on the diagonal has the value of zero. Does anyone have any ideas on how this would be done?
Code:
### READ and PREPARE DATA ###
# Read OD cost matrix (use data.table for performance)
od_table <- read.table('DMatrix.txt', sep=',', header=TRUE, na.strings="NA", stringsAsFactors=FALSE)
v <- t(od_table$Total_TravelTime)
n <- sqrt(length(v))
D <- matrix(v, nrow=25)
The resulting matrix has zero values along the first row only:

Related

Find column with values closest to vector

I have a vector containing times in milliseconds looking like this;
vector <- c(667753, 671396, 675356, 679286, 683413, 687890, 691742,
695651, 700100, 704552, 708832, 713117, 717082, 720872, 725002, 729490,
733824, 738233, 742239, 746092, 750003, 754236, 867342, 870889, 873704,
876617, 879626, 882595, 885690, 888602, 891789, 894717, 897547, 900797,
903615, 906646, 909624, 912613, 915645, 918566, 921792, 924625, 927538,
930721, 933542)
Now i want to look into a large data frame with a lot of time columns and search for a single column that contains time values being closest (row-wise) to my vector time values.
The data.frame containing all the columns is of the same number of rows. So lets say my vector has 240 elements, then every column in the larger data.frame consists of 240 rows.
Any idia how to do this ?
You can calculate the euclidean distance from your vector and each column of the dataframe and then check which column has the smallest distance:
which.min(sapply(1:ncol(dataFrame), function(i) sqrt(sum((t(v)-dataFrame[,i])^2))))
The above returns the index of the column with the lowest distance.
Where dataFrame is the data frame containing columns of different times(so we compare each column to the vector v) and v is the vector.
The following is just the square root of the sum of squared distances (euclidean distance):
sqrt(sum((t(v)-dataFrame[,i])^2)))
You can also use the following as a distance measure:
abs(t(v)-dataFrame[,i])
EDIT
As Evan Friedland pointed out you can actually just use:
which.min(colSums(abs(v-dataFrame)))
or
which.min(sqrt(colSums((t(v)-dataFrame)^2)))

R - Create table,how much of a vector in % is contained in another vector

I have a number of vectors containing a set of numbers.
e.g.:
v1 <- c(15,12,50,2007,1828)
v2 <- c(50,2007,11,8)
in the next step i Want to see how much of vector 2 in Percent is contained in vector 1
sim <- length(which(v2%in%v1 ==T)) / length(v2)
I created a for loop for that ,checking v1 versus v2,v3,v4.... and then v2 versus v1,v3,4...
If the sim value was bigger than 10% i wanted to enter that in a table.
Because of the number of vectors ~ 1000. The for loop is taking way to long.
Are there any alternatives?
You should use the set operator intersect
First, calculate the intersection of the two vectors
shared <- intersect(v1,v2)
Next, calculate the percent of shared elements in v2
sim <- length(shared)/length(v2)
If you type ?intersect in your R command line, you will see there are also other helpfull setoperations like union and setdiff

create an incidence matrix with restrictions in r (i.graph)

I would like to create a (N*M)-Incidence Matrix for a bipartite graph (N=M=200).
However, the following restrictions have to be considered:
Each column i ( 1 , ... , 200 ) has a column sum of g = 10
each row has a Row sum of h = 10
no multiedges (The values in the incidence Matrix only take on the values [0:1]
So far I have
M <- 200; # number of rows
N <- 200; # number of colums
g <- 10
I <- matrix(sample(0:1, M*N, repl=T, prob= c(1-g/N,g/N)), M, N);
Does anybody has a solution?
Here's one way to do what you want. First the algorithm idea, then its implementation in R.
Two step Algorithm Idea
You want a matrix of 0's and 1's, with each row adding up to be 10, and each column adding up to be 10.
Step 1: First,create a trivial solution as follows:
The first 10 rows have 1's for the first 10 elements, then 190 zeros.
The second set of ten rows have 1's from the 11th to the 20th element and so on.
In other words, a feasible solution is to have a 200x200 matrix of all 0's, with dense matrices of 10x10 1's embedded diagonally, 20 times.
Step 2: Shuffle entire rows and entire columns.
In this shuffle, the rowSum and columnSums are maintained.
Implementation in R
I use a smaller matrix of 16x16 to demonstrate. In this case, let's say we want each row and each column to add up to 4. (This colsum has to be integer divisible of the larger square matrix dimension.)
n <- 4 #size of the smaller square
i <- c(1,1,1,1) #dense matrix of 1's
z <- c(0,0,0,0) #dense matrix of 0's
#create a feasible solution to start with:
m <- matrix(c(rep(c(i,z,z,z),n),
rep(c(z,i,z,z),n),
rep(c(z,z,i,z),n),
rep(c(z,z,z,i),n)), 16,16)
#shuffle (Run the two lines following as many times as you like)
m <- m[sample(16), ] #shuffle rows
m <- m[ ,sample(16)] #shuffle columns
#verify that the sum conditions are not violated
colSums(m); rowSums(m)
#solution
print(m)
Hope that helps you move forward with your bipartite igraph.

Convert a one column matrix to n x c matrix

I have a (nxc+n+c) by 1 matrix. And I want to deselect the last n+c rows and convert the rest into a nxc matrix. Below is what I've tried, but it returns a matrix with every element the same in one row. I'm not sure why is this. Could someone help me out please?
tmp=x[1:n*c,]
Membership <- matrix(tmp, nrow=n, ncol=c)
You have a vector x of length n*c + n + c, when you do the extract, you put a comma in your code.
You should do tmp=x[1:(n*c)].
Notice the importance of parenthesis, since if you do tmp=x[1:n*c], it will take the range from 1 to n, multiply it by c - giving a new range and then extract based on this new range.
For example, you want to avoid:
(1:100)[1:5*5]
[1] 5 10 15 20 25
You can also do without messing up your head with indexing:
matrix(head(x, n*c), ncol=c)

how to select a matrix column based on column name

I have a table with shortest paths obtained with:
g<-barabasi.game(200)
geodesic.distr <- table(shortest.paths(g))
geodesic.distr
# 0 1 2 3 4 5 6 7
# 117 298 3002 2478 3342 3624 800 28
I then build a matrix with 100 rows and same number of columns as length(geodesic.distr):
geo<-matrix(0, nrow=100, ncol=length(unlist(labels(geodesic.distr))))
colnames(geo) <- unlist(labels(geodesic.distr))
Now I run 100 experiments where I create preferential attachment-based networks with
for(i in seq(1:100)){
bar <- barabasi.game(vcount(g))
geodesic.distr <- table(shortest.paths(bar))
distance <- unlist(labels(geodesic.distr))
for(ii in distance){
geo[i,ii]<-WHAT HERE?
}
}
and for each experiment, I'd like to store in the matrix how many paths I have found.
My question is: how to select the right column based on the column name? In my case, some names produced by the simulated network may not be present in the original one, so I need not only to find the right column by its name, but also the closest one (suppose my max value is 7, I may end up with a path of length 9 which is not present in the geo matrix, so I want to add it to the column named 7)
There is actually a problem with your approach. The length of the geodesic.distr table is stochastic, and you are allocating a matrix to store 100 realizations based on a single run. What if one of the 100 runs will give you a longer geodesic.distr vector? I assume you want to make the allocated matrix bigger in this case. Or, even better, you want run the 100 realizations first, and allocate the matrix after you know its size.
Another potential problem is that if you do table(shortest.paths(bar)), then you are (by default) considering undirected distances, will end up with a symmetric matrix and count all distances (expect for self-distances) twice. This may or may not be what you want.
Anyway, here is a simple way, with the matrix allocated after the 100 runs:
dists <- lapply(1:100, function(x) {
bar <- barabasi.game(vcount(g))
table(shortest.paths(bar))
})
maxlen <- max(sapply(dists, length))
geo <- t(sapply(dists, function(d) c(d, rep(0, maxlen-length(d)))))

Resources