How to create chain from pairs in R - r

edit: added current solution
I am dabbling with the Travelling Salesman Problem and am using a solver to calculate the most optimal tour. The output of my linear solver gives me a table with arches in a route, however to plot the tour I require vector with all the locations chained in the right order. Is there an elegant way to chain these arches into a single tour?
One solution would be a series of (nested) joins/matches, however that is not an elegant solution in my opinion.
# output of solver (where i = 'from' and j = 'to')
solution = data.frame(i = c(6, 4, 10, 7, 1, 9, 3, 2, 8, 5),
j = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
# transformation
??
# required output
tour = c(6, 1, 5, 10, 3, 7, 4, 2, 8, 9)
So the output I am looking for is a single chain of connected arches (from i to j) in the tour.
My current solution uses for loops and match and looks as follows:
# number of cities to visit
nCities = length(solution)
# empty matrix
tour = matrix(0, nCities, 2)
#first location to visit picked manually
tour[1, ] = solution[1, ]
# for loop to find index of next arch in tour
for(k in 2:nCities){
ind = match(tour[k - 1, 2], solution[, 1])
tour[k, ] = solution[ind, ]
}
# output 'tour' is the solution but then sorted.
# I then take only the first column which is the tour
tour = tour[1, ]
However, it looks clunky and as I try to avoid for loops as much as possible I am not to happy with it. Also, my suspicion is that there are more elegant solutions out there, preferably using base R functions.

Related

Processing a data_frame: Defining when the value of a column change one unit

I have the following data structure:
iid<-c(rep("I1",5),rep("I2",5),rep("I3",5),rep("I4",5))
days<-rep(c(0,2,5,7,14),4)
estatus<-c(4,4,4,3,3,
5,4,4,4,3,
4,4,4,4,4,
5,4,4,3,2)
data<-as.data.frame(cbind(iid,days,estatus))
I'm interested in obtained different outcomes all related to changes in the variable "status"
First I want to know how many individuals (iid) have changed their status in 1 unit by the day 5. I don't want to treat days as a factor, this is a simple example, but in the real dataset days can change between individuals, so I don't have always the same days.
The first outcome would look like this:
iid<-c("I1","I2","I3","I4")
res_5<-c(0,1,0,1)
results_1<-as.data.frame(cbind(iid,res_5))
I1 and I3 did not experience a change in their status of 1 unit.
The second outcome I'm interested in is to know on which day the status of each individual changes 1 unit in their status. The outcome would be like:
iid<-c("I1","I2","I3","I4")
res_d<-c(7,2,NA,2)
results_1<-as.data.frame(cbind(iid,res_d))
I think that I got the first part of the problem, as I know how to aggregate by iid with tidyverse or dplyr. However, I don't know how to check if a certain row is 1, 2 or n units above or below the previous row.
Using by to apply a function for each id.
(i) look for the index where days == 5 and check the diff with the first element
(ii) use diff to compute the difference of consecutive elements in your vector and then look for a difference of 1 or -1
iid <- c(rep("I1", 5), rep("I2", 5), rep("I3", 5), rep("I4", 5))
days <- rep(c(0, 2, 5, 7, 14), 4)
estatus <- c(
4, 4, 4, 3, 3,
5, 4, 4, 4, 3,
4, 4, 4, 4, 4,
5, 4, 4, 3, 2
)
data <- data.frame(iid = iid, days = days, estatus = estatus)
my_func1 <- function(x) {
ind5 <- which(x$days == 5)
d <- x$estatus[ind5] - x$estatus[1]
return((d == 1) | (d == -1))
}
by(data, data$iid, my_func1)
my_func2 <- function(x) {
d <- diff(x$estatus)
hasChangeOf1 <- (d == 1) | (d == -1)
return(x$days[which(hasChangeOf1)[1] + 1])
}
by(data, data$iid, my_func2)

Apply an index command on a matrix of lists

I have a matrix that contains lists containing shortest path sequences of an igraph object.
I want to turn this matrix into an igraph.es(edge sequence).
sample:
library(igraph)
data <- data.frame(from =c(1, 2, 3, 4, 5, 1),
to =c(4, 3, 4, 5, 6, 5),
weight=c(0.2,0.1,0.5,0.7,0.8,0.2))
g <- graph.data.frame(data, directed=FALSE)
sp <- sapply(data, function(x){shortest_paths(g, from = x, to = V(g)[x],output = "epath")})
sp is now a matrix. We can subset it with indexing:
x<-sp[[2]][[2]]
will turn x to an igraph::edge_sequence.
I'm looking for an apply command to turn all path_sequences of sp into edge_sequences. Thank you in advance.
EDIT:
I managed to unlist the first layer of the list.
sp<-flatten(sp)
So we just need a simple index.
Can I just use a for loop now?
Something like:
for(i in sp){ result[i]<- sum(E(g)$weight[sp[[i]])}
unfortunately this doesn't give me the desired output..

Appropriate method for transition function in gdistance

The following code is an example for the transition function from the pdf manual for gdistance:
library(raster)
library(gdistance)
r <- raster(nrows=6, ncols=7, xmn=0, xmx=7, ymn=0, ymx=6, crs="+proj=utm +units=m")
r[] <- c(2, 2, 1, 1, 5, 5, 5,
2, 2, 8, 8, 5, 2, 1,
7, 1, 1, 8, 2, 2, 2,
8, 7, 8, 8, 8, 8, 5,
8, 8, 1, 1, 5, 3, 9,
8, 1, 1, 2, 5, 3, 9)
T <- transition(r, function(x) 1/mean(x), 8)
# 1/mean: reciprocal to get permeability
T <- geoCorrection(T)
c1 <- c(5.5,1.5)
c2 <- c(1.5,5.5)
#make a SpatialLines object for visualization
sPath1 <- shortestPath(T, c1, c2, output="SpatialLines")
plot(r)
lines(sPath1)
#make a TransitionLayer for further calculations
sPath2 <- shortestPath(T, c1, c2)
plot(raster(sPath2))
My specific interest is in this line:
T <- transition(r, function(x) 1/mean(x), 8)
Because I've come across numerous examples of people doing the following:
T <- transition(1/r, mean, 8)
As far as I can tell, this is the difference between 1/mean(x) and mean(1/x), which are not equivalent.
To verify this, I ran both versions of the transition function using the above code from the gdistance manual, and got these two very different plots:
And using costDistance(T, c1, c2) I got a distance of 21.1 for the first, and 13.6 for the second.
Clearly, these are very different results. So, my question is, what is the correct method for creating a TransitionLayer object from a cost matrix/layer/raster?
This is indeed an important difference. Take a look at the Wikipedia article on the harmonic mean for more info.
In the example, the values in the input raster are costs. So the correct way is to take the arithmetic mean of the cost first and then take the reciprocal of that to get the conductance. The traveller experiences half of the cost of the origin cell and half of the cost of the destination cell, (cost1 + cost2)/2.
So 1/mean(x) is correct for this case.
If the input raster has conductance values, the other function is correct: mean(1/x).

Re-ordering bars in R's barplot()

What I want to achieve is exactly the same that was already asked here (and specifically using R's base graphics, not packages like ggplot or lattice): Ordering bars in barplot()
However, the solutions proposed there do not seem to work for me. What I need to is the following. Suppose I have this:
num <- c(1, 8, 4, 3, 6, 7, 5, 2, 11, 3)
cat <- c(letters[1:length(num)])
data <- data.frame(num, cat)
If I generate a barplot using barplot(data$num), here is what I get:
Now, I want to reorder the bars according to data$cat. Following the link I mentioned above, I tried the accepted answer but got an error:
num2 <- factor(num, labels = as.character(cat))
Error in factor(num, labels = as.character(cat)) : invalid 'labels'; length 10 should be 1 or 9
Then I also tried the other answer there:
num <- as.factor(num)
barplot(table(num))
But here is what I got:
So, in this particular case of mine, which is slightly different from that question, how should I order the barplot so the bars are defined by data$num but ordered according to data$cat?
you can use ggplot to do this
library("ggplot2")
num <- c(1, 8, 4, 3, 6, 7, 5, 2, 11, 3)
cat <- c(letters[1:10])
data <- data.frame(num, cat)
ggplot(data,aes(x= reorder(cat,-num),num))+geom_bar(stat ="identity")
The result is as shown below
Using base functions
df <- data[order(data$num,decreasing = TRUE),]
barplot(df$num,names.arg = df$cat)
I get the following,
num <- c(1, 8, 4, 3, 6, 7, 5, 2, 11, 3)
cat <- c(letters[1:10])
data <- data.frame(num, cat)
barplot(data[order(data[,1],decreasing=TRUE),][,1],names.arg=data[order(data[,1],decreasing=TRUE),][,2])
The above code uses the order() function twice (see comments, below). To avoid doing this the results of the ordered data.frame can be stored in a new data.frame and this can be used to generate the barplot.
num <- c(1, 8, 4, 3, 6, 7, 5, 2, 11, 3)
cat <- c(letters[1:10])
data <- data.frame(num, cat)
data2 <- data[order(data[,1],decreasing=TRUE),]
barplot(data2[,1],names.arg=data2[,2])
Alternatively, you can also use the following if you don't want to put your data in a new dataframe. Just a little simpler.
barplot(sort(data$num, decreasing = TRUE))

Extract an increasing subsequence

I wish to extract an increasing subsequence of a vector, starting from the first element. For example, from this vector:
a = c(2, 5, 4, 0, 1, 6, 8, 7)
...I'd like to return:
res = c(2, 5, 6, 8).
I thought I could use a loop, but I want to avoid it. Another attempt with sort:
a = c(2, 5, 4, 0, 1, 6, 8, 7)
ind = sort(a, index.return = TRUE)$ix
mat = (t(matrix(ind))[rep(1, length(ind)), ] - matrix(ind)[ , rep(1, length(ind))])
mat = ((mat*upper.tri(mat)) > 0) %*% rep(1, length(ind)) == (c(length(ind):1) - 1)
a[ind][mat]
Basically I sort the input vector and check if the indices verify the condition "no indices at the right hand side are lower" which means that there were no greater values beforehand.
But it seems a bit complicated and I wonder if there are easier/quicker solutions, or a pre-built function in R.
Thanks
One possibility would be to find the cumulative maxima of the vector, and then extract unique elements:
unique(cummax(a))
# [1] 2 5 6 8
The other answer is better, but i made this iterative function which works as well. It works by making all consecutive differences > 0
increasing <- function (input_vec) {
while(!all(diff(input_vec) > 0)){
input_vec <- input_vec[c(1,diff(input_vec))>0]
}
input_vec
}

Resources