Maintain attribute names when merging Igraphs - r

Problem
I have two separate networks with no overlapping nodes or edges, they both have the same attributes. I want to combine these two networks into a single network which would then be made up of two distinct components.
However when I try to merge them using the union command the attributes are renamed from "attribute" to "attribute_1" and "attribute_2". That this will happen is stated in the command help file, but I cannot find an obvious way to merge these two networks.
The situation is shown in the below code block
library(igraph)
#create a 4 node network of two components
adjmat <- rep(0, 16)
adjmat[c(2,5,12,15)] <- 1
g <- graph.adjacency(matrix(adjmat, nrow = 4) , mode = "undirected")
#give attributes naming the nodes and the edges
g <- set_vertex_attr(g, "name", value = paste0("Node_", 1:4))
g <- set_edge_attr(g, "name", value = paste0("Edge_",1:2))
#I am interested in the type attribute
g <- set_edge_attr(g, "type", value = c("foo", "bar"))
plot(g)
#Decompose into seperate networks
gList <- decompose(g)
g2 <-union(gList[[1]], gList[[2]])
#vertices are fine but edges have been renamed as stated in the helpfile for union.
get.edge.attribute(g2)
get.vertex.attribute(g2)
Work around
Currently the two separate networks originate from the same original network so I have been able to make a hack however this isn't always the case and I would like a more igraph way of merging the two.
The hack is below
#To solve this problem I do the following
#Create two dataframes from the edge characteristics of the network and combine into a single dataframe
P <- rbind(as_data_frame(gList[[1]]),
as_data_frame(gList[[2]]))
g3 <- set.edge.attribute(g, "type", value = P$type[match(P$name, get.edge.attribute(g, "name"))])
#Edges are now correct
get.edge.attribute(g3)matrix(adjmat, nrow = 4)
get.vertex.attribute(g3)
Is there a function in igraph that would merge the two seperate networks into a single network whilst maintaining the attributes as is?

I have made the below version of union, which accepts two graphs with an arbitrary number of overlapping attributes and merges them into a single graph where the attributes do not have the "_x" suffix. The graphs can be entirely independent or have overlapping nodes.
In the case of overlapping nodes the attributes of graph 1 take precedence
library(dplyr)
library(igraph)
union2<-function(g1, g2){
#Internal function that cleans the names of a given attribute
CleanNames <- function(g, target){
#get target names
gNames <- parse(text = (paste0(target,"_attr_names(g)"))) %>% eval
#find names that have a "_1" or "_2" at the end
AttrNeedsCleaning <- grepl("(_\\d)$", gNames )
#remove the _x ending
StemName <- gsub("(_\\d)$", "", gNames)
NewnNames <- unique(StemName[AttrNeedsCleaning])
#replace attribute name for all attributes
for( i in NewnNames){
attr1 <- parse(text = (paste0(target,"_attr(g,'", paste0(i, "_1"),"')"))) %>% eval
attr2 <- parse(text = (paste0(target,"_attr(g,'", paste0(i, "_2"),"')"))) %>% eval
g <- parse(text = (paste0("set_",target,"_attr(g, i, value = ifelse(is.na(attr1), attr2, attr1))"))) %>%
eval
g <- parse(text = (paste0("delete_",target,"_attr(g,'", paste0(i, "_1"),"')"))) %>% eval
g <- parse(text = (paste0("delete_",target,"_attr(g,'", paste0(i, "_2"),"')"))) %>% eval
}
return(g)
}
g <- igraph::union(g1, g2)
#loop through each attribute type in the graph and clean
for(i in c("graph", "edge", "vertex")){
g <- CleanNames(g, i)
}
return(g)
}
Using the previous example
g4 <-union2(gList[[1]], gList[[2]])
#As we would like
get.edge.attribute(g4)
get.vertex.attribute(g4)

Related

How to remove vertices by condition in igraph object, color edges by group, and size arrows based on attribute?

I am trying to do something similar to this and this post. I have an igraph object and want to remove vertices(arrows) based on an values in a column of the edges dataframe, color the edges(circles) by a group, and change the line/arrow size based on the same column in the edges dataframe. Here is some reproducible code that looks exactly like my data:
# Data
edges <- data.frame(
"agency.from" = c(rep("a",4),rep("b",4),rep("c",4),rep("d",4)),
"agency.to" = c(rep(c("a","b","c","d"),4)),
"comm.freq" = sample(0:5,16, replace=TRUE))
nodes <- data.frame(
"agency" = c("a","b","c","d"),
"group" = c("x", "y", "x", "y"),
"state" = c("i", "j", "j", "i"))
# make igraph object
net <- graph_from_data_frame(d=edges, vertices=nodes, directed=T)
plot(net)
# remove loops
net2 <- simplify(net, remove.multiple = T, remove.loops = T)
plot(net2)
Which gives me:
this
# remove vertices where communication frequency is 1 and 0
net3 <- delete.vertices(net2, which(E(net2)$comm.freq == 1))
net4 <- delete.vertices(net3, which(E(net2)$comm.freq == 0))
plot(net4)
Which does not change the plot at all
Then I try to change the colors and sizes:
# color edges by group
colrs <- c("gray50", "tomato")
V(net4)$color <- colrs[V(net4)$group]
plot(net4)
# make size of arrow based on communication frequency
plot(net4, edge.width = E(net4)$comm.freq * 5, edge.arrow.size = E(net4)$comm.freq)
And still nothing changes
I followed the code provided in the other posts and I'm just really confused why nothing will work.
Any help is much appreciated!
The simplify() function removed your edge attributes. You need to specify how you want those values to be preserved when simplifying your graph. If you just want to keep the first possible value, you can do
net2 <- simplify(net, remove.multiple = T, remove.loops = T, edge.attr.comb=list("first"))
And then you use delete.vertices but you are passing indexes for edges, not vertices. If you want to drop both vertices that are adjacent to an edge with that given property, it should look more like
net3 <- delete_vertices(net2, V(net2)[.inc(E(net2)[comm.freq==1])])
net4 <- delete_vertices(net3, V(net3)[.inc(E(net3)[comm.freq==0])])
And then for the colors you have values like "x" and "y" for group, but you are indexing into the colrs vector which has no idea what "x" and "y" correspond to. It would be better to use a named vector. For example
colrs <- c(x="gray50", y="tomato")
V(net4)$color <- colrs[V(net4)$group]

Create list with named objects in R and retrieve parts of the objects from this list

I have several dataframes (full data and reducted data) and now I want to do a whole lot of analysing with kmeans and hclust. I want to be able to work in a loop and store the results in a list where I can retreive (parts of) the stored objects based on their names. The reason is that in R-Markdown there is no good way to create new objects (and no, assign is NOT a good option to do so).
So the idea is that I make several kmeans-objects in a for-loop on several dataframes and put them to a list. But I can't seem to store them in such a way, that I can name these objects. In my list everything is cluttering up. See my example.
To retreive (parts of) the object of the desired list, I have problems how to address this parts (see my last part)
set.seed(4711)
df <- data.frame(matrix(sample(0:6, 120, replace = TRUE), ncol = 15, nrow = 8))
list_of_kmeans_objects <- list()
for (i in 2:4){
list_of_kmeans_objects <- c(list_of_kmeans_objects, kmeans(df, centers = i))
}
Now I have a clutterded up list of 36 items. But what I want is a list with 'items' which I also want to be named. My desired list would be:
C2_kmeans_df <- kmeans(df, centers = 2)
C3_kmeans_df <- kmeans(df, centers = 3)
C4_kmeans_df <- kmeans(df, centers = 4)
desired_list_of_kmeans <- list(C2_kmeans_df, C3_kmeans_df, C4_kmeans_df, C5_kmeans_df)
names(desired_list_of_kmeans)[1] <- "C2_kmeans_df"
names(desired_list_of_kmeans)[2] <- "C3_kmeans_df"
names(desired_list_of_kmeans)[3] <- "C4_kmeans_df"
If I should have this list, my last problem is how do I extract for example
C3_kmeans_df$cluster #or
C4_kmeans_df$tot.withinss
from this list, using the names of the objects in the desired list?
Here is an option using lapply and setNames.
idx <- 2:4
out <- setNames(object = lapply(idx, function(i) kmeans(df, centers = i)),
nm = paste0("C", idx, "_kmeans_df"))
Check the names
names(out)
# [1] "C2_kmeans_df" "C3_kmeans_df" "C4_kmeans_df"
Access cluster
out$C2_kmeans_df$cluster
# [1] 2 1 2 1 2 1 2 1
In your present for loop, you erase the list_of_kmeans_objects object at each iteration.
The following code should do what you do want:
list_of_kmeans_objects <- list()
aaa <- 0
for (i in 2:4) {
aaa <- aaa+1
list_of_kmeans_objects[[aaa]] <- kmeans(df, centers=i)
names(list_of_kmeans_objects)[aaa] <- paste0("C", aaa, "_kmeans_df")
}

Individual coloring of single communities

I have this bit of code
library(igraph)
library(igraphdata)
data("karate")
g <- karate
# for reproducibility
set.seed(23548723)
network_layout <- layout_with_fr(g)
trimmed_network <- delete.edges(g, which(E(g)$weight < 4))
communities <- cluster_louvain(trimmed_network)
plot(communities, trimmed_network, layout=network_layout)
and it generates
I want to disable the coloring (color="white" and mark.groups=NULL) of vertices in single vertice communities (length 1) and I know that you can manipulate the color of "normal" graphs by using $color but I did not find any hint in the igraph documentation how to handle it per community.
There is also the option not to use the community plotting with
plot(trimmed_network, ...)
thus using the color of the graph, but then I would loose the group markings.
How can I change color and group marks per community based on length(communities[1]) == 1
Identify the vertices in each group > 1 and pass those as a list to mark.groups. It is a bit fiddly, but it works.
r <- rle(sort(communities$membership))
x <- r$values[which(r$lengths>1)]
y <- r$values[which(r$lengths==1)]
cols <- communities$membership
cols[which(cols %in% y)] <- "white"
grps <- lapply(x, function(x) communities[[x]])
grps <- lapply(1:length(grps), function(x) which(V(g)$name %in% grps[[x]]))
plot(communities, trimmed_network, layout=network_layout,
col = cols, mark.groups = grps)
We need to find the numeric identifier of communities with only one member and set the color of the members of those singleton communities to "white".
# Get community membership
memb = membership(communities)
# Find number of members in each community
tab = table(memb)
# Set colors for each member. (Adjust these as desired)
col = colors()[2:(length(memb)+1)]
# But for members of communities of one, set the color to white
singles = which(memb %in% as.numeric(names(tab)[tab==1]))
col[singles] = "white"
plot(communities, trimmed_network, layout=network_layout, col=col, mark.groups=NULL)

How to collapse branches in a phylogenetic tree by the label in their nodes or leaves?

I have built a phylogenetic tree for a protein family that can be split into different groups, classifying each one by its type of receptor or type of response. The nodes in the tree are labeled as the type of receptor.
In the phylogenetic tree I can see that proteins that belong to the same groups or type of receptor have clustered together in the same branches. So I would like to collapse these branches that have labels in common, grouping them by a given list of keywords.
The command would be something like this:
./collapse_tree_by_label -f phylogenetic_tree.newick -l list_of_labels_to_collapse.txt -o collapsed_tree.eps(or pdf)
My list_of_labels_to_collapse.txt would be like this:
A
B
C
D
My newick tree would be like this:
(A_1:0.05,A_2:0.03,A_3:0.2,A_4:0.1):0.9,(((B_1:0.05,B_2:0.02,B_3:0.04):0.6,(C_1:0.6,C_2:0.08):0.7):0.5,(D_1:0.3,D_2:0.4,D_3:0.5,D_4:0.7,D_5:0.4):1.2)
The output image without collapsing is like this:
http://i.stack.imgur.com/pHkoQ.png
The output image collapsing should be like this (collapsed_tree.eps):
http://i.stack.imgur.com/TLXd0.png
The width of the triangles should represent the branch length, and the high of the triangles must represent the number of nodes in the branch.
I have been playing with the "ape" package in R. I was able to plot a phylogenetic tree, but I still can't figure out how to collapse the branches by keywords in the labels:
require("ape")
This will load the tree:
cat("((A_1:0.05,A_2:0.03,A_3:0.2,A_4:0.1):0.9,(((B_1:0.05,B_2:0.02,B_3:0.04):0.6,(C_1:0.6,C_2:0.08):0.7):0.5,(D_1:0.3,D_2:0.4,D_3:0.5,D_4:0.7,D_5:0.4):1.2):0.5);", file = "ex.tre", sep = "\n")
tree.test <- read.tree("ex.tre")
Here should be the code to collapse
This will plot the tree:
plot(tree.test)
Your tree as it is stored in R already has the tips stored as polytomies. It's just a matter of plotting the tree with triangles representing the polytomies.
There is no function in ape to do this, that I am aware of, but if you mess with the plotting function a little bit you can pull it off
# Step 1: make edges for descendent nodes invisible in plot:
groups <- c("A", "B", "C", "D")
group_edges <- numeric(0)
for(group in groups){
group_edges <- c(group_edges,getMRCA(tree.test,tree.test$tip.label[grepl(group, tree.test$tip.label)]))
}
edge.width <- rep(1, nrow(tree.test$edge))
edge.width[tree.test$edge[,1] %in% group_edges ] <- 0
# Step 2: plot the tree with the hidden edges
plot(tree.test, show.tip.label = F, edge.width = edge.width)
# Step 3: add triangles
add_polytomy_triangle <- function(phy, group){
root <- length(phy$tip.label)+1
group_node_labels <- phy$tip.label[grepl(group, phy$tip.label)]
group_nodes <- which(phy$tip.label %in% group_node_labels)
group_mrca <- getMRCA(phy,group_nodes)
tip_coord1 <- c(dist.nodes(phy)[root, group_nodes[1]], group_nodes[1])
tip_coord2 <- c(dist.nodes(phy)[root, group_nodes[1]], group_nodes[length(group_nodes)])
node_coord <- c(dist.nodes(phy)[root, group_mrca], mean(c(tip_coord1[2], tip_coord2[2])))
xcoords <- c(tip_coord1[1], tip_coord2[1], node_coord[1])
ycoords <- c(tip_coord1[2], tip_coord2[2], node_coord[2])
polygon(xcoords, ycoords)
}
Then you just have to loop through the groups to add the triangles
for(group in groups){
add_polytomy_triangle(tree.test, group)
}
I've also been searching for this kind of tool for ages, not so much for collapsing categorical groups, but for collapsing internal nodes based on a numerical support value.
The di2multi function in the ape package can collapse nodes to polytomies, but it currently can only does this by branch length threshold.
Here is a rough adaptation that allows collapsing by a node support value threshold instead (default threshold = 0.5).
Use at your own risk, but it works for me on my rooted Bayesian tree.
di2multi4node <- function (phy, tol = 0.5)
# Adapted di2multi function from the ape package to plot polytomies
# based on numeric node support values
# (di2multi does this based on edge lengths)
# Needs adjustment for unrooted trees as currently skips the first edge
{
if (is.null(phy$edge.length))
stop("the tree has no branch length")
if (is.na(as.numeric(phy$node.label[2])))
stop("node labels can't be converted to numeric values")
if (is.null(phy$node.label))
stop("the tree has no node labels")
ind <- which(phy$edge[, 2] > length(phy$tip.label))[as.numeric(phy$node.label[2:length(phy$node.label)]) < tol]
n <- length(ind)
if (!n)
return(phy)
foo <- function(ancestor, des2del) {
wh <- which(phy$edge[, 1] == des2del)
for (k in wh) {
if (phy$edge[k, 2] %in% node2del)
foo(ancestor, phy$edge[k, 2])
else phy$edge[k, 1] <<- ancestor
}
}
node2del <- phy$edge[ind, 2]
anc <- phy$edge[ind, 1]
for (i in 1:n) {
if (anc[i] %in% node2del)
next
foo(anc[i], node2del[i])
}
phy$edge <- phy$edge[-ind, ]
phy$edge.length <- phy$edge.length[-ind]
phy$Nnode <- phy$Nnode - n
sel <- phy$edge > min(node2del)
for (i in which(sel)) phy$edge[i] <- phy$edge[i] - sum(node2del <
phy$edge[i])
if (!is.null(phy$node.label))
phy$node.label <- phy$node.label[-(node2del - length(phy$tip.label))]
phy
}
This is my answer based on phytools::phylo.toBackbone function,
see http://blog.phytools.org/2013/09/even-more-on-plotting-subtrees-as.html, and http://blog.phytools.org/2013/10/finding-edge-lengths-of-all-terminal.html. First, load the function at the end of code.
library(ape)
library(phytools) #phylo.toBackbone
library(phangorn)
cat("((A_1:0.05,E_2:0.03,A_3:0.2,A_4:0.1,A_5:0.1,A_6:0.1,A_7:0.35,A_8:0.4,A_9:01,A_10:0.2):0.9,((((B_1:0.05,B_2:0.05):0.5,B_3:0.02,B_4:0.04):0.6,(C_1:0.6,C_2:0.08):0.7):0.5,(D_1:0.3,D_2:0.4,D_3:0.5,D_4:0.7,D_5:0.4):1.2):0.5);"
, file = "ex.tre", sep = "\n")
phy <- read.tree("ex.tre")
groups <- c("A", "B|C", "D")
backboneoftree<-makebackbone(groups,phy)
# tip.label clade.label N depth
# 1 A_1 A 10 0.2481818
# 2 B_1 B|C 6 0.9400000
# 3 D_1 D 5 0.4600000
{
tryCatch(dev.off(),error=function(e){""})
par(fig=c(0,0.5,0,1), mar = c(0, 0, 2, 0))
plot(phy, main="Original" )
par(fig=c(0.5,1,0,1), oma = c(0, 0, 1.2, 0), xpd=NA, new=T)
plot(backboneoftree)
title(main="Clades")
}
makebackbone <- function(groupings,phy){
listofspecies <- phy$tip.label
listtopreserve <- character()
newedgelengths <- meandistnode<- lengthofclades<- numeric()
for (i in 1:length(groupings)){
bestmrca<-getMRCA(phy,grep(groupings[i], phy$tip.label) )
mrcatips<-phy$tip.label[unlist(phangorn::Descendants(phy,bestmrca, type="tips") )]
listtopreserve[i] <- mrcatips[1]
meandistnode[i] <- mean(dist.nodes(phy)[unlist(lapply(mrcatips,
function(x) grep(x, phy$tip.label) ) ),bestmrca] )
lengthofclades[i] <- length(mrcatips)
provtree <- drop.tip(phy,mrcatips, trim.internal=F, subtree = T)
n3 <- length(provtree$tip.label)
newedgelengths[i] <- setNames(provtree$edge.length[sapply(1:n3,function(x,y)
which(y==x),
y=provtree$edge[,2])],
provtree$tip.label)[provtree$tip.label[grep("tips",provtree$tip.label)] ]
}
newtree <- drop.tip(phy,setdiff(listofspecies,listtopreserve),
trim.internal = T)
n <- length(newtree$tip.label)
newtree$edge.length[sapply(1:n,function(x,y)
which(y==x),
y=newtree$edge[,2])] <- newedgelengths + meandistnode
trans <- data.frame(tip.label=newtree$tip.label,clade.label=groupings,
N=lengthofclades, depth=meandistnode )
rownames(trans) <- NULL
print(trans)
backboneoftree <- phytools::phylo.toBackbone(newtree,trans)
return(backboneoftree)
}
EDIT: I haven't tried this, but it might be another answer: "Script and function to transform the tip branches of a tree , i.e the thickness or to triangles, with the width of both correlating with certain parameters (e.g., species number of the clade) (tip.branches.R)"
https://www.en.sysbot.bio.lmu.de/people/employees/cusimano/use_r/index.html
I think the script is finally doing what I wanted.
From the answer that #CactusWoman provided, I changed the code a little bit so the script will try to find the MRCA that represents the largest branch that matches to my search pattern. This solved the problem of not merging non-polytomic branches, or collapsing the whole tree because one matching node was mistakenly outside the correct branch.
In addition, I included a parameter that represents the limit for the pattern abundance ratio in a given branch, so we can select and collapse/group branches that have at least 90% of its tips matching to the search pattern, for example.
library(geiger)
library(phylobase)
library(ape)
#functions
find_best_mrca <- function(phy, group, threshold){
group_matches <- phy$tip.label[grepl(group, phy$tip.label, ignore.case=TRUE)]
group_mrca <- getMRCA(phy,phy$tip.label[grepl(group, phy$tip.label, ignore.case=TRUE)])
group_leaves <- tips(phy, group_mrca)
match_ratio <- length(group_matches)/length(group_leaves)
if( match_ratio < threshold){
#start searching for children nodes that have more than 95% of descendants matching to the search pattern
mrca_children <- descendants(as(phy,"phylo4"), group_mrca, type="all")
i <- 1
new_ratios <- NULL
nleaves <- NULL
names(mrca_children) <- NULL
for(new_mrca in mrca_children){
child_leaves <- tips(tree.test, new_mrca)
child_matches <- grep(group, child_leaves, ignore.case=TRUE)
new_ratios[i] <- length(child_matches)/length(child_leaves)
nleaves[i] <- length(tips(phy, new_mrca))
i <- i+1
}
match_result <- data.frame(mrca_children, new_ratios, nleaves)
match_result_sorted <- match_result[order(-match_result$nleaves,match_result$new_ratios),]
found <- numeric(0);
print(match_result_sorted)
for(line in 1:nrow(match_result_sorted)){
if(match_result_sorted$ new_ratios[line]>=threshold){
return(match_result_sorted$mrca_children[line])
found <- 1
}
}
if(found==0){return(found)}
}else{return(group_mrca)}
}
add_triangle <- function(phy, group,phylo_plot){
group_node_labels <- phy$tip.label[grepl(group, phy$tip.label)]
group_mrca <- getMRCA(phy,group_node_labels)
group_nodes <- descendants(as(tree.test,"phylo4"), group_mrca, type="tips")
names(group_nodes) <- NULL
x<-phylo_plot$xx
y<-phylo_plot$yy
x1 <- max(x[group_nodes])
x2 <-max(x[group_nodes])
x3 <- x[group_mrca]
y1 <- min(y[group_nodes])
y2 <- max(y[group_nodes])
y3 <- y[group_mrca]
xcoords <- c(x1,x2,x3)
ycoords <- c(y1,y2,y3)
polygon(xcoords, ycoords)
return(c(x2,y3))
}
#main
cat("((A_1:0.05,E_2:0.03,A_3:0.2,A_4:0.1,A_5:0.1,A_6:0.1,A_7:0.35,A_8:0.4,A_9:01,A_10:0.2):0.9,((((B_1:0.05,B_2:0.05):0.5,B_3:0.02,B_4:0.04):0.6,(C_1:0.6,C_2:0.08):0.7):0.5,(D_1:0.3,D_2:0.4,D_3:0.5,D_4:0.7,D_5:0.4):1.2):0.5);", file = "ex.tre", sep = "\n")
tree.test <- read.tree("ex.tre")
# Step 1: Find the best MRCA that matches to the keywords or search patten
groups <- c("A", "B|C", "D")
group_labels <- groups
group_edges <- numeric(0)
edge.width <- rep(1, nrow(tree.test$edge))
count <- 1
for(group in groups){
best_mrca <- find_best_mrca(tree.test, group, 0.90)
group_leaves <- tips(tree.test, best_mrca)
groups[count] <- paste(group_leaves, collapse="|")
group_edges <- c(group_edges,best_mrca)
#Step2: Remove the edges of the branches that will be collapsed, so they become invisible
edge.width[tree.test$edge[,1] %in% c(group_edges[count],descendants(as(tree.test,"phylo4"), group_edges[count], type="all")) ] <- 0
count = count +1
}
#Step 3: plot the tree hiding the branches that will be collapsed/grouped
last_plot.phylo <- plot(tree.test, show.tip.label = F, edge.width = edge.width)
#And save a copy of the plot so we can extract the xy coordinates of the nodes
#To get the x & y coordinates of a plotted tree created using plot.phylo
#or plotTree, we can steal from inside tiplabels:
last_phylo_plot<-get("last_plot.phylo",envir=.PlotPhyloEnv)
#Step 4: Add triangles and labels to the collapsed nodes
for(i in 1:length(groups)){
text_coords <- add_triangle(tree.test, groups[i],last_phylo_plot)
text(text_coords[1],text_coords[2],labels=group_labels[i], pos=4)
}
This doesn't address depicting the clades as triangles, but it does help with collapsing low-support nodes. The library ggtree has a function as.polytomy which can be used to collapse nodes based on support values.
For example, to collapse bootstraps less than 50%, you'd use:
polytree = as.polytomy(raxtree, feature='node.label', fun=function(x) as.numeric(x) < 50)

Retain start and end IDs when creating spatial lines from X Y points

The following code will create SpatialLines that connect all x/y points in a dataset. However, each of those x/y points has a unique ID. I need to retain both the startpoint and the endpoint IDs of the x/y as "attributes" (the ArcGIS term) of each of the SpatialLines that connect them.
Help would be appreciated.
I have a dataset of XY values that looks like this
x<-c(2,4,6,3,7,9,1)
y<-c(6,4,8,2,9,6,1)
id<-c("a","b","c","d","e","f","g")
dataset<-data.frame(cbind(x,y,id))
dataset$x<-as.numeric(as.character(dataset$x)) #converting from factor to numeric
dataset$y<-as.numeric(as.character(dataset$y))
plot(dataset$x,dataset$y)
Replicate the dataframe to cover for all possible combinations
dataset<-do.call(rbind, replicate(7, dataset, simplify=FALSE))
Now, create a matrix with all the same destination points, mixed:
nm=matrix(ncol=3)
for (i in 1:7){
nm<-rbind(nm,do.call(rbind,replicate(7,as.matrix(dataset[i,]),simplify=FALSE)))
}
nm<-nm[-1,]
Rename the columns of matrix, so they make sense, and bind the existing data frame with the new matrix
colnames(nm)<-c("x2","y2","id.dest")
newds<-cbind(dataset,as.data.frame(nm))
Remove duplicated trajectories:
newds1<-newds[-which(newds$id==newds$id.dest),]
converting destination x & y to numeric from factor
newds1$x2<-as.numeric(as.character(newds1$x2)) #converting from factor to numeric
newds1$y2<-as.numeric(as.character(newds1$y2))
plotting the destination points . . .same as the origin points
plot(newds1$x, newds1$y)
plot(newds1$x2, newds1$y2, col="red")
####*
converting the begin and end points to spatial lines
raw list to store Lines objects
l <- vector("list", nrow(newds1)) #
this l is now an empty vector w/ number of rows defined by length (nrow) of newds1
splitting origin and destination coordinates so I can run this script
origins<-data.frame(cbind(newds1$x, newds1$y))
destinations<-data.frame(cbind(newds1$x2, newds1$y2))
library(sp)
for (i in seq_along(l)) {
l[[i]] <- Lines(list(Line(rbind(origins[i, ], destinations[i,]))), as.character(i))
}
l.spatial<-SpatialLines(l)
plot(l.spatial, add=T)
The object newds1 contains both the startpoint and endpoint. However, the final SpatialLines that are created from those start and endpoints (l.spatial) do not contain a reference to the start and endpoints. I would like those SpatialLines to contain two "attribute" columns that refer to the IDs of the start and endpoints. I think this is a matter of binding newds1 (a dataframe) onto l.spatial (spatial lines), but the code I'm running doesn't seem to do it.
Attempting to spatially bind the start and end IDs to l.spatial
row.names(newds1)<-1:length(newds1$id) #renaming rows in the dataframe so they match the spatial object
id<-newds1$id
newds2<-spCbind(l.spatial, id)
I get
"Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘spCbind’ for signature ‘"SpatialLines", "factor"’"
In short:
newds2 <- SpatialLinesDataFrame(l.spatial, newds1, match.ID = FALSE)
## or you can use the rownames of newds1 in the lines loop)
There are 42 distinct lines, made by matching every coordinate to each other coordinate once.
FWIW, you don't need to convert from factor for numeric:
x <- c(2,4,6,3,7,9,1)
y <- c(6,4,8,2,9,6,1)
id <- c("a","b","c","d","e","f","g")
## don't coerce to character in the
##first place cbind(x, y, id) *must* be
## character and then data.frame
## converts characters to factors
dataset <- data.frame(x = x, y = y, id = id)
There are other ways to simplify your task, but here's a reasonably straightforward way (I think this is what you are after):
x <- c(2,4,6,3,7,9,1)
y <- c(6,4,8,2,9,6,1)
id<-c("a","b","c","d","e","f","g")
## don't coerce to character in the first place cbind(x, y, id) must be character
## and then data.frame converts characters to factors by default
dataset<-data.frame(x = x, y = y, id = id)
l <- vector("list", nrow(dataset) * (nrow(dataset) - 1))
origID <- destID <- character(length(l))
##xy <- as.matrix(dataset[, c("x", "y")])
cnt <- 0
for (i in seq(nrow(dataset))) {
pt0 <- as.matrix(dataset[i, c("x", "y") ])
pts <- dataset[-i, ]
for (j in seq(nrow(pts))) {
cnt <- cnt + 1
l[[cnt]] <- Lines(list(Line(rbind(pt0, as.matrix(pts[j, c("x", "y")])))), as.character(cnt))
destID[cnt] <- pts$id[j]
origID[cnt] <- dataset$id[i]
}
}
x <- SpatialLinesDataFrame(SpatialLines(l), data.frame(dest = destID, orig = origID, row.names = as.character(1:cnt)))
Pick out one line and investigate:
itest <- 10
## so for example
as.data.frame(x[itest, ])
index <- c(x$orig[itest], x$dest[itest])
plot(x)
plot(x[itest, ], lwd = 4, add = TRUE)
lines(dataset[index, c("x", "y")], col = "firebrick", lwd = 2)
text(dataset[index, c("x", "y")], label = dataset$id[index], col = "dodgerblue", cex = 4)

Resources