How to assign different images to different vertices in an igraph? - r

I've looked at this question which seems similar but I am having difficulty getting it to work with my data.
Let's say my edgelist consists of the following:
P1 P2 weight
a b 1
a c 3
a d 2
b c 8
I use read.csv to collect the data, and then I convert it to a matrix. Then I graph it using the following:
g=graph.edgelist(x[,1:2],directed=F)
E(g)$weight=as.numeric(x[,3])
tkplot(g,layout=layout.fruchterman.reingold,edge.width=E(g)$weight)
And this returns a network with vertices and edges. I would like to replace vertex a with one image, vertex b with another and so on. I know how to apply the same image to all, but I want to apply a different image to every vertex. How do I go about doing this?
Edit: Adding reproducible code below as requested by user20650
# loading libraries
library(igraph)
library(rgdal)
# reading data from edgelist
x <- read.csv('edgelist', colClasses = c("character","character","numeric"), header=T)
# however, to replicate the data, use this line instead (Above line included just to show how I get the data)
x <- data.frame(P1 = c("a","a","a","b"), P2 = c("b","c","d","c"), weight = c(1,3,2,8))
# converting x to a matrix
x = as.matrix(x)
# preparing graph (getting rid of arrows, edge colors)
g = graph.edgelist(x[,1:2], directed=F)
E(g)$weight=as.numeric(x[,3])
E(g)[weight<=1]$color='dodgerblue'
E(g)[weight>=2&weight<=3]$color='dodgerblue1'
E(g)[weight>=4&weight<=7]$color='dodgerblue2'
E(g)[weight>=8&weight<=9]$color='dodgerblue3'
E(g)[weight==10]$color='dodgerblue4'
# plot the graph
# beginning of stuff I do not do anymore - the tkplot and adj lines below here I do not do anymore as they have been replaced with suggestions by user20650
tkplot(g, canvas.width=640, canvas.height=640, layout=layout.fruchterman.reingold, edge.width=E(g)$weight)
# just to make sure everything is correct, I was also verifying with this
adj <- get.adjacency(g, attr='weight')
# end of stuff I do not do anymore and I replaced it with what follows
# this is where I started placing user20650's lines (survcont1.png through survcont13.png are local files - 1 is the image for a, 2 for b, and so on)
url <- paste0("survcont", 1:13, ".png")
# my mapply which I guess I don't need anymore (I'm using rgdal because it is a library I already have that can read the images, am willing to use a better method if one exists)
mapply(readGDAL, url)
img <- lapply(url, png::readPNG)
set.seed(1)
adj <- matrix(sample(0:1,3^2,T,prob=c(0.8,0.8)),13,13)
g <- graph.adjacency(adj)
set.seed(1)
l <- layout.fruchterman.reingold(g)
l[,1]=(l[,1]-min(l[,1]))/(max(l[,1])-min(l[,1]))*2-1
l[,2]=(l[,2]-min(l[,2]))/(max(l[,2])-min(l[,2]))*2-1
# I added in the label so I can verify if the right vertices are showing up in the right places, I will remove in final version, also added in the edge weights
plot(g, layout=l, vertex.size=10, vertex.shape="square", vertex.color="#00000000", vertex.frame.color="#00000000", vertex.label="", edge.width=E(g)$weight)
# and finally plotting of the images
for(i in 1:nrow(l)) {
rasterImage(img[[i]], l[i, 1]-0.2, l[i, 2]-0.2, l[i, 1]+0.2, l[i, 2]+0.2)
}
I am guessing something is going wrong with the adj line and I'm somehow not linking my data to the images. I also don't get why I need to set.seed.
The images plot, which is great, but my original edge widths and colors do not and I am not sure image 1 is linking to a, 2 to b, and so on.

You can use Sacha's answer in the question you link to, to do this. If your images are stored in a list, just iterate through it to render the png files. I had to tweak the manual adjustment (from 0.1 to 0.2) to resize the image.
EDIT Using OP's data and adding edge weight and colour (deleted original post as this largely repeats it)
First need some images for the vertices.
# As i dont have access to your images i will download and use the
# images as before. We need four images as there are four vertices
# You dont need to do this bit exactly, all you need to do is read
# in your images into your R session, in a list called img
url <- paste0("http://pngimg.com/upload/cat_PNG", 1632:1635, ".png")
mapply(download.file, url, basename(url))
img <- lapply( basename(url), png::readPNG)
library(igraph)
# data
x <- data.frame(P1 = c("a","a","a","b"),
P2 = c("b","c","d","c"),
weight = c(1,3,2,8))
# this reads in the third column which you can then assign to be weights
g <- graph.data.frame(x, directed=FALSE)
# check
E(g)$weight
# edge colour - you might need to tweak this depending on your
# data, with the right argument etc
E(g)$colour <- as.character(cut(as.numeric(E(g)$weight),
breaks = c(0, 1, 3, 7, 9, 10),
labels=paste0("dodgerblue", c("", 1:4))))
# you need to set the seed as the layout function is an
# iterative process and not deterministic
set.seed(1)
l <- layout.norm(layout.fruchterman.reingold(g),
xmin=-1, xmax=1, ymin=-1, ymax=1)
par(mar=rep(0,4))
plot(g, layout=l, vertex.size=20, vertex.shape="square",
vertex.color="#00000000", vertex.frame.color="#00000000",
vertex.label="", edge.width=E(g)$weight, edge.color=E(g)$colour)
# and finally plotting of the images
for(i in 1:nrow(l)) {
rasterImage(img[[i]], l[i, 1]-0.2, l[i, 2]-0.2, l[i, 1]+0.2, l[i, 2]+0.2)
}

Related

How to visualize a distance matrix on the map by the thickness or color of the line connect the adjacent localities in R?

Suppose I have two datasets: (1) a data frame: coordinates of localities, each with ID; and (2) a linguistic distance matrix which reflects the linguistic distance between these localities.
# My data are similar to this structure
# dataframe
id <- c("A","B","C","D","E")
x_coor <- c(0.5,1,1,1.5,2)
y_coor <- c(5.5,3,7,6.5,5)
my.data <- data.frame(id = id, x_coor = x_coor, y_coor = y_coor)
# linguistic distance matrix
A B C D
B 308.298557
C 592.555483 284.256926
D 141.421356 449.719913 733.976839
E 591.141269 282.842712 1.414214 732.562625
Now, I want to visualize the linguistic distance between every two sites onto a map by the thickness or color of the line connect the adjacent localities in R.
Just like this:
enter image description here
My idea is to generate the delaunay triangulation by deldir or tripack package in R.
# generate delaunay triangulation
library(deldir)
de=deldir(my.data$x_coor,my.data$y_coor)
plot.deldir(de,wlines="triang",col='blue',wpoints = "real",cex = 0.1)
text(my.data$x_coor,my.data$y_coor,my.data$id)
this is the plot:
enter image description here
My question is how to reflect the linguistic distance by the thickness or color of the edges of triangles? Is there any other better method?
Thank you very much!
What you want to do in respect of the line widths can be done "fairly
easily" by the deldir package. You simply call plot.deldir() with the
appropriate value of "lw" (line width).
At the bottom of this answer is a demonstration script "demo.txt" which shows how to do this in the case of your example. In particular this script shows
how to obtain the appropriate value of lw from the "linguistic distance
matrix". I had to make some adjustments in the way this matrix was
presented. I.e. I had to convert it into a proper matrix.
I have rescaled the distances to lie between 0 and 10 to obtain the
corresponding values of the line widths. You might wish to rescale in a different manner.
In respect of colours, there are two issues:
(1) It is not at all clear how you would like to map the "linguistic
distances" to colours.
(2) Unfortunately the code for plot.deldir() is written in a very
kludgy way, whence the "col" argument to segments() cannot be
appropriately passed on in the same manner that the "lw" argument can.
(I wrote the plot.deldir() code a long while ago, when I knew far less about
R programming than I know now! :-))
I will adjust this code and submit a new version of deldir to CRAN
fairly soon.
#
# Demo script
#
# Present the linguistic distances in a useable way.
vldm <- c(308.298557,592.555483,284.256926,141.421356,449.719913,
733.976839,591.141269,282.842712,1.414214,732.562625)
ldm <- matrix(nrow=5,ncol=5)
ldm[row(ldm) > col(ldm)] <- vldm
ldm[row(ldm) <= col(ldm)] <- 0
ldm <- (ldm + t(ldm))/2
rownames(ldm) <- LETTERS[1:5]
colnames(ldm) <- LETTERS[1:5]
# Set up the example data. It makes life much simpler if
# you denote the "x" and "y" coordinates by "x" and "y"!!!
id <- c("A","B","C","D","E")
x_coor <- c(0.5,1,1,1.5,2)
y_coor <- c(5.5,3,7,6.5,5)
# Eschew nomenclature like "my.data". Such nomenclature
# is Micro$oft-ese and is an abomination!!!
demoDat <- data.frame(id = id, x = x_coor, y = y_coor)
# Form the triangulation/tessellation.
library(deldir)
dxy <- deldir(demoDat)
# Plot the triangulation with line widths proportional
# to "linguistic distances". Note that plot.deldir() is
# a *method* for plot, so you do not have to (and shouldn't)
# type the ".deldir" in the plotting command.
plot(dxy,col=0) # This, and plotting with "add=TRUE" below, is
# a kludge to dodge around spurious warnings.
ind <- as.matrix(dxy$delsgs[,c("ind1","ind2")])
lwv <- ldm[ind]
lwv <- 10*lwv/max(lwv)
plot(dxy,wlines="triang",col='grey',wpoints="none",
lw=10*lwv/max(lwv),add=TRUE)
with(demoDat,text(x,y,id,col="red",cex=1.5))

How can I get the same piece (duplicate code) of an image from many different photos every time?

From 5000 photos of license plates I want to determine which duplicate code these license plates have.
Here are 2 examples of a duplicate code on a license plate.
In the first example the duplicate code is 2 and in the second example the duplicate code is 1.
With the package Magick and Tesseract, see code below, I was able to retrieve the piece of the photo from the first example where the duplicate code is and to read the duplicate code. Only in the second example and other photos is the photo different.
So I am looking for something that can recognize where the duplicate code is and that will read the duplicate code. Note: The duplicate code is always above the 1st indent mark.
Does someone have an idea how to read the duplicate code automatically from 5000 different photos?
library(magick)
library(tesseract)
#Load foto:
foto <- image_read("C:/Users/camie/OneDrive/Documenten/kenteken3.jpg")
#Get piece of photo where duplicate code is retrieved:
foto2 <- image_crop(foto,"10X24-620-170")
#read duplicate code:
cat(ocr(foto3))
Here is an approach based on the package EBImage. ImageMagik is great for image manipulation but I think EBImage may provide more quantitative tools that are useful here. As for all image processing, the quality of input image matters a great deal. The approach suggested here would likely benefit from noise and artifact removal, scaling and possibly cropping.
Also, some licenses seem to have additional symbols in the position of interest that are not numbers. Clearly more pre-processing and filtering are needed for such cases.
Sample image
# Starting from EBImage
if (!require(EBImage)) {
source("http://bioconductor.org/biocLite.R")
biocLite("EBImage")
library(EBImage)
}
# Test images
# setwd(<image directory>)
f1 <- "license1.jpg"
f2 <- "license2.jpg"
# Read image and convert to normalized greyscale
img0 <- readImage(f1)
img <- channel(img0, "grey")
img <- normalize(img)
# plot(img) # insert plot or display commands as desired
# Rudimentary image process for ~300 pixel wide JPEG
xmf <- medianFilter(img, 1)
xgb <- gblur(xmf, 1)
xth <- xgb < otsu(xgb) # Otsu's algorithm to determine best threshold
xto <- opening(xth, makeBrush(3, shape = "diamond"))
A binary (thresholded) image has been produced and cleaned up to identify objects as shown here.
# Create object mask with unique integer for each object
xm <- bwlabel(xto)
# plot(colorLabels(xm)) # optional code to visualize the objects
In addition to the rudimentary image process, some "object processing" can be applied as shown here. Objects along the edge are not going to be of interest so they are removed. Similarly, artifacts that give rise to horizontal (wide) streaks can be removed as well.
# Drop objects touching the edge
nx <- dim(xm)[1]
ny <- dim(xm)[2]
sel <- unique(c(xm[1,], xm[nx,], xm[,1], xm[,ny]))
sel <- sel[sel != 0]
xm <- rmObjects(xm, sel, reenumerate = TRUE)
# Drop exceptionally wide objects (33% of image width)
major <- computeFeatures.moment(xm)[,"m.majoraxis"]
sel <- which(major > nx/3)
xm <- rmObjects(xm, sel, reenumerate = TRUE)
The following logic identifies the center of mass for each object with the computeFeatures.moment function of EBImage. It seems that the main symbols will be along a horizontal line while the candidate object will be above that line (lower y-value in EBImage Image object). An alternative approach would be to find objects stacked on one another, i.e., objects with similar x-values.
For the examples I explored, one standard deviation away from the median y-value for the center of mass appears to be sufficient to identify candidate object. This is used to determine the limits shown below. Of course, this logic should be adjusted as dictated by the actual data.
# Determine center of mass for remaining objects
M <- computeFeatures.moment(xm)
x <- M[,1]
y <- M[,2]
# Show suggested limit on image (y coordinates are inverted)
plot(img)
limit <- median(y) - sd(y)
abline(h = limit, col = "red")
# Show centers of mass on original image
ok <- y < limit
points(x[!ok], y[!ok], pch = 16, col = "blue")
points(x[ok], y[ok], pch = 16, col = "red")
The image shows the segmented objects after having discarded objects along the edge. Red shows the candidate, blue shows the non-candidates.
Because some licenses have two symbols above the dash, the following code selects the leftmost of possible candidates, expands the object mask and returns a rectangular crop of the image that can be passed to ocr().
# Accept leftmost (first) of candidate objects
left <- min(x[which(ok)])
sel <- which(x == left)
# Enlarge object mask and extract the candidate image
xm <- dilate(xm, makeBrush(7, "disc"))
ix <- range(apply(xm, 2, function(v) which(v == sel)))
iy <- range(apply(xm, 1, function(v) which(v == sel)))
xx <- ix[1]:ix[2]
yy <- iy[1]:iy[2]
# "Return" selected portion of image
ans <- img[xx, yy] # this is what can be passed to tesseract
plot(ans, interpolate = FALSE)
Here is the unscaled and extracted candidate image from example 1:
Another sample image
The same code applied to this example gives the following:
With a few more checks for errors and for illogical conditions, the code could be assembled into single function and applied to the list of 5000 files! But of course that assumes they are properly formatted, etc. etc.
What with the existance of multiple layouts for Dutch license plates, I'm not sure if you just can hardcode a method to extract a duplication value. Also you don't mention if every image you have always has the same quality and/or orientation/scale/skew/etc.
You could in theory apply a Convolutional Neural Network that categorizes license plates in a several categories. (0 for n/a, 1 for 1, 2 for 2, etc.) However I am not familiar with related packages in R, so I won't be able to point you to some.

cluster walktrap returns three communities, but when plotting they are all on top of each other, with no visible clustering

I've been following documentation tutorials and even lecture tutorials step by step. But for some reason the output of my plot is like this:
The output doesn't make any sense to me. There clearly is no structure, or communities in this current plot, as you can see that the bigger circles are all overlapping. Shouldn't this, in this case, return only a single community? Additionally the modularity of my network is ~0.02 which would again, suggest there is no community structure. But why does it return 3 communities?
this is my code: (exactly same as in documentation, with different dataset)
m <- data.matrix(df)
g <- graph_from_adjacency_matrix(m, mode = "undirected")
#el <- get.edgelist(g)
wc <- cluster_walktrap(g)
modularity(wc)
membership(wc)
plot(wc,g)
my data set looks is a 500x500 adjacency matrix in the form of a csv, with a 1-500 column and index names corresponding to a person.
I tried understanding the community class and using different types of variables for the plot, e.g. membership(wc)[2] etc. My thought is that the coloring is simply wrong, but nothing Ive tried so far seems to fix the issue.
You can have inter-community connections. You're working with a graph of 500 nodes and they can have multiple connections. There will be a large number of connections between nodes of different communities, but if you conduct a random walk you're most likely to traverse connections between nodes of the same community.
If you separate the communities in the plot (using #G5W's code (igraph) Grouped layout based on attribute) you can see the different groups.
set.seed(4321)
g <- sample_gnp(500, .25)
plot(g, vertex.label = '', vertex.size = 5)
wc <- cluster_walktrap(g)
V(g)$community <- membership(wc)
E(g)$weight = 1
g_grouped = g
for(i in unique(V(g)$community)){
groupV = which(V(g)$community == i)
g_grouped = add_edges(g_grouped, combn(groupV, 2), attr=list(weight = 2))
}
l <- layout_nicely(g_grouped)
plot( wc,g, layout = l, vertex.label = '', vertex.size = 5, edge.width = .1)
Red edges are intercommunity connections and black edges are intracommunity edges

Extracting the outer-boundary of a set of grid points with some missing grids (holes)

This question is about generalization of this question. The mentioned question is working well for the point set with no hole. In the present question I want to get the perimeter (outer boundary) of a subset of a near-regular grid of points where some of the grid point with in the polygon are missing (i.e., polygon with hole).
The sample data set on grids is available here.
I used the R-code as suggested as an answer in the above mentioned question (with no holes).
The following is the output of using those codes:
Now I want it ignore holes in inside the point set and want to consider the outer boundary of the point set as the required polygon.
Any suggestion!! Thanks.
This slight variant on my previous code works if there's holes by finding all the loops and taking only the one with the largest X coordinate, which must be the outside loop. Unless the loops touch... Strictly perhaps it should take the loop with the largest area... Note also the need to use X and Y in one of the functions because of a bug (I've reported) in the igraph package.
perimeterGrid <- function(pts, maxdist=6000, mindist=1){
g = edgeP(makegrid(pts, maxdist=maxdist, mindist=mindist))
## there might be holes. Find the loop with the largest X coordinate.
parts = components(g)
outer = which.max(tapply(V(g)$x, parts$membership, function(x){max(x)}))
g = induced_subgraph(g, which(parts$membership==outer))
loop = graph.dfs(minimum.spanning.tree(g),1)$order
cbind(V(g)$x, V(g)$y)[loop,]
}
# haversine distance matrix
dmat <- function(pts){
n=nrow(pts)
do.call(rbind,lapply(1:n,function(i){distHaversine(pts[i,],pts)}))
}
# make the grid cells given a maxdist (and a mindist to stop self-self edges)
makegrid <- function(pts, maxdist=6000, mindist=1){
dists = dmat(pts)
g = graph.adjacency(dists<maxdist & dists>mindist,
mode="undirected")
## use X and Y here not x and y due to igraph bug
## these get copied to the output later...
V(g)$X=pts[,1]
V(g)$Y=pts[,2]
g
}
# clever function that does the grid edge count etc
edgeP <- function(g){
# find all the simple squares
square=graph.ring(4)
subs = graph.get.subisomorphisms.vf2(g,square)
# expand all the edges
subs = do.call(rbind, lapply(subs, function(s){
rbind(s[1:2], s[2:3], s[3:4], s[c(4,1)])
}))
# make new graph of the edges of all the squares
e = graph.edgelist(subs,directed=FALSE)
# add the weight as the edge count
E(e)$weight=count.multiple(e)
# copy the coords from the source back
V(e)$x=V(g)$X
V(e)$y=V(g)$Y
# remove multiple edges
e=simplify(e)
# internal edges now have weight 256.
e = e - edges(which(E(e)$weight==256))
# internal nodes how have degree 0
e = e - vertices(degree(e)==0)
return(e)
}

Plot tree with R

from a data.frame (or any other R object type), with 3 Columns: "Node, Parent and text", I'd like to plot a tree with rows from "Node" to "Parent" and "text" as label.
Can anyone suggest a good library to use and example code, if possible.
I've been looking at the igraph library, but all examples I could find plot trees with sequential numbers or letters as nodes and its not simple to set the tree layout.
Any help would be greatly appreciated
Thanks
EDIT:
Thanks guys for all your help, I really appreciate it.
Some extra comments, if you can help further
#md1630, I tried your suggestion but that's not what I'm looking for. The fist code plots the tree with the root on top and the arrows from root to leaf and the second corrects the arrows but inverts the tree. What I'd like is root on top and arrow from leafs to root (I understand that may not be a tree per say - but that's the requirement
#user20650 your solution looks correct but the image starts to get crowded as the number of nodes increase. Any idea on how to add more space between them?
#math Am I using the function you provided correctly? I called plot(layout.binary(g)) and got the result on the left. The one on the right is the output of plot(g)
upgrade comment
library(igraph)
# some example data
dat <- data.frame(parent=rep(letters[1:3], each=2),
node=letters[2:7],
text=paste0("lab", 1:6))
# create graph
g <- graph.data.frame(dat)
# plot
# layout.reingold.tilford gives a tree structure
# edge and vertx labels can be defined in the plot command or alternatively
# you can add them to the graph via V(g)$name and E(g($label assignments
plot(g, layout = layout.reingold.tilford,
edge.label=E(g)$text, vertex.label=paste0("v_lab",1:7))
EDIT re comment
If you want the direction to go from the leaves towards the root; you can first, get the tree layout coordinates from the more standard tree structure, and then reverse the edges.
# get tree layout coords
g <- graph.data.frame(dat)
lay = layout.reingold.tilford(g)
# redraw graph with edges reversed
g2 <- graph.data.frame(dat[2:1], vertices = get.data.frame(g, what="vertices"))
par(mar=rep(0,4), mfrow=c(1,2))
plot(g, layout=lay)
plot(g2, layout=lay)
You can use rgraphviz. Here's the code to plot the tree from a dataframe df with columns "Node, Parent and text". I didn't run this on my computer so there may be bugs. But roughly this is the idea:
source("http://bioconductor.org/biocLite.R")
biocLite("Rgraphviz")
library("Rgraphviz")
#first set up the graph with just the nodes
nodes<- unique(df['Node'])
gR <- new("graphNEL", nodes = nodes, edgemode = "directed")
#add edges for each row in df
for (j in (1:nrow(df))) {
gR <- addEdge(df[j,2], df[j,1], gR, 1)
}
#add text labels
nAttrs <- list()
z <- df['text']
nAttrs$label <- z
#plot
plot(gR, nodeAttrs = nAttrs) #you can specify more attributes here
You can use igraph to get a network with your data (supposing your dataframe is dd):
g = graph(t(dd[,2:1]))
V(g)$label = as.character(dd$text)
plot(g, layout=layout.binary)
I supposed your root (with no parents) is not in the dataframe, otherwise use dd[-1,2:1] instead.
If you want to have a tree, you can easily produce a layout, it is simply a function that takes a graph and return a matrix. For a binary tree :
layout.binary = function(graph) {
layout = c()
r_vertex = length(V(graph))
depth = ceiling(log2(r_vertex+1))
for (ii in 0:(depth-1)) {
for (jj in 1:min(2^ii, r_vertex)) {
layout = rbind(layout, c(ii, (2*(jj-1)+1)/(2^(ii+1))))
}
r_vertex = r_vertex - 2^ii
}
return(layout)
}
It will plot an horizontal tree, use c((2*(jj-1)+1)/(2^(ii+1)), ii) if you want it to be vertical.

Resources