Self organising map visualisation result interpretation - r

Using the R Kohonen package, I have obtained a "codes" plot which shows the codebook vectors.
I would like to ask, shouldn't the codebook vectors of neighbouring nodes be similar? Why are the top 2 nodes on the left so different?
Is there a way to organise it in a meaningful organisation such as this image below? Source from here. Where the countries of high poverty are clustered at the bottom.
library("kohonen")
data("wines")
wines.sc <- scale(wines)
set.seed(7)
wine.som <- som(data = wines.sc, grid = somgrid(5, 4, "hexagonal"))
# types of plots
plot(wine.som, type="codes", main = "Wine data")

Map 1 is the average vector result for each node. The top 2 nodes that you highlighted are very similar.
Map 2 is a kind of similarity index between the nodes.
If you want to obtain such kind of map using the map 1 result you may have to develop your own plotting function with the following parameters:
Pick up the most relevant nodes or the most different ones (manually or automatically). Then, you have to attribute a color to each of these nodes.
Give a color the the neigbours nodes using the average distance between the center of each node from the selected nodes. Shorter distance = close color, higher distance = fading color.
To sum up, that's a lot of work for nearly nothing. Map 1 is better and contains a lot of informations. Map 2 is nice looking...

Related

Change edge size in igraph

I want to plot a simple star graph in which the size of the edges depends on a score representing a difference of perception between the central node (e.g.,a leader) and the other nodes (e.g., its employees).
I succeeded in modifying the colors, the size of the node, the width of the edges but not the size of the latter.
How would you do?
library(igraph)
nodes <- read.csv("exemple_nodes.csv", header=T, as.is=T)
links <- read.csv("exemple_edges.csv", header=T, as.is=T)
st <- graph_from_data_frame(d=links, vertices=nodes, directed=T)
plot(st, vertex.color=V(st)$perception.type)
With the ggraph package and one of the geom_edge_ func' (e.g., geom_edge_arc, geom_edge_diagonal), in order to use the edge_width parameter, depending on a numeric value associated with the edges, in the edges-list (hereafter "value"). For example:
ggraph::ggraph(st) +
ggraph::geom_edge_diagonal(aes(edge_width = as.numeric(value)) )
In addition, ggraph allow you to specify other edges-parameters inside the geom_edge_ func', for example edge_alpha = as.numeric(value).
I think that what you want is to position the vertices so that you can control the length of the edges. If that is not what you want, then please explain what you mean by the "size" of the edges.
You do not provide your data so that we cannot use exactly your graph. I will use a generic star graph as an example. In order to control the placement of the vertices, you need to use the layout parameter. The basic function layout_as_star will place the first vertex at the center and the other vertices equally spaced around it at the same distance. Because this layout function places the center vertex at (0,0) and the remaining nodes on a unit circle around the center, it is easy to adjust it so that the distance of the outer vertices is controlled by a parameter. Just multiply the coordinates by the parameter and it will proportionally change the distance. I just make something up for the distances, but you can use your parameter.
## Make up perception parameter
set.seed(271828)
Perception = sample(4, 9, replace=T)
Perception
[1] 2 3 4 4 1 4 2 2 1
Now there is one weight for every outer vertex, but we need a weight for the central vertex. We don't want it to move so we use a weight of 1.
Weight = c(1, Perception)
LO = layout_as_star(S10)
LO = LO*Weight
plot(S10, layout=LO)

R generate points with condition using runifpoint function

I am trying to generate randomly distributed points in a rectangle.
To create 50 random points in a rectangle, I used
i=50
pp<-runifpoint(i, win=owin(c(0,19.5),c(0,3.12))
If I were to add conditions on the coordinates before randomly generating points,
e.g. 0.24 <x<19.26 ,0.24<y<2.64 ,
then generate random points, what code can I imply?
The ultimate goal is to generate points in the rectangle except for the grey shaded area, in the below image
This is a question about the R package spatstat.
The argument win specifies the spatial region in which the points will be generated. In your example you have specified this region to be a rectangle. You just need to replace this rectangle by the region in which you want the points to be generated.
You can construct spatial regions (objects of class owin) in many ways. See help(owin), or help(spatstat) for an overview.
In your example, you could build up the shape by forming the union of several rectangles. For example to make a simple cross shape, I could just write
require(spatstat)
A <- owin(c(-1,1), c(-4, 4))
B <- owin(c(-4,4), c(-1,1))
U <- union.owin(A, B)
plot(U)
Another way would be to specify the corners of the polygon shape and use W <- owin(poly=p) where p = list(x, y) contains the coordinates of the corners, listed in anticlockwise order without repetition. See help(owin).
This is also covered in Section 3.5 of the spatstat book. You can download Chapter 3 for free.

(igraph) Grouped layout based on attribute

I'm using the iGraph package in R to layout a network graph, and I would like to group the vertex coordinates based on attribute values.
Similar to the answered question How to make grouped layout in igraph?, my question differs in that the nodes needn't be grouped by a community membership that was derived from a community detection algorithm.
Rather, I want to layout with groups based on attribute values that are known in advance for each vertex.
For example, if each vertex has an attribute "Master.Org", and there are ~10 to ~20 distinct values for Master.Org, then how can I layout the graph such that all vertices within the same Master.Org are grouped ?
Thanks!
Additional Detail
In fact, two separate attributes provide nested levels of grouping.
My goal is to layout a graph object such that the "Master.Org" and "Org.Of" values are grouped together in their XY coordinates on the graph.
For example, each node will belong to an "Org.Of". And there can be multiple "Org.Of" values within the "Master.Org".
Thoughts ?
Thanks!
While this question is rather old, it is a reasonable question and deserves an answer.
No data was provided so I will generate an arbitrary example.
library(igraph)
set.seed(1234)
G = erdos.renyi.game(20, 0.25)
V(G)$Group1 = sample(3,20, replace=TRUE)
plot(G, vertex.color=rainbow(3, alpha=0.4)[V(G)$Group1])
Without doing anything, the Group is ignored.
Now, we need to create a layout that will plot nodes
in the same group close together. We can do this by creating
a graph with the same nodes, but with additional links between
nodes in the same group. The within-group links will be given
a high weight and the original links will be given a small weight.
This will cluster nodes in the same group. We then apply the
layout to plotting the original graph, without the extra links.
They were just to get a good layout.
G_Grouped = G
E(G_Grouped)$weight = 1
## Add edges with high weight between all nodes in the same group
for(i in unique(V(G)$Group1)) {
GroupV = which(V(G)$Group1 == i)
G_Grouped = add_edges(G_Grouped, combn(GroupV, 2), attr=list(weight=5))
}
## Now create a layout based on G_Grouped
set.seed(567)
LO = layout_with_fr(G_Grouped)
## Use the layout to plot the original graph
plot(G, vertex.color=rainbow(3, alpha=0.4)[V(G)$Group1], layout=LO)
If you want to go beyond this to have multiple levels of grouping, just add additional links with appropriate weights to connect the subgroups too.

How to get the color coded plotted areas in images using R?

Hi R expert of the world,
Assume I have a point pattern that generate an intensity map and that this map is color coded in 3 region in an pixeled image.... how could I get the color-coded area?
here it is an example using spatstat:
library(spatstat)
japanesepines
Z<-density(japanesepines); plot(dens) # ---> I create a density map
b <- quantile(Z, probs = (0:3)/3) # ---> I "reduce it" to 3 color-ceded zones
Zcut <- cut(Z, breaks = b, labels = 1:3); plot(Zcut)
class(Zcut) # ---> and Zcut is my resultant image ("im")
Thank you in advance
Sacc
In your specific example it is very easy to calculate the area because you used quantile to cut the image: This effectively divides the image into areas of equal size, so there should be three areas of size 1/3 since the window is a unit square. In general to calculate areas from a factor valued image you could use as.tess and tile.areas (continuing your example):
Ztess <- as.tess(Zcut)
tile.areas(Ztess)
In this case the areas are 0.333313, which must be due to discretization.
I'm not exactly sure what you're after, but you can count up the number of pixels in each color using the table() function.
table(Zcut[[1]])

How to generate medoid plots

Hi I am using partitioning around medoids algorithm for clustering using the pam function in clustering package. I have 4 attributes in the dataset that I clustered and they seem to give me around 6 clusters and I want to generate a a plot of these clusters across those 4 attributes like this 1: http://www.flickr.com/photos/52099123#N06/7036003411/in/photostream/lightbox/ "Centroid plot"
But the only way I can draw the clustering result is either using a dendrogram or using
plot (data, col = result$clustering) command which seems to generate a plot similar to this
[2] : http://www.flickr.com/photos/52099123#N06/7036003777/in/photostream "pam results".
Although the first image is a centroid plot I am wondering if there are any tools available in R to do the same with a medoid plot Note that it also prints the size of each cluster in the plot. It would be great to know if there are any packages/solutions available in R that facilitate to do this or if not what should be a good starting point in order to achieve plots similar to that in Image 1.
Thanks
Hi All,I was trying to work out the problem the way Joran told but I think I did not understand it correctly and have not done it the right way as it is supposed to be done. Anyway this is what I have done so far. Following is how the file looks like that I tried to cluster
geneID RPKM-base RPKM-1cm RPKM+4cm RPKMtip
GRMZM2G181227 3.412444267 3.16437442 1.287909035 0.037320722
GRMZM2G146885 14.17287135 11.3577013 2.778514642 2.226818648
GRMZM2G139463 6.866752401 5.373925806 1.388843962 1.062745344
GRMZM2G015295 1349.446347 447.4635291 29.43627879 29.2643755
GRMZM2G111909 47.95903081 27.5256729 1.656555758 0.949824883
GRMZM2G078097 4.433627458 0.928492841 0.063329249 0.034255945
GRMZM2G450498 36.15941083 9.45235616 0.700105077 0.194759794
GRMZM2G413652 25.06985426 15.91342458 5.372151214 3.618914949
GRMZM2G090087 21.00891969 18.02318412 17.49531186 10.74302155
following is the Pam clustering output
GRMZM2G181227
1
GRMZM2G146885
2
GRMZM2G139463
2
GRMZM2G015295
2
GRMZM2G111909
2
GRMZM2G078097
3
GRMZM2G450498
3
GRMZM2G413652
2
GRMZM2G090087
2
AC217811.3_FG003
2
Using the above two files I generated a third file that somewhat looks like this and has cluster information in the form of cluster type K1,K2,etc
geneID RPKM-base RPKM-1cm RPKM+4cm RPKMtip Cluster_type
GRMZM2G181227 3.412444267 3.16437442 1.287909035 0.037320722 K1
GRMZM2G146885 14.17287135 11.3577013 2.778514642 2.226818648 K2
GRMZM2G139463 6.866752401 5.373925806 1.388843962 1.062745344 K2
GRMZM2G015295 1349.446347 447.4635291 29.43627879 29.2643755 K2
GRMZM2G111909 47.95903081 27.5256729 1.656555758 0.949824883 K2
GRMZM2G078097 4.433627458 0.928492841 0.063329249 0.034255945 K3
GRMZM2G450498 36.15941083 9.45235616 0.700105077 0.194759794 K3
GRMZM2G413652 25.06985426 15.91342458 5.372151214 3.618914949 K2
GRMZM2G090087 21.00891969 18.02318412 17.49531186 10.74302155 K2
I certainly don't think that this is the file that joran would have wanted me to create but I could not think of anything else thus I ran lattice on the above file using the following code.
clusres<- read.table("clusinput.txt",header=TRUE,sep="\t");
jpeg(filename = "clusplot.jpeg", width = 800, height = 1078,
pointsize = 12, quality = 100, bg = "white",res=100);
parallel(~clusres[2:5]|Cluster_type,clusres,horizontal.axis=FALSE);
dev.off();
and I get a picture like this
Since I want one single line as the representative of the whole cluster at four different points this output is wrong moreover I tried playing with lattice but I can not figure out how to make it accept the Rpkm values as the X coordinate It always seems to plot so many lines against a maximum or minimum value at the Y coordinate which I don't understand what it is.
It will be great if anybody can help me out. Sorry If my question still seems absurd to you.
I do not know of any pre-built functions that generate the plot you indicate, which looks to me like a sort of parallel coordinates plot.
But generating such a plot would be a fairly trivial exercise.
Add a column of cluster labels (K1,K2, etc.) to your original data set, based on your clustering algorithm's output.
Use one of the many, many tools in R for aggregating data (plyr, aggregate, etc.) to calculate the relevant summary statistics by cluster on each of the four variables. (You haven't said what the first graph is actually plotting. Mean and sd? Median and MAD?)
Since you want the plots split into six separate panels, or facets, you will probably want to plot the data using either ggplot or lattice, both of which provide excellent support for creating the same plot, split across a single grouping vector (i.e. the clusters in your case).
But that's about as specific as anyone can get, given that you've provided so little information (i.e. no minimal runnable example, as recommended here).
How about using clusplot from package cluster with partitioning around medoids? Here is a simple example (from the example section):
require(cluster)
#generate 25 objects, divided into 2 clusters.
x <- rbind(cbind(rnorm(10,0,0.5), rnorm(10,0,0.5)),
cbind(rnorm(15,5,0.5), rnorm(15,5,0.5)))
clusplot(pam(x, 2)) #`pam` does you partitioning

Resources