I'm constructing large directional graphs (using igraph, from R) and have discovered a strange issue in which vertices are apparently duplicated for certain vertex names. This issue doesn't happen in small graphs, and the issue only appears to arise when the vertex names reach 1e+05. There is a clear regularity to the vertices that get duplicated. To jump ahead, the vertex duplication looks like this (generated in section 2 of the below code):
name_vertex id_cell id_vertex
1: 100000 100000 97355
2: 1e+05 100000 1435205
3: 200000 200000 197106
4: 2e+05 200000 1435206
5: 400000 400000 396605
6: 4e+05 400000 1435207
7: 500000 500000 496356
8: 5e+05 500000 1435208
9: 700000 700000 695855
10: 7e+05 700000 1435209
11: 800000 800000 795606
12: 8e+05 800000 1435210
13: 1000000 1000000 995105
14: 1e+06 1000000 1435211
The duplication occurs when 1e+05 is reached and then duplicates are generated for that and every subsequent vertex that is xe+0n where x is in 1:9 and n is >=5 (note that in this graph there is no 3e+05 valued vertex by construction- it lies on the matrix margin- and this is why it isn't present).
All x0.. versions of the vertices hold the outgoing edges, while the xe+0.. versions hold the incoming edges.
Reproducible example:
(note: the way in which I generate the adjacency dataframe owes more to the pipeline I've been using to generate graphs for my use case. The issue could presumably be generated more directly).
The below code generates a matrix, identifies each cell's adjacencies and then constructs a graph from these. Cells at the matrix margin are assigned 0 values to remove them from the adjacency table (to prevent wrapping round the edges).
There are three sections:
(1) running for matrix dimension 100x100: correct behaviour
(2) running for matrix dimension 1200x1200: duplication
(3) unpacking the duplication issue
NOTE: it takes 30 seconds or so and 3-4GB RAM to produce the graph in (2)
# packages
library(data.table); library(igraph)
# function to get adjacent cells in a matrix
get_adjacent <- function(cells, n_row, n_col) {
adjacencies_i <- c(cells-n_row - 1,
cells-n_row,
cells-n_row+1,
cells-1,
cells+1,
cells+n_row-1,
cells+n_row,
cells+n_row+1)
return(adjacencies_i)
}
# function to get the margins of a matrix (i.e. 1-deep outer margin of cells)
get_margins <- function(matrix) {
dims <- dim(matrix)
bottom_right <- prod(dims)
top_right <- (bottom_right - dims[1])
c(1:dims[1], # first column
top_right:bottom_right, # last column
seq(1, top_right, dims[1]), # top row
seq(dims[1], bottom_right, dims[1])) # bottom row
}
# (1) Before creating the failure case, produce a much smaller graph that
# has the correct behaviour
# generate a matrix of 1-valued cells
test_mat <- matrix(1, ncol=100, nrow=100)
# remove edge cells to prevent the adjacencies wrapping around the edges
test_mat[get_margins(test_mat)] <- 0
# plot: all black cells are those that should be represented in the graph, and
# each of these cells should each be linked to their immediately adjacent neighbours
# (including diagonals - see get_adjacent function)
image(test_mat, asp=1, col=c("red", "black"))
# calculate the adjacency dataframe to calculate a graph from
permitted_cells <- which(test_mat[] == 1)
n_row <- dim(test_mat)[1]
n_col <- dim(test_mat)[2]
# full set of adjacencies
adj <- data.table(from = rep(permitted_cells, (1*2 + 1)^2 - 1),
to = get_adjacent(permitted_cells, n_row, n_col))
# remove those that are 0-valued
adj_permitted <- adj[to %in% permitted_cells,]
# calculate graph
g <- graph_from_data_frame(adj_permitted[,list(from, to)], directed = T)
# get vertex names
vertex_names <- names(V(g))
graph_vertices <- data.table(name_vertex = vertex_names,
id_cell = as.integer(vertex_names),
id_vertex = 1:length(vertex_names))
setorder(graph_vertices, id_cell)
# looks good: same number of vertices in graph as there are 1-valued cells in the
# original matrix
print(paste0("n_vertices: ", nrow(graph_vertices)))
print(paste0("n_cells: ", sum(test_mat)))
## (2) failure case. Code is identical to the above, save for the dimensions of
## the matrix being much larger (1200 rather than 100), and the image() function
## is commented out.
# generate a matrix of 1-valued cells
test_mat <- matrix(1, ncol=1200, nrow=1200)
# remove edge cells to prevent the adjacencies wrapping around the edges
test_mat[get_margins(test_mat)] <- 0
# plot: all black cells are those that should be represented in the graph, and
# each of these cells should each be linked to their immediately adjacent neighbours
# (including diagonals - see get_adjacent function)
# image(test_mat, asp=1, col=c("red", "black"))
# calculate the adjacency dataframe to calculate a graph from
permitted_cells <- which(test_mat[] == 1)
n_row <- dim(test_mat)[1]
n_col <- dim(test_mat)[2]
# full set of adjacencies
adj <- data.table(from = rep(permitted_cells, (1*2 + 1)^2 - 1),
to = get_adjacent(permitted_cells, n_row, n_col))
# remove those that are 0-valued
adj_permitted <- adj[to %in% permitted_cells,]
# calculate graph
g <- graph_from_data_frame(adj_permitted[,list(from, to)], directed = T)
# get vertex names
vertex_names <- names(V(g))
graph_vertices <- data.table(name_vertex = vertex_names,
id_cell = as.integer(vertex_names),
id_vertex = 1:length(vertex_names))
setorder(graph_vertices, id_cell)
# there are 7 more vertices than there are 1-valued cells
print(paste0("n_vertices: ", nrow(graph_vertices)))
print(paste0("n_cells: ", sum(test_mat)))
print(paste0("n_extra_vertices: ", nrow(graph_vertices) - sum(test_mat)))
# (3) What are these extra vertices?
# get duplicated vertices
duplicated_vertices <-
graph_vertices[id_cell %in% graph_vertices[duplicated(id_cell),id_cell]]
setorder(duplicated_vertices, id_cell, id_vertex)
# the 7 additional vertices arise through duplication
nrow(duplicated_vertices)
print(duplicated_vertices)
# xe+.. version has the incoming edges
incoming <- adjacent_vertices(g, duplicated_vertices$id_vertex, mode="in")
incoming[unlist(lapply(incoming, function(x) length(x) != 0))]
# x0.. version has outgoing edges
outgoing <- adjacent_vertices(g, duplicated_vertices$id_vertex, mode="out")
outgoing[unlist(lapply(outgoing, function(x) length(x) != 0))]
To (finally) get to the question. What is going on here? Is there something I can do to prevent this behaviour? The workaround I currently have is to take the incoming edges that are received by the xe+0.. version and add edges to these vertices for the x0.. version, before deleting the xe+0.. version of the vertex.
The problem seems to be caused by R (or igraph) equating the two forms 100000 and 1e+05. I managed to resolve it by adding the statement options(scipen=99) at the start of the script, which stops R from using the e notation.
Related
I would like to remove the pixels that form a large cluster and keep only the small cluster to analyse (means get pixels number and locations). First I apply a filter to color in white all pixels that has a value lower to 0.66. Then I use the function clump() in R. The model works but I cannot remove only the large cluster. I do not understand how clump function works.
Initial image:
Results image: plot_r is the image where the pixels with value < 0.66 are changed to 0. plot_rc is the results after clump() function. As observed I cannot remove only the large cluster of pixels (on top of the image plot_r). I changed the value (700 in the code) but not better, how to do?
Here the code:
library(magick)
library(pixmap)
library(raster)
library(igraph)
f <- "https://i.stack.imgur.com/2CjCh.jpg"
x <- image_read(f)
x <- image_convert(x, format = "pgm", depth = 8)
# Save the PGM file
f <- tempfile(fileext = ".pgm")
image_write(x, path = f, format = "pgm")
# Read in the PGM file
picture <- read.pnm(file = f, cellres = 1)
str(picture)
picture#size
mat <- picture#grey
mat[mat<0.66] <- 0; x
##############################################################
##Remove clumps of pixels in R using package Raster and igraph
#Detect clumps (patches) of connected cells
r <-raster(mat)
rc <- clump(r)
#extract IDs of clumps according to some criteria
clump9 = data.frame(freq(rc))
#remove clump observations with frequency smaller/larger than N
clump9 = clump9[ ! clump9$count > 700, ]
# record IDs from clumps which met the criteria in previous step
clump9 = as.vector(clump9$value)
#replace cells with IDs which do not belong to the group of interest
rc[rc != clump9[1] & rc != clump9[2]] = NA
# converting rasterlayer to matrix
n <- as.matrix(r)
m <- as.matrix(rc)
Perhaps something like this
library(raster)
library(igraph)
Short-cutting your approach a bit
f <- "https://i.stack.imgur.com/2CjCh.jpg"
b <- brick(f)
x <- sum(b)
r <- x > 450
rc <- clump(r)
f <- freq(rc, useNA="no")
Replace the clumps with the number of cells they consist of and then set the larger one (here more than 100 cells) to NA, and use the result to mask the original raster
rs <- subs(rc, data.frame(f))
rsc <- reclassify(rs, cbind(100,Inf,NA))
m <- mask(b, rsc)
plotRGB(m)
I have a set of 6 colour codes (x), a set of N individuals that each need to be labelled with a unique colour code, and four locations on each animal, each of which can carry a different colour. I have 6 different colors.
So, the codes for two individual might be;
1. red, blue, blue,white
2. white,yellow,pink,yellow
However, as the colour at each position can fall off, I would like to generate a redundant labelling scheme, that would allow still allow an individual to be distinguished from others, even after it loses the colour at one (or even two?) locations.
Even though 6 colours and 4 positions gives 1296 combinations, I am finding it difficult to select the N most dissimilar combinations:
Reproducible example:
library(gtools)
x <- c("white", "red", "green", "blue", "pink", "yellow")
Perms <- permutations(n=6,r=4,v=x,repeats.allowed=T)
print(nrow(Perms))
head(Perms)
Note that the first 6 combinations differ in the colour at only 1 position - loss of this code by >1 individual would mean that they can no longer be distinguished!
So, for values of N between 50-150, how to select the N most dissimilar combinations?
Thanks !
I cannot conclusively answer your question, but I have an idea that might help you.
Build string codes with the first letter of each color:
library(gtools)
x <- c("w", "r", "g", "b", "p", "y")
Perms <- permutations(n=6,r=4,v=x,repeats.allowed=T)
m <- apply(Perms, 1, paste, collapse = "")
> head(m)
[1] "bbbb" "bbbg" "bbbp" "bbbr" "bbbw" "bbby"
Sample n codes:
set.seed(1)
n <- 50
y <- sample(m, n)
Create a n*n matrix of Levenshtein distances:
library(vwr)
lvmat <- sapply(y, function(x) levenshtein.distance(x, y))
> lvmat[1:5, 1:5]
grrp pgpg rprr yprw gggp
grrp 0 4 3 3 2
pgpg 4 0 4 4 3
rprr 3 4 0 2 4
yprw 3 4 2 0 4
gggp 2 3 4 4 0
Now you could maximize sum(lvmat), maybe via bootstrapping or whatever floats your boat, to get the sample of most dissimilar combinations.
Reproducible example of LAPs suggestion above.
Note, due to the reliance upon random sampling, this still does not guarantee that there will be no code pairs that differ at only one position. Still, it's a good start -thanks LAP!
# install.packages("gtools")
library(gtools)
library(vwr)
## Available colours
x <- c("W", "R", "G", "B", "P", "Y")
## Generate all possible colour combinations, for 6 colours & 4 positions
body <- data.frame(permutations(n=6,r=4,v=x,repeats.allowed=T), stringsAsFactors = F) ; colnames(body) <- c("Head","Thorax","L_gaster","R_gaster")
## concatenate each colour-code to a sequence without spaces, etc
m <- paste( body$Head, body$Thorax, body$L_gaster, body$R_gaster, sep="")
##
set.seed(1)
COLONY_SIZE <- 50 ## How many adult workers in the colony excluding the queen
N_Attempts <- 1000 ## How many alternative solutions to generate - the more the better, but it takes longer
## prepare data-containers
Summary <- NULL
LvList <- list()
for (TRY in 1:N_Attempts)
{print(paste(TRY,"of",N_Attempts))
y <- sample(m, COLONY_SIZE) ## randomly sample COLONY_SIZE codes
## measure pairwise Levenshtein distances for all pair combinations
Matrix <- sapply(y, function(x) levenshtein.distance(x, y))
diag(Matrix) <- NA ## eliminate self-self measure (distance = 0)
Matrix[lower.tri(Matrix)] <- NA ## dist i-j = dist j-i
## store solution
LvList[[TRY]] <- Matrix
## summarize each solution using three metrics:
## (i) the average pair distance (higher is better)
## (ii) the number of 'close' code pairs (those with the minimum distance of 1 - lower is better)
## (iii) the maximum number of 'close' code *pairs across all codes (lower is better)
Summary <- rbind(Summary, data.frame(Mean_Distance = mean(Matrix, na.rm=T),
N_close_pairs = sum(Matrix[!is.na(Matrix)]==1),
N_close_pairs_per_ant = max(rowSums( Matrix==1, na.rm=T)) ))
}
## ***Find the solution with the fewest pairs wiRth the lowest distance***
Summary$Mean_Distance_Rank <- rank(Summary$Mean_Distance)
Summary$N_close_pairs_Rank <- rank(-Summary$N_close_pairs)
Summary$N_close_pairs_per_ant_Rank <- rank(-Summary$N_close_pairs_per_ant)
Summary$Rank_Total <- Summary$Mean_Distance_Rank + Summary$N_close_pairs_Rank + Summary$N_close_pairs_per_ant_Rank
solution <- rownames( LvList[[which.max(Summary$Rank_Total)]] )
## Highlight candidate solutions
Colour <- rep(rgb(0,0,0,0.1,1),nrow(Summary) )
Colour [which.max(Summary$Rank_Total) ] <- "red"
pairs(Summary[,c("Mean_Distance","N_close_pairs","N_close_pairs_per_ant")], col=Colour, bg=Colour, pch=21, cex=1.4)
## format into a table
SOLUTION <- data.frame(Code=1:COLONY_SIZE, t(as.data.frame(sapply(solution, strsplit, ""))))
colnames(SOLUTION)[2:5] <- c("Head","Thorax","L_gaster","R_gaster")
Here's a better approach that does not rely upon blind sampling, but instead represents the similarity between each code pair as an edge in a network, and then uses the igraph function largest_ivs to searche for the most dissimilar code pairs:
rm(list=ls())
library(gtools)
library(igraph)
##
outputfolder <- "XXXXXXXXXX"
dir.create(outputfolder,showWarnings = F)
setwd(outputfolder)
## Available colours
x <- c("W", "R", "G", "B", "P", "Y")
## Generate all possible colour combinations, for 6 colours & 4 positions
body <- data.frame(permutations(n=6,r=4,v=x,repeats.allowed=T), stringsAsFactors = F) ; colnames(body) <- c("Head","Thorax","L_gaster","R_gaster")
write.table(body,file="Paint_marks_full_list.txt",col.names=T,row.names=F,quote=F,append=F)
## Generate edge list
edge_list <- data.frame(comb_1=character(),comb_2=character(),similarity=character())
if (!file.exists("Edge_list.txt")){
write.table(edge_list,file="Edge_list.txt",col.names=T,row.names=F,quote=F,append=F)
}else{
edge_list <- read.table("Edge_list.txt",header=T,stringsAsFactors = F)
}
if (nrow(edge_list)>0){
last_i <- edge_list[nrow(edge_list),"comb_1"]
last_j <- edge_list[nrow(edge_list),"comb_2"]
}
if (!(last_i==(nrow(body)-1)&last_j==nrow(body))){
for (i in last_i:(nrow(body)-1)){
print(paste("Combination",i))
for (j in (i+1):nrow(body)){
if (i>last_i|j>last_j){
simil <- length(which(body[i,]==body[j,]))
if (simil>0){
write.table(data.frame(comb_1=i,comb_2=j,similarity=simil),file="Edge_list.txt",col.names=F,row.names=F,quote=F,append=T)
}
}
}
}
}
######let's make 3 graphs with edges representing overlap between combinations ###
##First graph, in which ANY overlap between two combinations is seen as an edge. Will be used to produce list of paint combination with no overlap
net1 <- graph.data.frame(edge_list[c("comb_1","comb_2")],directed=F)
##Second graph, in which only overlaps of 2 or more spots is seen as an edge. Will be used to produce list of paint combinations with no more than 1 spot in common
net2 <- graph.data.frame(edge_list[which(edge_list$similarity>=2),c("comb_1","comb_2")],directed=F)
##Third graph, in which only overlaps of 3 or more spots is seen as an edge. Will be used to produce list of paint combinations with no more than 2 spots in common
net3 <- graph.data.frame(edge_list[which(edge_list$similarity>=3),c("comb_1","comb_2")],directed=F)
#######Now let's use the ivs function to get independent vertex sets, i.e., set of vertices with no connections between any of them
no_overlap_list <- largest_ivs(net1)
max_one_spot_overlap_list <- largest_ivs(net2)
max_two_spots_overlap_list <- largest_ivs(net3)
I have a problem that sounds easy, but I really cannot find the mistake. I have 3377 data points (measurements of body temperature). The sampling rate is 5min and I would like to put the data into a matrix. However, R starts recycling once it has put all 3377 data points into the matrix. To prevent r from doing this, I wrote a loop and I want the loop to stop when the end of the vector is reached.
Ankle.r <- 1:3377 # Example data
a = 288 # sampling rate = 5min -> 288 measurement points per day
c = 11 # 11 full days of sampling (and a few more points, wherefore the matrix is to be 12 rows)
Ankle.r2 <- matrix(NA, ncol = a, nrow = c+1) # matrix with NAs for 12 days with 288 cols each (=3456 cells)
x <- length (Ankle.r) # total number of data points, is 3377
for (f in 1:(c+1)){ # for each row
for (p in 1:a){ # for each column (i.e. cell)
st_op <- (((f-1)*p)+p) # STOP criterion, gives the number of cells that have already been filled
if (st_op<x){ # only perform operation if the number of cells filled is < the number of data points (i.e. 3377)
Ankle.r2[f,p] <- Ankle.r[(((f-1)*p)+p)]
} else {stop
}
}
}
However, the loop does not stop...it loops till the last cell in my matrix. According to my calculations, the last 79 cells should remain free (i.e. NA, because 3456 cells - 3377 = 79), but that is only true for the last 8 or so...
Any hints where the mistake is?
Thanks!
I think this does what you would like to do:
Ankle.r <- 1:3377 # Example data
a = 288 # sampling rate = 5min -> 288 measurement points per day
c = 11
length(Ankle.r) <- a * (c + 1) #pad input vector with NA values
m <- matrix(Ankle.r, ncol = a, byrow = TRUE)
Ok, try an example and it will show you where your mistake is...sighing. The loop must be:
Ankle.r2 <- matrix(NA, ncol = a, nrow = c+1) # matrix with NAs for 12 days with 288 cols each (=3456 cells)
x <- length (Ankle.r) # total number of data points, is 3377
for (f in 1:(c+1)){ # for each row
for (p in 1:a){ # for each column (i.e. cell)
st_op <- (((f-1)*a)+p) # STOP criterion, gives the number of cells that have already been filled
if (st_op<=x){ # only perform operation if the number of cells filled is < the number of data points (i.e. 3377)
Ankle.r2[f,p] <- Ankle.r[(((f-1)*a)+p)]
} else {stop
}
}
}
Thanks anyway!
Best,
Christine
I've got the following code:
df <- read.table(text='verkoop V621
verkoopcode V62123
verkoopcodenaam V6212355
verkoopdatum V621335
verkoopdatumchar V62133526
verkooppr V6216
verkoopprijs V62162
verkoopsafdeling V621213452
verkoopsartikel V62126324')
# use igraph package
require(igraph)
# create adjacency matrix
adj <- nchar(sapply(df$V1, gsub, x=df$V1, replacement=''))
adj[!sapply(df$V1, grepl, x=df$V1)] <- 0
# name adjecency matrix
colnames(adj) <- df$V2
# original graph
gr <- graph.adjacency(adj, mode='directed', weighted=TRUE)
layout(matrix(1:2, ncol=2))
plot(gr)
# minimum spanning tree
mst <- minimum.spanning.tree(gr)
shortest.paths(mst, to="V621", weights=rep(1, ecount(mst)))
Now I get for every node the depth in the tree. I want to determine now which node comes before a specific node. For example, for 'verkoopdatumchar' I want to find 'verkoopdatum'.
After struggling with this problem for a while, I am hoping to get some advice here. I am wondering if anyone is aware of an automated method for determining pairwise grouping labels based on significance. The question is independent of the significance test (e.g. Tukey for parametric or Mann-Whitney for non-parametric) - given these pairwise comparisons, some boxplot-type figures often represent these groupings with a sub-script:
I have done this example by hand, which can be quite tedious. I think that the sequence of labeling in the algorithm should be based on the number of levels in each group - e.g. those groups containing single levels that are significantly different from all other levels should be named first, then groups containing 2 levels, then 3, etc., all the while checking that new groupings add a new needed grouping and do not violate and differences.
In the example below, the tricky part is getting the algorithm to recognize that level 1 should be grouped with 3 and 5, but 3 and 5 should not be grouped (i.e. share a label).
Example code:
set.seed(1)
n <- 7
n2 <- 100
mu <- cumsum(runif(n, min=-3, max=3))
sigma <- runif(n, min=1, max=3)
dat <- vector(mode="list", n)
for(i in seq(dat)){
dat[[i]] <- rnorm(n2, mean=mu[i], sd=sigma[i])
}
df <- data.frame(group=as.factor(rep(seq(n), each=n2)), y=unlist(dat))
bp <- boxplot(y ~ group, df, notch=TRUE)
kr <- kruskal.test(y ~ group, df)
kr
mw <- pairwise.wilcox.test(df$y, df$g)
mw
mw$p.value > 0.05 # TRUE means that the levels are not significantly different at the p=0.05 level
# 1 2 3 4 5 6
#2 FALSE NA NA NA NA NA
#3 TRUE FALSE NA NA NA NA
#4 FALSE FALSE FALSE NA NA NA
#5 TRUE FALSE FALSE FALSE NA NA
#6 FALSE FALSE FALSE TRUE FALSE NA
#7 FALSE FALSE FALSE FALSE FALSE FALSE
text(x=1:n, y=bp$stats[4,], labels=c("AB", "C", "A", "D", "B", "D", "E"), col=1, cex=1.5, pos=3, font=2)
First let me restate the problem in the language of graph theory. Define a graph as follows. Each sample gives rise to a vertex that represents it. Between two vertices, there is an edge if and only if some test indicates that the samples represented by those vertices could not be distinguished statistically. In graph theory, a clique is a set of vertices such that, between every two vertices in the set, there is an edge. We're looking for a collection of cliques such that every edge in the graph belongs to (at least? exactly?) one of the cliques. We'd like to use as few cliques as possible. (This problem is called clique edge cover, not clique cover.) We then assign each clique its own letter and label its members with that letter. Each sample distinguishable from all others gets its own letter as well.
For example, the graph corresponding to your sample input could be drawn like this.
3---1---5 4--6
My proposed algorithm is the following. Construct the graph and use the Bron--Kerbosch algorithm to find all maximal cliques. For the graph above, these are {1, 3}, {1, 5}, and {4, 6}. The set {1}, for example, is a clique, but it is not maximal because it is a subset of the clique {1, 3}. The set {1, 3, 5} is not a clique because there is no edge between 3 and 5. In the graph
1
/ \
3---5 4--6,
the maximal cliques would be {1, 3, 5} and {4, 6}.
Now search recursively for a small clique edge cover. The input to our recursive function is a set of edges remaining to be covered and the list of maximal cliques. Find the least edge in the remaining set, where, e.g., edge (1,2) < (1,5) < (2,3) < (2,5) < (3,4). For each maximal clique that contains this edge, construct a candidate solution comprised of that clique and the output of a recursive call where the clique edges are removed from set of edges remaining. Output the best candidate.
Unless there are very few edges, this may be too slow. The first performance improvement is memoize: maintain a map from inputs to outputs of the recursive function so that we can avoid doing the work twice. If that doesn't work, then R should have an interface to an integer program solver, and we can use integer programming to determine the best collection of cliques. (I'll explain this more if the other approach is insufficient.)
I thought I would post the solution that I was able to derive with additional help from the following question:
set.seed(1)
n <- 7
n2 <- 100
mu <- cumsum(runif(n, min=-3, max=3))
sigma <- runif(n, min=1, max=3)
dat <- vector(mode="list", n)
for(i in seq(dat)){
dat[[i]] <- rnorm(n2, mean=mu[i], sd=sigma[i])
}
df <- data.frame(group=as.factor(rep(seq(n), each=n2)), y=unlist(dat))
bp <- boxplot(y ~ group, df, notch=TRUE)
#significance test
kr <- kruskal.test(y ~ group, df)
mw <- pairwise.wilcox.test(df$y, df$g)
#matrix showing connections between levels
g <- as.matrix(mw$p.value > 0.05)
g <- cbind(rbind(NA, g), NA)
g <- replace(g, is.na(g), FALSE)
g <- g + t(g)
diag(g) <- 1
rownames(g) <- 1:n
colnames(g) <- 1:n
g
#install.packages("igraph")
library(igraph)
# Load data
same <- which(g==1)
topology <- data.frame(N1=((same-1) %% n) + 1, N2=((same-1) %/% n) + 1)
topology <- topology[order(topology[[1]]),] # Get rid of loops and ensure right naming of vertices
g3 <- simplify(graph.data.frame(topology,directed = FALSE))
get.data.frame(g3)
# Plot graph
plot(g3)
# Calcuate the maximal cliques
res <- maximal.cliques(g3)
# Reorder given the smallest level
res <- sapply(res, sort)
res <- res[order(sapply(res,function(x)paste0(sort(x),collapse=".")))]
ml<-max(sapply(res, length))
reord<-do.call(order, data.frame(
do.call(rbind,
lapply(res, function(x) c(sort(x), rep.int(0, ml-length(x))))
)
))
res <- res[reord]
lab.txt <- vector(mode="list", n)
lab <- letters[seq(res)]
for(i in seq(res)){
for(j in res[[i]]){
lab.txt[[j]] <- paste0(lab.txt[[j]], lab[i])
}
}
bp <- boxplot(y ~ group, df, notch=TRUE, outline=FALSE, ylim=range(df$y)+c(0,1))
text(x=1:n, y=bp$stats[5,], labels=lab.txt, col=1, cex=1, pos=3, font=2)
Cool code.
I think you need to quote the function order() when calling do.call:
reord<-do.call("order", data.frame(
do.call(rbind,
lapply(res, function(x) c(sort(x), rep.int(0, ml-length(x))))
)
))