I have two 3x2 Matrices A, and B :
A:
n2 n3
Part1 1 5
Part2 2 6
Part3 3 7
B:
n2 n3
Part1 5 1
Part2 6 2
Part3 7 3
and I want to create stacked bar, which I do via:
d1 <- read.csv("A.csv", header=T, dec=".",sep = " ")
d1 <- subset(d1, select = c(n2, n3))
d2 <- read.csv("B.csv", header=T, dec=".",sep = " ")
d2 <- subset(d2, select = c(n2, n3))
d <- cbind(d1[,1],d2[,1],d1[,2],d2[,2])
barplot(d, col=c("lawngreen","firebrick","deepskyblue"),
space=c(0,0,0.2,0), xaxt = "n", yaxt="n",ylim = c(0, 25))
x_axis_range <- c(2,3)
x_axis_labels <- c("2", "3")
axis(1,at = x_axis_range, labels = x_axis_range)
y_axis_range <- c(0,2,4,6,25)
y_axis_labels <- c("0","2","4","6","25")
axis(2,at = y_axis_range, labels = y_axis_labels, las=2)
and it produces:
However, I want 2 and 3 to appear centered under each grouped bars. How can I do it?
Change the code that produces the x-axis to
x_axis_range <- c(1, 3.2)
x_axis_labels <- c("2", "3")
axis(1,at = x_axis_range, labels = x_axis_labels)
The bars are created with a width of one. In addition, you require to be the space between the second and third bar to be 0.2. Thus, the left borders of the bars are placed at 0, 1, 2.2, 3.2. If you want the labels to be placed at the centre of each group, you need therefore to put them at 1 and 3.2, which is what above definition of x_axis_range does.
I still left the labels to be 2 and 3, but of course you can change them to whatever you want.
Related
I have two data frames that look like this:
y1 <- c(1, 0, 0)
y2 <- c(0, 1, 0)
y3 <- c(0, 0, 1)
df1 <- data.frame(y1, y2, y3, row.names = c("x1", "x2", "x3"))
y1 <- c(1, 0, 0)
y2 <- c(1, 0, 0)
y3 <- c(1, 0, 0)
df2 <- data.frame(y1, y2, y3, row.names = c("z1", "z2", "z3"))
I want to graph the relationships in these dataframes so that the x, y, and z values appear in columns with lines connecting them. Here's a rough example of what I'm looking to achieve:
I considered using a ggplot2 scatterplot with a categorical variable on the X axis to generate the columns, but I couldn't figure out how to generate the connecting lines between points from that. I also looked into ggnet2 for a network plot, but couldn't find any examples where nodes were fixed in columns.
EDIT:
My real use case has approximately 20 x points, 120 y points and 200 z points, so ideally the solution would scale fairly easily.
I attempted the following solution using a sankeynetwork plot from the networkD3 package
library(networkD3)
Nodes <- data.frame(name = c("x1", "x2", "x3", "y1", "y2", "y3", "z1",
"z2", "z3"), group = c("1", "1", "1", "2", "2", "2", "3", "3",
"3"))
Links <- data.frame(source = c(0, 1, 2, 3, 4, 5), target = c(3, 4, 5, 6,
6, 6), value = 1, 1, 1, 1, 1, 1)
sankeyNetwork(Links = Links, Nodes = Nodes, Source = "source",
Target = "target", Value = "value", NodeGroup = "group", NodeID
= "name", sinksRight = FALSE)
The result is sort of correct... however it is probably not ideal. Additionally there doesn't seem to be a clear way of forcing z2 and z3 to appear with z1 on the right side of the page without going into the underlying javascript, which I have no idea how to do (see d3 sankey charts - manually position node along x axis)
Is there a better solution, or a way to improve this one?
Thanks!
Here is one possible solution using geom_segment to draw the connecting edges. I don't how adaptable it will be to datasets larger or more complex than your example. I suspect there is a more graceful and scalable way to handle this with igraph or ggraph.
# Start with two data.frames: one for node positions,
# and one for edges you want to draw between nodes.
pos_dat = data.frame(node_id=paste(rep(c("x", "y", "z"), each=3),
rep(c("1", "2", "3"), times=3),
sep=""),
type=rep(c("x", "y", "z"), each=3),
xpos=rep(c(1, 2, 3), each=3),
ypos=rep(c(1, 2, 3), times=3))
# node_id type xpos ypos
# 1 x1 x 1 1
# 2 x2 x 1 2
# 3 x3 x 1 3
# 4 y1 y 2 1
# 5 y2 y 2 2
# 6 y3 y 2 3
# 7 z1 z 3 1
# 8 z2 z 3 2
# 9 z3 z 3 3
edge_dat = data.frame(start=c("x1", "x2", "x3", "y1", "y2", "y3"),
end=c("y1", "y2", "y3", "z1", "z1", "z1"))
# start end
# 1 x1 y1
# 2 x2 y2
# 3 x3 y3
# 4 y1 z1
# 5 y2 z1
# 6 y3 z1
# Use two successive merges to join node x,y positions
# for each edge you want to draw.
tmp_dat = merge(edge_dat, pos_dat, by.x="start", by.y="node_id")
seg_dat = merge(tmp_dat, pos_dat, by.x="end", by.y="node_id")
# Remove unneeded columns and change column names for convenience.
seg_dat$type.x = NULL
seg_dat$type.y = NULL
names(seg_dat) = c("end", "start", "x", "y", "xend", "yend")
seg_dat
# end start x y xend yend
# 1 y1 x1 1 1 2 1
# 2 y2 x2 1 2 2 2
# 3 y3 x3 1 3 2 3
# 4 z1 y1 2 1 3 1
# 5 z1 y2 2 2 3 1
# 6 z1 y3 2 3 3 1
# Finally, draw the plot.
library(ggplot2)
p = ggplot() +
geom_segment(data=seg_dat, aes(x=x, y=y, xend=xend, yend=yend),
colour="grey50") +
geom_point(data=pos_dat, aes(x=xpos, y=ypos, colour=type), size=10) +
geom_text(data=pos_dat, aes(x=xpos, y=ypos, label=node_id)) +
scale_colour_manual(values=c(x="steelblue", y="darkorange", z="olivedrab3"))
ggsave("plot.png", plot=p, height=4, width=6, dpi=150)
Initially I was trying to add the horizontal color side bar to the dendrogram plot (NOT to the whole heat map) using colored_bars from dendextend.
The code below (THANK YOU for your help Tal!) works pretty well. The only issue remaining is how to control the distance of the bar from the leaves labels and the bar width?
Here is an example, data and code
Data (4 variables, 5 cases)
df <- read.table(header=T, text="group class v1 v2
1 A 1 3.98 23.2
2 A 2 5.37 18.5
3 C 1 4.73 22.1
4 B 1 4.17 22.3
5 C 2 4.47 22.4
")
car_type <- factor(df[,c(1)]) # groups codes (A,B,C)
cols_4 <- heat.colors(3)
col_car_type <- cols_4[car_type]
matrix<-data.matrix(df[,c(3,4)])
rnames<-df[,2]
row.names(matrix)<-rnames
matrix<-data.matrix(df[,c(3,4)])
row.names(matrix)<-rnames
dend<-hclust(dist(matrix))
labels_colors(dend) <- col_car_type[order.dendrogram(dend)] # Error in order.dendrogram(dend) : 'order.dendrogram' requires a dendrogram
# But I dont think the line above is doing anything so it can be removed...
plot(dend)
colored_bars(col_car_type, dend)
This is possible to do using dendextend.
First to install the latest dendextend version you can use:
install.packages(dendextend)
Here is an example using mtcars:
## mtcars example
# Create the dend:
dend <- as.dendrogram(hclust(dist(mtcars)))
# Create a vector giving a color for each car to which company it belongs to
car_type <- rep("Other", length(rownames(mtcars)))
is_x <- grepl("Merc", rownames(mtcars))
car_type[is_x] <- "Mercedes"
is_x <- grepl("Mazda", rownames(mtcars))
car_type[is_x] <- "Mazda"
is_x <- grepl("Toyota", rownames(mtcars))
car_type[is_x] <- "Toyota"
car_type <- factor(car_type)
n_car_types <- length(unique(car_type))
cols_4 <- colorspace::rainbow_hcl(n_car_types, c = 70, l = 50)
col_car_type <- cols_4[car_type]
# extra: showing the various clusters cuts
k234 <- cutree(dend, k = 2:4)
# color labels by car company:
labels_colors(dend) <- col_car_type[order.dendrogram(dend)]
# color branches based on cutting the tree into 4 clusters:
dend <- color_branches(dend, k = 4)
### plots
par(mar = c(12,4,1,1))
plot(dend)
colored_bars(cbind(k234[,3:1], col_car_type), dend, rowLabels = c(paste0("k = ", 4:2), "Car Type"))
# horiz version:
par(mar = c(4,1,1,12))
plot(dend, horiz = TRUE)
colored_bars(cbind(k234[,3:1], col_car_type), dend, rowLabels = c(paste0("k = ", 4:2), "Car Type"), horiz = TRUE)
legend("topleft", legend = levels(car_type), fill = cols_4)
Let's say I have a variable mm = 5 which I would like to change and have the number I change it to automatically update in the legend of the graph. So where it says "no change" and "change in m" I would like it to say m = 3 and m = 5. Can this be done?
m <- 3
a <- .5
b <- 1
c <- .5
g <- seq(.02,.2,by=.02)
n <- 7
r <- .25
alpha <- 2
dt <- 1
X <- .1
#Changed parameters
mm <- 5
A.2 = function(m = m,a,b,c,g,n,r,alpha,dt,X) {
1 - exp(-dt*(1/(alpha*dt)*log(1+(alpha*b*dt*m*a^2*c*X*exp(-g*n))/(a*c*X+g))))
}
all.data.g <- expand.grid(m = m,g = g,X = X)
all.data.g$a.4 <- A.2(m,a,b,c,all.data.g$g,n,r,alpha,dt,X)
all.data.g$a.5 <- A.2(mm,a,b,c,all.data.g$g,n,r,alpha,dt,X)
plot(all.data.g$g, all.data.g$a.4, xlab = 'g', ylab = 'attack rate', ylim = c(0,1), type = 'l')
lines(all.data.g$g, all.data.g$a.5, lty=2)
legend('topright', c("No change","Change in m"),lty=c(1,2))
How do I plot a network of type bipartite in R? Similar to this:
I have similar data but with weights for both genes and diseases and SARS. This network is an example. I have different kind of attributes. I followed a link here. But due to my little knowledge in this topic, I could not get much out of it. Thanks in advance for any help.
From the ?bipartite_graph help:
Bipartite graphs have a type vertex attribute in igraph, this is boolean and FALSE for the vertices of the first kind and TRUE for vertices of the second kind.
So you could do something like this (igraph 1.0.1):
library(igraph)
set.seed(123)
# generate random bipartite graph.
g <- sample_bipartite(10, 5, p=.4)
# check the type attribute:
V(g)$type
# define color and shape mappings.
col <- c("steelblue", "orange")
shape <- c("circle", "square")
plot(g,
vertex.color = col[as.numeric(V(g)$type)+1],
vertex.shape = shape[as.numeric(V(g)$type)+1]
)
Check also ?bipartite.
Using the example provided by the OP in the comments. Since the graph is multipartite and given the provided data format, I would first create a bipartite graph, then add the additional edges. Note that although the resulting graph returns TRUE for is_bipartite() the type argument is specified as numeric instead of logical and may not work properly with other bipartite functions.
set.seed(123)
V1 <- sample(LETTERS[1:10], size = 10, replace = TRUE)
V2 <- sample(1:10, size = 10, replace = TRUE)
d <- data.frame(V1 = V1, V2 = V2, weights = runif(10))
d
> d
V1 V2 weights
1 C 10 0.8895393
2 H 5 0.6928034
3 E 7 0.6405068
4 I 6 0.9942698
5 J 2 0.6557058
6 A 9 0.7085305
7 F 3 0.5440660
8 I 1 0.5941420
9 F 4 0.2891597
10 E 10 0.1471136
g <- graph_from_data_frame(d, directed = FALSE)
V(g)$label <- V(g)$name # set labels.
# create a graph connecting central node FOO to each V2.
e <- expand.grid(V2 = unique(d$V2), V2 = "FOO")
> e
V2 V2
1 10 FOO
2 5 FOO
3 7 FOO
4 6 FOO
5 2 FOO
6 9 FOO
7 3 FOO
8 1 FOO
9 4 FOO
g2 <- graph.data.frame(e, directed = FALSE)
# join the two graphs.
g <- g + g2
# set type.
V(g)$type <- 1
V(g)[name %in% 1:10]$type <- 2
V(g)[name %in% "FOO"]$type <- 3
V(g)$type
> V(g)$type
[1] 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3
col <- c("steelblue", "orange", "green")
shape <- c("circle", "square", "circle")
library(rTRM) # Bioconductor package containing layout.concentric()
# the fist element in the list for concentric is the central node.
l <- layout.concentric(g, concentric = list("FOO", 1:10, LETTERS[1:10]))
plot(g,
layout = l,
vertex.color = col[V(g)$type],
vertex.shape = shape[V(g)$type],
edge.width = E(g)$weights * 5 # optional, plot edges width proportional to weights.
)
The function layout.concentric() is in (my) package rTRM, available from Bioconductor. It is really a simple implementation I wrote to do exactly what you want. I am not completely sure whether the latest igraph version has the same functionality though (it may be).
For the example you provided, I would recommend using the x and y attributes for visualizing a bipartite graph. E.g.:
V(g)$x <- c(1, 1, 1, 2, 2, 2, 2)
V(g)$y <- c(3, 2, 1, 3.5, 2.5, 1.5, 0.5)
V(g)$shape <- shape[as.numeric(V(g)$type) + 1]
V(g)$color <- c('red', 'blue', 'green', 'steelblue', 'steelblue', 'steelblue', 'steelblue')
E(g)$color <- 'gray'
E(g)$color[E(g)['A' %--% V(g)]] <- 'red'
E(g)$color[E(g)['B' %--% V(g)]] <- 'blue'
E(g)$color[E(g)['C' %--% V(g)]] <- 'green'
plot(g)
EDIT: added code to give the vertices and edges different colors for clarity.
Or you can use the multigraph package.
swomen <- read.dl(file = "http://moreno.ss.uci.edu/davis.dat")
bmgraph(swomen, layout = "force", seed = 1, cex = 3, tcex = .8, pch = c(19, 15), lwd = 2,
+ vcol = 2:3, ecol = 8, rot = 65)
that can produce the binomial projection of the two-mode data set
This is the code I'm currently running:
n <- 7
N <- 52
r <- 13
reps <- 1000000
deck <- rep(c('h','d','c','s'), each = r)
diamonds <- rep(NA, length.out = reps)
pos <- sample(x = 1:52, size = 7, replace = FALSE)
for(i in 1:reps) {
hand <- sample(x = deck, replace = FALSE)[pos]
diamonds[i] <- sum(ifelse(hand == 'd', 1, 0))
}
barplot(table(diamonds), col = 'red', xlab = '# of diamonds',
ylab = paste('frequency out of',reps,'trials'),
main = paste('Positions:',pos[1],pos[2],pos[3],pos[4],
pos[5],pos[6],pos[7]))
What I'd really like is to be able to give a title to the barplot with something like the following
barplot(..., main = paste('Positions:',pos))
and have the title say "Positions: p1 p2 p3 p4 p5 p6 p7", where p1,p2,...,p7 are the elements of pos.
For anyone that's interested, this code randomly chooses 7 positions from 52 and then counts the number of diamonds ('d') within those 7 positions after each shuffle of the deck for 1000000 shuffles. Then the empirical distribution of the number of diamonds within those 7 cards is plotted.
Use collapse in paste to collapse the multiple elements in a vector containing the base test and pos,
paste(c('Positions:', pos), collapse=" ")
Otherwise, when you paste "Positions:" to pos you get the former recycled to the length of pos.