Related
Thanks for your help in advance!
My question is, given a list of sets, how can I visualize the overlap of any of the two sets using the network plot as shown below?
Please feel free to generate any sets for demonstration. Or you can use the following simple sets.
set.seed(123456)
A <- sample(1:100, 60)
B <- sample(1:100, 50)
C <- sample(1:100, 75)
In ggraph we must use scale_size() for nodes and scale_edge_width() for edges to harmonize proportions. Point sizes in ggplot are scaled by their radius already:
Does size for ggplot2::geom_point() refer to radius, diameter, area, or something else?
so no transformations are necessary, unless you want the point size to be proportional to the edge width by area.
Build a tbl_graph with your samples
#edges are determined by length of intersection
edges <- data.frame('from'=c('A','B','C'),'to'=c('B','C','A'),
'weight'=c(length(intersect(A,B)),length(intersect(B,C)),length(intersect(C,A))))
#nodes are weighted by the length of the sample
nodes <- data.frame('name'=c('A','B','C'),size=c(length(A),length(B),length(C)))
tbl_graph <- tbl_graph(nodes=nodes,edges=edges)
Now, if you build the network directly with these sizes, the distances between nodes will be decided automatically, and most ggraph layouts set distances between nodes between 0 and 1, resulting in a crowded graph with oversized edges and nodes. If the distance between nodes is not important, we can simply use a scaling factor to scale the node sizes and edge widths down to fit the graph.
In order to harmonize width and sizes, we scale the range of the edge widths to the min and max of the edge widths, and scale node sizes to the min and max of the nodes' sizes, multiplied by 2, as the nodes are scaled by diameter. This way, node sizes and edge widths are scaled to their actual values, rather than decided by the layout. I also include here additional annotation methods to show the sizes of the nodes and edges. node_point shape=21 is the empty circle. Good luck!
scale_factor = 0.1
ggraph(tbl_graph) + geom_edge_link(aes(width=weight*scale_factor,label=weight),label_dodge=unit(-4,'mm'),angle_calc='along') +
scale_edge_width(range=c(min(edges$weight)*scale_factor,max(edges$weight)*scale_factor)) +
geom_node_point(aes(size=size * scale_factor),shape=21) + scale_size(range=c(min(nodes$size)*scale_factor*2,max(nodes$size)*scale_factor*2)) +
theme_linedraw() + geom_node_text(aes(label=paste(name,':',size)),nudge_x=-0.1)
resulting ggraph
I am plotting a food web ecological network with igraph and I am trying to avoid having my nodes to overlap.
The y-axis depends on the attributes of the nodes (i.e. the species trophic levels, varying between 1-3.52, no nodes between [1,2] by convention).
To do so, I calculate how many nodes there are within certain intervals of the y-axis range (1 to 4 with increments of 0.5).
If all nodes have the same size (defined by the user), I calculate the length of x-axis at each levels of y, as the sum of the size of the nodes (as if the nodes would be next to each other on a single axis):
xl <- (n nodes + 1 + 2 ) * size
I add 1 here since the nodes should be plotted from the certain of the nodes, so there is half a node on the left of the minima and right of the maxima of the x-axis. (The +2 is just to give a little space, and yes I know 1+2 =3 :) just to make the explanation easier).
I then calculate the break points on the x-axis calculating a sequence from half of the length I calculated above, centered around 0.
seq(-xl/2, xl/2,length.out = nnodes)
Doing so, each center of the nodes should be at a breakpoint on the x-axis with some distance between the nodes.
If the nodes have different sizes (defined by the user), I use the maximum size in the formula above so that I have enough space between each center of the nodes.
I combine the x-axis and y-axis coordinates as a matrix to use as my layout.
coordsp <- cbind(xcoord, TLsp)
I use rescale within the parameters of the plotting function.
plot(network, layout= coorsp,
rescale=TRUE,...)
I still get some overlap.
I tried using rescale = FALSE, specifying the x-axis and y-axis limits (range of the coordinates in coordsp) but the graph end up looking weird or the plotting window turns completely white.
I think the issue when rescale = TRUE is that it rescales the coordinates but not the size of the nodes, so that if they are too big for a window [-1,1]x[-1,1], the nodes end up overlapping.
Any guess on how to avoid overlap?
Thanks in advance
I want to visualize proportions using points inside a circle. For example, let's say that I have 100 points that I wish to scatter (somewhat randomly jittered) in a circle.
Next, I want to use this diagram to represent the proportions of people who voted Biden/Harris in 2020 US presidential elections, in each state.
Example #1 -- Michigan
Biden got 50.62% of Michigan's votes. I'm going to draw a horizontal diameter that splits the circle to two halves, and then color the points under the diameter in blue (Democrats' color).
Example #2 -- Wyoming
Unlike Michigan, in Wyoming Biden got only 26.55% of the votes, which is approximately a quarter of the vote. In this case I'd draw a horizontal chord that divides the circle such that the disk's area under the chord is 25% of the entire disk area. Then I'll color the respective points in that area in blue. Since I have 100 points in total, 25 points represent the 25% who voted Biden in Wyoming.
My question: How can I do this with ggplot? I researched this issue, and there's a lot of geometry going on here. First, the kind of area I'm talking about is called a "circular segment". Second, there are many formulas to calculate its area, if we know some other parameters about the shape (such as the radius length, etc.). See this nice demo.
However, my goal isn't to solve geometry problems, but just to represent proportions in a very specific way:
draw a circle
sprinkle X number of points inside
draw a (real or invisible) horizontal line that divides the circle/disk area according to a given proportion
ensure that the points are arranged respective to the split. That is, if we want to represent a 30%-70% split, then have 30% of the points under the line that divides the disk.
color the points under the line.
I understand that this is somewhat an exotic visualization, but I'll be thankful for any help with this.
EDIT
I've found a reference to a JavaScript package that does something very similar to what I'm asking.
I took a crack at this for fun. There's a lot more that could be done. I agree that this is not a great way to visualize proportions, but if it's engaging your audience ...
Formulas for determining appropriate heights are taken from Wikipedia. In particular we need the formulas
a/A = (theta - sin(theta))/(2*pi)
h = 1-cos(theta/2)
where a is the area of the segment; A is the whole area of the circle; theta is the angle described by the arc that defines the segment (see Wikipedia for pictures); and h is the height of the segment.
Machinery for finding heights.
afun <- function(x) (x-sin(x))/(2*pi)
## curve(afun, from=0, to = 2*pi)
find_a <- function(a) {
uniroot(
function(x) afun(x) -a,
interval=c(0, 2*pi))$root
}
find_h <- function(a) {
1- cos(find_a(a)/2)
}
vfind_h <- Vectorize(find_h)
## find_a(0.5)
## find_h(0.5)
## curve(vfind_h(x), from = 0, to= 1)
set up a circle
dd <- data.frame(x=0,y=0,r=1)
library(ggforce)
library(ggplot2); theme_set(theme_void())
gg0 <- ggplot(dd) + geom_circle(aes(x0=x,y0=y,r=r)) + coord_fixed()
finish
props <- c(0.2,0.5,0.3) ## proportions
n <- 100 ## number of points to scatter
cprop <- cumsum(props)[-length(props)]
h <- vfind_h(cprop)
set.seed(101)
r <- runif(n)
th <- runif(n, 0, 2 * pi)
dd <-
data.frame(x = sqrt(r) * cos(th),
y = sqrt(r) * sin(th))
dd2 <- data.frame(x=r*cos(2*pi*th), y = r*sin(2*pi*th))
dd2$g <- cut(dd2$y, c(1, 1-h, -1))
gg0 + geom_point(data=dd2, aes(x, y, colour = g), size=3)
There are a bunch of tweaks that would make this better (meaningful names for the categories; reverse the axis order to match the plot; maybe add segments delimiting the sections, or (more work) polygons so you can shade the sections.
You should definitely check this for mistakes — e.g. there are places where I may have used a set of values where I should have used their first differences, or vice versa (values vs cumulative sum). But this should get you started.
Hi R expert of the world,
Assume I have a point pattern that generate an intensity map and that this map is color coded in 3 region in an pixeled image.... how could I get the color-coded area?
here it is an example using spatstat:
library(spatstat)
japanesepines
Z<-density(japanesepines); plot(dens) # ---> I create a density map
b <- quantile(Z, probs = (0:3)/3) # ---> I "reduce it" to 3 color-ceded zones
Zcut <- cut(Z, breaks = b, labels = 1:3); plot(Zcut)
class(Zcut) # ---> and Zcut is my resultant image ("im")
Thank you in advance
Sacc
In your specific example it is very easy to calculate the area because you used quantile to cut the image: This effectively divides the image into areas of equal size, so there should be three areas of size 1/3 since the window is a unit square. In general to calculate areas from a factor valued image you could use as.tess and tile.areas (continuing your example):
Ztess <- as.tess(Zcut)
tile.areas(Ztess)
In this case the areas are 0.333313, which must be due to discretization.
I'm not exactly sure what you're after, but you can count up the number of pixels in each color using the table() function.
table(Zcut[[1]])
I have created a 3D plot (a surface) using wireframe function. I wonder if there is any functions by which I can calculate the volume under the surface in a 3D plot?
Here is a sample of my data plus the wrieframe syntax I used to create my 3D (surface) plot:
x1<-c(13,27,41,55,69,83,97,111,125,139)
x2<-c(27,55,83,111,139,166,194,222,250,278)
x3<-c(41,83,125,166,208,250,292,333,375,417)
x4<-c(55,111,166,222,278,333,389,445,500,556)
x5<-c(69,139,208,278,347,417,487,556,626,695)
x6<-c(83,166,250,333,417,500,584,667,751,834)
x7<-c(97,194,292,389,487,584,681,779,876,974)
x8<-c(111,222,333,445,556,667,779,890,1001,1113)
x9<-c(125,250,375,500,626,751,876,1001,1127,1252)
x10<-c(139,278,417,556,695,834,974,1113,1252,1391)
df<-data.frame(x1,x2,x3,x4,x5,x6,x7,x8,x9,x10)
df.matrix<-as.matrix(df)
wireframe(df.matrix,
aspect = c(61/87, 0.4),scales=list(arrows=FALSE,cex=.5,tick.number="10",z=list(arrows=T)),ylim=c(1:10),xlab=expression(phi1),ylab="Percentile",zlab=" Loss",main="Random Classifier",
light.source = c(10,10,10),drape=T,col.regions = rainbow(100, s = 1, v = 1, start = 0, end = max(1,100 - 1)/100, alpha = 1),screen=list(z=-60,x=-60))
Note: my real data is a 100X100 matrix
Thanks
The data you are feeding to wireframe is a grid of values. Hence one estimate of the volume of whatever underlying surface this is approximating is the sum of the grid values multiplied by the grid cell areas. This is just like adding up the heights of histogram bars to get the number of values in your histogram.
The problem I see with you doing this on your data is that the cell areas are going to be in odd units - percentiles on one axis, phi on the other has unknown units, so your volume is going to have units of loss times units of percentile times units of phi.
This isn't a problem if you want to compare volumes of similar things on exactly the same grid, but if you have surfaces on different grids (different values of phi, or different percentiles) then you need to be careful.
Now, noting that wireframe doesn't draw like a 3d histogram would (looking like square tower blocks) this gives us another way to estimate the volume. Your 10x10 matrix is plotted as 9x9 squares. Divide each of those squares into triangles and then compute the volume of the 192 right truncated triangular prisms (I think this is what they are - they are equilateral triangular prisms with a right angle and one sloping end). The formula for that should be out there somewhere. Probably base area times height to the centroid of the triangle or something.
I thought maybe this would be in the raster package, but it isn't. There's code for computing the surface area but not the volume! I'm sure the raster maintainer would be happy to have some code for this!
If the points are arbitrary (ie, don't follow smooth function), it seems like you're looking for the volume of the convex hull (minimum surface) surrounding these points. One package to help you calculate this is alphashape3d.
You'll need a 3-column matrix of the coordinates to form the right type of object to make the calculation but it seems rather straight-forward.