Thanks for your help in advance!
My question is, given a list of sets, how can I visualize the overlap of any of the two sets using the network plot as shown below?
Please feel free to generate any sets for demonstration. Or you can use the following simple sets.
set.seed(123456)
A <- sample(1:100, 60)
B <- sample(1:100, 50)
C <- sample(1:100, 75)
In ggraph we must use scale_size() for nodes and scale_edge_width() for edges to harmonize proportions. Point sizes in ggplot are scaled by their radius already:
Does size for ggplot2::geom_point() refer to radius, diameter, area, or something else?
so no transformations are necessary, unless you want the point size to be proportional to the edge width by area.
Build a tbl_graph with your samples
#edges are determined by length of intersection
edges <- data.frame('from'=c('A','B','C'),'to'=c('B','C','A'),
'weight'=c(length(intersect(A,B)),length(intersect(B,C)),length(intersect(C,A))))
#nodes are weighted by the length of the sample
nodes <- data.frame('name'=c('A','B','C'),size=c(length(A),length(B),length(C)))
tbl_graph <- tbl_graph(nodes=nodes,edges=edges)
Now, if you build the network directly with these sizes, the distances between nodes will be decided automatically, and most ggraph layouts set distances between nodes between 0 and 1, resulting in a crowded graph with oversized edges and nodes. If the distance between nodes is not important, we can simply use a scaling factor to scale the node sizes and edge widths down to fit the graph.
In order to harmonize width and sizes, we scale the range of the edge widths to the min and max of the edge widths, and scale node sizes to the min and max of the nodes' sizes, multiplied by 2, as the nodes are scaled by diameter. This way, node sizes and edge widths are scaled to their actual values, rather than decided by the layout. I also include here additional annotation methods to show the sizes of the nodes and edges. node_point shape=21 is the empty circle. Good luck!
scale_factor = 0.1
ggraph(tbl_graph) + geom_edge_link(aes(width=weight*scale_factor,label=weight),label_dodge=unit(-4,'mm'),angle_calc='along') +
scale_edge_width(range=c(min(edges$weight)*scale_factor,max(edges$weight)*scale_factor)) +
geom_node_point(aes(size=size * scale_factor),shape=21) + scale_size(range=c(min(nodes$size)*scale_factor*2,max(nodes$size)*scale_factor*2)) +
theme_linedraw() + geom_node_text(aes(label=paste(name,':',size)),nudge_x=-0.1)
resulting ggraph
This has been killing me, it should be so simple. I would like to make a scatterplot in which both axes are count data, so each [x,y] location of two integer values has a substantial amount of overlapping points. For each of these x,y points I would like to show (and specify) 2 treatments by colour, 8 trials by alpha (or shape), and crucially, the count of points at that locus (separated by treatment/trial) by size. It seems that geom_jitter and geom_count allow some elements of each but I can't combine them.
The data is zero-heavy, so I'm expecting 16 large overlapping circles at 0,0, which it would be nice to jitter as in a bubble plot. I'm also happy to use other approaches outside ggplot2 if these things can't be combined within it. Many thanks in advance!
Hi R expert of the world,
Assume I have a point pattern that generate an intensity map and that this map is color coded in 3 region in an pixeled image.... how could I get the color-coded area?
here it is an example using spatstat:
library(spatstat)
japanesepines
Z<-density(japanesepines); plot(dens) # ---> I create a density map
b <- quantile(Z, probs = (0:3)/3) # ---> I "reduce it" to 3 color-ceded zones
Zcut <- cut(Z, breaks = b, labels = 1:3); plot(Zcut)
class(Zcut) # ---> and Zcut is my resultant image ("im")
Thank you in advance
Sacc
In your specific example it is very easy to calculate the area because you used quantile to cut the image: This effectively divides the image into areas of equal size, so there should be three areas of size 1/3 since the window is a unit square. In general to calculate areas from a factor valued image you could use as.tess and tile.areas (continuing your example):
Ztess <- as.tess(Zcut)
tile.areas(Ztess)
In this case the areas are 0.333313, which must be due to discretization.
I'm not exactly sure what you're after, but you can count up the number of pixels in each color using the table() function.
table(Zcut[[1]])
I have created a 3D plot (a surface) using wireframe function. I wonder if there is any functions by which I can calculate the volume under the surface in a 3D plot?
Here is a sample of my data plus the wrieframe syntax I used to create my 3D (surface) plot:
x1<-c(13,27,41,55,69,83,97,111,125,139)
x2<-c(27,55,83,111,139,166,194,222,250,278)
x3<-c(41,83,125,166,208,250,292,333,375,417)
x4<-c(55,111,166,222,278,333,389,445,500,556)
x5<-c(69,139,208,278,347,417,487,556,626,695)
x6<-c(83,166,250,333,417,500,584,667,751,834)
x7<-c(97,194,292,389,487,584,681,779,876,974)
x8<-c(111,222,333,445,556,667,779,890,1001,1113)
x9<-c(125,250,375,500,626,751,876,1001,1127,1252)
x10<-c(139,278,417,556,695,834,974,1113,1252,1391)
df<-data.frame(x1,x2,x3,x4,x5,x6,x7,x8,x9,x10)
df.matrix<-as.matrix(df)
wireframe(df.matrix,
aspect = c(61/87, 0.4),scales=list(arrows=FALSE,cex=.5,tick.number="10",z=list(arrows=T)),ylim=c(1:10),xlab=expression(phi1),ylab="Percentile",zlab=" Loss",main="Random Classifier",
light.source = c(10,10,10),drape=T,col.regions = rainbow(100, s = 1, v = 1, start = 0, end = max(1,100 - 1)/100, alpha = 1),screen=list(z=-60,x=-60))
Note: my real data is a 100X100 matrix
Thanks
The data you are feeding to wireframe is a grid of values. Hence one estimate of the volume of whatever underlying surface this is approximating is the sum of the grid values multiplied by the grid cell areas. This is just like adding up the heights of histogram bars to get the number of values in your histogram.
The problem I see with you doing this on your data is that the cell areas are going to be in odd units - percentiles on one axis, phi on the other has unknown units, so your volume is going to have units of loss times units of percentile times units of phi.
This isn't a problem if you want to compare volumes of similar things on exactly the same grid, but if you have surfaces on different grids (different values of phi, or different percentiles) then you need to be careful.
Now, noting that wireframe doesn't draw like a 3d histogram would (looking like square tower blocks) this gives us another way to estimate the volume. Your 10x10 matrix is plotted as 9x9 squares. Divide each of those squares into triangles and then compute the volume of the 192 right truncated triangular prisms (I think this is what they are - they are equilateral triangular prisms with a right angle and one sloping end). The formula for that should be out there somewhere. Probably base area times height to the centroid of the triangle or something.
I thought maybe this would be in the raster package, but it isn't. There's code for computing the surface area but not the volume! I'm sure the raster maintainer would be happy to have some code for this!
If the points are arbitrary (ie, don't follow smooth function), it seems like you're looking for the volume of the convex hull (minimum surface) surrounding these points. One package to help you calculate this is alphashape3d.
You'll need a 3-column matrix of the coordinates to form the right type of object to make the calculation but it seems rather straight-forward.
In JFreechart xySeries I want to plot the lines using a very dense set of points in order to show curves with precision, however, I want to plot the points with less density. For example, I have 100 data points each one is 1 unit apart on the x axis, but I only want to plot the point every 5 unit. I do,however, want the lines to be connected every 1 unit in order to show the curve with high density.
Is this possible?
You can subclass XYLineAndShapeRenderer and override getItemShapeVisible(int series, int item).