I'm trying to figure out some visualisation using piecharts to represent the frequencies of a total population vs. a subpopulation.
I'd like to represent either:
with angles defined by subpopulation frequencies (colors), and area (or radius size) of total population (gray) is adjusted:
or with angles defined by the total population frequencies (gray), and area (or radius size) of total population (colors) is adjusted:
Here are my questions:
Is there a R package that can do that for me ?
Which visualization is better, 1 or 2 ?
Should I adjust radius size or area to compare total and subpopulations ?
Thanks.
Related
I have a dataset of length values for plants sampled in a number of specific square areas
eg., not actual data - just a representation
I want to create a graph in R that has the number of plants per plot (per area squared) on the x axis and the average length on the y axis - effectively exploring the effect of density per plot on length as pictured below.
Just wondering what the best function/package would be to do this in
Thanks
I am plotting a food web ecological network with igraph and I am trying to avoid having my nodes to overlap.
The y-axis depends on the attributes of the nodes (i.e. the species trophic levels, varying between 1-3.52, no nodes between [1,2] by convention).
To do so, I calculate how many nodes there are within certain intervals of the y-axis range (1 to 4 with increments of 0.5).
If all nodes have the same size (defined by the user), I calculate the length of x-axis at each levels of y, as the sum of the size of the nodes (as if the nodes would be next to each other on a single axis):
xl <- (n nodes + 1 + 2 ) * size
I add 1 here since the nodes should be plotted from the certain of the nodes, so there is half a node on the left of the minima and right of the maxima of the x-axis. (The +2 is just to give a little space, and yes I know 1+2 =3 :) just to make the explanation easier).
I then calculate the break points on the x-axis calculating a sequence from half of the length I calculated above, centered around 0.
seq(-xl/2, xl/2,length.out = nnodes)
Doing so, each center of the nodes should be at a breakpoint on the x-axis with some distance between the nodes.
If the nodes have different sizes (defined by the user), I use the maximum size in the formula above so that I have enough space between each center of the nodes.
I combine the x-axis and y-axis coordinates as a matrix to use as my layout.
coordsp <- cbind(xcoord, TLsp)
I use rescale within the parameters of the plotting function.
plot(network, layout= coorsp,
rescale=TRUE,...)
I still get some overlap.
I tried using rescale = FALSE, specifying the x-axis and y-axis limits (range of the coordinates in coordsp) but the graph end up looking weird or the plotting window turns completely white.
I think the issue when rescale = TRUE is that it rescales the coordinates but not the size of the nodes, so that if they are too big for a window [-1,1]x[-1,1], the nodes end up overlapping.
Any guess on how to avoid overlap?
Thanks in advance
I’am trying to classify bivariate point patterns into groups using spatstat. The patterns are derived from the whole slide images of lymph nodes with cancer. I’ve trained a neural network to recognize cells of three types (cancer “LP”, immune cells “bcell” and all other cells). I do not wish to analyse all other cells but use them to construct a polygonal window in the shape of the lymph node. Thus, the patterns to be analysed are immune cells and cancer cells in polygonal windows. Each pattern can have several 10k cancer cells and up to 2mio immune cells. The patterns are of the type “Small World Model” as there is no possibility of points laying outside the window.
My classification should be based on the position of the cancer cells in relation to the immune cells. E.g. most cancer cells are laying on the “islands” of immune cells but in some cases cancer cells are (seemingly) uniformly dispersed and there are only a few immune cells. In addition, the patterns are not always uniform across the node. As I’m rather new to spatial statistics I developed a simple and crude method to classify the patterns. Here in short:
I calculated a kernel density of the immune cells with sigma=80 because this looked “nice” for me. Den<-density(split(cells)$"bcell",sigma=80,window= cells$window) (Should I have used e.g. sigma=bw.scott instead?)
Then I created a tessellation image by dividing density range in 3 parts (here again, I experimented with the breaks to get some “good looking results”).
rangesDenMax<-2*range(Den)[2]/3
rangesDenMin<-range(Den)[2]/3
map.breaks<-c(-Inf,rangesDenMin,rangesDenMax,Inf)
map.cuts <- cut(Den, breaks = map.breaks, labels = c("Low B-cell density","Medium B-cell density", "High B-cell density"))
map.quartile <- tess(image = map.cuts,window=cells$window)
tessImage<-map.quartile
Here are some examples of the plots of the tessellations with the cancer cell overlay (white dots). The lymph node on the left has a typical uniformly distributed “islands” of immune cells while the node on the right has only a few dense spots of immune cells and cancer cells not restricted to those spots:
heat map: immune cell kernel density, white dots: cancer cells
Then I measured a silly number of variables, which should give me a clue of how the cancer cells are distributed across the tessellation tiles (the calculation code is trivial so I post only the description of my variables):
LPlwB<-c() # proportion of cancer cells in low-b-cell-area
LPmdB<-c() # proportion of cancer cells in medium-b-cell-area
LPhiB<-c() # proportion of cancer cells in high-b-cell-area
AlwB<-c() # proportion of the low-b-cell area
AmdB<-c() # proportion of the medium-b-cell area
AhiB<-c() # proportion of the high-b-cell area
LPm1<-c() # mean distance to the 1st neighbour
LPm2<-c() # mean distance to the 2nd neighbour
LPm3<-c() # mean distance to the 3d neighbour
LPsd1<-c() # standard deviation of the mean distance to the 1st neighbour
LPsd2<-c() # standard deviation of the mean distance to the 2nd neighbour
LPsd3<-c() # standard deviation of the mean distance to the 3d neighbour
meanQ<-c() # mean quadratcount (I visually chose the quadrat size to be not too large and not too small)
sdevQ<-c() # standard deviation of the mean quadratcount
hiSAT<-c() # realised cancer cells saturation in high b-cell-area (number of cells observed divided by a number of cells, which could be fitted into the area considering the observed min distance between the cells)
mdSAT<-c() # realised cancer cells saturation in medium b-cell-area
lwSAT<-c() # realised cancer cells saturation in low b-cell-area
ll<-c() # Proportion LP neighbours of LP (contingency table count divided by total points)
lb<-c() # Proportion b-cell neighbours of LP
bl<-c() # Proportion b-cell neighbours of b-cells
bb<-c() # Proportion LP neighbours of b-cells
I z-scaled the variables, inspected them on a PCA-plot (the vectors pointed in different directions like needles of a sea urchin) and performed a hierarchical cluster analysis. I choose k by calculating fviz_nbclust(scaled_variables, hcut, method = "silhouette"). After dividing the dendrogram into k clusters and checking the cluster stability, I ended up with my groups, which seemed to make sense as cases with “islands” were separated from the "more dispersed" ones.
However, given the possibilities of the spatstat package I strongly feel like hitting nails into the wall with a smartphone.
It seems you are trying to quantify the way in which the cancer cells are positioned relative to the immune cells. You could do this by something like
Cancer <- split(cells)[["LP"]]
Immune <- split(cells)[["bcell"]]
Dimmune <- density(Immune, sigma=80)
f <- rhohat(Cancer, Dimmune)
plot(f)
Then f is a function that indicates the intensity (number per unit area) of cancer cells as a function of the density of immune cells. The plot shows the density of cancer cells on the vertical axis, against the density of immune cells on the horizontal axis.
If the graph of this function is flat, it means that the cancer cells are not paying attention to the density of immune cells. If the graph is steeply declining it means that cancer cells tend to avoid immune cells.
I suggest you first look at the plot of f for some example datasets to decide whether f has any ability to discriminate between spatial arrangements that you think should be classified as different. If so then you can use as.data.frame to extract the values of f and then use classical discriminant analysis (etc) to classify the slide images into groups.
Instead of density(Immune) you could use any other summary of the immune cells.
For example D <- distfun(Immune) would give you the distance to the nearest immune cell, and then f would compute the density of cancer cells as a function of the distance to nearest immune cell. And so on.
I can not find this information in the reference literature [1]
1)how adaptative.density() (package spatstat) manage duplicated spatial points. I have duplicated points exactly in the same position because I am combining measurements from different years, and I am expecting that the density curve is higher in those areas but I am not sure about it.
2) is the default value of f in adaptative.density() f=0 or f=1?
My guess is that it is f=0, so it is doing an adaptive estimate by calculating the intensity estimate at every location equal to the average intensity (number of points divided by window area)
Thank you for your time and input!
The default value of f is 0.1 as you can see from the "Usage" section in the help file.
The function subsamples the point pattern with this selection probability and uses the resulting pattern to generate a Dirichlet tessellation (if there are duplicated points here they are ignored). The other fraction of points (1-f) is used to estimate the intensity by the number of points in each tile of the tessellation divided by the corresponding area (here duplicated points count equally to the total count in the tile).
In R's excellent ICEinfer package which graphically plots incremental cost-effectiveness ratios from a bootstrap replication to denote uncertainty, the output by default is expressed in terms of cost (or effectiveness) units on both x and y axes.
The code is
dpunc <-ICEuncrt (
your.data,
treatment group, effectiveness, cost,
lamda=? (shadow price of health),
R= (number of bootstrap replications)
)
I want to plot the scatter plot in its natural units, i.e. costs on the y axis (which will always be large numbers) and effectiveness on the x axes (which will be small numbers - like QALYs i.e. 0.10, 0.20 etc.). For publication (say in journal papers) I think it is wise that a scatter plot has each variable (change in cost (y) and change in effect(x) in its natural units. I don't know how to change the graph or set it up so that it plots in natural units along the x and y axes. Any suggestions?