Two questions of R circlepack plot : - r

Two questions of R circlepack plot :
the result always not real circular (it always show in ellipse, I have to adjust the plot window size in Rstudio ....) . Is the any way to hadle it ?
The position of circular and sub circular will be changed when rerun the code ? How to fix it ?
library(ggraph)
library(igraph)
library(dplyr)
library(tidyverse)
md <- data.frame(category = c('FDM','FDM','FDM','LCD','LCD','LCD'),
item =c('A1','B1','C1','A','B','C'),
amount = c(1,2,3,4,3,1))
md_sum <- md %>% group_by(category) %>% summarise(amount =sum(amount)) %>% rename('item'='category')
md_v <- rbind(md[,c(2:3)],md_sum)
pt <- igraph::graph_from_data_frame(md,vertices = md_v)
ggraph(pt,layout = 'circlepack', weight =amount)+
geom_node_circle(aes(fill=depth))+
geom_node_label(aes(label = paste0(name,'\n',amount )))+theme_void()

Related

Plotting large sf dataframe

I need to work with a 10minutes (1/12th degree) global grid of all land areas. The grid is generated in R using package (sf). The grid is to be limited to land areas of the world. Grid ID is needed for further down-stream analysis. Code generating the grid is below:
library(tidyverse)
library(sf)
sf_use_s2(F)
birds <- st_read('BOTW_breeding_valid_union.gpkg') ## This is just an examplary shapefile I use to set a bbox.
bbox <- st_bbox(birds)
bbox[1] <- -180
bbox[2] <- -90
bbox[3] <- 180
bbox[4] <- 90
bbox <- bbox %>% st_as_sfc
grid <- st_make_grid(bbox, cellsize = 1/12) %>% st_as_sf() %>% mutate(grid_ID = row_number())
land <- st_read('ne_10m_land.shp')
land_grids <- st_intersects(grid, land) %>% as.data.frame() %>% rename(grid_ID = row.id)
grid <- grid %>% left_join(land_grids, by = "grid_ID") %>% filter(col.id == "1") %>% select(grid_ID) %>%
st_write('global_10m_grid.gpkg')
Now I need to plot it to inspect it and for further data mapping (the grids will have values). I use package tmap:
grid <- st_read('global_10m_grid.gpkg')
bitmap('test_grid.png')
tm_shape(grid) + tm_fill(col = 'red')
dev.off()
However, I am struggling due to the size either on a personal machine (takes incredibly long time to load [or so I hope as it hasn't loaded in principle just yet]) or on a cluster with interactive shell (dev.off produced an empty file).
Is there a way to plot this more efficiently?
Yes, rasterising was indeed the solution. Maintaining the resolution of the vector in raster results in files just under 11 mb, perfectly openable in a normal RStudio setting on my Desktop.
For future references, the code looks like this:
g <- st_read('yourfile.gpkg')
library(stars)
g %>% left_join(df, by = 'grp') %>% select(value) %>% st_rasterize(n = 2773927) %>% write_stars('filename.tif')

Extracting and summarizing data from interactive histogram selection in R

I want to create an interactive histogram using plotly (or other package if better suited) in R from data similar to this example set:
test<-data.frame(sex=c("m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m"),weight=runif(80,5,9))
I want to show two overlayed histograms of weight distribution per sex with some summary statistics such as standard deviation, mean, number of samples, all per sex as well as globally.
Also I want to be able to make a selection preferably using a range slider or selection box while updating these summary statistics to the selection. Then I want to be able to add a variable to the original dataset to indicate if a sample is part of the selection.
Thanks for any help! Even if it's just pointing to a relevant online resource, I'm struggling to find one that tackles a similar problem.
#DataZhukov this is a revised answer based on your larger data sample. Per reply I removed the side-by-side (think age pyramid) and show how to use {plotly} for histograms.
While {plotly} supports interactivity, it is based on the concept of a "static" html-webpage. This means that no "active" calculation is done on the client side/user viewing the page.
For simple stats/summaries you can look into {crosstalk} & SummaryWidget to enable (some) "dynamic" update (i.e. client side calculations).
For a full fledged dynamic select/filter/recalculate type of interactivity, {shiny} is the way to go. (But that is another ballgame.)
{plotly} allows you to place text annotations "freely", by specifying the add_text() layer.
I construct this from your data. You can also just define it by hand in form of vectors.
If you use data frames as your input data structure, note that {plotly} uses the tilde notation (~) for the variable.
test<-data.frame(sex=c("m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m","m","m","f","f","m","m","f","m","f","m"),weight=runif(80,5,9))
# calculate mean, sd, etc based on given data
# note you can also define this with simple vectors
total_stats <- test_df %>%
summarise(SAMPLE = n(), MEAN_WEIGHT = mean(weight), SD = sd(weight)) %>%
mutate(sex = "m+f")
group_stats <- test_df %>% group_by(sex) %>%
summarise(SAMPLE = n(), MEAN_WEIGHT = mean(weight), SD = sd(weight))
my_stats <- bind_rows(total_stats, group_stats) %>%
mutate(LABEL = paste0(sex, " sample size: ", SAMPLE
, " with mean ", round(MEAN_WEIGHT, 2)
, " and SD ", round(SD, 2)
)
)
# format your text, e.g. font face and size ---- format to your liking
tf <- list(
family = "sans serif",
size = 11
)
The {plotly} call to construct the "pyramid" side-by-side rather than overlapping and adding a text layer to it.
test %>%
plot_ly() %>%
# ------------ plot histogram ----------------------
add_histogram( x = ~weight, color = ~sex
,nbinsx = 20 # set the number of bins you want/need
) %>%
# ------------ add annotation layer ---------------
## I provide x, y positions as vector, you could add and place
## each label as its own layer, i.e. add_text() call
add_text(data = my_stats
,x = c(5.2, 6,6.3), y = c(6, 5, 4.5)
,text = ~LABEL
,name = "" # left empty as we do not need to name the layer
,textfont = tf
,textposition = "right"
, showlegend = FALSE
) %>%
layout(yaxis = list(title =""))
This yields:
Obviously, you can freely define the x,y positions of your text annotations.
The default behaviour puts the count-bars side-by-side. If you want to force the "overlaying" behaviour, you can plot 2 histograms and force these 2 graphical layer to overlay. For the latter, you need to set the mode in the layout() layer. I put an alpha transparency as well, as you may have overlapping counts in your data sample. Text placement, etc follows the principles shown above.
# split test data frame in a male and female df
males <- test %>% filter(sex == "m")
fems <- test %>% filter(sex == "f")
plot_ly(
alpha = 0.5 # set alpha to ensure visibility on overlapping counts
, nbinsx = 20 # set number of bins
) %>%
#------------ add a histogram layer per group -------------------
add_histogram(data = males, x = ~weight, name = "male") %>%
add_histogram(data = fems, x = ~weight, name = "female") %>%
#------------ tweak layout --------------------------------------
layout(
barmode = "overlay" # to change side-by-side default to overlay
)

Replicating Hexagon heatmap in ggmaps [R]

I need to replicate heatmap that can be found in the link: https://rpubs.com/chrisbrunsdon/gwdplyr
I tried the very approach however I cannot replicate final hexagon colored heatmap.
My main issues is that I am having problem with even creating non colored map, using st_buffer(), st_make_grid() for some reason it takes a lot of time and then cannot be store because the file is very large.
I provided example that is with location that I need to plot heatmap on, then values are discrete and shall be use as color in heatmap
library(sf)
library(dplyr)
library(ggplot2)
library(ggspatial)
df <-
data.frame(
x = runif(1000, min = 14.22, max = 14.71),
y = runif(1000, min = 49.94, max = 50.18),
value = sample(c(1:8))
)
hp <-
df%>%
st_as_sf(coords=c("x","y"),crs=27700)
gl_hexes <- st_buffer(hp,1) %>%
st_make_grid(cellsize=c(0.0005, 0.0005),square=FALSE) %>%
st_sf() %>% mutate(hex_ID=sprintf('Hex%04d',row_number()))
I will be thankful for any advice...

Joining list of nest/ggplot2 generated images to two columns with consistent proportions

In summary, I would like to split a list of plots created using nest and ggplot2 to two columns. The problem I have had in my approaches is that elements in different subplots end up having more or less inconsistent dimensions because subplots have different heights (different number of elements in each groups, possibly exclusion of x-axis labels etc.).
Following example code uses the main tidyverse packages. I first generate some dummy data using mtcars; cars are split to random groups and each car is assigned an in-group position.
dummy <- mtcars %>%
mutate(group = sample(1:10, n(), replace = TRUE)) %>%
filter(group < 6) %>%
group_by(group) %>%
mutate(position = 1:n())
The actual code produces a list of subplots (plots) and information about number of elements in each group (heights).
## install patchwork via:
## devtools::install_github("thomasp85/patchwork")
plots <- dummy %>%
nest(-group, .key = "data") %>%
mutate(plots = map(data, ~ggplot(data = .x, aes(x = position, y = hp)) +
geom_bar(stat="identity") +
coord_flip()),
heights = purrr::map(data, ~ nrow(.)) %>% unlist())
g.plot <- patchwork::wrap_plots(plots$plots, ncol = 1, heights = plots$heights)
Function wrap_plots is able to produce a nice one-panel image using information about heights (included image panel A). When there is a large number of images to the plot, the one-column plot is not practical. I would, therefore, kindly ask help on how to turn the output of the above code (panel A) to the hoped output (panel B). Panel C exaggerates the problematic effect I have had using different image concatenation approaches.
Current output and hoped output
you could try setting the panel size to fixed dimensions and then arranging the gtables together,
library(egg)
library(gridExtra)
lg <- purrr::map2(plots$plots, plots$heights,
function(p,h) gtable_frame(ggplotGrob(p),
height =unit(h/10,'npc'), #tweak
width =unit(0.7,'npc'))) #tweak
grid.arrange(gtable_rbind(lg[[1]],lg[[2]], egg::.dummy_gtable),
gtable_rbind(lg[[3]],lg[[4]], egg::.dummy_gtable), ncol=2)
(tested with set.seed(12); I don't know what sample() OP had)

Fill geom_tile by mode of a factor variable or other ways to create a heat map in R

I am trying to create a heat map in R using three factors. I would like to be able to fill the colour using the modal category of one of the factors but I have not been able to find out how to do this.
When I try ggplot with geom_tile, it does produce the heatmap, however, I am not sure how it chooses the value of the fill variable. It certainly isn't the mode because I've checked this.
For instance, using the inbuilt dataset ChickWeight, I would like the fill to be based on the modal (most frequent) category of a variable "weight_group" I created.
data(ChickWeight)
glimpse(ChickWeight)
ChickWeight$Time <- ifelse(ChickWeight$Time >= 10,1,0)
ChickWeight <- ChickWeight %>% mutate(weight_group = ntile(weight, 3))
ChickWeight$Diet <- as.factor(ChickWeight$Diet)
ChickWeight$Time <- as.factor(ChickWeight$Time)
ChickWeight$weight_group <- as.factor(ChickWeight$weight_group)
table(ChickWeight$Diet, ChickWeight$Time, ChickWeight$weight_group)
ggplot(data = ChickWeight, aes(x=Time, y=Diet, fill=weight_group)) +
geom_tile()
Based on the three-way table, the bottom right block should be pink (corresponding to weight_group==1) rather than green as the modal category of weight_group when Diet==1 & Time==1 is weight_group==1 (11 counts).
Any help on this would be greatly appreciated.
Thank you!
You can define a function getMode that calculates the mode of a vector using plyr's count function to create a data frame of the counts for each class. Then sort the data frame and get the top value.
library(plyr)
getMode <- function(vec){
df <- plyr::count(vec) %>%
arrange(-freq)
return(df[1,"x"])
}
From here group by time and diet so you can find the mode for each combination of these groups and then use this as the fill for ggplot.
ChickWeight %>%
group_by(Time, Diet) %>%
summarize(modeWeightGroup = getMode(weight_group)) %>%
ggplot(aes(x=Time, y=Diet, fill= modeWeightGroup)) +
geom_tile()
I also don't think that the bottom right square should be weight_group 1 because it looks like the three way table is already sorted based on weight_group so that square is saying that of chicks in weight_group 1, their modal time, diet combination is (1,1).
Using dplyr to count the most frequent category of weight_group for each combination of Time and Diet :
ChickWeight %>%
group_by(Time, Diet) %>%
count(weight_group) %>%
filter(n == max(n)) %>%
ggplot(
aes(x = Time,
y = Diet,
fill = weight_group)
) +
geom_tile()
By the way, since you already know dplyr::mutate, you should know you can do all the pre-processing you are doing here inside a single mutate.
That means instead of :
ChickWeight$Time <- ifelse(ChickWeight$Time >= 10,1,0)
ChickWeight <- ChickWeight %>% mutate(weight_group = ntile(weight, 3))
ChickWeight$Diet <- as.factor(ChickWeight$Diet)
ChickWeight$Time <- as.factor(ChickWeight$Time)
ChickWeight$weight_group <- as.factor(ChickWeight$weight_group)
you can simply type :
ChickWeight <-
ChickWeight %>%
mutate(
Time = as.factor(ifelse(Time>=10, 1 ,0)),
Diet = as.factor(Diet),
weight_group = as.factor(ntile(weight, 3))
)

Resources