How can I exclude small boxes in tree map? - r

I have a tree map code in R
treemap(df,
index=c("Account.Name"),
vSize = "X2017",
type="index",
palette = "Reds",
title="Test tree",
fontsize.title = 14
)
Here this code generates tree map, but there are many very small boxes which have very small sum(default fun.aggregate) of "X2017" with respect to "Account.Name". Is there a way to exclude these small boxes like putting some limit or something ?

You can do the aggregation prior to creating the treemap. For example:
library(dplyr)
library(treemap)
df_sum = df %>% group_by(Account.Name) %>% summarise(X2017 = sum(X2017)) %>% filter(X2017 > 10)
treemap(df_sum,
index=c("Account.Name"),
vSize = "X2017",
type="index",
palette = "Reds",
title="Test tree",
fontsize.title = 14
)
The above will first aggregate (sum) the X2017 field by Account Name and then keep only cases where X2017 > 10 (change this to your desired value). The rest is the same as your code but with the aggregated data frame as the input.

Related

Display value other than 'size' with sunburstR

The following code generates a simple, interactive sunburst using sunburstR (example taken from https://timelyportfolio.github.io/sunburstR/articles/sunburst-2-0-0.html). When you scroll over any section it displays "size", and also wedges are colored according to "size". I would like to be able to manually specify a value other than 'size' that will come up when scrolled over and also that will be used to color the wedges. Is this possible? In other words, I would like to be able to have all wedges the same size, but be able to specify a different value for each wedge.
library(sunburstR)
library(htmltools)
library(d3r)
dat <- data.frame(
level1 = rep(c("a", "b"), each=3),
level2 = paste0(rep(c("a", "b"), each=3), 1:3),
size = c(10,5,2,3,8,6),
stringsAsFactors = FALSE
)
knitr::kable(dat)
tree <- d3_nest(dat, value_cols = "size")
tree
sb1 <- sunburst(tree, width="100%", height=400)
sb1
Just now seeing this and sorry for the delay. We can specify another field other than size with the valueField argument. See https://bl.ocks.org/timelyportfolio/616fc81b3bacee0d34a2975d53e9203a as an example.
library(treemap)
library(sunburstR)
library(d3r)
# use example from ?treemap::treemap
data(GNI2014)
tm <- treemap(GNI2014,
index=c("continent", "iso3"),
vSize="population",
vColor="continent",
type="index")
tm_nest <- d3_nest(
tm$tm[,c("continent", "iso3", "vSize", "color")],
value_cols = c("vSize", "color")
)
sunburst(
jsondata = tm_nest,
valueField = "vSize",
count = TRUE,
colors = htmlwidgets::JS("function(d){return d3.select(this).datum().color;}")
)
The prior example also shows how we can change color based on a column in the data.frame using a JavaScript function.
Here is another example controlling color https://github.com/timelyportfolio/sunburstR/issues/17#issuecomment-228448029.

Shiny and Leaflet integration is really slow - how can I speed it up?

Right now i'm almost certain that my current use of shiny and leaflet is sub-optimal.
At a high level my current approach looks like this:
Generate a leaflet.
Create a reactive dataframe on user input.
Create a reactive dataframe of lat lon coordinates on user selection of their area of interest.
Merge a spatial dataframe (containing postcode polygon boundaries) with the reactive dataframe from step 2, then draw the map with the joined dataframe. This keeps all the data necessary for drawing polygons, adding colorBins and fillColor and labels inside the same final dataframe.
In more detail, the steps are executed as follows:
Generate a map like this:
output$leaflet_map <- renderLeaflet({
leaflet() %>%
addTiles()
})
Produce a reactive dataframe of marketing data to be joined onto an sf spatial dataframe containing postcode polygons via sp::merge() (the join happens a little later, i'll get to that):
reactive_map_data1 <- reactive({
df %>%
filter(BrandAccount_Brand %in% input$selectBrandRecruitment1) %>%
group_by(POA_CODE, ordertype) %>%
summarise("Number of Orders type and postcode" = n(), "AOV" = round(mean(TotalDiscount), 2)) %>%
left_join(seifa, by = "POA_CODE") %>%
left_join(over25bypostcode, by = "POA_CODE") %>%
mutate(`Proportion of Population Over 25` = round(n() / `25_and_over` * 100, 4))
})
Create a reactive dataframe containing the lat and lon coordinates of the State selected by the user to be fed into the call to render the map:
reactive_state_recruitment1 <- reactive({
australian_states %>%
filter(States == input$selectState_recruitment1)
})
Render the final map - profvis determines that this is infact the slow part:
observeEvent(
input$gobutton_recruitment1, {
## First I load the spatial data with each call to render the
## map - this is almost certainly sub-optimal however I can't
## think of another way to do this as each time the data are
## joined I have no other way of re-setting the gdal.postcodes2
## spatial dataframe to its original state which is why I reload
## it from .rds each time:
gdal.postcodes_recruitment1 <- readRDS("gdal.postcodes2.rds")
## I then merge the marketing `reactive_map_data1()` dataframe
## created in Step 2 with the freshly loaded `gdal.postcodes2`
## spatial dataframe - `profvis` says this is pretty slow but
## not as slow as the rendering of the map
gdal.postcodes_recruitment1#data <- sp::merge(gdal.postcodes_recruitment1#data, reactive_map_data1(), by.x = "POA_CODE", all.x = TRUE)
## Next I generate the domain of `colorBin` with the `Number of
## Orders type and postcode` variable that only exists after the
## merge and is subject to change from user input - it resides
## within the `reactive_map_data1()` dataframe that gets merged
## onto the `gdal.postcodes2()` spatial dataframe.
pal <- colorBin("YlOrRd", domain =
gdal.postcodes_recruitment1$`Number of Orders type and
postcode`, bins = bins_counts)
## Lastly I update the leaflet with `leafletProxy()` to draw the
## map with polygons and fill colour based on the
## `reactive_map_data1()` values
leafletProxy("leaflet_map_recruitment1", data = gdal.postcodes_recruitment1) %>%
addPolygons(data = gdal.postcodes_recruitment1,
fillColor = ~pal(gdal.postcodes_recruitment1$`Number of Orders type and postcode`),
weight = 1,
opacity = 1,
color = "white",
dashArray = "2",
fillOpacity = .32,
highlight = highlightOptions(
weight = 3.5,
color = "white",
dashArray = "4",
fillOpacity = 0.35,
bringToFront = TRUE),
layerId = gdal.postcodes_recruitment1#data$POA_CODE,
label = sprintf(
"<strong>%s<br/>%s</strong><br/>%s<br/>%s<br/>%s<br/>%s",
paste("Postcode: ", gdal.postcodes_recruitment1$POA_CODE, sep = ""),
paste("% of Population Over 25: ", gdal.postcodes_recruitment1$`Proportion of Population Over 25`, "%"),
paste("Number of Orders: ", gdal.postcodes_recruitment1$`Number of Orders type and postcode`, sep = ""),
paste("Ave Order Value: $", gdal.postcodes_recruitment1$`AOV`, sep = ""),
paste("Advantage & Disadvantage: ", gdal.postcodes_recruitment1$`Relative Socio-Economic Advantage and Disadvantage Decile`, sep = ""),
paste("Education and Occupation: ", gdal.postcodes_recruitment1$`Education and Occupation Decile`, sep = "")
) %>%
lapply(htmltools::HTML),
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "3px 8px"),
textsize = "15px",
direction = "auto")) %>%
addLegend("bottomright", pal = pal, values = ~bins_counts,
title = "# of Recruits (All Time)",
labFormat = labelFormat(suffix = ""),
opacity = 1
) %>%
setView(lng = reactive_state_recruitment1()$Lon, lat = reactive_state_recruitment1()$Lat, zoom = reactive_state_recruitment1()$States_Zoom)
})
All up the map takes between 7 and 20 seconds to render as the data are quite large.
Some things to note:
The polygons have already been simplified to death, they are currently only displaying at 10% of the detail that was originally provided to define postcode boundaries by the Australian Bureau of Statistics. Simplifying the polygons further is not an option.
sp::merge() is not the fastest of join functions I have come across, but it is necessary in order to merge a spatial dataframe with a non-spatial dataframe (other joins such as those offered by dplyr will not accomplish this task - a look at the sp::merge() documentation reveals that this has something to do with S3 and S4 datatypes, in any case this part is not the slow part according to profvis).
According to profvis the actual rendering of the map in step 4 (drawing polygons) is the slow part. Ideally a solution to speed this whole process up would involve drawing the polygons on the original leaflet, and only updating the fillColor and labels applied to each polygon upon input of the 'Go' actionButton. I have not figured out a way to do this.
Can anyone think of a way to re-structure this whole procedure to optimise efficiency?
Any input is greatly appreciated.

How to get the grouping right in R with Plotly

I have some problem to group my data in Plotly under R. To start with I was using local data from a csv file, reading them with:
geogrid_data <- read.delim('geogrid.csv', row.names = NULL, stringsAsFactors = TRUE)
and the plotting went well, using the following:
library(plotly)
library(RColorBrewer)
x <- list(
title = 'Date'
)
p <- plotly::plot_ly(geogrid_data,
type = 'scatter',
x = ~ts_now,
y = ~absolute_v_sum,
text = paste('Table: ', geogrid_data$table_name,
'<br>Absolute_v_Sum: ', geogrid_data$absolute_v_sum),
hoverinfo = 'text',
mode = 'lines',
color = list(
color = colorRampPalette(RColorBrewer::brewer.pal(11,'Spectral'))(
length(unique(geogrid_data$table_name))
)
),
transforms = list(
list(
type = 'groupby',
groups = ~table_name
)
)
) %>% layout(showlegend = TRUE, xaxis = x)
Here the output:
Then I was going to alter the data source to an Oracle database table, reading the data as follows, using the ROracle package:
# retrieve data into resultSet object
rs <- dbSendQuery(con, "SELECT * FROM GEOGRID_STATS")
# fetch records from the resultSet into a data.frame
geogrid_data <- fetch(rs)
# free resources occupied by resultSet
dbClearResult(rs)
dbUnloadDriver(drv)
# remove duplicates from dataframe (based on TABLE_NAME, TS_BEFORE, TS_NOW, NOW_SUM)
geogrid_data <- geogrid_data %>% distinct(TABLE_NAME, TS_BEFORE, TS_NOW, NOW_SUM, .keep_all = TRUE)
# alter date columns in place
geogrid_data$TS_BEFORE <- as.Date(geogrid_data$TS_BEFORE, format='%d-%m-%Y')
geogrid_data$TS_NOW <- as.Date(geogrid_data$TS_NOW, format='%d-%m-%Y')
and adjusting the plotting to:
p <- plotly::plot_ly(
type = 'scatter',
x = geogrid_data$TS_NOW,
y = geogrid_data$ABSOLUTE_V_SUM,
text = paste('Table: ', geogrid_data$TABLE_NAME,
'<br>Absolute_v_Sum: ', geogrid_data$ABSOLUTE_V_SUM,
'<br>Date: ', geogrid_data$TS_NOW),
hoverinfo = 'text',
mode = 'lines',
color = list(
color = colorRampPalette(RColorBrewer::brewer.pal(11,'Spectral'))(
length(unique(geogrid_data$TABLE_NAME))
)
),
transforms = list(
list(
type = 'groupby',
groups = geogrid_data$TABLE_NAME
)
)
) %>% layout(showlegend = TRUE, xaxis = x)
Unfortunately, this is leading to some problem with the grouping as it seems.:
As you can see from the label text when hovering over the data point, the point represents data from NY_SKOV_PLANTEB_MW_POLY while the legend is set to show data from NY_BYGN_MW_POLY. Looking at other data points in this graph I found a wild mix of points of all sorts in this graph, some of them representing data of NY_BYGN_MW_POLY, most of them not.
Also the plotting with respect to the time line does not work any more, e.g. data are plotted with start on Dec. 11 - Dec. 10 - Dec. 10 - Dec. 12 - Dec. 20 - Dec. 17 - Dec. 16 - Dec. 15.
Where do I go wrong in handling the data, and what do I have to do to get it right?
Of course, one should look at the data... thanks Marco, after your question I did look at my data.
There are some points where I simply assumed things.
The reason why all data plotted fine with data from the csv file is simple. All information manually compiled in the csv file came from information in emails that have been ordered by date. Hence, I compiled the data in the csv file ordered by date and Plotly does not have any problems grouping the data by table_name.
After looking at my data I tidied up, keeping only the data I need to show in the plot and used dplyr to sort the data by time.
geogrid_data <- dplyr::arrange(geogrid_data, TS_NOW)
It is only by time and not by time and table name because the sorting by table name is done anyway by Plotly and the groupby statement

How to change chart height in hchart() function in R (highcharter package) without using pipe operator?

I built a Shiny app where I create some plot from hist() and density() objects, both saved in a list into an .RDS file from another script file. So, in shiny I only read the .RDS and make the plot.
Everything is working now, except that I am not finding how to change the height of the highchart plot using the hchart() function. In my code, the way it was built, I cannot work with pipes "%>%", beacuse I am using hchart inside a purrr::map() function.
To explain better I created a small example, that follows.
# Example of how the objects are structured
list <-
list(df1 = list(Sepal.Length = hist(iris$Sepal.Length, plot = FALSE)),
df2 = list(Sepal.Length = density(iris$Sepal.Length)))
# Example of a plot built with hchart function
list[['df2']]['Sepal.Length'] %>%
purrr::map(hchart, showInLegend = FALSE)
# Example of what does not work
list[['df2']]['Sepal.Length'] %>%
purrr::map(hchart, showInLegend = FALSE, height = 200)
Actually, I also would like to change more options of the chart, like colors, for example. But I am not finding a way with this solution I found.
Thanks in advance.
Wlademir.
I can see 2 main ways to do what you need (not sure why you can't use the pipe):
Option 1
Create a function to process every data and add the options inside that function:
get_hc <- function(d) {
hchart(d, showInLegend = FALSE) %>%
hc_size(height= 200) %>%
hc_title(text = "Purrr rocks")
}
Then:
list_of_charts <- list[['df2']]['Sepal.Length'] %>%
purrr::map(get_hc)
Option 2
You can use successively purrr::map:
list_of_charts <- list[['df2']]['Sepal.Length'] %>%
purrr::map(hchart, showInLegend = FALSE)
# change heigth
list_of_charts <- purrr::map(list_of_charts, hc_size, height = 200)
# change title
list_of_charts <- purrr::map(list_of_charts, hc_title. text = "Purrr rocks")
Or you can use successively purrr::map/ %>% combo:
list_of_charts <- list[['df2']]['Sepal.Length'] %>%
purrr::map(hchart, showInLegend = FALSE) %>%
purrr::map(hc_size, height = 200) %>%
purrr::map(hc_title, text = "Purrr rocks")

R Highcharter map from customized shapefile

I am having trouble importing and joining a geojson map to some data using the highcharter library. I am trying to use a slim downed version of a sf dataset that I got using the tidycensus package which I then uploaded to https://mapshaper.org/ to reduce the size of the file by thinning out the polygons. After thinning I exported as geojson and import into R.
Here is an example. First I download the data using tidycensus, create two data sets one for geometry and one for the attribute of interest, here its median family income. Then I export the geometry data to so that I can feed into mapshapper for reduction.
#start with an example for one state
##pull geometry data for one state
md_data <- get_acs(geography = "tract",
state = "MD",
variables = "B19113_001",
geometry = T,
key = Sys.getenv("CENSUS_API_KEY"))
#data set of just GEOID and median family income for use in mapping
md_mfi <- as.data.frame(md_data) %>%
mutate(median_family_income = case_when(is.na(estimate) ~ 0,
TRUE ~ estimate)) %>%
select(GEOID,median_family_income)
#slim down to just the geoid and the geometry data
md_tracts <- md_data %>%
select(GEOID,geometry)
st_write(md_tracts, "U:/M1JPW00/GeoSpatial/census_tracts/acs_carto_2016/md_carto_tracts.shp")
After reformatting in mapshaper I import back into R
md_map_json <- jsonlite::fromJSON(txt = "FILEPATH/md_carto_tracts.json",simplifyVector = FALSE)
md_map_json <- geojsonio::as.json(md_map_json)
And then try and build a map based on an example from the highcharter docs here
> class(md_map_json)
[1] "json" "geo_json"
> head(md_mfi)
GEOID median_family_income
1 24001000100 54375
2 24001000200 57174
3 24001000300 48362
4 24001000400 52038
5 24001000500 46174
6 24001000600 49784
highchart(type = "map") %>%
hc_add_series(mapData = md_map_json,
data = list_parse(md_mfi),
joinBy = "GEOID",
value = "median_family_income",
name = "Median Family Income")
The map actually renders and the census tracts are colored solid blue but the series data doesn't seem to successfully join even with or without using list_parse.
I had the same problem, asked here:
Make a choropleth from a non-highmap-collection map. Nobody responded (I know!), so I finally got to a solution that I think should work for you too:
#Work with the map you get until this step:
md_map_json <- jsonlite::fromJSON(txt = "FILEPATH/md_carto_tracts.json",simplifyVector = FALSE)
#This part is unnecessary:
#md_map_json <- geojsonio::as.json(md_map_json)
#Then, write your map like this:
highchart() %>%
hc_add_series_map(md_map_json, md_mfi, value = "median_family_income", joinBy = "GEOID")

Resources