I am trying to do charts with multiselect options using Streamlit app. Actually, I only achieve to do a chart with a unique selectbox (no multiple choices).
This is the code below that works with a unique selection:
df = pd.DataFrame(px.data.gapminder())
def plot():
clist = data['country'].unique()
country = st.selectbox('Select country', clist)
st.header('You selected:', country)
fig = px.line(df[df['country'] == country], x = "year", y = "gdpPercap", title = country)
st.plotly_chart(fig)
But when I replace st.selectbox by st.multiselect the plot does not work.
You can do it like this:
import pandas as pd
import streamlit as st
import plotly.express as px
import plotly.graph_objects as go
def plot():
df = pd.DataFrame(px.data.gapminder())
clist = df["country"].unique().tolist()
countries = st.multiselect("Select country", clist)
st.header("You selected: {}".format(", ".join(countries)))
dfs = {country: df[df["country"] == country] for country in countries}
fig = go.Figure()
for country, df in dfs.items():
fig = fig.add_trace(go.Scatter(x=df["year"], y=df["gdpPercap"], name=country))
st.plotly_chart(fig)
plot()
using st.multiselect and then add_trace to add every country after the other. The dict dfs is there to map every sub-dataframe for the country for easy access.
It gives the following:
Related
Good afternoon,
from the code below I am able to produce a Graph chart but it does not show the underlying values.
I tried to tweak the code in this thread but I got no joy. I never used Java.
What I need is a graph that has also the values and the percentage under each node like the following picture.
Thanks
library(dplyr)
library(networkD3)
library(tidyverse)
library(readxl)
library(RColorBrewer)
df = data.frame(Source = c("ABC","CDE","MNB","PCI","UCD"),
Destination = c("Me","You","Him","Her","Her"),
Value = c(200,350,456,450,100))
## Reshape dataframe to long
df2 = pivot_longer(df, c(Destination, Source))
## make unique list for destination and source
dest = unique(as.character(df$Destination))
sources = unique(as.character(df$Source))
## Assign nodes number to each element of the chart
nodes2 = data.frame(node = append(dest,sources), nodeid = c(0:8))
res = merge(df,nodes2, by.x="Source", by.y = "node")
res = merge(res,nodes2, by.x="Destination", by.y = "node")
## Make links
links2 = res[, c("nodeid.x","nodeid.y","Value")]
colnames(links2) <- c("source", "target", "value")
## Add a 'group' column to each connection:
links2$group = as.factor(c("type_a","type_b","type_c","type_d","type_e"))
## defining nodes
nodes2["groups"] = nodes2$node
nodes2$groups = as.factor(nodes2$groups)
# Give a color for each group:
my_color <- 'd3.scaleOrdinal() .domain(["type_a","type_b","type_c","type_d","type_e","Me","You","Him","Her","Her"]) .range(["rgb(165,0,38,0.4)", "rgb(215,48,39, 0.4)", "rgb(244,109,67,0.4)", "rgb(253,174,97,0.4)", "rgb(254,224,139,0.4)",
"rgb(255,255,191,0.4)", "rgb(217,239,139,0.4)", "rgb(166,217,106,0.4)",
"rgb(102,189,99,0.4)","rgb(26,152,80,0.4)"])'
# plot graph
networkD3::sankeyNetwork(Links = links2, Nodes = nodes2,
Source = 'source',
Target = 'target',
Value = 'value',
NodeID = 'node',
units = 'Amount',
colourScale=my_color,
LinkGroup="group",
NodeGroup="groups",
fontFamily = "arial",
fontSize = 8,
nodeWidth = 8)
Update below original content; it is a fully developed solution to your original request.
I'm still working on rendering the string with multiple lines (instead of on one line). However, it's proving to be quite difficult as SVG text. However, here is a method in which you can get all of the desired information onto your diagram, even if it isn't styled exactly as you wished.
First I created the data to add to the plot. This has to be added to the widget after it's created. (It will just get stripped if you try to add it beforehand.)
This creates the before and after percentages and the aggregated sums (where needed).
# for this tiny data frame some of this grouping is redundant---
# however, this method could be used on a much larger scale
df3 <- df %>%
group_by(Source) %>%
mutate(sPerc = paste0(round(sum(Value) / sum(df$Value) * 100, 2), "%")) %>%
group_by(Destination) %>%
mutate(dPerc = paste0(round(sum(Value) / sum(df$Value) * 100, 2), "%")) %>%
pivot_longer(c(Destination, Source)) %>%
mutate(Perc = ifelse(name == "Destination",
dPerc, sPerc)) %>% # determine which % to retain
select(Value, value, Perc) %>% # only fields to add to widget
group_by(value, Perc) %>%
summarise(Value = sum(Value)) # get the sum for 'Her'
I saved the Sankey diagram with the object name plt. This next part adds the new data to the widget plt.
plt$x$nodes <- right_join(plt$x$nodes, df3, by = c("name" = "value"))
This final element adds the value and the percentages to the source and destination node labels.
htmlwidgets::onRender(plt, '
function(el, x) {
d3.select(el).selectAll(".node text")
.text(d => d.name + " " + d.Perc + " " + d.Value)
}')
Update: Multi-line labels
I guess I just needed to sleep on it. This update will get you multi-line text.
You also asked for resources on how you would go about doing this yourself. There are a few things at play here: Javascript, SVG text, D3, and the package htmlwidgets. When you use onRender, it's important to know the script file that that connects the package R code to the package htmlwidgets. I would suggest starting with learning about htmlwidgets. For example, how to create your own.
Alright-- back to answering the original question. This appends the new values using all of the content I originally provided, except the call to onRender.
htmlwidgets::onRender(plt, '
function(el, x) {
d3.select(el).selectAll(".node text").each(function(d){
var arr, val, anc
arr = " " + d.Perc + " " + d.Value;
arr = arr.split(" ");
val = d3.select(this).attr("x");
anc = d3.select(this).attr("text-anchor");
for(i = 0; i < arr.length; i++) {
d3.select(this).append("tspan")
.text(arr[i])
.attr("dy", i ? "1.2em" : 0)
.attr("x", val)
.attr("text-anchor", anc)
.attr("class", "tspan" + i);
}
})
}')
I am currently building an app and I want to have tooltips in a highcharter stock with additional information.
#Data
df <- data.frame(time = seq(as.Date("2021-03-10"), length = 10, by = "days"),
values = 1:10,
additionalInfo1 = LETTERS[1:10],
additionalInfo2 = letters[1:10])
#Packages
library(highcharter) #plots
library(xts) #conversion for stock-highchart
library(dplyr) #piping-operator
I know how to add additional information to a tooltip when using type="line". This can be done via
highchart_line <- hchart(df, "line", hcaes(x = time, y = values),
tooltip = list(headerFormat = "<b> Some Tooltipheader </b> <br/>",
pointFormat = paste0("index: {point.index} <br/>",
"time: {point.time} <br/>",
"additional1: {point.additionalInfo1} <br/>",
"additional2: {point.additionalInfo2}")))
and shows everything fine. The nice thing is that one can supply the whole data and hence has access to the columnnames.
When creating a type="stock", I need to convert the data to an xts while only using the times and the values:
stockdata_xts <- xts(x = df$values, order.by = df$time)
highchart_stock <- highchart(type="stock") %>%
hc_add_series(stockdata_xts, name = "someData",
tooltip = list(pointFormat = paste0(
"point.x: {point.x} <br/>",
"point.y: {point.y} <br/>",
"point.index: {point.index} <br/>"
))) %>%
hc_rangeSelector(enabled = FALSE)
Is there a way to put additional data for the tooltip in the stock-highchart such that the tooltip looks like the one in the linechart above? Maybe some functionality of the xts-object including more attributes for it which I don't know.
The reason why I want to use the stock-highchart is the navigator-bar. If there is a way to include the navigator-bar in the line-highchart, I would also be thankful. There is also a function highcharter::hc_navigator, but the documentation says that it is only applicable to highstocks. (see https://cloud.r-project.org/web/packages/highcharter/highcharter.pdf , page 50)
Yes, the navigator works in stockChart. For such tooltip customization, I think it will be best to use the dedicated formatter API function: https://api.highcharts.com/highstock/tooltip.formatter
Here you can find an article that can help you use JS code in R:
https://www.highcharts.com/blog/tutorials/working-with-highcharts-javascript-syntax-in-r/?fbclid=IwAR1Em2yNUsIJunTRS4IEbUwGksb5PC7LfZATLcyvb7uLS7ZvV7v4-e0L0
I am working on a research assignment on COVID and using the datalake API to fetch different kind of datasets available to us.
I am wondering if it's possible to fetch all outbreak countries.
ids = list("Australia"), this works with individual country, it doesnt seem to accept wildcard or all.
Can anyone give me any insights on this please.
# Total number of confirmed cases in Australia and proportion of getting infected.
today <- Sys.Date()
casecounts <- evalmetrics(
"outbreaklocation",
list(
spec = list(
**ids = list("Australia"),**
expressions = list("JHU_ConfirmedCasesInterpolated","JHU_ConfirmedDeathsInterpolated"),
start = "2019-12-20",
end = today-1,
interval = "DAY"
)
)
)
casecounts
The easiest way to access a list of countries is in the Excel file linked at https://c3.ai/covid-19-api-documentation/#tag/OutbreakLocation. It has a list of countries in the first sheet, and shows which of those have data from JHU.
You could also fetch an approximate list of country-level locations with:
locations <- fetch(
"outbreaklocation",
list(
spec = list(
filter = "not(contains(id, '_'))"
)
)
)
That should contain all of the countries, but could have some non-countries like World Bank regions.
Then, you'd use this code to get the time series data for all of those locations:
location_ids <-
locations %>%
dplyr::select(-location) %>%
unnest_wider(fips, names_sep = ".") %>%
sample_n(15) %>% # include this to test on a smaller set
pull(id)
today <- Sys.Date()
casecounts <- evalmetrics(
"outbreaklocation",
list(
spec = list(
ids = location_ids,
expressions = list("JHU_ConfirmedCasesInterpolated","JHU_ConfirmedDeathsInterpolated"),
start = "2019-12-20",
end = today-1,
interval = "DAY"
)
),
get_all = TRUE
)
casecounts
I have some problem to group my data in Plotly under R. To start with I was using local data from a csv file, reading them with:
geogrid_data <- read.delim('geogrid.csv', row.names = NULL, stringsAsFactors = TRUE)
and the plotting went well, using the following:
library(plotly)
library(RColorBrewer)
x <- list(
title = 'Date'
)
p <- plotly::plot_ly(geogrid_data,
type = 'scatter',
x = ~ts_now,
y = ~absolute_v_sum,
text = paste('Table: ', geogrid_data$table_name,
'<br>Absolute_v_Sum: ', geogrid_data$absolute_v_sum),
hoverinfo = 'text',
mode = 'lines',
color = list(
color = colorRampPalette(RColorBrewer::brewer.pal(11,'Spectral'))(
length(unique(geogrid_data$table_name))
)
),
transforms = list(
list(
type = 'groupby',
groups = ~table_name
)
)
) %>% layout(showlegend = TRUE, xaxis = x)
Here the output:
Then I was going to alter the data source to an Oracle database table, reading the data as follows, using the ROracle package:
# retrieve data into resultSet object
rs <- dbSendQuery(con, "SELECT * FROM GEOGRID_STATS")
# fetch records from the resultSet into a data.frame
geogrid_data <- fetch(rs)
# free resources occupied by resultSet
dbClearResult(rs)
dbUnloadDriver(drv)
# remove duplicates from dataframe (based on TABLE_NAME, TS_BEFORE, TS_NOW, NOW_SUM)
geogrid_data <- geogrid_data %>% distinct(TABLE_NAME, TS_BEFORE, TS_NOW, NOW_SUM, .keep_all = TRUE)
# alter date columns in place
geogrid_data$TS_BEFORE <- as.Date(geogrid_data$TS_BEFORE, format='%d-%m-%Y')
geogrid_data$TS_NOW <- as.Date(geogrid_data$TS_NOW, format='%d-%m-%Y')
and adjusting the plotting to:
p <- plotly::plot_ly(
type = 'scatter',
x = geogrid_data$TS_NOW,
y = geogrid_data$ABSOLUTE_V_SUM,
text = paste('Table: ', geogrid_data$TABLE_NAME,
'<br>Absolute_v_Sum: ', geogrid_data$ABSOLUTE_V_SUM,
'<br>Date: ', geogrid_data$TS_NOW),
hoverinfo = 'text',
mode = 'lines',
color = list(
color = colorRampPalette(RColorBrewer::brewer.pal(11,'Spectral'))(
length(unique(geogrid_data$TABLE_NAME))
)
),
transforms = list(
list(
type = 'groupby',
groups = geogrid_data$TABLE_NAME
)
)
) %>% layout(showlegend = TRUE, xaxis = x)
Unfortunately, this is leading to some problem with the grouping as it seems.:
As you can see from the label text when hovering over the data point, the point represents data from NY_SKOV_PLANTEB_MW_POLY while the legend is set to show data from NY_BYGN_MW_POLY. Looking at other data points in this graph I found a wild mix of points of all sorts in this graph, some of them representing data of NY_BYGN_MW_POLY, most of them not.
Also the plotting with respect to the time line does not work any more, e.g. data are plotted with start on Dec. 11 - Dec. 10 - Dec. 10 - Dec. 12 - Dec. 20 - Dec. 17 - Dec. 16 - Dec. 15.
Where do I go wrong in handling the data, and what do I have to do to get it right?
Of course, one should look at the data... thanks Marco, after your question I did look at my data.
There are some points where I simply assumed things.
The reason why all data plotted fine with data from the csv file is simple. All information manually compiled in the csv file came from information in emails that have been ordered by date. Hence, I compiled the data in the csv file ordered by date and Plotly does not have any problems grouping the data by table_name.
After looking at my data I tidied up, keeping only the data I need to show in the plot and used dplyr to sort the data by time.
geogrid_data <- dplyr::arrange(geogrid_data, TS_NOW)
It is only by time and not by time and table name because the sorting by table name is done anyway by Plotly and the groupby statement
I am having trouble importing and joining a geojson map to some data using the highcharter library. I am trying to use a slim downed version of a sf dataset that I got using the tidycensus package which I then uploaded to https://mapshaper.org/ to reduce the size of the file by thinning out the polygons. After thinning I exported as geojson and import into R.
Here is an example. First I download the data using tidycensus, create two data sets one for geometry and one for the attribute of interest, here its median family income. Then I export the geometry data to so that I can feed into mapshapper for reduction.
#start with an example for one state
##pull geometry data for one state
md_data <- get_acs(geography = "tract",
state = "MD",
variables = "B19113_001",
geometry = T,
key = Sys.getenv("CENSUS_API_KEY"))
#data set of just GEOID and median family income for use in mapping
md_mfi <- as.data.frame(md_data) %>%
mutate(median_family_income = case_when(is.na(estimate) ~ 0,
TRUE ~ estimate)) %>%
select(GEOID,median_family_income)
#slim down to just the geoid and the geometry data
md_tracts <- md_data %>%
select(GEOID,geometry)
st_write(md_tracts, "U:/M1JPW00/GeoSpatial/census_tracts/acs_carto_2016/md_carto_tracts.shp")
After reformatting in mapshaper I import back into R
md_map_json <- jsonlite::fromJSON(txt = "FILEPATH/md_carto_tracts.json",simplifyVector = FALSE)
md_map_json <- geojsonio::as.json(md_map_json)
And then try and build a map based on an example from the highcharter docs here
> class(md_map_json)
[1] "json" "geo_json"
> head(md_mfi)
GEOID median_family_income
1 24001000100 54375
2 24001000200 57174
3 24001000300 48362
4 24001000400 52038
5 24001000500 46174
6 24001000600 49784
highchart(type = "map") %>%
hc_add_series(mapData = md_map_json,
data = list_parse(md_mfi),
joinBy = "GEOID",
value = "median_family_income",
name = "Median Family Income")
The map actually renders and the census tracts are colored solid blue but the series data doesn't seem to successfully join even with or without using list_parse.
I had the same problem, asked here:
Make a choropleth from a non-highmap-collection map. Nobody responded (I know!), so I finally got to a solution that I think should work for you too:
#Work with the map you get until this step:
md_map_json <- jsonlite::fromJSON(txt = "FILEPATH/md_carto_tracts.json",simplifyVector = FALSE)
#This part is unnecessary:
#md_map_json <- geojsonio::as.json(md_map_json)
#Then, write your map like this:
highchart() %>%
hc_add_series_map(md_map_json, md_mfi, value = "median_family_income", joinBy = "GEOID")