How to do Top N per time bucket in KQL? - azure-data-explorer

I want to group by time bucket and one other column and then only select the top N aggregated rows.
It's best explained with this example:
let T = datatable(d:datetime , continent:string, country:string, val:int)
[
datetime(2022-10-05T01:40:00.00), "Asia", "China", 10,
datetime(2022-10-05T02:50:00.00), "Asia", "India", 25,
datetime(2022-10-05T03:55:00.00), "Asia", "Japan", 15,
datetime(2022-10-05T01:40:00.00), "Europe", "Czech Republic", 1,
datetime(2022-10-05T02:50:00.00), "Europe", "France", 8,
datetime(2022-10-05T07:55:00.00), "Europe", "Germany", 9,
datetime(2022-10-05T04:55:00.00), "North America", "USA", 25,
datetime(2022-10-05T05:55:00.00), "North America", "Haiti", 5,
datetime(2022-10-05T09:55:00.00), "North America", "Jamaica", 3,
datetime(2022-10-06T01:40:00.00), "Asia", "China", 7,
datetime(2022-10-06T02:50:00.00), "Asia", "India", 8,
datetime(2022-10-06T03:55:00.00), "Asia", "Japan", 15,
datetime(2022-10-06T01:40:00.00), "Europe", "Czech Republic", 29,
datetime(2022-10-06T02:50:00.00), "Europe", "France", 14,
datetime(2022-10-06T07:55:00.00), "Europe", "Germany", 13,
datetime(2022-10-06T04:55:00.00), "North America", "USA", 12,
datetime(2022-10-06T05:55:00.00), "North America", "Haiti", 7,
datetime(2022-10-06T09:55:00.00), "North America", "Jamaica", 4,
];
T
| summarize sumval = sum(val) by bin(d,1d), continent
| sort by d asc, sumval desc
This is the result, but I only want the top 2 results per day (highlighted).
In SQL I would use either row_number or cross apply, but I've been struggling in KQL. I want to understand the solution, because it doesn't click yet.

top-nested operator
Please note that in you case you don't really need the 1st sum(val), but it was added since the syntax mandates something there.
We could have used count(), 0, int(null) or other options for that matter.
let T = datatable(d:datetime , continent:string, country:string, val:int)
[
datetime(2022-10-05T01:40:00.00), "Asia", "China", 10,
datetime(2022-10-05T02:50:00.00), "Asia", "India", 25,
datetime(2022-10-05T03:55:00.00), "Asia", "Japan", 15,
datetime(2022-10-05T01:40:00.00), "Europe", "Czech Republic", 1,
datetime(2022-10-05T02:50:00.00), "Europe", "France", 8,
datetime(2022-10-05T07:55:00.00), "Europe", "Germany", 9,
datetime(2022-10-05T04:55:00.00), "North America", "USA", 25,
datetime(2022-10-05T05:55:00.00), "North America", "Haiti", 5,
datetime(2022-10-05T09:55:00.00), "North America", "Jamaica", 3,
datetime(2022-10-06T01:40:00.00), "Asia", "China", 7,
datetime(2022-10-06T02:50:00.00), "Asia", "India", 8,
datetime(2022-10-06T03:55:00.00), "Asia", "Japan", 15,
datetime(2022-10-06T01:40:00.00), "Europe", "Czech Republic", 29,
datetime(2022-10-06T02:50:00.00), "Europe", "France", 14,
datetime(2022-10-06T07:55:00.00), "Europe", "Germany", 13,
datetime(2022-10-06T04:55:00.00), "North America", "USA", 12,
datetime(2022-10-06T05:55:00.00), "North America", "Haiti", 7,
datetime(2022-10-06T09:55:00.00), "North America", "Jamaica", 4,
];
T
| top-nested of bin(d, 1d) by sum(val), top-nested 2 of continent by sum(val)
d
aggregated_d
continent
aggregated_continent
2022-10-05T00:00:00Z
101
Asia
50
2022-10-05T00:00:00Z
101
North America
33
2022-10-06T00:00:00Z
109
Europe
56
2022-10-06T00:00:00Z
109
Asia
30
Fiddle

Related

Hierarchical plot with bubbles in r

On this page, I found an interesting plot:
Is it possible to do something similar or exactly? (Combination between treemap and ggraph library).
You can get a similar appearance with the voronoiTreemap package:
library(voronoiTreemap)
vor <- data.frame(h1 = 'World',
h2 = c('Europe', 'Europe', "Europe",
'America', 'America', 'America', 'America',
'Asia', 'Asia', 'Asia', 'Asia', 'Asia', 'Asia',
'Africa', 'Africa', 'Africa'),
h3 = c("UK", "France", "Germany",
"USA", "Mexico", "Canada", "Brazil",
"China", "India", "S Korea", "Japan", "Thailand",
"Malaysia", "Egypt", "South Africa", "Nigeria"),
color = rep(c("pink", "steelblue", "#96f8A0", "yellow"),
times = c(3, 4, 6, 3)),
weight = c(12, 10, 15, 40, 5, 7, 9, 45, 30, 20, 20, 6, 9,
8, 10, 5),
codes = c("UK", "France", "Germany",
"USA", "Mexico", "Canada", "Brazil",
"China", "India", "S Korea", "Japan", "Thailand",
"Malaysia", "Egypt", "South Africa", "Nigeria"))
vt <- vt_input_from_df(vor)
vt_d3(vt_export_json(vt))

How to bring a chord diagram to a similar form?

I have a database.
I want to build a chord diagram similar to this one:
https://i.stack.imgur.com/59JcJ.png
My code:
vertices <- data.frame(name = unique(c(as.character(imports$Partner), as.character(imports$Reporter))) )
mygraph <- graph_from_data_frame( imports, vertices=vertices )
from <- match( imports$Reporter, vertices$name)
to <- match( imports$Partner, vertices$name)
ggraph(mygraph, layout = 'dendrogram', circular = TRUE)
geom_conn_bundle(data = get_con(from = from, to = to), alpha=0.2, colour="skyblue", tension = 0)
geom_node_point(aes(filter = leaf, x = x*1.05, y=y*1.05))
theme_void()
Result:
https://i.stack.imgur.com/uY2Yq.png
I searched for all kinds of settings for the chord diagram, but I didn’t find, how to make the links of the same size, and set the number of lines as an indicator of the value. Does anyone know how to create such a diagram?
data=structure(list(Reporter = c("USA", "USA", "USA", "USA", "India",
"Japan", "Japan", "USA", "Rep. of Korea", "USA", "Japan", "Japan",
"Japan", "Rep. of Korea", "USA", "USA", "USA", "China", "USA",
"USA", "Rep. of Korea", "USA", "Japan", "Japan", "Rep. of Korea",
"China", "China", "Rep. of Korea", "India", "China", "Rep. of Korea",
"USA", "Rep. of Korea", "Japan", "China", "Rep. of Korea", "India",
"China", "China", "India", "China", "China"), Partner = c("Saudi Arabia", "Canada", "Venezuela", "Mexico",
"Areas, nes", "Saudi Arabia", "United Arab Emirates", "Nigeria",
"Saudi Arabia", "Iraq", "Iran", "Qatar", "Kuwait", "United Arab Emirates",
"Angola", "Norway", "Colombia", "Oman", "United Kingdom", "Kuwait",
"Iran", "Gabon", "Indonesia", "Oman", "Kuwait", "Angola", "Iran",
"Oman", "Saudi Arabia", "Saudi Arabia", "Qatar", "Ecuador", "Indonesia",
"China", "Indonesia", "Australia", "Nigeria", "Yemen", "Fmr Sudan",
"Kuwait", "Iraq", "Viet Nam", "Iraq", "Australia", "Angola",
"United Arab Emirates", "Argentina", "Iran", "Trinidad and Tobago",
"Congo", "Yemen", "Iraq", "Viet Nam", "Australia", "Malaysia",
"Mexico", "Indonesia", "China", "Congo", "Ecuador", "Malaysia",
"Qatar", "Brunei Darussalam", "Norway"), Qty = c(69785202126, 68349221243, 68326932683,
64923669168, 57159000064, 53691639675, 52396394737, 46817696134,
38307387772, 31471382247, 25554794183, 19184268129, 18481591406,
16695296617, 16497467586, 16029110463, 15953011573, 15660839936,
14459452736, 13796910873, 11134838478, 10393629031, 10258716565,
9751327665, 9417368771, 8636634112, 7000465408, 6586187350, 5769723904,
5730211328, 5702528697, 5553458497, 5290777764, 5113191253, 4575188480,
4361612670, 3888963072, 3612423424, 3313590784, 3223781888, 3183182080,
3158472192, 3151280715, 3081015515, 3067260000, 2921931008, 2850134892,
2607684096, 2587749446, 2547349198, 2485083122, 2443798762, 2365431992,
2342513214, 2308853961, 2130664704, 1942125162, 1828376381, 1814260579,
1785874000, 1609282280, 1598901888, 1534923974, 1477843712, 1476737920,
1454356736, 1355873401, 1293729024, 1285355978, 1278701346, 1259876360,
1252518912, 1248772992, 1223383808, 1163368000, 1144188000, 1108399232,
1062041363, 1041526592, 977722731, 897418483, 877541040, 845556546,
801940467, 744316800, 739848000, 724177472, 694896000, 685405539,
672387008, 554965585, 540327751, 508204324, 4.87e+08, 457252032,
433428000, 430473920, 426744352, 408635880, 392727578, 390598528,
390189912, 389451923, 384376548, 350920922, 327039700, 285413702,
285143680, 275486240, 274015471, 264478000, 260122000, 238997756,
227806048, 204376795, 192144011, 150791409, 140634221, 135842986,
130777039, 129973032, 125115000, 124681401, 123443000, 120061792,
110795499, 106762492, 105548008, 84693986, 70275359, 57248174,
47944463, 40236018, 30783728, 18364000, 13419253, 12551365, 9631763,
5994199, 374000, 350000, 339115, 86420, 24000, 180)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -145L))

How to hide, or at least customize colorbar in plotly scattergeo in R

I want to create a scattergeo plot with markers for capitals. These markers are sized and colored regarding database values.
If I use the standard colors, everything goes well : the map is shown properly with good size, different colors, and no legend (I add the information country:value in a hover text on the markers).
However if I use a custom palette using colors feature into my scattergeo plot the colorbar is always displayed. Showlegend=F and Showscale=F doesn't help. Since I remove colors the colorbar disapeared.
Moreover, if I try to customize it (eg change the title or the format of the tickslabels) it doesn't work.
In other words no option I have tried on this colorbar works !
This is my data :
structure(list(ISO3 = c("ARG", "AUS", "AUT", "BEL", "BRA", "CAN",
"CHE", "CHL", "CHN", "COK", "DEU", "DNK", "ESP", "FIN", "FJI",
"FRA", "GBR", "HKG", "IDN", "IND", "ITA", "JPN", "KOR", "LUX",
"MEX", "MYS", "NCL", "NLD", "NOR", "NZL", "PHL", "PRT", "RUS",
"SGP", "SWE", "THA", "TON", "USA", "WSM"), Total = c(1073L, 8204L,
818L, 1502L, 1871L, 7958L, 3524L, 2456L, 3345L, 456L, 5010L,
569L, 2775L, 184L, 75L, 60382L, 4424L, 415L, 146L, 405L, 8369L,
8176L, 1034L, 235L, 961L, 137L, 6522L, 667L, 309L, 7960L, 238L,
316L, 486L, 404L, 480L, 200L, 41L, 85225L, 46L), Size = c(16,
30, 14, 18, 19, 30, 24, 21, 23, 12, 26, 13, 22, 8, 5, 50, 25,
11, 7, 11, 30, 30, 16, 9, 15, 7, 28, 13, 10, 30, 9, 10, 12, 11,
12, 8, 3, 54, 4), Color = c(3, 4, 3, 3, 3, 4, 4, 3, 4, 3, 4,
3, 3, 2, 2, 5, 4, 3, 2, 3, 4, 4, 3, 2, 3, 2, 4, 3, 2, 4, 2, 2,
3, 3, 3, 2, 2, 5, 2), ISO2 = c("AR", "AU", "AT", "BE", "BR",
"CA", "CH", "CL", "CN", "CK", "DE", "DK", "ES", "FI", "FJ", "FR",
"GB", "HK", "ID", "IN", "IT", "JP", "KR", "LU", "MX", "MY", "NC",
"NL", "NO", "NZ", "PH", "PT", "RU", "SG", "SE", "TH", "TO", "US",
"WS"), LABELFR = c("Argentine", "Australie", "Autriche", "Belgique",
"Brésil", "Canada", "Suisse", "Chili", "Chine", "Iles Cook",
"Allemagne", "Danemark", "Espagne", "Finlande", "Fidji", "France",
"Royaume-Uni", "Hong-kong, Chine", "Indonésie", "Inde", "Italie",
"Japon", "Corée, République de", "Luxembourg", "Mexique", "Malaisie",
"Nouvelle-Calédonie", "Pays-Bas", "Norvège", "Nouvelle-Zélande",
"Philippines", "Portugal", "Russie, Fédération de", "Singapour",
"Suède", "Thaïlande", "Tonga", "Etats-Unis", "Samoa"), LABELEN = c("Argentina",
"Australia", "Austria", "Belgium", "Brazil", "Canada", "Switzerland",
"Chile", "China", "Cook Islands", "Germany", "Denmark", "Spain",
"Finland", "Fiji", "France", "United Kingdom", "Hong Kong", "Indonesia",
"India", "Italy", "Japan", "South Korea", "Luxembourg", "Mexico",
"Malaysia", "New Caledonia", "Netherlands", "Norway", "New Zealand",
"Philippines", "Portugal", "Russia", "Singapore", "Sweden", "Thailand",
"Tonga", "United States", "Samoa"), CAPITAL = c("Buenos Aires",
"Canberra", "Vienna", "Brussels", "Brasilia", "Ottawa", "Bern",
"Santiago", "Beijing", "Avarua", "Berlin", "Copenhagen", "Madrid",
"Helsinki", "Suva", "Paris", "London", "N/A", "Jakarta", "New Delhi",
"Rome", "Tokyo", "Seoul", "Luxembourg", "Mexico City", "Kuala Lumpur",
"Noumea", "Amsterdam", "Oslo", "Wellington", "Manila", "Lisbon",
"Moscow", "Singapore", "Stockholm", "Bangkok", "Nuku'alofa",
"Washington", "Apia"), LATITUDE = c("-34.583333333333336", "-35.266666666666666",
"48.2", "50.833333333333336", "-15.783333333333333", "45.416666666666664",
"46.916666666666664", "-33.45", "39.916666666666664", "-21.2",
"52.516666666666666", "55.666666666666664", "40.4", "60.166666666666664",
"-18.133333333333333", "48.86666666666667", "51.5", "0", "-6.166666666666667",
"28.6", "41.9", "35.68333333333333", "37.55", "49.6", "19.433333333333334",
"3.1666666666666665", "-22.266666666666666", "52.35", "59.916666666666664",
"-41.3", "14.6", "38.71666666666667", "55.75", "1.2833333333333332",
"59.333333333333336", "13.75", "-21.133333333333333", "38.883333",
"-13.816666666666666"), LONGITUDE = c("-58.666667", "149.133333",
"16.366667", "4.333333", "-47.916667", "-75.700000", "7.466667",
"-70.666667", "116.383333", "-159.766667", "13.400000", "12.583333",
"-3.683333", "24.933333", "178.416667", "2.333333", "-0.083333",
"0.000000", "106.816667", "77.200000", "12.483333", "139.750000",
"126.983333", "6.116667", "-99.133333", "101.700000", "166.450000",
"4.916667", "10.750000", "174.783333", "120.966667", "-9.133333",
"37.600000", "103.850000", "18.050000", "100.516667", "-175.200000",
"-77.000000", "-171.766667"), CONTINENT = c("South America",
"Australia", "Europe", "Europe", "South America", "Central America",
"Europe", "South America", "Asia", "Australia", "Europe", "Europe",
"Europe", "Europe", "Australia", "Europe", "Europe", "Asia",
"Asia", "Asia", "Europe", "Asia", "Asia", "Europe", "Central America",
"Asia", "Australia", "Europe", "Europe", "Australia", "Asia",
"Europe", "Europe", "Asia", "Europe", "Asia", "Australia", "Central America",
"Australia")), class = c("data.table", "data.frame"), row.names = c(NA,
-39L), .internal.selfref = <pointer: 0x000001ffb6417970>, sorted = "ISO3")
This is my code :
fig <- plot_ly(
type = 'scattergeo',
showlegend=F,
mode='markers',
data=TOUR,
y=~LATITUDE,
x=~LONGITUDE,
text=sprintf("%s : %s",TOUR$LABELFR,TOUR$Total),
hovertemplate = "%{text}<extra></extra>",
colors=c(ispfPalette[c(9,4,2)]),
color=~Color,
marker=list(
showscale=F,
size=~Size,
reversescale=F
)
)
And this is the output I have :
Best solution would be to hide completely the colorbar, but I would also be curious how to customize it by changing the title and formatting the values (per example in case of % or if I want to change the decimal separator).
Thanks for your help !
Update on request:
You could modify the colorbar in the colors argument, for example like this:
fig <- plot_ly(
type = 'scattergeo',
showlegend=F,
mode='markers',
data=TOUR,
y=~LATITUDE,
x=~LONGITUDE,
text=sprintf("%s : %s",TOUR$LABELFR,TOUR$Total),
hovertemplate = "%{text}<extra></extra>",
colors="YlOrRd",
#colors = c("#1B98E0","black"),
color=~Color,
marker=list(
showscale=F,
size=~Size,
reversescale=F
)
)
fig
colors="YlOrRd"
colors = c("#1B98E0","black")
ORIGINAL ANSWER:
Just add: %>% hide_colorbar at the end of your code:
fig <- plot_ly(
type = 'scattergeo',
showlegend=F,
mode='markers',
data=TOUR,
y=~LATITUDE,
x=~LONGITUDE,
text=sprintf("%s : %s",TOUR$LABELFR,TOUR$Total),
hovertemplate = "%{text}<extra></extra>",
colors=c(ispfPalette[c(9,4,2)]),
color=~Color,
marker=list(
showscale=F,
size=~Size,
reversescale=F
)
) %>% hide_colorbar()

Time series visualization with ggplot2 Rstudio (country-level)

I'm just learning R fundamentals, and I would like to ask your help with data visualization, and specifically time series. I'm studying how vote shares of a specific category of political parties (right-wing populists) vary overtime in each country from 2009 to 2019.
Here's my dataset:
dput(votesharespop)
structure(list(country = c("Austria", "Belgium", "Bulgaria",
"Czech Republic", "Denmark", "Estonia", "Finland", "France",
"Germany", "Great Britain", "Greece", "Hungary", "Italy", "Lithuania",
"Luxembourg", "Netherlands", "Poland", "Romania", "Portugal",
"Slovakia", "Slovenia", "Spain", "Sweden", "Austria", "Belgium",
"Bulgaria", "Czech Republic", "Denmark", "Estonia", "Finland",
"France", "Germany", "Great Britain", "Greece", "Hungary", "Italy",
"Lithuania", "Luxembourg", "Netherlands", "Poland", "Romania",
"Portugal", "Slovakia", "Slovenia", "Spain", "Sweden", "Austria",
"Belgium", "Bulgaria", "Czech Republic", "Denmark", "Estonia",
"Finland", "France", "Germany", "Great Britain", "Greece", "Hungary",
"Italy", "Lithuania", "Luxembourg", "Netherlands", "Poland",
"Romania", "Portugal", "Slovakia", "Slovenia", "Spain", "Sweden"
), year = c(2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009,
2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009,
2009, 2009, 2009, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014,
2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014,
2014, 2014, 2014, 2014, 2019, 2019, 2019, 2019, 2019, 2019, 2019,
2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019,
2019, 2019, 2019, 2019, 2019), vote_share = c(17.3, 15.7, 16.7,
4.3, 15.3, 0, 9.8, 8.1, 1.7, 22.7, 7.2, 71.2, 45.5, 12.2, 7.4,
17, 27.4, 8.7, 0, 5.6, 35.2, 0, 3.3, 20.2, 7.6, 16.8, 4.8, 26.6,
5.3, 12.9, 28.7, 0.4, 28.6, 6.2, 66.2, 26.7, 14.3, 7.5, 13.3,
31.8, 2.7, 0, 3.6, 28.8, 1.6, 9.7, 17.2, 13.8, 14.6, 10, 10.8,
12.7, 13.8, 26.8, 11, 34.9, 6.2, 62.2, 49.5, 2.7, 10, 14.5, 49.1,
0, 1.5, 7.3, 30.3, 6.2, 15.3), continent = c("Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -69L))
My aim was to get something like this (no interactive):
Or something like facets, but for each country.
Thank you very much for your attention.
Data
votesharespop <- structure(list(country = c("Austria", "Belgium", "Bulgaria",
"Czech Republic", "Denmark", "Estonia", "Finland", "France",
"Germany", "Great Britain", "Greece", "Hungary", "Italy", "Lithuania",
"Luxembourg", "Netherlands", "Poland", "Romania", "Portugal",
"Slovakia", "Slovenia", "Spain", "Sweden", "Austria", "Belgium",
"Bulgaria", "Czech Republic", "Denmark", "Estonia", "Finland",
"France", "Germany", "Great Britain", "Greece", "Hungary", "Italy",
"Lithuania", "Luxembourg", "Netherlands", "Poland", "Romania",
"Portugal", "Slovakia", "Slovenia", "Spain", "Sweden", "Austria",
"Belgium", "Bulgaria", "Czech Republic", "Denmark", "Estonia",
"Finland", "France", "Germany", "Great Britain", "Greece", "Hungary",
"Italy", "Lithuania", "Luxembourg", "Netherlands", "Poland",
"Romania", "Portugal", "Slovakia", "Slovenia", "Spain", "Sweden"
), year = c(2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009,
2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009,
2009, 2009, 2009, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014,
2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014,
2014, 2014, 2014, 2014, 2019, 2019, 2019, 2019, 2019, 2019, 2019,
2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019,
2019, 2019, 2019, 2019, 2019), vote_share = c(17.3, 15.7, 16.7,
4.3, 15.3, 0, 9.8, 8.1, 1.7, 22.7, 7.2, 71.2, 45.5, 12.2, 7.4,
17, 27.4, 8.7, 0, 5.6, 35.2, 0, 3.3, 20.2, 7.6, 16.8, 4.8, 26.6,
5.3, 12.9, 28.7, 0.4, 28.6, 6.2, 66.2, 26.7, 14.3, 7.5, 13.3,
31.8, 2.7, 0, 3.6, 28.8, 1.6, 9.7, 17.2, 13.8, 14.6, 10, 10.8,
12.7, 13.8, 26.8, 11, 34.9, 6.2, 62.2, 49.5, 2.7, 10, 14.5, 49.1,
0, 1.5, 7.3, 30.3, 6.2, 15.3), continent = c("Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -69L))
Code
library(ggplot2)
library(ggthemes) # to access theme_hc()
ggplot(data = votesharespop, mapping = aes(x = year, y = vote_share, color = country)) + # specify data, x-axis, y-axis and grouping variable
geom_line() + # a line per group
geom_point() + # points per group
theme_hc() + # a ggtheme, similar to your example
labs(title = "Variation of vote shares of right wing populists, 2009 to 2019", # plot title
subtitle = "Add a subtitle of your choice", # plot subtitle
caption = "Add a caption of your choice") + # plot caption
theme(legend.position = "right", # move legend to the right hand side of the plot
axis.title.x = element_blank(), # remove x axis title
axis.title.y = element_blank(), # remove y axis title
legend.title = element_blank(), # remove legend title
plot.title = element_text(size = 20, color = "gray40"), # change size and color of plot title
plot.subtitle = element_text(color = "gray40"), # change color of subtitle
plot.caption = element_text(color = "gray40", hjust = 0)) + # change color of caption and left-align
scale_y_continuous(breaks = seq(0, 80, by = 20)) + # specify min, max and break distance for y axis
scale_x_continuous(breaks = seq(2009, 2019, by = 5)) + # specify min, max and break distance for x axis
expand_limits(y = c(0, 80))
Output
Note however, that for multiple groups, the colors can be quite indistinguishable. It might be preferable to go with facet_wrap
Code
ggplot(data = votesharespop, mapping = aes(x = year, y = vote_share, color = country)) + # specify data, x-axis, y-axis and grouping variable
geom_line() + # a line per group
geom_point() + # points per group
theme_hc() + # a ggtheme, similar to your example
labs(title = "Variation of vote shares of right wing populists, 2009 to 2019", # plot title
subtitle = "Add a subtitle of your choice", # plot subtitle
caption = "Add a caption of your choice") + # plot caption
theme(legend.position = "right", # move legend to the right hand side of the plot
axis.title.x = element_blank(), # remove x axis title
axis.title.y = element_blank(), # remove y axis title
legend.title = element_blank(), # remove legend title
plot.title = element_text(size = 20, color = "gray40"), # change size and color of plot title
plot.subtitle = element_text(color = "gray40"), # change color of subtitle
plot.caption = element_text(color = "gray40", hjust = 0)) + # change color of caption and left-align
scale_y_continuous(breaks = seq(0, 75, by = 25)) + # specify min, max and break distance for y axis
scale_x_continuous(breaks = seq(2009, 2019, by = 5)) + # specify min, max and break distance for x axis
expand_limits(y = c(0, 75)) + # adjust y axis limits
facet_wrap(~ country) + # facet wrap
theme(legend.position = "none") + # remove legend, since not needed anymore in facet_wrap
theme(panel.spacing.x = unit(4, "mm")) # avoid overlapping of x axis text
Output

Why ggplot2 duplicate fill legend?

This is my data frame:
df1<-structure(list(Country_name = c("Austria", "Austria", "Austria",
"Austria", "Belgium", "Belgium", "Belgium", "Belgium", "Cyprus",
"Cyprus", "Cyprus", "Cyprus", "Denmark", "Denmark", "Denmark",
"Denmark", "Finland", "Finland", "Finland", "Finland", "France",
"France", "France", "France", "Germany", "Germany", "Germany",
"Germany", "Greece", "Greece", "Greece", "Greece", "Iceland",
"Iceland", "Iceland", "Iceland", "Ireland", "Ireland", "Ireland",
"Ireland", "Italy", "Italy", "Italy", "Italy", "Luxembourg",
"Luxembourg", "Luxembourg", "Luxembourg", "Malta", "Malta", "Malta",
"Malta", "Netherlands", "Netherlands", "Netherlands", "Netherlands",
"North Cyprus", "North Cyprus", "North Cyprus", "North Cyprus",
"Norway", "Norway", "Norway", "Norway", "Portugal", "Portugal",
"Portugal", "Portugal", "Spain", "Spain", "Spain", "Spain", "Sweden",
"Sweden", "Sweden", "Sweden", "Switzerland", "Switzerland", "Switzerland",
"Switzerland", "United Kingdom", "United Kingdom", "United Kingdom",
"United Kingdom"), Regional_indicator = c("Western Europe", "Western Europe",
"Western Europe", "Western Europe", "Western Europe", "Western Europe",
"Western Europe", "Western Europe", "Western Europe", "Western Europe",
"Western Europe", "Western Europe", "Western Europe", "Western Europe",
"Western Europe", "Western Europe", "Western Europe", "Western Europe",
"Western Europe", "Western Europe", "Western Europe", "Western Europe",
"Western Europe", "Western Europe", "Western Europe", "Western Europe",
"Western Europe", "Western Europe", "Western Europe", "Western Europe",
"Western Europe", "Western Europe", "Western Europe", "Western Europe",
"Western Europe", "Western Europe", "Western Europe", "Western Europe",
"Western Europe", "Western Europe", "Western Europe", "Western Europe",
"Western Europe", "Western Europe", "Western Europe", "Western Europe",
"Western Europe", "Western Europe", "Western Europe", "Western Europe",
"Western Europe", "Western Europe", "Western Europe", "Western Europe",
"Western Europe", "Western Europe", "Western Europe", "Western Europe",
"Western Europe", "Western Europe", "Western Europe", "Western Europe",
"Western Europe", "Western Europe", "Western Europe", "Western Europe",
"Western Europe", "Western Europe", "Western Europe", "Western Europe",
"Western Europe", "Western Europe", "Western Europe", "Western Europe",
"Western Europe", "Western Europe", "Western Europe", "Western Europe",
"Western Europe", "Western Europe", "Western Europe", "Western Europe",
"Western Europe", "Western Europe"), continent = c("Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe",
"Europe", "Europe", "Europe", "Europe", "Europe", "Europe"),
variables_names = c("lowerwhisker", "Ladder_score", "upperwhisker",
"Logged_GDP_per_capita", "lowerwhisker", "Ladder_score",
"upperwhisker", "Logged_GDP_per_capita", "lowerwhisker",
"Ladder_score", "upperwhisker", "Logged_GDP_per_capita",
"lowerwhisker", "Ladder_score", "upperwhisker", "Logged_GDP_per_capita",
"lowerwhisker", "Ladder_score", "upperwhisker", "Logged_GDP_per_capita",
"lowerwhisker", "Ladder_score", "upperwhisker", "Logged_GDP_per_capita",
"lowerwhisker", "Ladder_score", "upperwhisker", "Logged_GDP_per_capita",
"lowerwhisker", "Ladder_score", "upperwhisker", "Logged_GDP_per_capita",
"lowerwhisker", "Ladder_score", "upperwhisker", "Logged_GDP_per_capita",
"lowerwhisker", "Ladder_score", "upperwhisker", "Logged_GDP_per_capita",
"lowerwhisker", "Ladder_score", "upperwhisker", "Logged_GDP_per_capita",
"lowerwhisker", "Ladder_score", "upperwhisker", "Logged_GDP_per_capita",
"lowerwhisker", "Ladder_score", "upperwhisker", "Logged_GDP_per_capita",
"lowerwhisker", "Ladder_score", "upperwhisker", "Logged_GDP_per_capita",
"lowerwhisker", "Ladder_score", "upperwhisker", "Logged_GDP_per_capita",
"lowerwhisker", "Ladder_score", "upperwhisker", "Logged_GDP_per_capita",
"lowerwhisker", "Ladder_score", "upperwhisker", "Logged_GDP_per_capita",
"lowerwhisker", "Ladder_score", "upperwhisker", "Logged_GDP_per_capita",
"lowerwhisker", "Ladder_score", "upperwhisker", "Logged_GDP_per_capita",
"lowerwhisker", "Ladder_score", "upperwhisker", "Logged_GDP_per_capita",
"lowerwhisker", "Ladder_score", "upperwhisker", "Logged_GDP_per_capita"
), variables_values = c(7.22880411148071, 7.29419994354248,
7.35959577560425, 10.742823600769, 6.79613399505615, 6.86350011825562,
6.93086624145508, 10.6736392974854, 6.05981206893921, 6.15899991989136,
6.25818777084351, 10.4057025909424, 7.57995510101318, 7.64559984207153,
7.71124458312988, 10.774001121521, 7.747633934021, 7.80870008468628,
7.86976623535156, 10.6392669677734, 6.59014940261841, 6.66379976272583,
6.73745012283325, 10.5842227935791, 7.00600481033325, 7.0757999420166,
7.14559507369995, 10.7328186035156, 5.42301368713379, 5.5149998664856,
5.6069860458374, 10.1323261260986, 7.38765287399292, 7.50449991226196,
7.62134695053101, 10.7725591659546, 7.01556777954102, 7.09369993209839,
7.17183208465576, 11.1609783172607, 6.30255556106567, 6.38740015029907,
6.47224473953247, 10.4818363189697, 7.17703056335449, 7.23750019073486,
7.29796981811523, 11.4506807327271, 6.68860197067261, 6.77279996871948,
6.85699796676636, 10.5338382720947, 7.39442825317383, 7.44890022277832,
7.50337219238281, 10.8127117156982, 5.43546724319458, 5.53550004959106,
5.63553285598755, 10.4057025909424, 7.41971874237061, 7.48799991607666,
7.55628108978271, 11.0878038406372, 5.8067102432251, 5.9109001159668,
6.0150899887085, 10.2637424468994, 6.31799125671387, 6.40089988708496,
6.48380851745605, 10.462926864624, 7.28248071670532, 7.35349988937378,
7.42451906204224, 10.7587938308716, 7.49127197265625, 7.55989980697632,
7.62852764129639, 10.9799327850342, 7.09166288375854, 7.16450023651123,
7.23733758926392, 10.6001348495483)), row.names = c(NA, -84L
), class = c("tbl_df", "tbl", "data.frame"))
When I change the colors of the stacked bar the colors are changed. But the problem is when I set new names for my labels the legend is duplicated.
Why this is happening?
This is my code:
ggplot(df1) +
geom_bar(aes(x = variables_values, y = Country_name,
group = Country_name, linetype = variables_names,
fill = variables_names),
stat = 'identity',
lwd = 0.71,
width = 0.5,
position = 'stack') +
scale_fill_manual(values = c('red','blue','green','red'), labels = letters[1:4])
For each aes other than x and y adds another legend accordingly, if you specify aes(..., linetype = var, fill = var) you will get one legend per level of the specified variables.
Therefore, as also mentioned in the comments, you will need to keep only a single aes, unless you are working with high dimensional data.
ggplot(df1) +
geom_bar(
aes(x = variables_values, y = Country_name,
fill = variables_names),
stat = 'identity',
lwd = 0.71,
width = 0.5,
position = 'stack'
) +
scale_fill_manual(values = c('red', 'blue', 'green', 'red'),
labels = letters[1:4])

Resources