Replicating a Data Visualization with R/ggplot - r

Replicating a visualization I saw in print media using ggplot2
Context:
I am always looking to make data visualizations more appealing/aesthetic specifically for non-data people, who are the majority of people I work with (stakeholders like marketers, management, etc) -- I've noted that when visualizations look like academic-publication-quality (standard ggplot2 aesthetics) they tend to assume they can't understand it and don't bother trying, defeating the whole purpose of visualizations in the first place. However, when it looks more graphic'y (like something you may see on websites or marketing material) they focus and try to understand the visualization, usually successfully. Often we'll end up in the most interesting discussions from these types of visualizations, so that is my ultimate goal.
The Visualization:
Here is something I saw on some marketing brochure on the device share of web traffic by geo, and though it is actually a bit busy and unclear, it resonated better than a similar stacked bar chart I created in standard -- I have not the slightest idea how I might replicate something like this within ggplot2, any attempts would be much appreciated! Here is some sample tidy data to use in a data.table:
structure(list(country = c("Argentina", "Argentina", "Argentina",
"Brazil", "Brazil", "Brazil", "Canada",
"Canada", "Canada", "China", "China",
"China", "Japan", "Japan", "Japan", "Spain",
"Spain", "Spain", "UK", "UK", "UK", "USA",
"USA", "USA"),
device_type = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L),
class = "factor",
.Label = c("desktop",
"mobile",
"multi")),
proportion = c(0.37, 0.22, 0.41, 0.3, 0.31, 0.39,
0.35, 0.06, 0.59, 0.19, 0.2, 0.61,
0.4, 0.18, 0.42, 0.16, 0.28, 0.56,
0.27, 0.06, 0.67, 0.37, 0.08, 0.55)),
.Names = c("country", "device_type", "proportion"),
row.names = c(NA, -24L),
class = c("data.table", "data.frame"))

You could also consider googleVis
library(googleVis)
dat <- structure(list(country = c("Argentina", "Argentina", "Argentina",
"Brazil", "Brazil", "Brazil", "Canada",
"Canada", "Canada", "China", "China",
"China", "Japan", "Japan", "Japan", "Spain",
"Spain", "Spain", "UK", "UK", "UK", "USA",
"USA", "USA"),
device_type = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L),
class = "factor",
.Label = c("desktop",
"mobile",
"multi")),
proportion = c(0.37, 0.22, 0.41, 0.3, 0.31, 0.39,
0.35, 0.06, 0.59, 0.19, 0.2, 0.61,
0.4, 0.18, 0.42, 0.16, 0.28, 0.56,
0.27, 0.06, 0.67, 0.37, 0.08, 0.55)),
.Names = c("country", "device_type", "proportion"),
row.names = c(NA, -24L),
class = c("data.table", "data.frame"))
link_order <- unique(dat$country)
node_order <- unique(as.vector(rbind(dat$country, as.character(dat$device_type))))
link_cols <- data.frame(color = c('#ffd1ab', '#ff8d14', '#ff717e', '#dd2c40', '#d6b0ea',
'#8c4fab','#00addb','#297cbe'),
country = c("UK", "Canada", "USA", "China", "Spain", "Japan", "Argentina", "Brazil"),
stringsAsFactors = F)
node_cols <- data.frame(color = c("#ffc796", "#ff7100", "#ff485b", "#d20000",
"#cc98e6", "#6f2296", "#009bd2", "#005daf",
"grey", "grey", "grey"),
type = c("UK", "Canada", "USA", "China", "Spain", "Japan",
"Argentina", "Brazil", "multi", "desktop", "mobile"))
link_cols2 <- sapply(link_order, function(x) link_cols[x == link_cols$country, "color"])
node_cols2 <- sapply(node_order, function(x) node_cols[x == node_cols$type, "color"])
actual_link_cols <- paste0("[", paste0("'", link_cols2,"'", collapse = ','), "]")
actual_node_cols <- paste0("[", paste0("'", node_cols2,"'", collapse = ','), "]")
opts <- paste0("{
link: { colorMode: 'source',
colors: ", actual_link_cols ," },
node: {colors: ", actual_node_cols ,"}}")
Sankey <- gvisSankey(dat,
from = "country",
to = "device_type",
weight = "proportion",
options = list(height = 500, width = 1000, sankey = opts))
plot(Sankey)

You can try with "ggalluvial" package and its respective "geom".
Chek this out

Related

How can I make several lines in ggplot with several group layers?

I have divided my plots into 2 based on Sportbook and Casino. How is it possible to also split line into several lines (different colors) to show different markets? I tried to use fill=market at the end of ggplot function, however it did not help.
library(ggplot2)
data<-structure(list(wday = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L,
4L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L,
7L, 7L, 7L), .Label = c("Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday", "Sunday"), class = "factor"), market = c("France",
"France", "Germany", "Germany", "Poland", "Poland", "France",
"France", "Germany", "Germany", "Poland", "Poland", "France",
"France", "Germany", "Germany", "Poland", "Poland", "France",
"France", "Germany", "Germany", "Poland", "Poland", "France",
"France", "Germany", "Germany", "Poland", "Poland", "France",
"France", "Germany", "Germany", "Poland", "Poland", "France",
"France", "Germany", "Germany", "Poland", "Poland"), product_preference = c("Casino",
"Sportsbook", "Casino", "Sportsbook", "Casino", "Sportsbook",
"Casino", "Sportsbook", "Casino", "Sportsbook", "Casino", "Sportsbook",
"Casino", "Sportsbook", "Casino", "Sportsbook", "Casino", "Sportsbook",
"Casino", "Sportsbook", "Casino", "Sportsbook", "Casino", "Sportsbook",
"Casino", "Sportsbook", "Casino", "Sportsbook", "Casino", "Sportsbook",
"Casino", "Sportsbook", "Casino", "Sportsbook", "Casino", "Sportsbook",
"Casino", "Sportsbook", "Casino", "Sportsbook", "Casino", "Sportsbook"
), ggr = c(3349.80897892753, 161.917715712988, 17700.4568364611,
-123.342131455399, 17208.7731385281, 3128.51277864992, 2877.17330617787,
28.5162781278127, 13453.7092912371, -82.8980672268908, 13611.1197727273,
9910.32070866143, 3939.20578803854, 126.311590466926, 19097.2664228723,
-94.5491666666667, 16706.9427008929, 2636.63687707641, 3393.43150322119,
176.953280238925, 23414.9515950069, -72.4428986866791, 16140.8680085653,
5618.00758333333, 3007.18322084806, 69.4383454281568, 18018.1755748663,
-77.87698, 19889.0339183673, 5561.69038585209, 4205.12735472371,
-16.0552268431002, 17166.1121932115, -117.149356025759, 18527.8546597938,
6806.36808346213, 3446.70375835385, 56.6674850849013, 18026.2400535475,
-67.3431629701062, 13641.4965135699, 11470.3083969466)), row.names = c(NA,
-42L), groups = structure(list(wday = structure(c(1L, 1L, 1L,
2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L,
7L, 7L), .Label = c("Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday", "Sunday"), class = "factor"), market = c("France",
"Germany", "Poland", "France", "Germany", "Poland", "France",
"Germany", "Poland", "France", "Germany", "Poland", "France",
"Germany", "Poland", "France", "Germany", "Poland", "France",
"Germany", "Poland"), .rows = structure(list(1:2, 3:4, 5:6, 7:8,
9:10, 11:12, 13:14, 15:16, 17:18, 19:20, 21:22, 23:24, 25:26,
27:28, 29:30, 31:32, 33:34, 35:36, 37:38, 39:40, 41:42), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -21L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), na.action = structure(43:46, .Names = c("43",
"44", "45", "46"), class = "omit"), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"))
ggplot() +
geom_line(data = data,aes(x = wday, y = ggr,group = product_preference))+
facet_grid(.~product_preference,scales="free")
You can define the color of your lines by adding color = <grouping variable>.
Also, you already do a facet grid on product_preference, so there seems to be no need to define group = product_preference.
Try this:
ggplot() +
geom_line(data = data,aes(x = wday, y = ggr,color = market, group = market)) +
facet_grid(.~product_preference,scales="free")
I am not sure this is what you wanted, but I would replace geom_line with 'geom_col'. 'geom_col' also takes the 'fill' attribute, while 'geom_line' would require 'color'.
So my suggestion would be the following:
ggplot() +
geom_col(data = data,aes(x = wday, y = ggr, group = product_preference, fill = market))+
facet_grid(.~product_preference,scales="free")
This results in the following plot:

How can I plot a world map with some circles according to the count (n) size in few countries with leaflet in R?

I apologize in advance, but I am really having a hard time trying to do a simple map plot in R using leaflet library. It might be the most basic question for such a simple problem, but I want to have a world map as an output with circles in those countries relative to n size. This being my input data:
country_groups <- structure(list(country_name = c("Australia", "Brazil", "Canada",
"China", "Germany", "India", "Japan", "Korea, Republic of", "Romania",
"Russian Federation", "Spain", "Taiwan, Province of China", "United Kingdom of Great Britain and Northern Ireland",
"United States of America"), latitude = c(47.516231, -14.235004,
56.130366, 35.86166, 51.165691, 20.593684, 36.204824, 35.907757,
45.943161, 61.52401, 40.463667, 23.69781, 55.378051, 37.09024
), longitude = c(14.550072, -51.92528, -106.346771, 104.195397,
10.451526, 78.96288, 138.252924, 127.766922, 24.96676, 105.318756,
-3.74922, 120.960515, -3.435973, -95.712891), n = c(2L, 1L, 1L,
541L, 1L, 6L, 3L, 6L, 1L, 3L, 1L, 3L, 1L, 56L)), row.names = c(NA,
-14L), class = "data.frame")
This is what I've tried but I get non-numeric argument for binary operator as an error.
country_groups$n <- as.integer(country_groups$n)
country_groups$longitude <- as.numeric(country_groups$longitude)
country_groups$latitude <- as.numeric(country_groups$latitude)
leaflet(country_groups)%>%
addCircles(lng = ~longitude, lat = ~latitude, weight = 1,
radius = n * 30, popup = ~country_name
)
And when I try to use as.numeric(n) inside the graph function, it drops this out Error in as.numeric(n) :
cannot coerce type 'closure' to vector of type 'double'

r: Assigning labels to leafs and rectangles to dendrogram using dendextend possible?

I want to plot dendrogram with for dataframe data with 7 columns (2 Factor, 5 num). The first column is containing the names of 7 different countries and in the following columns I have collected data for different parameters (like population, GDP etc.) characterizing each country. In the last column a factor variable assigns which continent the respective country belongs to.
Here is the data
structure(list(Country = structure(c(5L, 4L, 7L, 2L, 1L, 6L,
3L), .Label = c("Brazil", "Chile", "China", "France", "Germany",
"India", "Netherlands"), class = "factor"), GDP = c(0.46, 0.57,
0.75, 0.56, 0.28, 0.88, 1), Population = c(0.18, 0.09, 0.54,
0.01, 0.02, 0.17, 0.84), Birth.rate = c(87.21, 18.34, 63.91,
14.21, 5.38, 51.19, 209.26), Income = c(43.89, 18.23, 63.91,
12.3, 0.1, 14.61, 160.82), Savings = c(43.32, 0.11, 0, 1.91,
5.29, 36.58, 50.38), Continent = structure(c(2L, 2L, 2L, 3L,
3L, 1L, 1L), .Label = c("Asia", "Europe", "South America"), class = "factor")), .Names = c("Country",
"GDP", "Population", "Birth.rate", "Income", "Savings", "Continent"
), class = "data.frame", row.names = c(NA, -7L))
Now the dendrogram which I want to obtain should have the following characteristics:
the leave-labels should be colored according to there continent membership
the leaves should be labeled according to the respective country (NOT numbers)
there should be rectangles around the clusters
I have tried the dendextend package which can be found here https://cran.r-project.org/web/packages/dendextend/vignettes/introduction.html#setting-a-dendrograms-branches but 2. and 3. of the above characteristics seem not to work together at the same time. My code looks like this (after having normalized data to norm)
#color codes for continents
regionCodes = c(rep("Europe",3), rep("South America", 2), rep("Asia",2), )
rownames(data) = make.unique(regionCodes)
colorCodes = c(Europe="blue", South America="yellow", Asia="red")
#dendrogram generation and plot
dc = as.dendrogram(hclust(dist(norm), method="complete"))
labels_colors(dc) = colorCodes[regionCodes][order.dendrogram(dc)]
labels(dc) = data$Country
labels_cex(dc) = .7
dc %>% plot
dc %>% rect.dendrogram(k=4, border = 8, lty = 5, lwd = 2)
But it produces the following error
Error in data$Country : object of type 'closure' is not subsettable
Can you help me?

Plot each column against each column

I have a dataframe ("data") with 7 columns (2 Factor, 5 num). The first column is containing the names of 7 different countries and in the following columns I have collected data for different parameters (like population, GDP etc.) characterizing each country. In the last column a factor variable assigns which continent the respective country belongs to.
The data looks like this:
structure(list(Country = structure(c(5L, 4L, 7L, 2L, 1L, 6L,
3L), .Label = c("Brazil", "Chile", "China", "France", "Germany",
"India", "Netherlands"), class = "factor"), GDP = c(0.46, 0.57,
0.75, 0.56, 0.28, 0.88, 1), Population = c(0.18, 0.09, 0.54,
0.01, 0.02, 0.17, 0.84), Birth.rate = c(87.21, 18.34, 63.91,
14.21, 5.38, 51.19, 209.26), Income = c(43.89, 18.23, 63.91,
12.3, 0.1, 14.61, 160.82), Savings = c(43.32, 0.11, 0, 1.91,
5.29, 36.58, 50.38), Continent = structure(c(2L, 2L, 2L, 3L,
3L, 1L, 1L), .Label = c("Asia", "Europe", "South America"), class = "factor")), .Names = c("Country",
"GDP", "Population", "Birth.rate", "Income", "Savings", "Continent"
), class = "data.frame", row.names = c(NA, -7L))
I need some sort of loop function which plots (e.g. scatter plot) every single column against each other so that in the end every column (except the first and the last, i.e. the two factor variables) has been plotted against all other columns but each in a single plot chart (not all plots in one). Preferably all these plots are being saved to some folder on my local machine.
Also it would be great if the x and y axis are already labeled according to the respective two columns that are plotted against each other. Moreover it would be convenient to have a label next to each point in the plot displaying the respective country name. Lastly it would be nice to have three different colors for the points of the countries according to the three different continents.
So far I only have a piece of code that goes like
for (i in seq(1,length(data),1)) {
plot(data[,i], ylab=names(data[i]), xlab="Country",
text(i, labels=Country, pos=4, cex =.5))
}
As you can see it only plots each column against the first column ("Country") which is not what I want in the end.
Do you have any idea how I could achieve this?
You can use pairs() directly from R. Note that dt represents your dataset.
pairs(dt)
dt <- structure(list(Country = structure(c(5L, 4L, 7L, 2L, 1L, 6L,
3L), .Label = c("Brazil", "Chile", "China", "France", "Germany",
"India", "Netherlands"), class = "factor"), GDP = c(0.46, 0.57,
0.75, 0.56, 0.28, 0.88, 1), Population = c(0.18, 0.09, 0.54,
0.01, 0.02, 0.17, 0.84), Birth.rate = c(87.21, 18.34, 63.91,
14.21, 5.38, 51.19, 209.26), Income = c(43.89, 18.23, 63.91,
12.3, 0.1, 14.61, 160.82), Savings = c(43.32, 0.11, 0, 1.91,
5.29, 36.58, 50.38), Continent = structure(c(2L, 2L, 2L, 3L,
3L, 1L, 1L), .Label = c("Asia", "Europe", "South America"), class = "factor")), .Names = c("Country",
"GDP", "Population", "Birth.rate", "Income", "Savings", "Continent"
), class = "data.frame", row.names = c(NA, -7L))
I've alway thought that splom function in package 'lattice' was quite useful for this sort of exploratory analysis. This is obviously not a great example since it obscures the group memberships but it shows the combinations of points and a non-parametric regression line in the "pairs" format:
png()
print( splom(~iris[1:4], groups = Species, data = iris,
panel = function(x, y, i, j, ...) {
panel.points(x,y, ...)
panel.loess(x,y, ...)
})); dev.off()

Add borders around individual bar graph plots based on a column: ggplot2

I am trying to create a graph using the following dataframe:
df1 <- structure(list(country.name = structure(c(4L, 11L, 10L, 2L, 1L,
3L, 8L, 5L, 7L, 9L, 6L), .Label = c("Austria", "Belgium", "Czech Republic",
"Denmark", "France", "Germany", "Netherlands", "Norway", "Poland",
"Sweden", "Switzerland"), class = "factor"), level = c(2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L), no.regions = c(5L, 7L, 8L,
11L, 9L, 8L, 7L, 16L, 12L, 15L, 14L), min_result = c(42.59, 33.57,
43.1, 38.46, 41.76, 44.05, 41.67, 36.32, 36.18, 42.79, 39.91),
max_result = c(50.24, 46.56, 58.24, 57.41, 61.07, 64.56,
63.25, 58.19, 59.14, 69.19, 67.11), diff = c(7.65, 12.99,
15.14, 18.95, 19.31, 20.51, 21.58, 21.87, 22.96, 26.4, 27.2
), RD = c(-0.07, 0.131, -0.091, -0.153, -0.172, 0.203, -0.166,
0.145, -0.228, -0.266, -0.261), RDCI_lower = c(-0.21, -0.028, -0.194, -0.328, -0.376, 0.076, -0.315, 0.075, -0.407, -0.348,
-0.347), RDCI_upper = c(0.07, 0.29, 0.012, 0.021, 0.031,
0.331, -0.017, 0.216, -0.049, -0.184, -0.175), RDpvalue = c(0.3237,
0.1113, 0.08, 0.0829, 0.1017, 0.0023, 0.0299, 0, 0.0149,
0, 0), diff_order = structure(1:11, .Label = c("Denmark",
"Switzerland", "Sweden", "Belgium", "Austria", "Czech Republic",
"Norway", "France", "Netherlands", "Poland", "Germany"), class = "factor", scores = structure(c(19.31,
18.95, 20.51, 7.65, 21.87, 27.2, 22.96, 21.58, 26.4, 15.14,
12.99), .Dim = 11L, .Dimnames = list(c("Austria", "Belgium",
"Czech Republic", "Denmark", "France", "Germany", "Netherlands",
"Norway", "Poland", "Sweden", "Switzerland"))))), .Names = c("country.name",
"level", "no.regions", "min_result", "max_result", "diff", "RD",
"RDCI_lower", "RDCI_upper", "RDpvalue", "diff_order"), row.names = c(NA,
-11L), class = "data.frame")
My initial graph is:
a <- ggplot(df1, aes((x=diff_order), y=diff,fill=as.factor(level))) +
geom_bar(stat="identity") +
xlab("Country") +
theme_classic() +
coord_flip() +
scale_fill_manual(values=c("#2B2B2B", "#555555"),name="NUTS level") +
ggtitle("All") +
theme(plot.title = element_text(hjust = 0)) +
theme(axis.title.x = element_blank()) +
theme(axis.text=element_text(size=9),
axis.title=element_text(size=10,face="bold"))
a
Now, I want to add borders round my bar graph plots depending on whether the RDpvalue is < 0.05 (to visually show significance).
Is it possible to do this in R?
Just use the color aesthetic:
df1$color_grp <- ifelse(df1$RDpvalue < 0.05,'Yes','No')
a <- ggplot(df1,
aes(x=diff_order,
y=diff,
fill=as.factor(level),
color = color_grp)) +
geom_bar(stat="identity") +
xlab("Country") +
theme_classic() +
coord_flip() +
scale_fill_manual(values=c("#2B2B2B", "#555555"),name="NUTS level") +
ggtitle("All") +
theme(plot.title = element_text(hjust = 0)) +
theme(axis.title.x = element_blank()) +
theme(axis.text=element_text(size=9),
axis.title=element_text(size=10,face="bold"))
a

Resources