Highchart: Can I use a different variable as the data labels? - r

I'm trying to build a column chart through highchart in r studio. I've converted the values to % as I want the graph to show %, but I want the data labels to show the value, is there a way of doing this?
My data set has a column with the values for London and the percentages for London, I want the Y axis of the graph to show the % while the data labels show the value.
This is my current code:
hc <- highchart() %>%
hc_title(text= "Gender - London")%>%
hc_colors('#71599b') %>%
hc_yAxis(max = 0.7) %>%
hc_xAxis(categories = Sex$Gender) %>%
hc_add_series(name = "London", type = "column",
data = Sex$LON_PERC, dataLabels = list(enabled=TRUE, format={Sex$London}) )
So, I've put Sex$LON_PERC (% in London) as the data to plot while Sex$London is the data labels.
But this code puts all the values of London in each data label.
Edit:
This is the data I'm trying to plot, LON_PERC on the Y Axis, Gender on the X axis and London as the Data Labels
Gender London LON_PERC
Declined 5 0.000351247
Female 8230 0.578152441
Male 4640 0.325957148
No Data 1360 0.095539164

I am rather uncomfortable working with the ´highcharter´ package, as it requires a commercial license, which I do not have.
The result you want to achieve can be reached with the following - rather straightforward - code using base r or ggplot functionality, both of which are freeware. I will show this with two code fragments below.
### your data
Sex <- read.table(header = TRUE, text =
"Gender London LON_PERC
Declined 5 0.000351247
Female 8230 0.578152441
Male 4640 0.325957148
'No Data' 1360 0.095539164
")
A Solution using base r
The barplot function returns a vector (when besides is false) with the coordinates of all the midpoints of the bars drawn (if besides is true, it is a matrix). This gives us the X-coordinates for setting text above the bars, the bar-heights we already have in the data we plot, right.
# Draw the barplot and store result in `mp`
mp <- barplot(Sex$LON_PERC, # height of the bar
names.arg = Sex$Gender, # x-axis labels
ylim = c(0, 0.7), # limits of y-axis
col = '#71599b', # your color
main = "Gender - London") # main title
# add text to the barplot using the stored values
text(x = mp, # middle of the bars
y = Sex$LON_PERC, # height of the bars
labels = Sex$London, # text to display
adj = c(.5, -1.5)) # adjust horizontally and vertically
This yields the following plot:
A solution based on ggplot
library(ggplot2)
ggplot(aes(x = Gender, y = LON_PERC), data = Sex) +
geom_bar(stat = "identity", width = .60, fill = "#71599b" ) +
geom_text(aes(label = London),
position = position_dodge(width = .9),
vjust = -.3, size = 3, hjust = "center") +
theme_minimal() +
scale_y_continuous(limits = c(0, 0.7),
breaks = seq(0.0, 0.7, by = 0.1),
minor_breaks = NULL) +
labs(title = "Gender - London") +
theme(axis.title.y = element_blank(), axis.title.x = element_blank())
yielding the following plot:
In both cases, a lot of characteristics may be adapted to your needs/wishes.
I hope you benefit from these examples, even though it is not made with highcharter.

I've found a work around.
So, I can add in a "tooltip" that appears when I hover over the column/bar.
Firstly, a function is needed:
myhc_add_series_labels_values <- function (hc, labels, values, text, colors= NULL, ...)
{
assertthat::assert_that(is.highchart(hc), is.numeric(values),
length(labels) == length(values))
df <- dplyr::data_frame(name = labels, y = values, text=text)
if (!is.null(colors)) {
assert_that(length(labels) == length(colors))
df <- mutate(df, color = colors)
}
ds <- list_parse(df)
hc <- hc %>% hc_add_series(data = ds, ...)
hc
}
and then when creating the highchart this function needs to be called.
The data looks as follows:
Sex <- read.table(header = TRUE, text =
"Gender London LON_PERC
Declined 5 0.000351247
Female 8230 0.578152441
Male 4640 0.325957148
'No Data' 1360 0.095539164
")
Then the code to generate the highchart is:
Gender<- highchart() %>%
hc_xAxis(categories = Sex$Gender, labels=list(rotation=0))%>%
myhc_add_series_labels_values(labels = Sex$Gender,values=Sex$LON_PERC, text=Sex$London, type="column")%>%
hc_tooltip(crosshairs=TRUE, borderWidth=5, sort=TRUE, shared=TRUE, table=TRUE,pointFormat=paste('<br>%: {point.y}%<br>#: {point.text}'))%>%
hc_legend()
This gives the below output:
Then when I hover over each column/bar it gives be the % information and the number information as can be seen here:

Related

How to Increase the radius of a circle using Coord_polar()

I have a doughnut sort of plot which i plot using the ggplot2. Code was shared by #Jonspring.
data.frame(
stringsAsFactors = FALSE,
Tenure.Type = c("Tenure_A","Tenure_B",
"Tenure_C","Tenure_D","Tenure_E"),
In.Poverty = c(45786L, 98453L, 34954L, 29586L, 74854L),
Not.in.Poverty = c(784733L, 359584L, 385884L, 948434L, 385869L)
) -> Poverty
library(tidyverse)
Poverty %>%
pivot_longer(-Tenure.Type) %>%
uncount(round(value/1000)) %>%
ggplot(aes(1, name, color = Tenure.Type)) +
geom_jitter() +
coord_polar()
This is what i got -
Plot
I was wondering if there is any way to increase the size/surface area of the outer ring while keeping the inner ring as it is. Thanks.
I tried using the agruments inside the Coord_polar() but I can't get it to work.
Note - If you can notice, in the plot each dot represents 1000 observations. So, is there something in which we can achieve like each outer ring's dot represents 10,000 observations and each inner ring's dot represent 1,000 observations? Thanks.
You can split those 2 categories between separate jitter layers and play around with jitter heights to achieve something like this:
library(tidyverse)
Poverty %>%
pivot_longer(-Tenure.Type) %>%
uncount(round(value/1000)) %>%
ggplot(aes(1, name, color = Tenure.Type, shape = name)) +
geom_jitter(data = ~filter(.x, name == "Not.in.Poverty" ), height = .6) +
geom_jitter(data = ~filter(.x, name == "In.Poverty" ), height = .35) +
scale_shape_manual(
name = "Scale",
values = c("Not.in.Poverty" = 15, "In.Poverty" = 19),
labels = c("Not.in.Poverty" = "n=10000", "In.Poverty" = "n=1000")
)+
coord_polar()
Input data:
Poverty <- data.frame(
stringsAsFactors = FALSE,
Tenure.Type = c("Tenure_A","Tenure_B",
"Tenure_C","Tenure_D","Tenure_E"),
In.Poverty = c(45786L, 98453L, 34954L, 29586L, 74854L),
Not.in.Poverty = c(784733L, 359584L, 385884L, 948434L, 385869L)
)
Created on 2023-01-31 with reprex v2.0.2

R colour code plot by rownames for principal component analysis

I am attempting to complete a principal component analysis on a set of data containing columns of numeric data.
Assuming a dataset like this (in reality I have a pre configured data frame, this one if for reproducibility):
v1 <- c(1,2,3,4,5,6,7)
v2 <- c(3,6,2,5,2,4,9)
v3 <- c(6,1,4,2,3,7,5)
dataset <-data.frame(v1,v2,v3)
row.names(dataset) <-c('New York', 'Seattle', 'Washington DC', 'Dallas', 'Chicago','Los Angeles','Minneapolis')
I have ran my principal component analysis, and successfully plotted it:
pca=prcomp(dataset,scale=TRUE)
plot(pca$x[,1], pca$x[,2],
xlab="First PC",ylab="Second PC")
text(pca$x[,1], pca$x[,2],cex=0.7,pos=3,col="darkgrey")
What I want to do however is colour code my data points based on the city, which is the row names of my dataset. I also want to use these cities (i.e. rownames) as labels.
I've tried the following, but neither have worked:
## attempt 1 - I get row labels, but no chart
plot(pca$x[,1], pca$x[,2],col=rownames(dataset),pch=rownames(dataset),
xlab="First PC",ylab="Second PC")
text(pca$x[,1], pca$x[,2],labels=rownames(dataset),cex=0.7,pos=3,col="darkgrey")
## attempt 2
datasetwithcity = rownames_to_column(dataset, var = "city")
head(datasetwithcity)
OnlyCities=datasetwithcity[,1]
OnlyCities
# this didn't work:
City_Labels=as.numeric(OnlyCities)
head(City_Labels)
# gets city labels, but loses points and no colour
plot(pca$x[,1], pca$x[,2],col=City_Labels,pch=City_Labels,
xlab="First PC",ylab="Second PC")
text(pca$x[,1], pca$x[,2],labels=rownames(dataset),
cex=0.7,pos=3,col="darkgrey")
There are many different ways to do this.
In base R, you could do:
plot(pca$x[,1], pca$x[,2],
xlab="First PC",ylab="Second PC", col = seq(nrow(pca$x)),
xlim = c(-2.5, 2.5), ylim = c(-2, 2))
text(pca$x[,1], pca$x[,2],cex=0.7,pos=3,col="darkgrey")
text(x = pca$x[,1], y = pca$x[,2], labels = rownames(pca$x), pos = 1)
Personally, I think the resulting aesthetics are nicer (and more easy to change to suit your needs) with ggplot. The code is also a bit easier to read once you get used to the syntax.
library(ggplot2)
df <- as.data.frame(pca$x)
df$city <- rownames(df)
ggplot(df, aes(PC1, PC2, color = city)) +
geom_point(size = 3) +
geom_text(aes(label = city) , vjust = 2) +
lims(x = c(-2.5, 2.5), y = c(-2, 2)) +
theme_bw() +
theme(legend.position = "none")
Created on 2021-10-28 by the reprex package (v2.0.0)

Trying to map petition data in R

I am trying to map a UK government petition data in R. I used the boundary data from ONS geography portal. The code works and the first map I created also works.
#Install packages
install.packages("tidyverse")
install.packages("jsonlite")
install.packages("geojsonio")
install.packages("sp")
install.packages("parlitools")
install.packages("rvest")
install.packages("xml2")
install.packages("magrittr")
#Load packages
library(tidyverse)
library(jsonlite)
library(geojsonio)
library(sp)
library(parlitools)
library(rvest)
library(xml2)
library(magrittr)
[#GETTING PETITION DATA
#Importing petition for UK-wide lockdown from JSON format
petition <- fromJSON("https://petition.parliament.uk/petitions/301397.json", flatten = TRUE)
signatures <- petition$data$attributes$signatures_by_constituency %>%
rename(constituency = name)
#MAPPING BOUNDARIES
#Save url for boundary data UK
url <- "https://opendata.arcgis.com/datasets/b64677a2afc3466f80d3d683b71c3468_0.geojson"
#Load and save the boundary data as uk_map
uk_map <- geojson_read (url, what = "sp")
#pcon18cd is code name for constituency (as we can see when we view uk_map). Use fortify to get this data.
fort_uk_map <- fortify(uk_map, region = "pcon18cd")
#MAPPING PETITION DATA
#Join map data to signatures data from constituency using left_join
full_uk_map <- left_join(fort_uk_map, signatures, by = c("id" = "ons_code"))
#Plot-1a: Map of signatures in the whole of UK
ggplot() +
geom_polygon(data = full_uk_map, aes(x = long, y= lat, group = group, fill = signature_count)) +
geom_path(color = "black", size = 0.1) +theme(legend.position = "bottom") +
theme_void() +
labs(x = NULL,
y = NULL,
title = "Signatories of the UK Coronavirus Lockdown Petition",
subtitle = "Let's investigate where the signatures come from",
caption = "Geometries: ONS Open Geography Portal; Data: UK Parliament and Government",
fill = "Signature Count")][1]
But, as you can see from the image, the higher signatures have a lighter color. I would like to change it so that the higher number of signatures have a darker color.
So, I tried this code just below the above code and that's where I am facing issues.
#Change color of legend so that higher signature count equals darker color. Use quantile () [Doesn't work]
no_of_classes <- 9
quantiles <- quantile(full_uk_map$signature_count, probs = seq(0, 1, length.out = no_of_classes + 1))
labels <- c()
for(band in 1:length(quantiles)){
labels <- c(labels, paste0(round(quantiles[band])," - ", round(quantiles[band + 1])))
}
full_uk_map$quantiles <- cut(full_uk_map$signature_count, breaks = quantiles, labels = labels,
include.lowest = T)
labels <- labels[1:length(labels)-1]
#Plot-1b: Map of signatures in the whole of UK [Doesn't work]
sig_map_by_quantile <- ggplot() +
geom_polygon(data = full_uk_map, aes(x = long, y = lat, group = group, fill = quantiles)) +
geom_path(color = "black", size = 0.1) +
scale_fill_brewer(type = 'qual', palette = "Blues", guide = "legend", name = "Signature Count", labels = labels) +
theme_void +
theme(legend.position = "bottom") +
labs(x = NULL,
y = NULL,
title = "Signatories of the UK Coronavirus Lockdown Petition",
caption = "Geometries: ONS Open Geography Portal; Data: UK Parliament and Government")
When I run the full_uk_map$quantiles, this is the error message I see:
> full_uk_map$quantiles <- cut(full_uk_map$signature_count, breaks = quantiles, labels = labels,
+ include.lowest = T)
Error in cut.default(full_uk_map$signature_count, breaks = quantiles, :
lengths of 'breaks' and 'labels' differ
Would anyone be able to help? Much appreciated!
Why you made us go through all that package installation, downloading files from the Internet, fortification, merging, and then waiting for the plot to appear is beyond me.
All you had to ask was why the cut function was returning an error. Your title is totally irrelevant to the problem.
Anyway, the cut function, although not mentioned in the documentation (which is a shame if true), requires that the length of labels be one less than the length of breaks, if breaks is specified as a vector. Apologies to all if this is in fact mentioned in the documentation, but I didn't see it after a good long look. It may be hidden between the lines of the descriptions for the breaks and labels arguments. Note that the breaks argument can be provided as a number (of break-points) or, as in your case, a vector of cut-points.
For example, if breaks = c(1,2,3), then that implies you have two intervals, so you need 2 labels.
In your code, you supply the quantiles vector as the breaks and labels vector and the labels. Both have the same length, which triggers the error; you have 1 too many labels. Solution: make the length of labels one less than the length of breaks.

R: PCA ggplot Error "arguments imply differing number of rows"

I have a dataset:
https://docs.google.com/spreadsheets/d/1ZgyRQ2uTw-MjjkJgWCIiZ1vpnxKmF3o15a5awndttgo/edit?usp=sharing
that I'm trying to apply PCA analysis and to achieve a graph based on graph provided in this post:
https://stats.stackexchange.com/questions/61215/how-to-interpret-this-pca-biplot-coming-from-a-survey-of-what-areas-people-are-i
However, an error doesn't seem to go away:
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names =
TRUE, :
arguments imply differing number of rows: 0, 1006
Following is my code that I have trouble finding the source of error. Would like to have some help for error detection. Any hints?
The goal is to produced a PCA graph grouped by levels of Happiness.in.life. I modified the original code to fit with my dataset. Originally, group is determined by Genders, which has 2 levels. What I'm attempting to do is to build a graph based on 5 levels of Happiness.in.life. However, it doesn't seem I can use the old code...
Thanks!
library(magrittr)
library(dplyr)
library(tidyr)
df <- happiness_reduced %>% dplyr::select(Happiness.in.life:Internet.usage, Happiness.in.life)
head(df)
vars_on_hap <- df %>% dplyr::select(-Happiness.in.life)
head(vars_on_hap)
group<-df$Happiness.in.life
fit <- prcomp(vars_on_hap)
pcData <- data.frame(fit$x)
vPCs <- fit$rotation[, c("PC1", "PC2")] %>% as.data.frame()
multiple <- min(
(max(pcData[,"PC1"]) - min(pcData[,"PC1"]))/(max(vPCs[,"PC1"])-
min(vPCs[,"PC1"])),
(max(pcData[,"PC2"]) - min(pcData[,"PC2"]))/(max(vPCs[,"PC2"])-
min(vPCs[,"PC2"]))
)
ggplot(pcData, aes(x=PC1, y=PC2)) +
geom_point(aes(colour=groups)) +
coord_equal() +
geom_text(data=vPCs,
aes(x = fit$rotation[, "PC1"]*multiple*0.82,
y = fit$rotation[,"PC2"]*multiple*0.82,
label=rownames(fit$rotation)),
size = 2, vjust=1, color="black") +
geom_segment(data=vPCs,
aes(x = 0,
y = 0,
xend = fit$rotation[,"PC1"]*multiple*0.8,
yend = fit$rotation[,"PC2"]*multiple*0.8),
arrow = arrow(length = unit(.2, 'cm')),
color = "grey30")
Here is an approach on how to plot the result of PCA in ggplot2:
library(tidyverse)
library(ggrepel)
A good idea (not in all cases for instance if they are all in the same units) is to scale the variables prior to PCA
hapiness %>% #this is the data from google drive. In the future try not top post such links on SO because they tend to be unusable after some time has passed
select(-Happiness.in.life) %>%
prcomp(center = TRUE, scale. = TRUE) -> fit
Now we can proceed to plotting the fit:
fit$x %>% #coordinates of the points are in x element
as.data.frame()%>% #convert matrix to data frame
select(PC1, PC2) %>% #select the first two PC
bind_cols(hapiness = as.factor(hapiness$Happiness.in.life)) %>% #add the coloring variable
ggplot() +
geom_point(aes(x = PC1, y = PC2, colour = hapiness)) + #plot points and color
geom_segment(data = fit$rotation %>% #data we want plotted by geom_segment is in rotation element
as.data.frame()%>%
select(PC1, PC2) %>%
rownames_to_column(), #get to row names so you can label after
aes(x = 0, y = 0, xend = PC1 * 7, yend = PC2* 7, group = rowname), #I scaled the rotation by 7 so it fits in the plot nicely
arrow = arrow(angle = 20, type = "closed", ends = "last",length = unit(0.2,"cm")),
color = "grey30") +
geom_text_repel(data = fit$rotation %>%
as.data.frame()%>%
select(PC1, PC2) %>%
rownames_to_column(),
aes(x = PC1*7,
y = PC2*7,
label = rowname)) +
coord_equal(ratio = fit$sdev[2]^2 / fit$sdev[1]^2) + #I like setting the ratio to the ratio of eigen values
xlab(paste("PC1", round(fit$sdev[1]^2/ sum(fit$sdev^2) *100, 2), "%")) +
ylab(paste("PC2", round(fit$sdev[2]^2/ sum(fit$sdev^2) *100, 2), "%")) +
theme_bw()
Look at all them happy people on the left (well it is hard to notice because of the colors used, I suggest using the palette jco from ggpubr library) get_palette('jco', 5) ie scale_color_manual(values = get_palette('jco', 5))
quite a similar plot can be achieved with library ggord:
library(ggord)
ggord(fit, grp_in = as.factor(hapiness$Happiness.in.life),
size = 1, ellipse = F, ext = 1.2, vec_ext = 5)
the major difference is ggord uses equal scaling for axes. Also I scaled the rotation by 5 instead of 7 as in the first plot.
As you can see I do not like many intermediate data frames.

coloring specific bars of grouped bar chart in r using lattice

I would like to create a bar chart in R using lattice. It's a simple chart with six double columns, currently the color pattern is the same for all of them. That means the first double column is in red and black as well as the other five double columns. Would it be possible to change the color pattern, so that each column has its own color? If it's not possible using lattice, might it be possible with ggplot2? To clarify what I have in mind, see the following two pictures:
Current result:
Desired color pattern:
My current code:
library(lattice)
mitte_table <- read.table(text = "Partei; Jahr; Ergebnis
CDU;2017;18.6
CDU;2013;22.6
SPD;2017;17.9
SPD;2013;26.1
Linke;2017;21.5
Linke;2013;18.7
Gruene;2017;17.2
Gruene;2013;16.7
AfD;2017;8.2
AfD;2013;3.9
FDP;2017;8.7
FDP;2013;3.7
Sonstige;2017;7.9
Sonstige;2013;8.3",
header = TRUE,
sep = ";",
)
colors_Jahr = c("black", "red")
my.settings <- list(
superpose.polygon = list(col = colors_Jahr)
)
barchart(
data = mitte_table,
Ergebnis ~ Partei,
groups = Jahr,
horizontal = FALSE,
main = "Zweitstimme im Wahlkreis Mitte",
xlab = "Parteien",
ylab = "Stimmenverteilung in %",
auto.key = TRUE,
par.settings = my.settings
)
Something like this? (Using ggplot2.)
Two caveats:
I didn't attempt to match the colors to the parties. But that should be easy to fix, just use a different color vector.
It's hard in ggplot2 to get the bars to overlap, as in the example. If you insist on that then some more work is required.
Here is the code:
require(dplyr)
require(ggplot2)
mitte_table <- read.table(text = "Partei; Jahr; Ergebnis
CDU;2017;18.6
CDU;2013;22.6
SPD;2017;17.9
SPD;2013;26.1
Linke;2017;21.5
Linke;2013;18.7
Gruene;2017;17.2
Gruene;2013;16.7
AfD;2017;8.2
AfD;2013;3.9
FDP;2017;8.7
FDP;2013;3.7
Sonstige;2017;7.9
Sonstige;2013;8.3",
header = TRUE,
sep = ";",
)
# make combined dark and light color palette
cols_dark <- c("#CD8E04", "#5E9FC8", "#028D67", "#D6CB35", "#0266A0", "#BB571E", "#AE7192", "#898989")
cols_light <- c("#FFB746", "#7DCCFF", "#4BC095", "#F9ED51", "#579BDB", "#FF834A", "#EC99C6", "#B7B7B7")
colors <- c(rbind(cols_dark, cols_light))
# add a column that has a smaller number for the larger year and vice versa
mitte_table <- mutate(mitte_table, order = 10000 - Jahr)
# reorder levels so they are in the order in which they are in the table
mitte_table$Partei <- factor(mitte_table$Partei, levels = unique(mitte_table$Partei))
# the trick is to fill by a factor that combines the party name and the order (10000 - year)
ggplot(mitte_table, aes(x=Partei, group=order, y=Ergebnis, fill=factor(paste0(Partei,order)))) +
geom_col(position="dodge") +
scale_fill_manual(values = colors,
breaks = paste0(mitte_table$Partei[1], 10000-c(2017, 2013)),
labels = c("Wahl 2017", "Wahl 2013"),
name = "") +
scale_y_continuous(limits = c(0, 29), expand = c(0, 0)) +
# override the fill aes to get gray colors in the legend
guides(fill = guide_legend(override.aes = list(fill = c("#9A9A9A", "#C5C5C5")))) +
theme_minimal(14) +
theme(legend.position = "bottom")

Resources