Trouble rendering dynamic geom_hlines in Power BI with R - r

I'm attempting to create a ggplot2 object in power bi that will render any number of horizontal lines depending on measures dropped in and out of the "Values" bin. I thought I'd do this with a for loop that adds an additional geom_hline to the object depending on the length of the dataframe. I also want each line to have a different color, and the value of the hline to render with the label in the legend.
The dataframe I am using has 3 static columns titled - Year, Escalation, and Type. Any additional columns beyond the first 3 would be considered data to be used for the horizontal lines.
This is what I have so far...
library(ggplot2)
library(RColorBrewer)
# create a unique color set
n <- 60
qual_col_pals = brewer.pal.info[brewer.pal.info$category == 'qual',]
col_vector = unlist(mapply(brewer.pal, qual_col_pals$maxcolors, rownames(qual_col_pals)))
# create an dynamic x axis label depending on the number of years to be plotted
scale <- if(length(dataset$Year) < 30) {
scale_x_continuous(breaks = seq(min(dataset$Year), max(dataset$Year)))
} else if (length(dataset$Year) >= 30 & length(dataset$Year) <= 60) {
scale_x_continuous(breaks = seq(min(dataset$Year), max(dataset$Year), by = 2))
} else {
scale_x_continuous(breaks = seq(min(dataset$Year), max(dataset$Year), by = 5))
}
#ggplot object
plot <- ggplot(dataset, aes(x = Year, y = Escalation)) +
geom_point(aes(color = "#094780"), size = 3) +
geom_hline(aes(yintercept = mean(dataset$Escalation), color = col_vector[1]), linetype = "dashed") +
geom_hline(aes(yintercept = median(dataset$Escalation), color = col_vector[2]), linetype = "dashed") +
theme(axis.text.x = element_text(colour = "#942832")) +
theme(axis.text.y = element_text(colour = "#942832")) +
scale +
scale_y_continuous(breaks = round(seq(min(dataset$Escalation), max(dataset$Escalation), by = 0.02),2))
# add horizontal comparison lines
addline <- function(data){
c <- list(unlist(unique(dataset[3])), paste("Mean ", round(mean(dataset$Escalation), 3)), paste("Median", round(median(dataset$Escalation),3)))
t <- list("#094780", col_vector[1], col_vector[2])
for (i in 1:data){
line = geom_hline(aes(yintercept = dataset[1,3+i], color = col_vector[2+i]))
plot = plot + line
c[3+i] = paste(unlist(names(dataset)[3+i]), " Escalation :", round(dataset[1 ,3+i], 3))
t[3+i] = col_vector[2+i]
}
m = scale_color_manual(name = "", values = t, labels = c)
plot = plot + m
return(plot)
}
addline(NCOL(dataset)-3)
It's rendering, but its not giving me what I'm expecting when I add data for more than 1 horizontal line (its shifting the line with the data but not properly naming it or coloring it). For reference, if there were 2 horizontal lines and it were hardcoded, I would want the code to look like this (this renders correctly in power BI).
library(ggplot2)
library(RColorBrewer)
n <- 60
qual_col_pals = brewer.pal.info[brewer.pal.info$category == 'qual',]
col_vector = unlist(mapply(brewer.pal, qual_col_pals$maxcolors, rownames(qual_col_pals)))
scale <- if(length(dataset$Year) < 30) {
scale_x_continuous(breaks = seq(min(dataset$Year), max(dataset$Year)))
} else if (length(dataset$Year) >= 30 & length(dataset$Year) <= 60) {
scale_x_continuous(breaks = seq(min(dataset$Year), max(dataset$Year), by = 2))
} else {
scale_x_continuous(breaks = seq(min(dataset$Year), max(dataset$Year), by = 5))
}
plot <- ggplot(dataset, aes(x = Year, y = Escalation)) +
geom_point(aes(color = "#094780"), size = 3) +
geom_hline(aes(yintercept = mean(dataset$Escalation), color = col_vector[1]), linetype = "dashed") +
geom_hline(aes(yintercept = median(dataset$Escalation), color = col_vector[2]), linetype = "dashed") +
theme(axis.text.x = element_text(colour = "#942832")) +
theme(axis.text.y = element_text(colour = "#942832")) +
scale +
scale_y_continuous(breaks = round(seq(min(dataset$Escalation), max(dataset$Escalation), by = 0.02),2)) +
geom_hline(aes(yintercept = dataset[1,4], color = col_vector[3]),) +
geom_hline(aes(yintercept = dataset[1,5], color = col_vector[4]),) +
scale_color_manual(
name = "",
values = list("#094780", col_vector[1], col_vector[2], col_vector[3], col_vector[4]),
labels = list(unlist(unique(dataset[3])),
paste("Mean ", round(mean(dataset$Escalation), 3)),
paste("Median", round(median(dataset$Escalation),3)),
paste(unlist(names(dataset)[4]), " Escalation :", round(dataset[1 ,4], 3)),
paste(unlist(names(dataset)[5]), " Escalation :", round(dataset[1 ,5], 3))
)
)
plot
I'm still a novice when it comes to coding, so I'm pretty sure I'm just not understanding something basic about how the loop works.
I know this is a bit to shift through, but I'm having a hard time really debugging because I'd have to export the data set from power BI into r studio. Any help is appreciated!

Related

Why do two legends appear when manually editing in ggplot2?

I want to plot two lines, one solid and another one dotted, both with different colors. I'm having trouble dealing with the legends for this plot. Take this example:
library(ggplot2)
library(reshape2)
df = data.frame(time = 0:127,
mean_clustered = rnorm(128),
mean_true = rnorm(128)
)
test_data_long <- melt(df, id="time") # convert to long format
p = ggplot(data=test_data_long,
aes(x=time, y=value, colour=variable)) +
geom_line(aes(linetype=variable)) +
labs(title = "", x = "Muestras", y = "Amplitud", color = "Spike promedio\n") +
scale_color_manual(labels = c("Hallado", "Real"), values = c("blue", "red")) +
xlim(0, 127)
print(p)
Two legends appear, and on top of it, none of them is correct (the one with the right colors has wrong line styles, and the one with the right line styles has all other things wrong).
Why is this happening and how can I get the right legend to appear?
You need to ensure all the aesthetic mappings match between the different aesthetics you're using:
library(ggplot2)
library(reshape2)
data.frame(
time = 0:127,
mean_clustered = rnorm(128),
mean_true = rnorm(128)
) -> xdf
test_data_long <- melt(xdf, id = "time")
ggplot(
data = test_data_long,
aes(x = time, y = value, colour = variable)
) +
geom_line(aes(linetype = variable)) +
scale_color_manual(
name = "Spike promedio\n", labels = c("Hallado", "Real"), values = c("blue", "red")
) +
scale_linetype(
name = "Spike promedio\n", labels = c("Hallado", "Real")
) +
labs(
x = "Muestras", y = "Amplitud", title = ""
) +
xlim(0, 127)
Might I suggest also using theme parameters to adjust the legend title:
ggplot(data = test_data_long, aes(x = time, y = value, colour = variable)) +
geom_line(aes(linetype = variable)) +
scale_x_continuous(name = "Muestras", limits = c(0, 127)) +
scale_y_continuous(name = "Amplitud") +
scale_color_manual(name = "Spike promedio", labels = c("Hallado", "Real"), values = c("blue", "red")) +
scale_linetype(name = "Spike promedio", labels = c("Hallado", "Real")) +
labs(title = "") +
theme(legend.title = element_text(margin = margin(b=15)))

using y-axis values to create secondary x-axis in ggplot2

I would like to create a dot plot with percentiles, which looks something like this-
Here is the ggplot2 code I used to create the dot plot. There are two things I'd like to change:
I can plot the percentile values on the y-axis but I want these
values on the x-axis (as shown in the graph above). Note that
the coordinates are flipped.
The axes don't display label for the
minimum value (for example the percentile axis labels start at 25
when they should start at 0 instead.)
# loading needed libraries
library(tidyverse)
library(ggstatsplot)
# creating dataframe with mean mileage per manufacturer
cty_mpg <- ggplot2::mpg %>%
dplyr::group_by(.data = ., manufacturer) %>%
dplyr::summarise(.data = ., mileage = mean(cty, na.rm = TRUE)) %>%
dplyr::rename(.data = ., make = manufacturer) %>%
dplyr::arrange(.data = ., mileage) %>%
dplyr::mutate(.data = ., make = factor(x = make, levels = .$make)) %>%
dplyr::mutate(
.data = .,
percent_rank = (trunc(rank(mileage)) / length(mileage)) * 100
) %>%
tibble::as_data_frame(x = .)
# plot
ggplot2::ggplot(data = cty_mpg, mapping = ggplot2::aes(x = make, y = mileage)) +
ggplot2::geom_point(col = "tomato2", size = 3) + # Draw points
ggplot2::geom_segment(
mapping = ggplot2::aes(
x = make,
xend = make,
y = min(mileage),
yend = max(mileage)
),
linetype = "dashed",
size = 0.1
) + # Draw dashed lines
ggplot2::scale_y_continuous(sec.axis = ggplot2::sec_axis(trans = ~(trunc(rank(.)) / length(.)) * 100, name = "percentile")) +
ggplot2::coord_flip() +
ggplot2::labs(
title = "City mileage by car manufacturer",
subtitle = "Dot plot",
caption = "source: mpg dataset in ggplot2"
) +
ggstatsplot::theme_ggstatsplot()
Created on 2018-08-17 by the reprex package (v0.2.0.9000).
I am not 100% sure to have understood what you really want, but below is my attempt to reproduce the first picture with mpg data:
require(ggplot2)
data <- aggregate(cty~manufacturer, mpg, FUN = mean)
data <- data.frame(data[order(data$cty), ], rank=1:nrow(data))
g <- ggplot(data, aes(y = rank, x = cty))
g <- g + geom_point(size = 2)
g <- g + scale_y_continuous(name = "Manufacturer", labels = data$manufacturer, breaks = data$rank,
sec.axis = dup_axis(name = element_blank(),
breaks = seq(1, nrow(data), (nrow(data)-1)/4),
labels = 25 * 0:4))
g <- g + scale_x_continuous(name = "Mileage", limits = c(10, 25),
sec.axis = dup_axis(name = element_blank()))
g <- g + theme_classic()
g <- g + theme(panel.grid.major.y = element_line(color = "black", linetype = "dotted"))
print(g)
That produces:
data <- aggregate(cty~manufacturer, mpg, FUN = mean)
data <- data.frame(data[order(data$cty), ], rank=1:nrow(data))
These two lines generate the data for the graph. Basically we need the manufacturers, the mileage (average of cty by manufacturer) and the rank.
g <- g + scale_y_continuous(name = "Manufacturer", labels = data$manufacturer, breaks = data$rank,
sec.axis = dup_axis(name = element_blank(),
breaks = seq(1, nrow(data), (nrow(data)-1)/4),
labels = 25 * 0:4))
Note that here the scale is using rank and not the column manufacturer. To display the name of the manufacturers, you must use the labels property and you must force the breaks to be for every values (see property breaks).
The second y-axis is generated using the sec.axis property. This is very straight-forward using dup_axis that easily duplicate the axis. By replacing the labels and the breaks, you can display the %-value.
g <- g + theme(panel.grid.major.y = element_line(color = "black", linetype = "dotted"))
The horizontal lines are just the major grid. This is much easier to manipulate than geom_segments in my opinion.
Regarding your question 1, you can flip the coordinates easily using coord_flip, with minor adjustments. Replace the following line:
g <- g + theme(panel.grid.major.y = element_line(color = "black", linetype = "dotted")
By the following two lines:
g <- g + coord_flip()
g <- g + theme(panel.grid.major.x = element_line(color = "black", linetype = "dotted"),
axis.text.x = element_text(angle = 90, hjust = 1))
Which produces:
Regarding your question 2, the problem is that the value 0% is outside the limits. You can solve this issue by changing the way you calculate the percentage (starting from zero and not from one), or you can extend the limit of your plot to include the value zero, but then no point will be associated to 0%.

How to separately label and scale double y-axis in ggplot2?

I have a test dataset like this:
df_test <- data.frame(
proj_manager = c('Emma','Emma','Emma','Emma','Emma','Alice','Alice'),
proj_ID = c(1, 2, 3, 4, 5, 6, 7),
stage = c('B','B','B','A','C','A','C'),
value = c(15,15,20,20,20,70,5)
)
Preparation for viz:
input <- select(df_test, proj_manager, proj_ID, stage, value) %>%
filter(proj_manager=='Emma') %>%
do({
proj_value_by_manager = sum(distinct(., proj_ID, value)$value);
mutate(., proj_value_by_manager = proj_value_by_manager)
}) %>%
group_by(stage) %>%
do({
sum_value_byStage = sum(distinct(.,proj_ID,value)$value);
mutate(.,sum_value_byStage= sum_value_byStage)
}) %>%
mutate(count_proj = length(unique(proj_ID)))
commapos <- function(x, ...) {
format(abs(x), big.mark = ",", trim = TRUE,
scientific = FALSE, ...) }
Visualization:
ggplot (input, aes(x=stage, y = count_proj)) +
geom_bar(stat = 'identity')+
geom_bar(aes(y=-proj_value_by_manager),
stat = "identity", fill = "Blue") +
scale_y_continuous(labels = commapos)+
coord_flip() +
ylab('') +
geom_text(aes(label= sum_value_byStage), hjust = 5) +
geom_text(aes(label= count_proj), hjust = -1) +
labs(title = "Emma: 4 projects| $90M Values \n \n Commitment|Projects") +
theme(plot.title = element_text(hjust = 0.5)) +
geom_hline(yintercept = 0, linetype =1)
My questions are:
Why is the y-values not showing up right? e.g. C is labeled 20, but nearing hitting 100 on the scale.
How to adjust the position of labels so that it sits on the top of its bar?
How to re-scale the y axis so that both the very short bar of 'count of project' and long bar of 'Project value' can be well displayed?
Thank you all for the help!
I think your issues are coming from the fact that:
(1) Your dataset has duplicated values. This causes geom_bar to add all of them together. For example there are 3 obs for B where proj_value_by_manager = 90 which is why the blue bar extends to 270 for that group (they all get added).
(2) in your second geom_bar you use y = -proj_value_by_manager but in the geom_text to label this you use sum_value_byStage. That's why the blue bar for A is extending to 90 (since proj_value_by_manager is 90) but the label reads 20.
To get you what I believe the chart you want is you could do:
#Q1: No dupe dataset so it doesnt erroneous add columns
input2 <- input[!duplicated(input[,-c(2,4)]),]
ggplot (input2, aes(x=stage, y = count_proj)) +
geom_bar(stat = 'identity')+
geom_bar(aes(y=-sum_value_byStage), #Q1: changed so this y-value matches your label
stat = "identity", fill = "Blue") +
scale_y_continuous(labels = commapos)+
coord_flip() +
ylab('') +
geom_text(aes(label= sum_value_byStage, y = -sum_value_byStage), hjust = 1) + #Q2: Added in y-value for label and hjust so it will be on top
geom_text(aes(label= count_proj), hjust = -1) +
labs(title = "Emma: 4 projects| $90M Values \n \n Commitment|Projects") +
theme(plot.title = element_text(hjust = 0.5)) +
geom_hline(yintercept = 0, linetype =1)
For your last question, there is no good way to display both of these. One option would be to rescale the small data and still label it with a 1 or 3. However, I didn't do this because once you scale down the blue bars the other bars look OK to me.

Add text to geom_line in ggplot

I am trying to create a line plot for 2 stocks AAPL and FB. Instead of adding a separate legend, I would like to print the stock symbols along with the lines. How can I add geom_text to the following code? I appreciate any help you could provide.
library (ggplot2)
library(quantmod)
getSymbols('AAPL')
getSymbols('FB')
AAPL = data.frame(AAPL)
FB = data.frame(FB)
p1 = ggplot(AAPL)+geom_line(data=AAPL,aes(as.Date(rownames(AAPL)),AAPL.Adjusted,color="AAPL"))
p2 = p1+geom_line(data=FB,aes(as.Date(rownames(FB)),FB.Adjusted,color="FB"))
p2 + xlab("Year")+ylab("Price")+theme_bw()+theme(legend.position="none")
This is the sort of plot that is perfect for the directlabels package. And it is easier to plot if the data is available in one dataframe.
# Data
library(quantmod)
getSymbols('AAPL')
getSymbols('FB')
AAPL = data.frame(AAPL)
FB = data.frame(FB)
# rbind into one dataframe
AAPL$label = "AAPL"
FB$label = "FB"
names = gsub("^FB\\.(.*$)", "\\1", names(FB))
names(AAPL) = names
names(FB) = names
df = rbind(AAPL, FB)
# Packages
library(ggplot2)
library(directlabels)
# The plot - labels at the beginning and the ends of the lines.
ggplot(df, aes(as.Date(rownames(df)), Adjusted, group = label, colour = label)) +
geom_line() +
scale_colour_discrete(guide = 'none') +
geom_dl(aes(label = label), method = list(dl.combine("first.points", "last.points")))
A better plot: Increase the space between the end points of the lines and the labels. See here for other options.
ggplot(df, aes(as.Date(rownames(df)), Adjusted, group = label, colour = label)) +
geom_line() +
scale_colour_discrete(guide = 'none') +
scale_x_date(expand=c(0.1, 0)) +
geom_dl(aes(label = label), method = list(dl.trans(x = x + .2), "last.points")) +
geom_dl(aes(label = label), method = list(dl.trans(x = x - .2), "first.points"))
Question is possibly a duplicate of this one.
You simply have to add geom_text as u said:
Define the x, y positions, the label you want to appear (and the color):
library(quantmod)
getSymbols('AAPL')
getSymbols('FB')
AAPL = data.frame(AAPL)
FB = data.frame(FB)
p1 = ggplot(AAPL)+geom_line(data=AAPL,aes(as.Date(rownames(AAPL)),AAPL.Adjusted,color="AAPL"))
p2 = p1+geom_line(data=FB,aes(as.Date(rownames(FB)),FB.Adjusted,color="FB"))
p2 + xlab("Year") + ylab("Price")+theme_bw()+theme(legend.position="none") +
geom_text(aes(x = as.Date("2011-06-07"), y = 60, label = "AAPL", color = "AAPL")) +
geom_text(aes(x = as.Date("2014-10-01"), y = 45, label = "FB", color = "FB"))
EDIT
If you want to automatically find positions for x and y in geom_text, you will face new problems with overlapping labels if you increase the number of variables.
Here is a beginning of solution, you might adapt the method to define x and `y
AAPL$date = rownames(AAPL)
AAPL$var1 = "AAPL"
names(AAPL)[grep("AAPL", names(AAPL))] = gsub("AAPL.", "", names(AAPL)[grep("AAPL", names(AAPL))])
FB$date = rownames(FB)
FB$var1 = "FB"
names(FB)[grep("FB", names(FB))] = gsub("FB.", "", names(FB)[grep("FB", names(FB))])
# bind the 2 data frames
df = rbind(AAPL, FB)
# where do you want the legend to appear
legend = data.frame(matrix(ncol = 3, nrow = length(unique(df$var1))))
colnames(legend) = c("x_pos" , "y_pos" , "label")
legend$label = unique(df$var1)
legend$x_pos = as.POSIXct(legend$x_pos)
df$date = as.POSIXct(df$date)
for (i in legend$label)
{
legend$x_pos[legend$label == i] <- as.POSIXct(min(df$date[df$var1 == i]) +
as.numeric(difftime(max(df$date[df$var1 == i]), min(df$date[df$var1 == i]), units = "sec"))/2)
legend$y_pos[legend$label == i] <- df$Adjusted[df$date > legend$x_pos[legend$label == i] & df$var1 == i][1]
}
# Plot
ggplot(df, aes(x = as.POSIXct(date), y = Adjusted, color = var1)) +
geom_line() + xlab("Year") + ylab("Price") +
geom_text(data = legend, aes(x = x_pos, y = y_pos, label = label, color = label, hjust = -1, vjust = 1))
+ guides(color = F)

guide_legend and ggplot2, format nrow

I am trying to format an over-long legend on a ggplot so that there is a maximum no. of rows. I've read all the documentation that I could find, especially this: http://docs.ggplot2.org/0.9.3.1/guide_legend.html but for some reason, the legend will not format.
I've given a reproducible sample below using the quakes dataset, and converted the column stations to character so that they plot individually (otherwise, they seem to plot as groups).
plotquakes <- function(magreq) {
library(ggplot2)
magdata <- subset(quakes, mag > magreq)
magdata$stations <- as.character(magdata$stations)
g <- ggplot(magdata, aes (x = lat, y = long))
g + geom_point(aes(alpha = stations), fill = "black", pch=21, size = 6) +
labs(x = "Latitude", y = "Longitude") +
geom_vline(xintercept = 0, col = "red") +
geom_hline(yintercept = 0, col = "red") +
guides(col = guide_legend(nrow = 16))
}
plotquakes(5)
And what I get is this:
whereas I would like to have a maximum of 16 data fields per column in the legend.
You are changing the wrong guide.
plotquakes <- function(magreq) {
library(ggplot2)
magdata <- subset(quakes, mag > magreq)
magdata$stations <- as.character(magdata$stations)
g <- ggplot(magdata, aes (x = lat, y = long))
g + geom_point(aes(alpha = stations), fill = "black", pch=21, size = 6) +
labs(x = "Latitude", y = "Longitude") +
geom_vline(xintercept = 0, col = "red") +
geom_hline(yintercept = 0, col = "red") +
guides(alpha = guide_legend(nrow = 16)) #note it's alpha not col
}
plotquakes(5)

Resources