R barplot cumulative - x date, y freq - r

I'm trying to recreate this kind of plot in R but I'm not very successful.
Where X = date and Y = frequency of a discrete variable, cumulative on one bar.
Also I'm trying to put it in a function so it would be easier to use this kind of plot for different variables.
Link to the plot image <---
I'd appreciate any help!
Data example:
Excel plot example <---
Purchase_date Phone
2014-10-23 Sony
2014-10-23 Apple
2014-10-23 Nokia
2014-10-23 Nokia
2014-10-24 NA
2014-10-24 Nokia
2014-10-24 Sony
2014-10-24 Other
2014-10-24 Apple
2014-10-25 Sony
2014-10-25 NA
2014-10-25 Apple
2014-10-25 Sony
2014-10-25 Nokia
Also
I have something like this but it's definitely far from universal method for different variables:
base_table %>%
filter(year(as.Date(BUY_DATE)) >= 2014, year(as.Date(BUY_DATE)) <= 2017) %>%
mutate(BUY_DATE = as.yearmon(as.Date(BUY_DATE))) %>%
group_by(PHONETYPE, BUY_DATE) %>% summarise(n = n()) -> applPerTypeAndMonth
applPerTypeAndMonth %>% pull(PHONETYPE) %>% table()
filter(applPerTypeAndMonth, PHONETYPE == '') -> x
xts(x$n, order.by = x$BUY_DATE) -> type1
filter(applPerTypeAndMonth, PHONETYPE == 'NOKIA') -> x
xts(x$n, order.by = x$BUY_DATE) -> type2
filter(applPerTypeAndMonth, PHONETYPE == 'APPLE') -> x
xts(x$n, order.by = x$BUY_DATE) -> type3
filter(applPerTypeAndMonth, PHONETYPE == 'SONY') -> x
xts(x$n, order.by = x$BUY_DATE) -> type4
filter(applPerTypeAndMonth, PHONETYPE == 'HUAWEI') -> x
xts(x$n, order.by = x$BUY_DATE) -> type5
filter(applPerTypeAndMonth, PHONETYPE == 'LG') -> x
xts(x$n, order.by = x$BUY_DATE) -> type6
filter(applPerTypeAndMonth, PHONETYPE == 'OTHER') -> x
xts(x$n, order.by = x$BUY_DATE) -> type7
merge(type1,type2,type3,type4,type5,type6,type7) -> types
na.fill(types, fill = 0.0) -> types
barplot(types, col = rainbow(7))
types %>% apply(1, function(x) x / sum(x)) %>% barplot(col = rainbow(7))
# legend("topright", legend = names(types), fill = rainbow(7))

Using data.table first create a summary table that details the frequency of each phone by each day.
summary = purchases[,list(Purchases = .N), by = list(Purchase_date, Phone)
Then split this out by phone type, and in each sub-dataset order by date and add in a cumulative purchases variable.
splitted = split(summary, summary$Phone)
splitted = lapply(splitted, function(x){
x = x[order(PurchaseDate)]
x$CumulativePurchases = cumsum(x$Purchases)
return(x)})
Then rbindlist back together into a single dataframe and then you can use GGplot easily.
summary = rbindlist(splitted)
plotted = ggplot(summary, aes(x = PurchaseDate, y = CumulativePurchases, fill = Phone)) + geom_bar(stat = "identity")

something along the lines of this,
dta <- structure(list(Purchase_date = structure(c(1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("2014-10-23",
"2014-10-24", "2014-10-25"), class = "factor"), Phone = structure(c(4L,
1L, 2L, 2L, NA, 2L, 4L, 3L, 1L, 4L, NA, 1L, 4L, 2L), .Label = c("Apple",
"Nokia", "Other", "Sony"), class = "factor")), .Names = c("Purchase_date",
"Phone"), class = "data.frame", row.names = c(NA, -14L))
# install.packages(c("ggplot2"), dependencies = TRUE)
library(ggplot2)
g <- ggplot(dta, aes(Purchase_date))
g + geom_bar(aes(fill = Phone))
updated, here's the plot wrapped in a function,
function.name <- function(df)
{
require(ggplot2)
p <- ggplot(df, aes(x = Purchase_date))
p + geom_bar(aes(fill = Phone))
}
function.name(dta)
I'll obviously recommend you take a look at this site to learn how to label, color, reorder, etc.

# load packages
library(tidyverse)
library(lubridate)
# create a dataframe from your data
df <- frame_data(
~Purchase_date, ~Phone
, "2014-10-23", "Sony"
, "2014-10-23", "Apple"
, "2014-10-23", "Nokia"
, "2014-10-23", "Nokia"
, "2014-10-24", "NA"
, "2014-10-24", "Nokia"
, "2014-10-24", "Sony"
, "2014-10-24", "Other"
, "2014-10-24", "Apple"
, "2014-10-25", "Sony"
, "2014-10-25", NA
, "2014-10-25", "Apple"
, "2014-10-25", "Sony"
, "2014-10-25", "Nokia"
)
# make dates dates, if you want to
df <- df %>%
mutate(Purchase_date = as_date(Purchase_date))
# and plot it
df %>%
ggplot(aes(Purchase_date, fill = Phone)) +
geom_bar()
ggplot() and geom_bar() ARE a functions and they do what you want (and actually a whole lot more if desired). How to plot can be read up, e.g., in the R-Graphics Cookbook which really helps whenever you need it.

Related

How to change the legend title of my ggplots based on outcomes?

I wonder if there is an efficent way to change the legend title of my ggplots based on outcomes.
Example
I have a function which helps me to filter a data base by type and by county.
df without filter
county | date | value | type
-----------------------------------
Alameda 2020-01-01 6 positive
Alameda 2020-01-02 2 negative
Alameda 2020-01-03 1 positive
LA 2020-01-04 4 positive
LA 2020-01-03 1 positive
** Function **
function_forggplot <- function(data = df,
select_county = "Alameda",
type_order = unique(df$type)) {
#Filter data base
df_outcome <- df[df$county %in% select_county,]
df_outcome <- df_outcome[df_outcome$type %in% type_order,]
gg_outcome <- ggplot(
data = df_outcome,
aes(x = date,
y = value,
color = type
)) +
geom_line(size = .5)
I want change the legend title of my ggplot2 based on the outcome for example if the user select county= LA and type = positive. I want a title in my ggplot like "Results positives for LA".
With if else conditionals works but I have more than 100 cases so I think this not a good option.
Expect outcome
function_forggplot(county="Alameda", type = "negatives")
A ggplot object with this title "Results negatives for Alameda"
function_forggplot(county="Fresno", type = "postives")
A ggplot object with this title "Results pisitives for Fresno"
Thanks
You can adapt the code you showed in the function directly like this. Also you can use paste0() for the title (no need of other packages):
library(ggplot2)
#Function
funplot <- function(df, select_county, type_order) {
#Filter data base
df_outcome <- df[df$county == select_county,]
df_outcome <- df_outcome[df_outcome$type == type_order,]
#Plot
ggplot(df_outcome,aes(x = date, y = value)) +
geom_line() +
ggtitle(paste0("Results ",select_county," for ",type_order))
}
#Apply
funplot(mydf, 'Alameda', 'positive')
Output:
Some data used:
#Data
mydf <- structure(list(county = c("Alameda", "Alameda", "Alameda", "LA",
"LA"), date = structure(c(18262, 18263, 18264, 18265, 18264), class = "Date"),
value = c(6L, 2L, 1L, 4L, 1L), type = c("positive", "negative",
"positive", "positive", "positive")), row.names = c(NA, -5L
), class = "data.frame")
We can create the function with glue as it is very flexible to get objects specified within {}. Of course, we can use paste or sprintf as well. Anyway, ggplot is an external package. So using, another package from the tidyverse, would make this more tidier.
library(ggplot2)
library(dplyr)
f1 <- function(dat, county_nm, type_nm) {
dat %>%
filter(county == county_nm, type == type_nm) %>%
ggplot(aes(x = date, y = value)) +
geom_line() +
ggtitle(glue::glue("Results {type_nm} for {county_nm}"))
}
then, we call as
f1(df, 'LA', 'positive')
-output
Or without any packages (other than ggplot2, dplyr)
f2 <- function(dat, county_nm, type_nm) {
dat %>%
filter(county == county_nm, type == type_nm) %>%
ggplot(aes(x = date, y = value)) +
geom_line() +
ggtitle(sprintf("Results %s for %s", type_nm, county_nm))
}
data
df <- structure(list(county = c("Alameda", "Alameda", "Alameda", "LA",
"LA"), date = structure(c(18262, 18263, 18264, 18265, 18264), class = "Date"),
value = c(6L, 2L, 1L, 4L, 1L), type = c("positive", "negative",
"positive", "positive", "positive")), row.names = c(NA, -5L
), class = "data.frame")

ggplot2::coord_cartesian on facets

coord_cartesian doesn't allow one to set per-facet coordinates, and using other range-limiting tends to produce a straight-line on the specific extreme. Since we have widelay-varying y-ranges, we can't set the limits on all facets identically; limiting the data before plot is not as friendly with geom_line/geom_path (https://stackoverflow.com/a/27319786/3358272), as it takes a lot more effort to interpolate data to get to the edge and then insert NAs in order to break up the line. (Ultimately, the only way to get the desired result is to do exactly this, which can be a bit onerous with other data.)
One workaround is suggested in https://gist.github.com/burchill/d780d3e8663ad15bcbda7869394a348a, where it starts with
test_data %>%
ggplot(aes(x=Nsubjects, y = Odds, color=EffectSize)) +
facet_wrap(DataType ~ ExpType, labeller = label_both, scales="free") +
geom_line(size=2) +
geom_ribbon(aes(ymax=Upper, ymin=Lower, fill=EffectSize, color=NULL), alpha=0.2)
and in previous versions of ggplot2, that gist defined coord_panel_ranges and was able to control coordinates per-facet. The two right facets should narrow down to a 1-6(ish) y-axis so that the exploding confidence interval goes off-screen and allows the facet to focus primarily on the "normal range" of data. (Note: the test_data and this vis is not mine, it's taken from the gist. While my needs are somewhat similar, I thought it better to stay within the confines of the gist's data and code.)
Unfortunately, this now fails for me with ggplot2-3.3.0. Initial errors related to the recent loss of ggplot2::scale_range, which I tried to mitigate with this adaptation of burchill's code (that uses other ggplot2::: internal functions):
UniquePanelCoords <- ggplot2::ggproto(
"UniquePanelCoords", ggplot2::CoordCartesian,
num_of_panels = 1,
panel_counter = 1,
panel_ranges = NULL,
setup_layout = function(self, layout, params) {
self$num_of_panels <- length(unique(layout$PANEL))
self$panel_counter <- 1
layout
},
setup_panel_params = function(self, scale_x, scale_y, params = list()) {
if (!is.null(self$panel_ranges) & length(self$panel_ranges) != self$num_of_panels)
stop("Number of panel ranges does not equal the number supplied")
train_cartesian <- function(scale, limits, name, given_range = NULL) {
if (is.null(given_range)) {
expansion <- ggplot2:::default_expansion(scale, expand = self$expand)
range <- ggplot2:::expand_limits_scale(scale, expansion,
coord_limits = self$limits[[name]])
} else {
range <- given_range
}
out <- scale$break_info(range)
out$arrange <- scale$axis_order()
names(out) <- paste(name, names(out), sep = ".")
out
}
cur_panel_ranges <- self$panel_ranges[[self$panel_counter]]
if (self$panel_counter < self$num_of_panels)
self$panel_counter <- self$panel_counter + 1
else
self$panel_counter <- 1
c(train_cartesian(scale_x, self$limits$x, "x", cur_panel_ranges$x),
train_cartesian(scale_y, self$limits$y, "y", cur_panel_ranges$y))
}
)
coord_panel_ranges <- function(panel_ranges, expand = TRUE, default = FALSE, clip = "on") {
ggplot2::ggproto(NULL, UniquePanelCoords, panel_ranges = panel_ranges,
expand = expand, default = default, clip = clip)
}
but this is still failing with
test_data %>%
ggplot(aes(x=Nsubjects, y = Odds, color=EffectSize)) +
facet_wrap(DataType ~ ExpType, labeller = label_both, scales="free") +
geom_line(size=2) +
geom_ribbon(aes(ymax=Upper, ymin=Lower, fill=EffectSize, color=NULL), alpha=0.2) +
coord_panel_ranges(panel_ranges = list(
list(x=c(8,64), y=c(1,4)), # Panel 1
list(x=c(8,64), y=c(1,6)), # Panel 2
list(NULL), # Panel 3, an empty list falls back on the default values
list(x=c(8,64), y=c(1,7)) # Panel 4
))
# Error in panel_params$x$break_positions_minor() :
# attempt to apply non-function
I'm not very familiar with extending ggplot2, and I suspect there is something I'm missing from the ggproto. Here's what the return value from the proto looks like:
str(c(train_cartesian(scale_x, self$limits$x, "x", cur_panel_ranges$x),
train_cartesian(scale_y, self$limits$y, "y", cur_panel_ranges$y)))
# List of 14
# $ x.range : num [1:2] 8 64
# $ x.labels : chr [1:3] "20" "40" "60"
# $ x.major : num [1:3] 0.214 0.571 0.929
# $ x.minor : num [1:6] 0.0357 0.2143 0.3929 0.5714 0.75 ...
# $ x.major_source: num [1:3] 20 40 60
# $ x.minor_source: num [1:6] 10 20 30 40 50 60
# $ x.arrange : chr [1:2] "secondary" "primary"
# $ y.range : num [1:2] 1 4
# $ y.labels : chr [1:4] "1" "2" "3" "4"
# $ y.major : num [1:4] 0 0.333 0.667 1
# $ y.minor : num [1:7] 0 0.167 0.333 0.5 0.667 ...
# $ y.major_source: num [1:4] 1 2 3 4
# $ y.minor_source: num [1:7] 1 1.5 2 2.5 3 3.5 4
# $ y.arrange : chr [1:2] "primary" "secondary"
Do I need to have an x element that's a list with at least a break_positions_minor function, or is there something else that needs to be inherited in order to ensure panel_params$x$break_positions_minor exists or a reasonable default is used?
Data:
test_data <- structure(list(DataType = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
ExpType = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("X", "Y"), class = "factor"),
EffectSize = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("15", "35"
), class = "factor"), Nsubjects = c(8, 16, 32, 64, 8, 16,
32, 64, 8, 16, 32, 64, 8, 16, 32, 64, 8, 16, 32, 64, 8, 16,
32, 64, 8, 16, 32, 64, 8, 16, 32, 64), Odds = c(1.06248116259846,
1.09482076720863, 1.23086993413208, 1.76749340505612, 1.06641831731573,
1.12616954196688, 1.48351814320987, 3.50755080416964, 1.11601399761081,
1.18352602009495, 1.45705466646283, 2.53384744810515, 1.13847061762186,
1.24983742407086, 1.97075900741022, 6.01497152563726, 1.02798821372378,
1.06297006279249, 1.19432835697453, 1.7320754674107, 1.02813271730924,
1.09355953747203, 1.44830680332583, 3.4732692664923, 1.06295915758305,
1.12008443626365, 1.3887632112682, 2.46321037334, 1.06722652223114,
1.1874936754725, 1.89870184372054, 5.943747409114), Upper = c(1.72895843644471,
2.09878774769559, 2.59771794965346, 5.08513435549015, 1.72999898901071,
1.8702196882561, 3.85385388850167, 5.92564404180303, 1.99113042576373,
2.61074135841984, 3.45852331828636, 4.83900142207583, 1.57897154221764,
1.8957409107653, 10, 75, 2.3763918424135, 2.50181951057562,
3.45037180395673, 3.99515276392065, 2.04584535265976, 2.39317394040066,
2.832526733659, 5.38414183471915, 1.40569501856836, 2.6778044191832,
2.98023068052396, 4.75934650422069, 1.54116883311054, 2.50647989271592,
3.48517589981551, 100), Lower = c(0.396003888752214, 0.0908537867216577,
-0.135978081389309, -1.55014754537791, 0.40283764562075,
0.382119395677663, -0.88681760208193, 1.08945756653624, 0.240897569457892,
-0.243689318229938, -0.544413985360706, 0.228693474134466,
0.69796969302609, 0.603933937376415, 0.183548809738402, 3.57236968943798,
-0.320415414965949, -0.375879384990643, -1.06171509000767,
-0.531001829099242, 0.010420081958713, -0.206054865456611,
0.0640868729926525, 1.56239669826544, 0.720223296597732,
-0.437635546655903, -0.202704257987574, 0.167074242459314,
0.593284211351745, -0.131492541770921, 0.312227787625573,
3.76692741957876)), .Names = c("DataType", "ExpType", "EffectSize",
"Nsubjects", "Odds", "Upper", "Lower"), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -32L))
I modified the function train_cartesian to match the output format of view_scales_from_scale (defined here), which seems to work:
train_cartesian <- function(scale, limits, name, given_range = NULL) {
if (is.null(given_range)) {
expansion <- ggplot2:::default_expansion(scale, expand = self$expand)
range <- ggplot2:::expand_limits_scale(scale, expansion,
coord_limits = self$limits[[name]])
} else {
range <- given_range
}
out <- list(
ggplot2:::view_scale_primary(scale, limits, range),
sec = ggplot2:::view_scale_secondary(scale, limits, range),
arrange = scale$axis_order(),
range = range
)
names(out) <- c(name, paste0(name, ".", names(out)[-1]))
out
}
p <- test_data %>%
ggplot(aes(x=Nsubjects, y = Odds, color=EffectSize)) +
facet_wrap(DataType ~ ExpType, labeller = label_both, scales="free") +
geom_line(size=2) +
geom_ribbon(aes(ymax=Upper, ymin=Lower, fill=EffectSize, color=NULL), alpha=0.2)
p +
coord_panel_ranges(panel_ranges = list(
list(x=c(8,64), y=c(1,4)), # Panel 1
list(x=c(8,64), y=c(1,6)), # Panel 2
list(NULL), # Panel 3, an empty list falls back on the default values
list(x=c(8,64), y=c(1,7)) # Panel 4
))
Original answer
I've cheated my way out of a similar problem before.
# alternate version of plot with data truncated to desired range for each facet
p.alt <- p %+% {test_data %>%
mutate(facet = as.integer(interaction(DataType, ExpType, lex.order = TRUE))) %>%
left_join(data.frame(facet = 1:4,
ymin = c(1, 1, -Inf, 1), # change values here to enforce
ymax = c(4, 6, Inf, 7)), # different axis limits
by = "facet") %>%
mutate_at(vars(Odds, Upper, Lower), list(~ ifelse(. < ymin, ymin, .))) %>%
mutate_at(vars(Odds, Upper, Lower), list(~ ifelse(. > ymax, ymax, .))) }
# copy alternate version's panel parameters to original plot & plot the result
p1 <- ggplot_build(p)
p1.alt <- ggplot_build(p.alt)
p1$layout$panel_params <- p1.alt$layout$panel_params
p2 <- ggplot_gtable(p1)
grid::grid.draw(p2)
Many thanks go to Z.Lin for starting the fix to my question, and that answer certainly helped me get past the errors and learn a more appropriate way of working with ggproto objects.
This answer is posted as more of a flexible method of fixing the underlying problem of per-panel limits within a faceted plot. The major issue I had with my first batch of code was that it relies on the ordering of the facets, which in some of my other (private) use-cases is not always known (well, not controlled) a priori. Because of this, I wanted an unambiguous determination of per-panel limits.
I've changed the function name (and the args) to represent two points: (1) this appears to be mimic/replace coord_cartesian, and (2) I don't know that it will translate to other coord_* functions without adjustment. Comments/patches welcome at my gist.
Up front, a perfect duplication of Z.Lin's results can be had with:
p <- test_data %>%
ggplot(aes(x = Nsubjects, y = Odds, color=EffectSize)) +
facet_wrap(DataType ~ ExpType, labeller = label_both, scales = "free") +
geom_line(size = 2) +
geom_ribbon(aes(ymax = Upper, ymin = Lower, fill = EffectSize, color = NULL), alpha = 0.2)
p + coord_cartesian_panels(
panel_limits = tibble::tribble(
~DataType, ~ExpType, ~ymin, ~ymax
, "A" , "X" , 1, 4
, "A" , "Y" , 1, 6
, "B" , "Y" , 1, 7
)
)
and gone is the ambiguity (that the original code introduced) of which panel is which argument in the list. Since it uses a data.frame to match (usually merge) with the layout of the plot, the order of rows does not matter.
Notes:
the panel_limits fields referenced are: xmin, xmax, ymin, and ymax, on top of whichever faceting variables are desired;
an NA in a particular field (or a missing field) means to use the previously-defined limit;
when all faceting-variables match (between panel_limits and the layout defined by facet_*), the limits are set on individual panels; this one-to-one mapping is the going-in assumption about this function;
when some (but not all) variables match, the limits are set on a subset of panels (e.g., on one axis of the panels, depending on the faceting method);
when no variables match and panel_limits is a single row, then set the limits for all panels indiscriminately; and
faceting rows in panel_limits that match nothing in layout are silently ignored.
Errors:
any faceting variables in panel_limits that do not exist in the layout (i.e., not specified within facet_*); or
more than one row in panel_limits matches a particular panel.
As an extension, this also handles a subset of the faceting variables, so if we want to limit all facets by ExpType only, then
# set the limits on panels based on one faceting variable only
p + coord_cartesian_panels(
panel_limits = tibble::tribble(
~ExpType, ~ymin, ~ymax
, "X" , NA, 4
, "Y" , 1, 5
)
) + labs(title = "panel_limits, one variable")
# set the limits on all panels
p + coord_cartesian_panels(
panel_limits = tibble::tribble(
~ymin, ~ymax
, NA, 5
)
) + labs(title = "panel_limits, no variables")
(The last example seems silly, but if the facets/plots are being built programmatically and it is not guaranteed a priori that there are individual facets, then this will result in a reasonable default behavior, assuming that everything is otherwise unambiguous.)
A further extension might allow for an NA in a facet variable to match all, such as
# does not work
p + coord_cartesian_panels(
panel_limits = tibble::tribble(
~DataType, ~ExpType, ~ymin, ~ymax
, "A" , NA , 1, 4
, NA , "Y" , 1, 6
)
)
This would require that merge understand that NA means "all/any", not a literal NA. I'm not going to extend merge at the moment to handle that, so I'm not going to complicate this function to attempt to do that. If there is a reasonable merge replacement that does this kind of calculus, let me know :-)
Many Thanks to ...
burchill for the original effort and gist; and
Z.Lin, for helping to bring the function up to ggplot2-3.3.0.
UniquePanelCoords <- ggplot2::ggproto(
"UniquePanelCoords", ggplot2::CoordCartesian,
num_of_panels = 1,
panel_counter = 1,
layout = NULL,
setup_layout = function(self, layout, params) {
self$num_of_panels <- length(unique(layout$PANEL))
self$panel_counter <- 1
self$layout <- layout # store for later
layout
},
setup_panel_params = function(self, scale_x, scale_y, params = list()) {
train_cartesian <- function(scale, limits, name, given_range = c(NA, NA)) {
if (anyNA(given_range)) {
expansion <- ggplot2:::default_expansion(scale, expand = self$expand)
range <- ggplot2:::expand_limits_scale(scale, expansion, coord_limits = limits)
isna <- is.na(given_range)
given_range[isna] <- range[isna]
}
out <- list(
ggplot2:::view_scale_primary(scale, limits, given_range),
sec = ggplot2:::view_scale_secondary(scale, limits, given_range),
arrange = scale$axis_order(),
range = given_range
)
names(out) <- c(name, paste0(name, ".", names(out)[-1]))
out
}
this_layout <- self$layout[ self$panel_counter,, drop = FALSE ]
self$panel_counter <-
if (self$panel_counter < self$num_of_panels) {
self$panel_counter + 1
} else 1
# determine merge column names by removing all "standard" names
layout_names <- setdiff(names(this_layout),
c("PANEL", "ROW", "COL", "SCALE_X", "SCALE_Y"))
limits_names <- setdiff(names(self$panel_limits),
c("xmin", "xmax", "ymin", "ymax"))
limit_extras <- setdiff(limits_names, layout_names)
if (length(limit_extras) > 0) {
stop("facet names in 'panel_limits' not found in 'layout': ",
paste(sQuote(limit_extras), collapse = ","))
} else if (length(limits_names) == 0 && NROW(self$panel_limits) == 1) {
# no panels in 'panel_limits'
this_panel_limits <- cbind(this_layout, self$panel_limits)
} else {
this_panel_limits <- merge(this_layout, self$panel_limits, all.x = TRUE, by = limits_names)
}
if (isTRUE(NROW(this_panel_limits) > 1)) {
stop("multiple matches for current panel in 'panel_limits'")
}
# add missing min/max columns, default to "no override" (NA)
this_panel_limits[, setdiff(c("xmin", "xmax", "ymin", "ymax"),
names(this_panel_limits)) ] <- NA
c(train_cartesian(scale_x, self$limits$x, "x",
unlist(this_panel_limits[, c("xmin", "xmax"), drop = TRUE])),
train_cartesian(scale_y, self$limits$y, "y",
unlist(this_panel_limits[, c("ymin", "ymax"), drop = TRUE])))
}
)
coord_cartesian_panels <- function(panel_limits, expand = TRUE, default = FALSE, clip = "on") {
ggplot2::ggproto(NULL, UniquePanelCoords,
panel_limits = panel_limits,
expand = expand, default = default, clip = clip)
}
At some point I had a similar problem to this. The result was a slightly more verbose but also more flexible option that can customize many aspects of position scales on a per-facet basis. Due to some technicality it uses the equivalent of scales::oob_keep() as oob arguments on the scales, thereby acting as if the coordinates determined the limits.
library(ggh4x)
library(tidyverse)
p <- test_data %>%
ggplot(aes(x=Nsubjects, y = Odds, color=EffectSize)) +
facet_wrap(DataType ~ ExpType, labeller = label_both, scales="free") +
geom_line(size=2) +
geom_ribbon(aes(ymax=Upper, ymin=Lower, fill=EffectSize, color=NULL), alpha=0.2) +
facetted_pos_scales(
x = list(
scale_x_continuous(limits = c(8, 64)),
scale_x_continuous(limits = c(64, 8), trans = "reverse"),
NULL,
scale_x_continuous(limits = c(8, 64), labels = scales::dollar_format())
),
y = list(
scale_y_continuous(limits = c(1, 4), guide = "none"),
scale_y_continuous(limits = c(1, 6), breaks = 1:3),
NULL,
scale_y_continuous(limits = c(1, 7), position = "right")
)
)

Adjust area from geom_area to a line from geom_line

I'm trying to make a hourly dispatch curve with generation and energy consumpsion data, which have the characteristic that when we do a power balance (generation minus consumpsion) we get values nearly to zero.
Into the generation data there are also net interchange values, that be negative when de power system are exporting energy and positive when the system are importing energy to complete the consumption.
Thus, to the plot created with geom_area and geom_line be ok, the black line (consumption) needs be adjusted with the generation area, so that there's no gap between the area and the black line. But, in my attempts I couldn't do it. How you can see, same the energy balence resulting in zero, there are a gap beetwen 19 and 20 hours. I don't know what is wrong. Someone have idea how to do that?
Thanks in advance.
Data to the plot:
generation <-
data.frame('dayHour' = c('18/11/2018 18:00','18/11/2018 19:00','18/11/2018 20:00','18/11/2018 21:00','18/11/2018 18:00','18/11/2018 19:00','18/11/2018 20:00','18/11/2018 21:00','18/11/2018 18:00','18/11/2018 19:00','18/11/2018 20:00','18/11/2018 21:00','18/11/2018 18:00','18/11/2018 19:00','18/11/2018 20:00','18/11/2018 21:00'),
'power' = c(-1364.290, -433.110, 1132.39, 749.48, 463.75, 467.8, 469.35, 436.51, 2025.5, 2133.07, 2306.85, 2304.91, 211.52, 213.16, 214.33, 214.59),
'label' = c('net interchange', 'net interchange', 'net interchange', 'net interchange', 'gas', 'gas', 'gas', 'gas', 'hydro', 'hydro', 'hydro', 'hydro', 'biomass', 'biomass', 'biomass', 'biomass'))
generation$label <- factor(generation$label, levels = c('net interchange', 'gas', 'hydro', 'biomass'))
net.load <-
data.frame('dayHour' = c('18/11/2018 18:00', '18/11/2018 19:00', '18/11/2018 20:00', '18/11/2018 21:00'), 'power' = c(1336.48, 2380.91, 4122.91, 3705.49), 'label' = c('net load', 'net load', 'net load', 'net load'))
generation$dayHour <-
as.POSIXct(strptime(generation$dayHour,format='%d/%m/%Y %H:%M'))
net.load$dayHour <-
as.POSIXct(strptime(net.load$dayHour,format='%d/%m/%Y %H:%M'))
Power balance
pb <-
filter(generation, label == "biomass")$power +
filter(generation, label == "hydro")$power +
filter(generation, label == "gas")$power +
filter(generation, label == "net interchange")$power -
net.load$power
summary(pb)
Dispatch curve
ggplot() +
geom_area(data = generation,
aes(y = power,
x = dayHour,
fill = label)) +
geom_line(data = net.load,
aes(y = power,
x = dayHour,
colour = label),
size = 1.2,
colour = "black") +
labs(fill = "generation",
colour = 'net load')
It looks like position_stack is getting confused when the interpolation crosses the x-axis.
To fix it, you can interpolate manually before plotting (e.g. with approx):
library(tidyverse)
generation <- data.frame(
dayHour = structure(c(1542585600, 1542589200, 1542592800, 1542596400, 1542585600, 1542589200, 1542592800, 1542596400, 1542585600, 1542589200, 1542592800, 1542596400, 1542585600, 1542589200, 1542592800, 1542596400), class = c("POSIXct", "POSIXt"), tzone = ""),
power = c(-1364.29, -433.11, 1132.39, 749.48, 463.75, 467.8, 469.35, 436.51, 2025.5, 2133.07, 2306.85, 2304.91, 211.52, 213.16, 214.33, 214.59),
label = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("net interchange", "gas", "hydro", "biomass"), class = "factor")
)
generation_interpolated <- generation %>%
group_by(label) %>%
summarise(data = list(as_tibble(approx(dayHour, power, n = 501)))) %>%
unnest() %>%
mutate(x = as.POSIXct(x, origin = '1970-01-01', tz = 'UTC'))
net_power_interpolated <- generation_interpolated %>%
group_by(x) %>%
summarise(y = sum(y))
ggplot(generation_interpolated, aes(x, y)) +
geom_area(aes(fill = label)) +
geom_line(data = net_power_interpolated)
To see how approx works, a simpler, ungrouped example:
df <- data.frame(x = c(0, 5, 10), y = c(0, 20, 10))
interpolated <- approx(df$x, df$y, n = 11)
str(interpolated)
#> List of 2
#> $ x: int [1:11] 0 1 2 3 4 5 6 7 8 9 ...
#> $ y: num [1:11] 0 4 8 12 16 20 18 16 14 12 ...
ggplot(as.data.frame(interpolated), aes(x, y)) +
geom_line() +
geom_point() +
geom_point(data = df, color = 'dodgerblue', size = 4)

Getting an error when trying to create a Shiny app

I have two map plots of 'Total Population' and 'Population Density' created using a shape file. Now, I'm trying to build a shiny app so that I can change from Total Population to Population Density and the plot should change accordingly. When I ran the code, i got following error code:
Warning: Error in : ggplot2 doesn't know how to deal with data of class matrix
Here's the code that i've been trying to use:
library(shiny)
library(ggplot2) #Loading necessary libraries
ui <- fluidPage(
selectInput("mr",
label="Type of Plot",
choices=c("Total Population", "Density"),
selected="Total Population"),
plotOutput("curv") #Giving an input name and listing out types to choose in the Shiny app
)
server <- function(input, output){
output$curv <- renderPlot({
ggplot() +
geom_polygon(data = final.plot==input$mr,
aes(x = long, y = lat, group = group, fill = Population),
color = "black", size = 0.20) +
coord_map()+
scale_fill_distiller(name="Population", palette = "YlGn")+
labs(title="Population in Australia")
}) # Output with the data file and input string to change when input changes.
}
shinyApp(ui = ui, server = server)
Any help is greatly appreciated.
UPDATE:
My dataset looks like this:
id long lat order hole piece
1 Ashmore and Cartier Islands 123.1169 -12.25333 1 FALSE 1
2 Ashmore and Cartier Islands 123.1206 -12.25611 2 FALSE 1
3 Ashmore and Cartier Islands 123.1222 -12.25861 3 FALSE 1
4 Ashmore and Cartier Islands 123.1239 -12.25528 4 FALSE 1
5 Ashmore and Cartier Islands 123.1258 -12.25333 5 FALSE 1
6 Ashmore and Cartier Islands 123.1275 -12.25619 6 FALSE 1
group Population Density
1 Ashmore and Cartier Islands.1 NA NA
2 Ashmore and Cartier Islands.1 NA NA
3 Ashmore and Cartier Islands.1 NA NA
4 Ashmore and Cartier Islands.1 NA NA
5 Ashmore and Cartier Islands.1 NA NA
6 Ashmore and Cartier Islands.1 NA NA
This is stored in the DataFrame called "final.plot". There's values of Population and Density for other states. I was able to create a static visualisation of Population and it looks like this:
There's a similar one for Density and I'm trying to create Shiny app where i can switch between these two so that the plot changes accordingly. Right now I've tried the following code:
library(shiny)
library(ggplot2) #Loading necessary libraries
ui <- fluidPage(
selectInput("pop",
label="Type of Plot",
choices=c("Population", "Density"),
selected="Total Population"),
plotOutput("curv") #Giving an input name and listing out types to choose in the Shiny app
)
server <- function(input, output){
output$curv <- renderPlot({
ggplot() +
geom_polygon(data = final.plot,
aes(x = long, y = lat, group = group, fill = input$pop),
color = "black", size = 0.25) +
coord_map()+
scale_fill_distiller(name="Density", palette = "Spectral")+
labs(title="Population in Australia")
})
}
shinyApp(ui = ui, server = server)
But I'm getting an error saying "Discrete value supplied to continuous scale".
UPDATE 2:
Here's the link for the dataset i'm using:
Dataset
I've had a quick look at your code and have a couple of suggestions.
1) When providing your data set you can use the function dput() - this writes a text representation of your data.frame which people answering your question can simply paste into R. For example:
dput(final.plot)
This will produce text output that I can assign to a dataframe by prefixing final.plot <- to the output. I have recreated your dataframe and used dput() to output it as text below. Now other users can quickly cut & paste your data:
Note this dataset is faulty - see below
final.plot <- structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "Ashmore and Cartier Islands", class = "factor"),
long = c(123.1169, 123.1206, 123.1222, 123.1239, 123.1258, 123.1275),
lat = c(-12.25333, -12.25611, -12.25861, -12.25528, -12.25333, -12.25619),
order = 1:6, hole = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE),
piece = c(1L, 1L, 1L, 1L, 1L, 1L),
group = structure(c(1L, 1L, 1L, 1L, 1L, 1L),
.Label = "Ashmore and Cartier Islands.1", class = "factor"),
Population = c(NA, NA, NA, NA, NA, NA),
Density = c(NA, NA, NA, NA, NA, NA)),
.Names = c("id", "long", "lat", "order", "hole", "piece", "group", "Population", "Density"),
class = "data.frame",
row.names = c(NA, -6L))
The error "Discrete value supplied to continuous scale" is caused by two issues.
i) You are passing NA in both your Population and Density columns. The dataframe below adds some (unrealistic) numbers to these columns and the error is removed when I run the plotting code in isolation.
Corrected Toy Dataset
final.plot <- structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "Ashmore and Cartier Islands", class = "factor"),
long = c(123.1169, 123.1206, 123.1222, 123.1239, 123.1258, 123.1275),
lat = c(-12.25333, -12.25611, -12.25861, -12.25528, -12.25333, -12.25619),
order = 1:6, hole = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE),
piece = c(1L, 1L, 1L, 1L, 1L, 1L),
group = structure(c(1L, 1L, 1L, 1L, 1L, 1L),
.Label = "Ashmore and Cartier Islands.1", class = "factor"),
Population = c(1, 2, 3, 4, 5, 6),
Density = c(7, 3, 9, 1, 3, 6)),
.Names = c("id", "long", "lat", "order", "hole", "piece", "group", "Population", "Density"),
class = "data.frame",
row.names = c(NA, -6L))
ii) When run interactively the error is caused because you are not passing appropriate data to fill in fill = input$pop. You should be passing the values from final.plot$Population or final.plot$Density depending on what was selected. You are instead passing the output of the dropdown box - "Population" or "Density". This can be corrected using a switch statement within renderPlot:
# User input assigns appropriate data to selectedData which can be passed to other functions:
selectedData <- switch(input$pop,
"Population" = final.plot$Population,
"Density" = final.plot$Density)
2) It would be helpful if you could provide the code which produced the static map you show in your Update above. When debugging Shiny code I find it easiest to get the function working non-interactively first and then to incorporate it into Shiny. I tried to extract your plotting code below but it is not producing the expected results:
library(ggplot2) #Loading necessary libraries
library(mapproj)
library(maps)
ggplot() +
geom_polygon(data = final.plot,
[![aes(x = long, y = lat, group = group, fill = Population),
color = "black", size = 0.25) +
coord_map()+
scale_fill_distiller(name="Density", palette = "Spectral")+
labs(title="Population in Australia")`
3) I am not familiar with plotting data onto maps in R but I believe your app will need to load in library(mapproj) and library(maps) to get the results you need. Here is all the above put together:
library(shiny)
library(ggplot2) #Loading necessary libraries
#I added the two lines below:
library(mapproj)
library(map)
ui <- fluidPage(
selectInput("pop",
label="Type of Plot",
choices=list("Population", "Density"),
selected="Population"), #NOTE: Total Population changed to Population so that it selects correct default value
plotOutput("curv") #Giving an input name and listing out types to choose in the Shiny app
)
server <- function(input, output){
#You will probably want to simply import your dataframe final.plot using read.table etc:
final.plot <- structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "Ashmore and Cartier Islands", class = "factor"),
long = c(123.1169, 123.1206, 123.1222, 123.1239, 123.1258, 123.1275),
lat = c(-12.25333, -12.25611, -12.25861, -12.25528, -12.25333, -12.25619),
order = 1:6, hole = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE),
piece = c(1L, 1L, 1L, 1L, 1L, 1L),
group = structure(c(1L, 1L, 1L, 1L, 1L, 1L),
.Label = "Ashmore and Cartier Islands.1", class = "factor"),
Population = c(1, 2, 3, 4, 5, 6),
Density = c(7, 3, 9, 1, 3, 6)),
.Names = c("id", "long", "lat", "order", "hole", "piece", "group", "Population", "Density"),
class = "data.frame",
row.names = c(NA, -6L))
output$curv <- renderPlot({
#Assign value of selectedData based upon user input:
selectedData <- switch(input$pop,
"Population" = final.plot$Population,
"Density" = final.plot$Density)
ggplot() +
geom_polygon(data = final.plot,
aes(x = long, y = lat, group = group, fill = selectedData),
color = "black", size = 0.25) +
coord_map()+
scale_fill_distiller(name="Density", palette = "Spectral")+
labs(title="Population in Australia")
})
}
shinyApp(ui = ui, server = server)
Now all you need to do is substitute your code which produced the static map shown in your update for the faulty code in renderPlot in your shiny app.

Color data points based on sample classification

A pairwise scatterplot showing relationship between genes (columns in data frame) across multiple samples (rows in data frame) is created. The samples belong to two distinct groups: group "A" and "B". Since one dot in plot represent one sample, I need to color the data points (dots) according to groups with two different colors, say group A with "green" and group B with "red". Is it possible to do that?
Any kind of help will be appreciated.
plot(DF[1:6], pch = 21) #command used for plotting, DF is data frame
Sample Data Frame Example:
CBX3 PSPH ATP2C1 SNX10 MMD ATP13A3
B 10.589844 6.842970 8.084550 8.475023 9.202490 10.403811
A 10.174385 5.517944 7.736994 9.094834 9.253766 10.133408
B 10.202084 5.669137 7.392141 7.522270 7.830969 9.123178
B 10.893231 6.630709 7.601690 7.894177 8.979142 9.791841
B 10.071038 5.091222 7.032585 8.305581 7.903737 8.994821
A 10.005002 4.708631 7.927246 7.292527 8.257853 10.054630
B 10.028055 5.080944 6.421961 7.616856 8.287496 9.642294
A 10.144115 6.626483 7.686203 7.970934 7.919615 9.475175
A 10.675386 6.874047 7.900560 7.605519 8.585158 8.858613
A 9.855063 5.164399 6.847923 8.072608 8.221344 9.077744
A 10.994228 6.545318 8.606128 8.426329 8.787876 9.857079
A 10.501266 6.677360 7.787168 8.444976 8.928174 9.542558
GGally has a good function for this as well.
library(GGally)
ggpairs(dd, color = 'CLASS',columns = 2:ncol(dd) )
It might not be that easy to do with base graphics. You could easily do this with lattice. With this sample data.frame
dd<-structure(list(CLASS = structure(c(2L, 1L, 2L, 2L, 2L, 1L, 2L,
1L, 1L, 1L, 1L, 1L), .Label = c("A", "B"), class = "factor"),
CBX3 = c(10.589844, 10.174385, 10.202084, 10.893231, 10.071038,
10.005002, 10.028055, 10.144115, 10.675386, 9.855063, 10.994228,
10.501266), PSPH = c(6.84297, 5.517944, 5.669137, 6.630709,
5.091222, 4.708631, 5.080944, 6.626483, 6.874047, 5.164399,
6.545318, 6.67736), ATP2C1 = c(8.08455, 7.736994, 7.392141,
7.60169, 7.032585, 7.927246, 6.421961, 7.686203, 7.90056,
6.847923, 8.606128, 7.787168), SNX10 = c(8.475023, 9.094834,
7.52227, 7.894177, 8.305581, 7.292527, 7.616856, 7.970934,
7.605519, 8.072608, 8.426329, 8.444976), MMD = c(9.20249,
9.253766, 7.830969, 8.979142, 7.903737, 8.257853, 8.287496,
7.919615, 8.585158, 8.221344, 8.787876, 8.928174), ATP13A3 = c(10.403811,
10.133408, 9.123178, 9.791841, 8.994821, 10.05463, 9.642294,
9.475175, 8.858613, 9.077744, 9.857079, 9.542558)), .Names = c("CLASS",
"CBX3", "PSPH", "ATP2C1", "SNX10", "MMD", "ATP13A3"), class = "data.frame", row.names = c(NA, -12L))
you can do
library(lattice)
splom(~dd[,-1], groups=dd$CLASS)
to get
You can add color to the points by specifying the argument col
to plot
DF <- read.delim(textConnection(
"category CBX3 PSPH ATP2C1 SNX10 MMD ATP13A3
B 10.589844 6.842970 8.084550 8.475023 9.202490 10.403811
A 10.174385 5.517944 7.736994 9.094834 9.253766 10.133408
B 10.202084 5.669137 7.392141 7.522270 7.830969 9.123178
B 10.893231 6.630709 7.601690 7.894177 8.979142 9.791841
B 10.071038 5.091222 7.032585 8.305581 7.903737 8.994821
A 10.005002 4.708631 7.927246 7.292527 8.257853 10.054630
B 10.028055 5.080944 6.421961 7.616856 8.287496 9.642294
A 10.144115 6.626483 7.686203 7.970934 7.919615 9.475175
A 10.675386 6.874047 7.900560 7.605519 8.585158 8.858613
A 9.855063 5.164399 6.847923 8.072608 8.221344 9.077744
A 10.994228 6.545318 8.606128 8.426329 8.787876 9.857079
A 10.501266 6.677360 7.787168 8.444976 8.928174 9.542558"))
plot(DF[2:7],col = ifelse(DF$category == 'A','red','green'))
A list of valid color values can be obtained by calling colors(). Vectors with a gradient of colors can be created via rainbow(), and just for fun, I use this little function for choosing pretty colors when making a figure.
(Edited per suggestions from #MrFlick)
#! #param n The number of colors to be selected
colorchoose <- function (n = 1, alpha, term = F)
{
cols <- colors()
mod <- ceiling(sqrt(length(cols)))
plot(xlab = "", ylab = "", main = "click for color name",
c(0, mod), c(0, mod), type = "n", axes = F)
s<-seq_along(cols)
dev.hold()
points(s%%mod, s%/%mod, col = cols, pch = 15, cex = 2.4)
dev.flush()
p <- locator(n)
return(cols[round(p$y) * mod + round(p$x)])
}

Resources