ggplot2 histogram binwidth [duplicate] - r

This question already has answers here:
Different breaks per facet in ggplot2 histogram
(4 answers)
Closed 8 years ago.
I would like to create multiple histograms within one plot (using facet_wrap).
This could be an example code:
df <- data.frame(p1 = rnorm(100,5,2), p2 = rnorm(100,80,20), group = rep(LETTERS[1:4],25))
library(ggplot2)
library(reshape)
plotData <- melt(df, id.vars = "group", measure.vars = c("p1","p2") )
m <- ggplot(plotData, aes(x = value, color = group, fill = group, group = group))
m <- m + geom_bar(position=position_dodge())
m <- m + facet_wrap( ~ variable,scales = "free_x")
print(m)
Now, I would like to modify the plot that it creates per parameter ("p1,"p2") let's say 10 bins.
Up to now, I could not find a way to do this as binwidth/breaks calculation should be dependent on a subset of data.
Is it possible at all?
I want to share my solution (taken from the answered question linked above) extended by the possibility to overlay the histograms with density curves scaled to histogram counts:
df <- data.frame(p1 = rnorm(1000,5,2), p2 = rnorm(1000,80,20), group = rep(LETTERS[1:4],25))
library(ggplot2)
library(reshape)
library(plyr)
plotData <- melt(df, id.vars = "group", measure.vars = c("p1","p2") )
nBins <- 10
groupedData <- dlply(plotData, .(variable))
groupedBinWidth <- llply(groupedData, .fun = function(data, nBins) {
r <- range(data$value, na.rm = TRUE, finite = TRUE)
widthOfBins = (r[2] - r[1])/nBins
if (is.na(widthOfBins) || is.infinite(widthOfBins) || (widthOfBins <= 0)) widthOfBins <- NULL
widthOfBins
}, nBins = nBins)
densData <- dlply(plotData, .(variable, group), .fun = function(subData){
param <- subData$variable[1]
group <- subData$group[1]
d <- density(subData$value)
bw <- groupedBinWidth[[param]]
data.frame(x = d$x, y = d$y * nrow(subData) * bw , group = group, variable = param)
})
hls <- mapply(function(x, b) geom_bar(aes(x = value), position = position_dodge(), data = x, binwidth = b),
groupedData, groupedBinWidth)
dLay <- mapply(function(data) geom_density(data = data, aes(x = x, y = y), stat = "identity", fill = NA, size = 1),
densData)
m <- ggplot(plotData, aes(x = value, color = group, fill = group, group = group))
m <- m + hls
m <- m + dLay
m <- m + facet_wrap( ~ variable,scales = "free")
print(m)

Try this - really ugly code, but works if I understand you correctly. You might want to play with geom_density and maybe remove fill to make it more readable.
nbin<- 5
m <- ggplot(plotData, aes(x = value, color = group, fill = group, group = group))
m <- m + geom_histogram(data = subset(plotData, variable == "p1"), binwidth=diff(range(subset(plotData, variable == "p1")$value))/nbin)
m <- m + geom_histogram(data = subset(plotData, variable == "p2"), binwidth=diff(range(subset(plotData, variable == "p2")$value))/nbin)
m <- m + facet_wrap( ~ variable,scales = "free_x")
print(m)

Related

How to draw a multi-colored dashed line (alternating colors for visual effect) [duplicate]

This question already has answers here:
Alternating color of individual dashes in a geom_line
(4 answers)
Closed 8 months ago.
I was wondering if it is possible to create a multicolored dashed line in ggplot.
Basically I have a plot displaying savings based on two packages.
A orange line with savings based on package A
A green line with savings based on package B
I also have a third line and I would like that one to be dashed alterenating between orange and green. Is that something that somebody has been able to do?
Here is an example:
library(tidyverse)
S <- seq(0, 5, by = 0.05)
a <- S ^ 2
b <- S
a_b = a + b #This data should have the dashed multicolor line, since it is the sum of the other two lines.
S <- data.frame(S)
temp <- cbind(S, a, b, a_b)
temp <- gather(temp, variable, value, -S)
desiredOrder <- c("a", "b", "a_b")
temp$variable <- factor(temp$variable, levels = desiredOrder)
temp <- temp[order(temp$variable),]
p <- ggplot(temp, aes(x = S, y = value, colour = variable)) +
theme_minimal() +
geom_line(size = 1) +
scale_color_manual(name = "Legend", values = c("orange", "green", "#0085bd"),
breaks = c("a", "b", "a_b"))
p
I basically want to have a multicolored (dashed or dotted) line for "c"
This is, to my best knowledge, currently only possible via creation of new segments for each alternate color. This is fiddly.
Below I've tried a largely programmatic approach in which you can define the size of the repeating segment (based on your x unit). The positioning of y values is slightly convoluted and it will also result in slightly irregular segment lengths when dealing with different slopes. I also haven't tested it on many data, either. But I guess it's a good start :)
For the legend, I'm taking the same approach, by creating a fake legend and stitching it onto the other plot. The challenges here include:
positioning of legend elements relative to the plot
relative distance between the legend elements
update
For a much neater way to create those segments and a Stat implementation see this thread
library(tidyverse)
library(patchwork)
S <- seq(0, 5, by = 0.05)
a <- S^2
b <- S
a_b <- a + b
df <- data.frame(x = S, a, b, a_b) %>%
pivot_longer(-x, names_to = "variable", values_to = "value")
## a function to create modifiable cuts in order to get segments.
## this looks convoluted - and it is! there are a few if/else statements.
## Why? The assigment of new y to x values depends on how many original values
## you have.
## There might be more direct ways to get there
alt_colors <- function(df, x, y, seg_length, my_cols) {
x <- df[[x]]
y <- df[[y]]
## create new x for each tiny segment
length_seg <- seg_length / length(my_cols)
new_x <- seq(min(x, na.rm = TRUE), x[length(x)], length_seg)
## now we need to interpolate y values for each new x
## This is different depending on how many x and new x you have
if (length(new_x) < length(x)) {
ind_int <- findInterval(new_x, x)
new_y <- sapply(seq_along(ind_int), function(i) {
if (y[ind_int[i]] == y[ind_int[length(ind_int)]]) {
y[ind_int[i]]
} else {
seq_y <- seq(y[ind_int[i]], y[ind_int[i] + 1], length.out = length(my_cols))
head(seq_y, -1)
}
})
} else {
ind_int <- findInterval(new_x, x)
rle_int <- rle(ind_int)
new_y <- sapply(rle_int$values, function(i) {
if (y[i] == y[max(rle_int$values)]) {
y[i]
} else {
seq_y <- seq(y[i], y[i + 1], length.out = rle_int$lengths[i] + 1)
head(seq_y, -1)
}
})
}
## THis is also a bit painful and might cause other bugs that I haven't
## discovered yet.
if (length(unlist(new_y)) < length(new_x)) {
newdat <- data.frame(
x = new_x,
y = rep_len(unlist(new_y), length.out = length(new_x))
)
} else {
newdat <- data.frame(x = new_x, y = unlist(new_y))
}
newdat <- newdat %>%
mutate(xend = lead(x), yend = lead(y)) %>%
drop_na(xend)
newdat$color <- my_cols
newdat
}
## the below is just a demonstration of how the function would work
## using different segment widths
df_alt1 <-
df %>%
filter(variable == "a_b") %>%
alt_colors("x", "value", 1, c("orange", "green"))
df_alt.5 <-
df %>%
filter(variable == "a_b") %>%
alt_colors("x", "value", .5, c("orange", "green"))
df_ab <-
df %>%
filter(variable != "a_b") %>%
# for the identity mapping
mutate(color = ifelse(variable == "a", "green", "orange"))
## create data frame for the legend, also using the alt_colors function as per above
## the amount of x is a bit of trial and error, this is just a quick hack
## this is a trick to center the legend more or less relative to the main plot
y_leg <- ceiling(mean(range(df$value, na.rm = TRUE)))
dist_y <- 2
df_legend <-
data.frame(
variable = rep(unique(df$variable), each = 2),
x = 1:2,
y = rep(seq(y_leg - dist_y, y_leg + dist_y, by = dist_y), each = 2)
)
df_leg_onecol <-
df_legend %>%
filter(variable != "a_b") %>%
mutate(color = ifelse(variable == "a", "green", "orange"))
df_leg_alt <-
df_legend %>%
filter(variable == "a_b") %>%
alt_colors("x", "y", .5, c("orange", "green"))
## I am mapping the colors globally using identity mapping (see scale_identity).
p1 <-
ggplot(mapping = aes(x, value, colour = color)) +
theme_minimal() +
geom_line(data = df_ab, size = 1) +
geom_segment(data = df_alt1, aes(y = y, xend = xend, yend = yend), size = 1) +
scale_color_identity() +
ggtitle("alternating every 1 unit")
p.5 <-
ggplot(mapping = aes(x, value, colour = color)) +
theme_minimal() +
geom_line(data = df_ab, size = 1) +
geom_segment(data = df_alt.5, aes(y = y, xend = xend, yend = yend), size = 1) +
scale_color_identity() +
ggtitle("alternating every .5 unit")
p_leg <-
ggplot(mapping = aes(x, y, colour = color)) +
theme_void() +
geom_line(data = df_leg_onecol, size = 1) +
geom_segment(data = df_leg_alt, aes(xend = xend, yend = yend), size = 1) +
scale_color_identity() +
annotate(
geom = "text", y = unique(df_legend$y), label = unique(df_legend$variable),
x = max(df_legend$x + 1), hjust = 0
)
## set y limits to the range of the main plot
## in order to make the labels visible you need to adjust the plot margin and
## turn clipping off
p1 + p.5 +
(p_leg + coord_cartesian(ylim = range(df$value), clip = "off") +
theme(plot.margin = margin(r = 20, unit = "pt"))) +
plot_layout(widths = c(1, 1, .2))
Created on 2022-01-18 by the reprex package (v2.0.1)
(Copied this over from Alternating color of individual dashes in a geom_line)
Here's a ggplot hack that is simple, but works for two colors only. It results in two lines being overlayed, one a solid line, the other a dashed line.
library(dplyr)
library(ggplot2)
library(reshape2)
# Create df
x_value <- 1:10
group1 <- c(0,1,2,3,4,5,6,7,8,9)
group2 <- c(0,2,4,6,8,10,12,14,16,18)
dat <- data.frame(x_value, group1, group2) %>%
mutate(group2_2 = group2) %>% # Duplicate the column that you want to be alternating colors
melt(id.vars = "x_value", variable.name = "group", value.name ="y_value") # Long format
# Put in your selected order
dat$group <- factor(dat$group, levels=c("group1", "group2", "group2_2"))
# Plot
ggplot(dat, aes(x=x_value, y=y_value)) +
geom_line(aes(color=group, linetype=group), size=1) +
scale_color_manual(values=c("black", "red", "black")) +
scale_linetype_manual(values=c("solid", "solid", "dashed"))
Unfortunately the legend still needs to be edited by hand. Here's the example plot.

Create a three panel plot with one panel spanning 2 columns using ggplot2

I have following data:
df <- data.frame("Stat" = c("Var1","Var1","Var1","Var1","Var1","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var2","Var3","Var3","Var3","Var3","Var3","Var3","Var3","Var3","Var3","Var3"),
"Value" = c(0,1,2,3,4,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,1,2,3,4,5,6,7,8,9,10),
"n" = c(33,120,223,63,20,17,28,33,22, 35,41,53,44,55,59,39,33, 46,30,29,23,21,14,6,18,7,29,50,80,86,91,83,35,34, 20))
What I wanted to do is to plot the above data as bar plot in one canvas but in three rows (1 columns x 3 rows) and each panel should contain plot for only one variable (Stat) eg. Var1 in first panel, Var2 in second and Var3 in the third panel, using the following code:
library(multipanelfigure)
fig1 <- multi_panel_figure(columns = 2, rows = 2, panel_label_type = "none")
# fit the plots on the panels
fig1 %<>%
fill_panel(Var1Plot, column = 1, row = 1) %<>%
fill_panel(Var2Plot, column = 2, row = 1) %<>%
fill_panel(Var3Plot, column = 1:2, row = 2)
fig1
Issue is how to get the Var1Plot, Var2Plot and Var3Plot so that these can be placed in respective panels above. I used the below code, but not able to get the results into above panels:
library(tidyverse)
df %>% ggplot(aes(x = Value, y = n)) +
geom_bar(stat='identity') + facet_wrap(~ Stat)
Expected plot should look something like this :
Here's an approach with cowplot.
library(cowplot)
figure.list <- map(unique(df$Stat), ~
ggplot(data = subset(df, df$Stat == .x), aes(x = Value, y = n)) +
geom_bar(stat='identity') +
ggtitle(.x))
top <- plot_grid(figure.list[[1]], figure.list[[2]], ncol = 2)
bottom <- plot_grid(figure.list[[3]], ncol = 1)
plot_grid(top, bottom,
ncol=1, rel_heights=c(1,1))
If you really want some to be coord_flip-ed, you could make the list manually:
figure.list <- list()
figure.list[[1]] <- ggplot(data = subset(df, df$Stat == "Var1"), aes(x = Value, y = n)) +
geom_bar(stat='identity') + coord_flip()
figure.list[[2]] <- ggplot(data = subset(df, df$Stat == "Var2"), aes(x = Value, y = n)) +
geom_bar(stat='identity') + coord_flip()
figure.list[[3]] <- ggplot(data = subset(df, df$Stat == "Var3"), aes(x = Value, y = n)) +
geom_bar(stat='identity')
top <- plot_grid(figure.list[[1]], figure.list[[2]], ncol = 2)
bottom <- plot_grid(figure.list[[3]], ncol = 1)
plot_grid(top, bottom,
ncol=1, rel_heights=c(1,1))

How to create multiple (6) plots with ggplot and save them to a pdf file?

I have a matrix (pred_matrix, dim = 1e6, 250), the rows are "pixelstacks" of 250 NDVI values of a Landsat scene, from which i did a "fuzzy cmeans" classification witch 6 centers (classes), stored in the list results. I want now to plot a random subset of each class of the 1e6 rows. This is my quick and dirty code so far:
random_index <- floor(runif(10000, 1, 1e6+1))
random_cluster <- results[[6]]$cluster[random_index]
random_pred_matrix <- pred_matrix[random_index, ]
dates_subse_after_pred <- rdn_num[rm_na_pred_df]
random_res <- cbind(random_pred_matrix, random_cluster)
random_res <- t(random_res)
random_res <- cbind(c(dates_subse_after_pred, 1), random_res)
df_1 <- data.frame(random_res[1:250,c(TRUE, random_cluster==1)])
df_2 <- data.frame(random_res[1:250,c(TRUE, random_cluster==2)])
df_3 <- data.frame(random_res[1:250,c(TRUE, random_cluster==3)])
df_4 <- data.frame(random_res[1:250,c(TRUE, random_cluster==4)])
df_5 <- data.frame(random_res[1:250,c(TRUE, random_cluster==5)])
df_6 <- data.frame(random_res[1:250,c(TRUE, random_cluster==6)])
df_1.long <- melt(df_1, id.vars = 1)
df_1.long$X1 <- as.Date(df_1.long$X1)
df_2.long <- melt(df_2, id.vars = 1)
df_2.long$X1 <- as.Date(df_2.long$X1)
df_3.long <- melt(df_3, id.vars = 1)
df_3.long$X1 <- as.Date(df_3.long$X1)
df_4.long <- melt(df_4, id.vars = 1)
df_4.long$X1 <- as.Date(df_4.long$X1)
df_5.long <- melt(df_5, id.vars = 1)
df_5.long$X1 <- as.Date(df_5.long$X1)
df_6.long <- melt(df_6, id.vars = 1)
df_6.long$X1 <- as.Date(df_6.long$X1)
ggplot(df_1.long) +
geom_line( aes(x = X1, y= value, group = variable), color = "lightblue")
ggplot(df_2.long) +
geom_line( aes(x = X1, y= value, group = variable), color = "blue")
ggplot(df_3.long) +
geom_line( aes(x = X1, y= value, group = variable), color = "lightgreen")
ggplot(df_4.long) +
geom_line( aes(x = X1, y= value, group = variable), color = "green")
ggplot(df_5.long) +
geom_line( aes(x = X1, y= value, group = variable), color = "pink")
ggplot(df_6.long) +
geom_line( aes(x = X1, y= value, group = variable), color = "red")
After this i have just hit 6 times the export button in rstudio and inserted it all in a word document...
Is there a way to do this in a loop? Or even produce a final pdf containing the 6 plots?
Separate file
I think what you are after is having the following six times in your code.
ggsave("filename.png", # or pdf if you like
plot = last_plot(), # or give ggplot object name as in myPlot,
width = 5, height = 5,
units = "in", # other options c("in", "cm", "mm"),
dpi = 300)
For example,
library(ggplot2)
p1 <- ggplot(df_1.long) +
geom_line( aes(x = X1, y= value, group = variable),
color = "lightblue")
ggsave("df1.png", plot = p1, dpi = 300)
All in one
If you want all the six files in one pdf, then first do
pdf("file_name.pdf")
# do your ggplots here
p1
p2
p6
dev.off()
If you are using Rstudio I would recommend writing your code in a Rmarkdown file and then exporting to pdf directly.

How to control legend with many groups

I have a plot like this:
Which was created with this code:
# Make data:
set.seed(42)
n <- 1000
df <- data.frame(values = sample(0:5, size = n, replace = T, prob = c(9/10, rep(0.0167,5))),
group = rep(1:100, each = 10),
fill2 = rep(rnorm(10), each = 100),
year = rep(2001:2010, times = 100)
)
df$values <- ifelse(df$year %in% 2001:2007 == T, 0, df$values)
# Plot
require(ggplot2)
p <- ggplot(data = df, aes(x = year, y = values, colour = as.factor(group))) + geom_line()
p
Since there are so many groups, the legend is really not helpfull.
Ideally I would like just two elements in the legend, one for group = 1 and for all the other groups (they should all have the same color). Is there a way to force this?
you can define a new variable that has only two values, but still plot lines according to their original group,
ggplot(data = df, aes(x = year, y = values, group = group,
colour = ifelse(group == 1, "1", "!1"))) +
geom_line() +
scale_colour_brewer("groups", palette="Set1")

insert labels in proportional bar chart with ggplot2 and geom_text

I am trying to insert labels into a proportional barchart: one label per segment, with as text the percentage of each segment. With the help of thothal I managed to do this:
var1 <- factor(as.character(c(1,1,2,3,1,4,3,2,3,2,1,4,2,3,2,1,4,3,1,2)))
var2 <- factor(as.character(c(1,4,2,3,4,2,1,2,3,4,2,1,1,3,2,1,2,4,3,2)))
data <- data.frame(var1, var2)
dat <- ddply(data, .(var1), function(.) {
res <- cumsum(prop.table(table(factor(.$var2))))
data.frame(lab = names(res), y = c(res))
})
ggplot(data, aes(x = var1)) + geom_bar(aes(fill = var2), position = 'fill') +
geom_text(aes(label = lab, x = var1, y = y), data = dat)
I would like to have for labels the percentage of each level, and not the level name.
Any help appreciated!
You are telling geom_text to use var2 as your y variable. That is in fact as.numeric(data$var2), which translates to a range of 1-4. However, your barplot uses the cumulative percentages.
Hence you have to calculate these positions before:
library(ggplot2)
library(plyr) # just for convenience
var1 <- factor(as.character(c(1,1,2,3,1,4,3,2,3,2,1,4,2,3,2,1,4,3,1,2)))
var2 <- factor(as.character(c(1,4,2,3,4,2,1,2,3,4,2,1,1,3,2,1,2,4,3,2)))
data <- data.frame(var1, var2)
dat <- ddply(data, .(var1), function(.) {
res <- cumsum(prop.table(table(factor(.$var2)))) # re-factor to use only used levels
res2 <- prop.table(table(factor(.$var2))) # re-factor to use only used levels
data.frame(lab = names(res), y = c(res), lab2 = c(res2))
})
ggplot(data, aes(x = var1)) + geom_bar(aes(fill = var2), position = 'fill') +
geom_text(aes(label = round(lab2, 2), x = var1, y = y), data = dat)
This places the labs at the end of each bar. If you want to have them slightly offset, you should play arround in the creation of dat.
Another way to get non-cumulative percentage plus centering the labels, for future reference:
dat <- ddply(data, .(var1), function(.) {
good <- prop.table(table(factor(.$var2)))
res <- cumsum(prop.table(table(factor(.$var2))))
data.frame(lab = names(res), y = c(res), good = good, pos = cumsum(good) - 0.5*good)
})
ggplot(data, aes(x = var1)) + geom_bar(aes(fill = var2), position = 'fill') +
geom_text(aes(label = round(good.Freq, 2), x = var1, y = pos.Freq), data = dat)
I used the following code and work well for me, give it a try.
geom_text(aes(label = paste(round(dat2$value,0), "%"),
vjust = ifelse(value >= 0, -0.05, 1.15)
),
size = 4, position = position_stack(vjust=0.5)
)
Basically, you need label = paste(y value, "%"). In my code, dat2 is the data file name; value is the Y value in the figure. In this case, I rounded up the number with 0 decimal.Good luck.

Resources