Related
I think I'm missing something very easy here, but I just can't figure it out at the moment:
I would like to consistently assign colors to certain values from a column across multiple plots.
So I have this tibble (sl):
# A tibble: 15 x 3
class hex x
<chr> <chr> <int>
1 translational slide #c23b22 1
2 rotational slide #AFC6CE 2
3 fast flow-type #b7bf5e 3
4 complex #A6CEE3 4
5 area subject to rockfall/topple #1F78B4 5
6 fall-type #B2DF8A 6
7 n.d. #33A02C 7
8 NA #FB9A99 8
9 area subject to shallow-slides #E31A1C 9
10 slow flow-type #FDBF6F 10
11 topple #FF7F00 11
12 deep-seated movement #CAB2D6 12
13 subsidence #6A3D9A 13
14 areas subject to subsidence #FFFF99 14
15 area of expansion #B15928 15
This should recreate it:
structure(list(class = c("translational slide", "rotational slide",
"fast flow-type", "complex", "area subject to rockfall/topple",
"fall-type", "n.d.", NA, "area subject to shallow-slides", "slow flow-type",
"topple", "deep-seated movement", "subsidence", "areas subject to subsidence",
"area of expansion"), hex = c("#c23b22", "#AFC6CE", "#b7bf5e",
"#A6CEE3", "#1F78B4", "#B2DF8A", "#33A02C", "#FB9A99", "#E31A1C",
"#FDBF6F", "#FF7F00", "#CAB2D6", "#6A3D9A", "#FFFF99", "#B15928"
), x = 1:15), row.names = c(NA, -15L), class = c("tbl_df", "tbl",
"data.frame"))
Now I would like to plot each class with a bar in the color if its hex-code (for now just for visualization purposes). So I did:
ggplot(sl) +
geom_col(aes(x = x,
y = 1,
fill = class)) +
scale_fill_manual(values = sl$hex) +
geom_text(aes(x = x,
y = 0.5,
label = class),
angle = 90)
But these are not the colors as they are in the tibble.
So I tried to follow this guide: How to assign colors to categorical variables in ggplot2 that have stable mapping? and created this:
# create the color palette
mycols = sl$hex
names(mycols) = sl$class
and then plotted it with
ggplot(sl) +
geom_col(aes(x = x,
y = 1,
fill = class)) +
scale_fill_manual(values = mycols) +
geom_text(aes(x = x,
y = 0.5,
label = class),
angle = 90)
But the results is the same. It's this:
For example the translational slide has the hex code: "#c23b22" and should be a pastell darkish red.
Anyone might have an idea what I'm missing here?
Consider this:
sl <- structure(list(class = c("translational slide", "rotational slide",
"fast flow-type", "complex", "area subject to rockfall/topple",
"fall-type", "n.d.", NA, "area subject to shallow-slides", "slow flow-type",
"topple", "deep-seated movement", "subsidence", "areas subject to subsidence",
"area of expansion"), hex = c("#c23b22", "#AFC6CE", "#b7bf5e",
"#A6CEE3", "#1F78B4", "#B2DF8A", "#33A02C", "#FB9A99", "#E31A1C",
"#FDBF6F", "#FF7F00", "#CAB2D6", "#6A3D9A", "#FFFF99", "#B15928"
), x = 1:15), row.names = c(NA, -15L), class = c("tbl_df", "tbl",
"data.frame"))
sl$class <- factor( sl$class, levels=unique(sl$class) )
cl <- sl$hex
names(cl) <- paste( sl$class )
ggplot(sl) +
geom_col(aes(x = x,
y = 1,
fill = class)) +
scale_fill_manual( values = cl, na.value = cl["NA"] ) +
geom_text(aes(x = x,
y = 0.5,
label = class),
angle = 90)
By changing class to a factor and setting levels to it, and using a named vector for your values in scale_fill_manual, and using na.value in there properly, yo might get something that looks more as expected.
You need to provide correct order to colors as per your column, since there is already one called 'x' I have used it as well. Also I replaced NA with character 'NA'. I have checked few of them, Please let me know if this is not the desired output. Thanks
#Assuming df is your dataframe:
df[is.na(df$class), 'class'] <- 'NA'
ggplot(df) +
geom_col(aes(x = x,
y = 1,
fill = factor(x))) +
scale_fill_manual(values = df$hex, labels=df$class) +
geom_text(aes(x = x,
y = 0.5,
label = class),
angle = 90)
Output:
I think the problem is that scale_fill_manual expect the order of its values and labels arguments to match. This isn't the case with your dataset.
Does
sl %>% ggplot() +
geom_col(aes(x = x,
y = 1,
fill = hex)) +
geom_text(aes(x = x,
y = 0.5,
label = class),
angle = 90) +
scale_fill_manual(values=sl$hex, labels=sl$class)
Give you what you want?
next time, please dput() your test data: it took me as long to create the test dataset as to answer your question. Also, using hex codes for colours make it difficult to check the colours are as expected. For a MWE, blue/green/black etx would have been more helpful.
I had an idea for a visualization, that includes generating a plot for each row in my dataset (58 rows), showing the relative position of the value that i select, in a scale (e.g.: 58 cities and the position of the population size of one city relative to others).
Here's a code sample showing my data structure (nregs the name of regions I'm studying). I want to create a 'rank plot' as I've showed for each row, one plot ranking based in total_pop and other based in urban_pop.
structure(list(nregs = c("1.1 Javari e Interbacias Javari - Juruá",
"1.2 Transf. da Margem Esquerda do Solimões", "1.3 Juruá e Interbacias Juruá - Jutaí",
"1.4 Purus e Interbacias Purus - Juruá", "1.5 Negro", "1.6 Madeira e Interbacias Madeira - Purus",
"1.7 Estaduais Margem Esquerda do Amazonas", "1.8 Tapajós e Interbacias Tapajós - Madeira",
"1.9 Estaduais PA", "1.10 Xingu e Interbacias Xingu - Tapajós"
), urban_pop = c(63777, 83237, 265725, 717181, 2122424, 1693933,
837519, 1169865, 171045, 515124), total_pop = c(111120, 141473,
405955, 910484, 2357696, 2320307, 933181, 1639624, 304181, 831595
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))
As english is not my native language, i'm finding it difficult to even search a solution online. I usually do my dataviz with R and tidyverse. Can anybody give me at least a direction? Thanks in advance.
It sounds like you're looking for something like this:
library(ggplot2)
library(dplyr)
df %>%
mutate(urban_pop = rank(urban_pop),
total_pop = rank(total_pop)) %>%
tidyr::pivot_longer(-1) %>%
ggplot(aes(value, nregs)) +
geom_segment(aes(x = 1, y = nregs, xend = 10, yend = nregs)) +
geom_segment(data = expand.grid(x = seq(nrow(df)), y = seq(nrow(df)) - 0.1),
aes(x = x, y = y, xend = x, yend = y + 0.2)) +
scale_x_continuous(breaks = seq(nrow(df)), labels = rev(seq(nrow(df))),
name = "Rank") +
geom_point(aes(color = name), position = position_dodge(width = 0.5),
size = 4) +
scale_color_manual(values = c("red", "forestgreen")) +
theme_void() +
theme(axis.text.y = element_text(hjust = 1),
axis.text.x = element_text(),
axis.title.x = element_text(size = 16))
Note that the ranks of urban and total population appear to be the same for each city in your sample
i am trying to plot three variable (SA,SA1,SA2) with two variable(SA& SA2) on left y-axis and one variable (SA1)on right secondary y-axis. I tried to fix the axis limits using limits = c(1e15,5e15) on left y-axis while trying to limit secondary axis between limits = c(3e17,4.2e17) but i am unable to plot the seocondary axis with my customized limits. DATA Link
library(ggplot2)
test <- read.xlsx2("filepath/test.xlsx", 1, header=TRUE)
View(test)
test$SA=as.numeric(levels(test$SA))[test$SA]
test$SA1=as.numeric(levels(test$SA1))[test$SA1]
test$SA2=as.numeric(levels(test$SA2))[test$SA2]
g <- ggplot(test,aes(x=year, y= SA, group = 1)) + geom_line(mapping = aes(x = test$year, y = test$SA))
+ geom_line(mapping = aes(x = test$year, y = test$SA2), color = "red") + geom_line(mapping = aes(x = test$year, y = test$SA1), size = 1, color = "blue")
g+scale_y_continuous(name = "primary axis title",
+ sec.axis = sec_axis(~./5, name = "secondary axis title (SA1)"))
Final Solution by #dc37 gives me the followibng result:
ggplot(subset(DF, Var != "SA1"), aes(x = year, y = val, color = Var))+
geom_line()+
scale_y_continuous(name = "Primary axis", sec.axis = sec_axis(~.*100, name = "Secondary"))
Thanks
The argument sec.axis is only creating a new axis but it does not change your data and can't be used for plotting data.
To do be able to plot data from two groups with a large range, you need to scale down SA1 first.
Here, I scaled it down by dividing it by 100 (because the ratio between the max of SA1 and the max of SA and SA2 is close to 100) and I also reshape your dataframe in longer format more suitable for ggplot2:
library(lubridate)
df$year = parse_date_time(df$year, orders = "%Y") # To set year in a date format
library(dplyr)
library(tidyr)
DF <- df %>% mutate(SA1_100 = SA1/100) %>% pivot_longer(.,-year, names_to = "Var",values_to = "val")
# A tibble: 44 x 3
year Var val
<int> <chr> <dbl>
1 2008 SA 1.41e15
2 2008 SA1 3.63e17
3 2008 SA2 4.07e15
4 2008 SA1_100 3.63e15
5 2009 SA 1.53e15
6 2009 SA1 3.77e17
7 2009 SA2 4.05e15
8 2009 SA1_100 3.77e15
9 2010 SA 1.52e15
10 2010 SA1 3.56e17
# … with 34 more rows
Then, you can plot it by using (I subset the dataframe to remove "SA1" and keep the transformed column "SA1_100"):
library(ggplot2)
ggplot(subset(DF, Var != "SA1"), aes(x = year, y = val, color = Var))+
geom_line()+
scale_y_continuous(name = "Primary axis", sec.axis = sec_axis(~.*100, name = "Secondary"))
BTW, in ggplot2, you don't need to design column using $, simply write the name of it.
Data
structure(list(year = 2008:2018, SA = c(1.40916e+15, 1.5336e+15,
1.52473e+15, 1.58394e+15, 1.59702e+15, 1.54936e+15, 1.6077e+15,
1.59211e+15, 1.73533e+15, 1.7616e+15, 1.67771e+15), SA1 = c(3.63e+17,
3.77e+17, 3.56e+17, 3.68e+17, 3.68e+17, 3.6e+17, 3.6e+17, 3.68e+17,
3.55e+17, 3.58e+17, 3.43e+17), SA2 = c(4.07e+15, 4.05e+15, 3.94e+15,
3.95e+15, 3.59e+15, 3.53e+15, 3.43e+15, 3.2e+15, 3.95e+15, 3.03e+15,
3.16e+15)), row.names = c(NA, -11L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x56412c341350>)
I'm new to ggplot and have a problem with plotting errorbars in a barplot.
A minimal working example looks like this:
abun_all <- data.frame("Tree.genus" = c(rep("Acer", 5), rep("Betula", 5), rep("Larix", 5), rep("Picea", 5), rep("Pinus", 5), rep("Quercus", 5)),
"P.sampled" = c(sample(c(seq(from = 0.001, to = 0.06, by = 0.0005)), 30)),
"Insects.sampled" = c(sample(c(seq(from = 1.667, to = 533, by = 1.335)), 30)),
"Category" = as.factor(c(sample(c(seq(from = 1, to = 3, by = 1)), 30, replace = T))),
"P.sampled_mean" = c(sample(c(seq(from = 0.006, to = 0.178, by = 0.0005)), 30)),
"P.sampled_sd" = c(sample(c(seq(from = 0.004, to = 0.2137, by = 0.0005)), 30)))
ggplot(data = abun_all, aes(x = as.factor(Tree.genus), y = P.sampled , fill = Category)) +
geom_bar(stat = "identity", position = position_dodge(1)) +
geom_errorbar(aes(ymin = P.sampled - (P.sampled_mean+P.sampled_sd), ymax = P.sampled + (P.sampled_mean+P.sampled_sd)), width = 0.1, position = position_dodge(1)) + scale_fill_discrete(name = "Category",
breaks = c(1, 2, 3),
labels = c("NrAm in SSM", "NrAm in FR", "Eurp in FR")) +
xlab("Genus") + ylab("No. of Focus sp. per total insect abundance")
NOTE : The values are just random and do not represent the actual data but should suffice to demonstrate the problem !
The problem seems to be that errorbars are plotted for the number of entires of each Tree.genus per Category. How can I get this to work ?
Edit: I created another Df by hand with just the max values of each P.sampled combination and now the plot looks the way I want it (except for the two missing errorbars).
abun_plot <- data.frame("Tree.genus" = rep(genera, each = 3),
"P.sampled" = c(0.400000000, 0.100000000, 0.500000000, 0.200000000, 0.100000000, 0.042857143, 0.016666667, 0.0285714286, 0.0222222222, 0.020000000, 0, 0.010000000, 0.060000000, 0.025000000, 0.040000000, 0.250000000, 0.150000000, 0.600000000),
"Category" = as.factor(rep(c(1,2,3), 3)),
"P.sampled_SD" = as.numeric(c(0.08493057, 0.02804758, 0.19476489, 0.04533747, 0.02447665, 0.01308939, 0.004200168, "NA", 0.015356359, 0.005724859, "NA", "NA", 0.01633612, 0.01013794, 0.02045931, 0.07584737, 0.05760980, 0.21374053)),
"P.sampled_Mean" = as.numeric(c(0.07837134, 0.05133333, 0.14089286, 0.04537983, 0.02686200, 0.01680721, 0.005833333, 0.028571429, 0.011363636, 0.01101331, "NA", 0.01000000, 0.02162986, 0.01333333, 0.01668582, 0.08705221, 0.04733333, 0.17870370)))
ggplot(data = abun_plot, aes(x = as.factor(Tree.genus), y = P.sampled , fill = Category)) +
geom_bar(stat = "identity", position = position_dodge(1)) +
geom_errorbar(aes(ymin = P.sampled - P.sampled_SD, ymax = P.sampled + P.sampled_SD), width = 0.1, position = position_dodge(1)) +
scale_fill_discrete(name = "Category",
breaks = c(1, 2, 3),
labels = c("NrAm in SSM", "NrAm in FR", "Eurp in FR")) +
xlab("Genus") + ylab("No. of Focus sp. per total insect abundance")
Since doing this by hand takes a lot of time and several other plots have the same problem, I would prefer working with the original df (abun_all). Can I just subset my df in the ggplot() function to get the desired output ?
Since you want to just show the maximum value for each combination of genus and category, you can use a couple of dplyr functions (in the tidyverse alongside ggplot2) to group by both genus and category, then take the top value for each. That way, you aren't building abun_plot by hand the way you did in the second block.
library(dplyr)
library(ggplot2)
abun_plot <- abun_all %>%
group_by(Tree.genus, Category) %>%
top_n(1, P.sampled_mean)
head(abun_plot)
#> # A tibble: 6 x 6
#> # Groups: Tree.genus, Category [6]
#> Tree.genus P.sampled Insects.sampled Category P.sampled_mean P.sampled_sd
#> <fct> <dbl> <dbl> <fct> <dbl> <dbl>
#> 1 Acer 0.041 295. 3 0.0125 0.044
#> 2 Acer 0.044 81.8 1 0.166 0.037
#> 3 Acer 0.0085 379. 2 0.155 0.134
#> 4 Betula 0.0505 183. 2 0.170 0.0805
#> 5 Betula 0.0325 61.7 3 0.0405 0.0995
#> 6 Betula 0.0465 326. 1 0.0985 0.188
After that, the plotting works as you initially expected:
ggplot(data = abun_plot, aes(x = as.factor(Tree.genus), y = P.sampled , fill = Category)) +
geom_col(position = position_dodge(1)) +
geom_errorbar(aes(ymin = P.sampled - P.sampled_sd, ymax = P.sampled + P.sampled_sd), width = 0.1, position = position_dodge(1)) +
scale_fill_discrete(name = "Category",
breaks = c(1, 2, 3),
labels = c("NrAm in SSM", "NrAm in FR", "Eurp in FR")) +
xlab("Genus") + ylab("No. of Focus sp. per total insect abundance")
It's also worth noting that as of a few releases back of ggplot2, you can use geom_col() in place of geom_bar(stat = "identity").
Created on 2018-10-03 by the reprex package (v0.2.1)
I am creating plots similar to the first example image below, and need plots like the second example below.
library(ggplot2)
library(scales)
# some data
data.2015 = data.frame(score = c(-50,20,15,-40,-10,60),
area = c("first","second","third","first","second","third"),
group = c("Findings","Findings","Findings","Benchmark","Benchmark","Benchmark"))
data.2014 = data.frame(score = c(-30,40,-15),
area = c("first","second","third"),
group = c("Findings","Findings","Findings"))
# breaks and limits
breaks.major = c(-60,-40,-22.5,-10, 0,10, 22.5, 40, 60)
breaks.minor = c(-50,-30,-15,-5,0, 5, 15,30,50)
limits =c(-70,70)
# plot 2015 data
ggplot(data.2015, aes(x = area, y = score, fill = group)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
coord_flip() +
scale_y_continuous(limit = limits, oob = squish, minor_breaks = breaks.minor, breaks = breaks.major)
The data.2014 has only values for the "Findings" group. I would like to show those 2014 Findings values on the plot, on the appropriate/corresponding data.2015$area, where there is 2014 data available.
To show last year's data just on the "Finding" (red bars) data, I'd like to use a one-sided errorbar/whisker that emanates from the value of the relevant data.2015 bar, and terminates at the data.2014 value, for example:
I thought to do this by using layers and plotting error bars so that the 2015 data could overlap, however this doesn't work when the 2014 result is abs() smaller than the 2015 result and is thus occluded.
Considerations:
I'd like the errorbar/whisker to be the same width as the bars, perhaps even dashed line with a solid cap.
Bonus points for a red line when the value has decreased, and green when the value has increased
I generate lots of these plots in a loop, sometimes with many groups, with a different amount of areas in each plot. The 2014 data is (at this stage) always displayed only for a single group, and every area has some data (except for just one NA case, but need to provision for that scenario)
EDIT
So I've added to the below solution, I used that exact code but instead used the geom_linerange so that it would add lines without the caps, then I also used the geom_errorbar, but with ymin and ymax set to the same value, so that the result is a one-sided error bar in ggplot geom_bar! Thanks for the help.
I believe you can get most of what you want with a little data manipulation. Doing an outer join of the two datasets will let you add the error bars with the appropriate dodging.
alldat = merge(data.2015, data.2014, all = TRUE, by = c("area", "group"),
suffixes = c(".2015", ".2014"))
To make the error bar one-sided, you'll want ymin to be either the same as y or NA depending on the group. It seemed easiest to make a new variable, which I called plotscore, to achieve this.
alldat$plotscore = with(alldat, ifelse(is.na(score.2014), NA, score.2015))
The last thing I did is to make a variable direction for when the 2015 score decreased vs increased compared to 2014. I included a third category for the Benchmark group as filler because I ran into some issues with the dodging without it.
alldat$direction = with(alldat, ifelse(score.2015 < score.2014, "dec", "inc"))
alldat$direction[is.na(alldat$score.2014)] = "absent"
The dataset used for plotting would look like this:
area group score.2015 score.2014 plotscore direction
1 first Benchmark -40 NA NA absent
2 first Findings -50 -30 -50 dec
3 second Benchmark -10 NA NA absent
4 second Findings 20 40 20 dec
5 third Benchmark 60 NA NA absent
6 third Findings 15 -15 15 inc
The final code I used looked like this:
ggplot(alldat, aes(x = area, y = score.2015, fill = group)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
geom_errorbar(aes(ymin = plotscore, ymax = score.2014, color = direction),
position = position_dodge(width = .9), lwd = 1.5, show.legend = FALSE) +
coord_flip() +
scale_y_continuous(limit = limits, oob = squish, minor_breaks = breaks.minor, breaks = breaks.major) +
scale_color_manual(values = c(NA, "red", "green"))
I'm using the development version of ggplot2, ggplot2_1.0.1.9002, and show_guide is now deprecated in favor of show.legend, which I used in geom_errorbar.
I obviously didn't change the line type of the error bars to dashed with a solid cap, nor did I remove the bottom whisker as I don't know an easy way to do either of these things.
In response to a comment suggesting I add the full solution as an answer:
library(ggplot2)
library(scales)
# some data
data.2015 = data.frame(score = c(-50,20,15,-40,-10,60),
area = c("first","second","third","first","second","third"),
group = c("Findings","Findings","Findings","Benchmark","Benchmark","Benchmark"))
data.2014 = data.frame(score = c(-30,40,-15),
area = c("first","second","third"),
group = c("Findings","Findings","Findings"))
# breaks and limits
breaks.major = c(-60,-40,-22.5,-10, 0,10, 22.5, 40, 60)
breaks.minor = c(-50,-30,-15,-5,0, 5, 15,30,50)
limits =c(-70,70)
# reconfigure data to create values for the additional errorbar/linerange
alldat = merge(data.2015, data.2014, all = TRUE, by = c("area", "group"),
suffixes = c(".2015", ".2014"))
alldat$plotscore = with(alldat, ifelse(is.na(score.2014), NA, score.2015))
alldat$direction = with(alldat, ifelse(score.2015 < score.2014, "dec", "inc"))
alldat$direction[is.na(alldat$score.2014)] = "absent"
ggplot(alldat, aes(x = area, y = score.2015, fill = group)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
# set the data min and max as the same to have a single 'cap' with no line
geom_errorbar(aes(ymin = score.2014, ymax = score.2014, color = direction),
position = position_dodge(width = .9), lwd = 1.5, show.legend = FALSE) +
#then add the line
geom_linerange(aes(ymin = score.2015, ymax = score.2014, color = direction),
position = position_dodge(width = .9), lwd = 1.5, show.legend = FALSE) +
coord_flip() +
scale_y_continuous(limit = limits, oob = squish, minor_breaks = breaks.minor, breaks = breaks.major) +
scale_color_manual(values = c(NA, "red", "green"))