Values in gganimate col chart differs from original data values - r

I'm starting with animated charts and using gganimate package. I've found that when generating a col chart animation over time, values of variables change from original. Let me show you an example:
Data <- as.data.frame(cbind(c(1,1,1,2,2,2,3,3,3),
c("A","B","C","A","B","C","A","B","C"),
c(20,10,15,20,20,20,30,25,35)))
colnames(Data) <- c("Time","Object","Value")
Data$Time <- as.integer(Data$Time)
Data$Value <- as.numeric(Data$Value)
Data$Object <- as.character(Data$Object)
p <- ggplot(Data,aes(Object,Value)) +
stat_identity() +
geom_col() +
coord_cartesian(ylim = c(0,40)) +
transition_time(Time)
p
The chart obtained loks like this:
Values obtained in the Y-axis are between 1 and 6. It seems that the original value of 10 corresponds to a value of 1 in the Y-axis. 15 is 2, 20 is 3 and so on...
Is there a way for keeping the original values in the chart?
Thanks in advance

Your data changed when you coerced a factor variable into numeric. (see data section how to efficiently define a data.frame)
You were missing a position = "identity" for your bar charts to stay at the same place. I added a fill = Time for illustration.
Code
p <- ggplot(Data, aes(Object, Value, fill = Time)) +
geom_col(position = "identity") +
coord_cartesian(ylim = c(0, 40)) +
transition_time(Time)
p
Data
Data <- data.frame(Time = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
Object = c("A", "B", "C", "A", "B", "C", "A", "B", "C"),
Value = c(20, 10, 15, 20, 20, 20, 30, 25, 35))

Related

logarithmic y-axis issue in R/ ggplot2

I plotted a histogram from a frequency distribution table using ggplot2. Here is some sample data
dput(test_data)
structure(list(inst = c(5, 5, 5, 10, 10, 10, 15, 15, 15), equip = c("a",
"b", "c", "a", "b", "c", "a", "b", "c"), value = c(0.520670542493463,
0.7556017707102, 0.931902746669948, 0.206132101127878, 0.0114199279341847,
0.603053622646257, 0.315444506937638, 0.375196750741452, 0.983124621212482
)), class = "data.frame", row.names = c(NA, -9L))
When I use ggplot2 to plot the data, I get the following output:
test_hist1 <- ggplot(test_data,aes(x = inst, y =value, fill = equip)) + geom_bar(width=3,alpha=1,stat = "dodge", position ="stack")+theme_bw()+xlab(expression(Value))+ylab("value") + ggtitle(expression(test~data))+theme(plot.title = element_text(hjust = 0.5))+scale_fill_manual(values=c("#00FF00", "#FFD700","#DC143C"))
But when I transform the y_axis to be a log_axis, the plot direction changes and so does the intensity of the bars.
test_hist2 <- ggplot(test_data,aes(x = inst, y =value, fill = equip)) + geom_bar(width=3,alpha=1,stat = "dodge", position ="stack")+theme_bw()+xlab(expression(Value))+ylab("log_yaxis") + ggtitle(expression(test~data))+theme(plot.title = element_text(hjust = 0.5))+scale_fill_manual(values=c("#00FF00", "#FFD700","#DC143C"))+scale_y_log10()
My second plot is wrong, because the code for second plot is just converting my y-axis number to log10(y_axis_value) instead of a log_axis that is given in the following answer (the plot in the answer is the axis I am looking for). Can someone direct me in the right direction. Thanks for the help.
R: Difference between log axis scale vs. manual log transformation?

ggplot2 bar chart with two bars for each x value of data and two y-axis

I struggle to create a bar chart with two different y-axis and two bars for each x -value (category).
I have different types of categories of data (see below) for each I have two values that I want to plot side by side (price and number). However, the values for each category are far apart, which makes the bars of the number category become almost invisible. Thus, I want to add a second y-axis (one for the price one for the number) to allow a comparison between the two categories.
Example data:
Cat Type Value
1 A price 12745
2 A number 5
3 B price 34874368
4 B number 143
5 C price 84526
6 C number 11
I use the following R code (ggplot2) to create the plot:
plot = ggplot(df ,aes(x=Cat, fill=Type, y=Value))+
geom_bar(stat="identity", position="dodge")+
theme_bw() +
labs_pubr() +
scale_fill_grey() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
plot
scale_y_continuous and sec.axis but I did not manage to assign the y-axis to the type of data.
scale_y_continuous(
"price",
sec.axis = sec_axis(~., name = "number")
) +
I am happy for every hint :)
Is that what you mean?
df=tribble(
~Id, ~Cat, ~Type, ~Value,
1, "A", "price", 13,
2, "A", "number", 5,
3, "B", "price", 19,
4, "B", "number", 12,
5, "C", "price", 8,
6, "C", "number", 11)
df %>% ggplot(aes(Cat))
df %>% ggplot(aes(x=Type, fill=Type, y=Value))+
geom_col()+
facet_grid(~Cat)
P.S.
I changed your values a bit because you could not see much when the differences were of the order of 10 ^ 7!
With these numbers, the logarithmic scale is better suited
df=tribble(
~Id, ~Cat, ~Type, ~Value,
1, "A", "price", 12745,
2, "A", "number", 5,
3, "B", "price", 34874368,
4, "B", "number", 143,
5, "C", "price", 84526,
6, "C", "number", 11)
df %>% ggplot(aes(x=Type, fill=Type, y=Value))+
geom_col()+
scale_y_continuous(trans='log10')+
facet_grid(~Cat)
The idea as I understand is to split the graphs by Type, and you can do this using the helpful ggplot facet_wrap() verb. Then use the scales package to fix the rounding along the y-axis.
library(scales)
library(ggplot)
library(dplyr)
tbl <- tibble(Cat = c("A", "A", "B", "B", "C", "C"), Type = c("price", "number", "price", "number","price", "number")
, Value = c(12745, 5, 34874368, 143, 84526, 11))
tbl %>%
ggplot(aes(Cat, Value, fill = Cat)) +
geom_col(position = "dodge") +
facet_wrap(~Type, scales = "free") +
scale_y_continuous(labels = scales::number_format())

GGPLOT2 : geom_area with ordered character variable as x axis

I have a dataset like the following :
dat <- data.frame(sp = c("a", "a", "b", "b", "b", "c", "c"),
nb = c(5, 44, 32, 56, 10, 1, 43),
gp = c("ds1", "ds2", "ds1", "ds2", "ds3", "ds1", "ds3"))
With sp = species ; nb = nb occurrences ; gp = sampling group
I want to make a geom_area graph where values for species (sp) are displayed on y axis, with species grouped on x axis and ordered by descending order based on their total sum.
Up to now I only managed to do that :
ggplot(dat, aes(x=as.numeric(factor(sp)), y=nb, fill=gp, colour = gp)) +
geom_area()
Which gives this output (please don't laugh ;))
Could you help me to sort the x axis on descending order of the sum of stacked values ? And to fill the empty area ?
E.g. I try to do something like that (here in ascending order, but it no matters) :
Try this. The gaps in your plot could be filled by filling the df with the missing combinations of gp and sp using tidyr::complete. To reorder the levels of sp I make use of forcats::fct_reorder:
library(ggplot2)
library(dplyr)
library(tidyr)
library(forcats)
dat <- data.frame(sp = c("a", "a", "b", "b", "b", "c", "c"),
nb = c(5, 44, 32, 56, 10, 1, 43),
gp = c("ds1", "ds2", "ds1", "ds2", "ds3", "ds1", "ds3"))
dat1 <- dat %>%
# Fill with missing combinations of gp and sp
tidyr::complete(gp, sp, fill = list(nb = 0)) %>%
# Reorder according to sum of nb
mutate(sp = forcats::fct_reorder(sp, nb, sum, .desc = TRUE),
sp_num = as.numeric(sp))
ggplot(dat1, aes(x=sp_num, y=nb, fill=gp, colour = gp)) +
geom_area()

Can ggplot's faceting be used here?

Welcome to Tidyville.
Below is a small df showing the populations of cities in Tidyville. Some cities belong to the A state and some the B state.
I wish to highlight the cities that decreased in population in red. Mission accomplished so far.
But there are many states in Tidyville. Is there a way to use ggplot's faceting faceting to show a plot for each state. I'm uncertain because I'm new and I do a little calculation outside the ggplot call to identify the cities that decreased in population.
library(ggplot2)
library(tibble)
t1 <- tibble (
y2001 = c(3, 4, 5, 6, 7, 8, 9, 10),
y2016 = c(6, 3, 9, 2, 8, 2, 11, 15),
type = c("A", "A", "B", "B", "A", "A", "B", "B")
)
years <- 15
y2001 <- t1$y2001
y2016 <- t1$y2016
# Places where 2016 pop'n < 2001 pop'n
yd <- y2016 < y2001
decrease <- tibble (
y2001 = t1$y2001[yd],
y2016 = t1$y2016[yd]
)
# Places where 2016 pop'n >= 2001 pop'n
yi <- !yd
increase <- tibble (
y2001 = t1$y2001[yi],
y2016 = t1$y2016[yi]
)
ggplot() +
# Decreasing
geom_segment(data = decrease, aes(x = 0, xend = years, y = y2001, yend = y2016),
color = "red") +
# Increasing or equal
geom_segment(data = increase, aes(x = 0, xend = years, y = y2001, yend = y2016),
color = "black")
I think this would be much easier if you just put your data in a tidy format like ggplot2 expects. Here's a possible solution using tidyverse functions
library(tidyverse)
t1 %>%
rowid_to_column("city") %>%
mutate(change=if_else(y2016 < y2001, "decrease", "increase")) %>%
gather(year, pop, y2001:y2016) %>%
ggplot() +
geom_line(aes(year, pop, color=change, group=city)) +
facet_wrap(~type) +
scale_color_manual(values=c("red","black"))
This results in
Your intermediary steps are unnecessary and lose some of your data. We'll keep what you created first:
t1 <- tibble (
y2001 = c(3, 4, 5, 6, 7, 8, 9, 10),
y2016 = c(6, 3, 9, 2, 8, 2, 11, 15),
type = c("A", "A", "B", "B", "A", "A", "B", "B")
)
years <- 15
But instead of doing all the separating and subsetting, we'll just create a dummy variable for whether or not y2016 > y2001.
t1$incr <- as.factor(ifelse(t1$y2016 >= t1$y2001, 1, 0))
Then we can extract the data argument to the ggplot() call to make it more efficient. We'll only use one geom_segment() argument and set the color() argument to be that dummy variable we created before. We then need to pass a vector of colors to scale_fill_manual()'s value argument. Finally, add the facet_grid() argument. If you're only faceting on one variable, you put a period on the opposite side of the tilde. Period first mean's they'll be paneled side-by-side, period last means they'll be stacked on top of each toher
ggplot(data = t1) +
geom_segment(aes(x = 0, xend = years, y = y2001, yend = y2016, color=incr)) +
scale_fill_manual(values=c("black", "red")) +
facet_grid(type~.)
I believe you don't need to create two new datasets, you can add a column to t1.
t2 <- t1
t2$decr <- factor(yd + 0L, labels = c("increase", "decrease"))
I have left the original t1 intact and altered a copy, t2.
Now in order to apply ggplot facets, maybe this is what you are looking for.
ggplot() +
geom_segment(data = t2, aes(x = 0, xend = years, y = y2001, yend = y2016), color = "red") +
facet_wrap(~ decr)
If you want to change the colors, use the new column decr as an value tocolor. Note that this argument changes its position, it is now aes(..., color = decr).
ggplot() +
geom_segment(data = t2, aes(x = 0, xend = years, y = y2001, yend = y2016, color = decr)) +
facet_wrap(~ decr)
require(dplyr)
t1<-mutate(t1,decrease=y2016<y2001)
ggplot(t1)+facet_wrap(~type)+geom_segment(aes(x = 0, xend = years, y = y2001, yend = y2016, colour=decrease))

How can I force ggplot to show more levels on the legend?

I'm trying to create a complex ggplot plot but some things don't work as expected.
I have extracted the problematic part, the creation of points and its associated legend.
library(data.table)
library(ggplot2)
lev <- c("A", "B", "C", "D") # define levels.
bb <- c(40, 30,20,10,5)/100 # define breaks.
ll <- c("40%","30%","20%","10%","5%") # labels.
# Create data
nodos <- data.table(event = c("A", "B", "D", "C", "D"), ord = c(1, 2, 3, 3, 4),
NP = c(0.375, 0.25, 0.125, 0.125, 0.125))
ggplot() + geom_point(data=nodos,aes(x=ord,
y=event, size=NP), color="black", shape=16) +
ylim(lev) + scale_size_continuous(name="Prop.",
breaks=bb, labels=ll, range=c(0,6))+
scale_x_continuous(limits=c(0.5, 4.5),
breaks=seq(1,4,1))
As you can see, no matter what breaks and labels I use I'm not able to force ggplot to paint a legend containing 0% or 10%.
scale_size_continuous keeps creating just two elements.
And the smaller points are very badly scaled.
I have also tried with scale_scale_area, but it doesn't work either.
I'm using R 3.4.2 and ggplot2 2.2.1 (also tried the latest github version).
How can I get it?
If you set the limits to encompass the breaks you'll be able to alter the legend. Current most of the breaks are outside the default limits of the scale.
ggplot() +
geom_point(data = nodos,
aes(x = ord, y = event, size = NP), color="black", shape = 16) +
scale_size_continuous(name = "Prop.",
breaks = bb,
limits = c(.05, .4),
labels = ll,
range = c(0, 6) )

Resources