Time Series in R with ggplot2 - r

I'm a ggplot2 newbie and have a rather simple question regarding time-series plots.
I have a data set in which the data is structured as follows.
Area 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
MIDWEST 10 6 13 14 12 8 10 10 6 9
How do I generate a time series when the data is structured in this format.
With the reshape package, I could just alter the data to look like:
totmidc <- melt(totmidb, id="Area")
totmidc
Area variable value
1 MIDWEST 1998 10
2 MIDWEST 1999 6
3 MIDWEST 2000 13
4 MIDWEST 2001 14
5 MIDWEST 2002 12
6 MIDWEST 2003 8
7 MIDWEST 2004 10
8 MIDWEST 2005 10
9 MIDWEST 2006 6
10 MIDWEST 2007 9
Then run the following code to get the desired plot.
ggplot(totmidc, aes(Variable, Value)) + geom_line() + xlab("") + ylab("")
However, is it possible to generate a time series plot from the first
object in which the columns represent the years.

What is the error that ggplot2 gives you? The following seems to work on my machine:
Area <- as.numeric(unlist(strsplit("1998 1999 2000 2001 2002 2003 2004 2005 2006 2007", "\\s+")))
MIDWEST <-as.numeric(unlist(strsplit("10 6 13 14 12 8 10 10 6 9", "\\s+")))
qplot(Area, MIDWEST, geom = "line") + xlab("") + ylab("")
#Or in a dataframe
df <- data.frame(Area, MIDWEST)
qplot(Area, MIDWEST, data = df, geom = "line") + xlab("") + ylab("")
You may also want to check out the ggplot2 website for details on scale_date et al.

I am guessing that with "time series plot" you mean you want to get a bar chart instead of a line chart?
In that case, you have to modify your code only slightly to pass the correct parameters to geom_bar(). The geom_bar default stat is stat_bin, which will calculate a frequency count of your categories on the x-scale. With your data you want to override this behaviour and use stat_identity.
library(ggplot2)
# Recreate data
totmidc <- data.frame(
Area = rep("MIDWEST", 10),
variable = 1998:2007,
value = round(runif(10)*10+1)
)
# Line plot
ggplot(totmidc, aes(variable, value)) + geom_line() + xlab("") + ylab("")
# Bar plot
# Note that the parameter stat="identity" passed to geom_bar()
ggplot(totmidc, aes(x=variable, y=value)) + geom_bar(stat="identity") + xlab("") + ylab("")
This produces the following bar plot:

Related

How do I get a trendline to appear on my scatter plot in ggplot?

I'm having trouble getting a trend line to appear with on a scatter plot of data structured the following way:
cohort count
<chr> <int>
1 1989 5
2 1990 7
3 1991 4
4 1992 4
5 1993 8
6 1994 7
This is the code I used to produce the plot:
ggplot(bipoc, aes(x = cohort, y = count)) +
geom_point() +
geom_smooth(method = 'lm') +
theme_classic()

transition_time is working but transition_reveal is not working (showing error)

I am trying to animate some plots in R using ggplot2 and gganimate.
I read some tutorials and successfully animated the gapminder data.
However, I have problems when I try to animate my dataset. First I created the plot and try to animate it like this:
animate_data_1 <- ggplot(data_1,
aes(x = Month, y = rain, colour = factor(Year))) +
geom_line(stat = "identity") +
labs(x = "Month", y = "Rain") +
geom_point(show.legend = FALSE, alpha = 0.7) +
scale_color_viridis_d()
animate_data_1
animate_data_1 + transition_time(Year) + labs(title = "Year: {frame_time}")
The above code is creating the animated plot, but it is showing one year at a time and then changing to another year. But, I want all year will change the value at a time (increasing from start to end). Like this example data gradually appear
So, I made this change:
animate_data_1 +
geom_point(aes(group = seq_along(Month))) +
transition_reveal(Year) +
labs(title = "Year: {frame_time}")
Now it showing this error:
Error: Provided file does not exist In addition: There were 50 or more warnings (use warnings() to see the first 50)
What is the problem? How can solve this?
My data:
tem Month Year rain
1 16.9760 1 1901 18.53560
2 19.9026 2 1901 16.25480
3 24.3158 3 1901 70.79810
4 28.1834 4 1901 66.16160
5 27.8892 5 1901 267.21500
6 28.8925 6 1901 341.04200
7 28.3327 7 1901 540.90700
8 27.9243 8 1901 493.21000
9 27.6057 9 1901 291.54900
10 27.0887 10 1901 199.17100
11 22.1671 11 1901 126.28500
12 18.5574 12 1901 1.69035
13 18.5455 1 1902 1.29152
14 20.1252 2 1902 0.14722
15 25.5508 3 1902 62.76860
16 26.5562 4 1902 229.58900
17 27.3165 5 1902 302.19700
18 28.2660 6 1902 528.77500
19 27.6247 7 1902 415.25700
20 28.1001 8 1902 435.16600
21 27.7271 9 1902 282.87200
22 26.0153 10 1902 76.65180
Try this but the time frame in not coming. Howver, this code in not giving the gradual increasing plot but it is giving all year one after another
animate_data_1 +
geom_point(aes(group = seq_along(Year)))+
transition_reveal(Year)
To get a movement for each month, you need to define a new variable to combine year and month as shown below, and then reveal it via transition_time. You can change fps value to increase or decrease the speed.
data_1$Yearm <- data_1$Year + data_1$Month*0.08
p <- animate_data_1 + transition_time(Yearm) +
labs(title = "Year: {frame_time}")
animate(p, fps=5)

Ploting in ggplot2 with geom_line() with label

I'm trying to plot this dataset with ggplot2, putting the name of each country in each line geom_line() and with the x axis (Year) and the y axis (with the relevant data from each country).
The DataSet to Edit
This is what I have so far. I wanted to include the name of the country in each line. The problem is that each country has its data in a separate column.
If you want to use ggplot you should bring your data into a "longer" format. Using package tidyr:
df %<>%
pivot_longer(cols=matches("[^Year]"),
names_to="Country",
values_to="Value")
gives you
# A tibble: 108 x 3
Year Country Value
<dbl> <chr> <dbl>
1 1995 Argentina 4122262
2 1995 Bolivia 3409890
3 1995 Brazil 36276255
4 1995 Chile 2222563
5 1995 Colombia 10279222
6 1995 Costa_Rica 1611055
7 1997 Argentina 4100563
8 1997 Bolivia 3391943
9 1997 Brazil 35718095
10 1997 Chile 2208382
Based on this it is easy to plot a line for each country using ggplot2:
ggplot(df, aes(x=Year, y=Value, color=Country)) +
geom_line()
You kind of answered your question. You require the package reshape to bring all countries into a single column.
Year<-c(1991,1992,1993,1994,1995,1996)
Argentina<-c(235,531,3251,3153,13851,16513)
Mexico<-c(16503,16035,3516,3155,30351,16513)
Japan<-c(1651,868416,68165,35135,03,136816)
df<-data.frame(Year,Argentina,Mexico,Japan)
library(reshape2)
df2<- melt(data = df, id.vars = "Year", Cont.Val=c("Argentina","Mexico","Japan"))
library(ggplot2)
ggplot(df2, aes(x=Year, y=value, group=variable, color=variable))+
geom_line()

Line chart issues - plot looks "funny" (ggplot2)

I have a large dataframe (CO2_df) with many years for many countries, and tried to plot a graph with ggplot2. This graph will have 6 curves + an aggregate curve. However, my graph looks pretty "funny" and I have no idea why.
The data looks like this (excerpt):
x y x1 x2 x4 x6
1553 1993 0.00000 CO2 Austria 6 6 - Other Sector
1554 2006 0.00000 CO2 Austria 6 6 - Other Sector
1555 2015 0.00000 CO2 Austria 6 6 - Other Sector
2243 1998 12.07760 CO2 Austria 5 5 - Waste management
2400 1992 11.12720 CO2 Austria 5 5 - Waste management
2401 1995 11.11040 CO2 Austria 5 5 - Waste management
2402 2006 10.26000 CO2 Austria 5 5 - Waste management
2489 1998 0.00000 CO2 Austria 6 6 - Other Sector
I have used this code:
ggplot(data=CO2_df, aes(x=x, y=y, group=x6, colour=x6)) +
geom_line() +
geom_point() +
ggtitle("Austria") +
xlab("Year") +
ylab("C02 Emissions") +
labs(colour = "Sectors")
scale_color_brewer(palette="Dark2")
CO2_df %>%
group_by(x) %>%
mutate(sum.y = sum(y)) %>%
ggplot(aes(x=x, y=y, group=x6, colour=x6)) +
geom_line() +
geom_point() +
ggtitle("Austria") +
xlab("Year") +
ylab("C02 Emissions") +
labs(colour = "Sectors")+
scale_color_brewer(palette="Dark2")+
geom_line(aes(y = sum.y), color = "black")
My questions
1) Why does it look like this and how can I solve it?
2) I have no idea why the value on the y axis are close to zero. They are not...
3) How can I add an entry to the legend for the aggregate line?
Thank you for any sort of help!
Nordsee
What about something like this:
CO2_df %>% # data
group_by(x,x6) %>% # group by
summarise(y = sum(y)) %>% # add the sum per group
ggplot(aes(x=x, y=y)) + # plot
geom_line(aes(group=x6, fill=x6, color=x6))+
# here you can put a summary line, like sum, or mean, and so on
stat_summary(fun.y = sum, na.rm = TRUE, color = 'black', geom ='line') +
geom_point() +
ggtitle("Austria") +
xlab("Year") +
ylab("C02 Emissions") +
labs(colour = "Sectors")+
scale_color_brewer(palette="Dark2"))
With modified data, to see the right behaviour, I've put same years and very different values to understand:
CO2_df <- read.table(text ="
x y x1 x2 x4 x6
1553 1993 20 CO2 'Austria' 6 '6 - Other Sector'
1554 1994 23 CO2 'Austria' 6 '6 - Other Sector'
1555 1995 43 CO2 'Austria' 6 '6 - Other Sector'
2243 1993 12.07760 CO2 'Austria' 5 '5 - Waste management'
2400 1994 11.12720 CO2 'Austria' 5 '5 - Waste management'
2401 1995 11.11040 CO2 'Austria' 5 '5 - Waste management'
2402 1996 10.26000 CO2 'Austria' 5 '5 - Waste management'
2489 1996 50 CO2 'Austria' 6 '6 - Other Sector'", header = T)

How to drop unused factors in faceted R ggplot boxplot?

Below is some example code I use to make some boxplots:
stest <- read.table(text=" site year conc
south 2001 5.3
south 2001 4.67
south 2001 4.98
south 2002 5.76
south 2002 5.93
north 2001 4.64
north 2001 6.32
north 2003 11.5
north 2003 6.3
north 2004 9.6
north 2004 56.11
north 2004 63.55
north 2004 61.35
north 2005 67.11
north 2006 39.17
north 2006 43.51
north 2006 76.21
north 2006 158.89
north 2006 122.27
", header=TRUE)
require(ggplot2)
ggplot(stest, aes(x=year, y=conc)) +
geom_boxplot(horizontal=TRUE) +
facet_wrap(~site, ncol=1) +
coord_flip() +
scale_y_log10()
Which results in this:
I tried everything I could think of but cannot make a plot where the south facet only contains years where data is displayed (2001 and 2002). Is what I am trying to do possible?
Here is a link (DEAD) to the screenshot showing what I want to achieve:
Use the scales='free.x' argument to facet_wrap. But I suspect you'll need to do more than that to get the plot you're looking for.
Specifically aes(x=factor(year), y=conc) in your initial ggplot call.
A simple way to circumvent your problem (with a fairly good result):
generate separately the two boxplots and then join them together using the grid.arrange command of the gridExtra package.
library(gridExtra)
p1 <- ggplot(subset(stest,site=="north"), aes(x=factor(year), y=conc)) +
geom_boxplot(horizontal=TRUE) + coord_flip() + scale_y_log10(name="")
p2 <- ggplot(subset(stest,site=="south"), aes(x=factor(year), y=conc)) +
geom_boxplot(horizontal=TRUE) + coord_flip() +
scale_y_log10(name="X Title",breaks=seq(4,6,by=.5)) +
grid.arrange(p1, p2, ncol=1)

Resources