Line chart issues - plot looks "funny" (ggplot2) - r

I have a large dataframe (CO2_df) with many years for many countries, and tried to plot a graph with ggplot2. This graph will have 6 curves + an aggregate curve. However, my graph looks pretty "funny" and I have no idea why.
The data looks like this (excerpt):
x y x1 x2 x4 x6
1553 1993 0.00000 CO2 Austria 6 6 - Other Sector
1554 2006 0.00000 CO2 Austria 6 6 - Other Sector
1555 2015 0.00000 CO2 Austria 6 6 - Other Sector
2243 1998 12.07760 CO2 Austria 5 5 - Waste management
2400 1992 11.12720 CO2 Austria 5 5 - Waste management
2401 1995 11.11040 CO2 Austria 5 5 - Waste management
2402 2006 10.26000 CO2 Austria 5 5 - Waste management
2489 1998 0.00000 CO2 Austria 6 6 - Other Sector
I have used this code:
ggplot(data=CO2_df, aes(x=x, y=y, group=x6, colour=x6)) +
geom_line() +
geom_point() +
ggtitle("Austria") +
xlab("Year") +
ylab("C02 Emissions") +
labs(colour = "Sectors")
scale_color_brewer(palette="Dark2")
CO2_df %>%
group_by(x) %>%
mutate(sum.y = sum(y)) %>%
ggplot(aes(x=x, y=y, group=x6, colour=x6)) +
geom_line() +
geom_point() +
ggtitle("Austria") +
xlab("Year") +
ylab("C02 Emissions") +
labs(colour = "Sectors")+
scale_color_brewer(palette="Dark2")+
geom_line(aes(y = sum.y), color = "black")
My questions
1) Why does it look like this and how can I solve it?
2) I have no idea why the value on the y axis are close to zero. They are not...
3) How can I add an entry to the legend for the aggregate line?
Thank you for any sort of help!
Nordsee

What about something like this:
CO2_df %>% # data
group_by(x,x6) %>% # group by
summarise(y = sum(y)) %>% # add the sum per group
ggplot(aes(x=x, y=y)) + # plot
geom_line(aes(group=x6, fill=x6, color=x6))+
# here you can put a summary line, like sum, or mean, and so on
stat_summary(fun.y = sum, na.rm = TRUE, color = 'black', geom ='line') +
geom_point() +
ggtitle("Austria") +
xlab("Year") +
ylab("C02 Emissions") +
labs(colour = "Sectors")+
scale_color_brewer(palette="Dark2"))
With modified data, to see the right behaviour, I've put same years and very different values to understand:
CO2_df <- read.table(text ="
x y x1 x2 x4 x6
1553 1993 20 CO2 'Austria' 6 '6 - Other Sector'
1554 1994 23 CO2 'Austria' 6 '6 - Other Sector'
1555 1995 43 CO2 'Austria' 6 '6 - Other Sector'
2243 1993 12.07760 CO2 'Austria' 5 '5 - Waste management'
2400 1994 11.12720 CO2 'Austria' 5 '5 - Waste management'
2401 1995 11.11040 CO2 'Austria' 5 '5 - Waste management'
2402 1996 10.26000 CO2 'Austria' 5 '5 - Waste management'
2489 1996 50 CO2 'Austria' 6 '6 - Other Sector'", header = T)

Related

How do I get a trendline to appear on my scatter plot in ggplot?

I'm having trouble getting a trend line to appear with on a scatter plot of data structured the following way:
cohort count
<chr> <int>
1 1989 5
2 1990 7
3 1991 4
4 1992 4
5 1993 8
6 1994 7
This is the code I used to produce the plot:
ggplot(bipoc, aes(x = cohort, y = count)) +
geom_point() +
geom_smooth(method = 'lm') +
theme_classic()

CREATE A TIME SERIES PLOT in r with ggplot

I have problems with coding of BIG DATA.
view(data)
Year
Month
Deaths
1998
1
200
1998
2
40
1998
3
185
1998
4
402
1998
5
20
1998
6
48
1998
7
290
1998
8
15
1998
9
252
1998
10
409
1998
11
233
1998
12
122
My data goes until 2014. I would like to create a time series. In the x-Axis only some years are available in 5 year step. In the y axis the deaths of all month during the 2000 years are shown. I don't know how can I code that?
I am not sure if it is right because I didn't have any data. I have this from a programming book
data$date = as.Date(paste(data$Year, data$Month,1), format = "%Y %m %d")
ggplot(data,
aes(
x = date,
y = Deaths,
)) +
geom_line() +
ggtitle("Time series") +
xlab("Year") +
ylab("Deaths")
Update if you want a month break, you can use
scale_x_date(date_breaks = "year", date_labels = "%Y", date_minor_breaks = "month")

r - Calculate % within a Sub Group using Dplyr

I want to chart the relative no of fatalities by year for each of various event types.
I can do with with facets in ggplot but am struggling to calculate the % By Event based on Event, Year and no of fatalities.
Event Type Year Fatalities % by Event
(calculated)
----- ---- ---------- ----------
Storm 1980 5 12.5%
Storm 1981 9 22.5%
Storm 1982 15 37.5%
Storm 1983 11 27.5%
Ice 1980 7 70%
Ice 1981 3 30%
I have the following code to calculate it, but the calculation is not working with the % using a much higher denominator.
fatalitiesByYearType <- stormDF %>%
group_by(eventType) %>%
mutate(totalEventFatalities = sum(FATALITIES)) %>%
group_by(year, add = TRUE) %>%
mutate(fatalitiesPct = sum(FATALITIES) / totalEventFatalities)
What am I doing wrong?
My charting as a below. I include this in case as I'm also interested to see whether there is a way of showing data in a proportionate way within ggplot.
p <- ggplot(data = fatalitiesByYearType,
aes(x=factor(year),y=fatalitiesPct))
p + geom_bar(stat="identity") +
facet_wrap(.~eventType, nrow = 5) +
labs(x = "Year",
y = "Fatalities",
title = "Fatalities by Type")
Maybe I do not get your problem, but we can start from here:
library(dplyr)
library(ggplot2)
# here the dplyr part
dats <- fatalitiesByYearType %>%
group_by(eventType) %>%
mutate(totalEventFatalities = sum(FATALITIES)) %>%
group_by(year, add = TRUE) %>%
# here we add the summarise
summarise(fatalitiesPct = sum(FATALITIES) / totalEventFatalities)
dats
# A tibble: 6 x 3
# Groups: eventType [?]
eventType year fatalitiesPct
<fct> <int> <dbl>
1 Ice 1980 0.7
2 Ice 1981 0.3
3 Storm 1980 0.125
4 Storm 1981 0.225
5 Storm 1982 0.375
6 Storm 1983 0.275
You can clearly merge everything in an unique dplyr chain:
# here the ggplot2 part
p <- ggplot(dats,aes(x=factor(year),y=fatalitiesPct)) +
geom_bar(stat="identity") +
facet_wrap(.~eventType, nrow = 5) +
labs(x = "Year", y = "Fatalities", title = "Fatalities by Type") +
# here we add the % in the plot
scale_y_continuous(labels = scales::percent)
With data:
fatalitiesByYearType <- read.table(text = "eventType year FATALITIES
Storm 1980 5
Storm 1981 9
Storm 1982 15
Storm 1983 11
Ice 1980 7
Ice 1981 3 ",header = T)

ggplot sub-plots with categorical and numeric in R

I have a the following table and I need to plot this to show (week in x-axis and percent in y-axis). MY following code plots nothing but gives me a message. Can someone help me to fix this?
Any help is appreciated.
dfx1:
Year State Cty Week ac_sum percent
1998 KS Coffey 10-1 79 6.4
1998 KS Coffey 10-3 764 62
1998 KS Coffey 10-4 951 77.2
1998 KS Coffey 10-5 1015 82.4
1998 KS Coffey 11-2 1231 100
1998 KS Crawford 10-3 79 6.1
1998 KS Crawford 10-4 764 15.8
1998 KS Crawford 10-5 951 84.1
1998 KS Crawford 11-2 1015 100
.
.
.
.
gg <- ggplot(dfx1, aes(Week,percent, col=Year))
gg <- gg + geom_line()
gg <- gg + facet_wrap(~Cty, 2, scales = "fixed")
gg <- gg + xlim(c(min(dfx1$Week), max(dfx1$Week)))
plot(gg)
geom_path: Each group consists of only one observation. Do you need to
adjust the group aesthetic?
Is this what you want?
dfx1 <- read.table(text="Year State Cty Week ac_sum percent
1998 KS Coffey 10-1 79 6.4
1998 KS Coffey 10-3 764 62
1998 KS Coffey 10-4 951 77.2
1998 KS Coffey 10-5 1015 82.4
1998 KS Coffey 11-2 1231 100
1998 KS Crawford 10-3 79 6.1
1998 KS Crawford 10-4 764 15.8
1998 KS Crawford 10-5 951 84.1
1998 KS Crawford 11-2 1015 100", header=T)
library(ggplot2)
ggplot(dfx1, aes(Week,percent, col=Year)) +
geom_point() +
facet_wrap(~Cty, 2, scales = "fixed")
ggplot(dfx1, aes(Week,percent, col=Year, group=1)) +
geom_point() + geom_line() +
facet_wrap(~Cty, 2, scales = "fixed")
You can look at other answers like this one to see that you're missing group = Year in your plot. Adding it in will give you what you are looking for:
library(ggplot2)
dfx1$Week <- factor(dfx1$Week, ordered = T)
ggplot(dfx1, aes(Week, percent, col = Year, group = Year)) +
geom_line() +
facet_wrap(~Cty, 2, scales = 'fixed')
With your last line it looks like you're wanting to only show the Weeks that actually have data. You can do that with scales = 'free', like so:
ggplot(dfx1, aes(Week, percent, col = Year, group = Year)) +
geom_line() +
facet_wrap(~Cty, 2, scales = 'free')

Time Series in R with ggplot2

I'm a ggplot2 newbie and have a rather simple question regarding time-series plots.
I have a data set in which the data is structured as follows.
Area 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
MIDWEST 10 6 13 14 12 8 10 10 6 9
How do I generate a time series when the data is structured in this format.
With the reshape package, I could just alter the data to look like:
totmidc <- melt(totmidb, id="Area")
totmidc
Area variable value
1 MIDWEST 1998 10
2 MIDWEST 1999 6
3 MIDWEST 2000 13
4 MIDWEST 2001 14
5 MIDWEST 2002 12
6 MIDWEST 2003 8
7 MIDWEST 2004 10
8 MIDWEST 2005 10
9 MIDWEST 2006 6
10 MIDWEST 2007 9
Then run the following code to get the desired plot.
ggplot(totmidc, aes(Variable, Value)) + geom_line() + xlab("") + ylab("")
However, is it possible to generate a time series plot from the first
object in which the columns represent the years.
What is the error that ggplot2 gives you? The following seems to work on my machine:
Area <- as.numeric(unlist(strsplit("1998 1999 2000 2001 2002 2003 2004 2005 2006 2007", "\\s+")))
MIDWEST <-as.numeric(unlist(strsplit("10 6 13 14 12 8 10 10 6 9", "\\s+")))
qplot(Area, MIDWEST, geom = "line") + xlab("") + ylab("")
#Or in a dataframe
df <- data.frame(Area, MIDWEST)
qplot(Area, MIDWEST, data = df, geom = "line") + xlab("") + ylab("")
You may also want to check out the ggplot2 website for details on scale_date et al.
I am guessing that with "time series plot" you mean you want to get a bar chart instead of a line chart?
In that case, you have to modify your code only slightly to pass the correct parameters to geom_bar(). The geom_bar default stat is stat_bin, which will calculate a frequency count of your categories on the x-scale. With your data you want to override this behaviour and use stat_identity.
library(ggplot2)
# Recreate data
totmidc <- data.frame(
Area = rep("MIDWEST", 10),
variable = 1998:2007,
value = round(runif(10)*10+1)
)
# Line plot
ggplot(totmidc, aes(variable, value)) + geom_line() + xlab("") + ylab("")
# Bar plot
# Note that the parameter stat="identity" passed to geom_bar()
ggplot(totmidc, aes(x=variable, y=value)) + geom_bar(stat="identity") + xlab("") + ylab("")
This produces the following bar plot:

Resources