How to get the smooth line for monthly rainfall using ggplot? - r

I am trying to plot the monthly rainfall data from 1986 to 2016 using ggplot. My dataframe looks like this:
head(df)
Year Month Station Rainfall Remarks
1 1986 Jan stn1 0.0 Observed
2 1986 Feb stn1 10.4 Observed
3 1986 Mar stn1 16.5 Estimated
4 1986 Apr stn1 34.0 Observed
5 1986 May stn1 27.0 Observed
6 1986 Jun stn1 159.4 Observed
str(df)
'data.frame': 1488 obs. of 5 variables:
$ Year : chr "1986" "1986" "1986" "1986" ...
$ Month : Ord.factor w/ 12 levels "Jan"<"Feb"<"Mar"<..: 1 2 3 4 5 6 7 8 9 10 ...
$ Station : Factor w/ 4 levels "stn1","stn2",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Rainfall: num 0 10.4 16.5 34 27 ...
$ Remarks : Factor w/ 2 levels "Estimated","Observed": 2 2 1 2 2 2 2 2 2 2 ...
I tried the following code:
library(ggplot2)
ggplot(df, aes(x=Year, y=Rainfall, col=Station)) + geom_line()
However the above code results in vertical lines plot, while I want to have smooth varying lines.
I want to plot all the four station (stn1 to stn4) such that the color of each line be based on the df$Remarks.
Also is it possible to have unique color for each station?
Your help would be appreciated

Here is one approach if you create a month-year variable:
library(ggplot2)
library(zoo)
df$Mo_Yr <- as.yearmon(paste0(df$Year, '-', df$Month), "%Y-%b")
ggplot(df, aes(x=Mo_Yr, y=Rainfall, col=Station)) +
geom_line() +
scale_x_yearmon()
If you want to use different color points for Remarks (Observed and Estimated), for a single Station, you could try the following:
ggplot(df, aes(x=Mo_Yr, y=Rainfall)) +
geom_point(aes(col = Remarks)) +
geom_line() +
scale_x_yearmon()
If you want to plot 2 lines for Observed and Estimated, you could add col argument to geom_line as below. Note I added some example data to illustrate. Depending on what data you have available this may (or may not) be what you need.
ggplot(df, aes(x=Mo_Yr, y=Rainfall)) +
geom_line(aes(col=Remarks)) +
scale_x_yearmon()
Data (for last example)
df <- read.table(text =
"Year Month Station Rainfall Remarks
1986 Jan stn1 0.0 Observed
1986 Feb stn1 10.4 Observed
1986 Mar stn1 16.5 Estimated
1986 Apr stn1 34.0 Observed
1986 May stn1 27.0 Observed
1986 Jun stn1 159.4 Observed
1986 Jul stn1 83.1 Estimated
1986 Aug stn1 55.7 Observed
1986 Sep stn1 12.3 Estimated", header = T, stringsAsFactors = T)

You might want to try passing the stat_smooth parameter
ggplot(df) +
geom_line(aes(y= Rainfall, x= Year, color= Station)) +
stat_smooth(aes(y= Rainfall, x= Year), method = lm, formula = y ~ poly(x, 10), se = FALSE)

Related

CREATE A TIME SERIES PLOT in r with ggplot

I have problems with coding of BIG DATA.
view(data)
Year
Month
Deaths
1998
1
200
1998
2
40
1998
3
185
1998
4
402
1998
5
20
1998
6
48
1998
7
290
1998
8
15
1998
9
252
1998
10
409
1998
11
233
1998
12
122
My data goes until 2014. I would like to create a time series. In the x-Axis only some years are available in 5 year step. In the y axis the deaths of all month during the 2000 years are shown. I don't know how can I code that?
I am not sure if it is right because I didn't have any data. I have this from a programming book
data$date = as.Date(paste(data$Year, data$Month,1), format = "%Y %m %d")
ggplot(data,
aes(
x = date,
y = Deaths,
)) +
geom_line() +
ggtitle("Time series") +
xlab("Year") +
ylab("Deaths")
Update if you want a month break, you can use
scale_x_date(date_breaks = "year", date_labels = "%Y", date_minor_breaks = "month")

How to plot monthly data having in the x-axis months and Years R studio

I have a dataframe where column 1 are Months, column 2 are Years and column 3 are precipitation values.
I want to plot the precipitation values for EACH month and EACH year.
My data goes from at January 1961 to February 2019.
¿How can I plot that?
Here is my data:
If I use this:
plot(YearAn,PPMensual,type="l",col="red",xlab="años", ylab="PP media anual")
I get this:
Which is wrong because it puts all the monthly values in every single year! What Im looking for is an x axis that looks like "JAN-1961, FEB1961....until FEB-2019"
It can be done easily using ggplot/tidyverse packages.
First lets load the the packages (ggplot is part of tidyverse) and create a sample data:
library(tidyverse)
set.seed(123)
df <- data.frame(month = rep(c(1:12), 2),
year = rep(c("1961", "1962"),
each = 12),
ppmensual = rnorm(24, 5, 2))
Now we can plot the data (df):
df %>%
ggplot(aes(month, ppmensual,
group = year,
color = year)) +
geom_line()
Using lubridate and ggplot2 but with no grouping:
Setup
library(lubridate) #for graphic
library(ggplot2) # for make_date()
df <- tibble(month = rep(month.name, 40),
year = rep(c(1961:2000), each = 12),
PP = runif(12*40) * runif(12*40) * 10) # PP data is random here
print(df, n = 20)
month year PP
<chr> <int> <dbl>
1 January 1961 5.42
2 February 1961 0.855
3 March 1961 5.89
4 April 1961 1.37
5 May 1961 0.0894
6 June 1961 2.63
7 July 1961 1.89
8 August 1961 0.148
9 September 1961 0.142
10 October 1961 3.49
11 November 1961 1.92
12 December 1961 1.51
13 January 1962 5.60
14 February 1962 1.69
15 March 1962 1.14
16 April 1962 1.81
17 May 1962 8.11
18 June 1962 0.879
19 July 1962 4.85
20 August 1962 6.96
# … with 460 more rows
Graph
df %>%
ggplot(aes(x = make_date(year, factor(month)), y = PP)) +
geom_line() +
xlab("años")

Time Series Heat Map

I wish to create a single heatmap using ggplot including following factors:
year-month, mkt_name, mp_price
str(df)
'data.frame': 2655 obs. of 5 variables:
$ year : int 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
$ yearmonthf: Factor w/ 48 levels "Jan 2012","Feb 2012",..: 1 2 3 4 5 6 7 8 9 10 ...
$ month : int 1 2 3 4 5 6 7 8 9 10 ...
$ mkt_name : Factor w/ 63 levels "Base","Birambo",..: 2 2 2 2 2 2 2 2 2 2 ...
$ mp_price : num 145 136 160 163 181 ...
df <- read.csv('/Users/shashankshekhar/Desktop/R Food Price/Foodprice/datasets/Correlation/Heatmap/Potatoesheat.csv')
df$date <- as.Date(df$date) # format date
df <- df[df$year >= 2012, ] # filter reqd years
# Create Month Week
df$yearmonth <- as.yearmon(df$date)
df$yearmonthf <- factor(df$yearmonth)
df <- df[, c("year", "yearmonthf", "month", "mkt_name", "mp_price")]
head(df)
year yearmonthf month mkt_name mp_price
15 2012 Jan 2012 1 Birambo 145.00
16 2012 Feb 2012 2 Birambo 136.25
17 2012 Mar 2012 3 Birambo 160.00
18 2012 Apr 2012 4 Birambo 162.75
19 2012 May 2012 5 Birambo 181.00
20 2012 Jun 2012 6 Birambo 170.00
Heatmap ggplot
ggplot(df, aes(mkt_name, year, fill = mp_price)) +
geom_tile(colour = "white") +
facet_grid(year~mkt_name) +
scale_fill_gradient(low="red", high="green") +
labs(x="Week of Month",
y="",
title = "Time-Series Calendar Heatmap",
subtitle="Potato Price",
fill="Price")
I think you mis-attributed your variables. Instead try:
ggplot(df, aes(x = yearmonth, y = mkt_name, fill = mp_price))+
geom_tile()+
scale_x_date(date_labels = "%b %Y")+
scale_fill_gradient(low="red", high="green") +
labs(x="Week of Month",
y="",
title = "Time-Series Calendar Heatmap",
subtitle="Potato Price",
fill="Price")
Does it answer your question ?
reproducible example
df <- data.frame(mkt_name = rep("Birambo",6),
year = 2012,
yearmonth = seq(as.Date("2012-01-01", format = "%Y-%m-%d"), as.Date("2012-06-01", format = "%Y-%m-%d"), by = "month"),
month = 1:6,
mp_price = c(145,136.25,160,162.75,181,170))

How to calculate the average year

I have a 20-year monthly XTS time series
Jan 1990 12.3
Feb 1990 45.6
Mar 1990 78.9
..
Jan 1991 34.5
..
Dec 2009 89.0
I would like to get the average (12-month) year, or
Jan xx
Feb yy
...
Dec kk
where xx is the average of every January, yy of every February, and so on.
I have tried apply.yearly and lapply but these return 1 value, which is the 20-year total average
Would you have any suggestions? I appreciate it.
The lubridate package could be useful for you. I would use the functions year() and month() in conjunction with aggregate():
library(xts)
library(lubridate)
#set up some sample data
dates = seq(as.Date('2000/01/01'), as.Date('2005/01/01'), by="month")
df = data.frame(rand1 = runif(length(dates)), rand2 = runif(length(dates)))
my_xts = xts(df, dates)
#get the mean by year
aggregate(my_xts$rand1, by=year(index(my_xts)), FUN=mean)
This outputs something like:
2000 0.5947939
2001 0.4968154
2002 0.4941752
2003 0.5291211
2004 0.6631564
To find the mean for each month you can do:
#get the mean by month
aggregate(my_xts$rand1, by=month(index(my_xts)), FUN=mean)
which will output something like
1 0.5560279
2 0.6352220
3 0.3308571
4 0.6709439
5 0.6698147
6 0.7483192
7 0.5147294
8 0.3724472
9 0.3266859
10 0.5331233
11 0.5490693
12 0.4642588

Height of tile with discrete values in ggplot2

I am trying to make a heat map for one year (2014) for about a 180 countries where the fill is GHG emissions. The y axis is supposed to be the countries and the x axis is supposed to be the year. Following is the code I am using,
g<-ggplot(data, aes(x = Year, y = Country, fill = `GHG_Emissions_Per_Capita_w_LULUCF_(tCO2e)`)) +
geom_tile() + scale_fill_gradient2(low = colors[1],
mid = colors[paletteSize/2],
high = colors[paletteSize],
midpoint = (max(data$`GHG_Emissions_Per_Capita_w_LULUCF_(tCO2e)`)+min(data$`GHG_Emissions_Per_Capita_w_LULUCF_(tCO2e)`)) / 2,
name = "Total GHG w LULUCF")
I am facing 2 problems,
1. The names on the y axis overlap and I cannot seem to make the y axis longer or the tiles higher to accommodate the name
2. The x axis is supposed to be one discrete element "2014" which gets split into a continuous variable.
Here is the output of str(data)
'data.frame': 191 obs. of 4 variables:
$ Country : chr "Afghanistan" "Albania" "Algeria" "Andorra" ...
$ Year : int 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...
$ GHG_Emissions_Per_Capita_wo_LULUCF_(tCO2e): num 1.02 3.13 5.17 6.54 5.86 ...
$ GHG_Emissions_Per_Capita_w_LULUCF_(tCO2e) : num 1.02 3 5.16 6.27 9.36 ...
Here is some sample data.
Country Year GHG_Emissions_Per_Capita_wo_LULUCF_(tCO2e) GHG_Emissions_Per_Capita_w_LULUCF_(tCO2e)
1 Afghanistan 2014 1.0185643 1.0185643
2 Albania 2014 3.1277710 3.0039601
3 Algeria 2014 5.1667095 5.1564317
4 Andorra 2014 6.5401349 6.2655871
5 Angola 2014 5.8623365 9.3643598
6 Antigua & Barbuda 2014 11.4753630 11.5420209
7 Argentina 2014 8.1115625 10.3127052
8 Armenia 2014 2.9682122 2.9177457
9 Australia 2014 25.1371295 22.3016710
10 Austria 2014 8.7978737 8.2251670
11 Azerbaijan 2014 7.6137557 6.7257221
12 Bahamas, The 2014 7.1029869 8.1139307
This is a screenshot of the plot I am trying to make
Thanks in advance for your help.

Resources