plot data with different dates - r

i have some trouble with the plots of my dataset.
This is an extract of my dataset.
Date Month Year Value 1
30/05/96 May 1996 1835
06/12/96 December 1996 1770
18/03/97 March 1997 1640
27/06/97 June 1997 1379
30/09/97 September 1997 1195
24/11/97 November 1997 1335
13/03/98 March 1998 1790
07/05/98 May 1998 349
14/07/98 July 1998 1179
27/10/98 October 1998 665
What I would like to do is a plot with Value 1 (y) against the mount (x) for every year. In other words, a plot with 3 lines that show the variation of Value 1 every month in th different years.
I do the following:
plot(x[Year==1996,4], xaxt="n")
par(new=T)
plot(x[Year==1997,4], xaxt="n")
axis(1, at=1:length(x$Month), labels=x$Month)
The problem is that the first value of 1996 refers to may, and the first value of 1997 refers to march. Due to that, the values plotted are mixed and don't correspond to their month anymore.
Is there a way to plot all these values in the same graph keeping the original correspondence of the data?

df <- read.table(text="Date Month Year Value1
30/05/96 May 1996 1835
06/12/96 December 1996 1770
18/03/97 March 1997 1640
27/06/97 June 1997 1379
30/09/97 September 1997 1195
24/11/97 November 1997 1335
13/03/98 March 1998 1790
07/05/98 May 1998 349
14/07/98 July 1998 1179
27/10/98 October 1998 665", header=T, as.is=T)
df$Month <- factor(df$Month, levels=month.name, ordered=T)
library(ggplot2)
ggplot(df) + geom_line(aes(Month, Value1, group=Year)) +
facet_grid(Year~.)

And a lattice alternative using #Michele df. I show here the 2 alternative (with and without faceting)
library(lattice)
library(gridExtra)
p1 <- xyplot(Value1~Month,groups=Year,data=df,
type=c('p','l'),auto.key=list(columns=3,lines=TRUE))
p2 <- xyplot(Value1~Month|Year,groups=Year,data=df,layout= c(1,3),
type=c('p','l'),auto.key=list(columns=3,lines=TRUE))
grid.arrange(p1,p2)

Create a numeric value for your months:
x$MonthNum <- sapply(x$Month, function(x) which(x==month.name))
Then plot using those numeric values, but label the axes with words.
plot(NA, xaxt="n", xlab="Month", xlim=c(0,13),
ylim=c(.96*min(x$Value),1.04*max(x$Value)), type="l")
z <- sapply(1996:1998, function(y) with(x[x$Year==y,], lines(MonthNum, Value1)))
axis(1, at=1:12, labels=month.name)
And some labels, if you want to identify years:
xlabpos <- tapply(x$MonthNum, x$Year, max)
ylabpos <- mapply(function(mon, year) x$Value1[x$MonthNum==mon & x$Year==year],
xlabpos, dimnames(xlabpos)[[1]])
text(x=xlabpos+.5, y=ylabpos, labels=dimnames(xlabpos)[[1]])
One could also obtain something similar to the ggplot example using layout:
par(mar=c(2,4,1,1))
layout(matrix(1:3))
z <- sapply(1996:1998, function(y) {
with(x[x$Year==y,], plot(Value1 ~ MonthNum, xaxt="n", xlab="Month", ylab=y,
xlim=c(0,13), ylim=c(.96*min(x$Value),1.04*max(x$Value)), type="l"))
axis(1, at=1:12, labels=month.name)
})

Related

How to plot monthly data having in the x-axis months and Years R studio

I have a dataframe where column 1 are Months, column 2 are Years and column 3 are precipitation values.
I want to plot the precipitation values for EACH month and EACH year.
My data goes from at January 1961 to February 2019.
¿How can I plot that?
Here is my data:
If I use this:
plot(YearAn,PPMensual,type="l",col="red",xlab="años", ylab="PP media anual")
I get this:
Which is wrong because it puts all the monthly values in every single year! What Im looking for is an x axis that looks like "JAN-1961, FEB1961....until FEB-2019"
It can be done easily using ggplot/tidyverse packages.
First lets load the the packages (ggplot is part of tidyverse) and create a sample data:
library(tidyverse)
set.seed(123)
df <- data.frame(month = rep(c(1:12), 2),
year = rep(c("1961", "1962"),
each = 12),
ppmensual = rnorm(24, 5, 2))
Now we can plot the data (df):
df %>%
ggplot(aes(month, ppmensual,
group = year,
color = year)) +
geom_line()
Using lubridate and ggplot2 but with no grouping:
Setup
library(lubridate) #for graphic
library(ggplot2) # for make_date()
df <- tibble(month = rep(month.name, 40),
year = rep(c(1961:2000), each = 12),
PP = runif(12*40) * runif(12*40) * 10) # PP data is random here
print(df, n = 20)
month year PP
<chr> <int> <dbl>
1 January 1961 5.42
2 February 1961 0.855
3 March 1961 5.89
4 April 1961 1.37
5 May 1961 0.0894
6 June 1961 2.63
7 July 1961 1.89
8 August 1961 0.148
9 September 1961 0.142
10 October 1961 3.49
11 November 1961 1.92
12 December 1961 1.51
13 January 1962 5.60
14 February 1962 1.69
15 March 1962 1.14
16 April 1962 1.81
17 May 1962 8.11
18 June 1962 0.879
19 July 1962 4.85
20 August 1962 6.96
# … with 460 more rows
Graph
df %>%
ggplot(aes(x = make_date(year, factor(month)), y = PP)) +
geom_line() +
xlab("años")

Only out-of-sample forecast plot using auto.arima and xreg

this is my first post so sorry if this is clunky or not formatted well.
period texas u3 national u3
1976 5.758333333 7.716666667
1977 5.333333333 7.066666667
1978 4.825 6.066666667
1979 4.308333333 5.833333333
1980 5.141666667 7.141666667
1981 5.291666667 7.6
1982 6.875 9.708333333
1983 7.916666667 9.616666667
1984 6.125 7.525
1985 7.033333333 7.191666667
1986 8.75 6.991666667
1987 8.441666667 6.191666667
1988 7.358333333 5.491666667
1989 6.658333333 5.266666667
1990 6.333333333 5.616666667
1991 6.908333333 6.816666667
1992 7.633333333 7.508333333
1993 7.158333333 6.9
1994 6.491666667 6.083333333
1995 6.066666667 5.608333333
1996 5.708333333 5.416666667
1997 5.308333333 4.95
1998 4.883333333 4.508333333
1999 4.666666667 4.216666667
2000 4.291666667 3.991666667
2001 4.941666667 4.733333333
2002 6.341666667 5.775
2003 6.683333333 5.991666667
2004 5.941666667 5.533333333
2005 5.408333333 5.066666667
2006 4.891666667 4.616666667
2007 4.291666667 4.616666667
2008 4.808333333 5.775
2009 7.558333333 9.266666667
2010 8.15 9.616666667
2011 7.758333333 8.95
2012 6.725 8.066666667
2013 6.283333333 7.375
2014 5.1 6.166666667
2015 4.45 5.291666667
2016 4.633333333 4.866666667
2017 4.258333333 4.35
2018 3.858333333 3.9
2019 ____ 3.5114
2020 ____ 3.477
2021 ____ 3.7921
2022 ____ 4.0433
2023 ____ 4.1339
2024 ____ 4.2269
2025 ____ 4.2738
How can one use auto.arima in R with an external regressor to make a forecast but only plot the out-of-sample values? I believe the forecast values are correct but the years do not match up correctly. So if I have annual data from 1976-2018 and I forecast the dependent variable (column 2) (I want to forecast through 2025), it plots the "forecast" for the time period 2019-2068. Weirdly enough, the figures match up well with the sample data (the "forecast" for 2019 seems to be the model prediction for 1980 and so on, all the way through 2068 matching 2025.
I would like to be able to eliminate that and have it so "2062-2068" results are instead 2019-2025. I'll try and include a picture of the plot so it might be easier to visualize my plight.
Below is the R script:
#Download the CVS file, the dependent variable in the second column, xreg in the third, and years in the first. All columns have headers.
library(forecast)
library(DataCombine)
library(tseries)
library(MASS)
library(TSA)
ts(TXB102[,2], frequency = 1, start = c(1976, 1),end = c(2018, 1)) -> TXB102ts
ts(TXB102[,3], frequency = 1, start = c(1976, 1), end = c(2018,1)) -> TXB102xregtest
ts(TXB102[,3], frequency = 1, start = c(1976, 1), end = c(2025,1)) -> TXB102xreg
as.vector(t(TXB102ts)) -> y
as.vector(t(TXB102xregtest)) -> xregtest
as.vector(t(TXB102xreg)) -> xreg
y <- ts(y,frequency = 1, start = c(1976,1),end = c(2018,1))
xregtest <- ts(xregtest, frequency = 1, start = c(1976,1), end=c(2018,1))
xreg <- ts(xreg, frequency = 1, start = c(1976,1), end=c(2025,1))
summary(y)
plot(y)
ndiffs(y)
ARIMA <- auto.arima(y, trace = TRUE, stepwise = FALSE, approximation = FALSE, xreg=xregtest)
ARIMA
forecast(ARIMA,xreg=xreg)
plot(forecast(ARIMA,xreg=xreg))
The following is a plot of what I get after running the script.
Plot
TLDR: How do I get the real out-of-sample forecast to plot for 2019-2025 as opposed to the in-sample model fit it is passing along as 2019-2068.

precipitation histogram and line graph

I'm trying to make a monthly precipitation histogram in r using ggplot2 this is the data ideally columns 1 would be the x axis, column 2 would be histogram and column 3 would be line graph
Month MonthlyPrecipitation 30YearNormalPrecipitation
January 49.75 67.1
February 8.75 53.6
March 27 64.2
April 55.5 77.7
May 62.25 89.2
June 171.75 84.7
July 50.75 83.6
August 37.25 77.6
September 75.75 92.6
October 99.25 86.3
November 37.25 90.7
December 43.25 78.9
I've created fake data to recreate your example. Month a-l represents January through December. Here you first create a bar plot (I'm assuming that's what you meant by histogram. A true histogram of your data would be flat, given each item in your first variable is unique), then add the line graph. You have to include group = 1 at the end or it will return an error. And in case it wasn't clear, MP is my recreation of monthly precipitation, and NP is 30YearNormalPrecipitation.
set.seed(100)
Month <- c(letters[1:12])
MP <- rnorm(12, 50, 5)
NP <- rnorm(12, 80, 5)
df <- data.frame(Month, MP, NP)
ggplot(df, aes(x=Month, y = MP)) + geom_bar(stat = 'identity', alpha = 0.75) +
geom_line(aes(y = NP), colour="blue", group = 1)

R image function - error with color/value scaling

I have a question about the R function image in the biwavelet package. This function is used to plot correlation values as the z in the image function. I'm plotting perfectly correlated data sets (R-squared=1 as my z-value). When the image is plotted there is some variation in the colored boxes. I'm confused by this because all the values in my z matrix are 1. Any ideas why this could be happening, and how to fix it?
image(x$t, yvals, t(zvals), zlim = zlim, xlim = xlim, ylim = rev(ylim),
xlab = "Time", ylab = "Period", yaxt = "n", xaxt = "n", col = fill.colors)
zvals is a matrix of all 1s
UPDATE: based on request below
x$t
[1] 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
[29] 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
yvals
[1] 1.046901 1.130235 1.213568 1.296901 1.380235 1.463568 1.546901 1.630235 1.713568 1.796901 1.880235 1.963568 2.046901 2.130235 2.213568
[16] 2.296901 2.380235 2.463568 2.546901 2.630235 2.713568 2.796901 2.880235 2.963568 3.046901 3.130235 3.213568 3.296901 3.380235 3.463568
[31] 3.546901 3.630235 3.713568 3.796901
str(zvals)
num [1:34, 1:40] 1 1 1 1 1 1 1 1 1 1 ...
Above are the actual data. Below is an example, but plots as expected, whereas my data do not:
a <- 1980:2010
y <- sort( rnorm(30,2,.5) )
z <- matrix(nrow=30,ncol=31,1)
image(a, y, t(z), zlim = range(z), xlim = range(a), ylim = rev(range(y)), col = heat.colors(n=20))
Turns out the biwavelet defaults for zlim were causing the issue in that the color scaling was altered along a gradient of very small scale - (e.g. 0.99-1), which was imposing different colors in the plot. I'm assuming that the precision displayed for the z values was less than what R was storing, accounting for the tiny discrepancies between values? The solution was to impose a wider range for the zlim (e.g. 0-1) which then forced the color gradient scheme to print appropriately (a single color).

Ordering a Data Frame By 2 Parameters, Then Plotting

I have a data frame with GDP values for 12 South American countries over ~40 years. A snippet of the frame is as follows:
168 Chile 1244.1799 1972
169 Chile 4076.3207 1994
170 Chile 3474.7172 1992
171 Chile 2928.1562 1991
172 Chile 6143.7276 2004
173 Colombia 882.5687 1976
174 Colombia 1094.8795 1977
175 Colombia 5403.4557 2008
176 Colombia 2376.8022 2002
177 Colombia 2047.9784 1993
1) I want to order the data frame by country. The first ~40 values should pertain to Argentina, then next ~40 to Bolivia, etc.
2) Within each country grouping, I want to order by year. The first 3 rows should pertain to Argentina 2012, Argentina 2011, Argentina 2010, etc.
I can grab the data for each country individually using subset(), and then order it with order(). Surely I don't have to do this for every country and then use rbind()? How do I do it in one foul swoop?
3) Once I have the final product, I'd like to create 12 small, individual line graphs stacked vertically, each pertaining to a different country, which shows the trend of that country's GDP over the ~40 years. How I do create such a plot?
I'm sure I could find info on the 3rd question myself, but, well, I don't even know what such a graph is called in the first place..
Here is a solution with ggplot2. Assuming your data is in df:
library(ggplot2)
df$year.as.date <- as.Date(paste0(df$year, "-01-01")) # convert year to date
ggplot(df, aes(x=year.as.date, y=gdp)) +
geom_line() + facet_grid(country ~ .)
You don't actually need to sort by year and country, ggplot will handle that for you. Here is the data (clearly, only using 5 countries and 12 years, but this will work for your data). Also, I show you how to sort by two columns on the third line:
countries <- c("ARG", "BRA", "CHI", "PER", "URU")
df <- data.frame(country=rep(countries, 12), year=rep(2001:2012, each=5), gdp=runif(60))
df <- df[order(df$country, df$year),] # <- we sort here
df$gdp <- df$gdp + 1:12 / 2

Resources