Boxplot not plotting all data - r

I'm trying to plot a boxplot for a time series (e.g. http://www.r-graph-gallery.com/146-boxplot-for-time-series/) and can get every other example to work, bar my last one. I have averages per month for six years (2011 to 2016) and have data for 2014 and 2015 (albeit in small quantities), but for some reason, boxes aren't being shown for the 2014 and 2015 data.
My input data has three columns: year, month and residency index (a value between 0 and 1). There are multiple individuals (in this example, 37) each with an average residency index per month per year (including 2014 and 2015).
For example:
year month RI
2015 1 NA
2015 2 NA
2015 3 NA
2015 4 NA
2015 5 NA
2015 6 NA
2015 7 0.387096774
2015 8 0.580645161
2015 9 0.3
2015 10 0.225806452
2015 11 0.3
2015 12 0.161290323
2016 1 0.096774194
2016 2 0.103448276
2016 3 0.161290323
2016 4 0.366666667
2016 5 0.258064516
2016 6 0.266666667
2016 7 0.387096774
2016 8 0.129032258
2016 9 0.133333333
2016 10 0.032258065
2016 11 0.133333333
2016 12 0.129032258
which is repeated for each individual fish.
My code:
#make boxplot
boxplot(RI$RI~RI$month+RI$year,
xaxt="n",xlab="",col=my_colours,pch=20,cex=0.3,ylab="Residency Index (RI)", ylim=c(0,1))
abline(v=seq(0,12*6,12)+0.5,col="grey")
axis(1,labels=unique(RI$year),at=seq(6,12*6,12))
The average trend line works as per the other examples.
a=aggregate(RI$RI,by=list(RI$month,RI$year),mean, na.rm=TRUE)
lines(a[,3],type="l",col="red",lwd=2)
Any help on this matter would be greatly appreciated.

Your problem seems to be the presence of missing values, NA, in your data, the other values are plotted correctly. I've simplified your code a bit.
boxplot(RI$RI ~ RI$month + RI$year,
ylab="Residency Index (RI)")
a <- aggregate(RI ~ month + year, data = RI, FUN = mean, na.rm = TRUE)
lines(c(rep(NA, 6), a[,3]), type="l", col="red", lwd=2)
Also, I believe that maybe a boxplot is not the best way to depict your data. You only have one value per year/month, when a boxplot would require more. Maybe a simple scatter plot will do better.

Related

Aggregating based on previous year and this year

I have these data sets
month Year Rain
10 2010 376.8
11 2010 282.78
12 2010 324.58
1 2011 73.51
2 2011 225.89
3 2011 22.96
I used
df2prnext<-
aggregate(Rain~Year, data = subdataprnext, mean)
but I need the mean value of 217.53.
I am not getting the expected result. Thank you for your help.

Modeling a repeated measures logistic growth curve

I have cumulative population totals data for the end of each month for two years (2016, 2017). I would like to combine these two years and treat each months cumulative total as a repeated measure (one for each year) and fit a non linear growth model to these data. The goal is to determine whether our current 2018 cumulative monthly totals are on track to meet our higher 2018 year-end population goal by increasing the model's asymptote to our 2018 year-end goal. I would ideally like to integrate a confidence interval into the model that reflects the variability between the two years at each month.
My columns in my data.frame are as follows:
- Year is year
- Month is month
- Time is the month's number (1-12)
- Total is the month-end cumulative population total
- Norm is the proportion of year-end total for that month
- log is the Total log transformed
Year Month Total Time Norm log
1 2016 January 3919 1 0.2601567 8.273592
2 2016 February 5887 2 0.3907993 8.680502
3 2016 March 7663 3 0.5086962 8.944159
4 2016 April 8964 4 0.5950611 9.100972
5 2016 May 10014 5 0.6647637 9.211739
6 2016 June 10983 6 0.7290892 9.304104
7 2016 July 11775 7 0.7816649 9.373734
8 2016 August 12639 8 0.8390202 9.444543
9 2016 September 13327 9 0.8846920 9.497547
10 2016 October 13981 10 0.9281067 9.545455
11 2016 November 14533 11 0.9647504 9.584177
12 2016 December 15064 12 1.0000000 9.620063
13 2017 January 3203 1 0.2163458 8.071843
14 2017 February 5192 2 0.3506923 8.554874
15 2017 March 6866 3 0.4637622 8.834337
16 2017 April 8059 4 0.5443431 8.994545
17 2017 May 9186 5 0.6204661 9.125436
18 2017 June 10164 6 0.6865248 9.226607
19 2017 July 10970 7 0.7409659 9.302920
20 2017 August 11901 8 0.8038501 9.384378
21 2017 September 12578 9 0.8495778 9.439705
22 2017 October 13422 10 0.9065856 9.504650
23 2017 November 14178 11 0.9576494 9.559447
24 2017 December 14805 12 1.0000000 9.602720
Here is my data plotted as a scatter plot:
Should I treat the two years as separate models or can I combine all the data into one?
I've been able to calculate the intercept and the growth parameter for just 2016 using the following code:
coef(lm(logit(df_tot$Norm[1:12]) ~ df_tot$Time[1:12]))
and got a non-linear least squares regression for 2016 with this code:
fit <- nls(Total ~ phi1/(1+exp(-(phi2+phi3*Time))), start = list(phi1=15064, phi2 = -1.253, phi3 = 0.371), data = df_tot[c(1:12),], trace = TRUE)
Any help is more than appreciated! Time series non-linear modeling is not my strong suit and googling hasn't got me very far at this point.

Combining unequal data frames and applying a calculation

I've been doing some data cleaning and regressions but now I would like to apply the output however, I'm stuck on the following problem.
One data frame called "Historical" and looks like this:
Year Value
2014 5
2015 7.5
2016 11
The other data frame is called "forecast" and looks like this (new years in the future):
Year Growth
2017 0.05
2018 0.11
etc
So I would like to have one data frame to show historical values and forecasted values starting in 2017 (11*1.05)
How can I go about this?
Much appreciated
Given
a <- read.table(header=T, text="Year Value
2014 5
2015 7.5
2016 11")
b <- read.table(header=T, text="
Year Growth
2017 0.05
2018 0.11")
You could e.g. do
rbind(a, cbind(
Year=b$Year,
Value=cumprod(c(tail(a$Value, 1), 1+b$Growth))[-1])
)
# Year Value
# 1 2014 5.0000
# 2 2015 7.5000
# 3 2016 11.0000
# 4 2017 11.5500
# 5 2018 12.8205

R: How to plot multiple series when the series is included as a variable?

I want to plot multiple lines to one graph of five different time series. The problem is that my data frame is arranged like so:
Series Time Price ...
1 Dec 2003 5
2 Dec 2003 10
3 Dec 2003 2
1 Jan 2004 10
2 Jan 2004 10
3 Jan 2004 5
This is a simplified version, and there are many other variables for each observation. I'd like to be able to plot time vs price and use the first variable as the indicator for which series.
The time period is 77 months long, so I'm not sure if there's an easy way to reshape the data to look like:
Series Dec.2003.Price Jan.2004.Price ...
1 5 10
2 10 10
3 2 5
or a way to graph these like I said without reshaping.
You can try
xyplot(Price ~ Time, groups=Series, data=df, type="l")

Changing X-axis values in Time Series plot with R

I'm a newer R user and I need help with a time series plot. I created a time series plot, and cannot figure out how to change my x-axis values to correspond to my sample dates. My data is as follows:
Year Month Level
2009 8 350
2009 9 210
2009 10 173
2009 11 166
2009 12 153
2010 1 141
2010 2 129
2010 3 124
2010 4 103
2010 5 69
2010 6 51
2010 7 49
2010 8 51
2010 9 51
Let's say this data is saved as the name "data.csv"
data = read.table("data.csv", sep = ",", header = T)
data.ts = ts(data, frequency = 1)
plot(dat.mission.ts[, 3], ylab = "level", main = "main", axes = T)
I've also tried inputing the start = c(2009, 8) into the ts function but I still get wrong values
When I plot this my x axis does not correlate to August 2009 through Sept. 2010. It will either increase by year or just by decimal. I've looked up many examples online and also through the ? help on R, but cannot find a way to relabel my axis values. Any help would be appreciated.
Using base coding, you can accomplish this in a few steps. As described in this SO answer, you can identify your "Month" and "Year" data as a date if you use as.Date and paste functions together and incorporate a day (i.e., first day of the month; "1"). For the purposes of this answer, I will simply refer to the data you provided as df:
df$date<-with(df,as.Date(paste(Year,Month,'1',sep='-'),format='%Y-%m-%d'))
df
Year Month Level date
1 2009 8 350 2009-08-01
2 2009 9 210 2009-09-01
3 2009 10 173 2009-10-01
4 2009 11 166 2009-11-01
5 2009 12 153 2009-12-01
6 2010 1 141 2010-01-01
7 2010 2 129 2010-02-01
8 2010 3 124 2010-03-01
9 2010 4 103 2010-04-01
10 2010 5 69 2010-05-01
11 2010 6 51 2010-06-01
12 2010 7 49 2010-07-01
13 2010 8 51 2010-08-01
14 2010 9 51 2010-09-01
Then you can use your basic plot, axis, and mtext functions to control how you want to visualize the data and your axes. For instance:
xmin<-min(df$date,na.rm=T);xmax<-max(df$date,na.rm=T) #ESTABLISH X-VALUES (MIN & MAX)
ymin<-min(df$Level,na.rm=T);ymax<-max(df$Level,na.rm=T) #ESTABLISH Y-VALUES (MIN & MAX)
xseq<-seq.Date(xmin,xmax,by='1 month') #CREATE DATE SEQUENCE THAT INCREASES BY MONTH FROM DATE MINIMUM TO MAXIMUM
yseq<-round(seq(0,ymax,by=50),0) # CREATE SEQUENCE FROM 0-350 BY 50
par(mar=c(1,1,0,0),oma=c(6,5,3,2)) #CONTROLS YOUR IMAGE MARGINS
plot(Level~date,data=df,type='b',ylim=c(0,ymax),axes=F,xlab='',ylab='');box() #PLOT LEVEL AS A FUNCTION OF DATE, REMOVE AXES FOR FUTURE CUSTOMIZATION
axis.Date(side=1,at=xseq,format='%Y-%m',labels=T,las=3) #ADD X-AXIS LABELS WITH "YEAR-MONTH" FORMAT
axis(side=2,at=yseq,las=2) #ADD Y-AXIS LABELS
mtext('Date (Year-Month)',side=1,line=5) #X-AXIS LABEL
mtext('Level',side=2,line=4) #Y-AXIS LABEL
library(data.table)
library(ggplot2)
library(scales)
data<-data.table(datetime=seq(as.POSIXct("2009/08/01",format="%Y/%m/%d"),
as.POSIXct("2010/09/01",format="%Y/%m/%d"),by="1 month"),
Level=c(350,210,173,166,153,141,129,124,103,69,51,49,51,51))
ggplot(data)+
geom_point(aes(x=datetime,y=Level),col="brown1",size=1)+
scale_x_datetime(labels = date_format("%Y/%m"),breaks = "1 month")+
theme(axis.text.x = element_text(angle = 90, hjust = 1,vjust=0.3))
Example using xts package:
library(xts)
ts1 <- xts(data$Level, as.POSIXct(sprintf("%d-%d-01", data$Year, data$Month)))
# or ts1 <- xts(data$Level, as.yearmon(data$Year + (data$Month-1)/12))
plot(ts1)
If you are using ggplot2:
library(ggplot2)
autoplot(ts1)

Resources