How do I add specific data from the original monthly average dataset to a new dataset? - r

I have an input data set with average monthly water flow for a river. This file has monthly data from 1912 to 2021 and I have read it into the program as "input1". I am trying to create a new dataset called "AugAvgs" that only includes the average August water flow for the years 1980 through 2021. I am new to programming in R and am not sure how to go about this.
Here is my current failed attempt:
AugAvgs$year <- as.numeric(c(1980:2021)) #creates august table and fills year column
AuAvgs$avg <- input1$mean_va(year>1980, month=8)
The line of code that creates AugAvgs and fills in the year column works. The next line of code attempting to add the specific data is giving me the error "Error: attempt to apply non-function". I believe this is because "input1$mean_va" is in the function location, but I don't know how to fix it. I tried using a series of if statements to filter through the data, but that did not work either because I was using an array with if statements that required single variables. How should I go about doing this? Thank you for the help!

I'm assuming mean_va is a column in input1. You can subset to years greater 1980 and month == 8. To only include august means after 1980.
input1[input1$year > 1980 & input1$month==8,]$mean_va
It would really be helpful to include a sample output of input1 and not a screenshot. Upon looking at the screenshot with the supposed column names you might try this. This assumes the columns are integers.
input1[input1$year_nu > 1980 & input1$month_nu==8,]$mean_va

Related

How do I stop the number of observations coming up when trying to tabulate a variable?

Very new to using R but encountering a problem when trying to work on the code for a stats project. I have attached the .csv file below for reference but essentially I would like to plot the years 2018,2019 and 2020 against the sum of international arrivals ("Int_Pax_In" in the excel file) from the first 6 months of each year from the "All Australian Airports" variable . So I will have 3 bars in my plot, with each being 2018,2019,2020 respectively with the y-axis labelled "All Australian Arrivals". The problem is, I just wanted to start off with a simple line of code to tabulate the "Year" variable without even trying to achieve the final result and simply putting in:
info=read.csv("mon_pax_web.csv")
table(info$Year)
doesn't give me any information. It simply gives me the number of observations for each year instead of anything else. Below is a screenshot of what I get:
Screenshot 1
info=read.csv("mon_pax_web.csv")
str(info)
table(info$Year)
I also tried changing my variables apart from "Year" into as.character and Month into factor but that had no effect as shown below:
Screenshot 2
info=read.csv("mon_pax_web.csv")
info$AIRPORT=as.character(info$AIRPORT)
info$Month=as.factor(info$Month)
info$Dom_Pax_In=as.character(info$Dom_Pax_In)
info$Dom_Pax_Out=as.character(info$Dom_Pax_Out)
info$Dom_Pax_Total=as.character(info$Dom_Pax_Total)
info$Int_Pax_Out=as.character(info$Int_Pax_Out)
info$Int_Pax_Total=as.character(info$Int_Pax_Total)
info$Pax_In=as.character(info$Pax_In)
info$Pax_Out=as.character(info$Pax_Out)
info$Pax_Total=as.character(info$Pax_Total)
info$Int_Pax_In=as.character(info$Int_Pax_In)
str(info)
table(info$Year)
I'm only allowed to use Base R for this project so would appreciate it a lot if people could help me out and if you do, provide coding using Base R so I could follow along. Just require some pointers so I could get started.
CSV File for reference
Thank you.
The column info$Year is just a vector of years, so when you do table(info$Year) it only shows the number of entries for that year because that's what you have asked for. If I gave you the following years: 2011, 2011, 2012 and 2013, and asked you to tabulate the years, without giving you any other information, all you could do is count the number of instances of each year. Presumably, this is not what meant.
I'm guessing what you're trying to do is to get the sum of Int_Pax_In per year. First you should filter so that your only include the years of interest, the months of interest, and the rows that represent all Australian airports. You can do this using subset:
df <- subset(info, Year > 2017 & Month < 7 & AIRPORT == "All Australian Airports")
Now we can use tapply to find the sum for each year:
plot_table <- tapply(df$Int_Pax_In, df$Year, sum)
Finally, we use barplot to create the bar graph you wanted:
barplot(plot_table, main = "Arrivals at all Australian airports January - June")

Meteorological data re-organizing

I have an excel file with daily data of temperature from 1903-2018. I would like to re-organize these data. For example the data begins in 1903 and you can see in the first attached file that 2 columns are shown : 1st: the month with daily (1 to 31) and 2nd(Temperature; TAM). I want to make 12 columns for the months for each year (2nd attached file). Its easy to do by hand, but it goes from 1903 to 2018...
So it would be a hard thing to do by hand. Can someone help me to code this (in R)?
attached1 attached2
I don't know the code you would use, but I can think of an algorithm:
Go down column B until you see keyword "TAM"
Set variable col to 1
After seeing "TAM", increment variable col by 1
Print all numbers in list under "TAM" to column # col until you don't see a proper date for it in the A column
goto 3 until out of data

how I change the frame data into time series?

I have a daily rainfall data for 36 years. I want to analyze the time series, but my data is still in the form of frame data, how I change the frame data into time series. My data is a variable, how to unify the year number with the date and month, so the data is only in one column
You could use a time series package for that, such as fpp i.e. install.packages('fpp'). Since you don't give an example code, I can't really help you properly with it but it's quite easy.
ts(your_data, start =, frequency = ) At start = you put the year or month where you'd start and at frequency = you'd put e.g. 36 since you talk about 36 years.
You might want to check out https://robjhyndman.com/. He has an online (free) book available that walks you through the use of his package as well as providing useful information with respect to time series analysis.
Hope this helps.

Simple time series analysis with R: aggregating and subsetting

I want to convert monthly data into quarterly averages. These are my 2 datasets:
gas <- UKgas
dd <- UKDriverDeaths
I was able to accomplish (I think) for the dd data as so:
dd.zoo <- zoo(dd)
ddq <- aggregate(dd.zoo, as.yearqtr, mean)
However I cannot figure out how to do this with the gas data...any help?
Follow-up
When I try to subset the data based on date (1969-1984) the resulting data does not include 1969 Q1 and instead includes 1985 Q1...any suggestions on how to fix this? I was just trying to subset as gas[1969:1984].
Originally I did not plan to post answer, as it looks like you did not pre-check your UKgas dataset to see that it is already a quarterly time series.
But the follow-up question is worth answering. "ts" object comes with many handy generic functions. We can use window to easily subset a time series. To extract the section between first quarter of 1969 and the final quarter of 1984, we can use
window(UKgas, start = c(1969,1), end = c(1984,4))
The result will still be a quarterly time series.
On the other hand, if we use "[" for subsetting, we lose object class:
class(UKgas[1:12])
#[1] "numeric"

Getting Date to Add Correctly

I have a 3000 x 1000 matrix time series database going back 14 years that is updated every three months. I am forecasting out 9 months using this data still keeping a 3200 x 1100 matrix (mind you these are rough numbers).
During the forecasting process I need the variables Year and Month to be calculated appropriately . I am trying to automate the process so I don't have to mess with the code any more; I can just run the code every three months and upload the projections into our database.
Below is the code I am using right now. As I said above I do not want to have to look at the data or the code just run the code every three months. Right now everything else is working as planed, but I still have to ensure the dates are appropriately annotated. The foo variables are changed for privacy purposes due to the nature of their names.
projection <- rbind(projection, data.frame(foo=forbar, bar=barfoo,
+ Year=2012, Month=1:9,
+ Foo=as.vector(fc$mean)))
I'm not sure exactly where the year/months are coming from, but if you want to refer to the current date for those numbers, here is an option (using the wonderful package, lubridate):
library(lubridate)
today = Sys.Date()
projection <- rbind(projection, data.frame(foo=foobar, bar=barfoo,
year = year(today),
month = sapply(1:9,function(x) month(today+months(x))),
Foo = as.vector(fc$mean)))
I hope this is what you're looking for.

Resources