Very new to using R but encountering a problem when trying to work on the code for a stats project. I have attached the .csv file below for reference but essentially I would like to plot the years 2018,2019 and 2020 against the sum of international arrivals ("Int_Pax_In" in the excel file) from the first 6 months of each year from the "All Australian Airports" variable . So I will have 3 bars in my plot, with each being 2018,2019,2020 respectively with the y-axis labelled "All Australian Arrivals". The problem is, I just wanted to start off with a simple line of code to tabulate the "Year" variable without even trying to achieve the final result and simply putting in:
info=read.csv("mon_pax_web.csv")
table(info$Year)
doesn't give me any information. It simply gives me the number of observations for each year instead of anything else. Below is a screenshot of what I get:
Screenshot 1
info=read.csv("mon_pax_web.csv")
str(info)
table(info$Year)
I also tried changing my variables apart from "Year" into as.character and Month into factor but that had no effect as shown below:
Screenshot 2
info=read.csv("mon_pax_web.csv")
info$AIRPORT=as.character(info$AIRPORT)
info$Month=as.factor(info$Month)
info$Dom_Pax_In=as.character(info$Dom_Pax_In)
info$Dom_Pax_Out=as.character(info$Dom_Pax_Out)
info$Dom_Pax_Total=as.character(info$Dom_Pax_Total)
info$Int_Pax_Out=as.character(info$Int_Pax_Out)
info$Int_Pax_Total=as.character(info$Int_Pax_Total)
info$Pax_In=as.character(info$Pax_In)
info$Pax_Out=as.character(info$Pax_Out)
info$Pax_Total=as.character(info$Pax_Total)
info$Int_Pax_In=as.character(info$Int_Pax_In)
str(info)
table(info$Year)
I'm only allowed to use Base R for this project so would appreciate it a lot if people could help me out and if you do, provide coding using Base R so I could follow along. Just require some pointers so I could get started.
CSV File for reference
Thank you.
The column info$Year is just a vector of years, so when you do table(info$Year) it only shows the number of entries for that year because that's what you have asked for. If I gave you the following years: 2011, 2011, 2012 and 2013, and asked you to tabulate the years, without giving you any other information, all you could do is count the number of instances of each year. Presumably, this is not what meant.
I'm guessing what you're trying to do is to get the sum of Int_Pax_In per year. First you should filter so that your only include the years of interest, the months of interest, and the rows that represent all Australian airports. You can do this using subset:
df <- subset(info, Year > 2017 & Month < 7 & AIRPORT == "All Australian Airports")
Now we can use tapply to find the sum for each year:
plot_table <- tapply(df$Int_Pax_In, df$Year, sum)
Finally, we use barplot to create the bar graph you wanted:
barplot(plot_table, main = "Arrivals at all Australian airports January - June")
I'm an absolute R beginner here working on a Master's project.
I have a data.frame that contains information on trotting horses (their wins, earnings, time records and such). The data is organised in a way that every row contains information for a specific year the horse competed and including a first row for each horse of "Total", so there's a summary for every variable for it's total competing life. It looks like this:
I created a new variable with their age using the age_calc function in the eeptools package:
travdata$Age<-age_calc(as.Date(travdata$Birth.date), enddate=as.Date("2016-12-31"),
units="years")
With no problems. What I'm trying to figure out is if there is any way I can calculate the age of the horses for each specific year I have info on them-that is, the "Total" row would have their age up until 2016-12-31, for the year 2015 it would have their age at that time and so on. I've been trying to include if statements in age_calc but it won't work and I'm really at a loss on how best to do this.
Any literature or help you could point me to would be much, much appreciated.
MWE
travdata <- data.frame(
"Id.Number"=c(rep("1938-98",3),rep("1803-97",7),rep("1221-03",4)),
"Name"=c(rep("Muuttuva",3),rep("Pelson Poika",7),rep("Muusan Muisto",4)),
"Sex"=c(rep("Mare",3),rep("Gelding",7),rep("Gelding",4)),
"Birth.year"=c(rep(1998,3),rep(1997,7),rep(2003,4)),
"Birth.date"=c(rep("1998-07-01",3),rep("1997-07-14",7),rep("2003-05-07",4)),
"Competition.year" = c("Total",2005,2004,"Total",2003,2004,2006,2005,2002,2001,2008,2010,"Total",2009),
"starts"=c(20,11,9,44,21,6,7,5,3,2,1,1,4,2),
"X1st.placements"=c(0,0,0,3,3,0,0,0,0,0,0,0,0,0),
"X2nd.placements"=c(2,2,0,1,0,1,0,0,0,0,0,0,0,0),
"X3rd.placements"=c(2,2,0,1,1,0,0,0,0,0,0,0,0,0),
"Earnings.euro"=c(1525,1425,100,2078,1498,580,0,0,0,0,0,0,10,10)
)
The trick is to filter out the "Total" rows and specify a format for the as.Date() function
library(eeptools)
travdata <- data.frame(
"Id.Number"=c(rep("1938-98",3),rep("1803-97",7),rep("1221-03",4)),
"Name"=c(rep("Muuttuva",3),rep("Pelson Poika",7),rep("Muusan Muisto",4)),
"Sex"=c(rep("Mare",3),rep("Gelding",7),rep("Gelding",4)),
"Birth.year"=c(rep(1998,3),rep(1997,7),rep(2003,4)),
"Birth.date"=c(rep("1998-07-01",3),rep("1997-07-14",7),rep("2003-05-07",4)),
"Competition.year" = c("Total",2005,2004,"Total",2003,2004,2006,2005,2002,2001,2008,2010,"Total",2009),
"starts"=c(20,11,9,44,21,6,7,5,3,2,1,1,4,2),
"X1st.placements"=c(0,0,0,3,3,0,0,0,0,0,0,0,0,0),
"X2nd.placements"=c(2,2,0,1,0,1,0,0,0,0,0,0,0,0),
"X3rd.placements"=c(2,2,0,1,1,0,0,0,0,0,0,0,0,0),
"Earnings.euro"=c(1525,1425,100,2078,1498,580,0,0,0,0,0,0,10,10)
)
travdata$Age<-age_calc(as.Date(travdata$Birth.date),
enddate=as.Date("2016-12-31"), units="years")
competitions <- travdata[travdata$Competition.year!="Total",]
competitions$Competition.age<-age_calc(
as.Date(competitions$Birth.date),
enddate=as.Date(competitions$Competition.year, format="%Y"),
units="years",F)