How can I convert a characters into dates in RStudio? - r

still new to R. I wanted to create a simple (bar) chart of the fluctuations/occurrences of burglaries per month in my city. I found that the column, 'Occurence_Date' is a character, I wanted it to be "time", or something simpler, to create a visualization. I wanted the "x-axis" to be the months of January to June 2019, with the "y-axis" to be the amount of burglaries per month. Can anyone help me get started on this please? Thanks!
This is my data frame

The lubridate package is very helpful for working with dates and times in R.
# load.packages("lubridate") ## only run once
library(lubridate)
df$Occurence_Date <- ymd(df$Occurence_Date) # converts text in year month day format, igrores time
Generally it's better to put example data in your question so people can work with it and show an example.

Related

How to simplify date in graph axis to month and year?

Apologies for a question on something which is probably very straightforward. I am very new to R.
I have a dataframe which contains dates in a year,month,day format e.g. "2020-05-28". I wanted to stratify the data at month level, so I used the "floor_date" function. However, now the dates read as "2020-05-01" etc. This is absolutely fine for the data set itself, but I am creating epidemiological curves and want to change the dates to "2020-05" etc. on the legend. Could anyone provide some guidance on how to do this? I can't simply replace the pattern "01" with a blank as I need to keep 01 on month level (January) visible.

How do I stop the number of observations coming up when trying to tabulate a variable?

Very new to using R but encountering a problem when trying to work on the code for a stats project. I have attached the .csv file below for reference but essentially I would like to plot the years 2018,2019 and 2020 against the sum of international arrivals ("Int_Pax_In" in the excel file) from the first 6 months of each year from the "All Australian Airports" variable . So I will have 3 bars in my plot, with each being 2018,2019,2020 respectively with the y-axis labelled "All Australian Arrivals". The problem is, I just wanted to start off with a simple line of code to tabulate the "Year" variable without even trying to achieve the final result and simply putting in:
info=read.csv("mon_pax_web.csv")
table(info$Year)
doesn't give me any information. It simply gives me the number of observations for each year instead of anything else. Below is a screenshot of what I get:
Screenshot 1
info=read.csv("mon_pax_web.csv")
str(info)
table(info$Year)
I also tried changing my variables apart from "Year" into as.character and Month into factor but that had no effect as shown below:
Screenshot 2
info=read.csv("mon_pax_web.csv")
info$AIRPORT=as.character(info$AIRPORT)
info$Month=as.factor(info$Month)
info$Dom_Pax_In=as.character(info$Dom_Pax_In)
info$Dom_Pax_Out=as.character(info$Dom_Pax_Out)
info$Dom_Pax_Total=as.character(info$Dom_Pax_Total)
info$Int_Pax_Out=as.character(info$Int_Pax_Out)
info$Int_Pax_Total=as.character(info$Int_Pax_Total)
info$Pax_In=as.character(info$Pax_In)
info$Pax_Out=as.character(info$Pax_Out)
info$Pax_Total=as.character(info$Pax_Total)
info$Int_Pax_In=as.character(info$Int_Pax_In)
str(info)
table(info$Year)
I'm only allowed to use Base R for this project so would appreciate it a lot if people could help me out and if you do, provide coding using Base R so I could follow along. Just require some pointers so I could get started.
CSV File for reference
Thank you.
The column info$Year is just a vector of years, so when you do table(info$Year) it only shows the number of entries for that year because that's what you have asked for. If I gave you the following years: 2011, 2011, 2012 and 2013, and asked you to tabulate the years, without giving you any other information, all you could do is count the number of instances of each year. Presumably, this is not what meant.
I'm guessing what you're trying to do is to get the sum of Int_Pax_In per year. First you should filter so that your only include the years of interest, the months of interest, and the rows that represent all Australian airports. You can do this using subset:
df <- subset(info, Year > 2017 & Month < 7 & AIRPORT == "All Australian Airports")
Now we can use tapply to find the sum for each year:
plot_table <- tapply(df$Int_Pax_In, df$Year, sum)
Finally, we use barplot to create the bar graph you wanted:
barplot(plot_table, main = "Arrivals at all Australian airports January - June")

How to get monthly time series cross sectional into zoo using R

I want to get a panel data set into zoo so that it catches both month and year. My data set looks like this.
and the data can be downloaded from HERE.
The best way I could do is,
dat<-read.csv("dat_lag.csv")
zdat <- read.zoo(dat, format="%d/%m/%Y")
However, I could do this by including column 1- Date and column 4- Day in my data set. Is there any clever way to get both month and year into zoo using R without including the Date and Day columns? Thanks, in advance for any help.

how to extract all values from a data frame by month for years

I have data in a zoo data structure. I want to pull all August daily vaules over 10 years and compute monthly statistics for a period of record. Any thoughts on easy way to do this?
an example will be great of the specific date format, however, try format()
for example:
x <- as.POSIXct("2009-08-03 12:01:59.23")
format(x,"%b")
For simplicity, just create a new column with the format() then subset it with the month your looking for.

Creating a single timestamp from separate DAY OF YEAR, Year and Time columns in R

I have a time series dataset for several meteorological variables. The time data is logged in three separate columns:
Year (e.g. 2012)
Day of year (e.g. 261 representing 17-September in a Leap Year)
Hrs:Mins (e.g. 1610)
Is there a way I can merge the three columns to create a single timestamp in R? I'm not very familiar with how R deals with the Day of Year variable.
Thanks for any help with this!
It looks like the timeDate package can handle gregorian time frames. I haven't used it personally but it looks straightforward. There is a shift argument in some methods that allow you to set the offset from your data.
http://cran.r-project.org/web/packages/timeDate/timeDate.pdf
Because you mentioned it, I thought I'd show the actual code to merge together separate columns. When you have the values you need in separate columns you can use paste to bring them together and lubridate::mdy to parse them.
library(lubridate)
col.month <- "Jan"
col.year <- "2012"
col.day <- "23"
date <- mdy(paste(col.month, col.day, col.year, sep = "-"))
Lubridate is a great package, here's the official page: https://github.com/hadley/lubridate
And here is a nice set of examples: http://www.r-statistics.com/2012/03/do-more-with-dates-and-times-in-r-with-lubridate-1-1-0/
You should get quite far using ISOdatetime. This function takes vectors of year, day, hour, and minute as input and outputs an POSIXct object which represents time. You just have to split the third column into two separate hour minute columns and you can use the function.

Resources