Issue with reading csv file in R software with year variable - r

I am reading some csv files for each year and every table has a year (two digit year), day and month column; instead I need one column just with the date. I was doing fine using my R code until for one of the tables the year variable has four digits (e.g. 2000). In this case my code convert this year to 2020.
Any thoughts?
dt_00$date=as.Date(with(dt_00,paste(MONTH,DAY,YEAR,sep='-')),'%m-%d-%y)

Because lubridate accommodates quite a few date format varieties, this might work:
library(lubridate)
dt_00$date <- mdy(dt_00$date)

Related

Formatting year month variable as date

In Stata I have a variable yearmonth which is formatted as 201201, 201202 etc. for the years 2012 - 2019, monthly with no gaps. When I format the variable as
format yearmonth %tm
The results look like: 2.0e+05 for all periods, with the exact same number each time. A Dickey-Fuller test tells me I have gaps in my data (I don't) and a tsfill command generates dozens of empty observations between each period.
How do I properly format my yearmonth variable so I can set it as a monthly date?
You do have gaps — between 201212 and 201301, for example. Consider a statement like
gen wanted = ym(floor(yearmonth/100), mod(yearmonth, 100))
which parses your integers like 201201 into year and month components. So floor(201201/100) is floor(2012.01) and so 2012 while mod(201201, 100) is 1. The two components are then the arguments of ym() which expects a year and a month argument.
Then and only then will your format statement do you want. That command won’t create date variables.
See help datetime in Stata for more information and Problem with displaying reformatted string into a four-digit year in Stata 17 for an explanation of the difference between a date value and a date display format.

Converting monthly numerics to readable dates in R

How can set R to count months instead of dates when converting integers to dates?
After reading several threads on how to convert dates in R, it seems like nobody has asked how it is possible to convert numeric dates if the numerics is given in monthly timeseries. E.g. 552 represents January 2006.
I have tried several things, such as using as.Date(dates,origin="1899-12-01"), but I reckognize that R counts days instead of months. Thus, the code on year-month number 552 above yields "1901-06-06" instead of the correct 2006-01-01.
Sidenote: I also want the format to be YEARmonth, but does R allow displaying dates without days?
I think your starting date should be '1960-01-01'.
anyway you can solve this problem using the package lubridate.
in this case you can start from a date and add months.
library(lubridate)
as.Date('1960-01-01') %m+% months(552)
it gives you
[1] "2006-01-01"
you can display only the year and month of a date, but in that case R coerces the date into a character.
format(as.Date('2006-01-01'), "%Y-%m")

Using Lubridate in R Studio to create year, month, day columns gives unexpected results

I am importing a csv file with dates in and using lubridate to create additional columns to show the year, month and day. The date column in the csv file is called "Date". My code is as below:
library (lubridate)
Dates<- read.csv("DateSpreadsheet.csv")
Dates$Year<- year(Dates$Date)
Dates$Month<- month(Dates$Date)
Dates$Day<-day(Dates$Date)
View (Dates)
The problem is that in the table produced the year column shows the day and the day column shows the first 2 digits of the year.
Table showing columns for date, year, month and day
I would be grateful for any advice.
This is happening because the date is not stored in the ymd format. You can use the following code yo update your date table and it will give you the correct output.
Dates$Date<-dmy(Dates$Date)
let me know if this works.

Time series (xts) strptime; ONLY month and day

I've been trying to do a time series on my dataframe, and I need to strip times from my csv. This is what I've got:
campbell <-read.csv("campbell.csv")
campbell$date = strptime(campbell$date, "%m/%d")
campbell.ts <- xts(campbell[,-1],order.by=campbell[,1])
First, what I'm trying to do is just get xts to strip the dates as "xx/xx" meaning just the month and day. I have no year for my data. When I try that second line of code and call upon the date column, it converts it to "2013-xx-xx." These months and days have no year associated with them, and I can't figure out how to get rid of the 2013. (The csv file I'm calling on has the dates in the format "9/30,10/1...etc.)
Secondly, once I try and make a time series (the third line), I am unsure what the "order.by" command is calling on. What am I indexing?
Any help??
Thanks!
For strptime, you need to provide the full date, i.e. day, month and year. In case, any of these is not provided, current ones are assumed from the system's time and appended to the incomplete date. So, if you want to retain your date format as you have read it, first make a copy of that and store in a temporary variable and then use strptime over campbell$date to convert into R readable date format. Since, year is not a concern to you, you need not bother about it even though it is automatically appended by strptime.
campbell <-read.csv("campbell.csv")
date <- campbell$date
campbell$date <- strptime(campbell$date, "%m/%d")
Secondly, what you are doing by 'the third line' (xts(campbell[,-1],order.by=campbell[,1])) command is that, your are telling to order all the data of campbell except the first column (campbell[,-1]) according to the index provided by the time data in the first column of campbell (campbell[,1]). So, it would only work given the date is in the first column.
After ordering the data according to time-series, you can replace back the campbell$date column with date to get back the date format you wanted (although here, first you have to order date also like shown below)
date <- xts(date, order.by=campbell[,1]) # assuming campbell$date is campbell[,1]
campbell.ts <- xts(campbell[,-1], order.by=campbell[,1])
campbell.ts <- cbind(date, campbell.ts)
format(as.Date(campbell$dat, "%m/%d/%Y"), "%m/%d")

Creating a single timestamp from separate DAY OF YEAR, Year and Time columns in R

I have a time series dataset for several meteorological variables. The time data is logged in three separate columns:
Year (e.g. 2012)
Day of year (e.g. 261 representing 17-September in a Leap Year)
Hrs:Mins (e.g. 1610)
Is there a way I can merge the three columns to create a single timestamp in R? I'm not very familiar with how R deals with the Day of Year variable.
Thanks for any help with this!
It looks like the timeDate package can handle gregorian time frames. I haven't used it personally but it looks straightforward. There is a shift argument in some methods that allow you to set the offset from your data.
http://cran.r-project.org/web/packages/timeDate/timeDate.pdf
Because you mentioned it, I thought I'd show the actual code to merge together separate columns. When you have the values you need in separate columns you can use paste to bring them together and lubridate::mdy to parse them.
library(lubridate)
col.month <- "Jan"
col.year <- "2012"
col.day <- "23"
date <- mdy(paste(col.month, col.day, col.year, sep = "-"))
Lubridate is a great package, here's the official page: https://github.com/hadley/lubridate
And here is a nice set of examples: http://www.r-statistics.com/2012/03/do-more-with-dates-and-times-in-r-with-lubridate-1-1-0/
You should get quite far using ISOdatetime. This function takes vectors of year, day, hour, and minute as input and outputs an POSIXct object which represents time. You just have to split the third column into two separate hour minute columns and you can use the function.

Resources