Inconsistency date value when read.xlsx in R - r

I am using the read.xlsx function in R to read excel sheets. All the values of a date column 'A' is of the form dd/mm/yyyy. However,when using the read.xlsx function, the values of the date parsed ranges from being an integer ie. 42283 to string i.e. 20/08/2015. This probelm persist even when I uses read.xlsx2.
I guess the inconsistency in the format for different rows makes it hard to change the column to a single standard format. Also, it is hard to specify the column classes in the read.xlsx since I have more than 100 variables.
Are there ways around this problem and also is this an excel specific problems?
Thank you!

This problem with date formats is pervasive and it seems like every R package out there deals with it differently. My experience with read.xlsx has been that it sometimes saves the date as a character string of numbers, e.g. "42438" as character data that I then have to convert to numeric and then to POSIXct. Then other times, it seems to save it as numeric and sometimes as character and once in a while, actually as POSIXct! If you're consistently getting character data in the form "20/08/2015", try the lubridate package:
library(lubridate)
dmy("20/08/2015")

Related

Trying to correctly format all the dates in RStudio imported from Excel

I imported some data from Excel to RStudio (csv file). The data contains date information. The date format I want is month-day-year (e.g. 2-10-16 means February 10th 2016). The problem is that Excel auto-fills 2-10-16 to 2002-10-16, and the problem continues to exist when I imported the data to R. So, my data column contains both the correctly formatted dates (e.g. 2-10-16) and incorrectly formatted dates (e.g. 2002-10-16). Because I have a lot of dates, it is impossible to manually change everything. I have tried to use the this code
as.Date(data[,1], format="%m-%d-%y") but it gives me NA for those incorrectly formatted dates (e.g. 2002-10-16). Does anybody know how to make all the dates correctly formatted?
Thank you very much in advance!
would you consider to have a consistent date format in excel before importing the data to R?
The best approach is likely to change how the data is captured in Excel even if it means storing the dates as strings. What you're looking for is string manipulation to then convert into a date which could potentially create incorrect data.
This will remove the first two digits and then allow conversion to a date.
as.Date(sub('^\\d{2}', '', '2002-10-16'), '%m-%d-%y')
[1] "2016-02-10"

R problem Date column stored as Factor R can't convert it

I have downloaded the SP500 data from Yahoo Finance ticker GSPC and am trying to filter it by year, however the Date column is stored as Factor so R can't filter it. Can anyone help me convert it? I tried multiple solutions, but nothing worked.
So far I've used the loaded the lubridate package and used the following code, but all the values just got replaced with NA's.
as.Date(SP500$Date, format = "%m-%d-%Y")
Then I used the: SP500$Date <- ymd(SP500$Date, format = "%Y-%m-%d") code and again nothing happened. (SP500 is the name of the data frame that I stored the data in)
Also, tried using just SP500$Date <- as.Date(SP500$Date) but R says do not know how to convert it to Date.
Any help would be much appreciated! Thank you!
Classes only exist in the environment of a programming language. What likely happened was that your data (perhaps a .csv file?) got interpreted as factor by R during reading.
Everything you're trying to do here can be accomplished using the base library in R (meaning you don't need to import anything).
If you're dealing with dates:
df$date <- as.Date(df$date, format = "%Y-%m-%d")
If you're dealing with datetimes:
df$date <- as.POSIXct(df$date, format = "%Y-%m-%d %H:%M:%S")
(obviously the specific format may vary; see list)
Occasionally, coercion in R may act finicky. The format parameter is somewhat unforgiving of errors. I personally frequently mistake - for /, or conflate "%Y-%m-%d" with "%d-%m-%Y" causing the operation to throw an error. Obviously, if the format isn't consistent in your data, instances that can't be described by the specific format you supplied will result in NAs.
Sometimes your dates are actually integers (e.g. 20181111); in this case, you may need to supply '1970-01-01' to the origin parameter of as.Date(). For example, if you are iterating through a vector of Dates using a for loop, R won't honour the class of passed Dates and will convert them to integers.
It may sound like a bandaid solution, but class coercions from common types like character are usually written well; I often pre-emptively coerce the object to character when I'm clueless about why my attempt to coerce a class failed.

Mixed Timed Data

I have a vector that contains time data, but there's a problem: some of the entries are listed as dates (e.g., 10/11/2017), while other entries are listed as dates with time (e.g., 12/15/2016 09:07:17). This is problematic for myself, since as.Date() can't recognize the time portion and enters dates in an odd format (0012-01-20), while seemingly adding dates with time entries as NA's. Furthermore, using as.POSIXct() doesn't work, since not all entries are a combination of date with time.
I suspect that, since these entries are entered in a consistent format, I could hypothetically use an if function to change the entries in the vector to a consistent format, such as using an if statement to remove time entirely, but I don't know enough about it to get it to work.
use
library(lubridate)
Name of the data frame or table-> x
the column that has date->Date
use the ymd function
x$newdate<-ydm(x$Date)

R - Converting and joining integer and numeric variables

I have a problem with a join in R, I've tried to create a reproducible example, but every one I've created works as intended, and I have no idea what the problem is to recreate. The dput is too large to provide as a whole, is there a way I can attach a file?
It is a problem with joining on different data types, integer and numeric. Most of the join happens as expected, but some does not join. This was eventually solved by exporting the data to Excel, changing the offending numeric variable to the "Number" format with no decimal places, saving and importing back in to R, where it is now an integer.
Is there an R equivalent of this step? as.integer() or as.numeric() did not provide the same result as opening in Excel and converting.

How to convert date and time into a numeric value

As a new and self taught R user I am struggling with converting date and time values characters into numbers to enable me to group unique combinations of data. I'm hoping someone has come across this before and knows how I might go about it.
I'd like to convert a field of DateTime data (30/11/2012 14:35) to a numeric version of the date and time (seconds from 1970 maybe??) so that I can back reference the date and time if needed.
I have search the R help and online help and only seem to be able to find POSIXct, strptime which seem to convert the other way in the examples I've seen.
I will need to apply the conversion to a large dataset so I need to set the formatting for a field not an individual value.
I have tried to modify some python code but to no avail...
Any help with this, including pointers to tools I should read about would be much appreciated.
You can do this with base R just fine, but there are some shortcuts for common date formats in the lubridate package:
library(lubridate)
d <- ymd_hms("30/11/2012 14:35")
> as.numeric(d)
[1] 1921407275
From ?POSIXct:
Class "POSIXct" represents the (signed) number of seconds since the
beginning of 1970 (in the UTC timezone) as a numeric vector.

Resources