Getting mysterious NA's when trying to parse date - r

I have not had experience with using dates in R. I have read all of the docs but I still can't figure out why I am getting this error. I am trying to take a vector of strings and convert that into a vector of dates, using some specified format. I have tried both using for loops and converting each date indicidually, or using vector functions like sapply, but neither is working. Here is the code using for loops:
dates = rawData[,ind] # get vector of date strings
print("single date example")
print(as.Date(dates[1]))
dDates = rep(1,length(dates)) # initialize vector of dates
class(dDates)="Date"
for (i in 1:length(dates)){
dDates[i]=as.Date(dates[i])
}
print(dDates[1:10])
EDIT: info on "dates" variables
[1] "dates"
V16 V17 V18 V19 V36
[1,] "2014-01-16" "2014-01-30" "2014-01-16" "2014-01-17" "1999-03-16 12:00"
[2,] "2014-01-04" "2014-01-18" "2014-01-04" "2014-01-08" "1998-09-04 12:00"
[3,] "2014-03-05" "2014-03-19" "2014-03-05" "2014-03-07" "1996-09-30 05:00"
[4,] "2014-01-21" "2014-02-04" "2014-01-22" "2014-01-24" "1995-08-21 12:00"
[5,] "2014-01-07" "2014-01-21" "2014-01-07" "2014-01-09" "1994-04-07 12:00"
[1] "class(dates)"
[1] "matrix"
[1] "class(dates[1,1])"
[1] "character"
[1] "dim(dates)"
[1] 56557 8
The result I am getting is as follows:
[1] "single date example"
[1] "2014-01-16"
Error in charToDate(x) :
character string is not in a standard unambiguous format
So basically, when I try to parse a signle element of the date string into a date, it works fine. But when I try to parse the dates in a loop, it breaks. How could this be so?
The reason why I am using a loop instead of sapply is because that was returning an even stranger result. When I try to run:
dDates = sapply(dDates, function(x) as.Date(x, format = "%Y-%m-%d"))
I am getting the following output:
2014-01-16 2014-01-04 2014-03-05 2014-01-21 2014-01-07 2014-01-02 2014-01-08
NA NA NA NA NA NA NA
2014-02-22 2014-01-09 2014-02-22
NA NA NA
Which is very strange. As you can see, since my format was correct, it was able to parse out the dates. But for some reason, it is also giving a time value of NA (or at least that is what I think the NA means). Maybe this is happening because some of my date strings have times, while others don't. But the thing is I left the time out of the format because I don't care about time.
Does anyone know why this is happening or how to fix it? I can't find anywhere online where you can "set" the time value of a date object easily -- I just can't seem to get rid of that NA. And somehow even a for loop doesn't work! Either was, the output is strange and I am not getting the expected results, even though my format is correct. Very frustrating that a simple thing like parsing a vector of dates is so much more difficult than in matlab or java.
Any help please?
EDIT: when I try simply
dDates = as.Date(dates,format="%m/%d/%Y")
I get the output
"dDates[1:10]"
[1] NA NA NA NA NA NA NA NA NA NA
still those mysterious NA's. I am also getting an error
Error in as.Date.default(value) :
do not know how to convert 'value' to class “Date”

Using a subset of your data,
v <- c("2014-01-16", "2014-01-30", "2014-01-16", "2014-01-17", "1999-03-16 12:00")
these statements are equivalent, since your format is the default one:
as.Date(v)
[1] "2014-01-16" "2014-01-30" "2014-01-16" "2014-01-17" "1999-03-16"
as.Date(v, format = "%Y-%m-%d")
[1] "2014-01-16" "2014-01-30" "2014-01-16" "2014-01-17" "1999-03-16"
If you would like to format the output of your date, use format:
format(as.Date(v), format = "%m/%d/%Y")
[1] "01/16/2014" "01/30/2014" "01/16/2014" "01/17/2014" "03/16/1999"

Related

as.POSIXct(%Y-%m-%dT%H:%M:S results in NAs

I have data that has a FixDateTime column (head below) where it is a character
head(df$FixDateTime)
[1] "2017-03-15 15:00:04" "2017-03-16 14:00:48" "2017-03-17 13:00:22"
[4] "2017-03-18 12:00:47" "2017-03-19 11:01:00" "2017-03-20 10:00:47"
class(df$FixDateTime)
[1] "character"
Using the code below I try to convert to as.POSIXct and the resulting column is full of NAs. I know that there are no NAs in my dataset
df$DateTime<-as.POSIXct(df$FixDateTime, format="%Y-%m%-dT%H:%M:%S", tz="MST")
head(df$DateTime)
[1] NA NA NA NA NA NA
I have also run the code in the same way omiting the "T" (with a space instead) and it results in the same thing
I have played with the timezone, and this does not seem to be the issue. I just need a column in the POSIXct format containing date and time.
You can use a tidyverse approach
lubridate::ymd_hms("2017-03-17 13:00:22",tz = "MST")

as.numeric returns NA for no apparent reason for some of the values in a column

While trying to convert a column of characters (strings of numbers, e.g "0.1234") into numeric, using as.numeric, some of the values are returned NA with the warning 'NAs introduced by coercion'. The characters that are returned as NAs don't seem to be any different from the ones that are returned as numeric correctly. Does anyone know what can be the problem?
Already tried to look for any characters that are not numeric (as ',') that can hide inside some of the values. I did find strings containing '-' (e.g "-0.123") that really turned into NAs, but these are only part of the strings turned into NAs. Also, tried to look for spaces inside the strings. that doesn't seem to be the problem as well.
data$y
[1] "0.833250539" "0.820323535" "0.462284612" "0.792943985" "0.860587952" "0.729665177" "0.461503956" "0.625871118"
[9] "0.740999346" "0.962727964" "0.971089266" "0.869004848" "0.828651766" "0.900648732" "0.970326033" "0.898123286"
[17] "0.911640765" "0.902442126" "0.843392097" "0.763421844" "0.892426243" "0.380433624" "0.925017633" "0.725470821"
[25] "0.699924767" "0.689061225" "0.907462936" "0.888064239" "0.913547115" "-‬0.625103904‭" "0.897385961" "0.889727462"
[33] "0.90127339" "0.947012474" "0.948883588" "0.845845512" "0.97866966" "0.796247738" "0.864627056" "0.266656189‭"
[41] "0.894915463" "0.969690678" "0.771365656‭" "0.88304436" "0.954039006" "0.836952199" "0.731558669‭" "0.907224294"
[49] "0.622059127" "0.887742343" "0.917550343" "0.97240334‭" "0.902841957" "0.617403052" "0.82926708" "0.674903846"
[57] "0.947132958" "0.929213613‭" "-‬0.297844476" "0.871767367"
y = as.numeric(data$y)
Warning message:
NAs introduced by coercion
y
[1] 0.8332505 0.8203235 0.4622846 0.7929440 0.8605880 0.7296652 0.4615040 0.6258711 0.7409993 0.9627280 0.9710893 0.8690048 0.8286518
[14] 0.9006487 0.9703260 0.8981233 0.9116408 0.9024421 0.8433921 0.7634218 0.8924262 0.3804336 0.9250176 0.7254708 0.6999248 0.6890612
[27] 0.9074629 0.8880642 0.9135471 NA 0.8973860 0.8897275 0.9012734 0.9470125 0.9488836 0.8458455 0.9786697 0.7962477 0.8646271
[40] NA 0.8949155 0.9696907 NA 0.8830444 0.9540390 0.8369522 NA 0.9072243 0.6220591 0.8877423 0.9175503 NA
[53] 0.9028420 0.6174031 0.8292671 0.6749038 0.9471330 NA NA 0.8717674
Your strings contain some non-unicode characters. If you are certain that it is safe to remove them, use
as.numeric(iconv(data$y, 'utf-8', 'ascii', sub=''))
Ref on the conversion
Copy and pasting your character gives me (for the example of the last NA) "-,0.297844476". There is something wrong with the encoding. You can work around by using
as.numeric(gsub(",","",data$y))
edit This answer does not work on all your NAs... I don't really know what is going on with your data, please provide a dput if possible.

Performance Analytics error Error in na.omit.xts(x) : unsupported type

I have a data frame "test" that looks like this. There are no N/A or inf. All days are populated with data.
head(test)
businessdate strategy 1 Strategy 2
1 2014-01-01 0.000000000 0.0000000
2 2014-01-02 0.010058520 -0.3565398
3 2014-01-03 0.000707818 0.2622737
4 2014-01-06 -0.019879142 -0.2891257
5 2014-01-07 -0.019929352 -0.2271491
6 2014-01-08 0.027108810 -0.7827856
When I look at the class of the those columns I see:
> class(test[,1])
[1] "POSIXct" "POSIXt"
> class(test[,2])
[1] "numeric"
> class(test[,3])
[1] "numeric"
So I think I can turn this into an xts object and to use Performance analytics. Here I turn it into a xts:
test_xts<- xts(test, order.by= test[,1])
Now I try to use the Performance analytics package and get an error:
charts.PerformanceSummary(test_xts,geometric= TRUE,cex.axis=1.5)
The error I get is:
Error in na.omit.xts(x) : unsupported type
Any idea what is happening and how to fix it?
xts/zoo objects are a matrix with an index attribute. You can't mix types in a matrix. There's no need to specify the businessdate as the index and in the coredata, so do not include it in the coredata.
test_xts <- xts(test[,-1], order.by= test[,1])

Character to date with as.Date

I have a vector (length=1704) of character like this:
[1] "1871_01" "1871_02" "1871_03" "1871_04" "1871_05" "1871_06" "1871_07" "1871_08" "1871_09" "1871_10" "1871_11" "1871_12"
[13] "1872_01" "1872_02" "1872_03" ...
.
.
.
[1681] "2011_01" "2011_02" "2011_03" "2011_04" "2011_05" "2011_06" "2011_07" "2011_08" "2011_09" "2011_10" "2011_11" "2011_12"
[1693] "2012_01" "2012_02" "2012_03" "2012_04" "2012_05" "2012_06" "2012_07" "2012_08" "2012_09" "2012_10" "2012_11" "2012_12"
I want to convert this vector into a vector of dates.
For that I use:
as.Date(vector, format="%Y_%m")
But it returns "NA"
I tried for one value:
b <- "1871_01"
as.Date(b, format="%Y_%m")
[1] NA
strptime(b, "%Y_%m")
[1] NA
I don't understand why it doesn't work...
Does anyone have a clue?
If you do regular work in year+month format, the zoo package can come in handy since it treats yearmon as a first class citizen (and is compatible with Date objects/functions):
library(zoo)
my.ym <- as.yearmon("1871_01", format="%Y_%m")
print(my.ym)
## [1] "Jan 1871"
str(my.ym)
## Class 'yearmon' num 1871
my.date <- as.Date(my.date)
print(my.date)
## [1] "1871-01-01"
str(my.date)
## Date[1:1], format: "1871-01-01"

R's strptime of a time datetime at midnight (00:00:00) gives NA

R's base strptime function is giving me output I do not expect.
This works as expected:
strptime(20130203235959, "%Y%m%d%H%M%S")
# yields "2013-02-03 23:59:59"
This too:
strptime(20130202240000, "%Y%m%d%H%M%S")
# yields "2013-02-03"
...but this does not. Why?
strptime(20130203000000, "%Y%m%d%H%M%S")
# yields NA
UPDATE
The value 20130204000000 showed up in a log I generated on a Mac 10.7.5 system using the command:
➜ ~ echo `date +"%Y%m%d%H%M%S"`
20130204000000
UPDATE 2
I even tried lubridate, which seem to be the recommendation:
> parse_date_time(c(20130205000001), c("%Y%m%d%H%M%S"))
1 parsed with %Y%m%d%H%M%S
[1] "2013-02-05 00:00:01 UTC"
> parse_date_time(c(20130205000000), c("%Y%m%d%H%M%S"))
1 failed to parse.
[1] NA
...and then funnily enough, it printed out "00:00:00" when I added enough seconds to now() to reach midnight:
> now() + new_duration(13000)
[1] "2013-02-10 00:00:00 GMT"
I should use character and not numeric when I parse my dates:
> strptime(20130203000000, "%Y%m%d%H%M%S") # No!
[1] NA
> strptime("20130203000000", "%Y%m%d%H%M%S") # Yes!
[1] "2013-02-03"
The reason for this seems to be that my numeric value gets cast to character, and I used one too many digits:
> as.character(201302030000)
[1] "201302030000"
> as.character(2013020300000)
[1] "2013020300000"
> as.character(20130203000000)
[1] "2.0130203e+13" # This causes the error: it doesn't fit "%Y%m%d%H%M%S"
> as.character(20130203000001)
[1] "20130203000001" # And this is why anything other than 000000 worked.
A quick lesson in figuring out the type you need from the docs: In R, execute help(strptime) and see a popup similar to the image below.
The red arrow points to the main argument to the function, but does not specify the type (which is why I just tried numeric).
The green arrow points to the type, which is in the document's title.
you are essentially asking for the "zeroeth" second, which obviously doesn't exist :)
# last second of february 3rd
strptime(20130203235959, "%Y%m%d%H%M%S")
# first second of february 4rd -- notice this 'rounds up' to feb 4th
# even though it says february 3rd
strptime(20130203240000, "%Y%m%d%H%M%S")
# no such second
strptime(20130204000000, "%Y%m%d%H%M%S")
# 2nd second of february 4th
strptime(20130204000001, "%Y%m%d%H%M%S")

Resources