as.POSIXct(%Y-%m-%dT%H:%M:S results in NAs - r

I have data that has a FixDateTime column (head below) where it is a character
head(df$FixDateTime)
[1] "2017-03-15 15:00:04" "2017-03-16 14:00:48" "2017-03-17 13:00:22"
[4] "2017-03-18 12:00:47" "2017-03-19 11:01:00" "2017-03-20 10:00:47"
class(df$FixDateTime)
[1] "character"
Using the code below I try to convert to as.POSIXct and the resulting column is full of NAs. I know that there are no NAs in my dataset
df$DateTime<-as.POSIXct(df$FixDateTime, format="%Y-%m%-dT%H:%M:%S", tz="MST")
head(df$DateTime)
[1] NA NA NA NA NA NA
I have also run the code in the same way omiting the "T" (with a space instead) and it results in the same thing
I have played with the timezone, and this does not seem to be the issue. I just need a column in the POSIXct format containing date and time.

You can use a tidyverse approach
lubridate::ymd_hms("2017-03-17 13:00:22",tz = "MST")

Related

as.numeric returns NA for no apparent reason for some of the values in a column

While trying to convert a column of characters (strings of numbers, e.g "0.1234") into numeric, using as.numeric, some of the values are returned NA with the warning 'NAs introduced by coercion'. The characters that are returned as NAs don't seem to be any different from the ones that are returned as numeric correctly. Does anyone know what can be the problem?
Already tried to look for any characters that are not numeric (as ',') that can hide inside some of the values. I did find strings containing '-' (e.g "-0.123") that really turned into NAs, but these are only part of the strings turned into NAs. Also, tried to look for spaces inside the strings. that doesn't seem to be the problem as well.
data$y
[1] "0.833250539" "0.820323535" "0.462284612" "0.792943985" "0.860587952" "0.729665177" "0.461503956" "0.625871118"
[9] "0.740999346" "0.962727964" "0.971089266" "0.869004848" "0.828651766" "0.900648732" "0.970326033" "0.898123286"
[17] "0.911640765" "0.902442126" "0.843392097" "0.763421844" "0.892426243" "0.380433624" "0.925017633" "0.725470821"
[25] "0.699924767" "0.689061225" "0.907462936" "0.888064239" "0.913547115" "-‬0.625103904‭" "0.897385961" "0.889727462"
[33] "0.90127339" "0.947012474" "0.948883588" "0.845845512" "0.97866966" "0.796247738" "0.864627056" "0.266656189‭"
[41] "0.894915463" "0.969690678" "0.771365656‭" "0.88304436" "0.954039006" "0.836952199" "0.731558669‭" "0.907224294"
[49] "0.622059127" "0.887742343" "0.917550343" "0.97240334‭" "0.902841957" "0.617403052" "0.82926708" "0.674903846"
[57] "0.947132958" "0.929213613‭" "-‬0.297844476" "0.871767367"
y = as.numeric(data$y)
Warning message:
NAs introduced by coercion
y
[1] 0.8332505 0.8203235 0.4622846 0.7929440 0.8605880 0.7296652 0.4615040 0.6258711 0.7409993 0.9627280 0.9710893 0.8690048 0.8286518
[14] 0.9006487 0.9703260 0.8981233 0.9116408 0.9024421 0.8433921 0.7634218 0.8924262 0.3804336 0.9250176 0.7254708 0.6999248 0.6890612
[27] 0.9074629 0.8880642 0.9135471 NA 0.8973860 0.8897275 0.9012734 0.9470125 0.9488836 0.8458455 0.9786697 0.7962477 0.8646271
[40] NA 0.8949155 0.9696907 NA 0.8830444 0.9540390 0.8369522 NA 0.9072243 0.6220591 0.8877423 0.9175503 NA
[53] 0.9028420 0.6174031 0.8292671 0.6749038 0.9471330 NA NA 0.8717674
Your strings contain some non-unicode characters. If you are certain that it is safe to remove them, use
as.numeric(iconv(data$y, 'utf-8', 'ascii', sub=''))
Ref on the conversion
Copy and pasting your character gives me (for the example of the last NA) "-,0.297844476". There is something wrong with the encoding. You can work around by using
as.numeric(gsub(",","",data$y))
edit This answer does not work on all your NAs... I don't really know what is going on with your data, please provide a dput if possible.

Weird conversion from list to dataframe in R

I have a list that I created from a for loop and it looks like this:
I tried to convert it to a dataframe using the code:
dflist<- as.data.frame(mylist)
But my dataframe looks like this now:
I know I probably created my list wrong but I am thinking this is still salvageable if I just need to convert the numbers to a dataframe correctly.
My end goal is to plot the numbers against their index (1-30) and I thought creating a dataframe first to clean it up and then plot would be helpful.
Any help would be really appreciated. Thank you.
The data showed is a list. We can use unlist and create a data.frame. Based on the image showed in OP's post, each list element have a length of 1. By doing unlist, we convert the list to vector and then wrap with data.frame.
data.frame(ind= seq_along(lst), Col1= as.numeric(unlist(lst)))
Or another option would be stack after naming the list elements
df1 <- transform(stack(setNames(lst, seq_along(lst))),
values = as.numeric(values))
It gives a two column dataset. From this we can do the plotting
Regarding the OP's approach about calling as.data.frame directly on the list, it does work in a different way as it calls on as.data.frame.list. For example, if we do as.data.frame on a vector, it uses as.data.frame.vector
as.data.frame(1:5)
# 1:5
#1 1
#2 2
#3 3
#4 4
#5 5
But, if we call as.data.frame.list
as.data.frame.list(1:5)
# X1L X2L X3L X4L X5L
#1 1 2 3 4 5
we get a data.frame with 'n' columns (based on the length of the vector).
Suppose, we do the same on a list
as.data.frame(as.list(1:5))
# X1L X2L X3L X4L X5L
#1 1 2 3 4 5
It uses the as.data.frame.list. To get the complete list of methods of as.data.frame,
methods('as.data.frame')
#[1] as.data.frame.aovproj* as.data.frame.array
# [3] as.data.frame.AsIs as.data.frame.character
# [5] as.data.frame.chron* as.data.frame.complex
# [7] as.data.frame.data.frame as.data.frame.data.table*
# [9] as.data.frame.Date as.data.frame.dates*
#[11] as.data.frame.default as.data.frame.difftime
#[13] as.data.frame.factor as.data.frame.ftable*
#[15] as.data.frame.function* as.data.frame.grouped_df*
#[17] as.data.frame.idf* as.data.frame.integer
#[19] as.data.frame.ITime* as.data.frame.list <-------
#[21] as.data.frame.logical as.data.frame.logLik*
#[23] as.data.frame.matrix as.data.frame.model.matrix
#[25] as.data.frame.noquote as.data.frame.numeric
#[27] as.data.frame.numeric_version as.data.frame.ordered
#[29] as.data.frame.POSIXct as.data.frame.POSIXlt
#[31] as.data.frame.raw as.data.frame.rowwise_df*
#[33] as.data.frame.table as.data.frame.tbl_cube*
#[35] as.data.frame.tbl_df* as.data.frame.tbl_dt*
#[37] as.data.frame.tbl_sql* as.data.frame.times*
#[39] as.data.frame.ts as.data.frame.vector

How to insert a new element to a vector?

I have a vector such as this; (1X2406)
head(lnreturn)
[1] NA 0.004002188 0.003262646 -0.009454616 0.001460387
[6] 0.004005103
I would like to insert an NA as a first element so that I could reach a vector like this:
[1] NA NA 0.004002188 0.003262646 -0.009454616
[6] 0.001460387
Hence, I would get a vector in (1X2407) dimension.
Just use c()
x<-rnorm(10)
x<-c(NA,x)
x
[1] NA -0.004620768 0.760242168 0.038990913 0.735072142 -0.146472627
[7] -0.057887335 0.482369466 0.992943637 -1.246395498 -0.033487525
its easy (like etienne posted)
if you want a vector with same length as a result (like in your example) you can use length().
x<-rnorm(10)
x<-c(NA,x)[1:length(x)]

Getting mysterious NA's when trying to parse date

I have not had experience with using dates in R. I have read all of the docs but I still can't figure out why I am getting this error. I am trying to take a vector of strings and convert that into a vector of dates, using some specified format. I have tried both using for loops and converting each date indicidually, or using vector functions like sapply, but neither is working. Here is the code using for loops:
dates = rawData[,ind] # get vector of date strings
print("single date example")
print(as.Date(dates[1]))
dDates = rep(1,length(dates)) # initialize vector of dates
class(dDates)="Date"
for (i in 1:length(dates)){
dDates[i]=as.Date(dates[i])
}
print(dDates[1:10])
EDIT: info on "dates" variables
[1] "dates"
V16 V17 V18 V19 V36
[1,] "2014-01-16" "2014-01-30" "2014-01-16" "2014-01-17" "1999-03-16 12:00"
[2,] "2014-01-04" "2014-01-18" "2014-01-04" "2014-01-08" "1998-09-04 12:00"
[3,] "2014-03-05" "2014-03-19" "2014-03-05" "2014-03-07" "1996-09-30 05:00"
[4,] "2014-01-21" "2014-02-04" "2014-01-22" "2014-01-24" "1995-08-21 12:00"
[5,] "2014-01-07" "2014-01-21" "2014-01-07" "2014-01-09" "1994-04-07 12:00"
[1] "class(dates)"
[1] "matrix"
[1] "class(dates[1,1])"
[1] "character"
[1] "dim(dates)"
[1] 56557 8
The result I am getting is as follows:
[1] "single date example"
[1] "2014-01-16"
Error in charToDate(x) :
character string is not in a standard unambiguous format
So basically, when I try to parse a signle element of the date string into a date, it works fine. But when I try to parse the dates in a loop, it breaks. How could this be so?
The reason why I am using a loop instead of sapply is because that was returning an even stranger result. When I try to run:
dDates = sapply(dDates, function(x) as.Date(x, format = "%Y-%m-%d"))
I am getting the following output:
2014-01-16 2014-01-04 2014-03-05 2014-01-21 2014-01-07 2014-01-02 2014-01-08
NA NA NA NA NA NA NA
2014-02-22 2014-01-09 2014-02-22
NA NA NA
Which is very strange. As you can see, since my format was correct, it was able to parse out the dates. But for some reason, it is also giving a time value of NA (or at least that is what I think the NA means). Maybe this is happening because some of my date strings have times, while others don't. But the thing is I left the time out of the format because I don't care about time.
Does anyone know why this is happening or how to fix it? I can't find anywhere online where you can "set" the time value of a date object easily -- I just can't seem to get rid of that NA. And somehow even a for loop doesn't work! Either was, the output is strange and I am not getting the expected results, even though my format is correct. Very frustrating that a simple thing like parsing a vector of dates is so much more difficult than in matlab or java.
Any help please?
EDIT: when I try simply
dDates = as.Date(dates,format="%m/%d/%Y")
I get the output
"dDates[1:10]"
[1] NA NA NA NA NA NA NA NA NA NA
still those mysterious NA's. I am also getting an error
Error in as.Date.default(value) :
do not know how to convert 'value' to class “Date”
Using a subset of your data,
v <- c("2014-01-16", "2014-01-30", "2014-01-16", "2014-01-17", "1999-03-16 12:00")
these statements are equivalent, since your format is the default one:
as.Date(v)
[1] "2014-01-16" "2014-01-30" "2014-01-16" "2014-01-17" "1999-03-16"
as.Date(v, format = "%Y-%m-%d")
[1] "2014-01-16" "2014-01-30" "2014-01-16" "2014-01-17" "1999-03-16"
If you would like to format the output of your date, use format:
format(as.Date(v), format = "%m/%d/%Y")
[1] "01/16/2014" "01/30/2014" "01/16/2014" "01/17/2014" "03/16/1999"

Character to date with as.Date

I have a vector (length=1704) of character like this:
[1] "1871_01" "1871_02" "1871_03" "1871_04" "1871_05" "1871_06" "1871_07" "1871_08" "1871_09" "1871_10" "1871_11" "1871_12"
[13] "1872_01" "1872_02" "1872_03" ...
.
.
.
[1681] "2011_01" "2011_02" "2011_03" "2011_04" "2011_05" "2011_06" "2011_07" "2011_08" "2011_09" "2011_10" "2011_11" "2011_12"
[1693] "2012_01" "2012_02" "2012_03" "2012_04" "2012_05" "2012_06" "2012_07" "2012_08" "2012_09" "2012_10" "2012_11" "2012_12"
I want to convert this vector into a vector of dates.
For that I use:
as.Date(vector, format="%Y_%m")
But it returns "NA"
I tried for one value:
b <- "1871_01"
as.Date(b, format="%Y_%m")
[1] NA
strptime(b, "%Y_%m")
[1] NA
I don't understand why it doesn't work...
Does anyone have a clue?
If you do regular work in year+month format, the zoo package can come in handy since it treats yearmon as a first class citizen (and is compatible with Date objects/functions):
library(zoo)
my.ym <- as.yearmon("1871_01", format="%Y_%m")
print(my.ym)
## [1] "Jan 1871"
str(my.ym)
## Class 'yearmon' num 1871
my.date <- as.Date(my.date)
print(my.date)
## [1] "1871-01-01"
str(my.date)
## Date[1:1], format: "1871-01-01"

Resources