nanotime: how to deal with NAs in a tibble? - r

I am using the excellent nanotime package to store my timestamps, but I am unable to make the package work when my tibble contains a missing value.
Consider this:
library(nanotime)
library(tibble)
library(dplyr)
tibble(time = c('2020-01-01 10:10:10.123456',
NA,
'2020-01-01 10:10:10.123456')) %>%
mutate(enhance = nanotime(time,
tz = 'GMT',
format = '%Y-%m-%d %H:%M:%E9S'))
Error in RcppCCTZ::parseDouble(x, fmt = format, tz = tz) :
Parse error on NA
What am I missing here? Using na.rm = TRUE does not work unfortunately.
Thanks!

The issue is NA is of type logical, you need to have all the values in the column of same type. We can use as.integer64 to replace logical NA's with integer64 NA.
library(nanotime)
tbl <- tibble::tibble(time = c('2020-01-01 10:10:10.123456',
NA,
'2020-01-01 10:10:10.123456'))
tbl$enhance <- as.integer64(NA)
tbl$enhance[!is.na(tbl$time)] <- nanotime(na.omit(tbl$time), tz = 'GMT',
format = '%Y-%m-%d %H:%M:%E9S')
nanotime(tbl$enhance)

Related

read.zoo is not returning needed date format

My initial data is in %y-%m-%d format...
using the code
returnsgamma <- read.zoo(returns, header = TRUE, sep = ",", FUN = as.chron)
the zoo file is returning values in the order %m/%d/%y
is there anyway to read.zoo and have the order of dates stay as %y/%m/%d or %d/%m/%y?
Assuming the input shown in the Note at the end we can use the default Date class whose output when rendering defaults to yyyy-mm-dd or use chron with chron(..., out.format="y-m-d") which produces yy-mm-dd.
library(zoo)
read.csv.zoo(text = Lines, format = "%y-%m-%d")
## 2022-12-01
## 34
library(chron)
toChron <- function(x) as.chron(x, out.format = "y-m-d")
read.csv.zoo(text = Lines, FUN = toChron)
## 22-12-01
## 34
Note
Lines <- "date,value
22-12-01,34"

How to use the as.POSIXlt() and solve the error of character string is not in a standard unambiguous format

I was working on an assignment,
library(tidyverse)
library(quantmod)
library(lubridate)
macro <- c("GDPC1", "CPIAUCSL","DTB3", "DGS10", "DAAA", "DBAA", "UNRATE", "INDPRO", "DCOILWTICO")
rm(macro_factors)
for (i in 1:length(macro)){
getSymbols(macro[i], src = "FRED")
data <- as.data.frame(get(macro[i]))
data$date <- as.POSIXlt.character(rownames(data))
rownames(data) <- NULL
colnames(data)[1] <- "macro_value"
data$quarter <- as.yearqtr(data$date)
data$macro_ticker <- rep(macro[i], dim(data)[1])
data <- data%>%
mutate(date = ymd(date))%>%
group_by(quarter)%>%
top_n(1,date) %>%
filter(date >= "1980-01-01", date <= "2019-12-31") %>%
if(i == 1){macro_factors <- data} else {macro_factors <- rbind(macro_factors, data)}
}
but this came out
Error in as.POSIXlt.character(rownames(data)) :
character string is not in a standard unambiguous format
I try follow the online tutorial of using as.POSIXct() by convert the data from charater to numeric first, but it did not work for my case, and I check the class of the data and the data shown like "year-month-day", and is in the class of character, supposedly the function as.POSIXlt() will work right?
There are several problems:
POSIXlt class should not be used in data frames. Also do not use POSIXct for dates since you can get into needless time zone problems.
to convert an xts object, such as the object produced by getSymbols , to a data frame use fortify.zoo
depending on what you want to do you might not need to convert from xts to a data frame in the first place. Suggest reading about xts and zoo in the documentation of those packages.
This gives a list of data frames L and then a long data frame DF containing them all.
library(dplyr, exclude = c("filter", "lag"))
library(quantmod) # also brings in xts and zoo
macro <- c("GDPC1", "CPIAUCSL")
getData <- function(symb) symb %>%
getSymbols(src = "FRED", auto.assign = FALSE) %>%
aggregate(as.yearqtr, tail, 1) %>%
window(start = "1980q1", end = "2019q4") %>%
fortify.zoo
L <- Map(getData, macro)
DF <- bind_rows(L, .id = "id")

Strptime fails when working with a dataframe

Strptime seems to be missing something in this scenario:
aDateInPOSIXct <- strptime("2018-12-31", format = "%Y-%m-%d")
someText <- "asdf"
df <- data.frame(aDateInPOSIXct, someText, stringsAsFactors = FALSE)
bDateInPOSIXct <- strptime("2019-01-01", format = "%Y-%m-%d")
df[1,1] <- bDateInPOSIXct
Assignment of bDate to the dataframe fails with:
Error in as.POSIXct.numeric(value) : 'origin' must be supplied
And a warning:
provided 11 variables to replace 1 variables
I want to use both POSIXct dates and POSIXct date-times to compare this and that. It's way less work than manipulating character strings -- and POSIX takes care of the time zone issues. Unfortunately, I'm missing something.
You only need to cast your calls to strptime to POSIXct explicitly:
aDateInPOSIXct <- as.POSIXct(strptime("2018-12-31", format = "%Y-%m-%d"))
someText <- "asdf"
df <- data.frame(aDateInPOSIXct, someText, stringsAsFactors = FALSE)
bDateInPOSIXct <- as.POSIXct(strptime("2019-01-01", format = "%Y-%m-%d"))
df[1,1] <- bDateInPOSIXct
Check the R documentation which says:
Character input is first converted to class "POSIXlt" by strptime: numeric input is first converted to "POSIXct".

R POSIXlt timestamp conversion do not know how to convert 'df1$timestamp' to class “POSIXlt”

Hello all I am facing issue while converting timestamp to POSIXlt, later from this timestamp o need to extract Year,month,dayofmoth,hour,min,sec
2015-12-01 00:04:39 is my timestamp
and here is my try
getwd()
rm(list=ls())
library(ggplot2)
library(plyr)
library(reshape)
library(scales)
library(gridExtra)
library(SparkR)
Sys.setenv(SPARK_HOME="/usr/local/spark").libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
sc <- sparkR.init(master="local","RwordCount")
args <- commandArgs(trailing = TRUE)
sqlContext <- sparkRSQL.init(sc)
df1 <- read.df(sqlContext, "hdfs://master:9000/test.csv", header='true', source = "com.databricks.spark.csv", inferSchema='true', stringsAsFactors = F)
if("timestamp" %in% colnames(df1)){
df1$pTime <- as.POSIXlt(df1$timestamp, format= "%Y-%m-%d %H:%M:%S")
}else {
df1$pTime <- as.POSIXlt(df1$Timestamp, format= "%Y-%m-%d %H:%M:%S")
}
but getting error here do not know how to convert 'df1$timestamp' to class “POSIXlt”
later i need to find out the Year,month,dayofmonth,hors,min,sec for that i have this snippet
df1$Year <- df1$pTime$year-100 #Year
df1$Month <- df1$pTime$mon+1 #Month 1-12
df1$Day <- df1$pTime$mday #day of month
df1$Hour <- df1$pTime$hour #0â??23: hours
df1$Min <- df1$pTime$min
df1$Sec <- df1$pTime$sec
df1$WeekOfYear <- strftime(df1$pTime, format="%W")
and i am executing above script using following syntax,
bin/spark-submit --packages com.databricks:spark-csv_2.11:1.3.0 /home/script/analysis.R
**Error in as.POSIXlt.default(df1$timestamp, format = "%Y-%m-%d %H:%M:%S") :
do not know how to convert 'df1$timestamp' to class “POSIXlt”
Calls: as.POSIXlt -> as.POSIXlt.default
Execution halted**
How can i get rid of the error, any help will be appreciated.
Thanks
You can convert your timestamp by using as.POSIXct
x <- as.POSIXct("2015-12-01 00:04:39")
and then using lubridate package, you can extract all the information
library(lubridate)
year(x)
#[1] 2015
month(x)
#[1] 12
day(x)
#[1] 1
hour(x)
#[1] 0
minute(x)
#[1] 4
second(x)
#[1] 39
You can extract parts of datetime values by
x <- Sys.time()
format(x, format="%Y")
for example. See
?strptime
for all options.
I can not reconstruct the first part of your question. What is the error message you get?
firstly, you can index your data.frame using without using the ifelse case
df1[colnames(df1) %in% "timestamp"]
To convert you entire column of format 2015-12-01 00:04:39
as.POSIXlt(strptime(as.character(df1[colnames(df1) %in% "timestamp"]),
format = "%Y-%m-%d %H:%M:%S"),
format = "%Y-%m-%d %H:%M:%S")
I had 'trans_dtime' column type string in dataframe. i have converted 'trans_dtime' column to timestamp type using SparkR
printSchema(df)
root
|-- col1: string (nullable = true)
|-- trans_dtime: string (nullable = true)
df$trans_dtime <- from_utc_timestamp(date_format(df$trans_dtime, "YYYY-MM-dd HH:mm:ss"), "GMT")
printSchema(df)
root
|-- col1: string (nullable = true)
|-- trans_dtime: timestamp (nullable = true)
Hope it will help you. :)

Time Series date format issue in R

I am using the [dowjones][1] dataset but I think maybe my date format is incorrect because when I run the zoo function to make the data time series I get the warning:
some methods for “zoo” objects do not work if the index entries in
‘order.by’ are not unique
My code:
dow = read.table('dow_jones_index.data', header=T, sep=',')
dowts = zoo(dow$close, as.Date(as.character(dow$date), format = "%m/%d/%Y"))
The dates look like this: 5/6/2011
Does my error have to do with using an incorrect date format? Or something else?
Thank you.
EDIT:
hist(dowts, xlab='close change rate', prob=TRUE, main='Histogram',ylim=c(0,.07))
Error in hist.default(dowts, xlab = "close change rate", prob = TRUE,
: character(0)
In addition: Warning messages: 1: In zoo(rval[i],
index(x)[i]) : some methods for “zoo” objects do not work if the
index entries in ‘order.by’ are not unique 2: In
pretty.default(range(x), n = breaks, min.n = 1) : NAs introduced by
coercion [1]:
https://archive.ics.uci.edu/ml/datasets/Dow+Jones+Index
The problem as the warning message indicates is that your date values are not unique. This is because your data is in long format with multiple stocks. A timeseries has to be in a matrix like structure with each column representing a stock and each row a point in time. With dcast from the package reshape2 this straigthforward:
library(zoo)
library(reshape2)
dow <- read.table('dow_jones_index.data', header=T, sep=',', stringsAsFactors = FALSE)
# delete $ symbol and coerce to numeric
dow$close <- as.numeric(sub("\\$", "",dow$close))
tmp <- dcast(dow, date~stock, value.var = "close")
dowts <- as.zoo(x = tmp[,-1], order.by = as.Date(tmp$date, format = "%m/%d/%Y"))

Resources