I have a column of dates, exported from Excel as CSV into dataframe, the default type in "import dataset..." "...from CSV" i.e. d<-read_csv(data.csv).
From a dataframe I like to create a zoo and/or xts object.
The data is:
30/04/2016
31/05/2016
30/06/2016
I get the following errors:
dates <- c('30/04/2016','31/05/2016','30/06/2016')
d <- dates
z <- read.zoo(d)
Error in read.zoo(d) : index has bad entry at data row 1
z <- read.zoo(d, FUN = as.Date())
Error in as.Date() : argument "x" is missing, with no default
z <- read.zoo(d, FUN = as.Date(format="%d/%m/%Y"))
Error in as.Date(format = "%d/%m/%Y") : argument "x" is missing,
with no default
Alternatively, if i read directly into zoo with format arguemnt i get a different error:
ts.z <- read.zoo(d,index=1,tz='',format="%d/%m/%Y")
Error in read.zoo(d, index = 1, tz = "", format = "%d/%m/%Y") :
index has bad entry at data row 1
What are the bad entry row 1 errors? What are the correct ways to specify FUN = ? What are the correct input classes and distinctions for read.zoo?
From ?read.zoo about the file-parameter:
character string or strings giving the name of the file(s) which the
data are to be read from/written to. See read.table and
write.table for more information. Alternatively, in read.zoo, file
can be a connection or a data.frame (e.g., resulting from a previous
read.table call) that is subsequently processed to a "zoo" series.
What is going wrong in your example is that d is neither a filename, a connection or a data.frame. You will have to wrap it in data.frame().
A working example:
z <- read.zoo(data.frame(dates), FUN = as.Date, format='%d/%m/%Y')
which gives:
> z
2016-04-30
2016-05-31
2016-06-30
> class(z)
[1] "zoo"
Used input data:
dates <- c('30/04/2016','31/05/2016','30/06/2016')
Related
Strptime seems to be missing something in this scenario:
aDateInPOSIXct <- strptime("2018-12-31", format = "%Y-%m-%d")
someText <- "asdf"
df <- data.frame(aDateInPOSIXct, someText, stringsAsFactors = FALSE)
bDateInPOSIXct <- strptime("2019-01-01", format = "%Y-%m-%d")
df[1,1] <- bDateInPOSIXct
Assignment of bDate to the dataframe fails with:
Error in as.POSIXct.numeric(value) : 'origin' must be supplied
And a warning:
provided 11 variables to replace 1 variables
I want to use both POSIXct dates and POSIXct date-times to compare this and that. It's way less work than manipulating character strings -- and POSIX takes care of the time zone issues. Unfortunately, I'm missing something.
You only need to cast your calls to strptime to POSIXct explicitly:
aDateInPOSIXct <- as.POSIXct(strptime("2018-12-31", format = "%Y-%m-%d"))
someText <- "asdf"
df <- data.frame(aDateInPOSIXct, someText, stringsAsFactors = FALSE)
bDateInPOSIXct <- as.POSIXct(strptime("2019-01-01", format = "%Y-%m-%d"))
df[1,1] <- bDateInPOSIXct
Check the R documentation which says:
Character input is first converted to class "POSIXlt" by strptime: numeric input is first converted to "POSIXct".
I am Rstudio for my R sessions and I have the following R codes:
d1 <- read.csv("mydata.csv", stringsAsFactors = FALSE, header = TRUE)
d2 <- d1 %>%
mutate(PickUpDate = ymd(PickUpDate))
str(d2$PickUpDate)
output of last line of code above is as follows:
Date[1:14258], format: "2016-10-21" "2016-07-15" "2016-07-01" "2016-07-01" "2016-07-01" "2016-07-01" ...
I need an additional column (let's call it MthDD) to the dataframe d2, which will be the Month and Day of the "PickUpDate" column. So, column MthDD need to be in the format mm-dd but most importantly, it should still be of the date type.
How can I achieve this?
UPDATE:
I have tried the following but it outputs the new column as a character type. I need the column to be of the date type so that I can use it as the x-axis component of a plot.
d2$MthDD <- format(as.Date(d2$PickUpDate), "%m-%d")
Date objects do not display as mm-dd. You can create a character string with that representation but it will no longer be of Date class -- it will be of character class.
If you want an object that displays as mm-dd and still acts like a Date object what you can do is create a new S3 subclass of Date that displays in the way you want and use that. Here we create a subclass of Date called mmdd with an as.mmdd generic, an as.mmdd.Date method, an as.Date.mmdd method and a format.mmdd method. The last one will be used when displaying it. mmdd will inherit methods from Date class but you may still need to define additional methods depending on what else you want to do -- you may need to experiment a bit.
as.mmdd <- function(x, ...) UseMethod("as.mmdd")
as.mmdd.Date <- function(x, ...) structure(x, class = c("mmdd", "Date"))
as.Date.mmdd <- function(x, ...) structure(x, class = "Date")
format.mmdd <- function(x, format = "%m-%d", ...) format(as.Date(x), format = format, ...)
DF <- data.frame(x = as.Date("2018-03-26") + 0:2) # test data
DF2 <- transform(DF, y = as.mmdd(x))
giving:
> DF2
x y
1 2018-03-26 03-26
2 2018-03-27 03-27
3 2018-03-28 03-28
> class(DF2$y)
[1] "mmdd" "Date"
> as.Date(DF2$y)
[1] "2018-03-26" "2018-03-27" "2018-03-28"
Try using this:
PickUpDate2 <- format(PickUpDate,"%m-%d")
PickUpDate2 <- as.Date(PickUpDate2, "%m-%d")
This should work, and you should be able to bind_cols afterwards, or just add it to the data frame right away, as you proposed in the code you provided. So the code should be substituted to be:
d2$PickUpDate2 <- format(d2$PickUpDate,"%m-%d")
d2$PickUpDate2 <- as.Date(d2$PickUpDate2, "%m-%d")
I am using the [dowjones][1] dataset but I think maybe my date format is incorrect because when I run the zoo function to make the data time series I get the warning:
some methods for “zoo” objects do not work if the index entries in
‘order.by’ are not unique
My code:
dow = read.table('dow_jones_index.data', header=T, sep=',')
dowts = zoo(dow$close, as.Date(as.character(dow$date), format = "%m/%d/%Y"))
The dates look like this: 5/6/2011
Does my error have to do with using an incorrect date format? Or something else?
Thank you.
EDIT:
hist(dowts, xlab='close change rate', prob=TRUE, main='Histogram',ylim=c(0,.07))
Error in hist.default(dowts, xlab = "close change rate", prob = TRUE,
: character(0)
In addition: Warning messages: 1: In zoo(rval[i],
index(x)[i]) : some methods for “zoo” objects do not work if the
index entries in ‘order.by’ are not unique 2: In
pretty.default(range(x), n = breaks, min.n = 1) : NAs introduced by
coercion [1]:
https://archive.ics.uci.edu/ml/datasets/Dow+Jones+Index
The problem as the warning message indicates is that your date values are not unique. This is because your data is in long format with multiple stocks. A timeseries has to be in a matrix like structure with each column representing a stock and each row a point in time. With dcast from the package reshape2 this straigthforward:
library(zoo)
library(reshape2)
dow <- read.table('dow_jones_index.data', header=T, sep=',', stringsAsFactors = FALSE)
# delete $ symbol and coerce to numeric
dow$close <- as.numeric(sub("\\$", "",dow$close))
tmp <- dcast(dow, date~stock, value.var = "close")
dowts <- as.zoo(x = tmp[,-1], order.by = as.Date(tmp$date, format = "%m/%d/%Y"))
I have a csv file containing financial data (i.e. dates with corresponding prices). My goal is to load these data in R and convert the dates from character data to dates. I tried the following:
data<-read.csv("data.csv",sep=";")
attach(data)
as.Date(Date,format="%Y-%b-%d") #'Date' is the column containing the dates
Unfortunately, this only leads to NAs in Date. Things that were proposed in other threads on this issue but did not help me:
reading in the csv file with 'stringsAsFactors=FALSE'
formatting the dates in Excel as dates
Here is a sample of my csv file:
Date;Open;High;Low;Close;Volume;Adj Close
30.10.2015;10842.51953;10850.58008;10748.7002;10850.13965;89270000;10850.13965
29.10.2015;10867.19043;10886.98047;10741.13965;10800.83984;122513100;10800.83984
28.10.2015;10728.16016;10848.41016;10691.62988;10831.95996;0;10831.95996
27.10.2015;10761.37012;10807.41016;10692.19043;10692.19043;0;10692.19043
26.10.2015;10791.17969;10863.08984;10756.83008;10801.33984;73091500;10801.33984
23.10.2015;10610.33008;10847.46973;10586.95996;10794.54004;0;10794.54004
22.10.2015;10213.00977;10508.25;10194.74023;10491.96973;107511600;10491.96973
21.10.2015;10185.41992;10277.58984;10107.91992;10238.09961;70021400;10238.09961
20.10.2015;10174.79981;10194.53027;10080.19043;10147.67969;67235200;10147.67969
Your format argument was incorrect, which is usually the cause of NAs when coercing strings to Date objects. You can use this instead:
R> as.Date(Df$Date, format = "%d.%m.%Y")
#[1] "2015-10-30" "2015-10-29" "2015-10-28" "2015-10-27" "2015-10-26"
#[6] "2015-10-23" "2015-10-22" "2015-10-21" "2015-10-20"
Instead of attach, you can use alternatives such as within to avoid qualifying your column names. For example,
Df <- within(Df, {
Date <- as.Date(Date, format = "%d.%m.%Y")
})
##
R> class(Df$Date)
#[1] "Date"
Data:
Df <- read.table(
text = "Date;Open;High;Low;Close;Volume;Adj Close
30.10.2015;10842.51953;10850.58008;10748.7002;10850.13965;89270000;10850.13965
29.10.2015;10867.19043;10886.98047;10741.13965;10800.83984;122513100;10800.83984
28.10.2015;10728.16016;10848.41016;10691.62988;10831.95996;0;10831.95996
27.10.2015;10761.37012;10807.41016;10692.19043;10692.19043;0;10692.19043
26.10.2015;10791.17969;10863.08984;10756.83008;10801.33984;73091500;10801.33984
23.10.2015;10610.33008;10847.46973;10586.95996;10794.54004;0;10794.54004
22.10.2015;10213.00977;10508.25;10194.74023;10491.96973;107511600;10491.96973
21.10.2015;10185.41992;10277.58984;10107.91992;10238.09961;70021400;10238.09961
20.10.2015;10174.79981;10194.53027;10080.19043;10147.67969;67235200;10147.67969",
header = TRUE, stringsAsFactors = FALSE, sep = ";")
This seems like a simple enough function to write, but I think I'm misunderstanding the requirements for formal arguments / how R parses and evaluates a function.
I'm trying to write a function that converts any character vector of the form "%m/%d/%Y" (and belonging to data.frame df) to a date vector, and formats it as "%m/%d/%Y", as follows:
dateformat <- function(x) {
df$x <- (format(as.Date(df$x, format = "%m/%d/%Y"), "%m/%d/%Y"))
}
I was thinking that...
dateformat(a)
... would just take the "a" as the actual argument for x and plug it into the function, thus resolving as:
df$a <- (format(as.Date(df$a, format = "%m/%d/%Y"), "%m/%d/%Y"))
However, I get the following error when running dateformat(a):
Error in as.Date.default(df$x, format = "%m/%d/%Y") :
do not know how to convert 'df$x' to class “Date”
Can someone please explain why my understanding of formal/actual arguments and/or R function parsing/evaluation is incorrect? Thank you.
Update
Of course, for all the variables I want to convert to dates (e.g., df$a, df$b, df$c), I could just write
df$a <- (format(as.Date(df$a, format = "%m/%d/%Y"), "%m/%d/%Y"))
df$b <- (format(as.Date(df$b, format = "%m/%d/%Y"), "%m/%d/%Y"))
df$c <- (format(as.Date(df$c, format = "%m/%d/%Y"), "%m/%d/%Y"))
But I'm looking to improve my coding skills by making a more general function to which I could feed a vector of variables. For instance, what if I had df$a to df$z, all character variables that I wanted to convert to date variables? After I write a proper function, I'd like to then perhaps run it like so:
for (n in letters) {
dateformat(n)
}
First, the format(...) function returns a character vector, not a date, so if x is a string,
format(as.Date(x, format = "%m/%d/%Y"), "%m/%d/%Y")
converts x to date and then back to character, as in:
result <- format(as.Date("01/03/2014", format = "%m/%d/%Y"), "%m/%d/%Y")
result
# [1] "01/03/2014"
class(result)
# [1] "character"
Second, referencing an object, such as df, in a function, on the LHS of an expression, causes R to create that object in the scope of the function.
a <- 2
f <- function(x) a <- x
f(3)
a
# [1] 2
Here, we set a variable, a, to 2. Then in the function we create a new variable, a in the scope of the function, set it to x (3), and destroy it when the function returns. So in the global environment a is still 2.
If you insist on using a dateformat(...) function, this should work work:
df <- data.frame(a=paste("01",1:10,"2014",sep="/"),
b=paste("02",11:20,"2014",sep="/"),
c=paste("03",21:30,"2014",sep="/"))
dateformat <- function(x) as.Date(df[[x]], format = "%m/%d/%Y")
for (n in letters[1:3]) df[[n]] <- dateformat(n)
sapply(df,class)
# a b c
# "Date" "Date" "Date"
This will be more efficient though:
df <- as.data.frame(lapply(df,as.Date,format="%m/%d/%Y"))