I have a sample q below that contains three dates of dd/mm/yy in q$test
test
1 210376
2 141292
3 280280
I want to create a new covariate q$new that calculates the date difference from q$test to today.
I tried
q$new <- as.numeric(difftime(as.Date(q$test,format='%d/%m/%y'), as.Date(Sys.Date()), unit="weeks"))
But I receive an error message
Error in q$new <- as.numeric(difftime(as.Date(q$test, format =
"%d/%m/%y"), : object of type 'closure' is not subsettable
Do you have any idea whats wrong? Or have another solution?
q <- structure(list(test = c(210376L, 141292L, 280280L)), class = "data.frame", row.names = c(NA,
-3L))
You could do
as.numeric(difftime(Sys.Date(), as.Date(as.character(q$test), "%d%m%y"), units = "weeks"))
#[1] 2257.286 1384.143 2051.714
Few pointers -
1) Sys.Date is already of class "Date" so no need for as.Date there
2) as.Date was expecting a character string as input hence wrapped q$test in as.character
3) format in as.Date is used to represent the format we have as input and not the output we want. So in your case you used the format "%d/%m/%y" whereas the format you had was %d%m%y.
structure(list(date = c(20140717L, 20140611L, 20140611L, 20140704L,
20140411L, 20140906L, 20140512L, 20140717L, 20140819L, 20140415L,
20140812L, 20140403L, 20140424L, 20140818L, 20140922L, 20140625L,
20141006L, 20140918L, 20140811L, 20140819L, 20140602L, 20140626L,
20140729L, 20140624L, 20140909L, 20140705L, 20140920L, 20140515L,
20140531L, 20140628L, 20140822L, 20140508L, 20140809L, 20140627L,
20140727L, 20140711L, 20140714L, 20140710L, 20140403L, 20140525L,
20140428L, 20140501L, 20140915L, 20140510L, 20140601L, 20140921L,
20140815L, 20140610L, 20140418L, 20140812L, 20140614L, 20140814L,
20140626L, 20140412L, 20140912L, 20140514L, 20140919L, 20140706L,
20140411L, 20140711L, 20140624L, 20140430L, 20140521L, 20140418L,
20140713L, 20140424L, 20140601L, 20140923L, 20140406L, 20140905L,
20140613L, 20140412L, 20140407L, 20140402L, 20140813L, 20140903L,
20140827L, 20140521L, 20140524L, 20140404L, 20140419L, 20140412L,
20140902L, 20140623L, 20140925L, 20140528L, 20140731L, 20140513L,
20140821L, 20140703L, 20140724L, 20140818L, 20140801L, 20140628L,
20140801L, 20140521L, 20140906L, 20140725L, 20140522L, 20140927L,
20140615L, 20140920L, 20140813L, 20140815L, 20140924L, 20140614L,
20140912L, 20140710L, 20140807L, 20140501L, 20140420L, 20140630L,
20140704L, 20140401L, 20140605L, 20140928L, 20140806L, 20140614L,
20140907L, 20140704L, 20140403L, 20140804L, 20140603L, 20140728L,
20140919L, 20140731L, 20140426L, 20140930L, 20140502L, 20140827L,
20140815L, 20140628L, 20140902L, 20140616L, 20140613L, 20140726L,
20140721L, 20140425L, 20140715L, 20140607L, 20140913L, 20140621L,
20140708L, 20140427L, 20140506L, 20140425L, 20140411L, 20140615L,
20140713L, 20140424L, 20140406L, 20140711L, 20140415L, 20140909L,
20141004L, 20140725L, 20140602L, 20140405L, 20140525L, 20140605L,
20140521L, 20140506L, 20140414L, 20140916L, 20140512L, 20140830L,
20140722L, 20140711L, 20140628L, 20140613L, 20140618L, 20140719L,
20140416L, 20140727L, 20140521L, 20140718L, 20140814L, 20140515L,
20140501L, 20140725L, 20140507L, 20140619L, 20140525L, 20140609L,
20140614L, 20140402L, 20140914L, 20140517L, 20140826L, 20140602L,
20140920L, 20140718L, 20140915L, 20140715L, 20140708L, 20140419L,
20140819L, 20140501L, 20140807L, 20140404L)), .Names = "date", row.names = c(NA,
-200L), class = "data.frame")
This data frame has date values as class of integer. This data set is just one column of my data set. The original data set also has another variable called "total sales". I want to make a plot which has dates in X axis and on Y axis, total sales.
However, because the dates are regarded as integer, the plot is bad. So I want to let R understand the date column as date variables so I can get improved plot.
How can it be possible? Please give me help. Thank you very much.
You might have better luck with as.Date. If df is the data, then you can do
df$date <- as.Date(as.character(df$date), format = "%Y%m%d")
with(df, plot(date))
If d is the date column, you can try:
strptime(t(d),format='%Y%m%d')
I'm trying to understand my difficulties in the past with inputting zoo objects. The following two uses of read.zoo give different results despite the default argument for tz supposedly being "" and that is the only difference between the two read.zoo calls:
Lines <- "2013-11-25 12:41:21 2
2013-11-25 12:41:22.25 2
2013-11-25 12:41:22.75 75
2013-11-25 12:41:24.22 3
2013-11-25 12:41:25.22 1
2013-11-25 12:41:26.22 1"
library(zoo)
z <- read.zoo(text = Lines, index = 1:2)
> dput(z)
structure(c(2L, 2L, 75L, 3L, 1L, 1L), index = structure(c(16034,
16034, 16034, 16034, 16034, 16034), class = "Date"), class = "zoo")
z <- read.zoo(text = Lines, index = 1:2, tz="")
> dput(z)
structure(c(2L, 2L, 75L, 3L, 1L, 1L), index = structure(c(1385412081,
1385412082.25, 1385412082.75, 1385412084.22, 1385412085.22, 1385412086.22
), class = c("POSIXct", "POSIXt"), tzone = ""), class = "zoo")
>
The answer (of course) is in the sources for read.zoo(), wherein there is:
....
ix <- if (missing(format) || is.null(format)) {
if (missing(tz) || is.null(tz))
processFUN(ix)
else processFUN(ix, tz = tz)
}
else {
if (missing(tz) || is.null(tz))
processFUN(ix, format = format)
else processFUN(ix, format = format, tz = tz)
}
....
Even though the default for tz is "", in your first case tz is considered missing (by missing()) and hence processFUN(ix) is used. When you set tz = "", it is no longer missing and hence you get processFUN(ix, tz = tz).
Without looking at the details of read.zoo() this could possibly be handled better by having tz = NULL or tz (no default) in the arguments and then in the code, if tz needs to be set to "" for some reason, do:
if (missing(tz) || is.null(tz)) {
tz <- ""
}
or perhaps this is not even needed if all the is required is to avoid the confusion about the two different calls?
Effectively, the default index class is "Date" unless tz is used in which case the default is "POSIXct". Thus the first example in the question gives "Date" class since that is the default and the second "POSIXct" since tz was specified.
If you want to specify the class without making use of these defaults then to be explicit use the FUN argument:
read.zoo(...whatever..., FUN = as.Date)
read.zoo(...whatever..., FUN = as.POSIXct) # might need FUN=paste,FUN2=as.POSIXct
read.zoo(...whatever..., FUN = as.yearmon)
# etc.
The FUN argument can also take a custom function as shown in the examples in the package.
Note that it always assumes standard formats (e.g. "%Y-%m-%d" in the case of "Date" class) if no format is specified and never tries to automatically determine the format.
The way it works is explained in detail in ?read.zoo and there are many examples in ?read.zoo (there are 78 lines of code in the examples section) as well as in an entire vignette (one of six vignettes) dedicated just to read.zoo" : Reading Data in zoo.
Added Have expanded the above. Also, in the development version of zoo available here the heuristic has been improved and with that improvement the first example in the question does recognize the date/times and chooses POSIXct. Also some clarification of the simple heuristic has been added to the read.zoo help file so that the many examples provided do not have to be relied upon as much.
Here are some examples. Note that the heuristic referred to is a heuristic to determine the class of the time index only. It can only identify "numeric", "Date" and "POSIXct" classes. The heuristic cannot identify other classes (although you can specify them yourself using FUN=). Also the heuristic does not identify formats. If the format is not provided using format= or implicitly through FUN= then standard format is assumed, e.g. "%Y-%m-%d" in the case of "Date".
Lines <- "2013-11-25 12:41:21 2
2013-12-25 12:41:22.25 3
2013-12-26 12:41:22.75 8"
# explicit. Uses POSIXct.
z <- read.zoo(text = Lines, index = 1:2, FUN = paste, FUN2 = as.POSIXct)
# tz implies POSIXct
z <- read.zoo(text = Lines, index = 1:2, tz = "")
# heuristic: Date now; devel ver uses POSIXct
z <- read.zoo(text = Lines, index = 1:2)
Lines <- "2013-11-251 2
2013-12-25 3
2013-12-26 8"
z <- read.zoo(text = Lines, FUN = as.Date) # explicit. Uses Date.
z <- read.zoo(text = Lines, format = "%Y-%m-%d") # format & no tz implies Date
z <- read.zoo(text = Lines) # heuristic: Date
Note:
(1) In general, its safer to be explicit by using FUN or by using tz and/or format as opposed to relying on the heuristic. If you are explicit by using FUN or semi-explicit by using tz and/or format then there is no change between the current and the development versions of read.zoo.
(2) Its safer to rely on the documentation rather than the internals as the internals can change without warning and in fact have changed in the development version. If you really want to look at the code despite this then the key statement that selects the class of the index if FUN is not explicitly defined is the if (is.null(FUN)) ... statement in the read.zoo source.
(3) I recommend using read.zoo as being easier, direct and compact rather than workarounds such as read.table followed by zoo. I have been using read.zoo for years as have many others and it seems pretty solid to me but if anyone finds specific problems with read.zoo or with the documentation (always possible since there is quite a bit of it) they can always be reported. Even though the package has been around for years improvements are still being made.
I suspect your use of read.zoo tripped you up. Here is what I did:
library(zoo)
tt <- read.table(text=Lines)
z <- zoo(as.integer(tt[,3]), order.by=as.POSIXct(paste(tt[,1], tt[,2])))
Now z is a proper zoo object:
R> z
2013-11-25 12:41:21.00 2013-11-25 12:41:22.25 2013-11-25 12:41:22.75
2 2 75
2013-11-25 12:41:24.22 2013-11-25 12:41:25.22 2013-11-25 12:41:26.22
3 1 1
R> class(z)
[1] "zoo"
R> class(index(z))
[1] "POSIXct" "POSIXt"
R>
And by making sure I used a POSIXct object for the index, I am in fact getting a POSIXct object back.