R Issue adding NA values for missing rows using data frames - r

Thanks in advance for any help that is provided.
Long story short: I am working with hourly time series data from a measurement device (exported from SQL then imported in to R in order to properly format the date time ) - the time series contains missing data, sometimes in groups, and I need to locate these missing rows/indices and insert a new row for each instance that holds an NA value.
Related Questions that did not solve my problem:
how to insert missing observations on a data frame
Adding row to a data frame with missing values
Problem Data
The dataset that I am working with in this case is fairly large and varies depending on the measurement device I select. As a test case, I have one time series that contains 17469 hourly observations. I located a small section of the dataset that may be used for testing purposes. Here it is:
> snip
date Reading
408 2015-12-15 00:00:00 4.40
409 2015-12-14 23:00:00 4.62
410 2015-12-14 22:00:00 4.61
411 2015-12-14 21:00:00 6.15
412 2015-12-14 20:00:00 6.06
413 2015-12-14 19:00:00 7.04
414 2015-12-14 18:00:00 8.57
415 2015-12-14 11:00:00 4.12
416 2015-12-14 10:00:00 3.73
We can see that observations are missing for 2015-12-14 12:00:00 to 2015-12-14 17:00:00. I would like to first locate then populate the time series with these date times and input NA for the Reading column in these positions. I would also like to return the indices that are missing in an additional vector.
How can this be done?
So far I have tried the following code (as suggested here, how to add a missing dates and remove repeated dates in hourly time series), but all I end up with is NA values when I perform the merge function and still need to identify where the missing indices are located.
Here is the result:
> grid = data.frame(date=seq.POSIXt(min(snip[,1]), to=max(snip[,1]), by="1 hours"));
> dat = merge(grid, snip, by="date", all.x=TRUE)
> dat
date Reading
1 2015-12-14 10:00:00 NA
2 2015-12-14 11:00:00 NA
3 2015-12-14 12:00:00 NA
4 2015-12-14 13:00:00 NA
5 2015-12-14 14:00:00 NA
6 2015-12-14 15:00:00 NA
7 2015-12-14 16:00:00 NA
8 2015-12-14 17:00:00 NA
9 2015-12-14 18:00:00 NA
10 2015-12-14 19:00:00 NA
11 2015-12-14 20:00:00 NA
12 2015-12-14 21:00:00 NA
13 2015-12-14 22:00:00 NA
14 2015-12-14 23:00:00 NA
15 2015-12-15 00:00:00 NA
What am I missing here? Is it because grid and snip$date are in reverse order? For additional information here is what the date time format looks like (in case this is from where my issue stems):
> snip[2,1]
[1] "2015-12-14 23:00:00 GMT"
The result of the dput(snip) command is as follows (thanks for the suggestion #42):
> dput(snip)
structure(list(date = structure(list(sec = c(0, 0, 0, 0, 0, 0,
0, 0, 0), min = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), hour = c(0L,
23L, 22L, 21L, 20L, 19L, 18L, 11L, 10L), mday = c(15L, 14L, 14L,
14L, 14L, 14L, 14L, 14L, 14L), mon = c(11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L), year = c(115L, 115L, 115L, 115L, 115L, 115L,
115L, 115L, 115L), wday = c(2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), yday = c(348L, 347L, 347L, 347L, 347L, 347L, 347L, 347L, 347L
), isdst = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("sec",
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
), class = c("POSIXlt", "POSIXt"), tzone = "GMT"), Reading = c(4.4,
4.62, 4.61, 6.15, 6.06, 7.04, 8.57, 4.12, 3.73)), .Names = c("date",
"Reading"), row.names = 408:416, class = "data.frame")

Here's how I was able to do it with some help from na.locf documentation. Does it help?
dat<- dget("yoursample")
require(xts)
datxts<- as.xts(dat[,-1],order.by = dat$date,frequency = 24)
tzn<-tzone(datxts)
g<- seq(start(datxts), end(datxts), "hour")
gxts<- xts(rep(NA,length(g)),order.by = as.POSIXct(g), tzone = tzn)
merge(datxts,gxts,all = T)$datxts
Edit: And also, your method works if you add a column of NA's to generated dataframe
dates=seq.POSIXt(min(snip[,1]), to=max(snip[,1]), by="1 hours")
grid = data.frame(date=dates,dummydata=rep(NA,length(dates)));
dat = merge(grid, snip, by="date", all=T)

Related

Set the start point for time intervals in R

I have different sets of data with the following format
Time Value1 Value2 ....
11/04/2015 15:12:22 1 2 ....
11/04/2015 15:13:46 1 2 ....
And I want to group them in intervals of 15 minutes. I can do this with the following code
data$time = cut(data$time, breaks = "15 min")
data.grouped <- aggregate(data[,c(-1)], by = list(time = datos$time), median)
The problem is that the time field in the output has the following values
12/04/2015 16:12
12/04/2015 16:27
12/04/2015 16:42
12/04/2015 16:57
And I want the times to be :00 :15 :30 or :45. Is there any way of forcing the intervals to be like this or a different approach to merge the data that allows it?
A sample data from dput:
structure(list(time = structure(list(sec = c(49, 5, 21, 37, 54,
10, 38), min = c(12L, 13L, 13L, 13L, 13L, 14L, 22L), hour = c(15L,
15L, 15L, 15L, 15L, 15L, 16L), mday = c(11L, 11L, 11L, 11L, 11L,
11L, 12L), mon = c(3L, 3L, 3L, 3L, 3L, 3L, 3L), year = c(116L,
116L, 116L, 116L, 116L, 116L, 116L), wday = c(1L, 1L, 1L, 1L,
1L, 1L, 2L), yday = c(101L, 101L, 101L, 101L, 101L, 101L, 102L
), isdst = c(1L, 1L, 1L, 1L, 1L, 1L, 1L), zone = c("CEST", "CEST",
"CEST", "CEST", "CEST", "CEST", "CEST"), gmtoff = c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_)), .Names = c("sec", "min", "hour", "mday", "mon",
"year", "wday", "yday", "isdst", "zone", "gmtoff"), class = c("POSIXlt",
"POSIXt")), value1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("time",
"value1"), row.names = c(NA, -7L), class = "data.frame")
Starting with your dput, calling it df, first we'll convert your factor to a POSIXct class, then we will floor it to closest 15 minutes below. (use round instead of floor if you want the closest 15 minutes in general):
df$time = as.POSIXct(df$time)
df$time15 = lubridate::floor_date(df$time, unit = "15 min")
df
# time value1 time15
# 1 2016-04-11 15:12:49 0 2016-04-11 15:00:00
# 2 2016-04-11 15:13:05 0 2016-04-11 15:00:00
# 3 2016-04-11 15:13:21 0 2016-04-11 15:00:00
# 4 2016-04-11 15:13:37 0 2016-04-11 15:00:00
# 5 2016-04-11 15:13:54 0 2016-04-11 15:00:00
# 6 2016-04-11 15:14:10 0 2016-04-11 15:00:00
# 7 2016-04-12 16:22:38 0 2016-04-12 16:15:00
You can then aggregate using the time15 column as the grouper.
I provide an example you can replicate with your data frame. First, I create a dummy time series (ts) as.POSIXct by 5 min intervals and then group them by 15 min intervals using dplyr.
ts <- seq.POSIXt(as.POSIXct("2017-01-01", tz = "UTC"),
as.POSIXct("2017-02-01", tz = "UTC"),
by = "5 min")
ts <- as.data.frame(ts)
library(dplyr)
ts %>%
group_by(interval = cut(ts, breaks = "15 min")) %>%
summarise(count= n())
Output
# A tibble: 2,977 x 2
interval sumvalue
<fct> <int>
1 2017-01-01 00:00:00 3
2 2017-01-01 00:15:00 3
3 2017-01-01 00:30:00 3
4 2017-01-01 00:45:00 3
5 2017-01-01 01:00:00 3
6 2017-01-01 01:15:00 3
7 2017-01-01 01:30:00 3
8 2017-01-01 01:45:00 3
9 2017-01-01 02:00:00 3
10 2017-01-01 02:15:00 3
# ... with 2,967 more rows

Comparing dates inside group using same reference

I have a data table for different patients ("Spell") and several temperature ("Temp") measures for each patient ("Episode"). I also have the date and time in which each temperature was taken.
Spell Episode Date Temp
1 3 2-1-17 21:00 40
1 2 2-1-17 20:00 36
1 1 1-1-17 10:00 37
2 3 2-1-17 15:00 36
2 2 2-1-17 10:00 37
2 1 1-1-17 8:00 36
3 1 3-1-17 10:00 40
4 3 4-1-17 15:00 36
4 2 3-1-17 12:00 40
4 1 3-1-17 10:00 39
5 7 3-1-17 17:30 36
5 6 2-1-17 17:00 36
5 5 2-1-17 16:00 37
5 1 1-1-17 9:00 36
5 4 1-1-17 14:00 39
5 3 1-1-17 13:00 40
5 2 1-1-17 11:00 39
I am interested in keeping all the measurements done 24h prior to the last one, I have grouped the observations by the spell and reverse date, but I am unsure on how to do the in-group comparison using the same reference (in this case, the first row for each group). The result should be:
Spell Episode Date Temp
1 3 2-1-17 21:00 40
1 2 2-1-17 20:00 36
2 3 2-1-17 15:00 36
2 2 2-1-17 10:00 37
3 1 3-1-17 10:00 40
4 3 4-1-17 15:00 36
5 7 3-1-17 17:30 36
Would appreciate any ideas that point me to the right direction.
Edit: Date is in d-m-yy H:M format. Here's dput from data:
structure(list(Spell = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 4L, 4L,
4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), Episode = c(3L, 2L, 1L, 3L,
2L, 1L, 1L, 3L, 2L, 1L, 7L, 6L, 5L, 1L, 4L, 3L, 2L), Date = c("2-1-17 21:00",
"2-1-17 20:00", "1-1-17 10:00", "2-1-17 15:00", "2-1-17 10:00",
"1-1-17 8:00", "3-1-17 10:00", "4-1-17 15:00", "3-1-17 12:00",
"3-1-17 10:00", "3-1-17 17:30", "2-1-17 17:00", "2-1-17 16:00",
"1-1-17 9:00", "1-1-17 14:00", "1-1-17 13:00", "1-1-17 11:00"
), Temp = c(40L, 36L, 37L, 36L, 37L, 36L, 40L, 36L, 40L, 39L,
36L, 36L, 37L, 36L, 39L, 40L, 39L)), .Names = c("Spell", "Episode",
"Date", "Temp"), class = c("data.table", "data.frame"), row.names = c(NA,
-17L), .internal.selfref = <pointer: 0x00000000001f0788>)
library(dplyr)
df %>%
mutate(Date2 = as.numeric(strptime(df$Date, "%d-%m-%Y %H:%M"))) %>%
group_by(Spell) %>%
filter(Date2 >= (max(Date2) - 60*60*24)) %>%
select(-Date2)
Solution using only data.table :
# convert Date column to POSIXct
DT[,Date:=as.POSIXct(Date,format='%d-%m-%y %H:%M',tz='GMT')]
# filter the data.table
filteredDT <- DT[, .SD[as.numeric(difftime(max(Date),Date,units='hours')) <= 24], by = Spell]
> filteredDT
Spell Episode Date Temp
1: 1 3 2017-01-02 21:00:00 40
2: 1 2 2017-01-02 20:00:00 36
3: 2 3 2017-01-02 15:00:00 36
4: 2 2 2017-01-02 10:00:00 37
5: 3 1 2017-01-03 10:00:00 40
6: 4 3 2017-01-04 15:00:00 36
7: 5 7 2017-01-03 17:30:00 36
mydata$Date <- as.POSIXct(mydata$Date, format = '%d-%m-%y %H:%M', tz='GMT')
mydata <- mydata[with(mydata, order(Spell, -as.numeric(Date))),]
index <- with(mydata, tapply(Date, Spell, function(x){x >= max(x) - as.difftime(1, unit="days")}))
mydata[unlist(index),]
Spell Episode Date Temp
1: 1 3 2017-01-02 21:00:00 40
2: 1 2 2017-01-02 20:00:00 36
4: 2 3 2017-01-02 15:00:00 36
5: 2 2 2017-01-02 10:00:00 37
7: 3 1 2017-01-03 10:00:00 40
8: 4 3 2017-01-04 15:00:00 36
11: 5 7 2017-01-03 17:30:00 36
The solution below uses two functions from Hadley Wickham's lubridate() package. This package is very handy when dealing with dates and times so I wonder why it hasn't been used in any of the other answers.
Furthermore, data.table is used because the OP has provided sample data of data.table class.
library(data.table) # if not already loaded
# coerce Date to POSIXct
DT[, Date := lubridate::dmy_hm(Date)][
# for each, pick measurements within last 24 hours
, .SD[Date > max(Date) - lubridate::dhours(24L)], by = Spell][
# order, just for convenience
order(Spell, -Date)]
Spell Episode Date Temp
1: 1 3 2017-01-02 21:00:00 40
2: 1 2 2017-01-02 20:00:00 36
3: 2 3 2017-01-02 15:00:00 36
4: 2 2 2017-01-02 10:00:00 37
5: 3 1 2017-01-03 10:00:00 40
6: 4 3 2017-01-04 15:00:00 36
7: 5 7 2017-01-03 17:30:00 36
Please note that the expected result given by the OP shows an additional row (Spell 5, Episode 6) which is outside of the 24 hrs window.
Data
As provided by the OP
DT <- structure(list(Spell = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 4L, 4L,
4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), Episode = c(3L, 2L, 1L, 3L,
2L, 1L, 1L, 3L, 2L, 1L, 7L, 6L, 5L, 1L, 4L, 3L, 2L), Date = c("2-1-17 21:00",
"2-1-17 20:00", "1-1-17 10:00", "2-1-17 15:00", "2-1-17 10:00",
"1-1-17 8:00", "3-1-17 10:00", "4-1-17 15:00", "3-1-17 12:00",
"3-1-17 10:00", "3-1-17 17:30", "2-1-17 17:00", "2-1-17 16:00",
"1-1-17 9:00", "1-1-17 14:00", "1-1-17 13:00", "1-1-17 11:00"
), Temp = c(40L, 36L, 37L, 36L, 37L, 36L, 40L, 36L, 40L, 39L,
36L, 36L, 37L, 36L, 39L, 40L, 39L)), .Names = c("Spell", "Episode",
"Date", "Temp"), class = c("data.table", "data.frame"), row.names = c(NA, -17L))

Record variable value when condition true with dynamic name

I have 9x2 dataframe DATS with prices and POSIXct datetimestamps sampled every 15 minutes. and a list of dates FOMCDATES with the dates of recent FOMC events. I then split the POSIXct datetimestamps into separate Date and Time columns. I then add column FOMCBinary to DATS containing a 1 whenever the date in DATS is contained in FOMCDATES AND time is 14:30 (EDIT: FOMC is 14:00, used 14:30 by mistake - example still valid).
I would like to record the Close before the event takes place in a separate variable. The name of the variable should be based on the date of the event. In the case at hand, the result should be: PreEvent-2016-01-27 = 1122.7. Please take into account this would actually be run in a large sample with dozens of dates and the time can be other than 14:30 (e.g. if looking at NFP rather than FOMC).
DATS <- structure(list(DateTime = structure(list(sec = c(0, 0, 0, 0,0, 0, 0, 0, 0), min = c(30L, 15L, 0L, 45L, 30L, 15L, 0L, 45L,30L), hour = c(15L, 15L, 15L, 14L, 14L, 14L, 14L, 13L, 13L),mday = c(27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L), mon = c(0L,0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), year = c(116L, 116L, 116L,116L, 116L, 116L, 116L, 116L, 116L), wday = c(3L, 3L, 3L,3L, 3L, 3L, 3L, 3L, 3L), yday = c(26L, 26L, 26L, 26L, 26L,26L, 26L, 26L, 26L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,0L, 0L), zone = c("EST", "EST", "EST", "EST", "EST", "EST","EST", "EST", "EST"), gmtoff = c(NA_integer_, NA_integer_,NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,NA_integer_, NA_integer_)), .Names = c("sec", "min", "hour","mday", "mon", "year", "wday", "yday", "isdst", "zone", "gmtoff"), class = c("POSIXlt", "POSIXt")), Close = c(1127.2, 1127.5,1126.9, 1128.3, 1125.4, 1122.7, 1122.8, 1117.3, 1116)), .Names = c("DateTime","Close"), row.names = 2131:2139, class = "data.frame")
FOMCDATES <- structure(c(16785, 16827, 16876), class = "Date")
DATS$Time <- strftime(DATS$DateTime, format="%H:%M:%S")
DATS$Date <- as.Date(DATS$DateTime)
DATS$FOMCBinary <- ifelse( DATS$Time == "14:30:00" & DATS$Date %in% FOMCDATES, 1, 0)
#Output for FOMCDATES:
[1] 2015-12-16 2016-01-27 2016-03-16
#Output for DATS after calculations performed:
DateTime Close Time Date FOMCBinary
2131 2016-01-27 15:30:00 1127.2 15:30:00 2016-01-27 0
2132 2016-01-27 15:15:00 1127.5 15:15:00 2016-01-27 0
2133 2016-01-27 15:00:00 1126.9 15:00:00 2016-01-27 0
2134 2016-01-27 14:45:00 1128.3 14:45:00 2016-01-27 0
2135 2016-01-27 14:30:00 1125.4 14:30:00 2016-01-27 1
2136 2016-01-27 14:15:00 1122.7 14:15:00 2016-01-27 0
2137 2016-01-27 14:00:00 1122.8 14:00:00 2016-01-27 0
2138 2016-01-27 13:45:00 1117.3 13:45:00 2016-01-27 0
2139 2016-01-27 13:30:00 1116.0 13:30:00 2016-01-27 0
My attempt results in a vector rather than a single value, and the variable name is not dynamic.
#My failed attempt
#Define rowShift function
rowShift <- function(x, shiftLen = 1L) {
r <- (1L + shiftLen):(length(x) + shiftLen)
r[r<1] <- NA
return(x[r]) }
PreEventLevel <- ifelse(DATS$FOMCBinary > 0, rowShift(DATS$Close, +1), 0)
How could this be achieved?
Thank you very much!
Creating variables in the global environment with dynamic names is not a good practice... I would rather use a list as container for your values e.g. :
# get the indexes where FOMCBinary > 0
oneIdxs <- which(DATS$FOMCBinary > 0)
# get the close values using indexes on the shifted vector and put the values in a list
PreEventLevel <- as.list(rowShift(DATS$Close,1)[oneIdxs])
# set the dates as names of the element in the list
names(PreEventLevel) <- DATS$Date[oneIdxs]
> PreEventLevel
$`2016-01-27`
[1] 1122.7
# now you can access to values using:
# PreEventLevel[["2016-01-27"]]
# or
# PreEventLevel$`2016-01-27`
Note that you can also simply create a vector with names instead of a list (just remove as.list), and PreEventLevel will be:
> PreEventLevel
2016-01-27
1122.7
# you can access to values using PreEventLevel["2016-01-27"]

Counting POSIXlt times by day

I have a chunk of POSIXlt times in a data frame, and I'm trying to see how many occurrences of these observances (in this case, bike rides) I have per day. What's the best way to do that?
The dates look like this:
> rides$start.fmtd[1:25]
[1] "2014-01-01 00:06:00" "2014-01-01 00:11:00" "2014-01-01 00:12:00"
[4] "2014-01-01 00:14:00" "2014-01-01 00:15:00" "2014-01-01 00:16:00"
[7] "2014-01-01 00:16:00" "2014-01-01 00:19:00" "2014-01-01 00:20:00"
[10] "2014-01-01 00:20:00"
dput(head()) gives me this:
> dput(head(rides$start.fmtd))
structure(list(sec = c(0, 0, 0, 0, 0, 0), min = c(6L, 11L, 12L,
14L, 15L, 16L), hour = c(0L, 0L, 0L, 0L, 0L, 0L), mday = c(1L,
1L, 1L, 1L, 1L, 1L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(114L,
114L, 114L, 114L, 114L, 114L), wday = c(3L, 3L, 3L, 3L, 3L, 3L
), yday = c(0L, 0L, 0L, 0L, 0L, 0L), isdst = c(0L, 0L, 0L, 0L,
0L, 0L)), .Names = c("sec", "min", "hour", "mday", "mon", "year",
"wday", "yday", "isdst"), class = c("POSIXlt", "POSIXt"))
This specific frame has about 300,000 observances (It's the capitol bikeshare dataset, which contains every bike ride taken in the system, packaged quarterly).
dates <- as.POSIXlt(runif(10, 0, 60 * 60 * 24 * 7), origin = Sys.Date())
dates
## [1] "2014-06-16 03:36:13 PDT" "2014-06-15 22:39:41 PDT"
## [3] "2014-06-19 12:25:11 PDT" "2014-06-17 09:31:45 PDT"
## [5] "2014-06-20 02:20:00 PDT" "2014-06-18 04:36:48 PDT"
## [7] "2014-06-19 17:33:35 PDT" "2014-06-21 15:38:24 PDT"
## [9] "2014-06-17 08:50:45 PDT" "2014-06-20 03:36:38 PDT"
class(dates)
## [1] "POSIXlt" "POSIXt"
table(as.Date(dates))
## 2014-06-15 2014-06-16 2014-06-17 2014-06-18 2014-06-19 2014-06-20 2014-06-21
## 1 1 2 1 2 2 1
POSIXlt has a yday attribute, and you can use this to do a count, using aggregate or by or table or such.
For example, suppose you have a count of observances of day in count in a data frame d, with column date. If your data does not span more than one year, you can use yday alone:
aggregate(count ~ date$yday, data=d, FUN=sum)
If it spans more than one year (or just to be safe) you can also include the year (with any multiplier greater than 366):
aggregate(count ~ I(1000*date$year + date$yday), data=d, FUN=sum)
If you have values with dates and times, you can format them to just have the date and use table() on those values to get counts.
#sample data
set.seed(15)
randomdates <- structure(runif(30, 1357016400, 1359608400),
class=c("POSIXct", "POSIXt"), tzone="")
Now count values per date
table(strftime(randomdates, "%Y-%m-%d"))
The only downside to this is that table() turns the dates to character vectors. You can convert them back with
tbl<-table(strftime(randomdates, "%Y-%m-%d"))
as.POSIXct(names(tbl))

create new variable from date data

Now my data frame is like below
dput(head(t.zoo))
structure(c(85.92, 85.85, 85.83, 85.83, 85.85, 85.87, 1300, 1300,
1299.75, 1299.75, 1299.75, 1300), .Dim = c(6L, 2L), .Dimnames = list(
NULL, c("cl", "es")), index = structure(list(sec = c(0.400000095367432,
0.900000095367432, 1.40000009536743, 1.90000009536743, 2.40000009536743,
2.90000009536743), min = c(30L, 30L, 30L, 30L, 30L, 30L), hour = c(10L,
10L, 10L, 10L, 10L, 10L), mday = c(6L, 6L, 6L, 6L, 6L, 6L), mon = c(5L,
5L, 5L, 5L, 5L, 5L), year = c(112L, 112L, 112L, 112L, 112L, 112L
), wday = c(3L, 3L, 3L, 3L, 3L, 3L), yday = c(157L, 157L, 157L,
157L, 157L, 157L), isdst = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("sec",
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
), class = c("POSIXlt", "POSIXt"), tzone = c("", "EST", "EDT"
)), class = "zoo")
I have two questions, first is I would like to add a variable name for the first column and 2nd is i want to create a categorical variable to help me indicate 2010-06-06 (since there are 3 separate days)
What I should do for the date data?
I'm not familiar with zoo class, so the following code is not nice, but seems working.
yourdata<-as.matrix(yourdata)
justdate <- substr(rownames(yourdata), 1, 10)
justtime <- substr(rownames(yourdata), 11, 19)
row.names(yourdata) <- NULL
yourdata<-as.data.frame(yourdata)
yourdata[,"justdate"]<-justdate
yourdata[,"justtime"]<-justtime
yourdata[yourdata$justdate=="2012-06-06","newvariable"]<-1
> yourdata
cl es justdate justtime newvariable
1 85.92 1300.00 2012-06-06 10:30:00 1
2 85.85 1300.00 2012-06-06 10:30:00 1
3 85.83 1299.75 2012-06-06 10:30:01 1
4 85.83 1299.75 2012-06-06 10:30:01 1
5 85.85 1299.75 2012-06-06 10:30:02 1
6 85.87 1300.00 2012-06-06 10:30:02 1
zoo objects are a little bit different to work with from data.frames.
The "first column" (as you referred to it) is actually not a column, but the index of your object. Try index(t.zoo) and see what it returns. This index really should have unique values; in your case, there are duplicated values, which might affect your calculations.
Conversion to a data.frame can be done like the following. I've added separate "Date" and "Time" variables based on the index from t.zoo.
require(zoo) # Load the `zoo` package if you haven't already done so
t.df = data.frame(Date = format(index(t.zoo), "%Y-%m-%d"),
Time = format(index(t.zoo), "%H:%M:%S"),
data.frame(t.zoo))
t.df
# Date Time cl es
# 1 2012-06-06 10:30:00 85.92 1300.00
# 2 2012-06-06 10:30:00 85.85 1300.00
# 3 2012-06-06 10:30:01 85.83 1299.75
# 4 2012-06-06 10:30:01 85.83 1299.75
# 5 2012-06-06 10:30:02 85.85 1299.75
# 6 2012-06-06 10:30:02 85.87 1300.00
Converting back to a zoo object (keeping the new "Date" and "Time" columns, or any other columns that you have added) can be done like:
zoo(t.df, order.by=index(t.zoo))
Note, however, that this will give you a warning because you don't have unique "order.by" values.

Resources