create a data frame of dates - r

I have a vector of dates:
dates <- seq(as.Date('2017-01-01'), as.Date('2017-12-31'), by = 'days')
I want to create a data frame where this vector is repeated for n rows. Can anyone tell me how I might be able to accomplish this? Any help is greatly appreciated.
Thanks for the suggestions so far. Unfortunately, I think my intention was unclear in my original question. I would like each of n rows in the data frame to contain the vector of dates so that the final data frame would look something like this:
1 2017-01-01 2017-01-02.....2017-12-31
2 2017-01-01 2017-01-02.....2017-12-31
3 2017-01-01 2017-01-02.....2017-12-31
.
.
.
n 2017-01-01 2017-01-02.....2017-12-31

You can use rep to repeat the vector and then coerce to a dataframe. For example, repeating 10 times
num_repeat <- 10
dates <- data.frame(rep(
seq(as.Date('2017-01-01'), as.Date('2017-12-31'), by = 'days'),
times = num_repeat))

As the question asker is hoping to fill n rows, wouldn't it make more sense to specify length.out rather than times?
set.seed(1)
dtf <- data.frame(A=letters[sample(1:27, 1000, TRUE)])
dtf$B <- rep(dates, length.out=nrow(dtf))
tail(dtf)
# A B
# 995 d 2017-09-22
# 996 u 2017-09-23
# 997 r 2017-09-24
# 998 h 2017-09-25
# 999 f 2017-09-26
# 1000 h 2017-09-27

We use replicate to do this
n <- 5
out <- do.call(rbind, replicate(n, as.data.frame(as.list(dates)),
simplify = FALSE))
names(out) <- paste0('V', seq_along(out))
dim(out)
#[1] 5 365
out[1:3, 1:3]
# V1 V2 V3
#1 2017-01-01 2017-01-02 2017-01-03
#2 2017-01-01 2017-01-02 2017-01-03
#3 2017-01-01 2017-01-02 2017-01-03
out[1:3, 362:365]
# V362 V363 V364 V365
#1 2017-12-28 2017-12-29 2017-12-30 2017-12-31
#2 2017-12-28 2017-12-29 2017-12-30 2017-12-31
#3 2017-12-28 2017-12-29 2017-12-30 2017-12-31

Related

Date sequence replication

I want to create a Date sequence as follows:
firstyear <- seq(as.Date('2000-01-01'),by='8 day',length=46)
then append the next year in the date sequence like 'first year', until the year 2017.
Lastly, the sequence contains 46*18 elements, shown visually like this:
2000-01-01
2000-01-09
...
2000-12-26
2001-01-01
...
2001-12-26
...
2017-12-26
How can I generate this Date sequence compactly?
Using sapply
a=c(2000:2017)
yourlist=as.Date(sapply(a,function(x) seq(as.Date(paste0(as.character(x),'-01-01')),by='8 day',length=46)),origin='1970-01-01')
You can create a function which will vary your date generation for you. Notice that I've transformed the output to a data.frame to preserve dates in "native" form.
yearSequence <- function(x) {
data.frame(variable = seq(as.Date(sprintf('%s-01-01', x)), by = '8 day', length = 46))
}
You can apply the function to the years you want.
out <- sapply(2000:2017, FUN = yearSequence, simplify = FALSE)
Combine result as a data.frame.
result <- do.call(rbind, out)
> head(result)
variable
1 2000-01-01
2 2000-01-09
3 2000-01-17
4 2000-01-25
5 2000-02-02
6 2000-02-10
> tail(result)
variable
823 2017-11-17
824 2017-11-25
825 2017-12-03
826 2017-12-11
827 2017-12-19
828 2017-12-27

R and Data.table - applying rollapply over multiple columns

I would really appreciate if you can help me do the rollapply for each column of the data.table
time AUD NZD EUR GBP USD AUD
1 2013-01-01 20:00 0.213 -0.30467 -0.127515
2 2013-01-01 20:05 0.21191 -0.30467 -0.127975
3 2013-01-01 20:10 0.212185 -0.304965 -0.127935
4 2013-01-01 20:15 0.212055 -0.30511 -0.1288
5 2013-01-01 20:20 0.211225 -0.30536 -0.12938
6 2013-01-01 20:25 0.211185 -0.30527 -0.129195
7 2013-01-01 20:30 0.21159 -0.3059 -0.13043
8 2013-01-01 20:35 0.21142 -0.304955 -0.13155
9 2013-01-01 20:40 0.21093 -0.30419 -0.132715
10 2013-01-01 20:45 0.2078 -0.30339 -0.13544
11 2013-01-01 20:50 0.208445 -0.30304 -0.135645
12 2013-01-01 20:55 0.208735 -0.30185 -0.1357
13 2013-01-01 21:00 0.20891 -0.303265 -0.13722
14 2013-01-01 21:05 0.20903 -0.30428 -0.137495
15 2013-01-01 21:10 0.209615 -0.305495 -0.13734
16 2013-01-01 21:15 0.20981 -0.30588 -0.13772
17 2013-01-01 21:20 0.209855 -0.306935 -0.13801
18 2013-01-01 21:25 0.209585 -0.30604 -0.138045
19 2013-01-01 21:30 0.210105 -0.3061 -0.137765
20 2013-01-01 21:35 0.210335 -0.30734 -0.138525
Code that works:
library("zoo")
library("data.table")
calculateAverage <- function (x,N) {
tempDataStorage <- rollapply(out[,1], N, mean)
}
col1 <- out[,2]
col2 <- out[,3]
col3 <- out[,4]
average1 <- calculateAverage(col1, 2)
average2 <- calculateAverage(col2, 2)
average3 <- calculateAverage(col3, 2)
combine <- cbind(average1, average2, average3)
tempMatrix <- matrix(, nrow = nrow(out), ncol = ncol(out))
tempMatrix[2:nrow(out), 1:3] <- combine
Suggestion from SO:
test <- lapply(out[,with=F], function(x) rollapply(x,width=2, FUN=mean))
Challenges:
1. The code I created works, but it feels inefficient and not generic. It needs to be modified whenever the number of cols changes
2. Suggestion from SO output is list which is not useful to me
If an alternate method is suggested, I would be really appreciate it!
Thanks in advance
Edit:
Data table added
data <- cbind(mtcars,as.Date(c("2007-06-22", "2004-02-13")))
merge(rollapply(Filter(is.numeric, data), 2, mean),
Filter(Negate(is.numeric), data))
The first line creates data, so that there are not only numeric values in it. This is only to mimic your data, which is not available right now.
The second line filters only numeric columns and applies mean function to each of filtered columns.
Suggestion from David Arenburg worked perfectly!
MaPrice <- function(x, N) {
Mavg <- rollapply(x, N, mean)
Mavg
}
SpreadMA <- out[, lapply(.SD, MaPrice, N = 20)]

Count consecutive events

I have daily data for 1 year having 0 and 1 values. I want to calculate monthly events, there is consecutive 1 value for 3 on more days using R?
set.seed(123)
abts1 <- sample(0:1, 366, replace=TRUE)
library(xts)
d16 <- seq(as.Date("2016-01-01"), as.Date("2016-12-31"), 1)
ax16 <- as.Date(d16,"%y-%m-%d")
abts12 <- xts(abts1, ax16)
# but it gives events for complete period, not as monthly.
apply.monthly(abts12, function(x) sum(with(rle(c(x!=0)), lengths*values)>=3))
The last line of your code throws an error for me when I use xts_0.9-7.
R> apply.monthly(abts12, function(x) sum(with(rle(x!=0), lengths*values)>=3))
Error in rle(x != 0) : 'x' must be a vector of an atomic type
That's easy to fix though. You just need to wrap x != 0 in as.logical.
R> apply.monthly(abts12, function(x) sum(with(rle(as.logical(x!=0)), lengths*values)>=3))
[,1]
2016-01-31 2
2016-02-29 1
2016-03-31 3
2016-04-30 2
2016-05-31 1
2016-06-30 2
2016-07-31 3
2016-08-31 3
2016-09-30 2
2016-10-31 3
2016-11-30 0
2016-12-31 2
That seems like the output you expect. The number of times there are 3 or more consecutive days with a value of 1.

Fastest way for filling-in missing dates for data.table

I am loading a data.table from CSV file that has date, orders, amount etc. fields.
The input file occasionally does not have data for all dates. For example, as shown below:
> NADayWiseOrders
date orders amount guests
1: 2013-01-01 50 2272.55 149
2: 2013-01-02 3 64.04 4
3: 2013-01-04 1 18.81 0
4: 2013-01-05 2 77.62 0
5: 2013-01-07 2 35.82 2
In the above 03-Jan and 06-Jan do not have any entries.
Would like to fill the missing entries with default values (say, zero for orders, amount etc.), or carry the last vaue forward (e.g, 03-Jan will reuse 02-Jan values and 06-Jan will reuse the 05-Jan values etc..)
What is the best/optimal way to fill-in such gaps of missing dates data with such default values?
The answer here suggests using allow.cartesian = TRUE, and expand.grid for missing weekdays - it may work for weekdays (since they are just 7 weekdays) - but not sure if that would be the right way to go about dates as well, especially if we are dealing with multi-year data.
The idiomatic data.table way (using rolling joins) is this:
setkey(NADayWiseOrders, date)
all_dates <- seq(from = as.Date("2013-01-01"),
to = as.Date("2013-01-07"),
by = "days")
NADayWiseOrders[J(all_dates), roll=Inf]
date orders amount guests
1: 2013-01-01 50 2272.55 149
2: 2013-01-02 3 64.04 4
3: 2013-01-03 3 64.04 4
4: 2013-01-04 1 18.81 0
5: 2013-01-05 2 77.62 0
6: 2013-01-06 2 77.62 0
7: 2013-01-07 2 35.82 2
Here is how you fill in the gaps within subgroup
# a toy dataset with gaps in the time series
dt <- as.data.table(read.csv(textConnection('"group","date","x"
"a","2017-01-01",1
"a","2017-02-01",2
"a","2017-05-01",3
"b","2017-02-01",4
"b","2017-04-01",5')))
dt[,date := as.Date(date)]
# the desired dates by group
indx <- dt[,.(date=seq(min(date),max(date),"months")),group]
# key the tables and join them using a rolling join
setkey(dt,group,date)
setkey(indx,group,date)
dt[indx,roll=TRUE]
#> group date x
#> 1: a 2017-01-01 1
#> 2: a 2017-02-01 2
#> 3: a 2017-03-01 2
#> 4: a 2017-04-01 2
#> 5: a 2017-05-01 3
#> 6: b 2017-02-01 4
#> 7: b 2017-03-01 4
#> 8: b 2017-04-01 5
Not sure if it's the fastest, but it'll work if there are no NAs in the data:
# just in case these aren't Dates.
NADayWiseOrders$date <- as.Date(NADayWiseOrders$date)
# all desired dates.
alldates <- data.table(date=seq.Date(min(NADayWiseOrders$date), max(NADayWiseOrders$date), by="day"))
# merge
dt <- merge(NADayWiseOrders, alldates, by="date", all=TRUE)
# now carry forward last observation (alternatively, set NA's to 0)
require(xts)
na.locf(dt)

How to get lastest value based on date in R?

I tried to find out(guess) current status based on lastest status.
Assume that we have following data frame(it's abbreviation of real data)
examineData
ID Date Status_Value
A 2012-01-01 100
A 2012-01-10 200
A 2012-02-20 500
B 2012-01-01 1100
B 2012-01-10 1200
B 2012-02-20 1500
C 2012-01-01 2100
C 2012-01-10 2200
C 2012-02-20 2500
In above, A,B and C are objects which have status_value. Status_values were examined on the Date.
asked
ID Date
A 2012-01-09
A 2012-02-28
B 2012-02-19
C 2012-01-10
But, someone asked about status from A,B and C (it could be less) on specific date.
As you can see, some of asked$Date does not match to the examinData$Date.
In that case, we decided to get lastest data from examineData$Date.
ID Date Status_Value
A 2012-01-09 100
A 2012-02-28 500
B 2012-02-19 1200
C 2012-01-10 2200
Would you give me a sample code? (Speed is important - 1,600,000 rows of examineData, 110,000 rows of asked)
In addition, There are over 60,000 kinds of ID. And, there are no duplicate date in a same ID in examineData
This seems to work:
examineData$Date <- as.Date(examineData$Date, format = "%Y-%m-%d")
asked$Date <- as.Date(asked$Date, format = "%Y-%m-%d")
#res <- unlist(lapply(split(examineData, examineData$ID),
# function(x) { merged <- c(x$Date, asked$Date[asked$ID == unique(x$ID)]) ;
# x$Status_Value[which(order(merged) %in% length(merged)) - 1] }))
I guess, though, a data.table solution might be more efficient than this.
EDIT Modified solution, provided -now- that there might be duplicate IDs in asked:
#dates should, still, be turned into actual dates if they aren't
#function to (m)apply over asked
fun <- function(id, date)
{
subsetted_examineData <- examineData[examineData$ID == id,]
merged <- c(subsetted_examineData$Date, date)
res <- subsetted_examineData$Status_Value[which(order(merged) %in% length(merged)) -1]
return(res)
}
res <- mapply(fun, asked$ID, asked$Date)
res
# A A B C
# 100 500 1200 2200
cbind(asked, Status_Value = unname(res))
# ID Date Status_Value
#1 A 2012-01-09 100
#2 A 2012-02-28 500
#3 B 2012-02-19 1200
#4 C 2012-01-10 2200
sel <- vector()
for(i in 1:length(unique(examineData$ID))){
id <- unique(examineData$ID)[i]
set <- subset(examineData,ID==id)
dif <- asked[asked$ID==id,"Date"] - set$Date
dif[dif<0] <- NA
sel[i] <- row.names(set)[which.min(dif)]
}
examineData[sel,]
To get this
ID Date Status_Value
1 A 2012-01-01 100
5 B 2012-01-10 1200
8 C 2012-01-10 2200
You can build in some "corrections" for missing values, but as you have not specified any of, this is the clean way.

Resources