Binning data across interval boundaries - r

Say I have these data:
start end duration
1 2.67026 2.903822 0.233562
2 4.40529 5.606470 1.201180
3 9.24340 10.010818 0.767418
4 11.87930 13.414140 1.534840
5 14.78210 15.182492 0.400392
6 16.51720 16.817494 0.300294
7 22.08930 25.125610 3.036310
8 32.13240 33.667240 1.534840
9 45.47880 45.912558 0.433758
10 52.85270 54.454270 1.601570
11 55.62210 56.389518 0.767418
They represent 11 events that occurred within a minute. Each has a start and end time (in seconds) and the duration of that event (in seconds).
What I want to calculate is how many seconds were spent doing these events in each 10 second bin/epoch.
A standard way of binning data in data.table would be to do something like:
as.data.table(df)[, .(total = sum(duration)), by = .(INTERVAL = cut(end, seq(0,60,10)))]
INTERVAL total
1: (0,10] 1.434742
2: (10,20] 3.002944
3: (20,30] 3.036310
4: (30,40] 1.534840
5: (40,50] 0.433758
6: (50,60] 2.368988
However, note that event 3 starts at 9.24340 seconds and ends at 10.010818 seconds. This method has only summed the durations of the first two events in the interval (0,10). I want that first interval to include 10-9.24340 = 0.7566 seconds, i.e. it should be 2.19132 seconds. This number should be subtracted from the second interval, it should be 2.246344 seconds.
In this example, the 0-10 / 10-20 seconds are the only ones where an event spans the cut point, however, obviously I need to find a solution that generalizes to any number of potential cut points.
I think a solution might be to convert the times to datetime format (including milliseconds?) and use that to cut the data, however, I wasn't able to make that work.
EDIT following #Arun 's answer:
#Arun 's answer works for the above problem well. But what if we want to include all intervals - even those where the summed duration = 0.
Example:
set.seed(1)
df<-
data.frame(
start=c(2.3, 3.5,6.7,9.4,10.4,13.5,16.3,18.1),
duration=runif(8,0,1)
)
df$end<-df$start+df$duration
dt<-data.table(df)
dt
start duration end
1: 2.3 0.2655087 2.565509
2: 3.5 0.3721239 3.872124
3: 6.7 0.5728534 7.272853
4: 9.4 0.9082078 10.308208
5: 10.4 0.2016819 10.601682
6: 13.5 0.8983897 14.398390
7: 16.3 0.9446753 17.244675
8: 18.1 0.6607978 18.760798
Following Arun's solution:
lookup = data.table(start = seq(0, 18, by = 2), end = seq(2, 20, by = 2))
ans = foverlaps(dt, setkey(lookup, start, end))
ans[, sum(pmin(i.end, end) - pmax(i.start, start)), by=.(start,end)]
Result:
1: 2 4 0.6376326
2: 6 8 0.5728534
3: 8 10 0.6000000
4: 10 12 0.5098897
5: 12 14 0.5000000
6: 14 16 0.3983897
7: 16 18 0.9446753
8: 18 20 0.6607978
Notice the intervals 0-2 and 4-6 are not included in the result. Obviously, we could bind these back in - but I wonder if this can be done simply by adjusting the data.table code?

Here's a way I could think of with foverlaps().
require(data.table) # v1.9.5+ (due to bug fixes in foverlaps for double)
lookup = data.table(start = seq(0, 50, by = 10), end = seq(10, 60, by = 10))
# start end
# 1: 0 10
# 2: 10 20
# 3: 20 30
# 4: 30 40
# 5: 40 50
# 6: 50 60
ans = foverlaps(dt, setkey(lookup, start, end))
ans[, sum(pmin(i.end, end) - pmax(i.start, start)), by=.(start,end)]
# start end V1
# 1: 0 10 2.191342
# 2: 10 20 2.246344
# 3: 20 30 3.036310
# 4: 30 40 1.534840
# 5: 40 50 0.433758
# 6: 50 60 2.368988
I feel like there may be better options out there though..

Related

Network Trip Assignment with igraph

My problem:
I have a street network (df.net) and a list containing the Origins and Destinations of trips (df.trips).
I need to find the flow on all links?
library(dplyr)
df.net = tribble(~from, ~to, ~weight,1,2,1,2,1,1,1,9,3,9,1,2,2,10,1,10,2,2,9,10,8,10,9,15,9,8,1,8,9,2,7,8,2,12,7,3,9,12,10,12,9,9,12,6,2,6,12,5,11,12,3,12,11,3,5,6,1,11,5,4,5,11,3,11,4,3,4,3,5,3,10,4,10,11,10)
df.trips = tribble(~from, ~to, ~N,1,2,45,1,4,24,1,5,66,1,9,12,1,11,54,2,3,63,2,4,22,2,7,88,2,12,44,3,2,6,3,8,43,3,10,20,3,11,4,4,1,9,4,5,7,4,6,35,4,9,1,5,7,55,5,8,21,5,1,23,5,7,12,5,2,18,6,2,31,6,3,6,6,5,15,6,8,19,7,1,78,7,2,48,7,3,92,7,6,6,8,2,77,8,4,5,8,5,35,8,6,63,8,7,22)
This is my solution:
library(igraph)
# I construct a directed igraph network:
graph = igraph::graph_from_data_frame(d=df.net, directed=T)
plot(graph)
# I make a vector of edge_ids:
edges = paste0(df.net$from,":",df.net$to)
# and an empty vector of same length to fill with the flow afterwards:
N = integer(length(edges))
# I loop through all Origin-Destination-pairs:
for(i in 1:nrow(df.trips)){
# provides one shortest path between one Origin & one Destination:
path = shortest_paths(graph = graph,
from = as.character(df.trips$from[i]),
to = as.character(df.trips$to[i]),
mode = "out",
weights = NULL)
# Extract the names of vetices on the path:
a = names(path$vpath[[1]])
# Make a vector of the edge_ids:
a2 = a[2:length(a)]
a = a[1:(length(a)-1)]
a = paste0(a,":",a2)
# and fill the vector with the trips
v = integer(length(edges))
v[edges %in% a] = pull(df.trips[i,3])
# adding the trips of this iteration to the sum
N = N + v
}
# attach vector to network-dataframe:
df.net = data.frame(df.net, N)
Theoretically it works. It just takes approx. 8h for my real network to finish (about 500 000 Origin-Destination-pairs on a network with a bit less than 50 000 links).
I am pretty sure my for-loop is the culprit.
So my questions concerning optimization are:
1) Is there a igraph-function which simply does what I want to do? I could not find it...
2) Maybe there is another package better suited to my needs which I haven't stumbled upon?
3) If not, should I go for loop-performance improvement by rewriting it with the Rcpp-package?
Anyways, I am grateful for any help you can provide me.
Thanks in advance!
I have what I hope is a faster solution, although I get slightly different results from you.
This approach multithreads with data.table, calls igraph::shorest_paths only once per from vertex, and avoids using the names attributes of the graph until the trivial last step.
library(igraph)
library(tibble)
library(data.table)
library(zoo)
library(purrr)
df.net = tribble(~from, ~to, ~weight,1,2,1,2,1,1,1,9,3,9,1,2,2,10,1,10,2,2,9,10,8,10,9,15,9,8,1,8,9,2,7,8,2,12,7,3,9,12,10,12,9,9,12,6,2,6,12,5,11,12,3,12,11,3,5,6,1,11,5,4,5,11,3,11,4,3,4,3,5,3,10,4,10,11,10)
graph = igraph::graph_from_data_frame(d=df.net, directed=T)
df.trips = tribble(~from, ~to, ~N,1,2,45,1,4,24,1,5,66,1,9,12,1,11,54,2,3,63,2,4,22,2,7,88,2,12,44,3,2,6,3,8,43,3,10,20,3,11,4,4,1,9,4,5,7,4,6,35,4,9,1,5,7,55,5,8,21,5,1,23,5,7,12,5,2,18,6,2,31,6,3,6,6,5,15,6,8,19,7,1,78,7,2,48,7,3,92,7,6,6,8,2,77,8,4,5,8,5,35,8,6,63,8,7,22)
l.trips <- split(df.trips,1:nrow(df.trips))
setDT(df.trips)
Result <- df.trips[,setnames(lapply(shortest_paths(graph = graph,from= from,to = to,weights=NULL,mode = "out")$vpath,
function(x){zoo::rollapply(x,width=2,c)}) %>% map2(.,N,~ {.x %x% rep(1,.y)} %>% as.data.frame) %>%
rbindlist %>% .[,.N,by = c("V1","V2")],c("new.from","new.to","N")),by=from][,sum(N),by = c("new.from","new.to")]
Result[,`:=`(new.from = V(graph)$name[Result$new.from],
new.to = V(graph)$name[Result$new.to])]
# new.from new.to V1
# 1: 1 2 320
# 2: 2 10 161
# 3: 1 9 224
# 4: 9 8 73
# 5: 10 11 146
# 6: 11 4 102
# 7: 2 1 167
# 8: 9 12 262
# 9: 4 3 44
#10: 9 1 286
#11: 12 6 83
#12: 12 11 24
#13: 11 5 20
#14: 10 2 16
#15: 11 12 35
#16: 12 7 439
#17: 8 9 485
#18: 7 8 406
#19: 6 12 202

Use previous calculated row value in r Continued 2

I have a data.table that looks like this:
library(data.table)
DT <- data.table(A=1:20, B=1:20*10, C=1:20*100)
DT
A B C
1: 1 10 100
2: 2 20 200
3: 3 30 300
4: 4 40 400
5: 5 50 500
...
20: 20 200 2000
I want to be able to calculate a new column "R" that has the first value as
DT$R[1]<-tanh(DT$B[1]/400000)
, and then I want to use the first row of column R to help calculate the next row value of G.
DT$R[2] <- 0.5*tanh(DT$B[2]/400000) + DT$R[1]*0.6
DT$R[3] <- 0.5*tanh(DT$B[3]/400000) + DT$R[2]*0.6
DT$R[4] <- 0.5*tanh(DT$B[4]/400000) + DT$R[3]*0.6
This will then look a bit like this
A B C R
1: 1 10 100 2.5e-05
2: 2 20 200 4e-05
3: 3 30 300 6.15e-05
4: 4 40 400 8.69e-05
5: 5 50 500 0.00011464
...
20: 20 200 2000 0.0005781274
Any ideas on this would be made?
Is this what you are looking for ?
DT <- data.table(A=1:20, B=1:20*10, C=1:20*100)
DT$R = 0
DT$R[1]<-tanh(DT$B[1]/400000)
for(i in 2:nrow(DT)) {
DT$R[i] <- 0.5*tanh(DT$B[i]/400000) + DT$R[i-1]*0.6
}

R Compute Statistics on Lagged Partitions

I have a data.frame with one column containing categorical data, one column containing dates, and one column containing numeric values. For simplicity, see the sample below:
A B C
1 L 2015-12-01 5.7
2 M 2015-11-30 2.1
3 K 2015-11-01 3.2
4 L 2015-10-05 5.7
5 M 2015-12-05 1.2
6 L 2015-11-15 2.3
7 L 2015-12-03 4.4
I would like to, for each category in A, compute a lagging average (e.g. average of the previous 30 days' values in column C).
I cannot for the life of me figure this one out. I have tried using sapply and a custom function that subsets the data.frame on category and date (or a deep copy of it) and returns the statistic (think mean or sd) and that works fine for single values, but it returns all NA's from inside sapply.
Any help you can give is appreciated.
This could be done more compactly, but here I have drawn it out to make it easiest to understand. The core is the split, lapply/apply, and then putting it back together. It uses a date window rather than a solution based on sorting, so it is very general. I also put the object back to its original order to enable direct comparison.
# set up the data
set.seed(100)
# create a data.frame with about a two-month period for each category of A
df <- data.frame(A = rep(c("K", "L", "M"), each = 60),
B = rep(seq(as.Date("2015-01-01"), as.Date("2015-03-01"), by="days"), 3),
C = round(runif(180)*6, 1))
head(df)
## A B C
## 1 K 2015-01-01 1.8
## 2 K 2015-01-02 1.5
## 3 K 2015-01-03 3.3
## 4 K 2015-01-04 0.3
## 5 K 2015-01-05 2.8
## 6 K 2015-01-06 2.9
tail(df)
## A B C
## 175 M 2015-02-24 4.8
## 176 M 2015-02-25 2.0
## 177 M 2015-02-26 5.7
## 178 M 2015-02-27 3.9
## 179 M 2015-02-28 2.8
## 180 M 2015-03-01 3.6
# preserve original order
df$originalOrder <- 1:nrow(df)
# randomly shuffle the order
randomizedOrder <- order(runif(nrow(df)))
df <- df[order(runif(nrow(df))), ]
# split on A - your own data might need coercion of A to a factor
df.split <- split(df, df$A)
# set the window size
window <- 30
# compute the moving average
listD <- lapply(df.split, function(tmp) {
apply(tmp, 1, function(x) mean(tmp$C[tmp$B <= as.Date(x["B"]) & tmp$B (as.Date(x["B"]) - window)]))
})
# combine the result with the original data
result <- cbind(do.call(rbind, df.split), rollingMean = unlist(listD))
# and tidy up:
# return to original order
result <- result[order(result$originalOrder), ]
result$originalOrder <- NULL
# remove the row names
row.names(result) <- NULL
result[c(1:5, 59:65), ]
## A B C rollingMean
## 1 K 2015-01-01 1.8 1.800000
## 2 K 2015-01-02 1.5 1.650000
## 3 K 2015-01-03 3.3 2.200000
## 4 K 2015-01-04 0.3 1.725000
## 5 K 2015-01-05 2.8 1.940000
## 59 K 2015-02-28 3.6 3.080000
## 60 K 2015-03-01 1.3 3.066667
## 61 L 2015-01-01 2.8 2.800000
## 62 L 2015-01-02 3.9 3.350000
## 63 L 2015-01-03 5.8 4.166667
## 64 L 2015-01-04 4.1 4.150000
## 65 L 2015-01-05 2.7 3.860000

Calculate mean of a proportion of the data.frame

I'm working with data that looks similar to this:
cat value n
1 100 18
2 0 19
3 -100 15
4 100 13
5 0 17
6 -100 18
In the real data, there are many cats and value can be any number between -100 and 100 (no NA).
What I want to do is to calculate the mean of value based on terciles defined by n
So, for example, since sum(n)=100 what I want to do is to get n's as close as possible to 33 and calculate the mean of value. So for the first tercile, 18 isn't quite 33, so I need to take 15 values from cat=2. So the mean for the first tercile should be (100*18+0*15)/(18+15). The second tercile would be the remaining ns from cat=2, then as many as are needed to get to 33: (0*4+-100*15+100*13+0*1)/(4+15+13+1). Similar for the last tercile.
I got started writing this, but ended up with lots of nasty for loops and if statements. I'm hoping that you see an easier way to deal with this than I do. Thanks in advance!
A solution with data.table:
setDT(df)[rep(1:.N,n)
][,indx:=c(rep("a",33),rep("b",33),rep("c",34))
][,.(mean_val_indx=mean(value)),by=indx]
this gives:
indx mean_val_indx
1: a 54.545455
2: b -6.060606
3: c -52.941176
Which are the means of value for the three parts of the data.
Broken down in the intermediate steps:
1: replice the rows according n
setDT(df)[rep(1:.N,n)]
this gives (shortened):
cat value n
1: 1 100 18
2: 1 100 18
....
17: 1 100 18
18: 1 100 18
19: 2 0 19
20: 2 0 19
....
36: 2 0 19
37: 2 0 19
38: 3 -100 15
....
99: 6 -100 18
100: 6 -100 18
2: create an index with [,indx:=c(rep("a",33),rep("b",33),rep("c",34))]
setDT(df)[rep(1:.N,n)
][,indx:=c(rep("a",33),rep("b",33),rep("c",34))]
this gives:
> dt
cat value n indx
1: 1 100 18 a
2: 1 100 18 a
....
17: 1 100 18 a
18: 1 100 18 a
19: 2 0 19 a
20: 2 0 19 a
....
32: 2 0 19 a
33: 2 0 19 a
34: 2 0 19 b
35: 2 0 19 b
....
99: 6 -100 18 c
100: 6 -100 18 c
3: summarise value by indx with [,.(mean_val_indx=mean(value)),by=indx]
You could try something like this, data being your example dataframe:
longData<-unlist(apply(data[,c("value","n")],1,function(x){
rep(x["value"],x["n"])
}))
aggregate(longData,list(cut(seq_along(longData),breaks=3,right=FALSE)),mean)
longData will be a vector of length 100 with, using your example, 18 repetitions of -100, 19 repetitions of 0 etc.
The cut in the aggregate will divide longData into three groups, and the mean of each group will be calculated.
If already the data is very long repetition by "n" is perhaps unwanted.
The following solution doesn't do this. Moreover, 1/3 of the sum of the
"n"-values is not rounded to the nearest integer.
"i" is the vector of row numbers where terciles end. Since it is possible
that several terciles end at the same row, those row numbers are replicated.
The result is the vector "k".
For each index "j" the cumulative sum of "data$value"*"data$n" up to "k[j]"
covers "ms[k[j]]" terciles, so "ms[j]-j" terciles have to be subtracted
to get the cumulative sum up to the "j"th tercile.
m <- 3
sn <- sum(data$n)
ms <- m * cumsum(data$n) / sn
d <- diff(c(0,floor(ms)))
i <- which(d>0)
k <- rep(i,d[i])
vn <- data$value * data$n
sums <- cumsum(vn)[k] - (ms[k]-(1:m))*data$value[k]*sn/m
means <- m*diff(c(0,sums))/sn
The means of the terciles are:
> means
[1] 54 -6 -54
In this example "i" is equal to "k". But if terciles are replaced by deciles,
i.e. "m" is not 3 but 10, they are distinct:
> m
[1] 10
> i
[1] 1 2 3 4 5 6
> k
[1] 1 2 2 3 3 4 5 5 6 6
> means
[1] 100 80 0 -30 -100 60 50 0 -80 -100
I compared the speed of the 4 answers, using out small example with 8 rows:
> ##### "longData"-Answer #####
>
> system.time( for ( i in 1:1000 ) { A1 <- f1(data) } )
User System verstrichen
3.48 0.00 3.49
> ##### "sapply"-Answer #####
>
> system.time( for ( i in 1:1000 ) { A2 <- f2(data) } )
User System verstrichen
1.00 0.00 0.99
> ##### "data.table"Answer #####
>
> system.time( for ( i in 1:1000 ) { A3 <- f3(data) } )
User System verstrichen
4.73 0.00 4.79
> ##### this Answer #####
>
> system.time( for ( i in 1:1000 ) { A4 <- f4(data) } )
User System verstrichen
0.43 0.00 0.44
The "sapply"-Answer is even false:
> A1
Group.1 x
1 [0.901,34) 54.545455
2 [34,67) -6.060606
3 [67,100) -52.941176
> A2
(0,33] (33,67] (67,100]
-100.00000 0.00000 93.93939
> A3
indx mean_val_indx
1: a 54.545455
2: b -6.060606
3: c -52.941176
> A4
[1] 54 -6 -54
>
This is basically the same as NicE although perhaps useful as a different way fo assembling the rep and cutting operations:
sapply(split( sort(unlist( mapply(rep, res$value, res$n) )),
cut(seq(sum(res$n)), breaks=c(0,33,67,100) )),
mean)
(0,33] (33,67] (67,100]
-100.00000 0.00000 93.93939

split apply recombine, plyr, data.table in R

I am doing the classic split-apply-recombine thing in R. My data set is a bunch of firms over time. The applying I am doing is running a regression for each firm and returning the residuals, therefore, I am not aggregating by firm. plyr is great for this but it takes a very very long time to run when the number of firms is large. Is there a way to do this with data.table?
Sample Data:
dte, id, val1, val2
2001-10-02, 1, 10, 25
2001-10-03, 1, 11, 24
2001-10-04, 1, 12, 23
2001-10-02, 2, 13, 22
2001-10-03, 2, 14, 21
I need to split by each id (namely 1 and 2). Run a regression, return the residuals and append it as a column to my data. Is there a way to do this using data.table?
DWin's answer is correct for v1.8.0 (as currently on CRAN). But in v1.8.1 (on R-Forge repository), := now works by group. It works for non-contiguous groups too so there is no need to setkey first for it to line up.
dtb <- as.data.table(dat)
dtb
dte id val1 val2
1: 2001-10-02 1 10 25
2: 2001-10-03 1 11 24
3: 2001-10-04 1 12 23
4: 2001-10-02 2 13 22
5: 2001-10-03 2 14 21
dtb[, resid:=residuals(lm(val1 ~ val2)), by=id]
dte id val1 val2 resid
1: 2001-10-02 1 10 25 1.631688e-15
2: 2001-10-03 1 11 24 -3.263376e-15
3: 2001-10-04 1 12 23 1.631688e-15
4: 2001-10-02 2 13 22 0.000000e+00
5: 2001-10-03 2 14 21 0.000000e+00
To upgrade to v1.8.1 just install from the R-Forge repo. (R 2.15.0+ is needed when installing any binary package from R-Forge) :
install.packages("data.table", repos="http://R-Forge.R-project.org")
or install from source if you can't upgrade to latest R. data.table itself only needs R 2.12.0+.
Extending to the 1MM case :
DT = data.table(dte=Sys.Date()+1:1000000,
id=sample(1:2, 1000000, repl=TRUE),
val1=runif(1000000), val2=runif(1000000) )
setkey(DT, id)
system.time(ans1 <- cbind(DT, DT[, residuals(lm(val1 ~ val2)), by="id"]) )
user system elapsed
12.272 0.872 13.182
ans1
dte id val1 val2 id V1
1: 2012-07-02 1 0.8369147 0.57553383 1 0.336647598
2: 2012-07-05 1 0.0109102 0.02532214 1 -0.488633325
3: 2012-07-06 1 0.4977762 0.16607786 1 -0.001952414
---
999998: 4750-05-27 2 0.1296722 0.62645838 2 -0.370627034
999999: 4750-05-28 2 0.2686352 0.04890710 2 -0.231952238
1000000: 4750-05-29 2 0.9981029 0.91626787 2 0.497948275
system.time(DT[, resid:=residuals(lm(val1 ~ val2)), by=id])
user system elapsed
7.436 0.648 8.107
DT
dte id val1 val2 resid
1: 2012-07-02 1 0.8369147 0.57553383 0.336647598
2: 2012-07-05 1 0.0109102 0.02532214 -0.488633325
3: 2012-07-06 1 0.4977762 0.16607786 -0.001952414
---
999998: 4750-05-27 2 0.1296722 0.62645838 -0.370627034
999999: 4750-05-28 2 0.2686352 0.04890710 -0.231952238
1000000: 4750-05-29 2 0.9981029 0.91626787 0.497948275
The example above only has 2 groups, is quite small at under 40MB, and Rprof shows 96% of the time is spent in lm. So in these cases := by group is not for a speed advantage really, but more for the convenience; i.e., less code needed to write and no superfluous columns added to the output. As size grows, the avoidance of copies comes into it and speed advantages start to show. Especially, transform in j will slow down terribly as the number of groups increases.
I'm guessing this needs to be sorted by "id" to line up properly. Luckily that happens automatically when you set the key:
dat <-read.table(text="dte, id, val1, val2
2001-10-02, 1, 10, 25
2001-10-03, 1, 11, 24
2001-10-04, 1, 12, 23
2001-10-02, 2, 13, 22
2001-10-03, 2, 14, 21
", header=TRUE, sep=",")
dtb <- data.table(dat)
setkey(dtb, "id")
dtb[, residuals(lm(val1 ~ val2)), by="id"]
#---------------
cbind(dtb, dtb[, residuals(lm(val1 ~ val2)), by="id"])
#---------------
dte id val1 val2 id.1 V1
[1,] 2001-10-02 1 10 25 1 1.631688e-15
[2,] 2001-10-03 1 11 24 1 -3.263376e-15
[3,] 2001-10-04 1 12 23 1 1.631688e-15
[4,] 2001-10-02 2 13 22 2 0.000000e+00
[5,] 2001-10-03 2 14 21 2 0.000000e+00
> dat <- data.frame(dte=Sys.Date()+1:1000000,
id=sample(1:2, 1000000, repl=TRUE),
val1=runif(1000000), val2=runif(1000000) )
> dtb <- data.table(dat)
> setkey(dtb, "id")
> system.time( cbind(dtb, dtb[, residuals(lm(val1 ~ val2)), by="id"]) )
user system elapsed
1.696 0.798 2.466
> system.time( dtb[,transform(.SD,r = residuals(lm(val1~val2))),by = "id"] )
user system elapsed
1.757 0.908 2.690
EDIT from Matthew :
This is all correct for v1.8.0 on CRAN. With the small addition that transform in j is the subject of data.table wiki point 2: "For speed don't transform() by group, cbind() afterwards". But, := now works by group in v1.8.1 and is both simple and fast. See my answer for illustration (but no need to vote for it).
Well, I voted for it. Here is the console command to install v 1.8.1on a Mac (if you have the proper XCode tools avaialble, since it only there in source):
install.packages("data.table", repos= "http://R-Forge.R-project.org", type="source",
lib="/Library/Frameworks/R.framework/Versions/2.14/Resources/lib")
(For some reason I could not get the Mac GUI Package Installer to read r-forge as a repository.)

Resources