Fill NA while keeping continous scale [duplicate] - r

This question already has answers here:
How to replace NA values in a data.table with na.spline
(2 answers)
How to replace NA (missing values) in a data frame with neighbouring values
(3 answers)
Closed 2 years ago.
I'd like to know if there's a way to fill NA values while keeping a continuous scale for a numeric vector.
Suppose I have a vector like this:
set.seed(55)
as.list(missForest::prodNA(data.frame(a=c(1:100)),noNA=0.3))
$a
[1] 1 NA 3 NA 5 NA 7 8 9 10 11 12 13 14 15 16 17 18 19 NA
[21] 21 22 23 24 NA 26 27 28 29 30 31 32 33 NA 35 NA 37 38 39 40
[41] 41 42 43 NA 45 46 47 48 NA 50 51 52 53 54 55 56 57 NA NA 60
[61] 61 62 NA NA 65 66 NA NA NA NA NA NA NA 74 75 NA 77 NA 79 NA
[81] 81 82 NA 84 85 86 NA 88 89 90 91 92 NA 94 95 NA NA NA NA 100
How can I get
> as.list(data.frame(a=c(1:100)))
$a
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
[21] 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
[41] 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
[61] 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
[81] 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
by filling NA?

You can use zoo's na.spline
x <- missForest::prodNA(data.frame(a=c(1:100)),noNA=0.3)$a
zoo::na.spline(x)
#[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
#[16] 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
#[31] 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
#[46] 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
#[61] 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
#[76] 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
#[91] 91 92 93 94 95 96 97 98 99 100

Related

How to access very first object in differently deep nested lists?

I need to access the first element of a list. The problem is that the lists vary in the way how deep they are nested. Here is an example:
list1 <- list(ts(1:100),
list(1:19,
factor(letters)))
list2 <- list(list(list(ts(1:100), data.frame(a= rnorm(100))),
matrix(rnorm(10))),
NA)
My expected output is to get the time seriests(1:100) for both lists, i.e. list1[[1]] and list2[[1]][[1]][[1]]. I've tried different stuff, among others lapply(list2, `[[`, 1) which here does not work here.
Another base R solution - you could do this with a recursive function:
list1 <- list(ts(1:100),
list(1:19,
factor(letters)))
list2 <- list(list(list(ts(1:100), data.frame(a= rnorm(100))),
matrix(rnorm(10))),
NA)
recursive_fun <- function(my_list) {
if (inherits(my_list, 'list')) {
Recall(my_list[[1]])
} else {
my_list
}
}
Output:
> recursive_fun(list1)
Time Series:
Start = 1
End = 100
Frequency = 1
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
[31] 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
[61] 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100
> recursive_fun(list2)
Time Series:
Start = 1
End = 100
Frequency = 1
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
[31] 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
[61] 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100
You can use rrapply::rrapply:
library(rrapply)
firstList1 <- rrapply(list1, how = "flatten")[[1]]
firstList2 <- rrapply(list2, how = "flatten")[[1]]
all.equal(firstList1, firstList2)
# [1] TRUE
output
> rrapply(list1, how = "flatten")[[1]]
Time Series:
Start = 1
End = 100
Frequency = 1
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
[27] 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
[53] 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78
[79] 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
Using a while loop :
x <- list1
while (inherits(x <- x[[1]], "list")) {}
x
#> Time Series:
#> Start = 1
#> End = 100
#> Frequency = 1
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
#> [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
#> [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
#> [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
#> [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
#> [91] 91 92 93 94 95 96 97 98 99 100
x <- list2
while (inherits(x <- x[[1]], "list")) {}
x
#> Time Series:
#> Start = 1
#> End = 100
#> Frequency = 1
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
#> [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
#> [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
#> [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
#> [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
#> [91] 91 92 93 94 95 96 97 98 99 100
We can combine while loop with purrr::pluck. This avoids an actual recursive function, which could be a problem with deeply nested lists.
library(purrr)
get_list <- function(x){
while(is.list(x)){
x <- pluck(x, 1)
}
x
}
We can also set the function to be called 'recursively' until it finds a "ts" class object:
get_list <- function(x){
while(!is(x, 'ts')){
x <- pluck(x, 1)
}
x
}
output
get_list(list2)
Time Series:
Start = 1
End = 100
Frequency = 1
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
[46] 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100
Another possible solution, using purrr::pluck and purrr::vec_depth:
library(tidyverse)
pluck(list1, !!!(rep(1, vec_depth(list1)-2) %>% as.list()))
#> Time Series:
#> Start = 1
#> End = 100
#> Frequency = 1
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
#> [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
#> [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
#> [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
#> [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
#> [91] 91 92 93 94 95 96 97 98 99 100
pluck(list2, !!!(rep(1, vec_depth(list2)-2) %>% as.list()))
#> Time Series:
#> Start = 1
#> End = 100
#> Frequency = 1
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
#> [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
#> [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
#> [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
#> [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
#> [91] 91 92 93 94 95 96 97 98 99 100
base R solution I've just got the idea for a pretty simple function. It is a while loop that runs until the element is not a list.
myfun <- function(mylist){
dig_deeper <- TRUE
while(dig_deeper){
mylist<- my_list[[1]]
dig_deeper <- is.list(mylist)
}
return(mylist)
}
It works as expected
> myfun(list1)
Time Series:
Start = 1
End = 100
Frequency = 1
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
[25] 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
[49] 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
[97] 97 98 99 100

How to cut the values in a regular interval and define them into the separate group? [duplicate]

This question already has answers here:
Split a vector into chunks
(22 answers)
Closed 3 years ago.
How to cut the values (1 to 100) in a regular interval (25) and place them into 4 groups as below:
sdr <- c(1:100)
Group1: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Group2: 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Group3: 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
Group4: 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
Any suggestion, please.
You could use split
sdr <- 1:100
split(sdr, rep(1:4, each = 25))
#$`1`
# [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#
#$`2`
# [1] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
#
#$`3`
# [1] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
#
#$`4`
# [1] 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94
#[20] 95 96 97 98 99 100
This returns a list with 4 vector elements.
Also note that the c() around 1:100 is not necessary.
Or we can define the number of groups
ngroup <- 4
split(sdr, rep(1:ngroup, each = length(sdr) %/% ngroup))
giving the same result.
You can make a dataframe for your groups and then transpose using t:
df <- t(data.frame(Group1 = c(1:25), Group2 = c(26:50), Group3 = c(51:75), Group4 = c(76:100)))

loop index in R not increasing by 1

This is a rather simple question: why is this code in R not printing numbers from 1 to 100, but jumps with the value of i? Is there a way to prevent this?
t <-5
for (i in 1:t){
print(20*(i-1)+1:20*i)
}
to get the question closed
t <-5
for (i in 1:t){
print(20*(i-1)+1:20)
}
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
#> [1] 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
#> [1] 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
#> [1] 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
#> [1] 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100

reprex setting output width

How do I set the width of a reprex output?
Say I have a code like this:
(x <- 1:100)
I get this with reprex::reprex(venue = "so")
(x <- 1:100)
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
#> [18] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
#> [35] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
#> [52] 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
#> [69] 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
#> [86] 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
How can I increase the width of the output to output something like this
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
[51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
Possible Solutions
One option that I have found but I find rather "un-tidy" is this (include options(width = ...) at the top of the code. But I don't want it to show up in the output, I'd prefer setting the width in the reprex-call.
options(width = 205)
(x <- 1:100)
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
#> [51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
reprex() allows for knitr's opts-chunk, but I can't get it working with reprex::reprex(venue = "so", opts_chunk = list(out.width = 205)) (which might be related to #421 as pointed out here (Long lines of text output))
Any better solutions?
reprex has a syntax for setting these options but not including them in the output markdown (see here for examples). In this case:
reprex({
#+ setup, include = FALSE
options(width=205)
#+ actual-reprex-code
(x <- 1:100)
}, venue = 'so')
outputs your desired format:
(x <- 1:100)
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
#> [51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
Created on 2018-09-21 by the reprex package (v0.2.1)

How to do efficient vectorized update on multiple columns using data.tables?

I have the following code using data.frames, and I'm wondering how to write this using data.tables, using the most efficient, most vectorized code?
data.frame code:
set.seed(1)
to <- cbind(data.frame(time=seq(1:5),bananas=sample(100,5),apples=sample(100,5)),setNames(data.frame(matrix(sample(100,90,replace=T),nrow=5)),paste0(1:18)))
from <- cbind(data.frame(time=seq(1:5),blah=sample(100,5),foo=sample(100,5)),setNames(data.frame(matrix(sample(100,90,replace=T),nrow=5)),paste0(1:18)))
from
to
rownames(to) <- to$time
to[as.character(from$time),paste0(1:18)] <- from[,paste0(1:18)]
to
Running this:
> set.seed(1)
> to <- cbind(data.frame(time=seq(1:5),bananas=sample(100,5),apples=sample(100,5)),setNames(data.frame(matrix(sample(100,90,replace=T),nrow=5)),paste0(1:18)))
> from <- cbind(data.frame(time=seq(1:5),blah=sample(100,5),foo=sample(100,5)),setNames(data.frame(matrix(sample(100,90,replace=T),nrow=5)),paste0(1:18)))
> from
time blah foo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 1 66 22 98 2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2 2 35 13 74 72 50 52 8 57 61 18 56 53 90 7 85 65 20 76 39 12
3 3 27 47 36 11 49 21 4 53 24 75 33 8 45 34 86 75 89 73 11 85
4 4 97 90 44 45 18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5 5 61 58 15 65 76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79
> to
time bananas apples 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 1 27 90 21 50 94 39 49 67 83 79 48 10 92 26 34 90 44 21 24 80
2 2 37 94 18 72 22 2 60 80 65 3 87 32 30 48 84 87 72 72 6 46
3 3 57 65 69 100 66 39 50 11 79 48 44 52 46 77 35 39 40 13 65 42
4 4 89 62 39 39 13 87 19 73 56 74 25 67 34 9 34 78 33 25 88 82
5 5 20 6 77 78 27 35 83 42 53 70 8 41 66 88 48 97 76 15 78 61
>
> rownames(to) <- to$time
> to[as.character(from$time),paste0(1:18)] <- from[,paste0(1:18)]
> to
time bananas apples 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 1 27 90 98 2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2 2 37 94 74 72 50 52 8 57 61 18 56 53 90 7 85 65 20 76 39 12
3 3 57 65 36 11 49 21 4 53 24 75 33 8 45 34 86 75 89 73 11 85
4 4 89 62 44 45 18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5 5 20 6 15 65 76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79
Basically, we update columns paste0(1:18) of to from columns paste0(1:18) of from, matching up the times.
data.tables apparently have some advantages, such as not needing head when printing them at the console, so I'm thinking about using them.
However I'd like not to have to write the := expressions by hand, ie try to avoid:
to[from,`1`:=i.`1`,`2`:=i.`2`, ..]
I'd also prefer to use vectorized syntax if possible, rather than some kind of for loop, ie try to avoid something like:
for( i in 1:18 ) {
to[from, sprintf("%d",i) := i.sprintf("%d",i)]
}
I read through the faq vignette, and the datatable-intro vignette, though I admit I probably haven't understood everything 100%.
I looked at Loop through columns in a data.table and transform those columns , but I can't say I understand it 100%, and it seems to say that I need to use a for loop?
There does seem to be some kind of a hint at the bottom of 8374816 that it might be possible to just use data frame syntax, adding with=FALSE? But since the data.frame procedure is hacking on the row names, I'm not sure how well / if that will work, and I wonder to what extent that makes use of the efficiencies of data.table?
Good question. The base construct you've shown :
to[as.character(from$time),paste0(1:18)] <- from[,paste0(1:18)]
works assuming row names can't be duplicated, or if they are then only the first is matched to. Here, the LHS of <- has the same number of rows as the RHS of <-.
data.table is different since routinely, multiple rows in to may match; the default for mult is "all". data.table also prefers long format to wide. So this question is kind of putting data.table through its paces for something it wasn't really designed for. If you have any NA in those 18 columns (i.e. sparse), then a long format may be more appropriate. If all 18 columns are the same type, then a matrix may be more appropriate.
That said, here are three data.table options for completeness.
1. Using := but without a for loop (multiple LHS and multiple RHS in LHS:=RHS)
from = as.data.table(from)
to = as.data.table(to)
from
time blah foo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1: 1 66 22 98 2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2: 2 35 13 74 72 50 52 8 57 61 18 56 53 90 7 85 65 20 76 39 12
3: 3 27 47 36 11 49 21 4 53 24 75 33 8 45 34 86 75 89 73 11 85
4: 4 97 90 44 45 18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5: 5 61 58 15 65 76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79
to
time bananas apples 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1: 1 27 90 21 50 94 39 49 67 83 79 48 10 92 26 34 90 44 21 24 80
2: 2 37 94 18 72 22 2 60 80 65 3 87 32 30 48 84 87 72 72 6 46
3: 3 57 65 69 100 66 39 50 11 79 48 44 52 46 77 35 39 40 13 65 42
4: 4 89 62 39 39 13 87 19 73 56 74 25 67 34 9 34 78 33 25 88 82
5: 5 20 6 77 78 27 35 83 42 53 70 8 41 66 88 48 97 76 15 78 61
setkey(to,time)
setkey(from,time)
to[from,paste0(1:18):=from[.GRP,paste0(1:18),with=FALSE]]
time bananas apples 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1: 1 27 90 98 2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2: 2 37 94 74 72 50 52 8 57 61 18 56 53 90 7 85 65 20 76 39 12
3: 3 57 65 36 11 49 21 4 53 24 75 33 8 45 34 86 75 89 73 11 85
4: 4 89 62 44 45 18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5: 5 20 6 15 65 76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79
or
to[from,paste0(1:18):=from[,paste0(1:18),with=FALSE],mult="first"]
time bananas apples 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1: 1 27 90 98 2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2: 2 37 94 74 72 50 52 8 57 61 18 56 53 90 7 85 65 20 76 39 12
3: 3 57 65 36 11 49 21 4 53 24 75 33 8 45 34 86 75 89 73 11 85
4: 4 89 62 44 45 18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5: 5 20 6 15 65 76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79
Note I'm using latest v1.8.3, which is needed for option 1 to work (.GRP has just been added, and the outer with=FALSE is no longer needed).
2. Use one list column to store the length 18 vectors, rather than 18 columns
to = data.table( time=seq(1:5),
bananas=sample(100,5),
apples=sample(100,5),
v18=replicate(5,sample(100,18),simplify=FALSE))
from = data.table( time=seq(1:5),
blah=sample(100,5),
foo=sample(100,5),
v18=replicate(5,sample(100,18),simplify=FALSE))
setkey(to,time)
setkey(from,time)
from
time blah foo v18
1: 1 56 97 88,47,1,71,69,18,
2: 2 69 40 96,99,60,3,33,27,
3: 3 65 84 100,38,56,72,84,55,
4: 4 98 74 91,69,24,63,27,100,
5: 5 46 52 65,4,59,41,8,51,
to
time bananas apples v18
1: 1 66 73 100,36,74,77,68,46,
2: 2 19 37 84,88,92,8,37,52,
3: 3 94 77 37,94,13,7,93,43,
4: 4 88 2 27,93,71,16,46,66,
5: 5 91 91 85,94,58,49,19,1,
to[from,v18:=i.v18]
to
time bananas apples v18
1: 1 66 73 88,47,1,71,69,18,
2: 2 19 37 96,99,60,3,33,27,
3: 3 94 77 100,38,56,72,84,55,
4: 4 88 2 91,69,24,63,27,100,
5: 5 91 91 65,4,59,41,8,51,
If you are not used to list column printing, the trailing comma signifies that more items are in that vector. Just the first 6 are printed.
3. Use data.frame syntax on the data.table
to = as.data.table(to)
from = as.data.table(from)
setkey(to,time)
setkey(from,time)
from
time blah foo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1: 1 66 22 98 2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2: 2 35 13 74 72 50 52 8 57 61 18 56 53 90 7 85 65 20 76 39 12
3: 3 27 47 36 11 49 21 4 53 24 75 33 8 45 34 86 75 89 73 11 85
4: 4 97 90 44 45 18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5: 5 61 58 15 65 76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79
to
time bananas apples 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1: 1 27 90 21 50 94 39 49 67 83 79 48 10 92 26 34 90 44 21 24 80
2: 2 37 94 18 72 22 2 60 80 65 3 87 32 30 48 84 87 72 72 6 46
3: 3 57 65 69 100 66 39 50 11 79 48 44 52 46 77 35 39 40 13 65 42
4: 4 89 62 39 39 13 87 19 73 56 74 25 67 34 9 34 78 33 25 88 82
5: 5 20 6 77 78 27 35 83 42 53 70 8 41 66 88 48 97 76 15 78 61
to[from, paste0(1:18)] <- from[,paste0(1:18),with=FALSE]
to
time bananas apples 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1: 1 27 90 98 2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2: 2 37 94 74 72 50 52 8 57 61 18 56 53 90 7 85 65 20 76 39 12
3: 3 57 65 36 11 49 21 4 53 24 75 33 8 45 34 86 75 89 73 11 85
4: 4 89 62 44 45 18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5: 5 20 6 15 65 76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79
So the LHS of <- can use data.table keyed join syntax; i.e. to[from]. It's just that this method (currently in R) will copy the entire to dataset. That's what := was introduced to avoid by providing update by reference. Also, if each row in from matches to multiple rows in to then the RHS of <- would need to expanded to line up (by you the user), otherwise the RHS would be recycled to fill up the LHS. That's one reason why, in data.table, we like := being inside j, all inside [...].

Resources