Using barplot in R doesn't not match the data? - r

I want to use barplot (or any other better options) to plot the following data:
action_number times
1 1 13408
2 2 5550
3 3 2757
4 4 1782
5 5 1114
6 6 847
7 7 582
8 8 410
9 9 306
10 10 278
11 11 212
12 12 165
13 13 139
14 14 112
15 15 106
16 16 82
17 17 64
18 18 61
19 19 69
20 20 47
21 21 31
22 22 40
23 23 34
24 24 31
25 25 28
26 26 26
27 27 21
28 28 16
29 29 14
30 30 16
31 31 11
32 32 10
33 33 11
34 34 10
35 35 4
36 36 6
37 37 5
38 38 8
39 39 6
40 40 3
41 41 6
42 42 8
43 43 3
44 44 3
45 45 7
46 46 8
47 47 4
48 48 4
49 49 1
50 50 4
51 51 2
52 52 4
53 53 3
54 54 1
55 55 2
56 56 1
57 58 2
58 59 4
59 60 1
60 62 2
61 63 1
62 66 1
63 67 4
64 68 2
65 69 1
66 70 1
67 71 1
68 73 1
69 74 1
70 77 1
71 79 1
72 80 1
73 82 1
74 92 2
75 97 1
76 98 1
77 103 1
78 106 1
79 114 1
80 118 1
81 128 1
82 142 1
83 148 1
84 153 1
85 155 1
86 166 1
87 183 1
88 218 1
89 224 1
90 298 1
91 536 1
I am using the following, but it does not match the data correctly:
mp <- barplot(data$times,axes=FALSE,ylim=c(0,13408))
axis(1,at=data$action_number,labels=data$action_number)
#??? Should I use at=data$action_number to at=data$times
axis(2,seq(0,91,3),c(0:30))
![enter image description here][1]
Problems:
- the x-axis does not have 536, it only goes to 224
- the Y axis only shows one number
Can you please give me advice and if I should use any package?

still, unclear but may be something like this
barplot(data$times, xlab=data$action_number)
mp <- barplot(data$times,axes=FALSE,ylim=c(0,13408))
axis(1,at=seq(1,91,10),labels=data$action_number[seq(1,91,10)])
axis(2,seq(0,13408,500),seq(0,13408,500))

Related

When using table on a vector, the numbers in the names are out of order

I have a data frame with a column Session. There are 215 unique values for Session, and I am trying to treat it as a categorical variable.
However, when I run table(df$Session), the sessions are not appearing in order and some appear to be missing:
table(df$Session)
1 10 100 101 102 103 104 105 106 107 108 109 11 110 111 113 114 115 116 117 118
6 11 20 14 17 8 14 11 8 14 15 17 12 16 15 17 19 26 24 31 28
12 120 121 122 123 124 125 126 127 128 13 130 131 132 133 134 135 136 137 138 139
13 36 27 20 23 18 12 12 40 52 19 91 78 88 78 8 7 74 5 8 6
14 140 141 142 143 144 145 146 147 148 149 15 150 151 152 153 154 155 156 157 158
14 7 6 7 5 3 75 3 70 75 68 16 68 67 67 68 58 69 70 68 26
159 16 160 161 162 163 164 165 166 167 168 169 17 170 171 172 173 174 175 176 177
75 17 65 70 63 76 57 43 45 32 31 18 18 20 17 22 13 15 12 7 7
178 179 18 180 181 182 183 184 185 186 187 188 189 19 190 191 192 193 194 195 196
6 7 17 9 9 13 12 18 19 22 15 3 10 3 21 32 43 54 66 77 84
197 198 199 2 20 200 201 202 203 204 205 206 207 208 209 21 210 211 212 213 215
77 85 79 6 17 89 87 93 85 85 98 80 78 68 54 17 34 24 50 50 65
22 23 24 25 26 27 28 29 3 30 31 32 33 34 35 36 37 38 39 4 40
11 12 12 10 11 7 7 10 4 7 8 7 6 9 11 10 23 27 14 3 21
41 42 43 44 45 46 47 48 49 5 50 51 52 53 54 55 56 57 58 59 6
27 16 16 18 10 12 19 7 6 4 5 13 21 17 25 31 32 30 15 10 3
60 61 62 63 64 65 66 67 68 69 7 70 71 73 74 75 76 77 78 79 8
18 17 11 14 14 15 18 11 13 9 7 13 12 7 8 8 9 12 8 9 6
80 81 82 83 84 85 86 87 88 89 9 90 91 92 93 94 95 97 98 99
1 11 8 17 20 13 14 18 19 19 9 14 16 12 15 17 19 13 7 16
If we only look at a couple of columns:
table(df$Session)
# 1 10 100 101 ... 197 198 199 2 20 200 201 202 ...
# 6 11 20 14 ... 77 85 79 6 17 89 87 93 ...
Why are they not ordered by number (1, 2, 3 instead of 1, 10, 100)? And how can I correct this?
Answer
The variable will be sorted correctly if you make it numeric first:
table(as.numeric(df$Session))
table(as.factor(as.numeric(df$Session)))
Explanation
Your variable is or was of the class character. The order of your variable is alphabetically, i.e. what would happen if you sort a character vector. Try: sort(c("1", "11", "2")). When you apply factor or as.factor to a character vector, the levels will be ordered as such (see ?factor):
levels: an optional vector of the unique values (as character strings) that x might have taken. The default is the unique set of values taken by as.character(x), sorted into increasing order of x. Note that this set can be specified as smaller than sort(unique(x)).
Keep in mind that R reads in numbers as numeric by default. If you expected the column to be numeric from the start but R made it character, then you likely have values in there that are not strictly numbers. It is important to find out why the vector was character.
Reproducible example
vec <- c(22, 11, 3, 2, 1)
table(vec) # correct: numeric
# 1 2 3 11 22
# 1 1 1 1 1
table(as.character(vec)) # incorrect: character
# 1 11 2 22 3
# 1 1 1 1 1
table(as.factor(as.character(vec))) # incorrect: character -> factor
# 1 11 2 22 3
# 1 1 1 1 1
table(as.factor(vec)) # correct: numeric -> factor
# 1 2 3 11 22
# 1 1 1 1 1

Split a data frame by column using a list of vectors

I want to split this data frame df by column using a list of vectors ind as the column indices.
> df
1 2 3 4 5 6 7 8 9 10
1 1 11 21 31 41 51 61 71 81 91
2 2 12 22 32 42 52 62 72 82 92
3 3 13 23 33 43 53 63 73 83 93
4 4 14 24 34 44 54 64 74 84 94
5 5 15 25 35 45 55 65 75 85 95
6 6 16 26 36 46 56 66 76 86 96
7 7 17 27 37 47 57 67 77 87 97
8 8 18 28 38 48 58 68 78 88 98
9 9 19 29 39 49 59 69 79 89 99
10 10 20 30 40 50 60 70 80 90 100
The combined length of the vectors are equal to the number of columns in the data frame.
> ind
[[1]]
[1] 1 4 9
[[2]]
[1] 2 5 10 7 3
[[3]]
[1] 8 6
The desired result should look like this:
$`1`
1 4 9
1 1 31 81
2 2 32 82
3 3 33 83
4 4 34 84
5 5 35 85
6 6 36 86
7 7 37 87
8 8 38 88
9 9 39 89
10 10 40 90
$`2`
2 5 10 7 3
1 11 41 91 61 21
2 12 42 92 62 22
3 13 43 93 63 23
4 14 44 94 64 24
5 15 45 95 65 25
6 16 46 96 66 26
7 17 47 97 67 27
8 18 48 98 68 28
9 19 49 99 69 29
10 20 50 100 70 30
$`3`
8 6
1 71 51
2 72 52
3 73 53
4 74 54
5 75 55
6 76 56
7 77 57
8 78 58
9 79 59
10 80 60
Effectively the code generates sub matrices as data frames from the data frame df based on the vectors in the list ind
I have tried using split.defult without achieving the desired result.
split.default(V, rep(seq_along(ind), lengths(ind)))
One purrr option could be:
map(.x = ind, ~ df[, .x])
[[1]]
X1 X4 X9
1 1 31 81
2 2 32 82
3 3 33 83
[[2]]
X2 X5 X10 X7 X3
1 11 41 91 61 21
2 12 42 92 62 22
3 13 43 93 63 23
[[3]]
X8 X6
1 71 51
2 72 52
3 73 53
With ind defined as:
ind <- list(c(1, 4, 9),
c(2, 5, 10, 7, 3),
c(8, 6))
An option for a list of dfs:
map(ind, ~ map(df_list, `[`, .))
You can just do,
lapply(your_list, function(i) your_df[i])
You can try the following base R solution using subset + Map
r <- Map(function(k) subset(df,select = k),ind)
such that
> r
[[1]]
X1 X4 X9
1 1 31 81
2 2 32 82
3 3 33 83
4 4 34 84
5 5 35 85
6 6 36 86
7 7 37 87
8 8 38 88
9 9 39 89
10 10 40 90
[[2]]
X2 X5 X10 X7 X3
1 11 41 91 61 21
2 12 42 92 62 22
3 13 43 93 63 23
4 14 44 94 64 24
5 15 45 95 65 25
6 16 46 96 66 26
7 17 47 97 67 27
8 18 48 98 68 28
9 19 49 99 69 29
10 20 50 100 70 30
[[3]]
X8 X6
1 71 51
2 72 52
3 73 53
4 74 54
5 75 55
6 76 56
7 77 57
8 78 58
9 79 59
10 80 60

Filter using paste and name in dplyr

Sample data
df <- data.frame(loc.id = rep(1:5, each = 6), day = sample(1:365,30),
ref.day1 = rep(c(20,30,50,80,90), each = 6),
ref.day2 = rep(c(10,28,33,49,67), each = 6),
ref.day3 = rep(c(31,49,65,55,42), each = 6))
For each loc.id, if I want to keep days that are >= then ref.day1, I do this:
df %>% group_by(loc.id) %>% dplyr::filter(day >= ref.day1)
I want to make 3 data frames, each whose rows are filtered by ref.day1, ref.day2,ref.day3 respectively
I tried this:
col.names <- c("ref.day1","ref.day2","ref.day3")
temp.list <- list()
for(cl in seq_along(col.names)){
col.sub <- col.names[cl]
columns <- c("loc.id","day",col.sub)
df.sub <- df[,columns]
temp.dat <- df.sub %>% group_by(loc.id) %>% dplyr::filter(day >= paste0(col.sub)) # this line does not work
temp.list[[cl]] <- temp.dat
}
final.dat <- rbindlist(temp.list)
I was wondering how to refer to columns by names and paste function in dplyr in order to filter it out.
The reason why your original code doesn't work is that your col.names are strings, but dplyr function uses non-standard evaluation which doesn't accept strings. So you need to convert the string into variables.rlang::sym() can do that.
Also, you can use map function in purrr package, which is much more compact:
library(dplyr)
library(purrr)
col_names <- c("ref.day1","ref.day2","ref.day3")
map(col_names,~ df %>% dplyr::filter(day >= UQ(rlang::sym(.x))))
#it will return you a list of dataframes
By the way I removed group_by() because they don't seem to be useful.
Returned result:
[[1]]
loc.id day ref.day1 ref.day2 ref.day3
1 1 362 20 10 31
2 1 69 20 10 31
3 1 65 20 10 31
4 1 88 20 10 31
5 1 142 20 10 31
6 2 355 30 28 49
7 2 255 30 28 49
8 2 136 30 28 49
9 2 156 30 28 49
10 2 194 30 28 49
11 2 204 30 28 49
12 3 129 50 33 65
13 3 254 50 33 65
14 3 279 50 33 65
15 3 201 50 33 65
16 3 282 50 33 65
17 4 351 80 49 55
18 4 114 80 49 55
19 4 338 80 49 55
20 4 283 80 49 55
21 5 199 90 67 42
22 5 141 90 67 42
23 5 241 90 67 42
24 5 187 90 67 42
[[2]]
loc.id day ref.day1 ref.day2 ref.day3
1 1 16 20 10 31
2 1 362 20 10 31
3 1 69 20 10 31
4 1 65 20 10 31
5 1 88 20 10 31
6 1 142 20 10 31
7 2 355 30 28 49
8 2 255 30 28 49
9 2 136 30 28 49
10 2 156 30 28 49
11 2 194 30 28 49
12 2 204 30 28 49
13 3 129 50 33 65
14 3 254 50 33 65
15 3 279 50 33 65
16 3 201 50 33 65
17 3 282 50 33 65
18 4 351 80 49 55
19 4 114 80 49 55
20 4 338 80 49 55
21 4 283 80 49 55
22 4 79 80 49 55
23 5 199 90 67 42
24 5 67 90 67 42
25 5 141 90 67 42
26 5 241 90 67 42
27 5 187 90 67 42
[[3]]
loc.id day ref.day1 ref.day2 ref.day3
1 1 362 20 10 31
2 1 69 20 10 31
3 1 65 20 10 31
4 1 88 20 10 31
5 1 142 20 10 31
6 2 355 30 28 49
7 2 255 30 28 49
8 2 136 30 28 49
9 2 156 30 28 49
10 2 194 30 28 49
11 2 204 30 28 49
12 3 129 50 33 65
13 3 254 50 33 65
14 3 279 50 33 65
15 3 201 50 33 65
16 3 282 50 33 65
17 4 351 80 49 55
18 4 114 80 49 55
19 4 338 80 49 55
20 4 283 80 49 55
21 4 79 80 49 55
22 5 199 90 67 42
23 5 67 90 67 42
24 5 141 90 67 42
25 5 241 90 67 42
26 5 187 90 67 42
You may also want to check these:
https://dplyr.tidyverse.org/articles/programming.html
Use variable names in functions of dplyr

loop index in R not increasing by 1

This is a rather simple question: why is this code in R not printing numbers from 1 to 100, but jumps with the value of i? Is there a way to prevent this?
t <-5
for (i in 1:t){
print(20*(i-1)+1:20*i)
}
to get the question closed
t <-5
for (i in 1:t){
print(20*(i-1)+1:20)
}
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
#> [1] 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
#> [1] 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
#> [1] 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
#> [1] 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100

How to do efficient vectorized update on multiple columns using data.tables?

I have the following code using data.frames, and I'm wondering how to write this using data.tables, using the most efficient, most vectorized code?
data.frame code:
set.seed(1)
to <- cbind(data.frame(time=seq(1:5),bananas=sample(100,5),apples=sample(100,5)),setNames(data.frame(matrix(sample(100,90,replace=T),nrow=5)),paste0(1:18)))
from <- cbind(data.frame(time=seq(1:5),blah=sample(100,5),foo=sample(100,5)),setNames(data.frame(matrix(sample(100,90,replace=T),nrow=5)),paste0(1:18)))
from
to
rownames(to) <- to$time
to[as.character(from$time),paste0(1:18)] <- from[,paste0(1:18)]
to
Running this:
> set.seed(1)
> to <- cbind(data.frame(time=seq(1:5),bananas=sample(100,5),apples=sample(100,5)),setNames(data.frame(matrix(sample(100,90,replace=T),nrow=5)),paste0(1:18)))
> from <- cbind(data.frame(time=seq(1:5),blah=sample(100,5),foo=sample(100,5)),setNames(data.frame(matrix(sample(100,90,replace=T),nrow=5)),paste0(1:18)))
> from
time blah foo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 1 66 22 98 2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2 2 35 13 74 72 50 52 8 57 61 18 56 53 90 7 85 65 20 76 39 12
3 3 27 47 36 11 49 21 4 53 24 75 33 8 45 34 86 75 89 73 11 85
4 4 97 90 44 45 18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5 5 61 58 15 65 76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79
> to
time bananas apples 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 1 27 90 21 50 94 39 49 67 83 79 48 10 92 26 34 90 44 21 24 80
2 2 37 94 18 72 22 2 60 80 65 3 87 32 30 48 84 87 72 72 6 46
3 3 57 65 69 100 66 39 50 11 79 48 44 52 46 77 35 39 40 13 65 42
4 4 89 62 39 39 13 87 19 73 56 74 25 67 34 9 34 78 33 25 88 82
5 5 20 6 77 78 27 35 83 42 53 70 8 41 66 88 48 97 76 15 78 61
>
> rownames(to) <- to$time
> to[as.character(from$time),paste0(1:18)] <- from[,paste0(1:18)]
> to
time bananas apples 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 1 27 90 98 2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2 2 37 94 74 72 50 52 8 57 61 18 56 53 90 7 85 65 20 76 39 12
3 3 57 65 36 11 49 21 4 53 24 75 33 8 45 34 86 75 89 73 11 85
4 4 89 62 44 45 18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5 5 20 6 15 65 76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79
Basically, we update columns paste0(1:18) of to from columns paste0(1:18) of from, matching up the times.
data.tables apparently have some advantages, such as not needing head when printing them at the console, so I'm thinking about using them.
However I'd like not to have to write the := expressions by hand, ie try to avoid:
to[from,`1`:=i.`1`,`2`:=i.`2`, ..]
I'd also prefer to use vectorized syntax if possible, rather than some kind of for loop, ie try to avoid something like:
for( i in 1:18 ) {
to[from, sprintf("%d",i) := i.sprintf("%d",i)]
}
I read through the faq vignette, and the datatable-intro vignette, though I admit I probably haven't understood everything 100%.
I looked at Loop through columns in a data.table and transform those columns , but I can't say I understand it 100%, and it seems to say that I need to use a for loop?
There does seem to be some kind of a hint at the bottom of 8374816 that it might be possible to just use data frame syntax, adding with=FALSE? But since the data.frame procedure is hacking on the row names, I'm not sure how well / if that will work, and I wonder to what extent that makes use of the efficiencies of data.table?
Good question. The base construct you've shown :
to[as.character(from$time),paste0(1:18)] <- from[,paste0(1:18)]
works assuming row names can't be duplicated, or if they are then only the first is matched to. Here, the LHS of <- has the same number of rows as the RHS of <-.
data.table is different since routinely, multiple rows in to may match; the default for mult is "all". data.table also prefers long format to wide. So this question is kind of putting data.table through its paces for something it wasn't really designed for. If you have any NA in those 18 columns (i.e. sparse), then a long format may be more appropriate. If all 18 columns are the same type, then a matrix may be more appropriate.
That said, here are three data.table options for completeness.
1. Using := but without a for loop (multiple LHS and multiple RHS in LHS:=RHS)
from = as.data.table(from)
to = as.data.table(to)
from
time blah foo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1: 1 66 22 98 2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2: 2 35 13 74 72 50 52 8 57 61 18 56 53 90 7 85 65 20 76 39 12
3: 3 27 47 36 11 49 21 4 53 24 75 33 8 45 34 86 75 89 73 11 85
4: 4 97 90 44 45 18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5: 5 61 58 15 65 76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79
to
time bananas apples 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1: 1 27 90 21 50 94 39 49 67 83 79 48 10 92 26 34 90 44 21 24 80
2: 2 37 94 18 72 22 2 60 80 65 3 87 32 30 48 84 87 72 72 6 46
3: 3 57 65 69 100 66 39 50 11 79 48 44 52 46 77 35 39 40 13 65 42
4: 4 89 62 39 39 13 87 19 73 56 74 25 67 34 9 34 78 33 25 88 82
5: 5 20 6 77 78 27 35 83 42 53 70 8 41 66 88 48 97 76 15 78 61
setkey(to,time)
setkey(from,time)
to[from,paste0(1:18):=from[.GRP,paste0(1:18),with=FALSE]]
time bananas apples 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1: 1 27 90 98 2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2: 2 37 94 74 72 50 52 8 57 61 18 56 53 90 7 85 65 20 76 39 12
3: 3 57 65 36 11 49 21 4 53 24 75 33 8 45 34 86 75 89 73 11 85
4: 4 89 62 44 45 18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5: 5 20 6 15 65 76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79
or
to[from,paste0(1:18):=from[,paste0(1:18),with=FALSE],mult="first"]
time bananas apples 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1: 1 27 90 98 2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2: 2 37 94 74 72 50 52 8 57 61 18 56 53 90 7 85 65 20 76 39 12
3: 3 57 65 36 11 49 21 4 53 24 75 33 8 45 34 86 75 89 73 11 85
4: 4 89 62 44 45 18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5: 5 20 6 15 65 76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79
Note I'm using latest v1.8.3, which is needed for option 1 to work (.GRP has just been added, and the outer with=FALSE is no longer needed).
2. Use one list column to store the length 18 vectors, rather than 18 columns
to = data.table( time=seq(1:5),
bananas=sample(100,5),
apples=sample(100,5),
v18=replicate(5,sample(100,18),simplify=FALSE))
from = data.table( time=seq(1:5),
blah=sample(100,5),
foo=sample(100,5),
v18=replicate(5,sample(100,18),simplify=FALSE))
setkey(to,time)
setkey(from,time)
from
time blah foo v18
1: 1 56 97 88,47,1,71,69,18,
2: 2 69 40 96,99,60,3,33,27,
3: 3 65 84 100,38,56,72,84,55,
4: 4 98 74 91,69,24,63,27,100,
5: 5 46 52 65,4,59,41,8,51,
to
time bananas apples v18
1: 1 66 73 100,36,74,77,68,46,
2: 2 19 37 84,88,92,8,37,52,
3: 3 94 77 37,94,13,7,93,43,
4: 4 88 2 27,93,71,16,46,66,
5: 5 91 91 85,94,58,49,19,1,
to[from,v18:=i.v18]
to
time bananas apples v18
1: 1 66 73 88,47,1,71,69,18,
2: 2 19 37 96,99,60,3,33,27,
3: 3 94 77 100,38,56,72,84,55,
4: 4 88 2 91,69,24,63,27,100,
5: 5 91 91 65,4,59,41,8,51,
If you are not used to list column printing, the trailing comma signifies that more items are in that vector. Just the first 6 are printed.
3. Use data.frame syntax on the data.table
to = as.data.table(to)
from = as.data.table(from)
setkey(to,time)
setkey(from,time)
from
time blah foo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1: 1 66 22 98 2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2: 2 35 13 74 72 50 52 8 57 61 18 56 53 90 7 85 65 20 76 39 12
3: 3 27 47 36 11 49 21 4 53 24 75 33 8 45 34 86 75 89 73 11 85
4: 4 97 90 44 45 18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5: 5 61 58 15 65 76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79
to
time bananas apples 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1: 1 27 90 21 50 94 39 49 67 83 79 48 10 92 26 34 90 44 21 24 80
2: 2 37 94 18 72 22 2 60 80 65 3 87 32 30 48 84 87 72 72 6 46
3: 3 57 65 69 100 66 39 50 11 79 48 44 52 46 77 35 39 40 13 65 42
4: 4 89 62 39 39 13 87 19 73 56 74 25 67 34 9 34 78 33 25 88 82
5: 5 20 6 77 78 27 35 83 42 53 70 8 41 66 88 48 97 76 15 78 61
to[from, paste0(1:18)] <- from[,paste0(1:18),with=FALSE]
to
time bananas apples 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1: 1 27 90 98 2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2: 2 37 94 74 72 50 52 8 57 61 18 56 53 90 7 85 65 20 76 39 12
3: 3 57 65 36 11 49 21 4 53 24 75 33 8 45 34 86 75 89 73 11 85
4: 4 89 62 44 45 18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5: 5 20 6 15 65 76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79
So the LHS of <- can use data.table keyed join syntax; i.e. to[from]. It's just that this method (currently in R) will copy the entire to dataset. That's what := was introduced to avoid by providing update by reference. Also, if each row in from matches to multiple rows in to then the RHS of <- would need to expanded to line up (by you the user), otherwise the RHS would be recycled to fill up the LHS. That's one reason why, in data.table, we like := being inside j, all inside [...].

Resources