I was wondering if you could help me with this problem. I have a dataset of US counties that I am trying to do k-nearest neighbor analysis for spatial weighting, following the method proposed here (section 4.5), but the results aren't making sense, or potentially I'm not understanding them.
library(spdep)
library(tigris)
library(sf)
counties <- counties("Georgia", cb = TRUE)
coords <- st_centroid(st_geometry(counties), of_largest_polygon=TRUE)
col.knn <- knearneigh(coords)
gck4.nb <- knn2nb(knearneigh(coords, k=4, longlat=TRUE))
summary(gck4.nb, coords, longlat=TRUE, scale=0.5)
However, the output I'm getting, with regards to the distances, seems rather small, on the order of less than 1 km:
Neighbour list object:
Number of regions: 159
Number of nonzero links: 636
Percentage nonzero weights: 2.515723
Average number of links: 4
Non-symmetric neighbours list
Link number distribution:
4
159
159 least connected regions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 with 4 links
159 most connected regions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 with 4 links
Summary of link distances:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1355 0.2650 0.3085 0.3112 0.3482 0.6224
The decimal point is 1 digit(s) to the left of the |
1 | 44
1 | 7799999999999999
2 | 00000000000011111111112222222222222233333333333333333333333333444444
2 | 55555555555555555555555555556666666666666666666666666666666666667777+92
3 | 00000000000000000000000000000001111111111111111111111111111111111111+121
3 | 55555555555555555555555555555556666666666667777777777777777777777777+19
4 | 00000000000111111111112222222222223333333444
4 | 555667777999
5 | 0000014
5 | 7888
6 | 2
I am using nested cross-validation in a benchmark experiment. I would like to retrieve the indices of the instances used for each outer loop. I am aware there is a function getResamplingIndices() suited for this task. But it won't accept 'BenchmarkResult' object. Is there a way to get around this? Here is an example:
res = benchmark(lrns, task, outer, measures = list(auc, ppv, npv), show.info = FALSE, models = T)
getResamplingIndices(res)
Error in getResamplingIndices(res) :
Assertion on 'object' failed: Must inherit from class 'ResampleResult', but has class 'BenchmarkResult'.
Resampling indices are the same across tasks, so you just apply it in a ResamplingResult object which is nested within your BenchmarkResult object.
If you want to have it for every task in your BenchmarkResult object, do
inds_by_task = lapply(bmr$results, function(x) getResamplingIndices(x[[1]]))
See below for a full reprex.
library(mlr)
#> Loading required package: ParamHelpers
lrns = list(makeLearner("classif.lda"), makeLearner("classif.rpart"))
tasks = list(iris.task, sonar.task)
rdesc = makeResampleDesc("CV", iters = 2L)
meas = list(acc, ber)
bmr = benchmark(lrns, tasks, rdesc, measures = meas, show.info = FALSE)
getResamplingIndices(bmr$results$`iris-example`$classif.lda)
#> $train.inds
#> $train.inds[[1]]
#> [1] 25 136 62 77 22 114 101 145 87 34 93 120 133 126 76 4 105 97 7
#> [20] 128 49 19 106 9 54 5 72 102 73 51 109 115 23 86 89 112 130 69
#> [39] 57 122 53 12 60 40 36 70 83 90 108 81 38 50 129 75 71 59 47
#> [58] 95 31 147 37 30 127 43 148 103 27 66 137 29 124 35 143 132 45
#>
#> $train.inds[[2]]
#> [1] 79 123 28 17 82 55 110 8 107 26 125 121 32 33 48 119 98 58 116
#> [20] 144 139 67 13 142 111 16 65 21 74 42 113 149 117 2 99 68 140 104
#> [39] 96 150 1 94 46 131 135 11 10 146 85 15 52 78 63 100 3 118 134
#> [58] 6 88 138 41 39 92 56 84 61 20 24 91 80 18 64 14 141 44
#>
#>
#> $test.inds
#> $test.inds[[1]]
#> [1] 1 2 3 6 8 10 11 13 14 15 16 17 18 20 21 24 26 28 32
#> [20] 33 39 41 42 44 46 48 52 55 56 58 61 63 64 65 67 68 74 78
#> [39] 79 80 82 84 85 88 91 92 94 96 98 99 100 104 107 110 111 113 116
#> [58] 117 118 119 121 123 125 131 134 135 138 139 140 141 142 144 146 149 150
#>
#> $test.inds[[2]]
#> [1] 4 5 7 9 12 19 22 23 25 27 29 30 31 34 35 36 37 38 40
#> [20] 43 45 47 49 50 51 53 54 57 59 60 62 66 69 70 71 72 73 75
#> [39] 76 77 81 83 86 87 89 90 93 95 97 101 102 103 105 106 108 109 112
#> [58] 114 115 120 122 124 126 127 128 129 130 132 133 136 137 143 145 147 148
Created on 2020-01-04 by the reprex package (v0.3.0)
The internat datasets of R give some brief explanation of its structure, background and content. For example:
?mtcars
gives me all the background I need to make sense of the data. is it possible to do the same with your own data-frames when you save them as an RMD? it would be really useful when I am collaborating with others.
You could add a custom attribute to the object that describes it. It'll get saved with the object and can be accessed using attr
data("iris")
i1 <- iris
attributes(i1)
# $names
# [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
#
# $class
# [1] "data.frame"
#
# $row.names
# [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
# [25] 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
# [49] 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
# [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
# [97] 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
# [121] 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
# [145] 145 146 147 148 149 150
attr(i1,'desc') <- 'Sample data on flowers'
attributes(i1)
# $names
# [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
#
# $class
# [1] "data.frame"
#
# $row.names
# [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
# [25] 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
# [49] 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
# [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
# [97] 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
# [121] 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
# [145] 145 146 147 148 149 150
#
# $desc
# [1] "Sample data on flowers"
attr(i1,'desc')
# [1] "Sample data on flowers"
Suppose I have a data set and I want to do a 4-fold cross validation using logistic regression. So there will be 4 different models. In R, I did the following:
ctrl <- trainControl(method = "repeatedcv", number = 4, savePredictions = TRUE)
mod_fit <- train(outcome ~., data=data1, method = "glm", family="binomial", trControl = ctrl)
I would assume that mod_fit should contain 4 separate sets of coefficients? When I type modfit$finalModel$ I just get the same set of coefficients.
I've created a reproducible example based on your code snippet. The first thing to notice about your code is that it's specifying repeatedcv as the method, but it doesn't give any repeats, so the number=4 parmeter is just telling it to resample 4 times (this is not an answer to your question but important to understand).
mod_fit$finalModel gives you only 1 set of coefficients because it's the one final model that's derived by aggergating the non-repeated k-fold CV results from each of the 4 folds.
You can see the fold-level performance in the resample object:
library(caret)
library(mlbench)
data(iris)
iris$binary <- ifelse(iris$Species=="setosa",1,0)
iris$Species <- NULL
ctrl <- trainControl(method = "repeatedcv",
number = 4,
savePredictions = TRUE,
verboseIter = T,
returnResamp = "all")
mod_fit <- train(binary ~.,
data=iris,
method = "glm",
family="binomial",
trControl = ctrl)
# Fold-level Performance
mod_fit$resample
RMSE Rsquared parameter Resample
1 2.630866e-03 0.9999658 none Fold1.Rep1
2 3.863821e-08 1.0000000 none Fold2.Rep1
3 8.162472e-12 1.0000000 none Fold3.Rep1
4 2.559189e-13 1.0000000 none Fold4.Rep1
To your earlier point, the package is not going to save and display information on the coefficients of each fold. In addition the the performance information above, does however save the index (list of in-sample rows), indexOut (hold how rows), and random seeds for each fold, thus if you were so inclined it would be easy to reconstruct the intermediate models.
mod_fit$control$seeds
[[1]]
[1] 169815
[[2]]
[1] 445763
[[3]]
[1] 871613
[[4]]
[1] 706905
[[5]]
[1] 89408
mod_fit$control$index
$Fold1
[1] 1 2 3 4 5 6 7 8 9 10 11 12 15 18 19 21 22 24 28 30 31 32 33 34 35 40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 59 60 61 63
[45] 64 65 66 68 69 70 71 72 73 75 76 77 79 80 81 82 84 85 86 87 89 90 91 92 93 94 95 96 98 99 100 103 104
106 107 108 110 111 113 114 116 118 119 120
[89] 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 140 141 142 143 145 147 149 150
$Fold2
[1] 1 6 7 8 12 13 14 15 16 17 18 19 20 21 22 23 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 42
44 46 48 50 51 53 54 55 56 57 58
[45] 59 61 62 64 66 67 69 70 71 72 73 74 75 76 78 79 80 81 82 83 84 85 87 88 89 90 91 92 95 96 97 98 99
101 102 104 105 106 108 109 111 112 113 115
[89] 116 117 119 120 121 122 123 127 130 131 132 134 135 137 138 139 140 141 142 143 144 145 146 147 148
$Fold3
[1] 2 3 4 5 6 7 8 9 10 11 13 14 16 17 20 23 24 25 26 27 28 29 30 33 35 36 37 38 39 40 41 43 45
46 47 49 50 51 52 54 55 56 57 58
[45] 60 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 82 83 84 85 86 88 89 93 94 97 98 99 100 101 102
103 105 106 107 108 109 110 111 112 114 115
[89] 117 118 119 121 124 125 126 128 129 131 132 133 134 135 136 137 138 139 144 145 146 147 148 149 150
$Fold4
[1] 1 2 3 4 5 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 29 31 32 34 36 37 38 39 41
42 43 44 45 47 48 49 52 53 55 56
[45] 57 58 59 60 61 62 63 65 67 68 74 77 78 79 80 81 83 86 87 88 90 91 92 93 94 95 96 97 100 101 102 103 104
105 107 109 110 112 113 114 115 116 117 118
[89] 120 122 123 124 125 126 127 128 129 130 133 136 137 138 139 140 141 142 143 144 146 148 149 150
mod_fit$control$indexOut
$Resample1
[1] 13 14 16 17 20 23 25 26 27 29 36 37 38 39 55 56 57 58 62 67 74 78 83 88 97 101 102 105 109 112 115 117 137
138 139 144 146 148
$Resample2
[1] 2 3 4 5 9 10 11 24 41 43 45 47 49 52 60 63 65 68 77 86 93 94 100 103 107 110 114 118 124 125 126 128 129
133 136 149 150
$Resample3
[1] 1 12 15 18 19 21 22 31 32 34 42 44 48 53 59 61 79 80 81 87 90 91 92 95 96 104 113 116 120 122 123 127 130
140 141 142 143
$Resample4
[1] 6 7 8 28 30 33 35 40 46 50 51 54 64 66 69 70 71 72 73 75 76 82 84 85 89 98 99 106 108 111 119 121 131
132 134 135 145 147
#Damien your mod_fit will not contain 4 separate set of coefficients. You are asking for cross validation with 4 folds. This does not mean you will have 4 different models. According to the documentation here, the train function works as follows:
At the end of the resampling loop - in your case 4 iterations for 4 folds, you will have one set of average forecast accuracy measures (eg., rmse, R-squared), for a given one set of model parameters.
Since you did not use tuneGrid or tuneLength argument in train function, by default, train function will tune over three values of each tuneable parameter.
This means you will have at most three models (not 4 models as you were expecting) and therefore three sets of average model performance measures.
The optimum model is the one that has the lowest rmse in case of regression. This model coefficients are available in mod_fit$finalModel.
I need a vector that repeats numbers in a sequence at varying intervals. I basically need this
c(rep(1:42, each=6), rep(43:64, each = 7),
rep(65:106, each=6), rep(107:128, each = 7),
.... but I need to this to keep going, until almost 2 million.
So I want a vector that looks like
[1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 ...
.....
[252] 43 43 43 43 43 43 43 44 44 44 44 44 44 44
....
[400] 64 64 64 64 64 64 65 65 65 65 65 65...
and so on. Not just alternating between 6 and 7 repetitions, rather mostly 6s and fewer 7s until the whole vector is something like 1.7 million rows. So, is there a loop I can do? Or apply, replicate? I need the 400th entry in the vector to be 64, the 800th entry to be 128, and so on, in somewhat evenly spaced integers.
UPDATE
Thank you all for the quick clever tricks there. It worked, at least well enough for the deadline I was dealing with. I realize repeating 6 xs and 7 xs are a really dumb way to try to solve this, but it was quick at least. But now that I have some time, I would like to get everyone's opinions /ideas on my real underlying issue here.
I have two datasets to merge. They are both sensor datasets, both with stopwatch time as primary keys. But one records every 1/400 of a second, and the other records every 1/256 of a second. I have trimmed the top of each so that they are starting the exact same moment. But.. now what? I have 400 records for each second in one set, and 256 records for 1 second in the other. Is there a way to merge these without losing data? Interpolating or just repeating obs is a-ok, necessary, I think, but I'd rather not throw any data out.
I read this post here, that had to do with using xts and zoo for a very similar problem to mine. But they have nice epoch date/times for each. I just have these awful fractions of seconds!
sample data (A):
time dist a_lat
1 139.4300 22 0
2 139.4325 22 0
3 139.4350 22 0
4 139.4375 22 0
5 139.4400 22 0
6 139.4425 22 0
7 139.4450 22 0
8 139.4475 22 0
9 139.4500 22 0
10 139.4525 22 0
sample data (B):
timestamp hex_acc_x hex_acc_y hex_acc_z
1 367065215501 -0.5546875 -0.7539062 0.1406250
2 367065215505 -0.5468750 -0.7070312 0.2109375
3 367065215509 -0.4218750 -0.6835938 0.1796875
4 367065215513 -0.5937500 -0.7421875 0.1562500
5 367065215517 -0.6757812 -0.7773438 0.2031250
6 367065215521 -0.5937500 -0.8554688 0.2460938
7 367065215525 -0.6132812 -0.8476562 0.2109375
8 367065215529 -0.3945312 -0.8906250 0.2031250
9 367065215533 -0.3203125 -0.8906250 0.2226562
10 367065215537 -0.3867188 -0.9531250 0.2578125
(oh yeah, and btw, the B dataset timestamps are epoch format * 256, because life is hard. i haven't converted it for this because dataset A has nothing like that, only just 0.0025 intervals. Also the B data sensor was left on for hours later the A data sensor turned off, so that doesn't help)
Or if you like, you can try this using apply
# using this sample data
df <- data.frame(from=c(1,4,7,11), to = c(3,6,10,13),rep=c(6,7,6,7));
> df
# from to rep
#1 1 3 6
#2 4 6 7
#3 7 10 6
#4 11 13 7
unlist(apply(df, 1, function(x) rep(x['from']:x['to'], each=x['rep'])))
# [1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4
#[26] 5 5 5 5 5 5 5 6 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8
#[51] 8 9 9 9 9 9 9 10 10 10 10 10 10 11 11 11 11 11 11 11 12 12 12 12 12
#[76] 12 12 13 13 13 13 13 13 13
Now that you put it that way ... I have absolutely no idea how you are planning on using all of the 6s and 7s. :-)
Regardless, I recommend standardizing the time, adding a "sample" column, and merging on them. Having the "sample" column may facilitate your processing later on, perhaps.
Your data:
df400 <- structure(list(time = c(139.43, 139.4325, 139.435, 139.4375, 139.44, 139.4425,
139.445, 139.4475, 139.45, 139.4525),
dist = c(22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L),
a_lat = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)),
.Names = c("time", "dist", "a_lat"),
class = "data.frame", row.names = c(NA, -10L))
df256 <- structure(list(timestamp = c(367065215501, 367065215505, 367065215509, 367065215513,
367065215517, 367065215521, 367065215525, 367065215529,
367065215533, 367065215537),
hex_acc_x = c(-0.5546875, -0.546875, -0.421875, -0.59375, -0.6757812,
-0.59375, -0.6132812, -0.3945312, -0.3203125, -0.3867188),
hex_acc_y = c(-0.7539062, -0.7070312, -0.6835938, -0.7421875,
-0.7773438, -0.8554688, -0.8476562, -0.890625,
-0.890625, -0.953125),
hex_acc_z = c(0.140625, 0.2109375, 0.1796875, 0.15625, 0.203125,
0.2460938, 0.2109375, 0.203125, 0.2226562, 0.2578125)),
.Names = c("timestamp", "hex_acc_x", "hex_acc_y", "hex_acc_z"),
class = "data.frame", row.names = c(NA, -10L))
Standardize your time frames:
colnames(df256)[1] <- 'time'
df400$time <- df400$time - df400$time[1]
df256$time <- (df256$time - df256$time[1]) / 256
Assign a label for easy reference (not that the NAs won't be clear enough):
df400 <- cbind(sample='A', df400, stringsAsFactors=FALSE)
df256 <- cbind(sample='B', df256, stringsAsFactors=FALSE)
And now for the merge and sorting:
dat <- merge(df400, df256, by=c('sample', 'time'), all.x=TRUE, all.y=TRUE)
dat <- dat[order(dat$time),]
dat
## sample time dist a_lat hex_acc_x hex_acc_y hex_acc_z
## 1 A 0.000000 22 0 NA NA NA
## 11 B 0.000000 NA NA -0.5546875 -0.7539062 0.1406250
## 2 A 0.002500 22 0 NA NA NA
## 3 A 0.005000 22 0 NA NA NA
## 4 A 0.007500 22 0 NA NA NA
## 5 A 0.010000 22 0 NA NA NA
## 6 A 0.012500 22 0 NA NA NA
## 7 A 0.015000 22 0 NA NA NA
## 12 B 0.015625 NA NA -0.5468750 -0.7070312 0.2109375
## 8 A 0.017500 22 0 NA NA NA
## 9 A 0.020000 22 0 NA NA NA
## 10 A 0.022500 22 0 NA NA NA
## 13 B 0.031250 NA NA -0.4218750 -0.6835938 0.1796875
## 14 B 0.046875 NA NA -0.5937500 -0.7421875 0.1562500
## 15 B 0.062500 NA NA -0.6757812 -0.7773438 0.2031250
## 16 B 0.078125 NA NA -0.5937500 -0.8554688 0.2460938
## 17 B 0.093750 NA NA -0.6132812 -0.8476562 0.2109375
## 18 B 0.109375 NA NA -0.3945312 -0.8906250 0.2031250
## 19 B 0.125000 NA NA -0.3203125 -0.8906250 0.2226562
## 20 B 0.140625 NA NA -0.3867188 -0.9531250 0.2578125
I'm guessing your data was just a small representation. If I've guessed poorly (that A's integers are seconds and B's integers are 1/400ths of a second) then just scale differently. Either way, by resetting the first value to zero and then merging/sorting, they are easy to merge and sort.
alt <- data.frame(len=c(42,22),rep=c(6,7));
alt;
## len rep
## 1 42 6
## 2 22 7
altrep <- function(alt,cyc,len) {
cyclen <- sum(alt$len*alt$rep);
if (missing(cyc)) {
if (missing(len)) {
cyc <- 1;
len <- cyc*cyclen;
} else {
cyc <- ceiling(len/cyclen);
};
} else if (missing(len)) {
len <- cyc*cyclen;
};
if (isTRUE(all.equal(len,0))) return(integer());
result <- rep(1:(cyc*sum(alt$len)),rep(rep(alt$rep,alt$len),cyc));
length(result) <- len;
result;
};
altrep(alt,2);
## [1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9
## [52] 9 9 9 10 10 10 10 10 10 11 11 11 11 11 11 12 12 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 15 15 15 15 15 15 16 16 16 16 16 16 17 17 17 17 17 17
## [103] 18 18 18 18 18 18 19 19 19 19 19 19 20 20 20 20 20 20 21 21 21 21 21 21 22 22 22 22 22 22 23 23 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25 25 26 26 26
## [154] 26 26 26 27 27 27 27 27 27 28 28 28 28 28 28 29 29 29 29 29 29 30 30 30 30 30 30 31 31 31 31 31 31 32 32 32 32 32 32 33 33 33 33 33 33 34 34 34 34 34 34
## [205] 35 35 35 35 35 35 36 36 36 36 36 36 37 37 37 37 37 37 38 38 38 38 38 38 39 39 39 39 39 39 40 40 40 40 40 40 41 41 41 41 41 41 42 42 42 42 42 42 43 43 43
## [256] 43 43 43 43 44 44 44 44 44 44 44 45 45 45 45 45 45 45 46 46 46 46 46 46 46 47 47 47 47 47 47 47 48 48 48 48 48 48 48 49 49 49 49 49 49 49 50 50 50 50 50
## [307] 50 50 51 51 51 51 51 51 51 52 52 52 52 52 52 52 53 53 53 53 53 53 53 54 54 54 54 54 54 54 55 55 55 55 55 55 55 56 56 56 56 56 56 56 57 57 57 57 57 57 57
## [358] 58 58 58 58 58 58 58 59 59 59 59 59 59 59 60 60 60 60 60 60 60 61 61 61 61 61 61 61 62 62 62 62 62 62 62 63 63 63 63 63 63 63 64 64 64 64 64 64 64 65 65
## [409] 65 65 65 65 66 66 66 66 66 66 67 67 67 67 67 67 68 68 68 68 68 68 69 69 69 69 69 69 70 70 70 70 70 70 71 71 71 71 71 71 72 72 72 72 72 72 73 73 73 73 73
## [460] 73 74 74 74 74 74 74 75 75 75 75 75 75 76 76 76 76 76 76 77 77 77 77 77 77 78 78 78 78 78 78 79 79 79 79 79 79 80 80 80 80 80 80 81 81 81 81 81 81 82 82
## [511] 82 82 82 82 83 83 83 83 83 83 84 84 84 84 84 84 85 85 85 85 85 85 86 86 86 86 86 86 87 87 87 87 87 87 88 88 88 88 88 88 89 89 89 89 89 89 90 90 90 90 90
## [562] 90 91 91 91 91 91 91 92 92 92 92 92 92 93 93 93 93 93 93 94 94 94 94 94 94 95 95 95 95 95 95 96 96 96 96 96 96 97 97 97 97 97 97 98 98 98 98 98 98 99 99
## [613] 99 99 99 99 100 100 100 100 100 100 101 101 101 101 101 101 102 102 102 102 102 102 103 103 103 103 103 103 104 104 104 104 104 104 105 105 105 105 105 105 106 106 106 106 106 106 107 107 107 107 107
## [664] 107 107 108 108 108 108 108 108 108 109 109 109 109 109 109 109 110 110 110 110 110 110 110 111 111 111 111 111 111 111 112 112 112 112 112 112 112 113 113 113 113 113 113 113 114 114 114 114 114 114 114
## [715] 115 115 115 115 115 115 115 116 116 116 116 116 116 116 117 117 117 117 117 117 117 118 118 118 118 118 118 118 119 119 119 119 119 119 119 120 120 120 120 120 120 120 121 121 121 121 121 121 121 122 122
## [766] 122 122 122 122 122 123 123 123 123 123 123 123 124 124 124 124 124 124 124 125 125 125 125 125 125 125 126 126 126 126 126 126 126 127 127 127 127 127 127 127 128 128 128 128 128 128 128
altrep(alt,len=1000);
## [1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9
## [52] 9 9 9 10 10 10 10 10 10 11 11 11 11 11 11 12 12 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 15 15 15 15 15 15 16 16 16 16 16 16 17 17 17 17 17 17
## [103] 18 18 18 18 18 18 19 19 19 19 19 19 20 20 20 20 20 20 21 21 21 21 21 21 22 22 22 22 22 22 23 23 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25 25 26 26 26
## [154] 26 26 26 27 27 27 27 27 27 28 28 28 28 28 28 29 29 29 29 29 29 30 30 30 30 30 30 31 31 31 31 31 31 32 32 32 32 32 32 33 33 33 33 33 33 34 34 34 34 34 34
## [205] 35 35 35 35 35 35 36 36 36 36 36 36 37 37 37 37 37 37 38 38 38 38 38 38 39 39 39 39 39 39 40 40 40 40 40 40 41 41 41 41 41 41 42 42 42 42 42 42 43 43 43
## [256] 43 43 43 43 44 44 44 44 44 44 44 45 45 45 45 45 45 45 46 46 46 46 46 46 46 47 47 47 47 47 47 47 48 48 48 48 48 48 48 49 49 49 49 49 49 49 50 50 50 50 50
## [307] 50 50 51 51 51 51 51 51 51 52 52 52 52 52 52 52 53 53 53 53 53 53 53 54 54 54 54 54 54 54 55 55 55 55 55 55 55 56 56 56 56 56 56 56 57 57 57 57 57 57 57
## [358] 58 58 58 58 58 58 58 59 59 59 59 59 59 59 60 60 60 60 60 60 60 61 61 61 61 61 61 61 62 62 62 62 62 62 62 63 63 63 63 63 63 63 64 64 64 64 64 64 64 65 65
## [409] 65 65 65 65 66 66 66 66 66 66 67 67 67 67 67 67 68 68 68 68 68 68 69 69 69 69 69 69 70 70 70 70 70 70 71 71 71 71 71 71 72 72 72 72 72 72 73 73 73 73 73
## [460] 73 74 74 74 74 74 74 75 75 75 75 75 75 76 76 76 76 76 76 77 77 77 77 77 77 78 78 78 78 78 78 79 79 79 79 79 79 80 80 80 80 80 80 81 81 81 81 81 81 82 82
## [511] 82 82 82 82 83 83 83 83 83 83 84 84 84 84 84 84 85 85 85 85 85 85 86 86 86 86 86 86 87 87 87 87 87 87 88 88 88 88 88 88 89 89 89 89 89 89 90 90 90 90 90
## [562] 90 91 91 91 91 91 91 92 92 92 92 92 92 93 93 93 93 93 93 94 94 94 94 94 94 95 95 95 95 95 95 96 96 96 96 96 96 97 97 97 97 97 97 98 98 98 98 98 98 99 99
## [613] 99 99 99 99 100 100 100 100 100 100 101 101 101 101 101 101 102 102 102 102 102 102 103 103 103 103 103 103 104 104 104 104 104 104 105 105 105 105 105 105 106 106 106 106 106 106 107 107 107 107 107
## [664] 107 107 108 108 108 108 108 108 108 109 109 109 109 109 109 109 110 110 110 110 110 110 110 111 111 111 111 111 111 111 112 112 112 112 112 112 112 113 113 113 113 113 113 113 114 114 114 114 114 114 114
## [715] 115 115 115 115 115 115 115 116 116 116 116 116 116 116 117 117 117 117 117 117 117 118 118 118 118 118 118 118 119 119 119 119 119 119 119 120 120 120 120 120 120 120 121 121 121 121 121 121 121 122 122
## [766] 122 122 122 122 122 123 123 123 123 123 123 123 124 124 124 124 124 124 124 125 125 125 125 125 125 125 126 126 126 126 126 126 126 127 127 127 127 127 127 127 128 128 128 128 128 128 128 129 129 129 129
## [817] 129 129 130 130 130 130 130 130 131 131 131 131 131 131 132 132 132 132 132 132 133 133 133 133 133 133 134 134 134 134 134 134 135 135 135 135 135 135 136 136 136 136 136 136 137 137 137 137 137 137 138
## [868] 138 138 138 138 138 139 139 139 139 139 139 140 140 140 140 140 140 141 141 141 141 141 141 142 142 142 142 142 142 143 143 143 143 143 143 144 144 144 144 144 144 145 145 145 145 145 145 146 146 146 146
## [919] 146 146 147 147 147 147 147 147 148 148 148 148 148 148 149 149 149 149 149 149 150 150 150 150 150 150 151 151 151 151 151 151 152 152 152 152 152 152 153 153 153 153 153 153 154 154 154 154 154 154 155
## [970] 155 155 155 155 155 156 156 156 156 156 156 157 157 157 157 157 157 158 158 158 158 158 158 159 159 159 159 159 159 160 160
You can specify len=1.7e6 (and omit the cyc argument) to get exactly 1.7 million elements, or you can get a whole number of cycles using cyc.
How about
len <- 2e6
step <- 400
x <- rep(64 * seq(0, ceiling(len / step) - 1), each = step) +
sort(rep(1:64, length.out = step))
x <- x[seq(len)] # to get rid of extra elements