I was wondering if you could help me with this problem. I have a dataset of US counties that I am trying to do k-nearest neighbor analysis for spatial weighting, following the method proposed here (section 4.5), but the results aren't making sense, or potentially I'm not understanding them.
library(spdep)
library(tigris)
library(sf)
counties <- counties("Georgia", cb = TRUE)
coords <- st_centroid(st_geometry(counties), of_largest_polygon=TRUE)
col.knn <- knearneigh(coords)
gck4.nb <- knn2nb(knearneigh(coords, k=4, longlat=TRUE))
summary(gck4.nb, coords, longlat=TRUE, scale=0.5)
However, the output I'm getting, with regards to the distances, seems rather small, on the order of less than 1 km:
Neighbour list object:
Number of regions: 159
Number of nonzero links: 636
Percentage nonzero weights: 2.515723
Average number of links: 4
Non-symmetric neighbours list
Link number distribution:
4
159
159 least connected regions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 with 4 links
159 most connected regions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 with 4 links
Summary of link distances:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1355 0.2650 0.3085 0.3112 0.3482 0.6224
The decimal point is 1 digit(s) to the left of the |
1 | 44
1 | 7799999999999999
2 | 00000000000011111111112222222222222233333333333333333333333333444444
2 | 55555555555555555555555555556666666666666666666666666666666666667777+92
3 | 00000000000000000000000000000001111111111111111111111111111111111111+121
3 | 55555555555555555555555555555556666666666667777777777777777777777777+19
4 | 00000000000111111111112222222222223333333444
4 | 555667777999
5 | 0000014
5 | 7888
6 | 2
I have a CSV (value, carbon, latitude, longitude) that I am trying to create a raster from.
CSV file sample:
Carbon Latitude Longitude coords.x1 coords.x2
1 385 36 74 36 74
2 463 36 74 36 74
3 35 36 74 36 74
4 38 36 74 36 74
5 34 36 74 36 74
6 11 36 74 36 74
7 46 36 74 36 74
8 18 36 74 36 74
9 213 36 74 36 74
10 619 36 74 36 74
11 140 36 74 36 74
12 40 36 74 36 74
13 42 36 74 36 74
14 18 36 74 36 74
15 277 36 74 36 74
16 641 36 74 36 74
17 416 36 74 36 74
18 459 36 74 36 74
19 1073 36 74 36 74
20 628 36 74 36 74
21 425 36 74 36 74
22 550 36 74 36 74
23 163 36 74 36 74
24 366 36 74 36 74
25 379 36 74 36 74
26 279 36 74 36 74
27 284 36 74 36 74
28 454 36 74 36 74
29 813 36 74 36 74
30 1296 36 74 36 74
31 1539 36 74 36 74
32 997 36 74 36 74
33 498 36 74 36 74
34 857 36 74 36 74
35 413 36 74 36 74
36 76 36 74 36 74
37 189 36 74 36 74
38 130 36 74 36 74
39 22 36 74 36 74
40 18 36 74 36 74
41 137 36 74 36 74
42 521 36 74 36 74
43 28 36 74 36 74
44 188 36 74 36 74
45 101 36 74 36 74
46 19 36 74 36 74
47 935 36 74 36 74
48 22 36 74 36 74
49 22 36 74 36 74
50 165 36 74 36 74
51 274 36 74 36 74
52 316 36 74 36 74
53 270 36 74 36 74
54 125 36 74 36 74
55 116 36 74 36 74
56 109 36 74 36 74
57 70 36 74 36 74
58 194 36 74 36 74
59 36 36 74 36 74
60 24 36 74 36 74
61 93 36 74 36 74
62 32 36 74 36 74
63 144 36 74 36 74
64 47 36 74 36 74
65 304 36 74 36 74
66 338 36 74 36 74
67 214 36 74 36 74
68 150 36 74 36 74
69 1799 36 74 36 74
70 394 36 74 36 74
71 24 36 74 36 74
72 117 36 74 36 74
73 140 36 74 36 74
74 47 36 74 36 74
75 3 36 74 36 74
76 221 36 74 36 74
77 41 36 74 36 74
78 319 36 74 36 74
79 119 36 74 36 74
80 39 36 74 36 74
81 3 36 74 36 74
82 2 36 74 36 74
83 15 36 74 36 74
84 69 36 74 36 74
85 40 36 74 36 74
86 233 36 74 36 74
87 15 36 74 36 74
88 147 36 74 36 74
89 50 36 74 36 74
90 348 36 74 36 74
91 549 36 74 36 74
92 5 36 74 36 74
93 191 36 74 36 74
94 409 36 75 36 75
95 93 36 75 36 75
96 1641 36 75 36 75
97 154 36 75 36 75
98 852 36 75 36 75
99 1571 36 75 36 75
100 1173 36 75 36 75
101 19 36 75 36 75
102 9 36 75 36 75
103 15 36 75 36 75
104 67 36 75 36 75
105 666 36 75 36 75
106 3 36 75 36 75
107 227 36 75 36 75
108 130 36 75 36 75
109 423 36 75 36 75
110 31 36 75 36 75
111 559 36 75 36 75
112 143 36 75 36 75
113 63 36 75 36 75
114 1211 36 75 36 75
115 280 36 75 36 75
116 1027 36 75 36 75
117 636 36 75 36 75
118 207 36 75 36 75
119 233 36 75 36 75
120 332 36 75 36 75
121 266 36 75 36 75
122 266 36 75 36 75
123 284 36 75 36 75
124 240 36 75 36 75
125 613 36 75 36 75
126 28 36 75 36 75
127 762 36 75 36 75
128 58 36 75 36 75
129 310 36 75 36 75
130 12 36 75 36 75
131 15 36 75 36 75
132 343 36 75 36 75
133 128 36 75 36 75
134 177 36 75 36 75
135 320 36 75 36 75
136 205 36 75 36 75
137 108 36 75 36 75
138 1445 36 75 36 75
139 109 36 75 36 75
140 251 36 75 36 75
141 262 36 75 36 75
142 282 36 75 36 75
143 188 36 75 36 75
144 207 36 75 36 75
145 63 36 75 36 75
146 63 36 75 36 75
147 194 36 75 36 75
148 170 36 75 36 75
149 196 36 75 36 75
150 85 36 75 36 75
151 93 36 75 36 75
152 79 36 75 36 75
153 656 36 75 36 75
154 56 36 75 36 75
155 93 36 75 36 75
156 28 36 75 36 75
157 4 35 75 35 75
158 3 35 75 35 75
159 82 35 75 35 75
160 48 35 75 35 75
161 64 35 75 35 75
162 72 35 75 35 75
163 86 35 75 35 75
164 12 35 75 35 75
165 73 35 75 35 75
166 77 35 75 35 75
167 2162 35 75 35 75
168 854 35 75 35 75
169 51 35 75 35 75
170 61 35 75 35 75
171 11 35 75 35 75
172 8 35 75 35 75
173 16 35 75 35 75
174 58 35 75 35 75
175 50 35 75 35 75
176 53 35 75 35 75
177 8 35 75 35 75
178 48 35 75 35 75
179 235 35 75 35 75
180 38 35 75 35 75
181 75 35 75 35 75
182 25 35 75 35 75
183 12 35 75 35 75
184 18 35 75 35 75
185 51 35 75 35 75
186 19 35 75 35 75
187 22 35 75 35 75
188 1595 35 75 35 75
189 77 35 75 35 75
190 1673 35 75 35 75
191 42 35 75 35 75
192 120 35 75 35 75
193 66 35 75 35 75
194 53 35 75 35 75
195 66 35 75 35 75
196 6 35 75 35 75
197 5 35 75 35 75
198 36 35 75 35 75
199 54 35 75 35 75
200 46 35 75 35 75
class : SpatialPointsDataFrame
features : 13135
extent : 35, 37, 73, 76 (xmin, xmax, ymin, ymax)
crs : NA
variables : 3
names : Carbon, Latitude, Longitude
min values : 1, 35, 73
max values : 5829, 37, 76
R Script:
library(sp) # vector data
library(raster) # raster data
library(rgdal) # input/output, projections
library(rgeos) # geometry ops
library(spdep) # spatial dependence
foresta<-carbonstock
head(carbonstock)
data<-data.frame(carbonstock$Longitude,carbonstock$Latitude,carbonstock$Carbon)
data<-data.frame(carbonstock)
# points from scratch
coords = cbind(carbonstock$Latitude, carbonstock$Longitude)
sp = SpatialPoints(coords)
# make spatial data frame
spdf = SpatialPointsDataFrame(coords, data)
spdf = SpatialPointsDataFrame(sp, data)
# promote data frame to spatial
coordinates(data) = cbind(carbonstock$Latitude, carbonstock$Longitude)
coordinates(data) = ~lon + lat
# back to data
as.data.frame(data)
plot(data,)
library(raster)
dfr <- rasterFromXYZ(data) #Convert first two columns as lon-lat and third as value
plot(dfr)
dfr
library(raster)
# create spatial points data frame
spg <- data
x<-carbonstock$Latitude
y<-carbonstock$Longitude
coordinates(spg) <- ~ x + y
# coerce to SpatialPixelsDataFrame
gridded(spg) <- TRUE
# coerce to raster
rasterDF <- raster(spg)
I have to get order of one vector to sort other vector. The point is I don't want my function to be stable. In fact, I'd like to random order of equal values. Any idea how do it in R in finite time? :D
Thanks for any help.
You can do this in base R using order. order will take multiple variable to sort on. If you make the second one be a random variable, it will randomize the ties. Here is an example using the built-in iris data. The variable Sepal.Length has several ties for second lowest value. Here are some:
iris$Sepal.Length[c(9,39,43)]
[1] 4.4 4.4 4.4
Now let's sort just that variable (stable sort) and then sort with a random secondary sort.
order(iris$Sepal.Length)
[1] 14 9 39 43 42 4 7 23 48 3 30 12 13 25 31 46 2 10 35
[20] 38 58 107 5 8 26 27 36 41 44 50 61 94 1 18 20 22 24 40
[39] 45 47 99 28 29 33 60 49 6 11 17 21 32 85 34 37 54 81 82
[58] 90 91 65 67 70 89 95 122 16 19 56 80 96 97 100 114 15 68 83
[77] 93 102 115 143 62 71 150 63 79 84 86 120 139 64 72 74 92 128 135
[96] 69 98 127 149 57 73 88 101 104 124 134 137 147 52 75 112 116 129 133
[115] 138 55 105 111 117 148 59 76 66 78 87 109 125 141 145 146 77 113 144
[134] 53 121 140 142 51 103 110 126 130 108 131 106 118 119 123 136 132
order(iris$Sepal.Length, sample(150,150))
[1] 14 43 39 9 42 48 7 4 23 3 30 25 31 46 13 12 35 38 107
[20] 10 58 2 8 41 27 61 94 5 36 44 50 26 18 22 99 40 20 47
[39] 24 45 1 33 60 29 28 49 85 11 6 32 21 17 90 81 91 54 34
[58] 37 82 67 122 95 65 70 89 100 96 56 114 80 16 19 97 93 15 68
[77] 143 102 83 115 150 62 71 120 79 84 63 139 86 72 135 74 64 92 128
[96] 149 69 98 127 88 134 101 57 137 73 104 147 124 138 112 129 116 75 52
[115] 133 148 55 111 105 117 59 76 87 66 78 146 141 109 125 145 144 113 77
[134] 140 53 121 142 51 103 126 130 110 108 131 106 136 119 118 123 132
Without the random secondary sort, positions 2,3,and 4 are in order (stable). With the random secondary sort, they are jumbled.
Try fct_reorder in the forcats package to order one factor by another. If you want to introduce randomness as well, try fct_reorder2 with .y = runif(length(your_vector))
(I'm apparently thinking in strange directions today - fct_reorder will reorder the levels of a factor. If that's what you are after, this may help. Otherwise, order is the better approach.)
Suppose I have a data set and I want to do a 4-fold cross validation using logistic regression. So there will be 4 different models. In R, I did the following:
ctrl <- trainControl(method = "repeatedcv", number = 4, savePredictions = TRUE)
mod_fit <- train(outcome ~., data=data1, method = "glm", family="binomial", trControl = ctrl)
I would assume that mod_fit should contain 4 separate sets of coefficients? When I type modfit$finalModel$ I just get the same set of coefficients.
I've created a reproducible example based on your code snippet. The first thing to notice about your code is that it's specifying repeatedcv as the method, but it doesn't give any repeats, so the number=4 parmeter is just telling it to resample 4 times (this is not an answer to your question but important to understand).
mod_fit$finalModel gives you only 1 set of coefficients because it's the one final model that's derived by aggergating the non-repeated k-fold CV results from each of the 4 folds.
You can see the fold-level performance in the resample object:
library(caret)
library(mlbench)
data(iris)
iris$binary <- ifelse(iris$Species=="setosa",1,0)
iris$Species <- NULL
ctrl <- trainControl(method = "repeatedcv",
number = 4,
savePredictions = TRUE,
verboseIter = T,
returnResamp = "all")
mod_fit <- train(binary ~.,
data=iris,
method = "glm",
family="binomial",
trControl = ctrl)
# Fold-level Performance
mod_fit$resample
RMSE Rsquared parameter Resample
1 2.630866e-03 0.9999658 none Fold1.Rep1
2 3.863821e-08 1.0000000 none Fold2.Rep1
3 8.162472e-12 1.0000000 none Fold3.Rep1
4 2.559189e-13 1.0000000 none Fold4.Rep1
To your earlier point, the package is not going to save and display information on the coefficients of each fold. In addition the the performance information above, does however save the index (list of in-sample rows), indexOut (hold how rows), and random seeds for each fold, thus if you were so inclined it would be easy to reconstruct the intermediate models.
mod_fit$control$seeds
[[1]]
[1] 169815
[[2]]
[1] 445763
[[3]]
[1] 871613
[[4]]
[1] 706905
[[5]]
[1] 89408
mod_fit$control$index
$Fold1
[1] 1 2 3 4 5 6 7 8 9 10 11 12 15 18 19 21 22 24 28 30 31 32 33 34 35 40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 59 60 61 63
[45] 64 65 66 68 69 70 71 72 73 75 76 77 79 80 81 82 84 85 86 87 89 90 91 92 93 94 95 96 98 99 100 103 104
106 107 108 110 111 113 114 116 118 119 120
[89] 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 140 141 142 143 145 147 149 150
$Fold2
[1] 1 6 7 8 12 13 14 15 16 17 18 19 20 21 22 23 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 42
44 46 48 50 51 53 54 55 56 57 58
[45] 59 61 62 64 66 67 69 70 71 72 73 74 75 76 78 79 80 81 82 83 84 85 87 88 89 90 91 92 95 96 97 98 99
101 102 104 105 106 108 109 111 112 113 115
[89] 116 117 119 120 121 122 123 127 130 131 132 134 135 137 138 139 140 141 142 143 144 145 146 147 148
$Fold3
[1] 2 3 4 5 6 7 8 9 10 11 13 14 16 17 20 23 24 25 26 27 28 29 30 33 35 36 37 38 39 40 41 43 45
46 47 49 50 51 52 54 55 56 57 58
[45] 60 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 82 83 84 85 86 88 89 93 94 97 98 99 100 101 102
103 105 106 107 108 109 110 111 112 114 115
[89] 117 118 119 121 124 125 126 128 129 131 132 133 134 135 136 137 138 139 144 145 146 147 148 149 150
$Fold4
[1] 1 2 3 4 5 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 29 31 32 34 36 37 38 39 41
42 43 44 45 47 48 49 52 53 55 56
[45] 57 58 59 60 61 62 63 65 67 68 74 77 78 79 80 81 83 86 87 88 90 91 92 93 94 95 96 97 100 101 102 103 104
105 107 109 110 112 113 114 115 116 117 118
[89] 120 122 123 124 125 126 127 128 129 130 133 136 137 138 139 140 141 142 143 144 146 148 149 150
mod_fit$control$indexOut
$Resample1
[1] 13 14 16 17 20 23 25 26 27 29 36 37 38 39 55 56 57 58 62 67 74 78 83 88 97 101 102 105 109 112 115 117 137
138 139 144 146 148
$Resample2
[1] 2 3 4 5 9 10 11 24 41 43 45 47 49 52 60 63 65 68 77 86 93 94 100 103 107 110 114 118 124 125 126 128 129
133 136 149 150
$Resample3
[1] 1 12 15 18 19 21 22 31 32 34 42 44 48 53 59 61 79 80 81 87 90 91 92 95 96 104 113 116 120 122 123 127 130
140 141 142 143
$Resample4
[1] 6 7 8 28 30 33 35 40 46 50 51 54 64 66 69 70 71 72 73 75 76 82 84 85 89 98 99 106 108 111 119 121 131
132 134 135 145 147
#Damien your mod_fit will not contain 4 separate set of coefficients. You are asking for cross validation with 4 folds. This does not mean you will have 4 different models. According to the documentation here, the train function works as follows:
At the end of the resampling loop - in your case 4 iterations for 4 folds, you will have one set of average forecast accuracy measures (eg., rmse, R-squared), for a given one set of model parameters.
Since you did not use tuneGrid or tuneLength argument in train function, by default, train function will tune over three values of each tuneable parameter.
This means you will have at most three models (not 4 models as you were expecting) and therefore three sets of average model performance measures.
The optimum model is the one that has the lowest rmse in case of regression. This model coefficients are available in mod_fit$finalModel.
I need a vector that repeats numbers in a sequence at varying intervals. I basically need this
c(rep(1:42, each=6), rep(43:64, each = 7),
rep(65:106, each=6), rep(107:128, each = 7),
.... but I need to this to keep going, until almost 2 million.
So I want a vector that looks like
[1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 ...
.....
[252] 43 43 43 43 43 43 43 44 44 44 44 44 44 44
....
[400] 64 64 64 64 64 64 65 65 65 65 65 65...
and so on. Not just alternating between 6 and 7 repetitions, rather mostly 6s and fewer 7s until the whole vector is something like 1.7 million rows. So, is there a loop I can do? Or apply, replicate? I need the 400th entry in the vector to be 64, the 800th entry to be 128, and so on, in somewhat evenly spaced integers.
UPDATE
Thank you all for the quick clever tricks there. It worked, at least well enough for the deadline I was dealing with. I realize repeating 6 xs and 7 xs are a really dumb way to try to solve this, but it was quick at least. But now that I have some time, I would like to get everyone's opinions /ideas on my real underlying issue here.
I have two datasets to merge. They are both sensor datasets, both with stopwatch time as primary keys. But one records every 1/400 of a second, and the other records every 1/256 of a second. I have trimmed the top of each so that they are starting the exact same moment. But.. now what? I have 400 records for each second in one set, and 256 records for 1 second in the other. Is there a way to merge these without losing data? Interpolating or just repeating obs is a-ok, necessary, I think, but I'd rather not throw any data out.
I read this post here, that had to do with using xts and zoo for a very similar problem to mine. But they have nice epoch date/times for each. I just have these awful fractions of seconds!
sample data (A):
time dist a_lat
1 139.4300 22 0
2 139.4325 22 0
3 139.4350 22 0
4 139.4375 22 0
5 139.4400 22 0
6 139.4425 22 0
7 139.4450 22 0
8 139.4475 22 0
9 139.4500 22 0
10 139.4525 22 0
sample data (B):
timestamp hex_acc_x hex_acc_y hex_acc_z
1 367065215501 -0.5546875 -0.7539062 0.1406250
2 367065215505 -0.5468750 -0.7070312 0.2109375
3 367065215509 -0.4218750 -0.6835938 0.1796875
4 367065215513 -0.5937500 -0.7421875 0.1562500
5 367065215517 -0.6757812 -0.7773438 0.2031250
6 367065215521 -0.5937500 -0.8554688 0.2460938
7 367065215525 -0.6132812 -0.8476562 0.2109375
8 367065215529 -0.3945312 -0.8906250 0.2031250
9 367065215533 -0.3203125 -0.8906250 0.2226562
10 367065215537 -0.3867188 -0.9531250 0.2578125
(oh yeah, and btw, the B dataset timestamps are epoch format * 256, because life is hard. i haven't converted it for this because dataset A has nothing like that, only just 0.0025 intervals. Also the B data sensor was left on for hours later the A data sensor turned off, so that doesn't help)
Or if you like, you can try this using apply
# using this sample data
df <- data.frame(from=c(1,4,7,11), to = c(3,6,10,13),rep=c(6,7,6,7));
> df
# from to rep
#1 1 3 6
#2 4 6 7
#3 7 10 6
#4 11 13 7
unlist(apply(df, 1, function(x) rep(x['from']:x['to'], each=x['rep'])))
# [1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4
#[26] 5 5 5 5 5 5 5 6 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8
#[51] 8 9 9 9 9 9 9 10 10 10 10 10 10 11 11 11 11 11 11 11 12 12 12 12 12
#[76] 12 12 13 13 13 13 13 13 13
Now that you put it that way ... I have absolutely no idea how you are planning on using all of the 6s and 7s. :-)
Regardless, I recommend standardizing the time, adding a "sample" column, and merging on them. Having the "sample" column may facilitate your processing later on, perhaps.
Your data:
df400 <- structure(list(time = c(139.43, 139.4325, 139.435, 139.4375, 139.44, 139.4425,
139.445, 139.4475, 139.45, 139.4525),
dist = c(22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L),
a_lat = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)),
.Names = c("time", "dist", "a_lat"),
class = "data.frame", row.names = c(NA, -10L))
df256 <- structure(list(timestamp = c(367065215501, 367065215505, 367065215509, 367065215513,
367065215517, 367065215521, 367065215525, 367065215529,
367065215533, 367065215537),
hex_acc_x = c(-0.5546875, -0.546875, -0.421875, -0.59375, -0.6757812,
-0.59375, -0.6132812, -0.3945312, -0.3203125, -0.3867188),
hex_acc_y = c(-0.7539062, -0.7070312, -0.6835938, -0.7421875,
-0.7773438, -0.8554688, -0.8476562, -0.890625,
-0.890625, -0.953125),
hex_acc_z = c(0.140625, 0.2109375, 0.1796875, 0.15625, 0.203125,
0.2460938, 0.2109375, 0.203125, 0.2226562, 0.2578125)),
.Names = c("timestamp", "hex_acc_x", "hex_acc_y", "hex_acc_z"),
class = "data.frame", row.names = c(NA, -10L))
Standardize your time frames:
colnames(df256)[1] <- 'time'
df400$time <- df400$time - df400$time[1]
df256$time <- (df256$time - df256$time[1]) / 256
Assign a label for easy reference (not that the NAs won't be clear enough):
df400 <- cbind(sample='A', df400, stringsAsFactors=FALSE)
df256 <- cbind(sample='B', df256, stringsAsFactors=FALSE)
And now for the merge and sorting:
dat <- merge(df400, df256, by=c('sample', 'time'), all.x=TRUE, all.y=TRUE)
dat <- dat[order(dat$time),]
dat
## sample time dist a_lat hex_acc_x hex_acc_y hex_acc_z
## 1 A 0.000000 22 0 NA NA NA
## 11 B 0.000000 NA NA -0.5546875 -0.7539062 0.1406250
## 2 A 0.002500 22 0 NA NA NA
## 3 A 0.005000 22 0 NA NA NA
## 4 A 0.007500 22 0 NA NA NA
## 5 A 0.010000 22 0 NA NA NA
## 6 A 0.012500 22 0 NA NA NA
## 7 A 0.015000 22 0 NA NA NA
## 12 B 0.015625 NA NA -0.5468750 -0.7070312 0.2109375
## 8 A 0.017500 22 0 NA NA NA
## 9 A 0.020000 22 0 NA NA NA
## 10 A 0.022500 22 0 NA NA NA
## 13 B 0.031250 NA NA -0.4218750 -0.6835938 0.1796875
## 14 B 0.046875 NA NA -0.5937500 -0.7421875 0.1562500
## 15 B 0.062500 NA NA -0.6757812 -0.7773438 0.2031250
## 16 B 0.078125 NA NA -0.5937500 -0.8554688 0.2460938
## 17 B 0.093750 NA NA -0.6132812 -0.8476562 0.2109375
## 18 B 0.109375 NA NA -0.3945312 -0.8906250 0.2031250
## 19 B 0.125000 NA NA -0.3203125 -0.8906250 0.2226562
## 20 B 0.140625 NA NA -0.3867188 -0.9531250 0.2578125
I'm guessing your data was just a small representation. If I've guessed poorly (that A's integers are seconds and B's integers are 1/400ths of a second) then just scale differently. Either way, by resetting the first value to zero and then merging/sorting, they are easy to merge and sort.
alt <- data.frame(len=c(42,22),rep=c(6,7));
alt;
## len rep
## 1 42 6
## 2 22 7
altrep <- function(alt,cyc,len) {
cyclen <- sum(alt$len*alt$rep);
if (missing(cyc)) {
if (missing(len)) {
cyc <- 1;
len <- cyc*cyclen;
} else {
cyc <- ceiling(len/cyclen);
};
} else if (missing(len)) {
len <- cyc*cyclen;
};
if (isTRUE(all.equal(len,0))) return(integer());
result <- rep(1:(cyc*sum(alt$len)),rep(rep(alt$rep,alt$len),cyc));
length(result) <- len;
result;
};
altrep(alt,2);
## [1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9
## [52] 9 9 9 10 10 10 10 10 10 11 11 11 11 11 11 12 12 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 15 15 15 15 15 15 16 16 16 16 16 16 17 17 17 17 17 17
## [103] 18 18 18 18 18 18 19 19 19 19 19 19 20 20 20 20 20 20 21 21 21 21 21 21 22 22 22 22 22 22 23 23 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25 25 26 26 26
## [154] 26 26 26 27 27 27 27 27 27 28 28 28 28 28 28 29 29 29 29 29 29 30 30 30 30 30 30 31 31 31 31 31 31 32 32 32 32 32 32 33 33 33 33 33 33 34 34 34 34 34 34
## [205] 35 35 35 35 35 35 36 36 36 36 36 36 37 37 37 37 37 37 38 38 38 38 38 38 39 39 39 39 39 39 40 40 40 40 40 40 41 41 41 41 41 41 42 42 42 42 42 42 43 43 43
## [256] 43 43 43 43 44 44 44 44 44 44 44 45 45 45 45 45 45 45 46 46 46 46 46 46 46 47 47 47 47 47 47 47 48 48 48 48 48 48 48 49 49 49 49 49 49 49 50 50 50 50 50
## [307] 50 50 51 51 51 51 51 51 51 52 52 52 52 52 52 52 53 53 53 53 53 53 53 54 54 54 54 54 54 54 55 55 55 55 55 55 55 56 56 56 56 56 56 56 57 57 57 57 57 57 57
## [358] 58 58 58 58 58 58 58 59 59 59 59 59 59 59 60 60 60 60 60 60 60 61 61 61 61 61 61 61 62 62 62 62 62 62 62 63 63 63 63 63 63 63 64 64 64 64 64 64 64 65 65
## [409] 65 65 65 65 66 66 66 66 66 66 67 67 67 67 67 67 68 68 68 68 68 68 69 69 69 69 69 69 70 70 70 70 70 70 71 71 71 71 71 71 72 72 72 72 72 72 73 73 73 73 73
## [460] 73 74 74 74 74 74 74 75 75 75 75 75 75 76 76 76 76 76 76 77 77 77 77 77 77 78 78 78 78 78 78 79 79 79 79 79 79 80 80 80 80 80 80 81 81 81 81 81 81 82 82
## [511] 82 82 82 82 83 83 83 83 83 83 84 84 84 84 84 84 85 85 85 85 85 85 86 86 86 86 86 86 87 87 87 87 87 87 88 88 88 88 88 88 89 89 89 89 89 89 90 90 90 90 90
## [562] 90 91 91 91 91 91 91 92 92 92 92 92 92 93 93 93 93 93 93 94 94 94 94 94 94 95 95 95 95 95 95 96 96 96 96 96 96 97 97 97 97 97 97 98 98 98 98 98 98 99 99
## [613] 99 99 99 99 100 100 100 100 100 100 101 101 101 101 101 101 102 102 102 102 102 102 103 103 103 103 103 103 104 104 104 104 104 104 105 105 105 105 105 105 106 106 106 106 106 106 107 107 107 107 107
## [664] 107 107 108 108 108 108 108 108 108 109 109 109 109 109 109 109 110 110 110 110 110 110 110 111 111 111 111 111 111 111 112 112 112 112 112 112 112 113 113 113 113 113 113 113 114 114 114 114 114 114 114
## [715] 115 115 115 115 115 115 115 116 116 116 116 116 116 116 117 117 117 117 117 117 117 118 118 118 118 118 118 118 119 119 119 119 119 119 119 120 120 120 120 120 120 120 121 121 121 121 121 121 121 122 122
## [766] 122 122 122 122 122 123 123 123 123 123 123 123 124 124 124 124 124 124 124 125 125 125 125 125 125 125 126 126 126 126 126 126 126 127 127 127 127 127 127 127 128 128 128 128 128 128 128
altrep(alt,len=1000);
## [1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9
## [52] 9 9 9 10 10 10 10 10 10 11 11 11 11 11 11 12 12 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 15 15 15 15 15 15 16 16 16 16 16 16 17 17 17 17 17 17
## [103] 18 18 18 18 18 18 19 19 19 19 19 19 20 20 20 20 20 20 21 21 21 21 21 21 22 22 22 22 22 22 23 23 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25 25 26 26 26
## [154] 26 26 26 27 27 27 27 27 27 28 28 28 28 28 28 29 29 29 29 29 29 30 30 30 30 30 30 31 31 31 31 31 31 32 32 32 32 32 32 33 33 33 33 33 33 34 34 34 34 34 34
## [205] 35 35 35 35 35 35 36 36 36 36 36 36 37 37 37 37 37 37 38 38 38 38 38 38 39 39 39 39 39 39 40 40 40 40 40 40 41 41 41 41 41 41 42 42 42 42 42 42 43 43 43
## [256] 43 43 43 43 44 44 44 44 44 44 44 45 45 45 45 45 45 45 46 46 46 46 46 46 46 47 47 47 47 47 47 47 48 48 48 48 48 48 48 49 49 49 49 49 49 49 50 50 50 50 50
## [307] 50 50 51 51 51 51 51 51 51 52 52 52 52 52 52 52 53 53 53 53 53 53 53 54 54 54 54 54 54 54 55 55 55 55 55 55 55 56 56 56 56 56 56 56 57 57 57 57 57 57 57
## [358] 58 58 58 58 58 58 58 59 59 59 59 59 59 59 60 60 60 60 60 60 60 61 61 61 61 61 61 61 62 62 62 62 62 62 62 63 63 63 63 63 63 63 64 64 64 64 64 64 64 65 65
## [409] 65 65 65 65 66 66 66 66 66 66 67 67 67 67 67 67 68 68 68 68 68 68 69 69 69 69 69 69 70 70 70 70 70 70 71 71 71 71 71 71 72 72 72 72 72 72 73 73 73 73 73
## [460] 73 74 74 74 74 74 74 75 75 75 75 75 75 76 76 76 76 76 76 77 77 77 77 77 77 78 78 78 78 78 78 79 79 79 79 79 79 80 80 80 80 80 80 81 81 81 81 81 81 82 82
## [511] 82 82 82 82 83 83 83 83 83 83 84 84 84 84 84 84 85 85 85 85 85 85 86 86 86 86 86 86 87 87 87 87 87 87 88 88 88 88 88 88 89 89 89 89 89 89 90 90 90 90 90
## [562] 90 91 91 91 91 91 91 92 92 92 92 92 92 93 93 93 93 93 93 94 94 94 94 94 94 95 95 95 95 95 95 96 96 96 96 96 96 97 97 97 97 97 97 98 98 98 98 98 98 99 99
## [613] 99 99 99 99 100 100 100 100 100 100 101 101 101 101 101 101 102 102 102 102 102 102 103 103 103 103 103 103 104 104 104 104 104 104 105 105 105 105 105 105 106 106 106 106 106 106 107 107 107 107 107
## [664] 107 107 108 108 108 108 108 108 108 109 109 109 109 109 109 109 110 110 110 110 110 110 110 111 111 111 111 111 111 111 112 112 112 112 112 112 112 113 113 113 113 113 113 113 114 114 114 114 114 114 114
## [715] 115 115 115 115 115 115 115 116 116 116 116 116 116 116 117 117 117 117 117 117 117 118 118 118 118 118 118 118 119 119 119 119 119 119 119 120 120 120 120 120 120 120 121 121 121 121 121 121 121 122 122
## [766] 122 122 122 122 122 123 123 123 123 123 123 123 124 124 124 124 124 124 124 125 125 125 125 125 125 125 126 126 126 126 126 126 126 127 127 127 127 127 127 127 128 128 128 128 128 128 128 129 129 129 129
## [817] 129 129 130 130 130 130 130 130 131 131 131 131 131 131 132 132 132 132 132 132 133 133 133 133 133 133 134 134 134 134 134 134 135 135 135 135 135 135 136 136 136 136 136 136 137 137 137 137 137 137 138
## [868] 138 138 138 138 138 139 139 139 139 139 139 140 140 140 140 140 140 141 141 141 141 141 141 142 142 142 142 142 142 143 143 143 143 143 143 144 144 144 144 144 144 145 145 145 145 145 145 146 146 146 146
## [919] 146 146 147 147 147 147 147 147 148 148 148 148 148 148 149 149 149 149 149 149 150 150 150 150 150 150 151 151 151 151 151 151 152 152 152 152 152 152 153 153 153 153 153 153 154 154 154 154 154 154 155
## [970] 155 155 155 155 155 156 156 156 156 156 156 157 157 157 157 157 157 158 158 158 158 158 158 159 159 159 159 159 159 160 160
You can specify len=1.7e6 (and omit the cyc argument) to get exactly 1.7 million elements, or you can get a whole number of cycles using cyc.
How about
len <- 2e6
step <- 400
x <- rep(64 * seq(0, ceiling(len / step) - 1), each = step) +
sort(rep(1:64, length.out = step))
x <- x[seq(len)] # to get rid of extra elements