Converting a weird list into a data frame in R - r

I scrapped a table from a web page using this code,
library(XML)
url2 <- "http://www.baseball-reference.com/leagues/MLB/"
data2 <- readHTMLTable(url2, stringAsFactor = FALSE)
It gave me a list which looks something like this,
$teams_team_wins3000
Year G ARI ATL BLA BAL BOS CHC CHW CIN CLE COL DET HOU KCR ANA LAD FLA
1 2016 149 62 57 81 84 94 72 62 86 71 78 78 75 64 84 73
2 2015 162 79 67 81 78 97 76 64 81 68 74 86 95 85 92 71
3 2014 162 64 79 96 71 73 73 76 85 66 90 70 89 98 94 77
4 2013 163 81 96 85 97 66 63 90 92 74 93 51 86 78 92 62
5 2012 162 81 94 93 69 61 85 97 68 64 88 55 72 89 86 69
6 2011 162 94 89 69 90 71 79 79 80 73 95 56 71 86 82 72
7 2010 162 65 91 66 89 75 88 91 69 83 81 76 67 80 80 80
8 2009 163 70 86 64 95 83 79 78 65 92 86 74 65 97 95 87
9 2008 163 82 72 68 95 97 89 74 81 74 74 86 75 100 84 84
If you'd like you can simply copy the code on top to get the same table. The problem is that R is reading this like a list, and I want it to be a data frame.
Normally, I would use this code to convert it into a data frame, but it's not working this time.
do.call(rbind, data2) %>% as.data.frame
I'm still fairly new to R, and what I would like to do is convert this list into a data frame so that I can then structure the data to look something like this,
Year Team Wins Games
2016 ARI 62 149
2016 ATL 57 149
All help is appreciated.

Couple of problems. Spelling: It's stringsAsFactors. There is a dataframe in there, but because the function is prepared to accept multiple tables it is in there as a list item. You can get it back with "[[" just as you would for any list:
str(data2[[1]])
'data.frame': 120 obs. of 33 variables:
$ Year: Factor w/ 117 levels "1901","1902",..: 116 115 114 113 112 111 110 109 108 107 ...
$ G : Factor w/ 15 levels "111","117","129",..: 6 12 12 13 12 12 12 13 13 13 ...
$ ARI : Factor w/ 19 levels "","100","51",..: 4 10 5 11 11 17 6 7 12 15 ...
$ ATL : Factor w/ 55 levels "101","103","104",..: 16 26 37 53 51 46 48 44 31 42 ...
$ BLA : Factor w/ 4 levels "","50","68","BLA": 1 1 1 1 1 1 1 1 1 1 ...
$ BAL : Factor w/ 53 levels "100","101","102",..: 37 37 50 40 47 26 23 21 25 26 ...
$ BOS : Factor w/ 51 levels "101","104","105",..: 35 29 22 48 21 41 40 46 46 47 ...
$ CHC : Factor w/ 47 levels "100","104","107",..: 42 44 21 14 10 19 23 31 44 33 ...
$ CHW : Factor w/ 46 levels "100","49","51",..: 20 24 21 11 32 27 35 27 36 20 ...
$ CIN : Factor w/ 45 levels "100","102","108",..: 10 11 22 36 42 25 37 24 20 18 ...
$ CLE : Factor w/ 44 levels "100","111","51",..: 31 26 30 37 13 25 14 10 26 40 ...
snipped rest of the 33 columns
Try:
data2 <- readHTMLTable(url2, stringsAsFactors = FALSE)
str(data2[[1]])

Related

reprex setting output width

How do I set the width of a reprex output?
Say I have a code like this:
(x <- 1:100)
I get this with reprex::reprex(venue = "so")
(x <- 1:100)
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
#> [18] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
#> [35] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
#> [52] 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
#> [69] 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
#> [86] 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
How can I increase the width of the output to output something like this
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
[51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
Possible Solutions
One option that I have found but I find rather "un-tidy" is this (include options(width = ...) at the top of the code. But I don't want it to show up in the output, I'd prefer setting the width in the reprex-call.
options(width = 205)
(x <- 1:100)
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
#> [51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
reprex() allows for knitr's opts-chunk, but I can't get it working with reprex::reprex(venue = "so", opts_chunk = list(out.width = 205)) (which might be related to #421 as pointed out here (Long lines of text output))
Any better solutions?
reprex has a syntax for setting these options but not including them in the output markdown (see here for examples). In this case:
reprex({
#+ setup, include = FALSE
options(width=205)
#+ actual-reprex-code
(x <- 1:100)
}, venue = 'so')
outputs your desired format:
(x <- 1:100)
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
#> [51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
Created on 2018-09-21 by the reprex package (v0.2.1)

Iterative loop to display outliers using Boxplot (car package) in R

I want to display all outliers by First column and respective column names.
I am using Boxplot from "car" package, if there is any other efficient solution with boxplot (lower case) then also let me know.
AFD2[Boxplot(AFD2$GOL), c("Catkey", "GOL")]
Catkey GOL
58 A2SC043 152
216 KU-1265 153
510 TU-49 199
I wish to write a loop which will display all outliers like above.
Catkey GOL
58 A2SC043 152
216 KU-1265 153
510 TU-49 199
Catkey NOL
25 GF-5466 50
517 yU-1869 452
378 KU-11 765
likewise.....
I have total 48 columns first column is "Catkey" and rest of he columns are readings.. GOL, ABC, EFG, PIL, GHF, etc.
Please help.
Here is how my dataframe looks like
> head(AFD2)
Record Catkey Sex GOL NOL BNL BBH XCB XFB WFB ZYB AUB ASB BPL NPH
2 2 019-CRA M 161 160 95 135 143 116 90 135 128 109 89 72
3 3 021-CRA M 174 169 109 142 139 112 87 141 131 101 95 66
4 4 023-CRA M 171 168 100 140 136 112 89 135 126 110 99 72
5 5 024-CRA F 166 167 94 130 133 100 85 124 121 104 94 63
6 6 025-CRA M 166 168 100 140 148 120 92 139 130 109 93 73
7 7 026-CRA M 165 165 98 135 146 118 89 136 129 108 93 68
NLH JUB NLB MAB MAL MDH OBH OBB DKB NDS WNB SIS ZMB SSS FMB NAS
2 52 117 29 62 48 28 36 40 20 10 10.1 4.7 99 23 95 15
3 54 121 29 61 46 30 38 43 16 7 6.2 3.5 96 19 97 13
4 54 118 26 68 52 28 34 40 18 9 6.9 4.1 100 24 95 12
5 46 108 23 60 51 25 31 37 23 8 9.0 2.5 92 23 91 14
6 53 119 26 69 54 26 36 40 22 12 9.7 4.0 97 20 98 14
7 52 120 30 68 51 30 35 38 22 7 8.8 2.3 98 23 92 14
EKB DKS IML XML MLS WMH GLS STB FRC FRS FRF PAC PAS PAF OCC OCS
2 98 10 37 55 12 25 2 116 108 24 54 98 24 56 92 28
3 98 13 40 59 14 20 4 101 106 25 53 112 27 55 88 21
4 94 8 29 51 13 25 4 113 111 26 56 114 25 62 94 23
5 93 9 33 51 11 20 2 93 107 25 49 106 23 60 97 21
6 100 6 39 56 14 25 5 112 117 20 58 101 25 48 95 20
7 96 9 32 49 9 23 2 111 113 26 55 97 23 48 94 26
OCF
2 58
3 42
4 58
5 39
6 46
7 64
> str(AFD2)
'data.frame': 526 obs. of 48 variables:
$ Record: int 2 3 4 5 6 7 8 9 10 11 ...
$ Catkey: Factor w/ 589 levels "016-CRA","019-CRA",..: 2 3 4 5 6 7 8 9 10 11 ...
$ Sex : Factor w/ 6 levels "","F","M","MALE?",..: 3 3 3 2 3 3 3 5 2 3 ...
$ GOL : int 161 174 171 166 166 165 171 157 166 183 ...
$ NOL : int 160 169 168 167 168 165 169 158 164 179 ...
$ BNL : int 95 109 100 94 100 98 99 85 94 99 ...
$ BBH : int 135 142 140 130 140 135 138 123 125 139 ...
$ XCB : int 143 139 136 133 148 146 134 127 132 141 ...
$ XFB : int 116 112 112 100 120 118 109 105 107 118 ...
$ WFB : int 90 87 89 85 92 89 93 81 85 95 ...
$ ZYB : int 135 141 135 124 139 136 131 104 120 137 ...
$ AUB : int 128 131 126 121 130 129 120 103 116 127 ...
$ ASB : int 109 101 110 104 109 108 105 96 101 107 ...
$ BPL : int 89 95 99 94 93 93 93 75 94 98 ...
$ NPH : int 72 66 72 63 73 68 62 54 64 68 ...
$ NLH : int 52 54 54 46 53 52 51 42 48 49 ...
$ JUB : int 117 121 118 108 119 120 116 91 104 123 ...
$ NLB : int 29 29 26 23 26 30 28 21 24 28 ...
$ MAB : int 62 61 68 60 69 68 66 48 60 69 ...
$ MAL : int 48 46 52 51 54 51 49 37 48 53 ...
$ MDH : int 28 30 28 25 26 30 32 15 25 31 ...
$ OBH : int 36 38 34 31 36 35 35 32 33 32 ...
$ OBB : int 40 43 40 37 40 38 38 34 36 37 ...
$ DKB : int 20 16 18 23 22 22 25 15 19 23 ...
$ NDS : int 10 7 9 8 12 7 9 6 7 10 ...
$ WNB : num 10.1 6.2 6.9 9 9.7 8.8 9.6 5.8 6.9 6.8 ...
$ SIS : num 4.7 3.5 4.1 2.5 4 2.3 3 1.7 1.7 1.9 ...
$ ZMB : int 99 96 100 92 97 98 97 71 92 98 ...
$ SSS : int 23 19 24 23 20 23 23 19 21 23 ...
$ FMB : int 95 97 95 91 98 92 95 79 90 99 ...
$ NAS : int 15 13 12 14 14 14 17 13 11 14 ...
$ EKB : int 98 98 94 93 100 96 98 80 91 98 ...
$ DKS : int 10 13 8 9 6 9 10 11 6 7 ...
$ IML : int 37 40 29 33 39 32 37 30 31 36 ...
$ XML : int 55 59 51 51 56 49 55 48 51 56 ...
$ MLS : int 12 14 13 11 14 9 13 9 14 14 ...
$ WMH : int 25 20 25 20 25 23 22 19 22 27 ...
$ GLS : int 2 4 4 2 5 2 2 1 2 4 ...
$ STB : int 116 101 113 93 112 111 108 105 107 111 ...
$ FRC : int 108 106 111 107 117 113 109 99 100 116 ...
$ FRS : int 24 25 26 25 20 26 28 22 25 28 ...
$ FRF : int 54 53 56 49 58 55 47 46 51 49 ...
$ PAC : int 98 112 114 106 101 97 115 101 105 104 ...
$ PAS : int 24 27 25 23 25 23 26 24 23 21 ...
$ PAF : int 56 55 62 60 48 48 57 52 52 47 ...
$ OCC : int 92 88 94 97 95 94 90 92 96 112 ...
$ OCS : int 28 21 23 21 20 26 27 20 28 40 ...
$ OCF : int 58 42 58 39 46 64 50 49 51 71 ...
- attr(*, "na.action")=Class 'omit' Named int [1:63] 1 12 18 20 23 24 29 33 35 39 ...
.. ..- attr(*, "names")= chr [1:63] "1" "12" "18" "20" ...

Take sample from diminishing population

I would like to take a random sample of rows from a data.frame, apply a function to the subset, then take a sample from the remaining rows, apply the function to the new subset (with different parameters), and so on.
A simple example would be if 5% of a population dies each month, in month 2 I need the population minus those ones who died in time month 1.
I have put together a very verbose method of doing this involving where I save the IDs from the sampled rows, then subset them out from the data for the second period, etc.
library(data.table)
dt <- data.table(Number=1:100, ID=paste0("A", 1:100))
first<-dt[sample(nrow(dt), nrow(dt)*.05)]$ID
mean(dt[ID %in% first]$Number)
second<-dt[!(ID %in% first)][sample(nrow(dt[!(ID %in% first)]),
nrow(dt[!(ID %in% first)])*.05)]$ID
mean(dt[ID %in% c(first,second)]$Number)
dt[!(ID %in% first)][!(ID %in% second)] #...
Obviously, this is not sustainable past a couple periods. What is the better way to do this? I imagine this is a standard method but couldn't think what to look for specifically. Thanks for any and all input.
This shows how to "grow" a vector of items that have been sampled at a 5% per interval time course:
removed <- numeric(0)
for ( i in 1:10){
removed <- c(removed, sample( (1:100)[!(1:100) %in% removed], # items out so far
(100-length(removed))*.05)) # 5% of remainder
cat(c(removed, "\n")) # print to console with each iteration.
}
54 1 76 96 93
54 1 76 96 93 81 16 13 79
54 1 76 96 93 81 16 13 79 80 74 30 29
54 1 76 96 93 81 16 13 79 80 74 30 29 52 33 86 19
54 1 76 96 93 81 16 13 79 80 74 30 29 52 33 86 19 34 32 41 62
54 1 76 96 93 81 16 13 79 80 74 30 29 52 33 86 19 34 32 41 62 5 70 8
54 1 76 96 93 81 16 13 79 80 74 30 29 52 33 86 19 34 32 41 62 5 70 8 66 82 50
54 1 76 96 93 81 16 13 79 80 74 30 29 52 33 86 19 34 32 41 62 5 70 8 66 82 50 6 91 99
54 1 76 96 93 81 16 13 79 80 74 30 29 52 33 86 19 34 32 41 62 5 70 8 66 82 50 6 91 99 46 27 51
54 1 76 96 93 81 16 13 79 80 74 30 29 52 33 86 19 34 32 41 62 5 70 8 66 82 50 6 91 99 46 27 51 22 23 20
Notice that the actual number of items added to the list of "removals" will be decreasing.

Generate sequence with alternating increments in R? [duplicate]

This question already has answers here:
Get a seq() in R with alternating steps
(6 answers)
Closed 6 years ago.
I want to use R to create the sequence of numbers 1:8, 11:18, 21:28, etc. through 1000 (or the closest it can get, i.e. 998). Obviously typing that all out would be tedious, but since the sequence increases by one 7 times and then jumps by 3 I'm not sure what function I could use to achieve this.
I tried seq(1, 998, c(1,1,1,1,1,1,1,3)) but it does not give me the results I am looking for so I must be doing something wrong.
This is a perfect case of vectorisation( recycling too) in R. read about them
(1:100)[rep(c(TRUE,FALSE), c(8,2))]
# [1] 1 2 3 4 5 6 7 8 11 12 13 14 15 16 17 18 21 22 23 24 25 26 27 28 31 32
#[27] 33 34 35 36 37 38 41 42 43 44 45 46 47 48 51 52 53 54 55 56 57 58 61 62 63 64
#[53] 65 66 67 68 71 72 73 74 75 76 77 78 81 82 83 84 85 86 87 88 91 92 93 94 95 96
#[79] 97 98
rep(seq(0,990,by=10), each=8) + seq(1,8)
You want to exclude numbers that are 0 or 9 (mod 10). So you can try this too:
n <- 1000 # upper bound
x <- 1:n
x <- x[! (x %% 10) %in% c(0,9)] # filter out (0, 9) mod (10)
head(x,80)
# [1] 1 2 3 4 5 6 7 8 11 12 13 14 15 16 17 18 21 22 23 24 25 26 27
# 28 31 32 33 34 35 36 37 38 41 42 43 44 45 46 47 48 51 52 53 54 55 56 57
# 58 61 62 63 64 65 66 67 68 71 72 73 74 75 76 77 78 81 82 83 84 85
# 86 87 88 91 92 93 94 95 96 97 98
Or in a single line using Filter:
Filter(function(x) !((x %% 10) %in% c(0,9)), 1:100)
# [1] 1 2 3 4 5 6 7 8 11 12 13 14 15 16 17 18 21 22 23 24 25 26 27 28 31 32 33 34 35 36 37 38 41 42 43 44 45 46 47 48 51 52 53 54 55 56 57
# [48] 58 61 62 63 64 65 66 67 68 71 72 73 74 75 76 77 78 81 82 83 84 85 86 87 88 91 92 93 94 95 96 97 98
With a cycle: for(value in c(seq(1,991,10))){vector <- c(vector,seq(value,value+7))}

Sequence with different intervals in R: matching sensor data

I need a vector that repeats numbers in a sequence at varying intervals. I basically need this
c(rep(1:42, each=6), rep(43:64, each = 7),
rep(65:106, each=6), rep(107:128, each = 7),
.... but I need to this to keep going, until almost 2 million.
So I want a vector that looks like
[1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 ...
.....
[252] 43 43 43 43 43 43 43 44 44 44 44 44 44 44
....
[400] 64 64 64 64 64 64 65 65 65 65 65 65...
and so on. Not just alternating between 6 and 7 repetitions, rather mostly 6s and fewer 7s until the whole vector is something like 1.7 million rows. So, is there a loop I can do? Or apply, replicate? I need the 400th entry in the vector to be 64, the 800th entry to be 128, and so on, in somewhat evenly spaced integers.
UPDATE
Thank you all for the quick clever tricks there. It worked, at least well enough for the deadline I was dealing with. I realize repeating 6 xs and 7 xs are a really dumb way to try to solve this, but it was quick at least. But now that I have some time, I would like to get everyone's opinions /ideas on my real underlying issue here.
I have two datasets to merge. They are both sensor datasets, both with stopwatch time as primary keys. But one records every 1/400 of a second, and the other records every 1/256 of a second. I have trimmed the top of each so that they are starting the exact same moment. But.. now what? I have 400 records for each second in one set, and 256 records for 1 second in the other. Is there a way to merge these without losing data? Interpolating or just repeating obs is a-ok, necessary, I think, but I'd rather not throw any data out.
I read this post here, that had to do with using xts and zoo for a very similar problem to mine. But they have nice epoch date/times for each. I just have these awful fractions of seconds!
sample data (A):
time dist a_lat
1 139.4300 22 0
2 139.4325 22 0
3 139.4350 22 0
4 139.4375 22 0
5 139.4400 22 0
6 139.4425 22 0
7 139.4450 22 0
8 139.4475 22 0
9 139.4500 22 0
10 139.4525 22 0
sample data (B):
timestamp hex_acc_x hex_acc_y hex_acc_z
1 367065215501 -0.5546875 -0.7539062 0.1406250
2 367065215505 -0.5468750 -0.7070312 0.2109375
3 367065215509 -0.4218750 -0.6835938 0.1796875
4 367065215513 -0.5937500 -0.7421875 0.1562500
5 367065215517 -0.6757812 -0.7773438 0.2031250
6 367065215521 -0.5937500 -0.8554688 0.2460938
7 367065215525 -0.6132812 -0.8476562 0.2109375
8 367065215529 -0.3945312 -0.8906250 0.2031250
9 367065215533 -0.3203125 -0.8906250 0.2226562
10 367065215537 -0.3867188 -0.9531250 0.2578125
(oh yeah, and btw, the B dataset timestamps are epoch format * 256, because life is hard. i haven't converted it for this because dataset A has nothing like that, only just 0.0025 intervals. Also the B data sensor was left on for hours later the A data sensor turned off, so that doesn't help)
Or if you like, you can try this using apply
# using this sample data
df <- data.frame(from=c(1,4,7,11), to = c(3,6,10,13),rep=c(6,7,6,7));
> df
# from to rep
#1 1 3 6
#2 4 6 7
#3 7 10 6
#4 11 13 7
unlist(apply(df, 1, function(x) rep(x['from']:x['to'], each=x['rep'])))
# [1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4
#[26] 5 5 5 5 5 5 5 6 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8
#[51] 8 9 9 9 9 9 9 10 10 10 10 10 10 11 11 11 11 11 11 11 12 12 12 12 12
#[76] 12 12 13 13 13 13 13 13 13
Now that you put it that way ... I have absolutely no idea how you are planning on using all of the 6s and 7s. :-)
Regardless, I recommend standardizing the time, adding a "sample" column, and merging on them. Having the "sample" column may facilitate your processing later on, perhaps.
Your data:
df400 <- structure(list(time = c(139.43, 139.4325, 139.435, 139.4375, 139.44, 139.4425,
139.445, 139.4475, 139.45, 139.4525),
dist = c(22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L),
a_lat = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)),
.Names = c("time", "dist", "a_lat"),
class = "data.frame", row.names = c(NA, -10L))
df256 <- structure(list(timestamp = c(367065215501, 367065215505, 367065215509, 367065215513,
367065215517, 367065215521, 367065215525, 367065215529,
367065215533, 367065215537),
hex_acc_x = c(-0.5546875, -0.546875, -0.421875, -0.59375, -0.6757812,
-0.59375, -0.6132812, -0.3945312, -0.3203125, -0.3867188),
hex_acc_y = c(-0.7539062, -0.7070312, -0.6835938, -0.7421875,
-0.7773438, -0.8554688, -0.8476562, -0.890625,
-0.890625, -0.953125),
hex_acc_z = c(0.140625, 0.2109375, 0.1796875, 0.15625, 0.203125,
0.2460938, 0.2109375, 0.203125, 0.2226562, 0.2578125)),
.Names = c("timestamp", "hex_acc_x", "hex_acc_y", "hex_acc_z"),
class = "data.frame", row.names = c(NA, -10L))
Standardize your time frames:
colnames(df256)[1] <- 'time'
df400$time <- df400$time - df400$time[1]
df256$time <- (df256$time - df256$time[1]) / 256
Assign a label for easy reference (not that the NAs won't be clear enough):
df400 <- cbind(sample='A', df400, stringsAsFactors=FALSE)
df256 <- cbind(sample='B', df256, stringsAsFactors=FALSE)
And now for the merge and sorting:
dat <- merge(df400, df256, by=c('sample', 'time'), all.x=TRUE, all.y=TRUE)
dat <- dat[order(dat$time),]
dat
## sample time dist a_lat hex_acc_x hex_acc_y hex_acc_z
## 1 A 0.000000 22 0 NA NA NA
## 11 B 0.000000 NA NA -0.5546875 -0.7539062 0.1406250
## 2 A 0.002500 22 0 NA NA NA
## 3 A 0.005000 22 0 NA NA NA
## 4 A 0.007500 22 0 NA NA NA
## 5 A 0.010000 22 0 NA NA NA
## 6 A 0.012500 22 0 NA NA NA
## 7 A 0.015000 22 0 NA NA NA
## 12 B 0.015625 NA NA -0.5468750 -0.7070312 0.2109375
## 8 A 0.017500 22 0 NA NA NA
## 9 A 0.020000 22 0 NA NA NA
## 10 A 0.022500 22 0 NA NA NA
## 13 B 0.031250 NA NA -0.4218750 -0.6835938 0.1796875
## 14 B 0.046875 NA NA -0.5937500 -0.7421875 0.1562500
## 15 B 0.062500 NA NA -0.6757812 -0.7773438 0.2031250
## 16 B 0.078125 NA NA -0.5937500 -0.8554688 0.2460938
## 17 B 0.093750 NA NA -0.6132812 -0.8476562 0.2109375
## 18 B 0.109375 NA NA -0.3945312 -0.8906250 0.2031250
## 19 B 0.125000 NA NA -0.3203125 -0.8906250 0.2226562
## 20 B 0.140625 NA NA -0.3867188 -0.9531250 0.2578125
I'm guessing your data was just a small representation. If I've guessed poorly (that A's integers are seconds and B's integers are 1/400ths of a second) then just scale differently. Either way, by resetting the first value to zero and then merging/sorting, they are easy to merge and sort.
alt <- data.frame(len=c(42,22),rep=c(6,7));
alt;
## len rep
## 1 42 6
## 2 22 7
altrep <- function(alt,cyc,len) {
cyclen <- sum(alt$len*alt$rep);
if (missing(cyc)) {
if (missing(len)) {
cyc <- 1;
len <- cyc*cyclen;
} else {
cyc <- ceiling(len/cyclen);
};
} else if (missing(len)) {
len <- cyc*cyclen;
};
if (isTRUE(all.equal(len,0))) return(integer());
result <- rep(1:(cyc*sum(alt$len)),rep(rep(alt$rep,alt$len),cyc));
length(result) <- len;
result;
};
altrep(alt,2);
## [1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9
## [52] 9 9 9 10 10 10 10 10 10 11 11 11 11 11 11 12 12 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 15 15 15 15 15 15 16 16 16 16 16 16 17 17 17 17 17 17
## [103] 18 18 18 18 18 18 19 19 19 19 19 19 20 20 20 20 20 20 21 21 21 21 21 21 22 22 22 22 22 22 23 23 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25 25 26 26 26
## [154] 26 26 26 27 27 27 27 27 27 28 28 28 28 28 28 29 29 29 29 29 29 30 30 30 30 30 30 31 31 31 31 31 31 32 32 32 32 32 32 33 33 33 33 33 33 34 34 34 34 34 34
## [205] 35 35 35 35 35 35 36 36 36 36 36 36 37 37 37 37 37 37 38 38 38 38 38 38 39 39 39 39 39 39 40 40 40 40 40 40 41 41 41 41 41 41 42 42 42 42 42 42 43 43 43
## [256] 43 43 43 43 44 44 44 44 44 44 44 45 45 45 45 45 45 45 46 46 46 46 46 46 46 47 47 47 47 47 47 47 48 48 48 48 48 48 48 49 49 49 49 49 49 49 50 50 50 50 50
## [307] 50 50 51 51 51 51 51 51 51 52 52 52 52 52 52 52 53 53 53 53 53 53 53 54 54 54 54 54 54 54 55 55 55 55 55 55 55 56 56 56 56 56 56 56 57 57 57 57 57 57 57
## [358] 58 58 58 58 58 58 58 59 59 59 59 59 59 59 60 60 60 60 60 60 60 61 61 61 61 61 61 61 62 62 62 62 62 62 62 63 63 63 63 63 63 63 64 64 64 64 64 64 64 65 65
## [409] 65 65 65 65 66 66 66 66 66 66 67 67 67 67 67 67 68 68 68 68 68 68 69 69 69 69 69 69 70 70 70 70 70 70 71 71 71 71 71 71 72 72 72 72 72 72 73 73 73 73 73
## [460] 73 74 74 74 74 74 74 75 75 75 75 75 75 76 76 76 76 76 76 77 77 77 77 77 77 78 78 78 78 78 78 79 79 79 79 79 79 80 80 80 80 80 80 81 81 81 81 81 81 82 82
## [511] 82 82 82 82 83 83 83 83 83 83 84 84 84 84 84 84 85 85 85 85 85 85 86 86 86 86 86 86 87 87 87 87 87 87 88 88 88 88 88 88 89 89 89 89 89 89 90 90 90 90 90
## [562] 90 91 91 91 91 91 91 92 92 92 92 92 92 93 93 93 93 93 93 94 94 94 94 94 94 95 95 95 95 95 95 96 96 96 96 96 96 97 97 97 97 97 97 98 98 98 98 98 98 99 99
## [613] 99 99 99 99 100 100 100 100 100 100 101 101 101 101 101 101 102 102 102 102 102 102 103 103 103 103 103 103 104 104 104 104 104 104 105 105 105 105 105 105 106 106 106 106 106 106 107 107 107 107 107
## [664] 107 107 108 108 108 108 108 108 108 109 109 109 109 109 109 109 110 110 110 110 110 110 110 111 111 111 111 111 111 111 112 112 112 112 112 112 112 113 113 113 113 113 113 113 114 114 114 114 114 114 114
## [715] 115 115 115 115 115 115 115 116 116 116 116 116 116 116 117 117 117 117 117 117 117 118 118 118 118 118 118 118 119 119 119 119 119 119 119 120 120 120 120 120 120 120 121 121 121 121 121 121 121 122 122
## [766] 122 122 122 122 122 123 123 123 123 123 123 123 124 124 124 124 124 124 124 125 125 125 125 125 125 125 126 126 126 126 126 126 126 127 127 127 127 127 127 127 128 128 128 128 128 128 128
altrep(alt,len=1000);
## [1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9
## [52] 9 9 9 10 10 10 10 10 10 11 11 11 11 11 11 12 12 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 15 15 15 15 15 15 16 16 16 16 16 16 17 17 17 17 17 17
## [103] 18 18 18 18 18 18 19 19 19 19 19 19 20 20 20 20 20 20 21 21 21 21 21 21 22 22 22 22 22 22 23 23 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25 25 26 26 26
## [154] 26 26 26 27 27 27 27 27 27 28 28 28 28 28 28 29 29 29 29 29 29 30 30 30 30 30 30 31 31 31 31 31 31 32 32 32 32 32 32 33 33 33 33 33 33 34 34 34 34 34 34
## [205] 35 35 35 35 35 35 36 36 36 36 36 36 37 37 37 37 37 37 38 38 38 38 38 38 39 39 39 39 39 39 40 40 40 40 40 40 41 41 41 41 41 41 42 42 42 42 42 42 43 43 43
## [256] 43 43 43 43 44 44 44 44 44 44 44 45 45 45 45 45 45 45 46 46 46 46 46 46 46 47 47 47 47 47 47 47 48 48 48 48 48 48 48 49 49 49 49 49 49 49 50 50 50 50 50
## [307] 50 50 51 51 51 51 51 51 51 52 52 52 52 52 52 52 53 53 53 53 53 53 53 54 54 54 54 54 54 54 55 55 55 55 55 55 55 56 56 56 56 56 56 56 57 57 57 57 57 57 57
## [358] 58 58 58 58 58 58 58 59 59 59 59 59 59 59 60 60 60 60 60 60 60 61 61 61 61 61 61 61 62 62 62 62 62 62 62 63 63 63 63 63 63 63 64 64 64 64 64 64 64 65 65
## [409] 65 65 65 65 66 66 66 66 66 66 67 67 67 67 67 67 68 68 68 68 68 68 69 69 69 69 69 69 70 70 70 70 70 70 71 71 71 71 71 71 72 72 72 72 72 72 73 73 73 73 73
## [460] 73 74 74 74 74 74 74 75 75 75 75 75 75 76 76 76 76 76 76 77 77 77 77 77 77 78 78 78 78 78 78 79 79 79 79 79 79 80 80 80 80 80 80 81 81 81 81 81 81 82 82
## [511] 82 82 82 82 83 83 83 83 83 83 84 84 84 84 84 84 85 85 85 85 85 85 86 86 86 86 86 86 87 87 87 87 87 87 88 88 88 88 88 88 89 89 89 89 89 89 90 90 90 90 90
## [562] 90 91 91 91 91 91 91 92 92 92 92 92 92 93 93 93 93 93 93 94 94 94 94 94 94 95 95 95 95 95 95 96 96 96 96 96 96 97 97 97 97 97 97 98 98 98 98 98 98 99 99
## [613] 99 99 99 99 100 100 100 100 100 100 101 101 101 101 101 101 102 102 102 102 102 102 103 103 103 103 103 103 104 104 104 104 104 104 105 105 105 105 105 105 106 106 106 106 106 106 107 107 107 107 107
## [664] 107 107 108 108 108 108 108 108 108 109 109 109 109 109 109 109 110 110 110 110 110 110 110 111 111 111 111 111 111 111 112 112 112 112 112 112 112 113 113 113 113 113 113 113 114 114 114 114 114 114 114
## [715] 115 115 115 115 115 115 115 116 116 116 116 116 116 116 117 117 117 117 117 117 117 118 118 118 118 118 118 118 119 119 119 119 119 119 119 120 120 120 120 120 120 120 121 121 121 121 121 121 121 122 122
## [766] 122 122 122 122 122 123 123 123 123 123 123 123 124 124 124 124 124 124 124 125 125 125 125 125 125 125 126 126 126 126 126 126 126 127 127 127 127 127 127 127 128 128 128 128 128 128 128 129 129 129 129
## [817] 129 129 130 130 130 130 130 130 131 131 131 131 131 131 132 132 132 132 132 132 133 133 133 133 133 133 134 134 134 134 134 134 135 135 135 135 135 135 136 136 136 136 136 136 137 137 137 137 137 137 138
## [868] 138 138 138 138 138 139 139 139 139 139 139 140 140 140 140 140 140 141 141 141 141 141 141 142 142 142 142 142 142 143 143 143 143 143 143 144 144 144 144 144 144 145 145 145 145 145 145 146 146 146 146
## [919] 146 146 147 147 147 147 147 147 148 148 148 148 148 148 149 149 149 149 149 149 150 150 150 150 150 150 151 151 151 151 151 151 152 152 152 152 152 152 153 153 153 153 153 153 154 154 154 154 154 154 155
## [970] 155 155 155 155 155 156 156 156 156 156 156 157 157 157 157 157 157 158 158 158 158 158 158 159 159 159 159 159 159 160 160
You can specify len=1.7e6 (and omit the cyc argument) to get exactly 1.7 million elements, or you can get a whole number of cycles using cyc.
How about
len <- 2e6
step <- 400
x <- rep(64 * seq(0, ceiling(len / step) - 1), each = step) +
sort(rep(1:64, length.out = step))
x <- x[seq(len)] # to get rid of extra elements

Resources