Creating an index for each subject in R

Creating an index for each subject in R - r

I'm working with some data on repeated measures of subjects over time. The data is in this format:
Subject <- as.factor(c(rep("A", 20), rep("B", 35), rep("C", 13)))
variable.A <- rnorm(mean = 300, sd = 50, n = Subject)
dat <- data.frame(Subject, variable.A)
dat
Subject variable.A
1 A 334.6567
2 A 353.0988
3 A 244.0863
4 A 284.8918
5 A 302.6442
6 A 298.3162
7 A 271.4864
8 A 268.6848
9 A 262.3761
10 A 341.4224
11 A 190.4823
12 A 297.1981
13 A 319.8346
14 A 343.9855
15 A 332.5318
16 A 221.9502
17 A 412.9172
18 A 283.4206
19 A 310.9847
20 A 276.5423
21 B 181.5418
22 B 340.5812
23 B 348.5162
24 B 364.6962
25 B 312.2508
26 B 278.9855
27 B 242.8810
28 B 272.9585
29 B 239.2776
30 B 254.9140
31 B 253.8940
32 B 330.1918
33 B 300.7302
34 B 237.6511
35 B 314.4919
36 B 239.6195
37 B 282.7955
38 B 260.0943
39 B 396.5310
40 B 325.5422
41 B 374.8063
42 B 363.1897
43 B 258.0310
44 B 358.8605
45 B 251.8775
46 B 299.6995
47 B 303.4766
48 B 359.8955
49 B 299.7089
50 B 289.3128
51 B 401.7680
52 B 276.8078
53 B 441.4852
54 B 232.6222
55 B 305.1977
56 C 298.4580
57 C 210.5164
58 C 272.0228
59 C 282.0540
60 C 207.8797
61 C 263.3859
62 C 324.4417
63 C 273.5904
64 C 348.4389
65 C 174.2979
66 C 363.4353
67 C 260.8548
68 C 306.1833
I've used the seq_along() function and the dplyr package to create an index of each observation for every subject:
dat <- as.data.frame(dat %>%
group_by(Subject) %>%
mutate(index = seq_along(Subject)))
Subject variable.A index
1 A 334.6567 1
2 A 353.0988 2
3 A 244.0863 3
4 A 284.8918 4
5 A 302.6442 5
6 A 298.3162 6
7 A 271.4864 7
8 A 268.6848 8
9 A 262.3761 9
10 A 341.4224 10
11 A 190.4823 11
12 A 297.1981 12
13 A 319.8346 13
14 A 343.9855 14
15 A 332.5318 15
16 A 221.9502 16
17 A 412.9172 17
18 A 283.4206 18
19 A 310.9847 19
20 A 276.5423 20
21 B 181.5418 1
22 B 340.5812 2
23 B 348.5162 3
24 B 364.6962 4
25 B 312.2508 5
26 B 278.9855 6
27 B 242.8810 7
28 B 272.9585 8
29 B 239.2776 9
30 B 254.9140 10
31 B 253.8940 11
32 B 330.1918 12
33 B 300.7302 13
34 B 237.6511 14
35 B 314.4919 15
36 B 239.6195 16
37 B 282.7955 17
38 B 260.0943 18
39 B 396.5310 19
40 B 325.5422 20
41 B 374.8063 21
42 B 363.1897 22
43 B 258.0310 23
44 B 358.8605 24
45 B 251.8775 25
46 B 299.6995 26
47 B 303.4766 27
48 B 359.8955 28
49 B 299.7089 29
50 B 289.3128 30
51 B 401.7680 31
52 B 276.8078 32
53 B 441.4852 33
54 B 232.6222 34
55 B 305.1977 35
56 C 298.4580 1
57 C 210.5164 2
58 C 272.0228 3
59 C 282.0540 4
60 C 207.8797 5
61 C 263.3859 6
62 C 324.4417 7
63 C 273.5904 8
64 C 348.4389 9
65 C 174.2979 10
66 C 363.4353 11
67 C 260.8548 12
68 C 306.1833 13
What I'm now looking to do is set up an analysis that looks at every 10 observations, so I'd like to create another column that basically gives me a number for every 10 observations. For example, Subject A would have a sequence of ten "1's" followed by a sequence of ten "2's" (IE, two groupings of 10). I've tried to use the rep() function but the issue I'm running into is that the other subjects don't have a number of observations that is divisible by 10.
Is there a way for the rep() function to just assign the grouping the next number, even if it doesn't have 10 total observations? For example, Subject B would have ten "1's", ten "2's" and then five "3's" (representing that his last group of observations)?

You can use modular division %/% to generate the ids:
dat %>%
group_by(Subject) %>%
mutate(chunk_id = (seq_along(Subject) - 1) %/% 10 + 1) -> dat1
table(dat1$Subject, dat1$chunk_id)
# 1 2 3 4
# A 10 10 0 0
# B 10 10 10 5
# C 10 3 0 0

For a plain vanilla base R solution, you also could try this:
dat$newcol <- 1
dat$index <- ave(dat$newcol, dat$Subject, FUN = cumsum)
dat$chunk_id <- (dat$index - 1) %/% 10 + 1
which, when you run the table command as above gives you
table(dat$Subject, dat$chunk_id)
1 2 3 4
A 10 10 0 0
B 10 10 10 5
C 10 3 0 0
If you don't want the extra 'newcol' column, just use 'NULL' to get rid of it:
dat$newcol <- NULL

Related

Subset data frame where values are greater than another data frame

Say I have a data frame with 3 columns of data (a,b,c) and 1 column of categories with multiple instances of each category (class).
set.seed(273)
a <- floor(runif(20,0,100))
b <- floor(runif(20,0,100))
c <- floor(runif(20,0,100))
class <- floor(runif(20,0,6))
df1 <- data.frame(a,b,c,class)
print(df1)
a b c class
1 31 73 28 3
2 44 33 57 3
3 19 35 53 0
4 68 70 39 4
5 92 7 57 2
6 13 67 23 3
7 73 50 14 2
8 59 14 91 5
9 37 3 72 5
10 27 3 13 4
11 63 28 0 5
12 51 7 35 4
13 11 36 76 3
14 72 25 8 5
15 23 24 6 3
16 15 1 16 5
17 55 24 5 5
18 2 54 39 1
19 54 95 20 3
20 60 39 65 1
And I have another data frame with the same 3 columns of data and category column, however this only has one instance per category (class).
a <- floor(runif(6,0,20))
b <- floor(runif(6,0,20))
c <- floor(runif(6,0,20))
class <- seq(0,5)
df2 <- data.frame(a,b,c,class)
print(df2)
a b c class
1 8 15 13 0
2 0 3 6 1
3 14 4 0 2
4 7 10 6 3
5 18 18 16 4
6 17 17 11 5
How to I subset the first data frame so that only rows where a, b, and c are all greater than the value in the second data frame for each class? For example, I only want rows where class == 0 if a > 8 & b > 15 & c > 13.
Note that I don't want to join the data frames, as the second data frame is the lowest acceptable value for the the first data frame.

As commented by Frank this can be done with non-equi joins.
# coerce to data.table
tmp <- setDT(df1)[
# non-equi join to find which rows of df1 fulfill conditions in df2
setDT(df2), on = .(class, a > a, b > b, c > c), rn, nomatch = 0L, which = TRUE]
# return subset in original order of df1
df1[sort(tmp)]
a b c class
1: 31 73 28 3
2: 44 33 57 3
3: 19 35 53 0
4: 68 70 39 4
5: 92 7 57 2
6: 13 67 23 3
7: 73 50 14 2
8: 11 36 76 3
9: 2 54 39 1
10: 54 95 20 3
11: 60 39 65 1
The parameter which = TRUE returns a vector of the matching row numbers instead of the joined data set. This saves us from creating a row id column before the join. (Credit to #Frank for reminding me of the which parameter!)
Note that there is no row in df1 which fulfills the condition for class == 5 in df2. Therefore, the parameter nomatch = 0L is used to exclude non-matching rows from the result.
This can be put together in a "one-liner":
setDT(df1)[sort(df1[setDT(df2), on = .(class, a > a, b > b, c > c), nomatch = 0L, which = TRUE])]

Assign weights in lpSolveAPI to prioritise variables

I am trying to set up a linear programming solution using lpSolveAPI and R to solve a scheduling problem. Below is a small sample of the data; the minutes required for each session id, and their 'preferred' order/weight.
id <- 1:100
min <- sample(0:500, 100)
weight <- (1:100)/sum(1:100)
data <- data.frame(id, min, weight)
What I want to do is arrange/schedule these session IDs so that there are maximum number sessions in a day, preferably by their weight and each day is capped by a total of 400 minutes.
This is how I have set it up currently in R:
require(lpSolveAPI)
#Set up matrix to hold results; each row represents day
r <- 5
c <- 10
row <- 1
results <- matrix(0, nrow = r, ncol = c)
rownames(results) <- format(seq(Sys.Date(), by = "days", length.out = r), "%Y-%m-%d")
for (i in 1:r){
for(j in 1:c){
lp <- make.lp(0, nrow(data))
set.type(lp, 1:nrow(data), "binary")
set.objfn(lp, rep(1, nrow(data)))
lp.control(lp, sense = "max")
add.constraint(lp, data$min, "<=", 400)
set.branch.weights(lp, data$weight)
solve(lp)
a <- get.variables(lp)*data$id
b <- a[a!=0]
tryCatch(results[row, 1:length(b)] <- b, error = function(x) 0)
if(dim(data[!data$id == a,])[1] > 0) {
data <- data[!data$id== a,]
row <- row + 1
}
break
}
}
sum(results > 0)
barplot(results) #View of scheduled IDs
A quick look at the results matrix tells me that while the setup works to maximise number of sessions so that the total minutes in a day are close to 400 as possible, the setup doesn't follow the weights given. I expect my results matrix to be filled with increasing session IDs.
I have tried assigning different weights, weights in reverse order etc. but for some reason my setup doesn't seem to enforce "set.branch.weights".
I have read the documentation for "set.branch.weights" from lpSolveAPI but I think I am doing something wrong here.
Example - Data:
id min weight
1 67 1
2 72 2
3 36 3
4 91 4
5 80 5
6 44 6
7 76 7
8 58 8
9 84 9
10 96 10
11 21 11
12 1 12
13 41 13
14 66 14
15 89 15
16 62 16
17 11 17
18 42 18
19 68 19
20 25 20
21 44 21
22 90 22
23 4 23
24 33 24
25 31 25
Should be
Day 1 67 72 36 91 80 44 76
Day 2 58 84 96 21 1 41 66 89
Day 3 62 11 42 68 25 44 90 4 33 31
Each day has a cumulative sum of <= 480m.

My simple minded approach:
df = read.table(header=T,text="
id min weight
1 67 1
2 72 2
3 36 3
4 91 4
5 80 5
6 44 6
7 76 7
8 58 8
9 84 9
10 96 10
11 21 11
12 1 12
13 41 13
14 66 14
15 89 15
16 62 16
17 11 17
18 42 18
19 68 19
20 25 20
21 44 21
22 90 22
23 4 23
24 33 24
25 31 25")
# assume sorted by weight
daynr = 1
daymax = 480
dayusd = 0
for (i in 1:nrow(df))
{
v = df$min[i]
dayusd = dayusd + v
if (dayusd>daymax)
{
daynr = daynr + 1
dayusd = v
}
df$day[[i]] = daynr
}
This will give:
> df
id min weight day
1 1 67 1 1
2 2 72 2 1
3 3 36 3 1
4 4 91 4 1
5 5 80 5 1
6 6 44 6 1
7 7 76 7 1
8 8 58 8 2
9 9 84 9 2
10 10 96 10 2
11 11 21 11 2
12 12 1 12 2
13 13 41 13 2
14 14 66 14 2
15 15 89 15 2
16 16 62 16 3
17 17 11 17 3
18 18 42 18 3
19 19 68 19 3
20 20 25 20 3
21 21 44 21 3
22 22 90 22 3
23 23 4 23 3
24 24 33 24 3
25 25 31 25 3
>

I will concentrate on the first solve. We basically solve a knapsack problem (objective + one constraint):
When I run this model as is I get:
> solve(lp)
[1] 0
> x <- get.variables(lp)
> weightx <- data$weight * x
> sum(x)
[1] 14
> sum(weightx)
[1] 0.5952381
Now when I change the objective to
I get:
> solve(lp)
[1] 0
> x <- get.variables(lp)
> weightx <- data$weight * x
> sum(x)
[1] 14
> sum(weightx)
[1] 0.7428571
I.e. the count stayed at 14, but the weight improved.

All possible unique pair combinations of gamete positions

I have some gamete data in the following format:
Ind Letter Place Position
1 A 19 23
2 B 19 23
3 B 19 23
4 B 19 23
1 B 19 34
2 A 19 34
3 B 19 34
4 B 19 34
1 C 19 52
2 T 19 52
3 C 19 52
4 T 19 52
1 T 33 15
2 T 33 15
3 T 33 15
4 C 33 15
1 C 33 26
2 T 33 26
3 T 33 26
4 C 33 26
dput of data:
structure(list(Ind = c(1L,2L,3L,4L,1L,2L,3L,4L,1L,2L,3L,4L,1L,2L,3L,4L,1L,2L,3L,4L),
Letter = structure(c(1L,2L,2L,2L,2L,1L,2L,2L,3L,4L,3L,4L,4L,4L,4L,3L,3L,4L,4L,3L),
.Label = c("A","B","C","T"), class="factor"),
Place = c(19L,19L,19L,19L,19L,19L,19L,19L,19L,19L,19L,19L,33L,33L,33L,33L,33L,33L,33L,33L),
Position = c(23L,23L,23L,23L,34L,34L,34L,34L,52L,52L,52L,52L,15L,15L,15L,15L,26L,26L,26L,26L)),
.Names = c("Ind","Letter","Place","Position"),
class="data.frame", row.names = c(NA,-20L))
I need to pair and combine them, so I get all possible unique combinations with reference to Position within a pair. I have another data-file, that contains information on the pairs, and they are paired with reference to Place. So in this file I may see, that Place 19+Place 33 is a pair, and I want the following result:
Ind Letter Place Position Ind Letter Place Position
1 A 19 23 1 T 33 15
2 B 19 23 2 T 33 15
3 B 19 23 3 T 33 15
4 B 19 23 4 C 33 15
1 A 19 23 1 C 33 26
2 B 19 23 2 T 33 26
3 B 19 23 3 T 33 26
4 B 19 23 4 C 33 26
1 B 19 34 1 T 33 15
2 A 19 34 2 T 33 15
3 B 19 34 3 T 33 15
4 B 19 34 4 C 33 15
1 B 19 34 1 C 33 26
2 A 19 34 2 T 33 26
3 B 19 34 3 T 33 26
4 B 19 34 4 C 33 26
1 C 19 52 1 T 33 15
2 T 19 52 2 T 33 15
3 C 19 52 3 T 33 15
4 T 19 52 4 C 33 15
1 C 19 52 1 C 33 26
2 T 19 52 2 T 33 26
3 C 19 52 3 T 33 26
4 T 19 52 4 C 33 26
In this case unique means that A1:A2 is equal to A2:A1.
The reason I want to do this, is because I want to do a Four-Gamete-Test on the pairs, to the see if all possible combinations of Letter is existent. So e.g. for the last combined pair above, we have the letter-pairs CC, TT, CT, TC, so this combined pair will pass the FGT.
I have tried to do the combining with expand.grid, as it seems this is quite close to what I want. However, when I require all combination of data$Position, I lose the information for Ind, Letter, and Place. Also the output includes non-unique pairs.
Can anyone point me to a tool, that is closer to what I want? Or give me some guidelines on how to modify expand.grid, to get what I need.
Should you be aware of a tool, that actually does the Four-Gamete-Test, or something similar, then that would of course also be interesting for me to look at.

You can use expand.grid but not directly on the Position column. The idea is to find all combinations of the "quartets" (unique Positions):
pair <- c(19, 33)
df1 <- df1[df1$Place %in% pair, ]
split1 <- split( df1, df1$Position)
vec1 <- unique(df1$Position[df1$Place == pair[1]])
vec2 <- unique(df1$Position[df1$Place == pair[2]])
combin_num <- expand.grid(vec2, vec1)[,2:1]
do.call(
rbind,
lapply(seq_len(nrow(combin_num)), function(i){
cbind( split1[[as.character(combin_num[i,1])]],
split1[[as.character(combin_num[i,2])]] )
})
)[,]
Result:
# Ind Letter Place Position Ind.1 Letter.1 Place.1 Position.1
# 1 1 A 19 23 1 T 33 15
# 2 2 B 19 23 2 T 33 15
# 3 3 B 19 23 3 T 33 15
# 4 4 B 19 23 4 C 33 15
# 5 1 A 19 23 1 C 33 26
# 6 2 B 19 23 2 T 33 26
# 7 3 B 19 23 3 T 33 26
# 8 4 B 19 23 4 C 33 26
# 51 1 B 19 34 1 T 33 15
# 61 2 A 19 34 2 T 33 15
# 71 3 B 19 34 3 T 33 15
# 81 4 B 19 34 4 C 33 15
# 52 1 B 19 34 1 C 33 26
# 62 2 A 19 34 2 T 33 26
# 72 3 B 19 34 3 T 33 26
# 82 4 B 19 34 4 C 33 26
# 9 1 C 19 52 1 T 33 15
# 10 2 T 19 52 2 T 33 15
# 11 3 C 19 52 3 T 33 15
# 12 4 T 19 52 4 C 33 15
# 91 1 C 19 52 1 C 33 26
# 101 2 T 19 52 2 T 33 26
# 111 3 C 19 52 3 T 33 26
# 121 4 T 19 52 4 C 33 26

R Two matrices, extract rows from m based on common column of r

My question is about matrixes for R. I have two matrices r and m:
m <- as.matrix(read.table(text="
15 56 44 1 4 7
61 31 63 7 1 3
10 36 99 5 9 6
65 79 88 54 1 1"))
colnames(m) <- c("Z","Q","A","F","D","H")
r <- as.matrix(read.table(text="
15 56 64
10 36 61 "))
colnames(r) <- c("Z","L","O")
I want to extract the rows based on common column (in this case Z column), so the result would be
A
15 56 44 1 4 7
10 36 99 5 9 6
A is the new matrix.
Any Ideas how to ?

Just do:
> merge(x=m, y=r, by='Z')
Z Q A F D H L O
1 10 36 99 5 9 6 36 61
2 15 56 44 1 4 7 56 64
To only keep the columns in m:
> merge(x=r, y=m, by='Z', sort=FALSE)[colnames(m)]
Z Q A F D H
1 15 56 44 1 4 7
2 10 36 99 5 9 6

Also:
indx <- intersect(colnames(m), colnames(r))
m[m[,indx] %in% r[,indx],]
# Z Q A F D H
#[1,] 15 56 44 1 4 7
#[2,] 10 36 99 5 9 6

Enumerate instances of a factor level

I have a data frame with 150000 lines in long format with multiple occurences of the same id variable. I'm using reshape (from stat, rather than package=reshape(2)) to convert this to wide format. I am generating a variable to count each occurence of a given level of id to use as an index.
I've got this working with a small dataframe using plyr, but it is far too slow for my full df. Can I programme this more efficiently?
I've struggled doing this with the reshape package as I have around 30 other variables. It may be best to reshape only what I'm looking at (rather than the whole df) for each individual analysis.
> # u=id variable with three value variables
> u<-c(rep("a",4), rep("b", 3),rep("c", 6), rep("d", 5))
> u<-factor(u)
> v<-1:18
> w<-20:37
> x<-40:57
> df<-data.frame(u,v,w,x)
> df
u v w x
1 a 1 20 40
2 a 2 21 41
3 a 3 22 42
4 a 4 23 43
5 b 5 24 44
6 b 6 25 45
7 b 7 26 46
8 c 8 27 47
9 c 9 28 48
10 c 10 29 49
11 c 11 30 50
12 c 12 31 51
13 c 13 32 52
14 d 14 33 53
15 d 15 34 54
16 d 16 35 55
17 d 17 36 56
18 d 18 37 57
>
> library(plyr)
> df2<-ddply(df, .(u), transform, count=rank(u, ties.method="first"))
> df2
u v w x count
1 a 1 20 40 1
2 a 2 21 41 2
3 a 3 22 42 3
4 a 4 23 43 4
5 b 5 24 44 1
6 b 6 25 45 2
7 b 7 26 46 3
8 c 8 27 47 1
9 c 9 28 48 2
10 c 10 29 49 3
11 c 11 30 50 4
12 c 12 31 51 5
13 c 13 32 52 6
14 d 14 33 53 1
15 d 15 34 54 2
16 d 16 35 55 3
17 d 17 36 56 4
18 d 18 37 57 5
> reshape(df2, idvar="u", timevar="count", direction="wide")
u v.1 w.1 x.1 v.2 w.2 x.2 v.3 w.3 x.3 v.4 w.4 x.4 v.5 w.5 x.5 v.6 w.6 x.6
1 a 1 20 40 2 21 41 3 22 42 4 23 43 NA NA NA NA NA NA
5 b 5 24 44 6 25 45 7 26 46 NA NA NA NA NA NA NA NA NA
8 c 8 27 47 9 28 48 10 29 49 11 30 50 12 31 51 13 32 52
14 d 14 33 53 15 34 54 16 35 55 17 36 56 18 37 57 NA NA NA

I still can't quite figure out why you would want to ultimately convert your dataset from wide to long, because to me, that seems like it would be an extremely unwieldy dataset to work with.
If you're looking to speed up the enumeration of your factor levels, you can consider using ave() in base R, or .N from the "data.table" package. Considering that you are working with a lot of rows, you might want to consider the latter.
First, let's make up some data:
set.seed(1)
df <- data.frame(u = sample(letters[1:6], 150000, replace = TRUE),
v = runif(150000, 0, 10),
w = runif(150000, 0, 100),
x = runif(150000, 0, 1000))
list(head(df), tail(df))
# [[1]]
# u v w x
# 1 b 6.368412 10.52822 223.6556
# 2 c 6.579344 75.28534 450.7643
# 3 d 6.573822 36.87630 283.3083
# 4 f 9.711164 66.99525 681.0157
# 5 b 5.337487 54.30291 137.0383
# 6 f 9.587560 44.81581 831.4087
#
# [[2]]
# u v w x
# 149995 b 4.614894 52.77121 509.0054
# 149996 f 5.104273 87.43799 391.6819
# 149997 f 2.425936 60.06982 160.2324
# 149998 a 1.592130 66.76113 118.4327
# 149999 b 5.157081 36.90400 511.6446
# 150000 a 3.565323 92.33530 252.4982
table(df$u)
#
# a b c d e f
# 25332 24691 24993 24975 25114 24895
Load our required packages:
library(plyr)
library(data.table)
Create a "data.table" version of our dataset
DT <- data.table(df, key = "u")
DT # Notice that the data are now automatically sorted
# u v w x
# 1: a 6.2378578 96.098294 643.2433
# 2: a 5.0322400 46.806132 544.6883
# 3: a 9.6289786 87.915303 334.6726
# 4: a 4.3393403 1.994383 753.0628
# 5: a 6.2300123 72.810359 579.7548
# ---
# 149996: f 0.6268414 15.608049 669.3838
# 149997: f 2.3588955 40.380824 658.8667
# 149998: f 1.6383619 77.210309 250.7117
# 149999: f 5.1042725 87.437989 391.6819
# 150000: f 2.4259363 60.069820 160.2324
DT[, .N, by = key(DT)] # Like "table"
# u N
# 1: a 25332
# 2: b 24691
# 3: c 24993
# 4: d 24975
# 5: e 25114
# 6: f 24895
Now let's run a few basic tests. The results from ave() aren't sorted, but they are in "data.table" and "plyr", so we should also test the timing for sorting when using ave().
system.time(AVE <- within(df, {
count <- ave(as.numeric(u), u, FUN = seq_along)
}))
# user system elapsed
# 0.024 0.000 0.027
# Now time the sorting
system.time(AVE2 <- AVE[order(AVE$u, AVE$count), ])
# user system elapsed
# 0.264 0.000 0.262
system.time(DDPLY <- ddply(df, .(u), transform,
count=rank(u, ties.method="first")))
# user system elapsed
# 0.944 0.000 0.984
system.time(DT[, count := 1:.N, by = key(DT)])
# user system elapsed
# 0.008 0.000 0.004
all(DDPLY == AVE2)
# [1] TRUE
all(data.frame(DT) == AVE2)
# [1] TRUE
That syntax for "data.table" sure is compact, and it's speed is blazing!

Using base R to create an empty matrix and then fill it in appropriately can often be significantly faster. In the code below I suspect the slow part would be converting the data frame to a matrix and transposing, as in the first two lines; if so, that could perhaps be avoided if it could be stored differently to start with.
g <- df$a
x <- t(as.matrix(df[,-1]))
k <- split(seq_along(g), g)
n <- max(sapply(k, length))
out <- matrix(ncol=n*nrow(x), nrow=length(k))
for(idx in seq_along(k)) {
out[idx, seq_len(length(k[[idx]])*nrow(x))] <- x[,k[[idx]]]
}
rownames(out) <- names(k)
colnames(out) <- paste(rep(rownames(x), n), rep(seq_len(n), each=nrow(x)), sep=".")
out
# b.1 c.1 d.1 b.2 c.2 d.2 b.3 c.3 d.3 b.4 c.4 d.4 b.5 c.5 d.5 b.6 c.6 d.6
# a 1 20 40 2 21 41 3 22 42 4 23 43 NA NA NA NA NA NA
# b 5 24 44 6 25 45 7 26 46 NA NA NA NA NA NA NA NA NA
# c 8 27 47 9 28 48 10 29 49 11 30 50 12 31 51 13 32 52
# d 14 33 53 15 34 54 16 35 55 17 36 56 18 37 57 NA NA NA

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Creating an index for each subject in R - r

You can use modular division %/% to generate the ids: dat %>% group_by(Subject) %>% mutate(chunk_id = (seq_along(Subject) - 1) %/% 10 + 1) -> dat1 table(dat1$Subject, dat1$chunk_id) # 1 2 3 4 # A 10 10 0 0 # B 10 10 10 5 # C 10 3 0 0

Related

Subset data frame where values are greater than another data frame

Assign weights in lpSolveAPI to prioritise variables

All possible unique pair combinations of gamete positions

R Two matrices, extract rows from m based on common column of r

Enumerate instances of a factor level

Categories

Resources