split data based on cumulative value of column in r

split data based on cumulative value of column in r - r

I have following type of data:
myd <- data.frame (group = c(rep(1, 15), rep(2, 15)),
distance = c(0, 4, 8,9,11, 14,18,19,23, 24, 29,30,35,40, 43,
0, 8,9,9,12, 13,14,15,16, 18, 23,24,28, 29, 30),
var1 = c(1:15, 11:25), var2 = 1:30, var3 = 1:30)
myd
group distance var1 var2 var3
1 1 0 1 1 1
2 1 4 2 2 2
3 1 8 3 3 3
4 1 9 4 4 4
5 1 11 5 5 5
6 1 14 6 6 6
7 1 18 7 7 7
8 1 19 8 8 8
9 1 23 9 9 9
10 1 24 10 10 10
11 1 29 11 11 11
12 1 30 12 12 12
13 1 35 13 13 13
14 1 40 14 14 14
15 1 43 15 15 15
16 2 0 11 16 16
17 2 8 12 17 17
18 2 9 13 18 18
19 2 9 14 19 19
20 2 12 15 20 20
21 2 13 16 21 21
22 2 14 17 22 22
23 2 15 18 23 23
24 2 16 19 24 24
25 2 18 20 25 25
26 2 23 21 26 26
27 2 24 22 27 27
28 2 28 23 28 28
29 2 29 24 29 29
30 2 30 25 30 30
I have multiple group levels (than 2 shown above). For each distance (say mile posts in highway) starts with 0 and is cumulative to end to the group. I want to split the data (make bins) in such way that each group is approximately of distance 10. The resulting split data will be like:
data group1subset1
group distance var1 var2 var3
1 1 0 1 1 1
2 1 4 2 2 2
3 1 8 3 3 3
4 1 9 4 4 4
data group1subset2
5 1 11 5 5 5
6 1 14 6 6 6
7 1 18 7 7 7
8 1 19 8 8 8
data group1subset3
9 1 23 9 9 9
10 1 24 10 10 10
11 1 29 11 11 11
12 1 30 12 12 12
data group1subset4
13 1 35 13 13 13
14 1 40 14 14 14
data group1subset5
15 1 43 15 15 15
=====
data group2subset1
16 2 0 11 16 16
17 2 8 12 17 17
18 2 9 13 18 18
19 2 9 14 19 19
data group2subset2
20 2 12 15 20 20
21 2 13 16 21 21
22 2 14 17 22 22
23 2 15 18 23 23
24 2 16 19 24 24
25 2 18 20 25 25
data group2subset3
26 2 23 21 26 26
27 2 24 22 27 27
28 2 28 23 28 28
29 2 29 24 29 29
30 2 30 25 30 30
I need to automize the process as the real data is really big. Please suggest how can we do it?

I'd actually use cut to accomplish this:
maxd <- (max(myd$distance) %/% 10 * 10) + 10
transform(myd,cutdist = cut(distance, breaks = seq(0,maxd, by = 10),
include.lowest = TRUE))
group distance var1 var2 var3 cumdist cutdist
1 1 0 1 1 1 0 [0,10]
2 1 4 2 2 2 4 [0,10]
3 1 8 3 3 3 12 [0,10]
4 1 9 4 4 4 21 [0,10]
5 1 11 5 5 5 32 (10,20]
6 1 14 6 6 6 46 (10,20]
7 1 18 7 7 7 64 (10,20]
8 1 19 8 8 8 83 (10,20]
9 1 23 9 9 9 106 (20,30]
10 1 24 10 10 10 130 (20,30]
11 1 29 11 11 11 159 (20,30]
12 1 30 12 12 12 189 (20,30]
13 1 35 13 13 13 224 (30,40]
14 1 40 14 14 14 264 (30,40]
15 1 43 15 15 15 307 (40,50]
16 2 0 11 16 16 307 [0,10]
17 2 8 12 17 17 315 [0,10]
18 2 9 13 18 18 324 [0,10]
19 2 9 14 19 19 333 [0,10]
20 2 12 15 20 20 345 (10,20]
21 2 13 16 21 21 358 (10,20]
22 2 14 17 22 22 372 (10,20]
23 2 15 18 23 23 387 (10,20]
24 2 16 19 24 24 403 (10,20]
25 2 18 20 25 25 421 (10,20]
26 2 23 21 26 26 444 (20,30]
27 2 24 22 27 27 468 (20,30]
28 2 28 23 28 28 496 (20,30]
29 2 29 24 29 29 525 (20,30]
30 2 30 25 30 30 555 (20,30]
There's actually no need to calculate cumulative distance, since you want to keep them in groups of multiples of 10

What about
split(myd, interaction(myd$distance %/%10, myd$group, drop=F), drop=T)
here we bin the distance based on how many times it's divisible by 10 and create an interaction with group to make distinct categories for each. We us that as a factor for split(). This puts 0 with the first group and 10 with the second so some of your example breaks are slightly different with the code, but it seems comparable.

Related

Generating a vector with n repetitions of x, then y, then z, with a fixed upper bound

I am trying to create a vector where I have 3 repetitions of the number 1, then 3 repetitions of the number 2, and so on up to, for instance, 3 repetitions of the number 36.
c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5...)
I have tried the following use of rep() but got the following error:
Error in rep(3, seq(1:36)) : argument 'times' incorrect
What formulation do I need to use to properly generate the vector I want?

sort(rep(1:36, 3))
Or even better as #Wimpel mentioned in the comments, use the each argument of the rep function.
rep(1:36, each = 3)
output
# [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 16 16 16 17 17 17 18 18 18 19 19 19 20 20 20 21 21 21 22
# [65] 22 22 23 23 23 24 24 24 25 25 25 26 26 26 27 27 27 28 28 28 29 29 29 30 30 30 31 31 31 32 32 32 33 33 33 34 34 34 35 35 35 36 36 36

This one should work. However probably not the most elegant.
reps = c()
n = 36
for(i in 1:n){
reps = append(reps, rep(i, 3))
}
reps
alternatively using the rep function properly (see documentation (?rep for argument each):
rep(1:36,each = 3)

rep approach is preferable (see existing answers)
Here are some other options:
> kronecker(1:36, rep(1, 3))
[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9
[26] 9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 16 16 16 17 17
[51] 17 18 18 18 19 19 19 20 20 20 21 21 21 22 22 22 23 23 23 24 24 24 25 25 25
[76] 26 26 26 27 27 27 28 28 28 29 29 29 30 30 30 31 31 31 32 32 32 33 33 33 34
[101] 34 34 35 35 35 36 36 36
> c(outer(rep(1, 3), 1:36))
[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9
[26] 9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 16 16 16 17 17
[51] 17 18 18 18 19 19 19 20 20 20 21 21 21 22 22 22 23 23 23 24 24 24 25 25 25
[76] 26 26 26 27 27 27 28 28 28 29 29 29 30 30 30 31 31 31 32 32 32 33 33 33 34
[101] 34 34 35 35 35 36 36 36

Order Levels (format: numbers) in a factor from low to high in RStudio

I have a problem in R. I have created a factor (called reference). But the Levels are not in the right order; I want them to be in order from low to high (1,2,3...30). I tried reference<-relevel(reference,1) but then the order is not right. Is there a way how I can change the order as I want?
reference
[1] 5 5 1 5 5 5 1 1 1 1 1 11 1 1 1 5 1 5 1 2 1 1 1 1 2 1 1 1 3 1 2 1 2 15 2 2 2 15
[39] 16 3 2 2 4 2 16 23 2 14 2 4 2 3 2 14 4 24 2 2 2 2 3 4 3 3 3 3 25 3 2 3 3 3 3 3 25 3
[77] 3 3 3 1 3 15 3 3 3 3 3 1 1 3 8 4 4 4 4 8 4 4 4 4 4 4 4 4 4 4 8 4 4 4 4 8 4 4
[115] 4 4 15 8 4 16 8 16 14 14 5 5 5 5 7 5 16 5 14 16 14 14 5 5 5 5 14 5 3 5 7 8 4 7 5 5 6 4
[153] 4 15 15 15 6 4 6 14 4 14 15 6 4 11 6 28 16 6 16 15 9 14 6 14 15 6 16 14 7 14 16 16 16 16 7 7 14 16
[191] 16 7 15 7 4 15 7 15 14 15 15 9 14 7 16 15 15 15 16 14 8 8 9 4 8 8 10 8 4 7 8 4 8 4 8 8 8 8
[229] 8 8 9 8 8 4 8 8 14 8 8 8 29 14 29 9 29 14 9 16 29 10 29 14 16 9 9 29 29 29 9 29 16 4 4 9 15 29
[267] 9 23 29 9 10 4 10 10 10 10 10 10 10 10 10 10 14 5 10 10 15 11 10 11 10 10 11 11 11 15 4 10 15 10 10 11 11 10
[305] 10 10 11 11 10 11 11 10 11 11 11 10 11 10 8 11 10 10 11 11 10 10 11 11 10 11 14 16 7 15 12 14 14 15 15 14 14 14
[343] 12 17 11 2 15 16 7 16 15 15 15 14 17 28 5 7 17 16 11 13 13 11 13 13 13 13 16 13 13 13 11 13 15 13 13 13 11 13
[381] 13 13 10 13 13 13 13 13 13 10 11 14 15 14 4 14 14 15 8 14 4 4 14 14 14 15 15 14 4 4 14 4 4 4 14 7 8 14
[419] 11 11 15 15 16 3 5 11 15 14 15 15 4 3 15 4 15 15 15 14 14 15 16 15 14 15 11 15 15 15 15 16 7 4 16 16 16 14
[457] 15 14 16 16 16 16 15 12 4 4 4 14 16 16 15 15 14 26 16 26 14 16 4 13 17 21 17 21 17 17 17 17 17 20 17 21 17 17
[495] 18 17 18 17 18 21 17 21 20 18 21 18 18 20 17 17 17 20 17 17 18 18 18 20 18 18 21 18 21 17 17 17 18 18 21 18 17 18
[533] 17 21 17 20 18 22 18 20 19 18 18 19 4 19 18 19 19 15 1 19 17 7 3 20 17 19 19 19 20 18 19 19 19 19 20 18 15 14
[571] 21 20 20 20 20 22 20 20 19 20 20 20 20 20 22 20 21 18 20 20 21 21 17 18 21 20 20 18 20 20 21 17 21 21 22 21 20 20
[609] 21 21 21 21 17 21 18 21 17 18 17 20 20 18 20 20 18 21 21 21 20 17 21 22 22 22 22 22 22 22 21 22 22 22 21 22 22 21
[647] 22 22 21 29 21 22 22 22 22 22 22 20 18 22 8 15 4 4 15 4 4 15 15 15 4 4 4 15 4 15 15 23 4 23 4 2 8 23
[685] 4 23 10 2 4 7 4 7 18 24 15 15 26 11 15 15 4 24 7 15 15 5 24 15 4 1 4 8 24 23 23 6 4 3 23 4 5 1
[723] 1 1 1 2 1 1 1 1 1 1 1 1 3 2 1 5 1 2 2 1 1 1 20 2 3 25 2 1 3 15 15 15 14 14 14 5 14 15
[761] 4 18 14 8 26 4 15 20 10 16 8 4 15 15 16 4 23 18 15 4 27 27 27 27 27 27 27 27 27 27 27 27 27 27 27 27 27 27
[799] 27 27 27 27 27 27 27 27 27 27 27 27 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28
[837] 28 28 28 28 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 30 30 30 30
[875] 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
Levels: 26 28 27 29 3 30 4 5 6 7 8 9 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 23 24 25

You can convert the factors to numbers and then factors again :
reference <- factor(as.numeric(as.character(reference)))
Or if you already know the range of factors :
reference <- factor(reference, 1:30)

Display vector in R with a defined viewport

I want to display a vector consistently in different R environment.
For example, for a vector like this
c(1:30)
will display 24 values per row
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
[25] 25 26 27 28 29 30
and not
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

The closest thing to what you are looking for is to use options() to configure the width of the results window:
options(width = 75)
c(1:30)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
[24] 24 25 26 27 28 29 30

Extracting columns from data according to column names

I used list.file in order to read many files in a folder named "matrices"
data <- list.files(path="/home/rania/Downloads/matrices/", full.names=TRUE, pattern="\\.csv") %>% lapply(read.csv, header=TRUE, sep=",")
when i write data the result is below :
> data
[[1]]
X25 X14 X14.1
1 4 145 58
2 3 4 5
3 10 11 12
4 14 12 17
5 12 2 8
6 8 2 8
7 7 47 55
8 15 12 18
9 17 12 18
10 14 22 47
11 17 12 75
12 23 77 14
13 12 23 17
[[2]]
X58 X14 X87
1 69 14 20
2 6 8 95
3 13 14 15
4 10 11 14
5 4 1 1
6 7 6 7
7 66 75 8
8 19 13 17
9 89 55 23
10 45 58 58
11 12 32 74
12 12 74 25
13 23 12 28
[[3]]
X63 X21 X87
1 87 23 88
2 20 12 47
3 16 17 18
4 15 10 17
5 8 4 8
6 5 2 6
7 74 85 66
8 19 15 13
9 47 22 54
10 47 77 12
11 15 17 85
12 12 33 45
13 23 47 58
[[4]]
X10 X23 X15
1 11 23 23
2 12 87 17
3 19 12 11
4 19 66 12
5 2 12 77
6 7 88 12
7 45 95 32
8 17 21 78
9 19 12 11
10 12 13 77
11 21 47 13
12 74 12 25
13 1 52 7
I need to extract data from each one of those files, in one iteration, according to columns name. How can i do that ? thanks !

We can use Map
do.call(cbind, Map(`[`, data, c("X25", "X14", "X21", "X23")))
# X25 X14 X21 X23
#1 4 14 23 23
#2 3 8 12 87
#3 10 14 17 12
#4 14 11 10 66
#5 12 1 4 12
#6 8 6 2 88
#7 7 75 85 95
#8 15 13 15 21
#9 17 55 22 12
#10 14 58 77 13
#11 17 32 17 47
#12 23 74 33 12
#13 12 12 47 52

Thanks #RoyalTS it works
> do.call(cbind, Map(`[`, data, c("X25", "X14", "X21", "X23")))
X25 X14 X21 X23
1 4 14 23 23
2 3 8 12 87
3 10 14 17 12
4 14 11 10 66
5 12 1 4 12
6 8 6 2 88
7 7 75 85 95
8 15 13 15 21
9 17 55 22 12
10 14 58 77 13
11 17 32 17 47
12 23 74 33 12
13 12 12 47 52

split a list and increment for loop by 10

How to split a list in r?
I want to split a list in increment manner.
for ex.:
x <- 1:50
n <- 5
spt <- split(x,cut(x,quantile(x,(0:n)/n), include.lowest=TRUE, labels=FALSE))
we get
$`1`
[1] 1 2 3 4 5 6 7 8 9 10
$`2`
[1] 11 12 13 14 15 16 17 18 19 20
$`3`
[1] 21 22 23 24 25 26 27 28 29 30
$`4`
[1] 31 32 33 34 35 36 37 38 39 40
$`5`
[1] 41 42 43 44 45 46 47 48 49 50
I don't want this output. I want the output like below,
$`1`
[1] 1 2 3 4 5 6 7 8 9 10
$`2`
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
$`3`
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
$`4`
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2021 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
$`5`
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
any idea?
And i also want to know that how to increment for loop by 10 in r?
Thanks.

We can use seq
lapply(seq(10,50, by=10), function(i) x[1:i])
Or as #RichardScriven mentioned in the comments, the seq(10,50, by=10) can be replaced by 1:5 * 10L

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

split data based on cumulative value of column in r - r

Related

Generating a vector with n repetitions of x, then y, then z, with a fixed upper bound

Order Levels (format: numbers) in a factor from low to high in RStudio

Display vector in R with a defined viewport

Extracting columns from data according to column names

split a list and increment for loop by 10

Categories

Resources