R - Index position with condition - r

I've a data frame like this
w<-c(0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0)
i would like an index position starting after value 1.
output : NA,NA,NA,NA,NA,1,2,3,4,5,6,7,1,2,3,4,5,1,2,3,4,5,6,7,8,9
ideally applicable to a data frame.
Thanks
edit : w is a data frame,
roughly this function
m<-as.data.frame(w)
m[m!=1] <- row(m)[m!=1]
m
w
1 1
2 2
3 3
4 4
5 5
6 1
7 7
8 8
9 9
10 10
11 11
12 12
13 1
14 14
15 15
16 16
17 17
18 1
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
but with a return to 1 when value 1 is matching.
> m
w wanted
1 1 NA
2 2 NA
3 3 NA
4 4 NA
5 5 NA
6 1 1
7 7 2
8 8 3
9 9 4
10 10 5
11 11 6
12 12 7
13 1 1
14 14 2
15 15 3
16 16 4
17 17 5
18 1 1
19 19 2
20 20 3
21 21 4
22 22 5
23 23 6
24 24 7
25 25 8
26 26 9
Thanks

This assumes that the data is ordered in the way shown in example.
m$wanted <- with(m, ave(w, cumsum(c(TRUE,diff(w) <0)), FUN=seq_along))
m$wanted
#[1] 1 2 3 4 5 1 2 3 4 5 6 7 1 2 3 4 5 1 2 3 4 5 6 7 8 9

For the given data including repeated 1's and non-sequential input, the following works:
m[9,1] <- 100
m[3,1] <- 55
m[14,1] <- 60
m[14,1] <- 60
m[25,1] <- 1
m[19,1] <- 1
m$result <- 1:nrow(m) - which(m$w == 1)[cumsum(m$w == 1)] + 1
But if the data does not start on 1:
m[1,1] <- 2
Then this works:
firstone <- which(m$w == 1)[1]
subindex <- m[firstone:nrow(m),'w'] == 1
m$result <- c(rep(NA,firstone-1),1:length(subindex) - which(subindex)[cumsum(subindex)] + 1)

Related

Repeat the first two rows for each id two times

I would like to repeat the first two rows for each id two times. I don't know how to do that. Does anyone have a suggestion?
id <- rep(1:4,each=6)
scored <- c(12,13,NA,NA,NA,NA,14,20,NA,NA,NA,NA,23,56,NA,NA,NA,NA, 45,78,NA,NA,NA,NA)
df <- data.frame(id,scored)
df
id scored
1 1 12
2 1 13
3 1 NA
4 1 NA
5 1 NA
6 1 NA
7 2 14
8 2 20
9 2 NA
10 2 NA
11 2 NA
12 2 NA
13 3 23
14 3 56
15 3 NA
16 3 NA
17 3 NA
18 3 NA
19 4 45
20 4 78
21 4 NA
22 4 NA
23 4 NA
24 4 NA
>
I want it to look like:
df
id score
1 1 12
2 1 13
3 1 12
4 1 13
5 1 12
6 1 13
7 2 14
8 2 20
9 2 14
10 2 20
11 2 14
12 2 20
13 3 23
14 3 56
15 3 23
16 3 56
17 3 23
18 3 56
19 4 45
20 4 78
21 4 45
22 4 78
23 4 45
24 4 78
>
..................................................
..................................................
..................................................
We can do a group by rep on the non-NA elements of 'scored'
library(dplyr)
df %>%
group_by(id) %>%
mutate(scored = rep(scored[!is.na(scored)], length.out = n()))
# A tibble: 24 x 2
# Groups: id [4]
# id scored
# <int> <dbl>
# 1 1 12
# 2 1 13
# 3 1 12
# 4 1 13
# 5 1 12
# 6 1 13
# 7 2 14
# 8 2 20
# 9 2 14
#10 2 20
# … with 14 more rows

Is there any method to sort the matrix by both column and row in R?

could you guys help me?
I have a matrix like this. the first column and row are the IDs.
I need to sort it by column and row ID like this.
Thanks!
Two thoughts:
mat <- matrix(1:25, nr=5, dimnames=list(c('4',3,5,2,1), c('4',3,5,2,1)))
mat
# 4 3 5 2 1
# 4 1 6 11 16 21
# 3 2 7 12 17 22
# 5 3 8 13 18 23
# 2 4 9 14 19 24
# 1 5 10 15 20 25
If you want a strictly alphabetic ordering, then this will work:
mat[order(rownames(mat)),order(colnames(mat))]
# 1 2 3 4 5
# 1 25 20 10 5 15
# 2 24 19 9 4 14
# 3 22 17 7 2 12
# 4 21 16 6 1 11
# 5 23 18 8 3 13
This will not work well if the names are intended to be ordered numerically:
mat <- matrix(1:30, nr=3, dimnames=list(c('2',1,3), c('4',3,5,2,1,6,7,8,9,10)))
mat
# 4 3 5 2 1 6 7 8 9 10
# 2 1 4 7 10 13 16 19 22 25 28
# 1 2 5 8 11 14 17 20 23 26 29
# 3 3 6 9 12 15 18 21 24 27 30
mat[order(rownames(mat)),order(colnames(mat))]
# 1 10 2 3 4 5 6 7 8 9
# 1 14 29 11 5 2 8 17 20 23 26
# 2 13 28 10 4 1 7 16 19 22 25
# 3 15 30 12 6 3 9 18 21 24 27
(1, 10, 2, ...) For that, you need a slight modification:
mat[order(as.numeric(rownames(mat))),order(as.numeric(colnames(mat)))]
# 1 2 3 4 5 6 7 8 9 10
# 1 14 11 5 2 8 17 20 23 26 29
# 2 13 10 4 1 7 16 19 22 25 28
# 3 15 12 6 3 9 18 21 24 27 30

Split into groups based on (multiple) conditions?

I have set of marbles, of different colors and weights, and I want to split them into groups based on their weight and color.
The conditions are:
A group cannot weigh more than 100 units
A group cannot have more than 5 different-colored marbles.
A reproducible example:
marbles <- data.frame(color=sample(1:20, 20), weight=sample(1:40, 20, replace=T))
color weight
1 1 22
2 15 33
3 13 35
4 11 13
5 6 26
6 8 15
7 10 3
8 16 22
9 14 21
10 3 16
11 4 26
12 20 30
13 9 31
14 2 16
15 7 12
16 17 13
17 19 19
18 5 17
19 12 12
20 18 40
And what I want is this group column:
color weight group
1 1 22 1
2 15 33 1
3 13 35 1
4 11 13 2
5 6 26 2
6 8 15 2
7 10 3 2
8 16 22 2
9 14 21 3
10 3 16 3
11 4 26 3
12 20 30 3
13 9 31 4
14 2 16 4
15 7 12 4
16 17 13 4
17 19 19 4
18 5 17 5
19 12 12 5
20 18 40 5
TIA.
The below isn't an optimal assignment to the groups, it just does it sequentially through the data frame. It's uses rowwise and might not be the most efficient way as it's not a vectorized approach.
library(dplyr)
marbles <- data.frame(color=sample(1:20, 20), weight=sample(1:40, 20, replace=T))
Below I create a rowwise function which we can apply using dplyr
assign_group <- function(color, weight) {
# Conditions
clists = append(color_list, color)
sum_val = group_sum + weight
num_colors = length(unique(color_list))
assign_condition = (sum_val <= 100 & num_colors <= 5)
#assign globals
cval <- if(assign_condition) clists else c(color)
sval <- ifelse(assign_condition, sum_val, weight)
gval <- ifelse(assign_condition, group_number, group_number + 1)
assign("color_list", cval, envir = .GlobalEnv)
assign("group_sum", sval, envir = .GlobalEnv)
assign("group_number", gval, envir = .GlobalEnv)
res = group_number
return(res)
}
I then setup a few global variables to track the allocation of the marbles to each group.
# globals
color_list <<- c()
group_sum <<- 0
group_number <<- 1
Finally run this function using mutate
test <- marbles %>% rowwise() %>% mutate(group = assign_group(color,weight)) %>% data.frame()
Which results in the below
color weight group
1 6 27 1
2 12 16 1
3 15 32 1
4 20 25 1
5 19 5 2
6 2 21 2
7 16 39 2
8 17 4 2
9 11 16 2
10 7 7 3
11 10 5 3
12 1 30 3
13 13 7 3
14 9 39 3
15 14 7 4
16 8 17 4
17 18 9 4
18 4 36 4
19 3 1 4
20 5 3 5
And seems to meet the constraints
test %>% group_by(group) %>% summarise(tot_w = sum(weight), n_c = length(unique(color)) )
group tot_w n_c
<dbl> <int> <int>
1 1 100 4
2 2 85 5
3 3 88 5
4 4 70 5
5 5 3 1
in base R you could write a recursive function as shown below:
create_group = function(df,a){
if(missing(a)) a = cumsum(df$weight)%/%100
b = !ave(df$color,a,FUN=seq_along)%%6
d = ave(df$weight,a+b,FUN=cumsum)>100
a = a+b+d
if (any(b|d)) create_group(df,a) else cbind(df,group = a+1)
}
create_group(df)
color weight group
1 1 22 1
2 15 33 1
3 13 35 1
4 11 13 2
5 6 26 2
6 8 15 2
7 10 3 2
8 16 22 2
9 14 21 3
10 3 16 3
11 4 26 3
12 20 30 3
13 9 31 4
14 2 16 4
15 7 12 4
16 17 13 4
17 19 19 4
18 5 17 5
19 12 12 5
20 18 40 5

How to merge two data frames by ranges in R?

Suppose I have two data frames such like:
set.seed(123)
df0<-data.frame(pos=3:12,
count0=rbinom(10, 50, 0.5),
count2=rbinom(10, 20, 0.5))
df0
pos count0 count2
1 3 23 14
2 4 28 10
3 5 24 11
4 6 29 10
5 7 30 7
6 8 19 13
7 9 25 8
8 10 29 6
9 11 25 9
10 12 25 14
df1<-data.frame(start=c(4, 7, 11, 14),
end=c(6, 9, 12, 15),
cnv=c(1, 2, 3, 4))
df1
start end cnv
1 4 6 1
2 7 9 2
3 11 12 3
4 14 15 4
What I want is to merge df0 and df1 using the df0$pos with the ranges ofdf1$start and df1$end. If the pos falls into the range of start:end, fills in the cnv from df1 otherwise set cnv as zeros. An output from the above example would be:
pos count0 count2 cnv
1 3 23 14 0
2 4 28 10 1
3 5 24 11 1
4 6 29 10 1
5 7 30 7 2
6 8 19 13 2
7 9 25 8 2
8 10 29 6 0
9 11 25 9 3
10 12 25 14 3
We can use sapply to find if there is an index which is present in range else return 0.
df0$cnv <- sapply(df0$pos, function(x) {
inds <- x >= df1$start & x <= df1$end
if (any(inds))
df1$cnv[inds]
else 0
})
df0
# pos count0 count2 cnv
#1 3 23 14 0
#2 4 28 10 1
#3 5 24 11 1
#4 6 29 10 1
#5 7 30 7 2
#6 8 19 13 2
#7 9 25 8 2
#8 10 29 6 0
#9 11 25 9 3
#10 12 25 14 3

zoo - Round coredata to integer

I've got a list of 69 zoo objects, I used na.approx to fill few gaps, but since my time series deal with counts I need the imputed values to be integers.
This code
list_int <- lapply(list_dec, round(coredata(list_dec), digits=0))
gives me the following error message
Error in round(coredata(list_dec), digits=0) :
non-numeric argument to mathematical function
I thought it was a problem with applying the function to a list instead of a vector, but the function
coredata(list_dec)
correctly shows all 69 time series (without need for lapply).
So why can't round apply to coredata?
EDITED
As suggested here's a minimal data set
A1 <- runif(20, min=-5, max=13)
A2 <- runif(20, min=-1, max=5)
A3 <- runif(20, min=-3, max=10)
A4 <- runif(20, min=0, max=2)
ls <- list(A1, A2, A3, A4)
list_dec <- lapply(ls, as.zoo)
As discussed in the comments, you can accomplish what you want by the following:
> library(zoo)
> A1 <- runif(20, min=-5, max=13)
> A2 <- runif(20, min=-1, max=5)
> A3 <- runif(20, min=-3, max=10)
> A4 <- runif(20, min=0, max=2)
> ls <- list(A1, A2, A3, A4)
> list_dec <- lapply(ls, as.zoo)
Now list_dec looks as follows:
> list_dec
[[1]]
1 2 3 4 5 6 7 8 9 10 11 12 13
9.20889929 8.03050882 1.52621137 9.91528049 12.71637959 11.93573340 3.34967427 9.75224030 7.90654714 0.08199464 -2.84403691 11.57990103 4.74868873
14 15 16 17 18 19 20
2.94023319 10.71812525 -2.05394366 -1.07669056 7.17503613 4.84871327 4.58929978
[[2]]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1.0756646 0.5615212 0.5697795 0.9629726 2.5962021 3.1932062 0.6894849 1.9844943 1.3351256 4.0043998 0.4756172 0.4573920 0.6009208 4.4963877 4.4149804
16 17 18 19 20
3.7762369 2.9670795 -0.8241576 2.1796402 2.5504061
[[3]]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1.5765136 1.7310402 0.7273943 4.0838831 -0.9946958 -2.0222258 7.5756159 3.9105252 3.9006369 -0.9939739 4.7603811 8.5079521 3.3653795 0.8546201 3.8143874
16 17 18 19 20
5.0847501 -2.6324485 2.0860695 5.7202315 9.5304238
[[4]]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1.36751418 1.44009472 1.41155170 1.55018689 1.31378442 1.09746739 0.09224919 0.66425731 0.61047787 1.63552109 1.56096710 1.59775494 1.69658733 1.08939868 1.96183397
16 17 18 19 20
1.20476936 0.94640977 0.73820689 0.65899943 1.54647028
Now you can directly call lapply like this:
lapply(list_dec,round)
which gives you the desired output:
[[1]]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
9 8 2 10 13 12 3 10 8 0 -3 12 5 3 11 -2 -1 7 5 5
[[2]]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 1 1 1 3 3 1 2 1 4 0 0 1 4 4 4 3 -1 2 3
[[3]]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2 2 1 4 -1 -2 8 4 4 -1 5 9 3 1 4 5 -3 2 6 10
[[4]]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 1 1 2 1 1 0 1 1 2 2 2 2 1 2 1 1 1 1 2

Resources