Identify and replace duplicates elements from a vector - r

I have got a vector which is as under
a<- c(1,1,1,2,3,2,2,2,2,1,0,0,0,0,2,3,4,4,1,1)
Here we can see that there are lot of duplicate elements, ie. they are repeated ones.
I want a code which can replace all the elements which are consecutive and duplicate by 0 except for the first element. The result which i require is
a<- c(1,0,0,2,3,2,0,0,0,1,0,0,0,0,2,3,4,0,1,0)
I've tried
unique(a)
#which gives
[1] 1 2 3 0 4

You can created a lagged series and compare
> a
[1] 1 1 1 2 3 2 2 2 2 1 0 0 0 0 2 3 4 4 1 1
> ifelse(a == c(a[1]-1,a[(1:length(a)-1)]) , 0 , a)
[1] 1 0 0 2 3 2 0 0 0 1 0 0 0 0 2 3 4 0 1 0

replace(a, duplicated(c(0, cumsum(abs(diff(a))))), 0)
# [1] 1 0 0 2 3 2 0 0 0 1 0 0 0 0 2 3 4 0 1 0

Related

How to convert a binary data frame to a vector?

Suppose I have a data frame such like
dat<-data.frame('0'=c(1,1,0,0,0,0,0,0),
'1'=c(0,0,1,0,1,0,0,0),
'2'=c(0,0,0,1,0,0,1,1),
'3'=c(0,0,0,0,0,1,0,0))
dat
X0 X1 X2 X3
1 1 0 0 0
2 1 0 0 0
3 0 1 0 0
4 0 0 1 0
5 0 1 0 0
6 0 0 0 1
7 0 0 1 0
8 0 0 1 0
I wanted to convert it to a vector like 1,1,2,3,2,4,3,3 where the numbers corresponding the column-th with unit 1. For example, 4 means the col 4th on row number 6th is 1.
Use
max.col(dat)
# [1] 1 1 2 3 2 4 3 3
In base R, we can use apply
apply(dat == 1, 1, which)
#[1] 1 1 2 3 2 4 3 3

summing all possible left to right diagonals along specified columns in a data frame by group?

Suppose I have something like this:
df<-data.frame(group=c(1, 1,2, 2, 2, 4,4,4,4,6,6,6),
binary1=c(1,0,1,0,0,0,0,0,0,0,0,0),
binary2=c(0,1,0,1,0,1,0,0,0,0,1,1),
binary3=c(0,0,0,0,1,0,1,0,0,0,0,0),
binary4=c(0,0,0,0,0,0,0,1,0,0,0,0))
I want to sum along all possible left to right diagonals within groups (i.e group 1, 2 4 and 6) and return the max sum. This is also in a dataframe, so I would like to specify to only sum along binary1-binary4. Anyone know if this is possible?
Here's my desired output:
group binary1 binary2 binary3 binary4 want
1 1 1 0 0 0 2
2 1 0 1 0 0 2
3 2 1 0 0 0 3
4 2 0 1 0 0 3
5 2 0 0 1 0 3
6 4 0 1 0 0 3
7 4 0 0 1 0 3
8 4 0 0 0 1 3
9 4 0 0 0 0 3
10 6 0 0 0 0 1
11 6 0 1 0 0 1
12 6 0 1 0 0 1
I have circled the "diagonals" I would like summed for group 4 in this image as an example:
Here is another solution where we use row and col indices to get all possible combinations of diagonals. Use by to split by group and merge it with original dataframe.
max_diag <- function(x) max(sapply(split(as.matrix(x), row(x) - col(x)), sum))
merge(df, stack(by(df[-1], df$group, max_diag)), by.x = "group", by.y = "ind")
# group binary1 binary2 binary3 binary4 values
#1 1 1 0 0 0 2
#2 1 0 1 0 0 2
#3 2 1 0 0 0 3
#4 2 0 1 0 0 3
#5 2 0 0 1 0 3
#6 4 0 1 0 0 3
#7 4 0 0 1 0 3
#8 4 0 0 0 1 3
#9 4 0 0 0 0 3
#10 6 0 0 0 0 1
#11 6 0 1 0 0 1
#12 6 0 1 0 0 1
You can split the data.frame and sum the diagonal using diag(). Once you have this sum diagonal per group, it's putting them back into the data.frame by calling the group.
Group 4 should be zero? Or am I missing something:
DIAG = by(df[,-1],df$group,function(i)sum(diag(as.matrix(i))))
df$want = DIAG[as.character(df$group)]
If I get your definition correct, we define a function to calculate sum of main diagonal:
main_diag = function(m){
sapply(1:(ncol(m)-1),function(i)sum(diag(m[,i:ncol(m)])))
}
Thanks to #IceCreamToucan for correcting this. Then we consider the max of all main diagonals, and their transpose:
DIAG = by(df[,-1],df$group,function(i){
i = as.matrix(i)
max(main_diag(i),main_diag(t(i)))
})
df$want = DIAG[as.character(df$group)]
group binary1 binary2 binary3 binary4 want
1 1 1 0 0 0 2
2 1 0 1 0 0 2
3 2 1 0 0 0 3
4 2 0 1 0 0 3
5 2 0 0 1 0 3
6 4 0 1 0 0 3
7 4 0 0 1 0 3
8 4 0 0 0 1 3
9 4 0 0 0 0 3
10 6 0 0 0 0 1
11 6 0 1 0 0 1
12 6 0 1 0 0 1

How to reset cumsum at end of consecutive string [duplicate]

This question already has answers here:
Cumulative sum for positive numbers only [duplicate]
(9 answers)
Closed 6 years ago.
If I have the following vector:
x = c(1,1,1,0,0,0,0,1,1,0,0,1,1,1,0,0,1,1,1,1,0,0,0,0,1,1,1)
how can I calculate the cumulative sum for all of the consecutive 1's, resetting each time I hit a 0?
So, the desired output would look like this:
> y
[1] 1 2 3 0 0 0 0 1 2 0 0 1 2 3 0 0 1 2 3 4 0 0 0 0 1 2 3
This works:
unlist(lapply(rle(x)$lengths, FUN = function(z) 1:z)) * x
# [1] 1 2 3 0 0 0 0 1 2 0 0 1 2 3 0 0 1 2 3 4 0 0 0 0 1 2 3
It relies pretty heavily on your special case of only having 1s and 0s, but for that case it works great! Even better, with #nicola's suggested improvements:
sequence(rle(x)$lengths) * x
# [1] 1 2 3 0 0 0 0 1 2 0 0 1 2 3 0 0 1 2 3 4 0 0 0 0 1 2 3
I read this post about how to split a vector, and use splitAt2 by #Calimo.
So it's like this:
splitAt2 <- function(x, pos) {
out <- list()
pos2 <- c(1, pos, length(x)+1)
for (i in seq_along(pos2[-1])) {
out[[i]] <- x[pos2[i]:(pos2[i+1]-1)]
}
return(out)
}
x = c(1,1,1,0,0,0,0,1,1,0,0,1,1,1,0,0,1,1,1,1,0,0,0,0,1,1,1)
where_split = which(x == 0)
x_split = splitAt2(x, where_split)
unlist(sapply(x_split, cumsum))
# [1] 1 2 3 0 0 0 0 1 2 0 0 1 2 3 0 0 1 2 3 4 0 0 0 0 1 2 3
Here is another option
library(data.table)
ave(x, rleid(x), FUN=seq_along)*x
#[1] 1 2 3 0 0 0 0 1 2 0 0 1 2 3 0 0 1 2 3 4 0 0 0 0 1 2 3
Or without any packages
ave(x, cumsum(c(TRUE, x[-1]!= x[-length(x)])), FUN=seq_along)*x
#[1] 1 2 3 0 0 0 0 1 2 0 0 1 2 3 0 0 1 2 3 4 0 0 0 0 1 2 3

Consecutive group number in R

This problem is very similar to Consecutive value after column value change in R
So for
SOG <- c(4,4,0,0,0,3,4,5,0,0,1,2,0,0,0)
the difference is that now I'd like to count how many groups of SOG there are. For example:
SOG Trips
--- -----
4 1
4 1
0 0
0 0
0 0
3 2
4 2
5 2
0 0
0 0
1 3
2 3
0 0
0 0
0 0
Anyone?
Assuming you mean a "group of SOG" is a set of consecutive non-zero SOG values, i.e. starts with a non-zero SOG value and ends with a non-zero SOG value (not necessarily the same value):
Trips <- ifelse(SOG>0, cumsum(c(SOG[1]>0, diff(SOG>0)) == 1), 0)
# [1] 1 1 0 0 0 2 2 2 0 0 3 3 0 0 0
This is one option:
replace(cumsum(c(SOG[1], abs(diff(SOG))) == SOG & SOG != 0), SOG == 0, 0)
# [1] 1 1 0 0 0 2 2 2 0 0 3 3 0 0 0
You can try my TrueSeq function from my GitHub-only "SOfun" package.
Usage would be:
library(SOfun)
TrueSeq(as.logical(SOG))
# [1] 1 1 0 0 0 2 2 2 0 0 3 3 0 0 0
To get the inverse, just negate the as.logical step:
TrueSeq(!as.logical(SOG))
# [1] 0 0 1 1 1 0 0 0 2 2 0 0 3 3 3

list all permutations of k numbers, taken from 0:k, that sums to k

This question is closely related to another question R:sample(). I want to find a way in R to list all the permutations of k numbers, that sums to k, where each number is chosen from 0:k. If k=7, I can choose 7 numbers from 0,1,...,7. A feasible solution is then 0,1,2,3,1,0,0 another is 1,1,1,1,1,1,1. I don't want to generate all permutations, since if k is just fairly larger than 7 this explodes.
Of course in the k=7 example I could use the following:
perms7<-matrix(numeric(7*1716),ncol=7)
count=0
for(i in 0:7)
for(j in 0:(7-i))
for(k in 0:(7-i-j))
for(l in 0:(7-i-j-k))
for(n in 0:(7-i-j-k-l))
for(m in 0:(7-i-j-k-l-n)){
res<-7-i-j-k-l-n-m
count<-count+1
perms7[count,]<-c(i,j,k,l,n,m,res)
}
head(perms7,10)
But how can I generalize this approach to account for any k without having to write (k-1) loops?
I tried to come up with a recursive scheme:
perms7<-matrix(numeric(7*1716),ncol=7) #store solutions (adjustable size later)
k<-7 #size of interest
d<-0 #depth
count=0 #count of permutations
rec<-function(j,d,a){
a<-a-j #max loop
d<-d+1 #depth (posistion)
for(i in 0:a ) {
if(d<(k-1)) rec(i,d,a)
count<<-count+1
perms7[count,d]<<-i
perms7[count,k]<<-k-sum(perms7[count,-k])
}
}
rec(0,0,k)
But got stuck, and I'm not quite sure this is the right way to go. Wonder if there is any "magic" R function that is neat for this (though very specific) problem or just part of it.
In the k=7 case, all the 2.097.152 permutations and the 1.716 that sum to k=7 can be found by:
library(gtools)
k=7
perms <- permutations(k+1, k, 0:k, repeats.allowed=T) #all permutations
perms.k <- perms[rowSums(perms) == k,] #permutations which sums to k
for k=8 there are 43.046.721 permutations but I only want to list the 6.435.
Any help is greatly appreciated!
There's a package for that...
require( partitions )
parts(7)
#[1,] 7 6 5 5 4 4 4 3 3 3 3 2 2 2 1
#[2,] 0 1 2 1 3 2 1 3 2 2 1 2 2 1 1
#[3,] 0 0 0 1 0 1 1 1 2 1 1 2 1 1 1
#[4,] 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1
#[5,] 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1
#[6,] 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
#[7,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
You appear to be looking for compositions(). e.g. for k=4:
parts(4)
#[1,] 4 3 2 2 1
#[2,] 0 1 2 1 1
#[3,] 0 0 0 1 1
#[4,] 0 0 0 0 1
compositions(4,4)
#[1,] 4 3 2 1 0 3 2 1 0 2 1 0 1 0 0 3 2 1 0 2 1 0 1 0 0 2 1 0 1 0 0 1 0 0 0
#[2,] 0 1 2 3 4 0 1 2 3 0 1 2 0 1 0 0 1 2 3 0 1 2 0 1 0 0 1 2 0 1 0 0 1 0 0
#[3,] 0 0 0 0 0 1 1 1 1 2 2 2 3 3 4 0 0 0 0 1 1 1 2 2 3 0 0 0 1 1 2 0 0 1 0
#[4,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 4
And just to check your math... :-)
ncol(compositions(8,8))
#[1] 6435

Resources