Merging Overlapping Intervals in R [duplicate] - r

This question already has answers here:
Overlap join with start and end positions
(5 answers)
Merge overlapping ranges into unique groups, in dataframe
(2 answers)
Collapse rows with overlapping ranges
(5 answers)
Closed 4 years ago.
I have a problem where I get information on the range of occupied cells. There may be multiple start and end entries of the range which can overlap for the same test. Not all the "test" have entries.
I have a data frame in R and want to merge all the unique ranges for each "test".
x<-data.frame(test=c(2,3,3,2,3,4),start=c(1,1,1,2,3,4),end=c(1,2,3,3,4,4))
> x
test start end
1 2 1 1
2 3 1 2
3 3 1 3
4 2 2 3
5 3 3 4
6 4 4 4
I would like to transform this data frame into:
test start end
1 2 1 1
2 2 2 3
3 3 1 4
4 4 4 4
In the end I just want to know how many cells are occupied by the range for each "row", so row 2 has (1,1) and (2,3) which means 3 cells. row 3 has (1,4) so 4 cells. row 4 has (4,4) so 1 cell. since row 1 or 5 to n has none occupied, all are 0 cells:
u<-unique(y[,1])
a<-rep(0,length(u))
for(i in 1:length(u)){
a[i]<-sum(y[which(y[,1]==u[i]),3]-y[which(y[,1]==u[i]),2])+length(which(y[,1]==u[i]))
}
> a
[1] 3 4 1

Related

increasing value by one with each occurrence of non-repeated number [duplicate]

This question already has answers here:
Increment by 1 for every change in column
(6 answers)
Closed 2 years ago.
v <- c(1,1,2,3,3,3,1,1,3,4,4)
I'm trying to create a vector of elements in which the first occurrence of a non-repeated number always increases by one relative to the previous number.
This is the desired output
1,1,2,3,3,3,4,4,5,6,6
What would an efficient way of doing this would be?
A base R option with rle
> with(rle(v),rep(seq_along(values),lengths))
[1] 1 1 2 3 3 3 4 4 5 6 6
or data.table::rleid
> data.table::rleidv(v)
[1] 1 1 2 3 3 3 4 4 5 6 6

Go through a column and collect a running total in new column [duplicate]

This question already has answers here:
Creation of a specific vector without loop or recursion in R
(2 answers)
Split data.frame by value
(2 answers)
Closed 4 years ago.
I have a dataframe whose rows represent people. For a given family, the first row has the value 1 in the column A, and all following rows contain members of the same family until another row in in column A has the value 1. Then, a new family starts.
I would like to assign IDs to all families in my dataset. In other words, I would like to take:
A
1
2
3
1
3
3
1
4
And turn it into:
A family_id
1 1
2 1
3 1
1 2
3 2
3 2
1 3
4 3
I'm playing with a dataframe of 3 million rows, so a simple for-loop solution I came up with falls short of necessary efficiency. Also, the family_id need not be sequential.
I'll take a dplyr solution.
data:
df <- data.frame(A = c(1:3,1,3,3,1,4))
code:
df$familiy_id <- cumsum(c(-1,diff(df$A)) < 0)
result:
# A familiy_id
#1 1 1
#2 2 1
#3 3 1
#4 1 2
#5 3 2
#6 3 2
#7 1 3
#8 4 3
please note:
This solution starts a new group when a number occurs that is smaller than the previous one.
When its 100% sure that a new group always begins with a 1 consistently, then ronak's solution is perfect.

Tallying values in single column and separating into Rows in R [duplicate]

This question already has answers here:
Counting the number of elements with the values of x in a vector
(20 answers)
Closed 6 years ago.
I have a single row of numbers. I'm wondering how I can separate it out so that it outputs columns that total the tally of each set of numbers. I've tried playing around with "separate" but I can't figure out how to make it work.
Here's my data frame:
2
2
2
2
2
4
4
4
I'd like it to be
2 4
5 3
You can use the table() function.
> df
V1
1 2
2 2
3 2
4 2
5 2
6 4
7 4
8 4
> table(df$V1)
2 4
5 3
We can use tabulate which would be faster
tabulate(factor(df1$V1))
#[1] 5 3

Count specific data in a data frame & display [duplicate]

This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 6 years ago.
am having exam dataframe with student.id marks
student.id marks
2 2
2 2
2 -1
3 -1
3 -1
3 2
4 2
4 -1
5 2
5 2
5 2
how could i sum total marks for a specific student.id like below table
student.id total-marks
2
3
4
5
how can i obtain the above table with respective total marks ? thanks
If you data.frame is callesd df, this should work:
aggregate(marks~student.id,df,FUN="sum")

Subtract from the previous row R [duplicate]

This question already has answers here:
How to find the difference in value in every two consecutive rows in R?
(4 answers)
Closed 7 years ago.
I have a dataframe like so:
df <- data.frame(start=c(5,4,2),end=c(2,6,3))
start end
5 2
4 6
2 3
And I want the following result:
start end diff
5 2
4 6 1
2 3 -1
Essentially it is:
end[2] (second row) - start[1] = 6-5=1
and end[3] - start[2] = 3-4 = -1
What is a good way of doing this in R?
Just a simple vector subtraction should work
df$diff <- c(NA,df[2:nrow(df), 2] - df[1:(nrow(df)-1), 1])
start end diff
1 5 2 NA
2 4 6 1
3 2 3 -1
library(data.table)
setDT(df)[,value:=end-shift(start,1,type="lag")]
start end value
1: 5 2 NA
2: 4 6 1
3: 2 3 -1

Resources