How Can I check the occurences of the values in each individual in R? - r

Let's say I have a data.frame that looks like this:
ID B
1 1
1 2
1 1
1 3
2 2
2 2
2 2
2 2
3 2
3 10
3 2
Now I want to check the occurrences of B under each ID, such as that for no. 1, 1 happens twice, 2 and 3 happens 1 time each. And in no. 2, only 2 happens 4 times. How should I accomplish this? I tried to use table in ddply but somehow it did not work. Thanks.

It seems like you may just want a table
> table(dat)
## B
## ID 1 2 3 10
## 1 2 1 1 0
## 2 0 4 0 0
## 3 0 2 0 1
Then the following shows that for ID equal to 1, there are two 1s, one 2, and one 3.
> table(dat)[1, ]
## 1 2 3 10
## 2 1 1 0

And here's an aggregate solution:
> with(data, aggregate(B, list(ID=ID, B=B), length))
ID B x
1 1 1 2
2 1 2 1
3 2 2 4
4 3 2 2
5 1 3 1
6 3 10 1

Here's an approach using "dplyr" (if I understood your question correctly):
library(dplyr)
mydf %.% group_by(ID, B) %.% summarise(count = n())
# Source: local data frame [6 x 3]
# Groups: ID
#
# ID B count
# 1 1 1 2
# 2 1 2 1
# 3 1 3 1
# 4 2 2 4
# 5 3 2 2
# 6 3 10 1
In "plyr", I guess it would be something like:
library(plyr)
ddply(mydf, .(ID, B), summarise, count = length(B))
In base R, you could do something like the following and just remove the rows with 0:
data.frame(table(mydf))
# ID B Freq
# 1 1 1 2
# 2 2 1 0
# 3 3 1 0
# 4 1 2 1
# 5 2 2 4
# 6 3 2 2
# 7 1 3 1
# 8 2 3 0
# 9 3 3 0
# 10 1 10 0
# 11 2 10 0
# 12 3 10 1

And the data.table solution because there must be:
data[, .N, by=c('ID','B')]
The above won't work if you try to apply it to a data.frame. It must be converted to a data.table first. With more recent versions of "data.table", this is most easily done with setDT (as recommended by David in the comments):
library(data.table)
setDT(data)[, .N, by=c('ID', 'B')]

Related

How to keep only first value in every sequence of duplicated values in R [duplicate]

This question already has answers here:
Select first row in each contiguous run by group
(4 answers)
Closed 5 months ago.
I am trying to create a subset where I keep the first value in each sequence of numbers in a column. I tried to use:
df %>% group_by(x) %>% slice_head(n = 1)
But it only works for the first instance of each sequence.
An example data where x column contains the repeated sequence can be seen below:
x = c(2,2,2,3,3,3,1,1,1,5,5,5,2,2,2,1,1,1,3,3,3)
y = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
df= data.frame(x,y)
> df
x y
1 2 1
2 2 1
3 2 1
4 3 1
5 3 1
6 3 1
7 1 1
8 1 1
9 1 1
10 5 1
11 5 1
12 5 1
13 2 1
14 2 1
15 2 1
16 1 1
17 1 1
18 1 1
19 3 1
20 3 1
21 3 1
So the end result that I would like to achive is:
x = c(2,3,1,5,2,1,3)
y = c(1,1,1,1,1,1,1)
df= data.frame(x,y)
> df
x y
1 2 1
2 3 1
3 1 1
4 5 1
5 2 1
6 1 1
7 3 1
Could you please help or point me to any useful existing topics as I haven't managed to find it?
Thanks
You can try rleid from package data.table
> library(data.table)
> setDT(df)[!duplicated(rleid(x))]
x y
1: 2 1
2: 3 1
3: 1 1
4: 5 1
5: 2 1
6: 1 1
7: 3 1
Base R.
df[c(1, diff(df$x)) != 0, ]
Or also with helper functions from data.table.
library(data.table)
df[rowid(rleid(df$x)) == 1L, ]
# x y
# 1 2 1
# 4 3 1
# 7 1 1
# 10 5 1
# 13 2 1
# 16 1 1
# 19 3 1
Using rle and match.
df[match(with(rle(df$x), values), df$x), ]
# x y
# 1 2 1
# 4 3 1
# 7 1 1
# 10 5 1
# 1.1 2 1
# 7.1 1 1
# 4.1 3 1

R assign order references by other column value [duplicate]

This question already has answers here:
Create counter with multiple variables [duplicate]
(6 answers)
Closed 9 years ago.
I am trying to obtain a sequence within category.
My data are:
A B
1 1
1 2
1 2
1 3
1 3
1 3
1 4
1 4
and I want to get variable "c" such as my data look like:
A B C
1 1 1
1 2 1
1 2 2
1 3 1
1 3 2
1 3 3
1 4 1
1 4 2
Use ave with seq_along:
> mydf$C <- with(mydf, ave(A, A, B, FUN = seq_along))
> mydf
A B C
1 1 1 1
2 1 2 1
3 1 2 2
4 1 3 1
5 1 3 2
6 1 3 3
7 1 4 1
8 1 4 2
If your data are already ordered (as they are in this case), you can also use sequence with rle (mydf$C <- sequence(rle(do.call(paste, mydf))$lengths)), but you don't have that limitation with ave.
If you're a data.table fan, you can make use of .N as follows:
library(data.table)
DT <- data.table(mydf)
DT[, C := sequence(.N), by = c("A", "B")]
DT
# A B C
# 1: 1 1 1
# 2: 1 2 1
# 3: 1 2 2
# 4: 1 3 1
# 5: 1 3 2
# 6: 1 3 3
# 7: 1 4 1
# 8: 1 4 2

How to split a dataframe by factor but discontinuous row should be in different group

For example:
> a <- 1:10
> c <- c(1,1,1,0,0,0,1,1,1,0)
> dt <- data.frame(a,c)
> dt
a c
1 1 1
2 2 1
3 3 1
4 4 0
5 5 0
6 6 0
7 7 1
8 8 1
9 9 1
10 10 0
I want the data should be seperated in 4 group by c:
The first group:
a c
1 1 1
2 2 1
3 3 1
The second one:
a c
1 4 0
2 5 0
3 6 0
The third one:
a c
1 7 1
2 8 1
3 9 1
The forth one:
a c
1 10 0
We can use rleid from data.table to create a grouping variable and use that to split the 'dt' into a list of data.frames.
library(data.table)
split(dt, rleid(dt$c))
Or as #ZheyuanLi mentioned, the rle from base R can be used to create the grouping variable
split(dt, with(rle(dt$c), rep(seq_along(values), lengths)))

Carry Forward First Observation for a Variable For Each Patient

My dataset has 3 variables:
Patient ID Outcome Duration
1 1 3
1 0 4
1 0 5
2 0 2
3 1 1
3 1 2
What I want is the first observation for "Duration" for each patient ID to be carried forward.
That is, for patient #1 I want duration to read 3,3,3 for patient #3 I want duration to read 1, 1.
Here is one way with data.table. You take the first number in Duration and ask R to repeat it for each PatientID.
mydf <- read.table(text = "PatientID Outcome Duration
1 1 3
1 0 4
1 0 5
2 0 2
3 1 1
3 1 2", header = T)
library(data.table)
setDT(mydf)[, Duration := Duration[1L], by = PatientID]
print(mydf)
# PatientID Outcome Duration
#1: 1 1 3
#2: 1 0 3
#3: 1 0 3
#4: 2 0 2
#5: 3 1 1
#6: 3 1 1
This is a good job for dplyr (a data.frame wicked-better successor to plyr with far better syntax than data.table):
library(dplyr)
dat %>%
group_by(`Patient ID`) %>%
mutate(Duration=first(Duration))
## Source: local data frame [6 x 3]
## Groups: Patient ID
##
## Patient ID Outcome Duration
## 1 1 1 3
## 2 1 0 3
## 3 1 0 3
## 4 2 0 2
## 5 3 1 1
## 6 3 1 1
Another alternative using plyr (if you will be doing lots of operations on your dataframe though, and particularly if it's big, I recommend data.table. It has a steeper learning curve but well worth it).
library(plyr)
ddply(mydf, .(PatientID), transform, Duration=Duration[1]) PatientID
# Outcome Duration
# 1 1 1 3
# 2 1 0 3
# 3 1 0 3
# 4 2 0 2
# 5 3 1 1
# 6 3 1 1

generate sequence within group in R [duplicate]

This question already has answers here:
Create counter with multiple variables [duplicate]
(6 answers)
Closed 9 years ago.
I am trying to obtain a sequence within category.
My data are:
A B
1 1
1 2
1 2
1 3
1 3
1 3
1 4
1 4
and I want to get variable "c" such as my data look like:
A B C
1 1 1
1 2 1
1 2 2
1 3 1
1 3 2
1 3 3
1 4 1
1 4 2
Use ave with seq_along:
> mydf$C <- with(mydf, ave(A, A, B, FUN = seq_along))
> mydf
A B C
1 1 1 1
2 1 2 1
3 1 2 2
4 1 3 1
5 1 3 2
6 1 3 3
7 1 4 1
8 1 4 2
If your data are already ordered (as they are in this case), you can also use sequence with rle (mydf$C <- sequence(rle(do.call(paste, mydf))$lengths)), but you don't have that limitation with ave.
If you're a data.table fan, you can make use of .N as follows:
library(data.table)
DT <- data.table(mydf)
DT[, C := sequence(.N), by = c("A", "B")]
DT
# A B C
# 1: 1 1 1
# 2: 1 2 1
# 3: 1 2 2
# 4: 1 3 1
# 5: 1 3 2
# 6: 1 3 3
# 7: 1 4 1
# 8: 1 4 2

Resources