R assign order references by other column value [duplicate] - r

This question already has answers here:
Create counter with multiple variables [duplicate]
(6 answers)
Closed 9 years ago.
I am trying to obtain a sequence within category.
My data are:
A B
1 1
1 2
1 2
1 3
1 3
1 3
1 4
1 4
and I want to get variable "c" such as my data look like:
A B C
1 1 1
1 2 1
1 2 2
1 3 1
1 3 2
1 3 3
1 4 1
1 4 2

Use ave with seq_along:
> mydf$C <- with(mydf, ave(A, A, B, FUN = seq_along))
> mydf
A B C
1 1 1 1
2 1 2 1
3 1 2 2
4 1 3 1
5 1 3 2
6 1 3 3
7 1 4 1
8 1 4 2
If your data are already ordered (as they are in this case), you can also use sequence with rle (mydf$C <- sequence(rle(do.call(paste, mydf))$lengths)), but you don't have that limitation with ave.
If you're a data.table fan, you can make use of .N as follows:
library(data.table)
DT <- data.table(mydf)
DT[, C := sequence(.N), by = c("A", "B")]
DT
# A B C
# 1: 1 1 1
# 2: 1 2 1
# 3: 1 2 2
# 4: 1 3 1
# 5: 1 3 2
# 6: 1 3 3
# 7: 1 4 1
# 8: 1 4 2

Related

How to keep only first value in every sequence of duplicated values in R [duplicate]

This question already has answers here:
Select first row in each contiguous run by group
(4 answers)
Closed 5 months ago.
I am trying to create a subset where I keep the first value in each sequence of numbers in a column. I tried to use:
df %>% group_by(x) %>% slice_head(n = 1)
But it only works for the first instance of each sequence.
An example data where x column contains the repeated sequence can be seen below:
x = c(2,2,2,3,3,3,1,1,1,5,5,5,2,2,2,1,1,1,3,3,3)
y = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
df= data.frame(x,y)
> df
x y
1 2 1
2 2 1
3 2 1
4 3 1
5 3 1
6 3 1
7 1 1
8 1 1
9 1 1
10 5 1
11 5 1
12 5 1
13 2 1
14 2 1
15 2 1
16 1 1
17 1 1
18 1 1
19 3 1
20 3 1
21 3 1
So the end result that I would like to achive is:
x = c(2,3,1,5,2,1,3)
y = c(1,1,1,1,1,1,1)
df= data.frame(x,y)
> df
x y
1 2 1
2 3 1
3 1 1
4 5 1
5 2 1
6 1 1
7 3 1
Could you please help or point me to any useful existing topics as I haven't managed to find it?
Thanks
You can try rleid from package data.table
> library(data.table)
> setDT(df)[!duplicated(rleid(x))]
x y
1: 2 1
2: 3 1
3: 1 1
4: 5 1
5: 2 1
6: 1 1
7: 3 1
Base R.
df[c(1, diff(df$x)) != 0, ]
Or also with helper functions from data.table.
library(data.table)
df[rowid(rleid(df$x)) == 1L, ]
# x y
# 1 2 1
# 4 3 1
# 7 1 1
# 10 5 1
# 13 2 1
# 16 1 1
# 19 3 1
Using rle and match.
df[match(with(rle(df$x), values), df$x), ]
# x y
# 1 2 1
# 4 3 1
# 7 1 1
# 10 5 1
# 1.1 2 1
# 7.1 1 1
# 4.1 3 1

R: Subsetting on increasing value to max excluding the decreasing

I have a number of trials where one variable increases to a max of interest then decreases back to a starting point. How would I go about just retaining the observations with the increasing values to max. Thanks.
For example
Trial A B C
1 2 4 1
1 4 3 2
1 3 7 3
1 3 3 2
1 4 1 1
2 4 1 1
2 6 2 2
2 3 1 3
2 1 1 2
2 7 3 1
...
So we would check max on C and retain as follows,
Trial A B C
1 2 4 1
1 4 3 2
1 3 7 3
2 4 1 1
2 6 2 2
2 3 1 3
...
Ultimately I'll have a low cut off value as well as varying perhaps what I mean by max but essentially the above is the aim.
Probably not the most efficient solution, but here is an attempt using data.table
library(data.table)
setDT(df)[, .SD[1:which.max(C)], by = Trial]
# Trial A B C
# 1: 1 2 4 1
# 2: 1 4 3 2
# 3: 1 3 7 3
# 4: 2 4 1 1
# 5: 2 6 2 2
# 6: 2 3 1 3
Or for some efficiency gain
indx <- setDT(df)[, .I[1:which.max(C)], by = Trial]
df[indx$V1]
library(dplyr)
df%>%group_by(Trial)%>%slice(1:max(C))

How Can I check the occurences of the values in each individual in R?

Let's say I have a data.frame that looks like this:
ID B
1 1
1 2
1 1
1 3
2 2
2 2
2 2
2 2
3 2
3 10
3 2
Now I want to check the occurrences of B under each ID, such as that for no. 1, 1 happens twice, 2 and 3 happens 1 time each. And in no. 2, only 2 happens 4 times. How should I accomplish this? I tried to use table in ddply but somehow it did not work. Thanks.
It seems like you may just want a table
> table(dat)
## B
## ID 1 2 3 10
## 1 2 1 1 0
## 2 0 4 0 0
## 3 0 2 0 1
Then the following shows that for ID equal to 1, there are two 1s, one 2, and one 3.
> table(dat)[1, ]
## 1 2 3 10
## 2 1 1 0
And here's an aggregate solution:
> with(data, aggregate(B, list(ID=ID, B=B), length))
ID B x
1 1 1 2
2 1 2 1
3 2 2 4
4 3 2 2
5 1 3 1
6 3 10 1
Here's an approach using "dplyr" (if I understood your question correctly):
library(dplyr)
mydf %.% group_by(ID, B) %.% summarise(count = n())
# Source: local data frame [6 x 3]
# Groups: ID
#
# ID B count
# 1 1 1 2
# 2 1 2 1
# 3 1 3 1
# 4 2 2 4
# 5 3 2 2
# 6 3 10 1
In "plyr", I guess it would be something like:
library(plyr)
ddply(mydf, .(ID, B), summarise, count = length(B))
In base R, you could do something like the following and just remove the rows with 0:
data.frame(table(mydf))
# ID B Freq
# 1 1 1 2
# 2 2 1 0
# 3 3 1 0
# 4 1 2 1
# 5 2 2 4
# 6 3 2 2
# 7 1 3 1
# 8 2 3 0
# 9 3 3 0
# 10 1 10 0
# 11 2 10 0
# 12 3 10 1
And the data.table solution because there must be:
data[, .N, by=c('ID','B')]
The above won't work if you try to apply it to a data.frame. It must be converted to a data.table first. With more recent versions of "data.table", this is most easily done with setDT (as recommended by David in the comments):
library(data.table)
setDT(data)[, .N, by=c('ID', 'B')]

generate sequence within group in R [duplicate]

This question already has answers here:
Create counter with multiple variables [duplicate]
(6 answers)
Closed 9 years ago.
I am trying to obtain a sequence within category.
My data are:
A B
1 1
1 2
1 2
1 3
1 3
1 3
1 4
1 4
and I want to get variable "c" such as my data look like:
A B C
1 1 1
1 2 1
1 2 2
1 3 1
1 3 2
1 3 3
1 4 1
1 4 2
Use ave with seq_along:
> mydf$C <- with(mydf, ave(A, A, B, FUN = seq_along))
> mydf
A B C
1 1 1 1
2 1 2 1
3 1 2 2
4 1 3 1
5 1 3 2
6 1 3 3
7 1 4 1
8 1 4 2
If your data are already ordered (as they are in this case), you can also use sequence with rle (mydf$C <- sequence(rle(do.call(paste, mydf))$lengths)), but you don't have that limitation with ave.
If you're a data.table fan, you can make use of .N as follows:
library(data.table)
DT <- data.table(mydf)
DT[, C := sequence(.N), by = c("A", "B")]
DT
# A B C
# 1: 1 1 1
# 2: 1 2 1
# 3: 1 2 2
# 4: 1 3 1
# 5: 1 3 2
# 6: 1 3 3
# 7: 1 4 1
# 8: 1 4 2

Inserting a count field for each row by a grouping variable

I have a data set with observations that are both grouped and ordered (by rank). I'd like to add a third variable that is a count of the number of observations for each grouping variable. I'm aware of ways to group and count variables but I can't find a way to re-insert these counts back into the original data set, which has more rows. I'd like to get the variable C in the example table below.
A B C
1 1 3
1 2 3
1 3 3
2 1 4
2 2 4
2 3 4
2 4 4
Here's one way using ave:
DF <- within(DF, {C <- ave(A, A, FUN=length)})
# A B C
# 1 1 1 3
# 2 1 2 3
# 3 1 3 3
# 4 2 1 4
# 5 2 2 4
# 6 2 3 4
# 7 2 4 4
Here is one approach using data.table that makes use of .N, which is described in the help file to "data.table" as .N is an integer, length 1, containing the number of rows in the group.
> library(data.table)
> DT <- data.table(A = rep(c(1, 2), times = c(3, 4)), B = c(1:3, 1:4))
> DT
A B
1: 1 1
2: 1 2
3: 1 3
4: 2 1
5: 2 2
6: 2 3
7: 2 4
> DT[, C := .N, by = "A"]
> DT
A B C
1: 1 1 3
2: 1 2 3
3: 1 3 3
4: 2 1 4
5: 2 2 4
6: 2 3 4
7: 2 4 4

Resources