This question already has answers here:
Calculate the mean by group
(9 answers)
How to get the maximum value by group
(5 answers)
How to select the rows with maximum values in each group with dplyr? [duplicate]
(6 answers)
Closed 5 years ago.
I have a sequence and then times that are recorded within each sequence. I am trying to find the max value of time that is recorded with its corresponding sequence. Example below:
Seq seconds
1 1 1
2 1 2
3 1 3
4 1 4
5 2 1
6 2 2
7 3 1
8 3 2
9 3 3
10 3 4
11 3 5
I would like a result that tells me the max time that was recorded in each sequence.
Seq Time
1 4
2 2
3 5
A solution from dplyr.
library(dplyr)
dt2 <- dt %>%
arrange(Seq, seconds) %>%
group_by(Seq) %>%
slice(n())
dt2
# A tibble: 3 x 2
# Groups: Seq [3]
Seq seconds
<int> <int>
1 1 4
2 2 2
3 3 5
DATA
dt <- read.table(text = " Seq seconds
1 1 1
2 1 2
3 1 3
4 1 4
5 2 1
6 2 2
7 3 1
8 3 2
9 3 3
10 3 4
11 3 5",
header = TRUE)
An option using data.table
library(data.table)
setDT(df1)[, .(Time = max(seconds)), Seq]
# Seq Time
#1: 1 4
#2: 2 2
#3: 3 5
Related
This question already has an answer here:
Incrementing an ID number each time a condition is met
(1 answer)
Closed 1 year ago.
I have a data.frame ordered by ID with a column of numeric values that I would like to bin into groups, increasing the group number only when a certain target value/trigger is surpassed. I haven't had success with seq(), seq_along(), or data.table cumsum(), but I'm sure there must be a way
Example data.frame with desired group column below. In this example, the sequence generating the group column should increase only when a number >= 300 appears in the value column.
dat = data.frame(ID=1:10, value=c(0,2,1,12,68,300,41,0,72959,51), group=c(1,1,1,1,1,2,2,2,3,3))
> dat
ID value group
1 1 0 1
2 2 2 1
3 3 1 1
4 4 12 1
5 5 68 1
6 6 300 2
7 7 41 2
8 8 0 2
9 9 72959 3
10 10 51 3
We may use cumsum on a logical vector to create the group
library(dplyr)
dat %>%
mutate(group2 = cumsum(value >=300)+ 1)
-output
ID value group group2
1 1 0 1 1
2 2 2 1 1
3 3 1 1 1
4 4 12 1 1
5 5 68 1 1
6 6 300 2 2
7 7 41 2 2
8 8 0 2 2
9 9 72959 3 3
10 10 51 3 3
This question already has answers here:
Reshaping data.frame from wide to long format
(8 answers)
Closed 2 years ago.
I've got a dataset like this:
dat1 <- read.table(header=TRUE, text="
ID A_1 B_1 C_1 A_2 B_2 C_2
1 1 2 1 5 5 5
2 1 3 3 4 4 1
3 1 3 1 3 2 2
4 2 5 5 3 2 2
5 1 4 1 2 1 3
")
I would like to convert this to a long format, with one column for the ID, one for the system (1 or 2), one for the variable (A, B, or C) and one with the value.
I can't figure out how to do that, I would be very grateful if somebody could help me out.
I already tried the pivot_longer command, but it only gives me three columns one for the ID one for the variables and one for the value.
Thanks!!
You can use pivot_longer in the following way :
tidyr::pivot_longer(dat1, cols = -ID,
names_to = c('variable', 'system'), names_sep = '_')
# ID variable system value
# <int> <chr> <chr> <int>
# 1 1 A 1 1
# 2 1 B 1 2
# 3 1 C 1 1
# 4 1 A 2 5
# 5 1 B 2 5
# 6 1 C 2 5
# 7 2 A 1 1
# 8 2 B 1 3
# 9 2 C 1 3
#10 2 A 2 4
# … with 20 more rows
This question already has answers here:
Fill missing dates by group
(3 answers)
Fastest way to add rows for missing time steps?
(4 answers)
Closed 3 years ago.
I have a data frame of ids with number column
df <- read.table(text="
id nr
1 1
2 1
1 2
3 1
1 3
", header=TRUE)
I´d like to create new dataframe from it, where each id will have unique nr from df dataframe. As you may notice, id 3 have only nr 1, but no 2 and 3. So result should be.
result <- read.table(text="
id nr
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
", header=TRUE)
You can use expand.grid as:
library(dplyr)
result <- expand.grid(id = unique(df$id), nr = unique(df$nr)) %>%
arrange(id)
result
id nr
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3
We can do:
tidyr::expand(df,id,nr)
# A tibble: 9 x 2
id nr
<int> <int>
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3
This question already has answers here:
Numbering rows within groups in a data frame
(10 answers)
Closed 3 years ago.
I have the following sample data frame:
> test = data.frame(UserId = sample(1:5, 10, replace = T)) %>% arrange(UserId)
> test
UserId
1 1
2 1
3 1
4 1
5 1
6 3
7 4
8 4
9 4
10 5
I now want another column called loginCount for that user, which is something like assigning incremental ids within each group, something like below. Using the mutate like below creates id within each group, but how do I get the incremental ids within each group independent of each other ?
> test %>% mutate(loginCount = group_indices_(test, .dots = "UserId"))
UserId loginCount
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 3 2
7 4 3
8 4 3
9 4 3
10 5 4
I want something like shown below:
UserId loginCount
1 1
1 2
1 3
1 4
1 5
3 1
4 1
4 2
4 3
5 1
You could group and use row_number:
test %>%
arrange(UserId) %>%
group_by(UserId) %>%
mutate(loginCount = row_number()) %>%
ungroup()
# A tibble: 10 x 2
# Groups: UserId [4]
UserId loginCount
<int> <int>
1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 3 1
7 4 1
8 4 2
9 4 3
10 5 1
One solution using base R tapply()
test$loginCount <- unlist(tapply(rep(1, nrow(test)), test$UserId, cumsum))
> test
UserId loginCount
1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 3 1
7 4 1
8 4 2
9 4 3
10 5 1
This question already has answers here:
How to create a consecutive group number
(13 answers)
How to convert three columns into single one
(2 answers)
Assign unique ID per multiple columns of data table
(2 answers)
Closed 4 years ago.
Hi I'm using R and I have a data like this:
1 2 3 4 5
1 2 1 2 2
3 4 1 2 3
1 2 3 4 5
3 4 1 2 3
I want to number the identical lines together with the same number, for the above ex
1 2 3 4 5 --> 1
1 2 1 2 2 --> 2
3 4 1 2 3 --> 3
1 2 3 4 5 --> 1
3 4 1 2 3 --> 3
Does any know how to do this in R (for both numeric case and character case)?
Your help is really appreciated!
This is your data:
df <- data.frame(a=c(1,1,3,1,3),
b=c(2,2,4,2,4),
c=c(3,1,1,3,1),
d=c(4,2,2,4,2),
e=c(5,2,3,5,3))
Approach 1:
You would need the data.table package to perform the below approach:
library(data.table)
i <- interaction(data.table(df), drop=TRUE)
df.out <- cbind(df, id=factor(i,labels=length(unique(i)):1))
This would give you the following:
# a b c d e id
#1 1 2 3 4 5 1
#2 1 2 1 2 2 3
#3 3 4 1 2 3 2
#4 1 2 3 4 5 1
#5 3 4 1 2 3 2
Approach 2:
Another approach is by using the plyr package, as follows:
library(plyr)
.id <- 0
df.out <- ddply(df, colnames(df), transform, id=(.id<<-.id+1))
This will give you the following output:
# a b c d e id
#1 1 2 1 2 2 1
#2 1 2 3 4 5 2
#3 1 2 3 4 5 2
#4 3 4 1 2 3 3
#5 3 4 1 2 3 3
Hope it helps.