Related
I would like to create a new df, based upon whether the second or third condition's for each subject are greater than the first condition.
Example df:
df1 <- data.frame(subject = rep(1:5, 3),
condition = rep(c("first", "second", "third"), each = 5),
values = c(.4, .4, .4, .4, .4, .6, .6, .6, .6, .4, .6, .6, .6, .4, .4))
> df1
subject condition values
1 1 first 0.4
2 2 first 0.4
3 3 first 0.4
4 4 first 0.4
5 5 first 0.4
6 1 second 0.6
7 2 second 0.6
8 3 second 0.6
9 4 second 0.6
10 5 second 0.4
11 1 third 0.6
12 2 third 0.6
13 3 third 0.6
14 4 third 0.4
15 5 third 0.4
The resulting df would be this:
> df2
subject condition values
1 1 first 0.4
2 2 first 0.4
3 3 first 0.4
4 4 first 0.4
6 1 second 0.6
7 2 second 0.6
8 3 second 0.6
9 4 second 0.6
11 1 third 0.6
12 2 third 0.6
13 3 third 0.6
14 4 third 0.4
Here, subject #5 does not meet the criteria. This is because only subject #5's values are not greater than the first condition in either the second or third condition.
Thanks.
We may group by 'subject' and filter if any of the second or third 'values' are greater than 'first'
library(dplyr)
df1 %>%
group_by(subject) %>%
filter(any(values[2:3] > first(values))) %>%
ungroup
-output
# A tibble: 12 × 3
subject condition values
<int> <chr> <dbl>
1 1 first 0.4
2 2 first 0.4
3 3 first 0.4
4 4 first 0.4
5 1 second 0.6
6 2 second 0.6
7 3 second 0.6
8 4 second 0.6
9 1 third 0.6
10 2 third 0.6
11 3 third 0.6
12 4 third 0.4
Using ave.
df1[with(df1, ave(values, subject, FUN=\(x) any(x[2:3] > x[1])) == 1), ]
# subject condition values
# 1 1 first 0.4
# 2 2 first 0.4
# 3 3 first 0.4
# 4 4 first 0.4
# 6 1 second 0.6
# 7 2 second 0.6
# 8 3 second 0.6
# 9 4 second 0.6
# 11 1 third 0.6
# 12 2 third 0.6
# 13 3 third 0.6
# 14 4 third 0.4
I have the following data frame:
DF <- data.frame(A=c(0.1,0.1,0.1,0.1,0.2,0.2,0.2,0.3,0.4,0.4 ), B=c(1,2,1,5,10,2,3,1,6,2), B=c(1000,50,400,6,300,2000,20,30,40,50))
and I want to filter DF for each group of equal values in A select the Maximum in B.
For example for 0.1 in A the maximum in B is 5.
Ending with the new data frame:
A B C
0.1 5 6
0.2 10 300
0.3 1 30
0.4 6 40
I am not sure if this a problem to solve with base R or with a library. Because I am thinking to use dplyr and group A. I am correct?
There are a couple of base R options:
Using subset + ave
> subset(DF,as.logical(ave(B,A,FUN = function(x) x == max(x))))
A B B.1
4 0.1 5 6
5 0.2 10 300
8 0.3 1 30
9 0.4 6 40
Using merge + aggregate
> merge(aggregate(B~A,DF,max),DF)
A B B.1
1 0.1 5 6
2 0.2 10 300
3 0.3 1 30
4 0.4 6 40
An option with data.table where group by 'A', get the index where 'B' is max with which.max, wrap with .I to return the row index. If we don't specify or rename, by default, it returns as 'V1' column, which we extract as vector to subset the rows of dataset
library(data.table)
setDT(DF)[DF[, .I[which.max(B)], A]$V1]
-output
# A B B.1
#1: 0.1 5 6
#2: 0.2 10 300
#3: 0.3 1 30
#4: 0.4 6 40
You're right, using dplyr and grouping by A, you can use slice_max() (also from dplyr) to select the max value in B for each group
library(dplyr)
DF %>%
group_by(A) %>%
slice_max(B)
Output:
# A tibble: 4 x 3
# Groups: A [4]
A B C
<dbl> <dbl> <dbl>
1 0.1 5 6
2 0.2 10 300
3 0.3 1 30
4 0.4 6 40
I have the following data frame,
Input
For all observations where Month > tenor, the last value of the rate column should be retained for each account for the remaining months. Eg:- Customer 1 has tenor = 5, so for all months greater than 5, the last rate value is retained.
I am using the following code
df$rate <- ifelse(df$Month > df$tenor,tail(df$rate, n=1),df$rate)
But here, the last value is NA so it does not work
Expected output is
Output
this will work, but please have a reproducible example. Others want to help you, not do your homework.
df <- data.frame(
customer = c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2),
Month = c(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10),
tenor = c(5,5,5,5,5,5,5,5,5,5,3,3,3,3,3,3,3,3,3,3),
rate = c(0.2,0.3,0.4,0.5,0.6,NA,NA,NA,NA,NA,0.1,0.2,0.3,NA,NA,NA,NA,NA,NA,NA)
)
fn <- function(cus, mon, ten, rat){
if (mon > ten & is.na(rat)){
return(dplyr::filter(df, customer == cus, Month == ten, tenor == ten)$rate)
}
return(rat)
}
df2 <- mutate(df,
newrate = Vectorize(fn)(customer, Month, tenor, rate)
)
One option is:
library(dplyr)
library(tidyr)
df %>%
group_by(cus_no) %>%
fill(rate, .direction = "down") %>%
ungroup()
# A tibble: 20 x 4
customer Month tenor rate
<dbl> <dbl> <dbl> <dbl>
1 1 1 5 0.2
2 1 2 5 0.3
3 1 3 5 0.4
4 1 4 5 0.5
5 1 5 5 0.6
6 1 6 5 0.6
7 1 7 5 0.6
8 1 8 5 0.6
9 1 9 5 0.6
10 1 10 5 0.6
11 2 1 3 0.1
12 2 2 3 0.2
13 2 3 3 0.3
14 2 4 3 0.3
15 2 5 3 0.3
16 2 6 3 0.3
17 2 7 3 0.3
18 2 8 3 0.3
19 2 9 3 0.3
20 2 10 3 0.3
I can't replicate your data frame so this is a guess right now.
I think dplyr should be the solution:-
library(dplyr)
df%>%
group_by(Month)%>%
replace_na(last(rate))
should work
I know how to do basic stuff in R, but I am still a newbie. I am also probably asking a pretty redundant question (but I don't know how to enter it into google so that I find the right hits).
I have been getting hits like the below:
Assign value to group based on condition in column
R - Group by variable and then assign a unique ID
I want to assign subgroups into groups, and create a new column out of them.
I have data like the following:
dataframe:
ID SubID Values
1 15 0.5
1 15 0.2
2 13 0.1
2 13 0
1 14 0.3
1 14 0.3
2 10 0.2
2 10 1.6
6 31 0.7
6 31 1.0
new dataframe:
ID SubID Values groups
1 15 0.5 2
1 15 0.2 2
2 13 0.1 2
2 13 0 2
1 14 0.3 1
1 14 0.3 1
2 10 0.2 1
2 10 1.6 1
6 31 0.7 1
6 31 1.0 1
I have tried the following in R, but I am not getting the desired results:
newdataframe$groups <- dataframe %>% group_indices(,dataframe$ID, dataframe$SubID)
newdataframe<- dataframe %>% group_by(ID, SubID) %>% mutate(groups=group_indices(,dataframe$ID, dataframe$SubID))
I am not sure how to frame the question in R. I want to group by ID, and SubID, and then assign those subgroups in that are grouped by IDs and reset the the grouping count on each ID.
Any help would be really appreciated.
Here is an alternative approach which uses the rleid() function from the data.table package. rleid() generates a run-length type id column.
According to the expected result, the OP expects SubId to be numbered by order of value and not by order of appearance. Therefore, we need to call arrange().
library(dplyr)
df %>%
group_by(ID) %>%
arrange(SubID) %>%
mutate(groups = data.table::rleid(SubID))
ID SubID Values groups
<int> <int> <dbl> <int>
1 2 10 0.2 1
2 2 10 1.6 1
3 2 13 0.1 2
4 2 13 0 2
5 1 14 0.3 1
6 1 14 0.3 1
7 1 15 0.5 2
8 1 15 0.2 2
9 6 31 0.7 1
10 6 31 1 1
Note that the row order has changed.
BTW: With data.table, the code is less verbose and the original row order is maintained:
library(data.table)
setDT(df)[order(ID, SubID), groups := rleid(SubID), by = ID][]
ID SubID Values groups
1: 1 15 0.5 2
2: 1 15 0.2 2
3: 2 13 0.1 2
4: 2 13 0.0 2
5: 1 14 0.3 1
6: 1 14 0.3 1
7: 2 10 0.2 1
8: 2 10 1.6 1
9: 6 31 0.7 1
10: 6 31 1.0 1
There are multiple ways to do this one way would be to group_by ID and create a unique number for each SubID by converting it to factor and then to integer.
library(dplyr)
df %>%
group_by(ID) %>%
mutate(groups = as.integer(factor(SubID)))
# ID SubID Values groups
# <int> <int> <dbl> <int>
# 1 1 15 0.5 2
# 2 1 15 0.2 2
# 3 2 13 0.1 2
# 4 2 13 0 2
# 5 1 14 0.3 1
# 6 1 14 0.3 1
# 7 2 10 0.2 1
# 8 2 10 1.6 1
# 9 6 31 0.7 1
#10 6 31 1 1
In base R, we can use ave with similar logic
df$groups <- with(df, ave(SubID, ID, FUN = factor))
I have a data frame ‘true set’, that I would like to sort based on the order of values in vectors ‘order’.
true_set <- data.frame(dose1=c(rep(1,5),rep(2,5),rep(3,5)), dose2=c(rep(1:5,3)),toxicity=c(0.05,0.1,0.15,0.3,0.45,0.1,0.15,0.3,0.45,0.55,0.15,0.3,0.45,0.55,0.6),efficacy=c(0.2,0.3,0.4,0.5,0.6,0.4,0.5,0.6,0.7,0.8,0.5,0.6,0.7,0.8,0.9),d=c(1:15))
orders<-matrix(nrow=3,ncol=15)
orders[1,]<-c(1,2,6,3,7,11,4,8,12,5,9,13,10,14,15)
orders[2,]<-c(1,6,2,3,7,11,12,8,4,5,9,13,14,10,15)
orders[3,]<-c(1,6,2,11,7,3,12,8,4,13,9,5,14,10,15)
The expected result would be:
First orders[1,] :
dose1 dose2 toxicity efficacy d
1 1 1 0.05 0.2 1
2 1 2 0.10 0.3 2
3 2 1 0.10 0.4 6
4 1 3 0.15 0.4 3
5 2 2 0.15 0.5 7
6 3 1 0.15 0.5 11
7 1 4 0.30 0.5 4
8 2 3 0.30 0.6 8
9 3 2 0.30 0.6 12
10 1 5 0.45 0.6 5
11 2 4 0.45 0.7 9
12 3 3 0.45 0.7 13
13 2 5 0.55 0.8 10
14 3 4 0.55 0.8 14
15 3 5 0.60 0.9 15
First orders[2,] : as above
First orders[3,] : as above
true_set <- data.frame(dose1=c(rep(1,5),rep(2,5),rep(3,5)), dose2=c(rep(1:5,3)),toxicity=c(0.05,0.1,0.15,0.3,0.45,0.1,0.15,0.3,0.45,0.55,0.15,0.3,0.45,0.55,0.6),efficacy=c(0.2,0.3,0.4,0.5,0.6,0.4,0.5,0.6,0.7,0.8,0.5,0.6,0.7,0.8,0.9),d=c(1:15))
orders<-matrix(nrow=3,ncol=15)
orders[1,]<-c(1,2,6,3,7,11,4,8,12,5,9,13,10,14,15)
orders[2,]<-c(1,6,2,3,7,11,12,8,4,5,9,13,14,10,15)
orders[3,]<-c(1,6,2,11,7,3,12,8,4,13,9,5,14,10,15)
# Specify your order set in the row dimension
First_order <- true_set[orders[1,],]
Second_order <- true_Set[orders[2,],]
Third_order <- true_Set[orders[3,],]
# If you want to store all orders in a list, you can try the command below:
First_orders <- list(First_Order=true_set[orders[1,],],Second_Order=true_set[orders[2,],],Third_Order=true_set[orders[3,],])
First_orders[1] # OR First_orders$First_Order
First_orders[2] # OR First_orders$Second_Order
First_orders[3] # OR First_orders$Third_Order
# If you want to combine the orders column wise, try the command below:
First_orders <- cbind(First_Order=true_set[orders[1,],],Second_Order=true_set[orders[2,],],Third_Order=true_set[orders[3,],])
# If you want to combine the orders row wise, try the command below:
First_orders <- rbind(First_Order=true_set[orders[1,],],Second_Order=true_set[orders[2,],],Third_Order=true_set[orders[3,],])