Retrieve a value by another column criteria in R - r

i need some help:
i got this df:
df <- data.frame(month = c(1,1,1,1,1,2,2,2,2,2),
day = c(1,2,3,4,5,1,2,3,4,5),
flow = c(2,5,7,8,5,4,6,7,9,2))
month day flow
1 1 1 2
2 1 2 5
3 1 3 7
4 1 4 8
5 1 5 5
6 2 1 4
7 2 2 6
8 2 3 7
9 2 4 9
10 2 5 2
but i want to know the day of min per month:
month day flow dayminflowofthemonth
1 1 1 2 1
2 1 2 5 1
3 1 3 7 1
4 1 4 8 1
5 1 5 5 1
6 2 1 4 5
7 2 2 6 5
8 2 3 7 5
9 2 4 9 5
10 2 5 2 5
this repetition is not a problem, i will use pivot fuction
tks people!

We can use which.min to return the index of 'min'imum 'flow' per group and use that to get the corresponding 'day' to create the column with mutate
library(dplyr)
df <- df %>%
group_by(month) %>%
mutate(dayminflowofthemonth = day[which.min(flow)]) %>%
ungroup
-output
df
# A tibble: 10 x 4
# month day flow dayminflowofthemonth
# <dbl> <dbl> <dbl> <dbl>
# 1 1 1 2 1
# 2 1 2 5 1
# 3 1 3 7 1
# 4 1 4 8 1
# 5 1 5 5 1
# 6 2 1 4 5
# 7 2 2 6 5
# 8 2 3 7 5
# 9 2 4 9 5
#10 2 5 2 5

Another option using indexing inside dplyr pipeline:
library(dplyr)
#Code
newdf <- df %>% group_by(month) %>% mutate(Val=day[flow==min(flow)][1])
Output:
# A tibble: 10 x 4
# Groups: month [2]
month day flow Val
<dbl> <dbl> <dbl> <dbl>
1 1 1 2 1
2 1 2 5 1
3 1 3 7 1
4 1 4 8 1
5 1 5 5 1
6 2 1 4 5
7 2 2 6 5
8 2 3 7 5
9 2 4 9 5
10 2 5 2 5

Here is a base R option using ave
transform(
df,
dayminflowofthemonth = ave(day*(ave(flow,month,FUN = min)==flow),month,FUN = max)
)
which gives
month day flow dayminflowofthemonth
1 1 1 2 1
2 1 2 5 1
3 1 3 7 1
4 1 4 8 1
5 1 5 5 1
6 2 1 4 5
7 2 2 6 5
8 2 3 7 5
9 2 4 9 5
10 2 5 2 5

One more base R approach:
df$dayminflowofthemonth <- by(
df,
df$month,
function(x) x$day[which.min(x$flow)]
)[df$month]

Related

Creating an indexed column in R, grouped by user_id, and not increase when NA

I want to create a column (in R) that indexes the presence of a number in another column grouped by a user_id column. And when the other column is NA, the new desired column should not increase.
The example should bring clarity.
I have this df:
data <- data.frame(user_id = c(1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3),
one=c(1,NA,3,2,NA,0,NA,4,3,4,NA))
user_id tobeindexed
1 1 1
2 1 NA
3 1 3
4 2 2
5 2 NA
6 2 0
7 2 NA
8 3 4
9 3 3
10 3 4
11 3 NA
I want to make a new column looking like "desired" in the following df:
> cbind(data,data.frame(desired = c(1,1,2,1,1,2,2,1,2,3,3)))
user_id tobeindexed desired
1 1 1 1
2 1 NA 1
3 1 3 2
4 2 2 1
5 2 NA 1
6 2 0 2
7 2 NA 2
8 3 4 1
9 3 3 2
10 3 4 3
11 3 NA 3
How can I solve this?
Using colsum and group_by gets me close, but the count does not start over from 1 when the user_id changes...
> data %>% group_by(user_id) %>% mutate(desired = cumsum(!is.na(tobeindexed)))
user_id tobeindexed desired
<dbl> <dbl> <int>
1 1 1 1
2 1 NA 1
3 1 3 2
4 2 2 3
5 2 NA 3
6 2 0 4
7 2 NA 4
8 3 4 5
9 3 3 6
10 3 4 7
11 3 NA 7
Given the sample data you provided (with the one) column, this works unchanged. The code is retained below for demonstration.
base R
data$out <- ave(data$one, data$user_id, FUN = function(z) cumsum(!is.na(z)))
data
# user_id one out
# 1 1 1 1
# 2 1 NA 1
# 3 1 3 2
# 4 2 2 1
# 5 2 NA 1
# 6 2 0 2
# 7 2 NA 2
# 8 3 4 1
# 9 3 3 2
# 10 3 4 3
# 11 3 NA 3
dplyr
library(dplyr)
data %>%
group_by(user_id) %>%
mutate(out = cumsum(!is.na(one))) %>%
ungroup()
# # A tibble: 11 × 3
# user_id one out
# <dbl> <dbl> <int>
# 1 1 1 1
# 2 1 NA 1
# 3 1 3 2
# 4 2 2 1
# 5 2 NA 1
# 6 2 0 2
# 7 2 NA 2
# 8 3 4 1
# 9 3 3 2
# 10 3 4 3
# 11 3 NA 3

Apply same function to several data replicates in R

Consider the following data simulation mechanism:
set.seed(1)
simulW <- function(G)
{
# Let G be the number of groups
n<-2*G #Assume 2 individuals per group
i<-rep(1:G, rep(2,G)) # Group index
j<-rep (1:n)
Y<-rbinom(n, 1, 0.5) # binary
data.frame(id=1:n, i,Y)
}
r<-5 #5 replicates
dat1 <- replicate(r, simulW(G = 10 ), simplify=FALSE)
#For example the first data replicate will be
> dat1[[1]]
id i Y
1 1 1 0
2 2 1 1
3 3 2 0
4 4 2 0
5 5 3 0
6 6 3 0
7 7 4 0
8 8 4 1
9 9 5 1
10 10 5 0
The code below can perform group wise (i is the group) sum of Y but by default considers only the first replicate i.e dat1[[1]].
Di<-aggregate( Y, by=list ( i ),FUN=sum) #Sum per group for the first dataset
e<-colSums(Di [ 2 ] ) #Total sum of Y for all groups for dataset 1
> e
x
8
di<-Di [ 2 ] # Groupwise sum for replicate 1
> di
x
1 2
2 2
3 2
4 0
5 2
How can I use the same function to perform the group wise sum for the other replicates.
Maybe something like:
for (m in 1:r )
{
Di[m]<-
e[m]<-
di[m]<-
}
You may use aggregate in lapply -
result <- lapply(dat1, function(x) aggregate(Y~i, x, sum))
result
#[[1]]
# i Y
#1 1 1
#2 2 1
#3 3 0
#4 4 0
#5 5 1
#6 6 1
#7 7 0
#8 8 2
#9 9 1
#10 10 1
#[[2]]
# i Y
#1 1 2
#2 2 2
#3 3 2
#4 4 0
#5 5 2
#6 6 1
#7 7 0
#8 8 0
#9 9 1
#10 10 1
#...
#...
We may use tidyverse
library(purrr)
library(dplyr)
map(dat1, ~ .x %>%
group_by(i) %>%
summarise(Y = sum(Y)))
-output
[[1]]
# A tibble: 10 × 2
i Y
<int> <int>
1 1 0
2 2 2
3 3 1
4 4 2
5 5 1
6 6 0
7 7 1
8 8 1
9 9 2
10 10 1
[[2]]
# A tibble: 10 × 2
i Y
<int> <int>
1 1 1
2 2 1
3 3 0
4 4 0
5 5 1
6 6 1
7 7 0
8 8 2
9 9 1
10 10 1
[[3]]
# A tibble: 10 × 2
i Y
<int> <int>
1 1 2
2 2 2
3 3 2
4 4 0
5 5 2
6 6 1
7 7 0
8 8 0
9 9 1
10 10 1
[[4]]
# A tibble: 10 × 2
i Y
<int> <int>
1 1 1
2 2 0
3 3 1
4 4 1
5 5 1
6 6 1
7 7 0
8 8 1
9 9 1
10 10 2
[[5]]
# A tibble: 10 × 2
i Y
<int> <int>
1 1 1
2 2 0
3 3 1
4 4 1
5 5 0
6 6 0
7 7 2
8 8 2
9 9 0
10 10 2

New variable based on first appearance of another variable in each group in R

I have a long data frame like this one:
set.seed(17)
players<-rep(1:2, c(5,5))
decs<-sample(1:3,10,replace=TRUE)
world<-sample(1:2,10,replace=TRUE)
gamematrix<-cbind(players,decs,world)
gamematrix<-data.frame(gamematrix)
gamematrix
players decs world
1 1 1 1
2 1 3 1
3 1 2 2
4 1 3 2
5 1 2 2
6 2 2 2
7 2 1 2
8 2 1 1
9 2 3 2
10 2 1 2
I want to create for each player a new variable, that is based on the first appearance of the decs==3 variable, and the state of the world.
That is, if when the first appearance of "decs", the state of the world was "1", then the new variable should get the value of "6", otherwise, "7", as follows:
players decs world player_type
1 1 1 1 6
2 1 3 1 6
3 1 2 2 6
4 1 3 2 6
5 1 2 2 6
6 2 2 2 7
7 2 1 2 7
8 2 1 1 7
9 2 3 2 7
10 2 1 2 7
Any ideas how to do it?
This tidyverse approach might be a little cumbersome but it should give you what you want.
library(tidyverse)
left_join(
gamematrix,
gamematrix %>%
filter(decs == 3) %>%
group_by(players) %>%
slice(1) %>%
mutate(player_type = ifelse(world == 1, 6, 7)) %>%
select(players, player_type),
by = 'players'
)
# players decs world player_type
#1 1 1 1 6
#2 1 3 1 6
#3 1 2 2 6
#4 1 3 2 6
#5 1 2 2 6
#6 2 2 2 7
#7 2 1 2 7
#8 2 1 1 7
#9 2 3 2 7
#10 2 1 2 7
The idea is to filter you data for observations where decs == 3, extract the first element per 'players', add player_type subject to the state of the 'world' and finally merge with your original data.
An option is to use cumsum(decs==3) == 1 to find first occurrence of decs == 3 for a player. Now, dplyr::case_when can be used to assign player type.
library(dplyr)
gamematrix %>% group_by(players) %>%
mutate(player_type = case_when(
world[first(which(cumsum(decs==3)==1))] == 1 ~ 6L,
world[first(which(cumsum(decs==3)==1))] == 2 ~ 7L,
TRUE ~ NA_integer_))
# # A tibble: 10 x 4
# # Groups: players [2]
# players decs world player_type
# <int> <int> <int> <int>
# 1 1 1 1 6
# 2 1 3 1 6
# 3 1 2 2 6
# 4 1 3 2 6
# 5 1 2 2 6
# 6 2 2 2 7
# 7 2 1 2 7
# 8 2 1 1 7
# 9 2 3 2 7
# 10 2 1 2 7
We could use data.table
library(data.table)
setDT(gamematrix)[, player_type := c(7, 6)[any(decs == 3& world == 1) + 1],
by = players]
gamematrix
# players decs world player_type
# 1: 1 1 1 6
# 2: 1 3 1 6
# 3: 1 2 2 6
# 4: 1 3 2 6
# 5: 1 2 2 6
# 6: 2 2 2 7
# 7: 2 1 2 7
# 8: 2 1 1 7
# 9: 2 3 2 7
#10: 2 1 2 7

count positive negative values in column by group

I want to create two variables giving me the total number of positive and negative values by id, hopefully using dplyr.
Example data:
library(dplyr)
set.seed(42)
df <- data.frame (id=rep(1:10,each=10),
ff=rnorm(100, 0,14 ))
> head(df,20)
id ff
1 1 19.1934183
2 1 -7.9057744
3 1 5.0837978
4 1 8.8600765
5 1 5.6597565
6 1 -1.4857432
7 1 21.1613080
8 1 -1.3252265
9 1 28.2579320
10 1 -0.8779974
11 2 18.2681752
12 2 32.0130355
13 2 -19.4440498
14 2 -3.9030427
15 2 -1.8664987
16 2 8.9033056
17 2 -3.9795409
18 2 -37.1903759
19 2 -34.1665370
20 2 18.4815868
the resulting dataset should look like:
> head(df,20)
id ff pos neg
1 1 19.1934183 6 4
2 1 -7.9057744 6 4
3 1 5.0837978 6 4
4 1 8.8600765 6 4
5 1 5.6597565 6 4
6 1 -1.4857432 6 4
7 1 21.1613080 6 4
8 1 -1.3252265 6 4
9 1 28.2579320 6 4
10 1 -0.8779974 6 4
11 2 18.2681752 4 6
12 2 32.0130355 4 6
13 2 -19.4440498 4 6
14 2 -3.9030427 4 6
15 2 -1.8664987 4 6
16 2 8.9033056 4 6
17 2 -3.9795409 4 6
18 2 -37.1903759 4 6
19 2 -34.1665370 4 6
20 2 18.4815868 4 6
I have thought something similar to this will work:
df<-df%>% group_by(id) %>% mutate(pos= nrow(ff>0)) %>% ungroup()
Any help would be great, thanks.
You need sum():
df %>% group_by(id) %>%
mutate(pos = sum(ff>0),
neg = sum(ff<0))
For a fun (and a fast) solution data.table can also be used:
library(data.table)
setDT(df)
df[, ":="(pos = sum(ff > 0), neg = sum(ff < 0)), by = id]
Here's an answer that add the ifelse part of your question:
df <- df %>% group_by(id) %>%
mutate(pos = sum(ff>0), neg = sum(ff<0)) %>%
group_by(id) %>%
mutate(any_neg=ifelse(any(ff < 0), 1, 0))
Output:
> head(df, 20)
Source: local data frame [20 x 5]
Groups: id [2]
id ff pos neg any_neg
<int> <dbl> <int> <int> <dbl>
1 1 19.1934183 6 4 1
2 1 -7.9057744 6 4 1
3 1 5.0837978 6 4 1
4 1 8.8600765 6 4 1
5 1 5.6597565 6 4 1
6 1 -1.4857432 6 4 1
7 1 21.1613080 6 4 1
8 1 -1.3252265 6 4 1
9 1 28.2579320 6 4 1
10 1 -0.8779974 6 4 1
11 2 18.2681752 4 6 1
12 2 32.0130355 4 6 1
13 2 -19.4440498 4 6 1
14 2 -3.9030427 4 6 1
15 2 -1.8664987 4 6 1
16 2 8.9033056 4 6 1
17 2 -3.9795409 4 6 1
18 2 -37.1903759 4 6 1
19 2 -34.1665370 4 6 1
20 2 18.4815868 4 6 1

Using mutate to create a new column with the first value of each group in R

I'm currently working on a Sabermetric research project and I've been stuck all day trying to create a new column in a data frame that displays the starting pitcher for a given game. Essentially, if I use the sample below, I have data for 'a' and 'b', but I can't figure out how to create 'c' to be the first value of 'b' for each unique value of 'a'. This should be easy, but I just started learning R.
a b c
1 1 1 1
2 1 2 1
3 1 3 1
4 1 4 1
5 1 5 1
6 1 6 1
7 2 7 7
8 2 8 7
9 2 1 7
10 2 2 7
11 2 3 7
12 2 4 7
13 3 5 5
14 3 6 5
15 3 7 5
So far I've used mutate and group_by to come up with
sample <- sample %>% group_by(a) %>% mutate(c = first(b))
But this just makes every value of 'c' the first value of the first 'b'. So in the sample above, my current code makes every value of 'c' equal to 1.
I'm missing something, any suggestions?
We can use base R
df1$c <- with(df1, ave(b, a, FUN= function(x) head(x,1)))
Or with data.table
library(data.table)
setDT(df1)[, c:= head(b, 1), by = a]
Using library dplyr, you can do something like this:
library(dplyr)
df %>% group_by(a) %>% mutate(c = b[1])
Output is as follows:
Source: local data frame [15 x 3]
Groups: a [3]
a b c
(int) (int) (int)
1 1 1 1
2 1 2 1
3 1 3 1
4 1 4 1
5 1 5 1
6 1 6 1
7 2 7 7
8 2 8 7
9 2 1 7
10 2 2 7
11 2 3 7
12 2 4 7
13 3 5 5
14 3 6 5
15 3 7 5
Changing columns to the types mentioned below in comments and running code produces desired output:
df$b <- as.factor(df$b)
df$a <- as.character(df$a)
str(df)
'data.frame': 15 obs. of 3 variables:
$ a: chr "1" "1" "1" "1" ...
$ b: Factor w/ 8 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 1 2 ...
$ c: int 1 1 1 1 1 1 7 7 7 7 ...
df %>% group_by(a) %>% mutate(c = b[1])
Source: local data frame [15 x 3]
Groups: a [3]
a b c
(chr) (fctr) (fctr)
1 1 1 1
2 1 2 1
3 1 3 1
4 1 4 1
5 1 5 1
6 1 6 1
7 2 7 7
8 2 8 7
9 2 1 7
10 2 2 7
11 2 3 7
12 2 4 7
13 3 5 5
14 3 6 5
15 3 7 5
Not so elegant but it works, I hope it works for you too:
df1 %>% group_by(a) %>% mutate(c = rep(first(b), length(a)))
Source: local data frame [15 x 3]
Groups: a [3]
a b c
(int) (int) (int)
1 1 1 1
2 1 2 1
3 1 3 1
4 1 4 1
5 1 5 1
6 1 6 1
7 2 7 7
8 2 8 7
9 2 1 7
10 2 2 7
11 2 3 7
12 2 4 7
13 3 5 5
14 3 6 5
15 3 7 5

Resources