I'm desperately trying to lag a variable by group. I found this post that deals with essentially the same problem I'm facing, but the solution does not work for me, no idea why.
This is my problem:
library(dplyr)
df <- data.frame(monthvec = c(rep(1:2, 2), rep(3:5, 3)))
df <- df %>%
arrange(monthvec) %>%
mutate(growth=ifelse(monthvec==1, 0.3,
ifelse(monthvec==2, 0.5,
ifelse(monthvec==3, 0.7,
ifelse(monthvec==4, 0.1,
ifelse(monthvec==5, 0.6,NA))))))
df%>%
group_by(monthvec) %>%
mutate(lag.growth = lag(growth, order_by=monthvec))
Source: local data frame [13 x 3]
Groups: monthvec [5]
monthvec growth lag.growth
<int> <dbl> <dbl>
1 1 0.3 NA
2 1 0.3 0.3
3 2 0.5 NA
4 2 0.5 0.5
5 3 0.7 NA
6 3 0.7 0.7
7 3 0.7 0.7
8 4 0.1 NA
9 4 0.1 0.1
10 4 0.1 0.1
11 5 0.6 NA
12 5 0.6 0.6
13 5 0.6 0.6
This is what I'd like it to be in the end:
df$lag.growth <- c(NA, NA, 0.3, 0.3, 0.5, 0.5, 0.5, 0.7,0.7,0.7, 0.1,0.1,0.1)
monthvec growth lag.growth
1 1 0.3 NA
2 1 0.3 NA
3 2 0.5 0.3
4 2 0.5 0.3
5 3 0.7 0.5
6 3 0.7 0.5
7 3 0.7 0.5
8 4 0.1 0.7
9 4 0.1 0.7
10 4 0.1 0.7
11 5 0.6 0.1
12 5 0.6 0.1
13 5 0.6 0.1
I believe that one problem is that my groups are not of equal length...
Thanks for helping out.
Here is an idea. We group by monthvec in order to get the number of rows (cnt) of each group. We ungroup and use the first value of cnt as the size of the lag. We regroup on monthvec and replace the values in each group with the first value of each group.
library(dplyr)
df %>%
group_by(monthvec) %>%
mutate(cnt = n()) %>%
ungroup() %>%
mutate(lag.growth = lag(growth, first(cnt))) %>%
group_by(monthvec) %>%
mutate(lag.growth = first(lag.growth)) %>%
select(-cnt)
which gives,
# A tibble: 13 x 3
# Groups: monthvec [5]
monthvec growth lag.growth
<int> <dbl> <dbl>
1 1 0.3 NA
2 1 0.3 NA
3 2 0.5 0.3
4 2 0.5 0.3
5 3 0.7 0.5
6 3 0.7 0.5
7 3 0.7 0.5
8 4 0.1 0.7
9 4 0.1 0.7
10 4 0.1 0.7
11 5 0.6 0.1
12 5 0.6 0.1
13 5 0.6 0.1
You may join your original data with a dataframe with a shifted "monthvec".
left_join(df, df %>% mutate(monthvec = monthvec + 1) %>% unique(), by = "monthvec")
# monthvec growth.x growth.y
# 1 1 0.3 NA
# 2 1 0.3 NA
# 3 2 0.5 0.3
# 4 2 0.5 0.3
# 5 3 0.7 0.5
# 6 3 0.7 0.5
# 7 3 0.7 0.5
# 8 4 0.1 0.7
# 9 4 0.1 0.7
# 10 4 0.1 0.7
# 11 5 0.6 0.1
# 12 5 0.6 0.1
# 13 5 0.6 0.1
Related
I would like to create a new df, based upon whether the second or third condition's for each subject are greater than the first condition.
Example df:
df1 <- data.frame(subject = rep(1:5, 3),
condition = rep(c("first", "second", "third"), each = 5),
values = c(.4, .4, .4, .4, .4, .6, .6, .6, .6, .4, .6, .6, .6, .4, .4))
> df1
subject condition values
1 1 first 0.4
2 2 first 0.4
3 3 first 0.4
4 4 first 0.4
5 5 first 0.4
6 1 second 0.6
7 2 second 0.6
8 3 second 0.6
9 4 second 0.6
10 5 second 0.4
11 1 third 0.6
12 2 third 0.6
13 3 third 0.6
14 4 third 0.4
15 5 third 0.4
The resulting df would be this:
> df2
subject condition values
1 1 first 0.4
2 2 first 0.4
3 3 first 0.4
4 4 first 0.4
6 1 second 0.6
7 2 second 0.6
8 3 second 0.6
9 4 second 0.6
11 1 third 0.6
12 2 third 0.6
13 3 third 0.6
14 4 third 0.4
Here, subject #5 does not meet the criteria. This is because only subject #5's values are not greater than the first condition in either the second or third condition.
Thanks.
We may group by 'subject' and filter if any of the second or third 'values' are greater than 'first'
library(dplyr)
df1 %>%
group_by(subject) %>%
filter(any(values[2:3] > first(values))) %>%
ungroup
-output
# A tibble: 12 × 3
subject condition values
<int> <chr> <dbl>
1 1 first 0.4
2 2 first 0.4
3 3 first 0.4
4 4 first 0.4
5 1 second 0.6
6 2 second 0.6
7 3 second 0.6
8 4 second 0.6
9 1 third 0.6
10 2 third 0.6
11 3 third 0.6
12 4 third 0.4
Using ave.
df1[with(df1, ave(values, subject, FUN=\(x) any(x[2:3] > x[1])) == 1), ]
# subject condition values
# 1 1 first 0.4
# 2 2 first 0.4
# 3 3 first 0.4
# 4 4 first 0.4
# 6 1 second 0.6
# 7 2 second 0.6
# 8 3 second 0.6
# 9 4 second 0.6
# 11 1 third 0.6
# 12 2 third 0.6
# 13 3 third 0.6
# 14 4 third 0.4
This question already has answers here:
Select groups where all values are positive
(2 answers)
Closed 7 months ago.
I have a data frame which is grouped by 'subject'. I want filter those 'subject' where all 'values' are above a certain value, values > 0.5
Example df:
df1 <- data.frame(subject = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5),
values = c(.4, .6, .6, .6, .6, .6, .6, .6, .6, .4))
df1
subject values
1 1 0.4
2 2 0.6
3 3 0.6
4 4 0.6
5 5 0.6
6 1 0.6
7 2 0.6
8 3 0.6
9 4 0.6
10 5 0.4
Desired output:
> df1
subject values
1 2 0.6
2 3 0.6
3 4 0.6
4 2 0.6
5 3 0.6
6 4 0.6
You can use all() inside a grouped filter using dplyr:
library(dplyr)
df1 %>%
group_by(subject) %>%
filter(all(values > .5)) %>%
ungroup()
Output:
# A tibble: 6 x 2
subject values
<dbl> <dbl>
1 2 0.6
2 3 0.6
3 4 0.6
4 2 0.6
5 3 0.6
6 4 0.6
Using min in ave.
df1[with(df1, ave(values, subject, FUN=min)) > .5, ]
# subject values
# 2 2 0.6
# 3 3 0.6
# 4 4 0.6
# 7 2 0.6
# 8 3 0.6
# 9 4 0.6
a data.table approach
library(data.table_)
setDT(df1)[, .SD[all(values == 0.6) == TRUE], by = .(subject)][]
# subject values
# 1: 2 0.6
# 2: 2 0.6
# 3: 3 0.6
# 4: 3 0.6
# 5: 4 0.6
# 6: 4 0.6
I have data that looks like this:
set.seed(13)
dt <- data.frame(group = c(rep("a", 3), rep("b", 4), rep("c", 3)), var = c(rep(0.1,3), rep(0.3, 4), rep(1.1,3)))
dt
group var
1 a 0.1
2 a 0.1
3 a 0.1
4 b 0.3
5 b 0.3
6 b 0.3
7 b 0.3
8 c 1.1
9 c 1.1
10 c 1.1
I'd like to lag var variable for all respondents in the group variable group. One difficulty is that the groups are of different size, otherwise this would be no problem specifing n as the size of all groups. My data should look accordingly (see below). How do I get at this using dplyr for example?
group var lag1.var lag2.var
1 a 0.1 NA NA
2 a 0.1 NA NA
3 a 0.1 NA NA
4 b 0.3 0.1 NA
5 b 0.3 0.1 NA
6 b 0.3 0.1 NA
7 b 0.3 0.1 NA
8 c 1.1 0.3 0.1
9 c 1.1 0.3 0.1
10 c 1.1 0.3 0.1
You can create a tibble with the lag variables for each group and then merge it with dt. Try this:
left_join(dt, dt %>%
group_by(group) %>%
mutate(var = first(var)) %>%
distinct() %>%
ungroup() %>%
mutate(lag1.var = lag(var, order_by = group),
lag2.var = lag(lag1.var, order_by = group)) %>%
select(-var),
by = "group")
# output
group var lag1.var lag2.var
1 a 0.1 NA NA
2 a 0.1 NA NA
3 a 0.1 NA NA
4 b 0.3 0.1 NA
5 b 0.3 0.1 NA
6 b 0.3 0.1 NA
7 b 0.3 0.1 NA
8 c 1.1 0.3 0.1
9 c 1.1 0.3 0.1
10 c 1.1 0.3 0.1
This assumes that var is always the same within each group
Here is another option. First we nest by group, then we map out the lagged values and then unnest.
library(tidyverse)
dt %>%
nest(-group) %>%
mutate(lag1.var = map_dbl(data, ~.x$var[[1]]) %>% lag(.), lag2.var = lag(lag1.var)) %>%
unnest
#> group lag1.var lag2.var var
#> 1 a NA NA 0.1
#> 2 a NA NA 0.1
#> 3 a NA NA 0.1
#> 4 b 0.1 NA 0.3
#> 5 b 0.1 NA 0.3
#> 6 b 0.1 NA 0.3
#> 7 b 0.1 NA 0.3
#> 8 c 0.3 0.1 1.1
#> 9 c 0.3 0.1 1.1
#> 10 c 0.3 0.1 1.1
I have two data base, df and cf. I want to multiply each value of A in df by each coefficient in cf depending on the value of B and C in table df.
For example
row 2 in df A= 20 B= 4 and C= 2 so the correct coefficient is 0.3,
the result is 20*0.3 = 6
There is a simple way to do that in R!?
Thanks in advance!!
df
A B C
20 4 2
30 4 5
35 2 2
24 3 3
43 2 1
cf
C
B/C 1 2 3 4 5
1 0.2 0.3 0.5 0.6 0.7
2 0.1 0.5 0.3 0.3 0.4
3 0.9 0.1 0.6 0.6 0.8
4 0.7 0.3 0.7 0.4 0.6
One solution with apply:
#iterate over df's rows
apply(df, 1, function(x) {
x[1] * cf[x[2], x[3]]
})
#[1] 6.0 18.0 17.5 14.4 4.3
Try this vectorized:
df[,1] * cf[as.matrix(df[,2:3])]
#[1] 6.0 18.0 17.5 14.4 4.3
A solution using dplyr and a vectorised function:
df = read.table(text = "
A B C
20 4 2
30 4 5
35 2 2
24 3 3
43 2 1
", header=T, stringsAsFactors=F)
cf = read.table(text = "
0.2 0.3 0.5 0.6 0.7
0.1 0.5 0.3 0.3 0.4
0.9 0.1 0.6 0.6 0.8
0.7 0.3 0.7 0.4 0.6
")
library(dplyr)
# function to get the correct element of cf
# vectorised version
f = function(x,y) cf[x,y]
f = Vectorize(f)
df %>%
mutate(val = f(B,C),
result = val * A)
# A B C val result
# 1 20 4 2 0.3 6.0
# 2 30 4 5 0.6 18.0
# 3 35 2 2 0.5 17.5
# 4 24 3 3 0.6 14.4
# 5 43 2 1 0.1 4.3
The final dataset has both result and val in order to check which value from cf was used each time.
I have association matrix file that looks like this (4 rows and 3 columns) .
test=read.table("test.csv", sep=",", header=T)
head(test)
LosAngeles SanDiego Seattle
1 2 3
A 1 0.1 0.2 0.2
B 2 0.2 0.4 0.2
C 3 0.3 0.5 0.3
D 4 0.2 0.5 0.1
What I want to is reshape this matrix file into data frame. The result should look something like this (12(= 4 * 3) rows and 3 columns):
RowNum ColumnNum Value
1 1 0.1
2 1 0.2
3 1 0.3
4 1 0.2
1 2 0.2
2 2 0.4
3 2 0.5
4 2 0.5
1 3 0.2
2 3 0.2
3 3 0.3
4 3 0.1
That is, if my matrix file has 100 rows and 90 columns. I want to make new data frame file that contains 9000 (= 100 * 90) rows and 3 columns. I've tried to use reshape package but but I do not seem to be able to get it right. Any suggestions how to solve this problem?
Use as.data.frame.table. Its the boss:
m <- matrix(data = c(0.1, 0.2, 0.2,
0.2, 0.4, 0.2,
0.3, 0.5, 0.3,
0.2, 0.5, 0.1),
nrow = 4, byrow = TRUE,
dimnames = list(row = 1:4, col = 1:3))
m
# col
# row 1 2 3
# 1 0.1 0.2 0.2
# 2 0.2 0.4 0.2
# 3 0.3 0.5 0.3
# 4 0.2 0.5 0.1
as.data.frame.table(m)
# row col Freq
# 1 1 1 0.1
# 2 2 1 0.2
# 3 3 1 0.3
# 4 4 1 0.2
# 5 1 2 0.2
# 6 2 2 0.4
# 7 3 2 0.5
# 8 4 2 0.5
# 9 1 3 0.2
# 10 2 3 0.2
# 11 3 3 0.3
# 12 4 3 0.1
This should do the trick:
test <- as.matrix(read.table(text="
1 2 3
1 0.1 0.2 0.2
2 0.2 0.4 0.2
3 0.3 0.5 0.3
4 0.2 0.5 0.1", header=TRUE))
data.frame(which(test==test, arr.ind=TRUE),
Value=test[which(test==test)],
row.names=NULL)
# row col Value
#1 1 1 0.1
#2 2 1 0.2
#3 3 1 0.3
#4 4 1 0.2
#5 1 2 0.2
#6 2 2 0.4
#7 3 2 0.5
#8 4 2 0.5
#9 1 3 0.2
#10 2 3 0.2
#11 3 3 0.3
#12 4 3 0.1