Moving average and moving slope in R - r

I am looking to separately calculate a 7-day moving average and 7-day moving slope of 'oldvar'.
My sincere apologies that I didn't add the details below in my original post. These are repeated observations for each id which can go from a minimum of 3 observations per id to 100 observations per id. The start day can be different for different IDs, and to make things complicated, the days are not equally spaced, so some IDs have missing days.
Here is the data structure. Please note that 'average' is the variable that I am trying to create as moving 7-day average for each ID:
id day outcome average
1 1 15 100 NA
2 1 16 110 NA
3 1 17 190 NA
4 1 18 130 NA
5 1 19 140 NA
6 1 20 150 NA
7 1 21 160 140
8 1 22 100 140
9 1 23 180 150
10 1 24 120 140
12 2 16 90 NA
13 2 17 110 NA
14 2 18 120 NA
12 2 20 130 NA
15 3 16 110 NA
16 3 18 200 NA
17 3 19 180 NA
18 3 21 170 NA
19 3 22 180 168
20 3 24 210 188
21 3 25 160 180
22 3 27 200 184
Also, would appreciate advice on how to calculate a moving 7-day slope using the same.
Thank you and again many apologies for being unclear the first time around.

The real challenge is to create a data.frame after completing the missing rows. One solution could be using zoo library. The rollapply function will provide a way to assign NA value for the initial rows.
Using data from OP as is, the solution could be:
library(zoo)
library(dplyr)
# Data from OP
df <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L),
day = c(15L,16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 16L, 17L, 18L, 20L,
16L, 18L, 19L, 21L, 22L, 24L, 25L, 27L),
outcome = c(100L, 110L,190L, 130L, 140L, 150L, 160L, 100L, 180L, 120L, 90L, 110L, 120L,
130L, 110L, 200L, 180L, 170L, 180L, 210L, 160L, 200L)),
.Names = c("id", "day", "outcome"), row.names = c(NA, -22L), class = "data.frame")
# Make a list without missing day for each id
df_complete <- merge(
expand.grid(id=unique(df$id), day=min(df$day):max(df$day)),
df, all=TRUE)
# Valid range of day for each ID group
df_id_wise_range <- df %>% group_by(id) %>%
summarise(min_day = min(day), max_day = max(day)) %>% as.data.frame()
# id min_day max_day
# 1 1 15 24
# 2 2 16 20
# 3 3 16 27
# Join original df and df_complete and then use df_id_wise_range to
# filter it for valid range of day for each group
df_final <- df_complete %>%
left_join(df, by=c("id","day")) %>%
select(-outcome.y) %>%
inner_join(df_id_wise_range, by="id") %>%
filter(day >= min_day & day <= max_day) %>%
mutate(outcome = outcome.x) %>%
select( id, day, outcome) %>%
as.data.frame()
# Now apply mean to get average
df_average <- df_final %>% group_by(id) %>%
mutate(average= rollapply(outcome, 7, mean, na.rm = TRUE, by = 1,
fill = NA, align = "right", partial = 7)) %>% as.data.frame()
df_average
# The result
# id day outcome average
#1 1 15 100 NA
#2 1 16 110 NA
#3 1 17 190 NA
#4 1 18 130 NA
#5 1 19 140 NA
#6 1 20 150 NA
#7 1 21 160 140.0
#8 1 22 100 140.0
#9 1 23 180 150.0
#10 1 24 120 140.0
#11 2 16 90 NA
#12 2 17 110 NA
#13 2 18 120 NA
#....
#....
#19 3 19 180 NA
#20 3 20 NA NA
#21 3 21 170 NA
#22 3 22 180 168.0
#23 3 23 NA 182.5
#24 3 24 210 188.0
#25 3 25 160 180.0
#26 3 26 NA 180.0
#27 3 27 200 184.0
The steps to calculate moving slope are:
First create a function to return slope
Use function as as part of rollapplyr
#Function to calculate slope
slop_e <- function(z) coef(lm(b ~ a, as.data.frame(z)))[[2]]
#Apply function
z2$slope <- rollapplyr(zoo(z2), 7, slop_e , by.column = FALSE, fill = NA, align = "right")
z2
a b mean_a slope
1 1 21 NA NA
2 2 22 NA NA
3 3 23 NA NA
4 4 24 NA NA
5 5 25 NA NA
6 6 26 NA NA
7 7 27 4 1
8 8 28 5 1
9 9 29 6 1
10 10 30 7 1
11 11 31 8 1
12 12 32 9 1
13 13 33 10 1
14 14 34 11 1
15 15 35 12 1
16 16 36 13 1
17 17 37 14 1
18 18 38 15 1
19 19 39 16 1
20 20 40 17 1

Related

Identify top n observations and aggregate by year all variables in dataframe R

In the following dataframe I would like to identify for every year the two observations by "id" with highest variable "op". Then aggregate those two highest observations of the variables op, tr, cp. I would appreciate an answer with dplyr. My original dataframe has thousands of hundreds of observations so I need something that I can adjust if I want to select the 1000 highest "op" observations by year.
Data:
year id op tr cp
1 1984 1 10 10 10
2 1985 1 20 20 20
3 1986 1 30 30 30
4 1987 1 40 40 40
5 1988 1 50 50 50
6 1985 2 15 15 15
7 1986 2 17 17 17
8 1987 2 18 18 18
9 1988 2 19 19 19
10 1985 3 20 20 20
11 1986 3 22 22 22
12 1986 4 10 10 10
13 1987 4 20 20 20
14 1988 4 40 40 40
Expected output:
year2 op2 tr2 cp2
1 1984 10 10 10
2 1985 40 40 40
3 1986 52 52 52
4 1987 60 60 60
5 1988 90 90 90
So in 1984 highestop aggregated by id=1, in the second id=1 and 3, in 1986 id= 1 and 3, in 1987 id= 1 and 4, 1988 id = 1 and 4.
I would like to avoid using a function but not so sure if that's possible. A well-functioning function would be could.
You could group_by and reframe/summarise across the columns and sort the values in descending order and select the two highest values like this (Please note: na.rm = TRUE is used because your first group has only 1 value so with selecting two values one is NA):
library(dplyr)
df %>%
select(-id) %>%
group_by(year) %>%
reframe(across(op:cp, ~sum(sort(.x, decreasing = TRUE)[1:2], na.rm = TRUE)))
#> # A tibble: 5 × 4
#> year op tr cp
#> <int> <int> <int> <int>
#> 1 1984 10 10 10
#> 2 1985 40 40 40
#> 3 1986 52 52 52
#> 4 1987 60 60 60
#> 5 1988 90 90 90
Created on 2023-01-14 with reprex v2.0.2
One difference between reframe and summarise is that reframe returns an ungrouped dataframe.
You could also use summarise like this:
library(dplyr)
df %>%
group_by(year) %>%
summarise(across(op:cp, ~sum(sort(.x, decreasing = TRUE)[1:2], na.rm = TRUE)))
Data
data <-
structure(list(year = c(1984L, 1985L, 1986L, 1987L, 1988L, 1985L,
1986L, 1987L, 1988L, 1985L, 1986L, 1986L, 1987L, 1988L),
id = c(1L,1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L),
op = c(10L,20L, 30L, 40L, 50L, 15L, 17L, 18L, 19L, 20L, 22L, 10L, 20L, 40L),
tr = c(10L, 20L, 30L, 40L, 50L, 15L, 17L, 18L, 19L, 20L, 22L,10L, 20L, 40L),
cp = c(10L, 20L, 30L, 40L, 50L, 15L, 17L, 18L,19L, 20L, 22L, 10L, 20L, 40L)),
class = "data.frame",row.names = c(NA,-14L))
Code
library(dplyr)
data %>%
select(-id) %>%
group_by(year) %>%
slice_max(n = 2,order_by = op) %>%
summarise(across(.fns = ~sum(.,na.rm = TRUE)))
Output
# A tibble: 5 x 4
year op tr cp
<int> <int> <int> <int>
1 1984 10 10 10
2 1985 40 40 40
3 1986 52 52 52
4 1987 60 60 60
5 1988 90 90 90
Using data.table
library(data.table)
setDT(df1)[, lapply(.SD, \(x) sum(head(x[order(-x)],2), na.rm = TRUE)),
year, .SDcols = op:cp]
-output
year op tr cp
1: 1984 10 10 10
2: 1985 40 40 40
3: 1986 52 52 52
4: 1987 60 60 60
5: 1988 90 90 90
Please try the below code
library(dplyr)
data2 <- data_a %>% dplyr::arrange(year,desc(op),id) %>% group_by(year) %>%
slice_head(n=2) %>% mutate(across(c('op','tr','cp'), ~ sum(.x), .names = '{col}2')) %>% slice_head(n=1) %>%
select(-id,-op,-tr,-cp)

Change the values for multiple columns of a data frame to NA that are located within the range of two vectors

Using R: I want to change the values for each column of my df which are located within the range of the corresponding columns from my other dataframes to NA.
It works when I have only one column each:
df<
days X1
1 20
2 30
3 50
4 10
5 10
6 20
7 10
8 70
9 90
10 20
start_vec<-c(4)
end_vec<-c(8)
by using:
df[df$days %in% start_vec:end_vec, ]<-NA
where I get:
desired_df<
days X1
1 20
2 30
3 50
NA NA
NA NA
NA NA
NA NA
NA NA
9 90
10 20
But actually I have a dataframe with more than three columns and more than three different starting and ending vectors, as you can see here:
df
days X1 X2 X3 Xn
1 20 10 20 ...
2 30 50 40 ...
3 50 40 40 ...
4 10 70 20 ...
5 10 10 30 ...
6 20 80 50 ...
7 10 30 70 ...
8 70 10 10 ...
9 90 10 70 ...
10 20 50 10 ...
start_vec<-c(4,5,2,n)
end_vec<-c(8,9,7,n)
my desired dataframe is this:
desired_df<
days X1 X2 X3 Xn
1 20 10 20 ...
2 30 50 NA ...
3 50 40 NA ...
4 NA 70 NA ...
5 NA NA NA ...
6 NA NA NA ...
7 NA NA NA ...
8 NA NA 10 ...
9 90 NA 70 ...
10 20 50 10 ...
Where NAs for X1 of df are defined by the range between 1st start_vec and 1st end_vec.
X2 of df is defined by the range between 2nd start_vec and 2nd end_vec and so on...
I would need a function that works for n columns, since my original data frame has more than 100 columns. For each column I have the corresponding vectors.
I tried with apply in order to apply condition to every column:
desired_df<-apply(df, 2, function(x) x[df$days %in% start_vec:end_vec]<-NA)
What I get is a df with NAs only.
Do you have any idea how I can change the values for each column of my df to NA where days or the index is within the range of the corresponding vectors?
Thanks for any help!
df <- data.frame(
days = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L),
X1 = c(20L, 30L, 50L, 10L, 10L, 20L, 10L, 70L, 90L, 20L),
X2 = c(10L, 50L, 40L, 70L, 10L, 80L, 30L, 10L, 10L, 50L),
X3 = c(20L, 40L, 40L, 20L, 30L, 50L, 70L, 10L, 70L, 10L)
)
start_df <- c(4, 5, 2)
end_df <- c(8, 9, 7)
mat <-
data.frame(row = c(4:8, 5:9, 2:7), col = c(rep(1, 5), rep(2, 5), rep(3, 6)))
df[-1][as.matrix(mat[, c("row", "col")])] <- NA
df
#> days X1 X2 X3
#> 1 1 20 10 20
#> 2 2 30 50 NA
#> 3 3 50 40 NA
#> 4 4 NA 70 NA
#> 5 5 NA NA NA
#> 6 6 NA NA NA
#> 7 7 NA NA NA
#> 8 8 NA NA 10
#> 9 9 90 NA 70
#> 10 10 20 50 10
Created on 2022-08-19 with reprex v2.0.2

lapply alternative to for loop to append to data frame

I have a data frame:
df<-structure(list(chrom = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"),
pos = c(10L, 200L, 134L, 400L, 600L, 1000L, 20L, 33L, 40L,
45L, 50L, 55L, 100L, 123L)), .Names = c("chrom", "pos"), row.names = c(NA, -14L), class = "data.frame")
> head(df)
chrom pos
1 1 10
2 1 200
3 1 134
4 1 400
5 1 600
6 1 1000
And I want to calculate pos[i+1] - pos[i] on the sample chromosome (chrom)
By using a for loop over each chrom level, and another over each row I get the expected results:
for (c in levels(df$chrom)){
df_chrom<-filter(df, chrom == c)
df_chrom<-arrange(df_chrom, df_chrom$pos)
for (i in 1:nrow(df_chrom)){
dist<-(df_chrom$pos[i+1] - df_chrom$pos[i])
logdist<-log10(dist)
cat(c, i, df_chrom$pos[i], dist, logdist, "\n")
}
}
However, I want to save this to a data frame, and think that lapply or apply is the right way to go about this. I can't work out how to make the pos[i+1] - pos[i] calculation though (seeing as lapply works on each row/column.
Any pointers would be appreciated
Here's the output from my solution:
chrom index pos dist log10dist
1 1 10 124 2.093422
1 2 134 66 1.819544
1 3 200 200 2.30103
1 4 400 200 2.30103
1 5 600 400 2.60206
1 6 1000 NA NA
2 1 20 13 1.113943
2 2 33 NA NA
3 1 40 5 0.69897
3 2 45 NA NA
4 1 50 5 0.69897
4 2 55 45 1.653213
4 3 100 23 1.361728
4 4 123 NA NA
We could do this using a group by difference. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'chrom', order the 'pos', get the difference of 'pos' (diff) and also log of the difference
library(data.table)
setDT(df)[order(pos), {v1 <- diff(pos)
.(index = seq_len(.N), pos = pos,
dist = c(v1, NA), logdiff = c(log10(v1), NA))}
, by = chrom]
# chrom index pos dist logdiff
# 1: 1 1 10 124 2.093422
# 2: 1 2 134 66 1.819544
# 3: 1 3 200 200 2.301030
# 4: 1 4 400 200 2.301030
# 5: 1 5 600 400 2.602060
# 6: 1 6 1000 NA NA
# 7: 2 1 20 13 1.113943
# 8: 2 2 33 NA NA
# 9: 3 1 40 5 0.698970
#10: 3 2 45 NA NA
#11: 4 1 50 5 0.698970
#12: 4 2 55 45 1.653213
#13: 4 3 100 23 1.361728
#14: 4 4 123 NA NA
Upon running the OP's code the output printed are
#1 1 10 124 2.093422
#1 2 134 66 1.819544
#1 3 200 200 2.30103
#1 4 400 200 2.30103
#1 5 600 400 2.60206
#1 6 1000 NA NA
#2 1 20 13 1.113943
#2 2 33 NA NA
#3 1 40 5 0.69897
#3 2 45 NA NA
#4 1 50 5 0.69897
#4 2 55 45 1.653213
#4 3 100 23 1.361728
#4 4 123 NA NA
We split df by df$chrom (Note that we reorder both df and df$chrom before splitting). Then we go through each of the subgroups (the subgroups are called a in this example) using lapply. On the pos column of each subgroup, we calculate difference (diff) of consecutive elements and take log10. Since diff decreases the number of elements by 1, we add a NA to the end. Finally, we rbind all the subgroups together using do.call.
do.call(rbind, lapply(split(df[order(df$chrom, df$pos),], df$chrom[order(df$chrom, df$pos)]),
function(a) data.frame(a, dist = c(log10(diff(a$pos)), NA))))
# chrom pos dist
#1.1 1 10 2.093422
#1.3 1 134 1.819544
#1.2 1 200 2.301030
#1.4 1 400 2.301030
#1.5 1 600 2.602060
#1.6 1 1000 NA
#2.7 2 20 1.113943
#2.8 2 33 NA
#3.9 3 40 0.698970
#3.10 3 45 NA
#4.11 4 50 0.698970
#4.12 4 55 1.653213
#4.13 4 100 1.361728
#4.14 4 123 NA

Combine two data frames considering levels of factor of one data frame and column name of another data frame using r

I need to create a new column for a existing data frame considering levels of factors. I have 2 data frames called dat_group and dat_prices. These data frames look like below.
dat_group
Group
1 A
2 A
3 A
4 A
5 A
6 A
7 A
8 A
9 A
10 A
11 C
12 C
13 C
14 C
15 C
16 C
17 C
18 C
19 C
20 C
21 B
22 B
23 B
24 B
25 B
26 B
27 B
28 B
29 B
30 B
dat_price
A B C
1 21 45 24
2 21 45 24
3 21 45 24
4 21 45 24
5 15 11 10
6 15 11 10
7 15 11 10
8 20 13 55
9 20 13 55
10 20 13 55
I need to paste the values of A,B and C columns considering the level in dat_group. The row sequence should be the same order. If I create new column to dat_group as "price"
dat_group$Price<-NA
Then the data frame should be like ;
Group Price
1 A 21
2 A 21
3 A 21
4 A 21
5 A 15
6 A 15
7 A 15
8 A 20
9 A 20
10 A 20
11 C 24
12 C 24
13 C 24
14 C 24
15 C 10
16 C 10
17 C 10
18 C 55
19 C 55
20 C 55
21 B 45
22 B 45
23 B 45
24 B 45
25 B 11
26 B 11
27 B 11
28 B 13
29 B 13
30 B 13
I tried to do this using some available examples e.g.1 e.g.2, but did not work.
Please could anybody help me. The two example data frames can be accessed in following codes. My actual data set has several 1000 rows.
dat_group<- structure(list(Group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B", "C"), class = "factor")), .Names = "Group", class = "data.frame", row.names = c(NA,
-30L))
dat_price<-structure(list(A = c(21L, 21L, 21L, 21L, 15L, 15L, 15L, 20L,
20L, 20L), B = c(45L, 45L, 45L, 45L, 11L, 11L, 11L, 13L, 13L,
13L), C = c(24L, 24L, 24L, 24L, 10L, 10L, 10L, 55L, 55L, 55L)), .Names = c("A",
"B", "C"), class = "data.frame", row.names = c(NA, -10L))
library(data.table)
dat_price <- as.data.table(dat_price)
dat_price_new <- cbind(dat_price[, c(1,3), with = FALSE],
dat_price[, 2, with = FALSE])
melt(dat_price_new)
A more defensive solution to your problem at hand. Hopefully this will work even if all of your factor's levels are not in identical multiples.
library(dplyr); library(purrr); library(magrittr)
dat_group$original_order <- seq(1:nrow(dat_group))
dat_group %<>%
split(.$Group) %>%
map(~ mutate(., Price = rep(na.omit(dat_price[,unique(Group)]), n()/length(na.omit(dat_price[,unique(Group)]))))) %>%
bind_rows() %>%
arrange(original_order) %>%
select(-original_order)
dat_group
Group Price
1 A 21
2 A 21
3 A 21
4 A 21
5 A 15
6 A 15
7 A 15
8 A 20
9 A 20
10 A 20
11 C 24
12 C 24
13 C 24
14 C 24
15 C 10
16 C 10
17 C 10
18 C 55
19 C 55
20 C 55
21 B 45
22 B 45
23 B 45
24 B 45
25 B 11
26 B 11
27 B 11
28 B 13
29 B 13
30 B 13
Original (lazy) solution:
dat_group$Price <- rep(unlist(dat_price), length.out = nrow(dat_group))

R: How to create a new data frame from two other data frames

I have two data frames containing related data. It is related to the NFL. One df has player names and receiving targets by week (player df):
Player Tm Position 1 2 3 4 5 6
1 A.J. Green CIN WR 13 8 11 12 8 10
2 Aaron Burbridge SFO WR 0 1 0 2 0 0
3 Aaron Ripkowski GNB RB 0 0 0 0 0 1
4 Adam Humphries TAM WR 5 8 12 4 2 0
5 Adam Thielen MIN WR 5 5 4 3 8 0
6 Adrian Peterson MIN RB 2 3 0 0 0 0
The other data frame has recieving targets summed by team for each week (team df):
Tm `1` `2` `3` `4` `5` `6`
<fctr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ARI 37 35 50 45 26 35
2 ATL 38 34 30 37 28 41
3 BAL 32 45 40 51 47 48
4 BUF 22 30 20 33 20 26
5 CAR 31 39 36 47 28 46
6 CHI 28 29 45 36 41 49
7 CIN 30 54 28 31 39 31
8 CLE 26 33 38 38 35 42
9 DAL 43 30 24 32 24 27
10 DEN 26 32 35 31 34 47
# ... with 22 more rows
What I am trying to do is create another data frame containing the target percentage by player, by week. So I need to match the team from the "Tm" column in the player df and the week column header (1-6).
I have figured out how to do this by merging them and then creating new rows, but as I add more data (weeks) I need to write more code:
a <- merge(playertgt, teamtgt, by="Tm") #merges the two
a$Wk1 <- a$`1.x` / a$`1.y`
a$Wk2 <- a$`2.x` / a$`2.y`
a$Wk3 <- a$`3.x` / a$`3.y`
So what I am looking for is a good way to do this that will auto update and doesn't make me have to create a df with a bunch of columns I don't need and that will update with new weeks as I add them to my source data.
If this is answered somewhere else I apologize, but I have been looking for a good way to do this for a day now, and I can't find it. Thanks in advance for your help!
You can do this with dplyr:
library(dplyr)
## Do a left outer join to match each player with total team targets
a <- left_join(playertgt,teamtgt, by="Tm")
## Compute percentage over all weeks selecting player columns ending with ".x"
## and dividing by corresponding team columns ending with ".y"
tgt.pct <- select(a,ends_with(".x")) / select(a,ends_with(".y"))
## set the column names to week + number
colnames(tgt.pct) <- paste0("week",seq_len(ncol(teamtgt)-1))
## construct the output data frame adding back the player and team columns
tgt.pct <- data.frame(Player=playertgt$Player,Tm=playertgt$Tm,tgt.pct)
Clearly, I am only using dplyr for the convenience of ends_with in selecting the columns after the join. A base-R approach using grepl to do this selection is:
a <- merge(playertgt, teamtgt, by="Tm", all.x=TRUE)
tgt.pct <- subset(a,select=grepl(".x$",colnames(a))) / subset(a,select=grepl(".y$",colnames(a)))
colnames(tgt.pct) <- paste0("week",seq_len(ncol(teamtgt)-1))
tgt.pct <- data.frame(Player=playertgt$Player,Tm=playertgt$Tm,tgt.pct)
Data: with your limited posted data, only AJ Green will have his target percentage computed:
playertgt <- structure(list(Player = structure(1:6, .Label = c("A.J. Green",
"Aaron Burbridge", "Aaron Ripkowski", "Adam Humphries", "Adam Thielen",
"Adrian Peterson"), class = "factor"), Tm = structure(c(1L, 4L,
2L, 5L, 3L, 3L), .Label = c("CIN", "GNB", "MIN", "SFO", "TAM"
), class = "factor"), Position = structure(c(2L, 2L, 1L, 2L,
2L, 1L), .Label = c("RB", "WR"), class = "factor"), X1 = c(13L,
0L, 0L, 5L, 5L, 2L), X2 = c(8L, 1L, 0L, 8L, 5L, 3L), X3 = c(11L,
0L, 0L, 12L, 4L, 0L), X4 = c(12L, 2L, 0L, 4L, 3L, 0L), X5 = c(8L,
0L, 0L, 2L, 8L, 0L), X6 = c(10L, 0L, 1L, 0L, 0L, 0L)), .Names = c("Player",
"Tm", "Position", "X1", "X2", "X3", "X4", "X5", "X6"), class = "data.frame", row.names = c(NA,
-6L))
## Player Tm Position X1 X2 X3 X4 X5 X6
##1 A.J. Green CIN WR 13 8 11 12 8 10
##2 Aaron Burbridge SFO WR 0 1 0 2 0 0
##3 Aaron Ripkowski GNB RB 0 0 0 0 0 1
##4 Adam Humphries TAM WR 5 8 12 4 2 0
##5 Adam Thielen MIN WR 5 5 4 3 8 0
##6 Adrian Peterson MIN RB 2 3 0 0 0 0
teamtgt <- structure(list(Tm = structure(1:10, .Label = c("ARI", "ATL",
"BAL", "BUF", "CAR", "CHI", "CIN", "CLE", "DAL", "DEN"), class = "factor"),
X1 = c(37L, 38L, 32L, 22L, 31L, 28L, 30L, 26L, 43L, 26L),
X2 = c(35L, 34L, 45L, 30L, 39L, 29L, 54L, 33L, 30L, 32L),
X3 = c(50L, 30L, 40L, 20L, 36L, 45L, 28L, 38L, 24L, 35L),
X4 = c(45L, 37L, 51L, 33L, 47L, 36L, 31L, 38L, 32L, 31L),
X5 = c(26L, 28L, 47L, 20L, 28L, 41L, 39L, 35L, 24L, 34L),
X6 = c(35L, 41L, 48L, 26L, 46L, 49L, 31L, 42L, 27L, 47L)), .Names = c("Tm",
"X1", "X2", "X3", "X4", "X5", "X6"), class = "data.frame", row.names = c(NA,
-10L))
## Tm X1 X2 X3 X4 X5 X6
##1 ARI 37 35 50 45 26 35
##2 ATL 38 34 30 37 28 41
##3 BAL 32 45 40 51 47 48
##4 BUF 22 30 20 33 20 26
##5 CAR 31 39 36 47 28 46
##6 CHI 28 29 45 36 41 49
##7 CIN 30 54 28 31 39 31
##8 CLE 26 33 38 38 35 42
##9 DAL 43 30 24 32 24 27
##10 DEN 26 32 35 31 34 47
The result is:
## Player Tm week1 week2 week3 week4 week5 week6
##1 A.J. Green CIN 0.4333333 0.1481481 0.3928571 0.3870968 0.2051282 0.3225806
##2 Aaron Burbridge SFO NA NA NA NA NA NA
##3 Aaron Ripkowski GNB NA NA NA NA NA NA
##4 Adam Humphries TAM NA NA NA NA NA NA
##5 Adam Thielen MIN NA NA NA NA NA NA
##6 Adrian Peterson MIN NA NA NA NA NA NA
it would be nice if you provide a bit of data the next time, that makes live a lot easier.
I think the main point is your data structure. I think you have to put your data into a long format (keyword is tidy-data I guess). I made up some data and hope I understood your problem correctly.
library(tidyr)
library(dplyr)
player_df = data.frame(team = c('ARI', 'BAL', 'BAL', 'CLE', 'CLE'),
player =c('A', 'B', 'C', 'D', 'F'),
'1' = floor(runif(5, min=1, max=2)*10),
'2' = floor(runif(5, min=1, max=2)*10))
> player_df
team player X1 X2
1 ARI A 15 10
2 BAL B 16 15
3 BAL C 13 11
4 CLE D 14 19
5 CLE F 12 14
team_df = data.frame(team = c('ARI', 'BAL', 'CLE'),
'1' = floor(runif(3, min=10, max=20)*20),
'2' = floor(runif(3, min=10, max=20)*20))
> team_df
team X1 X2
1 ARI 281 205
2 BAL 362 309
3 CLE 323 238
Now, put both dataframes into a long format:
player_df = gather(player_df, week, player_value, -team, -player)
team_df = gather(team_df, week, team_value, -team)
> player_df
team player week player_value
1 ARI A X1 15
2 BAL B X1 16
3 BAL C X1 13
4 CLE D X1 14
5 CLE F X1 12
6 ARI A X2 10
7 BAL B X2 15
8 BAL C X2 11
9 CLE D X2 19
10 CLE F X2 14
> team_df
team week team_value
1 ARI X1 281
2 BAL X1 362
3 CLE X1 323
4 ARI X2 205
5 BAL X2 309
6 CLE X2 238
Now, join (or merge) them together. inner_join will by default join on common column names.
join_db = inner_join(player_df, team_df)
> join_db
team player week player_value team_value
1 ARI A X1 15 281
2 BAL B X1 16 362
3 BAL C X1 13 362
4 CLE D X1 14 323
5 CLE F X1 12 323
6 ARI A X2 10 205
7 BAL B X2 15 309
8 BAL C X2 11 309
9 CLE D X2 19 238
10 CLE F X2 14 238
I think in that format you can do a lot more.
HTH
Stefan

Resources