I have found a lot of material about counting a value in table, but my goal is little different and I havent found any source.
This is my data
ID_1 ID_2 Date RESULT
1 12 3 2011-12-21 0
2 12 13 2011-12-22 0
3 3 12 2011-12-22 1
4 15 13 2011-12-23 0
5 12 13 2011-12-23 1
5 13 15 2011-12-23 1
6 3 12 2011-12-23 0
7 12 13 2011-12-23 0
TARGET
ID_1 ID_2 Date RESULT H2H_ID1 H2H_ID2
1 12 3 2011-12-21 0 0 0
2 12 13 2011-12-22 0 0 0
3 3 12 2011-12-22 1 1 0
4 15 13 2011-12-23 0 0 0
5 12 13 2011-12-23 1 0 1
5 13 15 2011-12-23 1 1 0
6 3 12 2011-12-23 0 2 0
7 12 13 2011-12-23 0 1 1
...
and so on
In RESULT, 0 is a match won by id2, 1 when is won by id1.
I need 2 columns (h2h_id1, h2h_id2) that count the match previously won by the same players (id1 and id2), the traditional head-to-head.
I'll make an example. Row 3.
ID1=3 and ID2=12.
The row where id 3 and id 12 had a previous match is row1, and the winner of the match is id2 (result=0).
So in the row 3 I want to read 1 in the H2H_ID1.
In row 6, same conditions, 2 matches with same ids and same result.
In another post, for a similiar tastk, (column with previous result)
they gave me this code to find only 1 match before (and without sum) but maybe could help.
# emulate the original dataframe
ID_1 <- c(12,12,3,15,16,3)
ID_2<-c(3,13,12,13,17,15)
ids <- cbind(ID_1, ID_2) # IDs columns
x1 <- c(15, 50, 20, 30, 51, 60)
y2 <- c(10, 40, 30, 20, 53, 62)
vars <- cbind(x1, y2) # x&y columns
FindPreviousIDsPair <- function(id_matrix, i_of_row) {
shorten_matrix <- id_matrix[1:(i_of_row - 1),,drop = FALSE]
string_to_search_for <- id_matrix[i_of_row, ]
string_to_search_for_sorted <-
string_to_search_for[order(string_to_search_for)]
found_rows_boolean <- sapply(FUN = function(i) all(shorten_matrix[i,
order(shorten_matrix[i, ])] ==
string_to_search_for_sorted), X = 1:(i_of_row - 1))
found_row_n <- ifelse(any(found_rows_boolean),
max(which(found_rows_boolean)), NA_real_)
found_col_of_DI1 <- ifelse(any(found_rows_boolean),
match(string_to_search_for[1], shorten_matrix[found_row_n, ]), NA_real_)
found_col_of_DI2 <- ifelse(any(found_rows_boolean),
match(string_to_search_for[2], shorten_matrix[found_row_n, ]), NA_real_)
return(c(found_row_n, found_col_of_DI1, found_col_of_DI2))
}
Thanks for your help.
Related
I'd like to loop through the following data frame in order of the sum of the first 2 column values for each row, and then assign the third column value a number as a result of that.
Initial Table:
Col 1
Col 2
Col 3
20
0
5
0
20
0
0
10
20
0
10
0
20
40
15
0
The sums of columns 1 and 2 give:
20+0=20
5+0=5
20+0=20
0+10=10
20+0=20
10+0=10
20+40=60
15+0=15
Col 1
Col 2
Col 3
20
0
10
5
0
20
20
0
10
0
10
20
20
0
10
10
0
20
20
40
5
15
0
20
The 3 lowest sums get Col 3 value 20, the next 4 lowest get value 10, and the highest value gets 5.
This can be done using a single assignment rather than a loop, for example:
#Example data
df <- data.frame(col1 = c(20, 5, 20, 0, 21, 10, 20, 15), col2=c(0,0,0,10,0,0,40,0))
#Add dummy values
df$col3 <- NA
#Assign required values
df$col3[order(df$col1+df$col2)] <- rep(c(20,10,5), c(3,4,1))
df
# col1 col2 col3
#1 20 0 10
#2 5 0 20
#3 20 0 10
#4 0 10 20
#5 21 0 10
#6 10 0 20
#7 20 40 5
#8 15 0 10
Let's take the example you gave:
df <- data.frame(Col1 = c(20,5,20,0,20,10,20,15),
Col2 = c(0,0,0,10,0,0,40,0))
colnames(df) <- c("Col 1", "Col 2")
We then can do this:
library(dplyr)
df <- df %>%
mutate(`Col 3` = `Col 1` + `Col 2`)
col3_values <- sort(df$`Col 3`)
df <- df %>%
mutate(`Col 3` = case_when(`Col 3` <= col3_values[[3]] ~ 20,
`Col 3` > col3_values[[3]] & `Col 3` <= col3_values[[7]] ~ 10,
TRUE ~ 5))
Output:
Col 1 Col 2 Col 3
1 20 0 10
2 5 0 20
3 20 0 10
4 0 10 20
5 20 0 10
6 10 0 20
7 20 40 5
8 15 0 10
Note that the last line isn't what you expected because the sum isn't one of the 3 smallest (you have a 5 and two 10 before).
But as Limey commented, this wont work if you have more than 8 rows. You will have to change the bounds where the given value is affected
I have the following data frame:
A tibble: 47,898 x 4
# Groups: countyfips [1,774]
countyfips day_month_year new_case_rate_07da case_rate
<dbl> <date> <dbl> <dbl>
1 1001 2020-01-12 0 0
2 1001 2020-01-19 0 0
3 1001 2020-01-26 0 0
4 1001 2020-02-02 0 0
5 1001 2020-02-09 0 0
6 1001 2020-02-16 0 0
7 1001 2020-02-23 0 0
8 1001 2020-03-01 0 0
9 1001 2020-03-08 0 0
10 1001 2020-03-15 0 0
# … with 47,888 more rows
I would like to calculate the change between the case rates for each county each week.
I have tried the following code:
Affinity_County_Weekly.csv %>% group_by(countyfips) %>% mutate(change_vs_last_week_case_rate = (case_rate-lag(case_rate))/lag(case_rate)
But the output for that has been zero for all rows (even though the values in rows after 10 are not zero).
How would you try to fix this problem? Thank you very much.
Have a look at diff which computes the lagged difference of vector or matrix x between lag rows (default 1).
Take the following example:
d <- data.frame(cases = c(4, 2, 3, 10, 0, 100))
diff(d$cases)
# [1] -2 1 7 -10 100
Effectively it creates a new vector that calculates the difference between 1 and 2, 2 and 3, 3 and 4, etc. This is also why the result only has 5 values, as there is no index 0 to subtract from index 1. (Note that with a different lag, there may be fewer values)
So depending on what you want the value in the first row to be, you can append a column as follows:
d$newCases <- c(0, diff(d$cases))
# cases newCases
# 1 4 0
# 2 2 -2
# 3 3 1
# 4 10 7
# 5 0 -10
# 6 100 100
In your specific problem, I imagine this should work:
Affinity_County_Weekly.csv %>% group_by(countyfips) %>% mutate(change_vs_last_week_case_rate = c(0, diff(case_rate))
I would like to create a data frame in which in the first column I will have all the dates from a certain period of time and in the second the number of events that occurred on each date including dates when no events occurred. I would also like to count the events to which specific factors have been assigned
The first data frame in which I have the events with dates for a given date:
Row Sex Age Date
1 2 36 2004-01-05
2 1 47 2004-01-06
3 1 26 2004-01-10
4 2 23 2004-01-20
5 1 50 2004-01-27
6 2 35 2004-01-28
7 1 35 2004-01-30
8 1 38 2004-02-06
9 2 29 2004-02-11
Where in the column "Sex" 1 means female and 2 male.
Second data frame in which I have dates from the examined period:
Row Date
1 2004-01-05
2 2004-01-06
3 2004-01-07
4 2004-01-08
5 2004-01-09
6 2004-01-10
7 2004-01-11
8 2004-01-12
9 2004-01-13
10 2004-01-14
I want to get a data frame that looks like this:
Row Date Events (All) Events (Female) Events (Male)
1 2004-01-05 1 0 1
2 2004-01-06 1 1 0
3 2004-01-07 0 0 0
4 2004-01-08 0 0 0
5 2004-01-09 0 0 0
6 2004-01-10 0 1 0
7 2004-01-11 0 0 0
8 2004-01-12 0 0 0
9 2004-01-13 0 0 0
10 2004-01-14 0 0 0
Can anyone help?
Here's one method:
library(data.table)
library(magrittr) # just for %>%
out <- dat1 %>%
dcast(Date ~ Sex, data = ., fun.aggregate = length) %>%
setnames(., c("1", "2"), c("Female", "Male")) %>%
.[ dat2[ , .(Date)], on = "Date" ] %>%
.[, lapply(.SD, function(a) replace(a, is.na(a), 0)), ] %>%
.[, All := Female + Male ]
out
# Date Female Male All
# 1: 2004-01-05 0 1 1
# 2: 2004-01-06 1 0 1
# 3: 2004-01-07 0 0 0
# 4: 2004-01-08 0 0 0
# 5: 2004-01-09 0 0 0
# 6: 2004-01-10 1 0 1
# 7: 2004-01-11 0 0 0
# 8: 2004-01-12 0 0 0
# 9: 2004-01-13 0 0 0
# 10: 2004-01-14 0 0 0
Note that the use of lapply might not be the overall fastest method to replace NA with 0, but it gets the point across. Also, I use magrittr::%>% merely to break out steps, this can be done easily without %>%.
Data:
dat1 <- fread(text = "
Row Sex Age Date
1 2 36 2004-01-05
2 1 47 2004-01-06
3 1 26 2004-01-10
4 2 23 2004-01-20
5 1 50 2004-01-27
6 2 35 2004-01-28
7 1 35 2004-01-30
8 1 38 2004-02-06
9 2 29 2004-02-11")
dat2 <- fread(text = "
Row Date
1 2004-01-05
2 2004-01-06
3 2004-01-07
4 2004-01-08
5 2004-01-09
6 2004-01-10
7 2004-01-11
8 2004-01-12
9 2004-01-13
10 2004-01-14")
A tidyversion:
dat1 <- read.table(header = TRUE, stringsAsFactors = FALSE, text = "
Row Sex Age Date
1 2 36 2004-01-05
2 1 47 2004-01-06
3 1 26 2004-01-10
4 2 23 2004-01-20
5 1 50 2004-01-27
6 2 35 2004-01-28
7 1 35 2004-01-30
8 1 38 2004-02-06
9 2 29 2004-02-11")
dat2 <- read.table(header = TRUE, stringsAsFactors = FALSE, text = "
Row Date
1 2004-01-05
2 2004-01-06
3 2004-01-07
4 2004-01-08
5 2004-01-09
6 2004-01-10
7 2004-01-11
8 2004-01-12
9 2004-01-13
10 2004-01-14")
library(dplyr)
library(tidyr)
as_tibble(dat1) %>%
group_by(Date, Sex) %>%
tally() %>%
ungroup() %>%
pivot_wider(id_cols = "Date", names_from = "Sex", values_from = "n",
values_fill = list(n = 0)) %>%
rename(Female = "1", Male = "2") %>%
left_join(select(dat2, Date), ., by = "Date") %>%
mutate_at(vars(Female, Male), ~ replace(., is.na(.), 0)) %>%
mutate(All = Female + Male)
I have a dataset similar to the following format:
Account_ID Date Delinquency age count
1 01/01/2016 0 1 0
1 02/01/2016 1 2 0
1 03/01/2016 2 3 1
1 04/01/2016 0 4 2
1 05/01/2016 1 5 2
1 06/01/2016 2 6 2
2 01/01/2016 0 1 0
2 02/01/2016 0 2 0
2 03/01/2016 1 3 0
2 04/01/2016 0 4 1
2 05/01/2016 1 5 1
3 01/01/2016 1 1 0
3 02/01/2016 2 2 1
3 03/01/2016 3 3 2
3 04/01/2016 4 4 3
3 05/01/2016 5 5 4
3 06/01/2016 6 6 5
I want to count the number of non-zeros in the previous 3 months by account for each row, i.e. I want to create the count variable using the first 4 variables (Account_ID, Date, Delinquency, Age). I would like to know how to do this for n past months. I'm hoping I can extend this exercise to other tasks such as finding the max delinquency in the past 3 months.
welcome to SE!
In case you would like to count non-zero deliquency event for 3 previous months by account for each row, you can use aggregate function as well as zlag function of TSA package in a following manner (see the code below). As the data you provided in count column are dificult to interpret as well as to connect with the condition provided the data in an example were simulated.
library(lubridate)
set.seed(123)
# data simulation
df <- data.frame( id = factor(rep(0:9, 100)),
date = sample(seq(ymd("2010-12-01"), by = 1, length.out = 1000), 1000, replace = TRUE),
deliquency = sample(c(rep(0, 30), 1:5), 1000, replace = TRUE),
age = sample(1:10, 1000, replace = TRUE))
head(df)
# id date deliquency age
# 1 0 2011-08-06 0 10
# 2 1 2013-08-16 0 6
# 3 2 2012-11-17 0 1
# 4 3 2012-09-12 0 9
# 5 4 2011-07-29 0 1
# 6 5 2011-02-25 0 9
# aggregation of non-zero deliquency by month
df$year_month <- df$date
day(df$year_month) <- 1
df_m <- aggregate(deliquency ~ id + year_month, data = df, sum)
df_m <- df_m[order(as.character(df_m$id, df_m$year_month)), ]
df_m$is_zero <- df_m$deliquency > 0
head(df_m)
# id year_month deliquency is_zero
# 1 0 2010-12-01 1 TRUE
# 10 0 2011-01-01 0 FALSE
# 19 0 2011-02-01 0 FALSE
# 29 0 2011-03-01 0 FALSE
# 39 0 2011-04-01 0 FALSE
# 65 0 2011-07-01 1 TRUE
# calculate zero-deliquency events for three last months
library(TSA)
dfx <- df_m
df_m_l <- by(df_m, df_m$id, function(dfx) {
dfx$zero_del <- zlag(dfx$is_zero, 1) + zlag(dfx$is_zero, 2) + zlag(dfx$is_zero, 3)
dfx})
df_m_res <- do.call(rbind, df_m_l)
head(df_m_res)
You can see as an output the data.frame which shows non-zero amount of deliquency events in the last 3 months. E.g. output here is:
id year_month deliquency is_zero zero_del
0.1 0 2010-12-01 1 TRUE NA
0.10 0 2011-01-01 0 FALSE NA
0.19 0 2011-02-01 0 FALSE NA
0.29 0 2011-03-01 0 FALSE 1
0.39 0 2011-04-01 0 FALSE 0
0.65 0 2011-07-01 1 TRUE 0
My data frame looks like this
personID t1 t2 t3
1 0 11 0
1 0 11 0
2 0 11 13
2 0 11 13
3 0 0 0
3 0 0 0
I need to make sure that each person has one test score above 10. If they do not, they have to be removed from the data frame. I also want to keep track of the lowest score above 10, and add it to a new column.
Thus, the result would look like this:
personID t1 t2 t3 new
1 0 11 0 11
1 0 11 0 11
2 0 11 13 11
2 0 11 13 11
If I was to go the data.table route, I think you could do it with a melt and join:
library(data.table)
setDT(dat)
dat[
melt(dat, id.vars="personID")[value > 10, .(new=min(value)), by=personID],
on="personID"
]
# personID t1 t2 t3 new
#1: 1 0 11 0 11
#2: 1 0 11 0 11
#3: 2 0 11 13 11
#4: 2 0 11 13 11
using data.table
library(data.table)
#convert your data (named DF here) to use data.table syntax
setDT(DF)
DF[ , {
# vector of row-wise minima within ID
m = do.call(pmin, .SD)
# confirm acceptance condition
if (min(m) > 10)
# add new column by appending it to current data
c(.SD, list(new = m))
}, by = personID]