How do I find last date in which a value increased in another column? - r

I have a data frame in R that looks something like this:
person date level
Alex 2007-06-01 3
Alex 2008-12-01 4
Alex 2009-12-01 3
Beth 2008-03-01 6
Beth 2010-10-01 6
Beth 2010-12-01 6
Mary 2009-11-04 9
Mary 2012-04-25 9
Mary 2013-09-10 10
I have sorted it first by "person" and second by "date".
I am trying to find out when the last increase in "level" occurred for each person. Ideally, the output would look something like:
person date
Alex 2008-12-01
Beth NA
Mary 2013-09-10

Using dplyr
library(dplyr)
dat %>% group_by(person) %>%
mutate(inc = c(F, diff(level) > 0)) %>%
summarize(date = last(date[inc], default = NA))
Yielding:
Source: local data frame [3 x 2]
person date
1 Alex 2008-12-01
2 Beth <NA>
3 Mary 2013-09-10

Try data.table version:
library(data.table)
setDT(dat)[order(person),diff:=c(NA,diff(level)),by=person][diff>0,tail(.SD,1),by=person][,-c(3,4),with=F]
person date
1: Alex 2008-12-01
2: Mary 2013-09-10
If na also needs to be included:
dd=setDT(dat)[order(person),diff:=c(NA,diff(level)),by=person][diff>0,tail(.SD,1),by=person][,-c(3,4),with=F]
dd2 =data.frame(unique(ddt[!(person %in% dd$person),,]$person),NA)
names(dd2) = c('person','date')
rbind(dd, dd2)
person date
1: Alex 2008-12-01
2: Mary 2013-09-10
3: Beth NA

A base-R version, using data frame df:
sapply(levels(df$Person), function(p) {
s <- df[df$Person==p,]
i <- 1+nrow(s)-match(TRUE,rev(diff(s$Level)>0))
ifelse(is.na(i), NA, as.character(s$Date[i]))
})
produces the named vector
Alex Beth Mary
"2008-12-01" NA "2013-09-10"
Easy to wrap this to produce any output format you need:
last.level.up <- function(df) {
data.frame(Date=sapply(levels(df$Person), function(p) {
s <- df[df$Person==p,]
i <- 1+nrow(s)-match(TRUE,rev(diff(s$Level)>0))
ifelse(is.na(i), NA, as.character(s$Date[i]))
}))
}
last.level.up(df)
Date
Alex 2008-12-01
Beth <NA>
Mary 2013-09-10

Related

Reshaping a dataset of patients with different numbers of diagnosis from long to wide [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 3 years ago.
I am a beginner, confronted with a big task and all the typical long to wide reshaping tools I found using the search function did not really do the job for me. I would be glad if someone could help me.
I try to achieve the following:
I have patientdata in which every patient has a unique patient number but multiple stays in hospital have lead to multiple cases per person. I want to work with these cases. Problem is, I have all the diagnoses per case but not everybody has the same number of diagnosis and I don't know how to tell R to create a new dagnosis (and date of diagnosis) variable each time there is already a diagnosis. Every help is highly appreciated!
So, I have a huge dataset that looks roughly like that:
Patient Case Diagnosis DateOfDiagnosis
1 John Doe 1 A 2010-10-10
2 John Doe 1 B 2010-10-10
3 John Doe 1 C 2010-10-10
4 Peter Griffin 2 D 2010-10-11
5 Peter Griffin 2 E 2010-10-11
6 Homer Simpson 3 F 2010-10-12
7 Homer Simpson 4 G 2010-10-13
I need row by case and I need all the diagnosis and their dates in separate variables. This would be no problem but there is no pattern in the cases or diagnosis so some patients have only one case others 5 and some cases have 1 others 5 diagnoses with respective date.
So what I need looks like this:
Patient Case Diag1 DateOfDiag1 Diag2 DateOfDiag2 Diag3 DateOfDiag3 ....
1 John Doe 1 A 2010-10-10 B 2010-10-10 C 2010-10-10
2 Peter Grif 2 D 2010-10-11 E 2010-10-11 NA NA
3 Homer Simp 3 F 2010-10-12 NA NA NA NA
4 Homer Simp 4 G 2010-10-13 NA NA NA NA
The code for my example is:
Patient <- c('John Doe','John Doe','John Doe', 'Peter Griffin','Peter Griffin', 'Homer Simpson', 'Homer Simpson')
Case <- c(1,1,1,2,2,3,4)
Diagnosis <- c('A','B','C','D','E','F','G')
DateOfDiagnosis <- as.Date(c('2010-10-10','2010-10-10','2010-10-10','2010-10-11','2010-10-11','2010-10-12','2010-10-13'))
df<-data.frame(Patient, Case, Diagnosis, DateOfDiagnosis)
Every help is highly appreciated!
Kind regards,
Jan
You could use pivot_wider, after creating a unique column.
library(dplyr)
library(tidyr)
df %>%
group_by(Patient, Case) %>%
mutate(row = row_number()) %>%
pivot_wider(values_from = c(Diagnosis, DateOfDiagnosis), names_from = row)
# Patient Case Diagnosis_1 Diagnosis_2 Diagnosis_3 DateOfDiagnosis_1 DateOfDiagnosis_2 DateOfDiagnosis_3
# <fct> <dbl> <fct> <fct> <fct> <date> <date> <date>
#1 John Doe 1 A B C 2010-10-10 2010-10-10 2010-10-10
#2 Peter Griffin 2 D E NA 2010-10-11 2010-10-11 NA
#3 Homer Simpson 3 F NA NA 2010-10-12 NA NA
#4 Homer Simpson 4 G NA NA 2010-10-13 NA NA

Getting Data in a single row into multiple rows

I have a code where I see which people work in certain groups. When I ask the leader of each group to present those who work for them, in a survey, I get a row of all of the team members. What I need is to clean the data into multiple rows with their group information.
I don't know where to start.
This is what my data frame looks like,
LeaderName <- c('John','Jane','Louis','Carl')
Group <- c('3','1','4','2')
Member1 <- c('Lucy','Stephanie','Chris','Leslie')
Member1ID <- c('1','2','3','4')
Member2 <- c('Earl','Carlos','Devon','Francis')
Member2ID <- c('5','6','7','8')
Member3 <- c('Luther','Peter','','Severus')
Member3ID <- c('9','10','','11')
GroupInfo <- data.frame(LeaderName, Group, Member1, Member1ID, Member2 ,Member2ID, Member3, Member3ID)
This is what I would like it to show with a certain code
LeaderName_ <- c('John','Jane','Louis','Carl','John','Jane','Louis','Carl','John','Jane','','Carl')
Group_ <- c('3','1','4','2','3','1','4','2','3','1','','2')
Member <- c('Lucy','Stephanie','Chris','Leslie','Earl','Carlos','Devon','Francis','Luther','Peter','','Severus')
MemberID <- c('1','2','3','4','5','6','7','8','9','10','','11')
ActualGroupInfor <- data.frame(LeaderName_,Group_,Member,MemberID)
An option would be melt from data.table and specify the column name patterns in the measure parameter
library(data.table)
melt(setDT(GroupInfo), measure = patterns("^Member\\d+$",
"^Member\\d+ID$"), value.name = c("Member", "MemberID"))[, variable := NULL][]
# LeaderName Group Member MemberID
# 1: John 3 Lucy 1
# 2: Jane 1 Stephanie 2
# 3: Louis 4 Chris 3
# 4: Carl 2 Leslie 4
# 5: John 3 Earl 5
# 6: Jane 1 Carlos 6
# 7: Louis 4 Devon 7
# 8: Carl 2 Francis 8
# 9: John 3 Luther 9
#10: Jane 1 Peter 10
#11: Louis 4
#12: Carl 2 Severus 11
Here is a solution in base r:
reshape(
data=GroupInfo,
idvar=c("LeaderName", "Group"),
varying=list(
Member=which(names(GroupInfo) %in% grep("^Member[0-9]$",names(GroupInfo),value=TRUE)),
MemberID=which(names(GroupInfo) %in% grep("^Member[0-9]ID",names(GroupInfo),value=TRUE))),
direction="long",
v.names = c("Member","MemberID"),
sep="_")[,-3]
#> LeaderName Group Member MemberID
#> John.3.1 John 3 Lucy 1
#> Jane.1.1 Jane 1 Stephanie 2
#> Louis.4.1 Louis 4 Chris 3
#> Carl.2.1 Carl 2 Leslie 4
#> John.3.2 John 3 Earl 5
#> Jane.1.2 Jane 1 Carlos 6
#> Louis.4.2 Louis 4 Devon 7
#> Carl.2.2 Carl 2 Francis 8
#> John.3.3 John 3 Luther 9
#> Jane.1.3 Jane 1 Peter 10
#> Louis.4.3 Louis 4
#> Carl.2.3 Carl 2 Severus 11
Created on 2019-05-23 by the reprex package (v0.2.1)

How to FILL DOWN (autofill) value , eg replace NA with first value in group, using data.table in R?

Very simple and common task:
I need to FILL DOWN in data.table (similar to autofill function in MS Excel) so that
library(data.table)
DT <- fread(
"Paul 32
NA 45
NA 56
John 1
NA 5
George 88
NA 112")
becomes
Paul 32
Paul 45
Paul 56
John 1
John 5
George 88
George 112
Thank you!
Yes the best way to do this is to use #Rui Barradas idea of the zoo package. You can simply do it in one line of code with the na.locf function.
library(zoo)
DT[, V1:=na.locf(V1)]
Replace the V1 with whatever you name your column after reading in the data with fread. Good luck!
For example 2, you can consider using stats::spline for extrapolation as follows:
DT2[is.na(V2), V2 :=
as.integer(DT2[, spline(.I[!is.na(V2)], V2[!is.na(V2)], xout=.I[is.na(V2)]), by=.(V1)]$y)]
output:
V1 V2
1: Paul 1
2: Paul 2
3: Paul 3
4: Paul 4
5: John 100
6: John 110
7: John 120
8: John 130
data:
DT2 <- fread(
"Paul, 1
Paul, 2
Paul, NA
Paul, NA
John, 100
John, 110
John, NA
John, NA")

Inserting rows into a dataframe based on a vector that contains dates

This is what my dataframe looks like:
df <- read.table(text='
Name ActivityType ActivityDate
John Email 2014-01-01
John Webinar 2014-01-05
John Webinar 2014-01-20
John Email 2014-04-20
Tom Email 2014-01-01
Tom Webinar 2014-01-05
Tom Webinar 2014-01-20
Tom Email 2014-04-20
', header=T, row.names = NULL)
I have this vector x which contains different dates
x<- c("2014-01-03","2014-01-25","2015-05-27"). I want to insert rows in my original dataframe in a way that incorporates these dates in the x vector.This is what the output should look like:
Name ActivityType ActivityDate
John Email 2014-01-01
John NA 2014-01-03
John Webinar 2014-01-05
John Webinar 2014-01-20
John NA 2014-01-25
John Email 2014-04-20
John NA 2015-05-27
Tom Email 2014-01-01
Tom NA 2014-01-03
Tom Webinar 2014-01-05
Tom Webinar 2014-01-20
Tom NA 2014-01-25
Tom Email 2014-04-20
Tom NA 2015-05-27
Sincerely appreciate your help!
It looks like you've added one of the 'new' dates aginst each of the people, correct?
In which case you can turn your x into a data.frame, and merge/join it on
## original dataframe
df <- data.frame(Name = c(rep("John", 4), rep("Tom", 4)),
ActivityType = c("Email","Web","Web","Email","Email","Web","Web", "Email"),
ActivityDate = c("2014-01-01","2014-05-01","2014-20-01","2014-20-04","2014-01-01","2014-05-01","2014-20-01","2014-20-04"))
## Turning x into a dataframe.
x <- data.frame(ActivityDate = rep(c("2014-01-03","2014-01-25","2015-05-27"), 2),
Name = rep(c("John","Tom"), 3))
merge(df, x, by=c("Name", "ActivityDate"), all=T)
# Name ActivityDate ActivityType
# 1 John 2014-01-01 Email
# 2 John 2014-05-01 Web
# 3 John 2014-20-01 Web
# 4 John 2014-20-04 Email
# 5 John 2014-01-03 <NA>
# 6 John 2014-01-25 <NA>
# 7 John 2015-05-27 <NA>
# 8 Tom 2014-01-01 Email
# 9 Tom 2014-05-01 Web
# 10 Tom 2014-20-01 Web
# 11 Tom 2014-20-04 Email
# 12 Tom 2014-01-03 <NA>
# 13 Tom 2014-01-25 <NA>
# 14 Tom 2015-05-27 <NA>
Update
As you are having memory issues, you can use data.table thusly
library(data.table)
dt <- as.data.table(df)
x_dt <- as.data.table(x)
merge(dt, x_dt, by=c("Name","ActivityDate"), all=T)
or, if you're not looking to merge you can rbind them, using data.table's rbindlist
rbindlist(list(dt, x_dt), fill=TRUE) ## fill sets the 'ActivityType' to NA in X
Update 2
To generate your x with 16000 uniqe names (I've used numbers here, but the principle is the same) and 30 dates
ActivityDates <- seq(as.Date("2014-01-01"), as.Date("2014-01-31"), by=1)
Names <- seq(1,16000)
x <- data.frame(Names = rep(Names, length(ActivityDates)),
ActivityDates = rep(ActivityDates, length(Names)))
1) expand.grid Using expand.grid create a data frame adds with the rows to be added and then use rbind to combine df and adds converting the ActivityDate column to "Date" class. Then sort. No packages are used.
adds <- expand.grid(Name = levels(df$Name), ActivityType = NA, ActivityDate = x)
both <- transform(rbind(df, adds), ActivityDate = as.Date(ActivityDate))
o <- with(both, order(Name, ActivityDate))
both[o, ]
giving:
Name ActivityType ActivityDate
1 John Email 2014-01-01
9 John <NA> 2014-01-03
2 John Webinar 2014-01-05
3 John Webinar 2014-01-20
11 John <NA> 2014-01-25
4 John Email 2014-04-20
13 John <NA> 2015-05-27
5 Tom Email 2014-01-01
10 Tom <NA> 2014-01-03
6 Tom Webinar 2014-01-05
7 Tom Webinar 2014-01-20
12 Tom <NA> 2014-01-25
8 Tom Email 2014-04-20
14 Tom <NA> 2015-05-27
2) sqldf This uploads adds and df to an sqlite data base which it creates on the fly, then it performs the sql query and downloads the result. The computation occurs outside of R so it might work with your large data.
adds <- data.frame(Name = NA, ActivityDate = x)
library(sqldf)
sqldf("select *
from (select *
from df
union
select a.Name, NULL ActivityType, ActivityDate
from (select distinct Name from df) a
cross join adds b
) order by 1, 3"
)
giving:
Name ActivityType ActivityDate
1 John Email 2014-01-01
2 John <NA> 2014-01-03
3 John Webinar 2014-01-05
4 John Webinar 2014-01-20
5 John <NA> 2014-01-25
6 John Email 2014-04-20
7 John <NA> 2015-05-27
8 Tom Email 2014-01-01
9 Tom <NA> 2014-01-03
10 Tom Webinar 2014-01-05
11 Tom Webinar 2014-01-20
12 Tom <NA> 2014-01-25
13 Tom Email 2014-04-20
14 Tom <NA> 2015-05-27

In R: add rows based on a date and another condition

I have a data frame df:
df <- data.frame(names=c("john","mary","tom"),dates=c(as.Date("2010-06-01"),as.Date("2010-07-09"),as.Date("2010-06-01")),tours_missed=c(2,12,6))
names dates tours_missed
john 2010-06-01 2
mary 2010-07-09 12
tom 2010-06-01 6
I want to be able to add a row with the dates the person missed. There are 2 tours every day the person works. Each person works every 4 days.
The result should be (though the order doesn't matter):
names dates tours_missed
john 2010-06-01 2
mary 2010-07-09 12
mary 2010-07-13 12
mary 2010-07-17 12
mary 2010-07-21 12
mary 2010-07-25 12
mary 2010-07-29 12
tom 2010-06-01 6
tom 2010-06-05 6
tom 2010-06-09 6
I have already tried looking at these topics but was unable to produce the above result: Add rows to a data frame based on date in previous row, In R: Add rows with data of previous row to data frame, add new row to dataframe, enter link description here. Thanks for your help!
library(data.table)
dt = as.data.table(df) # or convert in-place using setDT
# all of the relevant dates
dates.all = dt[, seq(dates, length = tours_missed/2, by = "4 days"), by = names]
# set the key and merge filling in the blanks with previous observation
setkey(dt, names, dates)
dt[dates.all, roll = T]
# names dates tours_missed
# 1: john 2010-06-01 2
# 2: mary 2010-07-09 12
# 3: mary 2010-07-13 12
# 4: mary 2010-07-17 12
# 5: mary 2010-07-21 12
# 6: mary 2010-07-25 12
# 7: mary 2010-07-29 12
# 8: tom 2010-06-01 6
# 9: tom 2010-06-05 6
#10: tom 2010-06-09 6
Or if merging is unnecessary (not quite clear from OP), just construct the answer:
dt[, list(dates = seq(dates, length = tours_missed/2, by = "4 days"), tours_missed)
, by = names]

Resources