I've seen lots of questions like this but can't figure this simple problem out. I don't want to collapse the dataset. Say I have this dataset:
library(tidyverse)
library(lubridate)
df <- data.frame(group = c("a", "a", "a", "a", "a", "b", "b", "b"),
starts = c("2011-09-18", NA, "2014-08-08", "2016-09-18", NA, "2013-08-08", "2015-08-08", NA),
ends = c(NA, "2013-03-06", "2015-08-08", NA, "2017-03-06", "2014-08-08", NA, "2016-08-08"))
df$starts <- parse_date_time(df$starts, "ymd")
df$ends <- parse_date_time(df$ends, "ymd")
df
group starts ends
1 a 2011-09-18 <NA>
2 a <NA> 2013-03-06
3 a 2014-08-08 2015-08-08
4 a 2016-09-18 <NA>
5 a <NA> 2017-03-06
6 b 2013-08-08 2014-08-08
7 b 2015-08-08 <NA>
8 b <NA> 2016-08-08
Desired output is:
group starts ends epi
1 a 2011-09-18 <NA> 1
2 a <NA> 2013-03-06 1
3 a 2014-08-08 2015-08-08 2
4 a 2016-09-18 <NA> 3
5 a <NA> 2017-03-06 3
6 b 2013-08-08 2014-08-08 1
7 b 2015-08-08 <NA> 2
8 b <NA> 2016-08-08 2
I was thinking something like this but obviously doesn't account for episodes where there is no NA
df <- df %>%
group_by(group) %>%
mutate(epi = cumsum(is.na(ends)))
df
I'm not sure how to incorporate cumsum(is.na) with condition if_else. Maybe I'm going at it the wrong way?
Any suggestions would be great.
A solution using dplyr. Assuming your data frame is well structured that each start always has an associated end record.
df2 <- df %>%
group_by(group) %>%
mutate(epi = cumsum(!is.na(starts))) %>%
ungroup()
df2
# # A tibble: 8 x 4
# group starts ends epi
# <fct> <dttm> <dttm> <int>
# 1 a 2011-09-18 00:00:00 NA 1
# 2 a NA 2013-03-06 00:00:00 1
# 3 a 2014-08-08 00:00:00 2015-08-08 00:00:00 2
# 4 a 2016-09-18 00:00:00 NA 3
# 5 a NA 2017-03-06 00:00:00 3
# 6 b 2013-08-08 00:00:00 2014-08-08 00:00:00 1
# 7 b 2015-08-08 00:00:00 NA 2
# 8 b NA 2016-08-08 00:00:00 2
An option is to get the rowSums of NA elements for columns 'starts', 'ends', grouped by 'group', get the rleid from the 'epi'
library(dplyr)
library(data.table)
df %>%
mutate(epi = rowSums(is.na(.[c("starts", "ends")]))) %>%
group_by(group) %>%
mutate(epi = rleid(epi))
# A tibble: 8 x 4
# Groups: group [2]
# group starts ends epi
# <fct> <dttm> <dttm> <int>
#1 a 2011-09-18 00:00:00 NA 1
#2 a NA 2013-03-06 00:00:00 1
#3 a 2014-08-08 00:00:00 2015-08-08 00:00:00 2
#4 a 2016-09-18 00:00:00 NA 3
#5 a NA 2017-03-06 00:00:00 3
#6 b 2013-08-08 00:00:00 2014-08-08 00:00:00 1
#7 b 2015-08-08 00:00:00 NA 2
#8 b NA 2016-08-08 00:00:00 2
If there are only two columns to consider
df %>%
group_by(group) %>%
mutate(epi = rleid(is.na(starts) + is.na(ends)))
Related
I create a simple dataframe:
library(dplyr)
df <- tibble(
UserId = c("A", "A", "A", "A", "A", "B", "B", "B", "B"),
Answer_Date = as.Date(c("2010-12-31", "2011-12-29", "2012-12-25", "2013-12-10", "2014-12-31", "2010-10-31", "2011-10-28", "2013-10-31", "2015-10-31")),
Q1 = c(3, 1, 1, 0, 1, 4, 2, 5, 4),
Q2 = c(2, 0, 1, 2, 1, 8, 2, 6, 5),
) %>%
group_by(UserId) %>%
mutate(First_Date = min(Answer_Date)) %>%
mutate(Last_Date = max(Answer_Date)) %>%
ungroup()
which gives me
> df
# A tibble: 9 x 6
UserId Answer_Date Q1 Q2 First_Date Last_Date
<chr> <date> <dbl> <dbl> <date> <date>
1 A 2010-12-31 3 2 2010-12-31 2014-12-31
2 A 2011-12-29 1 0 2010-12-31 2014-12-31
3 A 2012-12-25 1 1 2010-12-31 2014-12-31
4 A 2013-12-10 0 2 2010-12-31 2014-12-31
5 A 2014-12-31 1 1 2010-12-31 2014-12-31
6 B 2010-10-31 4 8 2010-10-31 2015-10-31
7 B 2011-10-28 2 2 2010-10-31 2015-10-31
8 B 2013-10-31 5 6 2010-10-31 2015-10-31
9 B 2015-10-31 4 5 2010-10-31 2015-10-31
I now wish to compute the change in each subject's answers between the first and last date on which they answer the questionnaire. I start by writing
df_tmp <- df %>%
filter(Answer_Date == First_Date) %>%
select(c("UserId", "Q1", "Q2"))
colnames(df_tmp) <- c("UserId", paste0("First_Response_", c("Q1", "Q2")))
df <- merge(df, df_tmp, by = "UserId")
df_tmp <- df %>%
filter(Answer_Date == Last_Date) %>%
select(c("UserId", "Q1", "Q2"))
colnames(df_tmp) <- c("UserId", paste0("Last_Response_", c("Q1", "Q2")))
df <- merge(df, df_tmp, by = "UserId")
giving me
> df
UserId Answer_Date Q1 Q2 First_Date Last_Date First_Q1 First_Q2 Last_Q1 Last_Q2
1 A 2010-12-31 3 2 2010-12-31 2014-12-31 3 2 1 1
2 A 2011-12-29 1 0 2010-12-31 2014-12-31 3 2 1 1
3 A 2012-12-25 1 1 2010-12-31 2014-12-31 3 2 1 1
4 A 2013-12-10 0 2 2010-12-31 2014-12-31 3 2 1 1
5 A 2014-12-31 1 1 2010-12-31 2014-12-31 3 2 1 1
6 B 2010-10-31 4 8 2010-10-31 2015-10-31 4 8 4 5
7 B 2011-10-28 2 2 2010-10-31 2015-10-31 4 8 4 5
8 B 2013-10-31 5 6 2010-10-31 2015-10-31 4 8 4 5
9 B 2015-10-31 4 5 2010-10-31 2015-10-31 4 8 4 5
I now wish to create two now columns, Delta_Q1 = Last_Q1 - First_Q1 and Delta_Q2 = Last_Q2 - First_Q2, but (possibly) using mutate, paste0("First_", c("Q1", "Q2")), paste0("Last_", c("Q1", "Q2")) and paste0("Delta_", c("Q1", "Q2")).
What is the correct syntax for computing the differences (or in general, some function of two variables) between pairs of columns sequentially? The reason I don't want to write the differences down manually is simple - the real dataframe has lots of pairs of columns.
Many thanks in advance for your help.
Sincerely
Thomas Philips
You can create two vector of columns and directly subtract them to create new columns.
first_r_col <- grep('First_Response', colnames(df))
last_r_col <- grep('Last_Response', colnames(df))
df[paste0('delta', seq_along(first_r_col))] <- df[last_r_col] - df[first_r_col]
Using dplyr select statement might be easy way to select the columns.
library(dplyr)
df[paste0('delta', seq_along(first_r_col))] <-
df %>% select(starts_with('Last_Response')) -
df %>% select(starts_with('First_Response'))
Here's one approach that does not require you creating the First_Date and Last_Date columns:
library(dplyr)
df %>%
group_by(UserId) %>%
arrange(UserId, Answer_Date) %>%
filter(row_number() == 1 | row_number() == n()) %>%
summarize(Delta_Q1 = diff(Q1),
Delta_Q2 = diff(Q2))
I don't think much of that coding is needed, below is a dplyr solution:
df %>%
group_by(UserId) %>%
arrange(Answer_Date) %>%
summarize(First_Q1 = first(Q1),
First_Q2 = first(Q2),
Last_Q1 = last(Q1),
Last_Q2 = last(Q2)) %>%
mutate(Delta_Q1 = Last_Q1 - First_Q1,
Delta_Q2 = Last_Q2 - First_Q2)
Gives the output of:
# A tibble: 2 x 7
UserId First_Q1 First_Q2 Last_Q1 Last_Q2 Delta_Q1 Delta_Q2
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 3 2 1 1 -2 -1
2 B 4 8 4 5 0 -3
With the benefit of hindsight, I missed the blindingly obvious answer, and made it harder than it should have been:
QUESTIONS <- c("Q1", "Q2")
FIRST_RESPONSE_PREFIX <- "First_"
LAST_RESPONSE_PREFIX <- "Last_"
DELTA_RESPONSE_PREFIX <- "Delta_"
first_response_cols <- paste0(FIRST_RESPONSE_PREFIX, QUESTIONS)
last_response_cols <- paste0(LAST_RESPONSE_PREFIX, QUESTIONS)
delta_response_cols <- paste0(DELTA_RESPONSE_PREFIX, QUESTIONS)
df_tmp1 <- df %>%
filter(Answer_Date == First_Answer_Date) %>%
select(c("UserId", QUESTIONS))
colnames(df_tmp1) <- c("UserId", first_response_cols)
df <- merge(df, df_tmp1, by = "UserId")
df_tmp2 <- df %>%
filter(Answer_Date == Last_Answer_Date) %>%
select(c("UserId", QUESTIONS))
colnames(df_tmp2) <- c("UserId", last_response_cols)
df <- merge(df, df_tmp2, by = "UserId")
df[delta_response_cols] <- df[last_response_cols] - df[first_response_cols]
When I run the code, I get exactly what i want:
> df
UserId Answer_Date Q1 Q2 First_Answer_Date Last_Answer_Date First_Q1 First_Q2 Last_Q1 Last_Q2 Delta_Q1 Delta_Q2
1 A 2010-12-31 3 2 2010-12-31 2014-12-31 3 2 1 1 -2 -1
2 A 2011-12-29 1 0 2010-12-31 2014-12-31 3 2 1 1 -2 -1
3 A 2012-12-25 1 1 2010-12-31 2014-12-31 3 2 1 1 -2 -1
4 A 2013-12-10 0 2 2010-12-31 2014-12-31 3 2 1 1 -2 -1
5 A 2014-12-31 1 1 2010-12-31 2014-12-31 3 2 1 1 -2 -1
6 B 2010-10-31 4 8 2010-10-31 2015-10-31 4 8 4 5 0 -3
7 B 2011-10-28 2 2 2010-10-31 2015-10-31 4 8 4 5 0 -3
8 B 2013-10-31 5 6 2010-10-31 2015-10-31 4 8 4 5 0 -3
9 B 2015-10-31 4 5 2010-10-31 2015-10-31 4 8 4 5 0 -3
That said, thanks for the help - I learned something by looking at the suggested answers.
I have a large dataset of processes (their IDs), start-dates and corresponding end dates.
What I want is divided in two parts. Firstly, how many processes are running each day. Secondly the running processes' mean days of running/commencement.
Sample data set is like
> dput(df)
structure(list(Process = c("P001", "P002", "P003", "P004", "P005"
), Start = c("01-01-2020", "02-01-2020", "03-01-2020", "08-01-2020",
"13-01-2020"), End = c("10-01-2020", "09-01-2020", "04-01-2020",
"17-01-2020", "19-01-2020")), class = "data.frame", row.names = c(NA,
-5L))
df
> df
Process Start End
1 P001 01-01-2020 10-01-2020
2 P002 02-01-2020 09-01-2020
3 P003 03-01-2020 04-01-2020
4 P004 08-01-2020 17-01-2020
5 P005 13-01-2020 19-01-2020
For first part I have proceeded like this
library(tidyverse)
df %>% pivot_longer(cols = c(Start, End), names_to = 'event', values_to = 'dates') %>%
mutate(dates = as.Date(dates, format = "%d-%m-%Y")) %>%
mutate(dates = if_else(event == 'End', dates+1, dates)) %>%
arrange(dates, event) %>%
mutate(processes = ifelse(event == 'Start', 1, -1),
processes = cumsum(processes)) %>%
select(-Process, -event) %>%
complete(dates = seq.Date(min(dates), max(dates), by = '1 day')) %>%
fill(processes)
# A tibble: 20 x 2
dates processes
<date> <dbl>
1 2020-01-01 1
2 2020-01-02 2
3 2020-01-03 3
4 2020-01-04 3
5 2020-01-05 2
6 2020-01-06 2
7 2020-01-07 2
8 2020-01-08 3
9 2020-01-09 3
10 2020-01-10 2
11 2020-01-11 1
12 2020-01-12 1
13 2020-01-13 2
14 2020-01-14 2
15 2020-01-15 2
16 2020-01-16 2
17 2020-01-17 2
18 2020-01-18 1
19 2020-01-19 1
20 2020-01-20 0
For second part the desired output is like column mean days in the following screenshot with explanation-
tidyverse approach will be preferred, please.
Here is one approach :
library(tidyverse)
df %>%
#Convert to date
mutate(across(c(Start, End), lubridate::dmy),
#Create a sequence of dates from start to end
Dates = map2(Start, End, seq, by = 'day')) %>%
#Get data in long format
unnest(Dates) %>%
#Remove columns
select(-Start, -End) %>%
#For each process
group_by(Process) %>%
#Count number of days spent on it
mutate(days_spent = row_number() - 1) %>%
#For each date
group_by(Dates) %>%
#Count number of process running and average days
summarise(process = n(),
mean_days = mean(days_spent))
This returns :
# Dates process mean_days
# <date> <int> <dbl>
# 1 2020-01-01 1 0
# 2 2020-01-02 2 0.5
# 3 2020-01-03 3 1
# 4 2020-01-04 3 2
# 5 2020-01-05 2 3.5
# 6 2020-01-06 2 4.5
# 7 2020-01-07 2 5.5
# 8 2020-01-08 3 4.33
# 9 2020-01-09 3 5.33
#10 2020-01-10 2 5.5
#11 2020-01-11 1 3
#12 2020-01-12 1 4
#13 2020-01-13 2 2.5
#14 2020-01-14 2 3.5
#15 2020-01-15 2 4.5
#16 2020-01-16 2 5.5
#17 2020-01-17 2 6.5
#18 2020-01-18 1 5
#19 2020-01-19 1 6
Here is my toy dataset:
df <- tibble::tribble(
~date, ~value,
"2007-01-31", 25,
"2007-05-31", 31,
"2007-12-31", 26
)
I am creating month-end date series using the following code.
df %>%
mutate(date = as.Date(date)) %>%
complete(date = seq(as.Date("2007-01-31"), as.Date("2019-12-31"), by="month"))
However, I am not getting the correct month-end dates.
date value
<date> <dbl>
1 2007-01-31 25
2 2007-03-03 NA
3 2007-03-31 NA
4 2007-05-01 NA
5 2007-05-31 31
6 2007-07-01 NA
7 2007-07-31 NA
8 2007-08-31 NA
9 2007-10-01 NA
10 2007-10-31 NA
11 2007-12-01 NA
12 2007-12-31 26
What am I missing here? I am okay using other functions from any other package.
No need of complete function, you can do this in base R.
Since last day of the month is different for different months, we can create a sequence of monthly start dates and subtract 1 day from it.
seq(as.Date("2007-02-01"), as.Date("2008-01-01"), by="month") - 1
#[1] "2007-01-31" "2007-02-28" "2007-03-31" "2007-04-30" "2007-05-31" "2007-06-30"
# "2007-07-31" "2007-08-31" "2007-09-30" "2007-10-31" "2007-11-30" "2007-12-31"
Using the same logic in updated dataframe, we can do :
library(dplyr)
df %>%
mutate(date = as.Date(date)) %>%
tidyr::complete(date = seq(min(date) + 1, max(date) + 1, by="month") - 1)
# date value
# <date> <dbl>
# 1 2007-01-31 25
# 2 2007-02-28 NA
# 3 2007-03-31 NA
# 4 2007-04-30 NA
# 5 2007-05-31 31
# 6 2007-06-30 NA
# 7 2007-07-31 NA
# 8 2007-08-31 NA
# 9 2007-09-30 NA
#10 2007-10-31 NA
#11 2007-11-30 NA
#12 2007-12-31 26
I'm having some trouble with logic I need to produce df$val_most_recent. If there's a value for both a_val and b_val, val_most_recent should be the value with the most recent time (a_val corresponds toa_dtm, b_val corresponds tob_dtm). If the times are identical, I'd like a_val to be val_most_recent. If just one value is reported for the two (with the other being a NA, it should simply be that one.
library(tidyverse)
library(lubridate)
location <- c("a", "b", "c", "d")
a_dtm <- ymd_hm(c(NA, "2019-06-05 10:30", "2019-06-05 10:45", "2019-06-05 10:50"))
b_dtm <- ymd_hm(c("2019-06-05 10:30", NA, "2019-06-05 10:48", "2019-06-05 10:50"))
a_val <- c(NA, 6, 4, 2)
b_val <- c(5, NA, 3, 2)
df <- data.frame(location, a_dtm, b_dtm, a_val, b_val)
as_tibble(df)
# A tibble: 4 x 5
#location a_dtm b_dtm a_val b_val
#<fct> <dttm> <dttm> <dbl> <dbl>
#1 a NA 2019-06-05 10:30:00 NA 5
#2 b 2019-06-05 10:30:00 NA 6 NA
#3 c 2019-06-05 10:45:00 2019-06-05 10:48:00 4 3
#4 d 2019-06-05 10:50:00 2019-06-05 10:50:00 2 2
val_most_recent <- c(5,6,3,2)
desired_df <- cbind(df, val_most_recent)
as_tibble(desired_df)
#location a_dtm b_dtm a_val b_val val_most_recent
#<fct> <dttm> <dttm> <dbl> <dbl> <dbl>
#1 a NA 2019-06-05 10:30:00 NA 5 5
#2 b 2019-06-05 10:30:00 NA 6 NA 6
#3 c 2019-06-05 10:45:00 2019-06-05 10:48:00 4 3 3
#4 d 2019-06-05 10:50:00 2019-06-05 10:50:00 2 2 2
Here is the logic from your text coded into a case_when statement:
df %>%
mutate(
val_most_recent = case_when(
is.na(a_val) | is.na(b_va) ~ coalesce(a_val, b_val),
a_dtm >= b_dtm ~ a_val,
TRUE ~ b_val
)
)
# location a_dtm b_dtm a_val b_val val_most_recent
# 1 a <NA> 2019-06-05 10:30:00 NA 5 5
# 2 b 2019-06-05 10:30:00 <NA> 6 NA 6
# 3 c 2019-06-05 10:45:00 2019-06-05 10:48:00 4 3 3
# 4 d 2019-06-05 10:50:00 2019-06-05 10:50:00 2 2 2
Here is one option in base R, convert the dates to numeric, replace the NAs with 0, get the column index with the max values in each row, cbind with the row index and extract the corresponding values from 'a_val/b_val' column
m1 <- sapply(df[2:3], as.numeric)
df$val_most_recent <- df[4:5][cbind(seq_len(nrow(m1)),
max.col(replace(m1, is.na(m1), 0), "first"))]
df$val_most_recent
#[1] 5 6 3 2
I am trying to use the na.locf function in a mutate and I am getting a strange answer. The data is ordered desc by date and then if a column is NA gets the result from na.locf and otherwise uses the value in the column. For most of the data, the answer is being returned as expected, but one row is coming back not as the previous non-NA but as the next non-NA. If we order the data by date ascending and use na.rm = F and fromLast = T it works as expected, but I want to understand why the result is not working if date is ordered descending.
The example is as follows:
example = data.frame(Date = factor(c("1/14/15", "1/29/15", "2/3/15",
"2/11/15", "2/15/15", "3/4/15","3/7/15", "3/7/15", "3/11/15",
"3/18/15", "3/21/15", "4/22/15", "4/22/15", "4/23/15", "5/6/15",
"5/13/15", "5/18/15", "5/24/15", "5/26/15", "5/28/15", "5/29/15",
"5/29/15", "6/25/15", "6/25/15","8/6/15", "8/15/15", "8/20/15",
"8/22/15", "8/22/15", "8/29/15")),
Scan = c(1, rep(NA, 21),2,rep(NA,7)),
Hours = c(rep(NA,3), rep(3,3), NA, 2, rep(3,3), NA, 2, 3, 2,
rep(3,5), NA, 2, rep(c(NA, 3),2), 3, NA, 2, 3)
)
example %>%
mutate(
date = as.Date(Date, "%m/%d/%y"),
Hours = replace_na(Hours,0),
scan_date = as.Date(ifelse(is.na(Scan),
NA,
date),
origin="1970-01-01")) %>%
arrange(desc(date)) %>%
mutate(
scan_new = ifelse(is.na(Scan),
na.locf(Scan),
Scan))
The issue in the result is in row 24, the Scan is coming in as 1 rather than 2:
Date Scan Hours date scan_date scan_new
23 3/7/15 NA 0 2015-03-07 <NA> 2
24 3/7/15 NA 2 2015-03-07 <NA> 1
25 3/4/15 NA 3 2015-03-04 <NA> 2
Interestingly, other data with the same date is handled appropriately, for example on line 18-19
Date Scan Hours date scan_date scan_new
18 4/22/15 NA 0 2015-04-22 <NA> 2
19 4/22/15 NA 2 2015-04-22 <NA> 2
For reference as noted above, the following provides the expected answer:
example %>%
mutate(
date = as.Date(Date, "%m/%d/%y"),
Hours = replace_na(Hours,0),
scan_date = as.Date(ifelse(is.na(Scan),
NA,
date),
origin="1970-01-01")) %>%
arrange(desc(date)) %>%
mutate(
scan_new = ifelse(is.na(Scan),
na.locf(Scan, na.rm = F, fromLast = T),
Scan))
Date Scan Hours date scan_date scan_new
6 3/4/15 NA 3 2015-03-04 <NA> 2
7 3/7/15 NA 0 2015-03-07 <NA> 2
8 3/7/15 NA 2 2015-03-07 <NA> 2
Can someone tell me why this is behaving this way?
In your first try na.locf(Scan), the leading NAs are removed and the remaining values are recycled to the full length in the ifelse. You can see the results with na.rm = F(or na.locf0, see comments) for reference:
example %>%
mutate(
date = as.Date(Date, "%m/%d/%y"),
Hours = replace_na(Hours,0),
scan_date = as.Date(ifelse(is.na(Scan),
NA,
date),
origin="1970-01-01")) %>%
arrange(desc(date)) %>%
mutate(
scan_new = ifelse(is.na(Scan),
na.locf(Scan, na.rm = FALSE),
Scan))
# Date Scan Hours date scan_date scan_new
# 1 8/29/15 NA 3 2015-08-29 <NA> NA
# 2 8/22/15 NA 0 2015-08-22 <NA> NA
# 3 8/22/15 NA 2 2015-08-22 <NA> NA
# 4 8/20/15 NA 3 2015-08-20 <NA> NA
# 5 8/15/15 NA 3 2015-08-15 <NA> NA
# 6 8/6/15 NA 0 2015-08-06 <NA> NA
# 7 6/25/15 2 0 2015-06-25 2015-06-25 2
# 8 6/25/15 NA 3 2015-06-25 <NA> 2
# 9 5/29/15 NA 0 2015-05-29 <NA> 2
# 10 5/29/15 NA 2 2015-05-29 <NA> 2
# 11 5/28/15 NA 3 2015-05-28 <NA> 2
# 12 5/26/15 NA 3 2015-05-26 <NA> 2
# 13 5/24/15 NA 3 2015-05-24 <NA> 2
# 14 5/18/15 NA 3 2015-05-18 <NA> 2
# 15 5/13/15 NA 3 2015-05-13 <NA> 2
# 16 5/6/15 NA 2 2015-05-06 <NA> 2
# 17 4/23/15 NA 3 2015-04-23 <NA> 2
# 18 4/22/15 NA 0 2015-04-22 <NA> 2
# 19 4/22/15 NA 2 2015-04-22 <NA> 2
# 20 3/21/15 NA 3 2015-03-21 <NA> 2
# 21 3/18/15 NA 3 2015-03-18 <NA> 2
# 22 3/11/15 NA 3 2015-03-11 <NA> 2
# 23 3/7/15 NA 0 2015-03-07 <NA> 2
# 24 3/7/15 NA 2 2015-03-07 <NA> 2
# 25 3/4/15 NA 3 2015-03-04 <NA> 2
# 26 2/15/15 NA 3 2015-02-15 <NA> 2
# 27 2/11/15 NA 3 2015-02-11 <NA> 2
# 28 2/3/15 NA 0 2015-02-03 <NA> 2
# 29 1/29/15 NA 0 2015-01-29 <NA> 2
# 30 1/14/15 1 0 2015-01-14 2015-01-14 1