Create new variable based on a condition across multiple columns - r

I have a binary variable ("Penalty") and 30 factors with the same levels: "Discharge", "Suspended", "Fine", "Community order", and "Imprisonment".
A small example:
ID
Possession
Importation
Production
Penalty
1
Fine
NA
Fine
Yes
2
NA
NA
Community order
No
3
Discharge
Discharge
NA
No
4
NA
NA
Suspended
Yes
5
Imprisonment
NA
NA
No
6
Fine
NA
Imprisonment
No
I would like to create a new factor based on the same condition across these columns plus the binary variable and where there are differing levels in the same row would like the new variable 'sentence' to retain the levels with this priority: Imprisonment > Community order, Suspended > Fine > Discharge. e.g. Discharge will only be present in the new column where no other level appears.
Desired output:
ID
Possession
Importation
Production
Penalty
Sentence
1
Fine
NA
Fine
Yes
Fine
2
NA
NA
Community order
No
Community order
3
Discharge
Discharge
NA
No
Discharge
4
NA
NA
Suspended
Yes
Suspended
5
Imprisonment
NA
NA
No
Imprisonment
6
Fine
NA
Imprisonment
No
Imprisonment
This is what I have attempted: (where "vec" is a vector of the factor column indices)
data <- data %>%
mutate(
crim_sanct = case_when(
(if_any(vec) == "Discharge") ~ "Discharge",
(if_any(vec) == "Fine") | (data$Penalty == "Yes") ~ "Fine",
(if_any(vec) == "Suspended") ~ "Suspended",
(if_any(vec) == "Community order") ~ "Community order",
(if_any(vec) == "Imprisonment") ~ "imprisonment"))

You are in the right direction but have some small syntax issues in if_any.
Also in case_when you need to put the conditions based on the priority. So if Imprisonment > Community order then Imprisonment condition should come first before Community order.
library(dplyr)
data <- data %>%
mutate(
crim_sanct =
case_when(
if_any(Possession:Production, ~. == "Imprisonment") ~ "imprisonment",
if_any(Possession:Production, ~ . == "Discharge") ~ "Discharge",
if_any(Possession:Production, ~. == "Suspended") ~ "Suspended",
if_any(Possession:Production, ~. == "Fine") | (Penalty == "Yes") ~ "Fine",
if_any(Possession:Production, ~. == "Community order") ~ "Community order")
)
data
# ID Possession Importation Production Penalty crim_sanct
#1 1 Fine <NA> Fine Yes Fine
#2 2 <NA> <NA> Community order No Community order
#3 3 Discharge Discharge <NA> No Discharge
#4 4 <NA> <NA> Suspended Yes Suspended
#5 5 Imprisonment <NA> <NA> No imprisonment
#6 6 Fine <NA> Imprisonment No imprisonment

Since I don't know how to handle the Penalty column, we ignore it for now. Creating a column Sentence based on the columns Possession, Importation and Production could be done with
library(dplyr)
data %>%
mutate(across(
Possession:Production,
~ factor(.x,
c("Imprisonment", "Community order", "Suspended", "Fine", "Discharge"),
ordered = TRUE))) %>%
rowwise() %>%
mutate(Sentence = min(c_across(Possession:Production), na.rm = TRUE)) %>%
ungroup()
which returns
# A tibble: 6 x 6
ID Possession Importation Production Penalty Sentence
<dbl> <ord> <ord> <ord> <chr> <ord>
1 1 Fine NA Fine Yes Fine
2 2 NA NA Community order No Community order
3 3 Discharge Discharge NA No Discharge
4 4 NA NA Suspended Yes Suspended
5 5 Imprisonment NA NA No Imprisonment
6 6 Fine NA Imprisonment No Imprisonment
The main idea here is creating ordered factors and using a rowwise min-function to get the sentence with the hightest priority.
Data
data <- structure(list(ID = c(1, 2, 3, 4, 5, 6), Possession = c("Fine",
NA, "Discharge", NA, "Imprisonment", "Fine"), Importation = c(NA,
NA, "Discharge", NA, NA, NA), Production = c("Fine", "Community order",
NA, "Suspended", NA, "Imprisonment"), Penalty = c("Yes", "No",
"No", "Yes", "No", "No")), problems = structure(list(row = 6L,
col = "Penalty", expected = "", actual = "embedded null",
file = "literal data"), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame")), class = "data.frame", row.names = c(NA,
-6L), spec = structure(list(cols = list(ID = structure(list(), class = c("collector_double",
"collector")), Possession = structure(list(), class = c("collector_character",
"collector")), Importation = structure(list(), class = c("collector_character",
"collector")), Production = structure(list(), class = c("collector_character",
"collector")), Penalty = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))

Related

R:Sorting rows with time within multiple time interval

I want to pick up rows of which time data is between multiple intervals.
The data frame is like this:
dputs
structure(list(ID = c("A", "A", "A", "A", "A", "B", "B", "B",
"B", "B"), score_time = c("2022/09/01 9:00:00", "2022/09/02 18:00:00",
"2022/09/03 12:00:00", NA, NA, "2022/09/15 18:00:00", "2022/09/18 20:00:00",
NA, NA, NA), score = c(243, 232, 319, NA, NA, 436, 310, NA, NA,
NA), treatment_start = c(NA, NA, NA, "2022/09/02 8:00:00", "2022/09/03 11:00:00",
NA, NA, "2022/09/15 8:00:00", "2022/09/16 14:00:00", "2022/09/16 23:00:00"
), treatment_end = c(NA, NA, NA, "2022/09/02 22:00:00", "2022/09/09 12:00:00",
NA, NA, "2022/09/16 2:00:00", "2022/09/16 22:00:00", "2022/09/17 0:00:00"
)), row.names = c(NA, -10L), spec = structure(list(cols = list(
ID = structure(list(), class = c("collector_character", "collector"
)), score_time = structure(list(), class = c("collector_character",
"collector")), score = structure(list(), class = c("collector_double",
"collector")), treatment_start = structure(list(), class = c("collector_character",
"collector")), treatment_end = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x6000000190b0>, class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
ID score_time score treatment_start treatment_end
<chr> <chr> <dbl> <chr> <chr>
1 A 2022/09/01 9:00:00 243 NA NA
2 A 2022/09/02 18:00:00 232 NA NA
3 A 2022/09/03 12:00:00 319 NA NA
4 A NA NA 2022/09/02 8:00:00 2022/09/02 22:00:00
5 A NA NA 2022/09/03 11:00:00 2022/09/09 12:00:00
6 B 2022/09/15 18:00:00 436 NA NA
7 B 2022/09/18 20:00:00 310 NA NA
8 B NA NA 2022/09/15 8:00:00 2022/09/16 2:00:00
9 B NA NA 2022/09/16 14:00:00 2022/09/16 22:00:00
10 B NA NA 2022/09/16 23:00:00 2022/09/17 0:00:00
Multiple score values are given for each ID with the measurement time.
And each ID has more than one information of treatment duration shown by start and end time.
My target is score values that are measured during treatment periods.
I tried with the package lubridate and tidyverse to mutate intervals but could not apply "%in%" method.
Here is my attempt until putting intervals in the same rows with score values.
data %>%
mutate(trt_interval = interval(start = treatment_start, end = treatment_end)) %>%
group_by(ID) %>%
mutate(num = row_number()) %>%
pivot_wider(names_from = num, names_prefix = "intvl", values_from = trt_interval) %>%
fill(c(intvl1:last_col()), .direction = "up")
Desired output is like this.
(The first score of A and the last score of B dismissed because their score_time are out of interval.)
ID score
<chr> <dbl>
1 A 232
2 A 319
3 B 436
I want to know the smarter way to put data in a row and how to apply "%in%" for multiple intervals.
Sorry that the question is not qualified and include multiple steps but any advices will be a great help for me.
Hi I would first create two seperate data frames. One for the scores and one for the intervalls. Then would I join them both and filter the score that are within an treatment intervall.
data_score <- data %>%
filter(!is.na(score_time)) %>%
select(-starts_with("treat")) %>%
mutate(score_time = ymd_hms(score_time))
data_score
data_interval <- data %>%
filter(is.na(score_time)) %>%
select(ID,starts_with("treat")) %>%
mutate(trt_interval = interval(start = treatment_start, end = treatment_end))
data_score %>%
inner_join(
data_interval
) %>%
filter(
lubridate::`%within%`(score_time,trt_interval )
)
Hope this helps!!

Moving elements from column to column in r

I have a dataframe that looks like this (but for every US county)
county
state
n_state_1
n_state_2
n_state_3
n_state_4
Autauga County
AL
NA
FL
NA
NA
Baldwin County
AL
GA
NA
TN
NA
Catron County
AL
FL
GA
NA
CA
I want to move the non-missing values (FL,GA,TN etc.) to the first columns starting from n_state_1 and then delete the columns containing only missing values to get:
county
state
n_state_1
n_state_2
n_state_3
Autauga County
AL
FL
NA
NA
Baldwin County
AL
GA
TN
NA
Catron County
AL
FL
GA
CA
I am struggling with the first step. I thought about using the function distinct but it doesn't work because there are non-empty elements in each column.
You could use dplyr and tidyr:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(starts_with("n_state")) %>%
drop_na() %>%
group_by(county, state) %>%
mutate(name=row_number()) %>%
pivot_wider(names_prefix="n_state_")
which returns
county state n_state_1 n_state_2 n_state_3
<chr> <chr> <chr> <chr> <chr>
1 Autauga_County AL FL NA NA
2 Baldwin_County AL GA TN NA
3 Catron_County AL FL GA CA
What happened here?
pivot_longer takes the n_state_{n}-columns and collapses them into two columns: the name-column contains the original column name (n_state_1, n_state_2 etc), the value-column contains the states (FL, GA or <NA> in many cases).
Next we remove every <NA> entry. (Note: I use <NA> to make clear it's an NA-value).)
After a grouping by county and state we add a rownumber. These numbers will be later used to create the new column names.
pivot_wider now takes these row numbers and prefixes them with n_state_ to get the new columns. The values are taken from the value-column created in the second line of code. pivot_wider fills the missing values with <NA>-values (default behaviour).
Data
structure(list(county = c("Autauga_County", "Baldwin_County",
"Catron_County"), state = c("AL", "AL", "AL"), n_state_1 = c(NA,
"GA", "FL"), n_state_2 = c("FL", NA, "GA"), n_state_3 = c(NA,
"TN", NA), n_state_4 = c(NA, NA, "CA")), problems = structure(list(
row = 3L, col = "n_state_4", expected = "", actual = "embedded null",
file = "literal data"), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame")), class = c("spec_tbl_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -3L), spec = structure(list(
cols = list(county = structure(list(), class = c("collector_character",
"collector")), state = structure(list(), class = c("collector_character",
"collector")), n_state_1 = structure(list(), class = c("collector_character",
"collector")), n_state_2 = structure(list(), class = c("collector_character",
"collector")), n_state_3 = structure(list(), class = c("collector_character",
"collector")), n_state_4 = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
Or another option with dapply from collapse and select only columns with any non-NA elements
library(collapse)
library(dplyr)
dapply(df1, MARGIN = 1, FUN = function(x) c(x[!is.na(x)], x[is.na(x)])) %>%
select(where(~ any(complete.cases(.))))
# A tibble: 3 x 5
county state n_state_1 n_state_2 n_state_3
<chr> <chr> <chr> <chr> <chr>
1 Autauga_County AL FL <NA> <NA>
2 Baldwin_County AL GA TN <NA>
3 Catron_County AL FL GA CA

Iterate through columns' suffixes in a for loop. R

I am trying to modify my dataset with a for loop. I want to modify certain cells of some columns depending on the value of its "paired" column. My dataset could be:
data1989 <- data.frame("date" = c("1987-01-01", "1987-01-03", "1987-01-19"),
"NDVI_1" = c(NA, 0.589, 0.120),
"NDVI_3" = c(NA, 0.447, NA),
"NDVI_4" = c(NA, NA, NA),
"pixelQA_1" = c(NA, 66.897,90.599),
"pixelQA_3" = c(NA, 66.097,NA),
"pixelQA_4" = c(NA, NA, NA),
stringsAsFactors = FALSE)
> data1989
date NDVI_1 NDVI_3 NDVI_4 pixelQA_1 pixelQA_3 pixelQA_4
1 1987-01-01 NA NA NA NA NA NA
2 1987-01-03 0.589 0.447 NA 66.897 66.097 NA
3 1987-01-19 0.120 NA NA 90.599 NA NA
Columns are "paired" by the suffix of each column, so NDVI_1 is paired with pixelQA_1, and so on. I want to modify the values under NDVI's columns depending on it's "paired" values on pixelQA column, following:
if PixelQa is NA -> then NDVI should be also NA.
if Pixel Qa is 66±0.5 OR 130±0.5 -> then NDVI remains the same value.
if Pixel Qa is different to 66±0.5 OR 130±0.5 -> then NDVI value is set to NA (this is bad quality data which needs to be ignored).
Applying these very simple rules my data should look like:
data1989clean <- data.frame("date" = c("1987-01-01", "1987-01-03", "1987-01-19"),
"NDVI_1" = c(NA, NA, NA),
"NDVI_3" = c(NA, 0.447, NA),
"NDVI_4" = c(NA, NA, NA),
"pixelQA_1" = c(NA, 66.897,90.599),
"pixelQA_3" = c(NA, 66.097,NA),
"pixelQA_4" = c(NA, NA, NA),
stringsAsFactors = FALSE)
> data1989clean
date NDVI_1 NDVI_3 NDVI_4 pixelQA_1 pixelQA_3 pixelQA_4
1 1987-01-01 NA NA NA NA NA NA
2 1987-01-03 NA 0.447 NA 66.897 66.097 NA
3 1987-01-19 NA NA NA 90.599 NA NA
To reach my goal I am trying the following for loop:
for(i in 1:4){
data1989$NDVI_[i] <- ifelse(data1989$pixelQA_[i] < 66.5 & data1989$pixelQA_[i] > 65.5 |
data1989$pixelQA_[i] < 130.5 & data1989$pixelQA_[i] > 129.5,
data1989$NDVI_[i], NA)
}
But so far it is not working, as the dataset output looks exactly the same as the original one. Any suggestion will be welcomed.
As suggested by #George Savva, you can achieve this by pivoting longer, correcting the data, and pivoting back wider. So, using the tidyverse, that gives:
library(tidyverse)
newdd1 <-
#
data1989 %>%
#
pivot_longer(cols = -date,
names_to = c(".value", "set"),
names_sep = "_") %>%
#
mutate(NDVI = case_when(is.na(pixelQA) ~ NA_real_,
between(pixelQA, 65.5, 66.5) ~ NDVI,
between(pixelQA, 129.5, 130.5) ~ NDVI,
TRUE ~ NA_real_)) %>%
#
pivot_wider(names_from = set,
values_from = c(NDVI, pixelQA))

Looping through select rows based on input of one or more columns

I have a nested list of items such that I have 3 separate lists grouped into one. I would like to make changes to a specific column that is present in all the lists. I have more details below
X
$`Manufacturing`
Stage Days.Added Start.Date End.Date
Planning 2 1968-12-01 NA
Building 14 NA NA
Testing 3 NA NA
Implementation 15 NA NA
$`Project Analysis`
Stage Days.Added Start.Date End.Date
Initial Review 3 1968-12-01 NA
Building 14 NA NA
User Testing 20 NA NA
Implementation 15 NA NA
User Review 7 NA NA
Final Analysis 4 NA NA
lapply(X, '[', 'End.Date') gives me:
$`Manufacturing`
End.Date
NA
NA
NA
NA
$`Project Analysis`
End.Date
NA
NA
NA
NA
NA
NA
I want to create a loop whereby the 'End.Date' column is the addition of the 'Start.Date' and the 'Days.Added' column for the first row. The resulting value would be the 'Start.Date' entry for the second row which would have the 'Days.Added' column added to produce the new 'End.Date' for the second row and so forth. So basically something like this:
$`Manufacturing`
Stage Days.Added Start.Date End.Date
Planning 2 1968-12-01 1968-12-03
Building 14 1968-12-03 1968-12-17
Testing 3 1968-12-17 1968-12-20
Implementation 15 1968-12-20 1969-01-04
$`Project Analysis`
Stage Days.Added Start.Date End.Date
Initial Review 3 1968-12-01 1968-12-04
Building 15 1968-12-04 1968-12-19
User Testing 20 1968-12-19 1969-01-08
Implementation 15 1969-01-08 1969-01-23
User Review 7 1969-01-23 1969-01-30
Final Analysis 4 1969-01-30 1969-02-03
How do I achieve this?
Assuming the Start.Date' isDate` class,
lapply(X, transform, Start.Date = Start.Date[1] +
c(0, cumsum(Days.Added[-length(Days.Added)])),
End.Date = Start.Date[1] + cumsum(Days.Added))
data
X <- list(Manufacturing = structure(list(Stage = c("Planning", "Building",
"Testing", "Implementation"), Days.Added = c(2L, 14L, 3L, 15L
), Start.Date = structure(c(-396, NA, NA, NA), class = "Date"),
End.Date = c(NA, NA, NA, NA)), row.names = c(NA, -4L), class = "data.frame"),
`Project Analysis` = structure(list(Stage = c("Initial Review",
"Building", "User Testing", "Implementation", "User Review",
"Final Analysis"), Days.Added = c(3L, 14L, 20L, 15L, 7L,
4L), Start.Date = structure(c(-396, NA, NA, NA, NA, NA), class = "Date"),
End.Date = c(NA, NA, NA, NA, NA, NA)), row.names = c(NA,
-6L), class = "data.frame"))

Transpose dplyr::tbl object

I am using src_postgres to connect and dplyr::tbl function to fetch data from redshift database. I have applied some filters and top function to it using the dplyr itself. Now my data looks as below:
riid day hour
<dbl> <chr> <chr>
1 5542. "THURSDAY " 12
2 5862. "FRIDAY " 15
3 5982. "TUESDAY " 15
4 6022. WEDNESDAY 16
My final output should be as below:
riid MON TUES WED THUR FRI SAT SUN
5542 12
5862 15
5988 15
6022 16
I have tried spread. It throws the below error because of the class type:
Error in UseMethod("spread_") : no applicable method for 'spread_'
applied to an object of class "c('tbl_dbi', 'tbl_sql', 'tbl_lazy',
'tbl')"
Since this is a really big table, I do not want to use dataframe as it takes a longer time.
I was able to use as below:
df_mon <- df2 %>% filter(day == 'MONDAY') %>% mutate(MONDAY = hour) %>% select(riid,MONDAY)
df_tue <- df2 %>% filter(day == 'TUESDAY') %>% mutate(TUESDAY = hour) %>% select(riid,TUESDAY)
df_wed <- df2 %>% filter(day == 'WEDNESDAY') %>% mutate(WEDNESDAY = hour) %>% select(riid,WEDNESDAY)
df_thu <- df2 %>% filter(day == 'THURSDAY') %>% mutate(THURSDAY = hour) %>% select(riid,THURSDAY)
df_fri <- df2 %>% filter(day == 'FRIDAY') %>% mutate(FRIDAY = hour) %>% select(riid,FRIDAY)
Is it possible to write all above in one statement?
Any help to transpose this in a faster manner is really appreciated.
EDIT
Adding the dput of the tbl object:
structure(list(src = structure(list(con = <S4 object of class structure("PostgreSQLConnection", package = "RPostgreSQL")>,
disco = <environment>), .Names = c("con", "disco"), class = c("src_dbi",
"src_sql", "src")), ops = structure(list(name = "select", x = structure(list(
name = "filter", x = structure(list(name = "filter", x = structure(list(
name = "group_by", x = structure(list(x = structure("SELECT riid,day,hour,sum(weightage) AS score FROM\n (SELECT riid,day,hour,\n POWER(2,(cast(datediff (seconds,convert_timezone('UTC','PKT',SYSDATE),TO_DATE(TO_CHAR(event_captured_dt,'mm/dd/yyyy hh24:mi:ss'),'mm/dd/yyyy hh24:mi:ss')) as decimal) / cast(7862400 as decimal))) AS weightage\n FROM (\n SELECT riid,convert_timezone('GMT','PKT',event_captured_dt) AS EVENT_CAPTURED_DT,\n TO_CHAR(convert_timezone('GMT','PKT',event_captured_dt),'DAY') AS day,\n TO_CHAR(convert_timezone('GMT','PKT',event_captured_dt),'HH24') AS hour\n FROM Zameen_STO_DATA WHERE EVENT_CAPTURED_DT >= TO_DATE((sysdate -30),'yyyy-mm-dd') and LIST_ID = 4282\n )) group by riid,day,hour", class = c("sql",
"character")), vars = c("riid", "day", "hour", "score"
)), .Names = c("x", "vars"), class = c("op_base_remote",
"op_base", "op")), dots = structure(list(riid = riid,
day = day), .Names = c("riid", "day")), args = structure(list(
add = FALSE), .Names = "add")), .Names = c("name",
"x", "dots", "args"), class = c("op_group_by", "op_single",
"op")), dots = structure(list(~min_rank(desc(~score)) <=
1), .Names = ""), args = list()), .Names = c("name",
"x", "dots", "args"), class = c("op_filter", "op_single",
"op")), dots = structure(list(~row_number() == 1), .Names = ""),
args = list()), .Names = c("name", "x", "dots", "args"), class = c("op_filter",
"op_single", "op")), dots = structure(list(~riid, ~day, ~hour), class = "quosures", .Names = c("",
"", "")), args = list()), .Names = c("name", "x", "dots", "args"
), class = c("op_select", "op_single", "op"))), .Names = c("src",
"ops"), class = c("tbl_dbi", "tbl_sql", "tbl_lazy", "tbl"))
I think what you're looking for is the ability to run the tidyr::spread() function against a remote source, or database. I have a PR for dbplyr that attempts to implement that here: https://github.com/tidyverse/dbplyr/pull/72, you can try it out by using: devtools::install_github("tidyverse/dbplyr", ref = devtools::github_pull(72)).
Use dcast from reshape2 package
> data
# A tibble: 4 x 3
riid day hour
<dbl> <chr> <dbl>
1 1.00 TH 12.0
2 2.00 FR 15.0
3 3.00 TU 15.0
4 4.00 WE 16.0
> dcast(data, riid~day, value.var = "hour")
riid FR TH TU WE
1 1 NA 12 NA NA
2 2 15 NA NA NA
3 3 NA NA 15 NA
4 4 NA NA NA 16
Further if you want to remove NA, then
> z <- dcast(data, riid~day, value.var = "hour")
> z[is.na(z)] <- ""
> z
riid FR TH TU WE
1 1 12
2 2 15
3 3 15
4 4 16
I tried to combine your multiple line attempts into one. Can you try this and let us know the outcome?
library(dplyr)
df %>%
rowwise() %>%
mutate(Mon = ifelse(day=='MONDAY', hour[day=='MONDAY'], NA),
Tue = ifelse(day=='TUESDAY', hour[day=='TUESDAY'], NA),
Wed = ifelse(day=='WEDNESDAY', hour[day=='WEDNESDAY'], NA),
Thu = ifelse(day=='THURSDAY', hour[day=='THURSDAY'], NA),
Fri = ifelse(day=='FRIDAY', hour[day=='FRIDAY'], NA),
Sat = ifelse(day=='SATURDAY', hour[day=='SATURDAY'], NA),
Sun = ifelse(day=='SUNDAY', hour[day=='SUNDAY'], NA)) %>%
select(-day, -hour)
Output is:
riid Mon Tue Wed Thu Fri Sat Sun
1 5542 NA NA NA 12 NA NA NA
2 5862 NA NA NA NA 15 NA NA
3 5982 NA 15 NA NA NA NA NA
4 6022 NA NA 16 NA NA NA NA
Sample data:
# A tibble: 4 x 3
riid day hour
* <dbl> <chr> <int>
1 5542 THURSDAY 12
2 5862 FRIDAY 15
3 5982 TUESDAY 15
4 6022 WEDNESDAY 16
Update:
Can you try below approach using data.table?
library(data.table)
dt <- setDT(df)[, c("Mon","Tue","Wed","Thu","Fri","Sat","Sun") :=
list(ifelse(day=='MONDAY', hour[day=='MONDAY'], NA),
ifelse(day=='TUESDAY', hour[day=='TUESDAY'], NA),
ifelse(day=='WEDNESDAY', hour[day=='WEDNESDAY'], NA),
ifelse(day=='THURSDAY', hour[day=='THURSDAY'], NA),
ifelse(day=='FRIDAY', hour[day=='FRIDAY'], NA),
ifelse(day=='SATURDAY', hour[day=='SATURDAY'], NA),
ifelse(day=='SUNDAY', hour[day=='SUNDAY'], NA))][, !c("day","hour"), with=F]

Resources