map and mutate over a list of tbl_df - r

I am trying to map over a list of data frames in R but not getting it right. What I am trying is:
lst %>%
map(~mutate(., NewColumn1 = .x$value*2,))
With error:
Error: Column NewColumn1 must be length 2 (the number of rows) or
one, not 0 In addition: Warning message: Unknown or uninitialised
column: 'value'.
The data looks like:
[[9]]
# A tibble: 2 x 4
time ID Value out
<date> <chr> <dbl> <dbl>
1 2016-12-23 CAT1 790. 0
2 2016-12-27 CAT1 792. 1
[[10]]
# A tibble: 2 x 4
time ID Value out
<date> <chr> <dbl> <dbl>
1 2016-12-28 CAT1 785. 0
2 2016-12-29 CAT1 783. 0
DATA:
Data <- list(structure(list(time = structure(c(17136, 17137), class = "Date"),
ID = c("CAT1", "CAT1"), Value = c(747.919983, 750.5), out = c(0,
1)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
)), structure(list(time = structure(c(17140, 17141), class = "Date"),
ID = c("CAT1", "CAT1"), Value = c(762.52002, 759.109985),
out = c(1, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17142,
17143), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(771.190002,
776.419983), out = c(1, 1)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17144,
17147), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(789.289978,
789.27002), out = c(1, 1)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17148,
17149), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(796.099976,
797.070007), out = c(1, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17150,
17151), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(797.849976,
790.799988), out = c(1, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17154,
17155), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(794.200012,
796.419983), out = c(1, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17156,
17157), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(794.559998,
791.26001), out = c(0, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17158,
17162), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(789.909973,
791.549988), out = c(0, 1)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17163,
17164), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(785.049988,
782.789978), out = c(0, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")))

Take a look at the error message Unknown or uninitialised column: 'value'., then look at your code map(Data, ~mutate(., NewColumn1 = .x$value*2,)). The column name is Value and not value (case is important!).
Your syntax can also be cleaned up a bit. Try map(Data, ~mutate(., NewColumn1 = Value*2)). Technically, I think . and .x refer to the same thing, but it's better to be consistent. In mutate you also don't need to subset the data frame, i.e. mutate(df, new_col = old_col) is enough, you don't need mutate(df, new_col = .$old_col).

Related

group_by and lag not working for date in long format

I have a long historical data like this format (unbalanced). While there is a lag until the data is released (next business day), I would like to record the date as of the day it happened. I tried to use dplyr as follows:
dataframe<-dataframe%>%group_by(date)%>%mutate(cob=lag(date,n=1))
However, it just produces the same result as:
lag(date,1)
date
name
value
2023/1/2
a
X
2023/1/2
b
X
2023/1/2
c
X
2023/1/3
a
X
2023/1/3
b
X
2023/1/4
a
X
2023/1/4
b
X
2023/1/5
a
X
2023/1/5
b
X
2023/1/5
c
X
I thought about:
dataframe<-dataframe%>%group_by(name)%>%mutate(cob=lag(date,n=1))
but it produces NA when there is no observation for a certain sample.
mutate(cob=date-1)
is not considering business day.
I just would like to slide all the dates in dataframe$date by 1 business day.
I attached the part of the actual data (historical prices of Japanese treasury bills).
structure(list(date = c("2002-08-06", "2002-08-06", "2002-08-07",
"2002-08-07", "2002-08-09", "2002-08-09"), code = c(2870075L,
3000075L, 2870075L, 3000075L, 2870075L, 3000075L), due_date = c("2002-08-20",
"2002-09-10", "2002-08-20", "2002-09-10", "2002-08-20", "2002-09-10"
), ave_price = c(99.99, 99.99, 99.99, 99.99, 99.99, 99.99)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), groups = structure(list(
date = c("2002-08-06", "2002-08-07", "2002-08-09"), .rows = structure(list(
1:2, 3:4, 5:6), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE))
The expected outcome is as follows:
structure(list(date = c("2002-08-06", "2002-08-06", "2002-08-07",
"2002-08-07", "2002-08-09", "2002-08-09"), code = c(2870075L,
3000075L, 2870075L, 3000075L, 2870075L, 3000075L), due_date = c("2002-08-20",
"2002-09-10", "2002-08-20", "2002-09-10", "2002-08-20", "2002-09-10"
), ave_price = c(99.99, 99.99, 99.99, 99.99, 99.99, 99.99), cob = c(NA,
NA, "2002-08-06", "2002-08-06", "2002-08-07", "2002-08-07")), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), groups = structure(list(
date = c("2002-08-06", "2002-08-07", "2002-08-09"), .rows = structure(list(
1:2, 3:4, 5:6), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -3L), .drop = TRUE))
Thank you very much in advance.
If I understand correctly, you want the previous date recorded in your date column as cob. So, your Aug 9 rows would have the previously recorded date of Aug 7 in your cob column.
If so, you could try the following. First, your example data above is grouped so I started with ungroup. You can get a vector of unique or distinct dates, and get the lag or previous date for those dates. In this case, dates of Aug 6, 7, and 9 will have cob set as NA, Aug 6, and Aug 7.
Then, you can join back to original data with right_join. The final select will keep columns and include order desired.
I left date alone (currently is character value, not in date format).
library(tidyverse)
df %>%
ungroup() %>%
distinct(date) %>%
mutate(cob = lag(date)) %>%
right_join(df) %>%
select(date, code, due_date, ave_price, cob)
Output
date code due_date ave_price cob
<chr> <int> <chr> <dbl> <chr>
1 2002-08-06 2870075 2002-08-20 100. NA
2 2002-08-06 3000075 2002-09-10 100. NA
3 2002-08-07 2870075 2002-08-20 100. 2002-08-06
4 2002-08-07 3000075 2002-09-10 100. 2002-08-06
5 2002-08-09 2870075 2002-08-20 100. 2002-08-07
6 2002-08-09 3000075 2002-09-10 100. 2002-08-07

how to write a list of list to excel using imap and map

I have a list of list and I would like to input them into muti excel files, wherer excel name will be the sublist name, and name of each sheet in the excel will be the df name in each sublist. I tried to utilize imap and map to reach my goal. However I am new to those and still confuse how to set it up correctly.
The sample list can be build using
lst1<-list(`101-01-101` = list(Demographics = structure(list(SubjectID = c("Subject ID",
"101-01-101"), BRTHDTC = c("Birthday", "1953-07-07"), SEX = c("Gender",
"Female")), row.names = c(NA, -2L), class = c("tbl_df", "tbl",
"data.frame")), DiseaseStatus = structure(list(SubjectID = c("Subject ID",
"101-01-101"), DSDT = c("DS Date", "2016-03-14"), DSDT_P = c("DS Date Prob",
NA)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
))), `101-02-102` = list(Demographics = structure(list(SubjectID = c("Subject ID",
"101-02-102"), BRTHDTC = c("Birthday", "1963-07-02"), SEX = c("Gender",
"Female")), row.names = c(NA, -2L), class = c("tbl_df", "tbl",
"data.frame")), DiseaseStatus = structure(list(SubjectID = c("Subject ID",
"101-02-102"), DSDT = c("DS Date", "2017-04-04"), DSDT_P = c("DS Date Prob",
NA)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
))), `101-03-103` = list(Demographics = structure(list(SubjectID = c("Subject ID",
"101-03-103"), BRTHDTC = c("Birthday", "1940-09-11"), SEX = c("Gender",
"Male")), row.names = c(NA, -2L), class = c("tbl_df", "tbl",
"data.frame")), DiseaseStatus = structure(list(SubjectID = c("Subject ID",
"101-03-103"), DSDT = c("DS Date", NA), DSDT_P = c("DS Date Prob",
"UN-UNK-2015")), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"))), `101-04-104` = list(Demographics = structure(list(
SubjectID = c("Subject ID", "101-04-104"), BRTHDTC = c("Birthday",
"1955-12-31"), SEX = c("Gender", "Male")), row.names = c(NA,
-2L), class = c("tbl_df", "tbl", "data.frame")), DiseaseStatus = structure(list(
SubjectID = c("Subject ID", "101-04-104"), DSDT = c("DS Date",
"2016-05-02"), DSDT_P = c("DS Date Prob", NA)), row.names = c(NA,
-2L), class = c("tbl_df", "tbl", "data.frame"))), `104-05-201` = list(
Demographics = structure(list(SubjectID = c("Subject ID",
"104-05-201"), BRTHDTC = c("Birthday", "1950-12-04"), SEX = c("Gender",
"Female")), row.names = c(NA, -2L), class = c("tbl_df", "tbl",
"data.frame")), DiseaseStatus = structure(list(SubjectID = c("Subject ID",
"104-05-201"), DSDT = c("DS Date", "2018-07-06"), DSDT_P = c("DS Date Prob",
NA)), row.names = c(NA, -2L), class = c("tbl_df", "tbl",
"data.frame"))))
The codes I have so far are:
lst1 %>% imap ( ~ wb = createWorkbook()
Map(function(data, nameofsheet){
addWorksheet(wb, nameofsheet)
writeData(wb, nameofsheet, data)
}, .x, names(.x))
saveWorkbook(wb, file.path("C:/Users/SQ/Documents/",
sprintf("subject_%s.xlsx", .y)))
)
My codes doesn't work, but hopfully can give you some idea what I try to do. I need to go through this in this way as I need to add some format to the sheets. Would anyone give me some guidance on how to do this?
I probably did not use .x & .y right. I am still confused on how to define them when we work with list.
Many thanks.
We just need a simple tweak to the code
If we are using multiple expressions, block it inside the {}
The data going into the Map should be .x and not .y as .y will be names of the list
library(openxlsx)
library(purrr)
library(dplyr)
lst1 %>%
imap ( ~ {
wb <- createWorkbook()
Map(function(data, nameofsheet){
addWorksheet(wb, nameofsheet)
writeData(wb, nameofsheet, data)
}, .x, names(.x))
saveWorkbook(wb, file.path("C:/Users/SQ/Documents/",
sprintf("subject_%s.xlsx", .y)))
}
)
-output files generated
-file content

How to export a list of list to a set of excel files [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I have a list of list lst2, which looks like this:
I would like to out this list to Excel:
Each sublist in one Excel file;
Each df in the sublist will be a sheet in excel workbook
Add filter on the 2nd row of each sheet;
For example, we will have a file call 101-01-101.xlsx, and in that file, we will have two sheets: Demographics and DiseaseStatus, and we will have a filter added on the 2nd row of each sheet. We will have 5 Excel files at the end.
How can I auto generate those files without doing it one by one? It seems like write.xlsx can not add format to the output. probaly we have to use library(openxlsx). Any one has any idea on how to handle this type work?
The sample list can be build using codes:
lst2<-list(`101-01-101` = list(Demographics = structure(list(SubjectID = c("Subject ID",
"101-01-101"), BRTHDTC = c("Birthday", "1953-07-07"), SEX = c("Gender",
"Female")), row.names = c(NA, -2L), class = c("tbl_df", "tbl",
"data.frame")), DiseaseStatus = structure(list(SubjectID = c("Subject ID",
"101-01-101"), DSDT = c("DS Date", "2016-03-14"), DSDT_P = c("DS Date Prob",
NA)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
))), `101-02-102` = list(Demographics = structure(list(SubjectID = c("Subject ID",
"101-02-102"), BRTHDTC = c("Birthday", "1963-07-02"), SEX = c("Gender",
"Female")), row.names = c(NA, -2L), class = c("tbl_df", "tbl",
"data.frame")), DiseaseStatus = structure(list(SubjectID = c("Subject ID",
"101-02-102"), DSDT = c("DS Date", "2017-04-04"), DSDT_P = c("DS Date Prob",
NA)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
))), `101-03-103` = list(Demographics = structure(list(SubjectID = c("Subject ID",
"101-03-103"), BRTHDTC = c("Birthday", "1940-09-11"), SEX = c("Gender",
"Male")), row.names = c(NA, -2L), class = c("tbl_df", "tbl",
"data.frame")), DiseaseStatus = structure(list(SubjectID = c("Subject ID",
"101-03-103"), DSDT = c("DS Date", NA), DSDT_P = c("DS Date Prob",
"UN-UNK-2015")), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"))), `101-04-104` = list(Demographics = structure(list(
SubjectID = c("Subject ID", "101-04-104"), BRTHDTC = c("Birthday",
"1955-12-31"), SEX = c("Gender", "Male")), row.names = c(NA,
-2L), class = c("tbl_df", "tbl", "data.frame")), DiseaseStatus = structure(list(
SubjectID = c("Subject ID", "101-04-104"), DSDT = c("DS Date",
"2016-05-02"), DSDT_P = c("DS Date Prob", NA)), row.names = c(NA,
-2L), class = c("tbl_df", "tbl", "data.frame"))), `104-05-201` = list(
Demographics = structure(list(SubjectID = c("Subject ID",
"104-05-201"), BRTHDTC = c("Birthday", "1950-12-04"), SEX = c("Gender",
"Female")), row.names = c(NA, -2L), class = c("tbl_df", "tbl",
"data.frame")), DiseaseStatus = structure(list(SubjectID = c("Subject ID",
"104-05-201"), DSDT = c("DS Date", "2018-07-06"), DSDT_P = c("DS Date Prob",
NA)), row.names = c(NA, -2L), class = c("tbl_df", "tbl",
"data.frame"))))
I don't know what you mean by having a filter added to the second row, but I hope the following code helps you.
saveFolder <- "someDirectory"
if(!dir.exists(saveFolder)) dir.create(saveFolder)
xlsxNames <- paste0(names(lst2),".xlsx")
xlsxPaths <- file.path(saveFolder,xlsxNames)
for(k in 1:length(lst2)) {
print(paste0("Processing ",k, " of ", length(lst2)))
currentSavePath <- xlsxPaths[k]
tablesToSave <- lst2[[k]]
sheetNames <- names(tablesToSave)
for(i in 1:length(tablesToSave)) {
if(!file.exists(currentSavePath)) {
#File does not exist, create it by writing the firdst sheet
xlsx::write.xlsx(x = tablesToSave[[i]], file = currentSavePath, sheetName = sheetNames[i])
} else {
#File does exist, hence we have already written something in there, must use append
xlsx::write.xlsx(x = tablesToSave[[i]], file = currentSavePath, sheetName = sheetNames[i],
append = TRUE)
}
}
}

Why do I get Error in Error: Problem with `mutate()` input `medication_name`. x Result 1 must be a single string, not a character vector of length 2

I have a data set with another with a list of a nested data.
age_pharma <- structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8), age_band = c("5_9",
"10_14", "15-19", "20-24", "5_9", "10_14", "15-19", "20-24"),
table = list(structure(list(med_name_one = c("Co-amoxiclav",
"doxycycline"), med_name_two = c(NA, "Gentamicin"), mg_one = c("411 mg",
"120 mg"), mg_two = c(NA, "11280 mg"), datetime = c("2020-01-03 10:08",
"2020-01-01 11:08"), date_time = structure(c(1578046080,
1577876880), tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-2L)), structure(list(med_name_one = c("Gentamicin", "Co-trimoxazole"
), med_name_two = c("Co-trimoxazole", NA), mg_one = c("11280 mg",
"8 mg"), mg_two = c("8 mg", NA), datetime = c("2020-01-02 19:08",
"2020-01-08 20:08"), date_time = structure(c(1577992080,
1578514080), tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-2L)), structure(list(med_name_one = "Gentamicin", med_name_two = NA_character_,
mg_one = "11280 mg", mg_two = NA_character_, datetime = "2020-01-02 19:08",
date_time = structure(1577992080, tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-1L)), structure(list(med_name_one = "Co-trimoxazole", med_name_two = NA_character_,
mg_one = "8 mg", mg_two = NA_character_, datetime = "2020-01-08 20:08",
date_time = structure(1578514080, tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-1L)), structure(list(med_name_one = "Sodium Chloride", med_name_two = NA_character_,
mg_one = "411 mg", mg_two = NA_character_, datetime = "2020-01-10 08:08",
date_time = structure(1578643680, tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-1L)), structure(list(med_name_one = "Piperacillin", med_name_two = NA_character_,
mg_one = "120 mg", mg_two = NA_character_, datetime = "2020-01-03 09:08",
date_time = structure(1578042480, tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-1L)), structure(list(med_name_one = character(0), med_name_two = character(0),
mg_one = character(0), mg_two = character(0), datetime = character(0),
date_time = structure(numeric(0), tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = integer(0)),
structure(list(med_name_one = character(0), med_name_two = character(0),
mg_one = character(0), mg_two = character(0), datetime = character(0),
date_time = structure(numeric(0), tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = integer(0)))), row.names = c(NA, -8L), class = c("tbl_df",
"tbl", "data.frame"))
I am trying to map a variable from the list (table). The variable is called med_name_one.
get_medication_name <- function(medication_name_df) {
medication_name <- medication_name_df %>%
dplyr::group_by(id) %>%
dplyr::arrange(datetime) %>%
pull(med_name_one)
}
Here I am applying the function so that I get the med_name_one as a variable.
age_pharma <- mutate(medication_name = purrr::map(age_pharma, get_medication_name))
Yet I do not know why I get this error?
Error: Problem with `mutate()` input `medication_name`.
x Result 1 must be a single string, not a character vector of length 2
ℹ Input `medication_name` is `purrr::map_chr(table, get_medication_name)`.
Run `rlang::last_error()` to see where the error occurred.
Can someone help me understand the error? Also how can I retrieve med_name_one?
Here's one option
get_medication_name <- function(medication_name_df) {
medication_name <- medication_name_df %>%
dplyr::arrange(datetime) %>%
dplyr::summarize(medname = first(med_name_one)) %>%
dplyr::pull(medname)
}
age_pharma %>% mutate(medication_name = purrr::map_chr(table, get_medication_name))
First we had to change the get_medication_name function to handle the case where there are no rows in the table column which is the case in your example.
Then we need to apply the map specifically to the table column.

Build a dataframe of nested tibbles in R?

I have a couple of tibbles:
1:
structure(list(contacts = c(151, 2243, 4122, 6833, 76, 123)), .Names = "contacts", row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
2:
structure(list(image_names = c("/storage/emulated/0/Pictures/1.png",
"/storage/emulated/0/Pictures/10.png", "/storage/emulated/0/Pictures/2.png",
"/storage/emulated/0/Pictures/3.png", "/storage/emulated/0/Pictures/4.png",
"/storage/emulated/0/Pictures/5.png")), .Names = "image_names", row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
3:
structure(list(phone_number = c(22881, 74049, 74049, 22881, 22881,
22881), isInContact = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE),
callDuration = c(1, 0, 0, 71, 13, 54), Date = structure(c(17689,
17689, 17689, 17690, 17690, 17690), class = "Date"), Time = structure(c(76180,
77415, 84620, 27900, 28132, 29396), class = c("hms", "difftime"
), units = "secs")), .Names = c("phone_number", "isInContact",
"callDuration", "Date", "Time"), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
And consider that for each set of these dataframes I can get an identifier, say UUID.
I want to build a large dataframe object where the identifier will be user's uuid and all other columns will be nested tibbles:
UUID contacts images call_logs
123 <tibble> <tibble> <tibble>
456 <tibble> <tibble> <tibble>
Please advise how can I build such thing, I am trying to use map_dfr without luck.
We could place the tibbles in a list to create a single row
tblN <- tibble(contacts = list(tbl1), images = list(tbl2),
call_logs = list(tbl3))
It is not clear whether the same dataset should be replicated or not for different 'UUID's.
list(`123` = tblN, `456` = tblN) %>%
bind_rows(.id = 'UUID')

Resources