Build a dataframe of nested tibbles in R? - r

I have a couple of tibbles:
1:
structure(list(contacts = c(151, 2243, 4122, 6833, 76, 123)), .Names = "contacts", row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
2:
structure(list(image_names = c("/storage/emulated/0/Pictures/1.png",
"/storage/emulated/0/Pictures/10.png", "/storage/emulated/0/Pictures/2.png",
"/storage/emulated/0/Pictures/3.png", "/storage/emulated/0/Pictures/4.png",
"/storage/emulated/0/Pictures/5.png")), .Names = "image_names", row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
3:
structure(list(phone_number = c(22881, 74049, 74049, 22881, 22881,
22881), isInContact = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE),
callDuration = c(1, 0, 0, 71, 13, 54), Date = structure(c(17689,
17689, 17689, 17690, 17690, 17690), class = "Date"), Time = structure(c(76180,
77415, 84620, 27900, 28132, 29396), class = c("hms", "difftime"
), units = "secs")), .Names = c("phone_number", "isInContact",
"callDuration", "Date", "Time"), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
And consider that for each set of these dataframes I can get an identifier, say UUID.
I want to build a large dataframe object where the identifier will be user's uuid and all other columns will be nested tibbles:
UUID contacts images call_logs
123 <tibble> <tibble> <tibble>
456 <tibble> <tibble> <tibble>
Please advise how can I build such thing, I am trying to use map_dfr without luck.

We could place the tibbles in a list to create a single row
tblN <- tibble(contacts = list(tbl1), images = list(tbl2),
call_logs = list(tbl3))
It is not clear whether the same dataset should be replicated or not for different 'UUID's.
list(`123` = tblN, `456` = tblN) %>%
bind_rows(.id = 'UUID')

Related

Randomize one column values based on multiple other columns

I have the following df:
structure(list(Donorcode = c("406A001", "406A002", "406A003",
"406A004"), Doos = c(1, 1, 2, 2), `Leeftijd T0` = c(70, 73, 79,
75), Instituut = c("Spaarne ziekenhuis", "Spaarne ziekenhuis",
"Spaarne ziekenhuis", "Spaarne ziekenhuis"), Datum = structure(c(1567468800,
1567468800, 1567468800, 1567468800), class = c("POSIXct", "POSIXt"
), tzone = "UTC")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-4L))
I need to randomize the column 'Donorcode' based on the other 4 columns, not one column 'weighs' more than the other so the order of which column randomizes the Donorcode column first does not matter.
Is there a way to do this in R?
Many thanks!

Why do I get Error in Error: Problem with `mutate()` input `medication_name`. x Result 1 must be a single string, not a character vector of length 2

I have a data set with another with a list of a nested data.
age_pharma <- structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8), age_band = c("5_9",
"10_14", "15-19", "20-24", "5_9", "10_14", "15-19", "20-24"),
table = list(structure(list(med_name_one = c("Co-amoxiclav",
"doxycycline"), med_name_two = c(NA, "Gentamicin"), mg_one = c("411 mg",
"120 mg"), mg_two = c(NA, "11280 mg"), datetime = c("2020-01-03 10:08",
"2020-01-01 11:08"), date_time = structure(c(1578046080,
1577876880), tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-2L)), structure(list(med_name_one = c("Gentamicin", "Co-trimoxazole"
), med_name_two = c("Co-trimoxazole", NA), mg_one = c("11280 mg",
"8 mg"), mg_two = c("8 mg", NA), datetime = c("2020-01-02 19:08",
"2020-01-08 20:08"), date_time = structure(c(1577992080,
1578514080), tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-2L)), structure(list(med_name_one = "Gentamicin", med_name_two = NA_character_,
mg_one = "11280 mg", mg_two = NA_character_, datetime = "2020-01-02 19:08",
date_time = structure(1577992080, tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-1L)), structure(list(med_name_one = "Co-trimoxazole", med_name_two = NA_character_,
mg_one = "8 mg", mg_two = NA_character_, datetime = "2020-01-08 20:08",
date_time = structure(1578514080, tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-1L)), structure(list(med_name_one = "Sodium Chloride", med_name_two = NA_character_,
mg_one = "411 mg", mg_two = NA_character_, datetime = "2020-01-10 08:08",
date_time = structure(1578643680, tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-1L)), structure(list(med_name_one = "Piperacillin", med_name_two = NA_character_,
mg_one = "120 mg", mg_two = NA_character_, datetime = "2020-01-03 09:08",
date_time = structure(1578042480, tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-1L)), structure(list(med_name_one = character(0), med_name_two = character(0),
mg_one = character(0), mg_two = character(0), datetime = character(0),
date_time = structure(numeric(0), tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = integer(0)),
structure(list(med_name_one = character(0), med_name_two = character(0),
mg_one = character(0), mg_two = character(0), datetime = character(0),
date_time = structure(numeric(0), tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = integer(0)))), row.names = c(NA, -8L), class = c("tbl_df",
"tbl", "data.frame"))
I am trying to map a variable from the list (table). The variable is called med_name_one.
get_medication_name <- function(medication_name_df) {
medication_name <- medication_name_df %>%
dplyr::group_by(id) %>%
dplyr::arrange(datetime) %>%
pull(med_name_one)
}
Here I am applying the function so that I get the med_name_one as a variable.
age_pharma <- mutate(medication_name = purrr::map(age_pharma, get_medication_name))
Yet I do not know why I get this error?
Error: Problem with `mutate()` input `medication_name`.
x Result 1 must be a single string, not a character vector of length 2
ℹ Input `medication_name` is `purrr::map_chr(table, get_medication_name)`.
Run `rlang::last_error()` to see where the error occurred.
Can someone help me understand the error? Also how can I retrieve med_name_one?
Here's one option
get_medication_name <- function(medication_name_df) {
medication_name <- medication_name_df %>%
dplyr::arrange(datetime) %>%
dplyr::summarize(medname = first(med_name_one)) %>%
dplyr::pull(medname)
}
age_pharma %>% mutate(medication_name = purrr::map_chr(table, get_medication_name))
First we had to change the get_medication_name function to handle the case where there are no rows in the table column which is the case in your example.
Then we need to apply the map specifically to the table column.

Filter a nested list of dataframes based on logical vector in R

I have a nested list of coordinates:
coords <- list(`41` = structure(list(lon = c(11.9170235974052, 11.9890348226944,11.9266305690725),
lat = c(48.0539406017157, 48.0618200883643,48.0734094557987)),
class = "data.frame", row.names = c(NA, -3L )),
`51` = structure(list(lon = c(11.9700157009047, 11.9661664366154,11.9111812165745),
lat = c(48.0524843177559, 48.0645786453912, 48.0623193233537)),
class = "data.frame", row.names = c(NA, -3L)),
`61` = structure(list(lon = c(11.9464237941416, 11.9536554768081,11.9112311461624),
lat = c(48.040970408282, 48.0408864989903, 48.0284615642167)),
class = "data.frame", row.names = c(NA, -3L )),
`71` = structure(list(lon = c(11.9274864543974, 11.8733675039864,11.933264512569),
lat = c(48.0135478382282, 48.0216452485664, 48.0289752363299)),
class = "data.frame", row.names = c(NA, -3L)),
`81` = structure(list(lon = c(11.8837173493491, 11.9072450330566,11.8943898749275),
lat = c(48.0266639859759, 48.0132853717376, 48.0327326995006)),
class = "data.frame", row.names = c(NA, -3L )),
`91` = structure(list(lon = c(11.882538477087, 11.8377742591454,11.8817027393128),
lat = c(48.0284081468982, 48.022864811514, 48.0229810559649)),
class = "data.frame", row.names = c(NA, -3L )))
I would like to get this list filterd based on nested list of logical values.
index <- list(`41` = c(TRUE, TRUE, FALSE), `51` = c(FALSE, FALSE, TRUE
), `61` = c(FALSE, FALSE, FALSE), `71` = c(FALSE, FALSE, FALSE
), `81` = c(FALSE, FALSE, FALSE), `91` = c(FALSE, FALSE, FALSE))
What is the best approach to do so?
I tried to unlist the nested lists or to create a data.frame but it did not worked out.
Thank you!
You can use Map like this :
Map(function(x, y) x[y, ], coords, index)
#$`41`
# lon lat
#1 11.91702 48.05394
#2 11.98903 48.06182
#$`51`
# lon lat
#3 11.91118 48.06232
#$`61`
#[1] lon lat
#<0 rows> (or 0-length row.names)
#...
#...
In tidyverse :
library(purrr)
library(dplyr)
map2(coords, index, ~.x %>% filter(.y))
This answer works well to turn the lists in to data frames. If the ordering is consistent then I think this is what you need
library(purrr)
# use solution to convert lists to dataframes, storing the names in id column
coords_df <- map_df(coords, ~as.data.frame(.x), .id="id")
index_df <- map_df(index, ~as.data.frame(.x), .id="id")
# filter coordinates on the values in index
coords_df[index_df$.x,]

map and mutate over a list of tbl_df

I am trying to map over a list of data frames in R but not getting it right. What I am trying is:
lst %>%
map(~mutate(., NewColumn1 = .x$value*2,))
With error:
Error: Column NewColumn1 must be length 2 (the number of rows) or
one, not 0 In addition: Warning message: Unknown or uninitialised
column: 'value'.
The data looks like:
[[9]]
# A tibble: 2 x 4
time ID Value out
<date> <chr> <dbl> <dbl>
1 2016-12-23 CAT1 790. 0
2 2016-12-27 CAT1 792. 1
[[10]]
# A tibble: 2 x 4
time ID Value out
<date> <chr> <dbl> <dbl>
1 2016-12-28 CAT1 785. 0
2 2016-12-29 CAT1 783. 0
DATA:
Data <- list(structure(list(time = structure(c(17136, 17137), class = "Date"),
ID = c("CAT1", "CAT1"), Value = c(747.919983, 750.5), out = c(0,
1)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
)), structure(list(time = structure(c(17140, 17141), class = "Date"),
ID = c("CAT1", "CAT1"), Value = c(762.52002, 759.109985),
out = c(1, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17142,
17143), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(771.190002,
776.419983), out = c(1, 1)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17144,
17147), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(789.289978,
789.27002), out = c(1, 1)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17148,
17149), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(796.099976,
797.070007), out = c(1, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17150,
17151), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(797.849976,
790.799988), out = c(1, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17154,
17155), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(794.200012,
796.419983), out = c(1, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17156,
17157), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(794.559998,
791.26001), out = c(0, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17158,
17162), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(789.909973,
791.549988), out = c(0, 1)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17163,
17164), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(785.049988,
782.789978), out = c(0, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")))
Take a look at the error message Unknown or uninitialised column: 'value'., then look at your code map(Data, ~mutate(., NewColumn1 = .x$value*2,)). The column name is Value and not value (case is important!).
Your syntax can also be cleaned up a bit. Try map(Data, ~mutate(., NewColumn1 = Value*2)). Technically, I think . and .x refer to the same thing, but it's better to be consistent. In mutate you also don't need to subset the data frame, i.e. mutate(df, new_col = old_col) is enough, you don't need mutate(df, new_col = .$old_col).

rbind fails to bind datetime column

I am binding a number of data frames data frames and have noticed that I get weird values in one of the bindings. Datetime in second df is disturbed after binding, it is one hour less than in original df.
kk <- structure(list(date = structure(c(1499133600, 1499137200, 1499140800,
1499144400), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
temp = c(14.7, 14.6, 14.3, 14.2)), .Names = c("date", "temp"
), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))
ff <- structure(list(date = structure(c(1499144400, 1499148000, 1499151600,
1499155200), class = c("POSIXct", "POSIXt"), tzone = ""), temp = 14:17), .Names = c("date",
"temp"), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))
Calling functions from different packages give me same result:
dplyr:: bind_rows(kk, ff)
data.table::rbindlist(list(kk, ff))
rbind(kk,ff)
I do not get what is going on. Could it have something to do with date format?

Resources