rbind fails to bind datetime column - r

I am binding a number of data frames data frames and have noticed that I get weird values in one of the bindings. Datetime in second df is disturbed after binding, it is one hour less than in original df.
kk <- structure(list(date = structure(c(1499133600, 1499137200, 1499140800,
1499144400), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
temp = c(14.7, 14.6, 14.3, 14.2)), .Names = c("date", "temp"
), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))
ff <- structure(list(date = structure(c(1499144400, 1499148000, 1499151600,
1499155200), class = c("POSIXct", "POSIXt"), tzone = ""), temp = 14:17), .Names = c("date",
"temp"), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))
Calling functions from different packages give me same result:
dplyr:: bind_rows(kk, ff)
data.table::rbindlist(list(kk, ff))
rbind(kk,ff)
I do not get what is going on. Could it have something to do with date format?

Related

Why do I get Error in Error: Problem with `mutate()` input `medication_name`. x Result 1 must be a single string, not a character vector of length 2

I have a data set with another with a list of a nested data.
age_pharma <- structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8), age_band = c("5_9",
"10_14", "15-19", "20-24", "5_9", "10_14", "15-19", "20-24"),
table = list(structure(list(med_name_one = c("Co-amoxiclav",
"doxycycline"), med_name_two = c(NA, "Gentamicin"), mg_one = c("411 mg",
"120 mg"), mg_two = c(NA, "11280 mg"), datetime = c("2020-01-03 10:08",
"2020-01-01 11:08"), date_time = structure(c(1578046080,
1577876880), tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-2L)), structure(list(med_name_one = c("Gentamicin", "Co-trimoxazole"
), med_name_two = c("Co-trimoxazole", NA), mg_one = c("11280 mg",
"8 mg"), mg_two = c("8 mg", NA), datetime = c("2020-01-02 19:08",
"2020-01-08 20:08"), date_time = structure(c(1577992080,
1578514080), tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-2L)), structure(list(med_name_one = "Gentamicin", med_name_two = NA_character_,
mg_one = "11280 mg", mg_two = NA_character_, datetime = "2020-01-02 19:08",
date_time = structure(1577992080, tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-1L)), structure(list(med_name_one = "Co-trimoxazole", med_name_two = NA_character_,
mg_one = "8 mg", mg_two = NA_character_, datetime = "2020-01-08 20:08",
date_time = structure(1578514080, tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-1L)), structure(list(med_name_one = "Sodium Chloride", med_name_two = NA_character_,
mg_one = "411 mg", mg_two = NA_character_, datetime = "2020-01-10 08:08",
date_time = structure(1578643680, tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-1L)), structure(list(med_name_one = "Piperacillin", med_name_two = NA_character_,
mg_one = "120 mg", mg_two = NA_character_, datetime = "2020-01-03 09:08",
date_time = structure(1578042480, tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-1L)), structure(list(med_name_one = character(0), med_name_two = character(0),
mg_one = character(0), mg_two = character(0), datetime = character(0),
date_time = structure(numeric(0), tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = integer(0)),
structure(list(med_name_one = character(0), med_name_two = character(0),
mg_one = character(0), mg_two = character(0), datetime = character(0),
date_time = structure(numeric(0), tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = integer(0)))), row.names = c(NA, -8L), class = c("tbl_df",
"tbl", "data.frame"))
I am trying to map a variable from the list (table). The variable is called med_name_one.
get_medication_name <- function(medication_name_df) {
medication_name <- medication_name_df %>%
dplyr::group_by(id) %>%
dplyr::arrange(datetime) %>%
pull(med_name_one)
}
Here I am applying the function so that I get the med_name_one as a variable.
age_pharma <- mutate(medication_name = purrr::map(age_pharma, get_medication_name))
Yet I do not know why I get this error?
Error: Problem with `mutate()` input `medication_name`.
x Result 1 must be a single string, not a character vector of length 2
ℹ Input `medication_name` is `purrr::map_chr(table, get_medication_name)`.
Run `rlang::last_error()` to see where the error occurred.
Can someone help me understand the error? Also how can I retrieve med_name_one?
Here's one option
get_medication_name <- function(medication_name_df) {
medication_name <- medication_name_df %>%
dplyr::arrange(datetime) %>%
dplyr::summarize(medname = first(med_name_one)) %>%
dplyr::pull(medname)
}
age_pharma %>% mutate(medication_name = purrr::map_chr(table, get_medication_name))
First we had to change the get_medication_name function to handle the case where there are no rows in the table column which is the case in your example.
Then we need to apply the map specifically to the table column.

map and mutate over a list of tbl_df

I am trying to map over a list of data frames in R but not getting it right. What I am trying is:
lst %>%
map(~mutate(., NewColumn1 = .x$value*2,))
With error:
Error: Column NewColumn1 must be length 2 (the number of rows) or
one, not 0 In addition: Warning message: Unknown or uninitialised
column: 'value'.
The data looks like:
[[9]]
# A tibble: 2 x 4
time ID Value out
<date> <chr> <dbl> <dbl>
1 2016-12-23 CAT1 790. 0
2 2016-12-27 CAT1 792. 1
[[10]]
# A tibble: 2 x 4
time ID Value out
<date> <chr> <dbl> <dbl>
1 2016-12-28 CAT1 785. 0
2 2016-12-29 CAT1 783. 0
DATA:
Data <- list(structure(list(time = structure(c(17136, 17137), class = "Date"),
ID = c("CAT1", "CAT1"), Value = c(747.919983, 750.5), out = c(0,
1)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
)), structure(list(time = structure(c(17140, 17141), class = "Date"),
ID = c("CAT1", "CAT1"), Value = c(762.52002, 759.109985),
out = c(1, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17142,
17143), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(771.190002,
776.419983), out = c(1, 1)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17144,
17147), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(789.289978,
789.27002), out = c(1, 1)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17148,
17149), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(796.099976,
797.070007), out = c(1, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17150,
17151), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(797.849976,
790.799988), out = c(1, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17154,
17155), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(794.200012,
796.419983), out = c(1, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17156,
17157), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(794.559998,
791.26001), out = c(0, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17158,
17162), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(789.909973,
791.549988), out = c(0, 1)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(time = structure(c(17163,
17164), class = "Date"), ID = c("CAT1", "CAT1"), Value = c(785.049988,
782.789978), out = c(0, 0)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame")))
Take a look at the error message Unknown or uninitialised column: 'value'., then look at your code map(Data, ~mutate(., NewColumn1 = .x$value*2,)). The column name is Value and not value (case is important!).
Your syntax can also be cleaned up a bit. Try map(Data, ~mutate(., NewColumn1 = Value*2)). Technically, I think . and .x refer to the same thing, but it's better to be consistent. In mutate you also don't need to subset the data frame, i.e. mutate(df, new_col = old_col) is enough, you don't need mutate(df, new_col = .$old_col).

Build a dataframe of nested tibbles in R?

I have a couple of tibbles:
1:
structure(list(contacts = c(151, 2243, 4122, 6833, 76, 123)), .Names = "contacts", row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
2:
structure(list(image_names = c("/storage/emulated/0/Pictures/1.png",
"/storage/emulated/0/Pictures/10.png", "/storage/emulated/0/Pictures/2.png",
"/storage/emulated/0/Pictures/3.png", "/storage/emulated/0/Pictures/4.png",
"/storage/emulated/0/Pictures/5.png")), .Names = "image_names", row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
3:
structure(list(phone_number = c(22881, 74049, 74049, 22881, 22881,
22881), isInContact = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE),
callDuration = c(1, 0, 0, 71, 13, 54), Date = structure(c(17689,
17689, 17689, 17690, 17690, 17690), class = "Date"), Time = structure(c(76180,
77415, 84620, 27900, 28132, 29396), class = c("hms", "difftime"
), units = "secs")), .Names = c("phone_number", "isInContact",
"callDuration", "Date", "Time"), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
And consider that for each set of these dataframes I can get an identifier, say UUID.
I want to build a large dataframe object where the identifier will be user's uuid and all other columns will be nested tibbles:
UUID contacts images call_logs
123 <tibble> <tibble> <tibble>
456 <tibble> <tibble> <tibble>
Please advise how can I build such thing, I am trying to use map_dfr without luck.
We could place the tibbles in a list to create a single row
tblN <- tibble(contacts = list(tbl1), images = list(tbl2),
call_logs = list(tbl3))
It is not clear whether the same dataset should be replicated or not for different 'UUID's.
list(`123` = tblN, `456` = tblN) %>%
bind_rows(.id = 'UUID')

aaply for data.table to find length of intersection of interval

I have data like this:
View(dose_merged)
SUBJECT_Blinded PACKID SACDPDAT SACRTDAT treatment_interval SD_SDAT SD_EDAT
1 1501301 10094 2012-05-26 2012-07-23 58 2012-01-03 2013-01-02
2 1601301 10555 2012-01-03 2012-01-31 28 2012-01-03 2013-01-0
With columns types in data table:
> mapply(class, dose_merged)
$SUBJECT_Blinded
[1] "numeric"
$PACKID
[1] "numeric"
$SACDPDAT
[1] "POSIXct" "POSIXt"
$SACRTDAT
[1] "POSIXct" "POSIXt"
$treatment_interval
[1] "Interval"
attr(,"package")
[1] "lubridate"
$SD_SDAT
[1] "POSIXct" "POSIXt"
$SD_EDAT
[1] "POSIXct" "POSIXt"
I want to determine the length of intersection of intervals: interval(SACDPDAT, SACRTDAT) and interval(SD_SDAT, SD_EDAT).
I am trying this:
dose_merged[,intersect1 := aaply(dose_merged, 1, function(x){intersect(interval(x[3],x[4]), interval(x[8],x[9]))})]
But then I get error message:
Error: error while computing 'x' when choosing method for 'intersect': Error in as.POSIXct.default(start) :
do not know how to convert 'start' to class “POSIXct”
The line
intersect(interval(x[3],x[4]), interval(x[8],x[9]))})
works for specified row x.
Any ideas what I am doing wrong ?
The first two rows of dput(dose_merge):
structure(list(SUBJECT_Blinded = c(1101001, 1101001), PACKID = c(10096,
10595), SACDPDAT = structure(c(1335304800, 1325545200), class = c("POSIXct",
"POSIXt"), tzone = ""), SACRTDAT = structure(c(1340316000, 1327964400
), class = c("POSIXct", "POSIXt"), tzone = ""), treatment_interval = structure(c(58,
28), class = structure("Interval", package = "lubridate")), TS_SDAT = structure(c(NA_real_,
NA_real_), class = c("POSIXct", "POSIXt"), tzone = ""), TS_EDAT = structure(c(NA_real_,
NA_real_), class = c("POSIXct", "POSIXt"), tzone = ""), SD_SDAT = structure(c(1325545200,
1325545200), class = c("POSIXct", "POSIXt"), tzone = ""), SD_EDAT = structure(c(1357081200,
1357081200), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("SUBJECT_Blinded",
"PACKID", "SACDPDAT", "SACRTDAT", "treatment_interval", "TS_SDAT",
"TS_EDAT", "SD_SDAT", "SD_EDAT"), sorted = "SUBJECT_Blinded", class = c("data.table",
"data.frame"), row.names = c(NA, -2L), .internal.selfref = <pointer: 0x0000000002f30788>)

how to convert data frame into json format in R

I have a data frame like this:
dput(head(y,20))
structure(list(DATETIME = structure(c(1369540800, 1369541700,
1369542600, 1369543500, 1369544400, 1369545300, 1369546200, 1369547100,
1369548000, 1369548900, 1369549800, 1369550700, 1369551600, 1369552500,
1369553400, 1369554300, 1369555200, 1369556100, 1369557000, 1369557900
), class = c("POSIXct", "POSIXt"), tzone = ""), CPU = c(14.84,
13.6333333333333, 14.7666666666667, 13.5333333333333, 17.8666666666667,
15.9333333333333, 14.2333333333333, 13.3, 10.8333333333333, 9.76666666666667,
8.93333333333333, 9.43333333333333, 10.2, 6.63333333333333, 13,
14.3, 15.3666666666667, 16.6666666666667, 17.8666666666667, 14.7
)), .Names = c("DATETIME", "CPU"), row.names = c(NA, 20L), class = "data.frame")
I would like to convert this data frame to json format as below:
library(RJSONIO)
data<-toJSON(y)
cat(data, file="data.json"
when I look at the data.json file, I only the see DATETIME HEADER, not the CPU. What am I doing wrong here?
[{"DATETIME":[1369540800,1369541700,1369542600,1369543500,1369544400

Resources