How to print milliseconds in dttm object - r

ex <- structure(list(rowid = 1:12, timestamp = structure(c(1505577931.8,
1505577931.8, 1505577931.8, 1509206767.39, 1509206767.39, 1511019574.47,
1511019574.47, 1511988378.544, 1511988378.544, 1511986281.239,
1511986281.239, 1512909143.7), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA,
-12L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("rowid",
"timestamp"))
With the data above, how can I print milliseconds in timestamp column?

We can use format to print the milliseconds
format(ex$timestamp, "%Y-%m-%d %H:%M:%OS3")
#[1] "2017-09-16 16:05:31.799" "2017-09-16 16:05:31.799" "2017-09-16 16:05:31.799"
#[4] "2017-10-28 16:06:07.390" "2017-10-28 16:06:07.390" "2017-11-18 15:39:34.470"
#[7] "2017-11-18 15:39:34.470" "2017-11-29 20:46:18.543" "2017-11-29 20:46:18.543"
#[10] "2017-11-29 20:11:21.239" "2017-11-29 20:11:21.239" "2017-12-10 12:32:23.700"

Related

reshape data into multiple columns using pivot_longer

I am using pivot_longer to reshape my data from wide to long format into multiple value columns. I know there are related questions (Pivot_longer 6 columns to 3 columns or Tidy dataset with pivot_longer: Multiple columns into two columns), but I could not find a solution so far, probably because my two columns will be of different class, the first one being POSIXct and the second one is numeric.
Here is a minimal working example:
structure(list(compid = c("AT9130162999", "AT9090003478", "AT9070005375",
"AT9130048156"), iso2c = c("AT", "AT", "AT", "AT"), nace4 = c("7010",
"4211", "2452", "7010"), lastyear = c("2018", "2019", "2019",
"2019"), `Closing date
Last avail. yr` = structure(c(1546214400,
1577750400, 1585612800, 1577750400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), `Closing date
Year - 1` = structure(c(1514678400,
1546214400, 1553990400, 1546214400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), `Closing date
Year - 2` = structure(c(NA,
1514678400, 1522454400, 1514678400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), `Closing date
Year - 3` = structure(c(NA,
1483142400, 1490918400, 1483142400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), `Closing date
Year - 4` = structure(c(NA,
1451520000, 1459382400, 1451520000), tzone = "UTC", class = c("POSIXct",
"POSIXt")), `Closing date
Year - 5` = structure(c(NA,
1419984000, 1427760000, 1419984000), tzone = "UTC", class = c("POSIXct",
"POSIXt")), `Closing date
Year - 6` = structure(c(NA,
1388448000, 1396224000, 1388448000), tzone = "UTC", class = c("POSIXct",
"POSIXt")), `Closing date
Year - 7` = structure(c(NA,
1356912000, 1364688000, 1356912000), tzone = "UTC", class = c("POSIXct",
"POSIXt")), `Closing date
Year - 8` = structure(c(NA,
1325289600, 1333152000, 1325289600), tzone = "UTC", class = c("POSIXct",
"POSIXt")), `Closing date
Year - 9` = structure(c(NA,
1293753600, 1301529600, 1293753600), tzone = "UTC", class = c("POSIXct",
"POSIXt")), operatinginc_last = c(NA, 482813, -94300, NA), operatinginc_year1 = c(NA,
423482, 780400, NA), operatinginc_year2 = c(NA, 404694, 1210300,
NA), ebit_last = c(1060000, 482813, -94300, 351292), ebit_year1 = c(1501000,
423482, 780400, 331415), ebit_year2 = c(NA, 404694, 1210300,
305492), operatingrev_last = c(28463000, 15842418, 13009700,
11742884), operatingrev_year1 = c(NA, 13734462, 13146300, 10682889
), operatingrev_year2 = c(NA, 13734462, 13146300, 10682889)), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame"))
So far, I have tried this:
df_l <- df %>%
pivot_longer(., cols = -(starts_with(c("compid","iso2c","nace4","lastyear","Closing"))),
values_to = "value", values_drop_na=T, names_sep = "_", names_to = c("variable","year"))
But now I would also like to reshape all the columns that start with Closing. How do I do (preferably in one step with pivot_longer)?
The expected output should then include a variable, year and value column, but also a closingdate and date column:
compid iso2c nace4 lastyear `closingdate ~ `date ~`variable ~`year ~ `value
<chr> <chr> <chr> <chr> <dttm> <dttm> <dttm> <dttm>
1 AT913~ AT 7010 2018 `Closing date Last avail. yr` 2018-12-31 ebit last 28463000
2 AT913~ AT 7010 2018 `Closing date Year - 1` 2017-12-31 ebit year1 15362687
2 AT913~ AT 7010 2018 `Closing date Year - 1` 2016-12-31 ebit year2 404694
I have no clue how you would do that in one call to pivot_longer, because you have different variables with different schemes. And you ALSO want to pivot to longer the closing date variable. So here it is in two calls with some cleaning of the closing variable.
library(tidyverse)
df_l <- pivot_longer(df, cols = starts_with("Closing"),
values_to = "date", values_drop_na=T, names_to = c("closing")) %>%
pivot_longer(., cols = contains("_"),
values_to = "value", values_drop_na=T, names_sep = '_', names_to = c("variable",'year')) %>%
mutate(closing = str_remove_all(closing,'Closing date') %>%
str_remove_all(.,'[:cntrl:]') %>%
str_squish() %>%
str_trim())

Why do I get Error in Error: Problem with `mutate()` input `medication_name`. x Result 1 must be a single string, not a character vector of length 2

I have a data set with another with a list of a nested data.
age_pharma <- structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8), age_band = c("5_9",
"10_14", "15-19", "20-24", "5_9", "10_14", "15-19", "20-24"),
table = list(structure(list(med_name_one = c("Co-amoxiclav",
"doxycycline"), med_name_two = c(NA, "Gentamicin"), mg_one = c("411 mg",
"120 mg"), mg_two = c(NA, "11280 mg"), datetime = c("2020-01-03 10:08",
"2020-01-01 11:08"), date_time = structure(c(1578046080,
1577876880), tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-2L)), structure(list(med_name_one = c("Gentamicin", "Co-trimoxazole"
), med_name_two = c("Co-trimoxazole", NA), mg_one = c("11280 mg",
"8 mg"), mg_two = c("8 mg", NA), datetime = c("2020-01-02 19:08",
"2020-01-08 20:08"), date_time = structure(c(1577992080,
1578514080), tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-2L)), structure(list(med_name_one = "Gentamicin", med_name_two = NA_character_,
mg_one = "11280 mg", mg_two = NA_character_, datetime = "2020-01-02 19:08",
date_time = structure(1577992080, tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-1L)), structure(list(med_name_one = "Co-trimoxazole", med_name_two = NA_character_,
mg_one = "8 mg", mg_two = NA_character_, datetime = "2020-01-08 20:08",
date_time = structure(1578514080, tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-1L)), structure(list(med_name_one = "Sodium Chloride", med_name_two = NA_character_,
mg_one = "411 mg", mg_two = NA_character_, datetime = "2020-01-10 08:08",
date_time = structure(1578643680, tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-1L)), structure(list(med_name_one = "Piperacillin", med_name_two = NA_character_,
mg_one = "120 mg", mg_two = NA_character_, datetime = "2020-01-03 09:08",
date_time = structure(1578042480, tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-1L)), structure(list(med_name_one = character(0), med_name_two = character(0),
mg_one = character(0), mg_two = character(0), datetime = character(0),
date_time = structure(numeric(0), tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = integer(0)),
structure(list(med_name_one = character(0), med_name_two = character(0),
mg_one = character(0), mg_two = character(0), datetime = character(0),
date_time = structure(numeric(0), tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = integer(0)))), row.names = c(NA, -8L), class = c("tbl_df",
"tbl", "data.frame"))
I am trying to map a variable from the list (table). The variable is called med_name_one.
get_medication_name <- function(medication_name_df) {
medication_name <- medication_name_df %>%
dplyr::group_by(id) %>%
dplyr::arrange(datetime) %>%
pull(med_name_one)
}
Here I am applying the function so that I get the med_name_one as a variable.
age_pharma <- mutate(medication_name = purrr::map(age_pharma, get_medication_name))
Yet I do not know why I get this error?
Error: Problem with `mutate()` input `medication_name`.
x Result 1 must be a single string, not a character vector of length 2
ℹ Input `medication_name` is `purrr::map_chr(table, get_medication_name)`.
Run `rlang::last_error()` to see where the error occurred.
Can someone help me understand the error? Also how can I retrieve med_name_one?
Here's one option
get_medication_name <- function(medication_name_df) {
medication_name <- medication_name_df %>%
dplyr::arrange(datetime) %>%
dplyr::summarize(medname = first(med_name_one)) %>%
dplyr::pull(medname)
}
age_pharma %>% mutate(medication_name = purrr::map_chr(table, get_medication_name))
First we had to change the get_medication_name function to handle the case where there are no rows in the table column which is the case in your example.
Then we need to apply the map specifically to the table column.

How to extract time interval data from minute data in r

I am trying to extract rows at 5 minute intervals from 1 minute data. My data looks like this:
structure(list(Date = structure(c(1509408000, 1509408000, 1509408000,
1509408000, 1509408000, 1509408000), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Time = structure(c(-2209021500, -2209021560,
-2209021620, -2209021680, -2209021740, -2209021800), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), O = c(3674, 3675, 3674, 3675, 3675,
3675), H = c(3674, 3675, 3675, 3676, 3676, 3675), L = c(3673,
3674, 3674, 3674, 3675, 3675), C = c(3673, 3674, 3674, 3675,
3675, 3675)), row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"))
structure(list(Date = structure(c(1506902400, 1506902400, 1506902400,
1506902400, 1506902400, 1506902400), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Time = structure(c(-2209071300, -2209071360,
-2209071420, -2209071480, -2209071540, -2209071600), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), O = c(3450, 3451, 3451, 3452, 3450,
3449), H = c(3451, 3451, 3451, 3452, 3452, 3451), L = c(3448,
3449, 3449, 3450, 3450, 3449), C = c(3448, 3451, 3450, 3451,
3452, 3450)), row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"))
I have looked at:
Create a time interval of 15 minutes from minutely data in R?
How to subset and extract time series by time interval in row
but none do exactly what I want. Maybe I could use this:
substr(t,15,16)=="00".
but I'm not sure how to combine it with filter.
Desired Output: find rows at 30 minute intervals:
You can extract rows with a minute-mark ending in 0 or 5 with
df[substr(format(df$Time, '%M'), 2, 2) %in% c(0, 5),]
# or
df[as.numeric(format(df$Time, '%M')) %% 5 == 0,]
# or
df[grep('[0|5]$', format(df$Time, '%M')),]
With filter:
library(dplyr)
df %>%
filter(substr(format(df$Time, '%M'), 2, 2) %in% c(0, 5))
# or
df %>%
filter(as.numeric(format(df$Time, '%M')) %% 5 == 0)

rbind fails to bind datetime column

I am binding a number of data frames data frames and have noticed that I get weird values in one of the bindings. Datetime in second df is disturbed after binding, it is one hour less than in original df.
kk <- structure(list(date = structure(c(1499133600, 1499137200, 1499140800,
1499144400), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
temp = c(14.7, 14.6, 14.3, 14.2)), .Names = c("date", "temp"
), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))
ff <- structure(list(date = structure(c(1499144400, 1499148000, 1499151600,
1499155200), class = c("POSIXct", "POSIXt"), tzone = ""), temp = 14:17), .Names = c("date",
"temp"), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))
Calling functions from different packages give me same result:
dplyr:: bind_rows(kk, ff)
data.table::rbindlist(list(kk, ff))
rbind(kk,ff)
I do not get what is going on. Could it have something to do with date format?

aaply for data.table to find length of intersection of interval

I have data like this:
View(dose_merged)
SUBJECT_Blinded PACKID SACDPDAT SACRTDAT treatment_interval SD_SDAT SD_EDAT
1 1501301 10094 2012-05-26 2012-07-23 58 2012-01-03 2013-01-02
2 1601301 10555 2012-01-03 2012-01-31 28 2012-01-03 2013-01-0
With columns types in data table:
> mapply(class, dose_merged)
$SUBJECT_Blinded
[1] "numeric"
$PACKID
[1] "numeric"
$SACDPDAT
[1] "POSIXct" "POSIXt"
$SACRTDAT
[1] "POSIXct" "POSIXt"
$treatment_interval
[1] "Interval"
attr(,"package")
[1] "lubridate"
$SD_SDAT
[1] "POSIXct" "POSIXt"
$SD_EDAT
[1] "POSIXct" "POSIXt"
I want to determine the length of intersection of intervals: interval(SACDPDAT, SACRTDAT) and interval(SD_SDAT, SD_EDAT).
I am trying this:
dose_merged[,intersect1 := aaply(dose_merged, 1, function(x){intersect(interval(x[3],x[4]), interval(x[8],x[9]))})]
But then I get error message:
Error: error while computing 'x' when choosing method for 'intersect': Error in as.POSIXct.default(start) :
do not know how to convert 'start' to class “POSIXct”
The line
intersect(interval(x[3],x[4]), interval(x[8],x[9]))})
works for specified row x.
Any ideas what I am doing wrong ?
The first two rows of dput(dose_merge):
structure(list(SUBJECT_Blinded = c(1101001, 1101001), PACKID = c(10096,
10595), SACDPDAT = structure(c(1335304800, 1325545200), class = c("POSIXct",
"POSIXt"), tzone = ""), SACRTDAT = structure(c(1340316000, 1327964400
), class = c("POSIXct", "POSIXt"), tzone = ""), treatment_interval = structure(c(58,
28), class = structure("Interval", package = "lubridate")), TS_SDAT = structure(c(NA_real_,
NA_real_), class = c("POSIXct", "POSIXt"), tzone = ""), TS_EDAT = structure(c(NA_real_,
NA_real_), class = c("POSIXct", "POSIXt"), tzone = ""), SD_SDAT = structure(c(1325545200,
1325545200), class = c("POSIXct", "POSIXt"), tzone = ""), SD_EDAT = structure(c(1357081200,
1357081200), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("SUBJECT_Blinded",
"PACKID", "SACDPDAT", "SACRTDAT", "treatment_interval", "TS_SDAT",
"TS_EDAT", "SD_SDAT", "SD_EDAT"), sorted = "SUBJECT_Blinded", class = c("data.table",
"data.frame"), row.names = c(NA, -2L), .internal.selfref = <pointer: 0x0000000002f30788>)

Resources