How to extract time interval data from minute data in r - r

I am trying to extract rows at 5 minute intervals from 1 minute data. My data looks like this:
structure(list(Date = structure(c(1509408000, 1509408000, 1509408000,
1509408000, 1509408000, 1509408000), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Time = structure(c(-2209021500, -2209021560,
-2209021620, -2209021680, -2209021740, -2209021800), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), O = c(3674, 3675, 3674, 3675, 3675,
3675), H = c(3674, 3675, 3675, 3676, 3676, 3675), L = c(3673,
3674, 3674, 3674, 3675, 3675), C = c(3673, 3674, 3674, 3675,
3675, 3675)), row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"))
structure(list(Date = structure(c(1506902400, 1506902400, 1506902400,
1506902400, 1506902400, 1506902400), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Time = structure(c(-2209071300, -2209071360,
-2209071420, -2209071480, -2209071540, -2209071600), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), O = c(3450, 3451, 3451, 3452, 3450,
3449), H = c(3451, 3451, 3451, 3452, 3452, 3451), L = c(3448,
3449, 3449, 3450, 3450, 3449), C = c(3448, 3451, 3450, 3451,
3452, 3450)), row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"))
I have looked at:
Create a time interval of 15 minutes from minutely data in R?
How to subset and extract time series by time interval in row
but none do exactly what I want. Maybe I could use this:
substr(t,15,16)=="00".
but I'm not sure how to combine it with filter.
Desired Output: find rows at 30 minute intervals:

You can extract rows with a minute-mark ending in 0 or 5 with
df[substr(format(df$Time, '%M'), 2, 2) %in% c(0, 5),]
# or
df[as.numeric(format(df$Time, '%M')) %% 5 == 0,]
# or
df[grep('[0|5]$', format(df$Time, '%M')),]
With filter:
library(dplyr)
df %>%
filter(substr(format(df$Time, '%M'), 2, 2) %in% c(0, 5))
# or
df %>%
filter(as.numeric(format(df$Time, '%M')) %% 5 == 0)

Related

Randomize one column values based on multiple other columns

I have the following df:
structure(list(Donorcode = c("406A001", "406A002", "406A003",
"406A004"), Doos = c(1, 1, 2, 2), `Leeftijd T0` = c(70, 73, 79,
75), Instituut = c("Spaarne ziekenhuis", "Spaarne ziekenhuis",
"Spaarne ziekenhuis", "Spaarne ziekenhuis"), Datum = structure(c(1567468800,
1567468800, 1567468800, 1567468800), class = c("POSIXct", "POSIXt"
), tzone = "UTC")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-4L))
I need to randomize the column 'Donorcode' based on the other 4 columns, not one column 'weighs' more than the other so the order of which column randomizes the Donorcode column first does not matter.
Is there a way to do this in R?
Many thanks!

reshape data into multiple columns using pivot_longer

I am using pivot_longer to reshape my data from wide to long format into multiple value columns. I know there are related questions (Pivot_longer 6 columns to 3 columns or Tidy dataset with pivot_longer: Multiple columns into two columns), but I could not find a solution so far, probably because my two columns will be of different class, the first one being POSIXct and the second one is numeric.
Here is a minimal working example:
structure(list(compid = c("AT9130162999", "AT9090003478", "AT9070005375",
"AT9130048156"), iso2c = c("AT", "AT", "AT", "AT"), nace4 = c("7010",
"4211", "2452", "7010"), lastyear = c("2018", "2019", "2019",
"2019"), `Closing date
Last avail. yr` = structure(c(1546214400,
1577750400, 1585612800, 1577750400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), `Closing date
Year - 1` = structure(c(1514678400,
1546214400, 1553990400, 1546214400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), `Closing date
Year - 2` = structure(c(NA,
1514678400, 1522454400, 1514678400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), `Closing date
Year - 3` = structure(c(NA,
1483142400, 1490918400, 1483142400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), `Closing date
Year - 4` = structure(c(NA,
1451520000, 1459382400, 1451520000), tzone = "UTC", class = c("POSIXct",
"POSIXt")), `Closing date
Year - 5` = structure(c(NA,
1419984000, 1427760000, 1419984000), tzone = "UTC", class = c("POSIXct",
"POSIXt")), `Closing date
Year - 6` = structure(c(NA,
1388448000, 1396224000, 1388448000), tzone = "UTC", class = c("POSIXct",
"POSIXt")), `Closing date
Year - 7` = structure(c(NA,
1356912000, 1364688000, 1356912000), tzone = "UTC", class = c("POSIXct",
"POSIXt")), `Closing date
Year - 8` = structure(c(NA,
1325289600, 1333152000, 1325289600), tzone = "UTC", class = c("POSIXct",
"POSIXt")), `Closing date
Year - 9` = structure(c(NA,
1293753600, 1301529600, 1293753600), tzone = "UTC", class = c("POSIXct",
"POSIXt")), operatinginc_last = c(NA, 482813, -94300, NA), operatinginc_year1 = c(NA,
423482, 780400, NA), operatinginc_year2 = c(NA, 404694, 1210300,
NA), ebit_last = c(1060000, 482813, -94300, 351292), ebit_year1 = c(1501000,
423482, 780400, 331415), ebit_year2 = c(NA, 404694, 1210300,
305492), operatingrev_last = c(28463000, 15842418, 13009700,
11742884), operatingrev_year1 = c(NA, 13734462, 13146300, 10682889
), operatingrev_year2 = c(NA, 13734462, 13146300, 10682889)), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame"))
So far, I have tried this:
df_l <- df %>%
pivot_longer(., cols = -(starts_with(c("compid","iso2c","nace4","lastyear","Closing"))),
values_to = "value", values_drop_na=T, names_sep = "_", names_to = c("variable","year"))
But now I would also like to reshape all the columns that start with Closing. How do I do (preferably in one step with pivot_longer)?
The expected output should then include a variable, year and value column, but also a closingdate and date column:
compid iso2c nace4 lastyear `closingdate ~ `date ~`variable ~`year ~ `value
<chr> <chr> <chr> <chr> <dttm> <dttm> <dttm> <dttm>
1 AT913~ AT 7010 2018 `Closing date Last avail. yr` 2018-12-31 ebit last 28463000
2 AT913~ AT 7010 2018 `Closing date Year - 1` 2017-12-31 ebit year1 15362687
2 AT913~ AT 7010 2018 `Closing date Year - 1` 2016-12-31 ebit year2 404694
I have no clue how you would do that in one call to pivot_longer, because you have different variables with different schemes. And you ALSO want to pivot to longer the closing date variable. So here it is in two calls with some cleaning of the closing variable.
library(tidyverse)
df_l <- pivot_longer(df, cols = starts_with("Closing"),
values_to = "date", values_drop_na=T, names_to = c("closing")) %>%
pivot_longer(., cols = contains("_"),
values_to = "value", values_drop_na=T, names_sep = '_', names_to = c("variable",'year')) %>%
mutate(closing = str_remove_all(closing,'Closing date') %>%
str_remove_all(.,'[:cntrl:]') %>%
str_squish() %>%
str_trim())

Why do I get Error in Error: Problem with `mutate()` input `medication_name`. x Result 1 must be a single string, not a character vector of length 2

I have a data set with another with a list of a nested data.
age_pharma <- structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8), age_band = c("5_9",
"10_14", "15-19", "20-24", "5_9", "10_14", "15-19", "20-24"),
table = list(structure(list(med_name_one = c("Co-amoxiclav",
"doxycycline"), med_name_two = c(NA, "Gentamicin"), mg_one = c("411 mg",
"120 mg"), mg_two = c(NA, "11280 mg"), datetime = c("2020-01-03 10:08",
"2020-01-01 11:08"), date_time = structure(c(1578046080,
1577876880), tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-2L)), structure(list(med_name_one = c("Gentamicin", "Co-trimoxazole"
), med_name_two = c("Co-trimoxazole", NA), mg_one = c("11280 mg",
"8 mg"), mg_two = c("8 mg", NA), datetime = c("2020-01-02 19:08",
"2020-01-08 20:08"), date_time = structure(c(1577992080,
1578514080), tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-2L)), structure(list(med_name_one = "Gentamicin", med_name_two = NA_character_,
mg_one = "11280 mg", mg_two = NA_character_, datetime = "2020-01-02 19:08",
date_time = structure(1577992080, tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-1L)), structure(list(med_name_one = "Co-trimoxazole", med_name_two = NA_character_,
mg_one = "8 mg", mg_two = NA_character_, datetime = "2020-01-08 20:08",
date_time = structure(1578514080, tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-1L)), structure(list(med_name_one = "Sodium Chloride", med_name_two = NA_character_,
mg_one = "411 mg", mg_two = NA_character_, datetime = "2020-01-10 08:08",
date_time = structure(1578643680, tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-1L)), structure(list(med_name_one = "Piperacillin", med_name_two = NA_character_,
mg_one = "120 mg", mg_two = NA_character_, datetime = "2020-01-03 09:08",
date_time = structure(1578042480, tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-1L)), structure(list(med_name_one = character(0), med_name_two = character(0),
mg_one = character(0), mg_two = character(0), datetime = character(0),
date_time = structure(numeric(0), tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"), row.names = integer(0)),
structure(list(med_name_one = character(0), med_name_two = character(0),
mg_one = character(0), mg_two = character(0), datetime = character(0),
date_time = structure(numeric(0), tzone = "Europe/London", class = c("POSIXct",
"POSIXt"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = integer(0)))), row.names = c(NA, -8L), class = c("tbl_df",
"tbl", "data.frame"))
I am trying to map a variable from the list (table). The variable is called med_name_one.
get_medication_name <- function(medication_name_df) {
medication_name <- medication_name_df %>%
dplyr::group_by(id) %>%
dplyr::arrange(datetime) %>%
pull(med_name_one)
}
Here I am applying the function so that I get the med_name_one as a variable.
age_pharma <- mutate(medication_name = purrr::map(age_pharma, get_medication_name))
Yet I do not know why I get this error?
Error: Problem with `mutate()` input `medication_name`.
x Result 1 must be a single string, not a character vector of length 2
ℹ Input `medication_name` is `purrr::map_chr(table, get_medication_name)`.
Run `rlang::last_error()` to see where the error occurred.
Can someone help me understand the error? Also how can I retrieve med_name_one?
Here's one option
get_medication_name <- function(medication_name_df) {
medication_name <- medication_name_df %>%
dplyr::arrange(datetime) %>%
dplyr::summarize(medname = first(med_name_one)) %>%
dplyr::pull(medname)
}
age_pharma %>% mutate(medication_name = purrr::map_chr(table, get_medication_name))
First we had to change the get_medication_name function to handle the case where there are no rows in the table column which is the case in your example.
Then we need to apply the map specifically to the table column.

rbind fails to bind datetime column

I am binding a number of data frames data frames and have noticed that I get weird values in one of the bindings. Datetime in second df is disturbed after binding, it is one hour less than in original df.
kk <- structure(list(date = structure(c(1499133600, 1499137200, 1499140800,
1499144400), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
temp = c(14.7, 14.6, 14.3, 14.2)), .Names = c("date", "temp"
), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))
ff <- structure(list(date = structure(c(1499144400, 1499148000, 1499151600,
1499155200), class = c("POSIXct", "POSIXt"), tzone = ""), temp = 14:17), .Names = c("date",
"temp"), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))
Calling functions from different packages give me same result:
dplyr:: bind_rows(kk, ff)
data.table::rbindlist(list(kk, ff))
rbind(kk,ff)
I do not get what is going on. Could it have something to do with date format?

Side by side plots for PerformanceAnalytics in R

The following script plots 2 charts side by side:
require(xts)
par(mfrow=c(1,2))
XTS1 <- structure(c(12, 7, 7, 22, 24, 30, 26, 23, 27, 30), .indexCLASS = c("POSIXct", "POSIXt"), .indexTZ = "", tclass = c("POSIXct", "POSIXt"), tzone = "", class = c("xts", "zoo"), .CLASS = structure("double", class = "CLASS"), formattable = structure(list(formatter = "formatC", format = structure(list(format = "f", digits = 2), .Names = c("format", "digits")), preproc = "percent_preproc", postproc = "percent_postproc"), .Names = c("formatter", "format", "preproc", "postproc")), index = structure(c(1413981900, 1413982800, 1413983700, 1413984600, 1413985500, 1413986400, 1413987300, 1413988200, 1413989100, 1413990000), tzone = "", tclass = c("POSIXct", "POSIXt")), .Dim = c(10L, 1L))
XTS2 <- XTS1 ^ 0.2
plot(XTS1)
plot(XTS2)
The following script fails to plot 2 charts side by side:
require(PerformanceAnalytics)
require(xts)
par(mfrow=c(1,2))
XTS1 <- structure(c(12, 7, 7, 22, 24, 30, 26, 23, 27, 30), .indexCLASS = c("POSIXct", "POSIXt"), .indexTZ = "", tclass = c("POSIXct", "POSIXt"), tzone = "", class = c("xts", "zoo"), .CLASS = structure("double", class = "CLASS"), formattable = structure(list(formatter = "formatC", format = structure(list(format = "f", digits = 2), .Names = c("format", "digits")), preproc = "percent_preproc", postproc = "percent_postproc"), .Names = c("formatter", "format", "preproc", "postproc")), index = structure(c(1413981900, 1413982800, 1413983700, 1413984600, 1413985500, 1413986400, 1413987300, 1413988200, 1413989100, 1413990000), tzone = "", tclass = c("POSIXct", "POSIXt")), .Dim = c(10L, 1L))
XTS2 <- XTS1 ^ 0.2
charts.PerformanceSummary(XTS1)
charts.PerformanceSummary(XTS2)
Would anyone know how to get the latter script to plot 2 charts side by side?
I would like to avoid using another package if possible. Thanks.
chart.PerformanceSummary is really just a wrapper to several charts.
You could do this, and extend it to any number of symbols horizontally if you wish (more than 2 symbols if you wanted):
par(mfrow=c(3,2))
# First row
chart.CumReturns(XTS1, ylab = "Cumulative Return", main = "give me a title")
chart.CumReturns(XTS2, ylab = "Cumulative Return", main = "give me a title2")
# second row
chart.BarVaR(XTS1)
chart.BarVaR(XTS2)
# third row
chart.Drawdown(XTS1, main = "DD title", ylab = "Drawdown",
)
chart.Drawdown(XTS2, main = "", ylab = "Drawdown",
)
You need to add the appropriate parameters to each plot for things like colour and titles (leaving that to you), but you have the flexibility of adding any charts from the wonderful xts, quantmod, performanceAnalytics packages (and others).

Resources