How to NOT write_csv if data frame is empty - r

I have a dataframe that is gathered everyday via a sql query. Sometimes it'll have rows in it, sometimes it wont. I then write_csv it into a onedrive location which triggers an automated email.
df and code like this if relevant:
df<-structure(list(PROTOCOL_ID = numeric(0), PROTOCOL_NO = character(0),
STATUS = character(0), STATUS_DATE = structure(numeric(0), tzone = "", class = c("POSIXct",
"POSIXt")), PROCESSED_FLAG = character(0), INITIATOR_CODE = numeric(0),
CHANGE_REASON_CODE = numeric(0), PR_STATUS_ID = numeric(0),
COMMENTS = character(0), CREATED_DATE = structure(numeric(0), tzone = "", class = c("POSIXct",
"POSIXt")), CREATED_USER = character(0), MODIFIED_DATE = structure(numeric(0), tzone = "", class = c("POSIXct",
"POSIXt")), MODIFIED_USER = character(0), OUTCOME_ID = numeric(0),
IRB_NO = character(0), NCT_NUMBER = character(0), PI_NAMES = character(0)), row.names = integer(0), class = "data.frame")
write_csv(df, "df.csv")
If the dataframe has zero rows that day, I'd rather it DIDN'T write the csv. I'm sure I could figure out a step that deletes the data frame if empty and then the write_csv line would error, but I'd rather not do that. Is there an easy way to 'turn off' the write?

We could have a condition to only write to csv when the number of rows is greater than 0
if(nrow(df) > 0) readr::write_csv(df, "df.csv")

Related

How can I transform data from tidy to a unique format?

I have a dataset in which there are dates describing a time period of interest, as well as events ("Tests" in my toy example) that can fall inside or outside the period of the interest. The events also have a time and some dichotomous characteristics.
My collaborator has asked me to transform the data from this format:
structure(list(ID = c(1, 1, 2, 3), StartDate = structure(c(315878400,
315878400, 357696000, 323481600), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), EndDate = structure(c(316137600, 316310400,
357955200, 323654400), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
TestDateTime = structure(c(316135500, 315797700, 357923700,
323422560), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
TestName = c("Test1", "Test2", "Test1", "Test3"), Characteristic = c("Fast",
"Slow", "Fast", "Slow")), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"))
current state
to this format:
desired state
I am unsure how to accomplish this transformation or set of transformations using R, but I believe it is possible.
try the following
library(dplyr)
data %>%
select(-c(StartDate,EndDate)) %>% # Remove extra columns
tidyr::spread(TestDate, TestTime) %>% # Spread df to long form
select(-Characteristic, everything()) %>% # Move Characteristic to the end of the df
group_by(ID) %>% # Group by ID and
group_split() # split it
Take on count that the date columns of the final df are not exact as the "desire" state.
Hope this can help you.

R loop doesn't work while single command works

I am trying to covert many time series xts objects to tibbles, and the for loop I wrote does not work properly, I don't know why.
This does not only happen to this particular task, but other task I perform, I have a list called "code", which contains a list of names for all the xts objects I want to convert from.
code <- c('ABT','BA','CL','ROK')
for (i in code)
{
i <- tk_tbl(i, preserve_index = TRUE, rename_index = "index",
timetk_idx = FALSE, silent = FALSE)
}
What is strange is that, if I use a single one without loop, it works beautifully and convert the xts "ABT" to a tibble "ABT"
ABT <- tk_tbl(ABT, preserve_index = TRUE, rename_index = "index",
timetk_idx = FALSE, silent = FALSE)
The error message for the first code is
Warning: No index to preserve. Object otherwise converted to tibble
successfully.
38: In tk_tbl.data.frame(as.data.frame(data), preserve_index, ... :
Edit:
tk_tabl is a function from the package timetk, and it "Coerce time-series objects to tibble."
And code is a vector containing names.
library(timetk)
code <- c('ABT','BA','CL','ROK')
> dput(head(ROK))
structure(c(8.14062, 8.15625, 8.03125, 7.78125, 7.6875, 7.71875,
8.25, 8.15625, 8.125, 7.90625, 7.71875, 7.75, 8.03125, 8.125,
7.90625, 7.65625, 7.625, 7.65625, 8.1875, 8.125, 7.90625, 7.71875,
7.65625, 7.6875, 109600, 80800, 138400, 151600, 96800, 258800,
0.684505, 0.67928, 0.660992, 0.645316, 0.640091, 0.642704),
class=c("xts", "zoo"), .indexCLASS = "Date", tclass = "Date",
.indexTZ = "UTC", tzone = "UTC", src = "yahoo",
updated = structure(1558826745.23035, class = c("POSIXct","POSIXt")),
index = structure(c(378604800, 378950400, 379036800,
379123200, 379209600, 379296000), tzone = "UTC", tclass = "Date"),
.Dim = c(6L, 6L), .Dimnames = list(NULL, c("ROK.Open", "ROK.High",
"ROK.Low", "ROK.Close", "ROK.Volume", "ROK.Adjusted")))
For me it looks like that you expect <- to do what assign is doing.
I think you get your expected result when you change your loop to:
for (i in code) {
assign(i, tk_tbl(i, preserve_index = TRUE, rename_index = "index", timetk_idx = FALSE, silent = FALSE))
}

Change data to numeric type to determine which distribution fits better

I am trying to figure out which distribution fits best logarithmic stock returns. Here is my code:
library(TTR)
sign="^GSPC"
start=19900101
end=20160101
x <- getYahooData(sign, start = start, end = end, freq = "daily")
x$logret <- log(x$Close) - lag(log(x$Close))
x=x[,6]
I want to use the function descdist(x, discrete = FALSE) which I got from this amazing post https://stats.stackexchange.com/questions/132652/how-to-determine-which-distribution-fits-my-data-best Nonetheless r gives me this error: Error in descdist(x, discrete = FALSE) : data must be a numeric vector How do I transform my data to numeric vector??
The output from dput(head(x)) is:
structure(c(NA, -0.00258888580664607, -0.00865029791190164, -0.00980414107803274,
0.00450431207515223, -0.011856706127011), class = c("xts", "zoo"
), .indexCLASS = "Date", .indexTZ = "UTC", tclass = "Date", tzone = "UTC", index = structure(c(631238400,
631324800, 631411200, 631497600, 631756800, 631843200), tzone = "UTC", tclass = "Date"), .Dim = c(6L,
1L), .Dimnames = list(NULL, "logret"))
Pre-process x using as.numeric(na.omit(x)), or simply run
descdist(as.numeric(na.omit(x)), discrete = FALSE)

date format change with DT and shiny

my problem is when i use datatable on my computer and on the server formatDate is changing
i know i'm using method = 'toLocaleDateString' maybe it's not the good method
on my computer it give me the format i want :
1 février 2000
21 mars 2000
on shiny it give me :
01/02/2000
21/03/2000
local computer and server have Sys.timezone()
[1] "Europe/Paris"
im trying to do it like this
a <-structure(list(timestamp = structure(c(949363200, 953596800,
961286400, 962582400, 965347200, 969667200),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
anoms = c(1, 1, 1, 1, 1, 2), syndrome = c("Acrosyndrome",
"Acrosyndrome", "Acrosyndrome", "Acrosyndrome", "Acrosyndrome",
"Acrosyndrome")), .Names = c("timestamp", "anoms", "syndrome"
), row.names = c(NA, 6L), class = "data.frame")
datatable(a) %>% formatDate( 1, method = 'toLocaleDateString')
a
Thank you
With the development version of DT (>= 0.2.2) on Github, you can pass additional parameters to the date conversion method, e.g.
datatable(a) %>%
formatDate(1, method = 'toLocaleDateString', params = list('fr-FR'))
Or more parameters:
datatable(a) %>% formatDate(
1, method = 'toLocaleDateString',
params = list('fr-FR', list(year = 'numeric', month = 'long', day = 'numeric'))
)

R crashes while using data.table

sac[,treatment_days := as.character(seq(from = SACDPDAT, to = SACRTDAT, by = "1 day")), by = PACKID] I have data named sac with dput(sac[1:2,]) as follows:
structure(list(SUBJECT_Blinded = c(1201001, 1101001), LINE = c(8,
4), MODULE = c("SAC", "SAC"), CENTRE_Blinded = c(1201, 1201),
STUDYPER = c(7, 4), PACKID = c(10096, 10595), SACDPDAT = structure(c(1335304800,
1325545200), class = c("POSIXct", "POSIXt"), tzone = ""),
SACDP1 = c(35, 35), C_SACDP = c(NA_character_, NA_character_
), SACRTDAT = structure(c(1340316000, 1327964400), class = c("POSIXct",
"POSIXt"), tzone = ""), SACRT1 = c(0, 9), C_SACRT = c(NA_character_,
NA_character_)), .Names = c("SUBJECT_Blinded", "LINE", "MODULE",
"CENTRE_Blinded", "STUDYPER", "PACKID", "SACDPDAT", "SACDP1",
"C_SACDP", "SACRTDAT", "SACRT1", "C_SACRT"), sorted = c("SUBJECT_Blinded",
"PACKID"), class = c("data.table", "data.frame"), row.names = c(NA,
-2L))
When I running the code:
sac[,treatment_days := list(format(seq(from = SACDPDAT, to = SACRTDAT, by = "1 day"),"%Y-%m-%d")), by = PACKID]
RStudio crushes and returns info:
Problem signature:
Problem Event Name: APPCRASH
Application Name: rsession.exe
Application Version: 0.98.501.0
Application Timestamp: 52e8371d
Fault Module Name: R.dll
Fault Module Version: 3.3.65126.0
Fault Module Timestamp: 53185fd3
Exception Code: c0000005
Exception Offset: 0000000000028c36
OS Version: 6.1.7601.2.1.0.256.48
Locale ID: 1045
Additional Information 1: 4fc0
Additional Information 2: 4fc0e6e5b53a870c89fb6e37a38d7e6b
Additional Information 3: 9d6e
Additional Information 4: 9d6e8f79167930945e5a5d06afac680e
It's the same with pure R. Any ideas how to do it another way?
There's a couple of problems with your new code:
"1 day" is incorrect, if you run seq on a date object, the number you pass to by will be interpreted as days, so:
seq(from = SACDPDAT, to = SACRTDAT, by = 1)
You also cannot create a new column from this sequence, because there can only be one value for each row. Instead, you can generate the sequence of days by PACKID, and then join this onto the old data.table
So try:
setkey(sac, PACKID)
sac <- sac[sac[,seq(from = SACDPDAT, to = SACRTDAT, by = 1), by=PACKID]]

Resources