Null datatable Shiny - r

I have a function to create a datatable in Shiny based on department numbers and how many times an event happened in that department during a time period. My issue is that if the date range is short enough, no departments will have had the event occur. In those instances, I get the error Error in rowSums(x) : 'x' must be an array of at least two dimensions which initially just appeared within the Shiny app and you could just ignore it. Now, the app crashes and you have to go back to R to look at it.
I understand why the error is occurring but I don't know if there's a way around it for my situation because I don't know if the events occur until the data is subset. The function is called a number of times in my code, so I don't want to write an if statement outside the function each time it is used.
I tried adding if(length(b$Department <= 1)){tab<-renderDataTable({datatable(NULL)})} right after defining b and then had an else statement around the remainder of the function, but I get the message Warning: Error in [.data.frame: undefined columns selected
I have also tried other if statements such as creating a dataframe full of NAs but this returned the original error message.
dept.table<-function(df, date1, date2){
a<-df[which(DATE >= as.Date(date1) & DATE <= as.Date(date2)),]
b<-as.data.frame(table((a[,c("Event", "Department")])))
d<-reshape(b, direction="wide", idvar="Event", timevar="Department")
names(d)<-sub('^Freq\\.', '', names(d))
d$Total<-round(rowSums(d[,-1]), 0)
levels(d$Event)<-c(levels(d$Event), "Total")
d<-rbind(d, c("Total", colSums(d[,-1])))tab<-DT::renderDataTable({
datatable(d, extensions="FixedColumns", options=list(dom='t', scrollX=T, fixedColumns=list(leftColumns=1, rightColumns=1)), rownames=FALSE)
})
}
Sample data
df<-data.frame(Department=rep(100:109, 3), Event=rep(c("A", "B", "C"),10),
Date=sample(seq(as.Date('2018/01/01'), as.Date('2018/09/01'), by="day"), 30))

It's not pretty, but I figured out a solution. There were two different issues. One when there was no data and another when there was only 2 departments, so I needed two if statements.
dept.table<-function(df, date1, date2) {a<-df[DATE >= as.Date(date1) & DATE <= as.Date(date2)),]
b<-as.data.frame(table((a[,c("Event", "Department")])))
if(nrow(b)==0){tab<-DT::renderDataTable(NULL)}
else{d<-reshape(b, direction="wide", idvar="CODE", timevar="Department")
names(d)<-sub('^Freq\\.', '', names(d))
if(ncol(d)>3){d$Total<-round(rowSums(d[,-1]), 0)
levels(d$Event)<-c(levels(d$Event), "Total")
d<-rbind(d, c("Total", colSums(d[,-1])))
tab<-DT::renderDataTable({
datatable(d, extensions="FixedColumns", options=list(dom='t', scrollX=T, fixedColumns=list(leftColumns=1, rightColumns=1)), rownames=FALSE)})}
else{tab<-DT::renderDataTable(datatable(d))}
}
tab
}

Related

Function Number of paid invoices prior to the creation date of a new invoice of a customer

I´m trying to run a function even though im not quite sure if this is the correct answer. Im new to Rstudio and im trying to get count of Number of paid invoices prior to the creation date of a new invoice of each customer and another column of Number of invoices which were paid late
prior to the creation date of a new invoice of each customer
My data:
set.seed(123)
names<- rep(LETTERS[1:2], each = 16)
id<- seq(1,32)
daysp<- runif(1:32,1,32)
startdate <-c("20-02-2018","01-03-2018","13-03-2018","20-03-2018","28-03-2018","05-04-2018","10-04-2018","13-04-2018",
"16-04-2018","19-04-2018","04-05-2018","14-05-2018","23-05-2018","04-06-2018","12-06-2018","19-06-2018",
"26-04-2018","02-05-2018","07-05-2018","07-05-2018","07-05-2018","14-05-2018","29-05-2018","12-06-2018",
"12-06-2018","18-06-2018","11-07-2018","11-07-2018","17-07-2018","30-07-2018","03-08-2018","07-08-2018")
startdate<-as.Date(startdate,"%d-%m-%Y" )
paydate<- startdate + daysp
class <- c("Payed", "Payed","Payed", "Delayed","Payed", "Delayed","Delayed", "Delayed","Payed", "Delayed",
"Payed", "Delayed","Payed", "Delayed","Payed", "Delayed","Payed", "Delayed","Payed", "Delayed",
"Payed", "Delayed","Payed", "Delayed","Payed", "Delayed","Delayed", "Delayed","Payed", "Delayed",
"Payed", "Delayed")
df<-data.frame(names,id,daysp,startdate,paydate,class)
My expected result looks like this:
nopip<-c(0,0,1,1,3,3,4,4,4,5,7,10,10,12,12,14,0,0,2,2,2,2,3,6,6,6,9,9,10,12,13,14)
nopip_delayed<-c(0,0,0,0,0,0,1,1,1,2,3,5,5,6,6,6,0,0,1,1,1,1,1,3,3,3,4,4,5,6,7,8)
like this Dataframe
df<-cbind(df,nopip,nopip_delayed)
Thanks in advance
There are several ways to accomplish this, but here is one using base R which is good to understand for building a foundation to expand.
This uses lapply to step through the data.frame and check if the names match that row along with the pay date being prior to the start date.
df$nopip2 <- lapply(seq_len(nrow(df)), function(x) sum(df$names == df$names[x] & df$paydate < df$startdate[x]))
This does the same sequence as the previous function, but adds an additional check if the class was delayed.
df$nopip_delayed2 <- lapply(seq_len(nrow(df)), function(x) sum(df$names == df$names[x] & df$paydate < df$startdate[x] & df$class == 'Delayed'))
Confirming calculated results are same as desired output
> setequal(df$nopip, df$nopip2)
[1] TRUE
> setequal(df$nopip_delayed, df$nopip_delayed2)
[1] TRUE
Added example to sum the daysp with respective nopip
df$nopip_daysp <- lapply(seq_len(nrow(df)), function(x) sum((df$names == df$names[x] & df$paydate < df$startdate[x]) * df$daysp))
As a side note iterating through a data.frame is an expensive option if the number of rows is large. However, using the steps above will be an easy transition if that time arises.

Filtering datetime by vector

It's probably really simple.
In the first case, using presidential data, I can filter by either years or years 2. And I get the same result.
However, when I use posixct data and try to filter in a similar way I run into problems.
When I write
school_hours2<-as.character(c(07:18))
I can see the values in school_hours 2 are
"7", "8","9" etc
whereas in
school_hours they are
"07" "08" "09" etc
EDIT: I think this explains that difference then?
EDIT: I can see the problem comparing integer:character, and even when I write the vector as.character the values in the vector do not match what I want.
What I'd like is to be able to filter by school_hours2. As that would mean I could think "i'd like to filter between these two times" and put the upper and lower bounds in. Rather than having to write all the interval points in between. How do I get this?
Why is filtering by "Y" easier than filtering by "H"?
library (tidyverse)
#some data - filtering works
data(presidential)
head(presidential)
str(presidential)
presidential%>%filter(format(as.Date(start),"%Y")<=2005)
years<-c('1979', '1980', '1981', '1982',
'1983', '1984', '1985', '1986',
'1987', '1988', '1989', '1990'
)
years2<-c(1950:1990)
presidential%>%filter(format(as.Date(start),"%Y")%in% years2)
presidential%>%filter(format(as.Date(start),"%Y")%in% years)
#some date time data - filtering.
test_data<-sample(seq(as.POSIXct('2013/01/01'), as.POSIXct('2017/05/01'), by="day"), 1000)
td<-as.data.frame(test_data)%>%mutate(id = row_number())
school_hours<-c('07', '08', '09', '10',
'11', '12', '13', '14',
'15', '16', '17', '18'
)
school_hours2<-c(07:18)
school_years<-c(2015,2016,2017)
school_years2<-c(2015:2017)
str(td)
test1<-td%>%
filter(id >=79)
schools<-td%>%
filter(format(test_data,'%H') %in% school_hours)
schools2<-td%>%
filter(format(test_data,'%H') %in% school_hours2)
schools3<-td%>%
filter(format(test_data,'%Y')==2017)
schools4<-td%>%
filter(format(test_data,'%Y') %in% school_years)
schools5<-td%>%
filter(format(test_data,'%Y') %in% school_years2)
Here's my question:
In the code above, when I try to filter td (which contains posixct data) using school_hours or school_hours2 I get zero data returned.
Why?
What I'd like to be able to do is instead of writing
school_hours<-c('07', '08', '09', '10',
'11', '12', '13', '14',
'15', '16', '17', '18'
)
I'd write
school_hours2<-c(07:18)
Just like I have for school_years and the filtering would work.
This doesn't work
schools2<-td%>%
filter(format(test_data,'%H') %in% school_hours2)
This does work
schools5<-td%>%
filter(format(test_data,'%Y') %in% school_years2)
WHY?
I ask because:
I've used something similar to filter my real data, which I can't share, and I get a discrepancy.
When I use school_hours (which is a character) I generate 993 records and the first time is 07:00.
When I use school_hours2 (which is an integer) I generate 895 records and the first time is 10:00.
I know - "without the data we can't make any evaluation" but what I can't work out is why the two different vector filters work differently. Is it because school_hours contains characters and school_hours2 integers?
EDIT:
I changed the test_data line to:
#some date time data - filtering.
test_data<-as.POSIXct(sample(seq(1127056501, 1127056501), 1000),origin = "1899-12-31",tz="UTC")
it's still problematic:
schools<-td%>%
filter(format(test_data,'%H') %in% school_hours)
generates 510 rows
schools2<-td%>%
filter(format(test_data,'%H') %in% school_hours2)
generates 379 rows
All of the data I'm really interested looks like this
1899-12-31 23:59:00
(where the last 6 digits represent a 24 hr clock time)
All I'm really trying to do is convert the time from this
1899-12-31 07:59:00
to
the hour (7)
and then
use
school_hours2<-c(07:18)
as a filter.
But will the hour generated by the conversion of
1899-12-31 07:59:00
be
07
or
7
Because if it's 07, then
school_hours2<-c(07:18)
generates
7
and
school_hours2<-as.character(c(07:18))
generates
'7'
How do I get around this?
EDIT:
LIKE THIS:
R: how to filter a timestamp by hour and minute?
td1<-td%>%mutate(timestamp_utc = ymd_hms(test_data,tz="UTC"))%>%
mutate(hour = hour(timestamp_utc))%>%
filter(hour(timestamp_utc) %in% school_hours)
td2<-td%>%mutate(timestamp_utc = ymd_hms(test_data,tz="UTC"))%>%
mutate(hour = hour(timestamp_utc))%>%
filter(hour(timestamp_utc) %in% school_hours2)
td3<-td%>%
mutate(hour = hour(test_data))%>%
filter(hour(test_data) %in% school_hours2)
After a lot of mucking around and talking to myself in my question
I found this thread:
filtering a dataset by time stamp
and it helped me to realise how to isolate the hour in the time stamp and then use that to filter the data properly.
the final answer is to isolate the hour by this
filter(hour(timestamp_utc) %in% school_hours2)

R - Date/Time Calculations

My Question is divided into 2 parts:
1st part:
I have a function, getdata() which I use to pull information for a date range.
get_data <- function (fac_num, start_date, end_date) {
if (!(is.null(fac_num) | is.null(start_date) | is.null(end_date))) {
if(end_date - start_date > 7) {
start_date <- end_date - 7
#start_date <- as.Date('2017-07-05')
#end_date <- as.Date('2017-07-06')
#fac_num <- "005"
}
new_start_date <- paste0(start_date,' 05:00:00')
new_end_date <- paste0(end_date + 1,' 05:00:00')
qry <- paste0("SELECT FAC_NUM, USER_ID, APPL_ID, FUNC_ID, ST_ID, NXT_ST_ID, RESP_PRMT_DATA,
ST_DT_TM, END_DT_TM, RESP_PRMT_TY_CDE,
REQ_INP_DATA FROM OPSDBA.STG_RFS_INTERACTION WHERE TRANS_ST_DT_TM >= DATE'",
start_date,"' AND TRANS_ST_DT_TM BETWEEN TO_TIMESTAMP('",new_start_date,"', 'YYYY-MM-DD HH:MI:SS') AND TO_TIMESTAMP('",new_end_date,"', 'YYYY-MM-DD HH:MI:SS')
AND APPL_ID='CTS' AND FAC_NUM='",fac_num,"'")
and then I perform calculations on it.
Further, in my program. I use this getdata() function to pull data for a new set of analysis.
rf_log_perform <- get_data(display_facility_decode(input$facNum2),
input$dateRange2, input$dateRange2 + 1)
Here since I am using just a single date instead of range, I have added one to the range so that the getdata() function would work.
I then wanted to modify the date range in such a way that, it does not show anything past 11:59 for the selected date.
rf_log_perform$date <- ifelse(strftime(rf_log_perform$st_dt_tm, format="%H:%M:%S")<'05:00:00',
format(as.POSIXct(strptime(rf_log_perform$st_dt_tm - 1*86400 , '%Y-%m-%d %H:%M:%S')),format = '%Y-%m-%d'),
format(as.POSIXct(strptime(rf_log_perform$st_dt_tm , '%Y-%m-%d %H:%M:%S')),format = '%Y-%m-%d'))
By using the getdata() function, I would be able to pull data for date range 08/29/2017, 05:00:00 to 08/30/2017, 05:00:00 which is considered to be a day in my example.
But for my calculations, I want to discard everything which is beyond 08/29/2017, 11:59:59 PM, for more accurate results.
For this purpose, I have added an ifelse statement in there to sort that out. But this isn't behaving as I expect and am confused on why not.
Unfortunately I still can not comment on the main question.
I encourage you to make two adjustments to your question to improve the chances on getting an answer to your question:
1) Please make your example reproducible e.g. provide date ranges, wrap your code in a well defined function etc.
2) Explain what you are trying to achieve. What is your intention and expected result.

replacing a value in column X based on columns Y with R

i've gone through several answers and tried the following but each either yields an error or an un-wanted result:
here's the data:
Network Campaign
Moburst_Chartboost Test Campaign
Moburst_Chartboost Test Campaign
Moburst_Appnext unknown
Moburst_Appnext 1065
i'd like to replace "Test Campaign" with "1055" whenever "Network" == "Moburst_Chartboost". i realize this should be very simple but trying out these:
dataset = read.csv('C:/Users/User/Downloads/example.csv')
for( i in 1:nrow(dataset)){
if(dataset$Network == 'Moburst_Chartboost') dataset$Campaign <- '1055'
}
this yields an error: Warning messages:
1: In if (dataset$Network == "Moburst_Chartboost") dataset$Campaign <- "1055" :
the condition has length > 1 and only the first element will be used
2: In if (dataset$Network == "Moburst_Chartboost") dataset$Campaign <- "1055" :
the condition has length > 1 and only the first element will be used
etc.
then i tried:
within(dataset, {
dataset$Campaign <- ifelse(dataset$Network == 'Moburst_Chartboost', '1055', dataset$Campaign)
})
this turned ALL 4 values in row "Campaign" into "1055" over running what was there even when condition isn't met
also this:
dataset$Campaign[which(dataset$Network == 'Moburst_Chartboost')] <- 1055
yields this error, and replaced the values in the two first rows of "Campaign" with NA:
Warning message:
In `[<-.factor`(`*tmp*`, which(dataset$Network == "Moburst_Chartboost"), :
invalid factor level, NA generated
scratching my head here. new to R but this shouldn't be so hard :(
In your first attempt, you're trying to iterate over all the columns, when you only want to change the 2nd column.
In your second, you're trying to assign the value "1055" to all of the 2nd column.
The way to think about it is as an if else, where if the condition in col 1 is met, col 2 is changed, otherwise it remains the same.
dataset <- data.frame(Network = c("Moburst_Chartboost", "Moburst_Chartboost",
"Moburst_Appnext", "Moburst_Appnext"),
Campaign = c("Test Campaign", "Test Campaign",
"unknown", "1065"))
dataset$Campaign <- ifelse(dataset$Network == "Moburst_Chartboost",
"1055",
dataset$Campaign)
head(dataset)
Network Campaign
1 Moburst_Chartboost 1055
2 Moburst_Chartboost 1055
3 Moburst_Appnext unknown
4 Moburst_Appnext 1065
You may also try dataset$Campaign[dataset$Campaign=="Test Campaign"]<-1055 to avoid the use of loops and ifelse statements.
Where dataset
dataset <- data.frame(Network = c("Moburst_Chartboost", "Moburst_Chartboost",
"Moburst_Appnext", "Moburst_Appnext"),
Campaign = c("Test Campaign", "Test Campaign",
"unknown", 1065))
Try the following
dataset = read.csv('C:/Users/User/Downloads/example.csv', stringsAsFactors = F)
for( i in 1:nrow(dataset)){
if(dataset$Network[i] == 'Moburst_Chartboost') dataset$Campaign[i] <- '1055'
}
It seems your forgot the index variable. Without [i] you work on the whole vector of the data frame, resulting in the error/warning you mentioned.
Note that I added stringsAsFactors = F to the read.csv() function to make sure the strings are indeed interpreted as strings and not factors. Using factors this would result in an error like this
In `[<-.factor`(`*tmp*`, i, value = c(NA, 2L, 3L, 1L)) :
invalid factor level, NA generated
Alternatively you can do the following without using a for loop:
idx <- which(dataset$Network == 'Moburst_Chartboost')
dataset$Campaign[idx] <- '1055'
Here, idx is a vector containing the positions where Network has the value 'Moburst_Chartboost'
thank you for the help! not elegant, but since this lingered with me when going to sleep last night i decided to try to bludgeon this with some ugly code but it worked too - just as a workaround...separated to two data frames, replaced all values and then binded back...
# subsetting only chartboost
chartboost <- subset(dataset, dataset$Network=='Moburst_Chartboost')
# replace all values in Campaign
chartboost$Campaign <-sub("^.*", "1055",chartboost$Campaign)
#subsetting only "not chartboost"
notChartboost <-subset(dataset, dataset$Network!='Moburst_Chartboost')
# binding back to single dataframe
newSet <- rbind(chartboost, notChartboost)
Ugly as a duckling but worked :)

linking crsp and compustat in R via WRDS

I am using R to connect to WRDS. Now, I would like to link compustat and crsp tables. In SAS, this would be achieved using macros and the CCM link table. What would be the best way to approach this topic in R?
PROGRESS UPDATE:
I downloaded crsp, compustat and ccm_link tables from wrds.
sql <- "select * from CRSP.CCMXPF_LINKTABLE"
res <- dbSendQuery(wrds, sql)
ccmxpf_linktable <- fetch(res, n = -1)
ccm.dt <- data.table(ccmxpf_linktable)
rm(ccmxpf_linktable)
I am then converting the suggested matching routine from the wrds event study sas file into R:
ccm.dt[,typeflag:=linktype %in% c("LU","LC","LD","LN","LS","LX") & USEDFLAG=="1"]
setkey(ccm.dt, gvkey, typeflag)
for (i in 1:nrow(compu.dt)) {
gvkey.comp = compu.dt[i, gvkey]
endfyr.comp = compu.dt[i,endfyr]
PERMNO.val <- ccm.dt[.(gvkey.comp, TRUE),][linkdt<=endfyr.comp & endfyr.comp<=linkenddt,lpermno]
if (length(PERMNO.val)==0) PERMNO.val <- NA
suppressWarnings(compu.dt[i, "PERMNO"] <- PERMNO.val)
}
However, this code is fantastically inefficient. I started out with data.table, but do not really understand how to apply the logic in the for-loop. I am hoping that some could point me to a way how to improve the for-loop.
Matching fields in stages works better. maybe someone finds this useful. Any suggestions for further improvement are of course very welcome!!!
# filter on ccm.dt
ccm.dt <- ccm.dt[linktype %in% c("LU","LC","LD","LN","LS","LX") & USEDFLAG=="1"]
setkey(ccm.dt, gvkey)
setkey(compu.dt, gvkey)
compu.merged <- merge(compu.dt, ccm.dt, all.x = TRUE, allow.cartesian = TRUE)
# deal with NAs in linkenddt - set NAs to todays date, assuming they still exist.
today <- as.character(Sys.Date())
compu.merged[is.na(linkenddt), "linkenddt":=today]
# filter out date mismatches
compu <- compu.merged[linkdt <= endfyr & endfyr<=linkenddt]

Resources