Working with imperfectly matched data in crossfilter - crossfilter

Given the following dimensions specifications in Crossfilter's API - https://github.com/square/crossfilter/wiki/API-Reference
1.) The function must return naturally-ordered values
2.) .....incomparable values such as NaN and undefined are not supported
How would one go about charting a crossfilter (using dc.js) with two dimensions - one with daily data (7 days a week), and another with business-day data (5-days a week)? The data structure implies that the business-day data will have gaps on the weekend which should violate the specifications above.
For example, if I want to compare a company's store sales (7 days/week) vs its stock price (5 days + gaps on Saturday and Sunday), how would i go about it? The goal would be to have two dc.js charts filtering each other, but having data that isn't perfectly matched up i.e. the first chart will show sales data from Jan 1 till Jan 31 (7 days a week), while the second chart will show stock price data from the first till the last business day in Jan (excluding weekends).

Your stock data would likely include no data for Saturday and Sunday. This is is different from having a data row with stock price as NaN.
For example: If you plotted the stock data on a row chart with the days of the weeks for the categories, then there would be no bars for saturdays and sundays.
Here is a crude example: DC.JS example of days of week chart
I made sure that no rows were added for saturdays and sundays:
if ((stockDate.getDay() != 6) && (stockDate.getDay() != 0))
The resulting row chart has no row for Saturday or Sunday.

You could explore filtering your data, as I did, so you preselect what you want to show. Remember to include the additional code which preserves the bins.
Hide Specified Row in dc.js rowchart

Related

Creating new datasets from unique dates in R

I have a dataset of 2015 with every day of the year. In this dataset, there are actions that happen on any given day. Some days have more actions than others, therefore some days have many more entries than others.
I am trying to create a function that will create an individual dataset per day of the year without having to code 365 of these:
df <- subset(dataset, date== "2015-01-01")
I have looked at dyplyr's group_by(), however I do not want a summary per day, it is important that I get to see the whole observation on any given day for graphing purposes.

Time Series Changing Values

so I wanted to forecast month-over-month increases for four columns to the end of the year; however, upon creating my dataset through ts, it removed the value of my imported dataset. Is there a reason for this that I can avoid? Or should it have come out in such a manner.
Month - 2022-03-01, 2022-04-01,2022-05-01,2022-06-01,2022-07-01
Visits- 71893, 40683,32455,34898,49834
Revenue- 87036,23846,34575,39732,45632
Orders- 3488,6578,4345,5644,6543
Conversion Rate- .35%,.33%,.43%,.39%
However, it is returning the following below: does this have an actual meaning? Or is the month column causing this?
Month - 2022-03-01, 2022-04-01,2022-05-01,2022-06-01,2022-07-01
Visits- 5,1,2,3,4
Revenue- 5,3,4,1,2
Orders- 1,2,3,4,5
Conversion Rate- 1,2,3,4,5

How to remove rows in data frames with dates close to each other in R?

I have longitudinal data in a data frame in long format in R, such that a person can be present on several rows, where each row has a specific date - but never the same date. Data is sorted by personal ID firstly and secondly by date, such that early dates for an individual comes first.
Following is what I would like to accomplish:
The first date for each individual should be kept. For the rest of the dates I want to remove all dates occurring within 30 days of a previous date for that person. But, if a row is removed, no other following dates should be compared to that date. The dates should be removed in order, from top to bottom. I.e. if a person has dates 14 May 2020, 20 May 2020, 22 May 2020 and 17 June 2020 I would like to remove the rows in the data frame with the two middle dates, as they are close to the first date: 14 May 2020. I have been able to do this with for loops, but it is not at all time efficient for big data. Does anybody know how I could solve this in a better way?

Filtering multi-year dates in Shiny Datatable

I am running into an issue I can't quite resolve when filtering my Datatable in shiny.
I am displaying two dataframes, first is all the rows of the dataframe filtered for the time by a dateRangeInput filter. The second one is using the same data set and dateRangeInput, but summating the totals during the period. The user therefore gets the raw + summarized data on the page.
update.df = eventReactive(input$Submit, {
data() %>% filter (between( as.Date(Date), input$dateRange[1], input$dateRange[2]))
})
This is my code for the first dataframe - I've also used this code prior to attempting the above:
update.df = eventReactive(input$Submit, {
data() %>% filter(as.Date(Date) >= as.Date(input$dateRange[1]) & as.Date(Date) <= as.Date(input$dateRange[2]))
})
Whenever the dates fall within a calendar year, or same month, I find that the filters tend to work and the right subset of data is pulled. However, when the data spans two calendar years, I am missing months of data.
For example:
Date 1: Sep/01/2018
Date 2: Apr/30/2019
When I run these two dates, I am missing the dates from Jan 1st, 2019 to Apr 30th, 2019.
I believe the issue initially was that as I filtered using this:
as.Date(Date) <= as.Date(input$dateRange[2])
for the second part of my filter logic, the months that are 'less' than Apr 30th occur after that in the calendar year. But really, with the data I am using, the months between January - April occur earlier. That's why I eventually shifted to using 'between' hoping it would identify all dates in between the two spans. However, I am still not achieving the correct results.
How can I properly structure the filter logic so that when I filter the dataframe between Sep/01/2018 and Apr/30/2019, all the months that occur linearly show up in the dataframe as part of the subset?

R - Filter Dates by Time Window without including weekends

Is there a way to window filter dates by a number of days excluding weekends?
I know you can use the between function for filtering between two specific dates but I only know one of the two specific dates, with the other date I would like to do is 4 days prior in business days only (not counting weekends).
An pseudo-example of what I am looking for is, given this wednesday I want to filter everything up to 4 business days beforehand:
window(z, start = as.POSIXct("2017-09-13"), end = as.POSIXct("2017-09-20"))
Another example would be if I am given this Friday's date, the start date would be Monday.
Ideally, I want to be able to play with the window value.

Resources