I have a dataset I'm working with that is buildings and electrical power use over time.
There are two aggregations on these buildings that are simple sums across the entire timespan and I have those written. They end up looking like:
var reducer = reductio();
// How much energy is used in the whole system
reducer.value("energy").sum(function (d) {
return +d.Energy;
});
These work great.
The third aggregation, however, is giving me some trouble. I need to find the point that the sum of all the buildings is at its greatest. I need the max of the sum and the time it happened.
I wrote:
reducer.value("power").sum(function (d) {
return +d.Power;
}).max(function (d) {
return +d.Power;
}).aliasProp({
time: function (d, v) {
return v.Timestamp;
}
});
But, this is not necessarily the biggest power use. I'm pretty sure this returns the sum and the time when any individual building used the most power.
So if the power values were 1, 1, 1, 15. I would end up with 18, when there might be a different moment when the values were 5, 5, 5, 5 for a total of 20. The 20 is what I need.
I am at a loss for how to get the maximum of a sum. Any advice?
Just to restate: You are grouping on time, so your group keys are time periods of some sort. What you want is to find the time period (group) for which power use is greatest.
If I'm right that this is what you want, then you would not do this in your reducer, but rather by sorting the groups. You can order groups by using the group.order method: https://github.com/crossfilter/crossfilter/wiki/API-Reference#group_order
// During group setup
group.order(function(p) { return p.power.sum; })
// Later, when you want to grab the top power group
group.top(1)
Reductio's max aggregation should just give you the maximum value that occurs within the group. So given a group with values 1,1,1,15, you would get back the value 15. It sounds like that's not what you want.
Hopefully I understood properly. If not, please comment. If you can put together an example with toy data that is public and where you can tell me what you would like to see vs what you are getting, I should be able to help out.
Update based on example:
So, what you want (based on the description in the example) is to find the maximum power usage for any given time within the selected time period. So you would do the following:
var timeDim = buildings.dimension(function(d) { return d.Timestamp })
var timeGrp = timeDim.group().reduceSum(function(d) { return d.Power })
var maxResults = timeGrp.top(1)
Whenever you want to find the max power usage time for your current filter, just call timeGrp.top(1) and the key of that group will be the time with the maximum power.
Note: Don't filter on timeDim as the filters on a dimension are not applied to groups defined on that dimension.
Here's an updated JSFiddle that writes out the maximum group to the console: https://jsfiddle.net/esjewett/1o3robm3/1/
Related
this might look simple.. but dk how to do it
this is the information:
So.. i got the Cumulative Total using this function:
CumulativeTotal = CALCULATE(
SUM(vnxcritical[Used Space GB]),
FILTER(ALL(Datesonly[Date]),
Datesonly[Date] <= MAX(Datesonly[Date])))
But what i need is to get the differences between the dates, in the first date and the second the difference will be of 210. I need to get another column with that information. know the formula to do that?
ok..
So.. i used this:
IncrmentalValueTEST =
VAR CurrDate = MAX(vnxcritical[Date])
VAR PrevDate = CALCULATE(LASTDATE(vnxcritical[Date]), vnxcritical[Date] < CurrDate)
RETURN SUM(vnxcritical[Used Space GB]) -
CALCULATE(SUM(vnxcritical[Used Space GB]), vnxcritical[Date] = PrevDate)
and this is the result:
Ok, so this is is my data table:
You can see all the dates that i have for now, this is a capacity report for diferents EMC Storage Arrays, for diferentes Pools. The idea would be to have the knolwdge to review the incremental space used in a determinated portion of time.
allready tried another idea to get this, but the result was the same.. i used this:
Diferencia =
Var Day = MAX(Datesonly[Month])
Var Month = MAX(Datesonly[Year])
RETURN
SUM('Used Space'[used_mb])
- CALCULATE(
SUM('Used Space'[used_mb])
,FILTER(ALL(Datesonly[Date]),Datesonly[Date] <= Max(Datesonly[Date])))
But the return is the same.. "47753152401"
i'm using graphical filters, and other things to get a minimal view, because there are only 5 weekly reports and the sql database got more than 150.000 rows.
and this is the relation that i made with a only a table full of "dates" in order to invoke the function in a better way, but the result is the same..
Try something along these lines:
IncrmentalValue =
VAR CurrDate = MAX(Datesonly[Date])
VAR PrevDate = CALCULATE(LASTDATE(Datesonly[Date]), Datesonly[Date] < CurrDate)
RETURN SUM(vnxcritical[Used Space GB]) -
CALCULATE(SUM(vnxcritical[Used Space GB]), Datesonly[Date] = PrevDate)
First, calculate the current date and then find the previous date by taking the last date that occurred before it. Then take the difference between the current value and the previous value.
I am attempting to combine a series of loops/functions into one all-encompassing function to then be able to see the result for different input values. While the steps work properly when standalone (and when given just one input), I am having trouble getting the overall function to work. The answer I am getting back is a vector of 1s, which is incorrect.
The goal is to count the number of occurrences of consecutive zeroes in the randomly generated results, and then to see how the probability of consecutive zeroes occurring changes as I change the initial percentage input provided.
Does anyone have a tip for what I'm doing wrong? I have stared at this at several separate points now but cannot figure out where I'm going wrong. Thanks for your help.
### Example
pctgs_seq=seq(0.8,1,.01)
occurs=20
iterations=10
iterate_pctgs=function(x) {
probs=rep(0,length(pctgs_seq))
for (i in 1:length(pctgs_seq)) {
all_sims=lapply(1:iterations, function (x) ifelse(runif(occurs) <= i, 1, 0))
totals=sapply(all_sims,sum)
consec_zeroes=function (x) {
g=0
for (i in 1:(length(x)-1))
{ g= g+ifelse(x[i]+x[i+1]==0,1,0) }
return (g) }
consec_zeroes_sim=sapply(all_sims,consec_zeroes)
no_consec_prob=sum(consec_zeroes_sim==0)/length(consec_zeroes_sim)
probs[i]=no_consec_prob }
return (probs)
}
answer=iterate_pctgs(pctgs_seq)
Im trying to show the total number of people in each geography when they hover over using crossfilter, but my current code is only showing the total of all geographies. So what is the equivalent in crossfilter to the sql query: SELECT COUNT(*) GROUP BY dma
This is my code so far
//geography that is being hovered over, getting dma name and removing everything that is after the comma
sel_geog = layer.feature.properties.dma_1;
sel_geog = sel_geog.split(",")[0];
console.log(sel_geog);
//crossfilter to get total number of people of each geography
var dmaDim = voter_data.dimension(function(d) {return d.dma == sel_geog}),
dma_grp = dmaDim.groupAll().reduceCount().value();
console.log(dma_grp);
Crossfilter isn't meant to be used in a way where you are building new dimensions and groups for each user interaction. It's meant to build dimensions and groups before interactions take place and then update them quickly when filtering based on user interactions.
It's not really clear from this question what your data looks like or what you are trying to do, but you probably want to create dimensions and group for your dma property and then build your map based on that:
var voter_data = crossfilter(my_data);
var dmaDim = voter_data.dimension(function(d) { return d.dma; });
var dmaGroup = dmaDim.group();
At this point dmaGroup.all() will be an array of objects that looks like { key: 'dmaKey', value: 10 } where 10 is the count of all records where d.dma === 'dmaKey'. There are lots of ways you can aggregate differently with Crossfilter, but that may get you started.
I've been searching for similar solutions out there but am coming up short so far. Here is what I want to accomplish:
I need to come up with a basic solution to sync inventory quantities at the end of each day. We take physical counts of inventory sold throughout the day but need something to log these changes and share between users. I would like to utilize two buttons (click one to subtract amount of items sold at the end of the day and click one button to add newly received inventory).
This is how my sheet is set up:
Col A: Product Tag
Col B: Product sku
Col C: Amount Sold Today
Col D: Total Inventory Quantity
Col E: Add New Inventory
Column D will be pre-populated with initial inventory counts. At the end of each day, I would like to go down my product list and fill in the amount of each item sold that day in Column C. Once Column C is fully populated, I would like to click the "subtract" button and have Column C subtracted from Column D.
On the other side, once we receive new stock of an item I would like to enter these counts into Column E. Once this column is fully populated, I would like to click the "Add" button and have Column E added to Column D. Ideally once the add or subtract function has been completed, columns C or E will be cleared and ready for the next days entry.
I already have designed my buttons, I just need help coming up with the scripts to accomplish this.
You can use Google Apps Script for this.
If you are unfamiliar, in your particular spreadsheet, go to Tools → Script Editor and then select the Blank Project option.
Then you can write functions like this to achieve what you want!
function subtractSold() {
var sheet = SpreadsheetApp.getActiveSheet();
var c1 = sheet.getRange("C2");
var c2 = sheet.getRange("D2");
while (!c1.isBlank() && !c2.isBlank()){
c2.setValue(c2.getValue() - c1.getValue());
c1.clear();
c1 = c1.offset(1, 0);
c2 = c2.offset(1, 0);
}
}
Basically what the function does is:
Get a reference to the active spreadsheet
Get references to the cells C2 and D2, for the first row of data.
Use a while loop to repeated go through the rows. Terminate when either cell is empty.
In the loop, we get the appropriate values, subtract and set the value back into the cell. Then we clear the cell in column C. We then move both cell references down by one row (the offset method returns a reference to the original cell, but offset by row, column).
Then assign the script to the button image by entering the name of the function (subtractSold in this case) in the "Assign script" option for the button.
I have made an example sheet here (go to File → Make a Copy to try the scripts and see the code): https://docs.google.com/spreadsheets/d/1qIJdTvG0d7ttWAUEov23HY5aLhq5wgv9Tdzk531yhfU/edit?usp=sharing
A bit faster
If you try the sheet above you can see it processes one row at a time, which might get pretty slow when you have a lot of rows. It is probably faster to process the entire column in bulk, but it may be a bit more complicated to understand:
function subtractSoldBulk() {
var sheet = SpreadsheetApp.getActiveSheet();
var maxRows = sheet.getMaxRows();
var soldRange = sheet.getRange(2, 3, maxRows); // row, column, number of rows
var totalRange = sheet.getRange(2, 4, maxRows);
var soldValues = soldRange.getValues();
var totalValues = totalRange.getValues();
for (var row in soldValues) {
var soldCellData = soldValues[row][0];
var totalCellData = totalValues[row][0];
if (soldCellData != "" && totalCellData != "") {
totalValues[row][0] = totalCellData - soldCellData;
soldValues[row][0] = "";
}
}
soldRange.setValues(soldValues);
totalRange.setValues(totalValues);
}
The difference here is that instead of getting one cell, we get one range of cells. The getValues() method then gives us a 2D array of the data in that range. We do the calculations on the two arrays, update the data in the arrays, and then set the values of the ranges based on the array data.
You can find documentation for the methods used above from Google's documentation: https://developers.google.com/apps-script/reference/spreadsheet/sheet
We're displaying time series data (utilisation of a compute resource, sampled hourly over months) on a stacked area chart using D3.js:
d3.json("/growth/instance_count_1month.json", function( data ) {
data.forEach(function(d) {
d.datapoints = d.datapoints.map(
function(da) {
// NOTE i'm not sure why this needs to be multiplied by 1000
return {date: new Date(da[1] * 1000),
count: da[0]};
});
});
x.domain(d3.extent(data[0].datapoints, function(d) { return d.date; }));
y.domain([0,
Math.ceil(d3.max(data.map(function (d) {return d3.max(d.datapoints, function (d) { return d.count; });})) / 100) * 100
]);
The result is rather spiky for my tastes:
Is there an easy way to simplify the data, either using D3 or another readily available library? I want to reduce the spikiness, but also reduce the volume of data to be graphed, as it will get out of hand.
I have a preference for doing this at the UI level, rather than touching the logging routines (even though redundant JSON data will have to be transferred.)
You have a number of options, you need to decided what is the best way forward for the type of data you have and the needs of it been used. Without knowing more about your data the best I can suggest is re-sampling. Simply report the data at longer intervals ('rolling up' the data). Alternatively you could use a rolling average or look at various line smoothing algorithms.