Moving averages with crossfilter.js and dc.js - crossfilter

I would like to use crossfilter.js with dc.js in order to represent moving averages.
Is it possible to calculate moving averages on a set of data or if the data passed in has to already have the averages calculated?
If it's possible, can you please provide a short example.
Thank you,

I would love to be proven wrong but I can't really think of a way to do this in crossfilter.
In the case of a timeseries, the reduce() function would be used to perform aggregations across data, grouped by date, as per the highlighted cells below:
With a moving average you'd need to perform an operation by looping down the rows:
Server-side, I wouldn't choose SQL to calculate a moving average on database data; likewise I wouldn't use crossfilter client-side.

I don't know if you're still after any help with this, but having just overcome this problem myself, I thought I'd share my solution in case it helps anyone else.
First I declare some pretty standard reduce functions to return a simple average, with the addition of two variables to store the averages for t-1 and t-2. I created a 3 period moving average, but if you want a longer one you could easily replace these vars with an array which you push and splice the data to/from.
function movingAveInit(){
return {
total: 0,
count: 0,
average: 0,
//These are the two vars for storing previous periods
avet1: 150000,
avet2: 150000
};
}
function movingAveAdd(p,v){
//'apples' is the property that you're interested in finding the moving average for
p.total = p.total + v.apples;
p.count = p.count +1;
p.avet2 = p.avet1;
p.avet1 = p.average;
p.average = p.total/p.count;
return p;
}
function movingAveRemove(p,v){
p.total = p.total - v.apples;
p.count = p.count - 1;
p.avet2 = p.avet1;
p.avet1 = p.average;
p.average = p.total/p.count;
return p;
}
Next call these on your group (the dimension should be whichever time period you are using):
var movingAverageGrp = yourDimension.group().reduce(movingAveAdd, movingAveRemove, movingAveInit);
Now when you build a chart with this group, add the following to its valueAccessor():
yourChart
.valueAccessor(function (d) {
var mAv = ((d.value.avet2 + d.value.avet1 + d.value.average) / 3);
return mAv;
})
If you stored previous periods in an array, you can calculate the moving average by using this function instead:
.valueAccessor(function (d) {
var sum = d.value.yourArray.reduce(function(a, b){ return a + b; });
var count = d.value.yourArray.length;
var mAv = (sum / count);
return mAv;
})
It's not a particularly beautiful solution, but it seems to work for me.

Related

Looping through keys in Map() object

I'm working with Map() and need an efficient method to loop through all the keys.
Specifically the keys are non matrices, and the image is a t_List of real vectors.
My current method is to turn the Map into a matrix and loop through like below
M = Map();
...\\fill up the map with stuff
matM = Mat(M);
for(i=1, matsize(M)[1],
L = matM[i,2];
\\ proceed to do stuff with L
);
However my understanding is that matM will create a copy of the data inside M, which I'd like to avoid if possible. My only other thought is to create a supplementary list of the ideals as the Map is filled, and then to iterate through that.
Is there a better way to handle this?
You can loop the map using a foreach().
{
foreach(M, item,
my(key = item[1][1]);
my(value = item[1][2]);
print(Str(key, ": ", value));
);
}
It looks a little bit weird because the variable item contains a vector whose first position it's another vector with the key and the value.
If you're going to use it often you could define a function like this:
foreachMap(M, expr) = foreach(M, item, expr(item[1][1], item[1][2]));
lambda = (key, value) -> print(Str(key, ": ", value));
foreachMap(M, lambda);

How to call a function for every position within a string in R?

As my question suggests, I have been tasked with writing a function which calls another function at every position of a vector. The following is the original function that I currently have:
find.TATA = function(k,s) {
v = string.to.vec(s)
i = v[k:(k+5)]
TATA = "TATAAA"
TATA.v = string.to.vec(TATA)
return(all(i==TATA.v))
}
As you can see, the function takes both a string (in this case a DNA sequence) and a position (k) within the sequence, and returns either TRUE or FALSE depending on whether "TATAAA" occurs at position k.
I was wondering how it would be possible to write a second function which calls the first function at every position in the input string (1:995). The result should return either TRUE or FALSE for every position. I will then modify the function using a dummy variable to count the number of times the result comes up as TRUE. Thanks in advance!
(P.S. Could any solutions please try to avoid using content from packages as we have been told to solve this using base R functionality)
This is a fairly primitive way of doing things:
count.TATA <- function(string) {
count <- 0
for (i in 1:nchar(string)-5) {
if (substr(string, i, i+5) == "TATAAA") {
count <- count + 1
}
}
return(count)
}

for loop function in Seurat analysis in R

I have tried to get multiple outputs of the function I made
ratio_marker_out_2 = function(marker_gene, cluster_id){
marker_gene = list(row.names(FindMarkers(glioblastoma, ident.1 = cluster_id)))
for (gene in marker_gene){
all_cells_all_markers = glioblastoma#assays$RNA#counts[gene,]
selected_cells_all_marker = all_cells_all_markers[cluster_id!=Idents(glioblastoma)]
gene_count_out_cluster = glioblastoma#assays$RNA#counts[,cluster_id!=Idents(glioblastoma)]
ratio_out = sum(selected_cells_all_marker)/sum(gene_count_out_cluster)
}
return(ratio_out)
}
Here, the length of marker_gene is about hundreds. Let's say the length is 100. I want to get ratio_out of each gene in marker_gene. However, when running this function, I only get one output instead of a list of 100 ratio_out. Could please anyone helps how to fix it?
The output I got for
ratio_marker_out_2(marker_gene, 0)
is 1 0.5354895. Please see the pict below
It can be that sum built-in function.
By default, it returns a number. So when you do:
ratio_out = sum(selected_cells_all_marker)/sum(gene_count_out_cluster)
you're actually dividing two numerics.
So if you want to return a list, you must divide, depending on your calculations, just
ratio_out = (selected_cells_all_marker)/sum(gene_count_out_cluster)
I have solved this issue using
all_cells_all_markers[marker_gene, cluster_id!=Idents(glioblastoma)]
ratio_out = (selected_cells_all_marker)/sum(gene_count_out_cluster).

Run R Script on Power BI Column

I have a Dataset in Power Bi with many columns, which contain information on incident tickets (e.g. How long it took to solve the issue, etc.)
Unfortunately the data I'm getting is not in the correct Time format. I wrote a simple R Function which would re-calculate the Time and return the correct value:
calculateHours <- function(hours) {
x <- trunc(hours/24)
rest <- mod(hours,24)
y <- trunc(rest/10)
z <- mod(rest,10)
result <- (((x+y)*10)+z)
return(result)
}
Example: 204 hours would turn into 92 hours if you run this through the Function.
Now I need to have a new column with the calculated values in it.
E.g. 'Business Elapsed Time = 204' -> 'Business Elapsed Time calculated (new Column) = 92'
How can I use this function in Power BI to add a new column which uses the values from another column of this table and then calculates the correct time values?
I'm still new to Power Bi and R so any help would be appreciated! Thanks in advance!
In Power BI Query Editor you can add an R Script (Transform -> Run R script) to your query. Here's a simple example that assumes you have a column Number:
# 'dataset' holds the input data for this script
myfunction <- function(x)
{
return (x + 1)
}
dataset$NewNumber <- myfunction(dataset$Number) ## apply function and add result as new column
output <- dataset ## PowerBI uses "output" as result from this query step
Here's a more detailed intro: https://www.red-gate.com/simple-talk/sql/bi/power-bi-introduction-working-with-r-scripts-in-power-bi-desktop-part-3/
Power Query can handle most of calculation itself using M formula. It would be much simpler than invoking R script, more integrated, and probably faster.
In Power Query Editor, navigate Add Column > Custom Column, then input M formula like below.
let
x = Number.IntegerDivide([Hours], 24),
rest = Number.Mod([Hours], 24),
y = Number.IntegerDivide(rest, 10),
z = Number.Mod(rest, 10),
result = (x + y) * 10 + z
in
result

Using a for loop to count negative values and add to new column

I'm pretty new to R and I want to experiment with different things in R.
I have created a dataframe and I want to use a for loop to select the number of negative values. I want to add this number to a new column in my dataframe. I know there are easier ways to do this, but I really want to get the hang of loops in R.
Does anyone of you have any advice for me? I'll post my data frame below.
newframe <- data.frame(V1=runif(500, min=-2, max=2),
V2=runif(500, min=-2, max=2))
Thanks in advance!
Yes, there are clearly easier way to do this, but I will give you some advice for a loop.
First, let's create an additional column :
nf=data.frame(newframe,neg=rep(NA)) #I called the column "neg"
Second, you have to define the range where the loop will start and end :
for (i in 1:length(newframe[,1])) {
}
comment : I wrote length(newframe[,1]) instead of 500 directly. It is always better to indicate something that is linked with the object you are using. For example, if you are adding more rows to your data.frame, it will be handled and your loop will still work.
Inside of the loop, if you only want the negative numbers you will have to use the IF condition to get them :
(we are working here on the first column of your data frame but you could also do a "for" loop to work on every column)
if(newframe[i,1]<=0){
nf$neg[i]=newframe[i,1]
}
Then you have your solution for the first column ! I let you do alone the second column :)
First get the number of rows and columns of your dataset:
nr = NROW(newframe); # you can use the lowercase variant as well or the dim command
nc = NCOL(newframe);
and then cycle through rows and colums:
response = 0;
for(ir in 1:nr) { // cycle through rows
for(ic in 1:nc) { // cycle through columns
response = response + ifelse(newframe[ir, ic]<0, 1, 0);
}
}
The response variable contains the number of negative values of your dataset.
The cycle works like a for-each statement, lets take into account the outer for-loop: ir will take all the values of the series generated by the command 1:nr (that is 1, 2, ... nr).
Another example is
x = c("a", "b", "c");
for(v in x) {
cat(v, "\n");
}
you'll get the output
> a
b
c
This means you can achive the same result by using a single for-loop:
response = 0;
for(v in unlist(newframe)) { # you need to un-list the dataframe otherwise you would cycle through the columns...
response = response + ifelse(v<0, 1, 0);
}
And you can always switch from a for-loop to a while loop if you prefer:
response = 0;
ir = 1;
while(ir<=nr) { // cycle through rows
ic = 1;
while(ic<=nc) { // cycle through columns
response = response + ifelse(newframe[ir, ic]<0, 1, 0);
ic = ic + 1;
}
ir = ir + 1;
}
I hope I have answered your question :)

Resources