multiply(num) aggregate function in postgresql - aggregate-functions

This could be incredibly simple by the documentation is quite on it. Is there a way to aggregate columns via multiplication operator in postgresql. I know i can do count(column) or sum(column), but is there a multiply(column) or product(column) function that i can use. If not, any ideas how to achieve it.
I'm using postgres 9.1
regards,
Hassan

Sure, just define an aggregate over the base multiplication function. E.g. for bigint:
CREATE AGGREGATE mul(bigint) ( SFUNC = int8mul, STYPE=bigint );
Example:
regress=> SELECT mul(x) FROM generate_series(1,5) x;
mul
-----
120
(1 row)
See CREATE AGGREGATE

Here is a version that works for all numerical data types:
CREATE FUNCTION mul_sfunc(anyelement, anyelement) RETURNS anyelement
LANGUAGE sql AS 'SELECT $1 * coalesce($2, 1)';
CREATE AGGREGATE mul(anyelement) (
STYPE = anyelement,
INITCOND = 1,
SFUNC = mul_sfunc,
COMBINEFUNC = mul_sfunc,
PARALLEL = SAFE
);

Related

Use weighted.mean in summary_rows GT package

I've been searching around for a solution to using weighted.mean with summary_rows in GT package.
summary_rows function only accepts functions in form foo(x), therefore functions with more variables such as weighted.mean(x,w) is not accepted.
When using summary_rows with groups, such as:
summary_rows(groups = T, columns = c, fns = list("average" = ~mean(.)),...)
It takes vector of values for each group and then runs them through the mean() function, resp. the list of chosen functions.
My solution to this is quite cumbersome. I wrote my own custom function, that takes the vector of values provided by summary_rows and compares it to expected vectors using if statements. This only works for single columns at a time so it is quite a lot of code, both in the custom functions and in the code for the GT table.
weighted_mean_age <- function (x) {
if (all(x == some.data$age.column[some.data$group.column == "group name"])) {
weighted.mean(x, some.data$no.occurences[some.data$group.column == "group name"])
} else if (another vector) {
And so on for every group.
}
}
Did anyone deal with the same problem, but came up with less cumbersome solution? Did I miss something in the GT package?
Thank you for your time and ideas.
First I need to clarify the assumption that I used for this answer:
What you want is to pass something like weighted.mean(.,w) to this summary_rows() function.
However this isn't possible due to the problems with the gt library that you outlined in your question. If that is the case then I do believe I have a solution:
I've done some similar 'hacks' when I was creating some very specific Python scripts. It essentially revolved around mapping the functions that I wanted to use using some specific container. Thus I searched the R language sources if something like this is also possible in R and apparently it is using factory functions and storing them in some container. Here is a step by step guide:
You first need to create a factory function for your weighted.mean as such:
my_mean <- function(w) { function(x) { weighted.mean(x,w) } }
then you need to populate some kind of a container with your new functions (I am using a list):
func_list <- list()
func_list[[some_weight]] <- my_mean(some_weight)
func_list[[different_w]] <- my_mean(different_w)
#etc...
Once you've done that you should be able to pass this as a function to summary_rows i.e.:
summary_rows(
groups = T,
columns = c,
fns = list("w_mean" = ~func_list[w](.)),
...)
Bare in mind that you have to put the w values in yourself using some form of a mapping function or a loop.
Hope it is what you are looking for and I hope it helps!

full_join by date plus one or minus one

I want to use full_join to join two tables. Below is my pseudo code:
join <- full_join(a, b, by = c("a_ID" = "b_ID" , "a_DATE_MONTH" = "b_DATE_MONTH" +1 | "a_DATE_MONTH" = "b_DATE_MONTH" -1 | "a_DATE_MONTH" = "b_DATE_MONTH"))
a_DATE_MONTH and b_DATE_MONTH are in date format "%Y-%m".
I want to do full join based on condition that a_DATE_MONTH can be one month prior to b_DATE_MONTH, OR one month after b_DATE_MONTH, OR exactly equal to b_DATE_MONTH. Thank you!
While SQL allows for (almost) arbitrary conditions in a join statement (such as a_month = b_month + 1 OR a_month + 1 = b_month) I have not found dplyr to allow the same flexibility.
The only way I have found to join in dplyr on anything other than a_column = b_column is to do a more general join and filter afterwards. Hence I recommend you try something like the following:
join <- full_join(a, b, by = c("a_ID" = "b_ID")) %>%
filter(abs(a_DATE_MONTH - b_DATE_MONTH) <= 1)
This approach still produces the same records in your final results.
It perform worse / slower if R does a complete full join before doing any filtering. However, dplyr is designed to use lazy evaluation, which means that (unless you do something unusual) both commands should be evaluated together (as they would be in a more complex SQL join).

Using apply function for an existing function

I am using "edgarWebR" package to get the data from USSEC EDGAR website. There is a function in the package called "company_filings", which has several arguments and I would like to use four of the arguments and it should be like this -
company_filings (comp, type = c('10-K','10-Q'), before = 20181231, count = 40)
where comp is a vector defined as follows -
comp <- c ("AAPL", "GOOG", "INTC")
but the company_filings function accepts only one element at a time in comp vector - for example -
company_filings ("AAPL", type = c('10-K','10-Q'), before = 20181231, count = 40)
Actually, I use the following code to get the results for all elements in comp vector -
filing <- Reduce(rbind, lapply(comp, company_filings))
but it does not work. Can anybody help me in this respect?
I appreciate your help.
To use functions in the apply family, the function in question should be of a single variable. You can create an anonymous function of one variable from a function of several variables and do something like:
sapply(comp,function(x){company_filings (x, type = c('10-K','10-Q'), before = 20181231, count = 40)})

How to add new keys and values to existing hash table in R?

Using hash package in R I created a hast table with keys and values. I want to add new keys and values to the existing hashtable. Is there any way?
Suppose
ht <- hash(keys = letters, values = 1:26)
And I need to add new keys and values to ht.
Is there any way other than
for eg :
ht$zzz <- 45
The documentation for the hash package provides a number of syntax varieties for adding new elements to a hash:
h <- hash()
.set( h, keys=letters, values=1:26 )
.set( h, a="foo", b="bar", c="baz" )
.set( h, c( aa="foo", ab="bar", ac="baz" ) )
The first .set option would seem to be the best for bulk inserts of key value pairs. You would only need a pair of vectors, ordered in such a way that the key value representation is setup the way you want.

Neo4j v2. Are aggregate functions usable as assignment (set) values?

I can return an aggregate such as:
match (u:User)-[:LIKED]->(o:Offer) return count(u) as numLikes
...but I can't assign from it and keep it pre-counted for speed:
match (u:User)-[:LIKED]->(o:Offer) set o.numLikes = count(u)
Is this possible without using two separate statements?
You need to use WITH:
MATCH (u:User)-[:LIKED]->(o:Offer)
WITH o, count(u) AS c
SET o.numLikes = c
You have to complete the aggregation before you can use the aggregated value, you can do this with WITH, something like
MATCH (u:User)-[:LIKED]->(o:Offer)
WITH o, COUNT(u) as numLikes
SET o.numLikes = numLikes

Resources