I am trying to write a single LINQ query to compute aggregates like sum/average/min/max and count. In my app, user selects the aggregate, which will be sent from the UI and stored in a variable. How can I use aggregate dynamically in LINQ query?
Sample query:
var selectedAggregate ="Count";
var xaxisparam2 = (from b in FiltersList
where (--column name--)
group b by (--column name--) into c
select new
{
XaxisVal = c.Key,
AggreMeasure = c.Average(--column name --),
}).ToList();
AggreMeasure = c.Average(--column name --),
In this line in place of "Average", user selected aggregate has to be used dynamically.
Related
I need to go through few millions of data searching for a year sent as a parameter to a method. The year comes as a varchar.
This is the query I'm working with
SELECT X,Y
FROM A
WHERE mch_code = 'KN'
AND contract = '15KTN'
AND to_char(cre_date, 'YYYY') = year_;
cre_ date is of type date and year_ is from type carchar.
when performing this query it take around 25 minutes to process it completely.
Is anyone knows about a different approach to find out the quick execution.
Please help.
This didn't work out.
SELECT X,Y
FROM A
WHERE mch_code = 'KN'
AND contract = '15KTN'
AND cre_date LIKE '%2013';
The reason might be 'cre_date' and '%2013' are of different types
If you have an index on (mch_code, contract, cre_date) columns, you can improve performance by doing something like:
select x, y
from a
where mch_code = 'KN'
and contract = '15KTN'
and cre_date >= to_date('01/01/'||year_, 'dd/mm/yyyy')
and cre_date < add_months(to_date('01/01/'||year_, 'dd/mm/yyyy'), 12);
Even better would be to declare the start of the year as a DATE variable prior to running the sql, eg:
v_year_dt := to_date('01/01/'||year_, 'dd/mm/yyyy');
which would make the query:
select x, y
from a
where mch_code = 'KN'
and contract = '15KTN'
and cre_date >= v_year_dt
and cre_date < add_months(v_year_dt, 12);
If you don't have an index on those three columns, you could create a function based index on (mch_code, contract, to_char(cre_date, 'yyyy')) that should help speed up your query, depending on the percentage of rows you're expecting to select. It may help even more if you added the x and y columns into the index, so that no table access was required at all.
Alternatively, you could think about partitioning the table on cre_date, monthly or yearly.
The reason your query is slow is that you're applying a function to a column on every row in your table. Let's try it another way:
SELECT X,Y
FROM A
WHERE mch_code = 'KN' AND
contract = '15KTN' AND
CRE_DATE BETWEEN TO_DATE('01/01/' || year_, 'DD/MM/YYYY')
AND TO_DATE('01/01/' || year_, 'DD/MM/YYYY') + INTERVAL '1' YEAR;
This eliminates the need to apply a function against every row in the table, and should allow any indexes on CRE_DATE to be used.
Best of luck.
You can try with EXTRACT function:
SELECT X,Y
FROM A
WHERE mch_code = 'KN'
AND contract = '15KTN'
AND EXTRACT(YEAR FROM cre_date) = year_;
I have a dictionary of names with a number (a score) assigned to them. The file is laid out as so:
Person A,7
Peron B,6
If a name is repeated in the file e.g. Person B occurred on 3 lines with 3 different scores I want to calculate the mean average of these scores then append this result to a dictionary in the form of a list. However, I keep encountering an error when i try to sort the dictionary. Code below.
else:
for key in results:
keyValue = results[key]
if len(keyValue) > 1:
# Line below this needs modification
keyValue = list(sum(keyValue)/len(keyValue))
newResults[key] = keyValue
# Error in above code...
else:
newResults[key] = keyValue
print(newResults)
print(sorted(zip(newResults.values(), newResults.keys()), reverse=True))
Results is a dictionary of the people (the keys) and their scores (the values) where the values are lists so that:
results = {'Bob':[7],'Jane':[8,9]}
If you're using Python 3.x you can use its statistics library which contains a function mean. Now assuming that your dict looks like: results = {'Bob': [7], 'Jane': [8, 9]} you can create a newResults dict like this:
from statistics import mean
newResults = {key: mean(results[key]) for key in results}
This is called dict comprehension and as you can see it's kinda intuitive. Starting with { you're telling that dict is going to be created. Then with key: value you're defining its structure. Lastly, with for loop you iterate over a collection that will be used for the dict creation. You can achieve the same with:
newResults = {}
for key in results:
newResults[key] = mean(results[key])
You want to sort the dict in the end. Unfortunately it's not possible. You can either create an OrderedDict, which remembers the items insertion order or a list which will contain sorted keys to your dict. The latter will look like:
sortedKeys = sorted(newResults, key=lambda x: newResults[x])
This is a newbie R question. I am beginning to explore the use of R for website analytics. I have a set of page view events which have common properties along with an arbitrary set of properties that depend on the page. For instance, all events will have a userId, createdAt, and pageId, but the "signup" page might have a special property origin whose value could be "adwords" or "organic", etc.
In JSON, the data might look like this:
[
{
"userId":null,
"pageId":"home",
"sessionId":"abcd",
"createdAt":1381013741,
"parameters":{},
},
{
"userId":123,
"pageId":"signup",
"sessionId":"abcd",
"createdAt":1381013787,
"parameters":{
"origin":"adwords",
"campaignId":4
}
}
]
I have been struggling to represent this data in R data structures effectively. In particular I need to be able to subset the event list by conditions based on the arbitrary key/value pairs, for instance, select all events whose pageId=="signup" and origin=="adwords".
There is enough diversity in the keys used for the arbitrary parameters that it seems unreasonable to create sparsely-populated columns for every possible key.
What I'm currently doing is pre-processing the data into two CSV files, core_properties.csv and parameters.csv, in the form:
# core_properties.csv (one record per pageview)
userId,pageId,sessionId,createdAt
,home,abcd
123,signup,abcd,1381013741
...
# parameters.csv (one record per k/v pair)
row,key,value # <- "row" here denotes the record index in core_properties.csv
1,origin,adwords
1,campaignId,4
...
I then read.table each file into a data frame, and I am now attempting to store the k/v pairs a list (with names=keys) inside cells of the core events data frame. This has been a lot of awkward trial and error, and the best approach I've found so far is the following:
events <- read.csv('core_properties.csv', header=TRUE)
parameters <- read.csv('parameters.csv',
header=TRUE,colClasses=c("character","character","character"))
paramLists <- sapply(1:nrow(events), function(x) { list() })
apply(parameters,1,function(x) {
paramLists [[ as.numeric(x[["row"]]) ]][[ x[["key"]] ]] <<- x[["value"]] })
events$parameters <- paramLists
I can now access the origin property of the first event by the syntax: events[1,][["parameters"]][[1]][["origin"]] - note it requires for some reason an extra [[1]] subscript in there. Data frames do not seem to appreciate being given lists as individual values for cells:
> events[1,][["parameters"]] <- list()
Error in `[[<-.data.frame`(`*tmp*`, "parameters", value = list()) :
replacement has 0 rows, data has 1
Is there a best practice for handling this sort of data? I have not found it discussed in the manuals and tutorials.
Thank you!
You can use nested lists in R that map nicely to JSON. I have shown a simple example where you filter based on parameter origin.
dat <- list(
list(userId = NULL, pageId = "home", createdAt = 1381013741, parameters = list()),
list(userId = NULL, pageId = "new", createdAt = 1381013741, parameters = list(origin = 'adwords', campaignId = 4))
)
Filter(function(l){length(l) > 0 && l$parameters$origin == 'adwords'}, dat)
I am looking over some code that another programmer made where he calls a stored procedure. Before calling it, he creates an Array with the parameters needed for the stored procedure to query the table. He creates the array like this:
param = Array("#Name", 3, 8, "Tom", _
"#Age", 3, 8, 28, _
"#City", 100, 200, "Toronto)
The stored procedure uses #Name, #Age and #City to query the table.
My question is, what are the numbers in between for?
It looks like:
#Name = parameter name
3 = adInteger
8 = length
"Tom" = value
#Age= parameter name
3 = adInteger
8 = length
28 = value
#City= parameter name
100 = length
200 = adVarChar
"Toronto = value
Here is a list for the other ADO Data Types -
http://www.w3schools.com/ado/ado_datatypes.asp
My guess is that he is using an array of params, just something like this: https://stackoverflow.com/a/10142254/2385, where I use an array of params to pass to a function who add the params to the ADO command.
Without comments it's impossible to know for sure or without stepping through the code.
Otherwise, if this is asp.net the best you can do is look at the SqlParameter class and see the properties it has available:
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlparameter.aspx
I think you have two strong candidates for ParameterName and Value, but the two numerical values could be a few different things. 3 just happens to be the numerical value for SqlDbType.Char and while 100 has no corresponding SqlDbType, the default for that type is NVarChar.
The next number could be precision. Take a look at the Database table and see if you can match those values to the fields. For example, is City VarChar(200)?
Suppose In database their is column called INCOME_PER_DAY. I bring data of this column in the gridview .
Now My question is that I want to find the total sum of the column INCOME_PER_DAY using C# .how to do this?
Please tell me.
Do this on server-side (database).
Return 2 recordsets: one with details and the second one (one row) with SUM(INCOME_PER_DAY).
or use this query:
SELECT ROW_TYPE = 1, FIELD1, FIELD2, FIELD3, INCOME_PER_DAY FROM MYSALES
UNION ALL
SELECT ROW_TYPE = 2, NULL, NULL, NULL, INCOME_PER_DAY = SUM(INCOME_PER_DAY) FROM MYSALES
ROW_TYPE = 1 - detail row
ROW_TYPE = 2 - summary row
On a page, use, for example, datagrid in the ItemDataBound event handler: check ROW_TYPE to apply valid CSS style (detail and summary)
Unfortunately, you have to loop through the column and add up rows line-by-line.