get average from linq query - asp.net

I would need to get an average from the below linq query but the data field being varchar, i am getting error while doing that.
var _districtResult = (from ls in SessionHandler.CurrentContext.LennoxSurveyResponses
join ml in SessionHandler.CurrentContext.MailingListEntries on ls.SurveyCode equals ml.SurveyCode
join m in SessionHandler.CurrentContext.MailingLists on ml.MailingListId equals m.MailingListId
join ch in SessionHandler.CurrentContext.Channels on m.ChannelId equals ch.ChannelId
join chg in SessionHandler.CurrentContext.ChannelGroups on ch.ChannelGroupId equals chg.ChannelGroupId
join tchg in SessionHandler.CurrentContext.ChannelGroups on chg.ParentChannelGroupId equals tchg.ChannelGroupId
where tchg.ParentChannelGroupId == model.LoggedChannelGroupId
select ls).ToList();
ls contains Score1, Score2, Score3, Score4. All these are varchar(50) in the database. Also they allow null values too.
If I need to get an average for Score1 and pass that to my model data, how would i do that?
I tried model.AvgScore1 = _districtResult.Select(m => m.Score1).Average().Value. But i get an error while doing so..

you have to convert (in your example) m.Score1 to a numeric type (here is the MSDN documentation for the Average method. You can not run a mathematical function on string data. you should also probably check for null.
something along the lines of this:
_districtResult.Select(m => string.IsNullOrEmpty(m.Score1) ? 0 : //null, use 0
Double.Parse(m.Score1)).Average() //to double

Related

Count(case when) redshift sql - receiving groupby error

I'm trying to do a count(case when) in Amazon Redshift.
Using this reference, I wrote:
select
sfdc_account_key,
record_type_name,
vplus_stage,
vplus_stage_entered_date,
site_delivered_date,
case when vplus_stage = 'Lost' then -1 else 0 end as stage_lost_yn,
case when vplus_stage = 'Lost' then 2000 else 0 end as stage_lost_revenue,
case when vplus_stage = 'Lost' then datediff(month,vplus_stage_entered_date,CURRENT_DATE) else 0 end as stage_lost_months_since,
count(case when vplus_stage = 'Lost' then 1 else 0 end) as stage_lost_count
from shared.vplus_enrollment_dim
where record_type_name = 'APM Website';
But I'm getting this error:
[42803][500310] [Amazon](500310) Invalid operation: column "vplus_enrollment_dim.sfdc_account_key" must appear in the GROUP BY clause or be used in an aggregate function; java.lang.RuntimeException: com.amazon.support.exceptions.ErrorException: [Amazon](500310) Invalid operation: column "vplus_enrollment_dim.sfdc_account_key" must appear in the GROUP BY clause or be used in an aggregate function;
Query was running fine before I added the count. I'm not sure what I'm doing wrong here -- thanks!
You can not have an aggregate function (sum, count etc) without group by
The syntax is like this
select a, count(*)
from table
group by a (or group by 1 in Redshift)
In your query you need to add
group by 1,2,3,4,5,6,7,8
because you have 8 columns other than count
Since I don't know your data and use case I can not tell you it will give you the right result, but SQL will be syntactically correct.
The basic rule is:
If you are using an aggregate function (eg COUNT(...)), then you must supply a GROUP BY clause to define the grouping
Exception: If all columns are aggregates (eg SELECT COUNT(*), AVG(sales) FROM table)
Any columns that are not aggregate functions must appear in the GROUP BY (eg SELECT year, month, AVG(sales) FROM table GROUP BY year, month)
Your query has a COUNT() aggregate function mixed-in with non-aggregate values, which is giving rise to the error.
In looking at your query, you probably don't want to group on all of the columns (eg stage_lost_revenue and stage_lost_months_since don't look like likely grouping columns). You might want to mock-up a query result to figure out what you actually want from such a query.

NetSuite Saved Search criteria formula {activity.assigned} = specific employee

Have a saved search using a formula(numeric) as criteria checking for 0/1 result. Below is example of functional piece:
CASE WHEN {activity.date} >= {today}-90 AND {activity.assigned} = {salesrep} THEN 1 ELSE 0 END
We also require looking up a specific sales rep here where we know the employee record. We have tried the below:
CASE WHEN {activity.date} >= {today}-90 AND {activity.assigned} = EMPLOYEE_ID THEN 1 ELSE 0 END
Where EMPLOYEE_ID = the internalId of the Employee. This yields an error: Your formula has an error in it. It could resolve to the wrong datatype, use an unknown function, or have a syntax error. Please go back, correct the formula, and re-submit. This happens with "" surrounding it as well. I believe it's looking for a Sales Rep Object, though no idea how to supply this via the formula input field.
Use parts of the formula in results to see what is being returned. For an Activity returning {Assigned} is not numeric but text of the name (entityid / EMPLOYEE ID) , ex E55 Doe, Jane.
case when {assigned} = 'E44 Doe, Jane' then 1 else 0 end

SQL Query assistance - Looping through data query

I have two tables. Config and Data. Config table has info to define what I call "Predefined Points". The columns are configId, machineId, iotype, ioid, subfield and predeftype. I have a second table that contains all the data for all the items in the config table linked by configId. Data table contains configId, timestamp, value.
I am trying to return each row from the config table with 2 new columns in the result which would be min timestamp of this particular predefined point and max timestamp of this particular predefined point.
Pseudocode would be
select a.*, min(b.timestamp), max(b.timestamp) from TrendConfig a join TrendData b on a.configId = b.configId where configId = (select configId from TrendConfig)
Where the subquery would return multiple values.
Any idea how to formulate this?
Try an inner join:
select a.*, b.min(timestamp), b.max(timestamp)
from config a
inner join data b
on a.configId = b.configID
I was able to find an answer using: Why can't you mix Aggregate values and Non-Aggregate values in a single SELECT?
The solution was indeed GROUP BY as CL mentioned above.
select a.*, min(b.timestamp), max(b.timestamp) from TrendConfig a join TrendData b on a.configId = b.configId group by a.configId

How to UPDATE multiple columns using a correlated subquery in SQLite?

I want to update multiple columns in a table using a correlated subquery. Updating a single column is straightforward:
UPDATE route
SET temperature = (SELECT amb_temp.temperature
FROM amb_temp.temperature
WHERE amb_temp.location = route.location)
However, I'd like to update several columns of the route table. As the subquery is much more complex in reality (JOIN with a nested subquery using SpatiaLite functions), I want to avoid repeating it like this:
UPDATE route
SET
temperature = (SELECT amb_temp.temperature
FROM amb_temp.temperature
WHERE amb_temp.location = route.location),
error = (SELECT amb_temp.error
FROM amb_temp.temperature
WHERE amb_temp.location = route.location),
Ideally, SQLite would let me do something like this:
UPDATE route
SET (temperature, error) = (SELECT amb_temp.temperature, amb_temp.error
FROM amb_temp.temperature
WHERE amb_temp.location = route.location)
Alas, that is not possible. Can this be solved in another way?
Here's what I've been considering so far:
use INSERT OR REPLACE as proposed in this answer. It seems it's not possible to refer to the route table in the subquery.
prepend the UPDATE query with a WITH clause, but I don't think that is useful in this case.
For completeness sake, here's the actual SQL query I'm working on:
UPDATE route SET (temperature, time_distance) = ( -- (C)
SELECT -- (B)
temperature.Temp,
MIN(ABS(julianday(temperature.Date_HrMn)
- julianday(route.date_time))) AS datetime_dist
FROM temperature
JOIN (
SELECT -- (A)
*, Distance(stations.geometry,route.geometry) AS distance
FROM stations
WHERE EXISTS (
SELECT 1
FROM temperature
WHERE stations.USAF = temperature.USAF
AND stations.WBAN_ID = temperature.NCDC
LIMIT 1
)
GROUP BY stations.geometry
ORDER BY distance
LIMIT 1
) tmp
ON tmp.USAF = temperature.USAF
AND tmp.WBAN_ID = temperature.NCDC
)
High-level description of this query:
using geometry (= longitude & latitude) and date_time from the route table,
(A) find the weather station (stations table, uniquely identified by the USAF and NCDC/WBAN_ID columns)
closest to the given longitude/latitude (geometry)
for which temperatures are present in the temperature table
(B) find the temperature table row
for the weather station found above
closest in time to the given timestamp
(C) store the temperature and "time_distance" in the route table

Optimization of Oracle Query

I have following query :
SELECT distinct A1 ,sum(total) as sum_total FROM
(
SELECT A1, A2,A3,A4,A5,A6,COUNT(A7) AS total,A8
FROM (
select a.* from table1 a
left join (select * from table_reject where name = 'smith') b on A.A3 = B.B3 and A.A9 =B.B2
where B.ID is null
) t1
WHERE A8 >= NEXT_DAY ( trunc(to_date('17/09/2013 12:00:00','dd/mm/yyyy hh24:mi:ss')) ,'SUN' )
GROUP BY
CUBE(A1, A2,A3,A4,A5,A6,A8)
)INN
WHERE
INN.A1 IS NOT NULL AND
INN.A2 IS NULL AND
INN.A3 IS NULL AND
INN.A4 IS NULL AND
INN.A5 IS NULL AND
INN.A6 is NULL AND
INN.A8 IS NOT NULL
GROUP BY A1
ORDER BY sum_total DESC ;
Total number records in table1 is around 8 million.
My problem is i need to optimize the above query in best possible way.I did tried to make index on column A8 of table1 and creating the index helped me to decrease the cost of query but execution time of query is more or less same when there was no index on the table1.
Any help would be appreciated.
Thanks
CUBE operation on large data set is really expensive, so you need to check do you really need all that data in inner query. because i see you are doing COUNT in inner and then on the outer query you have SUM of counts. so in other words, give me the row count of A7 at for all combination A1-A8 (-A7). then get only SUM for selected combinations filtered by WHERE clause. we can sure optimize this by limiting CUBE on certain column itself but very obvious things so far i have notice are as follows.
if you use below query and have right index o Table1 and Table_reject then both query can utilize the Index and reduce the data set needs to be join and further processed.
I am not 100% sure but yes Partial CUBE processing is possible and need to check that.
clustered index --> Table1 need on A8 And Table_Reject need clustered index on NAME.
non-clustered index--> Table1 need on A3,A9and Table_reject need on B3,B2
SELECT qry1.
(
SELECT A1, A2,A3,A4,A5,A6,A7,A8
FROM table1
WHERE A8 >= NEXT_DAY ( trunc(to_date('17/09/2013 12:00:00','dd/mm/yyyy hh24:mi:ss')) ,'SUN' )
)qry1
LEFT JOIN
(
select B3,B2,ID
from table_reject
where name = 'smith'
)qry2
ON qry1.A3 = qry2.B3 and qry1.A9=qry2.B2
WHERE qry2.ID IS NULL
EDIT1:
I tried to find out what will be the difference in CUBE operator result if you do it on all Columns or you do it on only columns that you need it in result set. what I found is the way CUBE function works you do not need to perform CUBE on all columns. because at the end you just care about combinations generated by CUBE where A1 and A8 is NOT NULL.
Try this link and see the output.
enter link description here
Query 1 and Query2 is just inner most queries to compare the CUBE result set.
Query3 and Query4 is the same query that you are trying and you see the results are same in both case.
DECLARE #NEXT_DAY DATE = NEXT_DAY ( trunc(to_date('17/09/2013 12:00:00','dd/mm/yyyy hh24:mi:ss')) ,'SUN' )
SELECT distinct A1 ,sum(total) as sum_total FROM
(
SELECT A1,COUNT(A7) AS total,A8
FROM (
select a.a1,a.a7,a.a8
from table1 a
left join (select * from table_reject where name = 'smith') b
on A.A3 = B.B3 and A.A9 =B.B2
where B.ID is null
) t1
WHERE A8 >= #NEXT_DAY
GROUP BY
CUBE(A1,A8)
)INN
WHERE INN.A1 IS NOT NULL AND
INN.A8 IS NOT NULL
GROUP BY A1
ORDER BY sum_total DESC ;
EDIT3
As I mentioned in the Comment this is a Round3 update. i can not change comment but i meant Edit3 instead Round3.
well the new change in your query is adding the WHERE A8 >= #NEXT_DAY condition in the inner most left join select where A8 >= #NEXT_DAY AND B.ID is null as well. that has improved the selection very much.
in your last comment you mentioned that query is taking 30-35 second and as you change the value of A8 it keep increasing. now with the execution time you didn't mentioned how much data is in the result set. why that is important? because if my query is returning 5M rows as a final result set that will going to spend 90% time in just droping that data on to UI, or output file what ever output method you are using. but actual performance should be measured how soon the query has started giving first couple of rows. because by that time Optimizer has already decided the execution paln and DB is executing that plan. yet i agree that if query is returning 100 rows and taking 10 seconds then something can be wrong with execution plan.
to demo that what I did is I created the dummy data. and perfomred your query against it.
i have table Test_CubeData with 9M rows in it with the same column numbers and data type you explained for your Table1. I have second table Table_Reject with 80K rows with number of columns and its datatype I figured out from query. To test the extreme side of this table; name column has only one value "smith" and ID is null for all 80K rows. so column values that can affect inner left join result will be B2 and B3.
in these tests i do not have any index on both tables. both are heap. and you see the results are in still few seconds with acceptable range of data in result set. as my result data set increases the completion time increases. if i create explained indexes then it will give me Index Seek operation
for all these tested cases. but at certain point that index will also exhaust and become Index Scan.
one sure example would be if my filter value for A8 column is smallest date value exist in that column. in that case Optimizer will see that all 9M rows need to be participate in inner select and CUBE and lot of data will be get processed in memory. which is expected. on the other hand lets see the another example of queries. i have unique 32873 values in A8 column and those values are almost equally distributed among 9M rows. so per single A8 values there are 260 to 300 rows. now if I execute the query for any single value smallest, largest, or any thing in between the query execution time should not change.
notice the highlighted text in each image below that indicated what the value of A8 filter is chosen,
important columns only in the select list instead using *, added A8 filter in the inner left join query, execution plan showing the TableScan operation on both table, query execution time in second,and total number of rows return by the query.
I hope that this will clear some doubts on performance of your query and will help you to set right expectation.
**Table Row Counts**
**TableScan_InnerLeftJoin**
**TableScan_FullQuery_248Rows**
**TableScan_FullQuery_5K**
**TableScan_FullQuery_56K**
**TableScan_FullQuery_480k**
You're calculating a potentially very large cube result on seven columns, and then discarding all the results except those that are logically just a group_by on column A1.
I suggest that you rewrite the query to just group by A1.

Resources