Selecting max value across columns opposed to across rows

Selecting max value across columns opposed to across rows - sqlite

I am attempting to select the max value within separate columns per a dimension listed in a row as so;
Input Dataset
Person|Date#1 |Date#2 |Date#3 |Date#4
------+--------+--------+--------+---------
Matt |12/01/18|01/15/19|02/15/19|04/15/18
Dave |01/15/18|01/02/19|03/15/19|11/01/19
Desired result
Input Dataset
Person|Max Date|
------+--------+
Matt |02/15/19|
Dave |11/01/19|

Once you fix up your tables to a proper format like YYYY-mm-dd so the table looks like so:
Person Date#1 Date#2 Date#3 Date#4
---------- ---------- ---------- ---------- ----------
Matt 2018-12-01 2019-01-15 2019-02-15 2018-04-15
Dave 2018-01-15 2019-01-12 2019-03-15 2019-11-01
it becomes a trivial
SELECT Person, max("Date#1", "Date#2", "Date#3", "Date#4") AS "Max Date" FROM mytable;
Person Max Date
---------- ----------
Matt 2019-02-15
Dave 2019-11-01
Remember, sqlite does not have any date or time types. It uses strings or numbers to hold those values. When storing dates as strings, they have to be formatted in a way that can be compared meaningfully. '04/15/18' is greater than '01/15/19' because the character 4 is greater than the character 1. None of the standard time string formats have that problem.

Related

Count rows until you get to the current owning team value... Kusto, countof()

I have this Kusto code that I have been trying to develop and any help would be greatly appreciated.
The objective is to count to the first occurrence of the CurrentOwningTeamId in the OwningTeamId column.
I packed the Owning Team number and parsed the value into a column of its own. I need to count the owning teams until I get to the current owning team.
Columns are (example):
Objective: Count to the first occurrence of the CurrentOwningTeam value in the OwningTeamId column using Kusto (Application Insights code):
[CODE]
OwningTeamId, CurrenOwningTeam, CreateDate, RequestType
155523 **888888** 2017-07-02 PRIMARY
256924 **888888** 2017-08-02 TRANSFER
**888888** **888888** 2017-09-02 TRANSFER
954005 **888888** 2017-10-02 TRANSFER
**888888** **888888** 2017-11-02 TRANSFER
155523 **888888** 2017-12-02 TRANSFER
954005 **888888** 2017-13-02 TRANSFER
**888888** **888888** 2017-14-02 TRANSFER
[/CODE]
I think you can match the current owning team with the countof() function, but I don't know how to go about it using regex. Note: values are different with each owning team on every incident, is why I capture the owning team on the incident first and try to count the very first instance of the CurrentOwningTeam number in the OwningTeamId column. In other words I want to count the number of times it takes to get to the very first owning team. In this case, it would be three.
Note: OwningTeamId's and CurrentOwningTeam can change on every incident, I first capture the CurrentOwningTeam then try to match in the OwningTeamId column.
Note: This is just one incident, but I am trying to do multiple Incidents.
Below is how I got the Current Owning Team Value.
[/CODE]
| extend CurrentOwningTeam=pack_array(OwningTeamId)
| parse CurrentOwningTeam with * "[" CurrentOwningTeam:int "]" *
| serialize CurrentOwningTeam
[/CODE]
I tried using row_number() but it will not work for multiple incidents, only per incident, so I have to use count or countof functions or another way of doing it.

Thanks for clarification. Here is a suggestion for a query that counts ordered by-time rows until certain condition is reached (count is contextual using IncidentId key).
datatable(IncidentId:string, OwningTeamId:string, CurrentOwningTeam:string, CreateDate:datetime, RequestType:string)
[
'Id1','155523','888888',datetime(2017-02-07),'PRIMARY',
'Id1','256924','888888',datetime(2017-02-08),'TRANSFER',
'Id1','888888','888888',datetime(2017-02-09),'TRANSFER',
'Id1','954005','888888',datetime(2017-02-10),'TRANSFER',
'Id1','888888','888888',datetime(2017-02-11),'TRANSFER',
'Id1','155523','888888',datetime(2017-02-12),'TRANSFER',
'Id1','954005','888888',datetime(2017-02-13),'TRANSFER',
'Id1','888888','888888',datetime(2017-02-14),'TRANSFER',
// Id2
'Id2','155523','888888',datetime(2017-02-07),'PRIMARY',
'Id2','256924','888888',datetime(2017-02-08),'TRANSFER',
'Id2','999999','888888',datetime(2017-02-09),'TRANSFER',
'Id2','954005','888888',datetime(2017-02-10),'TRANSFER',
'Id2','888888','888888',datetime(2017-02-11),'TRANSFER',
'Id2','155523','888888',datetime(2017-02-12),'TRANSFER',
'Id2','954005','888888',datetime(2017-02-13),'TRANSFER',
'Id2','888888','888888',datetime(2017-02-14),'TRANSFER',
]
| order by IncidentId, CreateDate asc
| extend c= row_cumsum(1, IncidentId!=prev(IncidentId))
| where OwningTeamId == CurrentOwningTeam
| summarize arg_min(CreateDate, c) by IncidentId
Result:
IncidentId CreateDate c
Id1 2017-02-09 00:00:00.0000000 3
Id2 2017-02-11 00:00:00.0000000 5
Here are the links to the docs that point how to find earliest record using arg_min() aggregation, and link to the row_cumsum() (cumulative sum) function.
https://learn.microsoft.com/en-us/azure/kusto/query/arg-min-aggfunction
https://learn.microsoft.com/en-us/azure/kusto/query/rowcumsumfunction

I figured it out by using the RowNumber directly into grouping inside the table, then finally summing to get my total count.
[CODE]
| serialize Id
| extend RowNumber=row_number(1, (Id) ==Id)
| summarize TotalOwningTeamChanges=sum(RowNumber) by Id
[/CODE]
Then after that I got the Minimum Date to extract the entire data set to the first instance of the current OwningTeamName.
[CODE]
//Outside the scope of the table.
| extend ExtractFirstOwningTeamCreateDate=CreateDate2
| extend VeryFirstOwningTeamCreateDate=MinimumCreateDate
| where FirstOwningTeamRow == true or MinimumCreateDate <=
ExtractFirstOwningTeamCreateDate
| serialize VeryFirstOwningTeamCreateDate
[/CODE]

Identifying, reviewing, and deduplicating records in R

I'm looking to identify duplicate records in my data set based on multiple columns, review the records, and keep the ones with the most complete data in R. I would like to keep the row(s) associated with each name that have the maximum number of data points populated. In the case of date columns, I would also like to treat invalid dates as missing. My data looks like this:
df<-data.frame(Record=c(1,2,3,4,5),
First=c("Ed","Sue","Ed","Sue","Ed"),
Last=c("Bee","Cord","Bee","Cord","Bee"),
Address=c(123,NA,NA,456,789),
DOB=c("12/6/1995","0056/12/5",NA,"12/5/1956","10/4/1980"))
Record First Last Address DOB
1 Ed Bee 123 12/6/1995
2 Sue Cord 0056/12/5
3 Ed Bee
4 Sue Cord 456 12/5/1956
5 Ed Bee 789 10/4/1980
So in this case I would keep records 1, 4, and 5. There are approximately 85000 records and 130 variables, so if there is a way to do this systematically, I'd appreciate the help. Also, I'm a total R newbie (as if you couldn't tell), so any explanation is also appreciated. Thanks!

#Add a new column to the dataframe containing the number of NA values in each row.
df$nMissing <- apply(df,MARGIN=1,FUN=function(x) {return(length(x[which(is.na(x))]))})
#Using ave, find the indices of the rows for each name with min nMissing
#value and use them to filter your data
deduped_df <-
df[which(df$nMissing==ave(df$nMissing,paste(df$First,df$Last),FUN=min)),]
#If you like, remove the nMissinig column
df$nMissing<-deduped_df$nMissing<-NULL
deduped_df
Record First Last Address DOB
1 1 Ed Bee 123 12/6/1995
4 4 Sue Cord 456 12/5/1956
5 5 Ed Bee 789 10/4/1980
Edit: Per your comment, if you also want to filter on invalid DOBs, you can start by converting the column to date format, which will automatically treat invalid dates as NA (missing data).
df$DOB<-as.Date(df$DOB,format="%m/%d/%Y")

SQLite order by date and group by

I've got a SQLite table which looks like this:
channel message time
123 hello 2014-03-25 21:33:52
123 there 2014-03-25 22:31:00
222 hi also 2014-01-22 10:19:00
222 bye 2014-01-22 11:29:00
Now I want to get the latest message for each channel and order all the results by DESC.
What I got so far is:
SELECT * from history WHERE GROUP BY channel order by date(time) DESC;
This returns the latest message for each channel but the results are not in order.
Im getting:
222 bye 2014-01-22 11:29:00
123 there 2014-03-25 22:31:00
Channel 123 should be on the top since its the newest one.
Any ideas what I'm doing wrong?

I just tried to order my data wich uses date and time, and it works, but only for DATE or TIME(names of my fields)
In your case, the name of the column is "time" so why do you use date(time) instead of simply time? try to order it simply with:
SELECT * from history WHERE GROUP BY channel order by time DESC;

In Sqlite database the date type is not of NSDate type it acts as a string so it will not order your records in according to the date. You have to do it with some other way.

How to warn about a duplicate values combination

I'm trying to create the following warning in Google Spreadsheet: when I add, in columns Name and Date, a combination of values which are already present then in the column Result I should receive the message Duplicate date.
Here is an example:
Name | Date | Result
Alex | 27/11/2013
John | 28/11/2013
Alan | 29/11/2013
Val | 30/11/2013
Jack | 2/12/2013
Alex | 27/11/2013 |Duplicate date
I know how to raise a "warning" if a duplicated Date exists, by changing the ColumnC cell text "Date" into that message, but I don't know how to pair the Name and Date values.
I use this =IF:
=IF(COUNTA(B2:B)>COUNTA(UNIQUE(B2:B));"Duplicate date";"Date")

Please try:
=If(ArrayFormula(SUMPRODUCT((A:A=A3)*(B:B=B3))>1);"Duplicate date";"")
in C3 and copied down (assuming John is in A4).
Note however this detects duplicates (ie both your first and last rows above, rather than merely the repetition that the last is of the first row).

mySql sum a column and return only entries with and entry in last 10 minutes

heres a table, the time when the query runs i.e now is 2010-07-30 22:41:14
number | person | timestamp
45 mike 2008-02-15 15:31:14
56 mike 2008-02-15 15:30:56
67 mike 2008-02-17 13:31:14
34 mike 2010-07-30 22:31:14
56 bob 2009-07-30 22:37:14
67 bob 2009-07-30 22:37:14
22 tom 2010-07-30 22:37:14
78 fred 2010-07-30 22:37:14
Id like a query that can add up the number for each person. Then only display the name totals which have a recent entry say last 60 minutes. The difficult seems to be, that although its possible to use AND timestamp > now( ) - INTERVAL 600, this has the affect of stopping the full sum of the number.
the results I would from above are
Mike 202
tom 22
fred 78
bob is not included his latest entry is not recent enough its a year old! mike although he has several old entries is valid because he has one entry recently - but key, it still adds up his full 'number' and not just those with the time period.
go on get your head round that one in a single query ! and thanks
andy.

You want a HAVING clause:
select name, sum(number), max(timestamp_column)
from table
group by name
HAVING max( timestamp_column) > now( ) - INTERVAL 600;

andrew - in the spirit of education, i'm not going to show the query (actually, i'm being lazy but don't tell anyone) :).
basically tho', you'd have to do a subselect within your main criteria select. in psuedo code it would be:
select person, total as (select sum(number) from table1 t2 where t2.person=t1.person)
from table1 t1 where timestamp > now( ) - INTERVAL 600
that will blow up, but you get the gist...
jim

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex