sql equivalent to "grep x | head -n1" - sqlite

I'm beginning to learn sql and have created an sqlite database of what was previously a text file of my daily ip address assignments from Comcast. If I wanted to find the first date that an ip address was assigned with the text file, I could:
cat, awk, sort, for/do, grep and head -n1
to get a list of the first dates any particular ip address was assigned. How can I do that with sql?
select distinct ip from history;
does not display the date column, and
select distinct ip, date from history;
returns all the db entries. What am I not doing? Thanks.

Your question is exactly a duplicate of this: SQL query to select distinct row with minimum value. You are trying to select the minimum value in the date column (the start date) for every unique IP. You will need to do an inner join and use group by.

Related

Case sensitive in Cloudera Impala table column name

I have installed Coludera VM. Tried to fetch data from Impala database using query editor. If I give upper case column name in query, always getting column name in lower case. Is there any limitations for column name as like we should use column name in lower case?
Sample Query:
select orderid as COLUMN1 from default.orders
Result:
column1
10248
10249
10278
From the Impala documentation:
Impala identifiers are always case-insensitive. That is, tables named
t1 and T1 always refer to the same table, regardless of quote
characters. Internally, Impala always folds all specified table and
column names to lowercase. This is why the column headers in query
output are always displayed in lowercase.
Try these table properties when creating the table. Make sure to put in your column names and types.
tblproperties (
'avro.schema.literal'='
{
"type":"record",
"name":"SchemaName",
"fields":[
{"name":"COLUMN1","type":["null","long"]},
{"name":"COLUMN2","type":["null","string"]}
]
}'
)
Inspired by https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#AvroSerDe-Useschema.literalandembedtheschemainthecreatestatement

Counting Columns in ColdFusion's QoQ

I have:
<cfspreadsheet action="read" src="#Trim(PathToExcelFile)#" query="Data">
How do I count the total column in my "Data" query using ColdFusion Query of Query? I need to count whether my users has used the corrent excel file format before inserting into my DB.
I'm using Oracle 11g and I can not do:
Select * From Data Where rownum < 2
If I can do that then I can create an array and count the columns but running that script using results in error. The error saying that there is no column name Rownum. Oracle does not allow me to use select top 1.
I don't want to loop over 5000+ record to just count the total column of one row. I appreciate any help, thank you
ColdFusion adds a few additional variables to it's query results. One of them is named `columnList' and contains a comma-separated list of the query columns that were returned.
From the documentation here
From that you should be able to count the number of columns easily. #listlen(Data.columnList)# as one example.

Get values from 5 columns meeting specific conditions from 2 other columns - SQL query in Access

I am new here.
I am using MS-Access and I have a database with several columns. Here is what I have and what I am looking for.
A column has a list of names. There are multiple entries for each name.
Another column has a list of dates. I should be able to select the most recent date for each of the names.
I know the SQL query for doing this in Access.
My challenge lies here. I have 5 other columns with status info. Either it's P or F or NA.
For each name and the most recent date, I should be able to pick the column names from the 5 status columns that equal F (status=fail).
How do I write a SQL query in Access to do that?
So, I think I got the first part.
SELECT O.* FROM data O
INNER JOIN
(SELECT I.[Name], MAX(CreatedDate) As RecentDate FROM data I
GROUP BY I.[Name])I
ON I.[Name] = O.[Name] AND I.RecentDate = O.CreatedDate
Now that I think about it, the second part seems very hard to me. The user should be able to select Name and then see the most recent date and the corresponding status column names if the status shows up as "F".

Was it mandatory to put a "distinct" field as the first field in a query?

Just out of curiosity, looks like a distinct field must be placed ahead of any other fields, am I wrong?
See this example in SQLite,
sqlite> select ip, distinct code from parser; # syntax error?
Error: near "distinct": syntax error
sqlite> select distinct code, ip from parser; # works
Why is that? Do I really have a syntax error?
There is no such thing as a "distinct field".
distinct applies to all fields in the query and therefore must appear immediately after select.
In other words, select distinct code, ip is really
select distinct
code,
ip
rather than
select
distinct code,
ip
It selects all distinct pairs of (code, ip). Thus the result set could include repeated values of code (each with a different value of ip).
It is not possible to apply distinct to a single field in the way you're trying to (group by might be a useful alternative, but we need to understand what it is exactly that you're trying to achieve).

SQL Query Help - Duplicate Removal

wasn't sure whether to put this in Software or here, so I figured I'd start here I know this will be a straightforward answer from you SQL geniuses...
I have a table, it contains contacts that I import on a daily basis. I will have an ASP.NET front end for user interaction. From this table, my intention is to send them all mailers - but only one to each address. So my end result is a user enters a date (which corresponds to teh date imported) and they are given a resultant grid that has all the unique addresses associated to that date. I only want to send a mailer to that address once - many times my original imported list will contain multiple businesses at the same address.
Table: ContactTable
Fielsd:
ID, CompanyName, Address, City, State, Zip, Phone
I can use the SELECT DISTINCT clause, but I need all the data associated to it (company name, etc.)
I have over 262000 Records in this table.
If I select a sample date of 1/10/2011, I get 2401 records. SELECT DISTINCT Address from the same date gives me 2092 records. This is workable, I would send those 2092 people a mailer.
Secondly, I'd have to be able to historically check if a mailer was already sent to that address as well. I would not want to send another mailer to the same business tomorrow either.
What's my best way?
I would start with creating a table to lookup sent mailers.
ID | DateSent
-------------
Every time you send a mailer you are going to want to insert the ID, and the DateTime into it, this way when you go to pull the mailers you can look against this table to see if the mailer has been sent within whatever your specified time frame of mailing is. You can extend this if you have multiple types of mailers to include the mailer type.
Plain Old SQL
SELECT a.ID, a.CompanyName, b.Address, b.City, b.State, b.Zip, a.Phone
FROM a.ContactTable
RIGHT JOIN (SELECT DISTINCT Address, City, State, Zip
FROM ContactTable) b
ON a.ID = b.ID
This sub-query is like creating a temp table SELECTing only the DISTINCT addresses, then joining it to the rest of the info.
To add the lookup against your new table add the following
SELECT a.ID, a.CompanyName, b.Address, b.City, b.State, b.Zip, a.Phone
FROM a.ContactTable
RIGHT JOIN (SELECT DISTINCT Address, City, State, Zip
FROM ContactTable) b
ON a.ID = b.ID
RIGHT JOIN SentMailer c
ON a.ID = c.ID
WHERE DATEDIFF(mm, c.DateSent, GETDATE()) > 12 --gives you everything that hasn't been sent a mailer within the last year
Edit
Without the data being standardized it's hard to get quality results. I've found in the past the more creative I have to get with my queries is a flag to bad table structure or data collection. I think you should still create a lookup table for ID/DateSent to manage the time frames for sending.
Edit
Yes, I'm basically looking for the unique address, city, state, zip. I would only require one instance for each address so we would be able to send a mailer to that address. At this point, Company name would not be required.
If this is the case you can simply do the following:
SELECT DISTINCT Address, City, State, Zip, Phone
FROM ContactTable
Keep in mind this won't scrub entries like Main Street vs Main St.
RogueSpear, I work in the address verification (and thus de-duplication) field for SmartyStreets, where we deal with this scenario a lot and tackle the challenge.
If you're getting daily lists from a company and have hundreds of thousands of records, then removing duplicate addresses using stored procedures or mere queries won't be enough to match the varying possibilities of each address. There are services which do this, and I'd point you to CASS-Certified vendors which provide that.
You can flag duplicates in a table using something like CASS-Certified Scrubbing, or you can prevent duplicates at point-of-entry with an API like LiveAddress. Anyway, I'd be happy to personally help you with any other address questions.
I would select, then remove, the duplicates like this:
SELECT a.ID, a.PurgedID, a.CAMPAIGNTYPE, a.COMPANY, a.DBANAME, a.COADDRESS, a.COCITY, a.COSTATE, a.COZIP, a.FIRSTNAME1, a.DIALERPHONENUM, a.Purged FROM PurgeReportDetail a
WHERE EXISTS (
SELECT * FROM PurgeReportDetail b WHERE
b.COADDRESS = a.COADDRESS
AND b.COCITY = a.COCITY
AND b.COSTATE = a.COSTATE
AND b.COZIP = a.COZIP
AND b.id <> a.id
) -- This clause will only include rows with duplicate columns noted
AND a.ID IN (
SELECT TOP 1 c.ID from PurgeReportDetail c
WHERE c.COADDRESS = a.COADDRESS
AND c.COCITY = a.COCITY
AND c.COSTATE = a.COSTATE
AND c.COZIP = a.COZIP
ORDER BY c.ID -- If you want the *newest* entry to be saved, add "DESC" here
) -- This clause gets the top 1 ID value for each matching set
or something like this.
This will keep the first ID of the redundant address, just replace the SELECT with DELETE when ready.
EDIT: Of course this will only work on exact matches.
EDIT2: If you wanted to only check where you hadn't sent mailers, you should join both to a table of sent mailers from a specified date range

Resources