I have a very simple sqlite table:
CREATE TABLE config (id INTEGER primary key,
token VARCHAR(255),
value TEXT,
date DATETIME)
Sometime when the webpage is grabbing the info, it spins and spins, but not all the time. Other pages, static, load quickly on this server.
I have been looking into INDEXing the data. Is this the best way to go about speeding up the query or should I be doing something different? Could this be a server issue? If so, how do I figure it out?
EDIT:
I am getting the data like so:
SELECT value
FROM config
WHERE token='%s'
ORDER BY id DESC
LIMIT 1
The page sometimes loads quick, sometimes slow. Sometimes, just half of the table fills and then it just spins until a refresh.
Thanks!
How large is the table? How many requests a second are you getting to it?
You'll probably want to create an index so that look-ups by token are quick. Right now, it has to scan the entire table to find the first row whose token matches your parameter.
CREATE INDEX IX_config_token ON config (token)
In general, every different way you query a table might benefit from an index just for that query, and so in practice you often end up with multiple indexes per table.
As an example, if I have a People table with a City and FirstName columns, amongst others, and I want to satisfy these three query cases:
Select all people that live in City X
Select all people that have FirstName Y
Select all people that live in City X AND have FirstName Y
Then I need three separate indexes:
CREATE INDEX IX_People_City ON People (City)
CREATE INDEX IX_People_FirstName ON People (FirstName)
CREATE INDEX IX_People_City_FirstName ON People (City, FirstName)
If I didn't have the third index, then query case #3 would use the first index to find all people that live in city X and then have to manually scan through that to find people that live in city X and have FirstName Y; it's still better than scanning the entire table, but it's still not ideal.
Related
My problem is that my querys are too slow.
I have a fairly large sqlite database. The table is:
CREATE TABLE results (
timestamp TEXT,
name TEXT,
result float,
)
(I know that timestamps as TEXT is not optimal, but please ignore that for the purposes of this question. I'll have to fix that when I have the time)
"name" is a category. This calculation holds the results of a calculation that has to be done at each timestamp for all "name"s. So the inserts are done at equal-timestamps, but the querys will be done at equal-names (i.e. I want given a name, get its time series), like:
SELECT timestamp,result WHERE name='some_name';
Now, the way I'm doing things now is to have no indexes, calculate all results, then create an index on name CREATE INDEX index_name ON results (name). The reasoning is that I don't need the index when I'm inserting, but having the index will make querys on the index really fast.
But it's not. The database is fairly large. It has about half a million timestamps, and for each timestamp I have about 1000 names.
I suspect, although I'm not sure, that the reason why it's slow is that every though I've indexed the names, they're still scattered all around the physical disk. Something like:
timestamp1,name1,result
timestamp1,name2,result
timestamp1,name3,result
...
timestamp1,name999,result
timestamp1,name1000,result
timestamp2,name1,result
timestamp2,name2,result
etc...
I'm sure this is slower to query with NAME='some_name' than if the rows were physically ordered as:
timestamp1,name1,result
timestamp2,name1,result
timestamp3,name1,result
...
timestamp499997,name1000,result
timestamp499998,name1000,result
timestamp499999,name1000,result
timestamp500000,namee1000,result
etc...
So, how do I tell SQLite that the order in which I'd like the rows in disk isn't the one they were written in?
UPDATE: I'm further convinced that the slowness in doing a select with such an index comes exclusively from non-contiguous disk access. Doing SELECT * FROM results WHERE name=<something_that_doesnt_exist> immediately returns zero results. This suggests that it's not finding the names that's slow, it's actually reading them from the disk.
Normal sqlite tables have, as a primary key, a 64-bit integer (Known as rowid and a few other aliases). That determines the order that rows are stored in a B*-tree (Which puts all actual data in leaf node pages). You can change this with a WITHOUT ROWID table, but that requires an explicit primary key which is used to place rows in a B-tree. So if every row's (name, timestamp) columns make a unique value, that's a possibility that will leave all rows with the same name on a smaller set of pages instead of scattered all over.
You'd want the composite PK to be in that order if you're searching for a particular name most of the time, so something like:
CREATE TABLE results (
timestamp TEXT
, name TEXT
, result REAL
, PRIMARY KEY (name, timestamp)
) WITHOUT ROWID
(And of course not bothering with a second index on name.) The tradeoff is that inserts are likely to be slower as the chances of needing to split a page in the B-tree go up.
Some pragmas worth looking into to tune things:
cache_size
mmap_size
optimize (After creating your index; also consider building sqlite with SQLITE_ENABLE_STAT4.)
Since you don't have an INTEGER PRIMARY KEY, consider VACUUM after deleting a lot of rows if you ever do that.
Coming from a SQL background, I understand the high-level concepts on NoSQL but still having troubles trying to translating some basic usage scenario. I am hoping someone can help.
My application simply record a location, a timestamp, and tempature for every second of the day. So we end up having 3 basic columns:
1) location
2) timestamp
3) and temperature
(All field are numbers and I'm storing the timestamp as an epoch for easy range querying)
I setup dynamodb with the location as the primary key, and the timestamp as the sortkey and temp as an attribute. This results in a composite key on location and timestamp which allows each location to have its own unique timestamp but not allow any individual location to have more than one identical timestamp.
Now comes the real-world queries:
Query each site for a time range (Works fine)
Query for any particular time-range return all temps for all locations (won't work)
So how would you account for the 2nd scenario? This is were I get hung up... Is this were we get into secondary indexes and things like that? For those of you smarter than me, how would you deal with this?
Thanks in advance for you help!
-D
you cant query for range of values in dynamodb. you can query for a range of values (range keys) that belongs to a certain value (hash key)
its not matter if this is table key, local secondary index key, or global secondary index (secondary index are giving you another query options..)
lets back to your scenario:
if timestamp is in seconds and you want to get all records between 2 timestamps then you can add another field 'min_timestamp'.
this field can be your global secondary hash key, and timestamp will be your global secondary range key.
now you can get all records that logged in a certain minute.
if you want a range of minutes, then you need to perform X queries (if X its the range of minutes)
you can also add another field 'hour_timestamp' (that hash key contains all records in a certain hour) and goes on... - but this approach is very dangerous - you going to update many records with the same hash key in the same point of time, and you can get many throughput errors...
I'm new to DynamoDB - I already have an application where the data gets inserted, but I'm getting stuck on extracting the data.
Requirement:
There must be a unique table per customer
Insert documents into the table (each doc has a unique ID and a timestamp)
Get X number of documents based on timestamp (ordered ascending)
Delete individual documents based on unique ID
So far I have created a table with composite key (S:id, N:timestamp). However when I come to query it, I realise that since my id is unique, because I can't do a wildcard search on ID I won't be able to extract a range of items...
So, how should I design my table to satisfy this scenario?
Edit: Here's what I'm thinking:
Primary index will be composite: (s:customer_id, n:timestamp) where customer ID will be the same within a table. This will enable me to extact data based on time range.
Secondary index will be hash (s: unique_doc_id) whereby I will be able to delete items using this index.
Does this sound like the correct solution? Thank you in advance.
You can satisfy the requirements like this:
Your primary key will be h:customer_id and r:unique_id. This makes sure all the elements in the table have different keys.
You will also have an attribute for timestamp and will have a Local Secondary Index on it.
You will use the LSI to do requirement 3 and batchWrite API call to do batch delete for requirement 4.
This solution doesn't require (1) - all the customers can stay in the same table (Heads up - There is a limit-before-contact-us of 256 tables per account)
Just some background, sorry so long winded.
I'm using the System.Data.SQLite ADO.net adapter to create a local sqlite database and this will be the only process hitting the database, so I don't need to worry about concurrency.
I'm building the database from various sources and don't want to build this all in memory using datasets or dataadapters or anything like that. I want to do this using SQL (DdCommands). I'm not very good with SQL and complete noob in sqlite. I'm basically using sqlite as a local database / save file structure.
The database has a lot of related tables and the data has nothing to do with People or Regions or Districts, but to use a simple analogy, imagine:
Region table with auto increment RegionID, RegionName column and various optional columns.
District table with auto increment DistrictID, DistrictName, RegionId, and various optional columns
Person table with auto increment PersonID, PersonName, DistrictID, and various optional columns
So I get some data representing RegionName, DistrictName,PersonName, and other Person related data. The Region, District and/or Person may or may not be created at this point.
Once again, not being the greatest with this, my thoughts would be something like:
Check to see if Region exists and if so get the RegionID
else create it and get RegionID
Check to see if District exists and if so get the DistrictID
else create it adding in RegionID from above and get DistrictID
Check to see if Person exists and if so get the PersonID
else create it adding in DistrictID from above and get PersonID
Update Person with rest of data.
In MS SQL Server I would create a stored procedure to handle all this.
Only way I can see to do this with sqlite is a lot of commands. So I'm sure I'm not getting this. I've spent hours looking around on various sites but just don't feel like I'm going down the right road. Any suggestions would be greatly appreciated.
Use last_insert_rowid() in conjunction with INSERT OR REPLACE. Something like:
INSERT OR REPLACE INTO Region (RegionName)
VALUES (:Region );
INSERT OR REPLACE INTO District(DistrictName, RegionID )
VALUES (:District , last_insert_rowid());
INSERT OR REPLACE INTO Person(PersonName, DistrictID )
VALUES (:Person , last_insert_rowid());
I am trying to correct the sort order of my ASP.NET drop down list.
The problem I have is that I need to select a distinct Serial number and have these numbers organised by DateTime Desc.
However I cannot ORDER BY DateTime if using DISTINCT without selecting the DateTime field in my query.
However if I select DateTime this selects every data value associated with a single Serial number and results in duplications.
The purpose of my page is to display data for ALL Serials, or data associated to one serial. When a new cycle begins (because it is a new production run) the Serial reverts to 1. So I cannot simply organise by serial number either.
When I use the following SQL statement the list box is in the order I require but after a period of time (usually a few hours) the order changes and appears to have no organised structure.
alt text http://img7.imageshack.us/i/captureky.jpg/
I'm fairly new to ASP.NET / SQL, does anyone know of a solution to my problem.
If you have multiple date times for each serial number, then which do you want to use for ordering? If the most recent, try this:
SELECT SerialNumber,
MAX(DateTimeField)
FROM Table
GROUP BY SerialNumber
ORDER BY 2 DESC
I don´t know if everybody agrees with that, but when I see a DISTINCT in a query the first thought that goes trough my mind is "This is wrong". Generally, DISTINCT is not necessary and it´s used when the person writing the query doesnt know very well what he is doing and this might be the case since you said you are new with Sql.
Without complete knowledge of your model is difficult to assist you a hundred percente, but I would say that you should use a GROUP BY clause instead of DISTINCT, then you can order it correctly.