How can I search for user_name and password in table contains MILLIONS OF records as fast as possible? - bigdata

Hi there
There is a question I was not lucky to find a solution till now, What technique Facebook use to match username and password in millions of record in few seconds or less, I need a simple example for this case. In my work users less than 300,000 records so using LINEAR search not a big problem but what I have to do if users become millions???

Related

Trying to understand database denormalization, Is this database denormalized?

I've been struggling for a couple of days trying to figure out the best way to design a database of a large data set on Firebase, I even wrote a question on database administration site.
I came up with a design, I don't know that's what's called denormalized data or not. I want to minimize querying time of data and also not making inserting/updating data so hard.
Here's my design:
Is that the right database design for this kind of data ?
(Please check my question at database administration site for more details about the nature of the data).
But also here's a short description of the data nature:
So I have an affiliator_category which maybe banks, clubs or organisations. And each category contains a number of affiliators and each affiliator contains number of stores divided into store_category, each store has a number of offers.
And for the user side (the one who do the shopping). A users has a number of memberships in several affiliators, and a number of spendings he/she does.

Best way to store data on iOS for list with 100s of items (possibly 1000s)

I am developing an app which presents a feed of posts and allows users to vote on these posts.
I want to prevent users from voting multiple times on a single post. To do that, I want to store a list of id's of the posts voted on already so that I can check that each time the user tries to vote.
What's the most efficient way of storing these post IDs if there's a chance of the user voting on up to thousands of posts within a year?
Sqlite, core data, p list or nsuserdefaults?
Since you would also like to know how many people voted (I think), I would save it to a server (using sqlite to store it).
Saving this on a user device seems redundant.
If you do want to store it I would advice Core Data.
It is too much information for NSUserdata, plists… I don’t know why but it just doesn’t seem like a good idea, and Coredata is just a better version of Sqlite (for swift usage)

Split large sqlite table by sessionid field

I am relatively new to sql(ite), and I'm learning as I go while working on a new project.
We have got millions of transaction rows in one "data" table, one field being a "sessionid" field.
Since I want to concentrate on in-session activity for now, I primarily need to look only at transactions from the same sessions.
My intuition now is, that it would be a lot faster if I separate the database by sessions into many single session tables, than always querying for a single sessionid, and then proceeding. My question: is that correct? will that make a difference?
Even if not: Could you help me out and tell me, how I could split the one "data" table rows into many session-specific tables, the rows staying the same? Plus one table which relates sessionIds to their tables?
Thanks!
A friend just told me, the splitting-into-tables thing would be extremely unflexible, and I should try adding a distinct index instead for the different sessionId rows to access single sessions faster. Any thoughts on that and how to do it best?
First of all, are you having any specific performance bottleneck with it till now? If yes, please describe it.
Having one table per session will probably speed lookups/indexes (for INSERTs) things up.
SQLite doesn't impose a limit on the number of tables, so you should be okay.
One other solution that provides easier maintenance, is if you create one table per day/week.
Depending on how long your sessions last, this could be feasible or not.
Related: https://stackoverflow.com/a/811862/89771

many-to-many query runs slow in windows phone 7 emulator

my application is using sqlite for a database. in the database, i have a many-to-many relationship. when i use the sqlite addon/tool for firefox, the sql query joining the tables in the many-to-many runs pretty fast. however, when i run the same query on the emulator, it takes a very long time (5 minutes or more). i haven't even tried it on a real device, thus.
can someone tell me what is going on?
for example, i have 3 table.
1. create table person (id integer, name text);
2. create table course (id integer, name text);
3. create table registration(personId integer, courseId integer);
my sql statements that i have tried are as follows.
select *
from person, course, registration
where registration.personId = person.id and registration.courseId = course.id
and also as follows.
select *
from person inner join registration on person.id=registration.personId
inner join course on course.id=registration.courseId
i am using the sqlite client from http://wp7sqlite.codeplex.com. i have 4,800 records in the registration table, 4,000 records in the person table, and 1,000 records in the course table.
is it my queries? is it just the sqlite client? is it the record size? if this problem cannot be fixed on the app, i'm afraid i'll have to push the database remotely (that means my app will have to use the internet).
Yep, its your queries. You're not going to get away with you can get away with doing what you are trying to do on a mobile device. You have to remember you aren't running on a PC so you have to think differently about how you approach things (both code and UI). You have low memory, slow disk access, a slow-ish processor, no virtual memory, etc. You're going to have to make compromises.
I'm sure what ever you are doing is perfectly possible to do on the phone without needing an offsite server but you need to be smart about it. For example is it really necessary to load all 4800+ records into memory at once? Almost certainly not, a user can't possibly at look at all 4800 at the same time. Forgetting the database speed just showing this number of items in a ListBox is going to kill your app performance wise.
And even is performance was perfect is displaying 4800 items really a good user experience? Surely allowing the user to enter a search term would be better and would allow you to filter the list to a more manageable size. Could you implement paging so you only display the first 10 records and have the user click next for the next 10?
You might also want to consider de-normalizing your database, so that you just have one table rather than 3. It will improve performance considerably. Yes it goes against everything you were taught about databases in school but like I said: phone = compromises. And remember this isn't a big OLTP mission critical database, its a phone app - no one cares if your database is in 3rd normal form or not. Also remember that the more work you give the phone (chugging through data building up joins) the more battery power you app will consume.
Finally if you absolutely think you must to give the user a list of 4800 records to scroll through, you should look at some kind of data virtualization technique. Which gives the user the illusion they are scrolling through a long list, even though there are actually only a few items loaded at any given time.
But the short answer is: yes, doing queries like that will problematic, you need to consider changing them.
By the time you start doing those joins that's an awfuly large amount of records you could end up with. What is memory like during this operation?
Assuming you have tuned indexes appropraitely, rather than do this with joins, I'd try three separate queries.
Either that or consider restructuring your data so it only contains what you need in the app.
You should also look to only return the fields you need.

Storing user profile data in the users table or separate profile table?

I'm developing a quick side project that needs a users table, and I want them to be able to store profile data. I was already reaching for the ASP.NET profile provider when I realized that users will only ever have one profile.
I realize that frequently changing data will impact performance on things like indexes and stuff but how frequent is too frequent?
If I have one profile change per month per user happening for say 1000 users, is that a lot?
Or are we talking more like users changing profile data on an hourly basis?
I realize this isn't an exact science but I'm trying to gauge at what point the threshold starts to peak, and since my users profile data will probably rarely change if I should bother the extra work or just wait a few decades for it to be a problem.
One thing to consider is how adding a large text column to a table will affect the layout of the rows. Some databases will store the large columns inlined with the other fixed size columns; this will make the rows variable sized and that means more work for the database when it needs to pull a row off the disk. Other databases (such as PostgreSQL) store large text columns away from the fixed size columns; this leads to fixed sized rows with quick access during table scans and the like but an extra bit of work is needed to pull out the text columns.
1000 users isn't that much in database terms so there's probably nothing to worry about one way or the other. OTOH, little one-off side projects have a nasty habit of turning into real mission critical projects when you're not looking so doing it right from the beginning is a good idea.
I think Justin Cave has covered the index issue well enough.
As long as you structure your data access properly (i.e. all access to your user table goes through one isolated pile of code) then changing your data schema for users won't be much work anyway.
Does the profile information actually need to be indexed? Or are you just going to be retrieving it based on the USER_ID of the table or some other indexed USER column? If the profile data isn't indexed, which seems likely to me, than there are no performance impacts to other indexes on the table.
The only reason I can think of to be concerned about putting profile information in the table is if there is a lot of data compared to the necessary information to define a user and if the USER table needs to be full scanned for some reason. In that case, increasing the size of the table would adversely affect the performance of a table scan. Assuming that you don't have a use case where it's regularly going to make sense to do a full scan on the USERS table, and given that the table will only have 1000 rows, that's probably not a big deal.

Resources