In my Symfony 2.0 application, I must import and create users from csv file.
I have no problem to do that, but I also need to generate an unique email for each user (because I have a webmail in my application, so I need to create an internal mail adress )
I would like to know if there is any best practice to auto-increment a duplicate value of unique text field (john.doe#mydomain.com, john.doe_1#mydomain.com ...)
My first idea would be to do the verification in a prepersist event, but maybe there is better solution.
Simply autoincrementing may have concurrency issues if the transaction has a pause or delay.
Adding a nano-second timestamp will effectively guarantee unique, but the email addresses will be unwieldly without an aliaser or autocomplete.
Alternatively, you can start a social campaign to eliminate duplicate names from the entire human population. That might take a few [thousand] months.
Related
I have a collection called Vouchers. A user can, if they know the unique number ID of a Voucher, "claim" that voucher, which will give it a user_id attribute, tying it to them.
I'm at a point where I need to check a user's ID query against the existing database, but I'm wondering if I can do so on the client instead of the server (the client would be much more convenient because I'm using utility functions to tie the query form to the database operation.... it's a long story). If I do so on the client, I'll have to publish the entire Vouchers collection with correct user_id fields, and although I won't be showing those ids through any templates, they would be available through the console.
Is there an inherent risk in publishing all of the IDs like this? Can they be used maliciously even if I don't leave any specific holes for them to be used in?
First, in general it sounds like a bad idea to publish all user_ids to the client. What would happen if you have 1 million users? That would be a lot of data.
Second, in specific, we cannot know if there is inherent risk in publishing your user_ids, because we do not know what could be done with it in your system. If you use a typical design of user_ids chosen by the user themselves (for instance email), then you MUST design your system to be safe even if an attacker has guessed the user_id.
Short Version: not so good idea.
I have a similar setup up: user can sign-up, if she knows the voucher code. You can only publish those vouchers where the user_id is identical to the logged in user. All other checks like "does the user input correspond to a valid voucher?" must be handled on the server.
Remember: client code is not trusted.
Using ASP.NET, I'm building an admin tool that requires a function to import a list of email addresses. Upon uploading the file, I want to check for existing records for any of the email addresses supplied. For non-existing email addresses, I would create them using my DAO.
Basically I want to:
Receive list of emails
Retrieve data for existing emails
Create data for new emails in db
Return full data for all emails in list.
Since I want to know which of the emails exist up front, my first thought was to query the table for all records WHERE Email IN ('Email001FromFile', 'Email002FromFile', 'etc...') but the list could potentially contain thousands of email addresses, and I'm not certain supplying that many email addresses to the IN operator would be a good idea.
I also thought about looping through the list and checking for a record for each email, but that would potentially generate far too many queries.
My next thought was to generate a temp table to hold the list and modify the IN clause to use the temp table, rather than an explicit list of items, but that would require I execute SQL or a stored procedure directly, which I'm not inclined to do since I'm using NHibernate to access my DB.
Though I am using ASP.NET (C#) and NHibernate, and any answers specific to that would be helpful, I'm really just looking for general ideas on how to handle this scenario.
If loading the existing e-mails into memory is not an option I would maybe go for some kind of batch approach. Go for the IN-query you mention, but do it only for n emails at time. You could eiter hardcode n to a certain value or you could let it be a function of the total number of new e-mails.
I'm not sure whether this approach really is faster than to perform one single IN-query (someone with more db-skills than me would have to answer that), but that would allow you to indicate some kind of loading status to the user.
Are you doing anything with the emails that are duplicates?
You could put a UNIQUE constraint on your table to only allow an email address to be entered once - then catch the exception SQL will throw when you attempt to insert a duplicate.
This is just a general question irrespective of database architecture.
I am maintaining an ASP.NET web application. The structure is such that,
Say on 'Add a new employee' webform
The primary key (or the record id to
be saved with) is initially loaded on form
load event & displayed as a label
So when the form loads, the record id to save with is shown to the user
Positives:
End user already knows what the id/serial of the form is (even before he saves the form)
So on form save when he is directed
to gridview screen (with all entries)
he can search records easily
(although the most recent one is at
the top anyway)
Negatives:
If he does not save the form, say he
just cancels after loading the data entry form,
the id/key initially fetched is
wasted (in my case it is a sequence
field fetched on form load from database)
What do you guys do in these scenarios ? Which approach would you recommend for 'web applications'? And how to facilitate the user with a different approach ? Is our current approach recommended (To me,it wastes the ids/sequence from database)
I'd always recommend not presenting the identity field value for the record being created until the record has been created. The "create a temporary placeholder record first to obtain the identity field value ahead of time" approach can, as you mention, result in wasted IDs, unless you have a process in place to reclaim them.
You can always pop-up a message box when the user presses save that tells them the identity field value of the newly created record.
In this situation you could use a GUID created by the application itself. The database would then only have the PK set to be a Unique Identifier (GUID) and that it must not be null. In this situation you are not wasting any unique keys as each call to get a new GUID should be definition produce a (mathmatically) unique identifier. It is worth noting that if you use this method, it is best to make sure your PK is not set up to be clustered. The resulting index reorganisation upon insert could quickly result in an application that suffers performance hits.
For one: I wouldn't care so much about wasted id values. When you are in danger of running out of int32 values (and when has that happened to you last?), use int64. The user experience is way much more important than wasting a few id values.
Having said that, I would not want the primary key to be anything the user would want to type in. If you are having a primary key that users need to type in, chances are it then is (or will be requested to be) more than just an int32/64 value and carries (will carry) meaning in its composition and/or formatting. Primary keys should not have that. (Tons of reasons google for meaningless primary keys or other such terms).
If you need a meaningful key, make it a secondary index that is in no way related to the primary key. If a part of that is still a sequential number taken from some counter value in your database. Decide whether functionally it is a problem for gaps to appear in the sequence. (The tax people generally don't want gaps in invoice numbers). If functionally it is no problem, then certainly don't start worrying about it technically. If functionally it is a problem, then yes, you have no option but to wait for the save in order to show it to the user. But, please, when you do, don't do it in a popup. They are horribly intrusive as they have to be dismissed. Just put up an informative message on the screen where the user is sent after (s)he saves the new employee. Much like gmail is telling you about actions you have performed just above the list of messages.
Is there an API in the ASP.NET membership, implementation to get all user profiles at once. If not, is there another good way to get all the names (first + last) of all the users. I'm trying to avoid the many SQL requests generated by getting the user profiles one at a time.
ProfileProvider.GetAllProfiles().
I'd still recommend just adding first and last names to the MembershipUser though. You'll need to cast your provider to the concrete type, which is brittle if you ever want to change it.
Update:
A challenge with the way profile data is stored is that the property names and values are packed and stored in two columns in the Profile database. If you run the aspnet_Profile_GetProperties sproc you will see that.
There is no out-of-the-box sproc that gets profile data for all users. A quick modification to the aspnet_Profile_GetProperties would do that for you though.
One requirement is that when persisting my C# objects to the database I must decide the database ID (surrogate primary key) in code.
Second requirement is that the database type for the key must be int or char(x)... so no uniqueidentifier or binary(16) or the like.
These are unchangeable requirements.
What would be the best way to go about handling this?
One idea is the base64 encoded GUIDs looking like "XSiZtdXcKU68QWe7N96Dig". These are easily created in code and are to me acceptable in URLs if necessary. But will it be too expensive regarding performance (indexing, size) having all primary and foreign keys be char(22)? Off hand I really like this idea.
Another idea would be to create a code version of a database sequence creating incremented integers for me. But I don't know if this is plausible and would need some guidance to secure the reliability. The sequencer must know har far it has come and what about threads that I don't control etc.
I imagine that no table involved will ever exceed 1.000.000 rows... will probably be far less.
You could have a table called "sequences". For each table there would be a row with a counter. Then, when you need another number, fetch it from the counter table and increment it. Put it in a transaction and you will have uniqueness.
However this will suffer in terms of performance, of course.
A simple incrementing int would be the easiest way to ensure uniqueness. This is what the database will do if you let it. If you set the table row to auto_increment, the database will do this for you automatically.
There are no security issues with this, but since you will be handling it yourself instead of letting the database engine take care of it, you will need to ensure that you don't generate the same id twice. This should be simple if you are on a single threaded system, but if your program is distributed you will need to put some effort into ensuring the uniqueness.
Seeing that you have an ASP.NET app, you could do the following (hoping and assuming all users must authenticate themselves before using your app!):
Assign each user a unique "UserID" in your database (can be INT, or CHAR)
Assign each user a "HighestSequentialID" (INT) in your database
When the user logs on, read those values from the database and store them in e.g. a custom principal, or in a cookie, or something else
whenever the user is about to insert a row, create a segmented ID: (UserID).(User's sequential number) and store it as "VARCHAR(20)" - e.g. your UserID is 15 and thus this user's entries would have unique IDs of "15.00001", "15.00002" and so on.
when the user logs off (or at any other time), update its new, highest used sequential ID in the database so that next time around, you'll know what this user has used last
Again - you'll have to do a lot more housekeeping work yourself, and it's always prone to a mishap (assigning a duplicate user ID, or misinterpreting the highest sequential number for that user).
I would strongly recommend trying to get these requirements changed - with these in place, all solutions will be sub-optimal at best, while using the database to handle this would be totally painless.
Marc
For a table below 1.000.000 rows, I would not be too terribly concerned about a char(22) Primary key. Of course the ideal solution for a situation like this would be for each object to have something unique about it that you could leverage for the key, even if it is a multi-part key. The next ideal solution would be to have the requirements changed :)