Building an import process that checks for duplicates

Building an import process that checks for duplicates - asp.net

Using ASP.NET, I'm building an admin tool that requires a function to import a list of email addresses. Upon uploading the file, I want to check for existing records for any of the email addresses supplied. For non-existing email addresses, I would create them using my DAO.
Basically I want to:
Receive list of emails
Retrieve data for existing emails
Create data for new emails in db
Return full data for all emails in list.
Since I want to know which of the emails exist up front, my first thought was to query the table for all records WHERE Email IN ('Email001FromFile', 'Email002FromFile', 'etc...') but the list could potentially contain thousands of email addresses, and I'm not certain supplying that many email addresses to the IN operator would be a good idea.
I also thought about looping through the list and checking for a record for each email, but that would potentially generate far too many queries.
My next thought was to generate a temp table to hold the list and modify the IN clause to use the temp table, rather than an explicit list of items, but that would require I execute SQL or a stored procedure directly, which I'm not inclined to do since I'm using NHibernate to access my DB.
Though I am using ASP.NET (C#) and NHibernate, and any answers specific to that would be helpful, I'm really just looking for general ideas on how to handle this scenario.

If loading the existing e-mails into memory is not an option I would maybe go for some kind of batch approach. Go for the IN-query you mention, but do it only for n emails at time. You could eiter hardcode n to a certain value or you could let it be a function of the total number of new e-mails.
I'm not sure whether this approach really is faster than to perform one single IN-query (someone with more db-skills than me would have to answer that), but that would allow you to indicate some kind of loading status to the user.

Are you doing anything with the emails that are duplicates?
You could put a UNIQUE constraint on your table to only allow an email address to be entered once - then catch the exception SQL will throw when you attempt to insert a duplicate.

Related

Firebase architecture for my app

Here is what I want to do:
Users are getting logged in and then save data (such as thier e-mail, their work, their adress and so on). I saved this data at „/userProfile/exampleUID“. This works as I wnat it to.
Then every user should create his or her own story. Within this stories, mostly strings should be stored. A friend of mine told me, that it would be better to normalize my data, so I thought of saving the stories to „/storyData“. He also told me, that every Story has to have a unique identifier as well, which i create with .push(). Under this identifiers I want to store the users unique id (auth().currentUser.uid) to assign the story to the user who has created it. The strings for the stories should also be stored under the unique ID created by .push(). („/storyData/exampleStoryID/exampleUID“)
The problem is now that i can’t find a method to access this strings or the "/exampleUID". In this case I would need to skip the „/exampleStoryID“-child when creating a query, because without saving I would not know its name. Am I right or did I oversee the method for this?
There would be solutions to this:
I have to save the „.key“ of the „/exampleStoryID“ to the „/userProfile/exampleUID“. With this key I would not need to skip one child while querying, because I can enter this key to Access the data in /“storyData“.
I have to denormalize my data. For me, this would mean that I have to create a new child: „/userProfile/exampleUser/storyData“. Here I could save all the strings.
It may be possible that there will be more data like „/storyAnalysis“ and „/storyComments“. Having that in mind: Which solution should I prefer?
Or do you have other suggestions?
Thanks in advance.
MfG

Is there an inherent risk in publishing other users' ids?

I have a collection called Vouchers. A user can, if they know the unique number ID of a Voucher, "claim" that voucher, which will give it a user_id attribute, tying it to them.
I'm at a point where I need to check a user's ID query against the existing database, but I'm wondering if I can do so on the client instead of the server (the client would be much more convenient because I'm using utility functions to tie the query form to the database operation.... it's a long story). If I do so on the client, I'll have to publish the entire Vouchers collection with correct user_id fields, and although I won't be showing those ids through any templates, they would be available through the console.
Is there an inherent risk in publishing all of the IDs like this? Can they be used maliciously even if I don't leave any specific holes for them to be used in?

First, in general it sounds like a bad idea to publish all user_ids to the client. What would happen if you have 1 million users? That would be a lot of data.
Second, in specific, we cannot know if there is inherent risk in publishing your user_ids, because we do not know what could be done with it in your system. If you use a typical design of user_ids chosen by the user themselves (for instance email), then you MUST design your system to be safe even if an attacker has guessed the user_id.

Short Version: not so good idea.
I have a similar setup up: user can sign-up, if she knows the voucher code. You can only publish those vouchers where the user_id is identical to the logged in user. All other checks like "does the user input correspond to a valid voucher?" must be handled on the server.
Remember: client code is not trusted.

Symfony 2.0/Doctrine2 autoincrement unique text field

In my Symfony 2.0 application, I must import and create users from csv file.
I have no problem to do that, but I also need to generate an unique email for each user (because I have a webmail in my application, so I need to create an internal mail adress )
I would like to know if there is any best practice to auto-increment a duplicate value of unique text field (john.doe#mydomain.com, john.doe_1#mydomain.com ...)
My first idea would be to do the verification in a prepersist event, but maybe there is better solution.

Simply autoincrementing may have concurrency issues if the transaction has a pause or delay.
Adding a nano-second timestamp will effectively guarantee unique, but the email addresses will be unwieldly without an aliaser or autocomplete.
Alternatively, you can start a social campaign to eliminate duplicate names from the entire human population. That might take a few [thousand] months.

semaphore for a datarow

I am writing a web application that allows the user basic CRUD operations against a database. The tables that are being updated have less than 200 records and there may be multiple users using this applications there is a need for some sort of locking mechanism to avoid the 2 users from overwriting each others changes.
I have looked into semaphores but that seems to only limit the number of users executing the same code. In my data layer I have a class file for each table so I can certainly employ this on a specific table's class file but can I somehow limit the locking to the key fields?

Assuming that you are using a proper SQL implementation along with ASP .Net, why dont you use transactions to achieve this? Check it out here.
Additionally, you can also read up on optimistic concurrency to see if that is what you need. Basically, before saving a value, the user checks if the value in a particular field is the same as it was when he first read it. If the value is the same, it is assumed that noone else has overwritten it, and the new value is saved to the DB; if the values are not the same, a warning message is returned instead.

is there a way to get all profiles in ASP.NET membership

Is there an API in the ASP.NET membership, implementation to get all user profiles at once. If not, is there another good way to get all the names (first + last) of all the users. I'm trying to avoid the many SQL requests generated by getting the user profiles one at a time.

ProfileProvider.GetAllProfiles().
I'd still recommend just adding first and last names to the MembershipUser though. You'll need to cast your provider to the concrete type, which is brittle if you ever want to change it.

Update:
A challenge with the way profile data is stored is that the property names and values are packed and stored in two columns in the Profile database. If you run the aspnet_Profile_GetProperties sproc you will see that.
There is no out-of-the-box sproc that gets profile data for all users. A quick modification to the aspnet_Profile_GetProperties would do that for you though.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex