Social Graph Assignment - graph

I have a Social Graph assignment, and I have a pretty good idea of what I want to do, I just want to know if I'm on the right track, and any hints you guys can provide.
Anyway, fairly simple implementation (I found a very complex one here - How to model social graph in Java, but I think it's far more than what I actually need). Essentially my idea is to make a "User" object and a hashmap to keep everything in. A User object will have 4 data structures within it - name (string), student (boolean), school (string), and friends (integer array).
Each user will be added to the hashmap, and thus given a unique key. When a friendship is to be made, say between A and B, I go to the user A in the hashmap and ad the key for user B into A's friends array, and vice versa. That way I can keep track of everyone and who they're friends with.
Does this make sense? It works out in my head, but I feel like I'm missing something in the implementation that will make this not work as well as I think it should.

The answer to this will depend on the requirements and what you want to do with your social graph (especially on whether you want to persist the data or not).
If you are using a hashmap as your user store, then I assume you have a separate class that is generating your ids (or you have a UserStore class that wraps the hashmap and generates them)? If you are not deleting users, then you could suffice to have an ArrayList as you store, with the index being the user key.
When it comes to the users themselves, you could hold their friends in a List, but that may complicate your delete user code slightly (assuming you have that functionality).
UPDATE:
If you want to do analysis, then you may get some benefit from storing a User's friends as a Set<"UserKey"> instead of as an array (but depends on how you plan to do your analysis). You would still need a counter class (or master UserStore class that assigns the ids).

I would add some form of "primary key" to the User object, a number that might be sintetic (taking the next number from a global integer counter). This way, you can avoid the situation of generating a hashCode() value from the other User's data, and then you can avoid collisions inside the Map.

Well, it can work.
The only thing you're missing for sure, is that adding a User to a HashMap does not "give" it a key. The key should be created by you somehow. You can choose the user's first name, last name or to generate an incremental id. You add the User to the HashMap by giving it that key and the User as the value. You'll have to use that key each time you want to retrieve that User from the HashMap.
In your case, if first and last name are unique, use firstName + " " + lastName as the key.
There are many other recommendations that widely depend on the expected usages of the model. So, I don't see a reason to get into all of that.

Related

Humanly readable keys for documents or collection

I have searched throughout stackoverflow looking for a way to generate numerical keys or any type of keys that are readable for the end user.
I have found multiple answers saying (you shouldn't). I get it .. but what's the alternative..
Imagine a customer having an issue regarding an Order for instance and having to spell the uid 1UXBay2TTnZRnbZrCdXh to your call center?
It's usually a good idea to disassociate keys from the data they contain. The data can change, usernames, passwords, locations etc. That kind of data is very dynamic. However, links and references are more static in nature.
Suppose you have a list of followers and you're using their username as a key. If a user changes his username, not only will their entire node have the be deleted and re-written, every other occurance of that key in the database would have the changed as well. Wheras, if the key is static, the only item that changes in the child username.
So to answer the question: here's one option
orders
firebase_generated_key_0
order_number: "1111"
ordered_by: "uid_0"
order_amount: "$99.95"
firebase_generated_key_1
order_number: "2222"
ordered_by: "uid_1"
order_amount: "$12>95"
With this structure you have the order number, a link to the user that ordered it and the total amount of the order. If the customer changes what's on the order, a simple change the order_amount is done and the order stays in place.
Edit:
A comment/question asked about race conditions when writing data with Firebase. There are a number of solutions but a good starting point is with Firebase Transactions to essentially 'lock' data to prevent concurrent modifications.
See Save data as transactions for further reading.

Firebase architecture for my app

Here is what I want to do:
Users are getting logged in and then save data (such as thier e-mail, their work, their adress and so on). I saved this data at „/userProfile/exampleUID“. This works as I wnat it to.
Then every user should create his or her own story. Within this stories, mostly strings should be stored. A friend of mine told me, that it would be better to normalize my data, so I thought of saving the stories to „/storyData“. He also told me, that every Story has to have a unique identifier as well, which i create with .push(). Under this identifiers I want to store the users unique id (auth().currentUser.uid) to assign the story to the user who has created it. The strings for the stories should also be stored under the unique ID created by .push(). („/storyData/exampleStoryID/exampleUID“)
The problem is now that i can’t find a method to access this strings or the "/exampleUID". In this case I would need to skip the „/exampleStoryID“-child when creating a query, because without saving I would not know its name. Am I right or did I oversee the method for this?
There would be solutions to this:
I have to save the „.key“ of the „/exampleStoryID“ to the „/userProfile/exampleUID“. With this key I would not need to skip one child while querying, because I can enter this key to Access the data in /“storyData“.
I have to denormalize my data. For me, this would mean that I have to create a new child: „/userProfile/exampleUser/storyData“. Here I could save all the strings.
It may be possible that there will be more data like „/storyAnalysis“ and „/storyComments“. Having that in mind: Which solution should I prefer?
Or do you have other suggestions?
Thanks in advance.
MfG

Primary keys on webforms (load initially or on save)?

This is just a general question irrespective of database architecture.
I am maintaining an ASP.NET web application. The structure is such that,
Say on 'Add a new employee' webform
The primary key (or the record id to
be saved with) is initially loaded on form
load event & displayed as a label
So when the form loads, the record id to save with is shown to the user
Positives:
End user already knows what the id/serial of the form is (even before he saves the form)
So on form save when he is directed
to gridview screen (with all entries)
he can search records easily
(although the most recent one is at
the top anyway)
Negatives:
If he does not save the form, say he
just cancels after loading the data entry form,
the id/key initially fetched is
wasted (in my case it is a sequence
field fetched on form load from database)
What do you guys do in these scenarios ? Which approach would you recommend for 'web applications'? And how to facilitate the user with a different approach ? Is our current approach recommended (To me,it wastes the ids/sequence from database)
I'd always recommend not presenting the identity field value for the record being created until the record has been created. The "create a temporary placeholder record first to obtain the identity field value ahead of time" approach can, as you mention, result in wasted IDs, unless you have a process in place to reclaim them.
You can always pop-up a message box when the user presses save that tells them the identity field value of the newly created record.
In this situation you could use a GUID created by the application itself. The database would then only have the PK set to be a Unique Identifier (GUID) and that it must not be null. In this situation you are not wasting any unique keys as each call to get a new GUID should be definition produce a (mathmatically) unique identifier. It is worth noting that if you use this method, it is best to make sure your PK is not set up to be clustered. The resulting index reorganisation upon insert could quickly result in an application that suffers performance hits.
For one: I wouldn't care so much about wasted id values. When you are in danger of running out of int32 values (and when has that happened to you last?), use int64. The user experience is way much more important than wasting a few id values.
Having said that, I would not want the primary key to be anything the user would want to type in. If you are having a primary key that users need to type in, chances are it then is (or will be requested to be) more than just an int32/64 value and carries (will carry) meaning in its composition and/or formatting. Primary keys should not have that. (Tons of reasons google for meaningless primary keys or other such terms).
If you need a meaningful key, make it a secondary index that is in no way related to the primary key. If a part of that is still a sequential number taken from some counter value in your database. Decide whether functionally it is a problem for gaps to appear in the sequence. (The tax people generally don't want gaps in invoice numbers). If functionally it is no problem, then certainly don't start worrying about it technically. If functionally it is a problem, then yes, you have no option but to wait for the save in order to show it to the user. But, please, when you do, don't do it in a popup. They are horribly intrusive as they have to be dismissed. Just put up an informative message on the screen where the user is sent after (s)he saves the new employee. Much like gmail is telling you about actions you have performed just above the list of messages.

Store in DB or not to store?

There are few string lists in my web application that i don't know where to store in DB or just class.
ie. I have 7 major browsers with which users enter the site. I want to save these stats thus i need to create browser column in UserLogin database. I don't want to waste space and resources so i can save full browser name in each login row. So i either need to save browserID field and hook it up with Browsers table which will store names following db normalization rules or to have sort of Dataholder abstract class which has a list of browsers from which i can retrieve browser name by it's ID...
The question what should i do ? These few data lists i have contain no more than 200 items each so i think it makes sense to have them as abstract class but again i don't know whether MS-SQL will handle multiple joins so well. Think of idea when i have user with country,ip,language,browser and few more stats ..
thanks
I have been on both sides of the fence about this.
My rule of thumb is:
If one of these lists changes, will I have to do changes to the code, too?
(e.g..: in your case, if someone writes "yet another browser" tomorrow, will I need to write code that caters for it?)
If the answer is "most probably yes" or "definitely" you can leave it inside code.
In all other cases (even just a "maybe, 50%-50%) you better put it in the DB, or at the very least a property file.
And please consider this, too: if you expect to have to provide statistics based on this data (e.g.: "how many users use Explorer") you better put it in the DB anyway: it becomes part of your domain data and therefore it must be there.
About the "domain data" part.
The information stored in your DB is the "domain data" of your application. It is, in a sense, a (hopefully consistent) representation of what your application is about - it represents the "known universe" for your application.
If you agree to this definition, then you must also accept that it does not make sense to have 99.9% of your "reality" in the DB, and 0.1% outside of it - if nothing else, it makes some operations cumbersome (if you only store the smallint you can't create meaningful reports without either post-processing them using the class to decode "1" into "Firefox" or providing some other key for the end-user).
It also makes impossible for you to leverage some inherent DB techniques like foreign key (if you just use a smallint without correlating it to any other table, who guarantees that "10" is an acceptable value in your domain?)
MS SQL handles multiple joins really well; it's up to you where you want to store the data. You can also consider XML too, as another option. I would consider the database or XL; it is easier to change the values than if the values are in code (have to recompile/deploy to change when in production).
HTH.

Generating unique database IDs in code

One requirement is that when persisting my C# objects to the database I must decide the database ID (surrogate primary key) in code.
Second requirement is that the database type for the key must be int or char(x)... so no uniqueidentifier or binary(16) or the like.
These are unchangeable requirements.
What would be the best way to go about handling this?
One idea is the base64 encoded GUIDs looking like "XSiZtdXcKU68QWe7N96Dig". These are easily created in code and are to me acceptable in URLs if necessary. But will it be too expensive regarding performance (indexing, size) having all primary and foreign keys be char(22)? Off hand I really like this idea.
Another idea would be to create a code version of a database sequence creating incremented integers for me. But I don't know if this is plausible and would need some guidance to secure the reliability. The sequencer must know har far it has come and what about threads that I don't control etc.
I imagine that no table involved will ever exceed 1.000.000 rows... will probably be far less.
You could have a table called "sequences". For each table there would be a row with a counter. Then, when you need another number, fetch it from the counter table and increment it. Put it in a transaction and you will have uniqueness.
However this will suffer in terms of performance, of course.
A simple incrementing int would be the easiest way to ensure uniqueness. This is what the database will do if you let it. If you set the table row to auto_increment, the database will do this for you automatically.
There are no security issues with this, but since you will be handling it yourself instead of letting the database engine take care of it, you will need to ensure that you don't generate the same id twice. This should be simple if you are on a single threaded system, but if your program is distributed you will need to put some effort into ensuring the uniqueness.
Seeing that you have an ASP.NET app, you could do the following (hoping and assuming all users must authenticate themselves before using your app!):
Assign each user a unique "UserID" in your database (can be INT, or CHAR)
Assign each user a "HighestSequentialID" (INT) in your database
When the user logs on, read those values from the database and store them in e.g. a custom principal, or in a cookie, or something else
whenever the user is about to insert a row, create a segmented ID: (UserID).(User's sequential number) and store it as "VARCHAR(20)" - e.g. your UserID is 15 and thus this user's entries would have unique IDs of "15.00001", "15.00002" and so on.
when the user logs off (or at any other time), update its new, highest used sequential ID in the database so that next time around, you'll know what this user has used last
Again - you'll have to do a lot more housekeeping work yourself, and it's always prone to a mishap (assigning a duplicate user ID, or misinterpreting the highest sequential number for that user).
I would strongly recommend trying to get these requirements changed - with these in place, all solutions will be sub-optimal at best, while using the database to handle this would be totally painless.
Marc
For a table below 1.000.000 rows, I would not be too terribly concerned about a char(22) Primary key. Of course the ideal solution for a situation like this would be for each object to have something unique about it that you could leverage for the key, even if it is a multi-part key. The next ideal solution would be to have the requirements changed :)

Resources