I'm setting up a table of people in DynamoDB, and I'd like to tag those people.
For demonstrations purposes lets say those tags are just strings... "tall", "short", "likes baseball" and so on...
How can I set up the data so I can quickly query all the people with a specific tag, like all the "tall" people?
Can I avoid scanning the table? Can I avoid creating multiple tables? Is this actually a much better use case for relational data structures? What if I come up with new tags on the fly? Relational doesn't work in that case.
Update:
People Tags Mappings
====== ==== ========
John firefighter John > firefighter
Sally young John > young
Joe owner Sally > owner
Anne staff Anne > owner
Chris zebra-lover Chris > zebra-lover
Ben 42 Ben > zebra-lover
In general to avoid scanning when you want to query an attribute which is not the primary key, you can use global secondary keys. For your case that probably doesn't work well, as you might want to be able to tag people with multiple tags at once.
Therefore I'd instead go for a separate table which just contains mappings of tags to people. In that table one item should be the mapping of one tag to one person. If a person has multiple tags, just add multiple items in there.
That way you'd query the tag-table for a given tag to get the primary key of all the people you're searching and would do another query against the people-table afterwards to get their details.
That works for new tags as well, as they'd mean just additional items in the tag-table.
Related
A totally neo4j noob is talking here,
I like to create a graph to store a set of users, a typical user is as follows:
CREATE
(node_1 {FullName:"Peter Parker",FirstName:"peter",FamilyName:"parker"}),
(node_2 {Address:"Newyork",CountryCode:"US"}),
(node_3 {Location:"Hidden"}),
(node_4 {phoneNumber:11111}),
(node_5 {InternetEmailAddress:"peter#peterland.com")
now the problem is,
Every time I execute this I add 5 more nodes.
I know I need to use a unique key, but all example I saw can use a unique key for a specific node. So how can I make sure a user doesn't get added if it already exists(I can use email address as unique key).
how do I update the nodes if some changes occur. for example, after a week I want to update the graph to contain the following instead of the previous one.(no duplicates)
CREATE(node_1 {FullName:"Peter Parker",FirstName:"peter",FamilyName:"parker"}),(node_2 {Address:"Newyork",CountryCode:"US"}),(node_3 {Location:"public"}),(node_4 {phoneNumber:11111}),(node_5 {InternetEmailAddress:"peter#peterland.com"),(node_6 {status:"Jailed"})
(NOTE the new update changed location to "public" and added a new node for peter
Seeing as you had a load of nodes anyway.
Some of the data you have modelled as Nodes are probably properties as the other answer suggests, some are possibly correctly modelled as Nodes and one could probably form the or a part of the relationship.
Location public/hidden can be modelled in one of three ways, as a property on the Person, as a property between the Person and the Location or as the relationship type. To understand that first you need to have a relationship.
Your address at the moment is another Node, I think this is correct, but possibly you would want two nodes, related something like this:
(s:State)-[:IN_COUNTRY]-(c:Country)
YMMV and clearly that a US centric model, but you can extend it easilly enough.
Now you could create Peter with a LIVES_IN relationship:
CREATE (p:Person{fullName:"Peter Parker"}), (s:State{name:"New York"}), (c:Country{code:"US"}),
(p)-[:LIVES_IN]->(s), (s)-[:IN_COUNTRY]->(c)
For speed you are better off modelling two relationships which could be LIVES_IN_PUBLIC and LIVES_IN_HIDDEN which means to perform that update that you want above then you have to delete the one and create the other. However, if speed is not of the essence, it is common also to use properties on the relationship.
CREATE (p:Person{fullName:"Peter Parker"}), (s:State{name:"New York"}), (c:Country{code:"US"}),
(p)-[:LIVES_IN{public:false}]->(s), (s)-[:IN_COUNTRY]->(c)
So your complete Q&A:
CREATE (p:Person {fullName:"Peter Parker",firstName:"peter",familyName:"parker", phoneNumber:1111, internetEmailAddress:"peter#peterland.com"}),
(s:State {name:"New York"}), (c:Country {code:"US"}),
(p)-[:LIVES_IN{public:false}]->(s), (s)-[:IN_COUNTRY]-(c)
MATCH (p:Person {internetEmailAddress:"peter#peterland.com"})-[li:LIVES_IN]->()
SET li.public = true, p.status = "jailed"
When adding other People you probably do not want to recreate States and Countries, rather you want to match them, and possibly Merge them, but we'll stick to Create.
MATCH (s:State{name:"New York"})
CREATE (p:Person{name:"John Smith", internetEmailAddress:"john#google.com"})-[:LIVES_IN{public:false}]->(s)
John Smith now implicitly lives in the US too as you can follow the relationship through the State Node.
Treatise complete.
I think you're modeling your data incorrectly here - you're setting up each property of the person as a separate node, which is not a good idea. You don't have any linkages between those nodes, so with this data pattern, later on you won't be able to tell what Peter Parker's address is. You're also not using node labels, which I think could really help here.
The quick question to your answer about updating nodes is that you have to MATCH them, then use SET to modify a property. So if you had a person, you might do this:
MATCH (p:Person { FullName: "Peter Parker" })
SET p.Address = "123 Fake Street"
RETURN p;
But notice I'm making assumptions about the way your data is structured. I'll take that same data you provided, this might be a better way of creating it:
CREATE (node_1:Person {FullName:"Peter Parker",
FirstName:"peter",
FamilyName:"parker",
Address:"Newyork",CountryCode:"US",
Location:"Hidden",
phoneNumber:11111,
InternetEmailAddress:"peter#peterland.com"});
The difference with this suggestion is that I'm putting all the properties into a single node (instead of one property per node) and I'm applying the Person label to the node.
If you structured the data like this, then the update query I provided would work. Structuring the data like you have it, it's not possible to update Peter Parker's address, because there's no relationship between your node_1 and node_2
I am creating an asp.net website (through Expression Web 4 and Visual Web Developer 2010 Express), using a database created via Sql Server 2008. I am however stuck with regards to the database schema, which feeds into the website.
The website being created is to allow legal cases to be created, with the type of case being recorded and then have Lawyer(s) and/or Advocates and/or Support Staff recorded as working on the case.
So a case called 'Kramer v Kramer' will be a divorce case with 2 lawyers, 1 advocate and 4 support staff all working on the case. However another case 'The People v Larry Flynt' may be a prosecution case with 4 lawyers, no advocate and 10 support staff.
What I have done so far is that I have a table called 'Case Table' which contains the following columns:
Case_ID (this being the Primary Key for the table); Case_Type_ID; Lawyer_ID; Advocate_ID, and Support_Staff_ID. The last 4 columns in the Case Table are Foreign Keys.
The Case_Type Table simply has two columns: 'Case_Type_ID' (PK) and 'Area_of_law'. Case_Type_ID is simply numbers (001,002...) and 'Area_of_law' is char(30) (e.g. litigation, divorce, property, criminal). So far, so good.
The problem is (and this applies to the Advocates and Support Staff), if I create the Lawyer Table to simply have three columns:
Columns = Lawyer_ID (as the primary key) | First Name | Last Name
record 1 = 001 | Tom | Hanks
record 2 = 002 | Tom | Cruise
record 3 = 003 | Daniel | Craig
record 4 = 004 | Nicole | Kidman
how do I then assign multiple Lawyers to the case?
Within the Case Table, in order to comply with the normalisation rules, presumably I would need to add columns to the Case Table as described above so it records Lawyer_ID_1 (FK)| Lawyer_ID_2 (FK)| Lawyer_ID_3 (FK) | Lawyer_ID_4 (FK).
So it could be 004 | 002 | Null | Null to show Nicole Kidman and Tom Cruise were working on the case. If I only had one Lawyer_ID column, in order to show Nicole Kidman and Tom Cruise working on the case, it would have to be '004,002', which is a big no-no from what I have read on this website.
By recording all the lawyers in a separate table, I am able to to create a databound checkboxlist (bound to the Lawyer Table) in visual web developer 2010 express because it populates the list with the names of the lawyers. So I am pleased with this approach.
But if then had 50 lawyers within the Lawyer Table, I would need to have 50 columns in the Case Table assigned to just lawyers (if a Case ended up with 50 lawyers working on it). This approach is problematic as the Case Table may get skewed if lawyers are removed or added because the number of Lawyer columns become affected. Clearly this is a bad approach, so is anyone able to help?
Is there a database schema that would allow one 'Case' record to be related to multiple 'Lawyer' records, without numerous columns? (hopefully the schema can also be applied to the Support Staff Table and the Advocate Table).
Your help will be much appreciated. I am new to web development and sql database work and I've tried to look up the solution, but I probably don't know the keywords for my problem.
This is probably a many-to-many relationship. Allow me to explain the 3 relationships a little in your case:
If each case has only one lawyer, and each lawyer can only ever work on one case, then you have a one to one relationship.
If each case has only one lawyer, but each lawyer can work on multiple cases, then you have a one to many relationship. Likewise, if each case can have multiple lawyers, but each lawyer can only work on one case, then you still have a one to many relationship.
If each case can have multiple lawyers, and each lawyer can work on multiple cases, then you have a many to many relationship.
The typical solution to a many to many relationship is to have a third table to describe the relationship:
create table casesALawyerWorksOn (case_id, lawyer_id)
If #albert, #brad, and #christine work on a case #peopleVflynt, then:
insert into casesALawyerWorksOn (case_id, lawyer_id) values
(#albert, #peopleVflynt)
(#brad, #peopleVflynt)
(#christine, #peopleVflynt)
Thus, all the cases that a lawyer #fred works on are:
select case_id from casesALawyerWorksOn where lawyer_id = #fred
and all the lawyers that work on a case #kramerVkramer are:
select lawyer_id from casesALawyerWorksOn where case_id = #kramerVkramer
I created extended user profiles and I'd like to query them.
Suppose a user lives in 2 states: STATE1: New York. STATE2: California, I can easily have 2 query boxes (with views) for STATE1 and STATE2, but I'd like to have ONE query box that will ask EITHER fields.
With views, I know how to do :
If STATE1="myValue-1" OR STATE2="myValue-2"
but what I'd like to do is
If STATE1="myValue" OR STATE2="myValue"
Is this possible ?
Many thanks.
Sam
Similar question was answered by me here
As I wrote, while it seems like an overhead, I couldn't find a better solution yet.
Hope this helps,
Shushu
We have a db driven asp.net /sql server website and would like to investigate how we can allow users to create a new database category and fields - is this crazy?. Is there any examples of such organic websites out there - the fact that I havent seen any maybe suggest i am?
Interested in the best approach which would allow some level of control by Admin.
I've implemented things along these lines with a dictionary table, rather than a more traditional table.
The dictionary table might look something like this:
create table tblDictionary
(id uniqueidentifier, --Surrogate Key (PK)
itemid uniqueidentifier, --Think PK in a traditional database
colmn uniqueidentifier, --Think "column name" in a traditional database
value nvarchar, --Can hold either string or number
sortby integer) --Sorting columns may or may not be needed.
So, then, what would have been one row in a traditional table would become multiple rows:
Traditional Way (of course I'm not making up GUIDs):
ID Type Make Model Year Color
1 Car Ford Festiva 2010 Lime
...would become multiple rows in the dictionary:
ID ITEMID COLUMN VALUE
0 1 Type Car
1 1 CarMake Ford
2 1 CarModel Festiva
3 1 CarYear 2010
4 1 CarColor Lime
Your GUI can search for all records where itemid=1 and get all of the columns it needs.
Or it can search for all records where itemid in (select itemid from tblDictionary where column='Type' and value='Car' to get all columns for all cars.
In theory, you can put the user-defined types into the same table (Type='Type') as well as the user-defined columns that that Type has (Type='Column', Column='ColumnName'). This is where the sortby column comes into it - to help build the the GUI in the correct order, if you don't want to rely on something else.
A number of times, though, I have felt that storing the user-defined dictionary elements in the dictionary was a bit too much drinking-the-kool-aid. Those can be separate tables because you already know what structure they need at design time. :)
This method will never have the speed or quality of reporting that a traditional table would have. Those generally require the developer to have pre-knowledge of the structures. But if the requirement is flexibility, this can do the job.
Often enough, what starts out as a user-defined area of my sites has had a later project to normalize the data for reporting, etc. But this allows users to get started in a limited way and work out their requirements before engaging the developers.
After all that, I just want to mention a few more options which may or may not work for you:
If you have SharePoint, users already have the ability to create
their own lists in this way.
Excel documents in a shared folder that are saved in such a way
to allow multiple simultaneous edits would also serve the purpose.
Excel documents, stored on the webserver and accessed via ODBC
would also serve as single-table databases like this.
I am trying to write a Firefox 3 add-on which will enable me to easily re-tag bookmarks. For example I have some bookmarks tagged "development" and some tagged "Development" and I would like a way to easily update all the "delelopment" tags to "Development". Unfortunately I can not find an add-on to do this so I thought I would create my own.
Having not developed an add-on before I've managed to grasp the basics and discovered that FireFox stores all bookmarks in an SQLite database called Places.sqlite. Within that database there is a table called moz_bookmarks which contains all the bookmarks, tags and folders within the bookmarks directory. The structure of the bookmark folders and their child bookmarks is represented using a foreign key id which points to the parent folder's id in the same table which again recursses upwards to that parent folder's Id until it hits the bookmarks root.
However, where I become stuck is how the tags you apply in firefox are related to the bookmarks. Each tag has a type = 2 and parent ID = 4. However I can see no correlation between this and an actual bookmarks that use the tag. If I add a bookmark in firefox to no particular folder but give it 2 or 3 tags then it's parent folder ID is 5 which corresponds to "unfiled" but I can see no further correlation to the tags associated with it.
I have found this Wiki page on the structure but it does not really help.
It's driving me nuts :( Please help...
You probably already found out yourself, but tags are applied as follows:
The central place for all URLS in the database is moz_places. The table moz_bookmarks refers to it by the foreign key column fk.
If you tag a bookmark, there are multiple entries in moz_bookmarks, all having the same reference fk: The first is the bookmark itself (having the title in the title column) For each tag, there's an additional entriy in moz_bookmarks having the same foreign key fk and refering to the tag in the parent coumn (which points to the moz_bookmarks row for the tag).
If you have a bookmark 'http://stackoverflow.com' titled 'Stackoverflow' with tags 'programming' and 'info', you will get:
moz_places
----------
id url (some more)
3636 http://stackoverflow.com
moz_bookmarks
-------------
id type fk parent title (other columns omitted...)
332 1 3636 5 Stackoverflow (parent=5 -> unfiled folder)
333 2 (NULL) 4 programming (programming tag, parent=4 -> tags folder)
334 1 3636 333 (NULL) (link to 'programming' tag)
335 2 (NULL) 4 info (info tag, parent=4 see above)
336 1 3636 335 (NULL) (link to 'info' tag)
Hope this helps...
As MartinStettner suggested tag structures are based on the foreign key for the tag id so you first have to determine the moz_bookmark.id for the target tag.
This Mozilla PDF explains the relationship in sqllite ...
Tags result in two new entries in
moz_bookmarks. The first one is the tag,
with parent=4 (tags), and fk=NULL. The
second entry follows the first one and has
the previous tag as its parent, and fk points
to the proper entry in moz_places.
Using that as a guide ... Once you know the id for the tag you can join moz_places.id ON moz_bookmarks.fk ...
SELECT moz_places.id, moz_places.url, moz_places.title, moz_bookmarks.parent
FROM moz_places
LEFT OUTER JOIN moz_bookmarks
ON moz_places.id = moz_bookmarks.fk
WHERE moz_bookmarks.parent = N
Export ...
I can't quite help you with the how-to, however, perhaps the extension 'SQLite Manager' will help you at least for the part where you're trying to figure out what to do. The plugin is a general manager, but it contains the default databases used by Firefox as standard option as well.
Using that extension it should be relatively straightforward to rename the keywords you like. If you're just looking for a way to fix it, this could work, if you still prefer to write your own tool, perhaps this one can at least help with the queries ;).
I can't help much either -- found your question looking for the same kind of answers...
What I have managed to find is the relevant Mozilla documentation. The bookmark system is called Places, and the the database tables are described at https://developer.mozilla.org/en-US/docs/The_Places_database.
Because I had to dig deeper to find this information, here is the request to fetch url, title and creation date of all your bookmarks:
SELECT h.url, b.title, b.dateAdded
FROM moz_places h
JOIN moz_bookmarks b
ON h.id = b.fk;
I hope it'd help people looking for this answer.