I'm new to the NoSQL world and I'm wondering if there are some known good patterns established for maintaining a manually sorted list.
More specifically, imagine the typical ToDo list application. At the top you have the List and under that you have each of the Items. I want the end user to be able to drag and drop the items. Therefore, I need to preserve this order for the next time I read these Items.
In a Relational DB I had a few ways to solve this, though none of those ways were that pretty. Maybe there is something better in the world of NoSQL? Not that it should matter, I'm using Firebase.
Just create an integer field for each item that indicates its position in the list. Update the values of the positions when user makes a change.
Related
I currently have a need to add a local secondary index to a DynamoDB table but I see that they can't be added after the table is created. It's fine for me to re-create the table now while my project is in development, but it would be painful to do that later if I need another index when the project is publicly deployed.
That's made me wonder whether it would be sensible to re-create the table with the maximum number of secondary indexes allowed even though I don't need them now. The indexes would have generically-named attributes that I am not currently using as their sort keys. That way, if I ever need another local secondary index on this table in the future, I could just bring one of the unused ones into service.
I don't think it would be waste of storage or a performance problem because I understand that the indexes will only be written to when an item is written that includes the attribute that they index on.
I Googled to see if this idea was a common practice, but haven't found anyone talking about it. Is there some reason why this wouldn't be a good idea?
Don’t do that. If a table has any LSIs it follows different rules and cannot grow an item collection beyond 10 GB or isolate hot items within an item collection. Why incur these downsides if you don’t need to? Plus later you can always create a GSI instead of an LSI.
My table is (device, type, value, timestamp), where (device,type,timestamp) makes a unique combination ( a candidate for composite key in non-DynamoDB DBMS).
My queries can range between any of these three attributes, such as
GET (value)s from (device) with (type) having (timestamp) greater than <some-timestamp>
I'm using dynamoosejs/dynamoose. And from most of the searches, I believe I'm supposed to use a combination of the three fields (as a single field ; device-type-timestamp) as id. However the set: function of Schema doesn't let me use the object properties (such as this.device) and due to some reasons, I cannot do it externally.
The closest I got (id:uuidv4:hashKey, device:string:GlobalSecIndex, type:string:LocalSecIndex, timestamp:Date:LocalSecIndex)
and
(id:uuidv4:rangeKey, device:string:hashKey, type:string:LocalSecIndex, timestamp:Date:LocalSecIndex)
and so on..
However, while using a Query, it becomes difficult to fetch results of particular device,type as the id, (hashKey or rangeKey) keeps missing from the scene.
So the question. How would you do it for such kind of table?
And point to be noted, this table is meant to gather content from IoT devices, which is generated every 5 mins by each device on an average.
I'm curious why you are choosing DynamoDB for this task. Advanced queries like this seem to be much better suited for a SQL based database as opposed to a NoSQL database. Due to the advanced nature of SQL queries, this task in my experience is a lot easier in SQL databases. So I would encourage you to think about if DynamoDB is truly the right system for what you are trying to do here.
If you determine it is, you might have to restructure your data a little bit. You could do something like having a property that is device-type and that will be the device and type values combined. Then set that as an index, and query based on that and sort by the timestamp, and filter out the results that are not greater than the value you want.
You are correct that currently, Dynamoose does not pass in the entire object into the set function. This is something that personally I'm open to exploring. I'm a member on the GitHub project, and if you would like to submit a PR adding that feature I would be more than happy to help explore that option with you and get that into the codebase.
The other thing you might want to explore is having a DynamoDB stream, that will set that device-type property whenever it gets added to your DynamoDB table. That would abstract that logic out of DynamoDB and your application. I'm not sure if it's necessary for what you are doing to decouple it to that level, but it might be something you want to explore.
Finally, depending on your setup, you could figure out which item will be more unique, device or type, and setup an index on that property. Then just query based on that, and filter out the results of the other property that you don't want. I'm not sure if that is what you are looking for, it will of course work, but I'm not sure how many items you will have in your table, and there get to be questions about scalability at a certain level. One way to solve some of those scalability questions might be to set the TTL of your items if you know that you the timestamp you are querying for is constant, or predictable ahead of time.
Overall there are a lot of ways to achieve what you are looking to do. Without more detail about how many items, what exactly those properties will be doing, the amount of scalability you require, which of those properties will be most unique, etc. it's hard to give a good solution. I would highly encourage you to think about if NoSQL is truly the best way to go. That query you are looking to do seems a LOT more like a SQL query. Not saying it's impossible in DynamoDB, but it will require some thought about how you want to structure your data model, and such.
Considering opinion of #charlie-fish, I decided to jump into Dynamoose and improvise the code to pass the model to the set function of the attribute. However, I discovered that the model is already being passed to default parameter of the attribute. So I changed my Schema to the following:
id:hashKey;default: function(model){ return model.device + "" + model.type; }
timestamp:rangeKey
For anyone landing here on this answer, please note that the default & set functions can access attribute options & schema instance using this . However both those functions should be regular functions, rather than arrow functions.
Keeping this here as an answer, but I won't accept it as an answer to my question for sometime, as I want to wait for someone else to hit out a better approach.
I also want to make sure that if a value is passed for id field, it shouldn't be set. For this I can use set to ignore the actual incoming value, which I don't know how, as of yet.
We want to use Riak's Links to create a doubly linked list.
The algorithm for it is quite simple, I believe:
Let 'N0' be the new element to insert
Get the head of the list, including its 'next' link (N1)
Set the 'previous' of N1 to be the N0.
Set the 'next' of N0 to be N1
Set the 'next' of the head of the list to be N0.
The problem that we have is that there is an obvious race condition here, because if 2 concurrent clients get the head of the list, one of the items will likely be 'lost'. Any way to avoid that?
Riak is an eventually consistent system when talking about CAP theorem.
Provided you set the bucket property allow_multi=true, if two concurrent clients get the head of the list then write, you will have sibling records. On your next read you'll receive multiple values (siblings) and will then have to resolve the conflict and write the result. Given that we don't have any sort of atomicity this will possibly lead to additional conflicts under heavy write concurrency as you attempt to update the linked objects. Not impossible to resolve, but definitely tricky.
You're probably better off simply serializing the entire list into a single object. This makes your conflict resolution much simpler.
Use Case:
End-User searches for something and an ArrayCollection is returned with Result objects. This is displayed in a data grid.
End-User selects a few of the search results and "moves" it over to another datagrid for use later.
End-User does another search.
PROBLEM:
Some of the search results might contain something the user already previously selected and moved over to the second datagrid. I want to remove these from the second search result.
How can I do this quickly, and efficiently in Flex code?
disableAutoUpdate() on both array collection
loop through the first one and for each item of the second remove it if it's present in the first one (or adapt the algorithm based on what you really want - unsure)
enableAutoUpdate() at the end.
Looping through array collection can be quick if no events are dispatched.
Second option, you could also loop through a cheap copy made up of an array, which is arraycollection.source.concat(), or even a vector if all your items are of the same type. That will give the maximum speed, but you might lose in the long run as you need to convert back to an array collection at the end.
So I would stick to the first option.
For the time being, I've implemented a hash collection (extends ArrayCollection). Hash only allows unique values, so in the end, it serves my purpose even though the UI might be confusing to the user. Will probably implement the above method at a later date. :)
I am facing this question in a new little project:
The system to be built will allow user to add new columns to a table in the system, and then the user will be able to maintain the data, I think there is two ways to implement this:
1) create a few tables including "columns" table with "columnName" "columnValue" "datatype" etc to store the column definition, aonther table "XXCoumn" to store the value of the column (entered by user), and user a store procedure to query/update column data.
2) create the column in the table schema when user enter a new column, then the maintain of the table data is just as normal
which way do you guys reckon? or any new suggestion?
Some additional info: the data volumn is small, and I need to create reports.
Any good recommendations would require a much better understanding of your requirements, but here are some comments on the options you mentioned, as well as some additional thoughts.
1) Entity-Attribute-Value (EAV) Design: This is the option you describe where you have a table that has columns for ColumnName, Type and Value. This option has the advantage of being able to accomodate unlimited new columns easily, but I have found it to be painful when the time comes to retrieve meaningful data back. For example, say you have rows in this EAV table for {Color, varchar}{Red, Green, Blue}, and {Size, varchar}{Small, Medium, Large}. If you want to find all the small green items, you need something like this (untested SQL of course):
SELECT *
FROM ITEMS
WHERE ITEMID IN (SELECT ITEMID
FROM ITEM_ATTRIBUTES ATT INNER JOIN ITEM_VALUES VLS
ON ATT.AttributeID = VLS.AttributeID
WHERE ATT.ColumnName = 'Color' AND VLS.Value = 'Green')
AND ITEMID IN (SELECT ITEMID
FROM ITEM_ATTRIBUTES ATT INNER JOIN ITEM_VALUES VLS
ON ATT.AttributeID = VLS.AttributeID
WHERE ATT.ColumnName = 'Size' AND VLS.Value = 'Small')
Contrast this with having actual columns on the items table for color and size:
SELECT *
FROM ITEMS
WHERE COLOR = 'Green' AND SIZE = 'Small'
In addition, you will have a difficult time maintaining data integrity, if that is important for this app (and it is almost always important, even when you are told otherwise). In the example above, you will need to implement extra logic if "Color" should be limited to Blue, Green, and Red. Also, you will need to implement even more logic if certain colors only come in certain sizes (example - blue items are only available in small and medium)
2) User-Defined Columns: Just giving the user the ability to add additional columns to the table has the advantage of making data retrieval simpler, but all the data integrity issues remain. Also, your app usually requires extra logic to deal with the unknown columns.
3) Pre-Existing Custom Columns: I have worked with a few apps, such as CRMs, that provide a dozen or more columns already in place for user definition. Basically, the designers put in columns like "Text1","Text2","Text3","Number1","Number2", etc. The users then provide header and description information for these columns, and that is what the app uses for display purposes. This model has the advantage of easy data retrieval, as well as a pre-defined DB schema which should simplify app logic. Data integrity issues remain, however. The other obvious downside is that you will run out of pre-defined columns, which is what you are usually trying to avoid with this type of solution.
As with most design issues, there are tradeoffs to each solution. My experience has been that while many users/clients say they want solutions like these, in reality they are simply trying to ensure they don't get trapped with an app that can't grow with their needs. I have found that there are actually very few places where a design like this is needed. I can almost always create a design that addresses the expansion desires of the client without putting them into the role of database designer.
"The system to be built will allow user to add new columns to a table in the system..."
Really - that's the user story? Sounds like you've already made up your mind on the solution, to me.
Whether it's a good idea or not to allow a user to extend schemas is pretty context dependent. I'd have little problem in an admin-like, limited use way. But it'd be a horribly bad idea in a MySpace type way. I suspect your situation lies somewhere between those 2 extremes.
Extending the schema would lead to greatly more efficient queries - as you could add indexes and such - but it does expose some relational rules on your users. Also, the extension would (probably) lock the entire table and concurrent edits would need to be dealt with.
If this is centrally hosted by you, I would suggest NOT allowing user-input data to change the schema of the database (i.e. drive the creation of new tables).
Rather you may want to look into using XML fields in SQL to store variable field names of data, or a more generic table structure... this technique works pretty well if we're not talking crazy amounts of data...
Is it possible you're looking at your solution sideways? It sounds like you need a mapping table (sort of like your #1). You have a table, say "objects" for example, a table called "properties" which holds what you're calling columns and then a table that holds the values, so it just has object_id, property_id, value.
To put in a smarter way than I said it, take a look at the Entity-attribute-value model.