Handling asynchronous edits of documents on the web - asynchronous

I am writing a Web application that has a user interface for editing data. The idea is something similar to a wiki where there are edits to chunks of text. What is the best way to handle asynchronous edits from multiple users? The situation I am considering is this:
There is a document that is version 0. User A is editing it when it is version 0. A few minutes later but before user A saves his changes, user B opens up the same document and starts editing. How should the server treat the two different edits to version 0 of the document? Also what is this problem called and where can I get more information about similar problems?

Wikipedia addresses this problem in the following way:
Assume that person A and person B are both editing the same document. Also assume person A submits their edits slightly before person B.
First the media wiki software runs a traditional diffing algorithm over both edits.
Next, the results of the diffing algorithm are used to merge the text.
If the diffing algorithm finds that there are merge conflicts (i.e. person A and B edited the same piece of text), then person B is asked to resolve the conflicts since they submitted their edits last.
Wikipedia handles merge conflicts much like conflicts in a code repository.
If you want to allow multiple people to edit a document simultaneously and in real-time (like with google wave or etherpad), then I'd recommend looking into operational transforms (aka OT). Though the OT algorithm is no harder or simpler than a traditional diffing algorithm, there is less information on it and fewer ready-made implementations.

One typical pattern is to send each user a chunk of text and also a version number indicating which version they received. The rule is that the host will only accept the first revision to the currently activer version.
That way only one person can revise each version; everyone else would be told that their version was obsolete and you can do what you want for them at that point - usually send them the current version to retry.
This only works if it's unlikely to have multiple people working on the same version. If that's likely, then you probably need to research how subversion for instance handles multiple revisions to source code.
There are also schemes for multiple people working simultaneously on the same text and being feed each others' updates - see Google Wave for one example.

Related

Static data storage on server-side

Why some data on server-side are still stored in DBC files, not in SQL-DB? In particular - spells (spells.dbc). What for?
We have a lot of bugs in spells and it's very hard to understand what's wrong with spell, but it's harder to find it spell...
Spells, Talents, achievements, etc... Are mostly found in DBC files because that is the way Blizzard did it back in the day. It's true that in 2019 this is a pretty outdated way to work indeed. Databases are getting stronger and more versatile and having hard-coded data is proving to be hard to work with. Hell, DBCs aren't really that heavy anyways and the reason why we haven't made this change yet is that... We have no other reason other than it being a task that takes a bit of time and It is monotonous to do.
We are aware that Trinity core has already made this change but they have far more contributors than we do if that serves as an excuse!
Nonetheless, this is already in our to-do list if you check the issue tracker at the main repository.
While It's true that we can't really edit DBC files because we would lose all the progress when re-extracted or lost the files, however, we can modify spells in a C++ file called SpellMgr.
There we have a function called SpellMgr::LoadDbcDataCorrections().
The main problem while doing this change is that we have to modify the core to support this change, and the function above contains a lot of corrections. Would need intense testing to make sure nothing is screwed up in the process.
In here by altering bits you can remove or add certain properties to the desired spells instead of touching the hard coded dbc files.
If you want an example, in this link, I have changed an Archimonde spell to have no cast time.
NOTE:
In this line, the commentary about damage can be miss leading but that's because I made a mistake and I haven't finished this pull request yet as of 18/04/2019.
The work has been started, notably by Kaev. I think at least 3 DBCs are now useless server side (but probably still needed client side, they are called DataBaseClient for a reason) like item.dbc.
Also, the original philosophy (for ALL cores, not just AC) was that we would not touch DBC because we don't do custom modifications, so there was no interest in having them server side.
But we wanted to change this and started to make them available directly in the DB, if you wish to help with that, it would be nice!
Why?
Because when emulation started, dbc fields were 90% unknown. So, developers created a parser for them that just required few code changes to support new fields as soon as their functionality was discovered.
Now that we've discovered 90% of required dbc fields and we've also created some great conversion tools for DBC<->SQL, it's just a matter of "effort".
SQL conversion is useful to avoid using of client data on server (you can totally overwrite them if you don't want to go against EULA) or just extends/customize them.
Here you are the issue about DBC->SQL conversion: https://github.com/azerothcore/azerothcore-wotlk/issues/584

Valuable? Fields support in REST API

I've read some topics about GraphQL, and one of the great features I like is you can specify fields you want (Client End).
I'm thinking maybe I can also add it into REST API. I look around and find there has already such specification: fetching-sparse-fieldsets
So I'm trying to add such feature in Symfony. (Especially, in FOSRestBundle+JSMSerializer).
But I'm not quite sure whether it is valuable or not. Can someone give you advice?
That is a question you should ask to your API users and your customer. It might be useful for the API user, but it has a lot of downsides:
By building it yourself, you spend a lot of time by developing, testing and maintaining this 'optional' feature. Is the customer willing to pay? (YAGNI)
As mentioned above, you must maintain the code for this feature, you cannot remove it unless you decide to release a new API version. As external packages change, your code might require an update aswell.
It can become difficult when troubleshooting. API users might be trying to retrieve a field that isn't specified in the API URL (ofcourse these issues arise after project transfers to other developers). Questions can come up why some data isn't available, but is present in API documentation.
Especially the first one in the list is incredibly important. Don't build features that the customer does not need. Personally I always return all accessible data available, even null values. The API users decide what to do with that data. Bandwidth isn't such a problem these days I guess.

Performance of large collection in Meteor 1.0.X

There has been a LOT of development in the Meteor world, and as such it's getting hard to find answers that work for current versions due to the plethora of answers you find for old, out-dated versions.
I have an app that has a LOT of data in a particular collection. By lots I mean somewhere between 10k-100k, and very potentially a lot more. Essentially it's log data, and I need to display the results in a table with no pagination (like a tail). In researching ways to optimize large collections I keep running into things like this that seem to be for older versions of Meteor.
So, as I see it my options are:
Use fast-render plugin to display the page prior to the subscription (at least this is my understand on how it works).
Use some sort of progressive publish function, where it loads limited more relevant bits of data first, then progressively loads the remaining data by expanding the window/limit (not sure if this would cause heavier load on the server, though). There seems to have been a "progressive publish" plugin, but it doesn't seem to be under active development any longer.
Optimize the lookups via indexing (How do you specify that when creating the collection???)
Profiling and optimizing the template further (not sure how).
Some other method I haven't thought of yet...
Some combination of all-the-above.
What is the proper approach by which to publish and render lots of data in this way?
I'm going to assume that "optimize" means reduced query time.
Always start with the biggest bang for your buck.
Unless you're publishing the entire collection, or query on the _id, then you want to create an index using _ensureIndex. Get more info on this on the mongodb website or by searching other questions. http://docs.mongodb.org/manual/reference/method/db.collection.ensureIndex/
Second, limit the fields to just the info you need. eg {fields: {a:1, b:1}}. http://docs.meteor.com/#/full/fieldspecifiers
Third, don't sort.
If this still isn't good enough, make another question with schema & query details & the desired UI so we can better understand the reactivity and why you can't use some form of pagination.

Lotus Smartsuite to "something newer"

I shall try and keep my scenario as brief as possible and to the point.
The office I’m currently working for uses Lotus Smartsuite on Windows 98 / XP, using lots of Lotus Script to tie together Lotus 123 and Lotus Word Pro documents. They also make heavy use of the Lotus Object Linking functions. I shall describe its behaviour below:
You can fill rows and columns in a 123 Spreadsheet with data galore, style it and format it any way you like and define it as a range (nothing unique here). However, you can then copy that range and paste it as a link in a Lotus Word Pro document. This link is then categorised by its range name, so expanding the range back in the 123 file causes the table in the Word Pro Document to expand. This link also carries with it all the formatting and styling of the cells in the 123 Spreadsheet. As I imagine you are now aware, this link is completely live, you can double click anywhere in the object and it opens up the 123 file for editing, and all changes go backward and forward between the two documents. Most of the data retrieved from testing equipment is stored in these 123 spreadsheets and then parts of that are linked into a final Lotus Word Pro report document sent to the customer.
Note: Just to be clear, this is NOT the same as a DDE link in Open Office, which seems to allow for copying of a non-defined range of cells to be imported into a document where all formatting is lost and editing back and forth is not straight forward. It also behaves differently to an OLE object, which seems to only import the entire Spreadsheet rather than a small subsection of it.
However, in recent years, support this older software (Lotus) is becoming more difficult, especially with regards to sending customers documents (Lotus word Pro files are generally unsupported by more modern Office Tools) and technical support for Lotus Smartsuite seems to be practically non-existent these days. Also, with the fear of on going development in a scripting language no-longer being practised by mainstream IT technicians, on-going development and support seems futile. Once the guys who wrote it move on to other things, we will be left with spaghetti script in a language nobody can help us with.
So, we have this goal of "modernising" our IT system by the end of the year. Linux is becoming a very viable option too (No doubt Debian or a derivative), but Open Office doesn't seem to have the linking capability mentioned above. The reason this linking is so important is because the veterans of the office are so used to working this way - storing data in the spreadsheet, linking back to it later in their Word Pro documents, etc. I think they are more than keen to keep this practice going and we have found no equivalent of it in modern office tools (as was requested of me). I can see, as a software engineer (fluent in many languages), how this practice is not the safest or best way of using and storing data (databases spring to mind), but I was wondering if someone could give me a few other good reasons as to why this is bad practice in the work place (I was always in the belief that you should keep your data away from your reporting and formatting, the two should never be entwined - this looks like spreadsheet hell to me) ... or why this is a good thing to keep doing!?
So, for those of you still with me, I guess what I am asking is:
Is this practice of storing data, formatting it in spreadsheets and importing that directly back and forth between word documents good or bad, and what can be done about it? I guess I'll need to prove my point in case either way for this.
Are there ANY modern alternatives to this linking method (regardless of weather it is good or bad practice or not) out there for Linux or Windows? This link MUST carry formatting as well as dynamic range sizes (DDE links don't seem to be the answer).
What would your solution be if you had to start from scratch? Store everything in databases and use SQL to simply ask for the data you need in your word documents? How would you do this? What software would you use?
Any help with this scenario would be more than helpful, or if you know anywhere I should go to ask for advice, that would be appreciated too.
Thank-you for reading!
My suggestion is to first take a step back. What is the benefit to the way things are done now? Is it just a habit that is tough to break? Is there any reason the documents and spreadsheets need to be maintained and linked the way they are, or is it just a requirement because 'that's how it was done before'?
If you can remove that requirement, you have a lot more options and you're building a system that's easier to understand and maintain.
Regarding question 1, I believe there's nothing wrong with storing data in spreadsheets, especially if the end-users need to create and maintain them and development staff is limited. Some questions are whether that data needs to be secured, is related between spreadsheets, is duplicated across the company, or should be shared in a better way across the company. If any of those are true then a centralized database would make more sense. Personally I'd want any valuable data safely stored in a database where it can be managed, access to it can be controlled, it can be easily backed-up, etc.
Regarding question 2, you can do the same thing in Microsoft Office. You can either link the documents, so that the data stays in the source excel doc but appears in the word doc, or you can embed the excel spreadsheet within the word doc.
You might want to look at Microsoft Access for storing the data and generating reports. Or you could build an application using a relational database back-end and reporting front-end. The possibilities are wide-open. It really depends on where the expertise lies within the company.
If it were me I'd probably use a SQL Express back-end (it's free) and a custom ASP.NET MVC application for generating the reports, but that's just where my expertise lies.

Best Practices for Passing Data Between Pages

The Problem
In the stack that we re-use between projects, we are putting a little bit too much data in the session for passing data between pages. This was good in theory because it prevents tampering, replay attacks, and so on, but it creates as many problems as it solves.
Session loss itself is an issue, although it's mostly handled by implementing Session State Server (or by using SQL Server). More importantly, it's tricky to make the back button work correctly, and it's also extra work to create a situation where a user can, say, open the same screen in three tabs to work on different records.
And that's just the tip of the iceberg.
There are workarounds for most of these issues, but as I grind away, all this friction gives me the feeling that passing data between pages using session is the wrong direction.
What I really want to do here is come up with a best practice that my shop can use all the time for passing data between pages, and then, for new apps, replace key parts of our stack that currently rely on Session.
It would also be nice if the final solution did not result in mountains of boilerplate plumbing code.
Proposed Solutions
Session
As mentioned above, leaning heavily on Session seems like a good idea, but it breaks the back button and causes some other problems.
There may be ways to get around all the problems, but it seems like a lot of extra work.
One thing that's very nice about using session is the fact that tampering is just not an issue. Compared to passing everything via the unencrypted QueryString, you end up writing much less guard code.
Cross-Page Posting
In truth I've barely considered this option. I have a problem with how tightly coupled it makes the pages -- if I start doing PreviousPage.FindControl("SomeTextBox"), that seems like a maintenance problem if I ever want to get to this page from another page that maybe does not have a control called SomeTextBox.
It seems limited in other ways as well. Maybe I want to get to the page via a link, for instance.
QueryString
I'm currently leaning towards this strategy, like in the olden days. But I probably want my QueryString to be encrypted to make it harder to tamper with, and I would like to handle the problem of replay attacks as well.
On 4 guys from Rolla, there's an article about this.
However, it should be possible to create an HttpModule that takes care of all this and removes all the encryption sausage-making from the page. Sure enough, Mads Kristensen has an article where he released one. However, the comments make it sound like it has problems with extremely common scenarios.
Other Options
Of course this is not an exaustive look at the options, but rather the main options I'm considering. This link contains a more complete list. The ones I didn't mention such as Cookies and the Cache not appropriate for the purpose of passing data between pages.
In Closing...
So, how are you handling the problem of passing data between pages? What hidden gotchas did you have to work around, and are there any pre-existing tools around this that solve them all flawlessly? Do you feel like you've got a solution that you're completely happy with?
Thanks in advance!
Update: Just in case I'm not being clear enough, by 'passing data between pages' I'm talking about, for instance, passing a CustomerID key from a CustomerSearch.aspx page to Customers.aspx, where the Customer will be opened and editing can occur.
First, the problems with which you are dealing relate to handling state in a state-less environment. The struggles you are having are not new and it is probably one of the things that makes web development harder than windows development or the development of an executable.
With respect to web development, you have five choices, as far as I'm aware, for handling user-specific state which can all be used in combination with each other. You will find that no one solution works for everything. Instead, you need to determine when to use each solution:
Query string - Query strings are good for passing pointers to data (e.g. primary key values) or state values. Query strings by themselves should not be assumed to be secure even if encrypted because of replay. In addition, some browsers have a limit on the length of the url. However, query strings have some advantages such as that they can be bookmarked and emailed to people and are inherently stateless if not used with anything else.
Cookies - Cookies are good for storing very tiny amounts of information for a particular user. The problem is that cookies also have a size limitation after which it will simply truncate the data so you have to be careful with putting custom data in a cookie. In addition, users can kill cookies or stop their use (although that would prevent use of standard Session as well). Similar to query strings, cookies are better, IMO, for pointers to data than for the data itself unless the data is tiny.
Form data - Form data can take quite a bit of information however at the cost of post times and in some cases reload times. ASP.NET's ViewState uses hidden form variables to maintain information. Passing data between pages using something like ViewState has the advantage of working nicer with the back button but can easily create ginormous pages which slow down the experience for the user. In general, ASP.NET model does not work on cross page posting (although it is possible) but instead works on posts back to the same page and from there navigating to the next page.
Session - Session is good for information that relates to a process with which the user is progressing or for general settings. You can store quite a bit of information into session at the cost of server memory or load times from the databases. Conceptually, Session works by loading the entire wad of data for the user all at once either from memory or from a state server. That means that if you have a very large set of data you probably do not want to put it into session. Session can create some back button problems which must be weighed against what the user is actually trying to accomplish. In general you will find that the back button can be the bane of the web developer.
Database - The last solution (which again can be used in combination with others) is that you store the information in the database in its appropriate schema with a column that indicates the state of the item. For example, if you were handling the creation of an order, you could store the order in the Order table with a "state" column that determines whether it was a real order or not. You would store the order identifier in the query string or session. The web site would continue to write data into the table to update the various parts and child items until eventually the user is able to declare that they are done and the order's state is marked as being a real order. This can complicate reports and queries in that they all need to differentiate "real" items from ones that are in process.
One of the items mentioned in your later link was Application Cache. I wouldn't consider this to be user-specific since it is application wide. (It can obviously be shoe-horned into being user-specific but I wouldn't recommend that either). I've never played with storing data in the HttpContext outside of passing it to a handler or module but I'd be skeptical that it was any different than the above mentioned solutions.
In general, there is no one solution to rule them all. The best approach is to assume on each page that the user could have navigated to that page from anywhere (as opposed to assuming they got there by using a link on another page). If you do that, back button issues become easier to handle (although still a pain). In my development, I use the first four extensively and on occasion resort to the last solution when the need calls for it.
Alright, so I want to preface my answer with this; Thomas clearly has the most accurate and comprehensive answer so far for people starting fresh. This answer isn't in the same vein at all. My answer is coming from a "business developer's" standpoint. As we all know too well; sometimes it's just not feasible to spend money re-writing something that already exists and "works"... at least not all in one shot. Sometimes it's best to implement a solution which will let you migrate to a better alternative over time.
The only thing I'd say Thomas is missing is; client-side javascript state. Where I work we've found customers are coming to expect "Web 2.0"-type applications more and more. We've also found these sorts of applications typically result in much higher user satisfaction. With a little practice, and the help of some really great javascript libraries like jQuery (we've even started using GWT and found it to be AWESOME) communicating with JSON-based REST services implemented in WCF can be trivial. This approach also provides a very nice way to start moving towards a SOA-based architecture, and clean separation of UI and business logic.
But I digress.
It sounds to me as though you already have an application, and you've already stretched the limits of ASP.NET's built-in session state management. So... here's my suggestion (assuming you've already tried ASP.NET's out-of-process session management, which scales signifigantly better than the in-process/on-box session management, and it sounds like you have because you mentioned it); NCache.
NCache provides you with a "drop-in" replacement for ASP.NET's session management options. It's super easy to implement, and could "band-aid" your application more than well enough to get you through - without any significant investment in refactoring your existing codebase immediately.
You can use the extra time and money to start reducing your technical debt by focusing new development on things with immediate business-value - using a new approach (such as any of the alternatives offered in the other answers, or mine).
Just my thoughts.
Several months later, I thought I would update this question with the technique I ended up going with, since it has worked out so well.
After playing with more involved session state handling (which resulted in a lot of broken back buttons and so on) I ended up rolling my own code to handle encrypted QueryStrings. It's been a huge win -- all of my problem scenarios (back button, multiple tabs open at the same time, lost session state, etc) are solved and the complexity is minimal since the usage is very familiar.
This is still not a magic bullet for everything but I think it's good for about 90% of the scenarios you run into.
Details
I built a class called CorePage that inherits from Page. It has methods called SecureRequest and SecureRedirect.
So you might call:
SecureRedirect(String.Format("Orders.aspx?ClientID={0}&OrderID={1}, ClientID, OrderID)
CorePage parses out the QueryString and encrypts it into a QueryString variable called CoreSecure. So the actual request looks like this:
Orders.aspx?CoreSecure=1IHXaPzUCYrdmWPkkkuThEes%2fIs4l6grKaznFGAeDDI%3d
If available, the currently logged in UserID is added to the encryption key, so replay attacks are not as much of a problem.
From there, you can call:
X = SecureRequest("ClientID")
Conclusion
Everything works seamlessly, using familiar syntax.
Over the last several months I've also adapted this code to work with edge cases, such as hyperlinks that trigger a download - sometimes you need to generate a hyperlink on the client that has a secure QueryString. That works really well.
Let me know if you would like to see this code and I will put it up somewhere.
One last thought: it's weird to accept my own answer over some of the very thoughtful posts other people put on here, but this really does seem to be the ultimate answer to my problem. Thanks to everyone who helped get me there.
After going through all the above scenarios and answers and this link Data pasing methods My final advice would be :
COOKIES for:
ENCRYPT[userId's]
ENCRYPT[productId]
ENCRYPT[xyzIds..]
ENCRYPT[etc..]
DATABASE for:
datasets BY COOKIE ID
datatables BY COOKIE ID
all other large chunks BY COOKIE ID
My advise also depends on the below statistics and this link details Data pasing methods :
I would never do this. I have never had any issues storing all session data in the database, loading it based on the users cookie. It's a session as far as anything is concerned, but I maintain control over it. Don't give up control of your session data to your web server...
With a little work, you can support sub sessions, and allow multi-tasking in different tabs/windows.
As a starting point, I find using the critical data elements, such as a Customer ID, best put into the query string for processing. You can easily track/filter bad data coming off of these elements, and it also allows for some integration with e-mail or other related sites/applications.
In a previous application, the only way to view an employee or a request record involving them was to log into the application, do a search for the employee or do a search for recent records to find the record in question. This became problematic and a big time sink when somebody from a related department needed to do a simple view on records for auditing purposes.
In the rewrite, I made both the employee Id, and request Ids available through a basic URL of "ViewEmployee.aspx?Id=XXX" and "ViewRequest.aspx?Id=XXX". The application was setup to A) filter out bad Ids and B) authenticate and authorize the user before allowing them to these pages. What this allowed the primarily application users to do was to send simple e-mails to the auditors with a URL in the e-mail. When they were in a big hurry, they were in their bulk processing time, they were able to simply click down a list of URLs and do the appropriate processing.
Other session related data, such as modification dates and maintaining the "state" of the user's interaction with the application gets a little more complex, but hopefully this provides a starting poing for you.

Resources