Storm real-time processing: What if it goes down? [closed] - bigdata

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Storm is a free and open source distributed realtime computation system. It receives streams of data and does processing on it. What if Storm goes down and part of the data never goes through it which means that calculations would not be in sync?
How can Storm solve this problem? If it can't, how could one solve this problem?
A similar question would be: How can I read old data that existed before Storm was added?

How can I read old data that existed before Storm was added?
The data must be stored somewhere (say, HDFS). You write a Spout which accepts data from some transport (say, JMS). Then, you would need to write replay code to read the appropriate data from HDFS, put it on a JMS channel, and Storm would deal with it. The trick is knowing how far back you need to go in the data, which is probably the responsibility of an external system, like the replay code. This replay code may consult a database, or the results of Storm's processing, whatever they may be.
Overall, the 'what if it goes down' question depends on what type of calculations you are doing, and if your system deals with back pressure. In short, much of the durability of your streams are dependent on the messaging/transport mechanism that delivers to Storm.
Example: If you need to simply tranform (xslt) individual events, then there is no real-time failure, and no state issues if Storm goes down. You simply start back up and resume processing.
The system that provides your feed may need to handle the back pressure. Messaging transports like Kafka can handle durable messaging, and allow Storm to resume where it left off.
The specific use case that results in "calculations would not be in sync" would need to be expounded upon to provide a better, more specific answer.

Related

Multiplayer billiards game physics simulation [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I’m building an online multiplayer billiards game and I’m struggling to think of the best approach to multiplayer physics simulation. I have thought of a three possible scenarios, each having its own advantages and disadvantages and I would like to hear some opinion of those that either have implemented something similar already or have experience in multiplayer online games.
1st Scenario: Physics simulation on the clients: The player in turn to take a shot sends the angle of the shot and power to the server, and the sever updates all clients with these values so they can simulate the shot independently.
Advantages:
Low server overheat
Disadvantages:
Problems with synchronization. Clients must simulate the exact simulation regardless of their frame rate. (Possible to solve with some clever algorithm like one described here)
Cheating. Players can cheat by tweaking the physics engine. (Possible to determine when making a comparison at the end of the shot with other players ball positions. If only two players are at the table (i.e. not spectaculars) then who the cheater is?)
2nd Scenario:
Physics simulation on one (i.e. “master”) client (e.g. who ever takes the shot) and then broadcast each physics step to everyone else.
Advantages:
No problems with synchronization.
Disadvantages:
1.Server overheat. Each time step the “master” client will be sending the coordinates of all balls to the server, and the server will have to broadcast them to everyone else in the room.
2. Cheating by the “master” player is still possible.
3rd Scenario: The physics will be simulated on the server.
Advantage:
No possibility to cheat as the simulation is run independent of clients.
No synchronization issues, one simulation means everyone will see the same result (event if not at the same time because of network lag)
Disadvantages:
Huge server overload. Not only the server will have to calculate physics 30/60 times every second for every table (there might be 100 tables at the same time) but also will have to broadcast all the coordinates to everyone in the rooms.
EDIT
Some of similar games to the one I’m making, in case someone is familiar with how they have overcame these issues:
http://apps.facebook.com/flash-pool/
http://www.thesnookerclub.com/download.php
http://gamezer.com/billiards/
I think that the 3rd one is the best.
But you can make it even better if you compute all the collisions and movement in the server before sending them to the clients (every collisions and movements, etc...) then clients just have to "execute" them.
If you do that you will send the informations only once per shot, that will greatly reduce the network issue.
And as JimR wrote, you should use velocity or movement equation instead of doing small step by small step incremental simulation (like the Runge-Kutta method)
The information that the server send to the client would look like this:
Blackball hit->move from x1,y1 to x2,y2 until time t1
Collision between blackball and ball 6 at time t1
Ball 6 -> move from x3,y3 to x4,y4 until time t3
Blackball -> move from x5,y5 to x6,y6 until time t4
Collision between Ball 6 and Ball 4 at time t3
and so on until nothings move anymore
Also, you are likely to need a bunch of classes representing the differents physics equations and have a way to serialize them to send them to the clients (Java or C# can serialize objects easily).

Getting Started with UDK [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I've been trying for a couple of days now to learn UDK, but I seem to be stuck at making that leap to understanding how everything works together. I understand the syntax, that's all well and good, and I pretty much get how classes and .ini files interact. As for the API, I have the entire reference as pretty decent Doxygen-style HTML output.
What I'm looking for is a sort of intermediate tutorial on game creation from scratch (as opposed to modding UT3 itself), more advanced than just learning language syntax, but not yet to the level of going through the API step by step. I'm looking for some guide to the structure of the internals - how GameInfo and PlayerController interact, where Pawn comes in, etc. - a way to visualize the big picture.
Does anyone have a particular favorite intermediate-level tutorials (or set of tutorials) that they used when first learning UDK?
Check out these these were (maybe still are?) the best when I first started. I have then since stopped using UDK due to lack of time but these are really good.
http://forecourse.com/unreal-tutorials/
Strangely, I never found tutorials on that topic.
In the way things come together, there is no big difference between modding UT3 and creating a new game -- it's just easier to play around on top of UT3 code because there's content to work with.
Development/Src contains uncompiled source code. Each of the folders in there gets compiled into a .u script package for use by the editor and the game. They end up in UDKGame\Script
UDKGame has all the packages, including assets, maps, and compiled scripts.
GameInfo (or your class derived from it) is used for things central in your game. A standalone game would derive from this. The derived class does not have to be big, it's probably not a good idea to put a lot of logic here. You can and should use this class to store central properties for your game -- like, what HUD class your game uses, what player controller class, etc. For example, a racing game could track the time of the race here, and notify players when race started or ended, and would also have a property like HUDType=class'Racer.RacerHUD'.
Controllers, such as PlayerController and AIController (which UTBot is derived from), are used to send instructions to Pawns. Pawns don't do anything on their own, they're more like empty shells that the controller can manipulate.
Things handled in the controllers are AI and input. Things handled in pawns are all kinds of animations for movement, taking damage, etc, anything visual.
Sorry, I don't have time for a longer answer, but I hope this helps a little bit.
PS -- What helped me A LOT was getting the game Whizzle and reading each class in that code. It does not derive from UT3 code, and it's very very small.

What kind of data should never go into session? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What kinds of data should never be kept in a session?
I really wish it was clearer what kind of session you mean. Depending on the answer, I can come up with a couple:
Passwords of any sort
Large amounts of data, especially 4 GB+ on a 32-bit OS (guaranteed out of memory if it has to be loaded into RAM)
Executable code
Raw SQL
Swear words
Things likely to get government agencies angry ("Free Tibet" in China, threats to the president in the US)
Your bank account PIN or credit card number
A rabid badger. Actually, ANY kind of badger.
If possible, store nothing in the Session. It is an unreliable way to maintain state, especially if you need to move to a web farm. Also, I believe it encourages poor design. HTTP is stateless, and web sites should be designed in a way where you assume that for any request, you could be starting over from scratch.
COM or complex objects.
This link can also be useful: ASP.NET 2.0 Performance Inspection Questions - Session State
This answer is for PHP Sessions.
If you mean $_SESSION, well it is stored on the hard drive, so it is not immediately available in anything like the cookies.
However, on a shared host, it can sometimes be trivial to access session files from other websites.
I would not store anything in the session you wouldn't want anyone else on your shared host to see.
This can be a pretty subjective question. Anything that's serializable can be stored in session, technically. But there are definitely scenarios where you don't want to add things to session. Complex objects, objects that have large collections as properties, etc. All these things are serialized into byte arrays and kept in memory (for InProc Session State) and then deserialized when needed in code again. The more complex the object, the more resource intensive it can get to go back and forth.
Depending on how many users you have, you may wish to limit the number of items that go into session and perhaps use ViewState or other means of persistence. If it's truly something meant for multiple pages, then it's probably a good candidate for session. If it's only used in a page or two, then ViewState, QueryString, etc. may be better.
I would not put the session inside the session also!
You can store anything in Session as long as you keep the SessionMode="InProc" in the web.config. This stores any session data in the web server's memory in a user specific context.
However, if you want to scale up one day and run your web app in a farm, you will have to use another SessionMode. Then you can't any longer store objects of complex types that are not serializable (unfortunately, dictionaries are a common candidate) and you will have to change your design.
DataSets: Serialising a dataset to store in session can take up an order of magnitude more memory than the dataset itself (i.e. a 1MB dataset can need 20MB to serialise/deserialise, and it does that on every request).
Controls: Storing controls (and their collections) in session means that ASP.NET can't clean them up properly at the end of the page request, leading to memory leaks.
See Tess Ferrandez's blog for other examples of things you should never put in session, along with reasons why.
Stock tips, pirated CDs, full-length movies (except "Clerks", that movie was awesome), analog information, ...
This question seems kind of vague -- I can think of countless kinds of information that shouldn't be stored in the session!

Are there specific "technical debts" that are not worth incurring? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
There are (at least) two ways that technical debts make their way into projects. The first is by conscious decision. Some problems just are not worth tackling up front, so they are consciously allowed to accumulate as technical debt. The second is by ignorance. The people working on the project don't know or don't realize that they are incurring a technical debt. This question deals with the second. Are there technical debts that you let into your project that would have been trivial to keep out ("If I had only known...") but once they were embedded in the project, they became dramatically more costly?
Ignoring security problems entirely.
Cross-site scripting is one such example. It's considered harmless until you get alert('hello there!') popping up in the admin interface (if you're lucky - script may as well silently copy all data admins have access to, or serve malware to your customers).
And then you need 500 templates fixed yesterday. Hasty fixing will cause data to be double-escaped, and won't plug all vulnerabilities.
Storing dates in a database in local timezone. At some point, your application will be migrated to another timezone and you'll be in trouble. If you ever end up with mixed dates, you'll never be able to untangle them. Just store them in UTC.
One example of this is running a database in a mode that does not support Unicode. It works right up until the time that you are forced to support Unicode strings in your database. The migration path is non-trivial, depending on your database.
For example, SQL Server has a fixed maximum row length in bytes, so when you convert your columns to Unicode strings (NCHAR, NVARCHAR, etc.) there may not be enough room in the table to hold the data that you already have. Now, your migration code must make a decision about truncation or you must change your table layout entirely. Either way, it's much more work than just starting with all Unicode strings.
Unit Testing -- I think that failing to write tests as you go incurs a HUGE debt that is hard to make up. Although I am a fan of TDD, I don't really care if you write your tests before or after you implement the code... just as long as you keep your tests synced with your code.
Not starting a web project off using a javascript framework and hand implementing stuff that was already available. Maintaining the hand written javascript became enough of a pain that I ended up ripping it all out and redoing it with with the framework.
I really struggle with this one, trying to balance YAGNI versus "I've been burned on this once too often"
My list of things I review on every application:
Localization:
Is Time Zone ever going to be important? If yes, persist date/times in UTC.
Are messages/text going to be localized? If yes, externalize messages.
Platform Independence? Pick an easily ported implementation.
Other areas where technical debt can be incurred include:
Black-Hole Data collection: Everything goes in, nothing ever goes out. (No long-term plan for archiving/deleting old data)
Failure to keep MVC or tiers cleanly separated over the application lifetime - for example, allowing too much logic to creep into the View, making adding an interface for mobile devices or web services much more costly.
I'm sure there will be others...
Scalability - in particular data-driven business applications. I've seen more than once where all seems to run fine, but when the UAT environment finally gets stood up with database table sizes that approach productions, then things start falling down right and left. It's easy for an online screen or batch program to run when the db is basically holding all rows in memory.
At a previous company they used and forced COM for stuff it wasn't needed for.
Another company with a C++ codebase didn't allow STL. (WTF?!)
Another project I was on made use of MFC just for the collections - No UI was involved. That was bad.
The ramifications of course for those decisions were not great. In two cases we had dependencies on pitiful MS technologies for no reason and the other forced people to use worse implementations of generics and collections.
I classify these as "debt" since we had to make decisions and trade-offs later on in the projects due to the idiotic decisions up front. Most of the time we had to work around the shortcomings.
While not everyone may agree, I think that the largest contributor to technical debt is starting from the interface of any type of application and working down in the stack. I have come to learn that there is less chance of deviation from project goals by implementing a combination of TDD and DDD, because you can still develop and test core functionality with the interface becoming the icing.
Granted, it isn't a technical debt in itself, but I have found that top-down development is more of an open doorway that is inviting to decisions that are not well thought out - all for the sake of doing something the "looks cool". Also, I understand that not everyone will agree or feel the same way about it, so your mileage might vary on this one. Team dynamics and skills are a part of this equation, as well.
The cliche is that premature optimization is the root of all evil, and this certainly is true for micro-optimization. However, completely ignoring performance at a design level in an area where it clearly will matter can be a bad idea.
Not having a cohesive design up front tends to lead to it. You can overcome it to a degree if you take the time to refactor frequently, but most people keep bashing away at an overall design that does not match their changing requirements. This may be a more general answer that what your looking for, but does tend to be one of the more popular causes of technical debt.

Sqlite Optimization: Read only scenario [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I use SQLite for a number of application on the desktop and PDA. Most operations are readonly, as SQLite functions as a data store for reference material in my applications.
Basically, I am looking for suggestions on improving performance in a scenario where you know the access to the data is only read-only.
Maybe via various pragma settings? etc...
SQLite performance is excellent, however on the PDA when you have multiple databases I can see a small performance hit. I dont think this is a problem with SQLite, just the reality of the speed of a PDA. However, having said that, maybe there are ways to improve it.
Good advice and well put. I am hoping for something more specific in telling the engine about what I am doing. For example, telling the engine there will be no multiple writes to the DB, or modifying the cache handling in some way.
However, I am glad you called attention to the "design" aspect of the database as a leading issue.
The standard database performance tips still apply:
Make sure your queries use indexes rather than full table scans
Be as selective as you can in your queries so you aren't pulling unneeded rows from the db
Select only the columns you want
sqlite3_open_v2() with the flag SQLITE_OPEN_READONLY alters the way SQLite handles opportunistic locks no real performance advantages. You could use a pragma cache_size if you are doing lots of reads or depending on the size of the db make an in memory copy of the db using the :memory open option.
You can call sqlite3_open_v2() with the flag SQLITE_OPEN_READONLY. I have no idea if sqlite3 actually uses that to optimize its behavior, or just as way to set the appropriate permissions on the open call it makes to the OS.

Resources