Riak-TS UseCase vs other tsdb - riak

This is a proof of concept and I am curious on experiences in using Riak-TS to evaluate it.
I am working on a mobile app where part of the use is to display graphs/charts of various data. The data is related to commercial printers, jobs that pass through to them, and pre-processing information and has a snapshot of various metrics, but is currently only available in real-time so I am looking at a tsdb implementation for analyzing historical data.
I would use Riak-TS to collect time series data on around 30-60 second intervals and use the data to display:
number of jobs printed by hour/shift/day/week/etc
Ink usage by hour/shift/day/etc
Various other data related to a sum/average/series snapshot of data at a specific time span.
What are some things I should consider to decide whether to use Riak-TS for this and potential drawbacks to think about?
What level of Erlang is required to use Riak for a basic proof of concept set-up of this case. I am pretty comfortable with Python and JavaScript and it looked like Riak was available to work with in those languages, but I probably don't have time to learn Erlang for the setup of this project.
Is there a noticeable difference in the Python, Node.js, HTTP interface easier to use, faster, more features, etc? I have worked with some cloud services where some interfaces had missing/buggy/slow features and would like to plan on using the best one. If that is Java, C#, or Go I would be interest in that information too.
What other open source implementations outside of Riak-TS should I explore?

At first blush this sounds like a good potential use case for Riak TS. Are there drawbacks to using TS vs something else? Maybe, the one thing I would note is that you didn't say how much data you will be dealing with. Riak TS is designed to be clustered from the beginning and the recommendation is that you start with a 5 node cluster for high availability reasons. You can start with a single node and scale out as needed but you lose out on some of the advantages of the TS platform by doing that.
I will also point out that TS was just open sourced not long ago and may not have all of the features of its competitors yet (but the team, and full disclosure I work for Basho, is working on frequent releases to add new features).
On to Erlang. You need to know 0 Erlang to use TS. For what you need to do there is no need to learn Erlang.
The Python client for Riak TS is excellent. I have used it and the Java client extensively. I would guess that the other clients are also quite good because they are written and maintained by the same group of engineers and client software is their specialty.
I would recommend using the client (whether it be Python, Node, Java, etc.) over the HTTP API because it will likely be easier for you and performance will be better since the clients use protocol buffers and/or TTB vs HTTP.
Other databases you should try? You mention TSDB in the title of this question. My experience is that TSDB is much harder to get up and running with. InfluxDB is probably the most popular time series specific database out there right now. I don't have personal experience with it but I am guessing by its popularity that it is pretty good.
Your use case sounds pretty interesting (I used to work in the printing industry) so if you have any other questions I can help with please let me know.

Related

Modelling many to many relationships in Meteor

Hi I am building a small app to get used to Meteor (and Mongo). Something that is bothering me is the data modelling aspect. Specifically what is the best way to model a many to many relationship. I have read in the Mongo docs that a doc should not be embedded in another doc if you expect it to grow while the original doc remains fairly static.
In my test app students can register for courses. So from the Mongo perspective it makes sense to include the students as an embedded doc in the course as each course will have a limited number of students as opposed to the other way round where, over time, a student could theoretically join unlimited courses.
Then there is the Meteor aspect, I read that a lot of Meteor's features are aimed at separate collections, such as DDP working at the document level so any change in the student array would cause the entire course doc to be resent to every browser, and things like the each spacebars helper works with Mongo cursors but not with arrays etc, etc.
Has anyone dealt with a similar situation and could they explain what approach they took and any drawbacks they had to deal with etc? Thanks.
See this article: https://www.discovermeteor.com/blog/reactive-joins-in-meteor/
And test how good your possible solutions are with this https://kadira.io/
Better use the guide:
http://guide.meteor.com/data-loading.html#publishing-relations
The Meteor team tames (or hides!) the javascript monster to an amazing extent. By using their conventions, you get "free" a ton of much-used functionality "out of the box". Things that are usually re-invented over and over again, accounts, OAuth, live data across clients, standard live-data protocol etc.
But very soon... you need features not in the box. Wow... look at all the choices. Wait a minute, this is the same monster you were fighting before Meteor!
So use the official Meteor Guide. They recommend the wisest ways to extend functionality of your app when you make these choices.
Since they know how they have "hidden the monster", they know how to keep avoiding the monster when you extend.

What cache strategy do I need in this case ?

I have what I consider to be a fairly simple application. A service returns some data based on another piece of data. A simple example, given a state name, the service returns the capital city.
All the data resides in a SQL Server 2008 database. The majority of this "static" data will rarely change. It will occassionally need to be updated and, when it does, I have no problem restarting the application to refresh the cache, if implemented.
Some data, which is more "dynamic", will be kept in the same database. This data includes contacts, statistics, etc. and will change more frequently (anywhere from hourly to daily to weekly). This data will be linked to the static data above via foreign keys (just like a SQL JOIN).
My question is, what exactly am I trying to implement here ? and how do I get started doing it ? I know the static data will be cached but I don't know where to start with that. I tried searching but came up with so much stuff and I'm not sure where to start. Recommendations for tutorials would also be appreciated.
You don't need to cache anything until you have a performance problem. Until you have a noticeable problem and have measured your application tiers to determine your database is in fact a bottleneck, which it rarely is, then start looking into caching data. It is always a tradeoff, memory vs CPU vs real time data availability. There is no reason to make your application more complicated than it needs to be just because.
An extremely simple 'win' here (I assume you're using WCF here) would be to use the declarative attribute-based caching mechanism built into the framework. It's easy to set up and manage, but you need to analyze your usage scenarios to make sure it's applied at the right locations to really benefit from it. This article is a good starting point.
Beyond that, I'd recommend looking into one of the many WCF books that deal with higher-level concepts like caching and try to figure out if their implementation patterns are applicable to your design.

How much unity across different teams?

Our company builds several (Java) applications that loosely communicate with eachother via web services, remote EJB and occasionally via shared data in a DB.
Each of those applications are build and maintained by their own teams. 1 or 2 persons for the smaller apps, and almost 10 for the largest one. The total amount of developers is approximately 25 FTE.
One problem we're facing is that there are some big egos among the teams. Historically the team of the largest app has set up a code convention and general guide lines. For instance our IDE is Netbeans, we use Hg for SCM, build with Ant and emphasize to first use as much from Java EE as possible, if that doesn't suffice use an external library and only resort to writing something yourself as a last resort. Writing things like yet another logging framework, orm, cms or web framework is pretty much not allowed following these guide lines.
Now some of the smaller teams go against this and start using Eclipse, Git and Maven and have an approach of writing as much as possible themselves and only look at existing things if time is short or they 'just don't feel like writing it themselves'. Where the main team uses log4j, one of the smaller teams just started writing their own logging framework.
There have been talks going on about all teams adhering to the same standards, but these have been 'troublesome' at best.
Now the big question I'd like to ask: does it actually matter that different teams do things differently? As long as each seperate app implements its requirements and provides the agreed upon interfaces, should we really force everyone to use Hg, Ant, the same code conventions, etc etc?
There is not much harm in letting each team use the technologies that work best for them. In fact if you restrict teams to the "standard" way of doing things you'll stifle innovation and have bad morale.
But you don't want things to diverge too much. There a few things you can do to prevent libraries and tools getting out of hand. The first thing is to have regular rotation of each member through the teams to cross pollinate ideas. In this way the best ideas will spread through the teams.
You can also enforce a "rule of 3", which simply says it is ok to introduce a second library, tool, logging approach, whatever. But as soon as you want to introduce a 3rd one, you have to remove one of the first two. In other words it is ok to have 2 competing logging frameworks but if there are 3 logging frameworks, choose one to kill.
A 3rd idea is to let developers run regular presentations to the entire developer group to demonstrate the pros and cons of each idea or approach. Encourage lots of discussion and constructive criticism. The purpose is to try many things and let everyone find the best way as a group.
Finally, Management 3.0 talks a lot more in depth about how teams make decisions. Well worth the read.

MUD Programming questions

I used to play a MUD based on the Smaug Codebase. It was highly customized, but was the same at the core. I have the source code for this MUD, and am interested in writing my own (Just for a fun project). I've got some questions though, mostly about design aspects. Maybe someone can give me a hand?
What language should I use? Interpreted or compiled? Does it make a difference? SMAUG is written in C. I am comfortable with a lot of languages, and have no problem learning more.
Is there a particular approach I should follow to not hinder performance? Object Oriented, functional, etc?
What medium should I use for storing data? Flat files (This is what SMAUG uses), or something like SQLite. What are the performance pros/cons of both?
Are there any guides that anyone knows of on how to get started on a project like this?
I want it to scale to allow 50 players online at a time with no decrease in performance. If I used Ruby 1.8 (very slow), would it make a difference compared to using Python 3.1 (Faster), or compiled C/C++?
If anyone can lend a hand and give some info or advice, I'd be eternally grateful.
I'll give this a shot:
In 2009, for a 50 player game, it doesn't matter. You may want to pick a language that you're familiar with profiling tools for, if you want to grow it further, but since RAM is so cheap nowadays, the constraints driving the early LPMUD (which I have experience with) and DikuMUD (which your Smaug is derived from) don't apply. (LPMUD could handle ~10-15 players on a machine with 8MB RAM)
The programming style doesn't necessarily lead to performance difficulties, large sites like Amazon's 'obidos' webserver are written in C, but just-as-large sites like the original Yahoo Stores were written in Lisp, StackOverflow is written in ASP.NET, etc. I'd /personally/ use C but many people would call me a sadist.
Flat Files are kind of pointless in today's day and age for lots of data storage, there are specific-case exceptions (Large mailservers sometimes use 'maildir' which is structured flat-files, for example). The size of your game likely means you won't be running into huge slowness driven by data retrieval delays, but the data integrity in-case-of-crash are probably going to make the most convincing argument.
Don't know of any guide, but what I'd do is try to get the game started as a dumb chat server to start, make sure users can log in and do something (take their input and dump it to all other users), then build that up to allowing specific logins, so you'll start facing the challenge of username/password handling, and user option setting / storage / retrieval ... then start adding the gamedriver elements (get tic tac toe games working in game), then go a little more complex (get a 5-room setup working with objects you can pick up / drop / bash each other with), then add some non-player characters, and THEN worry about slurping in the Diku-derived smaug castles / etc and working with them. :)
This is a bit off the cuff , I'm sure there are dissenting opinions. :) Good luck!
This is a text based game, right? In that case, with current hardware, it seems all you would have to worry about is not accidentally creating an O(n**2) algorithm. Even that probably wouldn't be too bad with 50 users.

How to Convince Programming Team to Let Go of Old Ways?

This is more of a business-oriented programming question that I can't seem to figure out how to resolve. I work with a team of programmers who have been working with BASIC for over 20 years. I was brought in to help write the same software in .NET, only with updates and modern practices. The problem is that I can't seem to get any of the other 3 team members(all BASIC programmers, though one does .NET now as well) to understand how to correctly do a relational database. Here's the thing they won't understand:
We basically have a transaction that keeps track of a customer's tag information. We need to be able to track current transactions and past transactions. In the old system, a flat-file database was used that had one table that contained records with the basic current transaction of the customer, and another transaction that contained all the previous transactions of the customer along with important money information. To prevent redundancy, they would overwrite the current transaction with the history transactions-(the history file was updated first, then the current one.) It's totally unneccessary since you only need one transaction table, but my supervisor or any of my other two co-workers can't seem to understand this. How exactly can I convince them to see the light so that we won't have to do ridiculous amounts of work and end up hitting the datatabse too many times? Thanks for the input!
Firstly I must admit it's not absolutely clear to me from your description what the data structures and logic flows in the existing structures actually are. This does imply to me that perhaps you are not making yourself clear to your co-workers either, so one of your priorities must be to be able explain, either verbally or preferably in writing and diagrams, the current situation and the proposed replacement. Please take this as an observation rather than any criticism of your question.
Secondly I do find it quite remarkable that programmers of 20 years experience do not understand relational databases and transactions. Flat file coding went out of the mainstream a very long time ago - I first handled relational databases in a commercial setting back in 1988 and they were pretty commonplace by the mid-90s. What sector and product type are you working on? It sounds possible to me that you might be dealing with some sort of embedded or otherwise 'unusual' system, in which case you do need to make sure that you don't have some sort of communication issue and you're overlooking a large elephant that hasn't been pointed out to you - you wouldn't be the first 'consultant' brought into a team who has been set up in some manner by not being fed the appropriate information. That said such archaic shops do still exist - one of my current clients systems interfaces to a flat-file based system coded in COBOL, and yes, it is hell to manage ;-)
Finally, if you are completely sure of your ground and you are faced with a team who won't take on board your recommendations - and demonstration code is a good idea if you can spare the time -then you'll probably have to accept the decision gracefully and move one. Myself in this position I would attempt to abstract out the issue - can the database updates be moved into stored procedures for example so the code to update both tables is in the SP and can be modified at a later date to move to your schema without a corresponding application change? Make sure your arguments are well documented and recorded so you can revisit them later should the opportunity arise.
You will not be the first coder who's had to implement a sub-optimal solution because of office politics - use it as a learning experience for your own personal development about handling such situations and commiserate yourself with the thought you'll get paid for the additional work. Often the deciding factor in such arguments is not the logic, but the 'weight of reputation' you yourself bring to the table - it sounds like having been brought in you don't have much of that sort of leverage with your team, so you may have to work on gaining a reputation by exceling at implementing what they do agree to do before you have sufficient reputation in subsequent cases - you need to be modded up first!
Sometimes you can't.
If you read some XP books, they often say that one of your biggest hurdles will be convincing your team to abandon what they have always done.
Generally they will recommend letting people who can't adapt go to other projects (Or just letting them go).
Code reviews might help in your case. Mandatory code reviews of every line of code is not unheard of.
Sometime the best argument is an example. I'd write a prototype (or a replacement if not too much work). With an example to examine it will be easier to see the pros and cons of a relational database.
As an aside, flat-file databases have their places since they are so much easier to "administer" than a true relational database. Keep an open mind. ;-)
I think you may have to lead by example - when people see that the "new" way is less work they will adopt it (as long as you don't rub their noses in it).
I would also ask yourself whether the old design is actually causing a problem or whether it is just aesthetically annoying. It's important to pick your battles - if the old design isn't causing a performance problem or making the system hard to maintain you may want to leave the old design alone.
Finally, if you do leave the old design in place, try and abstract the interface between your new code and the old database so if you do persuade your co-workers to improve the design later you can drop the new schema in without having to change anything else.
It is difficult to extract a whole lot except general frustration from the original question.
Yes, there are a lot of techniques and habits long-timers pick up over time that can be useless and even costly in light of technology changes. Some things that made sense when processing power, memory, and even disk was expensive can be foolish attempts at optimization now. It is also very much the case that people accumulate bad habits and bad programming patterns over time.
You have to be careful though.
Sometimes there are good reasons for the things those old timers do. Sadly, they may not even be able to verbalize the "why" - if they even know why anymore.
I see a lot of this sort of frustration when newbies come into an enterprise software development shop. It can be bad even when the environment is all fairly modern technology and tools. If most of your experience is in writing small-community desktop and Web applications a lot of what you "know" may be wrong.
Often there are requirements for transaction journaling at a level above what your DBMS may do. Quite often it can be necessary to go beyond DB transaction semantics in order to ensure time-sequence correctness, once and only once updating, resiliancy, and non-repudiation.
And this doesn't even begin to address the issues involved in enterprise or inter-enterprise scalability. When you begin to approach half a million complex transactions a day you will find that RDBMS technology fails you. Because relational databases are not designed to handle high transaction volumes you must often break with standard paradigms for normalization and updating. Conventional RDBMS locking techniques can destroy scalability no matter how much hardware you throw at the problem.
It is easy to dismiss all of it as stodginess or general wrong-headedness - even incompetence. But be careful because this isn't always the case.
And by the way: There are other models besides the RDBMS, and the alternative to an RDBMS is not necessarily "flat files" - contrary to the experience of of most coders today. There are transactional hierarchical DBMSs that can handle much higher throughput than an RDBMS. IMS is still very much alive in large IBM shops, for example. Other vendors offer similar software for different platforms.
Of course in a 4-man shop maybe none of this applies.
Sign them up for some decent trainings and then it's up to you to convince them that with new technologies a lot more is possible (or at least easier!).
But I think the most important thing here is that professional, certified trainers teach them the basics first. They will be more impressed by that instead of just one of their colleagues telling them: "hey, why not use this?"
Related post here.
The following may not apply in yr situation, but you make very little mention of technical details, so I thought I'd mention it...
Sometimes, if the access patterns are very different for current data than for historical data (I'm making this example up, but say that Current data is accessed 1000s of times per second, and accesses a small subset of columns, and all current data fits in less than 1 GB, whereas, say, historical data uses 1000s of GBs, is accessed only 100s of times per day, and access is to all columns),
then, what your co-workers are doing would make perfect sense, for performance optimization. By separating the current data (albiet redundantly) you can optimize the indices and data structures in that table, for the higher frequency access paterns that you could not do in the historical table.
Not everything that is "academically", or "technically" correct from a purely relational perspective makes sense when applied in an actual practical situation.

Resources