Predicating the HOME and WORK location by VESPA - bigdata

It is one of the use case of my application. I am feeding location details(latitude, longitude, time) at every 5 minutes to VESPA. How VESPA can predicate the HOME and WORK address on the basis of data I fed into it? Is it possible in VESPA? If YES, How should I write the application for it or anything else?

I guess you'd need to train a ml model for this.
Vespa can not help you with that, but it can execute those models query time if you need that (which seems unlikely to me since these predictions only change when the addresses (of users I assume) change).

Related

Unity + Firebase: is it possible to append data to a keys value, or do I have to retrieve keys data every time?

I'm a bit worried that I will reach the free data limits of Firebase in a student project.
Basically my question is:
is it possible to append to the end of the string instead of retrieving key and value, appending and uploading again.
What I want to achieve:
I have to create statistics of user right/wrong answers for particular questions.
I want to have a kvp:
answers: 1r/5w/3r
Where number is the number of users guesses and r/w means right wrong. Whenever the guessing session ends I want to add /numberOfGuesses+RightOrWrongAnswer and the end.
I'm using Unity 2018.
Thank you in advance for all the help!
I don't know how your game is architected or how many people are playing, but I'd be surprised if you hit your free limit on a student project (you can store 1GB and download 10GB). That string is 8 bytes, let's assume worst case scenario: as a UTF32 string, that would be 32 bytes of data - you'd have to pull that down 312 million times to hit a cap (there'll be some overhead, but I can't imagine it being a hugely impactful). If you're afraid of being charged, you can opt to not have a credit card on file to be doubly sure you stay on a student budget.
If you want to reduce the amount of reading/writing though, I might suggest that instead of:
key: <value_string> (so, instead of session_id: "1r/5w/3r")
you structure more like:
key:
- wrong: 5
- right: 3
So have two more values nested under your key. One for all the wrong answers, just an incrementing integer. Then one for all the right answers: just an incrementing integer.
The mechanism to "append" would be a transaction, and you should use these whether you're mutating a string or counter. Firebase tries to be smart with data usage and offline caching, but you don't get much more control other than that.
If order really matters, you might want to get cleverer. You'll generally want to work with the abstractions Realtime Database gives you though to maximize any inherent optimizations (it likes to think in terms of JSON documents, so think about your data layout similarly). This may not be as data optimal, but you may want to consider instead using a ledger of some kind (perhaps using ServerValue.Timestamp to record a single right or wrong answer, and having a cloud function listening to sum up the results in the background after a game - this would be especially useful if you plan on having a lot of users trying to write the same key at the same time).

Can One Time Passwords be used as identifiers?

If I have bunch of OTPs mixed and if I know all of their generation seeds (the OPT URI) can I group by source URI?
I have a use case there I need the system to be 100% blind to the data relationships that its passing around.
For example: Users enter OTPs from their smartphones instead of their logins it should become very difficult identify entries by one user. As data is exported of the system that has OPT seeds is it possible to reestablish entry's ownership?
That's possible, but with a big complexity. You will need to generate codes for all seeds you have and then find if there is any match.
Also, there is a chance to receive the same code for different seeds at some moment. To avoid this problem you can ask a user for several consecutive codes, this significantly decreases the possibility of codes matching just by case.

How to cache complex calculated temporary Data

I have an Application that allows people to bet on the result of soccer games.
The score of each single bet (=entity) is calculated by comparing the betted scores of the bet with the actual result in the game(=entity). Bet's are betted within Betrounds. Betrounds are organisations where groups bet on gamegroups (groups of games e.g. single matchdays). Single Usergroups can have several betrounds.
To summarize the relational model:
UserGroup 1:N BetRounds 1:N Bets N:1 Game
Within each betround I create a resulttable where I show every user with their result points and position.
In order to calculate the position of one user I need to calculate the points of every user within a betround.
These points from the single betrounds are aggregated into groups and within the group there is again a resulttable.
Example
A Usergroup with: 20 users
One Season has 34 matchdays
One matchday has 9 games
In order to calculate the the points for this usergroup I would need to calculate the points from 20*34*9=6120 bets.
Since this is a lot to calculate I don't want to do it everytime I show the resulttable.
I currently see two options in order to save some calculation time:
Cache
Save interim results (e.g. on the bet entity) in the database
Maybe a mix of both.
Cache
If caching is the correct way I am not sure on which level and how to invalidate.
There are several options what to cache:
- pointresult of single bets
- pointresults of single users within a betround
- whole result table of a betround (points & position)
- pointresult of single user within usergroup
- whole resulttable of usergroup
I am unsure how to cache those data:
- just the integer values for positions and points
- whole entities (e.g. bets)
- temporary not persistent entities (e.g. to represent the the resulttables)
- the html output of the table
Then dependent on format how to cache it:
- html views could be cached via reverse proxies
- values / entities probably via redis / memcache etc.
In the future we might change to a single page app that data is only served via restapi, then caching of html outputs is not an option.
Dependent on the caching strategy the question arises how to invalidate cache and optionally warm it, so that the result is never calculated within the application but only recalculated when the cache is invalidated and immediately replaced by the new result.
I have read very often that cache invalidation is evil. I am not sure if this applies to my use case since all points/results/tables etc. only change when my interface updates the result of the games. This is the only time when points change.
2.Save interim results (e.g. on the bet entity) in the database
I am not sure if this scenario is applicable on all levels. I first thought about saving the actual result on a bet instead of always comparing the bet scores with the actual scores. This would then make my data model a little bit redundent and i have increased complexity if I wrong result is fetched by my interface and later the correct comes in and my points are not recalculated.
On all other levels I would need to create new interim entities to store table results persistently.
3.Mix of both
I am not sure how mixing both would look like and if it makes sense at all, but I thought it might be an option.
Any advice, Input or experience would be highly appreciated.
I only mildly understand betting, so hopefully this helps.
It sounds like you are asking two questions:
When do I calculate results?
How much caching should I use?
To me it sounds like there are very clear events that happen, after which you can successfully calculate your results. Your design should take advantage of this and be evented in nature. You should have background processes that can detect when a game is complete. The results of the game should be written, and additional background jobs should be triggered to calculate the results of any bets that depend on that game.
This would also be the point at which any caches that involve that game, results from that game, or results from any bets on that game, should be invalidated and/or refreshed.
How much you should cache should be based on how much you need to cache. Caching should be considered separately from computing results. That is not caching. That is computing results and storing them. You should definitely not be calculating results during a page view request, and should be done ahead of time when the corresponding event (game ends) has triggered the calculation.
Your database should pretty much always represent the latest information you have on everything. You should avoid doing any calculations on-the-fly if possible.
I would get all the events and background stuff working first, then see what kind of performance you get. At that point your app should be doing little more than taking the results and sticking them into a view for each page view. If that part is going too slow, then you should start looking at caching your views/templates/html. As mentioned before, these caches could be invalidated by your background workers when they encounter new results.

Should I use Wordpress Transient API in this case?

I'm writing a simple Wordpress plugin for work and am wondering if using the Transients API is practical in this case, or if I should seek out another way.
The plugin's purpose is simple. I'm making a call to USZip Web Service (http://www.webservicex.net/uszip.asmx?op=GetInfoByZIP) to retrieve data. Our sales team is using a Lead Intake sheet that the plugin will run on.
I wanted to reduce the number of API calls, so I thought of setting a transient for each zip code as the key and store the incoming data (city and zip). If the corresponding data for a given zip code already exists, then no need to make an API call.
Here are my concerns:
1. After a quick search, I realized that the transient data is stored in the wp_options table and storing the data would balloon that table in no time. Would this cause a significance performance issue if the db becomes huge?
2. Is this horrible practice to create this many transient keys? It could easily becomes thousands in a few months time.
If using Transient is not the best way, could you please help point me in the right direction? Thanks!
P.S. I opted for the Transients API vs the Options API. I know zip codes don't change often, but they sometimes so. I set expiration time of 3 months.
A less-inflated solution would be:
Store a single option called uszip with a serialized array inside the option
Grab the entire array each time and simply check if the zip code exists
If it doesn't exist, grab the data and save the whole transient again
You should make sure you don't hit the upper bounds of a serialized array in this table (9,000 elements) considering 43,000 zip codes exist in the US. However, you will most likely have a very localized subset of zip codes.

How to build large/busy RSS feed

I've been playing with RSS feeds this week, and for my next trick I want to build one for our internal application log. We have a centralized database table that our myriad batch and intranet apps use for posting log messages. I want to create an RSS feed off of this table, but I'm not sure how to handle the volume- there could be hundreds of entries per day even on a normal day. An exceptional make-you-want-to-quit kind of day might see a few thousand. Any thoughts?
I would make the feed a static file (you can easily serve thousands of these), regenerated periodically. Then you have a much broader choice, because it doesn't have to run below second, it can run even minutes. And users still get perfect download speed and reasonable update speed.
If you are building a system with notifications that must not be missed, then a pub-sub mechanism (using XMPP, one of the other protocols supported by ApacheMQ, or something similar) will be more suitable that a syndication mechanism. You need some measure of coupling between the system that is generating the notifications and ones that are consuming them, to ensure that consumers don't miss notifications.
(You can do this using RSS or Atom as a transport format, but it's probably not a common use case; you'd need to vary the notifications shown based on the consumer and which notifications it has previously seen.)
I'd split up the feeds as much as possible and let users recombine them as desired. If I were doing it I'd probably think about using Django and the syndication framework.
Django's models could probably handle representing the data structure of the tables you care about.
You could have a URL that catches everything, like: r'/rss/(?(\w*?)/)+' (I think that might work, but I can't test it now so it might not be perfect).
That way you could use URLs like (edited to cancel the auto-linking of example URLs):
http:// feedserver/rss/batch-file-output/
http:// feedserver/rss/support-tickets/
http:// feedserver/rss/batch-file-output/support-tickets/ (both of the first two combined into one)
Then in the view:
def get_batch_file_messages():
# Grab all the recent batch files messages here.
# Maybe cache the result and only regenerate every so often.
# Other feed functions here.
feed_mapping = { 'batch-file-output': get_batch_file_messages, }
def rss(request, *args):
items_to_display = []
for feed in args:
items_to_display += feed_mapping[feed]()
# Processing/returning the feed.
Having individual, chainable feeds means that users can subscribe to one feed at a time, or merge the ones they care about into one larger feed. Whatever's easier for them to read, they can do.
Without knowing your application, I can't offer specific advice.
That said, it's common in these sorts of systems to have a level of severity. You could have a query string parameter that you tack on to the end of the URL that specifies the severity. If set to "DEBUG" you would see every event, no matter how trivial. If you set it to "FATAL" you'd only see the events that that were "System Failure" in magnitude.
If there are still too many events, you may want to sub-divide your events in to some sort of category system. Again, I would have this as a query string parameter.
You can then have multiple RSS feeds for the various categories and severities. This should allow you to tune the level of alerts you get an acceptable level.
In this case, it's more of a manager's dashboard: how much work was put into support today, is there anything pressing in the log right now, and for when we first arrive in the morning as a measure of what went wrong with batch jobs overnight.
Okay, I decided how I'm gonna handle this. I'm using the timestamp field for each column and grouping by day. It takes a little bit of SQL-fu to make it happen since of course there's a full timestamp there and I need to be semi-intelligent about how I pick the log message to show from within the group, but it's not too bad. Further, I'm building it to let you select which application to monitor, and then showing every message (max 50) from a specific day.
That gets me down to something reasonable.
I'm still hoping for a good answer to the more generic question: "How do you syndicate many important messages, where missing a message could be a problem?"

Resources