Is plotting data from various sources possible? - bokeh

Is there a way to create a plot that is defined in terms of more than one source?
source1 = ColumnDataSource({'x': [1,2,3]})
source2 = ColumnDataSource({'y': [5,5,7]})
p = figure(plot_width=400, plot_height=400)
p.circle(x='x', y='y', size=20, sources=[source1, source2])
Alternatively, is there a way to merge, combine or link sources on the client side? Possibly using some custom javascript?
My motivation is to be able to update the sources independently to minimize traffic. source1 might be updated a lot while source2 is mostly static.

When considering a stand-alone BokehJS app you could take advantage of the AjaxDataSource (see Bokeh documentation) where you can set a polling_interval to define how often the data should be updated and you could add a JS callback (adapter) executed at each update where you can simply concatenate all your data sources into one data source. In this example here Flask is used for serving the data. I guess in your application you would like to split this one app into 2 apps: one server app for serving the data and another stand-alone Bokeh JS app consuming the data

Theres no officially endorsed way to drive a single glyph from multiple data sources. But, you could accomplish this by passing additional data sources as args to a CustomJSTransform, and pulling values from the extra data sources, instead returning a transformation of the standard xs values.
This is 1000% not what was intended as standard usage for CustomJSTransform, so I will leave demonstration as an exercise to the reader.

Related

Rails how to have models hit a different database dynamically

Looking to see if it's possible to have a Rails app hit multiple dbs dynamically. To be more precise:
I have an app that can operate in different regions.
Each request that comes in will identify the region.
In mysql, one region corresponds to exactly one db.
The dbs are identical in terms of the schema. Implying the AR models are all the same, it's just that depending on the request, I want the model object to be retrieved/updated from one of the per region dbs.
All of the data is isolated to that particular db. There is never any crossover, nor any need to query multiple dbs at the same time.
One way to avoid multiple db's is to add a "region" column to all the models/tables (don't really like that).
Another way to do this would simply be to fire up different instances for different regions. Again, don't really want to do that given all the config overhead (cloud servers, nginx, etc, etc).
Any ideas?
I found that Rails 6.1 introduced the notion of horizontal sharding. That was what I needed. And I found this article useful:
https://www.freshworks.com/horizontal-sharding-in-a-multi-tenant-app-with-rails-61-blog/

How to scrape data on a web site that is changing contents in active manner?

I would like to grab satellite positions from the page(s) below, but I'm not sure if scraping is appropriate because the page appears to be updating itself every second using some internal code (it keeps updating after I disconnect from the internet). Background information can be found in my question at Space Stackexchange: A nicer way to download the positions of the Orbcomm-2 satellites.
I need a "snapshot" of four items simultaneously:
UTC time
latitude
longitude
altitude
Right now I use screen shots and manual typing. Since these values are being updated by the page - is conventional web-scraping going to work here? I found a "screen-scraping" tag, should I try to learn about that instead?
I'm looking for the simplest solution to get those four values, I wonder if I can just use urllib or urllib2 and avoid installing something new?
example page: http://www.satview.org/?sat_id=41186U I need to do 41179U through 41189U (the eleven Orbcomm-2 satellites that SpaceX just put in orbit)
Those values are calculated with a little math using javascript. The calculations are detailed here: https://www.satview.org/track.js
So, I guess one option is to write that script (plus any dependencies) in the language of your choice and use it to return your desired values.
There is one major function track() which takes arg $modo which can be one of two values - tic or plot.
There may be other source files (dependencies) referenced:
The easier way would probably be to use something which allows javascript to run e.g. automating a browser, and extracting the calculated values as they are generated.

SSRS dynamic report generation, pdf and subscriptions?

If this question is deemed inappropriate because it does not have a specific code question and is more "am I barking up the right tree," please advise me on a better venue.
If not, I'm a full stack .NET Web developer with no SSRS experience and my only knowledge comes from the last 3 sleepless nights. The app my team is working on requires end users to be able to create as many custom dashboards as they would like by creating instances of a dozen or so predefined widget types. Some widgets are as simple as a chart or table, and the user configures the widget to display a subset of possible fields selected from a larger set. We have a few widgets that are composites. The Web client is all angular and consumes a restful Web api.
There are two more requirements, that a reasonable facsimile of each widget can be downloaded as a PDF report upon request or at scheduled times. There are several solutions to this requirement, so I am not looking for alternate solutions. If SSRS would work, it would save us from having to build a scheduler and either find a way to leverage the existing angular templates or to create views based off of them, populate them and convert that to a pdf. What I am looking for is he'll in understanding how report generation best practices and how they interact witg .NET assemblies.
My specfic task is to investige if SSRS can create a report based on a composite widget and either download it as a PDF or schedule it as one, and if so create a POC based on a composite widget that contains 2 line graphs and a table. The PDF versions do not need to be displayed the same way as the UI where the graphs are on the same row and the table is below. I can show each graph on its' own as long as the display order is in reading order. ( left to right, then down to the next line)
An example case could be that the first graph shows the sales of x-boxes over the course of last year. The line graph next to it shows the number of new releases for the X-Box over the course of last year. The report in the table below shows the number of X-box accessories sold last year grouped by accessory type (controller, headset, etc,) and by month, ordered by the total sales amount per month.
The example above would take 3 queries. The queries are unique to that users specific instance of that widget on that specific dashboard. The user can group, choose sort columns and anything else that is applicable.
How these queries are created is not my task (at least not yet.) So there is an assumption that a magic query engine creates and stores these sql queries correctly in the database.
My target database is sql 2012 and its' reporting service. I'm disappointed it only supports the 2.0 clr.
OI have the rough outline of a plan, but given my lack of experience any help with this would be appreciated.
It appears I can use the Soap service for scheduling and management. That's straight forward.
The rest of my plan sounds pretty crazy. Any corrections, guidance and better suggestions would be welcome. Or maybe a different methodology. The report server is a big security hole, and if I can accomplish the requirements by only referencing the reporting names paces please point me in the right direction. If not, this is the process I have cobbled together after 3 days of research and a few msdn simple tutorials. Here goes:
To successfully create the report definition, I will need to reference every possible field in the entire superset available. It isn't clear yet if the superset for a table is the same as the superset for a graph , but for this POC I will assume they are. This way, I will only need a single stored procedure with an input parameter that identifies the correct query, which I will select and execute. The result set will be a small subset of the possible fields, but the stored procedure will return every field, with nulls for each row of the omitted fields so that the report knows about every field. Terrible. I will probably be returning 5 columns with data and 500 full of nulls. There has to be a better way. Thinking about the performance hit is making me queasy, but that was pretty easy. Now I have a deployable report. I have no idea how I would handle summaries. Would they be additional queries that I would just append to the result set? Maybe the magic query engine knows.
Now for some additional ugliness. I have to request the report url with a query string that identifies the correct query. I am guessing I can also set the scheduler up with the correct parameter. But man do I have issues. I could call the url using httpWebRequest for my download, but how exactly does the scheduler work? I would imagine it would create the report in a similar fashion, and I should be able to tell it in what format to render. But for the download I would be streaming html. How would I tell the report server to convert it to a pdf and then stream it as such? Can that be set in the reports definition before deploying it? It has no problem with the conversion when I play around on the report server. But at least I've found a way to secure the report server by accessing it through the Web api.
Then there is the issue of cleaning up the null columns. There are extension points, such as data processing extensions. I think these are almost analogous to a step in the Web page life cycle but not sure exactly or else they would be called events. I would need to find the right one so that I can remove the null data column or labels on a pie chart at null percent, if that doesn't break the report. And I need to do it while it is still rdl. And just maybe if I still haven't found a way, transform the rdl to a pdf and change the content type. It appears I can add .net assemblies at the extension points. But is any of this correct? I am thinking like a developer, not like a seasoned SSRS pro. I'm trying, but any help pushing me in the right direction would be greatly appreciated.
I had tried revising that question a dozen times before asking, and it still seems unintelligible. Maybe my own answer will make my own question clear, and hopefully save someone else having to go through what I did, or at least be a quick dive into SSRS from a developer standpoint.
Creating a typical SSRS report involves (quick 40,000 foot overview)
1. Creating your data connection
2. Creating a SQL query or Queries which can be parameterized.
3. Datasets that the query result will fill
4. Mapping Dataset columns to Report Items; charts, tables, etc.
Then you build the report and deploy it to your report server, where the report can be requested by url with any SQL parameters Values added as a querystring:
http://reportserver/reportfolder/myreport?param1=data
How this works is that an RDL file (Report Definition Language) which is just an XML document with a specific schema is generated. The RDL has two elements that were relevant to me, and . As the names infer, the first contains the queries and the latter contains the graphs, charts, tables, etc. in the report and the mappings to the columns in the dataset.
When the report is requested, it goes through a processing pipeline on the report server. By implementing Interfaces in the reporting services namespace, one could create .NET assemblies that could transform the RDL at various stages in the pipeline.
Reporting Services also has two reporting API's. One for managing reports, and another for rendering. There is also the reportserver control which is a .NET Webforms control which is pretty rich in functionality and could be used to create and render reports without even needing a report server instance. The report files the control could generate were RDLC files, with the C standing for client.
Armed with all of this knowledge, I found several solution paths, but all of them were not optimal for my purposes and I have moved on to a solution that did not involve reporting services or RDL at all. But these may be of use to someone else.
I could transform the RDL file as it went through the pipeline. Not very performant, as this involved writing to the actual physical file, and then removing the modifications after rendering. I was also using SQL Server 2012, which only supported the 2.0/3.5 framework.
Then there were the services. Using either service, I could retrieve an RDL template as a byte array from my application. I wasn't limited by the CLR version here. With the management server, I could modify the RDL and deploy that to the Report Server. I would only need to modify the RDL once, but given the number of files I would need and having to manage them on the remote server, creating file structures by client/user/Dashboard/ReportWidget looked pretty ugly.
Alternatively, I instead of deploying the RDL templates, why not just store them in the database in byte array format. When I needed a specific instance, I could fetch the RDL template, add my queries and mappings to the template and then pass them to the execution service which would then render them. I could then save the resulting RDL in the database. It would be much easier for me to manage there. But now the report server would be useless, I would need my own services for management and to create subscriptions and to mail them I would need a queue service and an SMTP mailer, removing all the extras I would get from the report server, need to write a ton of custom code, and still be bound by RDL. So I would be creating RDLM, RDL mess.
It was the wrong tool for the job, but it was an interesting exercise, I learned more about Reporting Services from every angle, and was paid for most of that time. Maybe a blog post would be a better venue, but then I would need to go into much greater detail.

Making Graphite UI data cumualtive by default

I'm setting up Graphite, and hit a problem with how data is represented on the screen when there's not enough pixels.
I found this post whose first answer is very close to what I'm looking for:
No what is probably happening is that you're looking at a graph with more datapoints than pixels, which forces Graphite to aggregate the datapoints. The default aggregation method is averaging, but you can change it to summing by applying the cumulative() function to your metrics.
Is there any way to get this cumulative() behavior by default?
I've modified my storage-aggregation.conf to use 'aggregationMethod = sum', but I believe this is for historical data and not for data that's displayed in the UI.
When I apply cumulative() everything is perfect, I'm just wondering if there's a way to get this behavior by default.
I'm guessing that even though you've modified your storage-aggregation.conf to use 'aggregationMethod = sum', your metrics you've already created have not changed their aggregationMethod. The rules in storage-aggregation.conf only affect new metrics.
To change your existing metrics to be summed instead of averaged, you'll need to use whisper-resize.py. Or you can delete your existing metrics and they'll be recreated with sum.
Here's an example of what you might need to run:
whisper-resize.py --xFilesFactor=0.0 --aggregationMethod=sum /opt/graphite/storage/whisper/stats_counts/path/to/your/metric.wsp 10s:28d 1m:84d 10m:1y 1h:3y
Make sure to run that as the same user who owns the file, or at least make sure the files have the same ownership when you're done, otherwise they won't be writeable for new data.
Another possibility if you're using statsd is that you're just using metrics under stats instead of stats_counts. From the statsd README:
In the legacy setting rates were recorded under stats.counter_name
directly, whereas the absolute count could be found under
stats_count.counter_name. With disabling the legacy namespacing those
values can be found (with default prefixing) under
stats.counters.counter_name.rate and stats.counters.counter_name.count
now.
Basically, metrics are aggregated differently under the different namespaces when using statsd, and you want stuff under stats_count or stats.counters for things that should be summed.

Working with google maps api

I am trying to build a map based query interface for my website and I am having difficulty finding a starting point besides http://developer.google.com. I assume this is a rather simple task but I feel as though I am on a wild goose chase. Anyway the problem is the existing site places people into a category based on their address (primarily the zip code), this is not working out because of odd shapes and user density so I would like to solve the problem by creating custom zones.
I am not looking for a proprietary solution because I would really like to accomplish this on my own, I just need some better places to start or better suggestions for searches.
I understand that I will need to create a map with my predetermined polygons.
I understand how to create a map with polygons via js.
I do not understand how data will request which zone it is within and how it will return it as a hash I can store. eg. user=>####, zone=>####, section=>#####
http://blog.appdelegateinc.com./point-in-polygon-checking-with-google-maps.html
has some JS you can add to give the ability to test whether a point is within a polygon (sample: http://blog.appdelegateinc.com./static/samples/point_in_polygon.html ) using this approach: http://en.wikipedia.org/wiki/Point_in_polygon#Ray_casting_algorithm
I think as you place the markers, you'll hold them in an array (of objects)...then loop through, doing some sort of reduction of which polygons to test, testing those that remain, if inPoly, set marker.zone and marker.section to whatever suits your needs

Resources