Could a TrainedModel be re-used in Google Prediction? - google-prediction

I am using Google Prediction v1.5 (Java client) with the "PredictionSample.java" sample program.
In "PredictionSample.java", I specify "MODEL_ID" as "mymodel", "APPLICATION_NAME" as "MyApplication" and train the model using "language_id.txt" stored in my Google Cloud Storage bucket. The sample program runs OK and performs several predictions using some input features.
However, I wonder where is the "mymodel" TrainedModel stored. Is it stored under my "Google APIs console" project ? (but it seems that I could not find "mymodel" in my "Google APIs console" project)
In the FAQ of the Google Prediction API, it says that "You cannot currently download your model.". It seems that the TrainedModel ("mymodel") is stored somewhere in the Google Prediction server. I wonder where exactly is the actual store location, and how could I re-use this TrainedModel to perform predictions using the Google Prediction v1.5 Java client (i.e. without re-training the model in future predictions).
Does anyone have ideas on this. Thanks for any suggestion.

The models are stored in Google's Cloud and can't be downloaded.
Models trained in Prediction API 1.5 and earlier are associated with the user that created them - and cannot be shared with other users.
If you use Prediction API 1.6 (released earlier this month), models are now associated with a project (similar to Google Cloud Storage, BigQuery, Google Compute Engine and other Google Cloud Platform products) which means that the same model can now be shared amongst all members of the project team.

I finally find out that I do not need to re-train the TrainedModel for future predictions. I just delete the
train(prediction);
statement in "PredictionSample.java" and I could still perform predictions from the "mymodel" TrainedModel.
The TrainedModel is in fact stored somewhere in the Google Prediction server. I could "list" it through the "APIs Explorer for Prediction API". I could perform predictions from it once it is trained/inserted. But I could not download it for my use.

Related

Azure Synapse replicated to Cosmos DB?

We have a Azure data warehouse db2(Azure Synapse) that will need to be consumed by read only users around the world, and we would like to replicate the needed objects from the data warehouse potentially to a cosmos DB. Is this possible, and if so what are the available options? (transactional, merege, etc)
Synapse is mainly about getting your data to do analysis. I dont think it has a direct export option, the kind you have described above.
However, what you can do, is to use 'Azure Stream Analytics' and then you should be able to integrate/stream whatever you want to any destination you need, like an app or a database ands so on.
more details here - https://learn.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-integrate-azure-stream-analytics
I think you can also pull the data into BI, and perhaps setup some kind of a automatic export from there.
more details here - https://learn.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-get-started-visualize-with-power-bi

Read/Write metrics from Datastore

Is there a way to get detail Datastore metrics? I am interested in reads and writes a sort of historical data. I would like visually to see whenever changes in the stack happens how Datastore utilized.
You can see the list of metrics provided for Datastore in the official documentation:
api/request_count
entity/read_sizes
entity/write_sizes
index/write_count
A straight forward way to observer them is to use Stackdriver Monitoring > Resources > Metrics Explorer > Find resource type and metric.

How could Bosun fit for my usecase?

I need an alerting system where I could have my own metric and threshold to report for anomalies (basically alerting on the basis of logs and data in DB). I explored Bosun but not sure how to make it work. I have following issues:-
There are pre-defined items which are all system level, but I couldn't find a way to add new items, i.e. custom items
How will bosun ingest data other than scollector. As I understand could I use logstash as data source and totally miss OpenTDSP( Really don't like HBase dependency)?
By Items I think you mean metrics. Bosun learns about metrics, and their tag relationships when you do one of the following:
Relay opentsdb data through Bosun (http://bosun.org/api#sending-data)
Get copies of metrics sent to the api/index route http://bosun.org/api#apiindex
There are also metadata routes, which tell bosun about the metric, such as counter/gauge, unit, and description.
The logstash datasource will be deprecated in favor of an elastic datasource in the coming 0.5.0 release. But it is replaced by an elastic one is better (but requires ES 2+). To use those expressions see the raw documentation (bosun.org docs will updated next release): https://raw.githubusercontent.com/bosun-monitor/bosun/master/docs/expressions.md. To add it you would have something like the following in the config:
elasticHosts=http://ny-lselastic01.ds.stackexchange.com:9200,http://ny-lselastic02.ds.stackexchange.com:9200,http://ny-lselastic03.ds.stackexchange.com:9200
The functions to query various backends are only loaded into the expression library when the backend is configured.

Tools for running analysis on data held in BigQuery?

I have about 100GB data in BigQuery, and I'm fairly new to using data analysis tools. I want to grab about 3000 extracts for different queries, using a programmatic series of SQL queries, and then run some statistical analysis to compare kurtosis across those extracts.
Right now my workflow is as follows:
running on my local machine, use BigQuery Python client APIs to grab the data extracts and save them locally
running on my local machine, run kurtosis analysis over the extracts using scipy
The second one of these works fine, but it's pretty slow and painful to save all 3000 data extracts locally (network timeouts, etc).
Is there a better way of doing this? Basically I'm wondering if there's some kind of cloud tool where I could quickly run the calls to get the 3000 extracts, then run the Python to do the kurtosis analysis.
I had a look at https://cloud.google.com/bigquery/third-party-tools but I'm not sure if any of those do what I need.
So far Cloud Datalab is your best option
https://cloud.google.com/datalab/
It is in beta so some surprises are possible
Datalab is built on top of below (Jupyter/IPython) option and totally in cloud
Another option is Jupyter/IPython Notebook
http://jupyter-notebook-beginner-guide.readthedocs.org/en/latest/
Our data sience team started with second option long ago with great success and now are moving toward Datalab
For the rest of the business (prod, bi, ops, sales, marketing, etc.), though, we had to build our own workflow/orchestration tool as nothing around was found good or relevant enough.
two easy ways:
1: if your issue is network like you say, use a google compute engine machine to do the analisis, in the same zone as your bigquery tables (us, eu etc). it will not have network issues getting data from bigquery and will be super-fast.
the machine will only cost you for the minutes you use it. save a snapshot of your machine to reuse the machine setup anytime (snapshot also has monthly cost but much lower than having the machine up.)
2: use Google cloud Datalab (beta as of dec. 2015) which supports bigquery sources and gives you all the tools you need to do the analysis and later share it with others:
https://cloud.google.com/datalab/
from their docs: "Cloud Datalab is built on Jupyter (formerly IPython), which boasts a thriving ecosystem of modules and a robust knowledge base. Cloud Datalab enables analysis of your data on Google BigQuery, Google Compute Engine, and Google Cloud Storage using Python, SQL, and JavaScript (for BigQuery user-defined functions)."
You can check out Cooladata
It allows you to query BQ tables as external data sources.
What you can do is either schedule your queries and export the results to Google storage, where you can pick up from there, or use the built in powerful reporting tool to answer your 3000 queries.
It will also provide you all the BI tools you will need for your business.

Get specific data from Multi-Channel Funnels Reporting API?

In my application, i get the data on traffic and conversions from my account using Google Analytics Core Reporting API and Multi-Channel-Funnels Reporting API.
For to get a traffic data, I use the GET command from Core Reporting API.
In this command I give, among other things, necessary to display parameters (dimensions and metrics), and filtering options data for my request - filter and segment (I use dynamic segments).
Here's an example of one of the queries:
GET https://www.googleapis.com/analytics/v3/data/ga?
ids=ga:XXXXXXXXXX &
start-date=2013-03-05 &
end-date=2013-04-04 &
metrics=ga:visits,ga:pageviewsPerVisit,ga:avgTimeOnSite,ga:percentNewVisits,ga:entranceBounceRate &
dimensions=ga:source,ga:keyword &
filters=ga:visits>0 &
segment=dynamic::ga:medium==CPA,ga:medium==CPC,ga:medium==cpm;ga:campaign!#img_ &
sort=-ga:visits &
key={YOUR_API_KEY}
This query returns me the results of the traffic data match the condition of filter and segment.
But when i wanted to return the data conversion for the same data with the MCF Reporting API, i encountered a problem.
GET command from MCF Reporting API does not contain the "Segment", and the filter does not allow write OR conditions.
Although the web interface Google Analytics has the ability to apply segments for data conversion. I've read. that we can apply Channel Groupings to the query results in the web interface, but they are tied to the account. And because I'm using a service account for authentication and working with API, to me they are not available. I do not know how to apply them in the API.
How do I filter the melon for the conversion in the request that they udoletvoryali Writing the above condition?
Is there a way to solve my problem?
Thanks, sorry for my English.

Resources