Metrics for Cosmos read regions - azure-cosmosdb

We've set up a cosmos account in North Europe with Geo replication to West Europe. Consistency is set to "Session"(Default). The intent is to use North Europe as a single write region and both regions as read. This is because the requirements are to have no performance degradation during batch data ingestion data into the database. We are using ADF to do the batch ingestion.
The question I have is how do I monitor the metrics for the read only region? When I look at the Metrics on Cosmos, I can only still see North Europe in the drop down.

Thanks. So this is not a problem.
I found that when you create a write region and 1 or more read regions, the other regions metrics will not be visible until there is some metrics to report. The replication of data does not contribute to the Metrics/throughput usage.
To test this, I wrote some python code to fetch some data and set the secondary read region as the preferred location. Just 2 minutes after executing the code, the read region appeared on the Metrics region drop down.
The python code I used to define the client is below :
client = CosmosClient(ENDPOINT, {'masterKey': MASTER_KEY}, preferred_locations = ['Central US'])
Am closing this question.

Your problem also appeared in my side.
I have a test db which set east-asia as write only and other regions read. When I reached metrics page, only east-asia in the drop down of region filter. I guess it comes from the location of the operation(all my operations are from this region so there only provides the only one choice). After I delete the east-asia region in Replicate data globally and did some query, then I can see another region in metrics.
I also tested in my another database, it doesn't enable global distribute and I haven't use the database for a long time. When I opened the metric page, I find it provides no choice for region. But after execute a query and wait for a while, the region showed in drop down.

Related

R Shiny: Why do I need (Azure) SQL?

I have 30GB of COVID risk from viral exposure data in a single flat CSV file. I have made an Rshiny app with filters that allow me to select a subset of these data and plot them. Eventually I would like a handful of people to use the Rshiny app in a secure environment outside of my organisation with password protection. Obviously I can't read then 30GB into memory so I have tried:
Disk.frame
sqlite
Downsampling
But these are still super slow. I'm told I need Azure SQL and my data bottleneck problem will go away but that's about 10K per year. Is that my best option?
Please ask if I need to give more info.
EDI: Some info about the data:
Each row is a pretend passenger on public transport. Each column is some attribute about their journey: where they got on/off, the time they they got on/off, how close they were to an infectious person, whether they themselves were infectious and then the dose they received through breathing it or touching a surface. Then there are columns about the passenger loading, prevalence of then virus in the community and others about the train itself. So these are a lot of repetition and no indexing. I think that might be where the bottleneck is coming from.
An example query would be:
Select passengers wearing masks &
Passengers that got on at FIN &
All passengers who wore masks &
Passengers where the train was 50% full &
Passengers that were on a train with bad ventilation.
Plot the dose they received.
Azure SQL will meet the need for a database that will handle that data volume, and it will provide a secure environment in which to do that.
As for pricing, it doesn't need to cost $10K per year, unless you have very specific performance requirements. I just quoted an S2 database (50 DTUs, 250GB storage) for $89/month. If you want super-scalability, you can go serverless, and the same size database can support 2 vCores scaling on demand to 16 vCores for $113/month.
Now, does that mean you have to use Azure's SQL offering? No, but it could be a viable solution for you.

Reverse geocode latitude/longitude coordinates to retrieve landuse data (eg. residential area, highway, etc.)

I would like to analyse the locations of electric vehicle charging stations for Germany, Italy and France. Those three countries, because they differ quite a lot in regard to their respective incentive programmes for public charging station infrastructure.
What I have so far are .csv exports from both OpenChargeMap and OpenStreetMap containing the location data (latitude and longitude) of all charging stations in those three countries along with a few other information that I can process in R.
What I would like to do now is some sort of reverse geocoding on those latitude and longitude coordinates to retrieve additional information on the surroundings. Especially, whether the respective charging station is located in a residential area in a city for example or at a rest stop on the highway. By knowing at what kind of locations the charging stations are placed in those three countries I am hoping to be able to draw conclusions regarding the incentive programmes. I'm not looking for specific addresses in this case, but rather an API or another way to process thousands of coordinates and retrieve information regarding for example population density or any other piece of data from which I could derive conclusions.
I have tried to get OpenStreetMap exports to work, but unfortunately I cannot seem to be able to query for the 'landuse' attribute through the Overpass Turbo API. This is my basic query that I'm using in this specific API, but as soon as I query for ["landuse" = "residential"] instead of ["landuse" = ""] I get prompted empty fields as result.
I found an API from Google which would offer lookup for various address components/types. Unfortunately, registering an API key at Google is not quite realistic for the scope of my work. Does somebody know of a (preferably FOSS) API that is able to do something like this? Or even how to make a 'landuse' query work in the Overpass Turbo API linked above?
Thank you in advance for your time.
Your Overpass API query is looking for elements that are tagged as amenity=charging_station and landuse. This is rather uncommon since charging stations and landuse are mapped as distinct objects. Instead you need to look around charging stations for landuse elements.
So instead of
area["ISO3166-1"="DE"]->.a;
nwr(area.a)["amenity"="charging_station"]["landuse"=""];
you will need a query like
area["ISO3166-1"="DE"]->.a;
nwr(area.a)["amenity"="charging_station"];
way(around:200)["landuse"];
This searches for ways with a landuse tag located within 200 meters of charging stations.
Note that this is a rather heavy query. You should probably use your own Overpass API server for it.

Ingesting Google Analytics data into S3 or Redshift

I am looking for options to ingest Google Analytics data(historical data as well) into Redshift. Any suggestions regarding tools, API's are welcomed. I searched online and found out Stitch as one of the ETL tools, help me know better about this option and other options if you have.
Google Analytics has an API (Core Reporting API). This is good for getting the occasional KPIs, but due to API limits it's not great for exporting great amounts of historical data.
For big data dumps it's better to use the Link to BigQuery ("Link" because I want to avoid the word "integration" which implies a larger level of control than you actually have).
Setting up the link to BigQuery is fairly easy - you create a project in the Google Cloud Console, enable billing (BigQuery comes with a fee, it's not part of the GA360 contract), add your email address as BigQuery Owner in the "IAM&Admin" section, go to your GA account and enter the BigQuery Project ID in the GA Admin section, "Property Settings/Product Linking/All Products/BigQuery Link". The process is described here: https://support.google.com/analytics/answer/3416092
You can select between standard updates and streaming updated - the latter comes with an extra fee, but gives you near realtime data. The former updates data in BigQuery three times a day every eight hours.
The exported data is not raw data, this is already sessionized (i.e. while you will get one row per hit things like the traffic attribution for that hit will be session based).
You will pay three different kinds of fees - one for the export to BigQuery, one for storage, and one for the actual querying. Pricing is documented here: https://cloud.google.com/bigquery/pricing.
Pricing depends on region, among other things. The region where the data is stored might also important be important when it comes to legal matters - e.g. if you have to comply with the GDPR your data should be stored in the EU. Make sure you get the region right, because moving data between regions is cumbersome (you need to export the tables to Google Cloud storage and re-import them in the proper region) and kind of expensive.
You cannot just delete data and do a new export - on your first export BigQuery will backfill the data for the last 13 months, however it will do this only once per view. So if you need historical data better get this right, because if you delete data in BQ you won't get it back.
I don't actually know much about Redshift, but as per your comment you want to display data in Tableau, and Tableau directly connects to BigQuery.
We use custom SQL queries to get the data into Tableau (Google Analytics data is stored in daily tables, and custom SQL seems the easiest way to query data over many tables). BigQuery has a user-based cache that lasts 24 hours as long as the query does not change, so you won't pay for the query every time the report is opened. It still is a good idea to keep an eye on the cost - cost is not based on the result size, but on the amount of data that has to be searched to produce the wanted result, so if you query over a long timeframe and maybe do a few joins a single query can run into the dozens of euros (multiplied by the number of users who use the query).
scitylana.com has a service that can deliver Google Analytics Free data to S3.
You can get 3 years or more.
The extraction is done through the API. The schema is hit level and has 100+ dimensions/metrics.
Depending on the amount of data in your view, I think this could be done with GA360 too.
Another option is to use Stitch's own specfication singer.io and related open source packages:
https://github.com/singer-io/tap-google-analytics
https://github.com/transferwise/pipelinewise-target-redshift
The way you'd use them is piping data from into the other:
tap-google-analytics -c ga.json | target-redshift -c redshift.json
I like Skyvia tool: https://skyvia.com/data-integration/integrate-google-analytics-redshift. It doesn't require coding. With Skyvia, I can create a copy of Google Analytics report data in Amazon Redshift and keep it up-to-date with little to no configuration efforts. I don't even need to prepare the schema — Skyvia can automatically create a table for report data. You can load 10000 records per month for free — this is enough for me.

Tracking a Search that leads to a sale in GA

This seems really basic but i am struggling with it
We have a client who runs a travel website.
They have a few different search bars eg Flights, Hotels, Carhire.
I am trying to track the performance of each... "What % of people completed a sale that ran a Flight search." Same for Hotel, and for Car hire
Any ideas for the best way to get this info in GA?
Many thanks
There are a few ways to get this information, each with their pros and cons. The options that I see immediately available are segments and goals.
Segments are great because they are retrospective and generally more flexible, with the ability to be changed if you find your criteria isn't quite right. You create here, and specify sessions that go through search results pages etc:
Then you can create another segment for booking confirmation page, and any other intermediary steps that you'd like to report on. The main con of segments is that you can only pull in 4 at a time, but if you have more you can pull them 4 at a time and copy+paste the data into an excel sheet or google sheet. Segments can also be pulled via the Core Reporting Api and DataStudio which makes them great for automating into dashboards.
Goals are cool because they pull into the default reports, and basically track sessions through a particular page, event or sequence. The main con I see and the reason is that I don't use them is that they only start tracking fro mthe time you create them , and if you change the configuration it does not impact historical data, so your data can get messed up quickly if you don't have sandbox GA views or sandbox goals for your testing before putting it into a dedicated goal slot. You can also only have 10 or 20 goals depending on your plan, so once data is tracked against that goal you can't remove or clear it.

Any free mapping service to display and filter 250000+ datapoints?

I have participated in a Hackathon in my city, and the traffic department made public a dataset with more than 250 thousand traffic accident datapoints, each one containing Latitude, Longitude, type of accident, vehicles involved, etc.
I made a test to display the data using Google Maps API and Google Fusion Tables, but the usage limits were quickly reached with the first two years of a total of 13 years of records.
The data for two years can be displayed and filtered here.
So my question is:
Which free online services could I use in order to interactively display and filter 250 thousand such datapoints as map layers?
It is important that the service be free, because we are volunteering our time for non-profit public good. Currently our City Hall is implementing an API, but it is not ready yet, and it would be useful to present them some popularly well-accepted use-cases to make some political pressure for further API development with THEIR server (specially remotely querying a database instead of crawling a bunch of .csv files as it is now...)
An alternative would be to put everything in GitHub and load the whole dataset client-side to be manipulated with D3.js for example, but that seems very inefficient either for the client/user as for the server.
Thanks for reading, and feel free to re-tag if needed.
You need Google Maps API for Business to achieve what you want, but it costs a lot of money.
However, in some cases, you can get this Business Licence if you work for non-profit organization. I can't find the exact rules to be eligible for this free licence. I tried googled them but I can't find anything. I only find this link, just take a look if it can answer your problem.
You should be able to do that with Google Fusion Tables. The limit is 100,000 points per table, but you can overlay 5 layers onto a single map so in effect you can reach 500,000 points. I implemented the website below and have run it with over 200,000 points.
http://www.skyscan.co.uk/mapsearch.html

Resources