I am trying to consolidate two custom dimensions provided by different third parties into a third one, and after much reading and trying I am sure I am not clear on what is the best approach. As an example, imagine one CD is Provider1_Company_Size and the other Provider2_Company_Size. The issue is each provider only detects a small subset of sessions, and we have two CD for the same segmentation, so I would like to have a CD that merges them. It doesn't matter if they both provide an answer, I'd just take one provider.
Is there a way to do this, or what could be the best way to do it? I want to use this segmentation in reports.
The challenge is having multiple provider with sparse data, each using a CD, which prevents using these in an integrated way.
Related
Looking to see if it's possible to have a Rails app hit multiple dbs dynamically. To be more precise:
I have an app that can operate in different regions.
Each request that comes in will identify the region.
In mysql, one region corresponds to exactly one db.
The dbs are identical in terms of the schema. Implying the AR models are all the same, it's just that depending on the request, I want the model object to be retrieved/updated from one of the per region dbs.
All of the data is isolated to that particular db. There is never any crossover, nor any need to query multiple dbs at the same time.
One way to avoid multiple db's is to add a "region" column to all the models/tables (don't really like that).
Another way to do this would simply be to fire up different instances for different regions. Again, don't really want to do that given all the config overhead (cloud servers, nginx, etc, etc).
Any ideas?
I found that Rails 6.1 introduced the notion of horizontal sharding. That was what I needed. And I found this article useful:
https://www.freshworks.com/horizontal-sharding-in-a-multi-tenant-app-with-rails-61-blog/
I have an application that creates persistent caches on a fixed region (MYAPP_REGION) with fixed cached names (MyApp.Data.Class1, MyApp.Data.Class2, ...etc.)
I am deploying 2 instances of this application for 2 different customers, but they use the same ignite clusters.
What is the correct way to discriminate the data between the instances: do I change the cache name to be by customer or a region per customer is enough?
In a rdbms scenario, we would create 2 different databases; so I am wondering how we would achieve the same thing when using ignite as storage solution.
Well, as you have mentioned, there are a variety of options. If it's only logical division and you are OK with resource sharing, just like with a regular RDBM, then use multiple caches/tables or different SQL schemas. Keep in mind the desired data distribution and the amount of caches/tables per customer. I.e. if you have 3 nodes and 3 customers with about the same amount of data, most likely you'd like to use a custom affinity function to make them collocated on a single node, but it's a bit different question.
If you want more physical division, for example, if one of the customers needs more resources or special features like native persistence, then it's better to follow the different regions approach which might end up having separate clusters though.
I would like some guidance to setup BigQuery data storage from Google Analytics.
We have 6 different websites which 4 of them belongs to a project and 2 of them to another, but we would like to analyse the data both separately for each site; the projects separately with the sites data; and all the sites together.
Hence, which is the best structure to setup in BigQuery?:
Two projects, with 4 and 2 datasets, or 1 main project with 2 datasets and 4 and 2 tables? or is that even possible.
Or is it so easy to extract the data that it doesn't matter, we can just put every site in an own project and extract the data as we want them.
Please give me some guidance in this issue
Kind regards
The short answer:
Or is it so easy to extract the data that it doesn't matter, we can just put every site in an own project and extract the data as we want them.
Yes!
The longer answer:
You can extract data from only one view per property (Set up a BigQuery Export), so start by identifying which one you'll link and ensure the settings are the same across all of the views you are going to import, assuming this is important to you.
Each profile/site will go into it's own dataset and will be partitioned by day, making it easy to query them individually, or together, as required.
It is possible to query across projects, so if you store data across two, you'll still be able to join them.
In my opinion it would make things easier for analysts if the data was all in one project, as you'll be able to save queries in a single location and track the query costs centrally, but if you need to keep 2 projects your data can still be connected.
I am working on building a database of timing and address information of restaurants those are extracted from multiple web sites. As information for same restaurants may be present in multiple web sites. So in the database I will have some nearly duplicate copies.
As the number of restaurants is large say, 100000. Then for each new entry I have to do order of 100000^2 comparison to check if any restaurant information with nearly similar name is already present. So I am asking whether there is any efficient approach better than that is possible. Thank you.
Basically, you're looking for a record linkage tool. These tools can index records, then for each record quickly locate a small set of potential candidates, then do more detailed comparison on those. That avoids the O(n^2) problem. They also have support for cleaning your data before comparison, and more sophisticated comparators like Levenshtein and q-grams.
The record linkage page on Wikipedia used to have a list of tools on it, but it was deleted. It's still there in the version history if you want to go look for it.
I wrote my own tool for this, called Duke, which uses Lucene for the indexing, and has the detailed comparators built in. I've successfully used it to deduplicate 220,000 hotels. I can run that deduplication in a few minutes using four threads on my laptop.
One approach is to structure your similarity function such that you can look up a small set of existing restaurants to compare your new restaurant against. This lookup would use an index in your database and should be quick.
How to define the similarity function is the tricky part :) Usually you can translate each record to a series of tokens, each of which is looked up in the database to find the potentially similar records.
Please see this blog post, which I wrote to describe a system I built to find near duplicates in crawled data. It sounds very similar to what you want to do and since your use case is smaller, I think your implementation should be simpler.
I am trying to build a map based query interface for my website and I am having difficulty finding a starting point besides http://developer.google.com. I assume this is a rather simple task but I feel as though I am on a wild goose chase. Anyway the problem is the existing site places people into a category based on their address (primarily the zip code), this is not working out because of odd shapes and user density so I would like to solve the problem by creating custom zones.
I am not looking for a proprietary solution because I would really like to accomplish this on my own, I just need some better places to start or better suggestions for searches.
I understand that I will need to create a map with my predetermined polygons.
I understand how to create a map with polygons via js.
I do not understand how data will request which zone it is within and how it will return it as a hash I can store. eg. user=>####, zone=>####, section=>#####
http://blog.appdelegateinc.com./point-in-polygon-checking-with-google-maps.html
has some JS you can add to give the ability to test whether a point is within a polygon (sample: http://blog.appdelegateinc.com./static/samples/point_in_polygon.html ) using this approach: http://en.wikipedia.org/wiki/Point_in_polygon#Ray_casting_algorithm
I think as you place the markers, you'll hold them in an array (of objects)...then loop through, doing some sort of reduction of which polygons to test, testing those that remain, if inPoly, set marker.zone and marker.section to whatever suits your needs