Using Face API To Identify Users - microsoft-cognitive

I am trying to create the ability to take a picture of a person and locate their user account based on a picture of them. I have 1MM users and each will have a photo which is only of them and will be associate to their user account via the creation of a PersonFace which is added to a Person which is in a PersonGroup. So with 10K persons per Person Group I need about 100 person Groups for me 1MM users.
So once all this is setup I am not clear on how I detect a face given a supplied photo. That is, I know I pass the photo via 'Detect' call however what is returned is an array of Face[] each which include a FaceId and to get from FaceId to a person I must call 'Identify', however that call requires I pass a Person Group Id, but I have 100 of them.
So given this the only solution I can come up with is to call Identify via loop through all 100 group Ids?

Currently, yes, that is the only way to do it. Which obviously makes scanning 1 million persons a less than ideal scenario given the 10 transactions per second limit.
There are a couple upcoming features which will improve this scenario, but right now I don't have an ETA for them:
Significantly higher limits of Persons per PersonGroup
Additional tiers of the Face API which allow significantly higher transaction per second rate limits.

Related

modeling scenario with mostly semi-additive facts

Im learning dimensional modeling and Im trying to create a model. I was thinking about a social media platform which rates hotels. The platform has following data:
hotel information: name and address
a user can rate hotels (1-5 points)
a user can write comments
platform stores the date of the comments
hotel can answer via comment and it stores the date of it
the platform stores the total number of each rating level (i.e.: all rates with 1 point, all rates with 2 point etc.)
platform stores information of the user: sex, name, total number of votes he/she made and address
First, I tried to define which information belongs to a dimension or fact table
(here I also checked which one is additive/semi additive/non-additive)
I realized my example is kind of difficult, because it’s hard to decide if it belongs to a fact table or dimension.
I would like to hear some advice. Would someone agree with my model?
This is how I would model it:
Hotel information -> hotel dimension
User rating -> additive fact – because I can aggregate them with all dimensions
User comment -> semi additive? – because I can aggregate them with the date dimension (I don’t know if my argument is correct, but I know I would have new comments every day, which is for me a reason to store it in a fact table
Answer as comment -> same handling like with the user comments
Date of comment-> dimension
Total Number of all votes (1/2/3/4/5) -> semi-additive facts – makes no sense to aggregate them, since its already total but I would get the average
User information sex and name, address -> user-dimension
User Information: total number of votes -> could be dimension or fact. It depends how often it changes. If it changes often, I store it in a fact. If its not that often, then dimension
I still have question, hope someone can help me:
My Question: should I create two date dimensions, or can I store both information in one date dimension?
2nd Question: each user and hotel just have one address. Are there arguments, to separate the address dimension in a own hierarchy? Can I create a 1:1 relationship to a user dimension and address dimension?
For your model, it looks well considered, but here are some thoughts:
User comment (and answers to comments): they are an event to be captured (with new ones each day, as you mention) so are factual, with dimensionality of the commenter, type of comment, date, and the measure is at least a 'count' which is additive. But you don't want to store big text in a fact so you would put that in a dimension by itself which is 1:1 with the fact, for situations where you need to query on the comment itself.
Total Number of all votes (1/2/3/4/5) are, as you say, already aggregates, mostly for performance. Totals should be easy from the raw data itself so maybe not worthwhile to store them at all. You might also consider updating the hotel dimension with columns (hotel A has 5 '1' votes and 4 '2' votes) that you'd update as you go on, for easy filtering and categorisation.
User Information: total number of votes: it is factual information about a user (dimension) and it depends on whether you always just want to 'find it out' about a person or whether you are likely to use it to filter other information (i.e. show me all reviews for users who have made 10-20 votes). In that case you might store the total in the user dimension (and/or a banding, like 'number of reviews range' with 10-20, 20-30). You can update dimensions often if you need to, but you're right, it could still just live as a fact only.
As for date dimensions, if the 'grain' is 'day' then you only need one dimension, that you refer to from multiple facts.
As for addresses, you're right that there are arguments on both sides! Many people separate addresses into their own dimension, referred to from the other dimensions that use them. Kimball suggests you can do that behind the scenes if necessary, but prefers for each dimension to have its own set of address columns(but modelled as consistently as possible).

Unique Users in Google Analytics

I'm trying to get all unique visitors for a selected time period, but I want to filter them by date on the server. However, the sum of unique visitors for each day isn't the number of unique visitors for the time period.
For example:
Monday: 2 unique visitors
Tuesday: 3 unique visitors
The unique visitors for the two days period isn't necessarily 5.
Is there a way to get the results I want using the Google Analytics API (v3)?
You're right that Users aren't additive, so you can't simply add them day by day. There are several ways around this.
The fist and most obvious is that if you've implemented the User-ID you should be able to straight up pull and interrogate the data about which users saw your site on which days.
Another way I've implemented before is to dynamically pull the number of Users from the Google Analytics API whenever you need it. Obviously this only works if you're populating a live web dashboard or similar, but since it's just the one figure you're asking for, it wouldn't slow down the load time by much. Eg. if you're using a dashboarding tool such as Klipfolio, you may be able to define a dynamic data source, and query Google whenever you needthe figure (https://support.klipfolio.com/hc/en-us/articles/216183237-BETA-Working-with-dynamic-data-sources)
You could also limit the number of ways that the data can be interrogated, and calculate all of them. For example, if you only allow users to look at data month-by-month or day-by-day, then you only need those figures.
Finally, you can estimate the figure with reasonable accuracy by splitting it into two parts. New Users are equal to New Sessions (you're only new on your first Session), which is additive, so that figure can be separated out and combined as required.
Then, you could take a rough ratio of new to returning Users (% New Users) from, say, 1 year of data, and use that with the New Users figure to generate an average on any level.

How to check similar faces in cognitive face api using persistedFaceIds

I am using face api to compare confidence level of 2 images.
i was using face list where i was adding some images and then comparing new image faceid with this list and getting its confidence level.
But as mentioned in https://eastus2.dev.cognitive.microsoft.com/docs/services/563879b61984550e40cbbe8d/operations/563879b61984550f30395237 faceid is only for 24 hours and expire after that. But persistedFaceIds never get exprire. So can you please suggest how can i use this persistedFaceIds to compare newly added faceid to get its confidence level. We can only create 64 facelists in cognitive (per subscription) and per list contains 1000 records. This is also one limit of this api.
Following is my requirement:
I storing person's images on server. But every image should be unique. Lets suppose if i got an image which is already store on server so i need to ignore that image.
please suggest how can i achieve this? Thanks
Instead of using FaceList, I would suggest you to use PersonGroup[1][2] instead.
PersonGroup support 10,000 persons per person group, it should fits more most common scenario. And you can use the identify [3] api to check whether the image belongs to one specific person.
[1] https://westus.dev.cognitive.microsoft.com/docs/services/563879b61984550e40cbbe8d/operations/563879b61984550f30395244
[2]
https://westus.dev.cognitive.microsoft.com/docs/services/563879b61984550e40cbbe8d/operations/563879b61984550f3039523c
[3]
https://westus.dev.cognitive.microsoft.com/docs/services/563879b61984550e40cbbe8d/operations/563879b61984550f30395239

Track count of events unique by user rather than session

We have a way to fetch the number sessions unique per device and the number of The New Feature uses, this can be done with a public API and requires implementation of two events to be sent by mobile applications to Google Analytics server. It will give us a statistics of the sessions when The New Feature was used, although it doesn't directly reflect individual users activity.
Ex: the app was opened 1000 times among all unique users (devices), The New Feature has been opened 200 times, the resulting value is 200/1000 or 20%. The drawback is that at this particular case we have no way to tell that is wasn't one user who has opened The New Feature 199 times and another one who has opened it just once, the real retention rate is low to none.
The secondary statistics that we are aiming to be able to calculate is the percentage of unique users who have used The New Feature at least N times during the given period. This statistics should be a closer representation of the real The New Feature retention as it will both show the share of users who were using the feature and the dynamics of frequency. For that we are not clear of which events are needed to be set up.
Ex: the app was opened 1000 times: user A used The New Feature 10 times, user B 5 times, user C 4 times, most of the other users who used The New Feature opened it 2 times - The New Feature was opened 200 times in total. The resulting percentage of users: 10% have opened The New Feature at least once, 8% used it at least 2 times, ..., 1% used it at least 10 times.
The numbers from the second example are giving us more useful information about how often the new feature is being used, but it isn't clear how we can set it up. We would need a kind of the event that shows a number of uses of The New Feature unique by the users (not just sessions) and I think the event values might be used to distinguish the users, will it be possible to get the number of unique users who has triggered the event at least N times this way ? Any other suggestion is welcome.

How does collection sampling affect the "live" stats for Google Analytics?

We've noticed lately that as our site is growing, our data in Google Analytics is getting less reliable.
One of the places we've noticed this most strongly is on the "Realtime Dashboard".
When we were getting 30k users per day, it would show about 500-600 people on line at a time. Now that we are hitting 50k users per day, it's showing 200-300 people on line at a time.
(Other custom metrics from within our product show that the user behavior hasn't changed much; if anything, users are currently spending longer on the site than ever!)
The daily totals in analytics are still rising, so it's not like it's just missing the hits or something... Does anyone have any thoughts?
The only thing I can think of is that there is probably a difference in interpretation of what constitutes a user being on line.
How do you determine if the user is on line?
Unless there is an explicit login/logout tracking, is it possible that it assumes that a user has gone if there is no user generated event or a request from the browser within an interval of X seconds?
If that is the case then it may be worth while adding a hidden iframe with some Javascript code that keeps sending a request every t seconds.
You can't compare instant measures of unique, concurrent users to different time-slices of unique users.
For example, you could have a small number of concurrent unique users (say 10) and a much higher daily unique users number like 1000, because 1000 different people were there over the course of the day, but only 10 at any given time. The number of concurrent users isn't correlated to the total daily uniques, the distribution over the course of the day may be uneven and it's almost apples and oranges.
This is the same way that monthly unique and daily uniques can't be combined, but average daily uniques are a lower bound for monthly uniques.

Resources