I have noticed that when EnhancedAirBook method is called twice, separately for 2 Flight Segments.
In the Last/Second Response of EAB:
TravelItinerary and AirPrice details gives me the combined result of both the Segments. So I was wonder why not consider only the last result and proceed with details.
A single call to EAB should be enough to book and get the total price of multiple segments at once.
If you make multiple calls to EAB with the same session, and with IgnoreAfter=false flag in PostProcessing, the segments you ask the service to book on each call will be concatenated into the same reservation, thus, you get the total price after the second call.
Related
I'm logging some custom metrics in Application insights using the TelemetryClient.TrackMetric method in .NET, and I've noticed that occasionally some of the events are duplicated when I view them in the Azure portal.
I've drilled into the data, and the duplicate events have the same itemId and timestamp, but if I show the ingestion time by adding | extend ingestionTime = ingestion_time() to the query then I can see that the ingestion times are different.
This GitHub issue indicates that this behavior is expected, as AI uses at-least-once delivery.
I plot these metrics in charts in the Azure portal using a sum aggregation, however these duplicates are creating trust issues with the charts as the duplicates are simply treated as two separate events.
Is there a way to de-dupe the events based on itemId before plotting the data in the Azure portal?
Update
A more specific example:
I'm running an algorithm, triggered by an event, which results in a reward. The algorithm may be triggered several dozen times a day, and the reward is a positive or negative floating point value. It logs the reward each time to Application Insights as a custom metric (called say custom-reward), along with some additional properties for data splitting.
In the Azure portal I'm creating a simple chart by going to Application Insights -> Metrics and customising the chart. I select my custom-reward metric in the Metric dropdown, and select Sum as the aggregation. I may or may not apply splitting. I save the chart to my dashboard.
This simple chart gives me a nice way of monitoring the system to make sure nothing unexpected is happening, and the Sum value in the bottom left of the chart allows me to quickly see whether the sum of the rewards is positive or negative over the chart's range, and by how much.
However, on occasion I've been surprised by the result (say over the last 12 hours the sum of the rewards was surprisingly negative), and on closer inspection I discovered that a few large negative results have been duplicated. Further investigation shows this has been happening with other events, but with smaller results I tend not to notice.
I'm not that familiar with the advanced querying bit of Application Insights, I actually just used it for the first time today to dig into the events. But it does sound like there might be something I can do there to create a query that I can then plot, with the results deduped?
Update 2
I've managed to make progress with this thanks to the tips by #JohnGardner, so I'll mark that as the answer. I've deduped and plotted the results by adding the following line to the query:
| summarize timestamp=any(timestamp), value=any(value), name=any(name), customDimensions=any(customDimensions) by itemId
Update 3
Adding the following line to the query allowed me to split on custom data (in this case splitting by algorithm ID):
| extend algorithmId = tostring(customDimensions.["algorithm-id"])
With that line added, when you select "Chart" in the query results, algorithmId now shows up as an option in the split dropdown. After that you can click "Pin to dashboard". You lose the handy "sum over the time period" indicator in the bottom left of the chart which you get via the simple "Metrics" chart, however I'm sure I'll be able to recreate that in other ways.
if you are doing your own queries, you would generally be using something like summarize or makeseries to do this deduping for a chart. you wouldn't generally plot individual items unless you are looking at a very small time range?
so instead of something like
summarize count() ...
you could do
summarize dcount(itemId) ...
or you might add a "fake" summarize to a query that didn't need it before with by itemId to coalesce multiple rows into just one, using any(x) to grab any individual row's value for each column for each itemId.
but it really depends on what you are doing in your specific query. if you were using something like sum(itemCount) to also deal with sampling, you have other odd cases now, where the at-least-once delivery might have duplicated sampled items? (updating your question to add a specific query and hypothetical result would possibly lead to a more specific answer).
Im learning dimensional modeling and Im trying to create a model. I was thinking about a social media platform which rates hotels. The platform has following data:
hotel information: name and address
a user can rate hotels (1-5 points)
a user can write comments
platform stores the date of the comments
hotel can answer via comment and it stores the date of it
the platform stores the total number of each rating level (i.e.: all rates with 1 point, all rates with 2 point etc.)
platform stores information of the user: sex, name, total number of votes he/she made and address
First, I tried to define which information belongs to a dimension or fact table
(here I also checked which one is additive/semi additive/non-additive)
I realized my example is kind of difficult, because it’s hard to decide if it belongs to a fact table or dimension.
I would like to hear some advice. Would someone agree with my model?
This is how I would model it:
Hotel information -> hotel dimension
User rating -> additive fact – because I can aggregate them with all dimensions
User comment -> semi additive? – because I can aggregate them with the date dimension (I don’t know if my argument is correct, but I know I would have new comments every day, which is for me a reason to store it in a fact table
Answer as comment -> same handling like with the user comments
Date of comment-> dimension
Total Number of all votes (1/2/3/4/5) -> semi-additive facts – makes no sense to aggregate them, since its already total but I would get the average
User information sex and name, address -> user-dimension
User Information: total number of votes -> could be dimension or fact. It depends how often it changes. If it changes often, I store it in a fact. If its not that often, then dimension
I still have question, hope someone can help me:
My Question: should I create two date dimensions, or can I store both information in one date dimension?
2nd Question: each user and hotel just have one address. Are there arguments, to separate the address dimension in a own hierarchy? Can I create a 1:1 relationship to a user dimension and address dimension?
For your model, it looks well considered, but here are some thoughts:
User comment (and answers to comments): they are an event to be captured (with new ones each day, as you mention) so are factual, with dimensionality of the commenter, type of comment, date, and the measure is at least a 'count' which is additive. But you don't want to store big text in a fact so you would put that in a dimension by itself which is 1:1 with the fact, for situations where you need to query on the comment itself.
Total Number of all votes (1/2/3/4/5) are, as you say, already aggregates, mostly for performance. Totals should be easy from the raw data itself so maybe not worthwhile to store them at all. You might also consider updating the hotel dimension with columns (hotel A has 5 '1' votes and 4 '2' votes) that you'd update as you go on, for easy filtering and categorisation.
User Information: total number of votes: it is factual information about a user (dimension) and it depends on whether you always just want to 'find it out' about a person or whether you are likely to use it to filter other information (i.e. show me all reviews for users who have made 10-20 votes). In that case you might store the total in the user dimension (and/or a banding, like 'number of reviews range' with 10-20, 20-30). You can update dimensions often if you need to, but you're right, it could still just live as a fact only.
As for date dimensions, if the 'grain' is 'day' then you only need one dimension, that you refer to from multiple facts.
As for addresses, you're right that there are arguments on both sides! Many people separate addresses into their own dimension, referred to from the other dimensions that use them. Kimball suggests you can do that behind the scenes if necessary, but prefers for each dimension to have its own set of address columns(but modelled as consistently as possible).
I'm very confused about the documentation because in order to call the endpoint /group you need faceIDs.
Thus if you want to have faceIDs you need to through face /detect.
But a faceIDs is available for 24h only:
The faceId will expire 24 hours after detection call.
Thus, does it mean that each time I need to analyse people in a batch I have to make twice the call, first to identify the face and then for the group?
I assume we can't have 'both in one'.
The id from /detect is ephemeral, you need to add the faces to a person in a group and train the group. You can then use the service to identify people by their faces.
I am trying to create the ability to take a picture of a person and locate their user account based on a picture of them. I have 1MM users and each will have a photo which is only of them and will be associate to their user account via the creation of a PersonFace which is added to a Person which is in a PersonGroup. So with 10K persons per Person Group I need about 100 person Groups for me 1MM users.
So once all this is setup I am not clear on how I detect a face given a supplied photo. That is, I know I pass the photo via 'Detect' call however what is returned is an array of Face[] each which include a FaceId and to get from FaceId to a person I must call 'Identify', however that call requires I pass a Person Group Id, but I have 100 of them.
So given this the only solution I can come up with is to call Identify via loop through all 100 group Ids?
Currently, yes, that is the only way to do it. Which obviously makes scanning 1 million persons a less than ideal scenario given the 10 transactions per second limit.
There are a couple upcoming features which will improve this scenario, but right now I don't have an ETA for them:
Significantly higher limits of Persons per PersonGroup
Additional tiers of the Face API which allow significantly higher transaction per second rate limits.
Is it possible to create a custom segment which is based on 2 other segments? I have a custom segment ("Segment A") with visitors that performed a certain action. I would like to see stats for all visitors that DIDN'T perform this action. So ideally, my custom segment would be:
All Visitors - Segment A = New Segment
Is this possible?
In general no, you cannot combine segments (no deeper reason, it's just not a function that is not implemented possibly due to the large processing overhead nested segments would incur).
For your particular case still no, but for another reason: "All Visitors" comprises the whole of the data. If you create a segment you remove a part of the data. So "Substracting a group from All Visitors" actually describes the process of creating a new segment in general.
As per my comment you could remove the part of sessions or visitors that do not have a special action by creating a segment where (condition) "does not contain" (action) or "does not match regex" for (action). This would effectively remove those visitors from the "All Visitors" segment.