Can you resolve identity queries whilst Face API is undergoing training? - microsoft-cognitive

I am investigating solutions for identifying people utilising facial recognition and I am interested in using Microsoft's Face API.
I have noted that when adding new people the model needs to be trained again before those people will be recognised.
For our application it is crucial that whilst training is happening that the model continues to resolve identify requests so that the service runs uninterrupted.
It seems to make sense that the old model would continue to respond to identify requests whilst the new model is being trained up but I am not sure if this assumption is correct.
I would be grateful if someone with knowledge of the API could advise if this is the case or if not if there is another way round to ensure continuous resolution of identify requests. I have thought about creating a whole new person group with all the new images but this involves copying a lot of data and seems an inefficient way to go.

From the same documentation link in the previous answer:
During the training period, it is still possible to perform Identification and FindSimilar if a successful training is done before. However, the drawback is that the new added persons/faces will not appear in the result until a new post migration to large-scale training is completed.
My understanding is that this would work with LargePersonGroups (hence "post migration to large-scale training") but is unclear if would work for the legacy PersonGroups.

I tried a few manipulations on my projects using face API but the collection of faces is too small and the training is too fast to check. I think it's not blocking the previous version but cannot guarantee it.
Anyway, you will be interested in the following part of the documentation addressing the problems of training latency: https://learn.microsoft.com/en-us/azure/cognitive-services/face/face-api-how-to-topics/how-to-use-large-scale#buffer
It shows how you could avoid the problem you depict by using a "buffer" group

Related

Can't get my A3C with LSTM layer using Tensorflow to work

I have recently tried to implement my own version of the Asynchronous Advantage Actor-Critic (A3C) method for deep reinforcement learning since I couldn't get other A3C implementations found in the web to work properly. The problem is that my version isn't converging either....So, I would really appreciate any help to identify the problem. The code is located in here: https://github.com/MatheusMRFM/A3C-LSTM-with-Tensorflow. I am training the method using the Pong game from the Open AI gym environment. Here's what I did:
My implementation is heavily based in the following A3C implementations: Arthur Juliani's version, Open AI's A3C and andreimuntean's version. I've chosen these implementations due to their clarity and because everything seemed correct according the the original A3C paper;
I'm using a network as follows: a set of convolutional layers, a fully connected layer, an LSTM layer, and again, two fully connected layers (one for the policy and the other for the value function). I already tested several other architectures (changing the concolutional layers, removing the first hidden layer, change the output of the hidden and LSTM layers, etc. None of these configurations worked....
I tried 3 different optimizers: RMSPropOptimizer, AdadeltaOptimizer, and AdamOptimizer. I also tried different learning rates for each one. No luck;
I already tried several parameters based on the implementations that I looked.
My code always ends up converging to a policy where the paddle always moves up or always moves down (not both). There must be some stupid detail that I missed, but I can't find it. Any help is welcome!
I have actually found what the problem was. And just as I thought, it was only a simple detail that messed everything up: in my code and in all the other codes that I mentioned in my post, the policy loss function uses the softmax log of the policy outputted by the network. In order to avoid Nan results (in case the policy has a 0 associated to at least one action), I added a small value to the policy (in my code, this is done in line 180 from Network.py). It seems that this small value (1e-8) wasn't so small after all, and it was messing with the policy loss function. I ended up using 1e-13 and it then worked. I tested in the VizDoom environment (same map used in Arthur Juliani's version) and it converged within about 6k episodes.
Hope it helps anyone with a similar problem. I will soon update my code in my GitHub account.

Scaling Firefeed Followers

I am in the research phase of a realtime application that I want to write and I think Firebase is the right choice but I'm currently stuck trying to figure out my data schema. My application is similar to the Firefeed example application in that it is a social inbox. My issue is the following code where the data is looped and copied to every user that "follows" the current user. Theoretically if this was Twitter and someone like Kim Kardashian posted a new Spark it'd have to loop through and save 50,000,000+ records.
Doing this on the client side, or doing this at all, seems extremely slow and error prone to me. Is this a valid concern? I realize my app has zero users right now but I'd like to plan my scaling ahead of time.
// Add spark ID to the feed of everyone following this user.
currentUser.child("followers").once("value", function(list) {
list.forEach(function(follower) {
var childRef = firebase.child("users").child(follower.name());
childRef.child("feed").child(sparkRefId).set(true);
});
});
I'd really appreciate any help and insight here!
Thanks.
tl;dr: I'd wait until you get further before coding up solutions. Avoid premature optimization.
Distant future scaling issues are very hard to optimize for because it's very difficult to predict how people will end up using your software.
But, to answer your specific question, there are ways to handle the Kim Kardashians of the social media world. It all comes down to partitioning behavior. You're going to have to treat them differently than the rest of your users. You'll have to do this regardless of the tech stack you use.
The extent to which you partition behavior depends a lot on the distribution of your users. Remember Tom from MySpace? That's an extreme example. I bet there were references to isTom peppered all over the codebase to deal with it, but we probably don't need to go that far.
In the case of the snippet of code in your question, it's already got a lot going for it in terms of scale. It distributes the data across all of the followers, and in doing so does not create any hot spots in the data. It will, however, take some time to run for 50,000,000 users.
My first attempt at optimization would be to take the same code and put it on a node worker. I'd then switch the client to register a task for that node worker for my really popular users.
If that still wasn't fast enough, I'd start looking into ways to partition the data for my super-users.

How to approach writing developer tests (unit tests, integration tests, etc) for a system?

I have a WCF service which runs and interacts with database, file system and few external web services, then creates the result and Xml Serialize it and returns it finally.
I'd like to write tests for this solution and I'm thinking how (it's all using dependency injection and design by contract).
There are 3 main approaches I can take.
1) I can pick smallest units of codes/methods and write tests for it. Pick one class and isolate it from its dependencies (other classes, etc). Although it guarantees quality but it takes lots of time writing them and that's slow.
2) Only make the interaction with external systems mockable and write some tests that cover the main scenarios from when the request is made until the response is serialized and returned. This will test all the interactions between my classes but mocks all external resource accesses.
3) I can setup a test environment where the interaction with external web services do happen, file access happens, database access happens, etc. Then writing the tests from end to end. this requires environmental setup and dependency on all other systems to be up and running.
About #1, I see no point in investing the time/money/energy on writing the tests for every single method or codes that I have. I mean it's a waste of time.
About #3, since it has dependency on external resources/systems, it's hard to set it up and running.
#2, sounds to be the best option to me. Since it will test what it should be testing. Only my system and all its classes and mocking all other external systems.
So basically, my conclusion after some years experience with unit tests is that writing unit tests is a waste to be avoided and instead isolated system tests are best return on investment.
Even if I was going to write the tests first (TDD) then the production code, still #2 I think would be best.
What's your view on this? would you write small unit tests for your application? would you consider it a good practice and best use of time/budget/energy?
If you want to talk about quality, you should have all 3:
Unit tests to ensure your code does what you think it does, expose any edge cases and help with regression. You (developer) should write such tests.
Integration tests to verify correctness of entire process, whether components talk to each other correctly and so on. And again, you as a developer write such tests.
System-wide tests in production-like environment (with some limitations naturally - you might not have access to client database, but you should have its exact copy on your local machines). Those tests are usually written by dedicated testers (often in programming languages different from application code), but of course can be written by you.
Second and third type of tests (integration and system) will be way too much effort to test edge cases of smaller components. This is what you usually want unit tests for. You need integration because something might fail on hooking-up of tested, verified and correct modules. And of course system tests is what you do daily, during development, or have assigned people (manual testers) do it.
Going for selected type of tests from the list might work to some point, but is far from complete solution or quality software.
All 3 are important and targeted at different test types that is a matrix of unit/integration/system categories with positive and negative testing in each category.
For code coverage Unit testing will yield the highest percentage, followed by Integration then System.
You also need to consider whether or not the purpose of the test is Validation (will meet the final user\customer requirements, i.e Value) or Verification (written to specification, i.e. Correct).
In summary the answer is 'it depends', and I would recommend following the SEI CMMi model for Verification and Validation (i.e. testing) which begins with the goals (value) of each activity then subjecting that activity to measures that will ultimately allow the whole process to be subjected to continuous improvement. In this way you have isolated the What and Why from the How and you will be able to answer time and value type questions for your given environment (which could be a Life support System or a Tweet of the day, to your favorite Aunt, App).
Summary: #2 (integration testing) seems most logical, but you shouldn't hesitate to use a variety of tests to achieve the best coverage for pieces of your codebase that need it most. Shooting for having tests for "everything" is not a worthy goal.
Long version
There is a school of thought out there where devs are convinced that adopting unit\integration\system tests means striving for every single chuck of code being tested. It's either no test coverage at all, or committing to testing "everything". This binary thinking always makes adopting any kind of testing strategy seem very expensive.
The truth is, forcing every single line of code\function\module to be tested is about as sound as writing all your code to be as fast as possible. It takes too much time and effort, and most of it nets very little return. Another truth is that you can never achieve true 100% coverage in a non-trivial project.
Testing is not a goal unto itself. It's a means to achieve other things: final product quality, maintainability, interoperability, and so on, all while expending the least amount of effort possible.
With that in mind, step back and evaluate your particular circumstances. Why do you want to "write tests for this solution"? Are you unhappy with the overall quality of the project today? Have you experienced high regression rates? Are you perhaps unsure about how some module works (and more importantly, what bugs it might have)? Regardless of what your exact goal is, you should be able to select pieces that pose particular challenges and focus your attention on them. Depending on what those pieces are, an appropriate testing approach can be selected.
If you have a particularly tricky function or a class, consider unit testing them. If you're faced with a complicated architecture with multiple, hard to understand interactions, consider writing integration tests to establish a clean baseline for your trickiest scenarios and to better understand where the problems are coming from (you'll probably flush out some bugs along the way). System testing can help if your concerns are not addressed in more localized tests.
Based on the information you provided for your particular scenario, external-facing unit testing\integration testing (#2) looks most promising. It seems like you have a lot of external dependencies, so I'd guess this is where most of the complexity hides. Comprehensive unit testing (#1) is a superset of #2, with all the extra internal stuff carrying questionable value. #3 (full system testing) will probably not allow you to test external edge cases\error conditions as well as you would like.

Regarding Automation Approach

I am supposed to automate an application which is developed in PowerBuilder. In order to test this Application we are using Rational Robot as a functional testing tool. We expect at least 40 -50% of change control in the Application for each release. Release trends are scheduled at least 3 times in a year.
The product has different setup for each client. Accordingly scenario has been derived. Although if is there any change occurs, it would be in functional feature and also in interface. Pointing to that, need to proceed with automation. Identified few areas which are stabilized (i.e., where no major changes occurs) to automate. Will that be feasible for proceeding with Automation?
Could you please suggest me how to go about on this?
I've seen the answer to your last question involve a consultant spending two days interviewing the development team, then three days developing a report. And, in some cases, I'd say the report was preliminary, introductory, and rushed. However, let me throw out a few ideas that may help manage expectations on your team.
Testing automation is great for checking for regressions in functionality that allegedly isn't being touched. Things like framework changes or database changes can cause untouched code to crash. For risk-averse environments (e.g. banking, pharmaceutical prescriptions), the investment in automation is well worth the effort.
However, what I've seen often is an underestimation of the effort. To really test all the functionality in a unit (let's say a window, for example), you need to review the specifications, design tests that test each functional point, plan your data (what is the data going into the window, how will you make sure this data is as expected when you start the test each time, what is the data when you've finished your tests, how will you ensure the data, including non-visible data, is correct), then script and debug all your tests. I'm not sure what professional testers say (I'm a developer by trade, but have taken a course in an automated testing tool), but if you aren't planning for the same amount of effort to be expended on developing the automation tests as you're spending on application development for the same functionality, I think you'll quickly become frustrated. Add to that the changing functionality means changing testing scripts, and automated testing can become a significant cost. (So, tell your manager that automated testing doesn't mean you push a button and things get tested. < grin > )
That's not to say that you can't expend less effort on testing and get some results, but you get what you pay for. Having a script that opens and closes all the windows in the app provides some value, but it won't tell you that a new behaviour implemented in the framework is being overridden on window X, or that a database change has screwed up the sequencing of items in a drop down DataWindow, or a report completion time goes from five seconds to five hours. However, again, don't underestimate the effort. This is a new tool with a new language and idiosyncrasies that need to be figured out and mastered.
Automated testing can be a great investment. If the cost of failure is significant, like a badly prescribed drug causing a death, then the investment is worth it. However, for cases where there is a high turn over of functionality (like I think you're describing) and the consequences of a failure is less critical, you might want to consider comparing the cost/benefit of testing automation with additional manual testing resources.
Good luck,
Terry
Adding on to what #Terry said: It sounds like you are both new to automation in general and to Rational Robot in particular.
One thing to keep in mind is that test automation is software development and needs to be treated as such. That means you need personnel dedicated to the automation effort who are solid programmers and have expertise in the tool being used (Robot in this case).
If your team does not have the general programming/automation skills and the specific Robot skills, you are going to need to hire that personnel or get existing staff trained in those skill sets.

How do you deal with design changes?

I just finished working on a project for the last couple of months. It's online and ready to go. The client is now back with what is more or less a complete rewrite of most parts of the application. A new contract has been drafted and payment made for the additional work involved.
I'm wondering what would be the best way to start reworking this whole thing. What are the first few things you would do? How would you rework the design in a way that you stay confident that the stuff you're changing does not break other stuff?
In short, how would you tackle drastic application design changes efficiently (both DB and code)?
Presuming that you have unit tests in place, this is just refactoring.
If you don't have unit tests in place, then
Write unit tests for the parts you're likely to keep.
Write unit tests for the parts you're going to change.
Run the tests. The "keep" should pass. The "change" should fail.
Start refactoring until the tests pass.
This is NOT-A-NEW thing in software and people have done this and written a lot about this.
Try reading
Working Effectively with Legacy
Code
Refactoring Databases:
Evolutionary Database Design
The techniques explained here are invaluable to sustain any kind of long running IT projects.
Database design is different from application design in this regard.
Very often, client rethinking changes the application completely, but changes little, if anything, in the fundamental underlying data model of the enterprise. The reason for this is that clients tend to think in terms of business processes, but not in terms of fundamental data. Business processing and data processing are tightly coupled. Data storage is less tightly coupled.
In the days of classical database design, designers learned how to exploit this pattern, by dividing their database design into (at least) two layers: logical design and physical design. There are any number of times that a change of business process requires a complete rewrite of the application, and a major rework of the database physical design, but requires few, if any, changes to the logical design.
If your database design didn't separate out the layers like this, it's hard to tell what gets affected and what doesn't. Start with your tables and columns. Ask yourself if any of the changes require removing any column from the table it's in, or require inventing new columns. If the answer is no, you're in luck. Next, look at the constraints placed on the database (things like PRIMARY KEY, FOREIGN KEY, UNIQUE and NOT NULL). These constraints might be tightened or loosened by the client's changes. If not, you're in luck. If you didn't declare any constraints in the database, and chose to do all your integrity protection in application code, you're probably out of luck.
You still have a fair amount of work to do in terms of changing the indexes on the tables, and the way the application works with the data. But you've salvaged part of the investment in the old system.
The application itself is much more vulnerable to client changes in process than the database. If your database design was completely driven by your application design, you may be out of luck.
If it's THAT drastic of a change it might be best to just start over. I've worked on a number of projects that have gone through some drastic changes.
Starting over gives you a chance to use experience learned since the last project and provide a more efficent product.
I would recommend against trying to re-work the old site into the new site, you'll probably spend more time fiddling around changing things than you would have if you had just re-written it.
Best of luck to you !
How would you rework the design in a way that you stay confident that the stuff you're changing does not break other stuff? In short, how would you tackle drastic application design changes efficiently (both DB and code)?
Tests, code complexity/coverage metrics, and a continuous integration system. Run them early and often, so you know which parts are the riskiest and where to start writing.
These will become your safety nets when you have to make potentially problematic changes. If something does break, your CI system will tell you, and you won't have spent weeks down some rabbit hole before you realize there's a problem.
Sometimes you do things better the second time around so just try and stay positive. Plus you will have more domain knowledge this time around.

Resources