Wondering if there are good examples or suggestions for how to handle steps that require manual review in a database-based scientific data pipeline (datajoint). For example, we'd like to handle the pre-processing and denoising/demixing of our neuronal calcium imaging data through the automated pipeline, but then each video and each cell requires manual review before being entered into the database for further analysis. What is the best practice for handling such steps? Add a manual table to add only data that pass review to downstream pipeline stages? Keep the steps before manual review separate from the rest of the pipeline (in their own schema?)? Thanks!
Automated curation
First, you could invoke an interactive GUI as part of the make method of a particular table that requires manual intervention. It would present the computed results for the human review/curation/correction.
A separate manual curation
Second, you can define a manual table to support manual review/curation. For example, the Curation table in the Calcium Imaging element follows this pattern: https://github.com/datajoint/element-calcium-imaging
This is a question about programming style and patterns when it comes to writing tests for complex systems written in Firebase/Firestore. I'm writing a web app using Firebase, Typescript, Angular, Firestore, etc...
Objective:
I have written basic security rules tests to test my users collection. I'll also be testing the function that writes the document, and e2e testing that the document is written when a new user signs up. So far so good.
The tests are clean so far - I manually define a few user objects, write it to the database before the tests with beforeEach() and run the tests. The tests depend on me knowing what data I wrote in the first place - what were the document ids, what were the field values etc... and then I check for certain operations to pass, certain operations to fail, depending on the custom claims provided. Similarly, with functions tests and e2e tests, I'll be checking if correct data was written, generated, etc...
The next step would be to test the chat functionality. Here's where I run into code duplication issues.
Issue:
So let's say I want to test the chat functionality or the transaction functionality. For these modules to be tested, the database needs to have fake user data, transaction data, test data etc... and, furthermore, within the tests, I need to have access to the fields, document Ids, etc.. that were written to the database, so I can access the documents for tests, etc...
For example, whether or not a chat message can be written to firestore depends on whether certain fields exist on the user document.
This would require me to manually define all the objects I plan to write to the database in the same file as the tests themselves, so I have access to what I wrote. As I test more complex and dependent modules of the system, since each test for each module is in a separate file, I either have to manually write out each object, or require it from another test file and write it. Each document has its payload and its Id that I need to keep track of, and even the fake user token objects I have to pass to firestore (or actual auth user records I have to import for online testing). This would mean a whole bunch of boilerplate code and duplication simply writing objects as such in all of my files:
const fakeUserPayload: User = {
handle: 'username',
email: faker.internet.email(),
...etc
}
// And so on and on for all test users, chat docs, transaction docs, etc...
Potential Solution: so there are a few potential solutions I came up with, but none seem to solve the problem.
For example, I thought I would write a module that simply populates the firestore database at the start of each test. The module would have a userPayloadFactory and other loops (using faker module) to automatically populate the Db with fake data.
Problem: If I did this, I wouldn't have access to the document Ids and field data in my tests, since it's automatically generated. I thought, maybe I could populate firestore with fake data, and then use an administrative db connection to read the fake user documents and their Ids back into the tests, and then use this data to conduct the tests. For example, I would find the user id and then generate a chat document and test for correct data / success. Except it seems incredibly messy to write data in one module and then read it back in another, especially since most tests require a specific document to be written to test for certain cases/scenarios. Which makes auto-populated mock data useless - so we're back to square one, where I have to manually define and write out a large number of fake objects in order to test rules, functions and functionality.
Potential Solution: I could maybe keep auth and firestore data in a JSON backup file, (so they remain static) and import them into the database with a shell command before each test suite. However, this has its own issues as it's not easy to dynamically generate new test cases or edge cases, and also difficult to continually re-export and update the JSON backup files as the project grows.
What is a better way to structure and write my tests so that I can automatically generate the documents and payloads I need, while having control and access over what gets written?
I'm hoping there's some kind of factory or pattern that can make this easier, scalable and more consistent and robust.
You're asking a really huge question, writing tests for big environments is a complex task and even more when it's coupled to database state. I will try to answer to the best of my knowledge so take my words with a grain of salt.
I believe you're dealing with two similar yet different concerns, automatic creation of mock data and edge-cases that require very specific document setup. These two are tightly coupled within the tests itself as you need both kinds of data to run them, however the requirements differ one another and therefore their creation should take that in account. Let's talk about your potential solutions from that perspective.
The JSON backup provides a static and consistent dataset that allows to repeat the tests over time being sure the environment hasn't changed and it's a good candidate to address the edge-case problem. It's downsides are that is hardly maintainable because any object modification to accommodate changes in TestA may break the expectations of TestB that also relies on it, it's almost assured you will loss the track of these nuances at some point; You can add new objects to accommodate code and test changes but this could lead to a combinatoric explosion of the objects you need to take care of as your project grows. Finally JSON files are not the way to go if you are going to require dynamically generated data.
The factory method is a great option to deal with the creation of arbitrary mock data since there are less restrictions placed on it, so writing a generator seems a good idea. You disliked this based on the fact that you need to know your data while running the tests but I think that's solvable. Your test might load the Factory module, then create the data and store it in-memory/HDD in addition to commit this changes to Firestore, there's no need to read the data from the database.
Your other other concern were the corner case documents which is trickier because you need very specifically shaped data. You might handcraft the documents yourself but then you got a poorly scalable solution. The alternative is trying to look for constraints/invariants in the shape of edge-case documents that you can abstract into factory methods. The worst scenario here is that when some edge-cases do not share any similarity with the rest you will need to write a whole method for each of these. I won't consider this a downside as it improves the modularity and maintainability of the Factory .
Overall, I would stick with the Factory pattern because it already offers techniques to follow the DRY principle by the means of isolating the creation of distinct objects, decoupling data creation from test execution and facilities to avoid disruptive breaks as the tests evolve with the project.
With that being said a little research got me to this page about the Builder Pattern that you may find interesting. Also this thread about code duplication in tests might be of interest. And finally just to comment out that Firebase has some testing functionality that can be found here.
Hope this helps.
I have integer values as test cases(ids of different users), and I don't want to hardcode them, I have a method that gets users from API. It is said in specs, that dynamic test cases spec is not implemented yet. Is it possible to load test cases before test is executed?
We have used the term "dynamic test cases" to mean that the tests are not created before the run but during it. Specifically, the test cases can change while the test is running.
It doesn't sound like this is what you need. If I understand correctly, you want to get the user ids programmatically at the time the tests are created. You can easily do this using the TestCaseSourceAttribute on a method that uses your API to get the user id.
I need to write end to end tests for a web api which test various scenarios. However the models logic is pretty tough, if the tests for a case/model takes like 100 code rows, the setup in the background most likely takes 300 - 500 of those, lots of initialization, passing of automatically generated valuesm etc.
In general I tend to have data generated via the models, against which I test in my cases, but this time I'm more inclined in using database scripts to populate the database directly according to my needs, or to just restore a 'predefined' state. What would be all the disadvantages of this approach? The software being tested here is somewhat mission critical.
I have a desktop application made in Flex using PureMVC multi-core and Sqlite as back-end.Now, I want to write integration tests.The proxy layer makes database calls using async method of SQLConnection.And, the result-handler throws notification.I want to test that expected values were modified in tables.Any ideas,how can this be done?
If you're asking for ways to do unit testing in Flex, I suggest checking out FlexUnit as it is the most commonly used unit testing framework for Flex.
From a conceptual stand point, basically, you need to write a method to retrieve data from the database; either part of your Unit Tests or calling your actual encapsulated application classes. Many people use Data Access Objects and Data Gateways for this purpose.
I suggest running before and after tests. Retrieve data from the database and check it's value. The run your test. Then retrieve data from the database to check the updated values. What your database call entails depends on the type of test. You may want to check for the number of records in a table or values of a specific record.
Flex's asynchronous nature makes this a little tricky.
I never thought testing database values from a UI was an area where unit testing shines, but I understand why it is necessary.