Optimize complex scenario in Cucumber - automated-tests

I have been working on an automation project where I have to write cucumber test for search filter. Search filter works dynamically where parameters are nested - next parameter are populated based on previous parameter e.g. On selecting "Subscribers" next parameters in dropdown are "Name", "City", "Network". Likewise, on selecting "Service Desk", parameters in subsequent dropdown are "Status", "Ticket no.", "Assignee". I am using Scenario Outline as below:
Scenario Outline: As a user, I can search records
Given I am on search page
When I search on "<category>" and "<nestedfilter>"
Then I see records having "<category>" category
Examples:
|category |nestedfilter|
|Subscribers |Name |
|Subscribers |City |
|Subscribers |Network |
|Service Desk|Status |
|Service Desk|Ticket no. |
|Service Desk|Assignee |
The filter could be more complex as there could be more nested filters based on previous nested filters.
All I need to know if there could be a more efficient way to handle this problem? For example passing data table to step_definition for which I am not too sure.
Thanks

If you really need the order of your items to be preserved, use a data table instead of a scenario outline.
A scenario outline is a shorthand notation for multiple scenarios. The execution of each scenario is not guaranteed. Or at least it would be a mistake to assume a specific execution order. The order of the items in a data table will not change if you use a List as argument and therefore a lot safer in your case.

A common mistake with Cucumber is to use Scenario Outline and example tables to do some sort of semi-exhaustive testing. This tends to hide lots of interesting things about the functionality being developed.
I would start writing single features for the searches you are working with and explore what those searches are and why they are important. So if we start with your first one we get ...
Note: all of the following assumes a background step Given I am searching
When I search on subscribers and name
Then I should see records for subscribers
and with the second one
When I search on subscribers and city
Then I should see records for subscribers
Now it becomes clear that there is a serious flaw in these scenarios, as both scenarios are looking for the same result.
So what you are actually testing is that
The subscribers search has name and city filters
A subscriber search should return subscriber results
Now you can refactor and get
When I do a subscriber search
Then I should see city, name, network filters
When I do a subscriber search
Then I should only see subscriber results
note: This is already much more efficient as you have reduced the number of scenarios from 3 to 2, and reduced the number of searches you have to do from 3 to 1.
Now I have no idea if this is what you want to do, but this is what your current scenario is doing. However because you are using an Outline and Example tables you can't see this.

The fact that you have a drop-down and nested filters is an implementation detail, which describes how the user is trying to achieve what they want to achieve.
If you think of what you're trying to do as examples of how the system behaves, rather than tests, it might be easier. You're not looking for something exhaustive. You also want your scenarios to be specific, so that you're illustrating them with realistic data and concrete examples. If you would commonly have some typical data available, that's a perfect thing to set up using Background.
So for instance, I might have scenarios like:
Background:
Given I have subscribers
| Name | City | Network | Status | etc.
| Bob | Rome | ABC | Alive | ...
| Sam | Berlin | ABC | Dead | ...
| Sue | Berlin | DEF | Dead | ...
| Ann | Berlin | DEF | Alive | ...
| Jon | London | DEF | Dead | ...
Scenario: First level search
Given I'm on the search page
When I search for Subscribers who are in Rome
Then I should see Bob
But not Sue or Jon.
Scenario: Second level search
Given I'm on the search page
When I search for Subscribers in Berlin on the ABC network
Then I should see Sam
But not Sue or Ann
etc.
The full-system scenarios should be just enough to understand what's going on. Don't use BDD for regression. It can help with that, but scenarios will rapidly become slow and unmaintainable if you try to cover every case. Delegate to integration and unit tests where appropriate (see "the testing pyramid").

Related

Preventing cycles in a neo4j graph where relationships are dated

Note: I apologize for not being able to include the images directly in the post, I don't have enough reputation in stackoverflow yet.
I'm brand new to graph databases, but I'm trying to understand if a graph database is suited for my problem. Here we go!
I have a set of users who can relate to each other via a "Parent" relationship (i.e., they can be built into a tree / hierarchy). The "Parent" relationship from one user to another is said to "begin" as of a certain date, and the relationship only "ends" if/when another "Parent" relationships exists between the same users and with a later date.
Every user can have <= 1 parent as of any particular date, and can have >=0 children. I.e., at most one parent at a time, and no limit to the number of children.
I've read blog posts dealing with dated relationships, but they don't seem to address this level of complexity:
https://maxdemarzi.com/2015/08/26/modeling-airline-flights-in-neo4j/
https://maxdemarzi.com/2017/05/24/flight-search-with-neo4j/
My challenge:
For a given set of users with existing "Parent" relationships, determine whether a new "Parent" relationship can be added "as of" a certain date WITHOUT creating a cycle anywhere in the timeline.
To help visualize an example, imagine we have four users Alice, Bob, Carlos, and David.
-----------------------------------------
| User | Date | Parent |
|-----------|---------------|-----------|
| Alice | 09/13/2012 | Bob |
| Alice | 04/01/2021 | David |
| Bob | 01/31/2020 | Carlos |
| Carlos | 02/14/2008 | David |
-----------------------------------------
Here is a (highly abstract) picture representing the current state of the data (time flows to the right):
[Initial state of the data as timeline]
https://i.stack.imgur.com/qdcbL.png
So in this example Alice has Bob as a parent from 9/13/2012 until 4/1/2021, at which point she begins to have David as a parent. Bob has no parent until 1/31/2020, at which point he gets Carlos as a parent. Etc.
I need to be able to determine whether an update/insert will create a cycle in the "parent" hierarchy at any point in time. So, for example, I'd like to be able to determine that it would be INVALID to set Carlos's parent to be Alice as of 10/22/2020 because then there would be a cycle in the hierarchy for the period between 10/22/2020 and 4/1/2021 (i.e., Alice-->Bob-->Carlos-->Alice). To help visualize it:
[Invalid addition creates a cycle in the timeline]
https://i.stack.imgur.com/xA2vv.png
But I also need to be able to determine that it would be VALID to set Carlos's parent to Alice as of 10/22/2021, as drawn here:
[Valid addition with no cycles in timeline]
https://i.stack.imgur.com/9u0P4.png
In terms of modeling the data, I started by thinking of two different models.
First:
I tried having my only nodes be "Users," and having my "Parent" relationships include a date in the relationship type. Since the range of dates is huge and the dates themselves are not known in advance, I'm not sure this is a good idea but decided to give it a shot anyway.
Diagram:
[graph diagram with dated relationships]
https://i.stack.imgur.com/ZuPDR.png
Cypher:
CREATE (n0:User {name: "Alice"})-[:P_2012_09_13]->(:User {name: "Bob"})-[:P_2020_01_31]->(:User {name: "Carlos"})-[:P_2008_02_14]->(:User {name: "David"})<-[:P_2021_04_01]-(n0)
Second:
I tried creating "UserDay" nodes to capture the date element, thereby reducing the range of relationship types to only two (i.e., a 1:many "HAS" relationship from User to UserDay, then a 1:1 "P" relationship from UserDay to User).
Diagram:
[graph diagram with user-days]
https://i.stack.imgur.com/W60bp.png
Cypher:
CREATE (n8:UserDay {date: "2021-04-01"})<-[:HAS]-(:User {name: "Alice"})-[:HAS]->(:UserDay {date: "2012-09-13"})-[:P]->(:User {name: "Bob"})-[:HAS]->(:UserDay {date: "2020-01-31"})-[:P]->(:User {name: "Carlos"})-[:HAS]->(:UserDay {date: "2008-02-14"})-[:P]->(:User {name: "David"})<-[:P]-(n8)
Given a source User, destination User, and start date, I need to be able to determine if a cycle would be created in the hierarchy for any time in the timeline.
Carlos, Alice, 10/22/2020 ----> should be invalid
Carlos, Alice, 10/22/2021 ----> should be valid
I've been spinning my wheels reading through neo4j docs and googling, and finally decided to ask my very first question on stackoverflow! Please let me know if you have any questions or if anything I've said is unclear.
Thanks in advance!

BigQuery to Data Studio : Show reliable COUNT DISTINCT regardless of the selected period

in my BigQuery project I store event data integrated from Firebase. The granularity and dimension is such that trying to present raw data in Data Studio quickly makes the report become VERY slow (1-2 min per page/interaction).
I then started to think how I could create pre-aggregated tables in BigQuery to speed everything up, but quickly realised COUNT DISTINCT metrics would be a problem with this approach.
Let me explain:
SELECT user, date
FROM UNNEST([
STRUCT("Adam" AS user, "20190923" AS date),
("Bob", "20190923"),
("Carl", "20190923"),
("Adam", "20190924"),
("Bob", "20190924"),
("Adam", "20190925"),
("Carl", "20190925"),
("Bob", "20190926")
]) AS website_visits;
+------+----------+
| User | Date |
+------+----------+
| Adam | 20190923 |
| Bob | 20190923 |
| Carl | 20190923 |
| Adam | 20190924 |
| Bob | 20190924 |
| Adam | 20190925 |
| Carl | 20190925 |
| Bob | 20190926 |
+------+----------+
The above is a table of website visits.
Clearly, creating a pre-aggregated table like
SELECT date, COUNT(DISTINCT user) FROM website_visits GROUP BY date
has the limitation that the count cannot be aggregated further (or even less, dinamically) to get a total, as doing a SUM would return 8 unique users which is not correct, there are only 3 unique users.
In BigQuery, this is fixed by using HLL_COUNT, which despite the approximation works ok for me.
Now to the big question:
How to do the same so that the result is displayable in Data Studio????
HLL_COUNT.EXTRACT is not available as function in there, and in the reporting I always have to keep in mind that the date range is set by the user however (s)he likes so it's not possible to store a pre-aggregated result for ALL cases...
EDIT 1: APPROX_COUNT_DISTINCT
As per answer from Bobbylank, I tried to use APPROX_COUNT_DISTINCT.
However I found that this just seems to move the issue down the line. My fault for not explaining what's over there.
Despite being performances acceptable it does not seem possible to me to blend a data source with this calculated metric.
Example: After displaying the amount of unique users in the selected period (which now works), I'm also trying to display Average Revenue Per User (ARPU) in Data Studio like Firebase does.
To do this, I have to SUM(REVENUE) / APPROX_COUNT_DISTINCT(USER)
Clearly, REVENUE works ok with pre-aggregation and is available in the raw data. I tried then to blend the raw data with a table containing just user visits. However APPROX_COUNT_DISTINCT can't be used in the blended data definition as calculated metrics are not allowed.
Even trying to use the USER field as a metric with Count Distinct aggregation, despite returning the correct figures when showing revenue and user count separately, when I try to divide them the problem becomes aggregation (apply SUM or AVG to the field and basically the result will be AVG(REVENUE/USERS) for each day).
I also then tried to store REVENUE directly in the visits table, but was reminded by Data Studio that I can't create calculated metrics that I can't mix dimensions and metrics in a calculated field.
APPROX_COUNT_DISTINCT might be more performance friendly for you?
https://support.google.com/datastudio/answer/9189108?hl=en
Otherwise the only way I can think would be to pre-calculate several metrics (e.g. unique users on that day, 7-day cumulative, 14-day, etc.) as your customer require for each single day.
Or you could provide a 2 page report with both of these methods with the caveat that the first can be used over a time period but will be much slower?

Naming Cucumber's Data Table

I am creating test cases on forms that could contains over 50 parameters, some of them would show up when a certain set of questions would be answered specifically.
The data tables were getting very long so I broke them into multiple data tables, each for a specific section of form.
I don't want to add every heading in the step so I want to use the data table's name instead.
Instead of:
Scenario:
.
.
.
When I fill in <title> <first name> <surname> ...
|title|first name|surname|...|
.
.
.
I want:
When I fill in <personal details>
And "personal details":
|title|first name|surname|...|
.
.
.
Is it possible to add and use the data table's name as the placeholder?
Note: I am working with Behave and Python.
It's definitely not possible using the <> syntax.
If you don't have many rows and your main concern is the readability of very wide tables then one option might be "transposing" the table like this:
When I fill in the personal details
| Field | Value |
| Title | Prof. |
| Surname | Einstein |
| ... | |
An other option could be to define the recurring set of properties in the Background like this:
Background:
Given the personal details for 'minimal personal details'
| Surname | First name |
| Doe | John |
And the personal details for 'insufficient personal details'
| First name |
| Jack |
And the personal details for 'all personal details'
...
...
When I fill in personal details using 'insufficient personal details'
The bindings of the background register their data in the context and the 'when' binding uses the data from the context.
In either case, you'll need a binding that will tolerate missing properties and catch unknown ones.
Am not sure about what you are asking but if you are using the same details in different scenarios then it is better to use Background option of Cucumber. So that it will be checked before executing every scenario.
Tables in Gherkin are a view on the real data (meaning a subset of columns and which rows are of interest). For readability reasons (and that somebody understands what you are doing), you should have at most 7 (plus/minus 2) columns. Maybe, the remaining data can be injected from configuration-files or config-profile database ?!? You basically use the provided Table columns as keys to be able select the configuration-row and to retrieve the remaining data from your configuration-profiles.

Concatenate Google Analytics results to ignore country code in URL

Our website automatically detects a user's region. Though the site structure remains the same across all regions, the content on the page can vary.
As such, URLs are fomatted as so: http://website.com/XX/pagename with XX=country code (e.g. GB, US, IT, etc.)
On Google Analytics, I want to see all of the different country versions of a single page contained as a single result.
For example, if I look at our top pages for January, I see:
| URL | page views |
|-------------------------|------------|
| website.com/US/page1 | 100 |
| website.com/GB/homepage | 60 |
| website.com/US/homepage | 40 |
| website.com/GB/page1 | 20 |
But what I want to see is:
| URL | page views |
|----------------------|------------|
| website.com/page1 | 120 |
| website.com/homepage | 100 |
Wherein the same URL (ignoring country code) is concatenated into one figure.
Is such a thing possible?
My end game here is a desire to see what our most popular pages are across the site in total, regardless of which country the user is browsing from.
Thanks!
One option is to use an advanced filter in GA so that you take something like website.com/US/page1 and replace it with website.com/page1. This only works on data moving forward from when the filter is applied, and does not change historical data, and cannot be undone once applied. This is another reason why it's always a good idea to have a Raw view which is unfiltered.
For the Advanced Filter, you need to do something like this:
where it looks for the pattern /{any two letters}/{anything else} and outputs just the /{anything else} part.

How do you know if your data is displayed semantically correct as a table or something else?

I have information from a database. It is one row. Now, it is old, and tables were used to display the data. It works but is ugly and hard to maintain. I'm "fixing" it. But, I don't know if it is considered tabular or not. It comes from database tables, but I don't know that what it means to be semantically correct in using a table for display.
I have several sections like this on a webpages. They are all calls to the same database, different sets of data, sectioned off in the webpage.
For example, one set is a general set of information, lastname, firstname, middle, other stats...
Next section might be address, etc.
A while back when I did asp.net forms there was a list view, I think that was similar to what I need to create (I'm using just straight html and a scripting language, no controls).
How should I be displaying the information to be semantically correct?
edit: It is one person that does not repeat.
edit: A single record, but displayed on the same page, just various SELECT statements to get that data all on the same page.
If it's tabular data (i.e. multiple records displayed underneath each other) then a <table> would probably be the best choice. If it's a view of a single record, maybe even aggregating data from multiple tables then the <table> shouldn't be your first choice.
Tabular data:
--------------------------------
| ID | Name | Description | Date |
--------------------------------
| 01 | ... | ... | ... |
| 02 | ... | ... | ... |
Not tabular data:
ID: ...
Name: ...
Description: .................
.................
.....
Date: ....

Resources