How are requests and tests organized in postman? - automated-tests

I am investigating Postman but I can't find an optimal way to organize the requests. Let me give an example:
Let's imagine I'm working on testing two microservices and my goal is to automate the tests and create a report that informs us of the outcome. Each microservice has 10 possible requests to perform (either get, post, etc.).
The ideal solution would be to have a collection which contains two folders (one per microservice) and that each folder has 10 requests; and to be able to run that collection from newman with a CSV/JSON data file, generating a final report (with tools like htmlextra).
The problem comes with data and iterations. Regarding the iterations, some of the requests like for example some 'GET' need only 1 execution but others like for example some 'POST' need several executions to check the status(if it returns a 200, a 400, etc.). Regarding the data, having 10 requests in the same collection, and some of the requests occupy quite a few columns of the csv, the data file becomes unreadable and hard to maintain.
So what I get when I run the collection is an unreadable data file and many iterations on requests that don't need them. As an alternative, I could create 1 collection for each request, and a data file for each collection but in that case I could not get a report as a whole when executing from newman, since to execute several collections I must execute several newman instructions, generating a report file for each one. So I would get 20 html file as report, 1 for each request(also not readable).
Sorry if I am pointing in the wrong direction as I have no experience with the tool.
On the web I only see examples of reports and basic collections, but I'm left wanting to see something more 'real'.
Thanks a lot!!!
Summary of my notes:
Only 1 data file can be assigned to the collection runner.
There are many requests which have a lot of data, about 10 columns, so if we have one data file for several, it becomes unreadable. Ideally 1 file for each request.
This forces us to create a collection for each request, since only 1 file per collection is feasible.
Newman can only run 1 collection per command, if you want to launch more, you have to duplicate instruction. This forces to create a report per collection. If we have 1 collection per request, we would have 1 report for each one instead of a report with the status of all the requests, so it loses the sense.

Related

PAW duplicated documents retain query output between documents, is this supposed to happen?

I have a PAW document that duplicated into two different documents, app A and app B. The files use environments and variables. When I have both open if I do a REST call in app A then switch to app B the results from the call in A are displayed in the results output are in B. If I refresh the REST call in app B, then I see the results in the output of app A. My guess, is there is a setting to a file location that is being updated when I do a REST call and since the documents are duplicated they are sharing this location. If this is the case, can I update this file location?
This happens because responses (HTTP exchanges) are stored in an independent location, in order to keep your actual files clean of the request/response history. For that, we reference responses by an unique identifier of the request that generated them. Duplicating files manually tricks this system...
As a quick workaround, I suggest that in one of the documents, you group all requests together, duplicate the group, and then delete the original group. This will assign new identifiers to requests, and fix your issue. (It will also break links between requests, used in Request/Response dynamic values...).
A quick screencast to explain this: http://cl.ly/2S3c46122k3s

Exporting all Marketo Leads in a CSV?

I am trying to export all of my leads from Marketo (we have over 20M+) into a CSV file, but there is a 10k row limit per CSV export.
Is there any other way that I can export a CSV file with more than 10k row? I tried searching for various dataloader tool on Marketo Launchpoint but couldn't find a tool that would work.
Have you considered using the API? It may not be practical unless you have a developer on your team (I'm a programmer).
marketo lead api
If your leads are in salesforce and marketo/salesforce are in parity, instead of exporting all your leads, do a sync from salesforce to the new MA tool (if you are switching) instead. It's a cleaner easier sync.
For important campaigns etc, you can create smart lists and export those.
There is no 10k row limit for exporting Leads from a list. However, there is a practical limit, especially if you choose to export all columns (instead of only the visible columns). I would generally advise on exporting a maximum of 200,000-300,000 leads per list, so you'd need to create multiple Lists.
As Michael mentioned, the API is also a good option. I would still advise to create multiple Lists, so you can run multiple processes in parallel, which will speed things up. You will need to look at your daily API quota: the default is either 10,000 or 50,000. 10,000 API calls allow you to download 3 million Leads (batch size 300).
I am trying out Data Loader for Marketo on Marketo Launchpoint to export my lead and activity data to my local database. Although it cannot transfer marketo data to CSV file directly, you can download Lead to your local database and then export to get a CSV file. For your reference, we have 100K leads and 1 billion activity data.
You might have to run multiple times for 20M leads, but the tool is quite easy and convenient to use so maybe it’s worth a try.
Initially there are 4 steps to get bulk leads from marketo
1. Creating a Job
2. Enqueue Export Lead Job
2. Polling Job Status
3. Retrieving Your Data
http://developers.marketo.com/rest-api/bulk-extract/bulk-lead-extract/

Loading Bulk data in Firebase

I am trying to use the set api to set an object in firebase. The object is fairly large, the serialized json is 2.6 mb in size. The root node has around 90 chidren, and in all there are around 10000 nodes in the json tree.
The set api seems to hang and does not call the callback.
It also seems to cause problems with the firebase instance.
Any ideas on how to work around this?
Since this is a commonly requested feature, I'll go ahead and merge Robert and Puf's comments into an answer for others.
There are some tools available to help with big data imports, like firebase-streaming-import. What they do internally can also be engineered fairly easily for the do-it-yourselfer:
1) Get a list of keys without downloading all the data, using a GET request and shallow=true. Possibly do this recursively depending on the data structure and dynamics of the app.
2) In some sort of throttled fashion, upload the "chunks" to Firebase using PUT requests or the API's set() method.
The critical components to keep in mind here is that the number of bytes in a request and the frequency of requests will have an impact on performance for others viewing the application, and also count against your bandwidth.
A good rule of thumb is that you don't want to do more than ~100 writes per second during your import, preferably lower than 20 to maximize your realtime speeds for other users, and that you should keep the data chunks in low MBs--certainly not GBs per chunk. Keep in mind that all of this has to go over the internets.

BigQuery streaming best practice

I am using Google BigQuery for sometime now, using upload files,
As I get some delays with this method I am now trying to convert my code into streaming.
Looking for best solution here, what is more correct working with BQ:
1. Using multiple (up to 40) different streaming machines ? or directing traffic to single or more endpoints to upload data?
2. Uploading one row at a time or stacking to a list of 100-500 events and uploading it.
3. is streaming the way to go, or stick with files uploading - in terms of high volumes.
some more data:
- we are uploading ~ 1500-2500 rows per second.
- using .net API.
- Need data to be available within ~ 5 minutes
Didn't find such reference elsewhere.
The big difference between streaming data and uploading files is that streaming is intended for live data that is being produced on real time while being streamed, whereas with uploading files, you would upload data that was stored previously.
In your case, I think Streaming makes more sense. If something goes wrong, you would only need to re-send the failed rows, instead of the whole file. And it adapts more to the growing files that I think you're getting.
The best practices in any case are:
Trying to reduce the number of sources that send the data.
Sending bigger chunks of data in each request instead of multiple tiny chunks.
Using exponential back-off to retry those requests that could fail due to server errors (These are common and should be expected).
There are certain limits that apply to Load Jobs as well as to Streaming inserts.
For example, when using streaming you should insert less than 500 rows per request and up to 10,000 rows per second per table.

EMR - create user log from log

EMR Newbie Alert:
We have large logs containing the usage data of our web site. Customers are authenticated and identified by their customer id. Whenever we try to troubleshoot a customer issue we grep through all the logs (using the customer_id as search criteria) and pipe the results into a file. Then we use the results file to troubleshoot the issue. We were thinking about using EMR to create per-customer log files so we don't have to create a per-customer log file on demand. EMR would do it for us every hour for every customer.
We were looking at EMR streaming and produced a little ruby script for the map step. Now we have a large list of key/values (userid, logdata).
We're stuck with the reduce step however. Ideally I'd want to generate a file with all the logdata of a particular customer and put it into an S3 bucket. Can anybody point us to how we'd do this? Is EMR even the technology we want to use?
Thanks,
Benno
One possibility would be to use the identity reducer, stipulating the number of reduce tasks via property beforehand. You would arrive at a fixed number of files, in which all the records for a set of users would live. To find the right file to search for a particular user, hash the user id to determine the right file and search therein.
If you really want one file per user, your reducer should generate a new file every time it is called. I'm pretty sure there are plenty of s3 client libraries available for ruby.
Without looking at your code, yes, this is typically pretty easy to do in MapReduce; the best case scenario here is if you have many, many users (who doesn't want that?), and a somewhat limited number of interactions per user.
Abstractly, your input data will probably look something like this:
File 1:
1, 200, "/resource", "{metadata: [1,2,3]}"
File 2:
2, 200, "/resource", "{metadata: [4,5,6]}"
1, 200, "/resource", "{metadata: [7,8,9]}"
Where this is just a log of user, HTTP status, path/resource, and some metadata. Your best bet here is to really only focus your mapper on cleaning the data, transforming it into a format you can consume, and emitting the user id and everything else (quite possibly including the user id again) as a key/value pair.
I'm not extremely familiar with Hadoop Streaming, but according to the documents: By default, the prefix of a line up to the first tab character is the key, so this might look something like:
1\t1, 200, "/resource", "{metadata: [7,8,9]}"
Note that the 1 is repeated, as you may want to use it in the output, and not just as part of the shuffle. That's where the processing shifts from single mappers handling File 1 and File 2 to somethig more like:
1:
1, 200, "/resource", "{metadata: [1,2,3]}"
1, 200, "/resource", "{metadata: [7,8,9]}"
2:
2, 200, "/resource", "{metadata: [4,5,6]}"
As you can see, we've already basically done our per-user grep! It's just a matter of doing our final transformations, which may include a sort (since this is essentially time-series data). That's why I said earlier that this is going to work out much better for you if you've got many users and limited user interaction. Sorting (or sending across the network!) tons of MBs per user is not going to be especially fast (though potentially still faster than alternatives).
To sum, it depends both on the scale and the use case, but typically, yes, this is a problem well-suited to map/reduce in general.
Take a look at Splunk. This is an enterprise-grade tool designed for discovering patterns and relationships in large quantities of text data. We use it for monitoring the web and application logs for a large web site. Just let Splunk index everything and use the search engine to drill into the data -- no pre-processing is necessary.
Just ran across this: Getting Started with Splunk as an Engineer

Resources