Get same html body in different requests - http

I'm calling HTTP requests on the same URL multiple times. But every time I get a different body. What's the reason behind that. And what should I do?
main() async {
for (var i = 0; i < 10; i++) {
print(await getBody('https://google.com'));
}
}
Future<String> getBody(String lyricsUrl) async {
final response = await http.get(Uri.parse(lyricsUrl));
return response.body.length.toString();
}
Here is the output

Basically what you are printing is the length of complete text obtained from the request. So, this includes lot of things like : html code of website which include the body of the page, other text on the page, js functions , etc.. These all would be same for each request you make. But the request also returns a set of parameters like KEI, KEXPI strings which are part of the script for windows.google..
So, if you try printing the complete request text instead of the length alone, you would observe something like below whose values are different for different requests. Thus this is cause of different lengths for different requests.
window.google={kEI:\'dzPQYPiJ4fs6rTzssOTwAk\',
kEXPI:\'0,772215,1,52320,56873,954,5105,206,4804,2316,383,246,5,
23,5250,16232,10,1106274,1197730,522,31,328984,51224,16114,28684,
17572,4859,1361,9290,3027,3891,13691,4020,978,13228,3847,10622,
1141,7509,2,276,4516,2778,919,5081,885,710,1277,2212,530,149,1943,
6297,108,3406,606,2023,1777,520,12370,603,2624,1989,856,7,5599,
6755,5096,7539,338,4928,108,3407,908,2,231,2614,2397,1027,6441,
3277,3,576,1014,1,54423,149,5990,5333,991,1661,4,1528,2304,1236,
5803,74,1983,2626,2015,1300,2767,7434,3824,3050,2658,872,3291,80,56,
462,2595,31,13628,2305,639,7079,10535,665,2521,3261,2575,2047,2047,
17,3121,6,908,3,3541,1,14710,1814,283,38,351,2,1,3,517,5992,6754,
432,552,4788,2,1394,1498,35,1273,1715,2,3037,20,483,1534,3244,297,
10,147,1067,3,33,3,1343,1461,3271,626,158,254,494,1454,923,93,545,
2,1784,115,1159,1076,262,2834,406,2091,1606,1784,1975,287,1678,2,
445,595,1160,2292,775,1628,2,2003,176,37,1228,756,1526,2,230,206,
4420,519,2,338,204,3,75,50,895,1368,138,2205,116,246,54,269,54,16,
1448,320,2005,61,531,619,223,168,74,262,256,207,87,1,233,341,552,
156,700,2,2,5,79,339,1220,683,590,188,657,2,1,533,461,18,6,304,113,
332,192,270,5629329,194,32,63,155,2,59,5996289,520,47,2800650,2382,
444,1,2,80,1,1796,1,9,2,2551,1,889,795,2,561,1,4265,1,1,2,1331,3299,
843,2609,155,17,13,72,139,4,2,20,2,169,13,19,46,5,39,96,548,29,2,
2,1,2,1,2,2,7,4,1,2,2,2,2,2,2,353,422,91,5,114,43,14,25,43,2,1,4,
1,10,8,1,23952292,4010273,268,1835,26467,2,2374,3,120,3,6,338,3,
2340,74,540\',
kBL:\'bsXf\'};
In short, the response text involve various other dynamic values along with the static data on the website. So the length varies from request to request.
Feel free to add a comment you still have any doubt or Upvote my answer so that it would be useful..

Related

Cloudflare worker: How to modify a response body like a string?

I'm trying to detect some string in the response body just to log a message to the console.
In another part of the code I also try to replace a piece of text in the response body.
These two attempts are throwing errors in the worker:
if (response.body.includes("X")) {
console.log(response.body);
}
And:
responseCopy = new Response(response.body, response)
responseCopy.body = responseCopy.body.replace("x", "y")
How can I:
Check for the existence of a piece of text in the response body and act accordingly?
Manipulate the response body like a string, e.g. replace a string, or overwrite it completely?
P.s. I'm really not that keen with js. I can't understand why it's not working.
Thank you
After researching more, I found out I have to make the function (in which the code is running) async, and then await on the response. Only then the text property of the response (not the body (why?)) contains the body text:
var html = await response.text()
// Simple replacement regex
html = html.replace(/x/g , 'y')
// return modified response
return new Response(html, response)
Also, the regex replacement should be used, otherwise only the first encounter of the search string is replaced (why js, why?!!).

AWS Textract - GetDocumentAnalysisRequest only returns correct results for first page of document

I have written code to extract tables and name value pairs from pdf using Amazon Textract. I followed this example:
https://docs.aws.amazon.com/textract/latest/dg/async-analyzing-with-sqs.html
which was in sdk for java version 1.1.
I have refactored it for version 2.
This is an async process that only applies to multi page documents. When i get back the results it is pretty accurate for first page. But the consecutive pages are mostly empty rows. The documents i parse are scanned so the quality is not great. However if i take a jpg of individual pages and use the one page operation, i.e. AnalyzeDocumentRequest, each page comes out good. Also Amazon Textract tryit service renders the pages correctly.
So the error must be in my code but can't see where.
As you see it all happens in here :
GetDocumentAnalysisRequest documentAnalysisRequest = GetDocumentAnalysisRequest.builder().jobId(jobId)
.maxResults(maxResults).nextToken(paginationToken).build();
response = textractClient.getDocumentAnalysis(documentAnalysisRequest);
and i can't really do any intervention.
The most likely place I could make a mistake would be in the util file that gathers the page and table blocks i.e. here:
PageModel pageModel = tableUtil.getTableResults(blocks);
But that works perfectly for the first page, and i could also see in the response object above, that the number of blocks returned are much less.
Here is the full code:
private DocumentModel getDocumentAnalysisResults(String jobId) throws Exception {
int maxResults = 1000;
String paginationToken = null;
GetDocumentAnalysisResponse response = null;
Boolean finished = false;
int pageCount = 0;
DocumentModel documentModel = new DocumentModel();
// loops until pagination token is null
while (finished == false) {
GetDocumentAnalysisRequest documentAnalysisRequest = GetDocumentAnalysisRequest.builder().jobId(jobId)
.maxResults(maxResults).nextToken(paginationToken).build();
response = textractClient.getDocumentAnalysis(documentAnalysisRequest);
// Show blocks, confidence and detection times
List<Block> blocks = response.blocks();
PageModel pageModel = tableUtil.getTableResults(blocks);
pageModel.setPageNumber(pageCount++);
Map<String,String> keyValues = formUtil.getFormResults(blocks);
pageModel.setKeyValues(keyValues);
documentModel.getPages().add(pageModel);
paginationToken = response.nextToken();
if (paginationToken == null)
finished = true;
}
return documentModel;
}
Has anyone else encountered this issue?
Many thanks
if the response has NextToken, then you need to recall textract and pass in the NextToken to get the next batch of Blocks.
I am not sure how to do this in Java but here is the python example from AWS repo
https://github.com/aws-samples/amazon-textract-serverless-large-scale-document-processing/blob/master/src/jobresultsproc.py
For my solution, I did a simple if response['NextToken'] then recall method and concat the response['Blocks'] to my current list.

How do i check whether the url is responsive or not

I have Image Url in my Database and i want to check whether the URL is responsive or not in the browser .
please Help me .
For Example :
http://images.jactravel.co.uk/6008_1_1.jpg
or
http://images.jactravel.co.uk/6049_2_4.jpg
now how can i check automatically this url is responsive or not
I assume that by responsive you mean whether you can get a response when you call a specific URL or not.
To do that without actually downloading the content, you can use the HttpClient.GetAsync(string,HttpCompletionOption) with an HttpCompletionOption of ResponseHeadersRead. This will make GetAsync return immediately with a status code (eg 200, 404 or 500) without waiting to download the entire content, eg:
using (var client = new HttpClient())
{
using(var response = await client.GetAsync("http://mysite/myimage.jpg",
HttpCompletionOption.ResponseHeadersRead))
{
if (response.IsSuccessStatusCode)
{
//The URL is good
}
}
}
To actually read the content, you need to access one of the Read methods of the response's Content property. For example, you can use the CopyToAsync to copy the content to a file stream, or use ReadAsByteArrayAsync to read the content as a byte array, eg:
var buffer=await response.Content.ReadAsByteArrayAsync();

nginx module: capture whole response body

Although Nginx is a really interesting piece of software, the lack of documentation is making me crazy.
Goal: capture the whole response body, which would be logged on the server.
Problem: I have always a single buffer which size is ZERO.
Approach
I would expect to be able to accomplish this requirement with a body filter, which would "wait" for last_buf before iterate the full buffers chain.
/**
* #param ngx_http_request_t *r HTTP request
* #param ngx_chain_t *in Buffer chain
*/
static ngx_int_t
create_response_snapshot(ngx_http_request_t *r, ngx_chain_t *in)
{
ngx_chain_t *chain = NULL;
int chain_contains_last_buffer = 0;
size_t buffer_size = 0;
// check if body is complete
chain = in;
for ( ; ; )
{
if (chain->buf->last_buf)
{
chain_contains_last_buffer = 1;
}
if (NULL == chain->next)
break;
chain = chain->next;
}
if (0 == chain_contains_last_buffer)
{
// response is not complete
return ngx_http_next_body_filter(r, in);
}
// Response Content-Length
ngx_log_error(NGX_LOG_ALERT,r->connection->log,0,"Content-Length: %d",
r->headers_out.content_length_n);
// lets iterate buffers chain
for (chain = in; NULL != chain; chain = chain->next)
{
buffer_size = ngx_buf_size(chain->buf);
ngx_log_error(NGX_LOG_ALERT,r->connection->log,0,"buffer_size#%d",buffer_size);
}
return ngx_http_next_body_filter(r, in);
}
My comment got too big to be a comment, but I don't feel like it's a proper answer - oh well.
To re-iterate, the problem with the code you've posted is that your module's body filter function won't be called on the whole chain at once. It gets called on the first piece, then the second piece, until the nth piece. Finally it gets called on a completely empty chain, for whatever reason the buf with last_buf = 1 is always by itself and empty.
So I think what you want to do is "dam" the flow of buffers by accumulating them in your module without releasing any to the next filter until you have all of them at once.
Check out the substitution filter module: http://lxr.nginx.org/source//src/http/modules/ngx_http_sub_filter_module.c
It uses a "busy" chain which is what I was referring to. From what I've been able to tell it uses it to keep track of which buffers have actually been sent (when this happens the size gets set to zero) and adds those to the module context's free list for re-use. See ngx_http_sub_output on line 438 for this behavior.
My suggestion was to do something like what that module does, except without calling the next filter until you have the entire page. You can't call next_filter if you want to process the entire page as a whole, since doing that will result in data getting sent to the client. Again this runs counter to Nginx's design, so I think you should find an alternative that doesn't require the whole response body at once if you can.

Passing flash variables to asp.net

I don't know much about Flash but we are working on a site that has a flash form and when the users pick an option, like selecting a value from a drop down list, we need the value to be passed to asp.net server-side code. What's the easiest way to do this?
Flash can invoke server side service. So use GET or POST to pass data
You could explore these options:
1) Communicate between the SWF and the containing page through JavaScript
2) Communicate via asp.net webservices from the SWF directly to the webservice.
3) Not sure but could probably do a POST to a processing aspx page?
HTH
I think a good option is to use the XML class so consider this:
var xmlRequest = new XML();
xmlRequest.onLoad = parseXMLResponse;
xmlRequest.load("http://yourpathtoyourserver/file.aspx?listselectedvalue=something");
function parseXMLRequest(loaded)
{
trace("hi");
}
You can also have the page give you data back this way so it's not just one way communication.
Assuming you are using Action Script 2.
Read the important notes at the bottom of each codes pertain to sending and retrieving data from flash to .net page. Explanation of the code is in the comment inside the code.
Flash Part (Action Script 2)
//function to send collected form data to asp.net page
//use other control/button to call this function
//important: in order for the 'onLoad' event to work correctly, this function has to be 'Void'
function sendForm():Void
{
//create LoadVars object
var lv_in:LoadVars = new LoadVars();
var lv_out:LoadVars = new LoadVars();
//set onLoad event
lv_in.onLoad = function(success:Boolean)
{
//if success, meaning data has received from .net page, run this code
if (success)
{
//lv_in.status is use to get the posted data from .Net page
statusMsg.text = "Thank you for filling up the form!" + lv_in.status;
}
//if fail, run this code
else
{
statusMsg.text = "The form you are trying to fill up has an error!";
}
}
//this is the collected data from the form
lv_out.userName = txtUserName.text;
lv_out.userAddress = txtUserAddress.text;
lv_out.userBirthday = txtUserBirthday.text;
//begin invoke .net page
lv_out.sendAndLoad("ProcessDataForm.aspx", lv_in, "POST");
}
Important note:
The function that contain onLoad event, in this case sendForm function, has to be Void function, meaning it's not returning value. If this function return value, what happen is the function will be executed all the way without waiting for the returned data from .net page, thus the onLoad event will not be set properly.
.Net Part
public void ProcessData
{
//process the data here
Response.Write("status=processed&");
}
Important note:
To send data/message back to flash, you can use Response.Write. However, if you want Action Script to parse the posted message/data from .Net page keep in mind you have to include & symbol at the end of the message. When parsing data/message, Action Script will stop at & symbol, thus leave the rest of the message alone and only get the message under sent variable.

Resources