I'm writing a tool that runs periodically and reads and processes batches of documents from a CosmosDb collection. I'd like to read all documents that have not been processed yet including changed and added documents.
List (ReadFeed) Documents states that:
ReadFeed can be used to retrieve (...) the incremental changes to documents within the collection.
and that this can be done by setting the If-None-Match request header etag received in the response header:
The logical sequence number (LSN) of last document returned in the response.
incremental ReadDocumentFeed can be resumed by resubmitting this value in If-None-Match.
but there is no such header in the response.
I used both REST API and .NET Cosmos SDK v3 and tried setting If-None-Match header to:
_etag of the last processed document
etag of the container
But I'm getting the same, full set of documents each time.
A sample request I have:
GET /dbs/myData/colls/myItems/docs HTTP/2
Host: cosmos-local.documents.azure.com
authorization: type%3dmaster%26ver%3d1.0%26sig%3d...%2f...%2f...%3d
x-ms-date: Fri, 19 Feb 2021 08:44:23 GMT
x-ms-version: 2018-12-31
if-none-match: "44000253-0000-0d00-0000-601428c90000"
accept: */*
I tried these etag formats in If-None-Match:
"\"44000253-0000-0d00-0000-601428c90000\""
"44000253-0000-0d00-0000-601428c90000"
44000253-0000-0d00-0000-601428c90000
a response:
HTTP/2 200
cache-control: no-store, no-cache
pragma: no-cache
content-type: application/json
content-location: https://cosmos-local.documents.azure.com/dbs/myData/colls/myItems/docs
server: Microsoft-HTTPAPI/2.0
strict-transport-security: max-age=31536000
x-ms-activity-id: 765ea7f5-40c6-48fb-bf90-7b3b506b0b82
x-ms-last-state-change-utc: Thu, 11 Feb 2021 08:35:47.977 GMT
x-ms-resource-quota: documentSize=51200;documentsSize=52428800;documentsCount=-1;collectionSize=52428800;
x-ms-resource-usage: documentSize=0;documentsSize=238;documentsCount=19;collectionSize=243;
x-ms-schemaversion: 1.11
lsn: 174
x-ms-item-count: 19
x-ms-request-charge: 2.53
x-ms-alt-content-path: dbs/myData/colls/myItems
x-ms-content-path: eiBDAJIWdUc=
x-ms-documentdb-partitionkeyrangeid: 0
x-ms-xp-role: 1
x-ms-global-committed-lsn: 173
x-ms-number-of-read-regions: 0
x-ms-transport-request-id: 1
x-ms-cosmos-llsn: 174
x-ms-session-token: 0:-1#174
x-ms-request-duration-ms: 1.002
x-ms-serviceversion: version=2.11.0.0
x-ms-gatewayversion: version=2.11.0
date: Fri, 19 Feb 2021 10:45:10 GMT
Other attempts I made:
Set A-IM header to Incremental feed - got an error in the response
Set value from lsn header in the If-None-Match - got full set
Use ChangeFeedProcessor. It looks like what I need, but it keeps waiting for new data and I'd like it to shut down / timeout if there are no new changes.
Check if setting x-ms-max-item-count makes any difference - it doesn't seem to
So the questions are:
How to run incremental ReadFeed in CosmosDb?
Is there a better approach to read all documents in a collection incrementally, in separate runs?
Best approach is to use Change Feed. You can shut this down if you want. Just call StopAsync() on the processor object. If you're running this on a schedule you can host in Azure Functions and run it from a timer. When it starts up it will connect to the lease collection and start processing again from the last lsn. When it's done processing, call StopAsync() and it will shut down.
Related
I have some URL and make a request to that URL but a response is invalid. I checked requests in Chrome dev tools and Chrome didn't find something wrong. I make a request in Postman but I receive "Parse Error: There seems to be an invalid character in response header key or value". Also I make requests in Axios in Node.js and I receive an error again.
After all, I checked request in chrome dev tools again and then I saw that:
Accept-Ranges: bytes
Connection: close
Date: Wed, 30 Jun 2021 12:05:28 GMT
Server: Boa/0.94.14rc21
There are parsed headers from the response and I clicked on a "View source" and saw that:
HTTP/1.1 200 OK
Date: Wed, 30 Jun 2021 12:05:28 GMT
Server: Boa/0.94.14rc21
Accept-Ranges: bytes
Connection: close
<Content-Type:text/html>
Is it normal or I should receive content-type without angle brackets? Maybe it's documented somewhere as a standart?
UPD. I made requests to a dashboard of Yeastar TG200
UPD 2. Also I made POST-requests and I received valid content-type in response headers without angle brackets
Angle brackets in a field name are definitively invalid (see https://greenbytes.de/tech/webdav/rfc7230.html#rule.token.separators).
Context
My github pages are not refreshing. After diagnosing my conclusion is it's a server side caching effect.
What I did + diagnostic results
The site is working OK.
I made a change in index.html in my local
repo, then commit and push
I completely cleared my browser cache (btw also using cache clear plugins, and Chrome dev tools set not using cache)
Reloaded the page, with ctrl+f5 and ctrl+R (change is not applied)
Checked using github.com read index.html, the change is there, committed.
Monitored the traffic with Fiddler. The request for index.html sent, full response received, the content is the old NOT changed.
Examined the response header with Fiddler, says: (see header exhibit)
Reverse diagnostic
I've issued a request with a usual trick typeing: index.html?v001orAnythingYouWant and I got the new version of the page
Problem
Problem solved one can say, but it is not true. When I refresh images, css, js still this effect will prevent me to see the new result.
Question
How can I configure or overcome this server side caching, of course only for development/testing time?
Response header exhibit
HTTP/1.1 200 OK
Server: GitHub.com
Content-Type: text/html; charset=utf-8
Last-Modified: Fri, 06 May 2016 12:24:29 GMT
Access-Control-Allow-Origin: *
Expires: Fri, 06 May 2016 12:45:44 GMT
Cache-Control: max-age=600
X-GitHub-Request-Id: B91F111E:5AA6:47804:572C8F9F
Content-Length: 43752
Accept-Ranges: bytes
Date: Fri, 06 May 2016 12:35:57 GMT
Via: 1.1 varnish
Age: 13
Connection: keep-alive
X-Served-By: cache-fra1238-FRA
X-Cache: HIT
X-Cache-Hits: 1
Vary: Accept-Encoding
X-Fastly-Request-ID: 1758f53052edbfb40a0044407d53d5654ad1e983
I have a quick question but in advance I've read the RFC 2616 Chapter 14.22 about Host and HTTP Header but I still not understand where in httpd.conf or configuration file of a webserver should be changed? Please correct me if I'm wrong.
Look at following two HTTP GET I did to an Apache. The first one is GET for HTTP 1.0 , the other one is GET for HTTP 1.1. See the output:
HTTP/1.0 200 OK
Date: Thu, 24 Oct 2013 03:46:22 GMT
Server: Apache/1.3.41 (Unix) mod_gzip/1.3.26.1a PHP/5.2.9 mod_throttle/3.1.2 mod_psoft_traffic/0.2 mod_ssl/2.8.31 OpenSSL/0.9.8b
Vary: *
Last-Modified: Fri, 10 Aug 2012 20:22:30 GMT
ETag: "17c815b-3b-50256d86"
Accept-Ranges: bytes
Content-Length: 59
Connection: close
Content-Type: text/html
<html>
<body>
<center>webli7</center>
</body>
</html>
HTTP/1.1 400 Bad Request
Date: Thu, 24 Oct 2013 04:04:40 GMT
Server: Apache/1.3.41 (Unix) mod_gzip/1.3.26.1a PHP/5.2.9 mod_throttle/3.1.2 mod_psoft_traffic/0.2 mod_ssl/2.8.31 OpenSSL/0.9.8b
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1
16e
The HTTP protocol version is decided dynamicaly, not through configuration files. The client send a request specifying the highest protocol version that its support. Then, the server must respond with either the version requested by the client, or any earlier version that it prefers.
Since Apache does support HTTP/1.1, it should therefore match exactly the version provided by the client.
There exist a flag that you may set in Apache's config to force Apache to use HTTP/1.0 in certain situations, even though the browser requested HTTP/1.1. This is used to fix bugs in HTTP/1.1 handling of some very old browser. Today, you should not need to play with this flag.
As for your error, I would suggest that you make sure that your GET does provide the Host: header. This header is required in HTTP/1.1, yet optional in HTTP/1.0, and having it missing would certainly result in a 400 error.
I'm working on a delphi api for Google docs and having a hard time getting the upload to work. I'm following Google's development guide here and from what I understand it looks like the process should go like this:
Make a POST request to this url: https://docs.google.com/feeds/upload/create-session/default/private/full/?access_token=my_access_token&v=3&convert=false with these headers: X-Upload-Content-Type and X-Upload-Content-Length
Get a 200 OK response with the next upload location stored in the Location header
Make a PUT request to the Location header with the header Content-Type set to whatever I had X-Upload-Content-Type set to in step 1 and the header Content-Range set to something like this: bytes 0-524287/2097152 and the first 512kb of data in the body
Get a 308 Resume Incomplete Response that has the next upload location in the Location header
Go back to 3 until all bytes are uploaded, at which point I will receive a 201 Created response that will have the xml data describing the file I uploaded
Everything up to and including step 3 works fine. It is at step 4 that things start to go wrong.
The one thing that confuses me the most is that the response on step 4 doesn't contain a Location header. I figured that meant I should just send the next request to the same url, but that causes me to get a 504 error. I tried the entire process with fiddler just to see if it was the delphi code, a lack of understanding on my part, or something that google is doing.
Here's the requests and responses I sent and received using fiddler:
POST https://docs.google.com/feeds/upload/create-session/default/private/full/?access_token=my_access_token&v=3&convert=false HTTP/1.1
Content-Type: application/x-www-form-urlencoded
X-Upload-Content-Type: application/octet-stream
X-Upload-Content-Length: 2097152
Content-Length: 0
Host: docs.google.com
HTTP/1.1 200 OK
Server: HTTP Upload Server Built on May 16 2012 12:03:24 (1337195004)
Location: https://docs.google.com/feeds/upload/create-session/default/private/full/?access_token=my_access_token&v=3&convert=false&upload_id=AEnB2Ur9-9VxMSI6kaFzbybY2qiyzK6kVoKzcZ6Yo02H8Ni4FlQFl_N06DdjZXzp3vSjOPH3CEb_4vDlKZp7VlC0hxpkypzlKg
Date: Tue, 22 May 2012 16:53:27 GMT
Pragma: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Cache-Control: no-cache, no-store, must-revalidate
Content-Length: 0
Content-Type: text/html
PUT https://docs.google.com/feeds/upload/create-session/default/private/full/?access_token=my_access_token&v=3&convert=false&upload_id=AEnB2Ur9-9VxMSI6kaFzbybY2qiyzK6kVoKzcZ6Yo02H8Ni4FlQFl_N06DdjZXzp3vSjOPH3CEb_4vDlKZp7VlC0hxpkypzlKg HTTP/1.1
Content-Type: application/octet-stream
Content-Length: 524288
Content-Range: bytes 0-524287/2097152
Host: docs.google.com
[first 512kb of data here]
HTTP/1.1 308 Resume Incomplete
Server: HTTP Upload Server Built on May 16 2012 12:03:24 (1337195004)
Range: bytes=0-524287
X-Range-MD5: bd9d4ee7afa24b7da0e685f05b5f1f44
Date: Tue, 22 May 2012 16:54:29 GMT
Pragma: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Cache-Control: no-cache, no-store, must-revalidate
Content-Length: 0
Content-Type: text/html
PUT https://docs.google.com/feeds/upload/create-session/default/private/full/?access_token=my_access_token&v=3&convert=false&upload_id=AEnB2Ur9-9VxMSI6kaFzbybY2qiyzK6kVoKzcZ6Yo02H8Ni4FlQFl_N06DdjZXzp3vSjOPH3CEb_4vDlKZp7VlC0hxpkypzlKg HTTP/1.1
Content-Type: application/octet-stream
Content-Length: 524288
Content-Range: bytes 524288-1048575/2097152
Host: docs.google.com
[next 512kb of data]
HTTP/1.1 504 Fiddler - Send Failure
Content-Type: text/html; charset=UTF-8
Connection: close
Timestamp: 10:54:14.056
The only thing I was able to do was to be able to say for a fact that it is not just the delphi code that is wrong, and since I don't think it's google, I'm going to have to go with I don't understand something that should be happening. What am I missing?
Edit
I was able to get the upload working, I'm not entirely sure what I did differently, but the documentation is a little misleading. At least it is to me. When you send a PUT request, you don't get a new location, you just continue to upload to the same one. Also, when you finish the upload, the 201 response doesn't contain the actual XML data, instead, it has a Location header that points to where you can grab the XML data from. Not a huge deal but a little confusing.
It seems like the 504 error is returned by Fiddler, these two links should help:
https://urda.com/blog/2010/09/28/iis-services-504s-and-fiddler/
https://urda.com/blog/2010/09/30/follow-up-iis-services-504s-and-fiddler/
I have a script on GAE that requests an XML feed from a partner that's typically 40MB but only 5MB gzipped. GAE is automatically unzipping this content and throwing an error that the response is too big:
HTTP response was too large: 46677241. The limit is: 33554432.
The script is setup to uncompress the response itself. How do I prevent GAE from getting in the way and breaking?
Here's the response header from my partner:
HTTP/1.0 200 OK
Expires: Wed, 27 Jun 2012 05:42:07 GMT
Cache-Control: max-age=10368000
Content-Type: application/x-gzip
Accept-Ranges: bytes
Last-Modified: Wed, 22 Feb 2012 11:06:09 GMT
Content-Length: 5263323
Date: Tue, 28 Feb 2012 05:42:07 GMT
Server: lighttpd
X-Cache: MISS from static01
X-Cache-Lookup: MISS from static01:80
Via: 1.0 static01:80 (squid)
Most likely your partner's server responds with plain XML, because it thinks that http-client sending requests (i.e. GAE URL Fetch service) does not support gzipping. Hence "response was too large" error.
To announce that you actually want to receive gzipped content you need to set Accept-Encoding: gzip header when using URL fetch service.