nginx module: capture whole response body - nginx

Although Nginx is a really interesting piece of software, the lack of documentation is making me crazy.
Goal: capture the whole response body, which would be logged on the server.
Problem: I have always a single buffer which size is ZERO.
Approach
I would expect to be able to accomplish this requirement with a body filter, which would "wait" for last_buf before iterate the full buffers chain.
/**
* #param ngx_http_request_t *r HTTP request
* #param ngx_chain_t *in Buffer chain
*/
static ngx_int_t
create_response_snapshot(ngx_http_request_t *r, ngx_chain_t *in)
{
ngx_chain_t *chain = NULL;
int chain_contains_last_buffer = 0;
size_t buffer_size = 0;
// check if body is complete
chain = in;
for ( ; ; )
{
if (chain->buf->last_buf)
{
chain_contains_last_buffer = 1;
}
if (NULL == chain->next)
break;
chain = chain->next;
}
if (0 == chain_contains_last_buffer)
{
// response is not complete
return ngx_http_next_body_filter(r, in);
}
// Response Content-Length
ngx_log_error(NGX_LOG_ALERT,r->connection->log,0,"Content-Length: %d",
r->headers_out.content_length_n);
// lets iterate buffers chain
for (chain = in; NULL != chain; chain = chain->next)
{
buffer_size = ngx_buf_size(chain->buf);
ngx_log_error(NGX_LOG_ALERT,r->connection->log,0,"buffer_size#%d",buffer_size);
}
return ngx_http_next_body_filter(r, in);
}

My comment got too big to be a comment, but I don't feel like it's a proper answer - oh well.
To re-iterate, the problem with the code you've posted is that your module's body filter function won't be called on the whole chain at once. It gets called on the first piece, then the second piece, until the nth piece. Finally it gets called on a completely empty chain, for whatever reason the buf with last_buf = 1 is always by itself and empty.
So I think what you want to do is "dam" the flow of buffers by accumulating them in your module without releasing any to the next filter until you have all of them at once.
Check out the substitution filter module: http://lxr.nginx.org/source//src/http/modules/ngx_http_sub_filter_module.c
It uses a "busy" chain which is what I was referring to. From what I've been able to tell it uses it to keep track of which buffers have actually been sent (when this happens the size gets set to zero) and adds those to the module context's free list for re-use. See ngx_http_sub_output on line 438 for this behavior.
My suggestion was to do something like what that module does, except without calling the next filter until you have the entire page. You can't call next_filter if you want to process the entire page as a whole, since doing that will result in data getting sent to the client. Again this runs counter to Nginx's design, so I think you should find an alternative that doesn't require the whole response body at once if you can.

Related

Confluent Batch Consumer. Consumer not working if Time out is specified

I am trying to consume a max of 1000 messages from kafka at a time. (I am doing this because i need to batch insert into MSSQL.) I was under the impression that kafka keeps an internal queue which fetches messages from the brokers and when i use the consumer.consume() method it just checks if there are any messages in the internal queue and returns if it finds something. otherwise it just blocks until the internal queue is updated or until timeout.
I tried to use the solution suggested here: https://github.com/confluentinc/confluent-kafka-dotnet/issues/1164#issuecomment-610308425
but when i specify TimeSpan.Zero (or any other timespan up to 1000ms) the consumer never consumes any messages. but if i remove the timeout it does consume messages but then i am unable to exit the loop if there are no more messages left to be read.
I also saw an other question on stackoverflow which suggested to read the offset of the last message sent to kafka and then read messages until i reach that offset and then break from the loop. but currently i only have one consumer and 6 partitions for a topic. I haven't tried it yet but i think managing offsets for each of the partition might make the code messy.
Can someone please tell me what to do?
static List<RealTime> getBatch()
{
var config = new ConsumerConfig
{
BootstrapServers = ConfigurationManager.AppSettings["BootstrapServers"],
GroupId = ConfigurationManager.AppSettings["ConsumerGroupID"],
AutoOffsetReset = AutoOffsetReset.Earliest,
};
List<RealTime> results = new List<RealTime>();
List<string> malformedJson = new List<string>();
using (var consumer = new ConsumerBuilder<Ignore, string>(config).Build())
{
consumer.Subscribe("RealTimeTopic");
int count = 0;
while (count < batchSize)
{
var consumerResult = consumer.Consume(1000);
if (consumerResult?.Message is null)
{
break;
}
Console.WriteLine("read");
try
{
RealTime item = JsonSerializer.Deserialize<RealTime>(consumerResult.Message.Value);
results.Add(item);
count += 1;
}
catch(Exception e)
{
Console.WriteLine("malformed");
malformedJson.Add(consumerResult.Message.Value);
}
}
consumer.Close();
};
Console.WriteLine(malformedJson.Count);
return results;
}
I found a workaround.
For some reason the consumer first needs to be called without a timeout. That means it will wait for a message until it gets at least one. after that using consume with timeout zero fetches all the rest of the messages one by one from the internal queue. this seems to work out for the best.
I had a similar problem, updating the Confluent.Kafka and lidrdkafka libraries from version 1.8.2 to 2.0.2 helped

Microsoft's MPEG-2 demuxer filter - can I change an elementary stream pin's PID while the graph is running?

I'm working with multi-program UDP MPEG-2 TS streams that, -unfortunately- dynamically re-map their elementary stream PIDs at random intervals. The stream is being demuxed using Microsoft's MPEG-2 demultiplexer filter.
I'm using the PSI-Parser filter (an example filter included in the DirectShow base classes) in order to react to the PAT/PMT changes.
The code is properly reacting to the change, yet I am experiencing some odd crashes (heap memory corruption) right after I remap the Demuxer pins to their new ID's. (The re-mapping is performed inside the thread that is processing graph events, while the EC_PROGRAMCHANGED message is being processed).
The crash could be due to faulty code in my part, yet I have not found any reference that tells me if changing the pin PID mapping is safe while the graph is running.
Can anyone provide some info if this is operation is safe, and if it is not, what could I do to minimize capture disruption?
I managed to find the source code for a Windows CE version of the demuxer filter. Inspecting it, indeed, it seems that it is safe to remap a pin while the filter is running.
I also managed to find the source of my problems with the PSI-Parser filter.
When a new transport stream is detected, or the PAT version changes, the PAT is flushed, (all programs are removed, the table is re-parsed and repopulated).
There is a subtle bug within the CPATProcessor::flush() method.
//
// flush
//
// flush an array of struct: m_mpeg2_program[];
// and unmap all PMT_PIDs pids, except one: PAT
BOOL CPATProcessor::flush()
{
BOOL bResult = TRUE;
bResult = m_pPrograms->free_programs(); // CPrograms::free_programs() call
if(bResult == FALSE)
return bResult;
bResult = UnmapPmtPid();
return bResult;
}// flush
Here's the CPrograms::free_programs() implementation.
_inline BOOL free_programs()
{
for(int i= 0; i<m_ProgramCount; i++){
if(!HeapFree(GetProcessHeap(), 0, (LPVOID) m_programs[i] ))
return FALSE;
}
return TRUE;
}
The problem here is that the m_ProgramCount member is never cleared. So, -apart from reporting the wrong number of programs in the table after a flush (since it is updated incrementally for each program found in the table)-, the next time the table is flushed, it will try to release memory that was already released.
Here's my updated version that fixes the heap corruption errors:
_inline BOOL free_programs()
{
for(int i= 0; i<m_ProgramCount; i++){
if(!HeapFree(GetProcessHeap(), 0, (LPVOID) m_programs[i] ))
return FALSE;
}
m_ProgramCount = 0; // This was missing, next call will try to free memory twice
return TRUE;
}

Libcurl and HTTP Pipelining

Libcurl offers CURLOPT_HEADERFUNCTION and CURLOPT_WRITEFUNCTION callbacks. That's great until you use pipelining and multistack. How do you correlate the header with the body? Let's say tons of requests and bunch of easy handles cause libcurl to establish multiple connections to the server. Let's assume first response header arrives, and there is a delay in receiving the body. In the mean time, second header shows up along with the body. Does libcurl ensure that the second header is not delivered to the application until first response is complete?
This is important because header needs to be associated with the body. I am in the same predicament even when I don't use HEADERFUNCTION. Even if I use just the WRITEFUNCTION, it could receive the replies out of order in a mixed fashion. So the question is: Does libcurl ensure that the responses are delivered as a whole? If it's a single connection, we can be sure that the response order will follow request order. But I see libcurl making multiple connections when I use pipeling and multistack. Let's say 5 connections are made to same server because we are talking about Pipelining here. Response header for Conn1 arrives, Before we get the body from Conn1, we get the ResponseHeader from Conn2. Does LibCurl ensure that the Conn2ResponseHeader is not delivered to the application before BodyFromConn1? Otherwise following code will break.
class CEasyHandle
{
CURL* m_pCurl;
bool m_bInUse;
};
class CMultiStack
{
public:
CURLM* m_pCurlMulti;
deque<CEasyHandle*>& m_listEasyHandles;
static CEasyHandle* gpCurrentlyReceivingEasyHandle;
CEasyHandle* GetAvailableEasyHandle()
{
// Iterate through m_listEasyHandles and find one that is currently not added to multistack (m_bInUse)
// if none free, return NULL
}
bool MakeRequest(const char* pUrl)
{
CEasyHandle* pEasyHandle = GetAvailableEasyHandle();
if(!pEasyHandle) pEasyHandle = CreateNewEasyHandleAndAddToList();
curl_easy_setopt(pEasyHandle->m_pCurl, CURLOPT_HEADERFUNCTION, header_callback);
curl_easy_setopt(pEasyHandle->m_pCurl, CURLOPT_HEADERDATA, pEasyHandle); // header gets the EasyHandle
curl_easy_setopt(pEasyHandle->m_pCurl, CURLOPT_WRITEFUNCTION, write_callback);
curl_easy_setopt(pEasyHandle->m_pCurl, CURLOPT_WRITEDATA, this); // body gets MultiStack
// set options, add to multistack, pEasyHandle->m_bInUse = true;
}
static size_t header_callback(char *buffer, size_t size, size_t nmemb, void *userdata)
{
gpCurrentlyReceivingEasyHandle = (CEasyHandle*)userdata;
// if no data expected, of course set gpCurrentlyReceivingEasyHandle->m_bInUse = false;
}
static size_t write_callback(char *ptr, size_t size, size_t nmemb, void *userdata)
{
CMultiStack* pThisObj = (CMultiStack*)userdata;
pThisObj->PerformSomeWork();
// once complete, gpCurrentlyReceivingEasyHandle->m_bInUse = false;
}
};
Why don't use case CEasyHandle as the user data to the WRITEFUNCTION? And store a back-pointer to the CMultiStack in each CEasyHandle when you create them? Then you can always find any of the pieces you need in the write_callback, and you don't need to worry about the order.

Setting the Content-Type of an empty response in ASP.NET MVC

In order to support a legacy application that's in the field, I need my ASP.NET MVC app to return an empty response that also has a Content-Type. One of IIS, ASP.NET, or ASP.NET MVC is removing my Content-Type when I send back a null response. Is there any way around this?
(While not requiring an empty response with a set Content-Type would obviously be the ideal solution, the clients are already out there, and many of them cannot be upgraded.)
EDIT: Since there was a request for code: I'm proxying the request from the new web application to the one that older clients rely on. To do this, I have a subclass of ActionResult, called LegacyResult, that you can simply return for those methods that need to be handled by the old software. This is the relevant part of its code:
public override void ExecuteResult(ControllerContext context)
{
using (var legacyResponse = GetLegacyResponse(context))
{
var clientResponse = context.HttpContext.Response;
clientResponse.Buffer = false;
clientResponse.ContentType = legacyResponse.ContentType; /* Yes, I checked that legacyResponse.ContentType is never string.IsNullOrEmpty */
if (legacyResponse.ContentLength >= 0) clientResponse.AddHeader("Content-Length", legacyResponse.ContentLength.ToString());
var legacyInput = legacyResponse.GetResponseStream();
using (var clientOutput = clientResponse.OutputStream)
{
var rgb = new byte[32768];
int cb;
while ((cb = legacyInput.Read(rgb, 0, rgb.Length)) > 0)
{
clientOutput.Write(rgb, 0, cb);
}
clientOutput.Flush();
}
}
}
If legacyInput has data, then Content-Type is set appropriately. Otherwise, it's not. I can actually kluge the old backend to send an empty v. non-empty response for exactly the same request, and observe the difference in Fiddler.
EDIT 2: Poking around with Reflector reveals that, if headers have not been written at the time that HttpResponse.Flush is called, then Flush writes out the headers itself. The problem is that it only writes out a tiny subset of the headers. One of the missing ones is Content-Type. So it seems that, if I can force headers out to the stream, I can avoid this problem.
You have to trick the response into writing the headers, by falsely telling it there's content, then suppressing it:
/// [inside the writing block]
var didWrite = false;
while ((cb = legacyInput.Read(rgb, 0, rgb.Length)) > 0)
{
didWrite = true;
clientOutput.Write(rgb, 0, cb);
}
if (!didWrite)
{
// The stream needs a non-zero content length to write the correct headers, but...
clientResponse.AddHeader("Content-Length", "1");
// ...this actually writes a "Content-Length: 0" header with the other headers.
clientResponse.SuppressContent = true;
}

Process Lock Code Illustration Needed

I recently started this question in another thread (to which Reed Copsey
graciously responded) but I don't feel I framed the question well.
At the core of my question, I would like an illustration of how to gain
access to data AS it is being get/set.
I have Page.aspx.cs and, in the codebehind, I have a loop:
List<ServerVariable> files = new List<ServerVariable>();
for (i = 0; i <= Request.Files.Count - 1; i++)
{
m_objFile = Request.Files[i];
m_strFileName = m_objFile.FileName;
m_strFileName = Path.GetFileName(m_strFileName);
files.Add(new ServerVariable(i.ToString(),
this.m_strFileName, "0"));
}
//CODE TO COPY A FILE FOR UPLOAD TO THE
//WEB SERVER
//WHEN THE UPLOAD IS DONE, SET THE ITEM TO
//COMPLETED
int index = files.FindIndex(p => p.Completed == "0");
files[index] = new ServerVariable(i.ToString(),
this.m_strFileName, "1");
The "ServerVariable" type gets and sets ID, File, and Completed.
Now, I need to show the user the file upload "progress" (in effect,
the time between when the loop adds the ServerVariable item to the
list to when the Completed status changes from 0 to 1.
Now, I have a web service method "GetStatus()" that I would like to
use to return the files list (created above) as a JSON string (via
JQuery). Files with a completed status of 0 are still in progress,
files with a 1 are done.
MY QUESTION IS - what does the code inside GetStatus() look like? How
do I query List **as* it is being populated and
return the results real-time? I have been advised that I need to lock
the working process (setting the ServerVariable data) while I query
the values returned in GetStatus() and then unlock that same process?
If I have explained myself well, I'd appreciate a code illustration of
the logic in GetStatus().
Thanks for reading.
Have a look at this link about multi threading locks.
You need to lock the object in both read and write.

Resources