Progress while deserializing JSON

Progress while deserializing JSON - json.net

I'm deserializing a huge JSON (1.4 GB) via a stream, because I don't want to load the whole content into memory in advance just for parsing. That's working fine, but it takes ~80 seconds, so I want to display a progress.
public JObject DeserializeViaStream(string filename)
{
object obj;
var serializer = new JsonSerializer();
using (var sr = new StreamReader(new FileStream(filename, FileMode.Open)))
{
using (var jsonTextReader = new JsonTextReader(sr))
{
obj = serializer.Deserialize(jsonTextReader);
}
}
return (JObject) obj;
}
I have not yet tried but only one idea: I could implement my own stream reader which keep track of the bytes being read and comparing that to the file length.
Is there a built-in option or easier way to do this?

I ended up using the idea I had. Luckily there's already a ProgressStream available by Mel Green (archive.org). The original URL is no longer available.
Please note:
this approach may not work for all situations and with all libraries forever. This is due to the fact that the Seek() operation provides random access, and someone could read the file multiple times.
I can't post the source code here, because it was released under an unclear license.

Related

Downsides of streaming large JSON or HTML content to a browser in ASP.NET MVC

I am working with a legacy ASP.NET Framework MVC application that is experiencing issues with memory usage such as occasional bursts of OutOfMemory exceptions across a variety of operations. The application is frequently operating with large lists of objects (sometimes 10s to 100s of megabytes), and then serializing them to JSON to return to the client. We are not totally sure what the source of the OutOfMemory exceptions is, but believe a likely candidate is memory fragmentation due to too many large objects going on the Large Object Heap.
We are thinking a quick win is to refactor some of the controller endpoints to serialize their JSON content using a stream writer (as outlined in the JSON.NET Documentation), and to stream the content back to the browser. This won't eliminate the memory load of the data lists prior to serialization, but in theory should cut down the overall amount of data going on to the LOH.
The code is written to send the results in chunks of less than 85kb:
public async Task<ActionResult> MyControllerMethod()
{
var data = GetData();
Response.BufferOutput = false;
Response.ContentType = "application/json";
var serializer = JsonSerializer.Create();
using (var sw = new StreamWriter(Response.OutputStream, Encoding.UTF8, 84999))
{
sw.AutoFlush = false;
serializer.Serialize(sw, data);
await sw.FlushAsync();
}
return new EmptyResult();
}
I am aware of a few downsides with this approach, but don't consider them showstoppers:
More complex to implement a unit test due to the 'EmptyResult' returned by the controller.
I have read there is a small overhead due to a call to PInvoke whenever data is flushed. (In practice I haven't noticed this).
Cannot post-process the content using e.g. an HttpHandler
Cannot set a content-length header which may be useful for the client in some cases.
What other downsides or potential problems exist with this approach?

How to view contents of HTMLTextWriter?

I have an HtmlTextWriter object (theWriter) being passed into a method. This method is in the middle tier. I'd like to read the contents of the theWriter in debug mode. The method is:
protected override void Render (HtmlTextWriter theWriter) {...}
which inherits from Panel:WebControl.
I've tried
theWriter.Flush();
theWriter.InnerWriter.ToString();
but that only outputs the object type: "System.Web.HttpWriter". I've seen some examples use methods on the Response object. But I don't have access to Response in this layer. Any ideas?

The InnerWriter is a TextWriter-derived class, which writes to a stream. You will have to open that stream and read data from it. Whether you can open and read from that stream is an open question, and depends very much on what type of stream it is.
So to use your example, theWriter.InnerWriter is an object derived from TextWriter. But you don't know what kind, and TextWriter itself doesn't expose the underlying stream.
Now, if InnerWriter is a StreamWriter, then you might be able to write:
var sWriter = theWriter.InnerWriter as StreamWriter;
var stream = sWriter.BaseStream;
var savePosition = stream.Position;
stream.Position = 0;
// now, you can read the stream
// when you're done reading the stream, be sure to reset its position
stream.Position = savePosition;
You have to be very careful, though. If you get the base stream and then open it with a StreamReader, closing the StreamReaderwill close the underlying stream. Then your HtmlTextWriter will throw an exception the next time you try to write to it.
It's also possible that you won't be able to read the stream. If the base stream is a NetworkStream, for example, you can't read it. Or it could be a FileStream that was open for write only. There's no good general way to do this, as it entirely depends not only on the specific TextWriter-derived class, but also on the stream that the TextWriter is writing to.
For example, the HtmlTextWriter could be writing to a StreamWriter, which is connected to a BufferedStream connected to a GZipStream, which finally writes to a MemoryStream.
So, in general, I'd recommend that you look for some other solution to your problem. Unless you know for sure what the underlying stream is, and that you can read it ... and that things won't change on you unexpectedly.

Read Request Body in ASP.NET

How does one read the request body in ASP.NET? I'm using the REST Client add-on for Firefox to form a GET request for a resource on a site I'm hosting locally, and in the Request Body I'm just putting the string "test" to try to read it on the server.
In the server code (which is a very simple MVC action) I have this:
var reader = new StreamReader(Request.InputStream);
var inputString = reader.ReadToEnd();
But when I debug into it, inputString is always empty. I'm not sure how else (such as in FireBug) to confirm that the request body is indeed being sent properly, I guess I'm just assuming that the add-on is doing that correctly. Maybe I'm reading the value incorrectly?

Maybe I'm misremembering my schooling, but I think GET requests don't actually have a body. This page states.
The HTML specifications technically define the difference between "GET" and "POST" so that former means that form data is to be encoded (by a browser) into a URL while the latter means that the form data is to appear within a message body.
So maybe you're doing things correctly, but you have to POST data in order to have a message body?
Update
In response to your comment, the most "correct" RESTful way would be to send each of the values as its own parameter:
site.com/MyController/MyAction?id=1&id=2&id=3...
Then your action will auto-bind these if you give it an array parameter by the same name:
public ActionResult MyAction(int[] id) {...}
Or if you're a masochist you can maybe try pulling the values out of Request.QueryString one at a time.

I was recently reminded of this old question, and wanted to add another answer for completeness based on more recent implementations in my own work.
For reference, I've blogged on the subject recently.
Essentially, the heart of this question was, "How can I pass larger and more complex search criteria to a resource to GET a filtered list of objects?" And it ended up boiling down to two choices:
A bunch of GET query string parameters
A POST with a DTO in the request body
The first option isn't ideal, because implementation is ugly and the URL will likely exceed a maximum length at some point. The second option, while functional, just didn't sit right with me in a "RESTful" sense. After all, I'm GETting data, right?
However, keep in mind that I'm not just GETting data. I'm creating a list of objects. Each object already exists, but the list itself doesn't. It's a brand new thing, created by issuing search/filter criteria to the complete repository of objects on the server. (After all, remember that a collection of objects is still, itself, an object.)
It's a purely semantic difference, but a decidedly important one. Because, at its simplest, it means I can comfortably use POST to issue these search criteria to the server. The response is data which I receive, so I'm "getting" data. But I'm not "GETting" data in the sense that I'm actually performing an act of creation, creating a new instance of a list of objects which happens to be composed of pre-existing elements.
I'll fully admit that the limitation was never technical, it was just semantic. It just never "sat right" with me. A non-technical problem demands a non-technical solution, in this case a semantic one. Looking at the problem from a slightly different semantic viewpoint resulted in a much cleaner solution, which happened to be the solution I ended up using in the first place.

Aside from the GET/POST issue, I did discover that you need to set the Request.InputStream position back to the start. Thanks to this answer I found.
Specifically the comment
Request.InputStream // make sure to reset the Position after reading or later reads may fail
Which I translated into
Request.InputStream.Seek(0,0)

I would try using the HttpClient (available via Nuget) for doing this type of thing. Its so much easier than the System.Net objects

Direct reading from the Request.InputStream dangerous because when re-reading will get null even if the data exists. This is verified in practice.
Reliable reading is performed as follows:
/*Returns a string representing the content of the body
of the HTTP-request.*/
public static string GetFromBodyString(this HttpRequestBase request)
{
string result = string.Empty;
if (request == null || request.InputStream == null)
return result;
request.InputStream.Position = 0;
/*create a new thread in the memory to save the original
source form as may be required to read many of the
body of the current HTTP- request*/
using (MemoryStream memoryStream = new MemoryStream())
{
request.InputStream.CopyToMemoryStream(memoryStream);
using (StreamReader streamReader = new StreamReader(memoryStream))
{
result = streamReader.ReadToEnd();
}
}
return result;
}
/*Copies bytes from the given stream MemoryStream and writes
them to another stream.*/
public static void CopyToMemoryStream(this Stream source, MemoryStream destination)
{
if (source.CanSeek)
{
int pos = (int)destination.Position;
int length = (int)(source.Length - source.Position) + pos;
destination.SetLength(length);
while (pos < length)
pos += source.Read(destination.GetBuffer(), pos, length - pos);
}
else
source.CopyTo((Stream)destination);
}

Dispose a stream in a BizTalk pipeline component?

I'm fairly new to BizTalk and creating a custom pipeline component. I have seen code in examples that are similar to the following:
public void Disassemble(IPipelineContext pContext, IBaseMessage pInMsg)
{
Stream originalDataStream = pInMsg.BodyPart.GetOriginalDataStream();
StreamReader strReader = new StreamReader(originalDataStream);
string strOriginalData = strReader.ReadToEnd();
byte[] bufferOriginalMessage = new byte[strOriginalData.Length];
bufferOriginalMessage = ASCIIEncoding.Default.GetBytes(strOriginalData);
Stream ms = new MemoryStream();
ms.Write(bufferOriginalMessage, 0, strOriginalD
//other stuff here
ms.Seek(0, SeekOrigin.Begin);
pInMsg.BodyPart.Data = ms;
}
But nowhere in the method is the StreamReader being closed or disposed. The method simply exits.
Normally when using StreamReader and other classes, it is best practice to use a using statement so that the stream is automatically disposed.
Is there a particular reason (perhaps in BizTalk) why you wouldn't dispose this StreamReader?
I have not found any information on this point. Can anyone help?

In general, yes, it's a good practice to close readers and streams you don't need anymore. That said, there might not necessarily be 100% required everytime. For example, closing the reader would close the underlying stream normally, but chances are, something else is probably already aware of the stream and will close it at the right time on it's own.
What is good practice, however, is to add any streams you use in a pipeline component with a lifetime matching that of the message to the resource tracker, so that BizTalk can dispose them automatically when the pipeline execution finishes and the message has been processed.

ASP.NET creating thumbnails server side

It look desceptively easy to use System.Drawing to create thumbnails in your ASP.NET application. But MSDN tells you:
Classes within the System.Drawing namespace are not supported for use within a Windows or ASP.NET service. Attempting to use these classes from within one of these application types may produce unexpected problems, such as diminished service performance and run-time exceptions.
I'm seeing intermittented 'out of memory' errors within this type of GDI+ code. I'm beginning to suspect this is the cause.
How ARE people doing server side image manipulation? Can anyone recommend any alternative that WON'T blow up my server?
The relevant code below. The exception intermittently happens in System.Drawing.Graphics.DrawImage. I've just inherited this project, so I'd need to check the logs to see how often this is being hit / how often we get an exception...
public byte[] Resize(int newWidth, int newHeight, Image orignalImage)
{
Bitmap bitmap = new Bitmap(newWidth, newHeight);
Graphics g = Graphics.FromImage(bitmap);
g.InterpolationMode = System.Drawing.Drawing2D.InterpolationMode.HighQualityBicubic;
Rectangle r = new Rectangle(0, 0, newWidth, newHeight);
g.DrawImage(orignalImage, r, r.X, r.Y, orignalImage.Width, orignalImage.Height, GraphicsUnit.Pixel);
MemoryStream stream = new MemoryStream();
bitmap.Save(stream, ImageFormat.Jpeg);
// clean up memory leaks
if (bitmap != null)
{
bitmap.Dispose();
bitmap = null;
}
if (g != null)
{
g.Dispose();
g = null;
}
return stream.ToArray();
}
UPDATE: I've searched thru the whole project for anywhere we are using GDI+ and put using() { } around everything that's IDisposable. I haven't seen one 'out of memory' exception since I did this.

Assuming you will be doing "stuff" per request, the issues might be
Processor intensive operation: manipulation of images, which could take time.
In case you are saving the file, it will lead to disk issues.
You can consider using HTTP handlers,
Disposing System.Drawing objects should be a priority(using(){} statement )
Asynchronous Pages can be explored here.

Why don't you set up a separate work server that is exposed through a web service.

I would recommend you put some exception handling code around these operations so that you're guaranteed to dispose of your GDI+ objects. Good practice to close your streams too ... although to my knowledge MemoryStream object is managed so should close itself when GC'd.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex