Stream on the fly zipped files to client via rest endpoint - .net-core

I am trying to stream on the fly zipped files but memory consumption is high. For example, to zip total file size of 2.8 GB is taking nearly 5 GB of processor memory.
[Route("zip")]
public class ZipController : ControllerBase
{
private readonly HttpClient _httpClient;
public ZipController()
{
_httpClient = new HttpClient();
}
[HttpPost]
public async Task Zip([FromBody] JsonToZipInput input)
{
Response.ContentType = "application/octet-stream";
Response.Headers.Add($"Content-Disposition", $"attachment; filename=\"{input.FileName}\"");
using var zipArchive =
new ZipArchive(Response.BodyWriter.AsStream(), ZipArchiveMode.Create);
foreach (var (key, value) in input.FilePathsToUrls)
{
var zipEntry = zipArchive.CreateEntry(key, CompressionLevel.Optimal);
await using var zipStream = zipEntry.Open();
await using var stream = await _httpClient.GetStreamAsync(value);
await stream.CopyToAsync(zipStream);
}
}
}

I believe you should be able to call Response.StartAsync:
[HttpPost]
public async Task Zip([FromBody] JsonToZipInput input)
{
Response.ContentType = "application/octet-stream";
Response.Headers.Add($"Content-Disposition", $"attachment; filename=\"{input.FileName}\"");
await Response.StartAsync();
using var zipArchive = new ZipArchive(Response.BodyWriter.AsStream(), ZipArchiveMode.Create);
foreach (var (key, value) in input.FilePathsToUrls)
{
var zipEntry = zipArchive.CreateEntry(key, CompressionLevel.Optimal);
await using var zipStream = zipEntry.Open();
await using var stream = await _httpClient.GetStreamAsync(value);
await stream.CopyToAsync(zipStream);
}
}
StartAsync should start the response being sent. Note that neither the response headers nor the status code can be modified once StartAsync is called.
In particular, this means that your exception handling will be different. Previously, an exception (e.g., from a bad URL in the request) would cause an exceptional status code (i.e., 500). With a streaming response, any exceptions after StartAsync cannot change the status code; it's already been sent. Instead, it will appear to the client as though the connection was terminated without a clean close. Complicating this a bit further, this behavior is not uncommon for web servers to do in the successful case, so clients may not complain - they would just end up with truncated (invalid) zip files. (In the case of streaming zips, the "file table" in the zip is sent last instead of first).
So, this should work, but I also recommend:
Ensure your exception logging works for exceptions after StartAsync. There is no way to return error details to the client, so you must rely on logging.
If you control the client, test out this new error situation, and see if you can detect it. If it's not detectable using that client, then ensure your code validates the zip.

Nothing about the zip file format should require a large amount of memory for this use case. It's essential all the files in order, with a table at the end describing the zip structure, and file offsets. This makes it possible to stream very efficiently without using much memory at all.
You may not need to write this yourself, ZipStreamer is a micro service you host that does exactly this (disclosure, I'm the author). It's designed to solve the exact problems you are hitting by streaming the bytes out as soon as they come in, with a fixed buffer size to prevent blowing up memory. It can stream hundreds of zips files in parallel using only a few MB of memory.
If you need this to be part of your application, here are some suggestions.
Disable compression will save CPU, and a bit of memory. Depending on your files, compression might not be a major benefit (jpegs actually get bigger after zip compression). If you're zipping just to combine many files into one, this will really help. But this doesn't explain using GB of memory.
Ensure you're not holding the stream content any longer than you need to, it looks like you are. Start streaming back asap as #Stephen suggested with StartAsync.

Related

c# Ftp file Upload Async | Sync

I've asp.net web API which upload some specific documents from end users to FTP file location on the server.
the website is public and will have many concurrent users, so many uploads can be invoked .
I want to tell user that the upload is success or fails.
I'm using the following code to upload
FtpWebRequest clsRequest = (FtpWebRequest)System.Net.WebRequest.Create(fileName);
clsRequest.Credentials = new NetworkCredential(ftpUsername, ftpPassword);
clsRequest.Method = Ftp.UploadFile;
using (System.IO.Stream clsStream = clsRequest.GetRequestStream())
{
clsStream.Write(bytes, 0, bytes.Length);
clsStream.Close();
clsStream.Dispose();
}
the code is running but I'm afraid the performance and stability with concurrent users
What's the best way from performance ,stability,and server health with concurrent users and with return success / Failure for each user
Do I need to use Async calls instead and how code should be changed?
Thanks
Here is the async version of your code:
FtpWebRequest clsRequest = (FtpWebRequest)System.Net.WebRequest.Create(fileName);
clsRequest.Credentials = new NetworkCredential(ftpUsername, ftpPassword);
clsRequest.Method = Ftp.UploadFile;
using (System.IO.Stream clsStream = await clsRequest.GetRequestStreamAsync())
{
await clsStream.WriteAsync(bytes, 0, bytes.Length);
// These lines not required since they are located inside using statement.
//clsStream.Close();
//clsStream.Dispose();
}
Remember your method must be marked with async keyword and returns a Task if it is a void method. Also, I have to mention that this is a client-side code. Making it async doesn`t affect server-side performance.

Telegram "API development tools" limits

I try to use my application (with TLSharp) but suddenly by using TelegramClient .SendCodeRequestAsync function, I get This Exception :
"Flood prevention. Telegram now requires your program to do requests
again only after 84894 seconds have passed (TimeToWait property). If
you think the culprit of this problem may lie in TLSharp's
implementation, open a Github issue "
after waiting for 84894 sec, It show this message again.
(I wait and try several times but messages doesn't differ:( )
Someone told me that its Telegram limits. Is it right?
Do you Have better idea to Send message/file to a telegram account?
It might be a late answer but can be used as a reference. the first problem is that Telegram APIs don't let each phone number to send code request more than 5 times a day. the second problem is shared session file that you use for TelegramClient by default. so you should create a custom session manager to separate each phone number session in a separate dat file.
public class CustomSessionStore : ISessionStore
{
public void Save(Session session)
{
var dir = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "Sessions");
if (!Directory.Exists(dir))
{
Directory.CreateDirectory(dir);
}
var file = Path.Combine(dir, "{0}.dat");
using (FileStream fileStream = new FileStream(string.Format(file, (object)session.SessionUserId), FileMode.OpenOrCreate))
{
byte[] bytes = session.ToBytes();
fileStream.Write(bytes, 0, bytes.Length);
}
}
public Session Load(string sessionUserId)
{
var dir = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "Sessions");
if (!Directory.Exists(dir))
{
Directory.CreateDirectory(dir);
}
var file = Path.Combine(dir, "{0}.dat");
string path = string.Format(file, (object)sessionUserId);
if (!File.Exists(path))
return (Session)null;
var buffer = File.ReadAllBytes(path);
return Session.FromBytes(buffer, this, sessionUserId);
}
}
then create your TelegramClient like this:
var client = new TelegramClient(apiId, apiHash, new CustomSessionStore(), phoneNumber);
I guess you are closing and starting your application many times or repeating this method. After 10 times the telegram API makes you wait for about 24 hours to prevent flood.
It's a Telegram limit, my advice: Wait 2-3 minutes between calling SendCodeRequestAsync()

IIS Worker process using 6gb RAM on web server with ASP.NET MVC web site

I have a web site running in its own Application Pool (IIS 8). Settings for the pool are default i.e. recycle every 29 hours.
Our web server only has 8gb RAM and I have noticed that the worker process for this web site regularly climbs to 6gb RAM and slows the server to a crawl. This is the only site currently on the web server.
I also have SQL Express 2016 installed as well. The site is using EF version 6.1.3.
The MVC site is very straightforward. It has a GETPDF controller which finds a row in a table, gets PDF info stored in a field then serves it back to the browser as follows :-
using (eBillingEntities db = new eBillingEntities())
{
try
{
string id = model.id;
string emailaddress = Server.HtmlEncode(model.EmailAddress).ToLower().Trim();
eBillData ebill = db.eBillDatas.ToList<eBillData>().Where(e => e.PURL == id && e.EmailAddress.ToLower().Trim() == emailaddress).FirstOrDefault<eBillData>();
if (ebill != null)
{
// update the 'Lastdownloaded' field.
ebill.LastDownloaded = DateTime.Now;
db.eBillDatas.Attach(ebill);
var entry = db.Entry(ebill);
entry.Property(en => en.LastDownloaded).IsModified = true;
db.SaveChanges();
// Find out from the config record whether the bill is stored in the table or in the local pdf folder.
//
Config cfg = db.Configs.ToList<Config>().Where(c => c.Account == ebill.Account).FirstOrDefault<Config>();
bool storePDFDataInEBillTable = true;
if (cfg != null)
{
storePDFDataInEBillTable = cfg.StorePDFDataInEBillDataTable;
}
// End of Modification
byte[] file;
if (storePDFDataInEBillTable)
{
file = ebill.PDFData;
}
else
{
string pathToFile = "";
if (string.IsNullOrEmpty(cfg.LocalPDFDataFolder))
pathToFile = cfg.LocalBackupFolder;
else
pathToFile = cfg.LocalPDFDataFolder;
if (!pathToFile.EndsWith(#"\"))
pathToFile += #"\";
pathToFile += ebill.PDFFileName;
file = System.IO.File.ReadAllBytes(pathToFile);
}
MemoryStream output = new MemoryStream();
output.Write(file, 0, file.Length);
output.Position = 0;
HttpContext.Response.AddHeader("content-disposition", "attachment; filename=ebill.pdf");
return new FileStreamResult(output, "application/pdf");
}
else
return View("PDFNotFound");
}
catch
{
return View("PDFNotFound");
}
Are there any memory leaks here?
Will the file byte array and the memory stream get freed up?
Also, is there anything else I need to do concerning clearing up the entity framework references?
If the code looks OK, where would be a good place to start looking?
Regards
Are there any memory leaks here?
No.
Will the file byte array and the memory stream get freed up?
Eventually, yes. But that may be the cause of your excessive memory use.
Also, is there anything else I need to do concerning clearing up the entity framework references?
No.
If the code looks OK, where would be a good place to start looking?
If this code is the cause of your high memory use, it's because you are loading files into memory. And you're loading two copies of each file in memory, once in a byte[] and copying to a MemoryStream.
There's no need to do that.
To eliminate the second copy of the file use the MemoryStream(byte[]) constructor instead of copying the bytes from the byte[] to an empty MemoryStream.
To eliminate the first copy in memory, you can stream the data into a temporary file that will be the target of your FileStreamResult, or initialize the FileStreamResult using a ADO.NET stream.
See https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sqlclient-streaming-support
If you go to ADO.NET streaming your DbContext, will need to be scoped to your Controller, instead of a local variable, which is a good practice in any case.
In addition to David's advice. I noticed that I was doing the following
**db.eBillDatas.ToList<eBillData>()**
therefore I was getting all the data from the database then fetching it again with the where clause.
I didn't notice the problem until the database started to fill up.
I removed that part and now the IIS worker processing is about 100mb.

Get length of data from inputstream opened by content-resolver

I am doing video (and also photo) uploading to the server by using HttpURLConnection.
I have an Uri of a video. I open an InputStream this way:
InputStream inputStream = context.getContentResolver().openInputStream(uri);
As video file is pretty big, I can't buffer data while writing it into the outputStream. So I need to use setFixedLengthStreamingMode(contentLength) method of HttpURLConnection. But it requires "contentLength".
The question is, how to get the length of the video?
Please don't suggest getting filepath. On some devices it works, but it often fails (especially on Android 6). They say Uri doesn't necessarily represent a file.
I also stumbled onto situations when after opening device gallery (with Intent) I receive an Uri of a picture, but I fail trying to get filepath from it. So I believe it's not a good way to get filepath from Uri?
Try something like this:
void uploadVideo() {
InputStream inputStream = context.getContentResolver().openInputStream(uri);
// Your connection.
HttpURLConnection connection;
// Do connection setup, setDoOutput etc.
// Be sure that the server is able to handle
// chunked transfer encoding.
connection.setChunkedStreamingMode(0);
OutputStream connectionOs = connection.getOutputStream();
// Read and write a 4 KiB chunk a time.
byte[] buffer = new byte[4096];
int bytesRead;
while ((bytesRead = inputStream.read(buffer)) != -1) {
connectionOs.write(buffer, 0, bytesRead);
}
// Close streams, do connection etc.
}
UPDATE: added setChunkedStreamingMode

How often should I open/close my Booksleeve connection?

I'm using the Booksleeve library in a C#/ASP.NET 4 application. Currently the RedisConnection object is a static object across my MonoLink class. Should I be keeping this connection open, or should I be open/closing it after each query/transaction (as I'm doing now)? Just slightly confused. Here's how I'm using it, as of now:
public static MonoLink CreateMonolink(string URL)
{
redis.Open();
var transaction = redis.CreateTransaction();
string Key = null;
try
{
var IncrementTask = transaction.Strings.Increment(0, "nextmonolink");
if (!IncrementTask.Wait(5000))
{
transaction.Discard();
throw new System.TimeoutException("Monolink index increment timed out.");
}
// Increment complete
Key = string.Format("monolink:{0}", IncrementTask.Result);
var AddLinkTask = transaction.Strings.Set(0, Key, URL);
if (!AddLinkTask.Wait(5000))
{
transaction.Discard();
throw new System.TimeoutException("Add monolink creation timed out.");
}
// Run the transaction
var ExecTransaction = transaction.Execute();
if (!ExecTransaction.Wait(5000))
{
throw new System.TimeoutException("Add monolink transaction timed out.");
}
}
catch (Exception ex)
{
transaction.Discard();
throw ex;
}
finally
{
redis.Close(false);
}
// Link has been added to redis
MonoLink ml = new MonoLink();
ml.Key = Key;
ml.URL = URL;
return ml;
}
Thanks, in advance, for any responses/insight. Also, is there any sort of official documentation for this library? Thank you S.O. ^_^.
According to the author of Booksleeve,
The connection is thread safe and intended to be massively shared;
don't do a connection per operation.
Should I be keeping this connection open, or should I be open/closing
it after each query/transaction (as I'm doing now)?
There is probably a little overhead if you will open a new connection each time you want to make a query/transaction and although redis is designed for high level of concurrently connected clients, there might be performance problems if their number is around tens of thousands. As far as I know connection pooling should be done by the client libraries (because redis itself doesn't have this functionality), so you should check if booksleeve supports this stuff. Otherwise you should open the connection when your application starts and keep it open for it's lifetime (in case you don't need parallel clients connected to redis for some reason).
Also, is there any sort of official documentation for this library?
The only documentation I was able to find regarding how to use it was tests folder in it's source codes.
For reference (continuing #bzlm's answer), I created a Singleton that always provides the same Redis connection using BookSleeve (if it's closed, it's being created. Else, the existing connection is being served).
Look at this: https://stackoverflow.com/a/8777999/290343
You consume it like that:
RedisConnection connection = Redis.RedisConnectionGateway.Current.GetConnection();

Resources