Uploading multiple HttpPostedFileBase using Parallel.ForEach breaking files - asp.net

I have a form that uploads multiple files. My model has a List<HttpPostedFileBase> called SchemaFileBases, which is correctly binded. I need to upload these files to s3 and would like to do it in parallel. I'm unable to use asyc and await because this code is run from both ASP.Net and a queue based application that currently doesn't have async/await support (working on it).
If I change the foreach below to Parallel.ForEach(this.SchemaFileBases, schemaFileBase => {... Then I get some funkiness going on. The two files end up being mashed. Each file will contain some of the other files content after it's uploaded. AwsDocument is being used elsewhere in parallel so I don't think it has to do with that. Each AwsDocument has it's own AmazonS3Client.
public override void UploadToS3(IMetadataParser parser)
{
string hash;
string key;
foreach (var schemaFileBase in this.SchemaFileBases)
{
AwsDocument aws = new AwsDocument(AwsBucket.Received);
hash = schemaFileBase.InputStream.Md5Hash().ToByteArray().ToHex();
key = String.Format("{0}/{1}", this.S3Prefix, schemaFileBase.FileName);
Stream inputStream = schemaFileBase.InputStream;
aws.UploadToS3(key, inputStream, hash);
}
}
My coworker suspect's it's something to do with how the InputStream on the HttpPostedFileBase is implemented. Perhaps it is not thread safe, and the streams are both reading from the original request at the same time? I can't imagine MS would do that though.
Multi-threaded version:
public override void UploadToS3(IMetadataParser parser)
{
Parallel.ForEach(this.SchemaFileBases, f =>
{
AwsDocument aws = new AwsDocument(AwsBucket.Received);
string hash = f.InputStream.Md5Hash().ToByteArray().ToHex();
string key = String.Format("{0}/{1}", this.S3Prefix, f.FileName);
Stream inputStream = f.InputStream;
aws.UploadToS3(key, inputStream, hash);
});
}
Above solution is what I tried to multi-thread it. Does not work (files get mixed up all weird).

Related

How to process Excel file in memory?

I am trying to create an API that will accept the representation of an Excel file from the client. I wish to return a List<List<string>> as JSON array after processing the first sheet. However, I cannot write the file to disk, and all processing must happen in-memory. What are the ways in which this can be achieved?
I've tried referring to various solutions on the internet but all of them involve writing the file to disk and then using that file for further processing. I'm open to solutions that involve
Accepting base-64 representation of the file from the POST request body
Accepting file as part of multipart/form-data request
Any other standard request formats that accept files
The only condition is that the API should return a JSON array representation of the spreadsheet.
Here I am sending a file as part of multipart/form-data request to the API which written in .NET core.
which support .xlsx , .xls and .csv format
use ExcelDataReader and ExcelDataReader.DataSet NuGet packages for reading excel and convert in the dataset.
Here one problem i faced and solution in .NET core.
By default, ExcelDataReader throws a NotSupportedException "No data is available for encoding 1252." on .NET Core.
To fix, add a dependency to the package System.Text.Encoding.CodePages and then add code to register the code page in starting of API
System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);
This is required to parse strings in binary BIFF2-5 Excel documents encoded with DOS-era code pages. These encodings are registered by default in the full .NET Framework, but not on .NET Core.
public ActionResult ExcelOrCsvToArray()
{
if (Request.Form.Files.Count > 0)
{
IFormFile file = Request.Form.Files[0];
string fileName = file.FileName;
string fileContentType = file.ContentType;
System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);
Stream stream = file.OpenReadStream();
try
{
if (fileName.EndsWith(".csv"))
{
using (var reader = ExcelReaderFactory.CreateCsvReader(stream))
{
var result = SetAsDataSet(reader);
DataTable table = result.Tables[0];
return new OkObjectResult(table);
}
}
else
{
using (var reader = ExcelReaderFactory.CreateReader(stream))
{
var result = SetAsDataSet(reader);
DataTable table = result.Tables[0];
return new OkObjectResult(table);
}
}
}
catch (Exception e)
{
return new BadRequestObjectResult(e);
}
}
else
{
return new BadRequestResult();
}
}
private DataSet SetAsDataSet(IExcelDataReader reader)
{
var result = reader.AsDataSet(new ExcelDataSetConfiguration()
{
ConfigureDataTable = (_) => new ExcelDataTableConfiguration()
{
UseHeaderRow = true,
}
});
return result;
}

How to use "Azure storage blobs" for POST method in controller

I am creating an app where user can upload their text file and find out about its most used word.
I have tried to follow this doc to get used to the idea of using AZURE STORAGE BLOBS - https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-dotnet
But I am super newbie and having a hard time figuring it out how to adapt those blobs methods for my POST method.
This my sudo - what I think I need in my controller and what needs to happen when POST method is triggered.
a.No need for DELETE or PUT, not replacing the data nor deleting in this app
b.Maybe need a GET method, but as soon as POST method is triggered, it should pass the text context to the FE component
POST method
connect with azure storage account
if it is a first time of POST, create a container to store the text file
a. how can I connect with the existing container if the new container has already been made? I found this, but this is for the old CloudBlobContainer. Not the new SDK 12 version.
.GetContainerReference($"{containerName}");
upload the text file to the container
get the chosen file's text content and return
And here is my controller.
public class HomeController : Controller
{
private IConfiguration _configuration;
public HomeController(IConfiguration Configuration)
{
_configuration = Configuration;
}
public IActionResult Index()
{
return View();
}
[HttpPost("UploadText")]
public async Task<IActionResult> Post(List<IFormFile> files)
{
if (files != null)
{
try
{
string connectionString = Environment.GetEnvironmentVariable("AZURE_STORAGE_CONNECTION_STRING");
BlobServiceClient blobServiceClient = new BlobServiceClient(connectionString);
string containerName = "textdata" + Guid.NewGuid().ToString();
BlobContainerClient containerClient = await blobServiceClient.CreateBlobContainerAsync(containerName);
//Q. How to write a if condition here so if the POST method has already triggered and container already created, just upload the data. Do not create a new container?
string fileName = //Q. how to get the chosen file name and replace with newly assignmed name?
string localFilePath = //Q. how to get the local file path so I can pass on to the FileStream?
BlobClient blobClient = containerClient.GetBlobClient(fileName);
using FileStream uploadFileStream = System.IO.File.OpenRead(localFilePath);
await blobClient.UploadAsync(uploadFileStream, true);
uploadFileStream.Close();
string data = System.IO.File.ReadAllText(localFilePath, Encoding.UTF8);
//Q. If I use fetch('Home').then... from FE component, will it receive this data? in which form will it receive? JSON?
return Content(data);
}
catch
{
//Q. how to use storageExeption for the error messages
}
finally
{
//Q. what is suitable to execute in finally? return the Content(data) here?
if (files != null)
{
//files.Close();
}
}
}
//Q. what to pass on inside of the Ok() in this scenario?
return Ok();
}
}
Q1. How can I check if the POST method has been already triggered, and created the Container? If so how can I get the container name and connect to it?
Q2. Should I give a new assigned name to the chosen file? How can I do so?
Q3. How can I get the chosen file's name so I can pass in order to process Q2?
Q4. How to get the local file path so I can pass on to the FileStream?
Q5. How to return the Content data and pass to the FE? by using fetch('Home').then... like this?
Q6. How can I use storageExeption for the error messages
Q7. What is suitable to execute in finally? return the Content(data) here?
Q8. What to pass on inside of the Ok() in this scenario?
Any help is welcomed! I know I asked a lot of Qs here. Thanks a lot!
Update: add a sample code, you can modify it as per your need.
[HttpPost]
public async Task<IActionResult> SaveFile(List<IFormFile> files)
{
if (files == null || files.Count == 0) return Content("file not selected");
string connectionString = "xxxxxxxx";
BlobServiceClient blobServiceClient = new BlobServiceClient(connectionString);
string containerName = "textdata" + Guid.NewGuid().ToString();;
BlobContainerClient containerClient = blobServiceClient.GetBlobContainerClient(containerName);
containerClient.CreateIfNotExists();
foreach (var file in files)
{
//use this line of code to get file name
string fileName = Path.GetFileName(file.FileName);
BlobClient blobClient = containerClient.GetBlobClient(fileName);
//directly read file content
using (var stream = file.OpenReadStream())
{
await blobClient.UploadAsync(stream);
}
}
//other code
return View();
}
Original answer:
When using List<IFormFile>, you should use foreach code block to iterate each file in the list.
Q2. Should I give a new assigned name to the chosen file? How can I do
so?
If you want to keep the file original name, in the foreach statement like below:
foreach (var file in myfiles)
{
Path.GetFileName(file.FileName)
//other code
}
And if you want to assign a new file name when uploaded to blob storage, you should define the new name in this line of code: BlobClient blobClient = containerClient.GetBlobClient("the new file name").
Q3. How can I get the chosen file's name so I can pass in order to
process Q2?
refer to Q2.
Q4. How to get the local file path so I can pass on to the FileStream?
You can use code like this: string localFilePath = file.FileName; to get the path, and then combine with the file name. But there is a better way, you can directly use this line of code Stream uploadFileStream = file.OpenReadStream().
Q5. How to return the Content data and pass to the FE? by using
fetch('Home').then... like this?
Not clear what's it meaning. Can you provide more details?
Q6. How can I use storageExeption for the error messages
The storageExeption does not exist in the latest version, you should install the older one.
You can refer to this link for more details.
#Ivan's answer is what the documentation seems the recommend; however, I was having a strange issue where my stream was always prematurely closed before the upload had time to complete. To anyone else who might run into this problem, going the BinaryData route helped me. Here's what that looks like:
await using var ms = new MemoryStream();
await file.CopyToAsync(ms);
var data = new BinaryData(ms.ToArray());
await blobClient.UploadAsync(data);

Serve file from Google Storage files via asp.net core webapi

I have a website (webapi+c#+asp.net core) that serving files to the clients. The application expose the method to the Download file from the server.
The actual file is stored on Google Storage. So, the file is being downloaded to the server (stored in memory stream) and then returned to the caller.
This is my code:
[Route("download/{id}")]
[HttpGet]
public async Task<ActionResult> DownloadAsync(string id)
{
// Authentication...
Uri remoteFile = item.GetEnpointResponse(); // Path to file in bucket
using (StorageClient gcpStorage = await StorageClient.CreateAsync(remoteFile.GetCredentials().Credentials).ConfigureAwait(false))
{
MemoryStream ms = new MemoryStream();
await gcpStorage.DownloadObjectAsync("bucketName", "path/to/blob", ms, new DownloadObjectOptions
{
EncryptionKey = "XXX"
}).ConfigureAwait(false);
ms.Seek(0, SeekOrigin.Begin);
return File(ms, "application/json");
}
}
Two problems I found:
It's storing all data of the file in the memory. If the file is large... its a hard job.
Waste of time - The file is being downloaded twice until getting to the client's hands.
Since this code is happening many times, I wonder if I can improve the performance of it? Something that I can improve here?
Most elegant way I could find is the following, using the library AspNetCore.Proxy:
namespace Storage
{
[Route("api/storage")]
public class StorageController : Controller
{
private readonly string _bucketName;
private readonly GoogleCredential _credential;
private readonly UrlSigner _urlSigner;
public StorageController(GoogleCredential credential, string bucketName)
{
_credential = credential;
_bucketName = bucketName;
_urlSigner = UrlSigner.FromServiceAccountCredential(_credential.UnderlyingCredential as ServiceAccountCredential);
}
[HttpGet("download/{file}")]
public async Task DownloadFileAsync(string file)
{
using var storageClient = await StorageClient.CreateAsync(_credential);
var signUrl = await _urlSigner.SignAsync(
_bucketName,
file,
TimeSpan.FromHours(3),
HttpMethod.Get
);
await this.HttpProxyAsync(signUrl);
}
}
}

How do I read and update HttpResponse body using PipeWriter?

This is actually a 2-part question related directly to .net core 3.0 and specifically with PipeWriter: 1) How should I read in the HttpResponse body? 2) How can I update the HttpResponse? I'm asking both questions because I feel like the solution will likely involve the same understanding and code.
Below is how I got this working in .net core 2.2 - note that this is using streams instead of PipeWriter and other "ugly" things associated with streams - eg. MemoryStream, Seek, StreamReader, etc.
public class MyMiddleware
{
private RequestDelegate Next { get; }
public MyMiddleware(RequestDelegate next) => Next = next;
public async Task Invoke(HttpContext context)
{
var httpResponse = context.Response;
var originalBody = httpResponse.Body;
var newBody = new MemoryStream();
httpResponse.Body = newBody;
try
{
await Next(context);
}
catch (Exception)
{
// In this scenario, I would log out the actual error and am returning this "nice" error
httpResponse.StatusCode = StatusCodes.Status500InternalServerError;
httpResponse.ContentType = "application/json"; // I'm setting this because I might have a serialized object instead of a plain string
httpResponse.Body = originalBody;
await httpResponse.WriteAsync("We're sorry, but something went wrong with your request.");
return;
}
// If everything worked
newBody.Seek(0, SeekOrigin.Begin);
var response = new StreamReader(newBody).ReadToEnd(); // This is the only way to read the existing response body
httpResponse.Body = originalBody;
await context.Response.WriteAsync(response);
}
}
How would this work using PipeWriter? Eg. it seems that working with pipes instead of the underlying stream is preferable, but I can not yet find any examples on how to use this to replace my above code?
Is there a scenario where I need to wait for the stream/pipe to finish writing before I can read it back out and/or replace it with a new string? I've never personally done this, but looking at examples of PipeReader seems to indicate to read things in chunks and check for IsComplete.
To Update HttpRepsonse is
private async Task WriteDataToResponseBodyAsync(PipeWriter writer, string jsonValue)
{
// use an oversized size guess
Memory<byte> workspace = writer.GetMemory();
// write the data to the workspace
int bytes = Encoding.ASCII.GetBytes(
jsonValue, workspace.Span);
// tell the pipe how much of the workspace
// we actually want to commit
writer.Advance(bytes);
// this is **not** the same as Stream.Flush!
await writer.FlushAsync();
}

Returning a filestream - how to know when it's done

I have a controller which has a function that will return a file. The file is generated on the server as a temp file and then streamed via a HttpResponseMessage. What I'd like to do, is delete the file after I've finished sending it (maybe in the future we might keep them for a little while in case the exact same request is made again). I have something like this:
[HttpGet]
public HttpResponseMessage GetReport()
{
string fileName = //function that creates the file and returns the filename...
HttpResponseMessage response = new HttpResponseMessage();
response.Content = new StreamContent(new FileStream(fileName, FileMode.Open, FileAccess.Read));
response.Content.Headers.ContentDisposition = new System.Net.Http.Headers.ContentDispositionHeaderValue("attachment");
response.Content.Headers.ContentDisposition.FileName = "test.docx";
//File.Delete(fileName);
return response;
}
I can't delete the file at the commented out point above because the file is in use at that point. So is there an event or something that will be fired once the stream has finished being sent so I can handle deleting?
I could, of course, just start a task to wait some (hopefully sufficiently long) period of time and then delete, but that seems a little hit-or-miss.
Because you mentioned keeping the files around for awhile (potentially), you will need some kind of expiration architecture. Create a database table that tracks these temporary file system objects along with an expiration timestamp. Then, create a scheduled task using Windows Task Scheduler or a library like Quartz.NET to periodically query for expired objects and delete them.
I do this in my own projects for cleaning up files that were uploaded by the user but aren't necessarily used because the user canceled the encompassing process.
The tricky part is defining what constitutes a successful response. Is the response successful because the client received all the data and acted upon it? If so, then only the client has all the information necessary to determine if the data was received successfully. In this case, the client could perhaps tell the server that it (the client) received and acted upon the data. Then, the server could either delete the file immediately or mark it for expiration in the architecture I mentioned previously.
HttpResponseMessage is disposable than my suggestion is define your class derived from HttpResponseMessage and override Dispose(bool disposing) method to clean up your file.
class FileResponseMessage : HttpResponseMessage
{
public string FileResponseMessage(string fileName)
{
this.Content = new StreamContent(new FileStream(fileName, FileMode.Open, FileAccess.Read));
this.Content.Headers.ContentDisposition = new System.Net.Http.Headers.ContentDispositionHeaderValue("attachment");
this.Content.Headers.ContentDisposition.FileName = "test.docx";
}
override void Dispose(bool disposing)
{
if(disposing)
{
//your cleanup
}
}
}

Resources