Azure Managing Interrupted/partial blob uploads

Azure Managing Interrupted/partial blob uploads - asp.net

I'm uploading bobs to my azure cloud storage using the following way. The problem I'm facing is, if a user exits the web application or if the upload gets interrupted, the partially uploaded blob still remains on the storage. What is the way of handling interrupted blob uploads in Azure?
Code:
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(cloudString);
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
container.CreateIfNotExists();
container.SetPermissions(
new BlobContainerPermissions
{
PublicAccess = BlobContainerPublicAccessType.Blob
});
CloudBlockBlob blockBlob = container.GetBlockBlobReference(uniqueBlobName);
blockBlob.UploadFromByteArray(f, 0, f.Length);

When it comes to uploading files as block blobs, there are two possible scenarios:
File is uploaded without being split into chunks - Let's say a user is uploading a file without splitting it in chunks and in the middle of upload process user closes the browser. In this case, nothing will happen because the blob is not saved yet in blob storage.
File is uploaded in chunks - This is the case with large files where the upload is happening in chunks. Assuming a scenario where some chunks are uploaded and after that user terminates the upload process. In this case, there are two possible solutions:
1) You do nothing - If you don't do anything, chunks that are uploaded but not committed gets deleted by storage service automatically after 7 (or 14) days. Downside of this approach is that you would pay for these bytes for those days.
2) You can programmatically delete uncommitted blobs - You can get a list of uncommitted blobs in a container and delete those blobs. One thing I would suggest is that you find uncommitted blobs that have not been modified for a certain time so that you're not deleting the blobs which are still being uploaded.
UPDATE
I had a chance to play with uncommitted blobs. When you list blobs with BlobListingDetails.UncommittedBlobs, it will return both committed and uncommitted blobs. One way to identify an uncommitted blob is by checking it's ETag property. In my little experiment, I found that ETag property will be null and blob length to be 0 bytes in case of an uncommitted blob.

Related

How can I set limit on the amount of storage that each user can upload to Firebase Storage?

I am struggling to find out how to set the limit of the storage that each user can upload to my apps storage.
I found method online Storage.storageLimitInBytes method, but I don't see this method even be mentioned in Firebase docs, let alone instructions on how to set it.
In general, how do startups monitor how many times user upload images, would they have a field in users document such as amountOfImagesUploaded: and everytime user uploads image I would increment that count and this way I could see who abuse the storage that way.
Or would I have to similar document that tracks users uploads per day and when the count reaches 100 or something then take action on that user.
I would really appreciate your help regarding this issue that I am facing.

Limits in Cloud Storage for Firebase security rules apply to each file/object separately, they don't apply to an entire operation.
You can limit what a user can upload through Firebase Storage's security rules. For example, this (from the linked docs) is a way to limit the size of uploaded files:
service firebase.storage {
match /b/<your-firebase-storage-bucket>/o {
match /images/{imageId} { // Only allow uploads of any image file that's less than 5MB
allow write: if request.resource.size < 5 * 1024 * 1024 && request.resource.contentType.matches('image/.*');
} } }
But there is currently no way in these rules to limit the number of files a user can upload.
Some options to consider:
If you hardcode the names of the files that the user uploads (which
also implies you'll limit the number of files they can upload), and
create a folder for the files for each specific user, you can
determine the sum of all files in a user's folder, and thus limit on
the sum in that way.
For example : If you fix file names and limit the allowed file
names to be numbered 1..5, the user can only ever have five files in
storage:
match /public/{userId}/{imageId} {
allow write: if imageId.matches("[1-5]\.txt");
}
Alternatively, you can ZIP all files together on the client, and
then upload the resulting archive. In that case, the security rules
can enforce the maximum size of that file.
And of course you can include client-side JavaScript code to check
the maximum size of the combined files in both of these cases. A
malicious user can bypass this JavaScript easily, but most users
aren't malicious and will thank you for saving their bandwidth by
preventing the upload that will be rejected anyway.
You can also use a HTTPS Cloud Function as your upload target, and
then only pass the files onto Cloud Storage if they meet your
requirements. Alternatively you can use a Cloud Function that
triggers upon the upload from the user, and validates the files for
that user after the change. For example : You would have to
upload the files through a Cloud function/server and keep track of
the total size that a user has uploaded. For that,
Upload image to your server
Check the size and add it to total size stored in a database
If the user has exceeded 150 GB, return quota exceeded error else upload to Firebase storage user -> server -> Firebase storage
An easier alternative would be to use Cloud Storage Triggers which
will trigger a Cloud function every time a new file is uploaded. You
can check the object size using the metadata and keep adding it in
the database. In this case, you can store total storage used by a
user in custom claims in bytes.
exports.updateTotalUsage = functions.storage.object().onFinalize(async (object) => {
// check total storage currently used
// add size of new object to it
// update custom claim "size" (total storage in bytes)
})
Then you can write a security rule that checks sum of size of new
object and total storage being used does not exceed 150 GB: allow
write: if request.resource.size + request.auth.token.size < 150 *
1024 * 1024
You can also have a look at this thread too if you need a per user
storage validation. The solution is a little bit tricky, but can be
done with :
https://medium.com/#felipepastoree/per-user-storage-limit-validation-with-firebase-19ab3341492d

Google Cloud (or Firebase environment) doesn't know the users. It knows your application and your application do.
if you want to have statistic per users you have to logs those data somewhere and perform sum/aggregations to have your metrics.
A usual way is to use Firestore to store those information and to increment the number of file or the total space used.
An unusual solution is to log each action in Cloud Logging and to perform a sink from Cloud Logging to BigQuery to find your metrics in BigQuery and perform aggregation directly from there (the latency is higher, all depends on what you want to achieve, sync or async check of those metrics)

Corda Attachment Flow

PFB Following questions:
1) In my local, I don't have a networkMap so the maxTransactionSize and maxMessageSize needs to be made part of the extraConfig in deployNodes for each node?
2) Let's say I have an Excel of 100MB which I Zip and then upload to Node using rpc.uploadAttachment the SecureHash received will now be added to a Tx. After Successful Completion of the TX will both parties have the attachment? or The Receiver will get the file only when he opens the attachment?
3) If it's when the receiver opens the attachment, it's requested from the sender, the file travels over the network and reaches to the receiver and is stored in the H2 DB for future use. If attachment is required later, the blob can directly be provided from the DB?
4) Now where, how does attachmentContentCacheSizeMegaBytes come into picture? Since we are already storing it in the H2 DB where is it used? as a blob limit to the node_attachment table?
5) Also, is the file ever stored in the file system ? at the time of upload to the node? does it get stored directly to the H2 DB?

The maxTransactionSize and maxMessageSize is set by the network operator, and individual nodes cannot modify it. This is for compatibility reasons. All the nodes on the network need to be able to handle the largest-possible transaction to ensure they can resolve any transactions they receive
The receiver node downloads the attachment immediately, and not when it first opens the attachment
N/A
The attachmentContentCacheSizeMegaBytes node configuration option is optional and specifies how much memory should be used to cache attachment contents in memory. It defaults to 10MB
The attachment is stored in the node's database as a blob when it is first uploaded

Avro "Invalid sync!" exception in nextRawBlock()

I am streaming a Avro encoded file over the network from a S3 compliant object store and trying to read it and put it in some data-structure.
Issue: The issue I am facing sometimes ( one or two times in one / two days in test node when running continuously) is that half way through the file it hits this exception Invalid sync! in the nextRawBlock() method in DataFileStream class.
I would like to detect the root-cause of this and fix. I have been trying to
reproduce this in a test app but unable to do so successfully. I am looking for ideas on
what might potentially cause this ?
any better ways of reproducing this.
More details
a) The Avro file is not downloaded to disk , I get a handle to the file stream using S3ObjectInputStream and feed it to DataFileStream constructor
and then read from the stream directly.
b) The app tries to read records from the Avro encoded file in batches
of 500 records at a time.
c) The file contains a header section containing a Long count and a KV Map of String to Integer. After that it contains a array of records where each record contains a String and a long array. The schema uses Avro's union construct for enabling this.
d) Number of records in the file on average is around 5M
e) This entire download happens in separate thread and not in any user request.
f) The file is uploaded to the store by a separate process.
Other observation:
a) Upon failure the app closes the stream and tries to again download and read the stream. What I observe is this takes the node to a high oldgen state slowing down user requests.

Firebase Push keys as Firebase Storage file names?

I noticed that to use Firebase Storage (Google Cloud Storage) I need to come up with a unique file name to upload a file.
I then plan to keep a copy of that Storage file location (https URL or gs URL) in the Firebase Realtime database, where the clients will be able to read and download it separately
However I am unable to come up with unique filenames for the files located on Firebase Storage. Using a UUID generator might cause collisions in my case since several clients are uploading images to a single Firebase root
Here's my plan. I'd like to know if it will work
Lets call my firebase root : Chatrooms, which consists of keys : chatroom_1, chatroom_2 ...chatroom_n
under chatroom_k I have a root called "Content", which stores Push keys that are uniquely generated by Firebase to store content. Each push key represents a content, but the actual content is stored in Firebase Storage and a key called URL references the URL of the actual content. Can the filename for this content on Firebase storage have the same randomized Push key as long as the bucket hierarchy represents chatroom_k?

I am not sure if storage provides push() function but a suggestion would be the following:
Request a push() to a random location to your firebase database and use this key for a name.
At any case you will probably need to store this name to the database too.
In my application I have a node called "photos" and there I store the information about the images I upload. I first do a push() to get a new key and I use this key to rename the uploaded image to.
Is this what you need or I misunderstood something?

So I had the same problem and I reached this solution:
I named the files with time and date and the user uid, so it is almost impossible to have two files with the same name and they will be different every single time.
DateFormat dtForm = new SimpleDateFormat("yyyy.MM.dd.HH.mm.ss.");
String date = dtForm.format(Calendar.getInstance().getTime());
String fileName = date + FirebaseAuth.getInstance().getCurrentUser().getUid();
FirebaseStorage
.getInstance()
.getReference("Folder/SubFolder/" + fileName)
.putFile(yourData)
With this the name of the files are going to be like this "2022.09.12.11.50.59.WEFfwds2234SA11" for example

storing files as byte array in db, security risk?

We have an asp.net application that allows users to upload files, the files are saved to temporary disk location and later attached to a record and saved in DB.
My question pertains to security and/or virus issues. Are there any security holes in this approach? Can a virus cause harm if it is never executed (file is saved, then opened using filestream, converted to byte array and saved to DB.
Later, when the file is needed we stream the file back to user.
The files are saved to a folder on the web server like this:
context.Request.Files[0].SaveAs(); (location is a folder under app_data/files)
later when the same user creates a record we grab the file from disk and store it in db like this:
FileStream fileStream = File.OpenRead(currentFilePath);
byte[] ba = new byte[fileStream.Length];
int len = fileStream.Read(ba, 0, ba.Length);
//ba saved to DB here as varbinary(max)
We limit the files that can be uploaded to this list:
List<string> supportedExtensions = new List<string>(10) {".txt", ".xls", ".xlsx", ".doc", ".docx", ".eps", ".jpg", ".jpeg", ".gif", ".png", ".bmp", ".rar", ".zip", ".rtf", ".csv", ".psd", ".pdf" };
The file is streamed back to user's web browser like this:
//emA = entity object loaded from DB
context.Response.AppendHeader("Content-Disposition", "inline; filename=\"" + emA.FileName + "\"");
context.Response.AddHeader("Content-Type", emA.ContentType);
context.Response.BinaryWrite(emA.FileContent);

There's always a security risk when accepting files from unknown users. Anyone could potentially write a virus in VBA (Visual Basic for Applications) in the office documents.
Your approach is no more or less of a security risk than saving them directly on the file system or directly in the database except for one concern...
If the files are saved to the disk, they can be scanned by traditional virus scanners. As far as I know most virus scanners don't scan files that are stored in a DB as a byte array.
If it were my server, I would be storing them on the file system for performance reasons, not security reasons, and you can bet I would have them scanned by a virus scanner if I were allowing potentially dangerous files, such as office documents, executables, etc.

Have your users create logins before allowing them to upload files. Unchecked access of this kind is unheard of... not saying that this is a solution in and of itself, but like all good security systems it can form an extra layer :-)

I can't see there being anymore security risk than saving the files to disk. The risks here are often not to do with where you store the data since as you've already pointed out the stored file doesn't get executred.
The risk is usually in how the data is transfered. Worms will exploit circumstances which allow what was just data on its way through the system to be treat as if it were code and start being executed. Such exploits do not require that any sense of "file" being transfered be present, in the past a specially formatted URL could suffice.
That said, I've never understood the desire to store large binary data in a SQL database. Why not just save the files on disk and store the file path in the DB. You can then use features such as WriteFile or URL re-writing to get IIS do what its good at.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex