Get image from URL and upload to Amazon S3 - asp.net

I'd like to load an image directly from a URL but without saving it on the server, I want to upload it directly from memory to Amazon S3 server.
This is my code:
Dim wc As New WebClient
Dim fileStream As IO.Stream = wc.OpenRead("http://www.domain.com/image.jpg")
Dim request As New PutObjectRequest()
request.BucketName = "mybucket"
request.Key = "file.jpg"
request.InputStream = fileStream
client.PutObject(request)
The Amazon API gives me the error "Could not determine content length". The stream fileStream ends up as "System.Net.ConnectStream" which I'm not sure if it's correct.
The exact same code works with files from the HttpPostedFile but I need to use it in this way now.
Any ideas how I can convert the stream to become what Amazon API is expecting (with the length intact)?

I had the same problem when I'm using the GetObjectResponse() method and its propertie ResponseStream to copy a file from a folder to another in same bucket. I noted that the AWS SDK (2.3.45) have some faults like a another method called WriteResponseStreamToFile in GetObjectResponse() that simply doesn't work. These lacks of functions needs some workarounds.
I solved the problem openning the file in array of bytes and putting it in a MemoryStream object.
Try this (C# code)
WebClient wc = new WebClient();
Stream fileStream = wc.OpenRead("http://www.domain.com/image.jpg");
byte[] fileBytes = fileStream.ToArrayBytes();
PutObjectRequest request = new PutObjectRequest();
request.BucketName = "mybucket";
request.Key = "file.jpg";
request.InputStream = new MemoryStream(fileBytes);
client.PutObject(request);
The extesion method
public static byte[] ToArrayBytes(this Stream input)
{
byte[] buffer = new byte[16 * 1024];
using (MemoryStream ms = new MemoryStream())
{
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, read);
}
return ms.ToArray();
}
}
You can also create a MemoryStream without an array of bytes. But after the first PutObject in S3, the MemoryStream will be discarted. If you need to put others objects, I recommend the first option
WebClient wc = new WebClient();
Stream fileStream = wc.OpenRead("http://www.domain.com/image.jpg");
MemoryStream fileMemoryStream = fileStream.ToMemoryStream();
PutObjectRequest request = new PutObjectRequest();
request.BucketName = "mybucket";
request.Key = "file.jpg";
request.InputStream = fileMemoryStream ;
client.PutObject(request);
The extesion method
public static MemoryStream ToMemoryStream(this Stream input)
{
byte[] buffer = new byte[16 * 1024];
int read;
MemoryStream ms = new MemoryStream();
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, read);
}
return ms;
}

I had the same problem in a similar scenario.
The reason for the error is that to upload an object the SDK needs to know the whole content length that is going to be uploaded. To be able to obtain stream length it must be seekable, but the stream returned from WebClient is not. To indicate the expected length set Headers.ContentLength in PutObjectRequest. The SDK will use this value if it cannot determine length from the stream object.
To make your code work, obtain content length from the response headers returned by the call made by WebClient. Then set PutObjectRequest.Headers.ContentLength. Of course this relies on the server returned content length value.
Dim wc As New WebClient
Dim fileStream As IO.Stream = wc.OpenRead("http://www.example.com/image.jpg")
Dim contentLength As Long = Long.Parse(client.ResponseHeaders("Content-Length"))
Dim request As New PutObjectRequest()
request.BucketName = "mybucket"
request.Key = "file.jpg"
request.InputStream = fileStream
request.Headers.ContentLength = contentLength
client.PutObject(request)

I came up with a solution that uses UploadPart when the length is not available by any other means, plus this does not load the entire file into memory.
if (args.DocumentContents.CanSeek)
{
PutObjectRequest r = new PutObjectRequest();
r.InputStream = args.DocumentContents;
r.BucketName = s3Id.BucketName;
r.Key = s3Id.ObjectKey;
foreach (var item in args.CustomData)
{
r.Metadata[item.Key] = item.Value;
}
await S3Client.PutObjectAsync(r);
}
else
{
// if stream does not allow seeking, S3 client will throw error:
// Amazon.S3.AmazonS3Exception : Could not determine content length
// as a work around, if cannot use length property, will chunk
// file into sections and use UploadPart, so do not have to load
// entire file into memory as a single MemoryStream.
var r = new InitiateMultipartUploadRequest();
r.BucketName = s3Id.BucketName;
r.Key = s3Id.ObjectKey;
foreach (var item in args.CustomData)
{
r.Metadata[item.Key] = item.Value;
}
var multipartResponse = await S3Client.InitiateMultipartUploadAsync(r);
try
{
var completeRequest = new CompleteMultipartUploadRequest
{
UploadId = multipartResponse.UploadId,
BucketName = s3Id.BucketName,
Key = s3Id.ObjectKey,
};
// just using this size, because it is the max for Azure File Share, but it could be any size
// for S3, even a configured value
const int blockSize = 4194304;
// BinaryReader gives us access to ReadBytes
using (var reader = new BinaryReader(args.DocumentContents))
{
var partCounter = 1;
while (true)
{
byte[] buffer = reader.ReadBytes(blockSize);
if (buffer.Length == 0)
break;
using (MemoryStream uploadChunk = new MemoryStream(buffer))
{
uploadChunk.Position = 0;
var uploadRequest = new UploadPartRequest
{
BucketName = s3Id.BucketName,
Key = s3Id.ObjectKey,
UploadId = multipartResponse.UploadId,
PartNumber = partCounter,
InputStream = uploadChunk,
};
// could call UploadPart on multiple threads, instead of using await, but that would
// cause more data to be loaded into memory, which might be too much
var part2Task = await S3Client.UploadPartAsync(uploadRequest);
completeRequest.AddPartETags(part2Task);
}
partCounter++;
}
var completeResponse = await S3Client.CompleteMultipartUploadAsync(completeRequest);
}
}
catch
{
await S3Client.AbortMultipartUploadAsync(s3Id.BucketName, s3Id.ObjectKey
, multipartResponse.UploadId);
throw;
}
}

Related

Unable to copy to memorystream from gzipstream

I want to compress a binary file in memory using System.IO.Compression.GZipStream. For this, I am using the following method
public byte[] Encrypt()
{
var payload = GetPayload();
Console.WriteLine("[!] Payload Size: {0} bytes", payload.Length);
using (var compressedStream = new MemoryStream(payload))
using (var zipStream = new GZipStream(compressedStream, CompressionMode.Compress))
using (var resultStream = new MemoryStream())
{
zipStream.CopyTo(resultStream);
return resultStream.ToArray();
}
}
But while .CopyTo, I am getting System.NotSupportedException: Stream does not support reading.
You need to "inverse" your logic: create GZipStream over empty MemoryStream and copy your original content into this gzip stream:
using var compressedData = new MemoryStream();
using var gzip = new GZipStream(compressedData);
originalUncompressedStream.CopyTo(gzip); // <- "magic" happens here
gzip.Flush();
// and "rewind" result stream back to beginning (for next reads)
compressedData.Position = 0;

Extracting AES encrypted zip to MemoryStream

I am developing a process to compress and encrypt a byte array in my desktop application and send it via a WebMethod to my web application, then uncompress/unencrypt it back to a byte array. I am currently attempting to do this with SharpZipLib. The compression of the file seems to be working as expected. I am able to save the file to disk and extract it using 7zip without issue.
The problem I am having is when I receive the byte array on my web server and attempt to extract it.
I use the CompressData method to compress the data on the desktop side.
private byte[] CompressData(byte[] data, string password)
{
MemoryStream input = new MemoryStream(data);
MemoryStream ms = new MemoryStream();
ZipOutputStream os = new ZipOutputStream(ms);
os.SetLevel(9);
if (!string.IsNullOrEmpty(password)) os.Password = password;
ZipEntry entry = new ZipEntry("data")
{
DateTime = DateTime.Now
};
if (!string.IsNullOrEmpty(password)) entry.AESKeySize = 256;
os.PutNextEntry(entry);
StreamUtils.Copy(input, os, new byte[4096]);
os.CloseEntry();
os.IsStreamOwner = false;
os.Close();
ms.Position = 0;
return ms.ToArray();
}
I am using the following code to extract the data on the server end (taken almost verbatim from the SharpZipLib examples):
private byte[] DoRebuildData(byte[] data, string password)
{
MemoryStream inStream = new MemoryStream(data);
MemoryStream outputMemStream = new MemoryStream();
ZipOutputStream zipOut = new ZipOutputStream(outputMemStream)
{
IsStreamOwner = false // False stops the Close also Closing the underlying stream.
};
zipOut.SetLevel(3);
zipOut.Password = password; // optional
RecursiveExtractRebuild(inStream, zipOut);
inStream.Close();
// Must finish the ZipOutputStream to finalise output before using outputMemStream.
zipOut.Close();
outputMemStream.Position = 0;
return outputMemStream.ToArray();
}
// Calls itself recursively if embedded zip
//
private void RecursiveExtractRebuild(Stream str, ZipOutputStream os)
{
ZipFile zipFile = new ZipFile(str)
{
IsStreamOwner = false
};
foreach (ZipEntry zipEntry in zipFile)
{
if (!zipEntry.IsFile)
continue;
String entryFileName = zipEntry.Name; // or Path.GetFileName(zipEntry.Name) to omit folder
// Specify any other filtering here.
Stream zipStream = zipFile.GetInputStream(zipEntry);
// Zips-within-zips are extracted. If you don't want this and wish to keep embedded zips as-is, just delete these 3 lines.
if (entryFileName.EndsWith(".zip", StringComparison.OrdinalIgnoreCase))
{
RecursiveExtractRebuild(zipStream, os);
}
else
{
ZipEntry newEntry = new ZipEntry(entryFileName);
newEntry.DateTime = zipEntry.DateTime;
newEntry.Size = zipEntry.Size;
// Setting the Size will allow the zip to be unpacked by XP's built-in extractor and other older code.
os.PutNextEntry(newEntry);
StreamUtils.Copy(zipStream, os, new byte[4096]);
os.CloseEntry();
}
}
}
The expected result is to get back my original byte array on the server.
On the server, when it comes to the line:
Stream zipStream = zipFile.GetInputStream(zipEntry);
I receive the error 'No password available for AES encrypted stream.'
The only place I see to set a password is in the ZipOutputStream object, and I have checked at runtime, and this is set appropriately.
When unpacking, the password must be assigned to the password-property of the ZipFile-instance, i.e. it must be set in the RecursiveExtractRebuild-method (for this the password has to be added as an additional parameter):
zipFile.Password = password;
as shown in this example.
It should be noted that the current DoRebuildData-method doesn't actually unpack the data, but re-packs it into a new zip. The (optional) line in the DoRebuildData-method:
zipOut.Password = password;
does not specify the password for the unpacking (i.e. for the old zip), but defines the password for the new zip.

Strange wikipedia mojibake (faulty coding)

This is the first Wikipedia page that appears a problem to me. When I use HttpWebResponse.GetResponseStream() to open this page https://en.wikipedia.org/wiki/London, it's full of mojibake. But my browser can encode it without problems.
I have used three methods to download the text file. And all of them get different files.
The first method downloaded a file of 274,851 bytes
string TargetUri = "https://en.wikipedia.org/wiki/London";
HttpWebRequest queryPage = (HttpWebRequest)WebRequest.Create(TargetUri);
queryPage.Credentials = CredentialCache.DefaultCredentials;
using (HttpWebResponse response = (HttpWebResponse)queryPage.GetResponse())
{
using (Stream PageRawCode = response.GetResponseStream())
{
using (MemoryStream PageRawCodeDuplicate = new MemoryStream())
{
byte[] buffer = new byte[1024];
int ByteCount;
do
{
ByteCount = PageRawCode.Read(buffer, 0, buffer.Length);
PageRawCodeDuplicate.Write(buffer, 0, ByteCount);
} while (ByteCount > 0);
PageRawCodeDuplicate.Seek(0, SeekOrigin.Begin);
using (StreamReader CodeInUTF8 = new StreamReader(PageRawCodeDuplicate))
{
string PageText = CodeInUTF8.ReadToEnd();
using (StreamWriter sw = new StreamWriter(#"E:\My Documents\Desktop\london1.html"))
{
sw.Write(PageText);
}
}
}
}
}
The second method is
WebClient myWebClient = new WebClient();
myWebClient.DownloadFile("https://en.wikipedia.org/wiki/London", #"E:\My Documents\Desktop\london2.html");
This method only downloaded a file of 152.297 bytes
The third method is to open the https://en.wikipedia.org/wiki/London, and save the source code file. This method will get a file of 1,746,420 bytes
I don't understand why there is a such a difference using different method get a text file.
I have used ASCII, BigEndianUnicode, Unicode, UTF32, UTF7, UTF8 to read the first 2 files. None of them shows the code correctly.
then I read the hex code of the files. The first 32 characters of london1.html is
1FEFBFBD0800000000000003EFBFBDEF
The first 32 characters of london2.html is
1F8B0800000000000003ECFD4B8F1C49
Obviously they are not <!DOCTYPE html>
What are these two files? I don't even know how to inspect them.
There is a simple issue in your code. You forgot to flush the memorystream. I've also added a second solution that doesn't copy the Stream in memory first...
If I run this slightly adapted code, I get a complete html file:
using (HttpWebResponse response = (HttpWebResponse)queryPage.GetResponse())
{
using (Stream PageRawCode = response.GetResponseStream())
{
using (MemoryStream PageRawCodeDuplicate = new MemoryStream())
{
byte[] buffer = new byte[1024];
int ByteCount;
do
{
ByteCount = PageRawCode.Read(buffer, 0, buffer.Length);
PageRawCodeDuplicate.Write(buffer, 0, ByteCount);
} while (ByteCount > 0);
// FLUSH!
PageRawCodeDuplicate.Flush();
PageRawCodeDuplicate.Seek(0, SeekOrigin.Begin);
// Pick an encoding here
using (StreamReader CodeInUTF8 = new StreamReader(
PageRawCodeDuplicate, Encoding.UTF8))
{
string PageText = CodeInUTF8.ReadToEnd();
using (StreamWriter sw = new StreamWriter(#"london1.html"))
{
sw.Write(PageText);
}
}
}
}
}
Direct copy of the stream
using (HttpWebResponse response = (HttpWebResponse)queryPage.GetResponse())
{
using (Stream PageRawCode = response.GetResponseStream())
{
using (StreamReader CodeInUTF8 = new StreamReader(
PageRawCode, Encoding.UTF8))
{
using (StreamWriter sw = new StreamWriter(#"london1.html"))
{
while (!CodeInUTF8.EndOfStream)
{
sw.WriteLine(CodeInUTF8.ReadLine());
}
}
}
}
}
Finally mystery SOLVED! The text stream is a GZipStream. Using GZipStream decompress can read the code.
http://msdn.microsoft.com/en-us/library/system.io.compression.gzipstream.aspx
It's hard to imagine how much the browser does behind

Download large file in small chunks in C#

I need to download some file which is more than 25 MB large, but my network only allow to request a file of 25 MB only.
I am using following code
const long DefaultSize = 26214400;
long Chunk = 26214400;
long offset = 0;
byte[] bytesInStream;
public void Download(string url, string filename)
{
long size = Size(url);
int blocksize = Convert.ToInt32(size / DefaultSize);
int remainder = Convert.ToInt32(size % DefaultSize);
if (remainder > 0) { blocksize++; }
FileStream fileStream = File.Create(#"D:\Download TEST\" + filename);
for (int i = 0; i < blocksize; i++)
{
if (i == blocksize - 1)
{
Chunk = remainder;
}
HttpWebRequest req = (HttpWebRequest)System.Net.WebRequest.Create(url);
req.Method = "GET";
req.AddRange(Convert.ToInt32(offset), Convert.ToInt32(Chunk+offset));
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
// StreamReader sr = new StreamReader(resp.GetResponseStream());
using (Stream responseStream = resp.GetResponseStream())
{
bytesInStream = new byte[Chunk];
responseStream.Read(bytesInStream, 0, (int)bytesInStream.Length);
// Use FileStream object to write to the specified file
fileStream.Seek((int)offset, SeekOrigin.Begin);
fileStream.Write(bytesInStream,0, bytesInStream.Length);
}
offset += Chunk;
}
fileStream.Close();
}
public long Size(string url)
{
System.Net.WebRequest req = System.Net.HttpWebRequest.Create(url);
req.Method = "HEAD";
System.Net.WebResponse resp = req.GetResponse();
resp.Close();
return resp.ContentLength;
}
It is properly writing content on disk but content is not working
You should check how much was read before write, something like this (and you don't need to remember the offset to seek, the seek is automatic when you write):
int read;
do
{
read = responseStream.Read(bytesInStream, 0, (int)bytesInStream.Length);
if (read > 0)
fileStream.Write(bytesInStream, 0, read);
}
while(read > 0);
There is a similar SO questions that might help you
Segmented C# file downloader
and
How to open multiple connections to download single file?
Also this code project article
http://www.codeproject.com/Tips/307548/Resume-Suppoert-Downloading
Range is zero based and you should subtract 1 from upper bound.
request.Headers.Range = new System.Net.Http.Headers.RangeHeaderValue(offset, chunkSize + offset - 1);
I published correct code fragment at the following link:
https://stackoverflow.com/a/48019611/1099716
Akka streams can help download file in small chunks from a System.IO.Stream using multithreading. https://getakka.net/articles/intro/what-is-akka.html
The Download method will append the bytes to the file starting with long fileStart. If the file does not exist, fileStart value must be 0.
using Akka.Actor;
using Akka.IO;
using Akka.Streams;
using Akka.Streams.Dsl;
using Akka.Streams.IO;
private static Sink<ByteString, Task<IOResult>> FileSink(string filename)
{
return Flow.Create<ByteString>()
.ToMaterialized(FileIO.ToFile(new FileInfo(filename), FileMode.Append), Keep.Right);
}
private async Task Download(string path, Uri uri, long fileStart)
{
using (var system = ActorSystem.Create("system"))
using (var materializer = system.Materializer())
{
HttpWebRequest request = WebRequest.Create(uri) as HttpWebRequest;
request.AddRange(fileStart);
using (WebResponse response = request.GetResponse())
{
Stream stream = response.GetResponseStream();
await StreamConverters.FromInputStream(() => stream, chunkSize: 1024)
.RunWith(FileSink(path), materializer);
}
}
}

Biztalk 2010 Custom Pipeline Component returns binary

I have created a custom pipeline component which transforms a complex excel spreadsheet to XML. The transformation works fine and I can write out the data to check. However when I assign this data to the BodyPart.Data part of the inMsg or a new message I always get a routing failure. When I look at the message in the admin console it appears that the body contains binary data (I presume the original excel) rather than the XML I have assigned - see screen shot below. I have followed numerous tutorials and many different ways of doing this but always get the same result.
My current code is:
public Microsoft.BizTalk.Message.Interop.IBaseMessage Execute(Microsoft.BizTalk.Component.Interop.IPipelineContext pc, Microsoft.BizTalk.Message.Interop.IBaseMessage inmsg)
{
//make sure we have something
if (inmsg == null || inmsg.BodyPart == null || inmsg.BodyPart.Data == null)
{
throw new ArgumentNullException("inmsg");
}
IBaseMessagePart bodyPart = inmsg.BodyPart;
//create a temporary directory
const string tempDir = #"C:\test\excel";
if (!Directory.Exists(tempDir))
{
Directory.CreateDirectory(tempDir);
}
//get the input filename
string inputFileName = Convert.ToString(inmsg.Context.Read("ReceivedFileName", "http://schemas.microsoft.com/BizTalk/2003/file-properties"));
swTemp.WriteLine("inputFileName: " + inputFileName);
//set path to write excel file
string excelPath = tempDir + #"\" + Path.GetFileName(inputFileName);
swTemp.WriteLine("excelPath: " + excelPath);
//write the excel file to a temporary folder
bodyPart = inmsg.BodyPart;
Stream inboundStream = bodyPart.GetOriginalDataStream();
Stream outFile = File.Create(excelPath);
inboundStream.CopyTo(outFile);
outFile.Close();
//process excel file to return XML
var spreadsheet = new SpreadSheet();
string strXmlOut = spreadsheet.ProcessWorkbook(excelPath);
//now build an XML doc to hold this data
XmlDocument xDoc = new XmlDocument();
xDoc.LoadXml(strXmlOut);
XmlDocument finalMsg = new XmlDocument();
XmlElement xEle;
xEle = finalMsg.CreateElement("ns0", "BizTalk_Test_Amey_Pipeline.textXML",
"http://tempuri.org/INT018_Workbook.xsd");
finalMsg.AppendChild(xEle);
finalMsg.FirstChild.InnerXml = xDoc.FirstChild.InnerXml;
//write xml to memory stream
swTemp.WriteLine("Write xml to memory stream");
MemoryStream streamXmlOut = new MemoryStream();
finalMsg.Save(streamXmlOut);
streamXmlOut.Position = 0;
inmsg.BodyPart.Data = streamXmlOut;
pc.ResourceTracker.AddResource(streamXmlOut);
return inmsg;
}
Here is a sample of writing the message back:
IBaseMessage Microsoft.BizTalk.Component.Interop.IComponent.Execute(IPipelineContext pContext, IBaseMessage pInMsg)
{
IBaseMessagePart bodyPart = pInMsg.BodyPart;
if (bodyPart != null)
{
using (Stream originalStrm = bodyPart.GetOriginalDataStream())
{
byte[] changedMessage = ConvertToBytes(ret);
using (Stream strm = new AsciiStream(originalStrm, changedMessage, resManager))
{
// Setup the custom stream to put it back in the message.
bodyPart.Data = strm;
pContext.ResourceTracker.AddResource(strm);
}
}
}
return pInMsg;
}
The AsciiStream used a method like this to read the stream:
override public int Read(byte[] buffer, int offset, int count)
{
int ret = 0;
int bytesRead = 0;
byte[] FixedData = this.changedBytes;
if (FixedData != null)
{
bytesRead = count > (FixedData.Length - overallOffset) ? FixedData.Length - overallOffset : count;
Array.Copy(FixedData, overallOffset, buffer, offset, bytesRead);
if (FixedData.Length == (bytesRead + overallOffset))
this.changedBytes = null;
// Increment the overall offset.
overallOffset += bytesRead;
offset += bytesRead;
count -= bytesRead;
ret += bytesRead;
}
return ret;
}
I would first of all add more logging to your component around the MemoryStream logic - maybe write the file out to the file system so you can make sure the Xml version is correct. You can also attach to the BizTalk process and step through the code for the component which makes debugging a lot easier.
I would try switching the use of MemoryStream to a more basic custom stream that writes the bytes for you. In the BizTalk SDK samples for pipeline components there are some examples for a custom stream. You would have to customize the stream sample so it just writes the stream. I can work on posting an example. So do the additional diagnostics above first.
Thanks,

Resources