Strange wikipedia mojibake (faulty coding) - asp.net

This is the first Wikipedia page that appears a problem to me. When I use HttpWebResponse.GetResponseStream() to open this page https://en.wikipedia.org/wiki/London, it's full of mojibake. But my browser can encode it without problems.
I have used three methods to download the text file. And all of them get different files.
The first method downloaded a file of 274,851 bytes
string TargetUri = "https://en.wikipedia.org/wiki/London";
HttpWebRequest queryPage = (HttpWebRequest)WebRequest.Create(TargetUri);
queryPage.Credentials = CredentialCache.DefaultCredentials;
using (HttpWebResponse response = (HttpWebResponse)queryPage.GetResponse())
{
using (Stream PageRawCode = response.GetResponseStream())
{
using (MemoryStream PageRawCodeDuplicate = new MemoryStream())
{
byte[] buffer = new byte[1024];
int ByteCount;
do
{
ByteCount = PageRawCode.Read(buffer, 0, buffer.Length);
PageRawCodeDuplicate.Write(buffer, 0, ByteCount);
} while (ByteCount > 0);
PageRawCodeDuplicate.Seek(0, SeekOrigin.Begin);
using (StreamReader CodeInUTF8 = new StreamReader(PageRawCodeDuplicate))
{
string PageText = CodeInUTF8.ReadToEnd();
using (StreamWriter sw = new StreamWriter(#"E:\My Documents\Desktop\london1.html"))
{
sw.Write(PageText);
}
}
}
}
}
The second method is
WebClient myWebClient = new WebClient();
myWebClient.DownloadFile("https://en.wikipedia.org/wiki/London", #"E:\My Documents\Desktop\london2.html");
This method only downloaded a file of 152.297 bytes
The third method is to open the https://en.wikipedia.org/wiki/London, and save the source code file. This method will get a file of 1,746,420 bytes
I don't understand why there is a such a difference using different method get a text file.
I have used ASCII, BigEndianUnicode, Unicode, UTF32, UTF7, UTF8 to read the first 2 files. None of them shows the code correctly.
then I read the hex code of the files. The first 32 characters of london1.html is
1FEFBFBD0800000000000003EFBFBDEF
The first 32 characters of london2.html is
1F8B0800000000000003ECFD4B8F1C49
Obviously they are not <!DOCTYPE html>
What are these two files? I don't even know how to inspect them.

There is a simple issue in your code. You forgot to flush the memorystream. I've also added a second solution that doesn't copy the Stream in memory first...
If I run this slightly adapted code, I get a complete html file:
using (HttpWebResponse response = (HttpWebResponse)queryPage.GetResponse())
{
using (Stream PageRawCode = response.GetResponseStream())
{
using (MemoryStream PageRawCodeDuplicate = new MemoryStream())
{
byte[] buffer = new byte[1024];
int ByteCount;
do
{
ByteCount = PageRawCode.Read(buffer, 0, buffer.Length);
PageRawCodeDuplicate.Write(buffer, 0, ByteCount);
} while (ByteCount > 0);
// FLUSH!
PageRawCodeDuplicate.Flush();
PageRawCodeDuplicate.Seek(0, SeekOrigin.Begin);
// Pick an encoding here
using (StreamReader CodeInUTF8 = new StreamReader(
PageRawCodeDuplicate, Encoding.UTF8))
{
string PageText = CodeInUTF8.ReadToEnd();
using (StreamWriter sw = new StreamWriter(#"london1.html"))
{
sw.Write(PageText);
}
}
}
}
}
Direct copy of the stream
using (HttpWebResponse response = (HttpWebResponse)queryPage.GetResponse())
{
using (Stream PageRawCode = response.GetResponseStream())
{
using (StreamReader CodeInUTF8 = new StreamReader(
PageRawCode, Encoding.UTF8))
{
using (StreamWriter sw = new StreamWriter(#"london1.html"))
{
while (!CodeInUTF8.EndOfStream)
{
sw.WriteLine(CodeInUTF8.ReadLine());
}
}
}
}
}

Finally mystery SOLVED! The text stream is a GZipStream. Using GZipStream decompress can read the code.
http://msdn.microsoft.com/en-us/library/system.io.compression.gzipstream.aspx
It's hard to imagine how much the browser does behind

Related

Unable to copy to memorystream from gzipstream

I want to compress a binary file in memory using System.IO.Compression.GZipStream. For this, I am using the following method
public byte[] Encrypt()
{
var payload = GetPayload();
Console.WriteLine("[!] Payload Size: {0} bytes", payload.Length);
using (var compressedStream = new MemoryStream(payload))
using (var zipStream = new GZipStream(compressedStream, CompressionMode.Compress))
using (var resultStream = new MemoryStream())
{
zipStream.CopyTo(resultStream);
return resultStream.ToArray();
}
}
But while .CopyTo, I am getting System.NotSupportedException: Stream does not support reading.
You need to "inverse" your logic: create GZipStream over empty MemoryStream and copy your original content into this gzip stream:
using var compressedData = new MemoryStream();
using var gzip = new GZipStream(compressedData);
originalUncompressedStream.CopyTo(gzip); // <- "magic" happens here
gzip.Flush();
// and "rewind" result stream back to beginning (for next reads)
compressedData.Position = 0;

Download file with progress bar in Xamarin Forms

I am trying to make a download page in Xamarin Forms (PCL, so WebClient is not usable) with a Download progress bar. I have used the following information from Xamarin, but without success:
http://developer.xamarin.com/recipes/ios/network/web_requests/download_a_file/
http://developer.xamarin.com/recipes/cross-platform/networking/download_progress/
This is my current code (with a working progress bar):
using System;
using System.Collections.Generic;
using Xamarin.Forms;
using System.Net.Http;
using System.IO;
using System.Threading.Tasks;
namespace DownloadExample
{
public partial class DownloadPage : ContentPage
{
public DownloadPage ()
{
InitializeComponent ();
DownloadFile("https://upload.wikimedia.org/wikipedia/commons/3/3d/LARGE_elevation.jpg");
}
private async Task<long> DownloadFile(string url)
{
long receivedBytes = 0;
long totalBytes = 0;
HttpClient client = new HttpClient ();
using (var stream = await client.GetStreamAsync(url)) {
byte[] buffer = new byte[4096];
totalBytes = stream.Length;
for (;;) {
int bytesRead = await stream.ReadAsync (buffer, 0, buffer.Length);
if (bytesRead == 0) {
await Task.Yield ();
break;
}
receivedBytes += bytesRead;
int received = unchecked((int)receivedBytes);
int total = unchecked((int)totalBytes);
double percentage = ((float) received) / total;
progressBar1.Progress = percentage;
}
}
return receivedBytes;
}
}
}
Now, I need to save the file to my local storage. But, in this example, I'm not getting the file content, so I can't write it to my local storage. What do I need to change in the code to make this possible?
FYI: In this example, I'm downloading an image, but it will be a .pdf / .doc / .docx in feature.
Thanks in advance.
BR, FG
Think your article,I also solved download file
WebClient client = new WebClient();
using (var stream = await client.OpenReadTaskAsync(Download point))
{
using (MemoryStream ms = new MemoryStream())
{
var buffer = new byte[BufferSize];
int read = 0;
totalBytes = Int32.Parse(client.ResponseHeaders[HttpResponseHeader.ContentLength]);
while ((read = await stream.ReadAsync(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, read);//important : receive every buffer
receivedBytes += read;
received = unchecked((int)receivedBytes);
total = unchecked((int)totalBytes);
percentage = ((float)received) / total;
progressBar1.Progress = percentage;
labProg.Text = AppResources.MsgDownloadprogress + Math.Truncate(percentage * 100).ToString() + "%";
}//END while
Stream ALLstream = new MemoryStream(ms.ToArray());//important change Stream
App.APK.GenApkFile(ALLstream);//change APK file and save memory of phone
}//END using (MemoryStream
stream.Close();
}//END using (var stream
You actually copy the file content to your buffer inside the for loop. Concat the buffer content each run of that loop into a new byte[] fileContentBuffer and you have access to the contents which you can save in the local storage.

Get image from URL and upload to Amazon S3

I'd like to load an image directly from a URL but without saving it on the server, I want to upload it directly from memory to Amazon S3 server.
This is my code:
Dim wc As New WebClient
Dim fileStream As IO.Stream = wc.OpenRead("http://www.domain.com/image.jpg")
Dim request As New PutObjectRequest()
request.BucketName = "mybucket"
request.Key = "file.jpg"
request.InputStream = fileStream
client.PutObject(request)
The Amazon API gives me the error "Could not determine content length". The stream fileStream ends up as "System.Net.ConnectStream" which I'm not sure if it's correct.
The exact same code works with files from the HttpPostedFile but I need to use it in this way now.
Any ideas how I can convert the stream to become what Amazon API is expecting (with the length intact)?
I had the same problem when I'm using the GetObjectResponse() method and its propertie ResponseStream to copy a file from a folder to another in same bucket. I noted that the AWS SDK (2.3.45) have some faults like a another method called WriteResponseStreamToFile in GetObjectResponse() that simply doesn't work. These lacks of functions needs some workarounds.
I solved the problem openning the file in array of bytes and putting it in a MemoryStream object.
Try this (C# code)
WebClient wc = new WebClient();
Stream fileStream = wc.OpenRead("http://www.domain.com/image.jpg");
byte[] fileBytes = fileStream.ToArrayBytes();
PutObjectRequest request = new PutObjectRequest();
request.BucketName = "mybucket";
request.Key = "file.jpg";
request.InputStream = new MemoryStream(fileBytes);
client.PutObject(request);
The extesion method
public static byte[] ToArrayBytes(this Stream input)
{
byte[] buffer = new byte[16 * 1024];
using (MemoryStream ms = new MemoryStream())
{
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, read);
}
return ms.ToArray();
}
}
You can also create a MemoryStream without an array of bytes. But after the first PutObject in S3, the MemoryStream will be discarted. If you need to put others objects, I recommend the first option
WebClient wc = new WebClient();
Stream fileStream = wc.OpenRead("http://www.domain.com/image.jpg");
MemoryStream fileMemoryStream = fileStream.ToMemoryStream();
PutObjectRequest request = new PutObjectRequest();
request.BucketName = "mybucket";
request.Key = "file.jpg";
request.InputStream = fileMemoryStream ;
client.PutObject(request);
The extesion method
public static MemoryStream ToMemoryStream(this Stream input)
{
byte[] buffer = new byte[16 * 1024];
int read;
MemoryStream ms = new MemoryStream();
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, read);
}
return ms;
}
I had the same problem in a similar scenario.
The reason for the error is that to upload an object the SDK needs to know the whole content length that is going to be uploaded. To be able to obtain stream length it must be seekable, but the stream returned from WebClient is not. To indicate the expected length set Headers.ContentLength in PutObjectRequest. The SDK will use this value if it cannot determine length from the stream object.
To make your code work, obtain content length from the response headers returned by the call made by WebClient. Then set PutObjectRequest.Headers.ContentLength. Of course this relies on the server returned content length value.
Dim wc As New WebClient
Dim fileStream As IO.Stream = wc.OpenRead("http://www.example.com/image.jpg")
Dim contentLength As Long = Long.Parse(client.ResponseHeaders("Content-Length"))
Dim request As New PutObjectRequest()
request.BucketName = "mybucket"
request.Key = "file.jpg"
request.InputStream = fileStream
request.Headers.ContentLength = contentLength
client.PutObject(request)
I came up with a solution that uses UploadPart when the length is not available by any other means, plus this does not load the entire file into memory.
if (args.DocumentContents.CanSeek)
{
PutObjectRequest r = new PutObjectRequest();
r.InputStream = args.DocumentContents;
r.BucketName = s3Id.BucketName;
r.Key = s3Id.ObjectKey;
foreach (var item in args.CustomData)
{
r.Metadata[item.Key] = item.Value;
}
await S3Client.PutObjectAsync(r);
}
else
{
// if stream does not allow seeking, S3 client will throw error:
// Amazon.S3.AmazonS3Exception : Could not determine content length
// as a work around, if cannot use length property, will chunk
// file into sections and use UploadPart, so do not have to load
// entire file into memory as a single MemoryStream.
var r = new InitiateMultipartUploadRequest();
r.BucketName = s3Id.BucketName;
r.Key = s3Id.ObjectKey;
foreach (var item in args.CustomData)
{
r.Metadata[item.Key] = item.Value;
}
var multipartResponse = await S3Client.InitiateMultipartUploadAsync(r);
try
{
var completeRequest = new CompleteMultipartUploadRequest
{
UploadId = multipartResponse.UploadId,
BucketName = s3Id.BucketName,
Key = s3Id.ObjectKey,
};
// just using this size, because it is the max for Azure File Share, but it could be any size
// for S3, even a configured value
const int blockSize = 4194304;
// BinaryReader gives us access to ReadBytes
using (var reader = new BinaryReader(args.DocumentContents))
{
var partCounter = 1;
while (true)
{
byte[] buffer = reader.ReadBytes(blockSize);
if (buffer.Length == 0)
break;
using (MemoryStream uploadChunk = new MemoryStream(buffer))
{
uploadChunk.Position = 0;
var uploadRequest = new UploadPartRequest
{
BucketName = s3Id.BucketName,
Key = s3Id.ObjectKey,
UploadId = multipartResponse.UploadId,
PartNumber = partCounter,
InputStream = uploadChunk,
};
// could call UploadPart on multiple threads, instead of using await, but that would
// cause more data to be loaded into memory, which might be too much
var part2Task = await S3Client.UploadPartAsync(uploadRequest);
completeRequest.AddPartETags(part2Task);
}
partCounter++;
}
var completeResponse = await S3Client.CompleteMultipartUploadAsync(completeRequest);
}
}
catch
{
await S3Client.AbortMultipartUploadAsync(s3Id.BucketName, s3Id.ObjectKey
, multipartResponse.UploadId);
throw;
}
}

Download large file in small chunks in C#

I need to download some file which is more than 25 MB large, but my network only allow to request a file of 25 MB only.
I am using following code
const long DefaultSize = 26214400;
long Chunk = 26214400;
long offset = 0;
byte[] bytesInStream;
public void Download(string url, string filename)
{
long size = Size(url);
int blocksize = Convert.ToInt32(size / DefaultSize);
int remainder = Convert.ToInt32(size % DefaultSize);
if (remainder > 0) { blocksize++; }
FileStream fileStream = File.Create(#"D:\Download TEST\" + filename);
for (int i = 0; i < blocksize; i++)
{
if (i == blocksize - 1)
{
Chunk = remainder;
}
HttpWebRequest req = (HttpWebRequest)System.Net.WebRequest.Create(url);
req.Method = "GET";
req.AddRange(Convert.ToInt32(offset), Convert.ToInt32(Chunk+offset));
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
// StreamReader sr = new StreamReader(resp.GetResponseStream());
using (Stream responseStream = resp.GetResponseStream())
{
bytesInStream = new byte[Chunk];
responseStream.Read(bytesInStream, 0, (int)bytesInStream.Length);
// Use FileStream object to write to the specified file
fileStream.Seek((int)offset, SeekOrigin.Begin);
fileStream.Write(bytesInStream,0, bytesInStream.Length);
}
offset += Chunk;
}
fileStream.Close();
}
public long Size(string url)
{
System.Net.WebRequest req = System.Net.HttpWebRequest.Create(url);
req.Method = "HEAD";
System.Net.WebResponse resp = req.GetResponse();
resp.Close();
return resp.ContentLength;
}
It is properly writing content on disk but content is not working
You should check how much was read before write, something like this (and you don't need to remember the offset to seek, the seek is automatic when you write):
int read;
do
{
read = responseStream.Read(bytesInStream, 0, (int)bytesInStream.Length);
if (read > 0)
fileStream.Write(bytesInStream, 0, read);
}
while(read > 0);
There is a similar SO questions that might help you
Segmented C# file downloader
and
How to open multiple connections to download single file?
Also this code project article
http://www.codeproject.com/Tips/307548/Resume-Suppoert-Downloading
Range is zero based and you should subtract 1 from upper bound.
request.Headers.Range = new System.Net.Http.Headers.RangeHeaderValue(offset, chunkSize + offset - 1);
I published correct code fragment at the following link:
https://stackoverflow.com/a/48019611/1099716
Akka streams can help download file in small chunks from a System.IO.Stream using multithreading. https://getakka.net/articles/intro/what-is-akka.html
The Download method will append the bytes to the file starting with long fileStart. If the file does not exist, fileStart value must be 0.
using Akka.Actor;
using Akka.IO;
using Akka.Streams;
using Akka.Streams.Dsl;
using Akka.Streams.IO;
private static Sink<ByteString, Task<IOResult>> FileSink(string filename)
{
return Flow.Create<ByteString>()
.ToMaterialized(FileIO.ToFile(new FileInfo(filename), FileMode.Append), Keep.Right);
}
private async Task Download(string path, Uri uri, long fileStart)
{
using (var system = ActorSystem.Create("system"))
using (var materializer = system.Materializer())
{
HttpWebRequest request = WebRequest.Create(uri) as HttpWebRequest;
request.AddRange(fileStart);
using (WebResponse response = request.GetResponse())
{
Stream stream = response.GetResponseStream();
await StreamConverters.FromInputStream(() => stream, chunkSize: 1024)
.RunWith(FileSink(path), materializer);
}
}
}

How do I set Filestream from an fileupload control?

This is my code and I can't seem to get the file I have in my FileUploadCotrol into the FILESTREAM.
// The buffer size is set to 2kb
int buffLength = 2048;
byte[] buff = new byte[buffLength];
int contentLen;
// Opens a file stream (System.IO.FileStream) to read the file to be uploaded
FileStream fs = fileInf.OpenRead();
try
{
// Stream to which the file to be upload is written
Stream strm = reqFTP.GetRequestStream();
// Read from the file stream 2kb at a time
contentLen = fs.Read(buff, 0, buffLength);
// Till Stream content ends
while (contentLen != 0)
{
// Write Content from the file stream to the FTP Upload Stream
strm.Write(buff, 0, contentLen);
contentLen = fs.Read(buff, 0, buffLength);
}
// Close the file stream and the Request Stream
strm.Close();
fs.Close();
}
It seems the I should be using the Fileupload control to do the from my website, yet it seams strange that the control creates a stream and not a filestream. Yes I am FTPing a file.
Here is a sample method that takes the two types we are targeting, FileInfo and FtpWebRequest, as arguments and streams data between them. I believe this will work.
void UploadFileToFtp(FileInfo file, FtpWebRequest req)
{
int buffLength = 2048;
using (var reader = new BinaryReader(file.OpenRead()))
{
using (var writer = new BinaryWriter(req.GetRequestStream()))
{
while (reader.PeekChar() > 0) writer.Write(reader.ReadBytes(buffLength));
}
}
}
Hope this helps!

Resources