I have created a custom pipeline component which transforms a complex excel spreadsheet to XML. The transformation works fine and I can write out the data to check. However when I assign this data to the BodyPart.Data part of the inMsg or a new message I always get a routing failure. When I look at the message in the admin console it appears that the body contains binary data (I presume the original excel) rather than the XML I have assigned - see screen shot below. I have followed numerous tutorials and many different ways of doing this but always get the same result.
My current code is:
public Microsoft.BizTalk.Message.Interop.IBaseMessage Execute(Microsoft.BizTalk.Component.Interop.IPipelineContext pc, Microsoft.BizTalk.Message.Interop.IBaseMessage inmsg)
{
//make sure we have something
if (inmsg == null || inmsg.BodyPart == null || inmsg.BodyPart.Data == null)
{
throw new ArgumentNullException("inmsg");
}
IBaseMessagePart bodyPart = inmsg.BodyPart;
//create a temporary directory
const string tempDir = #"C:\test\excel";
if (!Directory.Exists(tempDir))
{
Directory.CreateDirectory(tempDir);
}
//get the input filename
string inputFileName = Convert.ToString(inmsg.Context.Read("ReceivedFileName", "http://schemas.microsoft.com/BizTalk/2003/file-properties"));
swTemp.WriteLine("inputFileName: " + inputFileName);
//set path to write excel file
string excelPath = tempDir + #"\" + Path.GetFileName(inputFileName);
swTemp.WriteLine("excelPath: " + excelPath);
//write the excel file to a temporary folder
bodyPart = inmsg.BodyPart;
Stream inboundStream = bodyPart.GetOriginalDataStream();
Stream outFile = File.Create(excelPath);
inboundStream.CopyTo(outFile);
outFile.Close();
//process excel file to return XML
var spreadsheet = new SpreadSheet();
string strXmlOut = spreadsheet.ProcessWorkbook(excelPath);
//now build an XML doc to hold this data
XmlDocument xDoc = new XmlDocument();
xDoc.LoadXml(strXmlOut);
XmlDocument finalMsg = new XmlDocument();
XmlElement xEle;
xEle = finalMsg.CreateElement("ns0", "BizTalk_Test_Amey_Pipeline.textXML",
"http://tempuri.org/INT018_Workbook.xsd");
finalMsg.AppendChild(xEle);
finalMsg.FirstChild.InnerXml = xDoc.FirstChild.InnerXml;
//write xml to memory stream
swTemp.WriteLine("Write xml to memory stream");
MemoryStream streamXmlOut = new MemoryStream();
finalMsg.Save(streamXmlOut);
streamXmlOut.Position = 0;
inmsg.BodyPart.Data = streamXmlOut;
pc.ResourceTracker.AddResource(streamXmlOut);
return inmsg;
}
Here is a sample of writing the message back:
IBaseMessage Microsoft.BizTalk.Component.Interop.IComponent.Execute(IPipelineContext pContext, IBaseMessage pInMsg)
{
IBaseMessagePart bodyPart = pInMsg.BodyPart;
if (bodyPart != null)
{
using (Stream originalStrm = bodyPart.GetOriginalDataStream())
{
byte[] changedMessage = ConvertToBytes(ret);
using (Stream strm = new AsciiStream(originalStrm, changedMessage, resManager))
{
// Setup the custom stream to put it back in the message.
bodyPart.Data = strm;
pContext.ResourceTracker.AddResource(strm);
}
}
}
return pInMsg;
}
The AsciiStream used a method like this to read the stream:
override public int Read(byte[] buffer, int offset, int count)
{
int ret = 0;
int bytesRead = 0;
byte[] FixedData = this.changedBytes;
if (FixedData != null)
{
bytesRead = count > (FixedData.Length - overallOffset) ? FixedData.Length - overallOffset : count;
Array.Copy(FixedData, overallOffset, buffer, offset, bytesRead);
if (FixedData.Length == (bytesRead + overallOffset))
this.changedBytes = null;
// Increment the overall offset.
overallOffset += bytesRead;
offset += bytesRead;
count -= bytesRead;
ret += bytesRead;
}
return ret;
}
I would first of all add more logging to your component around the MemoryStream logic - maybe write the file out to the file system so you can make sure the Xml version is correct. You can also attach to the BizTalk process and step through the code for the component which makes debugging a lot easier.
I would try switching the use of MemoryStream to a more basic custom stream that writes the bytes for you. In the BizTalk SDK samples for pipeline components there are some examples for a custom stream. You would have to customize the stream sample so it just writes the stream. I can work on posting an example. So do the additional diagnostics above first.
Thanks,
Related
I want to read a 50milion record from 15G txt file and write in to elastic search
if (file.Length > 0)
{
string wwroot = _he.WebRootPath;
string contentpath = _he.ContentRootPath;
string path = Path.Combine(wwroot, "file/" + foldername);
if (!Directory.Exists(path))
{
var rcheck = Directory.CreateDirectory(path);
}
var filename = file.FileName;
var filepath = Path.Combine(path, filename);
if (filepath.Any())
{
using (FileStream stream = new FileStream(Path.Combine(path, filename), FileMode.Create))
{
file.CopyTo(stream);
}
}
string[] lines = System.IO.File.ReadAllLines(filepath);
var Plist = new List<Person>();
int i = 0;
foreach (var line in lines)
{
var newperson = new Person();
string[] sub = line.Split(":");
newperson.PId = sub[1];
newperson.FirstName = sub[2];
newperson.LastName = sub[3];
newperson.Gender = sub[4];
Plist.Add(newperson);
}
return View();
I can read and upload file but when in want to add to list I get error and only read 16000 items and my application is shutdown.
You need to read the file using a buffer. With a proper reading logic based on a buffer, you'll be able to read a file of any size.
This line here:
System.IO.File.ReadAllLines(filepath);
Reads ALL the content of 15 GB file at once, and attempts to put it all into memory. I don't know how your code managed to get past that line without throwing an OutOfMemoryException (reading "only" 4.62 GB file ate 19.2 GB of my memory when debugging).
Instead, use a buffer of a single line:
using var streamReader = File.OpenText(bigFilePath);
var fileLine = string.Empty;
while ((fileLine = streamReader.ReadLine()) != null)
{
// Your string line reading logic.
}
You will most probably not be able to keep all the records in the memory (depending on memory available), also sending them one by one to Elasticsearch would be an opposite of efficiency... so, you'll need to find a middle ground between those limitations. I would suggest batching, that is, sending records in a fixed-size groups. The size is for you to pick, but note that it shouldn't be super large or minimal, otherwise the benefits of using batching will be smaller.
Full code:
static void Main()
{
string wwroot = _he.WebRootPath;
string contentpath = _he.ContentRootPath;
string path = Path.Combine(wwroot, "file/" + foldername);
var peopleListBatch = new List<Person>();
const int BatchSize = 1024;
using var streamReader = File.OpenText(path);
var fileLine = string.Empty;
while ((fileLine = streamReader.ReadLine()) != null)
{
var lineParts = fileLine.Split(":");
var newperson = new Person
{
PId = lineParts[1],
FirstName = lineParts[2],
LastName = lineParts[3],
Gender = lineParts[4],
};
peopleListBatch.Add(newperson);
// Add to Elastic, but only when batch is full.
if (peopleListBatch.Count == BatchSize)
{
AddPersonsToElasticSearch(peopleListBatch);
peopleListBatch.Clear();
}
}
// Add remaining people, if any.
if (peopleListBatch.Count > 0)
{
AddPersonsToElasticSearch(peopleListBatch);
peopleListBatch.Clear();
}
}
Inserting to Elasticsearch is another story, and I leave that task to you:
static void AddPersonsToElasticSearch(List<Person> people)
{
// TODO: Add your inserting logic here.
}
I am trying to upload video files Amazon S3 using Multipart upload method in asp.net and I traced the upload progress using logs. It uploads 106496 each time and runs only single thread at a time. I did not notice that multiple threads running. Please clarify me on this why it is running single thread and it's taking long time to upload even for 20Mb file it's taking almost 2 minutes.
Here is my code, which uses UploadPartRequest.
private void UploadFileOnAmazon(string subUrl, string filename, Stream audioStream, string extension)
{
client = new AmazonS3Client(accessKey, secretKey, Amazon.RegionEndpoint.USEast1);
// List to store upload part responses.
List<UploadPartResponse> uploadResponses = new List<UploadPartResponse>();
// 1. Initialize.
InitiateMultipartUploadRequest initiateRequest = new InitiateMultipartUploadRequest
{
BucketName = bucketName,
Key = subUrl + filename
};
InitiateMultipartUploadResponse initResponse =
client.InitiateMultipartUpload(initiateRequest);
// 2. Upload Parts.
//long contentLength = new FileInfo(filePath).Length;
long contentLength = audioStream.Length;
long partSize = 5 * (long)Math.Pow(2, 20); // 5 MB
try
{
long filePosition = 0;
for (int i = 1; filePosition < contentLength; i++)
{
UploadPartRequest uploadRequest = new UploadPartRequest
{
BucketName = bucketName,
Key = subUrl + filename,
UploadId = initResponse.UploadId,
PartNumber = i,
PartSize = partSize,
FilePosition = filePosition,
InputStream = audioStream
//FilePath = filePath
};
// Upload part and add response to our list.
uploadRequest.StreamTransferProgress += new EventHandler<StreamTransferProgressArgs>(UploadPartProgressEventCallback);
uploadResponses.Add(client.UploadPart(uploadRequest));
filePosition += partSize;
}
logger.Info("Done");
// Step 3: complete.
CompleteMultipartUploadRequest completeRequest = new CompleteMultipartUploadRequest
{
BucketName = bucketName,
Key = subUrl + filename,
UploadId = initResponse.UploadId,
//PartETags = new List<PartETag>(uploadResponses)
};
completeRequest.AddPartETags(uploadResponses);
CompleteMultipartUploadResponse completeUploadResponse =
client.CompleteMultipartUpload(completeRequest);
}
catch (Exception exception)
{
Console.WriteLine("Exception occurred: {0}", exception.Message);
AbortMultipartUploadRequest abortMPURequest = new AbortMultipartUploadRequest
{
BucketName = bucketName,
Key = subUrl + filename,
UploadId = initResponse.UploadId
};
client.AbortMultipartUpload(abortMPURequest);
}
}
public static void UploadPartProgressEventCallback(object sender, StreamTransferProgressArgs e)
{
// Process event.
logger.DebugFormat("{0}/{1}", e.TransferredBytes, e.TotalBytes);
}
Is there anything wrong with my code or how to make threads run simultaneously to speed up upload?
Rather than managing the Multipart Upload yourself, try using the TransferUtility that does all the hard work for you!
See: Using the High-Level .NET API for Multipart Upload
The AmazonS3Client internally uses an AmazonS3Config instance to know the buffer size used for transfers (ref 1). This AmazonS3Config (ref 2) has a property named BufferSize whose default value is retrieved from a constant in AWSSDKUtils (ref 3) - which in the current SDK version defaults to 8192 bytes - quite small value IMHO.
You may use a custom instance of AmazonS3Config with an arbitrary BufferSize value. To build an AmazonS3Client instance that respects your custom configs, you have to pass the custom config to the client constructor. Example:
// Create credentials.
AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey);
// Create custom config.
AmazonS3Config config = new AmazonS3Config
{
RegionEndpoint = Amazon.RegionEndpoint.USEast1,
BufferSize = 512 * 1024, // 512 KiB
};
// Pass credentials + custom config to the client.
AmazonS3Client client = new AmazonS3Client(credentials, config);
// They uploaded happily ever after.
I'd like to load an image directly from a URL but without saving it on the server, I want to upload it directly from memory to Amazon S3 server.
This is my code:
Dim wc As New WebClient
Dim fileStream As IO.Stream = wc.OpenRead("http://www.domain.com/image.jpg")
Dim request As New PutObjectRequest()
request.BucketName = "mybucket"
request.Key = "file.jpg"
request.InputStream = fileStream
client.PutObject(request)
The Amazon API gives me the error "Could not determine content length". The stream fileStream ends up as "System.Net.ConnectStream" which I'm not sure if it's correct.
The exact same code works with files from the HttpPostedFile but I need to use it in this way now.
Any ideas how I can convert the stream to become what Amazon API is expecting (with the length intact)?
I had the same problem when I'm using the GetObjectResponse() method and its propertie ResponseStream to copy a file from a folder to another in same bucket. I noted that the AWS SDK (2.3.45) have some faults like a another method called WriteResponseStreamToFile in GetObjectResponse() that simply doesn't work. These lacks of functions needs some workarounds.
I solved the problem openning the file in array of bytes and putting it in a MemoryStream object.
Try this (C# code)
WebClient wc = new WebClient();
Stream fileStream = wc.OpenRead("http://www.domain.com/image.jpg");
byte[] fileBytes = fileStream.ToArrayBytes();
PutObjectRequest request = new PutObjectRequest();
request.BucketName = "mybucket";
request.Key = "file.jpg";
request.InputStream = new MemoryStream(fileBytes);
client.PutObject(request);
The extesion method
public static byte[] ToArrayBytes(this Stream input)
{
byte[] buffer = new byte[16 * 1024];
using (MemoryStream ms = new MemoryStream())
{
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, read);
}
return ms.ToArray();
}
}
You can also create a MemoryStream without an array of bytes. But after the first PutObject in S3, the MemoryStream will be discarted. If you need to put others objects, I recommend the first option
WebClient wc = new WebClient();
Stream fileStream = wc.OpenRead("http://www.domain.com/image.jpg");
MemoryStream fileMemoryStream = fileStream.ToMemoryStream();
PutObjectRequest request = new PutObjectRequest();
request.BucketName = "mybucket";
request.Key = "file.jpg";
request.InputStream = fileMemoryStream ;
client.PutObject(request);
The extesion method
public static MemoryStream ToMemoryStream(this Stream input)
{
byte[] buffer = new byte[16 * 1024];
int read;
MemoryStream ms = new MemoryStream();
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, read);
}
return ms;
}
I had the same problem in a similar scenario.
The reason for the error is that to upload an object the SDK needs to know the whole content length that is going to be uploaded. To be able to obtain stream length it must be seekable, but the stream returned from WebClient is not. To indicate the expected length set Headers.ContentLength in PutObjectRequest. The SDK will use this value if it cannot determine length from the stream object.
To make your code work, obtain content length from the response headers returned by the call made by WebClient. Then set PutObjectRequest.Headers.ContentLength. Of course this relies on the server returned content length value.
Dim wc As New WebClient
Dim fileStream As IO.Stream = wc.OpenRead("http://www.example.com/image.jpg")
Dim contentLength As Long = Long.Parse(client.ResponseHeaders("Content-Length"))
Dim request As New PutObjectRequest()
request.BucketName = "mybucket"
request.Key = "file.jpg"
request.InputStream = fileStream
request.Headers.ContentLength = contentLength
client.PutObject(request)
I came up with a solution that uses UploadPart when the length is not available by any other means, plus this does not load the entire file into memory.
if (args.DocumentContents.CanSeek)
{
PutObjectRequest r = new PutObjectRequest();
r.InputStream = args.DocumentContents;
r.BucketName = s3Id.BucketName;
r.Key = s3Id.ObjectKey;
foreach (var item in args.CustomData)
{
r.Metadata[item.Key] = item.Value;
}
await S3Client.PutObjectAsync(r);
}
else
{
// if stream does not allow seeking, S3 client will throw error:
// Amazon.S3.AmazonS3Exception : Could not determine content length
// as a work around, if cannot use length property, will chunk
// file into sections and use UploadPart, so do not have to load
// entire file into memory as a single MemoryStream.
var r = new InitiateMultipartUploadRequest();
r.BucketName = s3Id.BucketName;
r.Key = s3Id.ObjectKey;
foreach (var item in args.CustomData)
{
r.Metadata[item.Key] = item.Value;
}
var multipartResponse = await S3Client.InitiateMultipartUploadAsync(r);
try
{
var completeRequest = new CompleteMultipartUploadRequest
{
UploadId = multipartResponse.UploadId,
BucketName = s3Id.BucketName,
Key = s3Id.ObjectKey,
};
// just using this size, because it is the max for Azure File Share, but it could be any size
// for S3, even a configured value
const int blockSize = 4194304;
// BinaryReader gives us access to ReadBytes
using (var reader = new BinaryReader(args.DocumentContents))
{
var partCounter = 1;
while (true)
{
byte[] buffer = reader.ReadBytes(blockSize);
if (buffer.Length == 0)
break;
using (MemoryStream uploadChunk = new MemoryStream(buffer))
{
uploadChunk.Position = 0;
var uploadRequest = new UploadPartRequest
{
BucketName = s3Id.BucketName,
Key = s3Id.ObjectKey,
UploadId = multipartResponse.UploadId,
PartNumber = partCounter,
InputStream = uploadChunk,
};
// could call UploadPart on multiple threads, instead of using await, but that would
// cause more data to be loaded into memory, which might be too much
var part2Task = await S3Client.UploadPartAsync(uploadRequest);
completeRequest.AddPartETags(part2Task);
}
partCounter++;
}
var completeResponse = await S3Client.CompleteMultipartUploadAsync(completeRequest);
}
}
catch
{
await S3Client.AbortMultipartUploadAsync(s3Id.BucketName, s3Id.ObjectKey
, multipartResponse.UploadId);
throw;
}
}
I had tried to develop a servlet that allow user to download file but it allow user to download the file but the file content contains binary garbage and not human readable. May I know what could be the reason ?
Code
int length = -1, index = 0;
byte[] buffer = null;
String attachmentPath = null, contentType = null, extension = null;
File attachmentFile = null;
BufferedInputStream input = null;
ServletOutputStream output = null;
ServletContext context = null;
attachmentPath = request.getParameter("attachmentPath");
if (attachmentPath != null && !attachmentPath.isEmpty()) {
attachmentFile = new File(attachmentPath);
if (attachmentFile.exists()) {
response.reset();
context = super.getContext();
contentType = context.getMimeType(attachmentFile.getName());
response.setContentType(contentType);
response.addHeader("content-length", String.valueOf(attachmentFile.length()));
response.addHeader("content-disposition", "attachment;filename=" + attachmentFile.getName());
try {
buffer = new byte[AttachmentTask.DEFAULT_BUFFER_SIZE];
input = new BufferedInputStream(new FileInputStream(attachmentFile));
output = response.getOutputStream();
while ((length = input.read(buffer)) != -1) {
output.write(buffer, 0, length);
index += length;
// output.write(length);
}
output.flush();
input.close();
output.close();
} catch (FileNotFoundException exp) {
logger.error(exp.getMessage());
} catch (IOException exp) {
logger.error(exp.getMessage());
}
} else {
try {
response.sendError(HttpServletResponse.SC_NOT_FOUND);
} catch (IOException exp) {
logger.error(exp.getMessage());
}
}
It is relate to writing file as binary or text mode or browser settings?
Please help.
Thanks.
The problem is not in the code given so far. You're properly using InputStream/OutputStream instead of a Reader/Writer to stream the file.
The cause of the problem is more likely in the way how you created/saved the file. This problem will manifest when you've used a Reader and/or Writer which is not been instructed to use the proper character encoding for the characters being read/written. Perhaps you're creating an upload/download service and the fault was in the upload process itself?
Assuming that the data is in UTF-8, you should have created the reader as follows:
Reader reader = new InputStreamReader(new FileInputStream(file), "UTF-8"));
and the writer as follows:
Writer writer = new OutputStreamWriter(new FileOutputStream(file), "UTF-8"));
But if you actually don't need to manipulate the stream on a per-character basis, but just wanted to transfer the data unmodified, then you should actually have used InputStream/OutputStream all the time.
See also:
Unicode - How to get the characters right?
I need to download some file which is more than 25 MB large, but my network only allow to request a file of 25 MB only.
I am using following code
const long DefaultSize = 26214400;
long Chunk = 26214400;
long offset = 0;
byte[] bytesInStream;
public void Download(string url, string filename)
{
long size = Size(url);
int blocksize = Convert.ToInt32(size / DefaultSize);
int remainder = Convert.ToInt32(size % DefaultSize);
if (remainder > 0) { blocksize++; }
FileStream fileStream = File.Create(#"D:\Download TEST\" + filename);
for (int i = 0; i < blocksize; i++)
{
if (i == blocksize - 1)
{
Chunk = remainder;
}
HttpWebRequest req = (HttpWebRequest)System.Net.WebRequest.Create(url);
req.Method = "GET";
req.AddRange(Convert.ToInt32(offset), Convert.ToInt32(Chunk+offset));
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
// StreamReader sr = new StreamReader(resp.GetResponseStream());
using (Stream responseStream = resp.GetResponseStream())
{
bytesInStream = new byte[Chunk];
responseStream.Read(bytesInStream, 0, (int)bytesInStream.Length);
// Use FileStream object to write to the specified file
fileStream.Seek((int)offset, SeekOrigin.Begin);
fileStream.Write(bytesInStream,0, bytesInStream.Length);
}
offset += Chunk;
}
fileStream.Close();
}
public long Size(string url)
{
System.Net.WebRequest req = System.Net.HttpWebRequest.Create(url);
req.Method = "HEAD";
System.Net.WebResponse resp = req.GetResponse();
resp.Close();
return resp.ContentLength;
}
It is properly writing content on disk but content is not working
You should check how much was read before write, something like this (and you don't need to remember the offset to seek, the seek is automatic when you write):
int read;
do
{
read = responseStream.Read(bytesInStream, 0, (int)bytesInStream.Length);
if (read > 0)
fileStream.Write(bytesInStream, 0, read);
}
while(read > 0);
There is a similar SO questions that might help you
Segmented C# file downloader
and
How to open multiple connections to download single file?
Also this code project article
http://www.codeproject.com/Tips/307548/Resume-Suppoert-Downloading
Range is zero based and you should subtract 1 from upper bound.
request.Headers.Range = new System.Net.Http.Headers.RangeHeaderValue(offset, chunkSize + offset - 1);
I published correct code fragment at the following link:
https://stackoverflow.com/a/48019611/1099716
Akka streams can help download file in small chunks from a System.IO.Stream using multithreading. https://getakka.net/articles/intro/what-is-akka.html
The Download method will append the bytes to the file starting with long fileStart. If the file does not exist, fileStart value must be 0.
using Akka.Actor;
using Akka.IO;
using Akka.Streams;
using Akka.Streams.Dsl;
using Akka.Streams.IO;
private static Sink<ByteString, Task<IOResult>> FileSink(string filename)
{
return Flow.Create<ByteString>()
.ToMaterialized(FileIO.ToFile(new FileInfo(filename), FileMode.Append), Keep.Right);
}
private async Task Download(string path, Uri uri, long fileStart)
{
using (var system = ActorSystem.Create("system"))
using (var materializer = system.Materializer())
{
HttpWebRequest request = WebRequest.Create(uri) as HttpWebRequest;
request.AddRange(fileStart);
using (WebResponse response = request.GetResponse())
{
Stream stream = response.GetResponseStream();
await StreamConverters.FromInputStream(() => stream, chunkSize: 1024)
.RunWith(FileSink(path), materializer);
}
}
}