I have a working web service which On load contacts different websites and scrapes relevant information from them. As the requirements grew so did the number of httpwebrequests.
Right now I'm not using any asynchronous requests in the web service - Which means that ASP.net renders one request at a time. This obviously became a burden as one request to the webservice itself can take up to 2 minutes to complete.
Is there a way to convert all these httpwebreqeusts inside the webservice to multi-threaded?
What would be the best way to achieve this?
Thanks!
If you are working with .Net V4+, you can use the Parallel library or task library which allow easily to do such things.
If you call all your web services using the same way (assuming all web services respects the same WSDL, just differing urls, you can use something like this) :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Text.RegularExpressions;
namespace ConsoleApplication2
{
class Program
{
private const string StartUrl = #"http://blog.hand-net.com";
private static void Main()
{
var content = DownloadAsString(StartUrl);
// The "AsParallel" here is the key
var result = ExtractUrls(content).AsParallel().Select(
link =>
{
Console.WriteLine("... Fetching {0} started", link);
var req = WebRequest.CreateDefault(new Uri(link));
var resp = req.GetResponse();
var info = new { Link = link, Size = resp.ContentLength};
resp.Close();
return info;
}
);
foreach (var linkInfo in result)
{
Console.WriteLine("Link : {0}", linkInfo.Link);
Console.WriteLine("Size : {0}", linkInfo.Size);
}
}
private static string DownloadAsString(string url)
{
using (var wc = new WebClient())
{
return wc.DownloadString(url);
}
}
private static IEnumerable<string> ExtractUrls(string content)
{
var regEx = new Regex(#"<a\s+href=""(?<url>.*?)""");
var matches = regEx.Matches(content);
return matches.Cast<Match>().Select(m => m.Groups["url"].Value);
}
}
}
This small program first download an html page, then extract all href. This produces an array of remote files.
the AsParralel here allow to run the content of the select in a parallel way.
This code does not have error handling, cancellation feature but illustrate the AsParallel method.
If you can't call all your webservices in the same way, you can also use something like this :
Task.WaitAll(
Task.Factory.StartNew(()=>GetDataFromWebServiceA()),
Task.Factory.StartNew(()=>GetDataFromWebServiceB()),
Task.Factory.StartNew(()=>GetDataFromWebServiceC()),
Task.Factory.StartNew(()=>GetDataFromWebServiceD()),
);
This code will add 4 tasks, that will be run "when possible". The WaitAll method will simply wait for all task to be completed before returning.
By when possible I mean when a slot in the thread pool is free. When using the Task library, there is by default one thread pool per processor core. If you have 100 tasks, the 100 taks will be processed by 4 worker threads on a 4 core computer.
Related
I found an interesting differences between .NET Framework's HttpClient class/objects and the VS-2013 Project PhotoServer (DLL) class/objects. It made me wonder if there's a bug with the script.
I'm using .NET Framework v4.5.1.
I'm using the HttpClient script in the sychronous Web Generic Handler. Noticed that I'm using the ".Result" for the asynchronous POST to wait for response. So, looking at HttpClient which works is
using (var httpClient = new HttpClient())
{
var response = httpClient.PostAsync(
_baseUrl,
new FormUrlEncodedContent
(
new List<KeyValuePair<string, string>>
{
new KeyValuePair<string, string>("Vin", parmVin),
new KeyValuePair<string, string>("ImageSize", parmImageSize)
}.ToArray()
)
).Result;
//returned string[] datatype...
var photoUrls = response.Content.ReadAsStringAsync().Result;
}
I'm using the "GetPhotoUrlsAsync" script in the sychronous Web Generic Handler. This "GetPhotoUrlsAsync" object comes from the Project class (DLL). Again, I'm using the ".Result" and it doesn't work, it just deadlocked and hung. What I wanna know is why is that and was there a bug with the script?
//[Scripts in Web Generic Handlers]...
var managerVehiclePhoto = new ManagerVehiclePhoto();
var photoUrls = managerVehiclePhoto.GetPhotoUrlsAsync("12345678901234567").Result;
//[Project Class]...
namespace BIO.Dealer.Integration.PhotoServer
{
public seal class VehiclePhotoManager
{
public async Task<string[]> GetPhotoUrlsAsync(string vin)
{
var listResponse = await _client.ListAsync(vin);
return listResponse.ToArray();
}
}
}
Thanks...
Edit #1
//Synchronous API Call...
public string[] GetPhotoUrls(string vin)
{
return GetPhotoUrlsAsync(vin).Result;
}
Using .Result like this is actually a bug in both cases; it just happens not to deadlock in the HttpClient case. Note that the same HttpClient library on other platforms (notably Windows Phone, IIRC) will deadlock if used like this.
I describe the deadlock in detail on my blog, but the gist of it is this:
There's an ASP.NET "request context" that is captured by default every time you use await. When the async method resumes, it will resume within that context. However, types such as HttpContext are not multithread-safe, so ASP.NET restricts that context to one thread at a time. So if you block a thread by calling .Result, it's blocking a thread inside that context.
The reason GetPhotoUrlsAsync deadlocks is because it's an async method that is attempting to resume inside that context, but there is already a thread blocked in that context. The reason HttpClient happens to work is because GetAsync etc. are not actually async methods (note that this is an implementation detail and you should not depend on this behavior).
The best way to fix this is to replace .Result with await:
var managerVehiclePhoto = new ManagerVehiclePhoto();
var photoUrls = await managerVehiclePhoto.GetPhotoUrlsAsync("12345678901234567");
Me again (re: same quest WebPages async) :) That said, I'm one of those that Stephen Cleary identifies as "trying to migrate into async", so all this is (still) a learning moment.
The issue is SynchronizationContext in GUI/ASP.Net. I won't mangle Stephen's explanation so the link is your best bet to grok.
Given the best practices in that article, here's my way (consequently what I use in WebPages for the "top level call") to "mock" awaiting a PostAsync call like what you're doing. In this case I'm using ConfigureAwait (call is from WebPages, not MVC):
public static async Task<string> PostToRequestBin()
{
var _strContent = new FormUrlEncodedContent(new[] {new KeyValuePair<string,string>("fizz","buzz")});
using (var client = new HttpClient())
{
/*
* See http://requestb.in/ for usage
*/
var result = await client.PostAsync("http://requestb.in/xyzblah", _strContent).ConfigureAwait(false);
return await result.Content.ReadAsStringAsync();
}
}
In my WebPages page:
#{
//Class2 is a mock "library" in App_Code where the above async code lives
var postcatcher = Class2.PostToRequestBin();
}
And to make use of it somewhere in the page (where I make use of Task<string>.Result:
<p>#postcatcher.Result</p>
Again, this is a learning moment and I hope it helps/guides you. I fully expect the SO community to comment and or correct/improve on this like:
"Why don't I have to ConfigureAwait on ReadAsStringAsync" (it works either way)?
because at this point, it's "async all the way". I could have awaited some other async method...
...so the learning moments continue :)
I have been experimenting with a lightweight solution for handling my business logic. It consists of a vanilla ADO.NET connection that is extended with Dapper, and monitored by Glimpse.ADO. The use case for this setup will be a web application that has to process a handful of queries asynchronously per request. Below a simple implementation of my setup in an MVC controller.
public class CatsAndDogsController : Controller
{
public async Task<ActionResult> Index()
{
var fetchCatsTask = FetchCats(42);
var fetchDogsTask = FetchDogs(true);
await Task.WhenAll(fetchCatsTask, fetchDogsTask);
ViewBag.Cats = fetchCatsTask.Result;
ViewBag.Dogs = fetchDogsTask.Result;
return View();
}
public async Task<IEnumerable<Cat>> FetchCats(int breedId)
{
IEnumerable<Cat> result = null;
using (var connection = CreateAdoConnection())
{
await connection.OpenAsync();
result = await connection.QueryAsync<Cat>("SELECT * FROM Cat WHERE BreedId = #bid;", new { bid = breedId });
connection.Close();
}
return result;
}
public async Task<IEnumerable<Dog>> FetchDogs(bool isMale)
{
IEnumerable<Dog> result = null;
using (var connection = CreateAdoConnection())
{
await connection.OpenAsync();
result = await connection.QueryAsync<Dog>("SELECT * FROM Dog WHERE IsMale = #im;", new { im = isMale });
connection.Close();
}
return result;
}
public System.Data.Common.DbConnection CreateAdoConnection()
{
var sqlClientProviderFactory = System.Data.Common.DbProviderFactories.GetFactory("System.Data.SqlClient");
var dbConnection = sqlClientProviderFactory.CreateConnection();
dbConnection.ConnectionString = "SomeConnectionStringToAwesomeData";
return dbConnection;
}
}
I have some questions concerning the creation of the connection in the CreateAdoConnection() method. I assume the following is happening behind the scenes.
The call to sqlClientProviderFactory.CreateConnection() returns an instance of System.Data.SqlClient.SqlConnection passed as a System.Data.Common.DbConnection. At this point Glimpse.ADO.AlternateType.GlimpseDbProviderFactory kicks in and wraps this connection in an instance of Glimpse.Ado.AlternateType.GlimpseDbConnection, which is also passed as a System.Data.Common.DbConnection. Finally, this connection is indirectly extended by the Dapper library with its query methods, among them the QueryAsync<>() method used to fetch the cats and dogs.
The questions:
Is the above assumption correct?
If I use Dapper's async methods with this connection - or create a System.Data.Common.DbCommand with this connection's CreateCommand() method, and use it's async methods - will those calls internally always end up using the vanilla async implementations of these methods as Microsoft has written them for System.Data.SqlClient.SqlConnection and System.Data.SqlClient.SqlCommand? And not some other implementations of these methods that are actually blocking?
How much perf do I lose with this setup compared to just returning a new System.Data.SqlClient.SqlConnection directly? (So, without the Glimpse.ADO wrapper)
Any suggestions on improving this setup?
Yes pretty much. GlimpseDbProviderFactory wraps/decorates/proxies all the registered factories. We then pass any calls we get through to the factory we wrap (in this case SQL Server). In the case of CreateConnection() we ask the inner factory we have, to create a connection, when we get that connection, we wrap it and then return it to the originating caller
Yes. Glimpse doesn't turn what was an async request into a blocking request. We persevere the async chain all the way though. If you are interested, the code in question is here.
Very little. In essence, using a decorator pattern like this adds only one or two frames to the call stack. Compared to most operations performed during the request lifecycle, the time to observe whats happening here is extremely minimal.
What you have looks great. Only suggestion is to maybe us this code to build the factory. This code means that you can shift your connection string, etc to the web.config.
I'm hoping to use SignalR to provide updates to the client, the updates are going to come from a message table which is updated when things happen across the application..
My problem is that the application will have around 500-600 concurrent users and I cant have all them having a connection to the database and constantly polling against the table..
What id like to do is have a single thing{?} polling the table and then updating the hubs rather than each connection polling.. I was thinking of using a singleton for this? so maybe when the application starts something is created that will then do all the work really..
My question is - say I had a singleton that had an event which was fired every time there was an update.. what would the performance be like for say 500 controllers subscribing to this event?
Also.. if there is a better way to do this then pleases say.. this is my first and only idea sadly!
any help would be fantastic!
EDIT: the data is bring provided by a legacy application and I have no control over how the data is entered so database polling will be needed.
ste.
I'd rather not to poll the database as it would be wasteful. I would approach this problem by opening only one single point of entry for my data (an HTTP API, etc) and then broadcast the update to all connected clients through the SignalR Hub. Brad Wilson has a super cool presentation which demonstrate this approach:
Brad Wilson - Microsoft’s Modern Web Stack, Starring ASP.NET Web API
Here is a code sample for this approach which uses ASP.NET Web API technology for data entry. It uses in-memory dictionary for data store but the data storage technique is not the concern here:
// This hub has no inbound APIs, since all inbound communication is done
// via the HTTP API. It's here for clients which want to get continuous
// notification of changes to the ToDo database.
[HubName("todo")]
public class ToDoHub : Hub { }
public abstract class ApiControllerWithHub<THub> : ApiController
where THub : IHub {
Lazy<IHubContext> hub = new Lazy<IHubContext>(
() => GlobalHost.ConnectionManager.GetHubContext<THub>()
);
protected IHubContext Hub {
get { return hub.Value; }
}
}
public class ToDoController : ApiControllerWithHub<ToDoHub> {
private static List<ToDoItem> db = new List<ToDoItem> {
new ToDoItem { ID = 0, Title = "Do a silly demo on-stage at NDC" },
new ToDoItem { ID = 1, Title = "Wash the car" },
new ToDoItem { ID = 2, Title = "Get a haircut", Finished = true }
};
private static int lastId = db.Max(tdi => tdi.ID);
// Lines removed for brevity
public HttpResponseMessage PostNewToDoItem(ToDoItem item) {
lock (db) {
// Add item to the "database"
item.ID = Interlocked.Increment(ref lastId);
db.Add(item);
// Notify the connected clients
Hub.Clients.addItem(item);
// Return the new item, inside a 201 response
var response = Request.CreateResponse(HttpStatusCode.Created, item);
string link = Url.Link("apiRoute", new { controller = "todo", id = item.ID });
response.Headers.Location = new Uri(link);
return response;
}
}
// Lines removed for brevity
}
The full source code for the application which Brad demoed is also available: https://github.com/bradwilson/ndc2012.
The other option, which you don't prefer, is make your database to fire notifications as soon as data is changed. Then, you can pick that up and broadcast it through SignalR. Here is an example:
Database Change Notifications in ASP.NET using SignalR and SqlDependency
Sorry that this solution is not signalR, but maybe you can get ideas from it.
Here is the full example for download on GitHub
I'm working on a web service using ASP.NET MVC's new WebAPI that will serve up binary files, mostly .cab and .exe files.
The following controller method seems to work, meaning that it returns a file, but it's setting the content type to application/json:
public HttpResponseMessage<Stream> Post(string version, string environment, string filetype)
{
var path = #"C:\Temp\test.exe";
var stream = new FileStream(path, FileMode.Open);
return new HttpResponseMessage<Stream>(stream, new MediaTypeHeaderValue("application/octet-stream"));
}
Is there a better way to do this?
Try using a simple HttpResponseMessage with its Content property set to a StreamContent:
// using System.IO;
// using System.Net.Http;
// using System.Net.Http.Headers;
public HttpResponseMessage Post(string version, string environment,
string filetype)
{
var path = #"C:\Temp\test.exe";
HttpResponseMessage result = new HttpResponseMessage(HttpStatusCode.OK);
var stream = new FileStream(path, FileMode.Open, FileAccess.Read);
result.Content = new StreamContent(stream);
result.Content.Headers.ContentType =
new MediaTypeHeaderValue("application/octet-stream");
return result;
}
A few things to note about the stream used:
You must not call stream.Dispose(), since Web API still needs to be able to access it when it processes the controller method's result to send data back to the client. Therefore, do not use a using (var stream = …) block. Web API will dispose the stream for you.
Make sure that the stream has its current position set to 0 (i.e. the beginning of the stream's data). In the above example, this is a given since you've only just opened the file. However, in other scenarios (such as when you first write some binary data to a MemoryStream), make sure to stream.Seek(0, SeekOrigin.Begin); or set stream.Position = 0;
With file streams, explicitly specifying FileAccess.Read permission can help prevent access rights issues on web servers; IIS application pool accounts are often given only read / list / execute access rights to the wwwroot.
For Web API 2, you can implement IHttpActionResult. Here's mine:
using System;
using System.IO;
using System.Net;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Threading;
using System.Threading.Tasks;
using System.Web;
using System.Web.Http;
class FileResult : IHttpActionResult
{
private readonly string _filePath;
private readonly string _contentType;
public FileResult(string filePath, string contentType = null)
{
if (filePath == null) throw new ArgumentNullException("filePath");
_filePath = filePath;
_contentType = contentType;
}
public Task<HttpResponseMessage> ExecuteAsync(CancellationToken cancellationToken)
{
var response = new HttpResponseMessage(HttpStatusCode.OK)
{
Content = new StreamContent(File.OpenRead(_filePath))
};
var contentType = _contentType ?? MimeMapping.GetMimeMapping(Path.GetExtension(_filePath));
response.Content.Headers.ContentType = new MediaTypeHeaderValue(contentType);
return Task.FromResult(response);
}
}
Then something like this in your controller:
[Route("Images/{*imagePath}")]
public IHttpActionResult GetImage(string imagePath)
{
var serverPath = Path.Combine(_rootPath, imagePath);
var fileInfo = new FileInfo(serverPath);
return !fileInfo.Exists
? (IHttpActionResult) NotFound()
: new FileResult(fileInfo.FullName);
}
And here's one way you can tell IIS to ignore requests with an extension so that the request will make it to the controller:
<!-- web.config -->
<system.webServer>
<modules runAllManagedModulesForAllRequests="true"/>
For those using .NET Core:
You can make use of the IActionResult interface in an API controller method, like so.
[HttpGet("GetReportData/{year}")]
public async Task<IActionResult> GetReportData(int year)
{
// Render Excel document in memory and return as Byte[]
Byte[] file = await this._reportDao.RenderReportAsExcel(year);
return File(file, "application/vnd.openxmlformats", "fileName.xlsx");
}
This example is simplified, but should get the point across. In .NET Core this process is so much simpler than in previous versions of .NET - i.e. no setting response type, content, headers, etc.
Also, of course the MIME type for the file and the extension will depend on individual needs.
Reference: SO Post Answer by #NKosi
While the suggested solution works fine, there is another way to return a byte array from the controller, with response stream properly formatted :
In the request, set header "Accept: application/octet-stream".
Server-side, add a media type formatter to support this mime type.
Unfortunately, WebApi does not include any formatter for "application/octet-stream". There is an implementation here on GitHub: BinaryMediaTypeFormatter (there are minor adaptations to make it work for webapi 2, method signatures changed).
You can add this formatter into your global config :
HttpConfiguration config;
// ...
config.Formatters.Add(new BinaryMediaTypeFormatter(false));
WebApi should now use BinaryMediaTypeFormatter if the request specifies the correct Accept header.
I prefer this solution because an action controller returning byte[] is more comfortable to test. Though, the other solution allows you more control if you want to return another content-type than "application/octet-stream" (for example "image/gif").
For anyone having the problem of the API being called more than once while downloading a fairly large file using the method in the accepted answer, please set response buffering to true
System.Web.HttpContext.Current.Response.Buffer = true;
This makes sure that the entire binary content is buffered on the server side before it is sent to the client. Otherwise you will see multiple request being sent to the controller and if you do not handle it properly, the file will become corrupt.
The overload that you're using sets the enumeration of serialization formatters. You need to specify the content type explicitly like:
httpResponseMessage.Content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
You could try
httpResponseMessage.Content.Headers.Add("Content-Type", "application/octet-stream");
You can try the following code snippet
httpResponseMessage.Content.Headers.Add("Content-Type", "application/octet-stream");
Hope it will work for you.
Is there any quick way/trick to delete around 85K entries for the workflow process history? Trying from the GUI gives a storage issue and to resolve this issue need to bounce the box.
Also trying the PowerTool crashes after a long time. Thought to ask the wider community. appreciate for your thoughts.
Thanks
Vin
Which version of Tridion? 2011?
You could probably get away with a CoreService client app that does this regularly for you. By "PowerTool" I assume you mean the Purge tool?
Also - I would likely contact Customer Support about the errors you see, doesn't seem like using the GUI or the Purge Tool should fail.
If you're on 2011 SP1 you could use the following code:
using System;
using System.ServiceModel;
using System.Xml;
using Tridion.ContentManager.CoreService.Client;
namespace DeleteWorkflowHistory
{
class Program
{
private const string NetTcpEndpoint =
"net.tcp://localhost:2660/CoreService/2011/netTcp";
private static readonly EndpointAddress EndpointAddress =
new EndpointAddress(NetTcpEndpoint);
static void Main(string[] args)
{
var binding = new NetTcpBinding
{
MaxReceivedMessageSize = 2147483647
};
var quota = new XmlDictionaryReaderQuotas
{
MaxStringContentLength = 2147483647,
MaxArrayLength = 2147483647
};
binding.ReaderQuotas = quota;
var client = new SessionAwareCoreServiceClient(binding, EndpointAddress);
Log("Connected to Tridion Content Manager version: " + client.GetApiVersion());
ProcessesFilterData filter = new ProcessesFilterData
{
BaseColumns = ListBaseColumns.IdAndTitle,
ProcessType = ProcessType.Historical
};
foreach (IdentifiableObjectData data in client.GetSystemWideList(filter))
{
var processHistory = data as ProcessHistoryData;
if (processHistory != null)
{
Log("Deleting history: " + processHistory.Id + " / " + processHistory.Title);
client.Delete(processHistory.Id);
}
}
client.Close();
}
private static void Log(string message)
{
Console.WriteLine(string.Format("[{0}] {1}", DateTime.Now.ToString("HH:mm:ss.fff"), message));
}
}
}
N
If you can't use the Core Service, have a look at this blog entry, which describes using the Powershell to force workflow processes to complete. With some very minor modifications, the same technique would work for deleting workflow processes.