Optimizing a set 20 webrequests with threads

Optimizing a set 20 webrequests with threads - asp.net

This is for ASP.NET. I want to improve the time it takes run my function, today it takes around 20-30 seconds, more towards 30secs than 20secs though. That's running on one thread making 20 webrequests.
I'm thinking threads that do all the 20 webreqeusts, in order to quickly find the result or just go through the data (IE do all the 20 requests not finding anything).
Here's how it works.
1. I'm using html agility pack to fetch htmldocuments. 2. Then I parse them for information 3. Lastly I add that information to a dictionary OR I move on to the next webrequest until I reach 20 requests made.
I make at most 20 webRequests, at minimum 1. I have set the function to end when the info I'm searching for is found. Sometimes the info isn't there hence the 20 webrequests(it goes through all the data).
Every webrequest adds between 5-20 entries to the dictionary. This is then compared with the information I sent to it, if it's in the list I get the Key back, otherwise it returns 201. If found it gets added to the database.
QUESTIONS
*A:*If I want to do this with threads, how many should I create? 20 One for each request and let them all loose to do the job? Or should i create like 4 of them making at most 5 requests each?B: What if two threads are finished at the same time and wants to add info to the directory, can it lock the whole site(I'm using ASP.NET), or will it try to add one from thread A and then one result from Thread B? I have a check already today that checks if the key exists before adding it.
C:What would be the fastest way to this?
This is my code, depicting the loop which just shows that 20 requests are being made?
public void FetchAndParseAllPages()
{
int _maxSearchDepth = 200;
int _searchIncrement = 10;
PageFetcher fetcher = new PageFetcher();
for (int i = 0; i < _maxSearchDepth; i += _searchIncrement)
{
string keywordNsearch = _keyword + i;
ParseHtmldocuments(fetcher.GetWebpage(keywordNsearch));
if (GetPostion() != 201)
{ //ADD DATA TO DATABASE
InsertRankingData(DocParser.GetSearchResults(), _theSearchedKeyword);
return;
}
}
}

.NET allows only 2 requests open at the same time. If you want more than that, you need to configure it in web.config. Look here: http://msdn.microsoft.com/en-us/library/aa480507.aspx
You can the Parallel.For method which is very straightforward and handles the "how much threads" for you. Of course you can tweak it to set how much threads (or tasks) you want with ParallelOptions. Look here: http://msdn.microsoft.com/en-us/library/dd781401.aspx
For making a thread-safe dictionary you can use the ConcurrentDictionary. Look here: http://msdn.microsoft.com/en-us/library/dd287191.aspx

Related

How to make Kaa log upload event based instead of time based

I've only recently started to work with KaaIoT and I am wondering if there is another way to store a log bucked to the server.
/* some headers */
static void main_callback(void *context)
{
kaa_user_log_record_t *log_record = kaa_logging_time_collection_create();
log_record->test_time = kaa_string_copy_create("some_time");
kaa_logging_add_record(kaa_client_get_context(context)->log_collector, log_record, NULL);
}
/* some other configuration */
error = kaa_client_start(kaa_client, main_callback, kaa_client, 5);
When I execute this code, the string "some_time" will be stored to the server every 5 seconds.
I was wondering if there was an other way to do this, like upload the log to the server when I press my 'enter' key? But I can't seem to find a command for this.

To my understanding kaa_logging_add_record, just add the record to the storing bucket waiting to be sent according to the logging strategy you have defined. (https://kaaproject.github.io/kaa/autogen-docs/client-c/v0.10.0/kaa__logging_8h.html#af0fadc09a50f5e38603271a08c581417) . The parameter 5 sec in kaa_client_start is only a delay to cycle the call back function. If you want to register an event, first you have to store it in the log bucket and the timestamp if you want to record at what time happened. If you want to notify at the moment, the I think you should use Notifications or Events. I am also scratching my head in something similar and I wonder if there is a better way.

How to create an alert to notify an user when some amount % of threshold reached DailyAsyncApex Executions

On 2 occasions in the past month, we have managed to hit our daily limit on asynchronous apex executions. Salesforce temporarily increased our limit to 425000 but it will be scaled down to 250000 in a week's time. Once we reach the limit, a lot of the SF functions will fail and this has tremendously impacted both internal staff and external customers.
So to prevent this from happening in the future, we need to create some kind of alert in Salesforce to monitor our daily asynchronous apex method executions. Our maximum daily limit is 250000. The alert will need to create a P3 helpdesk ticket and notify couple of users say USER A and USER B once it reaches 70% threshold.
Kindly advise what is possible to achieve the same
Thanks & Regards,
Harjeet

There's a promising Limits method but it doesn't seem to work currently ("reserved for future use"): System.debug(Limits.getAsyncCalls() + ' / ' + Limits.getLimitAsyncCalls());
There's an idea you can upvote: https://success.salesforce.com/ideaView?id=0873A0000003VIFQA2 ;)
You could query SELECT COUNT() FROM AsyncApexJob WHERE ... but that sounds like a bad idea ;)
I think your best course of action is to use SF REST API. There's a "limits" resource you can fetch. You could do it from SF itself (bad idea because if you'd schedule it to run every hour then well, of course it will contribute to the limit consumption too ;)) or from some external app that'd connect to your SF...
https://developer.salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/dome_limits.htm
You can quickly try it out for example in workbench.developerforce.com before you decide you do want to deep dive into coding it.
Of course if you have control over your batch jobs, queuable, schedulable & #future calls you could implement some rough counter of executions in a helper object for example... won't help you much if most of the jobs are coming from managed packages though...
Got 1 more idea but it's pretty hardcore - you should be able to make a REST API call from javascript. so you could create a simple VF page (even without any apex controller), put JS callout on it, have it check every 5 mins and do something if threshold is hit... But that means IT person would have to have this page open all the time (perhaps as a home page component)... Messy :)

I was having the exact same issue so I created a simple JsForce script in NodeJS to monitor the call to the /limits endpoint.
You can connect a Free Monitoring service like UpTimerobot.com or PingDom.com and get an email when you find the Word "Warning" >50% or "Error" > 80%.
async function getSfLimits() {
try {
//Let's login into salesforce
const login = await conn.login(SF_USERNAME, SF_PASSWORD+SF_SECURITY_TOKEN);
//Call the API
const sfLimits = await conn.requestGet('/services/data/v51.0/limits');
return sfLimits;
} catch(err) {
console.log(err);
}
}
https://github.com/carlosdevia/salesforcelimits

System.Io.Directory::GetFiles() Polling from AX 2009, Only Seeing New Files Every 10s

I wrote code in AX 2009 to poll a directory on a network drive, every 1 second, waiting for a response file from another system. I noticed that using a file explorer window, I could see the file appear, yet my code was not seeing and processing the file for several seconds - up to 9 seconds (and 9 polls) after the file appeared!
The AX code calls System.IO.Directory::GetFiles() using ClrInterop:
interopPerm = new InteropPermission(InteropKind::ClrInterop);
interopPerm.assert();
files = System.IO.Directory::GetFiles(#POLLDIR,'*.csv');
// etc...
CodeAccessPermission::revertAssert();
After much experimentation, it emerges that the first time in my program's lifetime, that I call ::GetFiles(), it starts a notional "ticking clock" with a period of 10 seconds. Only calls every 10 seconds find any new files that may have appeared, though they do still report files that were found on an earlier 10s "tick" since the first call to ::GetFiles().
If, when I start the program, the file is not there, then all the other calls to ::GetFiles(), 1 second after the first call, 2 seconds after, etc., up to 9 seconds after, simply do not see the file, even though it may have sitting there since 0.5s after the first call!
Then, reliably, and repeatably, the call 10s after the first call, will find the file. Then no calls from 11s to 19s will see any new file that might have appeared, yet the call 20s after the first call, will reliably see any new files. And so on, every 10 seconds.
Further investigation revealed that if the polled directory is on the AX AOS machine, this does not happen, and the file is found immediately, as one would expect, on the call after the file appears in the directory.
But this figure of 10s is reliable and repeatable, no matter what network drive I poll, no matter what server it's on.
Our network certainly doesn't have 10s of latency to see files; as I said, a file explorer window on the polled directory sees the file immediately.
What is going on?

Sounds like your issue is due to SMB caching - from this technet page:
Name, type, and ID
Directory Cache [DWORD] DirectoryCacheLifetime
Registry key the cache setting is controlled by
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Lanmanworkstation\Parameters
This is a cache of recent directory enumerations performed by the
client. Subsequent enumeration requests made by client applications as
well as metadata queries for files in the directory can be satisfied
from the cache. The client also uses the directory cache to determine
the presence or absence of a file in the directory and uses that
information to prevent clients from repeatedly attempting to open
files which are known not to exist on the server. This cache is likely
to affect distributed applications running on multiple computers
accessing a set of files on a server – where the applications use an
out of band mechanism to signal each other about
modification/addition/deletion of files on the server.
In short try to set the registry key
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Lanmanworkstation\Parameters\DirectoryCacheLifetime
to 0

Thanks to #Jan B. Kjeldsen , I have been able to solve my problem using FileSystemWatcher. Here is my implementation in X++ :
class SelTestThreadDirPolling
{
}
public server static Container SetStaticFileWatcher(str _dirPath,str _filenamePattern,int _timeoutMs)
{
InteropPermission interopPerm;
System.IO.FileSystemWatcher fw;
System.IO.WatcherChangeTypes watcherChangeType;
System.IO.WaitForChangedResult res;
Container cont;
str fileName;
str oldFileName;
str changeType;
;
interopPerm = new InteropPermission(InteropKind::ClrInterop);
interopPerm.assert();
fw = new System.IO.FileSystemWatcher();
fw.set_Path(_dirPath);
fw.set_IncludeSubdirectories(false);
fw.set_Filter(_filenamePattern);
watcherChangeType = ClrInterop::parseClrEnum('System.IO.WatcherChangeTypes', 'Created');
res = fw.WaitForChanged(watcherChangeType,_timeoutMs);
if (res.get_TimedOut()) return conNull();
fileName = res.get_Name();
//ChangeTypeName can be: Created, Deleted, Renamed and Changed
changeType = System.Enum::GetName(watcherChangeType.GetType(), res.get_ChangeType());
fw.Dispose();
CodeAccessPermission::revertAssert();
if (changeType == 'Renamed') oldFileName = res.get_OldName();
cont += fileName;
cont += changeType;
cont += oldFileName;
return cont;
}
void waitFileSystemWatcher(str _dirPath,str _filenamePattern,int _timeoutMs)
{
container cResult;
str filename,changeType,oldFilename;
;
cResult=SelTestThreadDirPolling::SetStaticFileWatcher(_dirPath,_filenamePattern,_timeoutMs);
if (cResult)
{
[filename,changeType,oldFilename]=cResult;
info(strfmt("filename=%1, changeType=%2, oldFilename=%3",filename,changeType,oldFilename));
}
else
{
info("TIMED OUT");
}
}
void run()
{;
this.waitFileSystemWatcher(#'\\myserver\mydir','filepattern*.csv',10000);
}
I should acknowledge the following for forming the basis of my X++ implementation:
https://blogs.msdn.microsoft.com/floditt/2008/09/01/how-to-implement-filesystemwatcher-with-x/

I would guess DAXaholic's answer is correct, but you could try other solutions like EnumerateFiles.
In your case I would rather wait for the files rather than poll for the files.
Using FileSystemWatcher there will be a minimal delay from file creation till your process wakes up. It is more tricky to use, but avoiding polling is a good thing. I have never used it over a network.

Memory leak while sending response from rebus handler

I saw a very strange behavior in my rebus handler which is self hosted in exe. Right after sending response using bus.send method it adds up some memory consumed by process. I tried to look up object graph using memory profile and found that rebus is holding response message in serialized format somewhere.
Object graph was showing below hierarchy to the root.
System.Message --> CachedBodyMessage --> stream
Give me some pointers if anybody is aware of this thing.

I understand that a memory leak is a grave concern, but my belief is that it is unlikely that Rebus should contain a memory leak.
This belief is rooted in the fact that I have been running Windows Service-hosted Rebus endpoints in production for 1,5 years now, and several of them (e.g. the timeout managers) have sometimes been running for several months without being restarted.
I'd like to be absolutely bulletproof sure though, so I'm willing to investigate the issue you're reporting.
You're mentioning "CachedBodyMessage" - judging by the names of fields inside System.Messaging.Message, it sounds like it's something within MSMQ. To try to reproduce your issue, I coded the following test:
[Test, Ignore("Only works in RELEASE mode because otherwise object references are held on to for the duration of the method")]
public void DoesNotLeakMessages()
{
// arrange
const string inputQueueName = "test.leak.input";
var queue = new MsmqMessageQueue(inputQueueName);
disposables.Add(queue);
var body = Encoding.UTF8.GetBytes(new string('*', 32768));
var message = new TransportMessageToSend
{
Headers = new Dictionary<string, object> { { Headers.MessageId, "msg-1" } },
Body = body
};
var weakMessageRef = new WeakReference(message);
var weakBodyRef = new WeakReference(body);
// act
queue.Send(inputQueueName, message, new NoTransaction());
message = null;
body = null;
GC.Collect();
GC.WaitForPendingFinalizers();
// assert
Assert.That(weakMessageRef.IsAlive, Is.False, "Expected the message to have been collected");
Assert.That(weakBodyRef.IsAlive, Is.False, "Expected the body bytes to have been collected");
}
which verifies that the sent transport message is collected as it should (will only do this in RELEASE mode though, because of the way DEBUG mode holds on to object references within scope)
I'll try and run the TimePrinter sample now and leave it running for a while to see if I can reproduce the issue. If you stumble upon more information about e.g. exactly which objects are leaking, it would be very helpful.
Thanks again for taking the time to report your worries to me :)
Followup:
I've modified the TimePrinter sample so that it sends 50 msg/s and includes a 64 KB random string payload with each message, and I've tracked the memory usage for almost four hours now. As you can see, it does not look like memory is being leaked.
I'll leave it running the rest of the day, just to be sure.
Maybe you can tell me some more about why you suspected there was a memory leak in the first place?
Update:
As you can see from the trace, it has now been running for 7 hours and thus more than 1,200,000 messages containing more than 70 GB of data has been sent and consumed by the same process. If cached message bodies were leaking, I am pretty sure that we would have been able to see something rising on the graph.

Flex slow first Http request

When i use loader.load(request); for the first time, my flex freeze for 10 secondes before posting the data (i can see the web server result in real time).
However if redo a similar POST with other data but same request.url, it's instantaneous.
// Multi form encoded data
variables = new URLVariables();
variables.user = "aaa";
variables.boardjpg = new URLFileVariable(data.boardBytes, "foo.jpg");
request = new URLRequestBuilder(variables).build();
request.url = "http://localhost:8000/upload/";
loader.load(request);
How can i see what is taking so long ?
Thanks !

Ok, this is an old question, anyway I find it searching for other things so quick adding this
URLFileVariables nor URLRequestBuilder are core classes in AS3, so I guess you're using some custom library to build your request. I don't know which library you use, but it seems that the purpose is to serialize some binary data to build a POST. Serializing usually takes some times the first time (lookup initialization and the like) and goes faster next, a well known example is Remoting in his different flavours

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex