I want to send a newsletter to our site's registered users - more than 100000
Today, I send the mails in a simple for loop, which is very heavy and slow.
I want it to finish the process as fast as possible (even if the process would run in the background and didn't bother the users' UI)
I tried TPL, Async, smtpClient.SendAsyncMail() and more, but never could actually create a working example that proves that these systems work faster (or work at all).
I've read many posts and explanations using thread, but they seem to be old, deprecated or irrelevant, since there are new technologies I surely don't know of.
Could you please tell me what is the most efficient and fastest way to send multiple emails in parallel? Could you show me an example that works?
UPDATE 20/06/2018
Each email is different for each user. Therefore a BCC solution is irrelevant for me.
I will specify my question a little more:
I only focus on the high-level where I use a sending function, such as ASP.NET smtpClient.Send() to send the mails to the SMTP.
I ignore for now the part of the SMTP itself. I have a great SMTP server that knows how to handle the queuing.
The main pitfall right now is the Send function that uploads the bytes to the SMTP server in ASP.NET.
That's where I couldn't find a way to send faster. (It is needles to say that if I buy a stronger server, even an iterative method will work faster)
But I want a scalable solution - a smarter one, where I can send in parallel huge amounts of mails.
UPDATE 28/06/2018
Apparently a parallel solution will not solve my problem. As explained by #Terry Carmen and others, I may have to look at other bottle necks in the process. Thank you all for helping!
What a great community!!!
Add the users as BCC recipients to the message and let the mail server do the work.
The mail server probably has a BCC limit, so you may have to see what the server's limit is and batch the requests.
edit
Regarding "sending in parallel"
You're looking for the wrong answer. If your machine takes X seconds to produce Y emails, running them "in parallel" won't fix anything . The most you would do is split the generation between several cores, which might improve performance a little, but is almost certainly not the problem.
You need to see "what the actual slow part is". It might be your disk or CPU or network connection, or database server or possibly a rate limit setting on the mail server. The actual SMTPClient object is almost certainly not the issue.
another edit
You can try sending in parallel by running separate instances of SMTPClient in separate threads; However this is unlikely to actually help you.
As mentioned before by myself and others, you are assuming that parallel send is some sort of magic bullet. It's not. There is something that is rate-limiting you and it's almost certainly not that you're single threaded.
Related
I am currently developing a website for an energy-monitoring company. We are trying to send high volumes data from the devices which record the data to a server so they can be processed in a database. The guy developing the firmware seems to think that the best way to send the data is to produce CSV files and send them via FTP. A program on the server needs to monitor the files received via FTP and run a PHP script to process them. I, however, feel that the best way of sending the data is via HTTP POST.
We had HTTP POST working and then I began trying to work with the CSVs which became a pain as reliably monitoring the files received via FTP meant editing the ProFTPD configuration file (which I found to be a near impossible task) and install a package called mod_exec (which comes with security risks) so that ProFTPD could run a PHP script. These issues and the fact that I am unfamiliar with the linux console which I am required to use extensively to set this up, makes the CSV method very difficult to set up. HTTP POST to me seems like a more direct way of sending the data without having to worry about files or relying on ProFTPD. It would also allow us to use identifiers to give the data being passed meaning as opposed to a string of values for which the meaning is not immediately apparent. In addition, the query string could be URL encoded to pass a multidimensional array which would work well given the type of data being passed.
Nevertheless, just because the HTTP POST method would be easier doesn't mean that the CSV method doesn't have advantages. Furthermore, the firmware guy has far more experience than me with computers so I trust his opinion.
Can you please help me to understand his point of view on the advantages of the CSV method and explain what the best method is?
You're right. FTP has major issues with firewalls, and especially doesn't work well on mobile (NAT'ted) IPv4. HTTP POST works far, far better under such circumstances, if only because nobody accepts an "internet" connection that breaks HTTP.
Furthermore, HTTP is a lot easier on the device as well. It's just a single-socket protocol, with trivial read/write semantics on that socket.
Some more benefits? HTTP has almost-native support for compression (gzip). HTTP transmission can start before the input is complete. HTTP is easier to secure (HTTPS)...
No, there really is little reason to use FTP.
The 'CSV method' (I'd call it the 'FTP method' though) has the advantage of being known to the embedded developer. The receiving side will have to create some way of checking if there is a file though. That adds complexity.
The 'HTTP method' has several advantages:
HTTP is easy to implement on the sending side
No need to create a file-checker
You can reply to the embedded device if everything went OK
I actually just implemented a system just like that (not too much data, but still) and use HTTP POST to send the data. I implemented the HTTP POST myself.
Part of this question is I'm not even sure what exactly I'll need to ask, so I'll start with the situation and work out from there.
One project I'm working on involves use of COMET via the aspComet library. The use case of the program is somewhat of a collaborative slideshow. One person runs the bulk of it, with one or more participants able to perform certain actions. Low latency between when an action is performed on screen
Previously, it was just running on one server. Now, we're wanting to scale it out a bit, more for reliability than performance reasons. So, we have some boxes out in Rackspace's cloud and all that fun stuff.
I knew from the start of this that I was going to need to make some changes to the way the COMET stuff works since different people in the same "show" might be on different servers, and I have no way of knowing what "show" they belong to until after they have already arrived on the site.
I initially tackled this using the WCF Mesh provider, which wasn't well documented to start with, now I'm running into issues with dispatching messages to it sometimes get lost, or delayed (I'm not 100% sure what is going on there), but it screws up the long poll for COMET and breaks things in rather strange ways (Clicking a button may trigger an event, or it'll hang for 10 seconds {long poll duration} and not actually do anything).
More research leads me to believe one of the .Net service bus providers may do what I need. However, I can't find examples that would cover what I need:
No single point of failure (outside of a database)
No hardcoding of peers.
Near-realtime (no polling, event based would be best)
My ideal solution would involve that when a server comes up, it lets other servers know of its existence (Even if it's just a row in a table somewhere), and they can start sending broadcast messages among each other, with each server being both a publisher and subscriber. This is what I somewhat had in the WCF Mesh provider, but I'm not overly confident in that code.
Can anyone point me in the right direction with this? Even proper terms to look for in the docs for service bus providers would be good at this point. Or are service buses not what I want? At this point I would settle for setting up a Jabber server on each web server and use that, if it could fit within my constraints.
I can't speak a ton to NServiceBus, but I expect the answers will be similar.
Single point of failure: MSMQ can use multicasting, which means each endpoint will broadcast it's existence and no DB table is needed. RabbitMQ uses this Exchange-to-Queue binding process which means as long as the Rabbit instance or cluster is up then messages still exist. RabbitMQ can be clustered, MSMQ cannot be. *Note: You might have issues with multicasting with Rackspace, no idea how they work. If so, you'll have to fall back on the runtime services for MSMQ (not RabbitMQ), that would create a single point of failure because everyone has a single point to coordinate control messages through.
Hard coding of peers: discussed above a bit; MSMQ's multicast handles it. Rabbit it can also be done, just bind queues to an exchange you want to listen to. MassTransit takes care of this for you.
Near-realtime: These both use messaging which is near real time. There's no polling in your message consumer code.
I think a service bus seems like a reasonable solution for what you're trying. Some more details would likely be needed, but the general messaging approach is correct. There are other more light weight messaging libraries if you decided you just want something on top of RabbitMQ and configure Rabbit to handle most of the stuff.
To get started with MassTransit, we have documentation up: http://readthedocs.org/projects/masstransit/ and mailing list http://groups.google.com/group/masstransit-discuss. Join the mailing list if you have future questions and someone will try and help you out.
I am looking for networking designs and tricks specific to games. I know about a few problems and I have some partial solutions to some of them but there can be problems I can't see yet. I think there is no definite answer to this but I will accept an answer I really like. I can think of 4 categories of problems.
Bad network
The messages sent by the clients take some time to reach the server. The server can't just process them FCFS because that is unfair against players with higher latency. A partial solution for this would be timestamps on the messages but you need 2 things for that:
Be able to trust the clients clock. (I think this is impossible.)
Constant latencies you can measure. What can you do about variable latency?
A lot of games use UDP which means messages can be lost. In that case they try to estimate the game state based on the information they already have. How do you know if the estimated state is correct or not after the connection is working again?
In MMO games the server handles a large amount of clients. What is the best way for distributing the load? Based on location in game? Bind a groups of clients to servers? Can you avoid sending everything through the server?
Players leaving
I have seen 2 different behaviours when this happens. In most FPS games if the player who hosted the game (I guess he is the server) leaves the others can't play. In most RTS games if any player leaves the others can continue playing without him. How is it possible without dedicated server? Does everyone know the full state? Are they transfering the role of the server somehow?
Access to information
The next problem can be solved by a dedicated server but I am curious if it can be done without one. In a lot of games the players should not know the full state of the game. Fog-of-war in RTS and walls in FPS are good examples. However, they need to know if an action is valid or not. (Eg. can you shoot me from there or are you on the other side of the map.) In this case clients need to validate changes to an unknown state. This sounds like something that can be solved with clever use of cryptographic primitives. Any ideas?
Cheating
Some of the above problems are easy in a trusted client environment but that can not be assumed. Are there solutions which work for example in a 80% normal user - 20% cheater environment? Can you really make an anti-cheat software that works (and does not require ridiculous things like kernel modules)?
I did read this questions and some of the answers https://stackoverflow.com/questions/901592/best-game-network-programming-articles-and-books but other answers link to unavailable/restricted content. This is a platform/OS independent question but solutions for specific platforms/OSs are welcome as well.
Thinking cryptography will solve this kind of problem is a very common and very bad mistake: the client itself of course have to be able to decrypt it, so it is completely pointless. You are not adding security, you're just adding obscurity (and that will be cracked).
Cheating is too game specific. There are some kind of games where it can't be totally eliminated (aimbots in FPS), and some where if you didn't screw up will not be possible at all (server-based turn games).
In general network problems like those are deeply related to prediction which is a very complicated subject at best and is very well explained in the famous Valve article about it.
The server can't just process them FCFS because that is unfair against players with higher latency.
Yes it can. Trying to guess exactly how much latency someone has is no more fair as latency varies.
In that case they try to estimate the game state based on the information they already have. How do you know if the estimated state is correct or not after the connection is working again?
The server doesn't have to guess at all - it knows the state. The client only has to guess while the connection is down - when it's back up, it will be sent the new state.
In MMO games the server handles a large amount of clients. What is the best way for distributing the load? Based on location in game?
There's no "best way". Geographical partitioning works fairly well, however.
Can you avoid sending everything through the server?
Only for untrusted communications, which generally are so low on bandwidth that there's no point.
In most RTS games if any player leaves the others can continue playing without him. How is it possible without dedicated server? Does everyone know the full state?
Many RTS games maintain the full state simultaneously across all machines.
Some of the above problems are easy in a trusted client environment but that can not be assumed.
Most games open to the public need to assume a 100% cheater environment.
Bad network
Players with high latency should buy a new modem. I don't think its a good idea to add even more latency because one person in the game got a bad connection. Or if you mean minor latency differences, who cares? You will only make things slower and complicated if you refuse to FCFS.
Cheating: aimbots and similar
Can you really make an anti-cheat software that works? No, you can not. You can't know if they are running your program or another program that acts like yours.
Cheating: access to information
If you have a secure connection with a dedicated server you can trust, then cheating, like seeing more state than allowed, should be impossible.
There are a few games where cryptography can prevent cheating. Card games like poker, where every player gets a chance to 'shuffle the deck'. Details on wikipedia : Mental Poker.
With a RTS or FPS you could, in theory, encrypt your part of the game state. Then send it to everyone and only send decryption keys for the parts they are allowed to see or when they are allowed to see it. However, I doubt that in 2010 we can do this in real time.
For example, if I want to verify, that you could indeed be at location B. Then I need to know where you came from and when you were there. But if you've told me that before, I knew something I was not allowed to know. If you tell me afterwards, you can tell me anything you want me to believe. You could have told me before, encrypted, and give me the decryption key when I need to verify it. That would mean, you'll have to encrypt every move you make with a different encryption key. Ouch.
If your not implementing a poker site, cheating won't be your biggest problem anyway.
With a lot of people accessing games on mobile devices, a "bad network" can occur when a player is in an area of poor reception or they're connected to a slow-wifi connection. So it's not just a problem of people connecting in sparsely populated areas. With mobile clients "bad networks" can occur very very often and it's usually EXTREMELY hard to diagnose.
UDP results in packet loss, but even games that use TCP and HTTP based can experience problems where the client & server communication slows to a crawl while packets are verified to have been sent. With communication UDP compensation for packet loss USUALLY depends on what the packets contain. If you're talking about motion data, usually if packets aren't received, the server interpolates the previous trajectory and makes a position change. Usually it's custom to the game how this is handled, which is why people often avoid UDP unless their game type requires it. Often to handle high network latency, problems games will automatically degrade the amount of features available to the users so that they can still interact with the game without causing the user to get kicked or experience too many broken features.
Optimally you want to have a logging tool like Loggly available that can help you find errors related to bad connection and latency and show you the conditions on the clients and server at the time they happened, this visibility lets you diagnose common problems users experience and develop strategies to address them.
Players leaving
Most games these days have dedicated servers, so this issue is mostly moot. However, sometimes yes, the server can be changed to another client.
Cheating
It's extremely hard to anticipate how players will cheat and create a cheat-proof system no one can hack. These days, a lot of cheat detection strategies are based on heuristic analysis of logging and behavioral analytics information data to spot abnormalities when they happen and flag it for review. You definitely should try to cheat-proof as much as is reasonable, but you also really need an early detection system that can spot new flaws people are exploiting.
Our app need instant notification, so obvious I should use some some WCF duplex, or socket communication. Problem is the the app is partial trust XBAP, and thus I'm not allowd to use anything but BasicHttpBinding. Therefore I need to poll for changes.
No comes the question: My PM says the update interval should be araound 2 sec, and run on a intranet with 500 users on a single web server.
Does any of you have experience how polling woould influence the web server.
The service is farly simple, it takes a guid as an arg, and returns a list of guids. All data access are cached, so I guess the load on the server is minimal for one single call, but for 500...
Except from the polling, the webserver has little work.
So, based on this little info (assume a standard server HW, whatever that is), is it possible to make a qualified guess?
Is it possible or not to implement this, will it work?
Yes, I know estimating this is difficult, but I'd be really glad if some of you could share some thoughts on this
Regards
Larsi
Don't estimate, benchmark.
Polling.. bad, but if you have no other choice, then it's ideal :)
Bear in mind the fact that you will no doubt have keep-alives on, so you will have 500 permanently-connected users. The memory usage of that will probably be more significant than the processor usage. I can't imagine network access (even in a relatively bloaty web service) would use much network capacity, but your network latency might become an issue - especially as we've all seen web applications 'pause' for a little while.
In the end though, you'll probably be ok, but you'll have to check it yourself. There are plenty of web service stress testers, you can use Microsoft's WAS tool for one, here's a few links to others.
Try using soapui, a web service testing tool, to check the performance of your web service. There is a paid version and an open source version that is free.
cheers
I don't think it will be a particular problem. I'd imagine the response time for each request would be pretty low, unless you're pulling back a hell of a lot of data, so 500 connections spread over 2 seconds shouldn't hit it too hard.
You can use a stress testing tool to verify your webserver can handle the load though, before you commit to this design.
250 qps probably is doable with quite modest hardware and network bandwidth provided you do minimize the data sent back & forth. I assume you're caching these GUID lists on the client so you can just send a small "no updates" response in the normal case.
Should be pretty easy to measure with a simple prototype though to be more confident.
Even on big-time sites such as Google, I sometimes make a request and the browser just sits there. The hourglass will turn indefinitely until I click again, after which I get a response instantly. So, the response or request is simply getting lost on the internet.
As a developer of ASP.NET web applications, is there any way for me to mitigate this problem, so that users of the sites I develop do not experience this issue? If there is, it seems like Google would do it. Still, I'm hopeful there is a solution.
Edit: I can verify, for our web applications, that every request actually reaching the server is served in a few seconds even in the absolute worst case (e.g. a complex report). I have an email notification sent out if a server ever takes more than 4 seconds to process a request, or if it fails to process a request, and have not received that email in 30 days.
It's possible that a request made from the client took a particular path which happened to not work at that particular moment. These are unavoidable - they're simply a result of the internet, which is built upon unstable components and which TCP manages to ensure a certain kind of guarantee for.
Like someone else said - make sure when a request hits your server, you'll be ready to reply. Everything else is out of your hands.
They get lost because the internet is a big place and sometimes packets get dropped or servers get overloaded. To give your users the best experience make sure you have plenty of hardware, robust software, and a very good network connection.
You cannot control the pipe from the client all the way to your server. There could be network connectivity issues anywhere along the pipeline, including from your PC to your ISP's router which is a likely place to look first.
The bottom line is if you are having issues bringing Google.com up in your browser then you are guaranteed to have the same issue with your own web application at least as often.
That's not to say an ASP application cannot generate the same sort of downtime experience completely on it's own... Test often and code defensively are the key phrases to keep in mind.
Let's not forget browser bugs. They aren't nearly perfect applications themselves...
This problem/situation isn't only ASP related, but it covers the whole concept of keeping your apps up and its called informally the "5 nines" or "99.999% availability".
The wikipedia article is here
If you lookup the 5 nines you'll find tons of useful information, which you can apply as needed to your apps.