LDAP Maximum number of users - openldap

i have an LDAP server which manage users for a unique access point from a several application (web and desktop).
Now i need to import more than 200k entry in a OU (organizational unit) and this number grow even.
My question is: there is a recommendation on the maximum number of users that an ldap server can handle?
A OU can accomodate such a high number of users?
Thanks

When we are talking about 200K entries, I think that the performance of your LDAP service will depend on your server and network resources, not on LDAP technology. But I think that there is really no problem with such amount of data.
To provide better performance to your users, you SHOULD have in mind (when you design project and decide which data to put into LDAP) rule 1:10K (1 time write : 10,000 times read), since LDAP is designed to be very fast to read in change for being very slow to write. Also you MAY think about using more than one instance with replications and load balancer in front of them (if you need more details, I can provide them).

OpenLDAP should be able to handle millions of users, provided you set up the indexing correctly.

I have run eDirectory as the underlying data store for LDAP services with 600,000 objects in one OU, with 2 million, and I know a telecom down under with over 8 million.

Related

appfabric caching failure exceptions=getandlock requests for session state

I'm using the session provider in an asp.net app with a 3 host appfabric cluster.
The cluster is version 3 and is running on windows server 2008.
My sessions cluster has secondaries set to 1 and min secondaries set to 0.
When I look at the cache statistics I notice a very large (disproportionate) number in the miss count category. In fact it almost equals the request count category. So with this I decided to look at the performance counters to figure out why the session provider does not seem to be able to hold the object correctly or why it keeps missing.
What I found was that the getandlocks/sec are identical to the failure exceptions/sec counters. It's also constantly running which is not normal considering that there is only so much traffic being generated by our staff. The object count is not large but the rejection rate is clearly much higher than the number of objects that should be coming out of it. I'm not writing or modify that much info to the session, for the most part it doesn't change but clearly I'm getting a significantly larger number of requests than my users can create.
Any help is welcome.
PS.
Ideally I'd love to know what these failure exceptions say but there is no way to capture them it seems.

Cross-server In-memory data (as variable) per user or global (for all users)

My question is regarding aggregated data for fast access across several servers on Amazon EC2. In an ASP.NET application, I would probably store that data in Application["somevar"] variable so it can be accessed quickly (in memory) by all users.
The problem starts when I want that aggregated data to be gathered and its value equal on all servers. If I chose to deploy two servers, the user might be transmitting data to different servers every time (the servers are under a load balancer or ElasticBean), and if for example I count the number of times the user asked for the page, each server's Application var will have different value
For example:
Server 1:
Application["counter1"] = 120
Server 2:
Application["counter1"] = 130
What I want is a variable that be the same on all servers. The reason I want the data in Application-like variable is that I want that data in memory for fast access, then I might write that data to the database.
What I want to know is how can I achieve this. I though about using Amazon ElasticCache so even if I have 10 server under the load balancer, I can access the ElasticCache variable via API and it doesn't matter from which server I access the memcache variable, it will get/set the same variable, and therefore I can achieve my goal in keeping a cross-server global variable.
I wanted to know if it's a good practice and wherever there is a better way to implement such feature.
I am developing my application in ASP.NET C# and with MySQL. Also take into consideration that some of the aggregated data should be written to the database, and I do that to prevent a lot of writes at the same time, and write data after it reaches 20 writes for example and then the data will be written to the database.
Just to clear up a few things. First lets make sure that we understand how to use ElasticCache. The API for ElasticCache doesn't give us any CRUD operations on the cache cluster, the API from Amazon is strictly for managing the servers and configuration. You will need to use a memcached library for .NET to connect to the cluster. Using a cache like memcached is a good solution for you're first problem. It will easily and quickly store simple application variables in a distributed environment. Using a cache is generally a good practice even with smaller applications.
I'm not sure how many users you have or how many you expect to have but one thing I've learned in my years programming is that over optimization is usually a bad idea. Over optimization is when you start to optimize you're code before it's really necessary. Take you're proposed optimization for example. We know that making 1 write on the database is quicker than making 20 writes, generally speaking of course. However, unless your database is the bottleneck in your application to implement such a feature you introduce a significant amount of complexity for no immediate benefit. If a memcached cluster server crashes, which it will, then the data waiting to be written to the database is lost. If you really do have a lot of users then you have to start thinking about concurrency and locks on the memcached items.
Without knowing more about your application i can't make any real recommendations except to say that make sure your optimization are required before you spend time increasing the complexity of your application for nothing.

Projecting simultaneous database queries

I’m after some thoughts on how people go about calculating database load for the purposes of capacity planning. I haven’t put this on Server Fault because the question is related to measuring just the application rather than defining the infrastructure. In this case, it’s someone else’s job to worry about that bit!
I’m aware there are a huge number of variables here but I’m interested in how others go about getting a sense of rough order of magnitude. This is simply a costing exercise early in a project lifecycle before any specific design has been created so not a lot of info to go on at this stage.
The question I’ve had put forward from the infrastructure folks is “how many simultaneous users”. Let’s not debate the rationale of seeking only this one figure; it’s just what’s been asked for in this case!
This is a web front end, SQL Server backend with a fairly fixed, easily quantifiable audience. To nail this down to actual simultaneous requests in a very rough fashion, the way I see it, it comes down to increasingly granular units of measurement:
Total audience
Simultaneous sessions
Simultaneous requests
Simultaneous DB queries
This doesn’t account for factors such as web app caching, partial page requests, record volume etc and there’s some creative license needed to define frequency of requests per user and number of DB hits and execution time but it seems like a reasonable starting point. I’m also conscious of the need to scale for peak load but that’s something else that can be plugged into the simultaneous sessions if required.
This is admittedly very basic and I’m sure there’s more comprehensive guidance out there. If anyone can share their approach to this exercise or point me towards other resources that might make the process a little less ad hoc, that would be great!
I will try, but obviously without knowing the details it is quite difficult to give a precise advice.
First of all, the infrastructure guys might have asked this question from the licensing perspective (SQL server can be licensed per user or per CPU)
Now back to your question. "Total audience" is important if you can predict/work out this number. This can give you the worst case scenario when all users hit the database at once (e.g. 9am when everyone logs in).
If you store session information you would probably have at least 2 connections per user (1 session + 1 main DB). But this number can be (sometimes noticeably) reduced by connection pooling (depends on how you connect to the database).
Use a worst case scenario - 50 system connection + 2 * number of users.
Simultaneous requests/queries depend on the nature of the application. Need more details.
More simultaneous requests (to your front end) will not necessarily translate to more requests on the back end.
Having said all of that - for the costing purposes you need to focus on a bigger picture.
SQL server license (If my memory serves me right) will cost ~128K AUD (dual Xeon). Hot/warm standby? Double the cost.
Disk storage - how much storage will you need? Disks are relatively cheap but if you are going to use SAN the cost might become noticeable. Also - the more disks the better from the performance perspective.

How many is too many databases on SQL Server?

I am working with an application where we store our client data in separate SQL databases for each client. So far this has worked great, there was even a case where some bad code selected the wrong customer ids from the database and since the only data in the database belonged to that client, the damage was not as bad as it could have been. My concerns are about the number of databases you realistically have on an SQL Server.
Is there any additional overhead for each new database you create? We we eventually hit a wall where we have just to many databases on one server? The SQL Server specs say you can have something like 32000 databases but is that possible, does anyone have a large number of database on one server and what are the problems you encounter.
Thanks,
Frank
The upper limits are
disk space
memory
maintenance
Examples:
Rebuilding indexes for 32k databases? When?
If 10% of 32k databases each has a active set of 100MB data in memory at one time, you're already at 320GB target server memory
knowing what DB you're connected too
...
The effective limit depends on load, usage, database size etc.
Edit: And bandwidth as Wyatt Barnett mentioned.. I forgot about network, the bottleneck everyone forgets about...
The biggest problem with all the multiple databases is keeping them all in synch as you make schema changes. As far as realistic number of databases you can have and have the system work well, as usual it depends. It depends on how powerful the server is and how large the databases are. Likely you would want to have multiple servers at some point not just because it will be faster for your clients but because it will put fewer clients at risk at one time if something happens to the server. At what point that is, only your company can decide. Certainly if you start getting a lot of time-outs another server might be indicated (or fixing your poor queries might also do it). Big clients will often pay a premium to be on a separate server, so consider that in your pricing. We had one client so paranoid about their data we had to have a separate server that was not even co-located with the other servers. They paid big bucks for that as we had to rent extra space.
ISPs routinely have one database server that is shared by hundreds or thousands of databases.
Architecturally, this is the right call in general. You've seen the first huge advantage--oftentimes, damage can be limited to a single client and you have near zero risk of a client getting into another client's data. But you are missing the other big advantage--you don't have to keep all the clients on the same database server. When you do get big enough that your server is suffering, you can offload clients onto another box entirely with minimal effort.
I'd also bet you'll run out of bandwidth to manage the databases before your server runs out of steam to handle more databases . . .
What you are really asking about is Scalability; Though, ideally setting up 32,000 Databases on one Server is probably not advantageous it is possible (though, not recommended).
Read - http://www.sql-server-performance.com/articles/clustering/massive_scalability_p1.aspx
I know this is an old thread but it's the same structure we've had in place for the past 2 years and current run 1768 databases over 3 servers.
We have the following setup (not included mirrors and so on):
2 web farm servers and 4 content servers
SQL instance just for a master database of customers, which is queried when they access their webpage by the ID to get the server/instance and database name which their data resides on. This is then stored in the authentication ticket.
3 SQL servers to host customer databases on with load spread on creation based on current total number of learners that exist within all databases on each server (quickly calculated by license number field in master database).
On each SQL Server there is a smaller master database setup which contains shared static data that is used by all clients, therefore allowing smaller client databases and quicker updating of the content.
The biggest thing as mentioned above is keeping the database structures synchronises! For this I ended up programming a small .NET windows form that looks up all customers in the master database and you paste code in to execute and it'll loop through getting the database location and executing the SQL you past.
Creating new customers also caused some issues for us, so I ended up programming a management system for our sale people and it create a new database based on a backup of a inactive "blank" database, therefore we have the latest DB without need to re-script the entire database creation script. It then inserts the customer details inside the master database with location of where the database was created and migrates any old data from an old version of our software. All this is done on a separate instance before moving, therefore reducing any SQL locks.
We are now moving to a single database for our next version of the software as database redundancy is near impossible with so many databases! This is a huge thing to consider as SQL creates a couple of waiting tasks which mirror your data per database, once you start multiplying the databases it gets out of hand and the system almost solely is tasked with synchronising and can lock up due to the shear number of threads. See page 30 of Microsoft document below:
SQLCAT's Guide to High Availability Disaster Recovery.pdf
I do however have doubts about moving to a single database, due to some concerns as mentioned above, such as constantly checking in every single procedure that the current customer has access to only their data and also things along the lines of one little issue will now affect every single database, such as table indexing and so on. Also at the minute our customer are spaced over 3 servers, but the single database will mean yes we have redundancy, but if the error was within the database rather than server going down, then that's every single customer down, not just 1 customer database.
All in all, it depends what you're doing and if you are wanting the redundancy; for me, the redundancy is now key and everything else in a perfect world shouldn't happen (such as error which causes errors within the database for everyone). We only started off expecting a hundred or so to move to the system from the old self hosted software and that quickly turned into 200,500,1000,1500... We now have over 750,000 users use our system each year and in August/September we have over 15,000 concurrent users online (expecting to hit 20,000 this year).
Hope that this is of help to someone along the line :-)
Regards
Liam

Bandwidth Monitoring in asp.net

Hi, We are developing a multi-tenant application in Asp.Net with separate Database for each tenant, in which one of the requirement is to monitor the bandwidth usage for each tenant,
i have tried to search but not found much help on the topic,we want to monitor exactly how much bandwidth is being used for each tenant while each tenant can have its own top level domain or a sub domain or a combination of both.
so what are the available options, the ones which i can think of can be
IIS Log Monitoring means a separate application which will calculate the bandwidth for each tenant.
Log Each Request and Response for a tenant from within the application and then calculate the total bandwidth usage based on that.
Use some third part components if available
So what do you think will be the best approach, also if there is any other way to do this.
Ok, here is an idea (that I have not test, leave that to you)
On global.asax
use one of this function (find the one that have a valid final size)
Application_PostRequestHandlerExecute
Application_ReleaseRequestState
and get the size that you have send with
Response.Filter.Length
No need to metion, that you get the filename of the call using the
HttpContext.Current.Request.Path
This functions called with every single request, so you can get your size and you do the rest.
Here must note, that you need first to test this idea to see if its work, and maybe improve it, and have in mine that if you have compress the pages on server the length is not the correct and maybe you need to compress it on Global.asax to have the actually lenght.
Hope this help.
Well, since the IIS logs already contain the request size and response size, it doesn't seem like too much trouble to develop a small tool to parse them and calculate the total per day/week/month/whatever.
Trying to segment traffic based on host is difficult in my experience. Instead, if you give each tenant their own IP(s) for the applications you should be able to find programs that will monitor bandwidth based on IP.
ADDITION Is the structure of IIS that you have one website to rule them all for all tenants and on login the system forks to the proper database? If so, this may create problems with respect to versioning in that all tenant's sites will all have to have exactly the same schema and would all need to be updated simultaneously when you update the application such that a schema change is required.
Another structure, which sounds like what you may have, is that each tenant has their own website like so:
tenant1_site/appvirtualdir
tenant2_site/appvirtualdir
...
Where the appvirtualdir points to the same physical path for all tenant's sites. When all clients have the same application version, they are all using literally the same code. If you have this scenario and some sort of authentication, then you will need one IP per tenant anyway because of SSL. SSL will only bind to IP and port unlike non-SSL which will bind to IP, port and host. If that were the case, then monitoring traffic based on IP will still be simpler and more accurate as it could be done at the router or via a network monitor.

Resources