Our application back-end is made of Symfony 2.5.x.
We are using MySQL 5.x and Elastic search 1.7.x for storing our data.
In our application, we store data of different companies in their individual database i.e. multi-tenant approach. Similarly we store companies data in their respective aliases in elastic search.
We had to enhance default fos:elastica:populate command with our custom command so that it checks every company database and sync it with corresponding elastic search alias data.
All this is working fine. No problem so far. Till now we have almost 2000K records in elastic search.
Now, due to one of the requirement, we need to update mapping and store another property in elastic search i.e. store value of another column in elastic search.
Last time we add a new column for all existing aliases and perform bulk updating, 340K records were update on an average in per hour. So this time it might take 6 hours to complete the process.
Is there any way to speed up the process to say within 2 hours.
Our MySQL database is in AWS RDS, application code in Heroku and elastic search is in Found Elastic search provider.
Any help/idea will be much appreciated.
Thanks in advance !
Related
in the project i am working on, we have a database per tenant and each tenant consists of at least 1 department. One of the requirements we have is that when an admin user deletes a department using a custom frontend we've provided, the system should first archive the data of that department on a blob storage before the data is deleted. The same we have for the tenant, we need to archive the data before the database of that tenant is removed from the account.
Now, my question: is there any best practice to do this? We are planning to retrieve all the data from all collections, using a mongo query, based on the department id (which is also the partition key) and then send it to a blob storage. The challenge we have is the execution of the query to retrieve all the data because it can be a huge amount and the RUs required for that action may affect the performance of the system because other users may be using the system while we remove the data.
I looked at mongodump and mongoexport but these are applications so we cannot execute it from our code?
Any ideas? Thanks a lot.
I think one way to solve this is by using ChangeFeed, as it reallyhelps and simplifies writing a carbon copy somewhere else.
However, as of now the change feed processor won't notify you for deleted documents so you can't listen for them, this feature is planned as of now.
Your best bet is to write some custom application that does archiving using Query language support
I would like to know how to send a data to a specific collection into running Solr instance (actually into running SolrCloud instance).
I've started a SolrCloud instance with a bunch of hand-made collections (using SolrCloud Collections REST API) and hence wanted to send some data to a specific collection in order to easily distinct one sort of data from another. Unfortunately I didn't find a way to do that..
It it possible? If it is than how?
The collection is part of the URL you're using when querying any server, usually located under http://localhost:8983/solr/<collection name>/. If you're querying the products collection to retrieve all documents, the url would be http://localhost:8983/solr/products/select?q=*:*. The same goes for updating a collection, POST your content to http://localhost:8983/solr/products/update.
Replace localhost:8983 with one of your own server host/port combinations.
You can also see further examples in the Getting Started with Solr Cloud tutorial.
I'm currently developing an app where the users are first asked to create an account trough a website (ASP.NET) to use the app. For a special reason I need to automatically generate a database for each customer creating an account, on the hosted SQL Server. The databases for all the customers are the same.
I was thinking about doing like that: as I have the script for creating the database, I was thinking to insert it in stored procedure or a trigger that will be launched as soon as the user has fully created his account.
I don't really see other solutions, maybe somebody could give me some guidelines? Thanks in advance.
I think such a design has been shown to not scale. I'd recommend redesigning the schema to allow multiple customers in a single database.
Amazon does not such thing. Neither should you.
I agree duffymo on you would have scalability issues.
However there are situations where in you might prefer separate database as your multi-tenant data approach.
In my last project I had to adopt separate DB approach as business wanted complete isolation for each customer. It was a school administrative system and number of customer was not expected to grow in more than three digits in 5-10 years time.
So the solution I designed was, I used Entity Framework code first approach. Every school will have a unique school identifier which will be used to name the database uniquely for each school. The connection string was generated at runtime obviously. A connection factory was used to create the appropriate DataContext based on passed school identifier. The database is created on first usage if not exist. At the same time SQL script was executed to create db users during db creation if not exist.
If this approach sounds appealing I can share code snippet if that helps.
I have an application which consumes RSS feeds and makes them searchable by performing the following steps:
pulling article from the feed URL
storing that data in a relational DB
indexing the data in Elasticsearch
I'd like to reverse this process so that I can use the RSS River Elasticsearch plugin to pull data from feeds. However, this plugin integrates directly with Elasticsearch, bypassing my relational DB (which is a problem for other parts of the application which rely on each article having a record in the DB).
How can I have Elasticsearch notify the DB when a new article has been indexed (and de-indexed)?
Edit
Currently I'm using Ruby on Rails 4 with a PostgreSQL DB. RSS feeds are fetched in the background using Sidekiq to manage jobs. They go directly into PG and are then indexed by Elasticsearch. I'm using Chewy to provide an interface to the ES index. It doesn't support callbacks like I'm looking for (no Ruby library does afaik?).
Searching queries ES for matches then loads the records from PG to display results.
It sounds like you are looking for the sort of notification/trigger functionality described in this feature request. In the absence of that feature I think the approach suggested in that thread by the user "cravergara" is your best bet - that is, you can alter the RSS river Elasticsearch plugin to update your DB whenever an article is indexed.
That would handle the indexing requirement. To sync the de-indexing, you should make sure that any code that deletes your Elasticsearch documents also deletes the corresponding DB records.
We are going to be selling a service that will be hosted by us, and each client we host will have their own database, but there will be one centralized website. I currently have a blank database with the few things that a new client will need. What is the best way to copy this database so I can setup another client? I want to be able to do this from an .aspx page. Thanks in advance!
Update:
By .aspx page, I just meant that I need to be able to kick off the process from an .aspx page.
Update2:
We're running SQL Server 2008.
Update 3: Referencing Cade Roux's answer... Thanks for a great answer, but...
What is the reason for merging all of the databases into one, and then distinguishing clients based on an identifier in each table? Wouldn't this greatly complicate the architecture of the entire product? I would need to add these Client ID columns to practically every table, and the DAL would need to know which client data its looking for. With the current setup I have, I just switch out the connection string in the DAL, depending on which user is accessing the site. That way, after the connection string is set, I never need to worry about finding client specific data! How do these approaches compare (and should I add this as a separate question?
You have a few different options:
You can detach your empty database, then when a user signs up, copy that database and mount it under a unique name for them and map it to their account in your master database, say.
You can create a database from scratch using scripts and populate any base data either from an online template database or scripting the base data and map it to their account in your master database.
You should seriously consider going to a multi-tenant architecture where all users are in the same database (with most tables having CustomerID columns to segregate the data) if you are going to have more than a few dozen customers.
Regarding your notes about option 3 - it depends on your application. Multi-tenant can be difficult to retrofit. On the other hand, managing and upgrading hundreds of individual customer databases can be difficult in the long haul.
There are previous Stack Overflow questions regarding this:
What are the advantages of using a single database for EACH client?
One database or many?
I think I'll see about re-tagging them with multi-tenant-db or something. Anyhow, I think that this comes up as a consideration secondary to your answer about a particular tactic does show the importance of including details about your overall goals in strategy in every question on StackOverflow.
Depending on what database you're using, there are several approaches. The simplest is to ask your database software to generate SQL code for creating the database and include that with your software. Another would be to just script out in C#/VB the steps needed to recreate your empty database.
Why the need for .aspx page?
You don't say what db version you're using but in SQL2005-2008, you have the ability to "script database as" and then "create to" and have it port the sql to a query window. You could then work with that to create a stored procedure that can be called from your .aspx page.
SQL Server has a system database called 'model'. Any database objects (tables, views, stored procedures) that exist in the model are added to any new database created.
You could create your 'client database' schema as model, and any new database would have all the same tables...
But, if you need to change your database schema later, your best option is to write change scripts which are part of your code-behind file. Since changes to the 'model' database are not propagated to existing databases, the application needs to detect and upgrade the database schema as necessary.
Disadvantage to this approach: If you want a database which isn't a 'client database' then you would need to create the database, and then delete the 'client database' tables.