Pimcore Grid export is taking too much of time to export all the data even we have only 50000 products/records. it's taking couple of hours to export.
Is there any way to do it fast or make this process fast?
Any help or recommendation would be appreciated.
Related
I'm running Rcrawler on a very large website, so it takes a very long time (3+ days with default page depth). Is there a way to not download all the HTMLs to make the process faster?
I only need the URLs that are stored in the INDEX.
Or can anyone recommend another way to make Rcrawler run faster?
I have tried running it with a smaller page depth (5), but it is still taking forever.
I am dealing with the same issue. Depending on the source, in some cases I am even running at depth 1.
Best,
Janusz
I currently have a table in BigQuery with a size of 100+GB that I would like to retrieve to R. I am using the list_tabledata() function in bigrquery package in R, but it takes a huge amount of time.
Anyone has recommendation on handling this large amount of data in R, and how to boost the performance? Like any packages, tools?
tabledata.list is not a great way to consume a large amount of table data from BigQuery - as you note, it's not very performant. I'm not sure if bigrquery has support for table exports, but the best way to retrieve data from a large BigQuery table is using an export job. This will dump the data to a file on Google Cloud Storage that you can then download to your desktop. You can find more info on exporting tables in our documentation.
Another option, would be: instead of bringing that large volume of data to code - try to bring your code to data. This can be challenging in terms of implementing logic in BQL. JS UDF might help. It depends.
In case if this is not doable - i would recommend either use sampled data or revisit your model
I am trying to export all of my leads from Marketo (we have over 20M+) into a CSV file, but there is a 10k row limit per CSV export.
Is there any other way that I can export a CSV file with more than 10k row? I tried searching for various dataloader tool on Marketo Launchpoint but couldn't find a tool that would work.
Have you considered using the API? It may not be practical unless you have a developer on your team (I'm a programmer).
marketo lead api
If your leads are in salesforce and marketo/salesforce are in parity, instead of exporting all your leads, do a sync from salesforce to the new MA tool (if you are switching) instead. It's a cleaner easier sync.
For important campaigns etc, you can create smart lists and export those.
There is no 10k row limit for exporting Leads from a list. However, there is a practical limit, especially if you choose to export all columns (instead of only the visible columns). I would generally advise on exporting a maximum of 200,000-300,000 leads per list, so you'd need to create multiple Lists.
As Michael mentioned, the API is also a good option. I would still advise to create multiple Lists, so you can run multiple processes in parallel, which will speed things up. You will need to look at your daily API quota: the default is either 10,000 or 50,000. 10,000 API calls allow you to download 3 million Leads (batch size 300).
I am trying out Data Loader for Marketo on Marketo Launchpoint to export my lead and activity data to my local database. Although it cannot transfer marketo data to CSV file directly, you can download Lead to your local database and then export to get a CSV file. For your reference, we have 100K leads and 1 billion activity data.
You might have to run multiple times for 20M leads, but the tool is quite easy and convenient to use so maybe it’s worth a try.
Initially there are 4 steps to get bulk leads from marketo
1. Creating a Job
2. Enqueue Export Lead Job
2. Polling Job Status
3. Retrieving Your Data
http://developers.marketo.com/rest-api/bulk-extract/bulk-lead-extract/
I developed a web application in ASP.Net and VB.In tried to display report in datagrid.And there is an option to export report from datagrid to MS Excel.But due to excess amount of data i am not able to export to into MS excel.With less amount of data export function works properly.Is there any option to export data page by page,or any other option? Please help me to solve this issue.
you could export it as a CSV. the overhead should be a lot less.
I'm using Table Wizard + Migrate module to import nodes into my Drupal installation.
I need to import around 60,000 questions / answers (they are both nodes) and I thought it would have been an easy task.
However, the migrate process imports 4 nodes per minute, and it would take approximately 11 days to finish the importing.
I was wondering if I can make it faster by importing directly in mysql. But I actually need to create 60,000 nodes. I guess Drupal is going to store additional information in other tables... and it is not that safe.
what do you suggest me to do ? Wait 10 days ?
Thanks
Table migrate should be orders of magnitude faster than that.
Are you using pathauto?
If yes, try disabling the pathauto module, often causes big performance problems on import.
Second, if disabling pathauto doesn't work, turn off all non-essential modules you may have running - some modules do crazy stuff. Eliminate other modules as the sources of the problem.
Third, is MySQL db log turned on? That can have a big performance impact - not the level you are talking about, but its something to consider.
Third, install xdebug, and tail your mysql log to see exactly whats happening.
What is your PHP memory limit?
Do you have plenty of disk space left?
If you're not doing it, you should use drush to migrate the nodes in batches. You could even write a shell script for it, if you want it automated. Using the command line should lower the time it takes to import the nodes a lot. With a script, you can make it an automated task that you don't have to worry about.
One thing I want to note though, 4 nodes per minute is very low. I once needed to import some nodes from a CSV file, using migrate etc. I needed to import 300 nodes, with location, 4-5 CCK fields and I did it in a matter of seconds. So if you only import 4 nodes per minute, you either have extremely complex nodes, or something fishy is going on.
What are the specs of the computer you are using for this? Where's the import source located?
This is a tough topic, but within Drupal actually very well covered. I don't know the ins- and outs. But do know where to look.
Data Mining Drupalgroup has some pointers, knowledge and information on processing large amounts of data in PHP/Drupal.
Drupal core has batch-functionality built in and called BatchAPI At your service when writing modules! For a working example, see this tutorial on CSV import.
4 node per minute is incredibly slow. Migrate shouldn't normally take that long. You could speed things up a bit by using Drush, but probably not enough to get a reasonable import time (hours, not days). That wouldn't really address your core problem: your import itself is taking too long. The overhead of the Migrate GUI isn't that big.
Importing directly into MySQL would certainly be faster, but there's a reason Migrate exists. Node database storage in Drupal is complicated, so it's generally best to let Drupal work it out rather than trying to figure out what goes where.
Are you using Migrate's hooks to do additional processing on each node? I'd suggest adding some logging to see what exactly is taking so long. Test it on 10 nodes at a time until you figure out the lag before doing the whole 60k.
We had a similar problem on a Drupal 7 install. Left it run all week-end on an import, and it only imported 1,000 lines of a file.
The funny thing is that exactly the same import on a pre-production machine was taking 90 minutes.
We ended up comparing the source code (making sure we are at the same commit in git), the database schema (identical), the quantity of node on each machine (not identical but similar)...
Long story made short, the only significant difference between the two machines was the max_execution_time option in the php.ini settings file.
The production machine had max_execution_time = 30, while the pre-production machine had max_execution_time = 3000. It looks like the migrate module has a kind of system to handle "short" max_execution_time that is less than optimal.
Conclusion : set max_execution_time = 3000 or more in your php.ini, that helps a lot the migrate module.
I just wanted to add a note saying the pathauto disable really does help. I had an import of over 22k rows and before disabling it took over 12 hours and would crash multiple times during the import. After disabling pathauto and then running the import, it took only 62 minutes and didn't crash once.
Just a heads up, I created a module that before the import starts, disables the pathauto module, and then upon the feed finishing, reenables the pathauto module. Here's the code from the module in case anyone needs to have this ability:
function YOURMODULENAME_feeds_before_import(FeedsSource $source) {
$modules = array('pathauto');
drupal_set_message(t('The ').$modules[0].t(' has been deployed and should begin to disable'), 'warning');
module_disable($modules);
drupal_set_message(t('The ').$modules[0].t(' module should have been disabled'), 'warning');
}
function YOURMODULENAME_feeds_after_import(FeedsSource $source) {
$modules = array('pathauto');
module_enable($modules);
drupal_set_message($modules[0].t(' should be reenabled now'), 'warning');
}