Tika and Solr Drupal 7 Indexing on Cron

Tika and Solr Drupal 7 Indexing on Cron - drupal

Using Drupal/Search API module/Solr/Tika we are trying to index a large number of files.
I've set up the index and everything works fine until I include the Search API attachments module.
When we run cron, tika is not being called. We know this because we put in a snippet of PHP code to write to the system log at the end of the tika module and that message never shows up. It does show up when running the index manually.
Additionally, the number of items index does not go up after a cron run.
We also noticed that if we run tika from the command line we get the following error at the top of the output:
INFO - unsupported/disabled operation: EI
The index works as expected without checking the box to index attachments both on cron and by manually indexing.
Any idea what the problem might be?
Thanks!
Site Built On:
Drupal 7
Modules In Question:
Search API
Search API Attachments
Indexing with:
Apache Solr
Indexing Attachments using:
Tika Library

I have the same problem. But it does not seem to be a problem at all, because the document seems to get indexed anyway.
I guess it is a Tika problem, because some documents (pdf) are working well, others not. Maybe it depends on the pdf version. Try something more simple. I.E. I wrote a sample text and used the print to pdf function on my mac to get a simple pdf document. Or use a Word doc. We also had to apply the real-path patch to get Tika working with the files ... and the transliteration module to have clean filenames. For debugging search_api I use dd()-function from devel. In search_api_solr/includes/solr_httptransport.inc performHttpRequest() I call
dd($url); dd($options); right before $response = drupal_http_request($url, $options); (line:92) ... hopefully this helps.

Related

How can I use SonarQube web service API for reporting purpose

I want to create a custom report. Response format for sonarqube web service API /api/issues/search is JSON or XML. How can I use that response to create a html or CSV file using "unix shell without using command line tools" so that I can use it as a Report. Or is there any other better way to achieve this?

you can generate a html file if you run an analysis in the preview mode http://docs.sonarqube.org/pages/viewpage.action?pageId=6947686

It looks as if the SonarQube team has been working hard to not allow people to do this. They appear to want people to purchase an Enterprise Subscription in order to export reports.
An old version of sonar-runner (now called sonar-scanner) had an option to allow local report output. But that feature is "no more supported".
ERROR: The preview mode, along with the 'sonar.analysis.mode' parameter, is no more supported. You should stop using this parameter.
Looks like version 2.4 of Sonar Runner does what you want. If you can find it. Of course they only have 2.5RC1 available on the site now.
Using the following command should work on version 2.4:
sonar-runner -Dsonar.analysis.mode=preview -Dsonar.issuesReport.html.enable=true

There at least two open-source projects that query the SQ API to generate reports in various formats.
https://github.com/cnescatlab/sonar-cnes-report/tree/dev (Java)
https://github.com/soprasteria/sonar-report (JavaScript/Node)
At the time of writing both are active.

single download file for multiple applications

I have a website on which I have published several of my applications.
Right now I have to update it each time one of the applications is updated.
The applications themselves check for updates so the user only visits the website if they don't have a previous version installed.
I would like to make it easier for me by creating a single executable that when downloaded and executed, will check with the database which version is the most recent and then download that one and run that setup.
Now I can make a downloader for each application, but I rather make something more universal with a parameter or argument as the difference.
For the download the 'know' which database to check for the most recent version, I need to pass on the data to the downloader.
My first thought was putting that in a XML file, so I only have to generate different xml files for each application, but then it wouldn't be a single executable anymore.
My second thought was using commandline arguments like: downloader.exe databasename
But how would I do that when the file is downloaded?
Would a link like: "https://my.website.com/downloader.exe databasename" work?
How could I best do this?
rg.
Eric

Searching through elmah error log files(Perhaps in 1000's)

We have a folder of elmah error logs in XML format. These files will be in millions and each file might be upto 50 kb in size. We need to be able to search on the files(eg: What errors occured, what system failed etc). Do we have a open source system that will index the files and perhaps help us search through the files using keywords? I have looked at Lucene.net but it seems that I will have the code the application.
Please advise.

If you need to have the logs in a folder in XML, elmah-loganalyzer might be of use.
You can also use Microsoft's Log Parser to perform "sql like" queries over the xml files:
LogParser -i:XML "SELECT * FROM *.xml WHERE detail like '%something%'"
EDIT:
You could use a combination of nutch+SOLR or logstash+Elastic Search as an indexing solution.
http://wiki.apache.org/nutch/NutchTutorial
http://lucene.apache.org/solr/tutorial.html
http://blog.building-blocks.com/building-a-search-engine-with-nutch-and-solr-in-10-minutes
http://www.logstash.net/
http://www.elasticsearch.org/tutorials/using-elasticsearch-for-logs/
http://www.javacodegeeks.com/2013/02/your-logs-are-your-data-logstash-elasticsearch.html

We are a couple of developers doing the website http://elmah.io. elmah.io index all your errors (in ElasticSearch) and makes it possible to do funky searches, group errors, hide errors, time filter errors and more. We are currently in beta, but you will get a link to the beta site if you sign up at http://elmah.io.
Unfortunately elmah.io doesn't import your existing error logs. We will open source an implementation of the ELMAH ErrorLog type, which index your errors in your own ElasticSearch (watch https://github.com/elmahio for the project). Again this error logger will not index your existing error logs, but you could implement a parser which runs through your XML files and index everything using our open source error logger. Also you could import the errors directly to elmah.io through our API, if you don't want to implement a new UI on top of ElasticSearch.

Tridion: Binary components do not get deployed when published in bulk

I am using Tridion 5.3.
I have webpage that has over 100 pdf links attached to it. When I publish that page not all pdf get published even though I get a URL for each pdf like "/pdf/xyzpdfname_tcm8-912.pdf". When I click on those links I get a 404 error. For the same pdf components for which I get the error, if I publish them by attaching 5 to 10 pdf at a time they get published and there is no 404 error and everything works fine. But that's not the functionality I need. Does any one know why Tridion is not able to deploy the binary contents if I publish them in bulk?
I am using engine.PublishingContext.RenderedItem.AddBinary(pdfComponent).Url to get the pdf url.

Could this be to do with the naming of your PDF?
Tridion has a mechanism in place to prevent you from accidentally overwriting a binary file, with a different binary file that is named the same.
I can see the Binary you are trying to deploy has the ID:
tcm:8-755-16
and you are naming it as follows:
/www.mysite.com/multimedia/pdfname_tcm8-765.pdf
Using the Variant Id:
variantId=tcm:8-755
is it possible you are also publishing the same binary from a different template? Perhaps with the same filename, but with a different Variant Id?
If so Tridion assumes you are trying to publish two 'Variants' of the same binary (for example a resized image, obviously not relavent for PDFs)
The deployer is therefore throwing an error to prevent you from accidentally overwriting the binary that is published first.
You can get round this in 2 ways:
1> Use the same variant ID for publishing both binaries
2> If you do want to publish a variant, change the filename to something different.
I hope this helps!

Have a look at the log files for your transport service and deployer. If those don't provide clarity, set Cleanup to false in cd_transport_conf.xml, restart the transport service and publish again. Then check if all PDFs ended up in your transport package.

engine.PublishingContext.RenderedItem.AddBinary(pdfComponent).Url gives you the URL of an item as it will be published in case of success, not a guarantee that it will publish.
Pretty sure you're just hitting a maximum size limit on your transport package.
PS - Check the status of your transaction in the publishing queue, might give you a hint
After you updated the question:
There's something terribly wrong with the template and/or your environment. The Published URL says "tcm8-7*6*5.pdf" but the Item Uri is "tcm:8-7*5*5".
Can you double check what's happening in here?

need help in choosing the right tool

I have a client who has set-up a testing environment in some AI language. It basically runs some predefined test cases and stores the results in as log files (comma separated txt files). My job is to identify and suggest a reporting system and I have these options in mind. either
1. Importing the logs into MSSQL and use the reporting(SSRS) it uses
2. or us import the logs to MySQL and use PHP to develop custom reporting.
I am thinking that going with option2 is better. The reason for this is, the logs are inconsistent and contain unexpected wild characters that normally DB's don't accept. So, I can write some scripts in php before loading them to the database.
Can anyone please suggest if this is your problem what will you suggest to do?

It depends how fancy you need to be. If the data is in CSV files, you could even go so simple as to load it into Excel (or their favorite spreadsheet tool), and use spreadsheet macros to analyze it.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Tika and Solr Drupal 7 Indexing on Cron - drupal

Related

How can I use SonarQube web service API for reporting purpose

single download file for multiple applications

Searching through elmah error log files(Perhaps in 1000's)

Tridion: Binary components do not get deployed when published in bulk

need help in choosing the right tool

Categories

Resources