I got a question regarding the search module of DNN. I am currently using DNN community edition 7.4 and I want to index PDF documents so that I can search on the content of the PDF files.
Now after a lot of research I found the following conclusions:
By default DNN does not support this. You can enable this by using external modules as: Dnn search engine and Search boost 3.2
Community edition does not allow to index documents but only pages. The professional edition does include indexing documents and paging.
Now I am wondering are these conclusions valid? If I upgrade my community edition then can I index documents and if I keep using the community edition my only option is buying external modules?
You are correct.
The Evoq (professional) editions have three search crawler schedule tasks that index different types of content:
File Crawler will index any supported document file types in the File Manager to include in the search results. This crawler will parse files such as PDF and Office documents.
Site Crawler will index module content from HTML modules as well as any third-party modules that support the search integration.
Url Crawler will crawl and parse HTML pages typically on external urls where you cannot get the content directly from the CMS with the Site crawler.
The DNN Community edition only comes with the Site Crawler. So if you need file parsing and indexing, you will need to use a third-party module like Searchboost or upgrade to one of the Evoq editions.
Related
I am working on a project where we have implemented content management with word.
We have some word files, that are being processed using OpenXML.
Users can open those files in two ways - download a copy or edit online. Online editing is implemented using Office Online Server and custom Wopi server, built based on this example.
Editing online works fine, but Word Online has limited features compared to desktop Word.
I am trying to build a functionality similar to Sharepoint, where user has 2 options - Edit in Word, Edit in Browser:
In Office Online Server I don't have such options, I can only edit in browser:
Even in edit mode Sharepoint provides a link for Edit in Word:
whereas Office Online Server does not have it:
My question is how it is implemented in Sharepoint?
In other words, am I missing something in Wopi server to enable it or Microsoft has built this functionality into Sharepoint, without the need of Wopi and/or OWA?
Any ideas would be appreciated!
To enable "Edit in Word" in Office Online Server when using a WOPI handler, you need to set the ClientUrl property in CheckFileInfo (and CheckFolderInfo if you implement that). ClientUrl should be set to a direct editable link for the document file, either WebDAV or FSHTTP, but you could even use a file:// link for testing.
When you set the ClientUrl property, Office Online behavior becomes very similar to OneDrive/SharePoint Online. The current WOPI documentation is a bit outdated, it lists this property under Unused and future properties, but there is nothing secret about it. I asked dochelp#microsoft.com, that is Microsoft's "Open Specifications Support" mailbox, mentioned in many of their presentations and publications about WOPI and Office Online.
Word Online Reading View:
Word Online Editing View after clicking OPEN IN WORD:
I'm pretty sure that the functionality (Edit in Word) is not part of the Office Online Server and that it doesn't utilize the WOPI protocol. In the previous versions of SharePoint, it was implemented using WebDAV and I guess this hasn't changed. If you want to support opening/editing/saving you should implement your own WebDAV server. You can save a lot of time if you use a pre-built server like one from ITHit. They also have a JS framework to support opening files from browser.
If you want a cheap, cross-browser alternative that will just invoke the editing apps I suggest you have a look at Office URIs.
I have to implement full-ext search for website based on SDL Tridion WCMS. Any suggestions or an idea how to implement full text search using Tridion Query?
The SDL Tridion Content Delivery API is designed for retrieval of content based on system or custom metadata and/or taxonomy. The full text is not available via the API for searching. To implement a full text site search on a Tridion site it is normal to use/integrate a separate search engine, such as Google Site Search or one of the Lucene based solutions. The best integrations usually use a storage extension to notify the search indexer when content has changed.
See How can we integrate Microsoft FAST with SDL Tridion 2011 SP1? and Extending Content Delivery Storage in SDL Tridion 2011 for some ideas/examples.
If your site is accessible to a Google bot, Google Site Search is easy.
You might also look to the app server for your full text search (for instance its in a .NET/SQL environment).
If you want an enterprise search platform, check out the open source Solr. With Java, .Net and JavaScript APIs and a REST-based server/service, this open source option is worth taking a long look.
Not to go too far off topic, but this helped me visualize when I was answering the same question for the first time: site search means three things. One, a search engine; two, a search schema/index (decide what the beast eats and feed it); three, a search user interface.
Does anyone know of a way to include pdf documents in the search for drupal 7?
I can't find anything to achieve this.
The Apache Solr Attachments module does this, but currently only has a development (not stable) release for Drupal 7.
Check out the Search API module and it's sibling module Search API Attachments. It uses Apache Tika to parse text from documents.
Another option:
https://drupal.org/project/search_files
This is slightly lighter weight but given SOLR's power and community adoption it is probably still the best return for effort (as a service or installed yourself), whether using Apache SOLR modules or Search API modules.
I am comparing Alfresco, Magnolia & Joomla especially specific to following features:
a. Ease of Integration of user created templates.
b. JCR (JSR-170?) or CMIS compliance.
c. Scalability in architecture.
d. Mobile site deployment.
I used cmsmatrix.org to compare features but I could not get some of the specific information related to above mentioned points.
Any insights based on your experience on working with one or more of the above CMS products will be helpful.
Thanks,
Krish.
While these four products are branded as CMS I don't think they are really comparable. Drupal and, for what I know, Joomla are web publishing CMS (or WCMS), they are designed to create web sites and manage their content. They are not designed as generic CMS, DMS or ECM. Alfresco, and probably Magnolia, are ECM/DMS designed to manage enterprise contents.
For instance, while manageable in Drupal (given enough effort and custom PHP code), complex multi-states multi-actor workflow for multilingual documents (PDF, Office, etc.) are probably easier to manage with Alfresco. And Alfresco is probably not suitable to manage web content with lightweight publishing workflow and user generated content.
Having the managed content published on a web site does not means it has to be managed by the same tools that the one used to manage the web site. For instance, using the Drupal CMIS module, you can bridge it with Alfresco (or any CMIS compliant ECM) to manage your enterprise content in the suitable tools but publish parts of it on a Drupal site.
Summarizing inputs I received here along with what I found in my search from various discussions so far (thanks #mongolito404 and bkraft).
For web content management features - Drupal / Joomla is recommended.
For Enterprise Content Management / Document Management features with minimal web publishing features - Alfresco / Magnolia is recommended.
For specific requirements the best of different tools can be used - Drupal to publish web content via CMIS support. Alfresco as solution for workflow & document management.
Alfresco already supports & continues to have CMIS in product roadmap (contributes to CMIS community).
Drupal is CMIS compliant (OOTB) with strong web content capability.
Leveraging best of both (Alfresco & Drupal) could also be one of the options depending on the requirement. Refer: http://www.optaros.com/blogs/drupal-alfresco-integration#
Another interesting option seems to be Liferay (v6+ specifically) with their CMIS integration capability: http://www.liferay.com/web/jonas.yuan/blog/-/blogs/integrating-alfresco-through-cmis-in-liferay
Thanks,
Krish.
Can't speak for the others, but from Magnolia's perspective, ease of integration is certainly a core feature. It runs on the Java platform, so integration is a given from the platform side. In addition Magnolia has been rated the most flexible CMS on the market today by independent analyst Tony White of Ars Logica download his free report (always worth a read, and other reports are also available).
JCR: Magnolia is based on JCR, and was so since the first line of code
CMIS: not implemented yet, but planned for Magnolia 5 to be shipped late this year
Scalability: Magnolia's got it covered. See our case studies
Mobile site deployment: again, comes naturally to Magnolia thanks to its architecture and rich out of the box functionality.
Regards
- Boris
Update: CMIS is available as a community module since Magnolia v4.5
Which is the best way to create a site search engine for a dynamic asp.net site with hundreds of dynamic pages. I have seen many products and articles
http://www.karamasoft.com/UltimateSearch/overview.aspx
http://www.sitesearchasp.net
http://www.easysearchasp.net/
http://msdn.microsoft.com/en-us/magazine/cc163355.aspx
http://www.codeproject.com/KB/asp/indexserver.aspx
Priyan,
Another high-quality open-source option would be the .NET port of Lucene
CodeProject - Introducing Lucene
dotlucene
lucene.net
You haven't mentioned Google's SiteSearch "product". Is one of your requirements that you'd like to host the search engine/catalog yourself?
Microsoft also has a product Search Server 2008 Express although I'm not sure if you can install it on any hosting provider.
And (disclaimer: I am the author) there is also a very basic open source project on CodeProject called Searcharoo (also at searcharoo.net). It is really meant as a 'demonstration/learning experience' - hence the six how to articles - but it might suffice for a small dynamic site.
I have used SQL Server Full Text Search for some projects - works well but it's really just searching database content, not a combination of static and dynamic Html/Pdf/Word/Jpg etc documents which a "real" web crawler will do.