I have created an index that pulls data from a SQL Server database using TikaEntityProcessor. The query associated with my configuration file pulls from a table containing file information, as well as the file content as a binary column. My index returns all fields from the database table in which I have configured, as well as the column for "text" that is the body of the file content. It correctly indexes the file text! However, the meta columns are not working! You can see I have a field for text/body, this works fine. However, I cannot get any metadata from the file such as last modified date or author.
Any suggestions would be greatly appreciated!!
data-config:
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
url="jdbc:sqlserver://server;databaseName=db1;integratedSecurity=false"
user="user"
password="XXXXXX" convertType="false"
name="ds"/>
<dataSource name="fieldReader"
type="FieldStreamDataSource" />
<document name="tika">
<entity name="tika" pk="id" transformer="TemplateTransformer" dataSource="ds"
query="select id, title from myDatabaseTable">
<entity name="tika-test" processor="TikaEntityProcessor" dataSource="fieldReader"
dataField="tika.FileContent" format="text">
<field column="text" name="body"/>
<field column="Last-Modified" name="lastModified" meta="true" /> <!-- not working -->
</entity>
</entity>
</document>
</dataConfig>
schema:
<field name="id" type="integer" indexed="true" stored="true" />
<field name="body" type="text" indexed="true" stored="true" />
<field name="lastModified" type="text" indexed="true" stored="true" />
<field name="title" type="text" indexed="true" stored="true" />
Thanks!!
Related
Solr Version : 5.0
So I am working on Solr for first time, and really not understand perfectly. Here what I did :-
I have created a core named - search
Then my schema.xml file has follwoing code :
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="simple" version="1.5">
<types>
<fieldtype name='string' class='solr.StrField' />
<fieldtype name='long' class='solr.TrieLongField' />
</types>
<fields>
<field name='id' type='int' required='true' indexed="true"/>
<field name='name' type='text' required='true' indexed="true"/>
</fields>
<uniqueKey>id</uniqueKey>
<defaultSearchField>fullText</defaultSearchField>
<solrQueryParser defaultOperator='OR' />
</schema>
solrconfig.xml :
<?xml version='1.0' encoding='UTF-8' ?>
<config>
<luceneMatchVersion>5.0.0</luceneMatchVersion>
<lib dir="../../../../dist/" regex="solr-dataimporthandler-.*\.jar" />
<requestHandler name="standard" class="solr.StandardRequestHandler" default='true' />
<requestHandler name="/update" class="solr.UpdateRequestHandler" />
<requestHandler name="/admin/" class="org.apache.solr.handler.admin.AdminHandlers" />
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">db-data-config.xml</str>
</lst>
</requestHandler>
<admin>
<defaultQuery>*:*</defaultQuery>
</admin>
</config>
db-data-config.xml :
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/solr"
user="root"
password="" />
<document>
<entity name="users" query="select id,name from users;" />
</document>
</dataConfig>
I have created a database on PHPmyadmin please find below SG :
when I clicked query on solr panel then it shows empty why ?
Can anyone help me on this, as I am new to solr search. What I am doing wrong ?
I dont see a field named "fulltext" in schema.xml but why its defined as the default search
<defaultSearchField>fullText</defaultSearchField>
change it
<defaultSearchField>name</defaultSearchField>
mention the fields in the data config xml
<field column="ID" name="id" />
<field column="NAME" name="name" />
your data-config should look alike
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/solr"
user="root"
password="" />
<document>
<entity name="users" query="select id,name from users">
<field column="ID" name="id" />
<field column="NAME" name="name" />
</entity>
</document>
</dataConfig>
add it as in schema.xml
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
</types>
<fields>
<field name='id' type='int' required='true' indexed="true" stored="true"/>
<field name='name' type='string' required='true' indexed="true" stored="true"/>
<fields>
Make the changes in your db-data-config.xml similar to what i have done
<entity name="city_masters" pk="city_id" query="SELECT delete_status as
city_masters_delete_status,city_id,country_id,city_name,city_updated from
city_masters>
<field column="city_id" name="id"/>
<field column="city_name" name="city_name" indexed="true" stored="true" />
<field column="country_id" name="country_id" indexed="true" stored="true" />
<field column="city_masters_delete_status" name="city_masters_delete_status"
indexed="true" stored="true" />
</entity>
You missed out the field column part.Add them like i have done for my code and it should work.If still doesnt work let me know
I start to model the db/entities in symfony2.
I have two entities. First is a User and second is a Groups.
I want to connect both. I think I should use many to many relations(?).
But my major problem is how can I get list of all groups with information if user joined this group.
Group Entity:
<?xml version="1.0" encoding="utf-8"?>
<doctrine-mapping xmlns="http://doctrine-project.org/schemas/orm/doctrine-mapping" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://doctrine-project.org/schemas/orm/doctrine-mapping http://doctrine-project.org/schemas/orm/doctrine-mapping.xsd">
<entity name="GroupBundle\Entity\Group">
<id name="id" type="integer" column="id">
<generator strategy="AUTO"/>
</id>
<field name="name" type="string" column="name" length="50"/>
<field name="description" type="string" column="description"/>
<many-to-many field="users" mapped-by="groups" target-entity="AccountBundle\Entity\User"/>
</entity>
</doctrine-mapping>
and User Entity:
<?xml version="1.0" encoding="utf-8"?>
<doctrine-mapping xmlns="http://doctrine-project.org/schemas/orm/doctrine-mapping" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://doctrine-project.org/schemas/orm/doctrine-mapping http://doctrine-project.org/schemas/orm/doctrine-mapping.xsd">
<entity name="AccountBundle\Entity\User">
<id name="id" type="integer" column="id">
<generator strategy="AUTO"/>
</id>
<field name="username" type="string" column="username" length="255"/>
<field name="password" type="string" column="password" length="255"/>
<field name="salt" type="string" column="salt" length="255"/>
<field name="email" type="string" column="email" length="255"/>
<field name="active" type="boolean" column="active"></field>
<field name="token" type="string" column="token" length="255"/>
<field name="lastLoginTime" type="datetime" column="lastLoginTime" nullable="true"/>
<field name="registerTime" type="datetime" column="registerTime"/>
<one-to-many field="events" target-entity="CoreBundle\Entity\Event" mapped-by="user" />
<many-to-many field="groups" inversed-by="users" target-entity="GroupBundle\Entity\Group">
<join-table name="UserGroups">
<join-columns>
<join-column name="userId" referenced-column-name="id" />
</join-columns>
<inverse-join-columns>
<join-column name="groupId" referenced-column-name="id" />
</inverse-join-columns>
</join-table>
</many-to-many>
</entity>
</doctrine-mapping>
There is no relation between there yet because I don't know what is best.
Maybe I must create additional Entity between like UserGroup?
D4V1D you have a right. Assumption is: User can join to many group and group can have a many user.
Okey I add the many-to-many relations. I hope I did it right.
And now. This is fragment of my controller:
$em = $this->getDoctrine()->getManager();
$repo = $em->getRepository("GroupBundle\Entity\Group");
$groups = $repo->findAll();
$user = $this->getUser();
$user_groups = $user->getGroups();
foreach($user_groups as $user_group){
/* #var $group Group */
foreach($groups as $group){
if($group->getId() == $user_group->getId()){
$group->setUserInGroup(true); // of course i extend my entity file about extra set and get method.
}
}
}
return $this->render('GroupBundle:showAll.html.twig', array('groups' => $groups));
and view:
{% for group in groups %}
<div>
{{ group.name }}
{% if(group.getUserInGroup()) %}
join
{% endif %}
</div>
{% endfor %}
I try find the best and right method to do this.
Without relation OneToMany with user and group, how do you join the group with user? which field have this information? you need a relation yes.
Group entity :
<many-to-many field="users" target-entity="AccountBundle\Entity\User" mapped-by="groups" />
User entity :
<many-to-many field="groups" target-entity="GroupBundle\Entity\Group" inversed-by="users"/>
I work with symfony2 and there i use doctrine. As Example i have a simple repository-class in my doctrine xml-file:
<?xml version="1.0" encoding="utf-8"?>
<doctrine-mapping xmlns="http://doctrine-project.org/schemas/orm/doctrine-mapping" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://doctrine-project.org/schemas/orm/doctrine-mapping http://doctrine-project.org/schemas/orm/doctrine-mapping.xsd">
<entity name="AllgemeinBundle\Entity\ObjektPosition" table="objekt_position" repository-class="AllgemeinBundle\Repository\ObjektPositionRepository">
<indexes>
<index name="id_objekt_subunternehmer_position_fk2_idx" columns="id_subunternehmer"/>
<index name="id_objekt_objektposition_idx" columns="id_objekt"/>
</indexes>
<id name="id" type="integer" column="id">
<generator strategy="IDENTITY"/>
</id>
<field name="artikelnummer" type="integer" column="artikelnummer" nullable="false"/>
<field name="preisProEinheit" type="float" column="preis_pro_einheit" precision="10" scale="2" nullable="false"/>
<field name="p1Einheit" type="float" column="p1_einheit" precision="10" scale="2" nullable="false"/>
<field name="p2Einheit" type="float" column="p2_einheit" precision="10" scale="2" nullable="true"/>
<field name="p3Einheit" type="float" column="p3_einheit" precision="10" scale="2" nullable="true"/>
<field name="zusatztext" type="text" column="zusatztext" nullable="true"/>
<field name="position" type="integer" column="position" nullable="false"/>
<many-to-one field="idSubunternehmer" target-entity="Subunternehmer">
<join-columns>
<join-column name="id_subunternehmer" referenced-column-name="subunternehmernummer"/>
</join-columns>
</many-to-one>
<many-to-one field="idObjekt" target-entity="Objekt">
<join-columns>
<join-column name="id_objekt" referenced-column-name="id"/>
</join-columns>
</many-to-one>
when i am generating my entities from the database, then the repository-class would be deleted. Also some other thinks i added to the xml-file.
Is it able to save the customized data in a sperated folder or file, so that i can generate the entities as often as i like and the customized data wouldn't be lost?
If you are using console commands, when generating entities don't mention '--no-backup' at the end of the command. So, you will be able to preserve your entity class. However you entity class will be renamed with adding '~' sign at the start of the file name. Your entity generation code will be like this, (without trailing --no-backup)
php app/console doctrine:generate:entities
Hope this helps,
Cheers!
Hi,
I have a drupal 7 , apache solr integration module with solr attachments module. I never got that notice before but i am puzzled why the search snippets are not getting displayed. Looked at the code mentioned in the notice and did a print_r($snippets) and found that snippets where in the variable and just not getting displayed in the search results. what could be the reason for this ?
solrconfig.xml
<requestHandler name="drupal" class="solr.SearchHandler" default="true">
<lst name="defaults">
<str name="defType">dismax</str>
<str name="echoParams">explicit</str>
<bool name="omitHeader">true</bool>
<float name="tie">0.01</float>
<str name="pf">
content^2.0
</str>
<int name="ps">15</int>
<!-- Abort any searches longer than 4 seconds -->
<!-- <int name="timeAllowed">4000</int> -->
<str name="mm">1</str>
<str name="q.alt">*:*</str>
<!-- example highlighter config, enable per-query with hl=true -->
<str name="hl">true</str>
<str name="hl.fl">content</str>
<int name="hl.snippets">1</int>
<str name="hl.mergeContiguous">true</str>
<!-- instructs Solr to return the field itself if no query terms are
found -->
<str name="f.content.hl.alternateField">teaser</str>
<str name="f.content.hl.maxAlternateFieldLength">256</str>
<!-- JS: I wasn't getting good results here... I'm turning off for now
because I was getting periods (.) by themselves at the beginning of
snippets and don't feel like debugging anymore. Without the regex is
faster too -->
<!--<str name="f.content.hl.fragmenter">regex</str>--> <!-- defined below -->
<!-- By default, don't spell check -->
<str name="spellcheck">false</str>
<!-- Defaults for the spell checker when used -->
<str name="spellcheck.onlyMorePopular">false</str>
<str name="spellcheck.extendedResults">false</str>
<!-- The number of suggestions to return -->
<str name="spellcheck.count">1</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
This is the code from the notice which is in the apachesolr_attachments.module
enter code here
function theme_apachesolr_search_snippets__file($vars) {
$doc = $vars['doc'];
$snippets = $vars['snippets'];
$parent_entity_links = array();
// Retrieve our parent entities. They have been saved as
// a small serialized entity
foreach ($doc->zm_parent_entity as $parent_entity_encoded) {
$parent_entity = (object) drupal_json_decode($parent_entity_encoded);
$parent_entity_uri = entity_uri($parent_entity->entity_type, $parent_entity);
$parent_entity_uri['options']['absolute'] = TRUE;
$parent_label = entity_label($parent_entity->entity_type, $parent_entity);
$parent_entity_links[] = l($parent_label, $parent_entity_uri['path'], $parent_entity_uri['options']);
}
if (module_exists('file')) {
$file_type = t('!icon #filemime', array('#filemime' => $doc->ss_filemime, '!icon' => theme('file_icon', array('file' => (object) array('filemime' => $doc->ss_filemime)))));
}
else {
$file_type = t('#filemime', array('#filemime' => $doc->ss_filemime));
}
//print_r($snippets);echo "\n";
return implode(' ... ', $snippets) . '<span>' . $file_type . ' <em>attached to:</em>' . implode(', ', $parent_entity_links) . '</span>';
}
schema.xml
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="name" type="string" stored="true" indexed="true"/>
<!-- entity_id is the numeric object ID, e.g. Node ID, File ID -->
<field name="entity_id" type="long" indexed="true" stored="true" />
<!-- entity_type is 'node', 'file', 'user', or some other Drupal object type -->
<field name="entity_type" type="string" indexed="true" stored="true" />
<!-- bundle is a node type, or as appropriate for other entity types -->
<field name="bundle" type="string" indexed="true" stored="true"/>
<field name="bundle_name" type="string" indexed="true" stored="true"/>
<field name="text" type="text" stored="true" indexed="true"/>
<field name="site" type="string" indexed="true" stored="true"/>
<field name="hash" type="string" indexed="true" stored="true"/>
<field name="url" type="string" indexed="true" stored="true"/>
<!-- label is the default field for a human-readable string for this entity (e.g. the title of a node) -->
<field name="label" type="text" indexed="true" stored="true" termVectors="true" omitNorms="true"/>
<!-- The string version of the title is used for sorting -->
<copyField source="label" dest="sort_label"/>
<!-- content is the default field for full text search - dump crap here -->
<field name="content" type="text" indexed="true" stored="true" termVectors="true"/>
<field name="content_type" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="teaser" type="text" indexed="false" stored="true"/>
<field name="language" type="text_en" stored="true" indexed="true"/>
<field name="path" type="string" indexed="true" stored="true"/>
<field name="path_alias" type="text" indexed="true" stored="true" termVectors="true" omitNorms="true"/>
<field name="created" type="string" indexed="true" stored="true" termVectors="true"/>
<field name="Question" type="text" indexed="true" stored="true" termVectors="true"/>
<field name="Response" type="text" indexed="true" stored="true" termVectors="true"/>
<field name="Module" type="text" indexed="true" stored="true" termVectors="true"/>
<field name="Meets" type="text" indexed="true" stored="true" termVectors="true"/>
<field name="cat" type="string" indexed="true" stored="true" termVectors="true"/>
Any suggestions to get rid of the notice as well as to display the snippets ?
I replaced return implode(' ... ', $snippets) with return implode(' ... ', $snippets['content']) and it worked for me.
I can't guarantee that problems cannot arise.
Worth a try anyway.
Indexing failed on one of the following entity ids: node/2
"400" Status: ERROR: [doc=l8febs/node/2] unknown field 'language': ERROR: [doc=l8febs/node/2] unknown field 'language'
Error 400 ERROR: [doc=l8febs/node/2] unknown field 'language'
HTTP ERROR 400
Problem accessing /solr/update. Reason:
ERROR: [doc=l8febs/node/2] unknown field 'language'Powered by Jetty://
I am wondering what is the problem. My schema.xml file does not have a field named language, will that solve the problem? I have never needed this?
schema.xml
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="name" type="string" stored="true" indexed="true"/>
<!-- entity_id is the numeric object ID, e.g. Node ID, File ID -->
<field name="entity_id" type="long" indexed="true" stored="true" />
<!-- entity_type is 'node', 'file', 'user', or some other Drupal object type -->
<field name="entity_type" type="string" indexed="true" stored="true" required="true" />
<!-- bundle is a node type, or as appropriate for other entity types -->
<field name="bundle" type="string" indexed="true" stored="true"/>
<field name="bundle_name" type="string" indexed="true" stored="true"/>
<field name="text" type="text" stored="true" indexed="true"/>
<field name="site" type="string" indexed="true" stored="true"/>
<field name="hash" type="string" indexed="true" stored="true"/>
<field name="url" type="string" indexed="true" stored="true"/>
<!-- label is the default field for a human-readable string for this entity (e.g. the title of a node) -->
<field name="label" type="text" indexed="true" stored="true" termVectors="true" omitNorms="true"/>
<!-- The string version of the title is used for sorting -->
<copyField source="label" dest="sort_label"/>
<!-- content is the default field for full text search - dump crap here -->
<field name="content" type="text" indexed="true" stored="true" termVectors="true"/>
<field name="teaser" type="text" indexed="false" stored="true"/>
<field name="language" type="text_en" stored="true" indexed="true"/>
<field name="path" type="string" indexed="true" stored="true"/>
<field name="path_alias" type="text" indexed="true" stored="true" termVectors="true" omitNorms="true"/>
The apache solr 3.2 and the schema.xml is from the apaache solr integration module for drupal 7
solrconfig.xml is
<requestHandler name="dismax" class="solr.SearchHandler">
<lst name="defaults">
<str name="defType">dismax</str>
<str name="echoParams">explicit</str>
<bool name="omitHeader">true</bool>
</lst>
</requestHandler>
<!-- Note how you can register the same handler multiple times with
different names (and different init parameters)
-->
<requestHandler name="drupal" class="solr.SearchHandler">
<lst name="defaults">
<str name="defType">dismax</str>
<str name="echoParams">explicit</str>
<bool name="omitHeader">true</bool>
<float name="tie">0.01</float>
<str name="pf">
content^2.0
</str>
<int name="ps">15</int>
<!-- Abort any searches longer than 4 seconds -->
<!-- <int name="timeAllowed">4000</int> -->
<str name="mm">1</str>
<str name="q.alt">*:*</str>
<!-- example highlighter config, enable per-query with hl=true -->
<str name="hl">true</str>
<str name="hl.fl">content</str>
<int name="hl.snippets">3</int>
<str name="hl.mergeContiguous">true</str>
<!-- instructs Solr to return the field itself if no query terms are
found -->
<str name="f.content.hl.alternateField">teaser</str>
<str name="f.content.hl.maxAlternateFieldLength">256</str>
<!-- JS: I wasn't getting good results here... I'm turning off for now
because I was getting periods (.) by themselves at the beginning of
snippets and don't feel like debugging anymore. Without the regex is
faster too -->
<!--<str name="f.content.hl.fragmenter">regex</str>--> <!-- defined below -->
<!-- By default, don't spell check -->
<str name="spellcheck">false</str>
<!-- Defaults for the spell checker when used -->
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.extendedResults">false</str>
<!-- The number of suggestions to return -->
<str name="spellcheck.count">1</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
If you don't want fields, you can simple ignore the fields to prevent them causing error. e.g.
Define a ignored field type :-
<fieldtype name="ignored" stored="false" indexed="false" class="solr.StrField" />
And define a Dynamic field :-
<dynamicField name="some regex to capture all unwanted fields" type="ignored"/>