How to search from database using solr - solr5

Solr Version : 5.0
So I am working on Solr for first time, and really not understand perfectly. Here what I did :-
I have created a core named - search
Then my schema.xml file has follwoing code :
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="simple" version="1.5">
<types>
<fieldtype name='string' class='solr.StrField' />
<fieldtype name='long' class='solr.TrieLongField' />
</types>
<fields>
<field name='id' type='int' required='true' indexed="true"/>
<field name='name' type='text' required='true' indexed="true"/>
</fields>
<uniqueKey>id</uniqueKey>
<defaultSearchField>fullText</defaultSearchField>
<solrQueryParser defaultOperator='OR' />
</schema>
solrconfig.xml :
<?xml version='1.0' encoding='UTF-8' ?>
<config>
<luceneMatchVersion>5.0.0</luceneMatchVersion>
<lib dir="../../../../dist/" regex="solr-dataimporthandler-.*\.jar" />
<requestHandler name="standard" class="solr.StandardRequestHandler" default='true' />
<requestHandler name="/update" class="solr.UpdateRequestHandler" />
<requestHandler name="/admin/" class="org.apache.solr.handler.admin.AdminHandlers" />
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">db-data-config.xml</str>
</lst>
</requestHandler>
<admin>
<defaultQuery>*:*</defaultQuery>
</admin>
</config>
db-data-config.xml :
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/solr"
user="root"
password="" />
<document>
<entity name="users" query="select id,name from users;" />
</document>
</dataConfig>
I have created a database on PHPmyadmin please find below SG :
when I clicked query on solr panel then it shows empty why ?
Can anyone help me on this, as I am new to solr search. What I am doing wrong ?

I dont see a field named "fulltext" in schema.xml but why its defined as the default search
<defaultSearchField>fullText</defaultSearchField>
change it
<defaultSearchField>name</defaultSearchField>
mention the fields in the data config xml
<field column="ID" name="id" />
<field column="NAME" name="name" />
your data-config should look alike
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/solr"
user="root"
password="" />
<document>
<entity name="users" query="select id,name from users">
<field column="ID" name="id" />
<field column="NAME" name="name" />
</entity>
</document>
</dataConfig>
add it as in schema.xml
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
</types>
<fields>
<field name='id' type='int' required='true' indexed="true" stored="true"/>
<field name='name' type='string' required='true' indexed="true" stored="true"/>
<fields>

Make the changes in your db-data-config.xml similar to what i have done
<entity name="city_masters" pk="city_id" query="SELECT delete_status as
city_masters_delete_status,city_id,country_id,city_name,city_updated from
city_masters>
<field column="city_id" name="id"/>
<field column="city_name" name="city_name" indexed="true" stored="true" />
<field column="country_id" name="country_id" indexed="true" stored="true" />
<field column="city_masters_delete_status" name="city_masters_delete_status"
indexed="true" stored="true" />
</entity>
You missed out the field column part.Add them like i have done for my code and it should work.If still doesnt work let me know

Related

Apache Tika is indexing HTTP response instead of document content

I'm using Solr 8.3 and Tika to index Wordpress (version 4.9.7) contents and attachments. Solr and Wordpress servers are in the same internal network in the company. Due to an organizational decision, I'm not using plugins such as WP-Solr and others (all of them good enough).
I wrote data-config.xml and managed-schema files, and uploaded them to Zookeeper. These files are updated in Solr admin interface. So I created a new collection, called wp, and indexed some files (in Solr admin interface, I set the range from 0 to 200).
So, when I query the contents, the meta fields are rightly indexed, but the conteudo_text and text fields return 301 HTTP Response (example bellow):
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":18,
"params":{
"q":"*:*",
"start":"0",
"rows":"1",
"_":"1580842143627"}},
"response":{"numFound":500,"start":0,"maxScore":1.0,"docs":[
{
"data_alteracao":"2019-09-06T11:05:10Z",
"conteudo":"Criação",
"titulo":"Criação",
"id":"37829",
"data_publicacao":"2019-09-06T11:04:55Z",
"url":"http://www.homolog.tjrs.jus.br/static/2019/09/estag-criacao.pdf",
"conteudo_text":["\nMoved Permanently\n\nThe document has moved here.\n\n\n\nApache Server at www.homolog.tjrs.jus.br Port 80\n\n"],
"_text_":["\nMoved Permanently\n\nThe document has moved here.\n\n\n\nApache Server at www.homolog.tjrs.jus.br Port 80\n\n"],
"_version_":1657631228775366656}]
}}
My data-config.xml:
<dataConfig>
<dataSource
type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://mysql-grid-homol.tjrs.gov.br:3306/wordpress"
user="usr"
password="pwd"
name="wpdb"
batchSize="-1"
readOnly="true"
/>
<dataSource
type="BinURLDataSource"
name="url_doc"
/>
<document name="docs">
<entity
dataSource="wpdb"
name="wp"
pk="ID"
query="
SELECT
post.Id ID,
post_title TITULO,
IF (post_content = '', post_title, post_content) CONTEUDO,
CONCAT
(
DATE_FORMAT(post.Post_date, '%Y-%m-%d'),
'T',
DATE_FORMAT(post.Post_date, '%H:%i:%s'),
'Z'
) DATA_PUBLICACAO,
CONCAT
(
DATE_FORMAT(post.Post_modified, '%Y-%m-%d'),
'T',
DATE_FORMAT(post.Post_modified, '%H:%i:%s'),
'Z'
)DATA_ALTERACAO,
CONCAT
(
'http:',
guid
) URL
FROM
wpw_posts post
LEFT JOIN wpw_postmeta postmeta
ON (postmeta.Post_id = post.Id AND postmeta.Meta_key = 'publico')
WHERE
post.Post_type IN ('page', 'noticia', 'evento', 'curso', 'sistema', 'classificado', 'discurso', 'attachment')
AND post.post_status = 'inherit'
AND post.post_mime_type like 'application%'
ORDER BY post.Post_date DESC
"
>
<field column="ID" name="id"/>
<field column="TITULO" name="titulo"/>
<field column="CONTEUDO" name="conteudo"/>
<field column="DATA_PUBLICACAO" name="data_publicacao" dateTimeFormat="DD/MM/YYYY'T'hh:mm:ss"/>
<field column="DATA_ALTERACAO" name="data_alteracao" dateTimeFormat="DD/MM/YYYY'T'hh:mm:ss"/>
<field column="URL" name="url"/>
<entity
name="arquivo"
dataSource="url_doc"
processor="TikaEntityProcessor"
url="${wp.URL}"
format="text"
onError="continue"
extractEmbedded="true"
>
<field column="text" name="conteudo_text" />
</entity>
</entity>
</document>
</dataConfig>
My managed-schema:
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="v2" version="1.6">
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<uniqueKey>id</uniqueKey>
<field name="titulo" type="string" indexed="true" stored="true" required="true" />
<field name="conteudo" type="string" indexed="true" stored="true" required="true" />
<field name="data_publicacao" type="date" indexed="true" stored="true" docValues="true"/>
<field name="data_alteracao" type="date" indexed="true" stored="true" docValues="true" />
<field name="url" type="string" indexed="true" stored="true" required="true" multiValued="false"/>
<field name="conteudo_text" type="text" indexed="true" stored="true" required="true" multiValued="true" default=" "/>
<field name="text" type="sem_aspas" indexed="true" stored="true" required="true" multiValued="true"/>
<field name="_version_" type="long" indexed="false" stored="false" />
<field name="_root_" type="string" indexed="true" stored="false" docValues="false" />
<field name="_text_" type="sem_aspas" indexed="true" stored="true" multiValued="true"/>
<!-- primitive types -->
<fieldType name="integer" class="solr.IntPointField" docValues="true"/>
<fieldType name="integers" class="solr.IntPointField" docValues="true" multiValued="true"/>
<fieldType name="long" class="solr.LongPointField" docValues="true"/>
<fieldType name="longs" class="solr.LongPointField" docValues="true" multiValued="true"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" docValues="true" />
<fieldType name="strings" class="solr.StrField" sortMissingLast="true" docValues="true" multiValued="true"/>
<fieldType name="date" class="solr.DatePointField" docValues="true"/>
<fieldType name="dates" class="solr.DatePointField" docValues="true" multiValued="true"/>
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
<fieldType name="booleans" class="solr.BoolField" sortMissingLast="true" multiValued="true"/>
<fieldType name="float" class="solr.FloatPointField" docValues="true" multiValued="false"/>
<fieldType name="floats" class="solr.FloatPointField" docValues="true" multiValued="true"/>
<fieldType name="double" class="solr.DoublePointField" docValues="true" multiValued="false"/>
<fieldType name="doubles" class="solr.DoublePointField" docValues="true" multiValued="true"/>
<fieldType name="binary" class="solr.BinaryField"/>
<copyField source="conteudo_text" dest="_text_" />
<fieldType name="sem_aspas" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" format="snowball" />
<filter class="solr.BrazilianStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" format="snowball" />
<filter class="solr.BrazilianStemFilterFactory"/>
<filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
</analyzer>
</fieldType>
<fieldType name="text" class="solr.TextField" positionIncrementGap="100" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
</schema>
Things I tried to solve the problem:
1) Change from BinURLDataSource to URLDataSource or FieldStreamDataSource;
2) Include, in BinURLDataSource definition, an user and password with permissions to access the files.
I'm a new user in Solr/Lucene and Tika technology (my 2nd project only), and any help is welcome.
Regards.

how to store customized Data for doctrine?

I work with symfony2 and there i use doctrine. As Example i have a simple repository-class in my doctrine xml-file:
<?xml version="1.0" encoding="utf-8"?>
<doctrine-mapping xmlns="http://doctrine-project.org/schemas/orm/doctrine-mapping" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://doctrine-project.org/schemas/orm/doctrine-mapping http://doctrine-project.org/schemas/orm/doctrine-mapping.xsd">
<entity name="AllgemeinBundle\Entity\ObjektPosition" table="objekt_position" repository-class="AllgemeinBundle\Repository\ObjektPositionRepository">
<indexes>
<index name="id_objekt_subunternehmer_position_fk2_idx" columns="id_subunternehmer"/>
<index name="id_objekt_objektposition_idx" columns="id_objekt"/>
</indexes>
<id name="id" type="integer" column="id">
<generator strategy="IDENTITY"/>
</id>
<field name="artikelnummer" type="integer" column="artikelnummer" nullable="false"/>
<field name="preisProEinheit" type="float" column="preis_pro_einheit" precision="10" scale="2" nullable="false"/>
<field name="p1Einheit" type="float" column="p1_einheit" precision="10" scale="2" nullable="false"/>
<field name="p2Einheit" type="float" column="p2_einheit" precision="10" scale="2" nullable="true"/>
<field name="p3Einheit" type="float" column="p3_einheit" precision="10" scale="2" nullable="true"/>
<field name="zusatztext" type="text" column="zusatztext" nullable="true"/>
<field name="position" type="integer" column="position" nullable="false"/>
<many-to-one field="idSubunternehmer" target-entity="Subunternehmer">
<join-columns>
<join-column name="id_subunternehmer" referenced-column-name="subunternehmernummer"/>
</join-columns>
</many-to-one>
<many-to-one field="idObjekt" target-entity="Objekt">
<join-columns>
<join-column name="id_objekt" referenced-column-name="id"/>
</join-columns>
</many-to-one>
when i am generating my entities from the database, then the repository-class would be deleted. Also some other thinks i added to the xml-file.
Is it able to save the customized data in a sperated folder or file, so that i can generate the entities as often as i like and the customized data wouldn't be lost?
If you are using console commands, when generating entities don't mention '--no-backup' at the end of the command. So, you will be able to preserve your entity class. However you entity class will be renamed with adding '~' sign at the start of the file name. Your entity generation code will be like this, (without trailing --no-backup)
php app/console doctrine:generate:entities
Hope this helps,
Cheers!

Configure Solr Index With File Metadata using TikaEntityProcessor & FieldStreamDataSource

I have created an index that pulls data from a SQL Server database using TikaEntityProcessor. The query associated with my configuration file pulls from a table containing file information, as well as the file content as a binary column. My index returns all fields from the database table in which I have configured, as well as the column for "text" that is the body of the file content. It correctly indexes the file text! However, the meta columns are not working! You can see I have a field for text/body, this works fine. However, I cannot get any metadata from the file such as last modified date or author.
Any suggestions would be greatly appreciated!!
data-config:
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
url="jdbc:sqlserver://server;databaseName=db1;integratedSecurity=false"
user="user"
password="XXXXXX" convertType="false"
name="ds"/>
<dataSource name="fieldReader"
type="FieldStreamDataSource" />
<document name="tika">
<entity name="tika" pk="id" transformer="TemplateTransformer" dataSource="ds"
query="select id, title from myDatabaseTable">
<entity name="tika-test" processor="TikaEntityProcessor" dataSource="fieldReader"
dataField="tika.FileContent" format="text">
<field column="text" name="body"/>
<field column="Last-Modified" name="lastModified" meta="true" /> <!-- not working -->
</entity>
</entity>
</document>
</dataConfig>
schema:
<field name="id" type="integer" indexed="true" stored="true" />
<field name="body" type="text" indexed="true" stored="true" />
<field name="lastModified" type="text" indexed="true" stored="true" />
<field name="title" type="text" indexed="true" stored="true" />
Thanks!!

search results displayed without snippets solr drupal

Hi,
I have a drupal 7 , apache solr integration module with solr attachments module. I never got that notice before but i am puzzled why the search snippets are not getting displayed. Looked at the code mentioned in the notice and did a print_r($snippets) and found that snippets where in the variable and just not getting displayed in the search results. what could be the reason for this ?
solrconfig.xml
<requestHandler name="drupal" class="solr.SearchHandler" default="true">
<lst name="defaults">
<str name="defType">dismax</str>
<str name="echoParams">explicit</str>
<bool name="omitHeader">true</bool>
<float name="tie">0.01</float>
<str name="pf">
content^2.0
</str>
<int name="ps">15</int>
<!-- Abort any searches longer than 4 seconds -->
<!-- <int name="timeAllowed">4000</int> -->
<str name="mm">1</str>
<str name="q.alt">*:*</str>
<!-- example highlighter config, enable per-query with hl=true -->
<str name="hl">true</str>
<str name="hl.fl">content</str>
<int name="hl.snippets">1</int>
<str name="hl.mergeContiguous">true</str>
<!-- instructs Solr to return the field itself if no query terms are
found -->
<str name="f.content.hl.alternateField">teaser</str>
<str name="f.content.hl.maxAlternateFieldLength">256</str>
<!-- JS: I wasn't getting good results here... I'm turning off for now
because I was getting periods (.) by themselves at the beginning of
snippets and don't feel like debugging anymore. Without the regex is
faster too -->
<!--<str name="f.content.hl.fragmenter">regex</str>--> <!-- defined below -->
<!-- By default, don't spell check -->
<str name="spellcheck">false</str>
<!-- Defaults for the spell checker when used -->
<str name="spellcheck.onlyMorePopular">false</str>
<str name="spellcheck.extendedResults">false</str>
<!-- The number of suggestions to return -->
<str name="spellcheck.count">1</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
This is the code from the notice which is in the apachesolr_attachments.module
enter code here
function theme_apachesolr_search_snippets__file($vars) {
$doc = $vars['doc'];
$snippets = $vars['snippets'];
$parent_entity_links = array();
// Retrieve our parent entities. They have been saved as
// a small serialized entity
foreach ($doc->zm_parent_entity as $parent_entity_encoded) {
$parent_entity = (object) drupal_json_decode($parent_entity_encoded);
$parent_entity_uri = entity_uri($parent_entity->entity_type, $parent_entity);
$parent_entity_uri['options']['absolute'] = TRUE;
$parent_label = entity_label($parent_entity->entity_type, $parent_entity);
$parent_entity_links[] = l($parent_label, $parent_entity_uri['path'], $parent_entity_uri['options']);
}
if (module_exists('file')) {
$file_type = t('!icon #filemime', array('#filemime' => $doc->ss_filemime, '!icon' => theme('file_icon', array('file' => (object) array('filemime' => $doc->ss_filemime)))));
}
else {
$file_type = t('#filemime', array('#filemime' => $doc->ss_filemime));
}
//print_r($snippets);echo "\n";
return implode(' ... ', $snippets) . '<span>' . $file_type . ' <em>attached to:</em>' . implode(', ', $parent_entity_links) . '</span>';
}
schema.xml
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="name" type="string" stored="true" indexed="true"/>
<!-- entity_id is the numeric object ID, e.g. Node ID, File ID -->
<field name="entity_id" type="long" indexed="true" stored="true" />
<!-- entity_type is 'node', 'file', 'user', or some other Drupal object type -->
<field name="entity_type" type="string" indexed="true" stored="true" />
<!-- bundle is a node type, or as appropriate for other entity types -->
<field name="bundle" type="string" indexed="true" stored="true"/>
<field name="bundle_name" type="string" indexed="true" stored="true"/>
<field name="text" type="text" stored="true" indexed="true"/>
<field name="site" type="string" indexed="true" stored="true"/>
<field name="hash" type="string" indexed="true" stored="true"/>
<field name="url" type="string" indexed="true" stored="true"/>
<!-- label is the default field for a human-readable string for this entity (e.g. the title of a node) -->
<field name="label" type="text" indexed="true" stored="true" termVectors="true" omitNorms="true"/>
<!-- The string version of the title is used for sorting -->
<copyField source="label" dest="sort_label"/>
<!-- content is the default field for full text search - dump crap here -->
<field name="content" type="text" indexed="true" stored="true" termVectors="true"/>
<field name="content_type" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="teaser" type="text" indexed="false" stored="true"/>
<field name="language" type="text_en" stored="true" indexed="true"/>
<field name="path" type="string" indexed="true" stored="true"/>
<field name="path_alias" type="text" indexed="true" stored="true" termVectors="true" omitNorms="true"/>
<field name="created" type="string" indexed="true" stored="true" termVectors="true"/>
<field name="Question" type="text" indexed="true" stored="true" termVectors="true"/>
<field name="Response" type="text" indexed="true" stored="true" termVectors="true"/>
<field name="Module" type="text" indexed="true" stored="true" termVectors="true"/>
<field name="Meets" type="text" indexed="true" stored="true" termVectors="true"/>
<field name="cat" type="string" indexed="true" stored="true" termVectors="true"/>
Any suggestions to get rid of the notice as well as to display the snippets ?
I replaced return implode(' ... ', $snippets) with return implode(' ... ', $snippets['content']) and it worked for me.
I can't guarantee that problems cannot arise.
Worth a try anyway.

Doctrine 2 unknown column type requested

I'm trying to update my doctrine schema with the command:
php app/console doctrine:schema:update --force
I'm getting this error:
[Doctrine\DBAL\DBALException]
Unknown column type requested.
I'm getting this error since I've updated the xml mapping of the user entity in the Sonata UserBundle like this:
<?xml version="1.0" encoding="UTF-8"?>
<doctrine-mapping xmlns="http://doctrine-project.org/schemas/orm/doctrine-mapping"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://doctrine-project.org/schemas/orm/doctrine-mapping
http://doctrine-project.org/schemas/orm/doctrine-mapping.xsd">
<entity name="Application\Sonata\UserBundle\Entity\User" table="fos_user_user" repository-class="Application\Sonata\UserBundle\Repository\UserRepository">
<id name="id" column="id" type="integer">
<generator strategy="AUTO" />
</id>
<field name="name" type="string" length="50" />
<field name="birthdate" type="date" />
<field name="natRanking" type="string" length="10" />
<field name="interNatRanking" type="string" length="10" nullable="true" />
<field name="natDoublesRanking" type="string" length="10" />
<field name="interNatDoublesRanking" type="string" length="10" nullable="true" />
<field name="doublesPartner" type="string" length="50" nullable="true" />
<field name="nationality" type="string" length="50" />
<field name="fileName" type="string" length="255" nullable="true" />
<field name="path" type="string" length="255" nullable="true" />
<field name="file" />
<many-to-many field="teams" target-entity="Tennisconnect\DashboardBundle\Entity\Team" mapped-by="players">
<join-table name="team_user">
<join-columns>
<join-column name="team_id" referenced-column-name="id"/>
</join-columns>
<inverse-join-columns>
<join-column name="user_id" referenced-column-name="id"/>
</inverse-join-columns>
</join-table>
</many-to-many>
<one-to-many field="my_friends" target-entity="Friend" mapped-by="friends_of_mine" />
<one-to-many field="friended_me" target-entity="Friend" mapped-by="friends_with_me" />
</entity>
Is the type of the field "file" missing?

Resources