Can't remove punctuation in Solr - drupal

I have a solr install to query content on a Drupal site. Many of the title fields have punctuation at the start of the string and so when I sort by title the punctuation appears top of the list.
I would like to get solr to ignore the the title when sorting by title but none of the solutions I have tried work.
I am fairly new to solr and so it may be something really simple that I am doing wrong... I don't really understand much of what is going on in the schema.xml file!
The title field is called label in solr and I have tried various methods in solr.PatternReplaceFilterFactory which do not work.
<field name="label" type="text" indexed="true" stored="true" termVectors="true" omitNorms="true"/>
<copyField source="label" dest="sort_label"/>
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory"
pattern="(^\p{Punct}+)" replacement="" replace="all"
/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
protected="protwords.txt"
generateWordParts="1"
generateNumberParts="1"
catenateWords="1"
catenateNumbers="1"
catenateAll="0"
splitOnCaseChange="0"
preserveOriginal="1"/>
<filter class="solr.LengthFilterFactory" min="2" max="100" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
…
</analyzer>
My query is
start=0&rows=25&q=education&fl=id%2Centity_id%2Centity_type%2Cbundle%2Cbundle_name%2Csort_label%2Css_language%2Cis_comment_count%2Cds_created%2Cds_changed%2Cscore%2Cpath%2Curl%2Cis_uid%2Ctos_name%2Czm_parent_entity%2Css_filemime%2Css_file_entity_title%2Css_file_entity_url&pf=content%5E2.0&&sort=sort_label%20asc

This is done with the WordDelimiterFilterFactory. Set generateWordParts=1. Add this filter to your
After modifying the schema.xml restart the server and re-index the data.
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
protected="protwords.txt"
generateWordParts="1"
generateNumberParts="1"
catenateWords="1"
catenateNumbers="1"
catenateAll="0"
splitOnCaseChange="0"
preserveOriginal="1"/>
<filter class="solr.LengthFilterFactory" min="2" max="100" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>

Related

Creating a custom "cross section" line (svg) for Recharts

I am currently using Recharts for React.
I am trying to recreate this line style:
This is what I've been able to achieve until now, but what I am missing is for the middle line to be a dashed line, and maybe for the start/end of the line to taper off.
I am pretty new to SVG world, so I am not sure how to proceed.
I've used a filter to create the current line.
Here is my current code:
<defs>
<filter id="crossSection">
<feMorphology in="SourceGraphic" result="a" operator="dilate" radius="3" />
<feMorphology in="SourceGraphic" result="b" operator="dilate" radius="1.8" />
<feComposite in="SourceGraphic" in2="a" result="aa" operator="xor" />
<feComposite in="aa" in2="b" operator="xor" />
</filter>
</defs>
<Line
strokeWidth={2}
yAxisId="section"
activeDot={{ r: 8 }}
dot={false}
type="monotone"
dataKey="section"
stroke="black"
filter="url(#crossSection)"
/>
Any insight or help would be greatly appreciated!

Apache Tika is indexing HTTP response instead of document content

I'm using Solr 8.3 and Tika to index Wordpress (version 4.9.7) contents and attachments. Solr and Wordpress servers are in the same internal network in the company. Due to an organizational decision, I'm not using plugins such as WP-Solr and others (all of them good enough).
I wrote data-config.xml and managed-schema files, and uploaded them to Zookeeper. These files are updated in Solr admin interface. So I created a new collection, called wp, and indexed some files (in Solr admin interface, I set the range from 0 to 200).
So, when I query the contents, the meta fields are rightly indexed, but the conteudo_text and text fields return 301 HTTP Response (example bellow):
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":18,
"params":{
"q":"*:*",
"start":"0",
"rows":"1",
"_":"1580842143627"}},
"response":{"numFound":500,"start":0,"maxScore":1.0,"docs":[
{
"data_alteracao":"2019-09-06T11:05:10Z",
"conteudo":"Criação",
"titulo":"Criação",
"id":"37829",
"data_publicacao":"2019-09-06T11:04:55Z",
"url":"http://www.homolog.tjrs.jus.br/static/2019/09/estag-criacao.pdf",
"conteudo_text":["\nMoved Permanently\n\nThe document has moved here.\n\n\n\nApache Server at www.homolog.tjrs.jus.br Port 80\n\n"],
"_text_":["\nMoved Permanently\n\nThe document has moved here.\n\n\n\nApache Server at www.homolog.tjrs.jus.br Port 80\n\n"],
"_version_":1657631228775366656}]
}}
My data-config.xml:
<dataConfig>
<dataSource
type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://mysql-grid-homol.tjrs.gov.br:3306/wordpress"
user="usr"
password="pwd"
name="wpdb"
batchSize="-1"
readOnly="true"
/>
<dataSource
type="BinURLDataSource"
name="url_doc"
/>
<document name="docs">
<entity
dataSource="wpdb"
name="wp"
pk="ID"
query="
SELECT
post.Id ID,
post_title TITULO,
IF (post_content = '', post_title, post_content) CONTEUDO,
CONCAT
(
DATE_FORMAT(post.Post_date, '%Y-%m-%d'),
'T',
DATE_FORMAT(post.Post_date, '%H:%i:%s'),
'Z'
) DATA_PUBLICACAO,
CONCAT
(
DATE_FORMAT(post.Post_modified, '%Y-%m-%d'),
'T',
DATE_FORMAT(post.Post_modified, '%H:%i:%s'),
'Z'
)DATA_ALTERACAO,
CONCAT
(
'http:',
guid
) URL
FROM
wpw_posts post
LEFT JOIN wpw_postmeta postmeta
ON (postmeta.Post_id = post.Id AND postmeta.Meta_key = 'publico')
WHERE
post.Post_type IN ('page', 'noticia', 'evento', 'curso', 'sistema', 'classificado', 'discurso', 'attachment')
AND post.post_status = 'inherit'
AND post.post_mime_type like 'application%'
ORDER BY post.Post_date DESC
"
>
<field column="ID" name="id"/>
<field column="TITULO" name="titulo"/>
<field column="CONTEUDO" name="conteudo"/>
<field column="DATA_PUBLICACAO" name="data_publicacao" dateTimeFormat="DD/MM/YYYY'T'hh:mm:ss"/>
<field column="DATA_ALTERACAO" name="data_alteracao" dateTimeFormat="DD/MM/YYYY'T'hh:mm:ss"/>
<field column="URL" name="url"/>
<entity
name="arquivo"
dataSource="url_doc"
processor="TikaEntityProcessor"
url="${wp.URL}"
format="text"
onError="continue"
extractEmbedded="true"
>
<field column="text" name="conteudo_text" />
</entity>
</entity>
</document>
</dataConfig>
My managed-schema:
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="v2" version="1.6">
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<uniqueKey>id</uniqueKey>
<field name="titulo" type="string" indexed="true" stored="true" required="true" />
<field name="conteudo" type="string" indexed="true" stored="true" required="true" />
<field name="data_publicacao" type="date" indexed="true" stored="true" docValues="true"/>
<field name="data_alteracao" type="date" indexed="true" stored="true" docValues="true" />
<field name="url" type="string" indexed="true" stored="true" required="true" multiValued="false"/>
<field name="conteudo_text" type="text" indexed="true" stored="true" required="true" multiValued="true" default=" "/>
<field name="text" type="sem_aspas" indexed="true" stored="true" required="true" multiValued="true"/>
<field name="_version_" type="long" indexed="false" stored="false" />
<field name="_root_" type="string" indexed="true" stored="false" docValues="false" />
<field name="_text_" type="sem_aspas" indexed="true" stored="true" multiValued="true"/>
<!-- primitive types -->
<fieldType name="integer" class="solr.IntPointField" docValues="true"/>
<fieldType name="integers" class="solr.IntPointField" docValues="true" multiValued="true"/>
<fieldType name="long" class="solr.LongPointField" docValues="true"/>
<fieldType name="longs" class="solr.LongPointField" docValues="true" multiValued="true"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" docValues="true" />
<fieldType name="strings" class="solr.StrField" sortMissingLast="true" docValues="true" multiValued="true"/>
<fieldType name="date" class="solr.DatePointField" docValues="true"/>
<fieldType name="dates" class="solr.DatePointField" docValues="true" multiValued="true"/>
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
<fieldType name="booleans" class="solr.BoolField" sortMissingLast="true" multiValued="true"/>
<fieldType name="float" class="solr.FloatPointField" docValues="true" multiValued="false"/>
<fieldType name="floats" class="solr.FloatPointField" docValues="true" multiValued="true"/>
<fieldType name="double" class="solr.DoublePointField" docValues="true" multiValued="false"/>
<fieldType name="doubles" class="solr.DoublePointField" docValues="true" multiValued="true"/>
<fieldType name="binary" class="solr.BinaryField"/>
<copyField source="conteudo_text" dest="_text_" />
<fieldType name="sem_aspas" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" format="snowball" />
<filter class="solr.BrazilianStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" format="snowball" />
<filter class="solr.BrazilianStemFilterFactory"/>
<filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
</analyzer>
</fieldType>
<fieldType name="text" class="solr.TextField" positionIncrementGap="100" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
</schema>
Things I tried to solve the problem:
1) Change from BinURLDataSource to URLDataSource or FieldStreamDataSource;
2) Include, in BinURLDataSource definition, an user and password with permissions to access the files.
I'm a new user in Solr/Lucene and Tika technology (my 2nd project only), and any help is welcome.
Regards.

Drupal 8 - Solr 6.6.x Does not split fulltext field type values into mutliple words in spell

I'm trying to setup a Solr 6.6.x schema.xml to work with a Drupal 8 (using Search API Solr), the main problem is indexed values are not split into multiple words (like : "Web developer" must be split in "Web" and "Developer" in spell) but it's not working, see below my schema.xml, any tips ?
Screenshot of indexed examples in Solr UI
Note: All my fields using this config :
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<!-- Case insensitive stop word removal. -->
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
/>
<!--<filter class="solr.WordDelimiterGraphFilterFactory" />-->
<filter class="solr.WordDelimiterGraphFilterFactory"
protected="protwords.txt"
generateWordParts="1"
generateNumberParts="1"
catenateWords="0"
catenateNumbers="0"
catenateAll="0"
splitOnCaseChange="1"
preserveOriginal="1"/>
<filter class="solr.LengthFilterFactory" min="2" max="100" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="100"/>
</fieldType>
https://gist.github.com/anonymous/af00ed3052fc6e85140ed5ec9842e4bf

How to go about debugging in Media Foundation

The question
Which tools or code constructs (like defines) are the ones that should be used in this area and how to get them to work? Is there something I as a new person to media foundation should do before asking my question here to avoid simple mistakes?
The question is not "what is your favorite tool, lets fight over who is right" but simply, in regards to this media foundation framework, which options are according your expertise in the area worth considering for debugging and how do I use them?
Background to why I am asking this
Looking around on stackoverflow it seems that some questions are asked without knowledge of how to properly debug Media Foundation applications. I some cases a specific question gets an answer that states that OP should use MFTrace 1, 2. I also belive that my earlier questions here would have been helped out by using proper debugging tools or traces specific to media foundation.
Things I as someone new to this framework have encountered
I myself have not even been able to get MFTrace or Event Viewer to work, both tools that are mentioned in the official media foundation blog.
The documentation for how to get the MFTrace is lacking, is it only available in the old win 7 SDK on .NET 4.0 which is referred to here? Or can one use a newer SDK? Installing the older win 7 SDK involves some pain points on windows 10 (first change regedit values, how to do that, new error -> SO suggest to look at the log and maybe uninstall any existing Visual C++ 2010 redistrubutable.)
It would be nice to know if this is something you have to go through, in which case I will, or if MFTrace can be found elsewhere.
I did not get any logs from the Event Viewer. But maybe one should skip that tool altogether and only use MFTrace since the official blog says the following?
However, MFTrace is much more powerful, and collects way more information, than Event viewer. source
Besides tools, is there no?
#define MF_TRACE_LEVEL 15
In this ms blog post they mention EventWriteString and a few TRACE_LEVEL defines. Is this something that is useful outside MFTrace?
Usually I use the following:
VS Debugger, with debug logging via OutputDebugString for most work. It works quite well, even with the asynchronous nature of media foundation.
MFTrace for detailed analysis in hard to analyze cases. This often involves looking up obscure GUIDS in MFAPI.h
Occasionally TOPOEDIT is helpful in testing things out. It's not nearly as capable as GraphEdit though.
Running the Microsoft Media Foundation SDK Samples, or have a look at MFNode. The samples from the Developing Microsoft Media Foundation book are also downloadable from the web. Be aware that some SDK samples have been obsoleted, if you need to look at those you may need to download older SDKs until you find them. There are more samples floating around. Find them.
Look on Stack Overflow, or on the MSDN Media Foundation Forum. Pay close attention to any answers from Roman Ryltsov ;)
If you are new to COM development, be sure to read The N habits of Highly Defective DirectShow Applications. Although it's direct Show specific, a lot of it still applies. In particular: Use and understand CCOMPtr and when you need to use mutexes.
Running MFTRACE:
MFTrace is not pretty, but it does not end up being hard after it gets figured out. The MS Blog entries referenced at the end helped a lot, as does the Text Analysis Tool.
Start an ADMIN command prompt.
run MFTRACEPATH.bat to add the MFTRACE.EXE location to the path
cd {YourExecutableLoc}
run MFTRACECALLER.BAT {YOUREXECUTABLENAME} (Without any Extension)
load YOUREXECTUABLENAME.TXT into the Text Analysis tool to help filter the output.
Occasionally MFTRACEParseTopologies.bat, mentioned in the MSDN blogs is useful.
I use these .bat scripts to run MFTRACE (Remember: Use ADMIN command prompt!)
MFTRACEPATH.BAT:
#echo off
Echo MFTracePath.bat adds MFTrace to path
SET _NT_SYMBOL_PATH=C:\Users\sschi\AppData\Local\Temp\SymbolCache;%QTDIR%\bin
SET PATH=%PATH%;%PROGRAMFILES(x86)%\Windows Kits\10\bin\x86
cd {your Binary Folder}
echo run MFTraceCaller CapstoneDebug next
MFTRACECALLER.BAT
#echo off
SET exFile=MYEXECUTABLEFILENAME
if '%%1' == '' goto start
set exFile=%1
:START
echo Starting MFTRACE using %exFile%, saving output to %exFile%.txt
#echo on
mftrace -es -k all -l 4 -o %exFile%.txt %exFile%.exe %2 %3 %4 %5
#echo off
echo.
echo Trace completed - output is in %exFile.Txt%
echo.
echo Post Processing is available using
echo MFTraceParseTimeStamps.bat
echo MFTraceParseTopologies.bat
echo a) Open %exfile%.txt in TextAnalysisTool
echo b) Load TextAnalysisToolDebugFilters.tat
The text analysis tool helps a lot to filter the mountain of output. Use "File", "Load Filters" to load the filters as needed You can turn individual filters on and off to help zero in on what you are doing. Also, higher filters override lower ones, so for example, the text "Error" in a line overrides everything below it. Also, turn on any "OutputDebugString" logging in your file, it will appear in the traced output.
Below is my 'kitchen sink' filters file. Turn off everything but the red error traces to start.
FILTERS.TAT:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<TextAnalysisTool.NET version="2016-06-16" showOnlyFilteredLines="False">
<filters>
<filter enabled="y" excluding="n" description="" foreColor="ff0000" type="matches_text" case_sensitive="n" regex="n" text="Error" />
<filter enabled="y" excluding="n" description="" foreColor="000000" backColor="ffa500" type="matches_text" case_sensitive="n" regex="n" text="Warning" />
<filter enabled="n" excluding="n" description="" backColor="ffa500" type="matches_text" case_sensitive="n" regex="n" text="Process Frame" />
<filter enabled="y" excluding="n" description="" backColor="90ee90" type="matches_text" case_sensitive="n" regex="n" text="MESessionStart" />
<filter enabled="y" excluding="n" description="" backColor="90ee90" type="matches_text" case_sensitive="n" regex="n" text="MESessionStopped" />
<filter enabled="y" excluding="n" description="" backColor="90ee90" type="matches_text" case_sensitive="n" regex="n" text="MESessionPaused" />
<filter enabled="y" excluding="n" description="" backColor="90ee90" type="matches_text" case_sensitive="n" regex="n" text="UpdatePendingCommands" />
<filter enabled="y" excluding="n" description="" backColor="90ee90" type="matches_text" case_sensitive="n" regex="n" text="SetPositionInternal" />
<filter enabled="n" excluding="n" description="" foreColor="008000" backColor="add8e6" type="matches_text" case_sensitive="n" regex="n" text="Scrub" />
<filter enabled="y" excluding="n" description="" foreColor="008000" backColor="d3d3d3" type="matches_text" case_sensitive="n" regex="n" text="<<<<<< " />
<filter enabled="y" excluding="n" description="" foreColor="008080" backColor="f0e68c" type="matches_text" case_sensitive="n" regex="n" text="RequestSample" />
<filter enabled="y" excluding="n" description="" foreColor="006400" type="matches_text" case_sensitive="n" regex="n" text="MESession" />
<filter enabled="y" excluding="n" description="" foreColor="008000" type="matches_text" case_sensitive="n" regex="n" text="MFStartup" />
<filter enabled="y" excluding="n" description="" foreColor="008000" type="matches_text" case_sensitive="n" regex="n" text="MFShutdown" />
<filter enabled="y" excluding="n" description="" foreColor="800080" type="matches_text" case_sensitive="n" regex="n" text="Grabber" />
<filter enabled="y" excluding="n" description="" foreColor="800080" type="matches_text" case_sensitive="n" regex="n" text="Seek" />
<filter enabled="y" excluding="n" description="" foreColor="d2691e" type="matches_text" case_sensitive="n" regex="n" text="GraphBuilder" />
<filter enabled="y" excluding="n" description="" foreColor="2e8b57" type="matches_text" case_sensitive="n" regex="n" text="MF_TOPOLOGY" />
<filter enabled="y" excluding="n" description="" foreColor="2e8b57" type="matches_text" case_sensitive="n" regex="n" text="MF_TOPONODE" />
<filter enabled="y" excluding="n" description="" foreColor="2e8b57" type="matches_text" case_sensitive="n" regex="n" text="MF_TRANSFORM" />
<filter enabled="y" excluding="n" description="" foreColor="5f9ea0" type="matches_text" case_sensitive="n" regex="n" text="CurrentPosition" />
<filter enabled="y" excluding="n" description="" foreColor="5f9ea0" type="matches_text" case_sensitive="n" regex="n" text="CMFMediaSession" />
<filter enabled="y" excluding="n" description="" foreColor="0000ff" type="matches_text" case_sensitive="n" regex="n" text="OutputDebugString" />
<filter enabled="n" excluding="n" description="" foreColor="b22222" type="matches_text" case_sensitive="n" regex="n" text="MF_SOURCE_READER" />
<filter enabled="n" excluding="n" description="" foreColor="008080" type="matches_text" case_sensitive="n" regex="n" text="CoCreateInstance" />
<filter enabled="y" excluding="n" description="" foreColor="008b8b" type="matches_text" case_sensitive="n" regex="n" text="MeStream" />
<filter enabled="y" excluding="n" description="" foreColor="008b8b" type="matches_text" case_sensitive="n" regex="n" text="MESource" />
<filter enabled="y" excluding="n" description="" foreColor="008b8b" type="matches_text" case_sensitive="n" regex="n" text="MFT_MESSAGE" />
<filter enabled="y" excluding="n" description="" foreColor="008080" type="matches_text" case_sensitive="n" regex="n" text="MF_MT_SUBTYPE" />
<filter enabled="y" excluding="n" description="" foreColor="008b8b" type="matches_text" case_sensitive="n" regex="n" text="Sample" />
<filter enabled="y" excluding="n" description="" foreColor="008b8b" type="matches_text" case_sensitive="n" regex="n" text="ProcessInput" />
<filter enabled="y" excluding="n" description="" foreColor="008b8b" type="matches_text" case_sensitive="n" regex="n" text="ProcessOutput" />
<filter enabled="y" excluding="n" description="" foreColor="008000" type="matches_text" case_sensitive="n" regex="n" text="OnClock" />
<filter enabled="y" excluding="n" description="" foreColor="b22222" type="matches_text" case_sensitive="n" regex="n" text="Met=" />
</filters>
</TextAnalysisTool.NET>
Additional References
Introduction to Text Analysis Tool
textanalysistool.github.io
Microsoft Media Foundation Blog Entries

How to search from database using solr

Solr Version : 5.0
So I am working on Solr for first time, and really not understand perfectly. Here what I did :-
I have created a core named - search
Then my schema.xml file has follwoing code :
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="simple" version="1.5">
<types>
<fieldtype name='string' class='solr.StrField' />
<fieldtype name='long' class='solr.TrieLongField' />
</types>
<fields>
<field name='id' type='int' required='true' indexed="true"/>
<field name='name' type='text' required='true' indexed="true"/>
</fields>
<uniqueKey>id</uniqueKey>
<defaultSearchField>fullText</defaultSearchField>
<solrQueryParser defaultOperator='OR' />
</schema>
solrconfig.xml :
<?xml version='1.0' encoding='UTF-8' ?>
<config>
<luceneMatchVersion>5.0.0</luceneMatchVersion>
<lib dir="../../../../dist/" regex="solr-dataimporthandler-.*\.jar" />
<requestHandler name="standard" class="solr.StandardRequestHandler" default='true' />
<requestHandler name="/update" class="solr.UpdateRequestHandler" />
<requestHandler name="/admin/" class="org.apache.solr.handler.admin.AdminHandlers" />
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">db-data-config.xml</str>
</lst>
</requestHandler>
<admin>
<defaultQuery>*:*</defaultQuery>
</admin>
</config>
db-data-config.xml :
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/solr"
user="root"
password="" />
<document>
<entity name="users" query="select id,name from users;" />
</document>
</dataConfig>
I have created a database on PHPmyadmin please find below SG :
when I clicked query on solr panel then it shows empty why ?
Can anyone help me on this, as I am new to solr search. What I am doing wrong ?
I dont see a field named "fulltext" in schema.xml but why its defined as the default search
<defaultSearchField>fullText</defaultSearchField>
change it
<defaultSearchField>name</defaultSearchField>
mention the fields in the data config xml
<field column="ID" name="id" />
<field column="NAME" name="name" />
your data-config should look alike
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/solr"
user="root"
password="" />
<document>
<entity name="users" query="select id,name from users">
<field column="ID" name="id" />
<field column="NAME" name="name" />
</entity>
</document>
</dataConfig>
add it as in schema.xml
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
</types>
<fields>
<field name='id' type='int' required='true' indexed="true" stored="true"/>
<field name='name' type='string' required='true' indexed="true" stored="true"/>
<fields>
Make the changes in your db-data-config.xml similar to what i have done
<entity name="city_masters" pk="city_id" query="SELECT delete_status as
city_masters_delete_status,city_id,country_id,city_name,city_updated from
city_masters>
<field column="city_id" name="id"/>
<field column="city_name" name="city_name" indexed="true" stored="true" />
<field column="country_id" name="country_id" indexed="true" stored="true" />
<field column="city_masters_delete_status" name="city_masters_delete_status"
indexed="true" stored="true" />
</entity>
You missed out the field column part.Add them like i have done for my code and it should work.If still doesnt work let me know

Resources