Index XML attributes along with plain text in CLucene - xhtml

I've been able to compile the CLucene project on iOS and am currently trying to use it within my iOS application. I'm trying to index xhtml documents, and have been able to do that by extracting the text nodes out of those documents, and then index in lucene by concatenating them together, so as to all the text from one xhtml doc appears in a single Lucene document.
However, each text node of my xhtml document has custom attributes to it, so that when a search is made on the indexed text, I should be also be able to obtain the attribute associated with that text.
My xml data looks like:
<span data-value="/1/2/3">This is a sample text for this span</span>
<span data-value="/2/3/4">This is a example text for another span</span>
<span data-value="/3/4/5">Searching for this span text</span>
So when I search from the Lucene index for a word sample, then I should be able to retrieve the data-value attributed to the word Sample is associated. In the case above it will be data-value="/1/2/3".
The way I'm creating an index is by concatenating the data-value attribute and the text node field together, and then have it indexed by Lucene. This way, whenever my search results return, then it would also return the data-value attributed along with it. I can evaluate the attribute value, and at the time of searching will strip away this attribute from the display results altogether. However, this is not true for large text contained in a span text, where in the searched word(s) may be returned but the data-value attributes may not be part of the search results, which can further be stripped off while display.
However, I think this is not the optimal way of indexing XML attributes along with their text data.
I'd appreciate if someone can help me with the approach in order to index the relationship between the text and its attributes.
Update: I found that the tokens generated from the text can have payloads associated with them, so I'm thinking that if we can have the XML attribute built in as a payload for my entire string, which can be treated as a single token (if I don't analyze the text), can be useful for my purpose. I wanted to know if anyone can help me in figuring out if this is the right approach for my case. Many thanks for your help.
Thanks & regards,
Asheesh

If you want to keep all of XHTML text as one Lucene document, then payloads are probably the way to go.
An alternate approach is to create a document ID field (like "documentID:42" and a field denoting that this Lucene document is the whole document concatenated together (like "AllOfDocument:42"). This would let you index each text node individually and limit the attributes to just the attributes for that node, while still tying that text node to whole document. With that approach, you could put the attributes in their own field in the text node Lucene document rather than having to use payloads. Might be simpler.

Related

How to determine if Mammo image is MLO or CC?

I'm confused as to which tags need to be examined to determined if a Mammo image is MLO or CC.
On sample sets I have I see the relevant information in View Position(0018,5101) i.e. a value of "MLO".
However looking at the standard it refers to Partial View Code Sequence (0028,1352) which is a sequence and there are more than just MLO/CC values.
The correct tag to check is the View Code Sequence, not the Partial View Code Sequence (that is only for partial views, as the name suggests, which seems not to be your case). In the sequence, you have to check the Code Value for the respective code. For example, MLO images get the code R-10226, as can be seen in the table.
Only if that sequence is not present or empty, you have to check for fallback tags like View Position. View Position is defined for CR and DX images, not for MG images (and actually does not have MLO or CC as defined terms), but as always, derivations from the DICOM standard by some modalities have to be taken into account, and you have to check if the images you process conform to the standard.

Riak: searchable list of maps (with CRDTs)

I have a use-case which consist in storing several maps under an attribute. It's the equivalent JSON list:
[
{"tel": "+33123456789", "type": "work"},
{"tel": "+33600000000", "type": "mobile"},
{"tel": "+79001234567", "type": "mobile"}
]
Also, I'd like to have it searchable at the object level. For instance:
- search entries that have a mobile number
- search entries that have a mobile number and whose number starts with the string "+7"
Since Riak doesn't support sets of maps (but only sets of registers), I was wondering if there is a trick to achieve it. So far I've had 2 ideas
Map of maps
The idea is to generate (randomly?) keys for the objects, and store each object of the list in a parent map whose keys are the ones generated for this only purpose to have a key.
It seems to me that it doesn't allow to search the object (maps inside the parent map) because Riak Solr search requires the full path to the attribute. One cannot simply write the following Solr query: phone_numbers.*.tel:+7*. Also composed search (eg. search entries that have a mobile number and whose number starts with the string "+7") seem hard to achieve.
Sets with simulated multi-valued attributes
This solution consists in using a set and insert all the values of the object as a single string, with separators between them. Yes, it's a hack. An attribute value would look like: $tel:+79001234567$type:mobile$ with : as the attribute name-value separator and $ as the attribute separator.
Search could be feasible using the * wildcard (ugly, but still), except the problem with escaping the separators: what if the type attribute includes a :? Are they some separators that are accepted by Riak and would not be acceptable in a string (I'm thinking of control characters)?
In a nutshell
I'm looking for a solution whatever the hackiness of it, as long as it works. Any idea is welcome.

Display label based on, field on one data-source (singular) being within another data-source fields many

I am still learning, and looking for help on how to display a label based on one data-sources field value, being within another data-sources field value list.
I have one calculated table, displaying rows of documents within a folder, and wish to use a field representing the document number in that data-source, so that if it's ANYWHERE within another tables field it displays my label.
I've been trying to use projection as I think this is how to achieve it.
I can get it working based on both the current #datasouce.item.fieldnames but need it to base the calculation on all possible numbers in that tables field (Image below should make it easier to understand).
I expect that it has something to do with projections, but can't find anything within the learning templates or anywhere else to resolve the issue.
I think the following should work for you. For the 'Reserved' label have the following binding for the text property:
(#datasources.project_quotes.items..quotenumber).indexOf(#widget.datasource.item.Qnumber) !== -1 ? 'Reserved' : ''
I would suggest alternatively just to include a field in your calculated datasource and making the determination in your server script.

C - GtkTextBuffer text with tag to gchar

I am currently learning about GTKTextTags and the application to GtkTextView and GtkTextBuffer. I did notice this question but was not looking to export my data to a rich text file which I understood to be the main purpose of the question
I have an application which stores the contents of a GtkTextBuffer into a TEXT field of a SQLITE3 database. Having read the GtkTextWidget Overview and the documentation on GtkTextTag, I (mistakenly) understood that the tag system worked much like a mark up language such as XML/HTML.
I was under the impression (after setting the &start and &end GtkTextIters) when I called gtk_text_buffer_get_text (...) with gboolean include_hidden_chars set to TRUE I would essentially obtain a gchar* that would also include GtkTextTags so the string might look like <b>some text</b> (but obviously with GtkTextTag formatters not HTML). I realise now this is not the case.
Problem: I store the gchar* obtained from gtk_text_buffer_get_text(...) into the database TEXT field. At a later time, or when I reopen the application, I want to reload this data into the GtkTextBuffer and do so by retrieving the relevant TEXT field data from my database and setting text with gtk_text_buffer_set_text (...). At this point I discover all of the formatting tags are gone and formatting somewhat becomes moot. What I would like to be able to do is store the text from the GtkTextBuffer into the TEXT field of the database and when it reloads the formatting is retained.
Q: Is there a way to store both text and tags from a GtkTextBuffer into a SQLITE3 database so that when reloading this text to the GtkTextBuffer formating is retained?
I had considered using a BLOB field rather than a TEXT field in the database but was uncertain if there is a better way to achieve what I am after.
I would suggest using gtk_text_buffer_register_serialize_tagset() and then gtk_text_buffer_serialize() to get a byte array (guint8[]) that you can then read back into another text buffer later with gtk_text_buffer_deserialize().
I think you will have to use a BLOB field rather than TEXT, since the return value of gtk_text_buffer_serialize() is a byte array rather than a string.

Comma Separated check in asp.net

How to search every word separated by comma in textbox
Please refer above post
Its Working perfectly...But i have small issues.. when i enter in text box like c,c++,4-5 yrs it have to check in database like either c,c++ skills and 4-5 yrs experiecne and then the reult has to be shown... Burt as per ur query it just show results whether any one of keyword satisfy database ...I want to compare year also how? –
If you want that behavior, you have to program that behavior. One design is to have multiple input boxes: one where you check if any of the words exist, another where you check that all of the words exist. (Perhaps even another for an exact phrase match.) Another design possibility would be for you to develop a syntax to indicate optional and required words all within a single input box. The point is it is up to you.
After you've decided on a design, then you could write code that builds your query based on or matches on the optional words and and matches on the required. Something like this pseudocode
Select * From Table Where
(Field Like OptionalWord1 Or Field Like OptionalWord2 Or Field Like OptionalWord3)
And Field Like RequiredWord1
And Field Like RequiredWord2
(etc.)

Resources