Cannot execute TREC customised file in Terrier - information-retrieval

Im having a problem to executing evaluation part of TREC file using terrier tools. I implement the query expansion in the TREC file, thus it gives me a weighting terms in the tag. What i want to do is input this customized TREC file in WT10G using terrier. I have succeed to index WT10G with terrier, therefore my next part is to retrieve an evaluation from this file.
Here is an example of modified TREC file:
<top>
<num> Number: 501
<title> peirce^570.66156
<desc> Description:
What is the difference between deduction and induction in the
process of reasoning?
<narr> Narrative:
A relevant document will contrast inductive and deductive reasoning.
A document that discusses only one or the other is not relevant.
</top>
When i try to input that file in Terrier, terrier process it as:
See the yellow rectangle. It treats as 2 inputs instead of 1 single input with weighted numbers. I read in the documentation that Terrier can do the weighted term query as its input http://terrier.org/docs/v3.5/querylanguage.html (i try the interactive part and it works with weighted term). Does anyone know how to solve this problem ?
Thank you

Related

RobotFramework: Purpose and best practice for the resource- and library-folders

I ponder on purpose and best practice for the Resource- and Library-folders usage in RobotFramework.
Below I have formulated some statements which serves to illustrate my questions. (Abbreviations used: KW = KeyWord, RF = RobotFramework, TS = TestSuite).
Statements/Questions:
Every KW, that is designed to be shared among TS, and written in RF-syntax, should be put inside a .resource-file in the Resource-folder?
Every KW written in Python should be put (as a method inside a .py-file) in the Library-folder?
I.e. the distinction-line between the Resource- and Library-folder is drawn based on syntax used when writing the KW (RF-KW go into Resource-folder and Python-KW go into Libraries-folder).
Or, should the distinction-line rather be drawn upon closeness to the test-rig and system under test. (i.e. High- or Low-level keywords. Where Low-level Keywords are said to be interact with the system under test). And hence you could place python KW (methods) in the Resource-folder?
My take - Yes on everything, even on the last paragraph with the "Or,". Everything up until it were questions on the on the content/syntax of a file. And if your python (library) file has KW-s that make contextual sense to be in a folder with other similar RF (resource) files - place it there.
Remember two things: for Robotframework the distinction between Resource and Library is mainly what syntax it is expecting & how to import the target's resources. It doesn't enforce any rigid expectations on its purpose.
E.g. nothing stops you of having a high-level keyword developed in python like
def login_with_user_and_do_complex_compund_action(user, pass, other_arg)
, nor to create a relatively low-level KW written in Robotframework syntax:
Keyword For Complex Math That Should Better Be In Python
[Arguments] ${complex_number} ${transformer_functuon} ${other_arg}
The other thing is Robotframework is the tool(-set) with which you construct your automated testing framework for the SUT. By your framework I mean the structure & organization of suites and tests, their interconnections and hierarchy, and - the "helpers" for their operations - the before-mentioned resource (RF) and library (py) files.
As long as this framework is logically sound, has established conventions and is easy to grasp & follow, you can have any structure you find suiting you.

First token could not be read or is not the keyword 'FoamFile' in OpenFOAM

I am a beginner to programming. I am trying to run a simulation of a combustion chamber using reactingFoam.
I have modified the counterflow2D tutorial.
For those who maybe don't know OpenFOAM, it is a programme built in C++ but it does not require C++ programming, just well-defining the variables in the files needed.
In one of my first tries I have made a very simple model but since I wanted to check it very well I set it to 60 seconds with a 1e-6 timestep.
My computer is not very powerful so it took me for a day aprox. (by this I mean I'd like to find a solution rather than repeating the simulation).
I executed the solver reactingFOAM using 4 processors in parallel using
mpirun -np 4 reactingFOAM -parallel > log
The log does not show any evidence of error.
The problem is that when I use reconstructPar it works perfectly but then I try to watch the results with paraFoam and this error is shown:
From function bool Foam::IOobject::readHeader(Foam::Istream&)
in file db/IOobject/IOobjectReadHeader.C at line 88
Reading "mypath/constant/reactions" at line 1
First token could not be read or is not the keyword 'FoamFile'
I have read that maybe some files are empty when they are not supposed to be so, but I have not found that problem.
My 'reactions' file have not been modified from the tutorial and has always worked.
edit:
Sorry for the vague question. I have modified it a bit.
A typical OpenFOAM dictionary file always contains a Foam::Istream named FoamFile. An example from a typical system/controlDict file can be seen below:
FoamFile
{
version 2.0;
format ascii;
class dictionary;
location "system";
object controlDict;
}
During the construction of the dictionary header, if this Istream is absent, OpenFOAM ceases its operation by raising an error message that you have experienced:
First token could not be read or is not the keyword 'FoamFile'
The benefit of the header is possibly to contribute OpenFOAM's abstraction mechanisms, which would be difficult otherwise.
As mentioned in the comments, adding the header entity almost always solves this problem.

Robot Framework (RIDE): Data Sanity Check failing

I am getting below error when I copy this value in my RIDE resource file in Robot Framework. Don't know what is wrong with this value.
I am able to save other 900 values in same format but not this one.
ERROR:Data Sanity Check Failed.Reset Changes?
Value I am trying to save is as below:
${MISDOB2ATTMM} Dobson to VF Migration (Manual)
Examples of correct values:
${CHILE} Chile
${CI} Cote d Ivoire (Ivory Coast)
${CN} China, Peoples Republic of
${COSTARICA} Costa recei
The error message you got was from the Text Editor (you did not mention this).
This could be because there may be invisible characters in that variable definition. Maybe indentation, or even a TAB symbol.
Also, that format for variable definition is correct for the section,
*** Variables ***
But not correct for keywords or test case sections.
I recommend to always use the '=' symbol for variables assignments outside the Variables section. For example:
${MISDOB2ATTMM}= Set Variable Dobson to VF Migration (Manual)

Accessing XML data in R that is several layers embedded

I'm working with a well-structured XML file. So far, I have successfully accessed elements of this dataset that are only one layer/subfield deep. However, now I need to access one type of data that is more deeply embedded within this data structure, and the expected method is not working...
Excerpt from the XML data; this is the "target" field that I need to access, where each node (i.e. drug) can have between 0 and N targets (I am arbitrarily setting N to 20 for now, since I'm not sure what this value is for the entire dataset):
<targets> --> 51st field in each node
<target> --> there are a variable number of targets per drug
<id>BE0000048</id> --> this is the value I want for each Target
<name>Prothrombin</name>
<organism>Human</organism>
<actions>
<action>inhibitor</action>
</actions>
<references>
<articles>
<article>
<pubmed-id>10505536</pubmed-id>
<citation>Turpie AG: Anticoagulants in acute coronary syndromes.
...
I have determined that the main Target field that I need is Field 51 within each node's structure, thus the hardcoded value below. I would think that accessing the i'th node's id value within the j'th target within the node's Target field should have an index of [[i]][[51]][[j]][[1]] or [[i]][[51]][[j]][['id']]:
This is my code that isn't working as expected:
Target <- array(1:NumNodes, dim=c(1,NumNodes,MaxTargets))
for (i in 1:NumNodes){
for (j in 1:MaxTargets){
Target[i][j] <- Data[[i]][[51]][[j]][[1]]
}
}
The behavior I'm seeing is that I can extend the subscripts out numerous levels on the command line, and never narrow the result any more than the following:
> Data[[1]][[51]][[1]][[1]][[1]][[1]][[1]][[1]][[1]][[1]]
[1] "BE0000048ProthrombinHumaninhibitor10505536Turpie AG: Anticoagulants...
It doesn't seem to matter how many subscripts I add; all of the fields in the Target subfield are always conjoined and don't seem to be able to be separated...
Confusingly, when I run my code, I get the following error message:
Error in Data[[i]][[51]][[1]] : subscript out of bounds
... which doesn't seem to make sense, given that I am limiting i to the number of nodes, and that there is no error thrown for even the ridiculously long list of subscripts show above, when I query that phrase on the command line...
Thanks in advance for any insights you can provide.
Thanks for your suggestion, cderv; I will plan to check out the xml2 package and XPATH. I really appreciate your willingness to provide an example.
I am pasting what should be a functional subset of my XML file; however, now instead of the "targets" field being the 51st field, it is the sixth. Again, it is the targets --> target --> id value that I want to report for each target, with each node having a variable number of target values. My code follows the XML content.
<?xml version="1.0" encoding="UTF-8"?>
<drugbank xmlns="http://www.drugbank.ca" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.drugbank.ca http://www.drugbank.ca/docs/drugbank.xsd" version="5.0" exported-on="2017-07-06">
<drug type="biotech" created="2005-06-13" updated="2016-08-17">
<drugbank-id primary="true">DB00001</drugbank-id>
<drugbank-id>BTD00024</drugbank-id>
<drugbank-id>BIOD00024</drugbank-id>
<name>Lepirudin</name>
<description>Lepirudin is identical to natural hirudin except for substitution of leucine for isoleucine at the N-terminal end of the molecule and the absence of a sulfate group on the tyrosine at position 63. It is produced via yeast cells. Bayer ceased the production of lepirudin (Refludan) effective May 31, 2012.</description>
<targets>
<target>
<id>BE0000048</id>
<name>Prothrombin</name>
<organism>Human</organism>
<actions>
<action>inhibitor</action>
</actions>
<references>
<articles>
<article>
<pubmed-id>10505536</pubmed-id>
<citation>Turpie AG: Anticoagulants in acute coronary syndromes. Am J Cardiol. 1999 Sep 2;84(5A):2M-6M.</citation>
</article>
<article>
<pubmed-id>10912644</pubmed-id>
<citation>Warkentin TE: Venous thromboembolism in heparin-induced thrombocytopenia. Curr Opin Pulm Med. 2000 Jul;6(4):343-51.</citation>
</article>
</articles>
</references>
<known-action>yes</known-action>
</target>
</targets>
</drug>
</drugbank>
Now that I have significantly truncated the above file, my code is now giving an error message that any subscripts above Data[[1]][[1]] are out of bounds, but hopefully this code gives you an idea of what I'm aiming to do...
library(XML)
# Save the database file as a tree structure
xmldata = xmlRoot(xmlTreeParse("DrugBank_TruncatedDatabase_v4_Tiny.xml"))
# Number of nodes in the entire database file
NumNodes <- xmlSize(xmldata)
MaxTargets <- 20
Data <- xmlSApply(xmldata, function(x) xmlSApply(x, xmlValue))
Target <- array(1:NumNodes, dim=c(1,NumNodes,MaxTargets))
for (i in 1:NumNodes){
for (j in 1:MaxTargets){
Target[i][j] <- Data[[i]][[5]][[j]][[1]]
}
}
Thanks for your input!

Are there any equivalent of C/C++ __FILE__ and __LINE__ macros in R?

I'm trying to get the equivalent of FILE or LINE macros in C or C++ in R (or S+). Any ideas?
FILE The presumed name of the current source file (a character string literal).
LINE The presumed line number (within the current source file) of the current source line (an integer constant).
As for context - I have log messages being flushed to console from different sections of the code, and given that the messages themselves are built at run-time, it is often very difficult to find out where this log message is coming from (with the size of the R code growing to many thousand lines and running on a distributed grid). However if I could dump the FILE and LINE number along with the log messages, it would be much easier to trace the logs...
Use the #line directive. The structure is #line nn "filename". See Duncan's Murdoch's article on source references for more.

Resources