MarkLogic: How to omit search facet values that are empty?

MarkLogic: How to omit search facet values that are empty? - xquery

Using the default search endpoint and passing in an options file, I have a path range index set up on a particular element /path/to/my/element/.
I'm currently being returned the following in my facet:
<search:facet name="facetName" type="xs:string">
<search:facet-value name="" count="3"/>
<search:facet-value name="real value 1" count="1">real value 1</search:facet-value>
<search:facet-value name="real value 2" count="1">real value 2</search:facet-value>
</search:facet>
My desired output:
<search:facet name="facetName" type="xs:string">
<search:facet-value name="real value 1" count="1">real value 1</search:facet-value>
<search:facet-value name="real value 2" count="1">real value 2</search:facet-value>
</search:facet>
There's facet values being returned for empty elements, which I do not want. Is there a way to get rid of those empty facet values without having to write another transform to apply to the search results?

The only sensible way would be to remove empty elements from your documents, if that is an option.
HTH!

For desire output. You store generate output in variable and add condition as per mention below.
declare namespace search="http://www.search.com";
let $var1:=<search:facet name="facetName" type="xs:string">
<search:facet-value name="" count="3"/>
<search:facet-value name="real value 1" count="1">real value 1</search:facet-value>
<search:facet-value name="real value 2" count="1">real value 2</search:facet-value>
</search:facet>
return for $in in $var1
return
<search:facet name="facetName" type="xs:string">{
$in//search:facet-value[not(#name='')]}
</search:facet>

Related

Airflow SqlToS3Operator has unwanted an index in the beginning

Recent airflow-providers-amazon has deprecated MySQLToS3Operator and introduced SqlToS3Operator and now it is adding an index column in the beginning of the CSV dump.
For example, if I run the following
sql_to_s3_task = SqlToS3Operator(
task_id="sql_to_s3_task",
sql_conn_id=conn_id_name,
query="SELECT created_at, score FROM my_table",
s3_bucket=bucket_name,
s3_key=key,
replace=True,
)
The S3 file has something like this:
,created_at,score
1,2023-01-01,5
2,2023-01-02,6
The output seems to be a direct dump from Pandas. How can I remove this unwanted preceding index column?

The operator uses pandas DataFrame under the hood.
You should use pd_kwargs. It allows you to pass arguments to include in DataFrame .to_parquet(), .to_json() or .to_csv().
Since your output is csv the relevant pandas.DataFrame.to_csv parameters are:
header: bool or list of str, default True
Write out the column names. If a list of strings is given it is assumed to be aliases for the column names.
index: bool, default True
Write row names (index).
Thus you can do:
sql_to_s3_task = SqlToS3Operator(
task_id="sql_to_s3_task",
sql_conn_id=conn_id_name,
query="SELECT created_at, score FROM my_table",
s3_bucket=bucket_name,
s3_key=key,
replace=True,
file_format="csv",
pd_kwargs={"index": False, "header": False},
)

chanaging standard_name and long_name with NCO

I calculated the potential temperaure from a NetCDF file. I would like to change standard_name and long_name with NCO.
I have tried some commands without sucess, e.g.:
> ncatted -a name,Temperature,o,c,"Potential_Temperature" pt_19891020-19891022.nc
ncatted: ERROR File contains no variables or groups that match name Temperature so attribute name cannot be changed
> ncrename -a air_temperature,air_potential_temperature -a Temperature,Potential_Temperature pt_19891020-19891022.nc
ncrename: ERROR Required attribute 'air_temperature' not present in group '/'.
HINT: If attribute presence is intended to be optional, then prefix attribute name with the period character '.', e.g., .air_temperature. With this syntax ncrename would succeed even when no variables or groups contain the attribute. If the attribute is intended to be renamed only in a specific variable, then prepend the variable name plus an at-sign '#' to the attribute name, e.g., var_nm#att_nm. If attribute presence is required only for root group (i.e., a global attribute), then prefix attribute name with "global" and an at-sign, e.g., global#att_nm. If attribute presence is required for all groups, then prefix attribute name with "group" and an at-sign, e.g., group#att_nm.
Current file attributes:
$ cdo showattsvar pt_19891020-19891022.nc
pt:
standard_name = "air_temperature"
long_name = "Temperature"
units = "K"
missing_value = -32767
Desired attributes:
pt:
standard_name = "air_potential_temperature"
long_name = "Potential_Temperature"
units = "K"
missing_value = -32767

These are attributes so ncatted is the correct tool, and the documentation is here with examples of the correct syntax:
ncatted -a standard_name,pt,o,c,air_potential_temperature -a long_name,pt,o,c,Potential_temperature pt_19891020-19891022.nc

Regex find the string between last two quotes " "?

For example, this is my string -> abcd 1234abcda="author 1" content="author 2.">\n
I only want the string author 2. by using the function str_extract() in R. How can I use regex to do that? Thank you so much.

You can use :
string = 'abcd 1234abcda="author 1" content="author 2.">\n'
sub('.*"(.*)".*', '\\1', string)
#[1] "author 2."
With str_match
library(stringr)
str_match(string, '.*"(.*)"')[, 2]
Another option is to extract all the values with "author" followed by a number and select the last one using tail.
tail(str_extract_all(string, 'author \\d+')[[1]], 1)

Decimals (involuntarily) trimmed from values when loaded into environment

I'm working with R 3.6.1 in Rstudio 1.2.1335.
When I assign the following value from a column in my data frame, the values that have decimals in that column in the dataframe, get trimmed in the value I assign:
Dataframe$Column1 has values [368.121 376.436]
Value <-- Dataframe$Column1
And I run my code chunk;
The environment shows the column values as: Value num [1:2] 368 376
My decimals have gone and I need those. Why does this happen and is there a way to fix it?
EDIT:
Set_1.
380.283 332.108 327.405 371.570 325.832 345.583 396.377 367.020 428.980 389.524 379.597 407.483 456.271 312.084 391.198 345.813 406.229 346.450 459.307 392.321 337.638 429.377 353.705 377.512 384.921 346.471 411.855 368.406 386.921 397.797 322.416 412.042 383.240 381.244 440.021 372.444 399.301 345.395 359.865 355.449 314.270 453.173 329.055 299.674 351.675 324.334 425.205 437.013 513.334 436.452 335.658 422.669 300.030 287.893 380.611 297.890 351.203 317.065 350.824 269.149 389.509 467.375 399.065 354.954 465.086 353.615 336.454 372.067 424.167 389.172 357.799 321.663 353.633 388.465 342.489 353.487 398.721 416.194 383.376 355.553 398.667 339.722 316.240 383.894 453.429 351.443 460.038 348.860 304.085 258.921
264.107 241.861 278.548 455.216 393.201 348.211 359.426 427.194 391.599 381.335 340.558 369.617 351.342 318.718 338.960 386.547 388.872 283.943 340.501
Set_2:
380.603 332.100 327.391 371.540 325.826 345.602 396.386 367.029 428.949 389.545 379.584 407.454 456.276 312.093 391.414 345.861 406.235 346.259 459.284 392.334 337.626 429.283 353.539 377.568 384.941 346.491 411.820 368.253 386.816 397.723 322.337 412.020 383.158 381.331 440.066 372.361 399.210 345.438 359.948 355.425 314.271 453.169 328.751 299.701 351.388 324.371 425.219 436.906 513.384 436.475 335.508 422.661 300.036 287.908 380.453 297.306 351.275 317.206 351.165 269.122 389.499 467.402 399.136 354.943 465.057 353.593 336.549 372.079 424.062 389.119 357.753 321.758 353.650 388.599 342.285 353.507 398.682 416.289 383.309 355.456 398.816 339.681 316.273 383.898 453.418 351.395 460.027 348.731 304.111 258.452
264.298 241.829 278.297 455.104 393.228 348.117 359.645 427.096 391.526 381.260 340.474 369.791 351.061 318.780 338.949 386.458 389.030 284.093 340.512
Code:
plot(Set_1,Set_2,col = "red", xlab="Set_1", ylab = "Set_2",
main = "Comparison Set_1 and Set_2", type = 'p')
abline(fit5<-lm(Set_2~Set_1), col="blue")
r5<-round(summary(fit5)$adj.r.square, 4)
text(410,330, paste("R2=",r5))

The decimals aren't gone, they are just not shown in your enviroment. Try accessing the values by Value[1]. This clearly gives you your desired result 368.121.

range index on mixed content node in exist db

My xml file is with the structure
<root>
<compound>abc<parts>a b c</parts></compound>
<compound>xyz<parts>x y z</parts></compound>
</root>
I have created a range index on
<range>
<create qname="compound" type="xs:string"/>
</range>
I expected the index terms are abca b c and xyzx y z but I found abc and xyz under index link in monitoring and profiling window. And also the search string
//compound[.="abca b c"] giving 0 results.
Can any one help in creating index on the whole contents of compound like on abca b c, xyz x y z so on..
Thanks
sony

In xquery, you have to use data() function in order to return all of the descendant or the sub-element values.
So, to test if the values of the compound element can be returned you can use the following:
//compound/data()[.="abca b c"]

nested="yes" attribute solved the problem.
I have changed the range index to
<range>
<create qname="compound" type="xs:string" nested="yes" />
</range>

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

MarkLogic: How to omit search facet values that are empty? - xquery

The only sensible way would be to remove empty elements from your documents, if that is an option. HTH!

Related

Airflow SqlToS3Operator has unwanted an index in the beginning

chanaging standard_name and long_name with NCO

Regex find the string between last two quotes " "?

Decimals (involuntarily) trimmed from values when loaded into environment

range index on mixed content node in exist db

Categories

Resources