as you can see, i try to list ads on my page by highest price. It didnt work. Checking it manually, and the result is weird. The price are not in order at all.
Your prices appear to be string values, which means they are going to be compared lexicographically, not numerically. In a lexicographic sort, two strings are compared by looking at the unicode values of their characters from left to right, like a dictionary. If you want a numerical sort, where the values are compared simply by their number values, you should be using a number type field instead of a string. This means you'll have to update each document to use numbers instead of strings where appropriate.
See also:
https://en.wikipedia.org/wiki/Lexicographical_order
https://www.quora.com/What-is-lexicographic-order
I see also you have what appears to be time values that don't much look like times. Look into using timestamp type fields instead, or numbers that will sort in chronological order.
Related
I have a csv file with multiple lists. See picture. What I want to do is query every single value so it tells me which list that the value is found in.
Eg I query number 898774 and it tells me 898774 - prim6 in set 1, set 2 and set 4.
I did find a quick work around by making one big list in excel, removing dupes and then manually searching all for each number. Doable for a small amount but not that good for '000s of sets.
I created a vector for each column and started a search with which(sapply) but then remembered I needed the names. Just a little bit out of my knowledge.
I have a few XML documents in marklogic which have the structure
<abc:doc>
<abc:doc-meta>
<abc:meetings>
<abc:meeting>
</abc:meeting>
<abc:meeting>
</abc:meeting>
</abc:meetings>
</abc:doc-meta>
</abc:doc>
We can have more than one <abc:meeting> element under the <abc:meetings> element.
I am trying to write a cts:search query to get only documents that have more than one <abc:meeting> element in the document.
Please advise
This is tricky. Ideally, you'd want to drive searches from indexes for best performance. Unfortunately, MarkLogic doesn't keep track of element counts in its universal index, and aggregating counts from a range index can be cumbersome.
The overall simplest solution would be to add a count attribute on abc:meetings, and then add a range index on that. It does mean you'd have to change your data, and you'd have to keep that attribute in synch with each change.
You could also just search on the presence of abc:meeting with cts:element-query(), and append an XPath predicate to count the number of elements afterwards. Something like:
cts:search(
collection(),
cts:element-query(xs:QName('abc:meeting'), cts:true-query())
)[count(.//abc:meeting) > 1]
If not many documents contain meetings, this might work fairly well for you, but it still requires pulling up all documents containing meetings, hence could be expensive.
I played with the thought of leveraging cts:near-query(), but that is driven on word positions, so depends on the actual amount of tokens inside a meeting. If that were always an exact number of tokens (unlikely I'd guess), you could use the minimal-distance option on a double cts:element-query() wrapped in a cts:near-query(). It might help optimize the previous option a little though.
Most performant option I can think of right now, involves adding a User-Defined aggregate Function. It unfortunately means compiling c++ code. I happen to have written such a UDF in the past, that you should be able to use as-is after compilation and installation. For details see:
https://github.com/grtjn/doc-count-udf
and
http://docs.marklogic.com/guide/app-dev/aggregateUDFs
HTH!
It boils down to how many "a few" is. If it's thousands or fewer, than what grtjn presents above for a cts:search plus an XPath expression will work fine. If it's more, I'd add the count attribute to abc:meetings and then use a pre-commit trigger (e.g. on the collection of these documents) to ensure that the count attribute value is kept in sync. You'd need a range index to be able to query for "Documents that have a count of meetings of 2 or greater".
Of course, if all you need to query on is whether there's more than one meeting, then just add a "multiple" attribute to abc:meetings with a value of "true". Then you don't need a range index - you can do a cts:element-attribute-value-query on abc:meetings and multiple="true".
I've found that sometimes comparing a timestamp on Google Sheets returned in a query differs from the original the query was based on.
At the online community I'm volunteering in, we use Google Forms to record volunteer hours. For our users to be able to verify their clock in/clock outs, we take the form responses with timestamps and filter them via a Query to only display those for one specific user:
=QUERY(A:F,"Select A,B,D where '"&J4&"'=F")
where J4 contains the username we are filtering for.
We calculate the row each stamp can be found in via a Match function where M2:M is the range containing the timestamp the query above returns and A2:A is the original timestamp.
=iferror(arrayformula(MATCH(M2:M,A2:A,0)+1),)
Now we found that sometimes, the MATCH failed even though we could verify that the timestamp in question existed. Some format wrangling later, we found the problem, illustrated for one example below:
The timestamp in question read 2/8/2018 4:12:47. Converted to a decimal, the value in column A turned into 43139.1755413195, while the very same time stamp in the query result read 43139.1755413194. The very last decimal, invisible unless you change the format to number and look at the formula line at the top of the sheet, has changed.
We have several different time stamps where the last decimal in the query result differs from the original the query is based on. Whether the last decimal in the query was one higher or lower than the original was inconsistent.
For our sheet, we now implemented a workaround of truncating the number earlier. However, that seems very inelegant. Is there a more elegant solution or a way to prevent (what we assume to be) rounding errors like this from happening? My search of google and the forums has not turned up anything like it, though I'm having trouble phrasing it in a way that gives me relevant hits.
I have an element range index configured for an element in my database. I am trying to run a search query on that element. The element contains string values and i need to search for one particular string value (not a range of values or date). Though both element-value and element-range queries can be used, and index is already present, Will both these queries perform in same way? or element-range performs better in this scenario?
The range query will be faster.
The element value query uses the universal index and that isn't full in memory
The range query users a ranged index and that's an in memory index.
The range query will be much faster as your data grows. It will also be faster if you have a lot of unique terms in that element.
The range-query is also answering a different question from the value-query.
Value queries query for matching word sequences, not matching strings. By default they are stemmed too, so cts:element-value-query(xs:QName("x"),"be fine") will match <x>Is finer</x>. Unless you do an exact unstemmed unwildcarded value query, an unfiltered search will not be able to resolve space and punctuation differences, either.
Range queries (on strings) are matching strings under the rules of a particular collation.
I have a piece of software which takes in a database, and uses it to produce graphs based on what the user wants (primarily queries of the form SELECT AVG(<input1>) AS x, AVG(<intput2>) as y FROM <input3> WHERE <key> IN (<vals..> AND ...). This works nicely.
I have a simple script that is passed a (often large) number of files, each describing a row
name=foo
x=12
y=23.4
....... etc.......
The script goes through each file, saving the variable names, and an INSERT query for each. It then loads the variable names, sort | uniq's them, and makes a CREATE TABLE statement out of them (sqlite, amusingly enough, is ok with having all columns be NUMERIC, even if they actually end up containing text data). Once this is done, it then executes the INSERTS (in a single transaction, otherwise it would take ages).
To improve performance, I added an basic index on each row. However, this increases database size somewhat significantly, and only provides a moderate improvement.
Data comes in three basic types:
single value, indicating things like program version, etc.
a few values (<10), indicating things like input parameters used
many values (>1000), primarily output data.
The first type obviously shouldn't need an index, since it will never be sorted upon.
The second type should have an index, because it will commonly be filtered by.
The third type probably shouldn't need an index, because it will be used in output.
It would be annoying to determine which type a particular value is before it is put in the database, but it is possible.
My question is twofold:
Is there some hidden cost to extraneous indexes, beyond the size increase that I have seen?
Is there a better way to index for filtration queries of the form WHERE foo IN (5) AND bar IN (12,14,15)? Note that I don't know which columns the user will pick, beyond the that it will be a type 2 column.
Read the relevant documentation:
Query Planning;
Query Optimizer Overview;
EXPLAIN QUERY PLAN.
The most important thing for optimizing queries is avoiding I/O, so tables with less than ten rows should not be indexed because all the data fits into a single page anyway, so having an index would just force SQLite to read another page for the index.
Indexes are important when you are looking up records in a big table.
Extraneous indexes make table updates slower, because each index needs to be updated as well.
SQLite can use at most one index per table in a query.
This particular query could be optimized best by having a single index on the two columns foo and bar.
However, creating such indexes for all possible combinations of lookup columns is most likely not worth the effort.
If the queries are generated dynamically, the best idea probably is to create one index for each column that has good selectivity, and rely on SQLite to pick the best one.
And don't forget to run ANALYZE.