how can I do sphinx search set limit for the word count.
For example I have faceted search for 2100 products with 190 filters (price, color etc.) and result time is 0.004 second. Very good for me.
But there's something I 've wondered.
Faceted search example
Blue(1700)
Yellow(676)
Green(224)
I want to this
Blue(999+) <- Sphinx count max 1000, not much more. How can i this? Is this possible?
Yellow(676)
Green(224)
improves performance?
Thank you.
There is no setting for this.
And even is was doubt it wold inprove performance. When doing the counting loop, would have to add a conditional, to check if above a threshold and if so do nothing, otherwise increment the counter. its less work to just to increment anyway.
Max_matches affects the number of groups more, rather than the count within.
Can do it in application code for display purposes if like.
You can set a limit on the amount of results returned in Sphinx for sure. With PHP, I can refer you this:
http://php.net/manual/en/sphinxclient.setlimits.php
Otherwise, check out the setLimits() function in the Sphinx API here: http://sphinxsearch.com/docs/current.html
Related
Let me say that I have a below Data Frame in R with 500 player records with the following columns
PlayerID
TotalRuns
RunRate
AutionCost
Now out of the 500 players, I want my code to give me multiple combinations of 3 players that would satisfy the following criteria. Something like a Moneyball problem.
The sum of auction cost of all the 3 players shouldn't exceed X
They should have a minimum of Y TotalRuns
Their RunRate must be higher than the average run rate of all the players.
Kindly help with this. Thank you.
So there are choose(500,3) ways to choose 3 players which is 20,708,500. It's not impossible to generate all these combinations combn might do it for you, but I couldn't be bothered waiting to find out. If you do this with player IDs and then test your three conditions, this would be one way to solve your problem. An alternative would be a Monte Carlo method. Select three players that initially satisfy your conditions. Randomly select another player who doesn't belong to the current trio, if he satisfies the conditions save the combination and repeat. If you're optimizing (it's not clear but your question has optimization in the tag), then the new player has to result in a new trio that's better than the last, so if he doesn't improve your objective function (whatever it might be), then you don't accept the trade.
choose(500,3)
Shows there are almost 21,000,000 combinations of 3 players drawn from a pool of 500 which means a complete analysis of the entire search space ought to be reasonably doable in a reasonable time on a modern machine.
You can generate the indeces of these combinations using iterpc() and getnext() from the iterpc package. As in
# library(iterpc) # uncomment if not loaded
I <- iterpc(5, 3)
getnext(I)
You can also drastically cut the search space in a number of ways by setting up initial filtering criteria and/or by taking the first solution (while loop with condition = meeting criterion). Or, you can get and rank order all of them (loop through all combinations) or some intermediate where you get n solutions. And preprocessing can help reduce the search space. For example, ordering salaries in ascending order first will give you the cheapest salary solution first. Ordering the file by descending runs will give you the highest runs solutions first.
NOTE: While this works fine, I see iterpc now is superseded by the arrangements package where the relevant iterator is icombinations(). getnext() is still the access method for succeeding iterators.
Thanks, I used a combination of both John's and James's answers.
Filtered out all the players who don't satisfy the criteria and that boiled down only to 90+ players.
Then I used picked up players in random until all the variations got exhausted
Finally, I computed combined metrics for each variation (set) of players to arrive at the optimized set.
The code is a bit messy and doesn't wanna post it here.
I'd like to have a Calculated Column in a table that counts the instances of a concatenation.
I get the following error when inputting Abs(Count([concat])) as the column formula for the calculation: The expression Abs(Count([concat])) cannot be used in a calculated column.
Is there any other way to do it without doing a query? I'm pretty sure it can't be done but I figured I'd ask anyways since I didn't see any other posts about it.
No, and even if there was, you should create and use a query for this.
Besides, applying Abs on a count doesn't make much sense, as the count cannot be negative.
im trying to find an easy formula to do the following:
=IF(AND(H6="OK";H7="OK";H8="OK";H9="OK";H10="OK";H11="OK";);"OK";"X")
This actually works. But I want to apply to a range of cells within a column (H6:H11) instead of having to create a rule for each and every one of them... But trying as a range:
=IF(AND(H6:H11="OK";);"OK";"X")
Does not work.
Any insights?
Thanks.
=ArrayFormula(IF(AND(H6:H11="OK");"OK";"X"))
also works
arrayformulas work the same way they do in excel... they just need an ArrayFormula() around to work (will be automatically set when pressing Ctrl+Alt+Return like in excel)
In google sheets the formula is:
=ArrayFormula(IF(SUM(IF(H6:H11="OK";1;0))=6;"OK";"X"))
in excel:
=IF(SUM(IF(H6:H11="OK";1;0))=6;"OK";"X")
And confirm with Ctrl-Shift-Enter
This basically counts the number of times the said range is = to the criteria and compares it to the number it should be. So if the range is increased then increase the number 6 to accommodate.
I was wondering what the most efficient way is to get the available articles for a given nntp group. The method I have implemented works as follows:
(i) Select the group:
GROUP group.name.subname
(ii) Get a list of article numbers from the group (pushed back into a vector 'codes'):
LISTGROUP
(iii) Loop over codes and grab articles (e.g. headers)
for code in codes do
HEAD code
end
However, this doesn't scale well with large groups with many article codes.
In RFC 3977, the GROUP command is indicated as also returning the 'low' and 'high' article numbers. For example,
[C] GROUP misc.test
[S] 211 1234 3000234 3002322 misc.test
where 3000234 and 2002322 are the low and high numbers. I'm therefore thinking of using these instead rather than initially pushing back all article codes. But can these numbers be relied upon? Is 3000234 definitely indicative of the first article id in the above-selected group and likewise is 3002322 definitely indicative of the last article id in the above-selected group or are they just estimates?
Many thanks,
Ben
It turns out I was thinking about this all wrong. All I need to do is
(i) set the group using GROUP
(ii) execute the NEXT command followed by HEAD for however many headers I want (up to count):
for c : count do
articleId <-- NEXT
HEAD articleID
end
EDIT: I'm sure there must be a better way but until anyone suggests otherwise I'll assume this way to be the most effective. Cheers.
I'd like to find patterns and sort them by number of occurrences on an HEX file I have.
I am not looking for some specific pattern, just to make some statistics of the occurrences happening there and sort them.
DB0DDAEEDAF7DAF5DB1FDB1DDB20DB1BDAFCDAFBDB1FDB18DB23DB06DB21DB15DB25DB1DDB2EDB36DB43DB59DB32DB28DB2ADB46DB6FDB32DB44DB40DB50DB87DBB0DBA1DBABDBA0DB9ADBA6DBACDBA0DB96DB95DBB7DBCFDBCBDBD6DB9CDBB5DB9DDB9FDBA3DB88DB89DB93DBA5DB9CDBC1DBC1DBC6DBC3DBC9DBB3DBB8DBB6DBC8DBA8DBB6DBA2DB98DBA9DBB9DBDBDBD5DBD9DBC3DB9BDBA2DB84DB83DB7DDB6BDB58DB4EDB42DB16DB0DDB01DB02DAFCDAE9DAE5DAD9DAE2DAB7DA9BDAA6DA9EDAAADAC9DACADAC4DA92DA90DA84DA89DA93DAA9DA8CDA7FDA62DA53DA6EDA
That's an excerpt of the HEX file, and as an example I'd like to get:
XX occurrences of BDBDBD
XX occurrences of B93D
Is there a way to mine the file to generate that output?
Sure. Use a sliding window to create the counts (The link is for Perl, but it seems general enough to understand the algorithm). Your patterns are named N-grams. You will have to limit the maximal pattern, though.
This is a pretty classic CS problem. The code in general is non-trivial to implement as it will require at least one full parse of the sequence, and depending on your efficiency and memory/processor constraints might require several. See here.
You will need to partition your input string in some way to ensure that you get a good subsequence across it.
If there is a specific problem we might be able to help more, but the general strategy is in the Wikipedia article above.
You can use Regular Expressions to make a pattern to search for.
The regex needed would be very simple. Just use the exact phrase you're searching for. Then there should be a regular expression function in the language you're using (you didn't specify) that can count the number of matches.
Use that to create a simple counter.