What does it mean "IDF is just dependent on the term"? - information-retrieval

it possible someone explain "Tf is dependent on term and document" and "IDF is just dependent on the term" with an example ?

Suppose that we have these two documents:
d_1: "Tf is dependent on term and document"
d_2: "IDF is just dependent on the term"
The count of terms in each document is as follows:
d_1:
{Tf: 1, is: 1, dependent: 1, on: 1, term: 1, and: 1, document: 1}
d_2:
{IDF: 1, is: 1, just: 1, dependent: 1, on: 1, the: 1, term: 1}
The term frequencies (i.e., the ratio of times that term t appears in document d to the total count of terms of that document) for term "on" are:
tf(on, d_1) = 1 / 7
tf(on, d_2) = 1 / 7
For calculating the term frequency of a term, you must specifiy which document you are talking about. tf(on, d_1) = 1/7 tells you that 1/7 of all words in d_1 is "on".
The inverse document frequency (logarithm of ratio of documents that include the word "on") is:
idf(on) = log(2/2) = 0
As you see, the idf is constant for all documents in this corpus of two documents. It's just a measure of how common a term is in a set of documents. idf(on) = 0 tells you that "on" is not special at all and it appears in all documents.

Related

Paramaterization with user defined functions, KQL

I would like to add a variable into a function in KQL as in the following. Any idea how to do it? I tried a str_concat() but that threw an error message. Some other ideas? This function should return the regex pattern inside the function, which in this case is just the word after the one specified in X.
let MyFunction = (X: string, arg1: string){extract("{X}\\s([^\\s]+", 1, Y)};
datatable(Num:int, Message: string)[
1, "Ye who in rhymes dispersed the echoes hear",
2, "Of those sad sighs with which my heart I fed",
3, "When early youth my mazy wanderings led",
4, "Fondly different from what I now appear"]
extend
My = MyFunction("my", Message),
Preposition = MyFunction("[Of|in|from]", Message)
You can use verbatim string (#"...") so there will be no need for double escaping (\\)
a capture group (the part that you want to extract) is defined by brackets ((...))
The regex OR operator | should be used like X|Y|Z, [] is used to define set of characters.
By using [Of|in|from] you basically defined a set of characters that includes the characters O, f, |, i, n, r, o & m
(?:...) to separate the argument from the text to it's right, but avoid treating it as a capture group
let MyFunction = (X: string, Y: string){extract(strcat("(?i:",X, #")\s+(\S+)"), 1, Y)};
datatable(Num:int, Message: string)[
1, "Ye who in rhymes dispersed the echoes hear",
2, "Of those sad sighs with which my heart I fed",
3, "When early youth my mazy wanderings led",
4, "Fondly different from what I now appear"]
| extend My = MyFunction("my", Message), Preposition = MyFunction("of|in|from", Message)
Num
Message
My
Preposition
1
Ye who in rhymes dispersed the echoes hear
rhymes
2
Of those sad sighs with which my heart I fed
heart
those
3
When early youth my mazy wanderings led
mazy
4
Fondly different from what I now appear
what
Fiddle

Robot in a Grid - how to get all possible paths

I'm trying to solve this problem:
There is a grid with with r rows and c columns. A robot sitting in top left cell can only move in 2 directions, right and down. But certain cells have to be avoided and the robot cannot step on them. Find a path for the robot from the top left to the bottom right.
The problem specifically asks for a single path, and that seems straight forward:
Having the grid as boolean[][], the pseudocode I have is
List<String> path = new ArrayList<String>()
boolean found = false
void getPath(r, c){
if (!found) {
if ( (r or c is outofbounds) || (!grid[r][c]) )
return
if (r==0 AND c==0) // we reached
found = true
getPath(r-1, c)
getPath(r, c-1)
String cell = "(" + r + ", " + c + ")"
path.add(cell)
}
}
Though I was wondering how can I get all the possible paths (NOT just the count, but the path values as well). Note that it has r rows and c columns, so its not a nxn grid. I'm trying to think of a DP/recursive solution but unable to come up with any and stuck. It's hard to think when the recursion goes in two ways.
Any pointers? And also any general help on how to "think" about such problems would be appreciated :).
Any pointers? And also any general help on how to "think" about such problems would be appreciated :).
Approach to the problem:
Mentally construct graph G of the problem. In this case the vertices are cells in the grid and directed edges are created where a valid robot move exist.
Search for properties of G. In this case G is a DAG (Directed Acyclic Graph).
Use such properties to come up with a solution. In this case (G is a DAG) its common to use topological sort and dynamic programming to find the amount of valid paths.
Actually you don't need to construct the graph since the set of edges is pretty clear or to do topological sort as usual iteration of the matrix (incremental row index and incremental column index) is a topological sort of this implicit graph.
The dynamic programming part can be solved by storing in each cell [x][y] the amount of valid paths from [0][0] to [x][y] and checking where to move next.
Recurrence:
After computations the answer is stored in dp[n - 1][m - 1] where n is amount of rows and m is amount of columns. Overall runtime is O(nm).
How about find all possible valid paths:
Usual backtracking works and we can speed it up by applying early pruning. In fact, if we calculate dp matrix and then we do backtracking from cell [n - 1][m - 1] we can avoid invalid paths as soon the robot enters at a cell whose dp value is zero.
Python code with dp matrix calculated beforehand:
n, m = 3, 4
bad = [[False, False, False, False],
[ True, True, False, False],
[False, False, False, False]]
dp = [[1, 1, 1, 1],
[0, 0, 1, 2],
[0, 0, 1, 3]]
paths = []
curpath = []
def getPath(r, c):
if dp[r][c] == 0 or r < 0 or c < 0:
return
curpath.append((r, c))
if r == 0 and c == 0:
paths.append(list(reversed(curpath)))
getPath(r - 1, c)
getPath(r, c - 1)
curpath.pop()
getPath(n - 1, m - 1)
print(paths)
# valid paths are [[(0, 0), (0, 1), (0, 2), (0, 3), (1, 3), (2, 3)],
# [(0, 0), (0, 1), (0, 2), (1, 2), (1, 3), (2, 3)],
# [(0, 0), (0, 1), (0, 2), (1, 2), (2, 2), (2, 3)]]
Notice that is very similar to your code, there is a need to store all valid paths together and take care that appended lists are a copy of curpath to avoid ending up with an list of empty lists.
Runtime: O((n + m) * (amount of valid paths)) since simulated robot moves belong to valid paths or first step into an invalid path detected using foresight (dp). Warning: This method is exponential as amount of valid paths can be .

Prolog methods with certain conditions

I am trying to write a function where it gets new values on each recursion call and if it fails, then increase the the numbers and try it again until the number has reached 6 or 1.
The important part here is that it should go to the second part only if it has succeeded once and increase or lower the number. And if it has succeeded once and increased/lowered it should check whether or not the function try_to_do_math works with the new number and if it doesn't then increase or lower the number again until the number reaches 6 or 1 unless it succeeds before the number hits 6 or 1.
Also it should check whether it should add or substract by checking th value of higer_or_lower.
?- iterate_over(1, 1, Z, X).
iterate_over(X,Y,Z,higher_or_lower):-
try_to_do_math(X,Y,X1,Y1,high_low),
iterate_over(X1, Y1,1,high_low).
iterate_over(X,Y,Z,higher_or_lower):-
higher_or_lower = 1, %Go higher
Z = 1,
X1 is X+1, X=<6,
Y1 is Y+1, Y=<6,
iterate_over(X1, Y1, 0,higher_or_lower).
iterate_over(X,Y,Z,higher_or_lower):-
higher_or_lower = 2, %Go lower
Z = 1,
X1 is X-1, 1=< X,
Y1 is Y-1, 1=< Y,
iterate_over(X1, Y1, 0,higher_or_lower).
I am using variable Z to check if it succeeded before or not. If it is 1, that means it has succeeded and if it is _ empty then it has not.
With this code it always fails on Z=1 because it does not have the value 1 when it reaches there to check it.
If try_to_do_math does not succeed on the first try then nothing should happen and false is returned.
If try_to_do_math succeeds the first time then it gives back new X1 and Y1 and higer/lower. And then I tried to pass on the variable Z as the state variable that is changed to 1 just on calling it again here iterate_over(X1, Y1,1,high_low)..
EDIT
I will try to do an example that would succeed.
Lets say the correct one is X = 5 and Y = 5.
First ?- iterate_over(1, 1, Z, X). is called.
Then it goes to try_to_do_math(1, 1, X, Y, Z) and gives back try_to_do_math(1, 1, 3, 3, 0).
Then it should try it again by calling ?- iterate_over(3, 3, 1, 0).. Now it tries to call the try_to_do_math(3,3,X1,Y1,higher_or_lower), and it fails.
Now it should go to the lower iterate_over methods with values iterate_over(3, 3, 1, 0), but this is where it does not work as I want it to. The method is called with numbers iterate_over(3, 3, _, 0) and since 1 != _ it all fails.
I am not sure if this is helpful, but I tried to explain it the way I see it.
Also if it succeedes then I do not care about the response as the try_to_do_math will only be changing things.

How to use predicate exactly in MiniZinc

New MiniZinc user here ... I'm having a problem understanding the syntax of the counting constraint:
predicate exactly(int: n, array[int] of var int: x, int: v)
"Requires exactly n variables in x to take the value v."
I want to make sure each column in my 10r x 30c array has at least one each of 1,2 and 3, with the remaining 7 rows equal to zero.
If i declare my array as
array[1..10,1..30] of var 0..3: s;
how can I use predicate exactly to populate it as I need? Thanks!
Well, the "exactly" constraint is not so useful here since you want at least one occurrence of 1, 2, and 3. It's better to use for example the count function:
include "globals.mzn";
array[1..10,1..30] of var 0..3: s;
solve satisfy;
constraint
forall(j in 1..30) (
forall(c in 1..3) (
count([s[i,j] | i in 1..10],c) >= 1
)
)
;
output [
if j = 1 then "\n" else " " endif ++
show(s[i,j])
| i in 1..10, j in 1..30
];
You don't have do to anything about 0 since the domain is 0..3 and all values that are not 1, 2, or 3 must be 0.
Another constraint is "at_least", see https://www.minizinc.org/2.0/doc-lib/doc-globals-counting.html .
If you don't have read the MiniZinc tutorial (https://www.minizinc.org/downloads/doc-latest/minizinc-tute.pdf), I strongly advice you to. The tutorial teaches you how to think Constraint Programming and - of course - MiniZinc.

Prolog: ID number mapping to a list

I have a variable X that may contain multiple values: X = 1; X = 4; X = 7...
These values map to a list that contain x,y,z, or w. Each one of these value/list pairs are split into multiple facts, so I could have:
map(2,[x,y]).
map(3,[x]).
map(9,[y,w]).
I'm trying to write a program that, given X, I can look up these lists and count how many occurences of x,y,z, or w there are.
This is my attempt:
count(A,B,C,D,X) :- A = 0, B = 0, C = 0, D = 0,
check_list(X,x,A),
check_list(X,y,B),
check_list(X,z.C),
check_list(X,w,D).
check_list(X,Element,Counter) :-
map(X, List),
member(List, Element),
S is Counter + 1,
Counter = S.
The idea behind my program is I call check_list to check if there is a member that contains x,y,z,w for every possible value of X. If there is that member, I will increment the counter. I then want the values of A,B,C,D to have A = number of occurrences of x, B = number of occurrences of y, etc etc.
You are using Prolog variables wrong. Variables cannot change their values once they are instantiated unless Prolog backtracks to a choice-point previous to the instantiation. For example, in the rule for count/5 you unify A with zero and then you expect that satisfying check_list(X,x,A) will bind A to the number of occurrences of x, but A is not a free variable at that point.
So, you have to remove A = 0, ..., D = 0 from the first rule.
Next, you need a predicate that can be used to find the number of occurrences of an element in a list. You can use findall/3 for that:
occurrences(X, List, N):- findall(_, member(X, List), O), length(O, N).
Or you can write it yourself:
occurrences(_, [], 0).
occurrences(X, [X|Tail], N):-!, occurrences(X, Tail, N1), N is N1 + 1.
occurrences(X, [_|Tail], N):-occurrences(X, Tail, N).

Resources