Product of two permutations (cycles) in Prolog - recursion

I would like to find the product of two permutations in Prolog (in cycle form) and I'm having problems with it (mostly because I can't even imagine, what it will look like).
I thought about changing these permutations into another representation, but I'm not sure which way is right.
So help me please, any hint is greatly appreciated.
%permutationproduct(+P1,+P2,-Result)
EDIT: Under the product of permutation I mean this: CLICK (but our inputs are in cycle notation, which makes the project more difficult). The inputs are 2 permutations (P1,P2) and the expected result is the 3rd parameter - the product of the permutations.
And I'm actually working on a bigger project, this is just one part of it, but as mentioned, I can't even start it, because I can't imagine it.

Related

find the number of permutations for no two identical elements sit next to each other

I need some help for this problem ive been facing
suppose I have an array=[3,4,1,5,6,1,3]
now I need the permutation that the duplicate element 3 should not sit beside other 3 and same for 1.
how am I suppose to solve this ive watched a ton of YouTube and googled it but no luck
for the help thanks in advance.,,,
Are you looking for a general case solution or just for that particular array? If you are looking for more general case, I think you should specify the restrictions or the problem becomes too complex. Same applies for if you want to write a code. Some languages (like Python) have libraries that makes these works relatively simple, but the time complexity can get ugly.
Here's mathmatical approached to the problem:
Step 1: Suppose all the elements are different a = [3,4,5,6,1]
In this case we will have 5! different options (You have 5 options to choose the first element and 4 options to choose the second and so on)
Step 2: Suppose you have one repeated element a = [1,3,4,5,6,1]
In this case we have 6!/2! different options (6! comes from Step 1 and we divide it by 2! because if you switch the position of repeated element to itself the array does not chance).
Now you want to exclude options where repeated elements appear next to each other. The trick is to treat them as one element. So now we have a = [(1,1), 3, 4, 5, 6]. There 5! different options. We subtract this from total, that is 6!/2! - 5! will give you the answer.
Step 3: (your case) Two repeated elements a = [3,4,1,5,6,1,3]
We continue with the same logic. In total we have 7!/(2!x2!) options. From Step 2, if we want to exclude cases where 1 appear next to 1 then we will have to substract 6! from total. Also we have 3 that appears twice too. So, we will subtract another 6! from total. Unfortunately, we have subtracted some cases twice (can you guess which). If we find which cases we subtracted twice and add them we will get the answer.
The cases that we subtracted twice are when 1 comes after 1 at the same time 3 comes after 3, that is a = [(1,1),4,5,6,(3,3)]. We have subtracted those options for both one and three. There are 5! cases like that (can you guess why?).
To some it up get 7!/(2!x2!) - 2x6! + 5!.
If you are not looking for general solution those numbers are not big so you can write a bruteforce code (To save some time/space convert array to string).
I might have missed something in calculations but if you follow the logic you will get the answer. Also, if you want to understand why those things work try it with small data to get the intuition. If you need code, let me know. I will update the solution.

My looping code in prolog won't work with the rest of my code

I am working on a school assignment where I have to find the smallest hitting set of a list of lists. Right now I am working on finding all the hitting sets before I narrow it down to the smallest one. I implemented a way for my code to find out if two lists have at least one number in common. To my knowledge it works as intended by itself but whenever I try to connect it to the main part of my code it won't work or runs indefinitely. Any help would be appreciated.
Minimal, Reproducible Example:
This code has been made in a way that it will return true anytime a two arrays have at least one number that intersects between the two. I have tested this code and to my knowledge it works as intended. The first result back is always correct when I tested it.
checkForIntersection([],[]):- false.
checkForIntersection([Head|Tail],[Head2|Tail2]):-
Head = Head2;
checkForIntersection(Tail,[Head2|Tail2]);
checkForIntersection([Head|Tail],Tail2).
This part of the code is where I believe an error occurs. I have an AnswerSet as the list that I want to check for intersections. The [Check|NextCheck] is a list of lists and I want to check each of them against the AnswerSet. I loop through it until the [Check|NextCheck] is empty. The issue is that when I call it the results I get is an infinite amount of trues even if the answer should be false.
loopThroughListOfLists(CheckListsAgenstThisList,[]).
loopThroughListOfLists(CheckListsAgenstThisList,[ListToCheck|NextListToCheck]):-
checkForIntersection(ListToCheck,CheckListsAgenstThisList),
loopThroughListOfLists(CheckListsAgenstThisList,NextListToCheck).
loopCheck([3],[[1], [1], [3], [4], [4]]). This is one of the cases I used to test my code with. Instead of returning false it returns an infinite amount of trues whenever I test it.
Thank you so much for reading, I am sorry if this is a really stupid question, I am really struggling with Prolog.

Correlations and what brackets indicate

I have this code, from Julian Farawy's linear models book:
round(cor(seatpos[,-9]),2)
I am unsure what [,-9],2 is doing - could someone please assist?
When you are learning new stuff nested functions can be difficult. This same computation could be accomplished in steps, which might be easier for you to see what KeonV and MrFlick are suggesting.
Here is an alternative way of doing this the same functions but easier steps to differentiate with simple explanations.
sub_seatpos<- seatpos[,-9]
this says take a subset of all rows and all columns EXCEPT column number nine and save it into sub_seatpos (this subseting was done in the initial code, but not saved into a new variable. This just makes seeing how each step works easier).
and reflects the bold portion below
round(cor(seatpos[,-9]),2)
cor_seatpos <- cor(sub_seatpos)
This takes the correlation for sub_seatpos and saves them into a variable named cor_seatpos. It reflects the part listed below in bold
round( cor( seatpos[,-9] ),2)
The final step just says round the correlation to 2 decimal places and would look like this in separate lines of code.
round(cor_seatpos, 2)
it is reflected in the bold below
round( cor(seatpos[,-9]),2)
What makes this confusing is that all of the functions are nested. As you become more proficient, this becomes less of a difficulty to read. But it can be confusing with new functions.

What is wrong with this lines of my R script?What I am missing?

Ok, So I got this long line of code as a part of a script which someone wrote ( I know it seems horrible). So I tried to simplify it.
dH=((-HMF )*( 1.013*10^10*((T+273.2)/298)*exp((292131/1.987)*(1/298-1/(T+273.2)))/(1+(exp((331573/1.987)*(1/284.9-1/(T+273.2))))))*( 4.371*10^-8*((RH+273.2)/298)*exp((55763.5/1.987)*(1/298-1/(RH+273.2)))/(1+(exp((77245.3/1.987)*(1/365.3-1/(RH+273.2))))))*(H[hour]*I[hour]))-((LGR1)*( 123.8*((T+273.2)/298)*exp((-390540/1.987)*(1/298-1/(T+273.2)))/(1+(exp((-402880/1.987)*(1/300.1-1/(T+273.2)))))) *H[hour]*L1a[hour])-((LGR2)*( 123.8*((T+273.2)/298)*exp((-390540/1.987)*(1/298-1/(T+273.2)))/(1+(exp((-402880/1.987)*(1/300.1-1/(T+273.2)))))) *H[hour]*L2a[hour])- ((LGR3)*( 123.8*((T+273.2)/298)*exp((-390540/1.987)*(1/298-1/(T+273.2)))/(1+(exp((-402880/1.987)*(1/300.1-1/(T+273.2)))))) *H[hour]*L3a[hour])
I simplified it like this:
a<-(1.013*10^10*((T+273.2)/298)*exp((292131/1.987)*(1/298-1/(T+273.2)))/(1+(exp((331573/1.987)*(1/284.9-1/(T+273.2))))))
b<-( 4.371*10^-8*((RH+273.2)/298)*exp((55763.5/1.987)*(1/298-1/(RH+273.2)))/(1+(exp((77245.3/1.987)*(1/365.3-1/(RH+273.2))))))
c<-(123.8*((T+273.2)/298)*exp((-390540/1.987)*(1/298-1/(T+273.2)))/(1+(exp((-402880/1.987)*(1/300.1-1/(T+273.2))))))
d<-(1.7168*((T+273.2)/298)*exp((14275.4/1.987)*(1/298-1/(T+273.2)))/(1+(exp((49087.1/1.987)*(1/298.85-1/(T+273.2))))))
dH=((-HMF )*a*b*(H[hour]*I[hour]))-(LGR1*c*H[hour]*L1a[hour])-(LGR2*c*H[hour]*L2a[hour])-(LGR3*c*H[hour]*L3a[hour])
So what basically the model does is that it takes T and RH for different hours and LGR1,LGR2 and LGR3 are constant values. Also L1a, L2a and L3a are also claculated for different hours and a,b,c and d are used to calculate L1a, L2a and L3a for different hours.
The odd thing is that when I only and simply replace the messy long formula with a,b,c, and d my output model changes which I expect not to. I know it might be vague but I was not sure if I can post the full script here.
Thanks in advance for your advice
I took it into an editor that is syntax-aware and use the parenthesis matching capacity to break into its 4 arithmetic terms (separated by minus signs):
dH=
((-HMF )*( 1.013*10^10*((T+273.2)/298)*exp((292131/1.987)*(1/298-1/(T+273.2)))/(1+(exp((331573/1.987)*(1/284.9-1/(T+273.2))))))*( 4.371*10^-8*((RH+273.2)/298)*exp((55763.5/1.987)*(1/298-1/(RH+273.2)))/(1+(exp((77245.3/1.987)*(1/365.3-1/(RH+273.2))))))*(H[hour]*I[hour]))-
((LGR1)*( 123.8*((T+273.2)/298)*exp((-390540/1.987)*(1/298-1/(T+273.2)))/(1+(exp((-402880/1.987)*(1/300.1-1/(T+273.2)))))) *H[hour]*L1a[hour])-
((LGR2)*( 123.8*((T+273.2)/298)*exp((-390540/1.987)*(1/298-1/(T+273.2)))/(1+(exp((-402880/1.987)*(1/300.1-1/(T+273.2)))))) *H[hour]*L2a[hour])-
((LGR3)*( 123.8*((T+273.2)/298)*exp((-390540/1.987)*(1/298-1/(T+273.2)))/(1+(exp((-402880/1.987)*(1/300.1-1/(T+273.2)))))) *H[hour]*L3a[hour])
The fact that your terms seem to start in different sections of that expression make me think you separated terms inappropriately. It does appear (in my editor that the last three terms all share a common factor and tht the only items that vary in those three terms are the first and last factors:
( 123.8*((T+273.2)/298)*exp((-390540/1.987)*(1/298-1/(T+273.2)))/(1+(exp((-402880/1.987)*(1/300.1-1/(T+273.2)))))) *H[hour]

Fuzzy matching of product names

I need to automatically match product names (cameras, laptops, tv-s etc) that come from different sources to a canonical name in the database.
For example "Canon PowerShot a20IS", "NEW powershot A20 IS from Canon" and "Digital Camera Canon PS A20IS"
should all match "Canon PowerShot A20 IS". I've worked with levenshtein distance with some added heuristics (removing obvious common words, assigning higher cost to number changes etc), which works to some extent, but not well enough unfortunately.
The main problem is that even single-letter changes in relevant keywords can make a huge difference, but it's not easy to detect which are the relevant keywords. Consider for example three product names:
Lenovo T400
Lenovo R400
New Lenovo T-400, Core 2 Duo
The first two are ridiculously similar strings by any standard (ok, soundex might help to disinguish the T and R in this case, but the names might as well be 400T and 400R), the first and the third are quite far from each other as strings, but are the same product.
Obviously, the matching algorithm cannot be a 100% precise, my goal is to automatically match around 80% of the names with a high confidence.
Any ideas or references is much appreciated
I think this will boil down to distinguishing key words such as Lenovo from chaff such as New.
I would run some analysis over the database of names to identify key words. You could use code similar to that used to generate a word cloud.
Then I would hand-edit the list to remove anything obviously chaff, like maybe New is actually common but not key.
Then you will have a list of key words that can be used to help identify similarities. You would associate the "raw" name with its keywords, and use those keywords when comparing two or more raw names for similarities (literally, percentage of shared keywords).
Not a perfect solution by any stretch, but I don't think you are expecting one?
The key understanding here is that you do have a proper distance metric. That is in fact not your problem at all. Your problem is in classification.
Let me give you an example. Say you have 20 entries for the Foo X1 and 20 for the Foo Y1. You can safely assume they are two groups. On the other hand, if you have 39 entries for the Bar X1 and 1 for the Bar Y1, you should treat them as a single group.
Now, the distance X1 <-> Y1 is the same in both examples, so why is there a difference in the classification? That is because Bar Y1 is an outlier, whereas Foo Y1 isn't.
The funny part is that you do not actually need to do a whole lot of work to determine these groups up front. You simply do an recursive classification. You start out with node per group, and then add the a supernode for the two closest nodes. In the supernode, store the best assumption, the size of its subtree and the variation in it. As many of your strings will be identical, you'll soon get large subtrees with identical entries. Recursion ends with the supernode containing at the root of the tree.
Now map the canonical names against this tree. You'll quickly see that each will match an entire subtree. Now, use the distances between these trees to pick the distance cutoff for that entry. If you have both Foo X1 and Foo Y1 products in the database, the cut-off distance will need to be lower to reflect that.
edg's answer is in the right direction, I think - you need to distinguish key words from fluff.
Context matters. To take your example, Core 2 Duo is fluff when looking at two instances of a T400, but not when looking at a a CPU OEM package.
If you can mark in your database which parts of the canonical form of a product name are more important and must appear in one form or another to identify a product, you should do that. Maybe through the use of some sort of semantic markup? Can you afford to have a human mark up the database?
You can try to define equivalency classes for things like "T-400", "T400", "T 400" etc. Maybe a set of rules that say "numbers bind more strongly than letters attached to those numbers."
Breaking down into cases based on manufacturer, model number, etc. might be a good approach. I would recommend that you look at techniques for term spotting to try and accomplish that: http://www.worldcat.org/isbn/9780262100854
Designing everything in a flexible framework that's mostly rule driven, where the rules can be modified based on your needs and emerging bad patterns (read: things that break your algorithm) would be a good idea, as well. This way you'd be able to improve the system's performance based on real world data.
You might be able to make use of a trigram search for this. I must admit I've never seen the algorithm to implement an index, but have seen it working in pharmaceutical applications, where it copes very well indeed with badly misspelt drug names. You might be able to apply the same kind of logic to this problem.
This is a problem of record linkage. The dedupe python library provides a complete implementation, but even if you don't use python, the documentation has a good overview of how to approach this problem.
Briefly, within the standard paradigm, this task is broken into three stages
Compare the fields, in this case just the name. You can use one or more comparator for this, for example an edit distance like the Levenshtein distance or something like the cosine distance that compares the number of common words.
Turn an array fo distance scores into a probability that a pair of records are truly about the same thing
Cluster those pairwise probability scores into groups of records that likely all refer to the same thing.
You might want to create logic that ignores the letter/number combination of model numbers (since they're nigh always extremely similar).
Not having any experience with this type of problem, but I think a very naive implementation would be to tokenize the search term, and search for matches that happen to contain any of the tokens.
"Canon PowerShot A20 IS", for example, tokenizes into:
Canon
Powershot
A20
IS
which would match each of the other items you want to show up in the results. Of course, this strategy will likely produce a whole lot of false matches as well.
Another strategy would be to store "keywords" with each item, such as "camera", "canon", "digital camera", and searching based on items that have matching keywords. In addition, if you stored other attributes such as Maker, Brand, etc., you could search on each of these.
Spell checking algorithms come to mind.
Although I could not find a good sample implementation, I believe you can modify a basic spell checking algorithm to comes up with satisfactory results. i.e. working with words as a unit instead of a character.
The bits and pieces left in my memory:
Strip out all common words (a, an, the, new). What is "common" depends on context.
Take the first letter of each word and its length and make that an word key.
When a suspect word comes up, looks for words with the same or similar word key.
It might not solve your problems directly... but you say you were looking for ideas, right?
:-)
That is exactly the problem I'm working on in my spare time. What I came up with is:
based on keywords narrow down the scope of search:
in this case you could have some hierarchy:
type --> company --> model
so that you'd match
"Digital Camera" for a type
"Canon" for company and there you'd be left with much narrower scope to search.
You could work this down even further by introducing product lines etc.
But the main point is, this probably has to be done iteratively.
We can use the Datadecision service for matching products.
It will allow you to automatically match your product data using statistical algorithms. This operation is done after defining a threshold score of confidence.
All data that cannot be automatically matched will have to be manually reviewed through a dedicated user interface.
The online service uses lookup tables to store synonyms as well as your manual matching history. This allows you to improve the data matching automation next time you import new data.
I worked on the exact same thing in the past. What I have done is using an NLP method; TF-IDF Vectorizer to assign weights to each word. For example in your case:
Canon PowerShot a20IS
Canon --> weight = 0.05 (not a very distinguishing word)
PowerShot --> weight = 0.37 (can be distinguishing)
a20IS --> weight = 0.96 (very distinguishing)
This will tell your model which words to care and which words to not. I had quite good matches thanks to TF-IDF.
But note this: a20IS cannot be recognized as a20 IS, you may consider to use some kind of regex to filter such cases.
After that, you can use a numeric calculation like cosine similarity.

Resources