How to find 5 closest number from matrix having attributes? - r

I have a matrix as follows
`> y
1 2 3
1 0.8802216 1.2277843 0.6875047
2 0.9381081 1.3189847 0.2046542
3 1.3245534 0.8221709 0.4630722
4 1.2006974 0.8890464 0.6710844
5 1.2344071 0.8354292 0.7259998
6 1.1670665 0.9214787 0.6826173
7 0.9670581 1.1070461 0.7742342
8 0.8867365 1.2160533 0.7024281
9 0.8235792 1.4424190 0.2030302
10 0.8821301 1.0541099 1.2279813
11 1.1958634 0.9708839 0.4297043
12 1.3542734 0.7747481 0.5119648
13 0.4385487 0.3588158 4.9167998
14 0.8530141 1.3578511 0.3698620
15 0.9651803 0.8426226 1.6132899
16 0.8854192 1.2272616 0.6715839
17 0.7779642 0.8132233 2.3386331
18 0.9936722 1.1629110 0.5083558
19 1.1235897 1.0018480 0.5764672
20 0.7887222 1.3101684 0.7373181
21 2.2276176 0.0000000 0.0000000`
I found one clue, but it can give position for the whole matrix,`
n<-read.table(file.choose(),header=T)
y<-n[,c("1","2","3")]
my.number=1.12270420185886 .
z<-abs(y-my.number)==min(abs(y-my.number))
which(z)
[1] 19 `
I want to find at least the 5 closest values with letter & column no too, in another way, I want the 5 closest single values from the matrix with their position.

I don't know what language it is; is it R?
In a procedural language, I would save all values to a map (val, (pos)) = (val (row, col); example (0.880..-> (1, 1)), then sort by value.
Then iterate over i<-pos (1 to map.size-5), and get the diff (pos (i), pos (i+5)), search for the minimum (diff), get the values and their position then.
Here is a solution in Scala:
val matrix = """1 0.8802216 1.2277843 0.6875047
2 0.9381081 1.3189847 0.2046542
3 1.3245534 0.8221709 0.4630722
4 1.2006974 0.8890464 0.6710844
5 1.2344071 0.8354292 0.7259998
6 1.1670665 0.9214787 0.6826173
7 0.9670581 1.1070461 0.7742342
8 0.8867365 1.2160533 0.7024281
9 0.8235792 1.4424190 0.2030302
10 0.8821301 1.0541099 1.2279813
11 1.1958634 0.9708839 0.4297043
12 1.3542734 0.7747481 0.5119648
13 0.4385487 0.3588158 4.9167998
14 0.8530141 1.3578511 0.3698620
15 0.9651803 0.8426226 1.6132899
16 0.8854192 1.2272616 0.6715839
17 0.7779642 0.8132233 2.3386331
18 0.9936722 1.1629110 0.5083558
19 1.1235897 1.0018480 0.5764672
20 0.7887222 1.3101684 0.7373181
21 2.2276176 0.0000000 0.0000000"""
// split block of text into lines
val lines=matrix.split ("\n")
// split lines into words
val rows = lines.map (l => l.split (" \\+"))
// remove the index from the beginning (1, 2, ... 21) and
// transform values from Strings to double numbers
// triples is: Array(Array(0.8802216, 1.2277843, 0.6875047), Array(0.9381081, 1.3189847, 0.2046542),
val triples = rows.map (_.tail).map(triple=> triple.map (_.toDouble))
// generate an own index for the rows and columns
// elems is: elems: Array[Array[(Double, (Int, Int))]] = Array(Array((0.8802216,(0,0)), (1.2277843,(0,1)), (0.6875047,(0,2))), Array((0.9381081,(1,0)), ...
val elems = triples.zipWithIndex.map {t=> t._1.zipWithIndex.map (vc=> (vc._1 -> (t._2, vc._2)))}
// sorted = Array((0.0,(20,1)), (0.0,(20,2)), (0.2030302,(8,2)), (0.2046542,(1,2)),
val sorted = elems.sortBy (e => e._1)
// delta5 = List(0.3588158, 0.369862, 0.2266741, 0.2338945, 0.10425639999999997, 0.1384938,
val delta5 = sorted.sliding (5, 1).map (q => q(4)._1-q(0)._1).toList
val minindex = delta5.indexOf (delta5.min) // minindex: Int = 29, delta5.min = 0.008824799999999966
// we found the smallest intervall of 5 values beginning at 29:
(29 to 29 +5).map (sorted (_))
res568: scala.collection.immutable.IndexedSeq[(Double, (Int, Int))] =
Vector( (0.8802216,(0,0)),
(0.8821301,(9,0)),
(0.8854192,(15,0)),
(0.8867365,(7,0)),
(0.8890464,(3,1)),
(0.9214787,(5,1)))
Since Scala counts from 0 to 20 and 0 to 2, where your index runs from 1 to 3 and 1 to 21 respectively, you have to add (1,1) to each of the positions=> (1,1), (10,1), and so on.

Related

How to know if there is a different element in one array in Scilab?

My goal is to check if there are misplaced objects in one array.
for example the array is
2.
2.
2.
2.
2.
1.
3.
1.
3.
3.
3.
1.
3.
1.
1.
1.
1.
I want to know if the first 5 elements, 6 to 13 and 14-17 are the same.
The purpose of this is to identify the misplaced elements in a clustering solution.
I have tried for the first 5 elements
ISet=5
IVer=7
IVir=5
for i=1:ISet
if(isequal(FIRSTMIN(i,1,2),FIRSTMIN(i+1,1,2))==%f)
numMisp=numMisp+1
mprintf("Set misp: %i",numMisp)
end
end
For the next 6 to 13 elements
for i=ISet+1:IVer+ISet-1
if(isequal(FIRSTMIN(i,1,2),FIRSTMIN(i+1,1,2))==%f)
mprintf("%i %i Ver misp: %i\n",FIRSTMIN(i,1,2),FIRSTMIN(i+1,1,2),i)
numMisp=numMisp+1
end
end
For the next 14 to 17 elements
for i=IVer+ISet:IVer+IVir-1
if(isequal(FIRSTMIN(i,1,2),FIRSTMIN(i+1,1,2))==%f)
mprintf("%i %i Ver misp: %i\n",FIRSTMIN(i,1,2),FIRSTMIN(i+1,1,2),i)
numMisp=numMisp+1
mprintf("Vir misp: %i",i)
end
end
You can use unique for that purpose. For example the following test checks if the first five elements are the same
x=[2 2 2 2 2 1 3 1 3 3 3 1 3 1 1 1 1];
if length(unique(x(1:5))) == 1
//
end
You can do the the same for the other clusters by replacing 1:5 by 6:13 then 14:17.

Complex select in SQLite view

I have two tables where Security holds the access bit mask for a given NTFS file system scan and FileSystemRights which equates to the string representations for the well known bit masks. I need to create a view which exposes the expected (not just proper) string representations for a given bit mask. The problem is several enum values composite and contain combinations of lower values, so the desired idea is not to repeat the implicit values.
For example, a value of 1179817 (Security.Id = 24) should only report ReadAndExecute and Synchronize, excluding ExecuteFile, ListDirectory, Read, ReadAttributes, ReadData, ReadExtendedAttributes, ReadPermissions and Traverse, as those are all part of ReadAndExecute (eg. ReadAndExecute & Read == Read). Its obviously correct to show them all, but a user wants only to see the non implicit values.
I'm lost within the constraints of SQL to produce a join that behaves like this without some abysmal nested case that would be a nightmare to look at.
Does a better programmatic approach exist?
FileSystemRights
================
Id Name Value
-- ---- -----
1 None 0
2 ListDirectory 1
3 ReadData 1
4 WriteData 2
5 CreateFiles 2
6 CreateDirectories 4
7 AppendData 4
8 ReadExtendedAttributes 8
9 WriteExtendedAttributes 16
10 ExecuteFile 32
11 Traverse 32
12 DeleteSubdirectoriesAndFiles 64
13 ReadAttributes 128
14 WriteAttributes 256
15 Write 278
16 Delete 65536
17 ReadPermissions 131072
18 Read 131209
19 ReadAndExecute 131241
20 Modify 197055
21 ChangePermissions 262144
22 TakeOwnership 524288
23 Synchronize 1048576
24 FullControl 2032127
25 GenericAll 268435456
26 GenericExecute 536870912
27 GenericWrite 1073741824
28 GenericRead 2147483648
Security
========
Id FileSystemRights IdentityReference
-- ---------------- -----------------
20 2032127 BUILTIN\Administrators
21 2032127 BUILTIN\Administrators
22 2032127 NT AUTHORITY\SYSTEM
23 268435456 CREATOR OWNER
24 1179817 BUILTIN\Users
25 4 BUILTIN\Users
26 2 BUILTIN\Users
MyView
======
SELECT s.Id AS SecurityId,
f.Name
FROM Security s
JOIN FileSystemRights f
ON CASE f.Value
WHEN 0 THEN s.FileSystemRights = f.Value
ELSE (s.FileSystemRights & f.Value) == f.Value
END
ORDER BY s.Id, f.Name;
Add the actual value of the name to the query.
Then wrap another query around that to filter out values for the same entry that are a subset of another value:
WITH AllValues(SecurityId, Name, Value) AS (
SELECT s.Id,
f.Name,
f.Value
FROM Security s
JOIN FileSystemRights f
ON CASE f.Value
WHEN 0 THEN s.FileSystemRights = f.Value
ELSE (s.FileSystemRights & f.Value) == f.Value
END
)
SELECT SecurityId,
Name
FROM AllValues
WHERE NOT EXISTS (SELECT *
FROM AllValues AS AV2
WHERE AV2.SecurityId = AllValues.SecurityId
AND (AV2.Value & AllValues.Value) != 0
AND AV2.Value > AllValues.Value
)
ORDER BY 1, 2;

GameTheory package: Convert data frame of games to Coalition Set

I am looking to explore the GameTheory package from CRAN, but I would appreciate help in converting my data (in the form of a data frame of unique combinations and results) in to the required coalition object. The precursor to this I believe to be an ordered list of all coalition values (https://cran.r-project.org/web/packages/GameTheory/vignettes/GameTheory.pdf).
My real data has n ~ 30 'players', and unique combinations = large (say 1000 unique combinations), for which I have 1 and 0 identifiers to describe the combinations. This data is sparsely populated in that I do not have data for all combinations, but will assume combinations not described have zero value. I plan to have one specific 'player' who will appear in all combinations, and act as a baseline.
By way of example this is the data frame I am starting with:
require(GameTheory)
games <- read.csv('C:\\Users\\me\\Desktop\\SampleGames.csv', header = TRUE, row.names = 1)
games
n1 n2 n3 n4 Stakes Wins Success_Rate
1 1 1 0 0 800 60 7.50%
2 1 0 1 0 850 45 5.29%
3 1 0 0 1 150000 10 0.01%
4 1 1 1 0 300 25 8.33%
5 1 1 0 1 1800 65 3.61%
6 1 0 1 1 1900 55 2.89%
7 1 1 1 1 700 40 5.71%
8 1 0 0 0 3000000 10 0.00333%
where n1 is my universal player, and in this instance, I have described all combinations.
To calculate my 'base' coalition value from player {1} alone, I am looking to perform the calculation: 0.00333% (success rate) * all stakes, i.e.
0.00333% * (800 + 850 + 150000 + 300 + 1800 + 1900 + 700 + 3000000) = 105
I'll then have zero values for {2}, {3} and {4} as they never "play" alone in this example.
To calculate my first pair coalition value, I am looking to perform the calculation:
7.5%(800 + 300 + 1800 + 700) + 0.00333%(850 + 150000 + 1900 + 3000000) = 375
This is calculated as players {1,2} base win rate (7.5%) by the stakes they feature in, plus player {1} base win rate (0.00333%) by the combinations he features in that player {2} does not - i.e. exclusive sets.
This logic is repeated for the other unique combinations. For example row 4 would be the combination of {1,2,3} so the calculation is:
7.5%(800+1800) + 5.29%(850+1900) + 8.33%(300+700) + 0.00333%(3000000+150000) = 529 which descriptively is set {1,2} success rate% by Stakes for the combinations it appears in that {3} does not, {1,3} by where {2} does not feature, {1,2,3} by their occurrences, and the base player {1} by examples where neither {2} nor {3} occur.
My expected outcome therefore should look like this I believe:
c(105,0,0,0, 375,304,110,0,0,0, 529,283,246,0, 400)
where the first four numbers are the single player combinations {1} {2} {3} and {4}, the next six numbers are two player combinations {1,2} {1,3} {1,4} (and the null cases {2,3} {2,4} {3,4} which don't exist), then the next four are the three player combinations {1,2,3} {1,2,4} {1,3,4} and the null case {2,3,4}, and lastly the full combination set {1,2,3,4}.
I'd then feed this in to the DefineGame function of the package to create my coalitions object.
Appreciate any help: I have tried to be as descriptive as possible. I really don't know where to start on generating the necessary sets and set exclusions.

Number pattern for checkboxes in asp.net?

I have a database table which contanis a field name Province bits and it adds up the count of the pattern like:
AB-1
BC-2
CD-4
DE-8
EF-16.... and so on.
Now in the table entry I have a value-13(Province bit), which implies checkboxes against entry AB,CD,DE(adds up to 13)should be checked.
I am not able to get the logic behind the same, how can check only those checkboxes whose sum adds up to the entry in the table?
You need to check to see if the value is in the bitwise total.
if( interestedInValue & totalValue == interestedInValue)
{
// this value is in the total, check the box
}
Documentation on & http://msdn.microsoft.com/en-us/library/sbf85k1c(v=vs.71).aspx
e.g. 13 = 1 + 4 + 8
13 & 1 == 1 // true
13 & 2 == 2 // false
13 & 4 == 4 // true
13 & 8 == 8 // true
13 & 16 == 16 // false
EDIT: for more clarification
ab.Checked = 1 && ProvinceBit == 1 // checkbox AB
bc.Checked = 2 && ProvinceBit == 2 // checkbox BC
...
The field is using bit flags.
13 is 1101 binary.
So convert the value to bits and assign one bit to each checkbox.
By converting your number to a string, you can convert to an array or just iterate through the string. A bit brute force, but will give you what you need.
var value = 13
string binary = Convert.ToString(value, 2);
//binary = "1101"

Implementing proximity matrix for clustering

Please I am a little new to this field so pardon me if the question sound trivial or basic.
I have a group of dataset(Bag of words to be specific) and I need to generate a proximity matrix by using their edit distance from each other to find and generate the proximity matrix .
I am however quite confused how I will keep track of my data/strings in the matrix. I need the proximity matrix for the purpose of clustering.
Or How generally do you approach this kinds of problem in the field. I am using perl and R to implement this.
Here is a typical code in perl I have written that reads from a text file containing my bag of words
use strict ;
use warnings ;
use Text::Levenshtein qw(distance) ;
main(#ARGV);
sub main
{
my #TokenDistances ;
my $Tokenfile = 'TokenDistinct.txt';
my #Token ;
my $AppendingCount = 0 ;
my #Tokencompare ;
my %Levcount = ();
open (FH ,"< $Tokenfile" ) or die ("Error opening file . $!");
while(<FH>)
{
chomp $_;
$_ =~ s/^(\s+)$//g;
push (#Token , $_ );
}
close(FH);
#Tokencompare = #Token ;
foreach my $tokenWord(#Tokencompare)
{
my $lengthoffile = scalar #Tokencompare;
my $i = 0 ;
chomp $tokenWord ;
##TokenDistances = levDistance($tokenWord , \#Tokencompare );
for($i = 0 ; $i < $lengthoffile ;$i++)
{
if(scalar #TokenDistances == scalar #Tokencompare)
{
print "Yipeeeeeeeeeeeeeeeeeeeee\n";
}
chomp $tokenWord ;
chomp $Tokencompare[$i];
#print $tokenWord. " {$Tokencompare[$i]} " . " $TokenDistances[$i] " . "\n";
#$Levcount{$tokenWord}{$Tokencompare[$i]} = $TokenDistances[$i];
$Levcount{$tokenWord}{$Tokencompare[$i]} = levDistance($tokenWord , $Tokencompare[$i] );
}
StoreSortedValues ( \%Levcount ,\$tokenWord , \$AppendingCount);
$AppendingCount++;
%Levcount = () ;
}
# %Levcount = ();
}
sub levDistance
{
my $string1 = shift ;
#my #StringList = #{(shift)};
my $string2 = shift ;
return distance($string1 , $string2);
}
sub StoreSortedValues {
my $Levcount = shift;
my $tokenWordTopMost = ${(shift)} ;
my $j = ${(shift)};
my #ListToken;
my $Tokenfile = 'LevResult.txt';
if($j == 0 )
{
open (FH ,"> $Tokenfile" ) or die ("Error opening file . $!");
}
else
{
open (FH ,">> $Tokenfile" ) or die ("Error opening file . $!");
}
print $tokenWordTopMost;
my %tokenWordMaster = %{$Levcount->{$tokenWordTopMost}};
#ListToken = sort { $tokenWordMaster{$a} cmp $tokenWordMaster{$b} } keys %tokenWordMaster;
##ListToken = keys %tokenWordMaster;
print FH "-------------------------- " . $tokenWordTopMost . "-------------------------------------\n";
#print FH map {"$_ \t=> $tokenWordMaster{$_} \n "} #ListToken;
foreach my $tokey (#ListToken)
{
print FH "$tokey=>\t" . $tokenWordMaster{$tokey} . "\n"
}
close(FH) or die ("Error Closing File. $!");
}
the problem is how can I represent the proximity matrix from this and still be able to keep track of which comparison represent which in my matrix.
In the RecordLinkage package there is the levenshteinDist function, which is one way of calculating an edit distance between strings.
install.packages("RecordLinkage")
library(RecordLinkage)
Set up some data:
fruit <- c("Apple", "Apricot", "Avocado", "Banana", "Bilberry", "Blackberry",
"Blackcurrant", "Blueberry", "Currant", "Cherry")
Now create a matrix consisting of zeros to reserve memory for the distance table. Then use nested for loops to calculate the individual distances. We end with a matrix with a row and a column for each fruit. Thus we can rename the columns and rows to be identical to the original vector.
fdist <- matrix(rep(0, length(fruit)^2), ncol=length(fruit))
for(i in seq_along(fruit)){
for(j in seq_along(fruit)){
fdist[i, j] <- levenshteinDist(fruit[i], fruit[j])
}
}
rownames(fdist) <- colnames(fdist) <- fruit
The results:
fdist
Apple Apricot Avocado Banana Bilberry Blackberry Blackcurrant
Apple 0 5 6 6 7 9 12
Apricot 5 0 6 7 8 10 10
Avocado 6 6 0 6 8 9 10
Banana 6 7 6 0 7 8 8
Bilberry 7 8 8 7 0 4 9
Blackberry 9 10 9 8 4 0 5
Blackcurrant 12 10 10 8 9 5 0
Blueberry 8 9 9 8 3 3 8
Currant 7 5 6 5 8 10 6
Cherry 6 7 7 6 4 6 10
The proximity or similarity (or dissimilarity) matrix is just a table that stores the similarity score for pairs of objects. So, if you have N objects, then the R code can be simMat <- matrix(nrow = N, ncol = N), and then each entry, (i,j), of simMat indicates the similarity between item i and item j.
In R, you can use several packages, including vwr, to calculate the Levenshtein edit distance.
You may also find this Wikibook to be of interest: http://en.wikibooks.org/wiki/R_Programming/Text_Processing

Resources