Find all words containing characters in UNIX - unix

Given a word W, I want to find all words containing the letters in W from /usr/dict/words.
For example, "bat" should return "bat" and "tab" (but not "table").
Here is one solution which involves sorting the input word and matching:
word=$1
sortedWord=`echo $word | grep -o . | sort | tr -d '\n'`
while read line
do
sortedLine=`echo $line | grep -o . | sort | tr -d '\n'`
if [ "$sortedWord" == "$sortedLine" ]
then
echo $line
fi
done < /usr/dict/words
Is there a better way? I'd prefer using basic commands (instead of perl/awk etc), but all solutions are welcome!
To clarify, I want to find all permutations of the original word. Addition or deletion of characters is not allowed.

here's an awk implementation. It finds the words with those letters in "W".
dict="/usr/share/dict/words"
word=$1
awk -vw="$word" 'BEGIN{
m=split(w,c,"")
for(p=1;p<=m;p++){ chars[c[p]]++ }
}
length($0)==length(w){
f=0;g=0
n=split($0,t,"")
for(o=1;o<=n;o++){
if (!( t[o] in chars) ){
f=1; break
}else{ st[t[o]]++ }
}
if (!f || $0==w){
for(z in st){
if ( st[z] != chars[z] ) { g=1 ;break}
}
if(!g){ print "found: "$0 }
}
delete st
}' $dict
output
$ wc -l < /usr/share/dict/words
479829
$ time ./shell.sh look
found: kolo
found: look
real 0m1.361s
user 0m1.074s
sys 0m0.015s
Update: change of algorithm, using sorting
dict="/usr/share/dict/words"
awk 'BEGIN{
w="table"
m=split(w,c,"")
b=asort(c,chars)
}
length($0)==length(w){
f=0
n=split($0,t,"")
e=asort(t,d)
for(i=1;i<=e;i++) {
if(d[i]!=chars[i]){
f=1;break
}
}
if(!f) print $0
}' $dict
output
$ time ./shell.sh #looking for table
ablet
batel
belat
blate
bleat
tabel
table
real 0m1.416s
user 0m1.343s
sys 0m0.014s
$ time ./shell.sh #looking for chairs
chairs
ischar
rachis
real 0m1.697s
user 0m1.660s
sys 0m0.014s
$ time perl perl.pl #using beamrider's Perl script
table
tabel
ablet
batel
blate
bleat
belat
real 0m2.680s
user 0m1.633s
sys 0m0.881s
$ time perl perl.pl # looking for chairs
chairs
ischar
rachis
real 0m14.044s
user 0m8.328s
sys 0m5.236s

Here's a shell solution. The best algorithm seems to be #4. It filters out all words that are of incorrect length. Then, it sums the words using a simple substitution cipher (a=1, b=2, A=27, ...). If the sums match, then it will actually do the original sort and compare.
On my system, it can churn through ~235k words looking for "bat" in just under 1/2 second.
I'm providing all of my solutions so you can see the different approaches.
Update: not shown, but I also tried putting the sum inside the first bin of the histogram approach I tried, but it was even slower than the histograms without. I thought it would function as a short circuit, but it didn't work.
Update2: I tried the awk solution and it runs in about 1/3 the time of my best shell solution or ~0.126s versus ~0.490s. The perl solution runs ~1.1s.
#!/bin/bash
word=$1
#dict=words
dict=/usr/share/dict/words
#dict=/usr/dict/words
alg1() {
sortedWord=`echo $word | grep -o . | sort | tr -d '\n'`
while read line
do
sortedLine=`echo $line | grep -o . | sort | tr -d '\n'`
if [ "$sortedWord" == "$sortedLine" ]
then
echo $line
fi
done < $dict
}
check_sorted_versus_not() {
local word=$1
local line=`echo $2 | grep -o . | sort | tr -d '\n'`
if [ "$word" == "$line" ]
then
echo $2
fi
}
# Filter out all words of incorrect length
alg2() {
sortedWord=`echo $word | grep -o . | sort | tr -d '\n'`
grep_string="^`echo -n $word | tr 'a-zA-Z' '.'`\$"
grep "$grep_string" "$dict" | \
while read line
do
sortedLine=`echo $line | grep -o . | sort | tr -d '\n'`
if [ "$sortedWord" == "$sortedLine" ]
then
echo $line
fi
done
}
# Create a lot of variables like this:
# _a=1, _b=2, ... _z=26, _A=27, _B=28, ... _Z=52
gen_chars() {
# [ -n "$GEN_CHARS" ] && return
GEN_CHARS=1
local alpha="abcdefghijklmnopqrstuvwxyz"
local upperalpha=`echo -n $alpha | tr 'a-z' 'A-Z'`
local both="$alpha$upperalpha"
for ((i=0; i < ${#both}; i++))
do
ACHAR=${both:i:1}
eval "_$ACHAR=$((i+1))"
done
}
# I think it's faster to return the value in a var then to echo it in a sub process.
# Try summing the word one char at a time by building an arithmetic expression
# and then evaluate that expression.
# Requires: gen_chars
sum_word() {
SUM=0
local s=""
# parsing input one character at a time
for ((i=0; i < ${#1}; i++))
do
ACHAR=${1:i:1}
s="$s\$_$ACHAR+"
done
SUM=$(( $(eval echo -n ${s}0) ))
}
# I think it's faster to return the value in a var then to echo it in a sub process.
# Try summing the word one char at a time using a case statement.
sum_word2() {
SUM=0
local s=""
# parsing input one character at a time
for ((i=0; i < ${#1}; i++))
do
ACHAR=${1:i:1}
case $ACHAR in
a) SUM=$((SUM+ 1));;
b) SUM=$((SUM+ 2));;
c) SUM=$((SUM+ 3));;
d) SUM=$((SUM+ 4));;
e) SUM=$((SUM+ 5));;
f) SUM=$((SUM+ 6));;
g) SUM=$((SUM+ 7));;
h) SUM=$((SUM+ 8));;
i) SUM=$((SUM+ 9));;
j) SUM=$((SUM+ 10));;
k) SUM=$((SUM+ 11));;
l) SUM=$((SUM+ 12));;
m) SUM=$((SUM+ 13));;
n) SUM=$((SUM+ 14));;
o) SUM=$((SUM+ 15));;
p) SUM=$((SUM+ 16));;
q) SUM=$((SUM+ 17));;
r) SUM=$((SUM+ 18));;
s) SUM=$((SUM+ 19));;
t) SUM=$((SUM+ 20));;
u) SUM=$((SUM+ 21));;
v) SUM=$((SUM+ 22));;
w) SUM=$((SUM+ 23));;
x) SUM=$((SUM+ 24));;
y) SUM=$((SUM+ 25));;
z) SUM=$((SUM+ 26));;
A) SUM=$((SUM+ 27));;
B) SUM=$((SUM+ 28));;
C) SUM=$((SUM+ 29));;
D) SUM=$((SUM+ 30));;
E) SUM=$((SUM+ 31));;
F) SUM=$((SUM+ 32));;
G) SUM=$((SUM+ 33));;
H) SUM=$((SUM+ 34));;
I) SUM=$((SUM+ 35));;
J) SUM=$((SUM+ 36));;
K) SUM=$((SUM+ 37));;
L) SUM=$((SUM+ 38));;
M) SUM=$((SUM+ 39));;
N) SUM=$((SUM+ 40));;
O) SUM=$((SUM+ 41));;
P) SUM=$((SUM+ 42));;
Q) SUM=$((SUM+ 43));;
R) SUM=$((SUM+ 44));;
S) SUM=$((SUM+ 45));;
T) SUM=$((SUM+ 46));;
U) SUM=$((SUM+ 47));;
V) SUM=$((SUM+ 48));;
W) SUM=$((SUM+ 49));;
X) SUM=$((SUM+ 50));;
Y) SUM=$((SUM+ 51));;
Z) SUM=$((SUM+ 52));;
*) SUM=0; return;;
esac
done
}
# I think it's faster to return the value in a var then to echo it in a sub process.
# Try summing the word by building an arithmetic expression using sed and then evaluating
# the expression.
# Requires: gen_chars
sum_word3() {
SUM=$(( $(eval echo -n `echo -n $1 | sed -E -ne 's,.,$_&+,pg'`) 0))
#echo "SUM($1)=$SUM"
}
# Filter out all words of incorrect length
# Sum the characters in the word: i.e. a=1, b=2, ... and "abbc" = 1+2+2+3 = 8
alg3() {
gen_chars
sortedWord=`echo $word | grep -o . | sort | tr -d '\n'`
sum_word $word
word_sum=$SUM
grep_string="^`echo -n $word | tr 'a-zA-Z' '.'`\$"
grep "$grep_string" "$dict" | \
while read line
do
sum_word $line
line_sum=$SUM
if [ $word_sum == $line_sum ]
then
check_sorted_versus_not $sortedWord $line
fi
done
}
# Filter out all words of incorrect length
# Sum the characters in the word: i.e. a=1, b=2, ... and "abbc" = 1+2+2+3 = 8
# Use sum_word2
alg4() {
sortedWord=`echo $word | grep -o . | sort | tr -d '\n'`
sum_word2 $word
word_sum=$SUM
grep_string="^`echo -n $word | tr 'a-zA-Z' '.'`\$"
grep "$grep_string" "$dict" | \
while read line
do
sum_word2 $line
line_sum=$SUM
if [ $word_sum == $line_sum ]
then
check_sorted_versus_not $sortedWord $line
fi
done
}
# Filter out all words of incorrect length
# Sum the characters in the word: i.e. a=1, b=2, ... and "abbc" = 1+2+2+3 = 8
# Use sum_word3
alg5() {
gen_chars
sortedWord=`echo $word | grep -o . | sort | tr -d '\n'`
sum_word3 $word
word_sum=$SUM
grep_string="^`echo -n $word | tr 'a-zA-Z' '.'`\$"
grep "$grep_string" "$dict" | \
while read line
do
sum_word3 $line
line_sum=$SUM
if [ $word_sum == $line_sum ]
then
check_sorted_versus_not $sortedWord $line
fi
done
}
# I think it's faster to return the value in a var then to echo it in a sub process.
# Try summing the word one char at a time using a case statement.
# Place results in a histogram
sum_word4() {
SUM=(0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0
0)
# parsing input one character at a time
for ((i=0; i < ${#1}; i++))
do
ACHAR=${1:i:1}
case $ACHAR in
a) SUM[1]=$((SUM[ 1] + 1));;
b) SUM[2]=$((SUM[ 2] + 1));;
c) SUM[3]=$((SUM[ 3] + 1));;
d) SUM[4]=$((SUM[ 4] + 1));;
e) SUM[5]=$((SUM[ 5] + 1));;
f) SUM[6]=$((SUM[ 6] + 1));;
g) SUM[7]=$((SUM[ 7] + 1));;
h) SUM[8]=$((SUM[ 8] + 1));;
i) SUM[9]=$((SUM[ 9] + 1));;
j) SUM[10]=$((SUM[10] + 1));;
k) SUM[11]=$((SUM[11] + 1));;
l) SUM[12]=$((SUM[12] + 1));;
m) SUM[13]=$((SUM[13] + 1));;
n) SUM[14]=$((SUM[14] + 1));;
o) SUM[15]=$((SUM[15] + 1));;
p) SUM[16]=$((SUM[16] + 1));;
q) SUM[17]=$((SUM[17] + 1));;
r) SUM[18]=$((SUM[18] + 1));;
s) SUM[19]=$((SUM[19] + 1));;
t) SUM[20]=$((SUM[20] + 1));;
u) SUM[21]=$((SUM[21] + 1));;
v) SUM[22]=$((SUM[22] + 1));;
w) SUM[23]=$((SUM[23] + 1));;
x) SUM[24]=$((SUM[24] + 1));;
y) SUM[25]=$((SUM[25] + 1));;
z) SUM[26]=$((SUM[26] + 1));;
A) SUM[27]=$((SUM[27] + 1));;
B) SUM[28]=$((SUM[28] + 1));;
C) SUM[29]=$((SUM[29] + 1));;
D) SUM[30]=$((SUM[30] + 1));;
E) SUM[31]=$((SUM[31] + 1));;
F) SUM[32]=$((SUM[32] + 1));;
G) SUM[33]=$((SUM[33] + 1));;
H) SUM[34]=$((SUM[34] + 1));;
I) SUM[35]=$((SUM[35] + 1));;
J) SUM[36]=$((SUM[36] + 1));;
K) SUM[37]=$((SUM[37] + 1));;
L) SUM[38]=$((SUM[38] + 1));;
M) SUM[39]=$((SUM[39] + 1));;
N) SUM[40]=$((SUM[40] + 1));;
O) SUM[41]=$((SUM[41] + 1));;
P) SUM[42]=$((SUM[42] + 1));;
Q) SUM[43]=$((SUM[43] + 1));;
R) SUM[44]=$((SUM[44] + 1));;
S) SUM[45]=$((SUM[45] + 1));;
T) SUM[46]=$((SUM[46] + 1));;
U) SUM[47]=$((SUM[47] + 1));;
V) SUM[48]=$((SUM[48] + 1));;
W) SUM[49]=$((SUM[49] + 1));;
X) SUM[50]=$((SUM[50] + 1));;
Y) SUM[51]=$((SUM[51] + 1));;
Z) SUM[52]=$((SUM[52] + 1));;
*) SUM[53]=-1; return;;
esac
done
#echo ${SUM[*]}
}
# Check if two histograms are equal
hist_are_equal() {
# Array sizes differ?
[ ${#_h1[*]} != ${#SUM[*]} ] && return 1
# parsing input one index at a time
for ((i=0; i < ${#_h1[*]}; i++))
do
[ ${_h1[i]} != ${SUM[i]} ] && return 1
done
return 0
}
# Check if two histograms are equal
hist_are_equal2() {
# Array sizes differ?
local size=${#_h1[*]}
[ $size != ${#SUM[*]} ] && return 1
# parsing input one index at a time
for ((i=0; i < $size; i++))
do
[ ${_h1[i]} != ${SUM[i]} ] && return 1
done
return 0
}
# Filter out all words of incorrect length
# Use sum_word4 which generates a histogram of character frequency
alg6() {
sum_word4 $word
_h1=${SUM[*]}
grep_string="^`echo -n $word | tr 'a-zA-Z' '.'`\$"
grep "$grep_string" "$dict" | \
while read line
do
sum_word4 $line
if hist_are_equal
then
echo $line
fi
done
}
# Filter out all words of incorrect length
# Use sum_word4 which generates a histogram of character frequency
alg7() {
sum_word4 $word
_h1=${SUM[*]}
grep_string="^`echo -n $word | tr 'a-zA-Z' '.'`\$"
grep "$grep_string" "$dict" | \
while read line
do
sum_word4 $line
if hist_are_equal2
then
echo $line
fi
done
}
run_test() {
echo alg$1
eval time alg$1
}
#run_test 1
#run_test 2
#run_test 3
run_test 4
#run_test 5
run_test 6
#run_test 7

#!/usr/bin/perl
$myword=join("", sort split (//, $ARGV[0]));
shift;
while (<>) {
chomp;
print "$_\n" if (join("", sort split (//)) eq $myword);
}
Use it like this:
bla.pl < /usr/dict/words searchword

You want to find words containing only a given set of characters. A regex for that would be:
'^[letters_you_care_about]*$'
So, you could do:
grep "^[$W]*$" /usr/dict/words
The '^' matches the beginning of the line; '$' is for the end of the line. This means we must have an exact match, not just a partial match (e.g. "table").
'[' and ']' are used to define a group of possible characters allowed in one character space of the input file. We use this to find words in /usr/dict/word that only contain the characters in $W.
The '*' repeats the previous character (the '[...]' rule), which says to find a word of any length, where all the characters are in $W.

So we have the following:
n = length of input word
L = lines in dictionary file
If n tends to be small and L tends to be huge, might we be better off finding all permutations of the input word and looking for those, rather than doing something (like sorting) to all L lines of the dictionary file? (Actually, since finding all permutations of a word is O(n!), and we have to run through the entire dictionary file once for each word, maybe not, but I wrote the code anyway.)
This is Perl - I know you wanted command-line operations but I don't have a way to do that in shell script that's not super-hacky:
sub dedupe {
my (#list) = #_;
my (#new_list, %seen_entries, $entry);
foreach $entry (#list) {
if (!(defined($seen_entries{$entry}))) {
push(#new_list, $entry);
$seen_entries{$entry} = 1;
}
}
return #new_list;
}
sub find_all_permutations {
my ($word) = #_;
my (#permutations, $subword, $letter, $rest_of_word, $i);
if (length($word) == 1) {
push(#permutations, $word);
} else {
for ($i=0; $i<length($word); $i++) {
$letter = substr($word, $i, 1);
$rest_of_word = substr($word, 0, $i) . substr($word, $i + 1);
foreach $subword (find_all_permutations($rest_of_word)) {
push(#permutations, $letter . $subword);
}
}
}
return #permutations;
}
$words_file = '/usr/share/dict/words';
$word = 'table';
#all_permutations = dedupe(find_all_permutations($word));
foreach $permutation (#all_permutations) {
if (`grep -c -m 1 ^$permutation\$ $words_file` == 1) {
print $permutation . "\n";
}
}

This utility might interest you:
an -w "tab" -m 3
...gives bat and tab only.
The original author seems to not be around any more, but you can find information at http://packages.qa.debian.org/a/an.html (even if you don't want to use it itself, the source might be worth a look).

Related

Unix - circular shift of rows and columns in a file

Given a file that contains something like this:
1 2 3 4
5 6 7 8
a b c d
e f g h
Is there any unix command that I could use to circular shift the rows and coluns?
I am looking for something like say,
circular_shift -r 2 <file> (shift row by 2) to give :
a b c d
e f g h
1 2 3 4
5 6 7 8
and
circular_shift -c 2 <file> (shift column by 2) to give :
3 4 1 2
7 8 5 6
c d a b
g h e f
Thanks!
Using awk for row shift processing the file twice:
$ awk -v r=2 'NR==FNR && FNR>r || NR>FNR && FNR<=r' file file
a b c d
e f g h
1 2 3 4
5 6 7 8
Basically it prints records where NR > r on the first go and NR <= r on the second.
Edit: Version regarding records and fields:
$ awk -v r=1 -v c=1 '
NR==FNR && FNR>r || NR>FNR && FNR<=r {
j=0;
for(i=c+1;++j<=NF;i=(i<NF?i+1:1)){
printf "%s%s",$i,(i==c?ORS:OFS)
}
}
' foo foo
6 7 8 5
b c d a
f g h e
2 3 4 1
(Pretty much untested as I'm in a meeting... it fails at least for c=0)
Another solution using multidimensional arrays in gawk
circular_shift.awk
{for(i=1; i<=NF; ++i){d[NR][i]=$i}}
END{
c=c%NF; r=r%NR
for(i=1; i<=NR; ++i){
nr = i + (i>r?0:NR) - r
for(j=1; j<=NF; ++j){
nc = j + (j>c?0:NF) - c
printf d[nr][nc] (j!=NF?OFS:RS)
}
}
}
awk -vr=2 -f circular_shift.awk file
a b c d
e f g h
1 2 3 4
5 6 7 8
awk -vc=2 -f circular_shift.awk file
3 4 1 2
7 8 5 6
c d a b
g h e f
awk -vr=2 -vc=2 -f circular_shift.awk file
c d a b
g h e f
3 4 1 2
7 8 5 6
Shifting Rows
You can use head, tail and the shell:
function circular_shift() {
n=$1
file=$2
tail -n +"$((n+1))" "$file"
head -n "$n" "$file"
}
Call the function like this:
circular_shift 2 <file>
One restriction. The above function just works for n <= nlines(file). If you want to get rid of that restriction you need to know the length of the file in advance and use the modulo operator:
function circular_shift() {
n=$1
file=$2
len="$(wc -l "$file"|cut -d" " -f1)"
n=$((n%len))
tail -n +"$((n+1))" "$file"
head -n "$n" "$file"
}
Now try to call:
circular_shift 6 <file>
Shifting Columns
For the column shift I would use awk:
column-shift.awk
{
n = n % NF
c = 1
for(i=NF-n+1; i<=NF; i++) {
a[c++] = $i
}
for(i=1; i<NF-n+1; i++) {
a[c++] = $i
}
for(i=1; i<c; i++) {
$i = a[i]
}
}
print
Wrap it in a shell function:
function column_shift() {
n="$1"
file="$2"
awk -v n="$n" -f column-shift.awk "$file"
}
#Vivek V K, Try:
For moving the rows to a number up-wards.
awk -vcount=2 'NR>count{print;next} NR<=count{Q=Q?Q ORS $0:$0} END{print Q}' Input_file
For shifting the fields, could you please try following:
awk -vcount=2 '{for(i=count+1;i<=NF;i++){Q=Q?Q FS $i:$i};for(j=1;j<=count;j++){P=P?P FS $j:$j};print Q FS P;Q=P=""}' Input_file
awk -v C=$1 -v R=$2 '
function PrintReverse () {
if( ! R ) return
for( i=1; i>=0; i--) {
for( j=1; j<=R; j++) {
#print "DEBUG:: i: "i " j:" j " i * R + j :" i * R + j " lr:" lr
print L[ i * R + j ]
L[ i * R + j ] = ""
}
}
}
{
if( C ) {
# Reverse Column
for ( i=1; i<=NF; i+=2*C) {
for( j=0; j<C; j++) {
#print "DEBUG:: i: "i " j:" j " NF:" NF
tmp = $(i+j)
$(i+j) = $(i+j+C)
$(i+j+C) = tmp
}
}
$1=$1
}
if ( R ) {
# Line buffer
lr = ( FNR - 1 ) % ( R * 2 ) + 1
L[ lr] = $0
}
else print
}
lr >= ( R * 2) { PrintReverse() }
END { if( lr < ( R * 2 )) PrintReverse() }
' YourFile
Will do both your reverse action
R is the number of row, C the number of column to reverse.
using 2 loop (1 loop inside another one) [not the fastest but the more explicit for understanding the concept ion this case)
this is a buffer permutation for lines by loading line in a buffer of twice the number of Row and print 2 half content in reverse order
this is a field swap for column permutation, it cycle by 2 * number of column swaping field content with field with index + number of column
Row are treated after the buffer is feeded (in fact each R * 2 lines)
column are treated at each line
i add a test ( C ), ( ! R ), ... to allow single reverse (Row only or Column only)

Given XOR & SUM of two numbers. How to find the numbers?

Given XOR & SUM of two numbers. How to find the numbers?
For example, x = a+b, y = a^b; if x,y are given, how to get a, b?
And if can't, give the reason.
This cannot be done reliably. A single counter-example is enough to destroy any theory and, in your case, that example is 0, 100 and 4, 96. Both of these sum to 100 and xor to 100 as well:
0 = 0000 0000 4 = 0000 0100
100 = 0110 0100 96 = 0110 0000
---- ---- ---- ----
xor 0110 0100 = 100 xor 0110 0100 = 100
Hence given a sum of 100 and an xor of 100, you cannot know which of the possibilities generated that situation.
For what it's worth, this program checks the possibilities with just the numbers 0..255:
#include <stdio.h>
static void output (unsigned int a, unsigned int b) {
printf ("%u:%u = %u %u\n", a+b, a^b, a, b);
}
int main (void) {
unsigned int limit = 256;
unsigned int a, b;
output (0, 0);
for (b = 1; b != limit; b++)
output (0, b);
for (a = 1; a != limit; a++)
for (b = 1; b != limit; b++)
output (a, b);
return 0;
}
You can then take that output and massage it to give you all the repeated possibilities:
testprog | sed 's/ =.*$//' | sort | uniq -c | grep -v ' 1 ' | sort -k1 -n -r
which gives:
255 255:255
128 383:127
128 319:191
128 287:223
128 271:239
128 263:247
:
and so on.
Even in that reduced set, there are quite a few combinations which generate the same sum and xor, the worst being the large number of possibilities that generate a sum/xor of 255/255, which are:
255:255 = 0 255
255:255 = 1 254
255:255 = 2 253
255:255 = <n> <255-n>, for n = 3 thru 255 inclusive
It has already been shown that it can't be done, but here are two further reasons why.
For the (rather large) subset of a's and b's (a & b) == 0, you have a + b == (a ^ b) (because there can be no carries) (the reverse implication does not hold). In such a case, you can, for each bit that is 1 in the sum, choose which one of a or b contributed that bit. Obviously this subset does not cover the entire input, but it at least proves that it can't be done in general.
Furthermore, there exist many pairs of (x, y) such that there is no solution to a + b == x && (a ^ b) == y, for example (there are more than just these) all pairs (x, y) where ((x ^ y) & 1) == 1 (ie one is odd and the other is even), because the lowest bit of the xor and the sum are equal (the lowest bit has no carry-in). By a simple counting-argument, that must mean that at least some pairs (x, y) must have multiple solutions: clearly all pairs of (a, b) have some pair of (x, y) associated with them, so if not all pairs of (x, y) can be used, some other pairs (x, y) must be shared.
Here is the solution to get all such pairs
Logic:
let the numbers be a and b, we know
s = a + b
x = a ^ b
therefore
x = (s-b) ^ b
Since we know x and we know s, so for all ints going from 0 to s - just check if this last equation is satisfied
here is the code for this
public List<Pair<Integer>> pairs(int s, int x) {
List<Pair<Integer>> pairs = new ArrayList<Pair<Integer>>();
for (int i = 0; i <= s; i++) {
int calc = (s - i) ^ i;
if (calc == x) {
pairs.add(new Pair<Integer>(i, s - i));
}
}
return pairs;
}
Class pair is defined as
class Pair<T> {
T a;
T b;
public String toString() {
return a.toString() + "," + b.toString();
}
public Pair(T a, T b) {
this.a = a;
this.b = b;
}
}
Code to test this:
public static void main(String[] args) {
List<Pair<Integer>> pairs = new Test().pairs(100,100);
for (Pair<Integer> p : pairs) {
System.out.println(p);
}
}
Output:
0,100
4,96
32,68
36,64
64,36
68,32
96,4
100,0
if you have a , b the sum = a+b = (a^b) + (a&b)*2 this equation may be useful for you

AWK division by zero error

I'm getting a division by 0 error from my awk command. I'm not sure what is causing this as the result should not be 0.
In this case it should be printing 1.11557887 from 1.7229/1.5444.
Could it be a problem with how I assigned the variables?
This is my script:
#!/usr/bin/awk -f
FNR == 22 { measC = $2 }
FNR == 23 { refC = $2 }
factorC = refC / measC
{ print factorC }
It returns:
/usr/bin/awk: division by zero
input record number 1, file 1.txt
source line number 5
This is what my input data looks like:
#!xxx x
# x x x x x x
# x: x x
# x: x x
# x: x x x x x
# (x) x x, x x x, x.
x: x x x x
x: 3.0.0
x: x
x: 0
x: x x
x: 0
x: x x
x: x
x: 0
x: 0
x: 2
x: x x x
x: x
x: 1
x: 4
origmax: 1.5444 1.5188 1.0221 1.4932
currentmax: 1.7229 1.6888 1.1069 1.6238
Because you put factorC = refC / measC outside of a block, awk thinks you want to use that expression as a pattern. So it evaluates that expression for each line of input. On the first line of input, measC hasn't been defined yet, so it defaults to zero.
I think you want this:
#!/usr/bin/awk -f
FNR == 22 { measC = $2 }
FNR == 23 { refC = $2 }
END {
factorC = refC / measC
print factorC
}
or this:
#!/usr/bin/awk -f
FNR == 22 { measC = $2 }
FNR == 23 {
refC = $2
factorC = refC / measC
print factorC
}
Your script says:
FNR == 22 { measC = $2 }
So ... when only, say, five lines of your input file have been read by this awk script, what is the value of measC?
I'll tell you a secret. It will be zero. Because nothing has assigned anything else to measC yet.
Also, your line:
factorC = refC / measC
is outside the block, so it's being used to evaluate whether the { print factorC } should be run. And because it's a condition, it gets run for every line. And wouldn't you know it, before line 22, measC is 0.
I don't understand the data or the output, so I don't know what measC should be, if anything.
What are you trying to achieve with this?

Haskell and Quadratics

I have to write a program to solve quadratics, returning a complex number result.
I've gotten so far, with defining a complex number, declaring it to be part of num, so +,- and * - ing can take place.
I've also defined a data type for a quadratic equation, but im now stuck with the actual solving of the quadratic. My math is quite poor, so any help would be greatly appreciated...
data Complex = C {
re :: Float,
im :: Float
} deriving Eq
-- Display complex numbers in the normal way
instance Show Complex where
show (C r i)
| i == 0 = show r
| r == 0 = show i++"i"
| r < 0 && i < 0 = show r ++ " - "++ show (C 0 (i*(-1)))
| r < 0 && i > 0 = show r ++ " + "++ show (C 0 i)
| r > 0 && i < 0 = show r ++ " - "++ show (C 0 (i*(-1)))
| r > 0 && i > 0 = show r ++ " + "++ show (C 0 i)
-- Define algebraic operations on complex numbers
instance Num Complex where
fromInteger n = C (fromInteger n) 0 -- tech reasons
(C a b) + (C x y) = C (a+x) (b+y)
(C a b) * (C x y) = C (a*x - b*y) (b*x + b*y)
negate (C a b) = C (-a) (-b)
instance Fractional Complex where
fromRational r = C (fromRational r) 0 -- tech reasons
recip (C a b) = C (a/((a^2)+(b^2))) (b/((a^2)+(b^2)))
root :: Complex -> Complex
root (C x y)
| y == 0 && x == 0 = C 0 0
| y == 0 && x > 0 = C (sqrt ( ( x + sqrt ( (x^2) + 0 ) ) / 2 ) ) 0
| otherwise = C (sqrt ( ( x + sqrt ( (x^2) + (y^2) ) ) / 2 ) ) ((y/(2*(sqrt ( ( x + sqrt ( (x^2) + (y^2) ) ) / 2 ) ) ) ) )
-- quadratic polynomial : a.x^2 + b.x + c
data Quad = Q {
aCoeff, bCoeff, cCoeff :: Complex
} deriving Eq
instance Show Quad where
show (Q a b c) = show a ++ "x^2 + " ++ show b ++ "x + " ++ show c
solve :: Quad -> (Complex, Complex)
solve (Q a b c) = STUCK!
EDIT: I seem to have missed out the whole point of using my own complex number datatype is to learn about custom datatypes. I'm well aware that i could use complex.data. Any help that could be given using my solution so far would be greatly appreciated.\
EDIT 2: It seems that my initial question was worded horribly. I'm aware that the quadratic formula will return both (or just the one) root to me. Where I am having trouble is returning these roots as a (complex, complex) tuple with the code above.
I'm well aware that I could use the built in quadratic functions as have been displayed below, but this is not the exercise. The idea behind the exercise, and creating ones own complex number data type, is to learn about custom data types.
Like newacct said, it's just the quadratic equation:
(-b +- sqrt(b^2 - 4ac)) / 2a
module QuadraticSolver where
import Data.Complex
data Quadratic a = Quadratic a a a deriving (Show, Eq)
roots :: (RealFloat a) => Quadratic a -> [ Complex a ]
roots (Quadratic a b c) =
if discriminant == 0
then [ numer / denom ]
else [ (numer + root_discriminant) / denom,
(numer - root_discriminant) / denom ]
where discriminant = (b*b - 4*a*c)
root_discriminant = if (discriminant < 0)
then 0 :+ (sqrt $ -discriminant)
else (sqrt discriminant) :+ 0
denom = 2*a :+ 0
numer = (negate b) :+ 0
in practice:
ghci> :l QuadraticSolver
Ok, modules loaded: QuadraticSolver.
ghci> roots (Quadratic 1 2 1)
[(-1.0) :+ 0.0]
ghci> roots (Quadratic 1 0 1)
[0.0 :+ 1.0,(-0.0) :+ (-1.0)]
And adapting to use your terms:
solve :: Quad -> (Complex, Complex)
solve (Q a b c) = ( sol (+), sol (-) )
where sol op = (op (negate b) $ root $ b*b - 4*a*c) / (2 * a)
Although I haven't tested that code
Since Haskell's sqrt can also handle complex numbers, rampion's solution can even be further simplified:
import Data.Complex
-- roots for quadratic equations with complex coefficients
croots :: (RealFloat a) =>
(Complex a) -> (Complex a) -> (Complex a) -> [Complex a]
croots a b c
| disc == 0 = [solution (+)]
| otherwise = [solution (+), solution (-)]
where disc = b*b - 4*a*c
solution plmi = plmi (-b) (sqrt disc) / (2*a)
-- roots for quadratic equations with real coefficients
roots :: (RealFloat a) => a -> a -> a -> [Complex a]
roots a b c = croots (a :+ 0) (b :+ 0) (c :+ 0)
You can also use this croots function with your own datatype, if you change the types to fit your implementation (and call your root function instead of sqrt).

Haskell floating point error

So I have finished creating my own complex number data type in haskell.
I've also, thanks to another question on here, got a function that will solve a quadratic equation.
The only problem now is that the code generates a parsing error in hugs, when trying to solve a quadratic with complex roots.
i.e. In hugs...
Main> solve (Q 1 2 1)
(-1.0,-1.0)
Main> solve (Q 1 2 0)
(0.0,-2.0)
Main> solve (Q 1 2 2)
(
Program error: pattern match failure: v1618_v1655 (C -1.#IND -1.#IND)
It looks to my like its a problem after the square-root has been applied, but I'm really not sure. Any help trying to pick up what is going wrong or any indications as to what this error means would be brilliant.
Thanks,
Thomas
The Code:
-- A complex number z = (re +im.i) is represented as a pair of Floats
data Complex = C {
re :: Float,
im :: Float
} deriving Eq
-- Display complex numbers in the normal way
instance Show Complex where
show (C r i)
| i == 0 = show r
| r == 0 = show i++"i"
| r < 0 && i < 0 = show r ++ " - "++ show (C 0 (i*(-1)))
| r < 0 && i > 0 = show r ++ " + "++ show (C 0 i)
| r > 0 && i < 0 = show r ++ " - "++ show (C 0 (i*(-1)))
| r > 0 && i > 0 = show r ++ " + "++ show (C 0 i)
-- Define algebraic operations on complex numbers
instance Num Complex where
fromInteger n = C (fromInteger n) 0 -- tech reasons
(C a b) + (C x y) = C (a+x) (b+y)
(C a b) * (C x y) = C (a*x - b*y) (b*x + b*y)
negate (C a b) = C (-a) (-b)
instance Fractional Complex where
fromRational r = C (fromRational r) 0 -- tech reasons
recip (C a b) = C (a/((a^2)+(b^2))) (b/((a^2)+(b^2)))
root :: Complex -> Complex
root (C x y)
| y == 0 && x == 0 = C 0 0
| y == 0 && x > 0 = C (sqrt ( ( x + sqrt ( (x^2) + 0 ) ) / 2 ) ) 0
| otherwise = C (sqrt ( ( x + sqrt ( (x^2) + (y^2) ) ) / 2 ) ) ((y/(2*(sqrt ( ( x + sqrt ( (x^2) + (y^2) ) ) / 2 ) ) ) ) )
-- quadratic polynomial : a.x^2 + b.x + c
data Quad = Q {
aCoeff, bCoeff, cCoeff :: Complex
} deriving Eq
instance Show Quad where
show (Q a b c) = show a ++ "x^2 + " ++ show b ++ "x + " ++ show c
solve :: Quad -> (Complex, Complex)
solve (Q a b c) = ( sol (+), sol (-) )
where sol op = (op (negate b) $ root $ b*b - 4*a*c) / (2 * a)
Your numbers seem denormalized in your error :
(C -1.#IND -1.#IND)
In this case, you can't assume that any comparison on float are valid anymore. This is in the definition of floating point numbers. Then your definition of show
show (C r i)
| i == 0 = show r
| r == 0 = show i++"i"
| r < 0 && i < 0 = show r ++ " - "++ show (C 0 (i*(-1)))
| r < 0 && i > 0 = show r ++ " + "++ show (C 0 i)
| r > 0 && i < 0 = show r ++ " - "++ show (C 0 (i*(-1)))
| r > 0 && i > 0 = show r ++ " + "++ show (C 0 i)
leave opportunity for a pattern failure, because of denormalized numbers. You can add the following condition
| otherwise = show r ++ "i" ++ show i"
Now for the why is it like that, when you evaluate
b * b - 4 * a * c
with Q 1 2 2, you obtain -4, and then in root, you fall in your last case, and in the second equation :
y
-----------------------------
________________
/ _______
/ / 2 2
/ x + \/ x + y
2 * \ / ----------------
\/ 2
-4 + sqrt( (-4) ^2) == 0, from there, you're doomed, division by 0, followed by a "NaN" (not a number), screwing everything else
Dave hit the nail on the head.
With the original code in GHCi, I get:
*Main> solve (Q 1 2 2)
(*** Exception: c.hs:(11,4)-(17,63): Non-exhaustive patterns in function show
If we update the show block:
instance Show Complex where
show (C r i)
| i == 0 = show r
| r == 0 = show i++"i"
| r < 0 && i < 0 = show r ++ " - "++ show (C 0 (i*(-1)))
| r < 0 && i > 0 = show r ++ " + "++ show (C 0 i)
| r > 0 && i < 0 = show r ++ " - "++ show (C 0 (i*(-1)))
| r > 0 && i > 0 = show r ++ " + "++ show (C 0 i)
| otherwise = "???(" ++ show r ++ " " ++ show i ++ ")"
then we get this information in GHCi:
*Main> :l c.hs
[1 of 1] Compiling Main ( c.hs, interpreted )
c.hs:22:0:
Warning: No explicit method nor default method for `abs'
In the instance declaration for `Num Complex'
c.hs:22:0:
Warning: No explicit method nor default method for `signum'
In the instance declaration for `Num Complex'
Ok, modules loaded: Main.
*Main> solve (Q 1 2 2)
(???(NaN NaN),???(NaN NaN))
I was "born and raised" on GHCi, so I don't know exactly how Hugs compares in verbosity of warnings and errors; but it looks like GHCi is a clear winner in telling you what went wrong.
Off the top of my head: It could be a problem with your definition of show for Complex.
I notice you don't have default case like this:
| otherwise = ...
Therefore if your conditions with r and i are non exhaustive you'll get a pattern match failure.

Resources