Unix - circular shift of rows and columns in a file - unix

Given a file that contains something like this:
1 2 3 4
5 6 7 8
a b c d
e f g h
Is there any unix command that I could use to circular shift the rows and coluns?
I am looking for something like say,
circular_shift -r 2 <file> (shift row by 2) to give :
a b c d
e f g h
1 2 3 4
5 6 7 8
and
circular_shift -c 2 <file> (shift column by 2) to give :
3 4 1 2
7 8 5 6
c d a b
g h e f
Thanks!

Using awk for row shift processing the file twice:
$ awk -v r=2 'NR==FNR && FNR>r || NR>FNR && FNR<=r' file file
a b c d
e f g h
1 2 3 4
5 6 7 8
Basically it prints records where NR > r on the first go and NR <= r on the second.
Edit: Version regarding records and fields:
$ awk -v r=1 -v c=1 '
NR==FNR && FNR>r || NR>FNR && FNR<=r {
j=0;
for(i=c+1;++j<=NF;i=(i<NF?i+1:1)){
printf "%s%s",$i,(i==c?ORS:OFS)
}
}
' foo foo
6 7 8 5
b c d a
f g h e
2 3 4 1
(Pretty much untested as I'm in a meeting... it fails at least for c=0)

Another solution using multidimensional arrays in gawk
circular_shift.awk
{for(i=1; i<=NF; ++i){d[NR][i]=$i}}
END{
c=c%NF; r=r%NR
for(i=1; i<=NR; ++i){
nr = i + (i>r?0:NR) - r
for(j=1; j<=NF; ++j){
nc = j + (j>c?0:NF) - c
printf d[nr][nc] (j!=NF?OFS:RS)
}
}
}
awk -vr=2 -f circular_shift.awk file
a b c d
e f g h
1 2 3 4
5 6 7 8
awk -vc=2 -f circular_shift.awk file
3 4 1 2
7 8 5 6
c d a b
g h e f
awk -vr=2 -vc=2 -f circular_shift.awk file
c d a b
g h e f
3 4 1 2
7 8 5 6

Shifting Rows
You can use head, tail and the shell:
function circular_shift() {
n=$1
file=$2
tail -n +"$((n+1))" "$file"
head -n "$n" "$file"
}
Call the function like this:
circular_shift 2 <file>
One restriction. The above function just works for n <= nlines(file). If you want to get rid of that restriction you need to know the length of the file in advance and use the modulo operator:
function circular_shift() {
n=$1
file=$2
len="$(wc -l "$file"|cut -d" " -f1)"
n=$((n%len))
tail -n +"$((n+1))" "$file"
head -n "$n" "$file"
}
Now try to call:
circular_shift 6 <file>
Shifting Columns
For the column shift I would use awk:
column-shift.awk
{
n = n % NF
c = 1
for(i=NF-n+1; i<=NF; i++) {
a[c++] = $i
}
for(i=1; i<NF-n+1; i++) {
a[c++] = $i
}
for(i=1; i<c; i++) {
$i = a[i]
}
}
print
Wrap it in a shell function:
function column_shift() {
n="$1"
file="$2"
awk -v n="$n" -f column-shift.awk "$file"
}

#Vivek V K, Try:
For moving the rows to a number up-wards.
awk -vcount=2 'NR>count{print;next} NR<=count{Q=Q?Q ORS $0:$0} END{print Q}' Input_file
For shifting the fields, could you please try following:
awk -vcount=2 '{for(i=count+1;i<=NF;i++){Q=Q?Q FS $i:$i};for(j=1;j<=count;j++){P=P?P FS $j:$j};print Q FS P;Q=P=""}' Input_file

awk -v C=$1 -v R=$2 '
function PrintReverse () {
if( ! R ) return
for( i=1; i>=0; i--) {
for( j=1; j<=R; j++) {
#print "DEBUG:: i: "i " j:" j " i * R + j :" i * R + j " lr:" lr
print L[ i * R + j ]
L[ i * R + j ] = ""
}
}
}
{
if( C ) {
# Reverse Column
for ( i=1; i<=NF; i+=2*C) {
for( j=0; j<C; j++) {
#print "DEBUG:: i: "i " j:" j " NF:" NF
tmp = $(i+j)
$(i+j) = $(i+j+C)
$(i+j+C) = tmp
}
}
$1=$1
}
if ( R ) {
# Line buffer
lr = ( FNR - 1 ) % ( R * 2 ) + 1
L[ lr] = $0
}
else print
}
lr >= ( R * 2) { PrintReverse() }
END { if( lr < ( R * 2 )) PrintReverse() }
' YourFile
Will do both your reverse action
R is the number of row, C the number of column to reverse.
using 2 loop (1 loop inside another one) [not the fastest but the more explicit for understanding the concept ion this case)
this is a buffer permutation for lines by loading line in a buffer of twice the number of Row and print 2 half content in reverse order
this is a field swap for column permutation, it cycle by 2 * number of column swaping field content with field with index + number of column
Row are treated after the buffer is feeded (in fact each R * 2 lines)
column are treated at each line
i add a test ( C ), ( ! R ), ... to allow single reverse (Row only or Column only)

Related

Expressing Natural Number by sum of Triangular numbers

Triangular numbers are numbers which is number of things when things can be arranged in triangular shape.
For Example, 1, 3, 6, 10, 15... are triangular numbers.
o o o o o o o o o o is shape of n=4 triangular number
what I have to do is A natural number N is given and I have to print
N expressed by sum of triangular numbers.
if N = 4
output should be
1 1 1 1
1 3
3 1
else if N = 6
output should be
1 1 1 1 1 1
1 1 1 3
1 1 3 1
1 3 1 1
3 1 1 1
3 3
6
I have searched few hours and couldn't find answers...
please help.
(I am not sure this might help, but I found that
If i say T(k) is Triangular number when n is k, then
T(k) = T(k-1) + T(k-3) + T(k-6) + .... + T(k-p) while (k-p) > 0
and p is triangular number )
Here's Code for k=-1(Read comments below)
#include <iostream>
#include <vector>
using namespace std;
long TriangleNumber(int index);
void PrintTriangles(int index);
vector<long> triangleNumList(450); //(450 power raised by 2 is about 200,000)
vector<long> storage(100001);
int main() {
int n, p;
for (int i = 0; i < 450; i++) {
triangleNumList[i] = i * (i + 1) / 2;
}
cin >> n >> p;
cout << TriangleNumber(n);
if (p == 1) {
//PrintTriangles();
}
return 0;
}
long TriangleNumber(int index) {
int iter = 1, out = 0;
if (index == 1 || index == 0) {
return 1;
}
else {
if (storage[index] != 0) {
return storage[index];
}
else {
while (triangleNumList[iter] <= index) {
storage[index] = ( storage[index] + TriangleNumber(index - triangleNumList[iter]) ) % 1000000;
iter++;
}
}
}
return storage[index];
}
void PrintTriangles(int index) {
// What Algorithm?
}
Here is some recursive Python 3.6 code that prints the sums of triangular numbers that total the inputted target. I prioritized simplicity of code in this version. You may want to add error-checking on the input value, counting the sums, storing the lists rather than just printing them, and wrapping the entire routine into a function. Setting up the list of triangular numbers could also be done in fewer lines of code.
Your code saved time but worsened memory usage by "memoizing" the triangular numbers (storing and reusing them rather than always calculating them when needed). You could do the same to the sum lists, if you like. It is also possible to make this more in the dynamic programming style: find the sum lists for n=1 then for n=2 etc. I'll leave all that to you.
""" Given a positive integer n, print all the ways n can be expressed as
the sum of triangular numbers.
"""
def print_sums_of_triangular_numbers(prefix, target):
"""Print sums totalling to target, each after printing the prefix."""
if target == 0:
print(*prefix)
return
for tri in triangle_num_list:
if tri > target:
return
print_sums_of_triangular_numbers(prefix + [tri], target - tri)
n = int(input('Value of n ? '))
# Set up list of triangular numbers not greater than n
triangle_num_list = []
index = 1
tri_sum = 1
while tri_sum <= n:
triangle_num_list.append(tri_sum)
index += 1
tri_sum += index
# Print the sums totalling to n
print_sums_of_triangular_numbers([], n)
Here are the printouts of two runs of this code:
Value of n ? 4
1 1 1 1
1 3
3 1
Value of n ? 6
1 1 1 1 1 1
1 1 1 3
1 1 3 1
1 3 1 1
3 1 1 1
3 3
6

Turning row-based data into columns by header

I have one (fairly large) file, formatted like such:
SET1
A B C D E F G
SET2
H I J K L M
SETX
(...)
etc.
I would prefer to have them
SET1 SET2 SETX
A H (...)
B I
C J
D K
E L
F M
G
Note that the columns are unequally long, and they are not ordered by size. My file is too big to use the column function inbuilt in unix, and attempts at getting cute by splicing the file and then pasting it together have had problematic results (that is, it has resulted in the empty columns getting the same content as the separator, which doesn't work for my purposes - they both ended up being "\t"). Note that each set may contain several hundred entries, and I have thousands of sets, making awk impractical (at least with my admittedly limited skills there).
Ideally, the output should be readable in R, but at this point I'd be very happy for something that is practically translatable into R input. Note that I can totally live with this having a non-whitespace separator if that is more practical.
Many thanks in advance for any help! Working in an external linux environment.
Edit:
I also have the file available as
SET1
A
B
C
D
E
F
G
SET2
H
I
J
K
L
M
If that could make it easier.
I guess this is more what you wanted:
awk -v OFS="\t"
'/^SET/ {sets[++cols]=$0; set=$0; max_recs=(c>max_recs?c:max_recs); c=0; next}
NF{a[cols,++c]=$0}
END {
for (i=1;i<=cols; i++) printf "%s%s", sets[i], OFS
print ""
for (i=1; i<=max_recs; i++) {
for (j=1; j<=cols; j++) printf "%s%s", a[j,i], OFS
print ""
}
}' file
For this given input:
SET1
B
C
D
E
F
G
SET2
H
I
J
K
L
M
AAA
SET3
A
B
C
D
It returns:
$ awk -v OFS="\t" '/^SET/ {sets[++cols]=$0; set=$0; max_recs=(c>max_recs?c:max_recs); c=0; next} NF{a[cols,++c]=$0} END {for (i=1;i<=cols; i++) printf "%s%s", sets[i], OFS; print ""; for (i=1; i<=max_recs; i++) { for (j=1; j<=cols; j++) printf "%s%s", a[j,i], OFS; print ""}}' file
SET1 SET2 SET3
B H A
C I B
D J C
E K D
F L
G M
AAA
Previous solution with just one block.
You can use paste to show files side by side.
In this case, let's use head and tail to the get half and half. Then, xargs to print one block of text per line. Then they are ready to be pasted:
paste -d"\t" <(head -2 file | xargs -n1) <(tail -2 file | xargs -n1)
For your given input it returns:
SET1 SET2
A H
B I
C J
D K
E L
F M
G

AWK division by zero error

I'm getting a division by 0 error from my awk command. I'm not sure what is causing this as the result should not be 0.
In this case it should be printing 1.11557887 from 1.7229/1.5444.
Could it be a problem with how I assigned the variables?
This is my script:
#!/usr/bin/awk -f
FNR == 22 { measC = $2 }
FNR == 23 { refC = $2 }
factorC = refC / measC
{ print factorC }
It returns:
/usr/bin/awk: division by zero
input record number 1, file 1.txt
source line number 5
This is what my input data looks like:
#!xxx x
# x x x x x x
# x: x x
# x: x x
# x: x x x x x
# (x) x x, x x x, x.
x: x x x x
x: 3.0.0
x: x
x: 0
x: x x
x: 0
x: x x
x: x
x: 0
x: 0
x: 2
x: x x x
x: x
x: 1
x: 4
origmax: 1.5444 1.5188 1.0221 1.4932
currentmax: 1.7229 1.6888 1.1069 1.6238
Because you put factorC = refC / measC outside of a block, awk thinks you want to use that expression as a pattern. So it evaluates that expression for each line of input. On the first line of input, measC hasn't been defined yet, so it defaults to zero.
I think you want this:
#!/usr/bin/awk -f
FNR == 22 { measC = $2 }
FNR == 23 { refC = $2 }
END {
factorC = refC / measC
print factorC
}
or this:
#!/usr/bin/awk -f
FNR == 22 { measC = $2 }
FNR == 23 {
refC = $2
factorC = refC / measC
print factorC
}
Your script says:
FNR == 22 { measC = $2 }
So ... when only, say, five lines of your input file have been read by this awk script, what is the value of measC?
I'll tell you a secret. It will be zero. Because nothing has assigned anything else to measC yet.
Also, your line:
factorC = refC / measC
is outside the block, so it's being used to evaluate whether the { print factorC } should be run. And because it's a condition, it gets run for every line. And wouldn't you know it, before line 22, measC is 0.
I don't understand the data or the output, so I don't know what measC should be, if anything.
What are you trying to achieve with this?

Find all words containing characters in UNIX

Given a word W, I want to find all words containing the letters in W from /usr/dict/words.
For example, "bat" should return "bat" and "tab" (but not "table").
Here is one solution which involves sorting the input word and matching:
word=$1
sortedWord=`echo $word | grep -o . | sort | tr -d '\n'`
while read line
do
sortedLine=`echo $line | grep -o . | sort | tr -d '\n'`
if [ "$sortedWord" == "$sortedLine" ]
then
echo $line
fi
done < /usr/dict/words
Is there a better way? I'd prefer using basic commands (instead of perl/awk etc), but all solutions are welcome!
To clarify, I want to find all permutations of the original word. Addition or deletion of characters is not allowed.
here's an awk implementation. It finds the words with those letters in "W".
dict="/usr/share/dict/words"
word=$1
awk -vw="$word" 'BEGIN{
m=split(w,c,"")
for(p=1;p<=m;p++){ chars[c[p]]++ }
}
length($0)==length(w){
f=0;g=0
n=split($0,t,"")
for(o=1;o<=n;o++){
if (!( t[o] in chars) ){
f=1; break
}else{ st[t[o]]++ }
}
if (!f || $0==w){
for(z in st){
if ( st[z] != chars[z] ) { g=1 ;break}
}
if(!g){ print "found: "$0 }
}
delete st
}' $dict
output
$ wc -l < /usr/share/dict/words
479829
$ time ./shell.sh look
found: kolo
found: look
real 0m1.361s
user 0m1.074s
sys 0m0.015s
Update: change of algorithm, using sorting
dict="/usr/share/dict/words"
awk 'BEGIN{
w="table"
m=split(w,c,"")
b=asort(c,chars)
}
length($0)==length(w){
f=0
n=split($0,t,"")
e=asort(t,d)
for(i=1;i<=e;i++) {
if(d[i]!=chars[i]){
f=1;break
}
}
if(!f) print $0
}' $dict
output
$ time ./shell.sh #looking for table
ablet
batel
belat
blate
bleat
tabel
table
real 0m1.416s
user 0m1.343s
sys 0m0.014s
$ time ./shell.sh #looking for chairs
chairs
ischar
rachis
real 0m1.697s
user 0m1.660s
sys 0m0.014s
$ time perl perl.pl #using beamrider's Perl script
table
tabel
ablet
batel
blate
bleat
belat
real 0m2.680s
user 0m1.633s
sys 0m0.881s
$ time perl perl.pl # looking for chairs
chairs
ischar
rachis
real 0m14.044s
user 0m8.328s
sys 0m5.236s
Here's a shell solution. The best algorithm seems to be #4. It filters out all words that are of incorrect length. Then, it sums the words using a simple substitution cipher (a=1, b=2, A=27, ...). If the sums match, then it will actually do the original sort and compare.
On my system, it can churn through ~235k words looking for "bat" in just under 1/2 second.
I'm providing all of my solutions so you can see the different approaches.
Update: not shown, but I also tried putting the sum inside the first bin of the histogram approach I tried, but it was even slower than the histograms without. I thought it would function as a short circuit, but it didn't work.
Update2: I tried the awk solution and it runs in about 1/3 the time of my best shell solution or ~0.126s versus ~0.490s. The perl solution runs ~1.1s.
#!/bin/bash
word=$1
#dict=words
dict=/usr/share/dict/words
#dict=/usr/dict/words
alg1() {
sortedWord=`echo $word | grep -o . | sort | tr -d '\n'`
while read line
do
sortedLine=`echo $line | grep -o . | sort | tr -d '\n'`
if [ "$sortedWord" == "$sortedLine" ]
then
echo $line
fi
done < $dict
}
check_sorted_versus_not() {
local word=$1
local line=`echo $2 | grep -o . | sort | tr -d '\n'`
if [ "$word" == "$line" ]
then
echo $2
fi
}
# Filter out all words of incorrect length
alg2() {
sortedWord=`echo $word | grep -o . | sort | tr -d '\n'`
grep_string="^`echo -n $word | tr 'a-zA-Z' '.'`\$"
grep "$grep_string" "$dict" | \
while read line
do
sortedLine=`echo $line | grep -o . | sort | tr -d '\n'`
if [ "$sortedWord" == "$sortedLine" ]
then
echo $line
fi
done
}
# Create a lot of variables like this:
# _a=1, _b=2, ... _z=26, _A=27, _B=28, ... _Z=52
gen_chars() {
# [ -n "$GEN_CHARS" ] && return
GEN_CHARS=1
local alpha="abcdefghijklmnopqrstuvwxyz"
local upperalpha=`echo -n $alpha | tr 'a-z' 'A-Z'`
local both="$alpha$upperalpha"
for ((i=0; i < ${#both}; i++))
do
ACHAR=${both:i:1}
eval "_$ACHAR=$((i+1))"
done
}
# I think it's faster to return the value in a var then to echo it in a sub process.
# Try summing the word one char at a time by building an arithmetic expression
# and then evaluate that expression.
# Requires: gen_chars
sum_word() {
SUM=0
local s=""
# parsing input one character at a time
for ((i=0; i < ${#1}; i++))
do
ACHAR=${1:i:1}
s="$s\$_$ACHAR+"
done
SUM=$(( $(eval echo -n ${s}0) ))
}
# I think it's faster to return the value in a var then to echo it in a sub process.
# Try summing the word one char at a time using a case statement.
sum_word2() {
SUM=0
local s=""
# parsing input one character at a time
for ((i=0; i < ${#1}; i++))
do
ACHAR=${1:i:1}
case $ACHAR in
a) SUM=$((SUM+ 1));;
b) SUM=$((SUM+ 2));;
c) SUM=$((SUM+ 3));;
d) SUM=$((SUM+ 4));;
e) SUM=$((SUM+ 5));;
f) SUM=$((SUM+ 6));;
g) SUM=$((SUM+ 7));;
h) SUM=$((SUM+ 8));;
i) SUM=$((SUM+ 9));;
j) SUM=$((SUM+ 10));;
k) SUM=$((SUM+ 11));;
l) SUM=$((SUM+ 12));;
m) SUM=$((SUM+ 13));;
n) SUM=$((SUM+ 14));;
o) SUM=$((SUM+ 15));;
p) SUM=$((SUM+ 16));;
q) SUM=$((SUM+ 17));;
r) SUM=$((SUM+ 18));;
s) SUM=$((SUM+ 19));;
t) SUM=$((SUM+ 20));;
u) SUM=$((SUM+ 21));;
v) SUM=$((SUM+ 22));;
w) SUM=$((SUM+ 23));;
x) SUM=$((SUM+ 24));;
y) SUM=$((SUM+ 25));;
z) SUM=$((SUM+ 26));;
A) SUM=$((SUM+ 27));;
B) SUM=$((SUM+ 28));;
C) SUM=$((SUM+ 29));;
D) SUM=$((SUM+ 30));;
E) SUM=$((SUM+ 31));;
F) SUM=$((SUM+ 32));;
G) SUM=$((SUM+ 33));;
H) SUM=$((SUM+ 34));;
I) SUM=$((SUM+ 35));;
J) SUM=$((SUM+ 36));;
K) SUM=$((SUM+ 37));;
L) SUM=$((SUM+ 38));;
M) SUM=$((SUM+ 39));;
N) SUM=$((SUM+ 40));;
O) SUM=$((SUM+ 41));;
P) SUM=$((SUM+ 42));;
Q) SUM=$((SUM+ 43));;
R) SUM=$((SUM+ 44));;
S) SUM=$((SUM+ 45));;
T) SUM=$((SUM+ 46));;
U) SUM=$((SUM+ 47));;
V) SUM=$((SUM+ 48));;
W) SUM=$((SUM+ 49));;
X) SUM=$((SUM+ 50));;
Y) SUM=$((SUM+ 51));;
Z) SUM=$((SUM+ 52));;
*) SUM=0; return;;
esac
done
}
# I think it's faster to return the value in a var then to echo it in a sub process.
# Try summing the word by building an arithmetic expression using sed and then evaluating
# the expression.
# Requires: gen_chars
sum_word3() {
SUM=$(( $(eval echo -n `echo -n $1 | sed -E -ne 's,.,$_&+,pg'`) 0))
#echo "SUM($1)=$SUM"
}
# Filter out all words of incorrect length
# Sum the characters in the word: i.e. a=1, b=2, ... and "abbc" = 1+2+2+3 = 8
alg3() {
gen_chars
sortedWord=`echo $word | grep -o . | sort | tr -d '\n'`
sum_word $word
word_sum=$SUM
grep_string="^`echo -n $word | tr 'a-zA-Z' '.'`\$"
grep "$grep_string" "$dict" | \
while read line
do
sum_word $line
line_sum=$SUM
if [ $word_sum == $line_sum ]
then
check_sorted_versus_not $sortedWord $line
fi
done
}
# Filter out all words of incorrect length
# Sum the characters in the word: i.e. a=1, b=2, ... and "abbc" = 1+2+2+3 = 8
# Use sum_word2
alg4() {
sortedWord=`echo $word | grep -o . | sort | tr -d '\n'`
sum_word2 $word
word_sum=$SUM
grep_string="^`echo -n $word | tr 'a-zA-Z' '.'`\$"
grep "$grep_string" "$dict" | \
while read line
do
sum_word2 $line
line_sum=$SUM
if [ $word_sum == $line_sum ]
then
check_sorted_versus_not $sortedWord $line
fi
done
}
# Filter out all words of incorrect length
# Sum the characters in the word: i.e. a=1, b=2, ... and "abbc" = 1+2+2+3 = 8
# Use sum_word3
alg5() {
gen_chars
sortedWord=`echo $word | grep -o . | sort | tr -d '\n'`
sum_word3 $word
word_sum=$SUM
grep_string="^`echo -n $word | tr 'a-zA-Z' '.'`\$"
grep "$grep_string" "$dict" | \
while read line
do
sum_word3 $line
line_sum=$SUM
if [ $word_sum == $line_sum ]
then
check_sorted_versus_not $sortedWord $line
fi
done
}
# I think it's faster to return the value in a var then to echo it in a sub process.
# Try summing the word one char at a time using a case statement.
# Place results in a histogram
sum_word4() {
SUM=(0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0
0)
# parsing input one character at a time
for ((i=0; i < ${#1}; i++))
do
ACHAR=${1:i:1}
case $ACHAR in
a) SUM[1]=$((SUM[ 1] + 1));;
b) SUM[2]=$((SUM[ 2] + 1));;
c) SUM[3]=$((SUM[ 3] + 1));;
d) SUM[4]=$((SUM[ 4] + 1));;
e) SUM[5]=$((SUM[ 5] + 1));;
f) SUM[6]=$((SUM[ 6] + 1));;
g) SUM[7]=$((SUM[ 7] + 1));;
h) SUM[8]=$((SUM[ 8] + 1));;
i) SUM[9]=$((SUM[ 9] + 1));;
j) SUM[10]=$((SUM[10] + 1));;
k) SUM[11]=$((SUM[11] + 1));;
l) SUM[12]=$((SUM[12] + 1));;
m) SUM[13]=$((SUM[13] + 1));;
n) SUM[14]=$((SUM[14] + 1));;
o) SUM[15]=$((SUM[15] + 1));;
p) SUM[16]=$((SUM[16] + 1));;
q) SUM[17]=$((SUM[17] + 1));;
r) SUM[18]=$((SUM[18] + 1));;
s) SUM[19]=$((SUM[19] + 1));;
t) SUM[20]=$((SUM[20] + 1));;
u) SUM[21]=$((SUM[21] + 1));;
v) SUM[22]=$((SUM[22] + 1));;
w) SUM[23]=$((SUM[23] + 1));;
x) SUM[24]=$((SUM[24] + 1));;
y) SUM[25]=$((SUM[25] + 1));;
z) SUM[26]=$((SUM[26] + 1));;
A) SUM[27]=$((SUM[27] + 1));;
B) SUM[28]=$((SUM[28] + 1));;
C) SUM[29]=$((SUM[29] + 1));;
D) SUM[30]=$((SUM[30] + 1));;
E) SUM[31]=$((SUM[31] + 1));;
F) SUM[32]=$((SUM[32] + 1));;
G) SUM[33]=$((SUM[33] + 1));;
H) SUM[34]=$((SUM[34] + 1));;
I) SUM[35]=$((SUM[35] + 1));;
J) SUM[36]=$((SUM[36] + 1));;
K) SUM[37]=$((SUM[37] + 1));;
L) SUM[38]=$((SUM[38] + 1));;
M) SUM[39]=$((SUM[39] + 1));;
N) SUM[40]=$((SUM[40] + 1));;
O) SUM[41]=$((SUM[41] + 1));;
P) SUM[42]=$((SUM[42] + 1));;
Q) SUM[43]=$((SUM[43] + 1));;
R) SUM[44]=$((SUM[44] + 1));;
S) SUM[45]=$((SUM[45] + 1));;
T) SUM[46]=$((SUM[46] + 1));;
U) SUM[47]=$((SUM[47] + 1));;
V) SUM[48]=$((SUM[48] + 1));;
W) SUM[49]=$((SUM[49] + 1));;
X) SUM[50]=$((SUM[50] + 1));;
Y) SUM[51]=$((SUM[51] + 1));;
Z) SUM[52]=$((SUM[52] + 1));;
*) SUM[53]=-1; return;;
esac
done
#echo ${SUM[*]}
}
# Check if two histograms are equal
hist_are_equal() {
# Array sizes differ?
[ ${#_h1[*]} != ${#SUM[*]} ] && return 1
# parsing input one index at a time
for ((i=0; i < ${#_h1[*]}; i++))
do
[ ${_h1[i]} != ${SUM[i]} ] && return 1
done
return 0
}
# Check if two histograms are equal
hist_are_equal2() {
# Array sizes differ?
local size=${#_h1[*]}
[ $size != ${#SUM[*]} ] && return 1
# parsing input one index at a time
for ((i=0; i < $size; i++))
do
[ ${_h1[i]} != ${SUM[i]} ] && return 1
done
return 0
}
# Filter out all words of incorrect length
# Use sum_word4 which generates a histogram of character frequency
alg6() {
sum_word4 $word
_h1=${SUM[*]}
grep_string="^`echo -n $word | tr 'a-zA-Z' '.'`\$"
grep "$grep_string" "$dict" | \
while read line
do
sum_word4 $line
if hist_are_equal
then
echo $line
fi
done
}
# Filter out all words of incorrect length
# Use sum_word4 which generates a histogram of character frequency
alg7() {
sum_word4 $word
_h1=${SUM[*]}
grep_string="^`echo -n $word | tr 'a-zA-Z' '.'`\$"
grep "$grep_string" "$dict" | \
while read line
do
sum_word4 $line
if hist_are_equal2
then
echo $line
fi
done
}
run_test() {
echo alg$1
eval time alg$1
}
#run_test 1
#run_test 2
#run_test 3
run_test 4
#run_test 5
run_test 6
#run_test 7
#!/usr/bin/perl
$myword=join("", sort split (//, $ARGV[0]));
shift;
while (<>) {
chomp;
print "$_\n" if (join("", sort split (//)) eq $myword);
}
Use it like this:
bla.pl < /usr/dict/words searchword
You want to find words containing only a given set of characters. A regex for that would be:
'^[letters_you_care_about]*$'
So, you could do:
grep "^[$W]*$" /usr/dict/words
The '^' matches the beginning of the line; '$' is for the end of the line. This means we must have an exact match, not just a partial match (e.g. "table").
'[' and ']' are used to define a group of possible characters allowed in one character space of the input file. We use this to find words in /usr/dict/word that only contain the characters in $W.
The '*' repeats the previous character (the '[...]' rule), which says to find a word of any length, where all the characters are in $W.
So we have the following:
n = length of input word
L = lines in dictionary file
If n tends to be small and L tends to be huge, might we be better off finding all permutations of the input word and looking for those, rather than doing something (like sorting) to all L lines of the dictionary file? (Actually, since finding all permutations of a word is O(n!), and we have to run through the entire dictionary file once for each word, maybe not, but I wrote the code anyway.)
This is Perl - I know you wanted command-line operations but I don't have a way to do that in shell script that's not super-hacky:
sub dedupe {
my (#list) = #_;
my (#new_list, %seen_entries, $entry);
foreach $entry (#list) {
if (!(defined($seen_entries{$entry}))) {
push(#new_list, $entry);
$seen_entries{$entry} = 1;
}
}
return #new_list;
}
sub find_all_permutations {
my ($word) = #_;
my (#permutations, $subword, $letter, $rest_of_word, $i);
if (length($word) == 1) {
push(#permutations, $word);
} else {
for ($i=0; $i<length($word); $i++) {
$letter = substr($word, $i, 1);
$rest_of_word = substr($word, 0, $i) . substr($word, $i + 1);
foreach $subword (find_all_permutations($rest_of_word)) {
push(#permutations, $letter . $subword);
}
}
}
return #permutations;
}
$words_file = '/usr/share/dict/words';
$word = 'table';
#all_permutations = dedupe(find_all_permutations($word));
foreach $permutation (#all_permutations) {
if (`grep -c -m 1 ^$permutation\$ $words_file` == 1) {
print $permutation . "\n";
}
}
This utility might interest you:
an -w "tab" -m 3
...gives bat and tab only.
The original author seems to not be around any more, but you can find information at http://packages.qa.debian.org/a/an.html (even if you don't want to use it itself, the source might be worth a look).

Haskell floating point error

So I have finished creating my own complex number data type in haskell.
I've also, thanks to another question on here, got a function that will solve a quadratic equation.
The only problem now is that the code generates a parsing error in hugs, when trying to solve a quadratic with complex roots.
i.e. In hugs...
Main> solve (Q 1 2 1)
(-1.0,-1.0)
Main> solve (Q 1 2 0)
(0.0,-2.0)
Main> solve (Q 1 2 2)
(
Program error: pattern match failure: v1618_v1655 (C -1.#IND -1.#IND)
It looks to my like its a problem after the square-root has been applied, but I'm really not sure. Any help trying to pick up what is going wrong or any indications as to what this error means would be brilliant.
Thanks,
Thomas
The Code:
-- A complex number z = (re +im.i) is represented as a pair of Floats
data Complex = C {
re :: Float,
im :: Float
} deriving Eq
-- Display complex numbers in the normal way
instance Show Complex where
show (C r i)
| i == 0 = show r
| r == 0 = show i++"i"
| r < 0 && i < 0 = show r ++ " - "++ show (C 0 (i*(-1)))
| r < 0 && i > 0 = show r ++ " + "++ show (C 0 i)
| r > 0 && i < 0 = show r ++ " - "++ show (C 0 (i*(-1)))
| r > 0 && i > 0 = show r ++ " + "++ show (C 0 i)
-- Define algebraic operations on complex numbers
instance Num Complex where
fromInteger n = C (fromInteger n) 0 -- tech reasons
(C a b) + (C x y) = C (a+x) (b+y)
(C a b) * (C x y) = C (a*x - b*y) (b*x + b*y)
negate (C a b) = C (-a) (-b)
instance Fractional Complex where
fromRational r = C (fromRational r) 0 -- tech reasons
recip (C a b) = C (a/((a^2)+(b^2))) (b/((a^2)+(b^2)))
root :: Complex -> Complex
root (C x y)
| y == 0 && x == 0 = C 0 0
| y == 0 && x > 0 = C (sqrt ( ( x + sqrt ( (x^2) + 0 ) ) / 2 ) ) 0
| otherwise = C (sqrt ( ( x + sqrt ( (x^2) + (y^2) ) ) / 2 ) ) ((y/(2*(sqrt ( ( x + sqrt ( (x^2) + (y^2) ) ) / 2 ) ) ) ) )
-- quadratic polynomial : a.x^2 + b.x + c
data Quad = Q {
aCoeff, bCoeff, cCoeff :: Complex
} deriving Eq
instance Show Quad where
show (Q a b c) = show a ++ "x^2 + " ++ show b ++ "x + " ++ show c
solve :: Quad -> (Complex, Complex)
solve (Q a b c) = ( sol (+), sol (-) )
where sol op = (op (negate b) $ root $ b*b - 4*a*c) / (2 * a)
Your numbers seem denormalized in your error :
(C -1.#IND -1.#IND)
In this case, you can't assume that any comparison on float are valid anymore. This is in the definition of floating point numbers. Then your definition of show
show (C r i)
| i == 0 = show r
| r == 0 = show i++"i"
| r < 0 && i < 0 = show r ++ " - "++ show (C 0 (i*(-1)))
| r < 0 && i > 0 = show r ++ " + "++ show (C 0 i)
| r > 0 && i < 0 = show r ++ " - "++ show (C 0 (i*(-1)))
| r > 0 && i > 0 = show r ++ " + "++ show (C 0 i)
leave opportunity for a pattern failure, because of denormalized numbers. You can add the following condition
| otherwise = show r ++ "i" ++ show i"
Now for the why is it like that, when you evaluate
b * b - 4 * a * c
with Q 1 2 2, you obtain -4, and then in root, you fall in your last case, and in the second equation :
y
-----------------------------
________________
/ _______
/ / 2 2
/ x + \/ x + y
2 * \ / ----------------
\/ 2
-4 + sqrt( (-4) ^2) == 0, from there, you're doomed, division by 0, followed by a "NaN" (not a number), screwing everything else
Dave hit the nail on the head.
With the original code in GHCi, I get:
*Main> solve (Q 1 2 2)
(*** Exception: c.hs:(11,4)-(17,63): Non-exhaustive patterns in function show
If we update the show block:
instance Show Complex where
show (C r i)
| i == 0 = show r
| r == 0 = show i++"i"
| r < 0 && i < 0 = show r ++ " - "++ show (C 0 (i*(-1)))
| r < 0 && i > 0 = show r ++ " + "++ show (C 0 i)
| r > 0 && i < 0 = show r ++ " - "++ show (C 0 (i*(-1)))
| r > 0 && i > 0 = show r ++ " + "++ show (C 0 i)
| otherwise = "???(" ++ show r ++ " " ++ show i ++ ")"
then we get this information in GHCi:
*Main> :l c.hs
[1 of 1] Compiling Main ( c.hs, interpreted )
c.hs:22:0:
Warning: No explicit method nor default method for `abs'
In the instance declaration for `Num Complex'
c.hs:22:0:
Warning: No explicit method nor default method for `signum'
In the instance declaration for `Num Complex'
Ok, modules loaded: Main.
*Main> solve (Q 1 2 2)
(???(NaN NaN),???(NaN NaN))
I was "born and raised" on GHCi, so I don't know exactly how Hugs compares in verbosity of warnings and errors; but it looks like GHCi is a clear winner in telling you what went wrong.
Off the top of my head: It could be a problem with your definition of show for Complex.
I notice you don't have default case like this:
| otherwise = ...
Therefore if your conditions with r and i are non exhaustive you'll get a pattern match failure.

Resources