Transposing a text file in Haskell - functional-programming

I was doing the exercises in here http://book.realworldhaskell.org/read/functional-programming.html . My solution to the problem where I need to transpose a text file seems to take a lot of CPU time. How can I improve below algorithm, if I could, to make it less CPU hungry.
import System.Environment (getArgs)
import Data.Char(isAlpha)
interactWith function inputFile outputFile = do
input <- readFile inputFile
writeFile outputFile (function input)
main = mainWith myFunction
where mainWith function = do
args <- getArgs
case args of
[input,output] -> interactWith function input output
_ -> putStrLn "error: exactly two arguments needed"
-- replace "id" with the name of our function below
myFunction = transpose
transpose :: String -> String
transpose input = tpose (lines input)
tpose [] = []
tpose xs = concat (map (take 1) xs) ++ "\n" ++ tpose (map (drop 1) xs)

Skip up to chapter 8, which talks about how inefficient the String datatype is, and proposes using ByteString instead. There's also Data.Text, if your file is unicode.

The Data.List module contains some useful functions, such as transpose :: [[a]] -> [[a]]. There's also lines and unlines in the Prelude, which convert between a String and a [String] (by breaking on newlines).
So basically, you probably want something like
main = do
[input,output] <- getArgs
interactWith (unlines . transpose . lines) input output

Related

Is there a way to display this only once?

I wrote this sml function that allows me to display the first 5 columns of the Ascii table.
fun radix (n, base) =
let
val b = size base
val digit = fn n => str (String.sub (base, n))
val radix' =
fn (true, n) => digit n
| (false, n) => radix (n div b, base) ^ digit (n mod b)
in
radix' (n < b, n)
end;
val n = 255;
val charList = List.tabulate(n+1,
fn x => print(
"DEC"^"\t"^"OCT"^"\t"^"HEX"^"\t"^"BIN"^"\t"^"Symbol"^"\n"^
Int.toString(x)^"\t"^
radix (x, "01234567")^"\t"^
radix (x, "0123456789abcdef")^"\t"^
radix (x, "01")^"\t"^
Char.toCString(chr(x))^"\t"
)
);
But I want the header : "DEC"^"\t"^"OCT"^"\t"^"HEX"^"\t"^"BIN"^"\t"^"Symbol" to be displayed only once at the beginning, but I can't do it. Does anyone know a way to do it?
On the other hand I would like to do without the resursive call of the "radix" function. Is that possible? And is it a wise way to write this function?
I want the header : "DEC"... to be displayed only once at the beginning
Currently the header displays multiple times because it is being printed inside of List.tabulate's function, once for each number in the table. So you can move printing the header outside of this function and into a parent function.
For clarity I might also move the printing of an individual character into a separate function. (I think you have indented the code in your charList very nicely, but if a function does more than one thing, it is doing too many things.)
E.g.
fun printChar (i : int) =
print (Int.toString i ^ ...)
fun printTable () =
( print "DEC\tOCT\tHEX\tBIN\tSymbol\n"
; List.tabulate (256, printChar)
; () (* why this? *)
)
It is very cool that you found Char.toCString which is safe compared to simply printing any character. It seems to give some pretty good names for e.g. \t and \n, but hardly for every function. So if you really want to spice up your table, you could add a helper function,
fun prettyName character =
if Char.isPrint character
then ...
else case ord character of
0 => "NUL (null)"
| 1 => "SOH (start of heading)"
| 2 => "STX (start of text)"
| ...
and use that instead of Char.toCString.
Whether to print a character itself or some description of it might be up to Char.isPrint.
I would like to do without the resursive call of the "radix" function.
Is that possible?
And is it a wise way to write this function?
You would need something equivalent to your radix function either way.
Sure, it seems okay. You could shorten it a bit, but the general approach is good.
You have avoided list recursion by doing String.sub constant lookups. That's great.

F# (F sharp) unzip function explained

I'm taking a university course in functional programming, using F#, and I keep getting confused about the logical flow of the following program. Would anyone care to explain?
let rec unzip = function
| [] -> ([],[])
| (x,y)::rest ->
let (xs,ys) = unzip rest
(x::xs,y:ys);;
So this program is supposed to take a list of pairs, and output a pair of lists.
[(1,'a');(2,'b')] -> ([1;2],['a','b'])
It seems to me, like the base case where the argument (list) is empty, the format of the output is given, but I don't understand how the third and fourth line is evaluated.
let (xs,ys) = unzip rest
(x::xs,y:ys);;
Firstly, this is a recursive function - the rec keyword is a giveawy :).
These can be quite hard to get you head around, but are quite common in functional programming.
I'll assume you are OK with most of the pattern matching going on, and that you are aware of the function keyword shorthand.
let rec unzip = function
| [] -> ([],[])
| (x,y)::rest ->
let (xs,ys) = unzip rest
(x::xs,y:ys);;
You seem quite happy with:
| [] -> ([],[])
Given an empty list, return a tuple with 2 empty lists. This isn't just a guard clause, it will be used later to stop the recursive program running forever.
The next bit...
| (x,y)::rest ->
Takes the first element (head) of the list and splits it off from the tail. It also deconstructs the head element which is a tuple into 2 values x and y.
The could be written out long hand as:
| head::rest ->
let x,y = head
Now is the fun part where it calls itself:
let (xs,ys) = unzip rest
(x::xs,y:ys);;
It might help to walk though an example an look at what goes on at each step:
unzip [(1,'a');(2,'b');(3,'c')]
x = 1
y = 'a'
rest = [(2,'b'); (3,'c')]
unzip rest
x = 2
y = 'b'
rest = [(3,'c')]
unzip rest
x = 3
y = 'c'
rest = []
unzip rest
return [],[]
xs = []
ys = []
return [x:xs],[y:ys] # 3:[] = [3], 'c':[] = ['c']
xs = [3]
ys = ['b']
return [x:xs],[y:ys] # 2:[3] = [2,3], 'b':['c'] = ['b', 'c']
xs = [2,3]
ys = ['b','c']
return [x:xs],[y:ys] # 1:[2;3] = [1,2,3], ['a']:['b';'c'] = ['a', 'b', 'c']
done

How to split a string on spaces in Clean?

I'm a newbie with functional programming and Clean. I want to split a string on whitespace, like the words function in Haskell.
words :: String -> [String]
input: "my separated list "
output: ["my","separated","list"]
This is the definition in Haskell:
words :: String -> [String]
words s = case dropWhile {-partain:Char.-}isSpace s of
"" -> []
s' -> w : words s''
where (w, s'') =
break {-partain:Char.-}isSpace s'
But Clean doesn't have break, and I dont know what it means, and how to implement it in Clean:
s' -> w : words s''
where (w, s'')
As the StdEnvApi document advises you should convert the String to a list to use the StdList API functions (section 6, page 20).
This results in something like this:
splitString :: String -> [String]
splitString x = [foldr (+++) "" i\\i<- splitString` (fromString x)]
where
splitString` :: [String] -> [[String]]
splitString` x = let (p, n) = span ((<>) " ") x in
if (isEmpty n) [p] [p:splitString` (tl n)]

functional programming with less recursion?

I am currently doing reasonably well in functional programming using F#. I tend, however, to do a lot of programming using recursion, when it seems that there are better idioms in the F#/functional programming community. So in the spirit of learning, is there a better/more idiomatic way of writing the function below without recursion?
let rec convert line =
if line.[0..1] = " " then
match convert line.[2..] with
| (i, subline) -> (i+1, subline)
else
(0, line)
with results such as:
> convert "asdf";;
val it : int * string = (0, "asdf")
> convert " asdf";;
val it : int * string = (1, "asdf")
> convert " asdf";;
val it : int * string = (3, "asdf")
Recursion is the basic mechanism for writing loops in functional languages, so if you need to iterate over characters (as you do in your sample), then recursion is what you need.
If you want to improve your code, then you should probably avoid using line.[2..] because that is going to be inefficient (strings are not designed for this kind of processing). It is better to convert the string to a list and then process it:
let convert (line:string) =
let rec loop acc line =
match line with
| ' '::' '::rest -> loop (acc + 1) rest
| _ -> (acc, line)
loop 0 (List.ofSeq line)
You can use various functions from the standard library to implement this in a more shorter way, but they are usually recursive too (you just do not see the recursion!), so I think using functions like Seq.unfold and Seq.fold is still recursive (and it looks way more complex than your code).
A more concise approach using standard libraries is to use the TrimLeft method (see comments), or using standard F# library functions, do something like this:
let convert (line:string) =
// Count the number of spaces at the beginning
let spaces = line |> Seq.takeWhile (fun c -> c = ' ') |> Seq.length
// Divide by two - we want to count & skip two-spaces only
let count = spaces / 2
// Get substring starting after all removed two-spaces
count, line.[(count * 2) ..]
EDIT Regarding the performance of string vs. list processing, the problem is that slicing allocates a new string (because that is how strings are represented on the .NET platform), while slicing a list just changes a reference. Here is a simple test:
let rec countList n s =
match s with
| x::xs -> countList (n + 1) xs
| _ -> n
let rec countString n (s:string) =
if s.Length = 0 then n
else countString (n + 1) (s.[1 ..])
let l = [ for i in 1 .. 10000 -> 'x' ]
let s = new System.String('x', 10000)
#time
for i in 0 .. 100 do countList 0 l |> ignore // 0.002 sec (on my machine)
for i in 0 .. 100 do countString 0 s |> ignore // 5.720 sec (on my machine)
Because you traverse the string in a non-uniform way, a recursive solution is much more suitable in this example. I would rewrite your tail-recursive solution for readability as follows:
let convert (line: string) =
let rec loop i line =
match line.[0..1] with
| " " -> loop (i+1) line.[2..]
| _ -> i, line
loop 0 line
Since you asked, here is a (bizarre) non-recursive solution :).
let convert (line: string) =
(0, line) |> Seq.unfold (fun (i, line) ->
let subline = line.[2..]
match line.[0..1] with
| " " -> Some((i+1, subline), (i+1, subline))
| _ -> None)
|> Seq.fold (fun _ x -> x) (0, line)
Using tail recursion, it can be written as
let rec convert_ acc line =
if line.[0..1] <> " " then
(acc, line)
else
convert_ (acc + 1) line.[2..]
let convert = convert_ 0
still looking for a non-recursive answer, though.
Here's a faster way to write your function -- it checks the characters explicitly instead of using string slicing (which, as Tomas said, is slow); it's also tail-recursive. Finally, it uses a StringBuilder to create the "filtered" string, which will provide better performance once your input string reaches a decent length (though it'd be a bit slower for very small strings due to the overhead of creating the StringBuilder).
let convert' str =
let strLen = String.length str
let sb = System.Text.StringBuilder strLen
let rec convertRec (count, idx) =
match strLen - idx with
| 0 ->
count, sb.ToString ()
| 1 ->
// Append the last character in the string to the StringBuilder.
sb.Append str.[idx] |> ignore
convertRec (count, idx + 1)
| _ ->
if str.[idx] = ' ' && str.[idx + 1] = ' ' then
convertRec (count + 1, idx + 2)
else
sb.Append str.[idx] |> ignore
convertRec (count, idx + 1)
// Call the internal, recursive implementation.
convertRec (0, 0)

Ocaml: matching on one item in a pair

I have a function that takes in a temp, which is a pair.
type temp = (pd * string);;
I want to extract that string in temp. But I can't write a function that can just match on temp since its a type.
I wrote a function:
let print_temp(t:temp) (out: out_channel) : unit =
fun z -> match z with
(_,a) -> output_string out a "
;;
But that gives me an error saying its not a function. I basically want to extract that string and print it. Any input on this would be appreciated.
Your solution is almost correct -- you don't need the "fun z ->" part, and it looks like you might have an extraneous ". Instead, you need to pattern match against t, like this:
let print_temp (t:temp) (out:out_channel) : unit =
match t with
(_,a) -> output_string out a
You can also do this more succinctly by pattern matching in the function definition:
let print_temp ((_,a):temp) (out:out_channel) : unit = output_string out a
In your code, the type error you get is telling you that you declared print_temp to return unit, but actually returned a function (fun z -> ...). Note that since the t:temp is what you want to "take apart", it makes sense that you would pattern match on it.
Instead of
match t with (_, a) -> output_string out a
you can also use the functions fst (and snd)
let a = fst t in output_string out a
or even more concise
output_string out (fst t)

Resources