How to read from stdin? - functional-programming

How to do this in Clean?
Pseudo code:
loop:
input = read_stdin
if input == "q":
break loop
else:
print "you input: ", input
Actually, I have had a glance at some pdf. But I got an imagination, It's difficult to deal with stdin and stdout. Could I have a code example to use stdio?
Following Keelan's instructions, I had finished my little program.
module std_in_out_loop
import StdEnv
loop :: *File -> *File
loop io
# io = fwrites "input your name: " io
# ( name, io ) = freadline io
# name = name % ( 0, size name - 2 )
| name == "q"
# io = fwrites "Bye!\n" io
= io
| name == ""
# io = fwrites "What's your name?\n" io
= loop io
| otherwise
# io = fwrites ( "hello " +++ name +++ "\n" ) io
= loop io
Start:: *World -> *World
Start world
# ( io, world ) = stdio world
# io = loop io
# ( ok, world ) = fclose io world
| not ok = abort "Cannot close io.\n"
| otherwise = world

From the Clean 2.2 manual, chapter 9:
Although Clean is purely functional, operations with side-effects (I/O operations, for instance) are permitted. To achieve
this without violating the semantics, the classical types are supplied with so called uniqueness attributes. If an argument
of a function is indicated as unique, it is guaranteed that at run-time the corresponding actual object is local, i.e. there are
no other references to it. Clearly, a destructive update of such a “unique object” can be performed safely.
Concretely, you can make Start, which normally has arity 0 (takes no arguments), a function from *World to *World. The idea is that we now have a function that changes the world, which means that side effects are allowed (they're not really side effects any more, but operations on the world).
The * indicates the uniqueness of the World type. This means that you cannot ever have two instances of the world argument. For example, the following will give a compile-time uniqueness error:
Start :: *World -> *(*World, *World)
Start w = (w, w)
To use standard IO, you will need functions from the StdFile module in StdEnv. The functions you're going to need are:
stdio :: !*World -> *(!*File, !*World)
fclose :: !*File !*World -> !(!Bool, !*World)
I simplified the types a bit, actually they're from the class FileSystem. stdio opens a unique File from a world and also returns the new, modified world. fclose closes a file in a world, and returns a success flag and the modified world.
Then, to read and write from that stdio file, you can use:
freadline :: !*File -> *(!*String, !*File)
fwrites :: !String !*File -> !*File
freadline reads a line into a String, including the newline character. fwrites writes a string to a file, usually you want to include a newline character when writing to stdio.
Putting it together:
Start :: *World -> *World
Start w
# (io,w) = stdio w // open stdio
# io = fwrites "What is your name?\n" io // ask for name
# (name,io) = freadline io // read in name
# name = name % (0, size name - 2) // remove \n from name
# io = fwrites ("Hello, " +++ name +++ "!\n") io // greet user
# (ok,w) = fclose io w // close stdio
| not ok = abort "Couldn't close stdio" // abort in case of failure
= w // return world from Start
The # syntax might be new to you. It's a kind of let which allows you to use the same name for files (or other things), which is more convenient than using, e.g.:
Start w = w3
where
(io, w1) = stdio w
io1 = fwrites "What is your name?\n" io
(name, io2) = freadline io1
//...
(ok, w3) = fclose io10 w2
Now you should be able to do what you want in pseudocode using a helper function loop :: *File -> *File, which calls itself recursively until q is inputted.
There are more functions than only freadline and fwrites, see StdFile.dcl for an idea.

Related

Clean language: append number in end of file, fwritei doesn't work

I'm trying to write function that receive [String] which are names of files, String which is the name of the files directory and *f. The function will append to each file an integer in the end.
Here is what I got so far:
import StdEnv
import StdFile
import FileManipulation
appendNumInEndOfVmFiles :: [String] String *f -> String
appendNumInEndOfVmFiles [] dirname w = "finished"
appendNumInEndOfVmFiles [x:xs] dirname w
# path = dirname +++ "\\\\" +++ x
# (ok,file,files) = fopen path FAppendText w
# file = fwritei 12 file
# (ok2,_) = fclose file w
= appendNumInEndOfVmFiles xs dirname w
Start w
// 1. Receive name of directory from the user.
# (io,w) = stdio w // open stdio
# io = fwrites "Enter name of directory:\n" io // ask for name
# (name,io) = freadline io // read in name
# name = name % (0, size name - 2) // remove \n from name
# (ok,w) = fclose io w // close stdio
| not ok = abort "Couldn't close stdio" // abort in case of failure
// 2. Get a list of all file names in that directory.
# (dir,w) = getDirectoryContents (RelativePath [PathDown name]) w
# fileList = getNamesOfFilesInDirectory (getEntriesList dir)
= appendNumInEndOfVmFiles (getVmFiles fileList) name w
Assume that getVmFiles is defined in my FileManipulation.dcl file and in the context of this problem name is "myDir" and file list is ["hello.vm","Wiki.vm"]
For some reason, even that I got "finished" message on the screen, the files aren't modified. No matter what kind of integer I give to fopen, even if its FWriteText or FWriteData its still doing nothing... also even if I'm using fwritec or fwrites with characters nothing happened.
What I'm missing here? Thanks a lot!
For some reason, even that I got "finished" message on the screen, the files aren't modified.
This is due to lazy evaluation. In appendNumInEndOfVmFiles, the result of fclose is not used, so fclose is not evaluated. Because of this, fwritei does not need to be evaluated either. You can fix this by adding a guard on ok2:
# (ok2,_) = fclose file w
| not ok2 = abort "fclose failed\n"
= appendNumInEndOfVmFiles xs dirname w
However, the typical way to do this would be to rewrite the function to return a *f instead of a String, so that this unique value is not lost. As long as the result is used, then, the fwritei is evaluated. You can potentially make the *f argument strict (i.e. add a ! in front). This would make sure that it is evaluated before entering the function, so that all lingering file closes have been performed.
There are some more issues with your code:
Here, w is used twice, which is illegal because it is of a strict type. You should use (ok2,w) in the guard to continue with the same environment.
# (ok2,_) = fclose file w
= appendNumInEndOfVmFiles xs dirname w
The appendNumInEndOfVmFiles needs to have a type context | FileSystem f to resolve overloading of fopen and fclose.
Lastly:
... even if its FWriteText or FWriteData ...
Just so you know: the difference would be that the first would write the integer in an ASCII representation whereas the second would write it binary as 4 or 8 bytes (depending on the bitwidth of your system).

Creating Sequence of Sequences is Causing a StackOverflowException

I'm trying to take a large file and split it into many smaller files. The location where each split occurs is based on a predicate returned from examining the contents of each given line (isNextObject function).
I have attempted to read in the large file via the File.ReadLines function so that I can iterate through the file one line at a time without having to hold the entire file in memory. My approach was to group the sequence into a sequence of smaller sub-sequences (one per file to be written out).
I found a useful function that Tomas Petricek created on fssnip called groupWhen. This function worked great for my initial testing on a small subset of the file, but a StackoverflowException is thrown when using the real file. I am not sure how to adjust the groupWhen function to prevent this (I'm still an F# greenie).
Here is a simplified version of the code showing only the relevant parts that will recreate the StackoverflowExcpetion::
// This is the function created by Tomas Petricek where the StackoverflowExcpetion is occuring
module Seq =
/// Iterates over elements of the input sequence and groups adjacent elements.
/// A new group is started when the specified predicate holds about the element
/// of the sequence (and at the beginning of the iteration).
///
/// For example:
/// Seq.groupWhen isOdd [3;3;2;4;1;2] = seq [[3]; [3; 2; 4]; [1; 2]]
let groupWhen f (input:seq<_>) = seq {
use en = input.GetEnumerator()
let running = ref true
// Generate a group starting with the current element. Stops generating
// when it founds element such that 'f en.Current' is 'true'
let rec group() =
[ yield en.Current
if en.MoveNext() then
if not (f en.Current) then yield! group() // *** Exception occurs here ***
else running := false ]
if en.MoveNext() then
// While there are still elements, start a new group
while running.Value do
yield group() |> Seq.ofList }
This is the gist of the code making use Tomas' function:
module Extractor =
open System
open System.IO
open Microsoft.FSharp.Reflection
// ... elided a few functions include "isNextObject" which is
// a string -> bool (examines the line and returns true
// if the string meets the criteria to that we are at the
// start of the next inner file)
let writeFile outputDir file =
// ... write out "file" to the file system
// NOTE: file is a seq<string>
let writeFiles outputDir (files : seq<seq<_>>) =
files
|> Seq.iter (fun file -> writeFile outputDir file)
And here is the relevant code in the console application that makes use of the functions:
let lines = inputFile |> File.ReadLines
writeFiles outputDir (lines |> Seq.groupWhen isNextObject)
Any ideas on the proper way to stop groupWhen from blowing the stack? I'm not sure how I would convert the function to use an accumulator (or to use a continuation instead, which I think is the correct terminology).
The problem with this is that the group() function returns a list, which is an eagerly evaluated data structure, which means that every time you call group() it has to run to the end, collect all results in a list, and return the list. This means that the recursive call happens within that same evaluation - i.e. truly recursively, - thus creating stack pressure.
To mitigate this problem, you could just replace the list with a lazy sequence:
let rec group() = seq {
yield en.Current
if en.MoveNext() then
if not (f en.Current) then yield! group()
else running := false }
However, I would consider less drastic approaches. This example is a good illustration of why you should avoid doing recursion yourself and resort to ready-made folds instead.
For example, judging by your description, it seems that Seq.windowed may work for you.
It's easy to overuse sequences in F#, IMO. You can accidentally get stack overflows, plus they are slow.
So (not actually answering your question),
personally I would just fold over the seq of lines using something like this:
let isNextObject line =
line = "---"
type State = {
fileIndex : int
filename: string
writer: System.IO.TextWriter
}
let makeFilename index =
sprintf "File%i" index
let closeFile (state:State) =
//state.writer.Close() // would use this in real code
state.writer.WriteLine("=== Closing {0} ===",state.filename)
let createFile index =
let newFilename = makeFilename index
let newWriter = System.Console.Out // dummy
newWriter.WriteLine("=== Creating {0} ===",newFilename)
// create new state with new writer
{fileIndex=index + 1; writer = newWriter; filename=newFilename }
let writeLine (state:State) line =
if isNextObject line then
/// finish old file here
closeFile state
/// create new file here and return updated state
createFile state.fileIndex
else
//write the line to the current file
state.writer.WriteLine(line)
// return the unchanged state
state
let processLines (lines: string seq) =
//setup
let initialState = createFile 1
// process the file
let finalState = lines |> Seq.fold writeLine initialState
// tidy up
closeFile finalState
(Obviously a real version would use files rather than the console)
Yes, it is crude, but it is easy to reason about, with
no unpleasant surprises.
Here's a test:
processLines [
"a"; "b"
"---";"c"; "d"
"---";"e"; "f"
]
And here's what the output looks like:
=== Creating File1 ===
a
b
=== Closing File1 ===
=== Creating File2 ===
c
d
=== Closing File2 ===
=== Creating File3 ===
e
f
=== Closing File3 ===

One processing conduit, 2 IO sources of the same type

In my GHC Haskell application utilizing stm, network-conduit and conduit, I have a strand for each socket which is forked automatically using runTCPServer. Strands can communicate with other strands through the use of a broadcasting TChan.
This showcases how I would like to set up the conduit "chain":
So, what we have here is two sources (each bound to helper conduits which) which produce a Packet object which encoder will accept and turn into ByteString, then send out the socket. I've had a great amount of difficulty with the efficient (performance is a concern) fusing of the two inputs.
I would appreciate if somebody could point me in the right direction.
Since it would be rude of me to post this question without making an attempt, I'll put what I've previously tried here;
I've written/cherrypicked a function which (blocking) produces a Source from a TMChan (closeable channel);
-- | Takes a generic type of STM chan and, given read and close functionality,
-- returns a conduit 'Source' which consumes the elements of the channel.
chanSource
:: (MonadIO m, MonadSTM m)
=> a -- ^ The channel
-> (a -> STM (Maybe b)) -- ^ The read function
-> (a -> STM ()) -- ^ The close/finalizer function
-> Source m b
chanSource ch readCh closeCh = ConduitM pull
where close = liftSTM $ closeCh ch
pull = PipeM $ liftSTM $ readCh ch >>= translate
translate = return . maybe (Done ()) (HaveOutput pull close)
Likewise, a function to transform a Chan into a sink;
-- | Takes a stream and, given write and close functionality, returns a sink
-- which wil consume elements and broadcast them into the channel
chanSink
:: (MonadIO m, MonadSTM m)
=> a -- ^ The channel
-> (a -> b -> STM()) -- ^ The write function
-> (a -> STM()) -- ^ The close/finalizer function
-> Sink b m ()
chanSink ch writeCh closeCh = ConduitM sink
where close = const . liftSTM $ closeCh ch
sink = NeedInput push close
write = liftSTM . writeCh ch
push x = PipeM $ write x >> return sink
Then mergeSources is straightforward; fork 2 threads (which I really don't want to do, but what the heck) which can put their new items into the one list which I then produce a source of;
-- | Merges a list of 'Source' objects, sinking them into a 'TMChan' and returns
-- a source which consumes the elements of the channel.
mergeSources
:: (MonadIO m, MonadBaseControl IO m, MonadSTM m)
=> [Source (ResourceT m) a] -- ^ The list of sources
-> ResourceT m (Source (ResourceT m) a)
mergeSources sx = liftSTM newTMChan >>= liftA2 (>>) (fsrc sx) retn
where push c s = s $$ chanSink c writeTMChan closeTMChan
fsrc x c = mapM_ (\s -> resourceForkIO $ push c s) x
retn c = return $ chanSource c readTMChan closeTMChan
While I was successful in making these functions typecheck, I was unsuccessful in getting any utilization of these functions to typecheck;
-- | Helper which represents a conduit chain for each client connection
serverApp :: Application SessionIO
serverApp appdata = do
use ssBroadcast >>= liftIO . atomically . dupTMChan >>= assign ssBroadcast
-- appSource appdata $$ decoder $= protocol =$= encoder =$ appSink appdata
mergsrc $$ protocol $= encoder =$ appSink appdata
where chansrc = chanSource (use ssBroadcast) readTMChan closeTMChan
mergsrc = mergeSources [appSource appdata $= decoder, chansrc]
-- | Structure which holds mutable information for clients
data SessionState = SessionState
{ _ssBroadcast :: TMChan Packet -- ^ Outbound packet broadcast channel
}
makeLenses ''SessionState
-- | A transformer encompassing both SessionReader and SessionState
type Session m = ReaderT SessionReader (StateT SessionState m)
-- | Macro providing Session applied to an IO monad
type SessionIO = Session IO
I see this method as being flawed anyhow -- there are many intermediate lists and conversions. This can not be good for performance. Seeking guidance.
PS. From what I can understand, this is not a duplicate of; Fusing conduits with multiple inputs , as in my situation both sources produce the same type and I don't care from which source the Packet object is produced, as long as I'm not waiting on one while another has objects ready to be consumed.
PPS. I apologize for the usage (and therefore requirement of knowledge) of Lens in example code.
I don't know if it's any help, but I tried to implement Iain's suggestion and made a variant of mergeSources' that stops as soon as any of the channels does:
mergeSources' :: (MonadIO m, MonadBaseControl IO m)
=> [Source (ResourceT m) a] -- ^ The sources to merge.
-> Int -- ^ The bound of the intermediate channel.
-> ResourceT m (Source (ResourceT m) a)
mergeSources' sx bound = do
c <- liftSTM $ newTBMChan bound
mapM_ (\s -> resourceForkIO $
s $$ chanSink c writeTBMChan closeTBMChan) sx
return $ sourceTBMChan c
(This simple addition is available here).
Some comments to your version of mergeSources (take them with a grain of salt, it can be I didn't understand something well):
Using ...TMChan instead of ...TBMChan seems dangerous. If the writers are faster than the reader, your heap will blow. Looking at your diagram it seems that this can easily happen, if your TCP peer doesn't read data fast enough. So I'd definitely use ...TBMChan with perhaps large but limited bound.
You don't need the MonadSTM m constraint. All STM stuff is wrapped into IO with
liftSTM = liftIO . atomically
Maybe this will help you slightly when using mergeSources' in serverApp.
Just a cosmetic issue, I found
liftSTM newTMChan >>= liftA2 (>>) (fsrc sx) retn
very hard to read due to its use of liftA2 on the (->) r monad. I'd say
do
c <- liftSTM newTMChan
fsrc sx c
retn c
would be longer, but much easier to read.
Could you perhaps create a self-contained project where it would be possible to play with serverApp?

File seeking with SML Basis

Is there a way, using the SML Basis library, to open a file at a specific position? That is, use an operating system call to change the position, rather than scan through the file and throw away the data.
This is tricky. Unfortunately, seeking isn't directly supported. Moreover, file positions are only transparent for binary files, i.e., those that you have opened with the BinIO structure [1]. For this structure, the corresponding type BinIO.StreamIO.pos is defined to be Position.int, which is some integer type.
However, in an SML system that supports the complete I/O stack from the standard you should be able to synthesise the following seek function using the lower I/O layers:
(* seekIn : BinIO.instream * Position.int -> unit *)
fun seekIn(instream, pos) =
case BinIO.StreamIO.getReader(BinIO.getInstream instream) of
(reader as BinPrimIO.RD{setPos = SOME f, ...}, _) =>
( f pos;
BinIO.setInstream(instream,
BinIO.StreamIO.mkInstream(reader, Word8Vector.fromList[]))
)
| (BinPrimIO.RD{name, ...}, _) =>
raise IO.Io{
name = name,
function = "seekIn",
cause = IO.RandomAccessNotSupported
}
Use it like:
val file = BinIO.openIn "filename"
val _ = seekIn(file, 200)
val bin = BinIO.inputN(file, 1000)
If you need to convert from Word8Vector to string:
val s = Byte.bytesToString bin
You can do the equivalent for out streams as well.
[1] http://standardml.org/Basis/bin-io.html#BIN_IO:SIG:SPEC
If you can manage to get hold of the reader/writer, then they should have getPos, setPos and endPos functions, depending on which kind of reader/writer you are dealing with.

Writing cat in OCaml: use of Unix.read

I'm trying to write small utilities to get used to Unix programming with OCaml. Here's my try for cat:
open Unix ;;
let buffer_size = 10
let buffer = String.create buffer_size
let rec cat = function
| [] -> ()
| x :: xs ->
let descr = openfile x [O_RDONLY] 0 in
let rec loop () =
match read descr buffer 0 buffer_size with
| 0 -> ()
| _ -> print_string buffer; loop () in
loop ();
print_newline ();
close descr;
cat xs ;;
handle_unix_error cat (List.tl (Array.to_list Sys.argv))
It seems that the problem is that, on the last call to read, the buffer doesn't entirely fill since there's nothing more to read, the end of what the buffer previously contained gets printed too. I read a few example codes using read and they didn't seem to use String.create every time they refill the buffer (which, anyway, still fills it with some characters...) ; so what should I do? Thanks.
The return of Unix.read (which you ignore, except checking for 0) is the number of characters that you've read, so you only should use that many characters of the buffer.
But really, why bother using the low-level Unix stuff? Why not use the regular OCaml file opening and reading functions?

Resources