Writing cat in OCaml: use of Unix.read - unix

I'm trying to write small utilities to get used to Unix programming with OCaml. Here's my try for cat:
open Unix ;;
let buffer_size = 10
let buffer = String.create buffer_size
let rec cat = function
| [] -> ()
| x :: xs ->
let descr = openfile x [O_RDONLY] 0 in
let rec loop () =
match read descr buffer 0 buffer_size with
| 0 -> ()
| _ -> print_string buffer; loop () in
loop ();
print_newline ();
close descr;
cat xs ;;
handle_unix_error cat (List.tl (Array.to_list Sys.argv))
It seems that the problem is that, on the last call to read, the buffer doesn't entirely fill since there's nothing more to read, the end of what the buffer previously contained gets printed too. I read a few example codes using read and they didn't seem to use String.create every time they refill the buffer (which, anyway, still fills it with some characters...) ; so what should I do? Thanks.

The return of Unix.read (which you ignore, except checking for 0) is the number of characters that you've read, so you only should use that many characters of the buffer.
But really, why bother using the low-level Unix stuff? Why not use the regular OCaml file opening and reading functions?

Related

Simple example of recursive run through string characters in Erlang

I can't get how I can go through all characters in a string, can you please share a simple example?
I have a string, like
"function(){var a = 10; var b = 5; return a + b;}".
Now I want to "cycle" through the string character by character and do something depending on its value.
Here is my code which doesn't work, while running as lexme("some string here").:
lexme(S) ->
lexme(S, 1).
lexme([H | T], _) ->
io:fwrite("~p~n", [H]),
T.
In order to make lexme/2 recursive, it must call itself.
Try this:
lexme([H | T], _) ->
io:fwrite("~p~n", [H]),
lexme(T, 1).
I'm not sure what you intend to do with the second parameter. You're ignoring it, so why is it there?
You'll also want a function head that deals with the empty list so that the recursion can terminate, so the full definition would be something like this:
lexme([], _) ->
done;
lexme([H | T], _) ->
io:fwrite("~p~n", [H]),
lexme(T, 1).
See http://learnyousomeerlang.com/recursion for more information.

How to read from stdin?

How to do this in Clean?
Pseudo code:
loop:
input = read_stdin
if input == "q":
break loop
else:
print "you input: ", input
Actually, I have had a glance at some pdf. But I got an imagination, It's difficult to deal with stdin and stdout. Could I have a code example to use stdio?
Following Keelan's instructions, I had finished my little program.
module std_in_out_loop
import StdEnv
loop :: *File -> *File
loop io
# io = fwrites "input your name: " io
# ( name, io ) = freadline io
# name = name % ( 0, size name - 2 )
| name == "q"
# io = fwrites "Bye!\n" io
= io
| name == ""
# io = fwrites "What's your name?\n" io
= loop io
| otherwise
# io = fwrites ( "hello " +++ name +++ "\n" ) io
= loop io
Start:: *World -> *World
Start world
# ( io, world ) = stdio world
# io = loop io
# ( ok, world ) = fclose io world
| not ok = abort "Cannot close io.\n"
| otherwise = world
From the Clean 2.2 manual, chapter 9:
Although Clean is purely functional, operations with side-effects (I/O operations, for instance) are permitted. To achieve
this without violating the semantics, the classical types are supplied with so called uniqueness attributes. If an argument
of a function is indicated as unique, it is guaranteed that at run-time the corresponding actual object is local, i.e. there are
no other references to it. Clearly, a destructive update of such a “unique object” can be performed safely.
Concretely, you can make Start, which normally has arity 0 (takes no arguments), a function from *World to *World. The idea is that we now have a function that changes the world, which means that side effects are allowed (they're not really side effects any more, but operations on the world).
The * indicates the uniqueness of the World type. This means that you cannot ever have two instances of the world argument. For example, the following will give a compile-time uniqueness error:
Start :: *World -> *(*World, *World)
Start w = (w, w)
To use standard IO, you will need functions from the StdFile module in StdEnv. The functions you're going to need are:
stdio :: !*World -> *(!*File, !*World)
fclose :: !*File !*World -> !(!Bool, !*World)
I simplified the types a bit, actually they're from the class FileSystem. stdio opens a unique File from a world and also returns the new, modified world. fclose closes a file in a world, and returns a success flag and the modified world.
Then, to read and write from that stdio file, you can use:
freadline :: !*File -> *(!*String, !*File)
fwrites :: !String !*File -> !*File
freadline reads a line into a String, including the newline character. fwrites writes a string to a file, usually you want to include a newline character when writing to stdio.
Putting it together:
Start :: *World -> *World
Start w
# (io,w) = stdio w // open stdio
# io = fwrites "What is your name?\n" io // ask for name
# (name,io) = freadline io // read in name
# name = name % (0, size name - 2) // remove \n from name
# io = fwrites ("Hello, " +++ name +++ "!\n") io // greet user
# (ok,w) = fclose io w // close stdio
| not ok = abort "Couldn't close stdio" // abort in case of failure
= w // return world from Start
The # syntax might be new to you. It's a kind of let which allows you to use the same name for files (or other things), which is more convenient than using, e.g.:
Start w = w3
where
(io, w1) = stdio w
io1 = fwrites "What is your name?\n" io
(name, io2) = freadline io1
//...
(ok, w3) = fclose io10 w2
Now you should be able to do what you want in pseudocode using a helper function loop :: *File -> *File, which calls itself recursively until q is inputted.
There are more functions than only freadline and fwrites, see StdFile.dcl for an idea.

How to convert a string to integer list in ocaml?

I need to pass two list as command line arguments in ocaml.
I used the following code to access it in the program.
let list1=Sys.argv.(1);;
let list2=Sys.argv.(2);;
I need to have the list1 and list2 as list of integers.
I am getting the error
This expression has type string but an expression was expected of type
int list
while processing.
How can I convert that arguments to a list of integers.
The arguments are passed in this format [1;2;3;4] [1;5;6;7]
Sys.argv.(n) will always be a string. You need to parse the string into a list of integers. You could try something like this:
$ ocaml
OCaml version 4.01.0
# #load "str.cma";;
# List.map int_of_string (Str.split (Str.regexp "[^0-9]+") "[1;5;6;7]");;
- : int list = [1; 5; 6; 7]
Of course this doesn't check the input for correct form. It just pulls out sequences of digits by brute force. To do better you need to do some real lexical analysis and simple parsing.
(Maybe this is obvious, but you could also test your function in the toplevel (the OCaml read-eval-print loop). The toplevel will handle the work of making a list from what you type in.)
As Sys.argv is a string array, you need to write your own transcription function.
I guess the simplest way to do this is to use the Genlex module provided by the standard library.
let lexer = Genlex.make_lexer ["["; ";"; "]"; ]
let list_of_string s =
let open Genlex in
let open Stream in
let stream = lexer (of_string s) in
let fail () = failwith "Malformed string" in
let rec aux acc =
match next stream with
| Int i ->
( match next stream with
| Kwd ";" -> aux (i::acc)
| Kwd "]" -> i::acc
| _ -> fail () )
| Kwd "]" -> acc
| _ -> fail ()
in
try
match next stream with
| Kwd "[" -> List.rev (aux [])
| _ -> fail ()
with Stream.Failure -> fail ()
let list1 = list_of_string Sys.argv.(1)
let list2 = list_of_string Sys.argv.(2)
Depending on the OCaml flavor you want to use, some other library may look more interesting. If you like yacc, Menhir may solve your problem in a few lines of code.

One processing conduit, 2 IO sources of the same type

In my GHC Haskell application utilizing stm, network-conduit and conduit, I have a strand for each socket which is forked automatically using runTCPServer. Strands can communicate with other strands through the use of a broadcasting TChan.
This showcases how I would like to set up the conduit "chain":
So, what we have here is two sources (each bound to helper conduits which) which produce a Packet object which encoder will accept and turn into ByteString, then send out the socket. I've had a great amount of difficulty with the efficient (performance is a concern) fusing of the two inputs.
I would appreciate if somebody could point me in the right direction.
Since it would be rude of me to post this question without making an attempt, I'll put what I've previously tried here;
I've written/cherrypicked a function which (blocking) produces a Source from a TMChan (closeable channel);
-- | Takes a generic type of STM chan and, given read and close functionality,
-- returns a conduit 'Source' which consumes the elements of the channel.
chanSource
:: (MonadIO m, MonadSTM m)
=> a -- ^ The channel
-> (a -> STM (Maybe b)) -- ^ The read function
-> (a -> STM ()) -- ^ The close/finalizer function
-> Source m b
chanSource ch readCh closeCh = ConduitM pull
where close = liftSTM $ closeCh ch
pull = PipeM $ liftSTM $ readCh ch >>= translate
translate = return . maybe (Done ()) (HaveOutput pull close)
Likewise, a function to transform a Chan into a sink;
-- | Takes a stream and, given write and close functionality, returns a sink
-- which wil consume elements and broadcast them into the channel
chanSink
:: (MonadIO m, MonadSTM m)
=> a -- ^ The channel
-> (a -> b -> STM()) -- ^ The write function
-> (a -> STM()) -- ^ The close/finalizer function
-> Sink b m ()
chanSink ch writeCh closeCh = ConduitM sink
where close = const . liftSTM $ closeCh ch
sink = NeedInput push close
write = liftSTM . writeCh ch
push x = PipeM $ write x >> return sink
Then mergeSources is straightforward; fork 2 threads (which I really don't want to do, but what the heck) which can put their new items into the one list which I then produce a source of;
-- | Merges a list of 'Source' objects, sinking them into a 'TMChan' and returns
-- a source which consumes the elements of the channel.
mergeSources
:: (MonadIO m, MonadBaseControl IO m, MonadSTM m)
=> [Source (ResourceT m) a] -- ^ The list of sources
-> ResourceT m (Source (ResourceT m) a)
mergeSources sx = liftSTM newTMChan >>= liftA2 (>>) (fsrc sx) retn
where push c s = s $$ chanSink c writeTMChan closeTMChan
fsrc x c = mapM_ (\s -> resourceForkIO $ push c s) x
retn c = return $ chanSource c readTMChan closeTMChan
While I was successful in making these functions typecheck, I was unsuccessful in getting any utilization of these functions to typecheck;
-- | Helper which represents a conduit chain for each client connection
serverApp :: Application SessionIO
serverApp appdata = do
use ssBroadcast >>= liftIO . atomically . dupTMChan >>= assign ssBroadcast
-- appSource appdata $$ decoder $= protocol =$= encoder =$ appSink appdata
mergsrc $$ protocol $= encoder =$ appSink appdata
where chansrc = chanSource (use ssBroadcast) readTMChan closeTMChan
mergsrc = mergeSources [appSource appdata $= decoder, chansrc]
-- | Structure which holds mutable information for clients
data SessionState = SessionState
{ _ssBroadcast :: TMChan Packet -- ^ Outbound packet broadcast channel
}
makeLenses ''SessionState
-- | A transformer encompassing both SessionReader and SessionState
type Session m = ReaderT SessionReader (StateT SessionState m)
-- | Macro providing Session applied to an IO monad
type SessionIO = Session IO
I see this method as being flawed anyhow -- there are many intermediate lists and conversions. This can not be good for performance. Seeking guidance.
PS. From what I can understand, this is not a duplicate of; Fusing conduits with multiple inputs , as in my situation both sources produce the same type and I don't care from which source the Packet object is produced, as long as I'm not waiting on one while another has objects ready to be consumed.
PPS. I apologize for the usage (and therefore requirement of knowledge) of Lens in example code.
I don't know if it's any help, but I tried to implement Iain's suggestion and made a variant of mergeSources' that stops as soon as any of the channels does:
mergeSources' :: (MonadIO m, MonadBaseControl IO m)
=> [Source (ResourceT m) a] -- ^ The sources to merge.
-> Int -- ^ The bound of the intermediate channel.
-> ResourceT m (Source (ResourceT m) a)
mergeSources' sx bound = do
c <- liftSTM $ newTBMChan bound
mapM_ (\s -> resourceForkIO $
s $$ chanSink c writeTBMChan closeTBMChan) sx
return $ sourceTBMChan c
(This simple addition is available here).
Some comments to your version of mergeSources (take them with a grain of salt, it can be I didn't understand something well):
Using ...TMChan instead of ...TBMChan seems dangerous. If the writers are faster than the reader, your heap will blow. Looking at your diagram it seems that this can easily happen, if your TCP peer doesn't read data fast enough. So I'd definitely use ...TBMChan with perhaps large but limited bound.
You don't need the MonadSTM m constraint. All STM stuff is wrapped into IO with
liftSTM = liftIO . atomically
Maybe this will help you slightly when using mergeSources' in serverApp.
Just a cosmetic issue, I found
liftSTM newTMChan >>= liftA2 (>>) (fsrc sx) retn
very hard to read due to its use of liftA2 on the (->) r monad. I'd say
do
c <- liftSTM newTMChan
fsrc sx c
retn c
would be longer, but much easier to read.
Could you perhaps create a self-contained project where it would be possible to play with serverApp?

How to return the index of a for loop in OCaml?

let find_free_next heap start =
for i = start to ((Array.length heap)-1) do
match heap.(i) with
Hdr (Free (h), g) ->
i
done
How can i return the index of a loop as an integer once the match has been found?
If you want to stick to the imperative style, you can use an exception to exit the loop:
exception Found of int
let find_free_next heap start =
try
for i = start to Array.length heap - 1 do
match heap.(i) with
| Hdr (Free (h), g) -> raise (Found i)
| _ -> () (* If it is not what you are seeking *)
done;
raise Not_found
with
| Found n -> n
But generally, as ppl have already written, functional style is more preferred in OCaml:
let find_free_next heap start =
let len = Array.length heap in
let rec find i =
if i >= len then None
else
match heap.(i) with
| Hdr (Free h, g) -> Some i
| _ -> find (i+1)
in
find start
In this example, there is not much difference between the two versions, but use of exceptions for exiting loops/recursions must be used with caution; you can introduce control flow bugs pretty easily with them, and they are sometimes hard to debug.
BTW, you can use Array.unsafe_get heap i to speed up your array access since you can be sure that i is always in the valid range of the array the above examples. (Oh, we need start >= 0 check in addition, though.)
Asumu Takikawa is right, the for loop in OCaml doesn't return a result. In idiomatic OCaml, you should use recursion instead. Ideally there would be a standard function like List.find that works for arrays. There is a function BatArray.findi in OCaml Batteries Included that does what you seem to want.
Simpler, and more efficient (no allocation at all):
let rec find_free_next heap start =
if start = Array.length heap then raise Not_found;
match heap.(i) with
| Hdr (Free h, g) -> i
| _ -> find_free_start heap (i+1)
Or, in imperative style:
let exit = Exit
let find_free_next heap start =
let pos = ref (-1) in
try
for i = start to Array.length heap - 1 do
match heap.(i) with
| Hdr (Free h, g) -> pos := i; raise exit
| _ -> ()
done;
raise Not_found
with Exit -> !pos
(notice that raise exit does not allocate only because the exception if precomputed).
Loops in Ocaml are supposed to be imperative, so it shouldn't return a result (aside from unit). So if you try to return a non-unit result, the compiler will give a warning.
The reason that Ocaml doesn't let you return a result from a loop is because this isn't a very functional idiom. If you use a recursive function instead of a loop, it's easy to exit early and return a result (by returning the result instead of recurring). If you want to write idiomatic Ocaml, you probably want to use recursion in this case.

Resources