F# - Need help converting this to use a threadpool - asynchronous

I am new to F# and I have frankensteined the code below from various examples I found online in an attempt to get a better understanding of how I can use it. Currently the code below reads in a list of machines from a file and pings each of the machines. I had to divide the initial array from the file up into a smaller arrays of 25 machines to control the number of concurrent actions otherwise it takes far to long to map out the list of machines. I would like be able to use a threadpool to manage the threads but I have not found a way to make it work. Any guidance would be great. I am not able to make this work:
let creatework = FileLines|> Seq.map (fun elem -> ThreadPool.QueueUserWorkItem(new WaitCallback(dowork), elem))
Here is the complete code:
open System.Threading
open System
open System.IO
let filePath = "c:\qa\machines.txt"
let FileLines = File.ReadAllLines(filePath)
let count = FileLines.Length/25
type ProcessResult = { exitCode : int; stdout : string; stderr : string }
let executeProcess (exe,cmdline) =
let psi = new System.Diagnostics.ProcessStartInfo(exe,cmdline)
psi.UseShellExecute <- false
psi.RedirectStandardOutput <- true
psi.RedirectStandardError <- true
psi.CreateNoWindow <- true
let p = System.Diagnostics.Process.Start(psi, EnableRaisingEvents = true)
let output = new System.Text.StringBuilder()
let error = new System.Text.StringBuilder()
p.OutputDataReceived.Add(fun args -> output.AppendLine(args.Data)|> ignore)
p.ErrorDataReceived.Add(fun args -> error.AppendLine(args.Data) |> ignore)
p.BeginErrorReadLine()
p.BeginOutputReadLine()
p.WaitForExit()
{ exitCode = p.ExitCode; stdout = output.ToString(); stderr = error.ToString() }
let dowork machinename=
async{
let exeout = executeProcess(#"c:\windows\system32\ping.exe", "-n 1 " + machinename)
let exelines =
if exeout.stdout.Contains("Reply from") then Console.WriteLine(machinename + " " + "REPLY")
elif exeout.stdout.Contains("Request timed out.") then Console.WriteLine(machinename + " " + "RTO")
elif exeout.stdout.Contains("Ping request could not find host") then Console.WriteLine(machinename + " " + "Unknown Host")
else Console.WriteLine(machinename + " " + "ERROR")
exelines
}
printfn "%A" (System.DateTime.Now.ToString())
for i in 0..count do
let x = i*25
let y = if i = count then FileLines.Length-1 else (i+1)*25
printfn "%s %d" "X equals: " x
printfn "%s %d" "Y equals: " y
let filesection = FileLines.[x..y]
let creatework = filesection |> Seq.map dowork |> Async.Parallel |> Async.RunSynchronously|>ignore
creatework
printfn "%A" (System.DateTime.Now.ToString())
printfn "finished"
UPDATE:
The code below works and provides a framework for what I want to do. The link that was referenced by Tomas Petricek did have the bits of code that made this work. I just had to figure which example was the right one. It is within 3 seconds of duplicate framework written in Java so I think I am headed in the right direction. I hope the example below will be useful to anyone else trying to thread out various executables in F#:
open System
open System.IO
open System.Diagnostics
let filePath = "c:\qa\machines.txt"
let FileLines = File.ReadAllLines(filePath)
type Process with
static member AsyncStart psi =
let proc = new Process(StartInfo = psi, EnableRaisingEvents = true)
let asyncExit = Async.AwaitEvent proc.Exited
async {
proc.Start() |> ignore
let! args = asyncExit
return proc
}
let shellExecute(program : string, args : string) =
let startInfo =
new ProcessStartInfo(FileName = program, Arguments = args,
UseShellExecute = false,
CreateNoWindow = true,
RedirectStandardError = true,
RedirectStandardOutput = true)
Process.AsyncStart(startInfo)
let dowork (machinename : string)=
async{
let nonbtstat = "NONE"
use! pingout = shellExecute(#"c:\windows\system32\ping.exe", "-n 1 " + machinename)
let pingRdToEnd = pingout.StandardOutput.ReadToEnd()
let pingresults =
if pingRdToEnd.ToString().Contains("Reply from") then (machinename + " " + "REPLY")
elif pingRdToEnd.ToString().Contains("Request timed out.") then (machinename + " " + "RTO")
elif pingRdToEnd.ToString().Contains("Ping request could not find host") then (machinename + " " + "Unknown Host")
else (machinename + " " + "PING_ERROR")
if pingresults.ToString().Contains("REPLY") then
use! nbtstatout = shellExecute(#"c:\windows\system32\nbtstat.exe", "-a " + machinename)
let nbtstatRdToEnd = nbtstatout.StandardOutput.ReadToEnd().Split('\n')
let nbtstatline = Array.tryFind(fun elem -> elem.ToString().Contains("<00> UNIQUE Registered")) nbtstatRdToEnd
return Console.WriteLine(pingresults + nbtstatline.Value.ToString())
else return Console.WriteLine(pingresults + " " + nonbtstat)
}
printfn "%A" (System.DateTime.Now.ToString())
let creatework = FileLines |> Seq.map dowork |> Async.Parallel |> Async.RunSynchronously|>ignore
creatework
printfn "%A" (System.DateTime.Now.ToString())
printfn "finished"

The main problem with your code is that executeProcess is a synchronous function that takes a long time to run (it runs the ping.exe process and waits for its result). The general rule is that tasks in a thread pool should not block for a long time (because then they block thread pool threads, which means that the thread pool cannot efficiently schedule other work).
I think you can solve this quite easily by making executeProcess asynchronous. Instead of calling WaitForExit (which blocks), you can wait for the Exitted event using Async.AwaitEvent:
let executeProcess (exe,cmdline) = async {
let psi = new System.Diagnostics.ProcessStartInfo(exe,cmdline)
psi.UseShellExecute <- false
// [Lots of stuff omitted]
p.BeginOutputReadLine()
let! _ = Async.AwaitEvent p.Exited
return { exitCode = p.ExitCode
stdout = output.ToString(); stderr = error.ToString() } }
This should unblock threads in the thread pool and so you'll be able to use Async.Parallel on all the URLs from the input array without any manual scheduling.
EDIT As #desco pointed out in a comment, the above is not quite right if the process exits before the AwaitEvent line is reached (before it may miss the event). To fix that, you need to use Event.guard function, which was discussed in this SO question:
Need help regarding Async and fsi

Related

Why is the variable modified when nginx Lua processes the request?

I just started studying Lua.
Every time I request, I want to check the name parameter in the request parameter. However, it is actually found that the self.name has changed occasionally.
For example,
request A with params: request_id = 123 & name = ABC,
request B with params: request_id = 321 & name = EFG,
in the log, I found that there are requests_id = 123, but name = EFG.
Why is that? Is my class incorrectly written?
Here is the sample codeļ¼š
main.lua:
local checker = require "checker"
local ch = checker:new()
if ch:check_name() then
ngx.header["Content-Type"] = "application/json"
ngx.status = ngx.HTTP_FORBIDDEN
ngx.exit(ngx.HTTP_FORBIDDEN)
end
checker.lua:
local utils = require "utils"
local _M = {}
function _M:new()
local o = {}
setmetatable(o, self)
self.__index = self
self.args = utils.get_req_args() or {}
local name = self.args["name"] or ""
local request_id = self.args["request_id"] or ""
self.name = name
return o
end
function _M:check_name()
ngx.log(ngx.ERR, "request_id: ", self.request_id, " name: ", self.name)
-- do some check ...
end
utils.lua
local json = require "cjson"
local _M = {}
function _M.new(self)
return self
end
function _M.get_req_args()
-- GET
local args = nil
if ngx.var.request_method == "GET" then
args = ngx.req.get_uri_args()
-- POST
elseif ngx.var.request_method == "POST" then
ngx.req.read_body()
local data = ngx.req.get_body_data()
args = json.decode(data)
end
return args
end

what is the pointer to pointer equivalent in lua

I know that you can use tables in a similar way to pointers in lua. That being said, what would pointers to pointers look like? Would they look something like dp = {p = {}}? if so what would the equivalent to the c code below be in lua?
void InsertItem(node **head, node *newp){
node **dp = head;
while((*dp) && (*dp)->value > newp->value
{
dp = &(*dp)->next;
}
newp->next = *dp;
*dp = newp;
}
Yes, double pointer may be translated to Lua as nested table.
local function InsertItem(head, newitem)
while head.next and head.next.value > newitem.value do
head = head.next
end
newitem.next = head.next
head.next = newitem
end
-- Typical Usage:
local head = {}
InsertItem(head, {value = 3.14})
InsertItem(head, {value = 42})
InsertItem(head, {value = 1})
-- Now the data is the following:
-- head = {next = elem1}
-- elem1 = {next = elem2, value = 42 }
-- elem2 = {next = elem3, value = 3.14}
-- elem3 = { value = 1 }
The big difference between C pointers and Lua tables is that in C, you can take the address of a variable and pass it to a function to modify it. You can't do that in Lua, but the function could always return the modified value.
Would they look something like dp = {p = {}}?
Yes, that's about as close as close as you can get to a pointer to a pointer in Lua.
if so what would the equivalent to the c code below be in lua?
Linked lists tend to work more smoothly with recursion:
local function InsertItem(head, newp)
if not head or head.value <= newp.value then
newp.next = head
return newp
end
head.next = InsertItem(head.next, newp)
return head
end

F# Async - An item with the same key has already been added

I am trying some async ops in f# but w/o much luck. I am trying to grab records from the db and perform operations on each record in Parallel.
let IsA1 companyId =
query { for comp in db.Company do
join cc in db.CC on (comp.CompanyId = int(cc.CompanyId))
join pp in db.PP on (cc.PartId = pp.PartId)
join tl in db.TL on (pp.CompanyId = tl.CompanyId)
where (comp.CompanyId = companyId)
select (comp.CompanyId > 0)
}
|> Seq.length |> fun len -> len > 0
let IsA2 companyId =
query { for t in db.Title do
join pp in db.PP on (t.Tid = pp.Tid)
join comp in db.Company on (pp.CompanyId = comp.CompanyId)
where (comp.CompanyId = companyId)
select (comp.CompanyId > 0)
}
|> Seq.length |> fun len -> len > 0
let GetAffiliations id =
async {
if (IsA1 id) then return "AffilBBB"
elif (IsA2 id) then return "AffilCCD"
else return Unknown
}
let ProcessCompany (company:dbSchema.ServiceTypes.Company) =
async {
let grp = GetAffiliations company.CompanyId
let result = { Id=company.CompanyId; Name=company.Name; Affiliations=grp; ContactType="ok"; }
return result
}
let GetCompanyNames =
let companies = db.Company |> Seq.distinctBy(fun d -> d.CompanyId)
companies
|> Seq.map(fun co -> ProcessCompany co)
|> Async.Parallel
|> Async.RunSynchronously
When I run the above code, I get error:
System.ArgumentException: An item with the same key has already been added.
The error is occurring as a result of another function call inside async { }:
let grp = GetAffiliations company.CompanyId
I am sure its a newbie issue, but I am not sure what the issue is. I even tried making the call inside of the async{ } another async call and used let! grp = (GetAffiliations company.CompanyId) but that does not resolve.
Because the two concurrent queries are sharing the same context, when the second result is added to the same context, you get an error saying that the context already has an item with the same key.
Using distinct instances of the 'db' context for each of the queries, should solve your issue.

Concurrent Download with limited number of Workers and AsyncSeq from FSharpX (or ExtCore)

I try a concurrent download with limited number of Workers using the AsyncSeq module.
Based on the FSharpX example of https://github.com/fsprojects/fsharpx/blob/master/samples/Crawler.fsx
let rec concurrentDownload concurrentWorkers download transform (urls:string list) =
asyncSeq {
let requests = BlockingQueueAgent<_>(1000)
let results = BlockingQueueAgent<_>(50)
let worker() = async {
while true do
let! url = requests.AsyncGet()
let! doc = download url
match doc with
| Some txt -> do! results.AsyncAdd( transform txt )
| _ -> ()
}
// fill in all the requests
for url in urls do
do! requests.AsyncAdd url
// create the workers and start them
for i in 1 .. concurrentWorkers do
worker() |> Async.Start
// get the results and yield them in the asynchronous sequence
while requests.Count > 0 && results.Count > 0 do
let! res = results.AsyncGet()
yield res
}
let rand = new System.Random()
let rnd() = rand.Next(0,4000)
// a simulated download, sleep time depends on random number
let download str = async {
let name = "dl " + str
let payload = rnd()
printfn "Started : %s (payload=%d)" name payload
do! Async.Sleep(1000 + payload)
printfn "Finished: %s" name
return Some name
}
let urls = [1..10] |> List.map (sprintf "URL %d")
let concurrentWorkers = 5
let transform = id
let ret = concurrentDownload concurrentWorkers download transform urls
//ret // val it : AsyncSeq<string> = Microsoft.FSharp.Control.FSharpAsync`1[FSI_0012.FSharp.Control.AsyncSeqInner`1[System.String]]
let z =
ret
|> AsyncSeq.toBlockingSeq
|> Seq.toList
I assumed that z gets something like seq ["dl URL 3"; "dl URL 5"; ... ]
because 'download' returns Some content.
The workers on the blocking queues working as expected:
Started : dl URL 1 (payload=2281)
Started : dl URL 3 (payload=741)
Started : dl URL 4 (payload=3283)
Started : dl URL 5 (payload=1117)
Started : dl URL 2 (payload=2435)
Finished: dl URL 3
Started : dl URL 6 (payload=263)
Finished: dl URL 5
Started : dl URL 7 (payload=1115)
Finished: dl URL 6
Started : dl URL 8 (payload=1041)
Finished: dl URL 1
Started : dl URL 9 (payload=959)
Finished: dl URL 2
Started : dl URL 10 (payload=604)
Finished: dl URL 7
Finished: dl URL 4
Finished: dl URL 10
Finished: dl URL 8
Finished: dl URL 9
The problem is, why is z an empty list?
And not as exprected seq ["dl URL 3"; "dl URL 5"; ... ]?
As a reference, here is the toBlockingSeq function:
// FSharpX AsyncSeq.toBlockingSeq
let toBlockingSeq (input : AsyncSeq<'T>) =
// Write all elements to a blocking buffer and then add None to denote end
let buf = new BlockingQueueAgent<_>(1)
async {
do! iterAsync (Some >> buf.AsyncAdd) input
do! buf.AsyncAdd None
} |> Async.Start
// Read elements from the blocking buffer & return a sequences
let rec loop () =
seq {
match buf.Get() with
| None -> ()
| Some v ->
yield v
yield! loop()
}
loop ()

IO and parallel async in Fsharp

I have some computation intensive tasks, which are now only running on 1 core, so 1/8th of my machine capacity. At the end of each task, I write a log in a file.
What would be the most graceful way to handle this IO using parallel tasks ?
Having my write be itself async ?
Sending messages to an agent who'd process the write sequentially ?
[<Fact>]
let boom () =
let tasks = [1 .. 10]
|> Seq.map (fun components -> async { //do compute intensive stuff
use writer = new StreamWriter("samefile")
writer.WriteLine "toto" }
)
tasks |> Async.Parallel |> Async.RunSynchronously
Edit
I ended up doing this, and replacing the new Stream in my async to be code by synchronous call to the agent.
[<Fact>]
let pasBoom () =
let tasks = [2 .. 2 .. 17]
|> Seq.map (fun components -> async { //do compute intensive stuff
//use writer = new StreamWriter("samefile")
use writerhanlde = repoFileHandle.PostAndReply(fun replyChannel -> GetFile(#"samefile", replyChannel))
printfn "%A" (writerhanlde.getWriter().ToString())
writerhanlde.getWriter().WriteLine "toto" }
)
tasks |> Async.Parallel |> Async.RunSynchronously
and the agent (there might be bugs please be careful, I just need something quick myself)
type IDisposableWriter =
inherit IDisposable
abstract getWriter : unit -> StreamWriter
type StreamMessage = | GetFile of string * AsyncReplyChannel<IDisposableWriter>
let repoFileHandle =
let writerCount = new Dictionary<string, int>()
let writerRepo = new Dictionary<string, StreamWriter> ()
Agent.Start(fun inbox ->
async { while true do
let! msg = inbox.Receive()
match msg with
| GetFile(filename, reply) ->
if not (writerRepo.ContainsKey(filename)) then
writerRepo.[filename] <- new StreamWriter(filename,true)
writerCount.[filename] <- 0
writerCount.[filename] <- writerCount.[filename] + 1
let obj = {new IDisposableWriter with
member this.getWriter () = writerRepo.[filename]
member IDisposable.Dispose() =
writerCount.[filename] <- writerCount.[filename] - 1
if writerCount.[filename] = 0 then
writerRepo.[filename].Dispose()
writerRepo.Remove(filename) |> ignore
}
reply.Reply(obj) })
and to avoid concurrent write
type WriteToStreamMessage = | WriteToStream of string * string
let fileWriterAgent =
Agent.Start(fun inbox ->
async { while true do
let! msg = inbox.Receive()
match msg with
| WriteToStream(filename, content) ->
use writerhanlde = repoFileHandle.PostAndReply(fun replyChannel -> GetFile(filename, replyChannel))
writerhanlde.getWriter().WriteLine content
})
Can you change your computation to return the message to be logged instead of writing it to a file? Then you could use PSeq in PowerPack, which is a thin wrapper over TPL:
open Microsoft.FSharp.Collections
let work n = sprintf "running task %d" n
let msgs = PSeq.init 10 work |> PSeq.toList
use writer = System.IO.StreamWriter(#"C:\out.log")
msgs |> List.iter writer.WriteLine

Resources