TCP/IP communication from the unix server to the Pure Data - unix

I am interested in TCP/IP communication from the Unix server to the Pure Data. I have it realized using sockets on the Unix server side, and netclient on the Pure Data side. I exploited the chat-server tutorial for this (3.Networking > 10.chat_client.pd).
Now the problem lies that the server is streaming the data out as a "string" message delimited with ";"
My question is, is there a way to send something other than string message to Pure Data, like byte-stream or serialized number stream? Can Pure Data receive such messages?
Since string takes too many bytes to transfer, for example number "1024;" is already 5 bytes, while such an integer number is just 4 bytes.
UPDATE: For everyone that stumbles upon this post in search for the answer.
Apparently [netclient] on the Pure Data side cannot receive nothing else than ; delimited messages.
So the solution for the problem posed above:
My question is, is there a way to send something other than string message to Pure Data, like byte-stream or serialized number stream? Can Pure Data receive such messages?
The solution is to use [tcpclient], it can receive byte-stream data.
Now my question is, how do I get four compact numbers to work with?
Now I have a series of bytes, at least in the correct order.
From my UNIX server I am sending a structure
typedef struct {
int var_code;
int sample_time;
int hr;
float hs;
} phy_data;
Sample data might be 2 1000000 51 2000.56
When received and printed in Pure Data I get output like this:
: 0 0 0 2 0 10 114 26 0 0 0 51 0 16 242 78
You can notice number 2 and number 51 clearly, I guess the others are correct as well.
How can I get these numbers back to a usable format?
Maybe some manipulation with [bytes2any] and [route], but I haven't been able to extract the data with it?

here's an outline of what you have to do:
repackage the bytelist to small messages of the correct size for the various types.
since all your elements are 4 byte long, you simply repackage your list (or bytestream, as TCP/IP doesn't guarantee to deliver your 16 bytes as a single list, but could also decide to break it into a list of arbitrary length) to a number of 4 atom lists.
the most stable way, would probably be to 1st serialize the list (check the "serializer" example in the [list] help) and than reassamble that list to 4 elements.
if you can use externals like zexy you could use [repack 4] for that.
if you trust [netclient] to output your messages as complete lists, you could simply use a large [unpack ....] and 4 [pack]s
interpret the raw data for each sublist
integers is rather simple, floats are way more complicated
integers:
|
[unpack 0 0 0 0]
| | | |
[<< 8] | | |
| | | |
[+ ] | |
| | |
[<< 8] | |
| | |
[+ ] |
| |
[<< 8] |
| |
[+ ]
|
floats are left as an exercise to the user :-)

the real solution to your problem would be to use a well-defined application-layer protocol, rather than brew your own.
the most widespread protocol in use for applications like Pd, is certainly OSC.
in order to decode the raw OSC-bytes into Pd-messages, use [unpackOSC] (part of the "mrpeach" library; on Debian, you install it via the pd-osc package)
on the "server" side, you can use liblo for encoding data and sending it.
note
be aware that since OSC is packet-based, you will need a packetizing mechanism for stream-based protocols like TCP/IP. as with OSC-1.2, this should be SLIP. liblo should already take care of this. check the patches accompanying [unpackOSC] for how to do this within Pd.
all this is not needed if you are using a UDP as a transport.

Related

What is the structure of the Presentation-Data-Value in P-Data-TF?

I find these for Presentation Data Value:
0 | 0 | 0 | 0 | 0 | 0 | last-or-not | command-or-dataset | **Some Message**
But I couldn't find Some Message part. I suppose this part include C-Find, C-Get etc. How can I know this structure?
Where do you have this from? In fact it is a bit different.
Your example should read
0 | 0 | 0 | 0 | 0 | 0 | last-or-not | command-or-data*set* | **Some Message**
So the "command-or-dataset" flag indicates whether the following bytes are encoding a command (as defined in PS3.7) or a dataset as defined in PS3.3 or 3.4 respectively).
E.g. for DICOM Queries, there is a C-FIND command defined in PS3.7, Chapter 9.1.2.1. In C-FIND, the query criteria are part of the command ("Identifier") in table 9.1-2. How the identifier is formed and all its semantics is subject of the Query/Retrieve Service Class as defined in PS3.4, C.4.1.
For transferring objects, there is a C-STORE command, also defined in PS3.7 (chapter 9.1.1.1). The Data-Set is also a part of the C-STORE command, and its contents depend on the type of data (SOP Class). This is referred to as an Information Object Definition (IOD) and defined in PS3.3. The protocol for Strorage is also defined in PS3.4 (Annex B)
However, the length limitation of the PDV will only allow to have the whole object encoded in a single PDV and needs to be split. For the following PDVs, no command set will be present but only a fragment of the dataset. In this case, the "command-or-dataset" bit must be set to 0.
I hope I could make it a bit clear. It is a bit difficult in the beginning of learning DICOM to know all the terms and the interrelationships.
Encoding
Logically command- and dataset are encoded in the same way. The data dictionary (Part 6) is a complete list of all possible attributes and the major difference between command- and data set attributes is that command attributes are having the group number 0 while data set attributes have "any even number but 0".
For each attribute, the data dictionary gives you the Value Representation (VR) which needs to be considered for encoding the value. E.g. "PN" for patient name "UI" for Unique Identifier and so forth. The VRs are defined in PS3.5, Chapter 6.2.
The encoding of attributes is then
group | element | (VR) | length (always even) | value
How this is transformed to the binary level depends on the Transfer Syntax (TS) that was agreed for the service during association negotiation. For this reason "VR" is enclosed in brackets above - it depends whether it is an implicit or explicit TS if this must/must not be present.
There are some more things to consider (endianess, sequence encoding) when encoding code sets or datasets in binary form. Basically everything about it is described in various chapters in PS3.5

Interpreting and converting zone file data

I have a zone for a single TLD. I am trying to process the file data and convert it into JSON for other services that uses this data. Here's the first five lines of the file I have:
com. 900 in soa a.gtld-servers.net. nstld.verisign-grs.com. 1612915221 1800 900 604800 86400
0-------------------------------------------------------------0.com. 172800 in ns ns1.domainit.com.
0-------------------------------------------------------------0.com. 172800 in ns ns2.domainit.com.
0-------------------------------------------------------------5.com. 172800 in ns fns.frogsmart.net.
0-------------------------------------------------------------5.com. 172800 in ns sns.frogsmart.net.
0-------------------------------------------------------------5.com. 172800 in ns tns.frogsmart.net.
Now I am not sure as how to interpret this file's data. I have looked at reference and example zone files at multiple places but, it does not resemble this format. One of the references can be found here. I just need some pointers on how to interpret each line. My understanding are the following:
The first value is the domain name
The next value is a number which, if I use the first line as header seems to be 900 (not sure what is)
The next value is in (not sure what this is)
The next value is soa which is ns (I think this means Start of Authority for domain is with Name server)
Lastly, the name server which, if I use the first line as header seems to a.gtld-servers.net (I think this is the primary SOA address)
Now the other properties (the first line I think indicates 10 properties) but these are not present in this file I am trying to process. That's all I could figure out so far and some help will be greatly appreciated.
First a warning: zonefiles can be big, especially .com one and converting that to JSON, especially if you intend to fully build the object in memory before using it, you might have trouble.
So you should start by asking yourselves if you really need all the data (for example as seen below what will you do with SOA content?) and if JSON is the most adequate representation, especially if not in a streaming way.
DNS data is explained in RFC 1034+1035.
More specifically ยง3.3.13 in RFC1035:
3.3.13. SOA RDATA format
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
/ MNAME /
/ /
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
/ RNAME /
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| SERIAL |
| |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| REFRESH |
| |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| RETRY |
| |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| EXPIRE |
| |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| MINIMUM |
| |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
where:
MNAME The of the name server that was the
original or primary source of data for this zone.
RNAME A which specifies the mailbox of the
person responsible for this zone.
SERIAL The unsigned 32 bit version number of the original
copy
of the zone. Zone transfers preserve this value. This
value wraps and should be compared using sequence space
arithmetic.
REFRESH A 32 bit time interval before the zone should be
refreshed.
RETRY A 32 bit time interval that should elapse before a
failed refresh should be retried.
EXPIRE A 32 bit time value that specifies the upper limit on
the time interval that can elapse before the zone is no
longer authoritative.
But do not that the semantics have changed in later RFCs, the MINIMUM is now called the NEGATIVE TTL.
Also IN (case not significant) means INternet but all records will have that, consider it as a left over of past DNS experiments around classes that never worked.

Google Sheets FILTER() and QUERY() not working with SUM()

I'm trying to pull and sum data from one sheet on another. This is GA data being built into a report, so I have sessions split up by landing page and device type, and would like to group them in different ways.
I usually use FILTER() for this sort of thing, but it keeps returning a 0 sum. Thinking this may be an odd edge case with FILTER(), I switched to using QUERY() instead. That gave me an error, but a Google search doesn't offer much documentation about what the error actually means. Taking a guess that it could be indicating an issue with the data type (i.e. not numeric), I changed the format of the source from "Automatic" to "Number", but to no avail.
Maybe it's a lack of coffee, I'm at a loss as to why neither function is working to do a simple lookup and sum by criteria.
FILTER() function
SUM(FILTER(AllData!C:C,AllData!A:A="/chestnut/",AllData!B:B="desktop"))
No error, but returns 0 regardless of filter parameters.
QUERY() function
QUERY(AllData!A:G, "SELECT SUM(C) WHERE A='/chestnut/' AND B='desktop'",1)
Error returned:
Unable to parse query string for Function QUERY parameter 2: AVG_SUM_ONLY_NUMERIC
Sample data:
landingPage | deviceCategory | sessions
-------------|----------------|----------
/chestnut/ | desktop | 4
/chestnut/ | desktop | 2
/chestnut/ | tablet | 5
/chestnut/ | tablet | 1
/maple/ | desktop | 1
/maple/ | desktop | 2
/maple/ | mobile | 3
/maple/ | mobile | 1
I think the summing doesn't work because your numbers are text formatted.
See if any of these work? (change ranges to suit)
using FILTER()
=SUM(FILTER(VALUE(AllData!C:C),AllData!A:A="/chestnut/",AllData!B:B="desktop"))
using QUERY()
=ArrayFormula(QUERY({AllData!A:B, VALUE(AllData!C:C)}, "SELECT SUM(Col3) WHERE Col1='/chestnut/' AND Col2='desktop' label SUM(Col3)''",1))
using SUMPRODUCT()
=SUMPRODUCT(VALUE(AllData!C2:C),AllData!A2:A="/chestnut/",AllData!B2:B="desktop")

Why can't I use hPutStr after printing the result of hGetContents?

I'm new to stackoverflow so forgive me if I do something wrong. I trying to understand how a simple server would work in Haskell. I think I'm missing something very simple or fundamental about how hGetContents works.
import Network
import System.IO
main = withSocketsDo $ do
socket <- listenOn $ PortNumber 5002
(h, _, _) <- accept socket
c <- hGetContents h
-- putStrLn c -- doesn't work
-- putStrLn $ head $ lines c -- works!
-- putStrLn $ unlines $ take 2 $ lines c -- works!
-- putStrLn $ unlines $ take 3 $ lines c -- works!
-- putStrLn $ unlines $ take 6 $ lines c -- works!
putStrLn $ unlines $ take 10 $ lines c -- doesn't work
hPutStr h $ "HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nHello!\r\n"
hClose h
After running the program, I navigate via web browser to http://localhost:5002. The problem seems to be that, depending on how much I've parsed the handle contents, I eventually am unable to send a response. I'd like to be able to parse the request before I send a response. I've commented in the code the cases that work and the cases that don't. Hoogle says that for hGetContents (lazy) the handle is "semi-closed" as it is being read. Am I misunderstanding the laziness or should I consider the handle closed once I begin parsing its contents?
The error I get is "hPutChar: resource vanished (Broken pipe)." Thanks for any help.
I tried to reproduce your problem. For that I executed your code and send it a request using nc:
printf "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11" | nc localhost 5002
As expected the server (code from your question) printed out first 10 lines and exited without any error. The client (nc) printed:
HTTP/1.0 200 OK
Content-Length: 5
Hello!
and also exited without an error.
So, at first I couldn't understand what's your problem, but then I tried to send a smaller request:
printf "1\n2\n3\n4\n5\n6\n" | nc localhost 5002
The server printed first 6 lines and didn't exit. The client also didn't exit, so I interrupted it with Ctrl-C and after that the server exited with "resource vanished" error.
I took some thinking and it started making sense to me. I don't understand lazy IO too good, so if my explanation isn't clear or correct it would be helpful if someone with better understanding would improve it.
Let's follow your code. First:
(h, _, _) <- accept socket
c <- hGetContents h
You open a handle and read it's content. Note that the handle is lazy and the content that you get is also lazy. When we say that something is lazy we mean that it can be passed around without being evaluated (it's often referred as 'call by name' vs 'call by value').
Now:
putStrLn $ unlines $ take 10 $ lines c
Here it is, you pass your lazy, unevaluated content to another function take 10. take 10 will try to evaluate first 10 elements of a list and return them, if there are less than 10 elements in the list it would simply return all of them. After take 10 we have putStrLn and unlines which both perfectly compatible with laziness.
Now let's say that client sends an input that is only 6 lines long and then starts waiting for the respond. Our server lazily receives the content and tries to print first 10 lines. First, take 10 function happily consumes the first 6 lines and passes them over to putStrLn . unlines, what happens then? take 10 can't just finish it's output because there is absolutely no indication that it is the end. The handle is still open, bytes still can be floating from client to server, so it just waits for more input.
This behaviour can be observed by running:
nc localhost 5002
and manually typing there 10 lines. The input would appear on server line-by-line as you type. After you will type the 10th line the server will respond with "Hello" message.
P.S: I guess that the behaviour that you described happens because you web browser sends 6 to 9 lines of something with the request.
To test, debug and analyze this kind of low level servers you should use simple tools like nc and curl instead of your web browser :)
When you initiate a lazy read on a handle, you give up the right to do anything much else with the handle until the contents string is fully forced, or you close the handle manually (at which point attempting to force any more of the contents string will lead to bad behavior or an error).
TL;DR
This is not a situation where lazy I/O is appropriate. The situations where a lazy read on a socket is appropriate can probably be counted on zero fingers. You can use regular strict I/O if you like, or conduit, or pipes, or some Haskell web framework like Yesod or Scotty or various other competitors.
Calling hGetContents puts the handle into a "semi-closed" state. You should not perform any operations on the handle after that point. You should only use the string returned from hGetContents.
Put simply, don't use lazy I/O here. You need to manually read and write individual strings one at a time, since the timing matters.
In general, lazy I/O is kind of neat, but it doesn't work well for anything much beyond toy examples.

Downloading the entire Bitcoin transaction chain with R

I'm pretty new here so thank you in advance for the help. I'm trying to do some analysis of the entire Bitcoin transaction chain. In order to do that, I'm trying to create 2 tables
1) A full list of all Bitcoin addresses and their balance, i.e.,:
| ID | Address | Balance |
-------------------------------
| 1 | 7d4kExk... | 32 |
| 2 | 9Eckjes... | 0 |
| . | ... | ... |
2) A record of the number of transactions that have ever occurred between any two addresses in the Bitcoin network
| ID | Sender | Receiver | Transactions |
--------------------------------------------------
| 1 | 7d4kExk... | klDk39D... | 2 |
| 2 | 9Eckjes... | 7d4kExk... | 3 |
| . | ... | ... | .. |
To do this I've written a (probably very inefficient) script in R that loops through every block and scrapes blockexplorer.com to compile the tables. I've tried running it a couple of times so far but I'm running into two main issues
1 - It's very slow... I can imagine it's going to take at least a week at the rate that it's going
2 - I haven't been able to run it for more than a day or two without it hanging. It seems to just freeze RStudio.
I'd really appreaciate your help in two areas:
1 - Is there a better way to do this in R to make the code run significantly faster?
2 - Should I stop using R altogether for this and try a different approach?
Thanks in advance for the help! Please see below for the relevant chunks of code I'm using
url_start <- "http://blockexplorer.com/b/"
url_end <- ""
readUrl <- function(url) {
table <- try(readHTMLTable(url)[[1]])
if(inherits(table,"try-error")){
message(paste("URL does not seem to exist:", url))
errors <- errors + 1
return(NA)
} else {
processed <- processed + 1
return(table)
}
}
block_loop <- function (end, start = 0) {
...
addr_row <- 1 #starting row to fill out table
links_row <- 1 #starting row to fill out table
for (i in start:end) {
print(paste0("Reading block: ",i))
url <- paste(url_start,i,url_end, sep = "")
table <- readUrl(url)
if(is.na(table)){ next }
....
There are very close to 250,000 blocks on the site you mentioned (at least, 260,000 gives a 404). Curling from my connection (1 MB/s down) gives an average speed of about half a second. Try it yourself from the command line (just copy and paste) to see what you get:
curl -s -w "%{time_total}\n" -o /dev/null http://blockexplorer.com/b/220000
I'll assume your requests are about as fast as mine. Half a second times 250,000 is 125,000 seconds, or a day and a half. This is the absolute best you can get using any methods because you have to request the page.
Now, after doing an install.packages("XML"), I saw that running readHTMLTable(http://blockexplorer.com/b/220000) takes about five seconds on average. Five seconds times 250,000 is 1.25 million seconds which is about two weeks. So your estimates were correct; this is really, really slow. For reference, I'm running a 2011 MacBook Pro with a 2.2 GHz Intel Core i7 and 8GB of memory (1333 MHz).
Next, table merges in R are quite slow. Assuming 100 records per table row (seems about average) you'll have 25 million rows, and some of these rows have a kilobyte of data in them. Assuming you can fit this table in memory, concatenating tables will be a problem.
The solution to these problems that I'm most familiar with is to use Python instead of R, BeautifulSoup4 instead of readHTMLTable, and Pandas to replace R's dataframe. BeautifulSoup is fast (install lxml, a parser written in C) and easy to use, and Pandas is very quick too. Its dataframe class is modeled after R's, so you probably can work with it just fine. If you need something to request URLs and return the HTML for BeautifulSoup to parse, I'd suggest Requests. It's lean and simple, and the documentation is good. All of these are pip installable.
If you still run into problems the only thing I can think of is to get maybe 1% of the data in memory at a time, statistically reduce it, and move on to the next 1%. If you're on a machine similar to mine, you might not have another option.

Resources