Interpreting and converting zone file data - networking

I have a zone for a single TLD. I am trying to process the file data and convert it into JSON for other services that uses this data. Here's the first five lines of the file I have:
com. 900 in soa a.gtld-servers.net. nstld.verisign-grs.com. 1612915221 1800 900 604800 86400
0-------------------------------------------------------------0.com. 172800 in ns ns1.domainit.com.
0-------------------------------------------------------------0.com. 172800 in ns ns2.domainit.com.
0-------------------------------------------------------------5.com. 172800 in ns fns.frogsmart.net.
0-------------------------------------------------------------5.com. 172800 in ns sns.frogsmart.net.
0-------------------------------------------------------------5.com. 172800 in ns tns.frogsmart.net.
Now I am not sure as how to interpret this file's data. I have looked at reference and example zone files at multiple places but, it does not resemble this format. One of the references can be found here. I just need some pointers on how to interpret each line. My understanding are the following:
The first value is the domain name
The next value is a number which, if I use the first line as header seems to be 900 (not sure what is)
The next value is in (not sure what this is)
The next value is soa which is ns (I think this means Start of Authority for domain is with Name server)
Lastly, the name server which, if I use the first line as header seems to a.gtld-servers.net (I think this is the primary SOA address)
Now the other properties (the first line I think indicates 10 properties) but these are not present in this file I am trying to process. That's all I could figure out so far and some help will be greatly appreciated.

First a warning: zonefiles can be big, especially .com one and converting that to JSON, especially if you intend to fully build the object in memory before using it, you might have trouble.
So you should start by asking yourselves if you really need all the data (for example as seen below what will you do with SOA content?) and if JSON is the most adequate representation, especially if not in a streaming way.
DNS data is explained in RFC 1034+1035.
More specifically ยง3.3.13 in RFC1035:
3.3.13. SOA RDATA format
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
/ MNAME /
/ /
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
/ RNAME /
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| SERIAL |
| |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| REFRESH |
| |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| RETRY |
| |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| EXPIRE |
| |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| MINIMUM |
| |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
where:
MNAME The of the name server that was the
original or primary source of data for this zone.
RNAME A which specifies the mailbox of the
person responsible for this zone.
SERIAL The unsigned 32 bit version number of the original
copy
of the zone. Zone transfers preserve this value. This
value wraps and should be compared using sequence space
arithmetic.
REFRESH A 32 bit time interval before the zone should be
refreshed.
RETRY A 32 bit time interval that should elapse before a
failed refresh should be retried.
EXPIRE A 32 bit time value that specifies the upper limit on
the time interval that can elapse before the zone is no
longer authoritative.
But do not that the semantics have changed in later RFCs, the MINIMUM is now called the NEGATIVE TTL.
Also IN (case not significant) means INternet but all records will have that, consider it as a left over of past DNS experiments around classes that never worked.

Related

What is the structure of the Presentation-Data-Value in P-Data-TF?

I find these for Presentation Data Value:
0 | 0 | 0 | 0 | 0 | 0 | last-or-not | command-or-dataset | **Some Message**
But I couldn't find Some Message part. I suppose this part include C-Find, C-Get etc. How can I know this structure?
Where do you have this from? In fact it is a bit different.
Your example should read
0 | 0 | 0 | 0 | 0 | 0 | last-or-not | command-or-data*set* | **Some Message**
So the "command-or-dataset" flag indicates whether the following bytes are encoding a command (as defined in PS3.7) or a dataset as defined in PS3.3 or 3.4 respectively).
E.g. for DICOM Queries, there is a C-FIND command defined in PS3.7, Chapter 9.1.2.1. In C-FIND, the query criteria are part of the command ("Identifier") in table 9.1-2. How the identifier is formed and all its semantics is subject of the Query/Retrieve Service Class as defined in PS3.4, C.4.1.
For transferring objects, there is a C-STORE command, also defined in PS3.7 (chapter 9.1.1.1). The Data-Set is also a part of the C-STORE command, and its contents depend on the type of data (SOP Class). This is referred to as an Information Object Definition (IOD) and defined in PS3.3. The protocol for Strorage is also defined in PS3.4 (Annex B)
However, the length limitation of the PDV will only allow to have the whole object encoded in a single PDV and needs to be split. For the following PDVs, no command set will be present but only a fragment of the dataset. In this case, the "command-or-dataset" bit must be set to 0.
I hope I could make it a bit clear. It is a bit difficult in the beginning of learning DICOM to know all the terms and the interrelationships.
Encoding
Logically command- and dataset are encoded in the same way. The data dictionary (Part 6) is a complete list of all possible attributes and the major difference between command- and data set attributes is that command attributes are having the group number 0 while data set attributes have "any even number but 0".
For each attribute, the data dictionary gives you the Value Representation (VR) which needs to be considered for encoding the value. E.g. "PN" for patient name "UI" for Unique Identifier and so forth. The VRs are defined in PS3.5, Chapter 6.2.
The encoding of attributes is then
group | element | (VR) | length (always even) | value
How this is transformed to the binary level depends on the Transfer Syntax (TS) that was agreed for the service during association negotiation. For this reason "VR" is enclosed in brackets above - it depends whether it is an implicit or explicit TS if this must/must not be present.
There are some more things to consider (endianess, sequence encoding) when encoding code sets or datasets in binary form. Basically everything about it is described in various chapters in PS3.5

Downloading the entire Bitcoin transaction chain with R

I'm pretty new here so thank you in advance for the help. I'm trying to do some analysis of the entire Bitcoin transaction chain. In order to do that, I'm trying to create 2 tables
1) A full list of all Bitcoin addresses and their balance, i.e.,:
| ID | Address | Balance |
-------------------------------
| 1 | 7d4kExk... | 32 |
| 2 | 9Eckjes... | 0 |
| . | ... | ... |
2) A record of the number of transactions that have ever occurred between any two addresses in the Bitcoin network
| ID | Sender | Receiver | Transactions |
--------------------------------------------------
| 1 | 7d4kExk... | klDk39D... | 2 |
| 2 | 9Eckjes... | 7d4kExk... | 3 |
| . | ... | ... | .. |
To do this I've written a (probably very inefficient) script in R that loops through every block and scrapes blockexplorer.com to compile the tables. I've tried running it a couple of times so far but I'm running into two main issues
1 - It's very slow... I can imagine it's going to take at least a week at the rate that it's going
2 - I haven't been able to run it for more than a day or two without it hanging. It seems to just freeze RStudio.
I'd really appreaciate your help in two areas:
1 - Is there a better way to do this in R to make the code run significantly faster?
2 - Should I stop using R altogether for this and try a different approach?
Thanks in advance for the help! Please see below for the relevant chunks of code I'm using
url_start <- "http://blockexplorer.com/b/"
url_end <- ""
readUrl <- function(url) {
table <- try(readHTMLTable(url)[[1]])
if(inherits(table,"try-error")){
message(paste("URL does not seem to exist:", url))
errors <- errors + 1
return(NA)
} else {
processed <- processed + 1
return(table)
}
}
block_loop <- function (end, start = 0) {
...
addr_row <- 1 #starting row to fill out table
links_row <- 1 #starting row to fill out table
for (i in start:end) {
print(paste0("Reading block: ",i))
url <- paste(url_start,i,url_end, sep = "")
table <- readUrl(url)
if(is.na(table)){ next }
....
There are very close to 250,000 blocks on the site you mentioned (at least, 260,000 gives a 404). Curling from my connection (1 MB/s down) gives an average speed of about half a second. Try it yourself from the command line (just copy and paste) to see what you get:
curl -s -w "%{time_total}\n" -o /dev/null http://blockexplorer.com/b/220000
I'll assume your requests are about as fast as mine. Half a second times 250,000 is 125,000 seconds, or a day and a half. This is the absolute best you can get using any methods because you have to request the page.
Now, after doing an install.packages("XML"), I saw that running readHTMLTable(http://blockexplorer.com/b/220000) takes about five seconds on average. Five seconds times 250,000 is 1.25 million seconds which is about two weeks. So your estimates were correct; this is really, really slow. For reference, I'm running a 2011 MacBook Pro with a 2.2 GHz Intel Core i7 and 8GB of memory (1333 MHz).
Next, table merges in R are quite slow. Assuming 100 records per table row (seems about average) you'll have 25 million rows, and some of these rows have a kilobyte of data in them. Assuming you can fit this table in memory, concatenating tables will be a problem.
The solution to these problems that I'm most familiar with is to use Python instead of R, BeautifulSoup4 instead of readHTMLTable, and Pandas to replace R's dataframe. BeautifulSoup is fast (install lxml, a parser written in C) and easy to use, and Pandas is very quick too. Its dataframe class is modeled after R's, so you probably can work with it just fine. If you need something to request URLs and return the HTML for BeautifulSoup to parse, I'd suggest Requests. It's lean and simple, and the documentation is good. All of these are pip installable.
If you still run into problems the only thing I can think of is to get maybe 1% of the data in memory at a time, statistically reduce it, and move on to the next 1%. If you're on a machine similar to mine, you might not have another option.

Bind query resolution time in munin

Is it possible to graph the query resolution time of bind9 in munin?
I know there is a way to graph it in a unbound server, is it already done in bind? If not how do I start writing a munin plugin for that? I'm getting stats from http://127.0.0.1:8053/ in the bind9 server.
I don't believe that "query time" is a function of BIND. About the only time that I see that value (with individual lookups) is when using dig. If you're willing to use that, the following might be a good starting point:
#!/bin/sh
case $1 in
config)
cat <<'EOM'
graph_title Red Hat Query Time
graph_vlabel time
time.label msec
EOM
exit 0;;
esac
echo -n "time.value "
dig www.redhat.com|grep Query|cut -d':' -f2|cut -d\ -f2
Note that there's two spaces after the "-d\" in the second cut statement. If you save the above as "querytime" and run it at the command line, output should look something like:
root#pi1:~# ./querytime
time.value 189
root#pi1:~# ./querytime config
graph_title Red Hat Query Time
graph_vlabel time
time.label msec
I'm not sure of the value in tracking the above though. The response time can be affected: if the query is an initial lookup, if the answer is cached locally, depending on server load, depending on intervening network congestion, etc.
Note: the above may be a bit buggy as I've written it on the fly, but it should give you a good starting point. That it returned the above output is a good sign.
In any case, recommend reading the following before you write your own: http://munin-monitoring.org/wiki/HowToWritePlugins

TCP/IP communication from the unix server to the Pure Data

I am interested in TCP/IP communication from the Unix server to the Pure Data. I have it realized using sockets on the Unix server side, and netclient on the Pure Data side. I exploited the chat-server tutorial for this (3.Networking > 10.chat_client.pd).
Now the problem lies that the server is streaming the data out as a "string" message delimited with ";"
My question is, is there a way to send something other than string message to Pure Data, like byte-stream or serialized number stream? Can Pure Data receive such messages?
Since string takes too many bytes to transfer, for example number "1024;" is already 5 bytes, while such an integer number is just 4 bytes.
UPDATE: For everyone that stumbles upon this post in search for the answer.
Apparently [netclient] on the Pure Data side cannot receive nothing else than ; delimited messages.
So the solution for the problem posed above:
My question is, is there a way to send something other than string message to Pure Data, like byte-stream or serialized number stream? Can Pure Data receive such messages?
The solution is to use [tcpclient], it can receive byte-stream data.
Now my question is, how do I get four compact numbers to work with?
Now I have a series of bytes, at least in the correct order.
From my UNIX server I am sending a structure
typedef struct {
int var_code;
int sample_time;
int hr;
float hs;
} phy_data;
Sample data might be 2 1000000 51 2000.56
When received and printed in Pure Data I get output like this:
: 0 0 0 2 0 10 114 26 0 0 0 51 0 16 242 78
You can notice number 2 and number 51 clearly, I guess the others are correct as well.
How can I get these numbers back to a usable format?
Maybe some manipulation with [bytes2any] and [route], but I haven't been able to extract the data with it?
here's an outline of what you have to do:
repackage the bytelist to small messages of the correct size for the various types.
since all your elements are 4 byte long, you simply repackage your list (or bytestream, as TCP/IP doesn't guarantee to deliver your 16 bytes as a single list, but could also decide to break it into a list of arbitrary length) to a number of 4 atom lists.
the most stable way, would probably be to 1st serialize the list (check the "serializer" example in the [list] help) and than reassamble that list to 4 elements.
if you can use externals like zexy you could use [repack 4] for that.
if you trust [netclient] to output your messages as complete lists, you could simply use a large [unpack ....] and 4 [pack]s
interpret the raw data for each sublist
integers is rather simple, floats are way more complicated
integers:
|
[unpack 0 0 0 0]
| | | |
[<< 8] | | |
| | | |
[+ ] | |
| | |
[<< 8] | |
| | |
[+ ] |
| |
[<< 8] |
| |
[+ ]
|
floats are left as an exercise to the user :-)
the real solution to your problem would be to use a well-defined application-layer protocol, rather than brew your own.
the most widespread protocol in use for applications like Pd, is certainly OSC.
in order to decode the raw OSC-bytes into Pd-messages, use [unpackOSC] (part of the "mrpeach" library; on Debian, you install it via the pd-osc package)
on the "server" side, you can use liblo for encoding data and sending it.
note
be aware that since OSC is packet-based, you will need a packetizing mechanism for stream-based protocols like TCP/IP. as with OSC-1.2, this should be SLIP. liblo should already take care of this. check the patches accompanying [unpackOSC] for how to do this within Pd.
all this is not needed if you are using a UDP as a transport.

How to setup futures instruments in FinancialInstrument to lookup data from CSIdata

Background
I am trying to setup my trade analysis environment. I am running some rule based strategies on futures on different brokers and trying to aggregate trades from different brokers in one place. I am using blotter package as my main tool for analysis.
Idea is to use blotter and PerformanceAnalytics for analysis of live performance of various strategies I am running.
Problem at hand
My source of future EOD data is CSIData. All the EOD OHLC prices for these futures are stored in CSV format in following directory structure. For each future there is seperate directory and each contract of the future has one csv file with OHLC price series.
|
+---AD
| AD_201203.TXT
| AD_201206.TXT
| AD_201209.TXT
| AD_201212.TXT
| AD_201303.TXT
| AD_201306.TXT
| AD_201309.TXT
| AD_201312.TXT
| AD_201403.TXT
| AD_201406.TXT
| AD_54.TXT
...
+---BO2
| BO2195012.TXT
| BO2201201.TXT
| BO2201203.TXT
| BO2201205.TXT
| BO2201207.TXT
| BO2201208.TXT
| BO2201209.TXT
| BO2201210.TXT
| BO2201212.TXT
| BO2201301.TXT
...
I have managed to define root contracts for all the futures (e.g. in above case AD, BO2 etc) I will be using in FinancialInstrument with CSIData symbols as primary identifiers.
I am now struggling on how to define all the actual individual future contracts (e.g. AD_201203, AD_201206 etc) and setup their lookup using setSymbolLookup.FI.
Any pointers on how to do that?
To setup individual future contracts, I looked into ?future_series and ?build_series_symbols, however, the suffixes they support seem to be only of Future Month code format. So I have a feeling I am left with setting up each individual future contract manually. e.g.
build_series_symbols(data.frame(primary_id=c('ES','NQ'), month_cycle=c('H,M,U,Z'), yearlist = c(10,11)))
[1] "ESH0" "ESM0" "ESU0" "ESZ0" "NQH0" "NQM0" "NQU0" "NQZ0" "ESH1" "ESM1" "ESU1" "ESZ1" "NQH1" "NQM1" "NQU1" "NQZ1"
I have no clue where to start digging for my second part of my question i.e. setting price lookup for these futures from CSI.
PS: If this is not right forum for this kind of question, I am happy to get it moved to right section or even ask on totally different forum altogether.
PPS: Can someone with higher reputation tag this question with FinancialInstrument and CSIdata? Thanks!
The first part just works.
R> currency("USD")
[1] "USD"
R> future("AD", "USD", 100000)
[1] "AD"
Warning message:
In future("AD", "USD", 1e+05) :
underlying_id should only be NULL for cash-settled futures
R> future_series("AD_201206", expires="2012-06-18")
[1] "AD_201206"
R> getInstrument("AD_201206")
primary_id :"AD_201206"
currency :"USD"
multiplier :1e+05
tick_size : NULL
identifiers: list()
type :"future_series" "future"
root_id :"AD"
suffix_id :"201206"
expires :"2012-06-18"
Regarding the second part, I've never used setSymbolLookup.FI. I'd either use setSymbolLookup directly, or set a src instrument attribute if I were going to go that route.
However, I'd probably make a getSymbols method, maybe getSymbols.mycsv, that knows how to find your data if you give it a dir argument. Then, I'd just setDefaults on your getSymbols method (assuming that's how most of your data are stored).
I save data with saveSymbols.days(), and use getSymbols.FI daily. I think it wouldn't be much effort to tweak getSymbols.FI to read csv files instead of RData files. So, I suggest looking at that code.
Then, you can just
setDefaults("getSymbols", src="mycsv")
setDefaults("getSymbols.mycsv", dir="path/to/dir")
Or, if you prefer
setSymbolLookup(AD_201206=list(src="mycsv", dir="/path/to/dir"))
or (essentially the same thing)
instrument_attr("AD_201206", "src", list(src="mycsv", dir="/path/to/dir")

Resources