Resource Records - networking

I have these two questions but I dont know if they either possible or impossible:
The RR included in the response message 7 is of type NS.
The RR included in the response message 3 is an authoritative response
So I know that type NS usually from non-authoritative so 1 should be impossible and for 2 root DNS server are different from authoritative so I think it would also impossible?

For question 1: I think you can use dig command.(that can traced the resolve log details to get which root name server you used)
There is my operation and output to get the root name server while query cis.poly.edu's Resource Records below. And the root name server is: k.root-servers.net
# shell
dig cis.poly.edu +trace
# output
; <<>> DiG 9.10.6 <<>> cis.poly.edu +trace
;; global options: +cmd
. 57690 IN NS j.root-servers.net.
. 57690 IN NS a.root-servers.net.
. 57690 IN NS g.root-servers.net.
. 57690 IN NS d.root-servers.net.
. 57690 IN NS c.root-servers.net.
. 57690 IN NS i.root-servers.net.
. 57690 IN NS m.root-servers.net.
. 57690 IN NS h.root-servers.net.
. 57690 IN NS e.root-servers.net.
. 57690 IN NS l.root-servers.net.
. 57690 IN NS b.root-servers.net.
. 57690 IN NS f.root-servers.net.
. 57690 IN NS k.root-servers.net.
. 518398 IN RRSIG NS 8 0 518400 20221028050000 20221015040000 18733 . DfUYqN7yeMl12Qrjlk3XF3uXzeVKCPOO4Z8cJIy67ago71Oad9W2iyey rpKZ5wK3FJqSB5HY0s7IICtLOUGIgWjYGJ33xAI9JzbkMVX67bxYr3lQ hkQrPLs/KAU+vBR1cq18+97gDaBb0LZ6jmNP5TQ73wKDvAlK0sQ5X0J8 08ZdLsGtyCoyiO+xxYwdD/t4AaetIjIuSaImTlRQ+PHL/CobmS563iUX dnSBzM5zzN5kmWAtyVvLtZ7illgOmoIrF3wkau9f28YOrpuKqliz33vy 9thColeVbEcXG2AIy4yBP4ZwPO0DgBeWSz9zPKKMTySoIhKaLV10l9sK zhizPw==
;; Received 1097 bytes from 8.8.8.8#53(8.8.8.8) in 6 ms
edu. 172800 IN NS a.edu-servers.net.
edu. 172800 IN NS b.edu-servers.net.
edu. 172800 IN NS c.edu-servers.net.
edu. 172800 IN NS d.edu-servers.net.
edu. 172800 IN NS e.edu-servers.net.
edu. 172800 IN NS f.edu-servers.net.
edu. 172800 IN NS g.edu-servers.net.
edu. 172800 IN NS h.edu-servers.net.
edu. 172800 IN NS i.edu-servers.net.
edu. 172800 IN NS j.edu-servers.net.
edu. 172800 IN NS k.edu-servers.net.
edu. 172800 IN NS l.edu-servers.net.
edu. 172800 IN NS m.edu-servers.net.
edu. 86400 IN DS 28065 8 2 4172496CDE85534E51129040355BD04B1FCFEBAE996DFDDE652006F6 F8B2CE76
edu. 86400 IN RRSIG DS 8 1 86400 20221028050000 20221015040000 18733 . s8v68Yclxtfl3mgPNWGA6e1HoMtodjbCurEEghXwlfJgVNDLCRJ5TSF9 r4UUkrE/eTVEUyeXls9XkD2AzkwnG4lS/Hdl2FajFzfCWBoCx7LwkSZy iaFfYqZosTS13BFjlRv7GnNevXJUrxkrJafUNCDNVVLH1UCChp11GaSZ LvGiTqvfbx6XdAJEygnEDPtvC3BO8YQUykGZahp4x8zF7mtkOnNDbU7x wyjcZWvEiGQO2BRZ1sujVIe3/TZZ+lgViba6s0KAYxQPI/76wtrQuvh2 oikExIxVkznOAJK1HRc7mZdtWCxJxNKW7NEBW9WOdIXteKzV+c/GYBjs QY2c7w==
;; Received 1171 bytes from 193.0.14.129#53(k.root-servers.net) in 35 ms
poly.edu. 172800 IN NS photon.poly.edu.
poly.edu. 172800 IN NS gatekeeper.poly.edu.
poly.edu. 172800 IN NS ns1.poly.edu.
poly.edu. 172800 IN NS ns2.poly.edu.
9DHS4EP5G85PF9NUFK06HEK0O48QGK77.edu. 86400 IN NSEC3 1 1 0 - 9DQL407IJB4JHADMF7CJMUFNQMR4DF7M NS SOA RRSIG DNSKEY NSEC3PARAM
9DHS4EP5G85PF9NUFK06HEK0O48QGK77.edu. 86400 IN RRSIG NSEC3 8 2 86400 20221020062758 20221013051758 18290 edu. LwzY23sau/ano67Cw5nOcaKY0H7b6LcTb0mdBJnAhZEJ9ywUJoKNtsjk lOSu6XRpheobElDX9M0Oz5YGvXLmnyOR83oAPACJFGFxttR3f+sACxXj OkVNptQgJF0TBF+4O26a+bX0pA354hYvnw/u+xOLv2hjRXjBZ9zaFXwE 7ztFKP09ndUtDUITJUnSfxYxXGxomY+7UVyXohDX6eI7tw==
JGCR51VNUV920A68TOFV2PDTHLDARKQ2.edu. 86400 IN NSEC3 1 1 0 - JIK2N7K7PE541135EMJS71QQL384FDGV NS DS RRSIG
JGCR51VNUV920A68TOFV2PDTHLDARKQ2.edu. 86400 IN RRSIG NSEC3 8 2 86400 20221022063427 20221015052427 18290 edu. Ynqc+LyXMy/0l3/JANMOp0jk9CYNItwTPLPJrAtkmkramgOM2B4F5Han 7VYnj7BXJjKLjmigu8+IyfiO7s0bwIqjhfhM1PJcFFRBWkXSzxMbu/sw P3ZCvHcmd5Qhov+c0a2qegwy4gPCemgh8bDUgPLDKUehqDqE6JsFHMmx yhUfTOTmSNpEHZYoYPEFVeYIeWiVUyVHWi2z6ZE2KBJmOw==
;; Received 736 bytes from 192.12.94.30#53(e.edu-servers.net) in 267 ms
cis.poly.edu. 300 IN A 128.238.64.106
poly.edu. 300 IN NS poly-addr-vm-01.poly.edu.
poly.edu. 300 IN NS poly-ad-vm-02.poly.edu.
poly.edu. 300 IN NS dns-vm-01.poly.edu.
poly.edu. 300 IN NS photon.poly.edu.
poly.edu. 300 IN NS ns1.poly.edu.
poly.edu. 300 IN NS ns2.poly.edu.
poly.edu. 300 IN NS poly-ad-vm-01.poly.edu.
poly.edu. 300 IN NS gatekeeper.poly.edu.
poly.edu. 300 IN NS wins-vm-01.poly.edu.
;; Received 418 bytes from 128.238.2.38#53(ns1.poly.edu) in 193 ms

Related

Unexpected memory allocation in basic for loop?

I am wondering why #btime reports one memory allocation per element in basic loops like these ones:
julia> using BenchmarkTools
julia> v=[1:15;]
15-element Array{Int64,1}:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
julia> #btime for vi in v end
1.420 μs (15 allocations: 480 bytes)
julia> #btime for i in eachindex(v) v[i]=-i end
2.355 μs (15 allocations: 464 bytes)
I do not know how to interpret this result:
is it a bug/artifact of #btime?
is there really one alloc per element? (this would ruin performance...)
julia> versioninfo()
Julia Version 1.5.1
Commit 697e782ab8 (2020-08-25 20:08 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) CPU E5-2603 v3 # 1.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, haswell)
You're benchmarking access to the global variable v, which is the very first performance tip you should be aware of.
With BenchmarkTools you can work around that by interpolating v:
julia> #btime for vi in v end
555.962 ns (15 allocations: 480 bytes)
julia> #btime for vi in $v end
1.630 ns (0 allocations: 0 bytes)
But note that in general it's better to put your code in functions. The global scope is just bad for performance:
julia> f(v) = for vi in v end
f (generic function with 1 method)
julia> #btime f(v)
11.410 ns (0 allocations: 0 bytes)
julia> #btime f($v)
1.413 ns (0 allocations: 0 bytes)

Any better equivalent for pandas value_counts in julia dataframes?

I was searching for an equivalent for the very convenient value_counts in pandas for a series in a dataframe in julia.
Unfortunately I could not find anything here, so my solution for a value_counts in a julia dataframe is as follows. However I didn't like my solution very much, as it is not as convinient compared to pandas with the method .value_counts(). So my question, is there another (more convinient) option than this?
jdf = DataFrame(rand(Int8, (1000000, 3)))
which gives me:
│ Row │ x1 │ x2 │ x3 │
│ │ Int8 │ Int8 │ Int8 │
├─────────┼──────┼──────┼──────┤
│ 1 │ -97 │ 98 │ 79 │
│ 2 │ -77 │ -118 │ -19 │
⋮
│ 999998 │ -115 │ 17 │ 107 │
│ 999999 │ -43 │ -64 │ 72 │
│ 1000000 │ 40 │ -11 │ 31 │
Value count for the first column would be:
combine(nrow,groupby(jdf,:x1))
which returns:
│ Row │ x1 │ nrow │
│ │ Int8 │ Int64 │
├─────┼──────┼───────┤
│ 1 │ -97 │ 3942 │
│ 2 │ -77 │ 3986 │
⋮
│ 254 │ 12 │ 3899 │
│ 255 │ -92 │ 3973 │
│ 256 │ -49 │ 3952 │
In DataFrames.jl this is the way to get the result you want. In general the approach in DataFrames.jl is to have a minimal API. If you use combine(nrow,groupby(jdf,:x1)) often then you can just define:
value_counts(df, col) = combine(groupby(df, col), nrow)
in your script.
Alternative ways to achieve what you want are using FreqTables.jl or StatsBase.jl:
julia> freqtable(jdf, :x1)
256-element Named Array{Int64,1}
x1 │
─────┼─────
-128 │ 3875
-127 │ 3931
-126 │ 3924
⋮ ⋮
125 │ 3873
126 │ 3917
127 │ 3975
julia> countmap(jdf.x1)
Dict{Int8,Int64} with 256 entries:
-98 => 3925
-74 => 4054
11 => 3798
-56 => 3853
29 => 3765
-105 => 3918
⋮ => ⋮
(the difference is that the output type will differ)
In terms of performance countmap is fastest, and combine is slowest:
julia> using BenchmarkTools
julia> #benchmark countmap($jdf.x1)
BenchmarkTools.Trial:
memory estimate: 16.80 KiB
allocs estimate: 14
--------------
minimum time: 436.000 μs (0.00% GC)
median time: 443.200 μs (0.00% GC)
mean time: 455.244 μs (0.22% GC)
maximum time: 5.362 ms (91.59% GC)
--------------
samples: 10000
evals/sample: 1
julia> #benchmark freqtable($jdf, :x1)
BenchmarkTools.Trial:
memory estimate: 37.22 KiB
allocs estimate: 86
--------------
minimum time: 7.972 ms (0.00% GC)
median time: 8.089 ms (0.00% GC)
mean time: 8.158 ms (0.00% GC)
maximum time: 10.016 ms (0.00% GC)
--------------
samples: 613
evals/sample: 1
julia> #benchmark combine(groupby($jdf,:x1), nrow)
BenchmarkTools.Trial:
memory estimate: 23.28 MiB
allocs estimate: 183
--------------
minimum time: 12.679 ms (0.00% GC)
median time: 14.572 ms (8.68% GC)
mean time: 15.239 ms (14.50% GC)
maximum time: 20.385 ms (21.83% GC)
--------------
samples: 328
evals/sample: 1
Note though that in combine most of the cost is grouping, so if you have the GroupedDataFrame object created already then combine is relatively fast:
julia> gdf = groupby(jdf,:x1);
julia> #benchmark combine($gdf, nrow)
BenchmarkTools.Trial:
memory estimate: 16.16 KiB
allocs estimate: 152
--------------
minimum time: 680.801 μs (0.00% GC)
median time: 714.800 μs (0.00% GC)
mean time: 737.568 μs (0.15% GC)
maximum time: 4.561 ms (83.47% GC)
--------------
samples: 6766
evals/sample: 1
EDIT
If you want a sorted dict then load DataStructures.jl and then do:
sort!(OrderedDict(countmap(jdf.x1)))
or
sort!(OrderedDict(countmap(jdf.x1)), byvalue=true)
depending on by what you want to sort the dictionary.

ezANOVA inputs with Shiny

So I am having a little trouble with listing inputs within functions, particularly ezANOVA(). Here is what I have for code so far:
ui.R:
library(shiny)
shinyUI(pageWithSidebar(
headerPanel('Analysis of Variance'),
sidebarPanel(
fileInput("file1", "CSV File", accept=c("text/csv", "text/comma-separated-values,text/plain", ".csv")),
checkboxInput("header", "Header", TRUE),
radioButtons('sep', 'Separator',c(Comma=',',Semicolon=';',Tab='\t'),','),
uiOutput('var')
),
mainPanel(
tableOutput('aovSummary')
)
)
)
server.R:
library(shiny)
library(ez)
shinyServer(function(input, output) {
csvfile <- reactive({
csvfile <- input$file1
if (is.null(csvfile)){return(NULL)}
dt <- read.csv(csvfile$datapath, header=input$header, sep=input$sep)
dt
})
output$var <- renderUI({
if(is.null(input$file1$datapath)){return()}
else{
return(list(radioButtons("estimate", "Please Pick The Dependent Variable", choices = names(csvfile())),
radioButtons("between1", "Please Pick The Between Subjects Factor", choices = names(csvfile())),
radioButtons("within1", "Please Pick The Within Subjects Factor", choices = names(csvfile())),
radioButtons("sid", "Please Pick The Subject Id Variable", choices = names(csvfile())),
actionButton("submit", "Submit")))
}
})
output$aovSummary = renderTable({
if(is.null(input$file1$datapath)){return()}
if(input$submit > 0){
aov.out <- ezANOVA(data = csvfile(), dv = .(input$estimate), wid = .(input$sid), between = .(input$between1),
within = .(input$within1), detailed = TRUE, type = "III")
return(aov.out)
}
})
})
Here is the data I have been testing it with:
Animal Visit Dose Estimate
2556 0 3 1.813206946
2557 0 3 1.933397744
2558 0 3 1.689893603
2559 0 3 1.780301984
2560 0 3 1.654374476
2566 0 10 3.401283412
2567 0 10 3.015958525
2568 0 10 2.808705611
2569 0 10 3.185718418
2570 0 10 2.767128836
2576 0 30 3.941412617
2577 0 30 3.793328436
2578 0 30 4.240736154
2579 0 30 3.859611218
2580 0 30 4.049743097
2586 0 100 5.600261483
2587 0 100 5.588115651
2588 0 100 5.089081008
2589 0 100 5.108262681
2590 0 100 5.343876403
2556 27 3 1.453587471
2557 27 3 1.994413484
2558 27 3 1.638132168
2559 27 3 2.138289747
2560 27 3 1.799769874
2566 27 10 3.302851871
2567 27 10 3.014199997
2568 27 10 3.190990162
2569 27 10 3.577924375
2570 27 10 3.537461068
2576 27 30 4.470837132
2577 27 30 4.081833308
2578 27 30 4.497192825
2579 27 30 4.205494309
2580 27 30 4.234496088
2586 27 100 6.054284369
2587 27 100 5.436697078
2588 27 100 5.398721492
2589 27 100 4.990794986
2590 27 100 5.573305744
2551 0 3 1.838550166
2552 0 3 1.847992942
2553 0 3 1.349892703
2554 0 3 1.725937126
2555 0 3 1.534652719
2561 0 10 2.931535704
2562 0 10 2.947599556
2563 0 10 3.092658629
2564 0 10 2.837625632
2565 0 10 2.970227467
2571 0 30 4.00746885
2572 0 30 3.921844968
2573 0 30 3.575724773
2574 0 30 4.17137839
2575 0 30 4.25251528
2581 0 100 4.785295667
2582 0 100 5.610955803
2583 0 100 5.497109771
2584 0 100 5.262724458
2585 0 100 5.430003698
2551 27 3 1.9326519
2552 27 3 2.313193186
2553 27 3 1.815261865
2554 27 3 1.345218914
2555 27 3 1.339432001
2561 27 10 3.305894401
2562 27 10 3.192621055
2563 27 10 3.76947789
2564 27 10 3.127887366
2565 27 10 3.231750087
2571 27 30 4.306556353
2572 27 30 4.232038905
2573 27 30 4.042378186
2574 27 30 4.784843929
2575 27 30 4.723665015
2581 27 100 5.601181262
2582 27 100 5.828647795
2583 27 100 5.652171222
2584 27 100 5.326512658
2585 27 100 6.009774247
The error I receive in the browser is:
"input$estimate" is not a variable in the data frame provided.
So, the function ezANOVA() is not using the actual variable name but rather the string "input$estimate", that is not what I want it to do.
How would I go about fixing this problem or is it helpless?
Thanks in advance for all your help!
You need to dynamically construct the call to ezANOVA(), i.e. use the value of the strings in your input variables to define the function call. Due to its LISP heritage, this is relatively easy in R via eval. (Relatively easy because strings are still painful in R and you need to manipulate strings to make this work). Here's a minimal working version of your code.
server.R
library(shiny)
library(ez)
shinyServer(function(input, output) {
csvfile <- reactive({
csvfile <- input$file1
if (is.null(csvfile)){return(NULL)}
dt <- read.csv(csvfile$datapath, header=input$header, sep=input$sep)
dt
})
output$var <- renderUI({
if(!is.null(input$file1$datapath)){
d <- csvfile()
anova.opts <- list(
radioButtons("estimate", "Please Pick The Dependent Variable", choices = names(d)),
radioButtons("between1", "Please Pick The Between Subjects Factor", choices = names(d)),
radioButtons("within1", "Please Pick The Within Subjects Factor", choices = names(d)),
radioButtons("sid", "Please Pick The Subject Id Variable", choices = names(d)),
actionButton("submit", "Submit")
)
anova.opts
}
})
output$aovSummary = renderTable({
if(!is.null(input$submit)){
aov.out <- eval(parse(text=paste(
"ezANOVA(data = csvfile()
, dv = .(", input$estimate, ")
, wid = .(", input$sid, ")
, between = .(", input$between1, ")
, within = .(", input$within1, ")
, detailed = TRUE, type = \"III\")")))
aov.out$ANOVA
}
})
})
ui.R
library(shiny)
shinyUI(pageWithSidebar(
headerPanel('Analysis of Variance'),
sidebarPanel(
fileInput("file1", "CSV File", accept=c("text/csv", "text/comma-separated-values,text/plain", ".csv")),
checkboxInput("header", "Header", TRUE),
radioButtons('sep', 'Separator',c(Comma=',',Semicolon=';',Tab='\t', `White Space`=''),''),
uiOutput('var')
),
mainPanel(
tableOutput('aovSummary')
)
)
)
I've changed/fixed a number of smaller issues, but the two most significant changes not related to eval() were:
Including an option for letting R do its usual thing with white-space as a field separater.
Changed the render function to include the actual ANOVA table. ezANOVA returns a list, the first entry of which is always ANOVA and contains the ANOVA table. However, there are sometimes further entries for assumption tests and post-hoc corrections, e.g. Mauchly's Test for Sphericity and Huynh-Feldt correction. You really need to add logic to deal with these when they're present.
Code style is also an issue -- it's better to get rid of empty if blocks followed by a full else and instead just test for the condition where you actually have code to run. Let R "fall off" the end of the function simulate a non existent return value.
I'm assuming UI improvements were waiting for a working example, but you need to consider:
meaningful defaults, perhaps on variable type, for the different arguments and/or not reacting to the radio buttons, instead only reacting to an action button. Otherwise you get confusing errors from ezANOVA while you're setting the values.
what happens if you have pure between or pure within designs?
You might also want to take a look at conditionalPanel() for hiding further options until an initial option (data file) is set in a meaningful way.

why round-trip time different between two test host?

I have written one http put client(use libcurl libarary) to put file into apache webdav server, and use tcpdump catch the packet at the server side, then use tcptrace (www.tcptrace.org) to analysis the dump file, below is the result:
Host a is the client side, Host b is the server side:
a->b: b->a:
total packets: 152120 total packets: 151974
ack pkts sent: 152120 ack pkts sent: 151974
pure acks sent: 120 pure acks sent: 151854
sack pkts sent: 0 sack pkts sent: 0
dsack pkts sent: 0 dsack pkts sent: 0
max sack blks/ack: 0 max sack blks/ack: 0
unique bytes sent: 3532149672 unique bytes sent: 30420
actual data pkts: 152000 actual data pkts: 120
actual data bytes: 3532149672 actual data bytes: 30420
rexmt data pkts: 0 rexmt data pkts: 0
rexmt data bytes: 0 rexmt data bytes: 0
zwnd probe pkts: 0 zwnd probe pkts: 0
zwnd probe bytes: 0 zwnd probe bytes: 0
outoforder pkts: 0 outoforder pkts: 0
pushed data pkts: 3341 pushed data pkts: 120
SYN/FIN pkts sent: 0/0 SYN/FIN pkts sent: 0/0
req 1323 ws/ts: N/Y req 1323 ws/ts: N/Y
urgent data pkts: 0 pkts urgent data pkts: 0 pkts
urgent data bytes: 0 bytes urgent data bytes: 0 bytes
mss requested: 0 bytes mss requested: 0 bytes
max segm size: 31856 bytes max segm size: 482 bytes
min segm size: 216 bytes min segm size: 25 bytes
avg segm size: 23237 bytes avg segm size: 253 bytes
max win adv: 125 bytes max win adv: 5402 bytes
min win adv: 125 bytes min win adv: 5402 bytes
zero win adv: 0 times zero win adv: 0 times
avg win adv: 125 bytes avg win adv: 5402 bytes
initial window: 15928 bytes initial window: 0 bytes
initial window: 1 pkts initial window: 0 pkts
ttl stream length: NA ttl stream length: NA
missed data: NA missed data: NA
truncated data: 0 bytes truncated data: 0 bytes
truncated packets: 0 pkts truncated packets: 0 pkts
data xmit time: 151.297 secs data xmit time: 150.696 secs
idletime max: 44571.3 ms idletime max: 44571.3 ms
throughput: 23345867 Bps throughput: 201 Bps
RTT samples: 151915 RTT samples: 120
RTT min: 0.0 ms RTT min: 0.1 ms
RTT max: 0.3 ms RTT max: 40.1 ms
RTT avg: 0.0 ms RTT avg: 19.9 ms
RTT stdev: 0.0 ms RTT stdev: 19.8 ms
RTT from 3WHS: 0.0 ms RTT from 3WHS: 0.0 ms
RTT full_sz smpls: 74427 RTT full_sz smpls: 60
RTT full_sz min: 0.0 ms RTT full_sz min: 39.1 ms
RTT full_sz max: 0.3 ms RTT full_sz max: 40.1 ms
RTT full_sz avg: 0.0 ms RTT full_sz avg: 39.6 ms
RTT full_sz stdev: 0.0 ms RTT full_sz stdev: 0.3 ms
post-loss acks: 0 post-loss acks: 0
segs cum acked: 89 segs cum acked: 0
duplicate acks: 0 duplicate acks: 0
triple dupacks: 0 triple dupacks: 0
max # retrans: 0 max # retrans: 0
min retr time: 0.0 ms min retr time: 0.0 ms
max retr time: 0.0 ms max retr time: 0.0 ms
avg retr time: 0.0 ms avg retr time: 0.0 ms
sdv retr time: 0.0 ms sdv retr time: 0.0 ms
According the result above, the RTT of client to server is small, but the server side to client side is large. Can anyone help explain this from me?
Because this
unique bytes sent: 3532149672 unique bytes sent: 30420
actual data pkts: 152000 actual data pkts: 120
actual data bytes: 3532149672 actual data bytes: 30420
a->b is sending a steady flow of data, which ensures buffers get filled and things get pushed.
b->a is only sending a few acks etc, doing next to nothing at all, so as a result things get left in buffers for a while (a few ms).
In addition to that, RTT is round trip time. It's the time from when the application queues a packet for sending and when the corresponding response is received. Since the host on a is busy pushing data, and probably filling its own buffers, there's going to be a small amount of additional overhead for something from b to get acknowledged.
Firstly host b sent very little data (a very small sample size). Secondly, I suspect that host a has an asymmetrical Internet connection (e.g. 10MB/1MB).

Converting unknown binary data into series of numbers? (with a known example)

I'm trying to find a way to convert files in a little-used archaic file format into something human readable...
As an example, od -x myfile gives:
0000000 2800 4620 1000 461e c800 461d a000 461e
0000020 8000 461e 2800 461e 5000 461f b800 461e
0000040 b800 461d 4000 461c a000 461e 3800 4620
0000060 f800 4621 7800 462a e000 4622 2800 463c
0000100 2000 464a 1000 4654 8c00 4693 5000 4661
0000120 7000 46ac 6c00 46d1 a400 4695 3c00 470a
0000140 b000 46ca 7400 46e9 c200 471b 9400 469e
0000160 9c00 4709 cc00 4719 4000 46b0 6400 46cc
...
which I know corresponds to these integers:
10250 10116 10098 10152 10144 10122 10196 10158
10094 10000 10152 10254 10366 10910 10424 12042
12936 13572 18886 14420 22072 ...
but I have no idea how to convert one to the other!!
Many many thanks to anyone who can help.
If possible, general tips for what to try/where to begin in this situation would also be appreciated.
Update: I put the full binary file online here http://pastebin.com/YL2ApExG and the numbers it corresponds to here http://pastebin.com/gXNntsaJ
In the hex dump, it seems to alternate between four digits, presumably they correspond to the numbers I want? separated either by 4600 or 4700. Unfortunately, I don't know where to go from here!
Someone else asked below: the binary file is a .dat file generated by an old spectroscopy program... it's 1336 bytes and corresponds to 334 integers, so it's four bytes per integer.
Well this is what you can do -
Step I: Do the od -x of the file and redirect it to a temp file (eg. hexdump.txt)
od -x myfile > hexdump.txt
Step II: You will now have a text file that contains hexadecimal values which you can view using the cat command. Something like this -
[jaypal~/Temp]$ cat hexdump.txt
0000000 2800 4620 1000 461e c800 461d a000 461e
0000020 8000 461e 2800 461e 5000 461f b800 461e
0000040 b800 461d 4000 461c a000 461e 3800 4620
0000060 f800 4621 7800 462a e000 4622 2800 463c
0000100 2000 464a 1000 4654 8c00 4693 5000 4661
0000120 7000 46ac 6c00 46d1 a400 4695 3c00 470a
0000140 b000 46ca 7400 46e9 c200 471b 9400 469e
0000160 9c00 4709 cc00 4719 4000 46b0 6400 46cc
Step III: The first column isn't really important to you. Columns 2 thru 9 are important. We will now strip the file using AWK so that you can convert it to decimal. We will add space so that we can consider each value as an individual field. We will also add "0x" to it so that we can pass it as a hexadecimal value.
[jaypal~/Temp]$ awk '{for (i=2;i<=NF;i++) printf "0x"$i" "}' hexdump.txt > hexdump1.txt
[jaypal~/Temp]$ cat hexdump1.txt
0x2800 0x4620 0x1000 0x461e 0xc800 0x461d 0xa000 0x461e 0x8000 0x461e 0x2800 0x461e 0x5000 0x461f 0xb800 0x461e 0xb800 0x461d 0x4000 0x461c 0xa000 0x461e 0x3800 0x4620 0xf800 0x4621 0x7800 0x462a 0xe000 0x4622 0x2800 0x463c 0x2000 0x464a 0x1000 0x4654 0x8c00 0x4693 0x5000 0x4661 0x7000 0x46ac 0x6c00 0x46d1 0xa400 0x4695 0x3c00 0x470a 0xb000 0x46ca 0x7400 0x46e9 0xc200 0x471b 0x9400 0x469e 0x9c00 0x4709 0xcc00 0x4719 0x4000 0x46b0 0x6400 0x46cc
Step IV: Now we will convert each hexadecimal value into decimal using printf function with AWK.
[jaypal~/Temp]$ gawk --non-decimal-data '{ for (i=1;i<=NF;i++) printf ("%05d ", $i)}' hexdump1.txt > hexdump2.txt
[jaypal~/Temp]$ cat hexdump2.txt
10240 17952 04096 17950 51200 17949 40960 17950 32768 17950 10240 17950 20480 17951 47104 17950 47104 17949 16384 17948 40960 17950 14336 17952 63488 17953 30720 17962 57344 17954 10240 17980 08192 17994 04096 18004 35840 18067 20480 18017 28672 18092 27648 18129 41984 18069 15360 18186 45056 18122 29696 18153 49664 18203 37888 18078 39936 18185 52224 18201 16384 18096 25600 18124
Step V: Formatting to make it easily readable
[jaypal~/Temp]$ sed 's/.\{48\}/&\n/g' < hexdump2.txt > hexdump3.txt
[jaypal~/Temp]$ cat hexdump3.txt
10240 17952 04096 17950 51200 17949 40960 17950
32768 17950 10240 17950 20480 17951 47104 17950
47104 17949 16384 17948 40960 17950 14336 17952
63488 17953 30720 17962 57344 17954 10240 17980
08192 17994 04096 18004 35840 18067 20480 18017
28672 18092 27648 18129 41984 18069 15360 18186
45056 18122 29696 18153 49664 18203 37888 18078
39936 18185 52224 18201 16384 18096 25600 18124

Resources