R - subsetting by date - r

i'm trying to subset a large dataframe by date field ad facing strange behaviour:
1) find interesting time interval:
> ld[ld$bps>30000000,]
Date.first.seen Duration Proto Src.IP.Addr Src.Pt Dst.IP.Addr Dst.Pt Tos Packets Bytes bps
1400199 2015-03-31 13:52:24 0.008 TCP 3.3.3.3 3128 4.4.4.4 65115 0 39 32507 32500000
1711899 2015-03-31 14:58:10 0.004 TCP 3.3.3.3 3128 4.4.4.7 49357 0 29 23830 47700000
2) and try to look whats happening on that second:
> ld[ld$Date.first.seen=="2015-03-31 13:52:24",]
Date.first.seen Duration Proto Src.IP.Addr Src.Pt Dst.IP.Addr Dst.Pt Tos Packets Bytes bps
1401732 2015-03-31 13:52:24 17.436 TCP 3.3.3.3 3128 6.6.6.6 51527 0 3 1608 737
don't really understand the behavior - i should get way more results.
for example
> ld[1399074,]
Date.first.seen Duration Proto Src.IP.Addr Src.Pt Dst.IP.Addr Dst.Pt Tos Packets Bytes bps
1399074 2015-03-31 13:52:24 0.152 TCP 10.10.10.10 3128 11.11.11.11 62375 0 8 3910 205789
for date i use POSIXlt
> str(ld)
'data.frame': 2657583 obs. of 11 variables:
$ Date.first.seen: POSIXlt, format: "2015-03-31 06:00:00" "2015-03-31 06:00:00" "2015-03-31 06:00:00" "2015-03-31 06:00:01" ...
...
would appreciate any assistance. thanks!

POSIXlt may carry additional info which is supressed when printing the entire data.frame, timezone, daylight savings etc. Have a look at https://stat.ethz.ch/R-manual/R-devel/library/base/html/DateTimeClasses.html.
Printing only the POSIXlt variable (ld$Date.first.seen) does generally supply at least some of this additional information.
If you're not for some particular reason required to keep your variable in the POSIXlt and if you don't need the extra functionality the format enables, a simple:
ld$Date.first.seen = as.character(ld$Date.first.seen)
Added before your subset statement will probably solve your problem.

Related

iperf3 - Meaning of Retr column in TCP measurement

Sorry for asking this topic, but after reading the tool's documentation and the similar ticket to my question (https://github.com/esnet/iperf/issues/343), I still don't really understand/know the meaning of the Retr column in a TCP measurement, and I do not get how to "use" it :-(
Let's say there is a result, like below, where are 5 retries. I got, these are the number of TCP segments retransmitted, but
were these retransmitted successfully, or they were just retried to send and not know about the result of that?
If I would like to see some kind of summa at the end in percentage (%), can the tool print it, similar to the UDP measurement? If not, how can I get the summa sent/received segments for compute the failure ratio?
Version of the tool:
>batman#bat-image:~$ iperf3 -v
iperf 3.8.1 (cJSON 1.7.13)
Linux bat-image 4.15.0-106-generic #107-Ubuntu SMP Thu Jun 4 11:27:52 UTC 2020 x86_64
Optional features available: CPU affinity setting, IPv6 flow label, TCP congestion algorithm setting, sendfile / zerocopy, socket pacing
batman#bat-image:~$
OS:
Ubuntu-18.04
batman#bat-image:~$ uname -aLinux bat-image 4.15.0-106-generic #107-Ubuntu SMP Thu Jun 4 11:27:52 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
batman#bat-image:~$
The log:
batman#bat-image:~$iperf3 -c 192.168.122.1 -f K -B 192.168.122.141 -b 10m -t 10
Connecting to host 192.168.122.1, port 5201
[ 5] local 192.168.122.141 port 34665 connected to 192.168.122.1 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.25 MBytes 10.5 Mbits/sec 0 297 KBytes
[ 5] 1.00-2.00 sec 1.25 MBytes 10.5 Mbits/sec 0 297 KBytes
[ 5] 2.00-3.00 sec 1.12 MBytes 9.43 Mbits/sec 0 297 KBytes
[ 5] 3.00-4.00 sec 1.25 MBytes 10.5 Mbits/sec 0 297 KBytes
[ 5] 4.00-5.00 sec 1.12 MBytes 9.43 Mbits/sec 0 297 KBytes
[ 5] 5.00-6.00 sec 1.25 MBytes 10.5 Mbits/sec 0 297 KBytes
[ 5] 6.00-7.00 sec 1.12 MBytes 9.44 Mbits/sec 2 1.41 KBytes
[ 5] 7.00-8.00 sec 512 KBytes 4.19 Mbits/sec 1 1.41 KBytes
[ 5] 8.00-9.00 sec 0.00 Bytes 0.00 Mbits/sec 1 1.41 KBytes
[ 5] 9.00-10.00 sec 0.00 Bytes 0.00 Mbits/sec 1 1.41 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 8.87 MBytes 7.44 Mbits/sec 5 sender
[ 5] 0.00-16.91 sec 7.62 MBytes 3.78 Mbits/sec receiver
iperf Done.
thanks for your help,
/Robi
In iperf3 the column Retr stands for Retransmitted TCP packets and indicates the number of TCP packets that had to be sent again (=retransmitted).
The lower the value in Retr the better. An optimal value would be 0, meaning that no matter how many TCP packets have been sent, not a single one had to be resent. A value greater than zero indicates packet losses which might arise from network congestion (too much traffic) or corruption due to faulty hardware.
Your original issue on Github has also been answered (source): https://github.com/esnet/iperf/issues/343
You are asking about the different outputs of iperf3 based on whether you test UDP or TCP.
When using UDP it's acceptable for packets to not arrive at the destination. To indicate the quality of the connection/ data transfer you get a percentage of how many packets did not arrive at the destination.
When using TCP all packets are supposed to reach the destination and are checked for missing or corrupted ones (hence Transmission Control Protocol). If a packet is missing it get's retransmitted. To indicate the quality of the connection you get a number of how many packets had to be retransmitted.
So both the percentage with UDP and the Retr count with TCP are quality indicators that are adjusted to the specifics of each protocol.
If you are wondering what the Cwnd column means, it stands for Congestion Window. The Congestion Window is a TCP state variable that limits the amount of data the TCP can send into the network before receiving an ACK.
Source: https://blog.stackpath.com/glossary-cwnd-and-rwnd/

Incorrect wav header generated by sox

I was using sox to convert a 2 channels, 48000Hz, 24bits wav file (new.wav) to a mono wav file (post.wav).
Here are the related commands and outputs:
[Farmer#Ubuntu recording]$ soxi new.wav
Input File : 'new.wav'
Channels : 2
Sample Rate : 48000
Precision : 24-bit
Duration : 00:00:01.52 = 72901 samples ~ 113.908 CDDA sectors
File Size : 447k
Bit Rate : 2.35M
Sample Encoding: 24-bit Signed Integer PCM
[Farmer#Ubuntu recording]$ sox new.wav -c 1 post.wav
[Farmer#Ubuntu recording]$ soxi post.wav
Input File : 'post.wav'
Channels : 1
Sample Rate : 48000
Precision : 24-bit
Duration : 00:00:01.52 = 72901 samples ~ 113.908 CDDA sectors
File Size : 219k
Bit Rate : 1.15M
Sample Encoding: 24-bit Signed Integer PCM
It looks fine. But let us check the header of post.wav:
[Farmer#Ubuntu recording]$ xxd post.wav | head -10
00000000: 5249 4646 9856 0300 5741 5645 666d 7420 RIFF.V..WAVEfmt
00000010: 2800 0000 feff 0100 80bb 0000 8032 0200 (............2..
00000020: 0300 1800 1600 1800 0400 0000 0100 0000 ................
00000030: 0000 1000 8000 00aa 0038 9b71 6661 6374 .........8.qfact
00000040: 0400 0000 c51c 0100 6461 7461 4f56 0300 ........dataOV..
This is the standard wav file header structure.
The first line is no problem.
The second line "2800 0000" shows the size of sub chunk "fmt ", it should be 0x00000028 (as this is little endian) = 40 bytes. But there are 54 bytes (before sub chunk "fmt " and sub chunk "data").
The third line shows "ExtraParamSize" is 0x0018 = 22 bytes. But actually it is 36 bytes (from third line's "1600" to 5th line's "0100"). The previous 16 bytes are standard.
So what's the extra 36 bytes?
Ok,I found out the answer.
Look at the second line, we can found that audio format is "feff", actual value is 0xFFFE, so this is not a PCM standard wave format, but a extensible format.
Wav head detailed introduction can refer to this link. The article is well written and thanks to the author.
So as this is a Non-PCM format wav, "fmt " chunk space occupied 40 bytes is no problem, and followed by a "fact" chunk, and then is "data" chunk, So everything makes sense.

Output R dataframe to SAS format Issue

I have a dataset that looks like this:
df_dummy = data.frame(
Company=c("0001","0002","0003","0004","0005"),
Measure=c("A","B","C","D","E"),
Num=c(10,10,10,10,10),
Den=c(20,20,20,20,20),
Rate=c(50.0,50.0,50.0,50.0,50.0)
)
df_dummy$Company <- as.character(df_dummy$Company)
df_dummy$Measure <- as.character(df_dummy$Measure)
I am using this to export to an .xpt file
write.xport(df_dummy, file = "data/tmp.xpt")
lookup.xport("data/tmp.xpt")
In SAS, I use this code to import:
libname sasfile 'PATH\data';
libname xptfile xport 'PATH\data\tmp.xpt' access=readonly;
proc copy inlib=xptfile outlib=sasfile;
run;
The table looks fine, but the rate doesn't show the decimal point.
In my actual dataset, there are a lot more rows but it's the same format essentially and if I run a lookup.xport I get this:
Variables in data set `MEASURES':
dataset name type format flength fdigits iformat iflength ifdigits label nobs
MEASURES ID character 0 0 0 0 29064
MEASURES MEASURE character 0 0 0 0 29064
MEASURES NUM numeric 0 0 0 0 29064
MEASURES DEN numeric 0 0 0 0 29064
MEASURES RATE numeric 0 0 0 0 29064
However, if I use the same SAS code to import this, I get something that looks completely off and I can't figure out what's causing it.
I cannot replicate your issue using R (3.4.1) and SAS (9.4 TS1M4) on Mac OS X with both being 64 bit versions. The 32/64 bit versions can cause issues sometimes.
I used R Studio and SAS UE, both freely available for education usage.
Full R code:
install.packages("SASxport")
library("SASxport")
df_dummy = data.frame(
Company=c("0001","0002","0003","0004","0005"),
Measure=c("A","B","C","D","E"),
Num=c(10,10,10,10,10),
Den=c(20,20,20,20,20),
Rate=c(50.0,50.0,50.0,50.0,50.0)
)
df_dummy$Company <- as.character(df_dummy$Company)
df_dummy$Measure <- as.character(df_dummy$Measure)
write.xport(df_dummy, file = "tmp.xpt")
Full SAS Code:
libname sasfile '/folders/myfolders/';
libname xptfile xport '/folders/myfolders/tmp.xpt' access=readonly;
proc copy inlib=xptfile outlib=sasfile;
run;
Your example works. Even with older version or R. Make sure your transport file had not been corrupted by transferring between machines. A transport file is binary data with fixed length 80 byte records, but much of data looks like ASCII codes.
SAS transport files follow the SAS V5 rules for names. Make sure that your member name and variable names are valid SAS names and are not longer than 8 characters. Character variables cannot be longer than 200 characters.
You can quickly look at the file using a simple data step. Especially for your small example. So if you see that the length is not exactly a multiple of 80 or you see that the header records do not start at the beginning of an 80 byte record then something has corrupted the file.
56 data _null_;
57 infile '/test/tmp.xpt' lrecl=80 recfm=f ;
58 input;
59 list;
60 run;
NOTE: The infile '/test/tmp.xpt' is:
Filename=/test/tmp.xpt,
Owner Name=xxxxx,Group Name=xxxxx,
Access Permission=-rw-r--r--,
Last Modified=29Sep2017:09:16:16,
File Size (bytes)=1680
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1 HEADER RECORD*******LIBRARY HEADER RECORD!!!!!!!000000000000000000000000000000
2 CHAR SAS SAS SASLIB 7.00 R 3.0.2. 29SEP17:09:16:16
ZONE 54522222545222225454442232332222523232302222222222222222222222223354533333333333
NUMR 3130000031300000313C92007E000000203E0E200000000000000000000000002935017A09A16A16
3 29SEP17:09:16:16
4 HEADER RECORD*******MEMBER HEADER RECORD!!!!!!!000000000000000001600000000140
5 HEADER RECORD*******DSCRPTR HEADER RECORD!!!!!!!000000000000000000000000000000
6 CHAR SAS DF_DUMMYSASDATA 7.00 R 3.0.2. 29SEP17:09:16:16
ZONE 54522222445454455454454232332222523232302222222222222222222222223354533333333333
NUMR 3130000046F45DD9313414107E000000203E0E200000000000000000000000002935017A09A16A16
7 29SEP17:09:16:16
8 HEADER RECORD*******NAMESTR HEADER RECORD!!!!!!!000000000500000000000000000000
9 CHAR ........COMPANY ........
ZONE 00000000444544522222222222222222222222222222222222222222222222220000000022222222
NUMR 020008013FD01E900000000000000000000000000000000000000000000000000000000000000000
10 CHAR ....................................................................MEASURE
ZONE 00000000000000000000000000000000000000000000000000000000000000000000444555422222
NUMR 00000000000000000000000000000000000000000000000000000000000002000802D51352500000
11 CHAR ........ ....................
ZONE 22222222222222222222222222222222222222222222000000002222222200000000000000000000
NUMR 00000000000000000000000000000000000000000000000000000000000000000008000000000000
12 CHAR ................................................NUM
ZONE 00000000000000000000000000000000000000000000000045422222222222222222222222222222
NUMR 000000000000000000000000000000000000000001000803E5D00000000000000000000000000000
13 CHAR ........ ........................................
ZONE 22222222222222222222222200000000222222220000000100000000000000000000000000000000
NUMR 00000000000000000000000000000000000000000000000000000000000000000000000000000000
14 CHAR ............................DEN
ZONE 00000000000000000000000000004442222222222222222222222222222222222222222222222222
NUMR 000000000000000000000100080445E0000000000000000000000000000000000000000000000000
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
15 CHAR ........ ............................................................
ZONE 22220000000022222222000000010000000000000000000000000000000000000000000000000000
NUMR 00000000000000000000000000080000000000000000000000000000000000000000000000000000
16 CHAR ........RATE ........
ZONE 00000000545422222222222222222222222222222222222222222222222222220000000022222222
NUMR 01000805214500000000000000000000000000000000000000000000000000000000000000000000
17 CHAR ....... ....................................................
ZONE 00000002000000000000000000000000000000000000000000000000000022222222222222222222
NUMR 00000000000000000000000000000000000000000000000000000000000000000000000000000000
18 HEADER RECORD*******OBS HEADER RECORD!!!!!!!000000000000000000000000000000
19 CHAR 0001 A A ......B.......B2......0002 B A ......B.......B2......
ZONE 33332222422222224A000000410000004300000033332222422222224A0000004100000043000000
NUMR 00010000100000001000000024000000220000000002000020000000100000002400000022000000
20 CHAR 0003 C A ......B.......B2......0004 D A ......B.......B2......
ZONE 33332222422222224A000000410000004300000033332222422222224A0000004100000043000000
NUMR 00030000300000001000000024000000220000000004000040000000100000002400000022000000
21 CHAR 0005 E A ......B.......B2......
ZONE 33332222422222224A00000041000000430000002222222222222222222222222222222222222222
NUMR 00050000500000001000000024000000220000000000000000000000000000000000000000000000
NOTE: 21 records were read from the infile '/test/tmp.xpt'.

How Convert hex ip from Java?

i used cat /proc/pid/net/udp6 and become:
sl local_address remote_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops
63: 00000000000000000000000000000000:D9BF 00000000000000000000000000000000:0000 07 00000000:00000000 00:00000000 00000000 1000 0 181584 2 c16e8d00 0
I know how its structured and the 00000000000000000000000000000000:D9BFmust be local ip. How can I convert it to normal ip format like 127.0.0.1?
InetAddress a = InetAddress.getByAddress(DatatypeConverter.parseHexBinary("0A064156"));

IBrokers request Historical Futures Contract Data?

I tried to request historical futures data but for a beginner the ibrokers.pdf document is not well enough documented.
example Gold Miny Contract Dec11 NYSELIFFE:
goldminy<-twsFuture("YG","NYSELIFFE","201112",multiplier="33.2")
reqHistoricalData(conn,
Contract= "goldminy",
endDateTime"",
barSize = "1 S",
duration = "1 D",
useRTH = "0",
whatToShow = "TRADES","BID", "ASK", "BID_ASK",
timeFormat = "1",
tzone = "",
verbose = TRUE,
tickerId = "1",
eventHistoricalData,
file)
I also don't know how to specify some of the data parameters correctly ?
whatToShow ? i need Date,Time,BidSize,Bid,Ask,AskSize,Last,LastSize,Volume
tickerID ?
eventHistoricalData ?
file ?
I wrote the twsInstrument package (on RForge) to alleviate these sorts of headaches.
getContract will find the contract for you if you give it anything reasonable. Any of these formats should work:
"YG_Z1", "YG_Z11", "YGZ1", "YGZ11", "YGZ2011", "YGDEC2011", "YG_DEC2011", etc. (also you could use the conId, or give it an instrument object, or the name of an instrument object)
> library(twsInstrument)
> goldminy <- getContract("YG_Z1")
Connected with clientId 100.
Contract details request complete. Disconnected.
> goldminy
List of 16
$ conId : chr "42334455"
$ symbol : chr "YG"
$ sectype : chr "FUT"
$ exch : chr "NYSELIFFE"
$ primary : chr ""
$ expiry : chr "20111228"
$ strike : chr "0"
$ currency : chr "USD"
$ right : chr ""
$ local : chr "YG DEC 11"
$ multiplier : chr "33.2"
$ combo_legs_desc: chr ""
$ comboleg : chr ""
$ include_expired: chr "0"
$ secIdType : chr ""
$ secId : chr ""
I don't have a subscription to market data for NYSELIFFE, so I will use the Dec 2011 e-mini S&P future for the rest of this answer.
You could get historical data like this
tws <- twsConnect()
hist.data <- reqHistoricalData(tws, getContract("ES_Z1"))
This will give you back these columns, and it will all be 'TRADES' data
> colnames(hist.data)
[1] "ESZ1.Open" "ESZ1.High" "ESZ1.Low" "ESZ1.Close" "ESZ1.Volume"
[6] "ESZ1.WAP" "ESZ1.hasGaps" "ESZ1.Count"
whatToShow must be one of 'TRADES', 'BID', 'ASK', or 'BID_ASK'. If your request uses whatToShow='BID' then you will get the OHLC etc. of the BID prices. "BID_ASK" means that the Ask price will be used for the High and the Bid price will be used for the Low.
Since you said the vignette was too advanced, it bears repeating that Interactive Brokers limits historical data requests to 6 every 60 seconds. So you should pause for 10 seconds between each request (or for getting lots of data I usually pause for 30 seconds after I make 3 requests so that if I have BID data for something I am also likely have ASK data for it)
The function getBAT will download the BID, ASK and TRADES data, and merge together only the closing values of those into a single xts object that looks like this:
> getBAT("ES_Z1")
Connected with clientId 120.
waiting for TWS reply on ES ............. done.
Pausing 10 seconds between requests ...
waiting for TWS reply on ES .... done.
Pausing 10 seconds between requests ...
waiting for TWS reply on ES .... done.
Pausing 10 seconds between requests ...
Disconnecting ...
[1] "ES_Z1"
> tail(ES_Z1)
ES.Bid.Price ES.Ask.Price ES.Trade.Price ES.Mid.Price
2011-09-27 15:09:00 1170.25 1170.50 1170.50 1170.375
2011-09-27 15:10:00 1170.50 1170.75 1170.50 1170.625
2011-09-27 15:11:00 1171.25 1171.50 1171.25 1171.375
2011-09-27 15:12:00 1171.50 1171.75 1171.50 1171.625
2011-09-27 15:13:00 1171.25 1171.50 1171.25 1171.375
2011-09-27 15:14:00 1169.75 1170.00 1170.00 1169.875
ES.Volume
2011-09-27 15:09:00 6830
2011-09-27 15:10:00 4509
2011-09-27 15:11:00 4902
2011-09-27 15:12:00 6089
2011-09-27 15:13:00 6075
2011-09-27 15:14:00 14380
You asked for both LastSize and Volume. The "Volume" that getBAT returns is the total amount traded over the time of the bar. So, with 1 minute bars, it's the total volume that took place in that 1 minute.
Here's an answer that doesn't use twsInstrument:
I'm almost certain this will work, but as I said, I don't have the required market data subscription, so I can't test.
reqHistoricalData(tws, twsFuture("YG","NYSELIFFE","201112"))
Using the e-mini S&P again:
> mydata <- reqHistoricalData(tws, twsFuture("ES","GLOBEX","201112"), barSize='1 min', duration='5 D', useRTH='0', whatToShow='TRADES')
waiting for TWS reply on ES .... done.
> head(mydata)
ESZ1.Open ESZ1.High ESZ1.Low ESZ1.Close ESZ1.Volume ESZ1.WAP ESZ1.hasGaps ESZ1.Count
2011-09-21 15:30:00 1155.25 1156.25 1155.00 1155.75 3335 1155.50 0 607
2011-09-21 15:31:00 1155.75 1156.25 1155.50 1155.75 917 1155.95 0 164
2011-09-21 15:32:00 1155.75 1156.25 1155.50 1156.00 859 1155.90 0 168
2011-09-21 15:33:00 1156.00 1156.25 1155.50 1155.75 642 1155.83 0 134
2011-09-21 15:34:00 1155.50 1156.00 1155.25 1155.25 1768 1155.65 0 232
2011-09-21 15:35:00 1155.25 1155.75 1155.25 1155.25 479 1155.45 0 94
One of the problems with your attempt is that if you're using a barSize of '1 S', your duration cannot be greater than '60 S' See IB Historical Data Limitations

Resources