How to follow a URL in R - r

Sorry for the bad title but I don't know how else to phrase "follow".
I'm looking to remotely download a csv file from a website. I could do this by clicking the download button using RSelenium, but I've found that there's a direct link that will initiate the download for me. I.e. I could go to https://www.fake-website-url.com and click the download button, or I could just enter https://www.fake-website-url.com/exportcsv into my browser and it would automatically download.
I try not to use RSelenium whenever I can help it since it's clunky, but I'm not sure how to just initiate the download. Nothing from rvest stands out since I'm not actually reading html.
Basically, I'm looking for an R function like gotoURL('https://www.website.com/exportfullcsv) that will download the file just like it would if I entered the URL into my browser.

Since you said that you were able to find a direct URL, then the issue is not that the download failed, it's that you aren't accessing the content correctly.
I uploaded a small zip file to a personal website and ran this code:
dl <- httr::GET("https://.../sessions_tracker.zip")
dl
# Response [https://.../sessions_tracker.zip]
# Date: 2020-04-08 20:59
# Status: 200
# Content-Type: application/zip
# Size: 19.2 kB
# <BINARY BODY>
length(httr::content(dl))
# [1] 19184
19184 / 1000
# [1] 19.184 ### confirmation of download, this rounds to 19.2kB
head(httr::content(dl), n=80)
# [1] 50 4b 03 04 14 00 00 00 08 00 60 7e 7b 50 1e c3 ed e8 32 4a 00 00 fa b7 01 00 14 00 1c 00
# [31] 73 65 73 73 69 6f 6e 73 5f 74 72 61 63 6b 65 72 2e 63 73 76 55 54 09 00 03 53 83 7e 5e 5e
# [61] 01 85 5e 75 78 0b 00 01 04 d3 c6 2d 00 04 64 00 00 00 b4 5d
writeBin(httr::content(dl), "sessions_tracker.zip")

Related

Accessing Files for a UICC Application

I have two different UICCs with USIM applications installed.
With the first card, I can access the USIM EFs without needing to first select the application if I know the file ID of the ADF (which this particular case is 7FF0).
00 A4 08 0C 04 7FF0 6F07 (select by path from MF)
I found the file ID by looking at the FCP template for the ADF.
SELECT EF-DIR
Command: 00 A4 00 0C 02 2F00
SW: 9000
READ RECORD
Command: 00 B2 00 02 00 (next record)
SW: 6C26
61 18
4F 10 A0000000871002FFFFFFFF8903050001 AID
50 04 5553494D USIM
SELECT AID
Command: 00 A4 04 0C 10 A0000000871002FFFFFFFF8903050001
SW: 9000
STATUS (after selecting AID)
Command: 80 F2 01 00 00
SW: 6C38
62 36
82 02 7821 File descriptor
83 02 7FF0 File identifier
84 10 A0000000871002FFFFFFFF8903050001 DF name (AID)
8A 01 05 Life cycle indicator
8B 03 2F0609 Security attributes
C6 0C PIN status template DO
90 01 60
83 01 01
83 01 81
83 01 0A
81 04 00002C48
However, when I try and do the same with the second card, I found that the FCP template does not include a file ID, and I need to first select the application, and then select the EF.
00 A4 04 0C 0C A0000000871002FF49FF0589 (select by DF name)
00 A4 00 0C 02 6F07 (Select by file ID)
SELECT EF-DIR
Command: 00 A4 00 0C 02 2F00
SW: 9000
READ RECORD
Command: 00 B2 00 02 00 (next record)
SW: 6C26
61 14
4F 0C A0000000871002FF49FF0589 AID
50 04 5553494D USIM
SELECT AID
Command: 00 A4 04 0C 0C A0000000871002FF49FF0589
SW: 9000
STATUS (after selecting AID)
Command: 80 F2 01 00 00
SW: 6C2A
62 28
82 02 7821 File descriptor
84 0C A0000000871002FF49FF0589 DF name (AID)
8A 01 05 Life cycle indicator
8B 03 2F0601 Security attributes
C6 0C PIN status template DO
90 01 A0
83 01 81
83 01 01
83 01 0A
My questions are:
Why are the two USIM applications configured differently, where one allows access to the application's files directly by specifying a path from the MF, and the other does not; only allowing access relative to the ADF after first selecting the application?
Are there security benefits to not allowing direct access?
Does one method better facilitate access to files in a multi-application environment?

Understanding how DNS queries work at a deeper level

It's currently 04:40 AM and I am stuck on something I simply do not understand. I am trying to look up a domain's nameservers directly by using the DNS protocol. If I send a host -t ns google.com 1.1.1.1 and monitor it with Wireshark, I can see the full query of the DNS query. However, I cannot figure out, why some ASCII characters are used one time, but not another time. Here is an example:
0000 70 4d 7b 94 dd e0 00 d8 61 a9 c5 ec 08 00 45 00 pM{.....a.....E.
0010 00 38 d6 ff 00 00 80 11 9f 50 c0 a8 01 bb 01 01 .8.......P......
0020 01 01 e8 40 00 35 00 24 a0 19 9e f7 01 00 00 01 ...#.5.$........
0030 00 00 00 00 00 00 06 67 6f 6f 67 6c 65 03 63 6f .......google.co
0040 6d 00 00 02 00 01 m.....
In this DNS query, I am looking up the nameservers for google.com. The actual query starts at 06 07.
06 in ASCII is ACK/Acknowledgment.
Now, if we take a look at gmail.com instead:
0000 70 4d 7b 94 dd e0 00 d8 61 a9 c5 ec 08 00 45 00 pM{.....a.....E.
0010 00 37 d7 00 00 00 80 11 9f 50 c0 a8 01 bb 01 01 .7.......P......
0020 01 01 e8 58 00 35 00 23 8f cc 6f e2 01 00 00 01 ...X.5.#..o.....
0030 00 00 00 00 00 00 05 67 6d 61 69 6c 03 63 6f 6d .......gmail.com
0040 00 00 02 00 01 .....
the query starts at 05 67 instead.
05 is ENQ/Enquiry.
Why are they different? If I try to send 06 instead of 05 the DNS server gives me no response but Wireshark tells me:
Unknown extended label
I've seen 05, 06, and 09 so far. 09 is my biggest "wat" of all time, because it's a HT/Horizontal Tab.
Anyone with a lot of DNS knowledge who can help me here? I'm not looking for "just use dig/nslookup/host command". I'm currently trying to research a bit on the DNS protocol, and this is a thing I do not understand.
Good read where I got a lot of help: http://dev.lab427.net/dns-query-wth-netcat.html
For a binary protocols like this, you can't assume each byte corresponds to the matching ASCII character.
Take a look at section 4.1.2 of the DNS RFC (https://www.ietf.org/rfc/rfc1035.txt).
The domain name in a DNS request is broken up into "labels". For each label, the first byte is the length of the label, then the bytes for the string are written.
For your Google.com example, the labels are "google" and "com". The 06 is the number of bytes in the first label. This is followed by the bytes for "google". Then the 03 is the number of bytes in the "com" label. After the "com" bytes, the 00 byte is the NULL label to mark the end.

Get jpg from UDP hex dump with wireshark

I am trying to take part in a CTF challenge. I have a pcap file of a jpg file transfer. I know that the jpg starts with FF D8 FF and ends with FF D9.
The problem is that I have no idea how to extract the file itself.
The file is in here:
00000226 67 0d 0a 0d 0a ff d8 ff e0 00 10 4a 46 49 46 00 g....... ...JFIF.
00000236 01 01 00 00 01 00 01 00 00 ff db 00 43 00 03 02 ........ ....C...
..
00015617 d2 51 95 15 f7 e1 c0 d8 e9 6d 58 c8 07 71 c7 40 .Q...... .mX..q.#
00015627 3a 79 53 19 33 54 00 05 b4 92 07 33 5e af 54 2d :yS.3T.. ...3^.T-
00015637 1f ff d9 ...
As you can see it's mixed with 67 0d 0a 0d and the other information. I tried to copy the relevant parts and cut out the offset and ascii (?) section left and right with python and then imported the hexdump to this site to create a jpg of the hex dump.
Unfortunately that didn't work. The resulting image is extremely distorted and I can't read anything on it.
Does anyone have an advice? Not a full solution, just a tip so I can wrap my head around it myself.
Thanks.

Dota2 packet analysis uknown wiretype for proto message

I am trying to gain access to in game chat information from dota2 packets. I knew this used to possible since there were multiple projects that intercepted dota2 network traffic and translated chat text to print out on an overlay over dota2. Right now I am using wireshark with protobuf addon installed. I can see a few packets here and there to valve servers outside the USA and can see the protobuf addon for wireshark working on these packets but I get an unknown wiretype error for 95% of the packets I believe to be related to dota. In almost all of these packets the UDP data payload starts off with 56 53 30 31
here is an example hex dump from wireshark. Are these 4 bytes some sort of header and then the proto messages start?
0000 c8 a7 0a a4 63 ed 6c fd b9 4b 6e 16 08 00 45 00
0010 00 70 58 db 40 00 40 11 85 1a c0 a8 01 f5 d0 40
0020 c9 a9 9e 96 69 89 00 5c 72 7c **56 53 30 31** 30 00
0030 06 00 00 02 00 00 00 1d fe 11 11 10 00 00 d7 0a
0040 00 00 01 00 00 00 11 10 00 00 30 00 00 00 24 fd
0050 37 3c b4 30 a5 48 fa 3d ea 30 1a 1f d8 a9 41 e0
0060 e0 6c 44 ba bb 4e ba fc e7 ac ed f9 40 19 86 20
0070 84 71 52 5d b3 1f da 36 40 d9 b6 2e e1 e5
That is ascii code for "VS01", so yes, it might be some kind of version identifier.

Creating a MIDI file and encoding event times: why do notes that should be spaced uniformly time-wise actually slow down?

I am trying to write a simple interface for creating MIDI files. As a test, I tried to create a file that plays a major scale, all notes of the same length. The file I get is as follows (indented for readability)
4d 54 68 64 00 00 00 06 00 01 00 02 00 08
4d 54 72 6b 00 00 00 0b
00 ff 51 03 00 27 0f
00 ff 2f 00
4d 54 72 6b 00 00 00 54
00 c0 00
00 90 40 7f
7d 80 40 7f
7d 90 42 7f
81 7a 80 42 7f
81 7a 90 44 7f
82 77 80 44 7f
82 77 90 45 7f
83 74 80 45 7f
83 74 90 47 7f
84 71 80 47 7f
84 71 90 49 7f
85 6e 80 49 7f
85 6e 90 4b 7f
86 6b 80 4b 7f
86 6b 90 4c 7f
87 68 80 4c 7f
00 ff 2f 00
Explanations: Line 1 is file header. Line 2 is a track header. (In my interface I reserve one track for percussion, and also to set the tempo. Since I have no percussion in this example it contains no notes.) Line 3 sets the tempo, line 4 ends the track. Line 5 is another track header. This track contains the melody. Line 6 sets the instrument for channel 0. Next come 8 alternating note-on and 8 note-off events for channel 0, and then track end. Times for starting and ending notes are:
00, 7d, 81 7a, 82 77, 83 74, 84 71, 85 6e, 86 6b, 87 68
As far as I understand, they should be uniformly spaced, because for event times MIDI uses a 7-bit-byte format where the length of the number is flexible and all bytes except the last one have their msnzb set. So 00 should translate to 0, 7d should translate to 125, 81 7a should translate to 250, etc. But for some reason, when you play the file, it does not sound uniform time-wise, but rather slowing down. Why is it so? Have I misunderstood the correct way to encode event timing, and if yes then what would be the correct way? Or is there some other issue with my file that causes the problem?
Timestamps in a MIDI file are delta-times -- you don't encode the time that an event occurs, you encode the time difference between sequential events on that track. If events are evenly spaced in time, their delta times should be the same.
From the standard:
The syntax of an MTrk event is very simple:
<MTrk event> = <delta-time><event>
<delta-time> is stored as a variable-length quantity. It represents the amount of time before the following event. If the first event in a track occurs at the very beginning of a track, or if two events occur simultaneously, a delta-time of zero is used. Delta-times are always present. (Not storing delta-times of 0 requires at least two bytes for any other value, and most delta-times aren't zero.) Delta-time is in some fraction of a beat (or a second, for recording a track with SMPTE times), as specified in the header chunk.
see e.g. http://www.music.mcgill.ca/~ich/classes/mumt306/StandardMIDIfileformat.html

Resources