I have a lot (1400) of outlook emails (.msg format) which I want to process further. R meets most of my text mining needs but for this I'm unable to find any solution.
I have used readMail from tm.plugin.mail, but haven't been successful
newsgroup <- file.path("D:", "mails")
news <- VCorpus(DirSource(newsgroup), readerControl = list(reader = readMail))
inspect(news)
Any help/suggestion would be greatly appreciated
Thanks!...
You can now use msgxtractr to do this:
devtools::install_github("hrbrmstr/msgxtractr")
library(msgxtractr)
print(str(read_msg(system.file("extdata/unicode.msg", package="msgxtractr"))))
## List of 7
## $ headers :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 18 variables:
## ..$ Return-path : chr "<brizhou#gmail.com>"
## ..$ Received :List of 1
## .. ..$ : chr [1:4] "from st11p00mm-smtpin007.mac.com ([17.172.84.240])\nby ms06561.mac.com (Oracle Communications Messaging Server "| __truncated__ "from mail-vc0-f182.google.com ([209.85.220.182])\nby st11p00mm-smtpin007.mac.com\n(Oracle Communications Messag"| __truncated__ "by mail-vc0-f182.google.com with SMTP id ie18so3484487vcb.13 for\n<brianzhou#me.com>; Mon, 18 Nov 2013 00:26:25 -0800 (PST)" "by 10.58.207.196 with HTTP; Mon, 18 Nov 2013 00:26:24 -0800 (PST)"
## ..$ Original-recipient : chr "rfc822;brianzhou#me.com"
## ..$ Received-SPF : chr "pass (st11p00mm-smtpin006.mac.com: domain of brizhou#gmail.com\ndesignates 209.85.220.182 as permitted sender)\"| __truncated__
## ..$ DKIM-Signature : chr "v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com;\ns=20120113; h=mime-version:date:message-id:subject:f"| __truncated__
## ..$ MIME-version : chr "1.0"
## ..$ X-Received : chr "by 10.221.47.193 with SMTP id ut1mr14470624vcb.8.1384763184960;\nMon, 18 Nov 2013 00:26:24 -0800 (PST)"
## ..$ Date : chr "Mon, 18 Nov 2013 10:26:24 +0200"
## ..$ Message-id : chr "<CADtJ4eNjQSkGcBtVteCiTF+YFG89+AcHxK3QZ=-Mt48xygkvdQ#mail.gmail.com>"
## ..$ Subject : chr "Test for TIF files"
## ..$ From : chr "Brian Zhou <brizhou#gmail.com>"
## ..$ To : chr "brianzhou#me.com"
## ..$ Cc : chr "Brian Zhou <brizhou#gmail.com>"
## ..$ Content-type : chr "multipart/mixed; boundary=001a113392ecbd7a5404eb6f4d6a"
## ..$ Authentication-results : chr "st11p00mm-smtpin007.mac.com; dkim=pass\nreason=\"2048-bit key\" header.d=gmail.com header.i=#gmail.com\nheader."| __truncated__
## ..$ x-icloud-spam-score : chr "33322\nf=gmail.com;e=gmail.com;pp=ham;spf=pass;dkim=pass;wl=absent;pwl=absent"
## ..$ X-Proofpoint-Virus-Version: chr "vendor=fsecure\nengine=2.50.10432:5.10.8794,1.0.14,0.0.0000\ndefinitions=2013-11-18_02:2013-11-18,2013-11-17,19"| __truncated__
## ..$ X-Proofpoint-Spam-Details : chr "rule=notspam policy=default score=0 spamscore=0\nsuspectscore=0 phishscore=0 bulkscore=0 adultscore=0 classifie"| __truncated__
## $ sender :List of 2
## ..$ sender_email: chr "brizhou#gmail.com"
## ..$ sender_name : chr "Brian Zhou"
## $ recipients :List of 2
## ..$ :List of 3
## .. ..$ display_name : NULL
## .. ..$ address_type : chr "SMTP"
## .. ..$ email_address: chr "brianzhou#me.com"
## ..$ :List of 3
## .. ..$ display_name : NULL
## .. ..$ address_type : chr "SMTP"
## .. ..$ email_address: chr "brizhou#gmail.com"
## $ subject : chr "Test for TIF files"
## $ body : chr "This is a test email to experiment with the MS Outlook MSG Extractor\r\n\r\n\r\n-- \r\n\r\n\r\nKind regards\r\n"| __truncated__
## $ attachments :List of 2
## ..$ :List of 4
## .. ..$ filename : chr "importOl.tif"
## .. ..$ long_filename: chr "import OleFileIO.tif"
## .. ..$ mime : chr "image/tiff"
## .. ..$ content : raw [1:969674] 49 49 2a 00 ...
## ..$ :List of 4
## .. ..$ filename : chr "raisedva.tif"
## .. ..$ long_filename: chr "raised value error.tif"
## .. ..$ mime : chr "image/tiff"
## .. ..$ content : raw [1:1033142] 49 49 2a 00 ...
## $ display_envelope:List of 2
## ..$ display_cc: chr "Brian Zhou"
## ..$ display_to: chr "brianzhou#me.com"
## NULL
The easiest way to do it would be to make use of the excellent Python msg extractor that you can source from GitHub here. If you feel like being creative you can make use of the rPython package to encapsulate that code in R.
Related
I'm trying to scrape https://www.yachtfocus.com/boten-te-koop.html#price=10000%7C30000&length=9.2%7C&super_cat_nl=Zeil. I'm using the R packages read_html and rvest. I do this using this code:
library('rvest')
#scrape yachtfocus
url <- "https://www.yachtfocus.com/boten-te-koop.html#price=10000|30000&length=9.2|&super_cat_nl=Zeil"
webpage <- read_html(url)
#Using CSS selectors to scrap the rankings section
amount_results_html <- html_node(webpage,".res_number")
#create text
amount_results <- html_text(amount_results_html)
This returns not the expected value when using the filters provided in the url, but instead returns the "unfiltered" value. So the same when I'd use:
url <- "https://www.yachtfocus.com/boten-te-koop.html"
webpage <- read_html(url)
Can I "force" read_html to execute the filter parameters correctly?
The issue is that the site turns the anchor link into an asynchronous POST request, retrieves JSON and then dynamically builds the page.
You can use Developer Tools in the browser and reload the request to see ^^:
If you right-click the highlighted item and choose "Copy as cURL" you can use the curlconverter package to automagically turn it into an httr function:
httr::POST(
url = "https://www.yachtfocus.com/wp-content/themes/yachtfocus/search/",
body = list(
hash = "#price=10000%7C30000&length=9.2%7C&super_cat_nl=Zeil"
),
encode = "form"
) -> res
dat <- jsonlite::fromJSON(httr::content(res, "text"))
This is what you get (you still need to parse some HTML):
str(dat)
## List of 8
## $ content : chr " <!-- <div class=\"list_part\"> <span class=\"list_icon\">lijst</span> <span class=\"foto\"><"| __truncated__
## $ top : chr " <h3 class=\"res_number\">317 <em>boten\tgevonden</em></h3> <p class=\"filters_list red_border\"> <span>prijs: "| __truncated__
## $ facets :List of 5
## ..$ categories_nl :List of 15
## .. ..$ 6u3son : int 292
## .. ..$ 1v3znnf: int 28
## .. ..$ 10opzfl: int 27
## .. ..$ 1mrn15c: int 23
## .. ..$ qn3nip : int 3
## .. ..$ 112l5mh: int 2
## .. ..$ 1xjlw46: int 1
## .. ..$ ci62ni : int 1
## .. ..$ 1x1x806: int 0
## .. ..$ 1s9bgxg: int 0
## .. ..$ 1i7r9mm: int 0
## .. ..$ qlys89 : int 0
## .. ..$ 1wwlclv: int 0
## .. ..$ 84qiky : int 0
## .. ..$ 3ahnnr : int 0
## ..$ material_facet_nl:List of 11
## .. ..$ 911206 : int 212
## .. ..$ c9twlr : int 53
## .. ..$ 1g88z3 : int 23
## .. ..$ fwfz2d : int 14
## .. ..$ gvrlp6 : int 5
## .. ..$ 10i8nq1: int 4
## .. ..$ h98ynr : int 4
## .. ..$ 1qt48ef: int 1
## .. ..$ 1oxq1p2: int 1
## .. ..$ 1kc1p0j: int 0
## .. ..$ 10dkoie: int 0
## ..$ audience_facet_nl:List of 13
## .. ..$ 71agu9 : int 69
## .. ..$ eb9lzb : int 63
## .. ..$ o40emg : int 55
## .. ..$ vd2cm9 : int 41
## .. ..$ tyffgj : int 24
## .. ..$ icsp53 : int 20
## .. ..$ aoqm1 : int 11
## .. ..$ 1puyni5: int 6
## .. ..$ 1eyfin8: int 5
## .. ..$ 1920ood: int 4
## .. ..$ dacmg4 : int 4
## .. ..$ e7bzw : int 3
## .. ..$ offcbq : int 3
## ..$ memberships :List of 7
## .. ..$ 137wtpl: int 185
## .. ..$ 17vn92y: int 166
## .. ..$ wkz6oe : int 109
## .. ..$ 1mdn78e: int 87
## .. ..$ aklw3a : int 27
## .. ..$ 1d9qtvu: int 20
## .. ..$ zqsmlf : int 3
## ..$ super_cat_nl :List of 3
## .. ..$ 2xl9ac : int 271
## .. ..$ glli8c : int 317
## .. ..$ 1key6o0: int 0
## $ filter :List of 3
## ..$ brand : chr "<label><input type=\"checkbox\" name=\"yfilter[brand][Dehler]\" data-solr=\"brand\" value=\"Dehler\" class=\"cu"| __truncated__
## ..$ brokers: chr "<label><input type=\"checkbox\" name=\"yfilter[brokers][Scheepsmakelaardij Goliath]\" data-solr=\"brokers\" val"| __truncated__
## ..$ land_nl: chr "<label><input type=\"checkbox\" name=\"yfilter[land_nl][Nederland]\" data-solr=\"land_nl\" value=\"Nederland\" "| __truncated__
## $ hash : chr "&price=10000|30000&length=9.2|&super_cat_nl=Zeil"
## $ ifield :List of 3
## ..$ y_price_min : chr "10000"
## ..$ y_price_max : chr "30000"
## ..$ y_length_min: chr "9.2"
## $ rcfield :List of 1
## ..$ y_glli8c: chr "1"
## $ session_id: chr "spghrfb8urv50u2kfg6bp3hejm"
Note that this is a super common problem that's been covered many times on SO. Each situation requires finding the right URL in the XHR requests but that's usually the only difference. If you're going to web scrape you should spend some time reading up on how to do so (even 10m of searching on SO would have likely solved this for you).
If you don't want to do this type of page introspection, you need to use Rselenium or splashr or decapitated. Again, the use of those tools in the context of a problem like this is a well-covered topic on SO.
I have a dataframe nested within a dataframe that I'm getting from Mongo. The number of rows match in each so that when viewed it looks like a typical dataframe. My question, how do I expand the nested dataframe into the parent so that I can run dplyr selects? See the layout below
'data.frame': 10 obs. of 2 variables:
$ _id : int 1551 1033 1061 1262 1032 1896 1080 1099 1679 1690
$ personalInfo:'data.frame': 10 obs. of 2 variables:
..$ FirstName :List of 10
.. ..$ : chr "Jack"
.. ..$ : chr "Yogesh"
.. ..$ : chr "Steven"
.. ..$ : chr "Richard"
.. ..$ : chr "Thomas"
.. ..$ : chr "Craig"
.. ..$ : chr "David"
.. ..$ : chr "Aman"
.. ..$ : chr "Frank"
.. ..$ : chr "Robert"
..$ MiddleName :List of 10
.. ..$ : chr "B"
.. ..$ : NULL
.. ..$ : chr "J"
.. ..$ : chr "I"
.. ..$ : chr "E"
.. ..$ : chr "A"
.. ..$ : chr "R"
.. ..$ : NULL
.. ..$ : chr "J"
.. ..$ : chr "E"
As per suggestion, here's how you recreate the data
id <- c(1551, 1033, 1061, 1262, 1032, 1896, 1080, 1099, 1679, 1690)
fname <- list("Jack","Yogesh","Steven","Richard","Thomas","Craig","David","Aman","Frank","Robert")
mname <- list("B",NULL,"J","I","E","A","R",NULL,"J","E")
sub <- as.data.frame(cbind(fname, mname))
master <- as.data.frame(id)
master$personalInfo <- sub
We could loop the 'personalInfo', change the NULL elements of the list to NA and convert it to a real dataset with 3 columns
library(tidyverse)
out <- master %>%
pull(personalInfo) %>%
map_df(~ map_chr(.x, ~ replace(.x, is.null(.x), NA))) %>%
bind_cols(master %>%
select(id), .)
str(out)
#'data.frame': 10 obs. of 3 variables:
# $ id : num 1551 1033 1061 1262 1032 ...
# $ fname: chr "Jack" "Yogesh" "Steven" "Richard" ...
# $ mname: chr "B" NA "J" "I" ...
While #akrun's answer is probably more practical and probably the way to tidy your data, I think this output is closer to what you describe.
I create a new environment where I put the data.frame's content, there I unlist to the said environment the content of your problematic column, and finally I wrap it all back into a data.frame.
I use a strange hack with cbind as as.data.frame is annoying with list columns. Using tibble::as_tibble works fine however.
new_env <- new.env()
list2env(master,new_env)
list2env(new_env$personalInfo,new_env)
rm(personalInfo,envir = new_env)
res <- as.data.frame(do.call(cbind,as.list(new_env))) # or as_tibble(as.list(new_env))
rm(new_env)
res
# fname id mname
# 1 Jack 1551 B
# 2 Yogesh 1033 NULL
# 3 Steven 1061 J
# 4 Richard 1262 I
# 5 Thomas 1032 E
# 6 Craig 1896 A
# 7 David 1080 R
# 8 Aman 1099 NULL
# 9 Frank 1679 J
# 10 Robert 1690 E
str(res)
# 'data.frame': 10 obs. of 3 variables:
# $ fname:List of 10
# ..$ : chr "Jack"
# ..$ : chr "Yogesh"
# ..$ : chr "Steven"
# ..$ : chr "Richard"
# ..$ : chr "Thomas"
# ..$ : chr "Craig"
# ..$ : chr "David"
# ..$ : chr "Aman"
# ..$ : chr "Frank"
# ..$ : chr "Robert"
# $ id :List of 10
# ..$ : num 1551
# ..$ : num 1033
# ..$ : num 1061
# ..$ : num 1262
# ..$ : num 1032
# ..$ : num 1896
# ..$ : num 1080
# ..$ : num 1099
# ..$ : num 1679
# ..$ : num 1690
# $ mname:List of 10
# ..$ : chr "B"
# ..$ : NULL
# ..$ : chr "J"
# ..$ : chr "I"
# ..$ : chr "E"
# ..$ : chr "A"
# ..$ : chr "R"
# ..$ : NULL
# ..$ : chr "J"
# ..$ : chr "E"
I have this code: mapply(annotate, maxs, s, ann)
Where annotate is a function that takes an annotator object (maxs), a string (s), and an annotated object (ann). s is a list of string and ann an equal-in-length list of annotated objects.
This is the error I am getting:
Error in dots[[1L]][[1L]] : object of type 'closure' is not
subsettable
Info & Code:
s <- c("hello world", "hello world")
require(openNLP)
require(NLP)
require(openNLPmodels.en)
require(tm) #optional
require(hash) #optional
require(openNLPdata) #optional
sent_token_annotator <- Maxent_Sent_Token_Annotator()
word_token_annotator <- Maxent_Word_Token_Annotator()
pos_tag_annotator <- Maxent_POS_Tag_Annotator()
ann <- sapply(s, annotate, list(sent_token_annotator,
word_token_annotator,
pos_tag_annotator))
maxs<-Maxent_Chunk_Annotator(probs = FALSE)
> maxs; str(maxs); s; str(s); ann; str(ann);
An annotator inheriting from classes
Simple_Chunk_Annotator Annotator
with description
Computes chunk annotations using the Apache OpenNLP Maxent chunker employing the default model for language 'en'.
function (s, a)
- attr(*, "meta")=List of 1
..$ description: chr "Computes chunk annotations using the Apache OpenNLP Maxent chunker employing the default model for language 'en'."
- attr(*, "class")= chr [1:2] "Simple_Chunk_Annotator" "Annotator"
[1] "hello world" "hello world"
chr [1:2] "hello world" "hello world"
id type start end features
1 sentence 1 23 constituents=<<integer,4>>
2 word 1 5 POS=UH
3 word 7 11 POS=NN
4 word 13 17 POS=UH
5 word 19 23 POS=NN
List of 5
$ :Classes 'Annotation', 'Span' hidden list of 5
..$ id : int 1
..$ type : chr "sentence"
..$ start : int 1
..$ end : int 23
..$ features:List of 1
.. ..$ :List of 1
.. .. ..$ constituents: int [1:4] 2 3 4 5
..- attr(*, "meta")=List of 2
.. ..$ POS_tagset : chr "en-ptb"
.. ..$ POS_tagset_URL: chr "http://www.comp.leeds.ac.uk/ccalas/tagsets/upenn.html"
$ :Classes 'Annotation', 'Span' hidden list of 5
..$ id : int 2
..$ type : chr "word"
..$ start : int 1
..$ end : int 5
..$ features:List of 1
.. ..$ :List of 1
.. .. ..$ POS: chr "UH"
..- attr(*, "meta")=List of 2
.. ..$ POS_tagset : chr "en-ptb"
.. ..$ POS_tagset_URL: chr "http://www.comp.leeds.ac.uk/ccalas/tagsets/upenn.html"
$ :Classes 'Annotation', 'Span' hidden list of 5
..$ id : int 3
..$ type : chr "word"
..$ start : int 7
..$ end : int 11
..$ features:List of 1
.. ..$ :List of 1
.. .. ..$ POS: chr "NN"
..- attr(*, "meta")=List of 2
.. ..$ POS_tagset : chr "en-ptb"
.. ..$ POS_tagset_URL: chr "http://www.comp.leeds.ac.uk/ccalas/tagsets/upenn.html"
$ :Classes 'Annotation', 'Span' hidden list of 5
..$ id : int 4
..$ type : chr "word"
..$ start : int 13
..$ end : int 17
..$ features:List of 1
.. ..$ :List of 1
.. .. ..$ POS: chr "UH"
..- attr(*, "meta")=List of 2
.. ..$ POS_tagset : chr "en-ptb"
.. ..$ POS_tagset_URL: chr "http://www.comp.leeds.ac.uk/ccalas/tagsets/upenn.html"
$ :Classes 'Annotation', 'Span' hidden list of 5
..$ id : int 5
..$ type : chr "word"
..$ start : int 19
..$ end : int 23
..$ features:List of 1
.. ..$ :List of 1
.. .. ..$ POS: chr "NN"
..- attr(*, "meta")=List of 2
.. ..$ POS_tagset : chr "en-ptb"
.. ..$ POS_tagset_URL: chr "http://www.comp.leeds.ac.uk/ccalas/tagsets/upenn.html"
- attr(*, "class")= chr [1:2] "Annotation" "Span"
- attr(*, "meta")=List of 2
..$ POS_tagset : chr "en-ptb"
..$ POS_tagset_URL: chr "http://www.comp.leeds.ac.uk/ccalas/tagsets/upenn.html"
`
Your help is very appreciated! TY
I try to parse data from a web API with jsonlite but for some reason the object it returns is a list.
It is said in the jsonlite package documentation that simplification process will automatically convert JSON list into a more specific R class but in my case it doesn't work.
It is like simplifyVector, simplifyDataFrame and simplifyMatrix function are disabled but each one is enabled by default.
What I would like is a dataframe to retrieve the $Name data (EAC, EFL, ELC, etc.).
I also try with the rjson library but still the same problem.
Any idea what could be wrong ?
Thank you,
Please find the code I use :
raw <- getURL("https://www.cryptocompare.com/api/data/coinlist")
library(jsonlite)
data <- fromJSON(txt=raw)
> class(data)
[1] "list"
> typeof(data)
[1] "list"
> str(data)
[...]
..$ EAC :List of 13
.. ..$ Id : chr "4437"
.. ..$ Url : chr "/coins/eac/overview"
.. ..$ ImageUrl : chr "/media/19690/eac.png"
.. ..$ Name : chr "EAC"
.. ..$ CoinName : chr "EarthCoin"
.. ..$ FullName : chr "EarthCoin (EAC)"
.. ..$ Algorithm : chr "Scrypt"
.. ..$ ProofType : chr "PoW"
.. ..$ FullyPremined : chr "0"
.. ..$ TotalCoinSupply : chr "13500000000"
.. ..$ PreMinedValue : chr "N/A"
.. ..$ TotalCoinsFreeFloat: chr "N/A"
.. ..$ SortOrder : chr "100"
..$ EFL :List of 13
.. ..$ Id : chr "4438"
.. ..$ Url : chr "/coins/efl/overview"
.. ..$ ImageUrl : chr "/media/19692/efl.png"
.. ..$ Name : chr "EFL"
.. ..$ CoinName : chr "E-Gulden"
.. ..$ FullName : chr "E-Gulden (EFL)"
.. ..$ Algorithm : chr "Scrypt"
.. ..$ ProofType : chr "PoW"
.. ..$ FullyPremined : chr "0"
.. ..$ TotalCoinSupply : chr "21000000 "
.. ..$ PreMinedValue : chr "N/A"
.. ..$ TotalCoinsFreeFloat: chr "N/A"
.. ..$ SortOrder : chr "101"
..$ ELC :List of 13
.. ..$ Id : chr "4439"
.. ..$ Url : chr "/coins/elc/overview"
.. ..$ ImageUrl : chr "/media/19694/elc.png"
.. ..$ Name : chr "ELC"
.. ..$ CoinName : chr "Elacoin"
.. ..$ FullName : chr "Elacoin (ELC)"
.. ..$ Algorithm : chr "Scrypt"
.. ..$ ProofType : chr "PoW"
.. ..$ FullyPremined : chr "0"
.. ..$ TotalCoinSupply : chr "75000000"
.. ..$ PreMinedValue : chr "N/A"
.. ..$ TotalCoinsFreeFloat: chr "N/A"
.. ..$ SortOrder : chr "102"
.. [list output truncated]
$ Type : int 100
NULL
You showed the lower end of the structure, but the answer to the question regarding why a dataframe was not returned is seen at the top of the structure:
# note: needed `require(RCurl)` to obtain getURL
> str(data)
List of 6
$ Response : chr "Success"
$ Message : chr "Coin list succesfully returned!"
$ BaseImageUrl: chr "https://www.cryptocompare.com"
$ BaseLinkUrl : chr "https://www.cryptocompare.com"
$ Data :List of 492
..$ BTC :List of 13
.. ..$ Id : chr "1182"
.. ..$ Url : chr "/coins/btc/overview"
.. ..$ ImageUrl : chr "/media/19633/btc.png"
.. ..$ Name : chr "BTC"
.. ..$ CoinName : chr "Bitcoin"
.. ..$ FullName : chr "Bitcoin (BTC)"
.. ..$ Algorithm : chr "SHA256"
# ------snipped the many, many pages of output that followed---------
Furthermore the $Data node of that list has irregular lengths so coercing to a dataframe in one step might be difficult:
> table( sapply(data$Data, length))
12 13 14
2 478 12
After loading pkg:plyr which provides a useful function to rbind similar but not identical dataframes I'm able to contruct a useful starting point for furhter analysis:
require(plyr)
money <- do.call(rbind.fill, lapply( data$Data, data.frame, stringsAsFactors=FALSE))
str(money)
#------------
'data.frame': 492 obs. of 14 variables:
$ Id : chr "1182" "3808" "3807" "5038" ...
$ Url : chr "/coins/btc/overview" "/coins/ltc/overview" "/coins/dash/overview" "/coins/xmr/overview" ...
$ ImageUrl : chr "/media/19633/btc.png" "/media/19782/ltc.png" "/media/20626/dash.png" "/media/19969/xmr.png" ...
$ Name : chr "BTC" "LTC" "DASH" "XMR" ...
$ CoinName : chr "Bitcoin" "Litecoin" "DigitalCash" "Monero" ...
$ FullName : chr "Bitcoin (BTC)" "Litecoin (LTC)" "DigitalCash (DASH)" "Monero (XMR)" ...
$ Algorithm : chr "SHA256" "Scrypt" "X11" "CryptoNight" ...
$ ProofType : chr "PoW" "PoW" "PoW/PoS" "PoW" ...
$ FullyPremined : chr "0" "0" "0" "0" ...
$ TotalCoinSupply : chr "21000000" "84000000" "22000000" "0" ...
$ PreMinedValue : chr "N/A" "N/A" "N/A" "N/A" ...
$ TotalCoinsFreeFloat: chr "N/A" "N/A" "N/A" "N/A" ...
$ SortOrder : chr "1" "3" "4" "5" ...
$ TotalCoinsMined : chr NA NA NA NA ...
If you wanted to be able to access the rows by way of the abbreviations for those crypto-currencies, you could do:
rownames(money) <- names(data$Data)
Which now lets you do this:
> money[ "BTC", ]
Id Url ImageUrl Name CoinName
BTC 1182 /coins/btc/overview /media/19633/btc.png BTC Bitcoin
FullName Algorithm ProofType FullyPremined TotalCoinSupply
BTC Bitcoin (BTC) SHA256 PoW 0 21000000
PreMinedValue TotalCoinsFreeFloat SortOrder TotalCoinsMined
BTC N/A N/A 1 <NA>
Where before access would have been a bit more clunky:
> money[ money$Name=="BTC", ]
I reply to my own question as - already said in the comment section - returned object is already in it's simplest form. Probably that jsonlite cannot create data frame from multiple lists (lists imbrication).
The solution I have found is to use unlist and data.frame like this :
> df <- data.frame(unlist(data))
> class(df)
[1] "data.frame"
Having worked out the oauth signature approval system, for dropbox, I wanted to download an .RData file that I had saved there, using the API, and httr's GET function.
The request was sucessfull and comes back with data, but it is in a raw format, and was wondering how do I go about converting it into an RData file again on my local drive.
This is what I've done so far:...
require(httr)
db.file.name <- "test.RData"
db.app <- oauth_app("db",key="xxxxx", secret="xxxxxxx")
db.sig <- sign_oauth1.0(db.app, token="xxxxxxx", token_secret="xxxxxx")
response <- GET(url=paste0("https://api-content.dropbox.com/1/files/dropbox/",db.file.name),config=c(db.sig,add_headers(Accept="x-dropbox-metadata")))
str(response)
List of 8
$ url : chr "https://api-content.dropbox.com/1/files/dropbox/test.RData"
$ handle :List of 2
..$ handle:Formal class 'CURLHandle' [package "RCurl"] with 1 slots
.. .. ..# ref:<externalptr>
..$ url :List of 8
.. ..$ scheme : chr "https"
.. ..$ hostname: chr "api-content.dropbox.com"
.. ..$ port : NULL
.. ..$ path : chr ""
.. ..$ query : NULL
.. ..$ params : NULL
.. ..$ username: NULL
.. ..$ password: NULL
.. ..- attr(*, "class")= chr "url"
..- attr(*, "class")= chr "handle"
$ status_code: num 200
$ headers :List of 14
..$ server : chr "nginx/1.2.6"
..$ date : chr "Tue, 29 Jan 2013 10:18:58 GMT"
..$ content-type : chr "application/octet-stream"
..$ content-length : chr "1142953"
..$ connection : chr "keep-alive"
..$ access-control-expose-headers: chr "X-Dropbox-Metadata, Accept-Ranges, Content-Range"
..$ accept-ranges : chr "bytes"
..$ x-dropbox-metadata : chr "{\"revision\": 8398, \"rev\": \"20ce0573b0e8\", \"thumb_exists\": false, \"bytes\": 1142953, \"modified\": \"Thu, 24 Jan 2013 2"| __truncated__
..$ etag : chr "8398n"
..$ pragma : chr "public"
..$ cache-control : chr "max-age=0"
..$ access-control-allow-origin : chr "*"
..$ status : chr "200"
..$ statusmessage : chr "OK"
..- attr(*, "class")= chr [1:2] "insensitive" "list"
$ cookies : list()
$ content : raw [1:1142953] 1f 8b 08 00 ...
$ times : Named num [1:6] 0 0.4 0.518 0.879 1.898 ...
..- attr(*, "names")= chr [1:6] "redirect" "namelookup" "connect" "pretransfer" ...
$ config :List of 1
..$ httpheader: Named chr [1:2] "x-dropbox-metadata" "OAuth oauth_consumer_key=\"xxxxxx\", oauth_nonce=\"xxxxxxxx\", oauth_signature=\"xxxxxxxxxxxxxx\", o"| __truncated__
.. ..- attr(*, "names")= chr [1:2] "Accept" "Authorization"
..- attr(*, "class")= chr "config"
- attr(*, "class")= chr "response"
raw.content.of.file <- content(response)
head(raw.content.of.file)
[1] 1f 8b 08 00 00 00
basically I want to somehow save the raw.content.of.file object into a file called downloaded.RData, which should be identical to test.RData or failing that at least be able to load the objects that are in test.RData into my Global environment.
You can use writeBin to write the binary response content to a Rda file. Here is a complete working example :
library(httr)
test <- 1:10
save(test, file="~/Dropbox/test.Rda")
response <- GET(url="https://dl.dropbox.com/s/9rjbjwqxid7yj53/test.Rda?dl=1")
writeBin(response$content, "test2.Rda")
rm(test)
load("test2.Rda")
test
[1] 1 2 3 4 5 6 7 8 9 10
And there is an even simpler way if you don't want to save your binary data to a file. You can just do directly :
rm(test)
load(rawConnection(response$content))
test
[1] 1 2 3 4 5 6 7 8 9 10