I am using the rDrop package that is available from https://github.com/karthikram/rDrop, and after a bit of tweaking (as all the functions don't quite work as you would always expect them to) I have got it to work finally in the way I would like, but it still requires authorisation verification to allow the use of the app, once you get the token each time, as I think that tokens expire over time...(if this is not the case and I can hard code in my token please tell me as that would be a good solution too...)
Basically I wanted a near seamless way of downloading csv files from my dropbox folders from the commandline in R in one line of code so that I dont need to click on the allow button after the token request.
Is this possible?
Here is the code I used to wrap up a dropbox csv download.
db.csv.download <- function(dropbox.path, ...){
cKey <- getOption('DropboxKey')
cSecret <- getOption('DropboxSecret')
reqURL <- "https://api.dropbox.com/1/oauth/request_token"
authURL <- "https://www.dropbox.com/1/oauth/authorize"
accessURL <- "https://api.dropbox.com/1/oauth/access_token/"
require(devtools)
install_github("ROAuth", "ropensci")
install_github("rDrop", "karthikram")
require(rDrop)
dropbox_oa <- oauth(cKey, cSecret, reqURL, authURL, accessURL, obj = new("DropboxCredentials"))
cred <- handshake(dropbox_oa, post = TRUE)
raw.data <- dropbox_get(cred,dropbox.path)
data <- read.csv(textConnection(raw.data), ...)
data
}
Oh and if its not obvious I have put my dropbox key and secret in my .Rprofile file, which is what the getOption part is referring to.
Thanks in advance for any help that is provided. (For bonus points...if anybody knows how to get rid of all the loading messages even for the install that would be great...)
library(rDrop)
# my keys are in my .rprofile, otherwise specifiy inline
db_token <- dropbox_auth()
# Hit ok to authorize once through the browser and hit enter back at the R prompt.
save(db_token, file="my_dropbox_token.rdata")
Dropbox token are non-expiring and can be revoked anytime from the Dropbox web panel.
For future use:
library(rDrop)
load('~/Desktop/my_dropbox_token.rdata')
df <- data.frame(x=1:10, y=rnorm(10))
> df
x y
1 1 -0.6135835
2 2 0.3624928
3 3 0.5138807
4 4 -0.2824156
5 5 0.9230591
6 6 0.6759700
7 7 -1.9744624
8 8 -1.2061920
9 9 0.9481213
10 10 -0.5997218
dropbox_save(db_token, list(df), file="foo", ext=".rda")
rm(df)
df2 <- db.read.csv(db_token, file='foo.rda')
> df2
x y
1 1 -0.6135835
2 2 0.3624928
3 3 0.5138807
4 4 -0.2824156
5 5 0.9230591
6 6 0.6759700
7 7 -1.9744624
8 8 -1.2061920
9 9 0.9481213
10 10 -0.5997218
If you have additional problems, please file an issue.
Related
I have tried several ways but it usually ends up with blank or N/A only.
library('rvest')
library(stringr)
url <- 'https://www.kimovil.com/en/compare-smartphones/f_min_dm+unveileddate.3,i_b+slug.samsung'
webpage <- read_html(url)
device_cost_html <- html_nodes(webpage,'.price')
device_cost <- html_text(device_cost_html)
device_cost <- as.numeric(device_cost)
This is not a static webpage that can be scraped with rvest. The span elements actually are empty on the requested html document. What happens in your web browser is that when the html document is loaded, the browser reads the javascript code on the page, which generates further requests to the server. These requests return the actual data in json format, which the javascript code then uses to populate the empty span elements.
The reason why this doesn't work in rvest is that it has no facility to run the javascript on the page. It just returns the original empty html spans.
However, all is not lost. Using the console in your browser's developer tools, you can find the url of the json that contains the data and just request that directly. In your case, this is surprisingly straightforward:
json <- httr::GET("https://www.kimovil.com/uploads/last_prefetch.json")
all_phones <- httr::content(json, "parsed")
df <- do.call(rbind, lapply(all_phones$smartphones, function(x) {
data.frame(name = x$full_name, price_usd = paste("$", x$usd))}))
head(df)
#> name price_usd
#> 1 Realme GT Neo 2 $ 489
#> 2 Google Pixel 6 $ 754
#> 3 OnePlus 9RT $ 577
#> 4 Google Pixel 6 Pro $ 1044
#> 5 Apple iPhone 13 Pro Max $ 1432
#> 6 Apple iPhone 13 Pro $ 31
Created on 2021-10-31 by the reprex package (v2.0.0)
Here is my question:
I am trying my hand at datapasta(). I was able to do everything required but my finish result is not good.
How do I make my clipboard accessible so that I can copy what I need right into the clipboard instead of having to paste my copied text into the text box and then press saves to import my data? I believe this is the problem that is stopping me from properly viewing my copied data.frame correctly when I run head()
Here are the steps I followed via code chunk
install.packages("datapasta")
test<-data.frame(
stringsAsFactors = FALSE,
check.names = FALSE,
`Pos Team P W D L GD Pts` = c("1\tChelsea\t7\t5\t1\t1\t12\t16",
"2\tLiverpool\t7\t4\t3\t0\t15","3\tManchester City\t7\t4\t3\t0\t11\t14",
"4\tManchester United\t7\t4\t2\t1\t8\t14",
"5\tEverton\t7\t4\t2\t1\t5\t14",
"6\tBrighton\t7\t4\t2\t1\t3\t14","7\tBrentford\t7\t3\t3\t1\t4\t12",
"8\tTottenham\t7\t4\t0\t3\t12",
"9\tWest Ham\t7\t3\t2\t2\t4\t11",
"10\tAston Villa\t7\t3\t1\t3\t10","11\tArsenal\t7\t3\t1\t3\t10",
"12\tWolves\t7\t3\t0\t4\t9",
"13\tLeicester City\t7\t2\t2\t3\t8","14\tCrystal Palace\t7\t1\t4\t2\t7",
"15\tWatford\t7\t2\t1\t4\t7",
"16\tLeeds United\t7\t1\t3\t3\t6","17\tSouthampton\t7\t0\t4\t3\t4",
"18\tBurnley\t7\t0\t3\t4\t3",
"19\tNewcastle\t7\t0\t3\t4\t3","20\tNorwich City\t7\t0\t1\t6\t1")
)
head(test)
Here is my result which is not what I wanted
Pos\tTeam\tP\tW\tD\tL\tGD\tPts
1 1\tChelsea\t7\t5\t1\t1\t12\t16
2 2\tLiverpool\t7\t4\t3\t0\t15
3 3\tManchester City\t7\t4\t3\t0\t11\t14
4 4\tManchester United\t7\t4\t2\t1\t8\t14
5 5\tEverton\t7\t4\t2\t1\t5\t14
6 6\tBrighton\t7\t4\t2\t1\t3\t14
>
This is what pops up on my screen
Any help or suggestion will be greatly appreciated.
I am trying to use R to retrieve data from the YouTube API v3 and there are few/no tutorials out there that show the basic process. I have figured out this much so far:
# Youtube API query
base_url <- "https://youtube.googleapis.com/youtube/v3/"
my_yt_search <- function(search_term, max_results = 20) {
my_api_url <- str_c(base_url, "search?part=snippet&", "maxResults=", max_results, "&", "q=", search_term, "&key=",
my_api_key, sep = "")
result <- GET(my_api_url)
return(result)
}
my_yt_search(search_term = "salmon")
But I am just getting some general meta-data and not the search results. Help?
PS. I know there is a package 'tuber' out there but I found it very unstable and I just need to perform simple searches so I prefer to code the requests myself.
Sadly there is no way to directly get the durations, you'll need to call the videos endpoint (with the part set to part=contentDetails) after doing the search if you want to get those infos, however you can pass as much as 50 ids in a single call thus we can save some time by pasting all the ids together.
library(httr)
library(jsonlite)
library(tidyverse)
my_yt_duration <- function(...){
my_api_url <- paste0(base_url, "videos?part=contentDetails", paste0("&id=", ..., collapse=""), "&key=",
my_api_key )
GET(my_api_url) -> resp
fromJSON(content(resp, "text"))$items %>% as_tibble %>% select(id, contentDetails) -> tb
tb$contentDetails$duration %>% tibble(id=tb$id, duration=.)
}
### getting the video IDs
my_yt_search(search_term = "salmon")->res
## Converting from JSON then selecting all the video ids
# fromJSON(content(res,as="text") )$items$id$videoId
my_yt_duration(fromJSON(content(res,as="text") )$items$id$videoId) -> tib.id.duration
# A tibble: 20 x 2
id duration
<chr> <chr>
1 -x2E7T3-r7k PT4M14S
2 b0ahREpQqsM PT3M35S
3 ROz8898B3dU PT14M17S
4 jD9VJ92xyzA PT5M42S
5 ACfeJuZuyxY PT3M1S
6 bSOd8r4wjec PT6M29S
7 522BBAsijU0 PT10M51S
8 1P55j9ub4es PT14M59S
9 da8JtU1YAyc PT3M4S
10 4MpYuaJsvRw PT8M27S
11 _NbbtnXkL-k PT2M53S
12 3q1JN_3s3gw PT6M17S
13 7A-4-S_k_rk PT9M37S
14 txKUTx5fNbg PT10M2S
15 TSSPDwAQLXs PT3M11S
16 NOHEZSVzpT8 PT7M51S
17 4rTMdQzsm6U PT17M24S
18 V9eeg8d9XEg PT10M35S
19 K4TWAvZPURg PT3M3S
20 rR9wq5uN_q8 PT4M53S
I run the "R code" in "RKWard" to read a CSV file:
# I) Go to the working directory
setwd("/home/***/Desktop/***")
# II) Verify the current working directory
print(getwd())
# III) Load te nedded package
require("csv")
# IV) Read the desired file
read.csv(file="CSV_Example.csv", header=TRUE, sep=";")
The data in CSV file is as follow (an example token from this website):
id,name,salary,start_date,dept
1,Rick,623.3,2012-01-01,IT
2,Dan,515.2,2013-09-23,Operations
3,Michelle,611,2014-11-15,IT
4,Ryan,729,2014-05-11,HR
5,Gary,843.25,2015-03-27,Finance
6,Nina,578,2013-05-21,IT
7,Simon,632.8,2013-07-30,Operations
8,Guru,722.5,2014-06-17,Finance
But the result is as follow:
id.name.salary.start_date.dept
1 1,Rick,623.3,2012-01-01,IT
2 2,Dan,515.2,2013-09-23,Operations
3 3,Michelle,611,2014-11-15,IT
4 4,Ryan,729,2014-05-11,HR
5 5,Gary,843.25,2015-03-27,Finance
6 6,Nina,578,2013-05-21,IT
7 7,Simon,632.8,2013-07-30,Operations
8 8,Guru,722.5,2014-06-17,Finance
PROBLEM: The datas are not aligned as supposed to be.
Please can anyone help me
I would like to read online data to R using download.file() as shown below.
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
download.file(URL, destfile = "./data/data.csv", method="curl")
Someone suggested to me that I add the line setInternet2(TRUE), but it still doesn't work.
The error I get is:
Warning messages:
1: running command 'curl "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv" -o "./data/data.csv"' had status 127
2: In download.file(URL, destfile = "./data/data.csv", method = "curl", :
download had nonzero exit status
Appreciate your help.
It might be easiest to try the RCurl package. Install the package and try the following:
# install.packages("RCurl")
library(RCurl)
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)
## Or
## x <- getURL(URL, ssl.verifypeer = FALSE)
out <- read.csv(textConnection(x))
head(out[1:6])
# RT SERIALNO DIVISION PUMA REGION ST
# 1 H 186 8 700 4 16
# 2 H 306 8 700 4 16
# 3 H 395 8 100 4 16
# 4 H 506 8 700 4 16
# 5 H 835 8 800 4 16
# 6 H 989 8 700 4 16
dim(out)
# [1] 6496 188
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv",destfile="reviews.csv",method="libcurl")
Here's an update as of Nov 2014. I find that setting method='curl' did the trick for me (while method='auto', does not).
For example:
# does not work
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
destfile='localfile.zip')
# does not work. this appears to be the default anyway
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
destfile='localfile.zip', method='auto')
# works!
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
destfile='localfile.zip', method='curl')
I've succeed with the following code:
url = "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x = read.csv(file=url)
Note that I've changed the protocol from https to http, since the first one doesn't seem to be supported in R.
If using RCurl you get an SSL error on the GetURL() function then set these options before GetURL(). This will set the CurlSSL settings globally.
The extended code:
install.packages("RCurl")
library(RCurl)
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)
Worked for me on Windows 7 64-bit using R3.1.0!
Offering the curl package as an alternative that I found to be reliable when extracting large files from an online database. In a recent project, I had to download 120 files from an online database and found it to half the transfer times and to be much more reliable than download.file.
#install.packages("curl")
library(curl)
#install.packages("RCurl")
library(RCurl)
ptm <- proc.time()
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)
proc.time() - ptm
ptm
ptm1 <- proc.time()
curl_download(url =URL ,destfile="TEST.CSV",quiet=FALSE, mode="wb")
proc.time() - ptm1
ptm1
ptm2 <- proc.time()
y = download.file(URL, destfile = "./data/data.csv", method="curl")
proc.time() - ptm2
ptm2
In this case, rough timing on your URL showed no consistent difference in transfer times. In my application, using curl_download in a script to select and download 120 files from a website decreased my transfer times from 2000 seconds per file to 1000 seconds and increased the reliability from 50% to 2 failures in 120 files. The script is posted in my answer to a question I asked earlier, see .
Try following with heavy files
library(data.table)
URL <- "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- fread(URL)
127 means command not found
In your case, curl command was not found. Therefore it means, curl was not found.
You need to install/reinstall CURL. That's all. Get latest version for your OS from http://curl.haxx.se/download.html
Close RStudio before installation.
Had exactly the same problem as UseR (original question), I'm also using windows 7. I tried all proposed solutions and they didn't work.
I resolved the problem doing as follows:
Using RStudio instead of R console.
Actualising the version of R (from 3.1.0 to 3.1.1) so that the library RCurl runs OK on it. (I'm using now R3.1.1 32bit although my system is 64bit).
I typed the URL address as https (secure connection) and with / instead of backslashes \\.
Setting method = "auto".
It works for me now. You should see the message:
Content type 'text/csv; charset=utf-8' length 9294 bytes
opened URL
downloaded 9294 by
You can set global options and try-
options('download.file.method'='curl')
download.file(URL, destfile = "./data/data.csv", method="auto")
For issue refer to link-
https://stat.ethz.ch/pipermail/bioconductor/2011-February/037723.html
Downloading files through the httr-package also works:
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
httr::GET(URL,
httr::write_disk(path = basename(URL),
overwrite = TRUE))