Resource not found nse india even with using Curl - web-scraping

So i am scrapping the nse india results calendar site via using curl command and even after that its giving me "Resource not Found" error . Here's my code
url = f"https://www.nseindia.com/api/event-calendar?index=equities"
header1 ="Host:www.nseindia.com"
header2 = "User-Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:82.0) Gecko/20100101 Firefox/82.0"
header3 ="Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"
header4 ="Accept-Language:en-US,en;q=0.5"
header5 ="Accept-Encoding:gzip, deflate, br"
header6 ="DNT:1"
header7 ="Connection:keep-alive"
header8 ="Upgrade-Insecure-Requests:1"
header9 ="Pragma:no-cache"
header10 ="Cache-Control:no-cache"
def run_curl_command(curl_command, max_attempts):
result = os.popen(curl_command).read()
count = 0
while "Resource not found" in result and count < max_attempts:
result = os.popen(curl_command).read()
count += 1
time.sleep(1)
print("API Read")
result = json.loads(result)
result = pd.DataFrame(result)
def init():
max_attempts = 100
curl_command = f'curl "{url}" -H "{header1}" -H "{header2}" -H "{header3}" -H "{header4}" -H "{header5}" -H "{header6}" -H "{header7}" -H "{header8}" -H "{header9}" -H "{header10}" --compressed '
print(f"curl_command : {curl_command}")
run_curl_command(curl_command, max_attempts)
init()

Related

How can I show more than 100 results per page?

I want to change the number of results on this page: https://fifatracker.net/players/ to more than 100 and then export the table to Excel and make it much easier for me. I tried to scrape it using python following a tutorial but I can't make it work. If there is a way to extract the table from all the pages it would also help me.
As stated, it's restricted to 100 per request. Simply iterate through the query payload on the api to get each page:
import pandas as pd
import requests
url = 'https://fifatracker.net/api/v1/players/'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36'}
page= 1
payload = {
"pagination":{
"per_page":"100","page":page},
"filters":{
"attackingworkrate":[],
"defensiveworkrate":[],
"primarypositions":[],
"otherpositions":[],
"nationality":[],
"order_by":"-overallrating"},
"context":{
"username":"guest",
"slot":"1","season":1},
"currency":"eur"}
jsonData = requests.post(url, headers=headers, json=payload).json()
current_page = jsonData['pagination']['current_page']
last_page = jsonData['pagination']['last_page']
dfs = []
for page in range(1,last_page+1):
if page == 1:
pass
else:
payload['pagination']['page'] = page
jsonData = requests.post(url, headers=headers, json=payload).json()
players = pd.json_normalize(jsonData['result'])
dfs.append(players)
print('Page %s of %s' %(page,last_page))
df = pd.concat(dfs).reset_index(drop=True)
Output:
print(df)
slug ... info.contract.loanedto_clubname
0 lionel-messi ... NaN
1 cristiano-ronaldo ... NaN
2 robert-lewandowski ... NaN
3 neymar-jr ... NaN
4 kevin-de-bruyne ... NaN
... ... ...
19137 levi-kaye ... NaN
19138 phillip-cancar ... NaN
19139 julio-pérez ... NaN
19140 alan-mclaughlin ... NaN
19141 tatsuki-yoshitomi ... NaN
[19142 rows x 92 columns]

JSON URL From NBA Website Not Working Anymore

I've been working on this project that scrapes data from the nba.com stats website using R. A couple of months ago, I was able to use it easily, but now the url does not seem to work and I can't figure out why. Looking at the website, it doesn't seem like the url changed at all, but I can't access it via my browser.
library(rjson)
url <- "https://stats.nba.com/stats/scoreboardV2?DayOffset=0&LeagueID=00&gameDate=02%2F07%2F2020"
data_json <- fromJSON(file = url)
Is anyone else experiencing this problem?
It was a header related issue. The following fixed it:
url <- "https://stats.nba.com/stats/scoreboardV2?DayOffset=0&LeagueID=00&gameDate=02%2F07%2F2020"
headers = c(
`Connection` = 'keep-alive',
`Accept` = 'application/json, text/plain, */*',
`x-nba-stats-token` = 'true',
`User-Agent` = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36',
`x-nba-stats-origin` = 'stats',
`Sec-Fetch-Site` = 'same-origin',
`Sec-Fetch-Mode` = 'cors',
`Referer` = 'http://stats.nba.com/%referer%/',
`Accept-Encoding` = 'gzip, deflate, br',
`Accept-Language` = 'en-US,en;q=0.9'
)
res <- GET(url, add_headers(.headers = headers))
data_json <- res$content %>%
rawToChar() %>%
fromJSON()

R -RMySQL- how to save more sql queries to file?

I do some data analysis in R. On end of script I want save my results to file. I know there is more options how to do it, but they don't work properly. When I try sink() it works but it give me :
<MySQLResult:1,5,1>
host logname user time request_fline status
1 142.4.5.115 - - 2018-01-03 12:08:58 GET /phpmyadmin?</script><script>alert('<!--VAIBS-->');</script><script> HTTP/1.1 400
size_varchar referer agent ip_adress size_int cookie time_microsec filename
1 Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0 445 - 142.4.5.115 445 - 159 -
request_protocol keepalive request_method contents_of_foobar contents_of_notefoobar port child_id
<MySQLResult:1,5,1>
[1] host logname user time request_fline status
[7] size_varchar referer agent ip_adress size_int cookie
<0 rows> (or 0-length row.names)
which is totally unusable because I cant export that type of data. If I try write.table it give file with one row which is possible read but after one row R skript and give me error : Error in isOpen(file, "w") : invalid connection and when I try write.csv result is same. And when I try lapply it give me just empty file.
There is my code :
fileConn<-file("outputX.txt")
fileCon2<-file("outputX.csv")
sink("outputQuery.txt")
for (i in 1:length(awq)){
sql <- paste("SELECT * FROM mtable ORDER BY cookie LIMIT ", awq[i], ",1")
nb <- dbGetQuery(mydb, sql)
print (nb)
write.table(nb, file = fileConn, append = TRUE, quote = FALSE, sep = " ", eol = "\n", na = "NA", row.names = FALSE, col.names = FALSE)
write.csv(nb, file = fileCon2,row.names=FALSE, sep=" ")
lapply(nb, write, fileConn, append=TRUE, ncolumns=7)
writeLines(unlist(lapply(nb, paste, collapse=" ")))
}
sink()
close(fileConn)
close(fileCon2)
I am new in R, so I don't know what else should I try.What I want is 1 file where data will be print in form which is easy to read and export. For example tike this :
142.4.5.115 - - 2018-01-03 12:08:58 GET /phpmyadmin?/><!--VAIBS--> HTTP/1.1 400 Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0 445 - 142.4.5.115 445 - 145 - HTTP/1.1 0 GET - - 80 7216 ?/><!--VAIBS--> GET /phpmyadmin?/><!--VAIBS--> HTTP/1.1 - 0 /phpmyadmin - 354 0
142.4.5.115 - - 2018-01-03 12:10:23 GET /phpmyadmin?/><!--VAIBS--> HTTP/1.1 400 Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0 445 - 142.4.5.115 445 - 145 - HTTP/1.1 0 GET - - 80 7216 ?/><!--VAIBS--> GET /phpmyadmin?/><!--VAIBS--> HTTP/1.1 - 0 /phpmyadmin - 354 0
142.4.5.115 - - 2018-01-03 12:12:41 GET /phpmyadmin?/><!--VAIBS--> HTTP/1.1 400 Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0 445 - 142.4.5.115 445 - 145 - HTTP/1.1 0 GET - - 80 7216 ?/><!--VAIBS--> GET /phpmyadmin?/><!--VAIBS--> HTTP/1.1 - 0 /phpmyadmin - 354 0
142.4.5.115 - - 2018-01-03 12:15:29 GET /phpmyadmin?/><!--VAIBS--> HTTP/1.1 400 Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0 445 - 142.4.5.115 445 - 145 - HTTP/1.1 0 GET - - 80 7216 ?/><!--VAIBS--> GET /phpmyadmin?/><!--VAIBS--> HTTP/1.1 - 0 /phpmyadmin - 354 0
or this :
host,logname,user,time, request_fline status,size_varchar,referer agent,ip_adress,size_int,cookie,time_microsec,filename,request_protocol,keepalive,request_method,contents_of_foobar,contents_of_notefoobar port child_id
1 142.4.5.115 - - 2018-01-03 12:08:58 GET /phpmyadmin?/><!--VAIBS--> HTTP/1.1 400 Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0 445 - 142.4.5.115 445 - 145 - HTTP/1.1 0 GET - - 80 7216 ?/><!--VAIBS--> GET /phpmyadmin?/><!--VAIBS--> HTTP/1.1 - 0 /phpmyadmin - 354 0
2 142.4.5.115 - - 2018-01-03 12:10:23 GET /phpmyadmin?/><!--VAIBS--> HTTP/1.1 400 Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0 445 - 142.4.5.115 445 - 145 - HTTP/1.1 0 GET - - 80 7216 ?/><!--VAIBS--> GET /phpmyadmin?/><!--VAIBS--> HTTP/1.1 - 0 /phpmyadmin - 354 0
3 142.4.5.115 - - 2018-01-03 12:12:41 GET /phpmyadmin?/><!--VAIBS--> HTTP/1.1 400 Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0 445 - 142.4.5.115 445 - 145 - HTTP/1.1 0 GET - - 80 7216 ?/><!--VAIBS--> GET /phpmyadmin?/><!--VAIBS--> HTTP/1.1 - 0 /phpmyadmin - 354 0
4 142.4.5.115 - - 2018-01-03 12:15:29 GET /phpmyadmin?/><!--VAIBS--> HTTP/1.1 400 Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0 445 - 142.4.5.115 445 - 145 - HTTP/1.1 0 GET - - 80 7216 ?/><!--VAIBS--> GET /phpmyadmin?/><!--VAIBS--> HTTP/1.1 - 0 /phpmyadmin - 354 0
or something similar. Best of all, will be some help how to write write.table in loop without error. But I will welcome any functional solution. Best what I have is :
sql <- paste("SELECT * FROM idsaccess ORDER BY cookie LIMIT ", awq[1], ",1")
nb <- dbGetQuery(mydb, sql)
write.table(nb, file = fileConn, append = TRUE, quote = FALSE, sep = " ", eol = "\n", na = "NA", row.names = FALSE, col.names = FALSE)
fileConn<-file("outputX1.txt")
sql <- paste("SELECT * FROM idsaccess ORDER BY cookie LIMIT ", awq[2], ",1")
nb <- dbGetQuery(mydb, sql)
write.table(nb, file = fileConn, append = true, quote = FALSE, sep = " ", eol = "\n", na = "NA", row.names = FALSE, col.names = FALSE)
But this give every query to own file. And I don't want have every query in own file. Any help ?
Simply concatenate all query dataframes into one large dataframe since they all share same structure, and then output to file in one call which is really the typical way to use write.table or its wrapper, write.csv:
Specifically, turn for loop:
for (i in 1:length(awq)){
sql <- paste("SELECT * FROM mtable ORDER BY cookie LIMIT ", awq[i], ",1")
nb <- dbGetQuery(mydb, sql)
}
Into lapply for a list of dataframes:
df_list <- lapply(1:length(awq), function(i) {
sql <- paste0("SELECT * FROM mtable ORDER BY cookie LIMIT ", awq[i], ",1")
nb <- dbGetQuery(mydb, sql)
})
Then, row bind with do.call to stack all dfs into a single dataframe and output to file:
final_df <- do.call(rbind, df_list)
write.table(final_df, file = "outputX.txt", append = true, quote = FALSE, sep = " ",
eol = "\n", na = "NA", row.names = FALSE, col.names = FALSE)

convert curl command to Rcurl

How do I convert this command:
curl -v -u abcdefghij1234567890:X -H "Content-Type: application/json" -X GET 'https://domain.freshdesk.com/api/v2/tickets'
to
curl command in Rcurl?
The dev version of curlconverter (devtools::install_github("hrbrmstr/curlconverter") can convert curl command-line strings with authentication and verbose params now:
Copy your URL to the clipboard:
curl -v -u abcdefghij1234567890:X -H "Content-Type: application/json" -X GET 'https://domain.freshdesk.com/api/v2/tickets'
Then run:
library(curlconverter)
req <- make_req(straighten())[[1]]
The following will now be in your clipboard:
httr::VERB(verb = "GET", url = "https://domain.freshdesk.com/api/v2/tickets",
httr::authenticate(user = "abcdefghij1234567890",
password = "X"), httr::verbose(),
httr::add_headers(), encode = "json")
but req is now also a callable function. You can see that by doing:
req
## function ()
## httr::VERB(verb = "GET", url = "https://domain.freshdesk.com/api/v2/tickets",
## httr::authenticate(user = "abcdefghij1234567890", password = "X"),
## httr::verbose(), httr::add_headers(), encode = "json")
or by actually calling it:
req()
I usually reformat the function source to make it more readable:
httr::VERB(verb = "GET",
url = "https://domain.freshdesk.com/api/v2/tickets",
httr::authenticate(user = "abcdefghij1234567890", password = "X"),
httr::verbose(),
httr::add_headers(),
encode = "json")
and you can easily translate that to a plain GET call without namespacing:
GET(url = "https://domain.freshdesk.com/api/v2/tickets",
authenticate(user = "abcdefghij1234567890", password = "X"),
verbose(),
add_headers(),
encode = "json"))
We can validate it working with authenticated curl command-lines by a small substitution in your example:
curl_string <- 'curl -v -u abcdefghij1234567890:X -H "Content-Type: application/json" -X GET "https://httpbin.org/basic-auth/abcdefghij1234567890/X"'
make_req(straighten(curl_string))[[1]]()
## -> GET /basic-auth/abcdefghij1234567890/X HTTP/1.1
## -> Host: httpbin.org
## -> Authorization: Basic YWJjZGVmZ2hpajEyMzQ1Njc4OTA6WA==
## -> User-Agent: libcurl/7.43.0 r-curl/1.2 httr/1.2.1
## -> Accept-Encoding: gzip, deflate
## -> Accept: application/json, text/xml, application/xml, */*
## ->
## <- HTTP/1.1 200 OK
## <- Server: nginx
## <- Date: Tue, 30 Aug 2016 14:13:12 GMT
## <- Content-Type: application/json
## <- Content-Length: 63
## <- Connection: keep-alive
## <- Access-Control-Allow-Origin: *
## <- Access-Control-Allow-Credentials: true
## <-
## Response [https://httpbin.org/basic-auth/abcdefghij1234567890/X]
## Date: 2016-08-30 14:13
## Status: 200
## Content-Type: application/json
## Size: 63 B
## {
## "authenticated": true,
## "user": "abcdefghij1234567890"
## }
You can use httr to do this as follows:
require(httr)
GET('https://domain.freshdesk.com/api/v2/tickets',
verbose(),
authenticate("user", "passwd"),
content_type("application/json"))

RCurl JSON data to JIRA REST add issue

I'm trying to POST data to JIRA Project using R and I keep getting: Error Bad Request. At first I thought it must be the JSON format that I created. So I wrote the JSON to file and did a curl command from console (see below) and the POST worked just fine.
curl -D- -u fred:fred -X POST -d #sample.json -H "Content-Type: application/json" http://localhost:8090/rest/api/2/issue/
Which brings the issue to my R code. Can someone tell me what am I doing wrong with the RCurl postForm?
Source:
library(RJSONIO)
library(RCurl)
x <- list (
fields = list(
project = c(
c(key="TEST")
),
summary="The quick brown fox jumped over the lazy dog",
description = "silly old billy",
issuetype = c(name="Task")
)
)
curl.opts <- list(
userpwd = "fred:fred",
verbose = TRUE,
httpheader = c('Content-Type' = 'application/json',Accept = 'application/json'),
useragent = "RCurl"
)
postForm("http://jirahost:8080/jira/rest/api/2/issue/",
.params= c(data=toJSON(x)),
.opts = curl.opts,
style="POST"
)
rm(list=ls())
gc()
Here's the output of the response:
* About to connect() to jirahost port 80 (#0)
* Trying 10.102.42.58... * connected
* Connected to jirahost (10.102.42.58) port 80 (#0)
> POST /jira/rest/api/2/issue/ HTTP/1.1
User-Agent: RCurl
Host: jirahost
Content-Type: application/json
Accept: application/json
Content-Length: 337
< HTTP/1.1 400 Bad Request
< Date: Mon, 07 Apr 2014 19:44:08 GMT
< Server: Apache-Coyote/1.1
< X-AREQUESTID: 764x1525x1
< X-AUSERNAME: anonymous
< Cache-Control: no-cache, no-store, no-transform
< Content-Type: application/json;charset=UTF-8
< Set-Cookie: atlassian.xsrf.token=B2LW-L6Q7-15BO- MTQ3|bcf6e0a9786f879a7b8df47c8b41a916ab51da0a|lout; Path=/jira
< Connection: close
< Transfer-Encoding: chunked
<
* Closing connection #0
Error: Bad Request
You might find it easier to use httr which has been constructed with
the needs of modern APIs in mind, and tends to set better default
options. The equivalent httr code would be:
library(httr)
x <- list(
fields = list(
project = c(key = "TEST"),
summary = "The quick brown fox jumped over the lazy dog",
description = "silly old billy",
issuetype = c(name = "Task")
)
)
POST("http://jirahost:8080/jira/rest/api/2/issue/",
body = RJSONIO::toJSON(x),
authenticate("fred", "fred", "basic"),
add_headers("Content-Type" = "application/json"),
verbose()
)
If that doesn't work, you'll need to supply the output from a successful
verbose curl on the console, and a failed httr call in R.

Resources