HI all.
im new to asp programming.
i want to extract data from bbs,http://sports.williamhill.com/bet/en-gb/betting/y/5/tm/Football.html
my plan is first, if i click some button, i go above page and extract 'Daily Match List'.
and with extracted data, i want to remove some unneed part of html source.
so what i want to extract data will be such like following
19:45 UK swidon VS Bristol Rovers 21/20 23/10 13/5
19:45 UK Brazil VS Ukraine 4/9 16/5 6/1
.
.
.
then i want insert this extracted data to excel file .
sorry my english
thanks in advance
I would look at using httpwebrequest - you can then make a web request to that page and access the html from the response. You then need to look for something that will identify the block of information your after and strip this out of the response.
However, there are some downsides to taking a screen scrap approach, if William Hill decide to change the format of the web page then it may break your code.
http://www.netomatix.com/HttpPostData.aspx
http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.aspx
Cheers Tigger
Related
I am playing around with the Skyscanner API from their webpage in Postman (opens in a new tab, or in the Postman desktop app) and testing the endpoint for browsing flights. This is what the API says in their page:
And this is what I am trying - browsing for flights from Stockholm Arlanda Airport (ARN-sky) to Heathrow (LHR-sky), on the date 22nd July (around 4 days from now) for first leg, and 25th for return, but as you can see, I am not getting any result. The URL I am trying is this.
Any idea what am I doing wrong, and how to fix it?
Please, mind that you present an image regarding an endpoint to Browse Quotes, but you are trying to consume an endpoint to Browse Routes.
Assuming that you actually want to browse routes, I believe the problem may be this:
The endpoint is of the form:
GET /browseroutes/v1.0/{country}/{currency}/{locale}/{originPlace}/{destinationPlace}/{outboundPartialDate}/{inboundPartialDate}
You are writing a URL like:
.../browseroutes/v1.0/FR/eur/en-US/us/ARN-sky/LHR-sky/2021-07-22/2021-07-25?apikey=<api-key>
So it seems that you are actually specifying:
originPlace = us
destinationPlace = ARN-sky
But I think you wanted to define:
originPlace = ARN-sky
destinationPlace = LHR-sky
To solve this, you may remove the /us member, thus writing: http://partners.api.skyscanner.net/apiservices/browseroutes/v1.0/FR/eur/en-US/ARN-sky/LHR-sky/2021-07-22/2021-07-25?apikey=api-key
(Please, replace the api-key value by an actual API key)
This URL already returns a valid 200 OK result :)
I wrote some code which should check whether a product is back in stock and when it is, send me an email to notify me. This works when the things I'm looking for are in the html.
However, sometimes certain objects are loaded through JavaScript. How could I edit my code so that the web scraping also works with JavaScript?
This is my code thus far:
import time
import requests
while True:
# Get the url of the IKEA page
url = 'https://www.ikea.com/nl/nl/p/flintan-bureaustoel-vissle-zwart-20336841/'
# Get the text from that page and put everything in lower cases
productpage = requests.get(url).text.lower()
# Set the strings that should be on the page if the product is not available
outofstockstrings = ['niet beschikbaar voor levering', 'alleen beschikbaar in de winkel']
# Check whether the strings are in the text of the webpage
if any(x in productpage for x in outofstockstrings):
time.sleep(1800)
continue
else:
# send me an email and break the loop
Instead of scraping and analyzing the HTML you could use the inofficial stock API that the IKEA website is using too. That API return JSON data which is way easier to analyze and you’ll also get estimates when the product gets back to stock.
There even is a project written in javascript / node which provides you this kind of information straight from the command line: https://github.com/Ephigenia/ikea-availability-checker
You can easily check the stock amount of the chair in all stores in the Netherlands:
npx ikea-availability-checker stock --country nl 20336841
Trying to get the Title for "Blood Oath" from 1990 https://www.imdb.com/title/tt0100414/ .In this example am using Jupyter, but it works the same in my .py program:
movie = ia.get_movie('0100414')
movie
<Movie id:0100414[http] title:_Prisoners of the Sun (1990)_>
Am I doing something wrong? This seems to be the 'USA aka' title. I do know how to get the AKA titles back via the API, but just puzzled as to why it's returning this one. On the IMDB web page "Blood Oath" is listed - under the AKA section - as the "(original title)". Thank you.
What you do is correct.
IMDbPY takes the movie title from the value of a meta tag with property set to "og:title". So, what's considered the title of a movie depends on the decisions made by IMDb.
You can also use "original title" key, that is taken from what it's actually shown to the reader of the web page. This, however, is even more subject to change since it's usually shown in the language guessed by the IMDb web servers using the language set by a registered user, the settings of your browser or by geolocation of the IP.
So, for example, for that title I get "Blood Oath" via browser since my browser is set to English and "Giuramento di sangue (1990)" if I access movie['original title'] (geolocation of my IP, I guess)
To conclude, if you really need another title, you may get the whole list this way:
ia.update(movie, 'release info')
print(movie.get('akas from release info'))
You will get a list that you can parse looking for a string ending in '(original title)'
(disclaimer: I'm one of the main authors of IMDbPY)
I am fetching data from the here autocomplete rest endpoint. It works fine, but for example searching for berlin returns me 3 results, which are (at least in my usecase) defacto the same ... I assume berlin is found as state, city and area or something, but since this will be the same area I consider it to be duplicates ...
I tried playing around with the resultType-attribute, but using area gives the same behavior. If I use city I have to enter all letters of berlin to get the suggestion ...
you can try resultType=city&country=DEU&query=Ber with country filter you need not to enter all the letters of berlin to get the suggestion and all with no duplicate.
https://developer.here.com/documentation/geocoder-autocomplete/dev_guide/topics/resource-suggest.html
Hope this helps !
I have to the new version coming out in a few weeks and it seems this issue is fixed:
Current:
Current behavior for HERE autocomplete
View of upcoming version:
Preview of latest behavior for HERE autocomplete
I already check the copyrights of Brazilian Central Bank, from now on: "BR Central Bank", (link here) and:
The total or partial reproduction of the content of this site is allowed, preserving the integrity of the information and citing the source. It is also authorized to insert links on other websites to the Central Bank of Brazil (BCB) website. However, the BCB reserves the right to change the provision of information on the site as necessary without notice.
Thus, I'm trying to scrape this website: https://www.bcb.gov.br/estabilidadefinanceira/leiautedoc2061e2071/atuais , but I can't understand why I'm not able to do it. Below you'll find what I'm doing. The html when is saved is empty. What am I doing wrong? Can anybody help me please? After this step I'll read the html code and look for new additions from last database.
url_bacen <- "https://www.bcb.gov.br/estabilidadefinanceira/leiautedoc2061e2071/atuais"
file_bacen_2061 <- paste("Y:/Dir_Path/" , "BACEN_2061.html", sep="" )
download.file(url_bacen,file_bacen_2061, method="auto",quiet= FALSE, mode="wb")
Tks for any help,
Felipe
Data is dynamically pulled from API call you can find it network tab when pressing F5 to refresh page i.e. the landing page makes an additional xhr request for info that you are not capturing. If you mimic this request it returns json you can parse for whatever info you want
library(jsonlite)
data <- jsonlite::read_json('https://www.bcb.gov.br/api/servico/sitebcb/leiautes2061')
print(data$conteudo)