How to import data from a HTML table on a website to excel?

How to import data from a HTML table on a website to excel? - web-scraping

I would like to do some statistical analysis with Python on the live casino game called Crazy Time from Evolution Gaming. There is a website that has the data to do this: https://tracksino.com/crazytime. I want the data of the lowest table 'Spin History' to be imported into excel. However, I do not now how this can be done. Could anyone give me an idea where to start?
Thanks in advance!

Try the below code:
import json
import requests
from urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
import csv
import datetime
def scrap_history():
csv_headers = []
file_path = '' #mention your system where you have to save the file
file_name = 'spin_history.csv' # filename
page_number = 1
while True:
#Dynamic URL fetching data in chunks of 100
url = 'https://api.tracksino.com/crazytime_history?filter=&sort_by=&sort_desc=false&page_num=' + str(page_number) + '&per_page=100&period=24hours'
print('-' * 100)
print('URL created : ',url)
response = requests.get(url,verify=False)
result = json.loads(response.text) # loading data to convert in JSON.
history_data = result['data']
print(history_data)
if history_data != []:
with open(file_path + file_name ,'a+') as history:
#Headers for file
csv_headers = ['Occured At','Slot Result','Spin Result','Total Winners','Total Payout',]
csvwriter = csv.DictWriter(history, delimiter=',', lineterminator='\n',fieldnames=csv_headers)
if page_number == 1:
print('Writing CSV header now...')
csvwriter.writeheader()
#write exracted data in to csv file one by one
for item in history_data:
value = datetime.datetime.fromtimestamp(item['when'])
occured_at = f'{value:%d-%B-%Y # %H:%M:%S}'
csvwriter.writerow({'Occured At':occured_at,
'Slot Result': item['slot_result'],
'Spin Result': item['result'],
'Total Winners': item['total_winners'],
'Total Payout': item['total_payout'],
})
print('-' * 100)
page_number +=1
print(page_number)
print('-' * 100)
else:
break
Explanation:
I have implemented the above script using python requests way. The API url https://api.tracksino.com/crazytime_history?filter=&sort_by=&sort_desc=false&page_num=1&per_page=50&period=24hours extarcted from the web site itself(refer screenshot). In the very first step script will take the dynamic URL where page number is dynamic and changed upon on every iteration. For ex:- first it will be page_num = 1 then page_num = 2 and so on till all the data will get extracted.

Related

How can I create a BLf file based on a measurement data?

I'm trying to make a certain BLF CAN data file.
After I created an arbitary measurements table, I try to encode messages and write on BLF format by folloing codes.
The BLF file was made, however, it doesn't have any data at all.
Please let me know what the problem is.
What I tried :
import cantools
import can
db = cantools.database.load_file('T_Fuel_HB.dbc')
ex_msg = db.get_message_by_name("DEVICE_56604591_0")
time = 0
write_created = can.BLFWriter("sample_created.blf")
for i in range(10) :
r_int = np.random.randint(0, 100)
data_created = ex_msg.encode({'C_1_T_Air' : r_int, 'C_2_T_EG_room' : r_int, 'C_3_T_Pump_room' : r_int, 'C_4_T_Fuel_tank' : r_int})
msg_created = can.Message(timestamp = time, arbitration_id = ex_msg.frame_id, data = data, channel=0)
print(msg_created, r_int)
time += 2
write_created.on_message_received(msg_created)
What I expected :
filename = "VDK14.blf"
log = can.BLFReader(filename)
log = list(log)
for msg in log :
write.on_message_received(msg)
-> When I use the BLF file with existing log data, it's no problem to read the file thru CANanalyzer.

Warp10 and streamlit integration?

Two simple questions:
Does Warp10 integrate into streamlit to feed visualisations?
If so, please would you specify how this can be accomplished?
Thanking you in advance.
Best wishes,

There's no direct integration of Warp 10 in streamlit.
Although streamlit can handle any kind of data, it's mainly focused on pandas DataFrame. DataFrames are tables whereas Warp 10 Geo Time Series are time series. So even if Warp 10 was integrated in streamlit, it would require some code to properly format the data for streamlit to give its full potential.
That being said, here is a small example on how to display data stored in Warp 10 with streamlit:
import json
from datetime import datetime, timedelta
import requests
import streamlit as st
from bokeh.palettes import Category10_10 as palette
from bokeh.plotting import figure
# Should be put in a configuration file.
fetch_endpoint = 'http://localhost:8080/api/v0/fetch'
token = 'READ' # Change that to your actual token
def load_data_as_json(selector, start, end):
headers = {'X-Warp10-Token': token}
params = {'selector': selector, 'start': start, 'end': end, 'format': 'json'}
r = requests.get(fetch_endpoint, params=params, headers=headers)
return r.text
st.title('Warp 10 Test')
# Input parameters
selector = st.text_input('Selector', value="~streamlit.*{}")
start_date = st.date_input('Start date', value=datetime.now() - timedelta(days=10))
start_time = st.time_input('Start time')
end_date = st.date_input('End date')
end_time = st.time_input('End time')
# Convert datetime.dates and datetime.times to microseconds (default time unit in Warp 10)
start = int(datetime.combine(start_date, start_time).timestamp()) * 1000000
end = int(datetime.combine(end_date, end_time).timestamp()) * 1000000
# Make the query to Warp 10 and get back a json.
json_data = load_data_as_json(selector, start, end)
gtss = json.loads(json_data)
# Iterate through the json and populate a Bokeh graph.
p = figure(title='GTSs', x_axis_label='time', y_axis_label='value')
for gts_index, gts in enumerate(gtss):
tss = []
vals = []
for point in gts['v']:
tss.append(point[0])
vals.append(point[-1])
p.line(x=tss, y=vals, legend_label=gts['c'] + json.dumps(gts['l']), color=palette[gts_index % len(palette)])
st.bokeh_chart(p, use_container_width=True)
# Also display the json.
st.json(json_data)

Uploading a file to Figshare using rapiclient swagger api

I am trying to programmatically update my Figshare repository using rapiclient.
Following the answer to this question, I managed to authenticate and see my repository by:
library(rapiclient)
library(httr)
# figshare repo id
id = 3761562
fs_api <- get_api("https://docs.figshare.com/swagger.json")
header <- c(Authorization = sprintf("token %s", Sys.getenv("RFIGSHARE_PAT")))
fs_api <- list(operations = get_operations(fs_api, header),
schemas = get_schemas(fs_api))
reply <- fs_api$operations$article_files(id)
I also managed to delete a file using:
fs_api$operations$private_article_file_delete(article_id = id, file_id = F)
Now, I would like to upload a new file to the repository. There seem to be two methods I need:
fs_api$operations$private_article_upload_initiate
fs_api$operations$private_article_upload_complete
But I do not understand the documentation. According to fs_api$operations$private_article_upload_initiate help:
> fs_api$operations$private_article_upload_initiate
private_article_upload_initiate
Initiate Upload
Description:
Initiate new file upload within the article. Either use link to
provide only an existing file that will not be uploaded on figshare
or use the other 3 parameters(md5, name, size)
Parameters:
link (string)
Url for an existing file that will not be uploaded on figshare
md5 (string)
MD5 sum pre computed on the client side
name (string)
File name including the extension; can be omitted only for linked
files.
size (integer)
File size in bytes; can be omitted only for linked files.
What does "file that will not be uploaded on Figshare" mean? How would I use the API to upload a local file ~/foo.txt?
fs_api$operations$private_article_upload_initiate(link='~/foo.txt')
returns HTTP 400.

I feel like I sent you down a bad path with my previous answer because I am not sure how to edit some of the api endpoints when using rapiclient. For example, the corresponding endpoint for fs_api$operations$private_article_upload_initiate() will be https://api.figshare.com/v2/account/articles/{article_id}/files, and I am not sure how to substitute for {article_id} prior to sending the request.
You may have to define your own client for operations you cannot get working any other way.
Here is an example of uploading a file to an existing private article as per the goal of your question.
library(httr)
# id of previously created figshare article
my_article_id <- 99999999
# make example file to upload
my_file <- tempfile("my_file", fileext = ".txt")
writeLines("Hello World!", my_file)
# Step 1 initiate upload
# https://docs.figshare.com/#private_article_upload_initiate
r <- POST(
url = sprintf("https://api.figshare.com/v2/account/articles/%s/files", my_article_id),
add_headers(c(Authorization = sprintf("token %s", Sys.getenv("RFIGSHARE_PAT")))),
body = list(
md5 = tools::md5sum(my_file)[[1]],
name = basename(my_file),
size = file.size(my_file)
),
encode = "json"
)
initiate_upload_response <- content(r)
# Step 2 single file info (get upload url)
# https://docs.figshare.com/#private_article_file
r <- GET(url = initiate_upload_response$location,
add_headers(c(Authorization = sprintf("token %s", Sys.getenv("RFIGSHARE_PAT"))))
)
single_file_response <- content(r)
# Step 3 uploader service (get number of upload parts required)
# https://docs.figshare.com/#endpoints
r <- GET(url = single_file_response$upload_url,
add_headers(c(Authorization = sprintf("token %s", Sys.getenv("RFIGSHARE_PAT"))))
)
upload_service_response <- content(r)
# Step 4 upload parts (this example only has one part)
# https://docs.figshare.com/#endpoints_1
r <- PUT(url = single_file_response$upload_url, path = 1,
add_headers(c(Authorization = sprintf("token %s", Sys.getenv("RFIGSHARE_PAT")))),
body = upload_file(my_file)
)
upload_parts_response <- content(r)
# Step 5 complete upload (after all part uploads are successful)
# https://docs.figshare.com/#private_article_upload_complete
r <- POST(
url = initiate_upload_response$location,
add_headers(c(Authorization = sprintf("token %s", Sys.getenv("RFIGSHARE_PAT"))))
)
complete_upload_response <- content(r)

Wrong page parsed BeautifulSoup?

I want to enter two values on this website https://hausratversicherung.friday.de/ and retrieve the value after submitting it. I wrote the following code
import requests, re
from robobrowser import RoboBrowser
br = RoboBrowser(parser='html.parser')
br.open("https://hausratversicherung.friday.de/")
form = br.get_form()
form['area'] = 100
form['postalCode'] = 44326
br.submit_form(form)
src = str(br.parsed())
start = '<div class="Typography-sc-3c3fuf-0 jEIicc" data-testid="totalPrice">'
end = ' €</div>'
result = re,search('%s(.*)%s' % (start, end),src).group(1)
print(result)
But the browser br is not opening the mentioned page and taking these values.

The postal code 44326 isn't accepted by the server. For other postal codes you can query their API directly:
import json
import requests
area = 100
postalcode = 44309
url = 'https://fdy2-policycenter-production.k8s.blue.friday-prod.de/rest/friday/hc/price?area={area}&postalCode={postalcode}'
data = requests.get(url.format(area=area, postalcode=postalcode)).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
# print some info to screen:
print(data['basicCoverages']['coverages'][0]['insuredSum']['amount'])
print(data['basicCoverages']['coverages'][0]['price']['amount'])
Prints:
65000.0
7.81

I'm Getting error json.decoder.JSONDecodeError: while running a python code

I have got this code from internet, for extracting data from justdial website.
While running this code I got the following error:
ERROR:json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) is shown.
Please help me to run this code as i'm not familiar with python. What changes should be done to run this code step by step.
Thank you in advance.
Here is my code:
import csv
import json
import requests
from bs4 import BeautifulSoup
print(25*"=")
print("Just Dial Scraper")
print(25*"=")
url = 'http://www.justdial.com/functions/ajxsearch.php?national_search=0&act'\
'=pagination&city={0}&search={1}&page={2}'
what = input("Enter your Query: ")
what = what.replace(' ', '+')
where = input("Enter the Location: ")
with open(what+"_"+where+'.csv', 'w') as f:
f.write('company, address, phone\n')
page = 1
while True:
print('Scraping Page', page)
resp = requests.get(url.format(where, what, page))
if not resp.json()['paidDocIds']:
print(25*"-")
print('Scraping Finished')
print(25*"-")
break
markup = resp.json()['markup'].replace('\/', '/')
soup = BeautifulSoup(markup, 'html.parser')
for thing in soup.find_all('section'):
csv_list = []
if thing.get('class') == ['jcar']:
# Company name
for a_tag in thing.find_all('a'):
if a_tag.get('onclick') == "_ct('clntnm', 'lspg');":
csv_list.append(a_tag.get('title'))
# Address
for span_tag in thing.find_all('span'):
if span_tag.get('class') == ['mrehover', 'dn']:
csv_list.append(span_tag.get_text().strip())
# Phone number
for a_tag in thing.find_all('a'):
if a_tag.get('href').startswith('tel:'):
csv_list.append(a_tag.get('href').split(':')[-1])
csv_list = ['"'+item+'"' for item in csv_list]
writeline = ','.join(csv_list)+'\n'
f.write(','.join(csv_list)+'\n')
page += 1

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to import data from a HTML table on a website to excel? - web-scraping

Related

How can I create a BLf file based on a measurement data?

Warp10 and streamlit integration?

Uploading a file to Figshare using rapiclient swagger api

Wrong page parsed BeautifulSoup?

I'm Getting error json.decoder.JSONDecodeError: while running a python code

Categories

Resources