How can I model a scalable set of definition/term pairs?

How can I model a scalable set of definition/term pairs? - dictionary

Right now my flashcard game is using a prepvocab() method where I
define the terms and translations for a week's worth of terms as a dictionary
add a description of that week's terms
lump them into a list of dictionaries, where a user selects their "weeks" to study
Every time I add a new week's worth of terms and translations, I'm stuck adding another element to the list of available dictionaries. I can definitely see this as not being a Good Thing.
class Vocab(object):
def __init__(self):
vocab = {}
self.new_vocab = vocab
self.prepvocab()
def prepvocab(self):
week01 = {"term":"translation"} #and many more...
week01d = "Simple Latvian words"
week02 = {"term":"translation"}
week02d = "Simple Latvian colors"
week03 = {"I need to add this":"to self.selvocab below"}
week03d = "Body parts"
self.selvocab = [week01, week02] #, week03, weekn]
self.descs = [week01d, week02d] #, week03, weekn]
Vocab.selvocab(self)
def selvocab(self):
"""I like this because as long as I maintain self.selvocab,
the for loop cycles through the options just fine"""
for x in range(self.selvocab):
YN = input("Would you like to add week " \
+ repr(x + 1) + " vocab? (y or n) \n" \
"Description: " + self.descs[x] + " ").lower()
if YN in "yes":
self.new_vocab.update(self.selvocab[x])
self.makevocab()
I can definitely see that this is going to be a pain with 20+ yes no questions. I'm reading up on curses at the moment, and was thinking of printing all the descriptions at once, and letting the user pick all that they'd like to study for the round.
How do I keep this part of my code better maintained? Anybody got a radical overhaul that isn't so....procedural?

You should store your term:translation pairs and descriptions in a text file in some manner. Your program should then parse the text file and discover all available lessons. This will allow you to extend the set of lessons available without having to edit any code.
As for your selection of lessons, write a print_lesson_choices function that displays the available lessons and descriptions to the user, and then ask for their input in selecting them. Instead of asking a question of them for every lesson, why not make your prompt something like:
self.selected_weeks = []
def selvocab(self):
self.print_lesson_choices()
selection = input("Select a lesson number or leave blank if done selecting: ")
if selection == "": #Done selecting
self.makevocab()
elif selection in self.available_lessons:
if selection not in self.selected_weeks:
self.selected_weeks.append(selection)
print "Added lesson %s"%selection
self.selvocab() #Display the list of options so the user can select again
else:
print "Bad selection, try again."
self.selvocab()

Pickling objects into a database means it'll take some effort to create an interface to modify the weekly lessons from the front end, but is well worth the time.

Related

What should I learn to code a bot in Telegram?

I want to code and creat a bot for telegram that does these things:
1 - shows a massage to the person that hit start button
2 - then it gets a name as an input
3 - then again shows a massage
4 - getting an input
5 - at the end add the inputs to a defualt text and showing it;
for exampele:
-start
+Hi What is your name?
-X
+How old are you?
-Y
+Your name is X and you are Y years old.
My second Question is that how can I Connect to bots together, for example imagine I want to pass some input from this bot to make a poll(voting massage), in order to do that I should send the name to let's say #vote, how is that possible and what should I learn to do such things with my bot?

First you're gonna have to explore telegram bot API documentation here.
Then you should choose your programming language and the library you want to use.
There are different libraries for each language, I'm gonna name a few:
Go: https://github.com/aliforever/go-telegram-bot-api (DISCLAIMER: I wrote and maintain it)
Python: https://github.com/eternnoir/pyTelegramBotAPI
NodeJS: https://github.com/telegraf/telegraf
I'm gonna give you an example for what you want in python using pyTelegramBotAPI:
First install the library using pip:
pip install git+https://github.com/eternnoir/pyTelegramBotAPI.git
Then run this script:
import telebot
API_TOKEN = 'PLACE_BOT_TOKEN_HERE'
bot = telebot.TeleBot(API_TOKEN)
user_info = {}
def set_user_state(user_id, state):
if user_id not in user_info:
user_info[user_id] = {}
user_info[user_id]["state"] = state
def get_user_state(user_id):
if user_id in user_info:
if "state" in user_info[user_id]:
return user_info[user_id]["state"]
return "Welcome"
def set_user_info(user_id, name=None, age=None):
if name is None and age is None:
return
if name is not None:
user_info[user_id]["name"] = name
if age is not None:
user_info[user_id]["age"] = age
def get_user_info(user_id):
return user_info[user_id]
#bot.message_handler()
def echo_all(message):
user_id = message.from_user.id
if message.text == "/start":
bot.reply_to(message, "Hi What is your name?")
set_user_state(user_id, "EnterName")
return
user_state = get_user_state(user_id)
if user_state == "EnterName":
set_user_info(user_id, name=message.text)
bot.reply_to(message, "How old are you?")
set_user_state(user_id, "EnterAge")
return
if user_state == "EnterAge":
set_user_info(user_id, age=message.text)
info = get_user_info(user_id)
bot.reply_to(message, "Your name is %s and you are %s years old." %(info["name"], info["age"]))
set_user_state(user_id, "Welcome")
return
bot.reply_to(message, "To restart please send /start")
bot.infinity_polling()
Here we use a dictionary to place user state and info, you can place them anywhere like databases or json files.
Then we update a user's state based on their interactions with the bot.
For your second question, bots cannot communicate with each other so you should look for other solutions. In the case of your question where you want to create a poll, you should check sendPoll method as well as PollAnswer object which you receive when a user votes in a poll.

In R: Search all emails by subject line, pull comma-separate values from body, then save values in a dataframe

Each day, I get an email with the quantities of fruit sold on a particular day. The structure of the email is as below:
Date of report:,04-JAN-2022
Time report produced:,5-JAN-2022 02:04
Apples,6
Pears,1
Lemons,4
Oranges,2
Grapes,7
Grapefruit,2
I'm trying to build some code in R that will search through my emails, find all emails with a particular subject, iterate through each email to find the variables I'm looking for, take the values and place them in a dataframe with the "Date of report" put in a date column.
With the assistance of people in the community, I was able to achieve the desired result in Python. However as my project has developed, I need to now achieve the same result in R if at all possible.
Unfortunately, I'm quite new to R and therefore if anyone has any advice on how to take this forward I would greatly appreciate it.
For those interested, my Python code is below:
#PREP THE STUFF
Fruit_1 = "Apples"
Fruit_2 = "Pears"
searchf = [
Fruit_1,
Fruit_2
]
#DEF THE STUFF
def get_report_vals(report, searches):
dct = {}
for line in report:
term, *value = line
if term.casefold().startswith('date'):
dct['date'] = pd.to_datetime(value[0])
elif term in searches:
dct[term] = float(value[0])
if len(dct.keys()) != len(searches):
dct.update({x: None for x in searches if x not in dct})
return dct
#DO THE STUFF
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6)
messages = inbox.Items
messages.Sort("[ReceivedTime]", True)
results = []
for message in messages:
if message.subject == 'FRUIT QUANTITIES':
if Fruit_1 in message.body and Fruit_2 in message.body:
data = [line.strip().split(",") for line in message.body.split('\n')]
results.append(get_report_vals(data, searchf))
else:
pass
fruit_vals = pd.DataFrame(results)
fruit_vals.columns = map(str.upper, fruit_vals.columns)
I'm probably going about this the wrong way, but I'm trying to use the steps I took in Python to achieve the same result in R. So for example I create some variables to hold the fruit sales I'm searching for, then I create a vector to store the searchables, and then when I create an equivalent 'get_vals' function, I create an empty vector.
library(RDCOMClient)
Fruit_1 <- "Apples"
Fruit_2 <- "Pears"
##Create vector to store searchables
searchf <- c(Fruit_1, Fruit_2)
## create object for outlook
OutApp <- COMCreate("Outlook.Application")
outlookNameSpace = OutApp$GetNameSpace("MAPI")
search <- OutApp$AdvancedSearch("Inbox", "urn:schemas:httpmail:subject = 'FRUIT QUANTITIES'")
inbox <- outlookNameSpace$Folders(6)$Folders("Inbox")
vec <- c()
for (x in emails)
{
subject <- emails(i)$Subject(1)
if (grepl(search, subject)[1])
{
text <- emails(i)$Body()
print(text)
break
}
}

read.table could be a good start for get_report_vals.
Code below outputs result as a list, exception handling still needs to be implemented :
report <- "
Date of report:,04-JAN-2022
Apples,6
Pears,1
Lemons,4
Oranges,2
Grapes,7
Grapefruit,2
"
get_report_vals <- function(report,searches) {
data <- read.table(text=report,sep=",")
colnames(data) <- c('key','value')
# find date
date <- data[grepl("date",data$key,ignore.case=T),"value"]
# transform dataframe to list
lst <- split(data$value,data$key)
# output result as list
c(list(date=date),lst[searches])
}
get_report_vals(report,c('Lemons','Oranges'))
$date
[1] "04-JAN-2022"
$Lemons
[1] "4"
$Oranges
[1] "2"
The results of various reports can then be concatenated in a data.frame using rbind:
rbind(get_report_vals(report,c('Lemons','Oranges')),get_report_vals(report,c('Lemons','Oranges')))
date Lemons Oranges
[1,] "04-JAN-2022" "4" "2"
[2,] "04-JAN-2022" "4" "2"

The code now functions as intended. Function was written quite a bit differently from those recommended:
get_vals <- function(email) {
body <- email$body()
date <- str_extract(body, "\\d{2}-[:alpha:]{3}-\\d{4}") %>%
as.character()
data <- read.table(text = body, sep = ",", skip = 9, strip.white = T) %>%
row_to_names(1) %>%
mutate("Date" = date)
return(data)
}
In addition I've written this to bind the rows together:
info <- sapply(results, get_vals, simplify = F) %>%
bind_rows()

May this is not what you are expecting to get as an answer, but I must state that here to help other readers to avoid such mistakes in future.
Unfortunately your Python code is not well-written. For example, I've noticed the following code where you iterate over all items in a folder and check the Subject and message bodies for keywords:
for message in messages:
if message.subject == 'FRUIT QUANTITIES':
if Fruit_1 in message.body and Fruit_2 in message.body:
You need to use the Find/FindNext or Restrict methods of the Items class instead. So, you don't need to iterate over all items in a folder. Instead, you get only items that correspond to your conditions. Read more about these methods in the following articles:
How To: Use Find and FindNext methods to retrieve Outlook mail items from a folder (C#, VB.NET)
How To: Use Restrict method to retrieve Outlook mail items from a folder
You may combine all your search criteria into a single query. So, you just need to iterate over found items and extract the data.
Also you may find the AdvancedSearch method helpful. The key benefits of using the AdvancedSearch method in Outlook are:
The search is performed in another thread. You don’t need to run another thread manually since the AdvancedSearch method runs it automatically in the background.
Possibility to search for any item types: mail, appointment, calendar, notes etc. in any location, i.e. beyond the scope of a certain folder. The Restrict and Find/FindNext methods can be applied to a particular Items collection (see the Items property of the Folder class in Outlook).
Full support for DASL queries (custom properties can be used for searching too). You can read more about this in the Filtering article in MSDN. To improve the search performance, Instant Search keywords can be used if Instant Search is enabled for the store (see the IsInstantSearchEnabled property of the Store class).
You can stop the search process at any moment using the Stop method of the Search class.
See Advanced search in Outlook programmatically: C#, VB.NET for more information.

Using Multiple Variables to Reference a Sub-Sub-Sub Field in a Lua Dictionary

I'm new to Lua (like, yesterday new), so please bear with me...
I apologize for the convoluted nature of this question, but I had no better idea of how to demonstrate what I'm trying to do:
I have a Lua table being used as a dictionary. The tuples(?) are not numerically indexed, but use mostly string indices. Many of the indices actually relate to sub-tables that contain more detailed information, and some of the indices in those tables relate to still more tables - some of them three or four "levels" deep.
I need to make a function that can search for a specific item description from several "levels" into the dictionary's structure, without knowing ahead of time which keys/sub-keys/sub-sub-keys led me to it. I have tried to do this using variables and for loops, but have run into a problem where two keys in a row are being dynamically tested using these variables.
In the example below, I'm trying to get at the value:
myWarehouselist.Warehouse_North.departments.department_one["rjXO./SS"].item_description
But since I don't know ahead of time that I'm looking in "Warehouse_North", or in "department_one", I run through these alternatives using variables, searching for the specific Item ID "rjXO./SS", and so the reference to that value ends up looking like this:
myWarehouseList[warehouse_key].departments[department_key][myItemID]...?
Basically, the problem I'm having is when I need to put two variables back-to-back in the reference chain of a value being stored at level N of a dictionary. I can't seem to write it out as [x][y], or as [x[y]], or as [x.y] or as [x].[y]... I understand that in Lua, x.y is not the same as x[y] (the former directly references a key by string index "y", while the latter uses the value being stored in variable "y", which could be anything.)
I've tried many different ways and only gotten errors.
What's interesting is that if I use the exact same approach, but add an additional "level" to the dictionary with a constant value, such as ["items"] (under each specific department), it allows me to reference the value without issue, and my script runs fine...
myWarehouseList[warehouse_key].departments[department_key].items[item_key].item_description
Is this how Lua syntax is supposed to look? I've changed the table structure to include that extra layer of "items" under each department, but it seems redundant and unnecessary. Is there a syntactical change that I can make to allow me to use two variables back-to-back in a Lua table value reference chain?
Thanks in advance for any help!
myWarehouseList = {
["Warehouse_North"] = {
["description"] = "The northern warehouse"
,["departments"] = {
["department_one"] = {
["rjXO./SS"] = {
["item_description"] = "A description of item 'rjXO./SS'"
}
}
}
}
,["Warehouse_South"] = {
["description"] = "The southern warehouse"
,["departments"] = {
["department_one"] = {
["rjXO./SX"] = {
["item_description"] = "A description of item 'rjXO./SX'"
}
}
}
}
}
function get_item_description(item_id)
myItemID = item_id
for warehouse_key, warehouse_value in pairs(myWarehouseList) do
for department_key, department_value in pairs(myWarehouseList[warehouse_key].departments) do
for item_key, item_value in pairs(myWarehouseList[warehouse_key].departments[department_key]) do
if item_key == myItemID
then
print(myWarehouseList[warehouse_key].departments[department_key]...?)
-- [department_key[item_key]].item_description?
-- If I had another level above "department_X", with a constant key, I could do it like this:
-- print(
-- "\n\t" .. "Item ID " .. item_key .. " was found in warehouse '" .. warehouse_key .. "'" ..
-- "\n\t" .. "In the department: '" .. dapartment_key .. "'" ..
-- "\n\t" .. "With the description: '" .. myWarehouseList[warehouse_key].departments[department_key].items[item_key].item_description .. "'")
-- but without that extra, constant "level", I can't figure it out :)
else
end
end
end
end
end

If you make full use of your looping variables, you don't need those long index chains. You appear to be relying only on the key variables, but it's actually the value variables that have most of the information you need:
function get_item_description(item_id)
for warehouse_key, warehouse_value in pairs(myWarehouseList) do
for department_key, department_value in pairs(warehouse_value.departments) do
for item_key, item_value in pairs(department_value) do
if item_key == item_id then
print(warehouse_key, department_key, item_value.item_description)
end
end
end
end
end
get_item_description'rjXO./SS'
get_item_description'rjXO./SX'

Building a query in appmaker with "or"

So say I am using a form to build a query against my datasource (i've come so far in two weeks! I can do this!), how do I make it more complex?
What if I want books by austen that include the word "pride" AND books by gabaldon that contain the word "Snow"
the individual queries would be
widget.datasource.query.filters['author']._contains = "austen";
widget.datasource.query.filters['title']._contains = "pride";
and
widget.datasource.query.filters['author']._contains = "gabaldon";
widget.datasource.query.filters['title']._contains = "snow";
in pseudosql it would be
select * from table
where
((author like 'austen') and (title like 'snow'))
or
((author like 'gabaldon') and (title like 'pride'))
Is there a way to filter a data source on a complex query like this and cut out the whole widget.datasource aspect? I'd be fine with using a calculated table.
Edit: Ok i'm making some progress towards the kind of functionality I need, can anyone tell me why this works:
widget.datasource.query.filters.document_name._contains = 'x';
but this does not?
widget.datasource.query.parameters.v1 = "x";
widget.datasource.query.where = 'document_name contains :v1';
this also doesn't work:
widget.datasource.query.where = 'document_name contains "x"';

How to scrape options from dropdown list and store them in table?

I am trying to make an interactive dashboard with analysis, base on car side. I would like user to be able to pick car brand for example BMW, Audi etc. and base on this choise he will have only avaiablity to pick BMW/Audi etc. models. I have a problem after selecting each brand, I am not able to scrape the models that belongs to that brand. Page that I am scraping from:
main page --> https://www.otomoto.pl/osobowe/
sub car brand page example --> https://www.otomoto.pl/osobowe/audi/
I have tried to scrape every option, so later on I can maybe somehow clean the data to store only models
code:
otomoto_models - paste0("https://www.otomoto.pl/osobowe/"audi/")
models <- read_html(otomoto_models) %>%
html_nodes("option") %>%
html_text()
But it is just scraping the brands with other options avaiable on the page engine type etc. While after inspecting element I can clearly see models types.
otomoto <- "https://www.otomoto.pl/osobowe/"
brands <- read_html(otomoto) %>%
html_nodes("option") %>%
html_text()
brands <- data.frame(brands)
for (i in 1:nrow(brands)){
no_marka_pojazdu <- i
if(brands[i,1] == "Marka pojazdu"){
break
}
}
no_marka_pojazdu <- no_marka_pojazdu + 1
for (i in 1:nrow(brands)){
zuk <- i
if(substr(brands[i,1],1,3) == "Żuk"){
break
}
}
Modele_pojazdow <- as.character(brands[no_marka_pojazdu:zuk,1])
Modele_pojazdow <- removeNumbers(Modele_pojazdow)
Modele_pojazdow <- substr(Modele_pojazdow,1,nchar(Modele_pojazdow)-2)
Modele_pojazdow <- data.frame(Modele_pojazdow)
Above code is only to pick supported car brands on the webpage and store them in the data frame. With that I am able to create html link and direct everything to one selected brand.
I would like to have similar object to "Modele_pojazdow" but with models limited on previous selected car brand.
Dropdown list with models appears as white box with text "Model pojazdu" next to the "Audi" box on the right side.

Some may frown on the solution language being Python, but the aim of this is was to give some pointers (high level process). I haven't written R in a long time so Python was quicker.
EDIT: R script now added
General outline:
The first dropdown options can be grabbed from the value attribute of each node returned by using a css selector of #param571 option. This uses an id selector (#) to target the parent dropdown select element, and then option type selector in descendant combination, to specify the option tag elements within. The html to apply this selector combination to can be retrieved by an xhr request to the url you initially provided. You want a nodeList returned to iterate over; akin to applying selector with js document.querySelectorAll.
The page uses ajax POST requests to update the second dropdown based on your first dropdown choice. Your first dropdown choice determines the value of a parameter search[filter_enum_make], which is used in the POST request to the server. The subsequent response contains a list of the available options (it includes some case alternatives which can be trimmed out).
I captured the POST request by using fiddler. This showed me the request headers and params in the request body. Screenshot sample shown at end.
The simplest way to extract the options from the response text, IMO, is to regex the appropriate string out (I wouldn't normally recommend regex for working with html but in this case it serves us nicely). If you don't want to use regex, you can grab the relevant info from the data-facets attribute of the element with id body-container. For the non-regex version you need to handle unquoted nulls, and retrieve the inner dictionary whose key is filter_enum_model. I show a function re-write, at the end, to handle this.
The retrieved string is a string representation of a dictionary. This needs converting to an actual dictionary object which you can then extract the option values from. Edit: As R doesn't have a dictionary object a similar structure needs to be found. I will look at this when converting.
I create a user defined function, getOptions(), to return the options for each make. Each car make value comes from the list of possible items in the first dropdown. I loop those possible values, use the function to return a list of options for that make, and add those lists as values to a dictionary, results ,whose keys are the make of car. Again, for R an object with similar functionality to a python dictionary needs to be found.
That dictionary of lists needs converting to a dataframe which includes a transpose operation to make a tidy output of headers, which are the car makes, and columns underneath each header, which contain the associated models.
The whole thing can be written to csv at the end.
So, hopefully that gives you an idea of one way to achieve what you want. Perhaps someone else can use this to help write you a solution.
Python demonstration of this below:
import requests
from bs4 import BeautifulSoup as bs
import re
import ast
import pandas as pd
headers = {
'User-Agent' : 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36'
}
def getOptions(make): #function to return options based on make
data = {
'search[filter_enum_make]': make,
'search[dist]' : '5',
'search[category_id]' : '29'
}
r = requests.post('https://www.otomoto.pl/ajax/search/list/', data = data, headers = headers)
try:
# verify the regex here: https://regex101.com/r/emvqXs/1
data = re.search(r'"filter_enum_model":(.*),"new_used"', r.text ,flags=re.DOTALL).group(1) #regex to extract the string containing the models associated with the car make filter
aDict = ast.literal_eval(data) #convert string representation of dictionary to python dictionary
d = len({k.lower(): v for k, v in aDict.items()}.keys()) #find length of unique keys when accounting for case
dirtyList = list(aDict)[:d] #trim to unique values
cleanedList = [item for item in dirtyList if item != 'other' ] #remove 'other' as doesn't appear in dropdown
except:
cleanedList = [] # sometimes there are no associated values in 2nd dropdown
return cleanedList
r = requests.get('https://www.otomoto.pl/osobowe/')
soup = bs(r.content, 'lxml')
values = [item['value'] for item in soup.select('#param571 option') if item['value'] != '']
results = {}
# build a dictionary of lists to hold options for each make
for value in values:
results[value] = getOptions(value) #function call to return options based on make
# turn into a dataframe and transpose so each column header is the make and the options are listed below
df = pd.DataFrame.from_dict(results,orient='index').transpose()
#write to csv
df.to_csv(r'C:\Users\User\Desktop\Data.csv', sep=',', encoding='utf-8-sig',index = False )
Sample of csv output:
Example as sample json for alfa-romeo:
Example of regex match for alfa-romeo:
{"145":1,"146":1,"147":218,"155":1,"156":118,"159":559,"164":2,"166":39,"33":1,"Alfasud":2,"Brera":34,"Crosswagon":2,"GT":89,"GTV":7,"Giulia":251,"Giulietta":378,"Mito":224,"Spider":24,"Sportwagon":2,"Stelvio":242,"alfasud":2,"brera":34,"crosswagon":2,"giulia":251,"giulietta":378,"gt":89,"gtv":7,"mito":224,"spider":24,"sportwagon":2,"stelvio":242}
Example of the filter option list returned from function call with make parameter value alfa-romeo:
['145', '146', '147', '155', '156', '159', '164', '166', '33', 'Alfasud', 'Brera', 'Crosswagon', 'GT', 'GTV', 'Giulia', 'Giulietta', 'Mito', 'Spider', 'Sportwagon', 'Stelvio']
Sample of fiddler request:
Sample of ajax response html containing options:
<section id="body-container" class="om-offers-list"
data-facets='{"offer_seek":{"offer":2198},"private_business":{"business":1326,"private":872,"all":2198},"categories":{"29":2198,"161":953,"163":953},"categoriesParent":[],"filter_enum_model":{"145":1,"146":1,"147":219,"155":1,"156":116,"159":561,"164":2,"166":37,"33":1,"Alfasud":2,"Brera":34,"Crosswagon":2,"GT":88,"GTV":7,"Giulia":251,"Giulietta":380,"Mito":226,"Spider":25,"Sportwagon":2,"Stelvio":242,"alfasud":2,"brera":34,"crosswagon":2,"giulia":251,"giulietta":380,"gt":88,"gtv":7,"mito":226,"spider":25,"sportwagon":2,"stelvio":242},"new_used":{"new":371,"used":1827,"all":2198},"sellout":null}'
data-showfacets=""
data-pagetitle="Alfa Romeo samochody osobowe - otomoto.pl"
data-ajaxurl="https://www.otomoto.pl/osobowe/alfa-romeo/?search%5Bbrand_program_id%5D%5B0%5D=&search%5Bcountry%5D="
data-searchid=""
data-keys=''
data-vars=""
Alternative version of function without regex:
from bs4 import BeautifulSoup as bs
def getOptions(make): #function to return options based on make
data = {
'search[filter_enum_make]': make,
'search[dist]' : '5',
'search[category_id]' : '29'
}
r = requests.post('https://www.otomoto.pl/ajax/search/list/', data = data, headers = headers)
soup = bs(r.content, 'lxml')
data = soup.select_one('#body-container')['data-facets'].replace('null','"null"')
aDict = ast.literal_eval(data)['filter_enum_model'] #convert string representation of dictionary to python dictionary
d = len({k.lower(): v for k, v in aDict.items()}.keys()) #find length of unique keys when accounting for case
dirtyList = list(aDict)[:d] #trim to unique values
cleanedList = [item for item in dirtyList if item != 'other' ] #remove 'other' as doesn't appear in dropdown
return cleanedList
print(getOptions('alfa-romeo'))
R conversion and improved python:
Whilst converting to R I found a better way of extracting the parameters from a js file on the server. If you open dev tools you can see the file listed in the sources tab.
R (To be improved):
library(httr)
library(jsonlite)
url <- 'https://www.otomoto.pl/ajax/jsdata/params/'
r <- GET(url)
contents <- content(r, "text")
data <- strsplit(contents, "var searchConditions = ")[[1]][2]
data <- strsplit(as.character(data), ";var searchCondition")[[1]][1]
source <- fromJSON(data)$values$'573'$'571'
makes <- names(source)
for(make in makes){
print(make)
print(source[make][[1]]$value)
#break
}
Python:
import requests
import json
import pandas as pd
r = requests.get('https://www.otomoto.pl/ajax/jsdata/params/')
data = r.text.split('var searchConditions = ')[1]
data = data.split(';var searchCondition')[0]
items = json.loads(data)
source = items['values']['573']['571']
makes = [item for item in source]
results = {}
for make in makes:
df = pd.DataFrame(source[make]) ## build a dictionary of lists to hold options for each make
results[make] = list(df['value'])
dfFinal = pd.DataFrame.from_dict(results,orient='index').transpose() # turn into a dataframe and transpose so each column header is the make and the options are listed below
mask = dfFinal.applymap(lambda x: x is None) #tidy up None values to empty strings https://stackoverflow.com/a/31295814/6241235
cols = dfFinal.columns[(mask).any()]
for col in dfFinal[cols]:
dfFinal.loc[mask[col], col] = ''
print(dfFinal)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex