Datascraping with BeatufilSoup - web-scraping

I am trying to collect "SMILES" codes of molecules from www.chemspider.com web site.
For instance, I would like to search the ChemSpider database with keyword "peroxide"
and it returns a table of results and subpages
So as you see each subpage with links above contains data of molecules.
I started with BeautifulSoup module but I have no idea how to move from there for example how can access the "2" page of the table?
Also sometimes these results come folded, just shows header of each row in the table.
Thanks in advance.

Related

WordPress search in custom table data

I am using WordPress. Is it possible to search function in a custom table data on front end?
Suppose I have a table arabic list and I have 7,000 rows in this table. Is there any plugin that can find the row data in this table and show the result of that row data? Something like this plugin:
https://wordpress.org/plugins/custom-searchable-data-entry-system/
This plugin has only one problem: Suppose I save this data in a row:
اور ان عورتوں کے شر سے جو گرہوں میں پھونکتی ہیں
When user search this word عورتوں, on front-end, it shows no result found, however, if user search اور, then it shows the full line in the result. It seems that this plugin match only first word, whereas, I want to show full row data when any word we write in the search bar.

Extracting table from a webpage in automation anywhere

Is there a way to extract a table from a web page in Automation Anywhere after taking certain steps using web recorder. The table does not appear directly, it appears after clicking few controls after launching the URL.
The table that I want to extract is coming after loggin in to the website and filtering using a control for search criteria.
I used web recorder to login and putting the desired search criteria in a text field and I want to extract the table now. When I use web recorder, it launches the URL again and takes me back to the login page which I dont want. I want the bot to stay on the page. Pls help.
Also, what is the significance of session name of an extracted table?
If you clicked on Advanced View, you will find at Step 5 : to run this command using an existing IE window. Try to write the URL of the page with the table and not the one of the login page.
The extracted table is to be used using variable $Table Column(Index)$ with index being the column number or column name
you can export directly using object cloning and in the selection criteria export to csv file. But we need to click on html inner text also in search criteria
An old question, but my experience has been the Extract Data/Table commands are rather poor. Not only do they only work in IE, you cannot call them as commands, they have to be called via a web recording.
Instead, I've found it much more useful to object clone the initial element, grab the DOMXPath, and variablize that. Then throw it into a loop while command and set the condition on finding at least one element (of the elements for the table you are trying to build). You can grab all sorts of useful info in the object clone command and then right that to a variable/table.
For example
//div[#id='updatable-standings']/div[1]/div[1]/div[2]/div[1]/table[1]/tbody[1]/tr[3]/td[2]/div[1]/span[2]
//div[#id='updatable-standings']/div[1]/div[1]/div[2]/div[1]/table[1]/tbody[1]/tr[4]/td[2]/div[1]/span[2]
I can create a incremental variable for {tr[3]} and call it $vTeamLoop$ and change my DOMXPath value in the Object Clone to be
//div[#id='updatable-standings']/div[1]/div[1]/div[2]/div[1]/table[1]/tbody[1]/tr[$vTeamLoop$]/td[2]/div[1]/span[2]
Ultimately, it is more steps than the Data/Table Extract command, but it is far less limited in scope.
Hope that helps.
enter code here

Accessing imported data in google appmaker

I have managed to create a data model, and in doing so import data variables and values from my googlespreadsheets (by clicking on 'import data from sheet'). My table is of the form:
table = {"age": {15,22}, "name": {"ted", "sally"}, "surname":{"anderson","medina"}}
I would like to have a table that uploads this on a page, and then I can click on each row which would then open up another page and I could edit the contents
I open a new page and drag a table in and link it to my data model. However, it only shows the variable titles and not the actual data. How can I get the data to automatically upload into the table?
I read alot of the tutorials but they all assume I want to start with only headers, and then manually input the data to create a data entry table. However, my table updates automatically in googlesheets, so I would like to import it to appmaker, and then have the ability to click on each row and add notes/edits. Any help would be greatly appreciated!! thanks
At this time App Maker doesn't support spreadsheets as data backend. You'll need first import all data to App Maker's models and then play with it in deployed application. You can find all pieces of the puzzle in Vendor Ratings template:
https://developers.google.com/appmaker/templates/vendor-rating/
Your actual data won't come up in your editor view, only when you preview it.
It seems to me that you could make this using an embedded Google sheet in your page.
Ex:
-drag an html block on the page
-set the content to something like:
<iframe src="[link to your google sheet]" height="1380" width="1100"></iframe>
*get the embed link from 'Publish to the Web'
Hope that helps, it might not be what you're looking for.

Scrapy returning numbers and letters instead of "?" for href value

I am trying to scrape a web forum using Scrapy for the href link info and when I do so, I get the href link with many letters and numbers where the question mark should be.
This is a sample of the html document that I am scraping:
I am scraping the html data for the href link using the following code:
response.xpath('.//*[contains(#id, "thread_title")]/#href').extract()
When I run this, I get the following results:
[u'showthread.php?s=f969fe6ed424b22d8fddf605a9effe90&t=2676278']
What should be returned is:
[u'showthread.php?t=2676278']
I have ran other tests scraping for href data with question marks elsewhere in the document and I also get the "s=f969fe6ed424b22d8fddf605a9effe90&" returned.
Why am I getting this data returned with the "s=f969fe6ed424b22d8fddf605a9effe90&" instead of just the question mark?
Thanks!
It seems that the site I am scraping from uses a unique identifier in order to more accurately update the number of views per the thread. I was not able to return scraped data without a unique id, it changed over time, and scraped a different HTML tag for the thread ID and then joined it to the web address (showthread.php?t=) to create the link I was looking for.

How to use Facebook Graph api

I am using facebook graph Api for my application where i want the data of this page.
http://www.facebook.com/pages/
In this page there is option TV SHOW I want collect all information of that page.
But i didn't any graph api method for this page.
Please help me for this problem.
As far as I can tell, this isn't something you can do. It has to do with what Facebook sets as "indexable" columns in their tables. For example, the page table has a type ID and the TV SHOWS category ID appears to be 89, so you'd think you could run a FQL query such as:
https://api.facebook.com/method/fql.query?query=SELECT page_id, name FROM page WHERE type=89
or perhaps:
https://api.facebook.com/method/fql.query?query=SELECT page_id, name FROM page WHERE type='TV SHOW'
But you'll get this notice:
Your statement is not indexable. The
WHERE clause must contain an indexable
column.
Unfortunately, the only two indexable columns for WHERE statements using the page table are page_id and name. So you can only query this table for one page at a time. Which makes sense when you consider how expensive a search like that would be on billions of records.
You can do a generic graph search for "TV SHOWS" but it won't give the the results on the page index as you requested:
https://graph.facebook.com/search?q=TV%20SHOWS&type=page
The Graph API is more about introspection/connections between objects than an open search protocol.

Resources