Trying to scrape the following site:
https://israeldrugs.health.gov.il/#!/byDrug
You need to enter a search term in the form and press the blue button on the left.
However, failed with bs4 because it cannot find the form element.
Thanks for your help.
The data on this site is loaded dynamically, using javascript. If you dig into the XHRs (using the Developer tab in your browser), you'll see how this information is loaded into the page. BTW, the following assumes you're using python; if not you'll have to find an equivalent in another language.
import requests
import json
target = 'ATORVASTATIN AS CALCIUM' #this is just a random drug from their list
data = '{"val":"'+target+'","prescription":false,"healthServices":false,"pageIndex":1,"orderBy":0}'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:89.0) Gecko/20100101 Firefox/89.0',
'Accept': 'application/json, text/plain, */*',
'Accept-Language': 'en-US,en;q=0.5',
'Content-Type': 'application/json',
'Origin': 'https://israeldrugs.health.gov.il',
'Connection': 'keep-alive',
'Referer': 'https://israeldrugs.health.gov.il/',
}
response = requests.post('https://israeldrugs.health.gov.il/GovServiceList/IDRServer/SearchByName', headers=headers, data=data)
#load the json response
meds = json.loads(response.text)
#a random item from the 8th (random, again) drug in the response
meds['results'][7]['dragHebName']
output:
'טורבה 10'
dunno if its still relevant but i was able scraping the whole database (or at least most of it) with node js & puppeteer (needed it for a personal project).
// after intalling puppeteer with npm i
const puppeteer = require("puppeteer");
// using fulesystem to save scraped data to json file
const fs = require('fs');
const scrape = async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// go to the url and wait until it loads
await
page.goto(`https://israeldrugs.health.gov.il/#!/byDrug`, {
waitUntil: 'networkidle2',
timeout : 0
});
const searchInputSelector = '#homeCtrl > div.display-wrapper > ui-view > div > div > div.search-textbox-container > idr-search-textbox > div:nth-child(1) > div > form > input'
await page.waitForSelector(searchInputSelector)
// grabbing search input field and typing 'a' inside. this will fetch most of the data base (as it searches char at the WHOLE string of the commercial & generic names
await page.type(searchInputSelector, 'a')
const searchButtonSelector = '#homeCtrl > div.display-wrapper > ui-view > div > div > div.search-textbox-container > idr-search-textbox > div:nth-child(1) > div > div'
// grabbing search button & clicking it
await page.click(searchButtonSelector)
// starting to create database array
fs.writeFileSync('israMeds.json', '[')
const pagesSelector = '#homeCtrl > div.display-wrapper > ui-view > div > search-list > div > div > div.compareAndSortBarWrap > div > div.checkbox.selectAll > span'
await page.waitForSelector(pagesSelector)
// grabbing number of results. divided by 10 it will give number of pages to scrape
const pagesNum = await page.evaluate(() => {
const pagesSelector = '#homeCtrl > div.display-wrapper > ui-view > div > search-list > div > div > div.compareAndSortBarWrap > div > div.checkbox.selectAll > span'
return +document.querySelector(pagesSelector).textContent.trim().split('').filter(x => !isNaN(x)).join('').trim() / 10 + 1
})
const roundPages = Math.floor(pagesNum)
for (let i = 0; i < roundPages; i++){
if (i !== 0 ) fs.appendFileSync('israMeds.json',',')
await page.waitForNetworkIdle(page,500,0)
const elements = await page.evaluate(() => {
const resultSelector = '#homeCtrl > div.display-wrapper > ui-view > div > search-list > div > div > div.search_wrap.ng-scope > div'
return
// grabbing results selector and looping over them. here you can choose your desired data by getting the field's html selector
[...document.querySelectorAll(resultSelector)]
.map(el => {
const hebTitleSelector = 'div.infoText > div > div > div.firstRowTitle.ng-binding'
const engTitleSelector = 'div.infoText > div > div > div > span'
const activeIngredientSelector = 'div.infoText > div > div > div.secondRowTitle.moreInfo.ng-binding.ng-scope'
// creating object out of the desired data
return JSON.stringify({
drugHebTitle : el.querySelector(hebTitleSelector).textContent,
drugEngTitle : el.querySelector(engTitleSelector).textContent,
activeIngredient : el.querySelector(activeIngredientSelector).textContent.trim().split(' ')[2]
})
})
})
// adding data to external json file
fs.appendFileSync('israMeds.json',elements.toString())
const nextPageSelector = '#homeCtrl > div.display-wrapper > ui-view > div > search-list > div > div > div.text-center > ul > li:nth-child(8) > a'
// moving to next result page
await page.click(nextPageSelector)
}
await browser.close();
fs.appendFileSync('israMeds.json', ']')
};
Related
I'm trying to get the image src from wxml. All of the image src should be obtained and replaced.
What should replace document.getElementsByTagName in Wechat Mini?
js:
var that = this;
var query = wx.createSelectorQuery();
query.select('.classname').boundingClientRect(function (params) {
cosnole.log(params)
})
As for your original question, perhaps you could use a "custom property" (data-xxx) to get your src
<image src="{{src}}" data-src="{{src}}" bindtap="ontap" />
js:
Page({
data: {
src:'1.jpg'
},
ontap:function(e) {
const {src} = e.currentTarget.dataset;
console.log(src);
}
})
This question already has answers here:
Warning: Text content did not match. Server: "I'm out" Client: "I'm in" div
(5 answers)
Closed 8 months ago.
So what to do when client has extra, different info than server?
i.e. read something from localStorage, and display it? Of course countent is different. Why this hydration error come?
Error: Text content does not match server-rendered HTML.
const getTempUserShortId = () => {
if (typeof window === 'undefined') {
return ''
} else {
let tempUserShortId = localStorage.getItem('tempUserShortId')
if (tempUserShortId === null) {
tempUserShortId = randomString(4)
localStorage.setItem('tempUserShortId', tempUserShortId)
}
return tempUserShortId
}
}
Son what is the fundamental issue here?
Found here: https://nextjs.org/docs/messages/react-hydration-error
like:
const [tempUserShortId, setTempUserShortId] = useState('')
useEffect(() => setTempUserShortId(getTempUserShortId()), [])
I am using React.createElement(...) to dynamically generate a basic react site using a json file.
The json file is array of JSON objects which represent an element (and its children). The JSON object for an individual element looks like this.
{
"id": "e960f0ad-b2c5-4b0b-9ae0-0f6dd19ca27d",
"render": true,
"component": "small",
"styles":[],
"classes":[],
"children": [
"©2017",
{
"id": "04fa3b1a-2fdd-4e55-9492-870d681187a4",
"render": true,
"component": "strong",
"styles":[],
"classes":[],
"children": [
"Awesome Company"
]
},
", All Rights Reserved"
]
}
this object "should" ideally generate the following html code
<small>©2017 <strong>Awesome Company</strong>, All Rights Reserved</small>
I have created react components to render these html elements for example
Small.jsx
import React from 'react'
const Small = ({ id, className, style, children }) => {
return (
<small id={id} style={style} className={className}>
{children}
</small>
)
}
export default Small
Similarly I have created jsx for other html elements like anchor tag, div, footer, button etc
There is renderer.js which is called by App.js to render each component in the json file
import React from 'react'
import Button from "../components/Button";
import Input from "../components/Input";
import Header from "../components/header/Header";
import Footer from "../components/footer/Footer";
import Div from "../components/containers/Div";
import Article from "../components/containers/article/Article";
import Fragment from "../components/containers/Fragment";
import Anchor from "../components/Anchor";
import Nav from "../components/Nav";
import Heading1 from "../components/typography/Heading1";
import Strong from "../components/typography/Strong";
import Small from "../components/typography/Small";
const componentMap = {
button: Button,
input: Input,
header: Header,
footer: Footer,
div: Div,
article: Article,
fragment: Fragment,
a: Anchor,
nav: Nav,
h1: Heading1,
small: Small,
strong: Strong
};
const generateComponentStyles = (styles) => {
let mappedStyles = {};
styles.forEach(style => {
mappedStyles[style.name] = style.value;
});
return mappedStyles;
}
function renderer(config) {
if (typeof componentMap[config.component] !== "undefined") {
//creating children array for this element
let elementChildren = [];
if(config.children && config.children.length > 0) {
for (let index = 0; index < config.children.length; index++) {
const child = config.children[index];
if(typeof config.children === "string") {
elementChildren.push(child);
} else {
elementChildren.push(renderer(child));
}
}
}
return React.createElement(
componentMap[config.component],
{
id: config.id,
key: config.id,
className: config.classes ? config.classes : null,
style: config.styles ? generateComponentStyles(config.styles) : null
},
elementChildren
);
}
}
export default renderer;
The problem is even when the <small> tag is generated with the inner html of <strong> but it skips inserting the pure text within the small and strong tags, so the elements come out as empty.
This is what is generated on the site
<small id="e960f0ad-b2c5-4b0b-9ae0-0f6dd19ca27d" class=""><strong id="04fa3b1a-2fdd-4e55-9492-870d681187a4" class=""></strong></small>
As you can see it did not insert the text.
If I however do not supply an array to the "children" attribute but just give a simple string then it shows up. This same thing happens on an anchor tag.
Based on this link: https://reactjs.org/docs/react-api.html#createelement the createElement takes the third param as array of children.
I even tried to wrap my text in an empty fragment to pass it but it did not work either.
How do I insert a plain text within the inner html of an anchor, small, strong etc when there are other children to be inserted as well?
This might be a simple one, because as far as I understood it mainly works with the exemption of the string nodes.
Please check the condition:
typeof config.children === "string"
should be
typeof child === "string"
Using the Firefox Addon SDK, I am creating a toolbar with several buttons and I want to create a mouseover effect for the buttons.
At first I thought to use a mouseover event, but then I would have to create a mouseout event to return it to normal, so I figured the best way would be to use css
In my old XUL version of my addon I was able to attach the stylesheet by linking to it in the XUL code and just add css for my #buttonID, which worked perfectly.
But how do I add the css stylesheet for my toolbar using the Addon SDK?
Here's what I've tried so far (which does not produce any errors), but I think this is just for content; if this is correct, then I'm not sure how to bind to the element:
const { browserWindows } = require("sdk/windows");
const { loadSheet } = require("sdk/stylesheet/utils");
//This is how to load an external stylesheet
for(let w of browserWindows){
loadSheet(viewFor(w), "./myStyleSheet.css","author" );
}
I've also tried this:
var Style = require("sdk/stylesheet/style").Style;
let myStyle = Style({source:'./myStyleSheet.css'});
for(let w of browserWindows){
attachTo(myStyle, viewFor(w))
};
And this:
var { attach, detach } = require('sdk/content/mod');
const { browserWindows } = require("sdk/windows");
var { Style } = require('sdk/stylesheet/style');
var stylesheet = Style({
uri: self.data.url('myStyleSheet.css')
});
for(let w of browserWindows){
attach(stylesheet, viewFor(w))
};
And here is my css:
#myButton:hover{list-style-image(url("./icon-16b.png")!important; }
Tested this in Browser Toolbox:
const { require } = Cu.import("resource://gre/modules/commonjs/toolkit/require.js"); // skip this in SDK
const { browserWindows: windows } = require("sdk/windows");
const { viewFor } = require("sdk/view/core");
const { attachTo } = require("sdk/content/mod");
const { Style } = require("sdk/stylesheet/style");
let style = Style({ source: "#my-button{ display: none!important; }" });
// let self = require("sdk/self");
// let style = Style({ uri: self.data.url("style.css") });
for (let w of windows)
attachTo(style, viewFor(w));
The commented part allows to load from a stylesheet file in the addon data directory.
Notice that you need to import SDK loader to use it in the toolbox.
When in an SDK addon, just use require directly.
NB: there is a difference in spelling: self.data.url vs { uri }
See self/data documentation.
NB2: SDK uses a custom widget ID scheme for toggle and action buttons so your button ID might not be what you expect:
const toWidgetId = id =>
('toggle-button--' + addonID.toLowerCase()+ '-' + id).replace(/[^a-z0-9_-]/g, '');
OR
const toWidgetId = id =>
('action-button--' + addonID.toLowerCase()+ '-' + id).replace(/[^a-z0-9_-]/g, '');
using this code, you should be able to use the mouse over or hover to change how it looks.
#buttonID {
//Normal state css here
}
#buttonID:hover {
//insert css stuff here
}
This goes in the javascript file:
const { browserWindows } = require("sdk/windows");
const { viewFor } = require("sdk/view/core");
const { loadSheet } = require("sdk/stylesheet/utils");
const { ActionButton } = require("sdk/ui/button/action");
var StyleUtils = require('sdk/stylesheet/utils');
var myButton = ActionButton({
id: "mybutton",
label: "My Button",
icon: { "16": "./icon-16.png", "32":"./icon-32.png", "64": "./icon-64.png" },
onClick: function(state) {
console.log("mybutton '" + state.label + "' was clicked");
}
});
//this is how you attach the stylesheet to the browser window
function styleWindow(aWindow) {
let domWin = viewFor(aWindow);
StyleUtils.loadSheet(domWin, "chrome://myaddonname/content/myCSSfile.css", "agent");
}
windows.on("open", function(aWindow) {
styleWindow(aWindow);
});
styleWindow(windows.activeWindow);
And here is the css for that
//don't forget to add the .toolbarbutton-icon class at the end
#action-button--mystrippedadonid-mybuttonid .toolbarbutton-icon,{
background-color: green;
}
There are several gotchas here.
First, as of this posting, you should not use capital letters in the id for the button because they get completely removed - only lowercase letters and hyphens are allowed.
The id of the element is not the same as the id you gave it in the button declaration. See below for how to come up with this identifier.
To specify content in the url for the stylesheet file (in the loadSheet function call) you will also need to create a chrome.manifest in the root of your addon folder, and put this in it: content spadmintoolbar data/ where "data" is the name of a real directory in the root folder. I needed a data/ folder so I could load icons for the button declarations, but you need to declare your virtual directories in chrome.manifest which jpm init does not do for you.
How to get the element id for your css file:
The easy way to get the id for your button element for use in an external style sheet is by testing your addon and then using the browser-toolbox's inspector to locate the element, whence you can fetch the id from the outputted code.
However, if you want to figure it yourself, try this formula.
[button-class] = the sdk class for the button. An Action Button becomes action-button
[mybuttonid] = the id you gave the button in the sdk button declaration
[myaddonname] = the name you gave the addon in it's package.json file.
[strippedaddonid] = take the id you assigned the addon in the package.json file, and remove any # symbol or dots and change it to all lowercase.
Now put it all together (don't include the square brackets):
`#[button-class]--[strippedaddonid]-[mybuttonid]]`
An example: action-button--myaddonsomewherecom-mybutton
Really simple isn't it?!
credit for the stylesheet attach code goes to mconley
I am working on my first set of Cucumber tests in a Meteor app, but I cannot get the login step to work. My app uses a custom login plugin I wrote specifically for this project. Here is the step, as I currently have it defined with debug output:
this.Given(/^I am logged in as a\/an "([^"]*)"$/, function(roleName, callback) {
this.browser
.url(url.resolve(process.env.HOST, '/'))
.waitForExist('#appSignIn');
this.browser.getHTML('#appSignIn', function(err, html) {
if (err) {
console.log('err: ', err);
} else {
console.log('link HTML: ', html);
}
});
this.browser.getCssProperty('#appSignIn', 'display', function(err, value) {
console.log('link HTML display: ', value);
});
browser.isVisible('#appSignIn', function(err, isVisible) {
console.log('#appSignIn', isVisible);
});
this.browser
.waitForVisible('#appSignIn')
.click('#appSignIn')
.waitForExist('#username')
.waitForVisible('#username')
.setValue('#username', 'test' + roleName)
.setValue('#password', 'test' + roleName)
.leftClick('#signin')
.waitForExist('#appSignOut')
.waitForVisible('#appSignOut')
.call(callback);
});
What I am seeing is in this logs is this:
Scenario: # features/my.feature:11
Given The server data has been reset # features/my.feature:12
link HTML: <a id="appSignIn" href="/signin">Sign In</a>
link HTML display: { property: 'display',
value: 'block',
parsed: { type: 'ident', string: 'block' } }
#appSignIn false
And I am logged in as a/an "ADMIN" # features/my.feature:13
RuntimeError: RuntimeError
(ScriptTimeout:28) A script did not complete before its timeout expired.
Problem: Timed out waiting for asyncrhonous script result after 511 ms
Basically, I see the HTML output, so I know the element is there. I see the CSS is set to display: block, but then WebDriver reports the element is not visible with the isVisible, and similarly times out with the waitForVisible call. the "Sign In" link is part of a Bootstrap collapsible nav-bar, located in the upper-right.
The issue was simple: The default size of the viewport was too small, which was causing the Bootstrap nav element to collapse. I set the browser size to 1000x600 and it worked as expected.