I apologize if I duplicated the topic, but everything I've tried so far doesn't help.
I am trying to scraping some details data from each links from ads.
After opened few links, I get each next time blank page without any data.
Also, if I stop script and start it again I can't do it(because I received blank page on the start) and need to wait few minutes.
This is code and can you please said me where I made mistake.
Thank you :)
const puppeteer = require("puppeteer-extra");
(async () => {
const browser = await puppeteer.launch({
headless: false,
args: [
"--no-sandbox",
"--disable-setuid-sandbox",
"--disable-infobars",
"--window-position=0,0",
"--ignore-certifcate-errors",
"--ignore-certifcate-errors-spki-list",
'--user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3312.0 Safari/537.36"',
],
});
const page = await browser.newPage();
await page.goto(
"https://www.ebay-kleinanzeigen.de/s-immobilien/01309/c195l23973"
);
const banner = (await page.$("#gdpr-banner-accept")) !== null;
if (banner) {
await Promise.all([page.click("#gdpr-banner-accept")]);
}
let isBtnDisabled = false;
while (!isBtnDisabled) {
const productsHandles = await page.$$(".ad-listitem.lazyload-item");
for (const producthandle of productsHandles) {
try {
link = await page.evaluate(
(el) => el.querySelector("a.ellipsis").href,
producthandle
);
const eachPage = await browser.newPage();
await eachPage.goto(link, {
waitUntil: "networkidle0",
});
await eachPage.waitForSelector("#viewad-price", {
visible: true,
});
const price = await eachPage.$eval(
"#viewad-price",
(price) => price.textContent
);
console.log(price);
eachPage.close();
} catch (error) {
console.log(error);
}
}
await page.waitForSelector("#srchrslt-pagination > div", { visible: true });
const is_disabled =
(await page.$(
"#srchrslt-pagination > div > div.pagination-nav > .pagination-next"
)) === null;
isBtnDisabled = is_disabled;
if (!is_disabled) {
await Promise.all([
page.click(".pagination-nav > .pagination-next"),
page.waitForNavigation({ waitUntil: "networkidle2" }),
]);
}
}
await browser.close();
})();
Related
I am trying to fetch data from Algolia database (index.search is similar to fetch) in useEffect,but then I find the order it execute is not the way I think.I console "queryNews1", "queryNews2", ..."queryNews6" in async function queryNews(), and I think they will sequentially print out in console(see image below). But I find that after queryNews2, it "jump out" of queryNews() but execute the code outside queryNews(), after console.log("5"), it go back to execute "queryNews3".
I guess it's an asychronous issue, so I wrap queryNews() inside an another async function const getData = async () => { await queryNews(keyword); }; and call getData(), but it's still execute in wrong way.Why and does anybody know how to fix that??
Sorry for my bad English!
mobile in the console is writing in articleState.map(() => { console.log("mobile"); return (); });
console results image
const [articleState, setArticles] = useState<ArticleType[]>([]);
useEffect(() => {
console.log("1");
if (windowResized === "large" || windowResized === undefined) return;
let isFetching = false;
let isPaging = true;
let paging = 0;
console.log("2");
async function queryNews(input: string) {
console.log("queryNews1");
isFetching = true;
setIsLoading(true);
setSearchState(true);
setPageOnLoad(true);
console.log("queryNews2");
const resp = await index.search(`${input}`, {
page: paging,
});
console.log("queryNews3");
const hits = resp?.hits as ArticleType[];
setTotalArticle(resp?.nbHits);
console.log("queryNews4");
setArticles((prev) => [...prev, ...hits]);
console.log("queryNews5");
setIsLoading(false);
console.log("queryNews6");
paging = paging + 1;
if (paging === resp?.nbPages) {
isPaging = false;
setScrolling(true);
return;
}
console.log("queryNews7");
isFetching = false;
setSearchState(false);
setPageOnLoad(false);
console.log("queryNews8");
}
console.log("3");
async function scrollHandler(e: WheelEvent) {
if (window.innerHeight + window.scrollY >=
document.body.offsetHeight - 100) {
if (isFetching || !isPaging) return;
console.log("scrollHandler");
getData();
}
}
const getData = async () => {
await queryNews(keyword);
};
getData()
console.log("4");
window.addEventListener("wheel", scrollHandler);
console.log("5");
return () => {
window.removeEventListener("wheel", scrollHandler);
};
}, [keyword, setSearchState, windowResized]);
Thanks to kim3er, that really help.
But simillar situation happened again when I scroll, it console.log("queryNews2"), and then it console "mobile", which means it render the component again, and then go back to queryNews() to finish execute the rest of the function?Why didn't it wait while I already put all the code in
getData().then(() => {
console.log("6");
window.addEventListener("wheel", scrollHandler);
console.log("7");
});
Thanks!!
(stack overflow suddenly said I can't embed image now so I paste an image link instead)
https://imgur.com/a/lDKEzxy
useEffect(() => {
console.log("1");
if (windowResized === "large" || windowResized === undefined) return;
let isFetching = false;
let isPaging = true;
let paging = 0;
console.log("2");
async function queryNews(input: string) {
console.log("queryNews1");
isFetching = true;
setIsLoading(true);
setSearchState(true);
setPageOnLoad(true);
console.log("queryNews2");
const resp = await index.search(`${input}`, {
page: paging,
});
console.log("queryNews3");
const hits = resp?.hits as ArticleType[];
setTotalArticle(resp?.nbHits);
console.log("queryNews4");
setArticles((prev) => [...prev, ...hits]);
console.log("queryNews5");
setIsLoading(false);
console.log("queryNews6");
paging = paging + 1;
if (paging === resp?.nbPages) {
isPaging = false;
setScrolling(true);
return;
}
console.log("queryNews7");
isFetching = false;
setSearchState(false);
setPageOnLoad(false);
console.log("queryNews8");
}
console.log("3");
async function scrollHandler(e: WheelEvent) {
if (
window.innerHeight + window.scrollY >=
document.body.offsetHeight - 100
) {
if (isFetching || !isPaging) return;
console.log("scrollHandler");
getData().then(() => {
console.log("6");
window.addEventListener("wheel", scrollHandler);
console.log("7");
});
}
}
const getData = async () => {
await queryNews(keyword);
};
getData().then(() => {
console.log("4");
window.addEventListener("wheel", scrollHandler);
console.log("5");
});
return () => {
window.removeEventListener("wheel", scrollHandler);
};
}, [keyword, setSearchState, windowResized]);
You're calling getData() without awaiting it. Because of this, it'll run parallel to the top level useEffect code. This isn't an issue, if getData() is the last line of code in the function.
But if you do need that initial getData() to complete before hitting console.log("4");, you could do this:
getData()
.then(() => {
console.log("4");
window.addEventListener("wheel", scrollHandler);
console.log("5");
});
From console.log("4"); will run after the call to getData().
Clarifier on async and .then()
With this function in mind:
const doSomething = async () => {
// Do something async
console.log('during');
});
The following:
const asyncFunc = async () => {
console.log('before');
await doSomething();
console.log('after');
});
Is equivalent to:
const asyncFunc = () => {
console.log('before');
doSomething()
.then(() => {
console.log('after');
});
});
Either will return:
before
during
after
However, if you used:
const syncFunc =() => {
console.log('before');
doSomething();
console.log('after');
});
Becasue I have not awaited doSomething(), I have created a race condition, whereby during and after could be returned in a different order depending on how long it took the async code to complete. Because the syncFunc script will continue running as soon as doSomething has been called (but crucially, has not finished).
So you could get:
before
after
during
Wrapping await queryNews(keyword); in another function called getData() does not make the function synchronous, it just means that you don't have to keep typing in the keyword parameter. You still need to await getData(), in order to ensure completion before continuing.
#kim3er
Thank you for your detailed answer, and sorry for my late response.
According to your explanation about difference between async/await and .then(),
I found that I didn't await getData() in scrollHandler(), so I modified my code again, I call queryNews(keyword) this time instead, but then something weird still exist.
The console.log sequence is right when first load(No1~10 in image below),and then when I scroll, I call scrollHandler(), and inside scrollHandler, await queryNews() is executed, but then again,it console to "queryNews2", and then it "jump out" of queryNews() to render component again I guess, because the word "mobile" in JSX tag is console before "queryNews3"(No13~18 in image below).
I use await queryNews(keyword) in scollHandler() this time, but it still has wrong console sequence.Why?Does it has anything to do with setState inside queryNews()?Because as far as I know, setState will trigger a component re-render?
Thank you for answer my question patiently.
console image
useEffect(() => {
console.log("1");
if (windowResized === "large" || windowResized === undefined) return;
let isFetching = false;
let isPaging = true;
let paging = 0;
console.log("2");
async function queryNews(input: string) {
console.log("queryNews1");
isFetching = true;
setIsLoading(true);
setSearchState(true);
setPageOnLoad(true);
console.log("queryNews2");
const resp = await index.search(`${input}`, {
page: paging,
});
console.log("queryNews3");
const hits = resp?.hits as ArticleType[];
setTotalArticle(resp?.nbHits);
console.log("queryNews4");
setArticles((prev) => [...prev, ...hits]);
console.log("queryNews5");
setIsLoading(false);
console.log("queryNews6");
paging = paging + 1;
if (paging === resp?.nbPages) {
isPaging = false;
setScrolling(true);
return;
}
console.log("queryNews7");
isFetching = false;
setSearchState(false);
setPageOnLoad(false);
console.log("queryNews8");
}
console.log("3");
async function scrollHandler(e: WheelEvent) {
if (
window.innerHeight + window.scrollY >=
document.body.offsetHeight - 100
) {
if (isFetching || !isPaging) return;
console.log("scrollHandler");
await queryNews(keyword);
console.log("end scrollHandler");
}
}
queryNews(keyword).then(() => {
console.log("4");
window.addEventListener("wheel", scrollHandler);
console.log("5");
});
return () => {
window.removeEventListener("wheel", scrollHandler);
};
}, [keyword, setSearchState, windowResized]);
I am trying the contents of iFrame and the iFrame doesn't have an src or URL. It has id element. Below is code I am trying to use . Is there anything I am missing here ? Content is empty
await page.goto("https://sites.google.com/view/pinnednote/home", { waitUntil: 'load', timeout: 30000 });
const myFrames = await page.frames();
console.log("Parent IFrame = "+myFrames.length)
for ( x of myFrames) { // Getting all iFrames
try {
const frameElement = await x.frameElement();
const contentFrame = await frameElement.contentFrame();
console.log(await contentFrame.content());
} catch(error){
console.log(error)
}
childs = await x.childFrames();
console.log("Child IFrame = "+childs.length)
for ( y of childs) {
const frameElement = await y.frameElement();
const contentFrame = await frameElement.contentFrame();
console.log(await contentFrame.content());
}
}
Try this:
await page.goto("https://sites.google.com/view/pinnednote/home", { waitUntil: 'load', timeout: 30000 });
const myFrames = await page.frames();
console.log("Parent IFrame = "+myFrames.length)
for ( x of myFrames) { // Getting all iFrames
try {
const frameContent = await x.content();
console.log(frameContent)
} catch(error){
console.log(error)
}
}
You already have the frames in your myFrames array. When you are making the loop you only need to take the content.
I am playing around with Sveltekit and I am struggling a bit..
So my problem is, when I add something to the DB it works, but the new Item does not show in the list until i Refresh the page. My Code looks like:
index.js
import { connectToDatabase } from '$lib/db';
export const post = async ({ request}) => {
const body = await request.json()
console.log(body)
const dbConnection = await connectToDatabase();
const db = dbConnection.db;
const einkaufszettel = db.collection('Einkaufszettel')
await einkaufszettel.insertOne({
name: body.newArticle
});
const einkaufsliste = await einkaufszettel.find().toArray();
return {
status: 200,
body: {
einkaufsliste
}
};
}
export const get = async () => {
const dbConnection = await connectToDatabase();
const db = dbConnection.db;
const einkaufszettel = db.collection('Einkaufszettel')
const einkaufsliste = await einkaufszettel.find().toArray();
console.log(einkaufsliste)
return {
status: 200,
body: {
einkaufsliste,
}
};
}
and the Script of index.svelte
<script>
import Title from '$lib/title.svelte'
export let einkaufsliste = []
let newArticle = ''
const addArticle = async () => {
const res = await fetch('/', {
method: 'POST',
body: JSON.stringify({
newArticle
}),
headers: {
'Content-Type': 'application/json'
}
})
fetchArticles()
}
async function fetchArticles() {
const res = await fetch('/')
console.log(res.body)
}
</script>
In the Network Preview Tab the new Item is already added to the List.
As you can read here, you need to reassign the einkaufsliste after fetching the list of elements from the API.
You can do this in your fetchArticles method, like this:
async function fetchArticles() {
einkaufsliste = await fetch('/')
}
It sounds so weird to me and I have no idea what's wrong here because everything is fine in a development environment. So the way app works are simple, user sign in, choose it's therapist then pay for it and after successful payment, booking is confirmed, but the problem is booking is being booked exactly 3 times in firebase real-time database no matter what and I don't know why... (in the development area all is fine and it's gonna book just once as the user requested)
here's my code of booking:
const bookingHandler = () => {
Linking.openURL('http://www.medicalbookingapp.cloudsite.ir/sendPay.php');
}
const handler = (e) => handleOpenUrl(e.url);
useEffect(() => {
Linking.addEventListener('url', handler)
return () => {
Linking.removeEventListener('url', handler);
}
});
const handleOpenUrl = useCallback((url) => {
const route = url.replace(/.*?:\/\/\w*:\w*\/\W/g, '') // exp://.... --> ''
const id = route.split('=')[1]
if (id == 1) {
handleDispatch();
toggleModal();
} else if (id == 0) {
console.log('purchase failed...');
toggleModal();
}
});
const handleDispatch = useCallback(() => {
dispatch(
BookingActions.addBooking(
therapistId,
therapistFirstName,
therapistLastName,
selected.title,
moment(selectedDate).format("YYYY-MMM-DD"),
selected.slots,
)
);
dispatch(
doctorActions.updateTherapists(therapistId, selected.slots, selectedDate, selected.title, selectedPlanIndex, selectedTimeIndex)
);
setBookingConfirm(true)
})
booking action:
export const addBooking = (therapistId, therapistFirstName, therapistLastName, sessionTime, sessionDate, slotTaken) => {
return async (dispatch, getState) => {
let userId = firebase.auth().currentUser.uid
const confirmDate = moment(new Date()).format("ddd DD MMMM YYYY")
const response = await fetch(
`https://mymedicalbooking.firebaseio.com/bookings/${userId}.json`,
{
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify({
userId,
therapistId,
confirmDate,
therapistFirstName,
therapistLastName,
sessionTime,
sessionDate,
slotTaken
})
}
);
if (!response.ok) {
throw new Error('Something went wrong!');
}
const resData = await response.json();
dispatch({
type: ADD_BOOKING,
bookingData: {
userId: userId,
therapistId: therapistId,
therapistFirstName: therapistFirstName,
therapistLastName: therapistLastName,
sessionTime: sessionTime,
sessionDate: sessionDate
}
});
};
};
Booking reducer:
const initialState = {
bookings: [],
userBookings: []
};
export default (state = initialState, action) => {
switch (action.type) {
case ADD_BOOKING:
const newBooking = new Booking(
action.bookingData.id,
action.bookingData.therapistId,
action.bookingData.therapistFirstName,
action.bookingData.therapistLastName,
action.bookingData.bookingdate
);
return {
...state,
bookings: state.bookings.concat(newBooking)
};
case FETCH_BOOKING:
const userBookings = action.userBookings;
return {
...state,
userBookings: userBookings
};
}
return state;
};
also, I use expo, SDK 38, Firebase as a database.
I really need to solve this, please if you have any idea don't hesitate to leave a comment or answer all of them kindly appreciated.
UPDATE:
I commented out all deep linking functionality and test the result, it's all fine. so I think the problem is with the eventListener or how I implemented my deep linking code but I still don't figure out what's wrong with the code that does fine in expo and has a bug in stand-alone.
UPDATE 2
I tried to add dependency array as suggested but still I have same problem..
there is an issue in expo-linking which on the standalone detached android app: event url fires multiple times ISSUE
I just wrapped my handling function in lodash's debounce with 1000ms wait
install lodash like this
yarn add lodash
import _ from 'lodash';
const handleOpenUrl = _.debounce((event) => {
// here is other logic
},1000);
here is your code
just add an empty dependency array into useEffect and use useCallback like this
useEffect(() => {
Linking.addEventListener('url', handleOpenUrl)
return () => {
Linking.removeEventListener('url', handleOpenUrl);
}
},[]); //like this []
const handleOpenUrl = _.debounce((url) => {
const route = url.replace(/.*?:\/\/\w*:\w*\/\W/g, '') // exp://.... --> ''
const id = route.split('=')[1]
if (id == 1) {
handleDispatch();
toggleModal();
} else if (id == 0) {
console.log('purchase failed...');
toggleModal();
}
},1000); //like this []
const handleDispatch = useCallback(() => {
dispatch(
BookingActions.addBooking(
therapistId,
therapistFirstName,
therapistLastName,
selected.title,
moment(selectedDate).format("YYYY-MMM-DD"),
selected.slots,
)
);
dispatch(
doctorActions.updateTherapists(therapistId, selected.slots, selectedDate, selected.title, selectedPlanIndex, selectedTimeIndex)
);
setBookingConfirm(true)
},[selected])
I'm trying to crawl a webpage that has a h3 tag under an a tag. I'm getting the a tag just fine, but when trying to get the innerText of h3 I'm getting an undefined value.
This is what I'm trying to crawl:
const puppeteer = require('puppeteer');
const pageURL = "https://producthunt.com";
const webScraping = async pageURL => {
const browser = await puppeteer.launch({
headless: false,
arges: ["--no-sandbox"]
});
const page = await browser.newPage();
let dataObj = {};
try {
await page.goto(pageURL, { waitUntil: 'networkidle2' });
const publishedNews = await page.evaluate(() => {
const newsDOM = document.querySelectorAll("main ul li");
let newsList = [];
newsDOM.forEach(linkElement => {
const text = linkElement.querySelector("a").textContent;
const innerText = linkElement.querySelector("a").innerText;
const url = linkElement.querySelector("a").getAttribute('href');
const title = linkElement.querySelector("h3").innerText;
console.log(title);
newsList.push({
title,
text,
url
});
});
return newsList;
});
dataObj = {
amount: publishedNews.length,
publishedNews
};
} catch (e) {
console.log(e);
}
console.log(dataObj);
browser.close();
return dataObj;
};
webScraping(pageURL).catch(console.error);
Console log works great, but puppeteer throws:
Cannot read property 'innerText' of null
It looks like your solution is working just fine, but you're not controlling whether the h3 tag is null or not. Try adding an if statement before accessing the innerText attribute, or use the code I left below.
const puppeteer = require('puppeteer');
const pageURL = "https://producthunt.com";
const webScraping = async pageURL => {
const browser = await puppeteer.launch({
headless: false,
arges: ["--no-sandbox"]
});
const page = await browser.newPage();
let dataObj = {};
try {
await page.goto(pageURL, { waitUntil: 'networkidle2' });
const publishedNews = await page.evaluate(() => {
let newsList = [];
const newsDOM = document.querySelectorAll("main ul li");
newsDOM.forEach(linkElement => {
const aTag = linkElement.querySelector("a");
const text = aTag.textContent;
const innerText = aTag.innerText;
const url = aTag.getAttribute('href');
let title = aTag.querySelector("h3");
// there may be some <a> without an h3, control
// the null pointer exception here, accessing only
// if title is not 'null'.
if (title) title = title.innerText;
console.log(title);
// changed the object structure to add a key for each attr
newsList.push({
title: title,
text: text,
url: url
});
});
return newsList;
});
// changed the object structure to add a key for the array
dataObj = {
amount: publishedNews.length,
list: publishedNews
};
} catch (e) {
console.log(e);
}
console.log({receivedData: dataObj});
browser.close();
return dataObj;
};
webScraping(pageURL).catch(console.error);
Let me know if this fixes your problem!