How to export data from a Puppeteer script? - web-scraping

So, I got this script that collects titles from a news website.
The result of the scraping is pushed into the x empty array.
const puppeteer = require('puppeteer');
export let x = []
async function scrapeNewsTitles(url){
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
const [el] = await page.$x('/html/body/main/div[1]/div[1]/article/figure/a/img');
const src = await el.getProperty('src');
const srcTxt = await src.jsonValue();
console.log(srcTxt);
const [el2] = await page.$x('/html/body/main/div[1]/div[1]/article/div/h1/a');
const txt = await el2.getProperty('textContent');
const rawTxt = await txt.jsonValue();
const newArticle = {srcTxt, rawTxt};
x.push(newArticle);
browser.close();
console.log(x)
}
scrapeNewsTitles('https://www.lmneuquen.com');
What I want now is to export the x array, which contains the collected data, so I can use it in another script. The problem is that if I do this...
export let x = []
and then I import it into another file like this...
import {x} from './file.js'
...it gives me the following error:
SyntaxError: Cannot use import statement outside a module
Would you point me in the right direction to do it?
Thank you in advance! Have a nice day.

Instead of using "export let x = []", use "module.exports".
So, change your code to:
let x = []
and at the end of the code, write
module.exports = {"x": x};
When you import this array from the new file, use
let x = require("./index.js") //Instead of index.js, write the name of your first file.
console.log(x);
The reason why is that the keywords "export" and "import from" are used in Vanilla JS. However, the code that you are using is Node JS, so the structure will be slightly different.

Related

Uploaded image URL not being stored in firestore the first time I setDoc

I am trying to store the downloadLink from firebase's storage into firestore. I am able to set all the data, and I am able to set the link, the second time I click the "post" button.
I know the issue has to do with asynchronous functions, but I'm not experienced enough to know how to solve the issue.
In the "createPost" function, I am console logging "i am the URL: {url}" and in the "uploadFile" function, I am console logging "look at me {url}" to debug.
I noticed the "I am the URL" outputs nothing and then shortly after, the "look at me" outputs the URL.
setDoc() of course stores the imageLink as an empty string.
What can I do to solve this? Any help would be greatly appreciated or any documentation to help with my understanding of async functions.
Here is my code:
const PostModal = (props) => {
const makeid = (length) => {
var result = '';
var characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
var charactersLength = characters.length;
for ( var i = 0; i < length; i++ ) {
result += characters.charAt(Math.floor(Math.random() * charactersLength));
}
return result;
}
const [descriptionText, setDescriptionText] = useState("");
const [addressText, setAddressText] = useState("");
const [venueText, setVenueText] = useState("");
const [startTimeText, setStartTimeText] = useState("");
const [endTimeText, setEndTimeText] = useState("");
const [shareImage, setShareImage] = useState("");
const [videoLink, setVideoLink] = useState("");
const [assetArea, setAssetArea] = useState("");
const [url, setURL] = useState("");
const { data } = useSession();
const storage = getStorage();
const storageRef = ref(storage, `images/${makeid(5) + shareImage.name}`);
const uploadFile = () => {
if (shareImage == null) return;
uploadBytes(storageRef, shareImage).then( (snapshot) => {
//console.log("Image uploaded")
getDownloadURL(snapshot.ref).then( (URL) =>
{
setURL(URL);
console.log(`look at me: ${URL}`)});
});
}
const createPost = async () => {
var idLength = makeid(25);
const uploadTask = uploadBytesResumable(storageRef, file);
uploadFile()
console.log(`I am the URL: ${url} `)
setDoc(doc(db, "posts", idLength), {
eventDescription: descriptionText,
eventAddress: addressText,
venueName: venueText,
startTime: startTimeText,
endTime: endTimeText,
imageLink: url,
videoLink: videoLink,
username: data.user.name,
companyName: !data.user.company ? "" : data.user.company,
timestamp: Timestamp.now(),
});
}
const handleChange = (e) => {
const image = e.target.files[0];
if(image === '' || image === undefined) {
alert('not an image, the file is a ${typeof image}');
return;
}
setShareImage(image);
};
const switchAssetArea = (area) => {
setShareImage("");
setVideoLink("");
setAssetArea(area);
};
const reset = (e) => {
setDescriptionText("");
setAddressText("");
setVenueText("");
setStartTimeText("");
setEndTimeText("");
setShareImage("");
setVideoLink("");
setURL("");
props.handleClick(e);
};
This was taken from a reddit user who solved my answer. Big thank you to him for taking the time to write out a thoughtful response.
So, you're kinda right that your issue has a bit to do with asynchronicity, but it's actually got nothing to do with your functions being async, and everything to do with how useState works.
Suffice it to say, when you call uploadFile in the middle of your createPost function, on the next line the value of url has not yet changed. This would still be true even if uploadFile were synchronous, because when you call a useState setter function, in this case setURL, the getter value url doesn't change until the next time the component renders.
This actually makes perfect sense if you stop thinking about it as a React component for a moment, and imagine that this was just vanilla JavaScript:
someFunction () {
const url = 'https://www.website.com';
console.log(url);
anotherFunction();
yetAnotherFunction();
evenMoreFunction();
console.log(url);
}
In this example, would you ever expect the value of url to change? Probably not, since url is declared as const, which means if the code runs literally at all, it's physically impossible for the value of url to change within a single invocation of someFunction.
Functional components and hooks are the same; in a single "invocation" (render) of a functional component, url will have the same value at every point in your code, and it's not until the entire functional component re-renders that any calls to setURL would take effect.
This is an extremely common misunderstanding; you're not the first and you won't be the last. Usually, it's indicative of a design flaw in your data flow - why are you storing url in a useState to begin with? If you don't need it to persist across distinct, uncoupled events, it's probably better to treat it like a regular JavaScript value.
Since uploadBytes returns a promise, you could make uploadFile asynchronous as well, and ultimately make uploadFile return the information you need back to createPost, like this:
const uploadFile = async () => {
if (shareImage == null) return;
const snapshot = await uploadBytes(storageRef, shareImage);
// console.log("Image uploaded")
const URL = await getDownloadURL(snapshot.ref);
return URL;
};
All I've done here us un-nest your .then calls, pulling the trapped values out into the usable scope of your uploadFile function. Now, you can change that one line of createPost to this:
const url = await uploadFile();
and eliminate your useState altogether.

Vue3 - OnMount doesn´t load array

In Vue 3 i need to fill some array with result of store. I import store like this
Imports
import { onMounted, ref, watch } from "vue";
import { useTableStore } from "../stores/table";
Then i declare values and try to fill it
const search = ref(null);
const searchInput = ref("");
const edition = ref([]);
const compilation = ref([]);
const debug = ref([]);
const navigation = ref([]);
const refactoring = ref([]);
const store = useTableStore();
onMounted(() => {
store.fetchTable();
edition.value = store.getEdition;
compilation.value = store.getCompilation;
debug.value = store.getDebug;
navigation.value = store.getNavigation;
refactoring.value = store.getRefactoring;
});
Values doesn´t fill it. Is strange, if use watcher like this
edition.value = store.getEdition.filter((edition: String) => {
for (let key in edition) {
if (
edition[key].toLowerCase().includes(searchInput.value.toLowerCase())
) {
return true;
}
}
});
Array get values.
So, the problem is: How can i get store values when view loads?
Maybe the problem is the store returns Proxy object...
UPDATE 1
I created a gist with full code
https://gist.github.com/ElHombreSinNombre/4796da5bcdcf6bf4f36f009132dd9f48
UPDATE 2
Pinia loads array data, but 'setup' can´t get it
UPDATE 3: SOLUTION
Finally i resolved the problems and upload to my Github. I used computed to get data updated. Maybe other solution was better.
https://github.com/ElHombreSinNombre/vue-shortcuts
Your onMounted lambda needs to be async, and you need to wait the fetchTable function. Edit: Try using reactive instead of ref for your arrays. Rule of thumb is ref for primitive values and reactive for objects and arrays.
const search = ref(null);
const searchInput = ref("");
const edition = reactive([]);
const compilation = reactive([]);
const debug = reactive([]);
const navigation = reactive([]);
const refactoring = reactive([]);
const store = useTableStore();
onMounted(async () => {
await store.fetchTable();
edition.push(...store.getEdition);
compilation.push(...store.getCompilation);
debug.push(...store.getDebug);
navigation.push(...store.getNavigation);
refactoring.push(...store.getRefactoring);
});
If what you need is the component to not be rendered until data is ready, you'll need a flag in your data that works along with a v-if to render the component when everything is ready, something like this:
// in your template
<div v-if="dataReady">
// your html code
</div>
// inside your script
const dataReady = ref(false)
onMounted(async () => {
await store.fetchTable();
dataReady.value = true;
});

Scrape supporters name from buymeacoffee website

I am trying to scrape supporters names from this https://www.buymeacoffee.com/singtaousa website.
Currently, I am able to get the total number of supporters using axios and cheerio modules. The problem is I can't figure out how to get the supporters name.
I also tried to search with span, not a single supporters name comes out. Not sure whether my code is wrong or the names are impossible to be retrieved.
Here is my code:
import cheerio from 'cheerio'
import axios from 'axios'
export default async function handler(req, res) {
const { data } = await axios.get('https://www.buymeacoffee.com/singtaousa') // example
const $ = cheerio.load(data)
const count = $('.text-fs-16.av-medium.clr-grey.xs-text-fs-14.mg-t-8').text()
const supporters = []
// to be change
$('span').each((i, element) => {
const name = $(element).text()
supporters.push(name)
})
res.status(200).json({ count, supporters })
}
The names are added by JavaScript, so you need something like puppeteer or any other headless browser runner to get full-fledged script-based page content. Here is an example for your case using puppeteer:
import puppeteer from 'puppeteer';
const browser = await puppeteer.launch();
try {
const [page] = await browser.pages();
await page.goto('https://www.buymeacoffee.com/singtaousa');
const namesMinimum = 20;
const nameSelector = 'div.supp-wrapper span.av-heavy';
const moreSelector = 'button#load-more-recent';
await page.waitForSelector(moreSelector);
while (await page.$$eval(nameSelector, names => names.length) < namesMinimum) {
await Promise.all([
page.click(moreSelector),
page.waitForResponse(
response => response.url().includes('www.buymeacoffee.com')
),
]);
}
const data = await page.evaluate(() => {
const names = Array.from(
document.querySelectorAll('div.supp-wrapper span.av-heavy'),
span => span.innerText,
);
return names;
});
console.log(data);
} catch (err) { console.error(err); } finally { await browser.close(); }
You will need to load all supporters with this method from console or manually because you don't have all of them loaded once:
await document.getElementById("load-more-recent").click();
The request for loading supporters is traceable via network tab of developer tools. After loading all, you can copy a list of names from output of code below. You can change concatenation for your output, or ignore null values, but basically that's working:
var supporters = $("div.supp-wrapper");
var list = [];
for(var i = 0; i < supporters.length; i++){
list.push(supporters[i].querySelectorAll("span.av-heavy")[0].textContent.trim(" "));
}
console.log(list);
this script will result:
(10) ['Amy', 'Wong', 'Someone', 'Someone', 'Someone', 'Emily', 'KWONG Wai Oi Anna', 'Simon wong', 'Elaine Liu', 'Someone']
To get all of the supporters name you need to load all with the click script above. Otherwise you can checkout network tab to use API request.

Problem scraping a url that has src = embed # async_embed using [cheerio]

Hello devs,
I am trying to take some data related to covid19 in my country from the following website
const url = https://e.infogram.com/dab81851-e3af-4767-b1f5-9b54eb900274?parent_url=https%3A%2F%2Festadisticas.pr%2Fen%2Fcovid-19&src=embed#async_embed
using the cheerio library, but apparently I cannot access the data.
If there is a way in which the data can be accessed, I will appreciate it.
index.js
const cheerio = require('cheerio');
const axios = require('axios').default;
const main = async() =>{
const url = 'https://e.infogram.com/dab81851-e3af-4767-b1f5-9b54eb900274?parent_url=https%3A%2F%2Festadisticas.pr%2Fen%2Fcovid-19&src=embed#async_embed'
const {data} = await axios.get(url, {method: 'GET'});
const $ = cheerio.load(data);
console.log($.html())
}
main();
That data is in a json blob:
let match = data.match(/window.infographicData=(\{.*?\});/)
let parsed = JSON.parse(match[1])

How to use firebase query in actions-on-google

I want to use firebase query to search for a particular user by their name in my action-on-google app.I have used the following code but it dosen't prints anything.
const ref = firebase.database();
const nm = ref.child('Users');
const q =
nm.orderByChild('Name').equalTo('abcd');
q.on('value', snap => {
conv.ask(snap.val());
});
});
Can somebody help in rectifying my code.
Keep in mind that Javascript is pretty asynchronous, meaning that you need to make sure that your function understands the flow of your execution and more importantly know when it ends.
The standard way of doing this is through a Promise. Many async functions now return Promises.
So you can rewrite your code as:
app.intent('intent name', conv => {
const ref = firebase.database();
const nm = ref.child('Users');
const q = nm.orderByChild('Name').equalTo('abcd');
return q.once('value', snap => {
conv.ask(snap.val());
});
});

Resources