I am trying to scrape supporters names from this https://www.buymeacoffee.com/singtaousa website.
Currently, I am able to get the total number of supporters using axios and cheerio modules. The problem is I can't figure out how to get the supporters name.
I also tried to search with span, not a single supporters name comes out. Not sure whether my code is wrong or the names are impossible to be retrieved.
Here is my code:
import cheerio from 'cheerio'
import axios from 'axios'
export default async function handler(req, res) {
const { data } = await axios.get('https://www.buymeacoffee.com/singtaousa') // example
const $ = cheerio.load(data)
const count = $('.text-fs-16.av-medium.clr-grey.xs-text-fs-14.mg-t-8').text()
const supporters = []
// to be change
$('span').each((i, element) => {
const name = $(element).text()
supporters.push(name)
})
res.status(200).json({ count, supporters })
}
The names are added by JavaScript, so you need something like puppeteer or any other headless browser runner to get full-fledged script-based page content. Here is an example for your case using puppeteer:
import puppeteer from 'puppeteer';
const browser = await puppeteer.launch();
try {
const [page] = await browser.pages();
await page.goto('https://www.buymeacoffee.com/singtaousa');
const namesMinimum = 20;
const nameSelector = 'div.supp-wrapper span.av-heavy';
const moreSelector = 'button#load-more-recent';
await page.waitForSelector(moreSelector);
while (await page.$$eval(nameSelector, names => names.length) < namesMinimum) {
await Promise.all([
page.click(moreSelector),
page.waitForResponse(
response => response.url().includes('www.buymeacoffee.com')
),
]);
}
const data = await page.evaluate(() => {
const names = Array.from(
document.querySelectorAll('div.supp-wrapper span.av-heavy'),
span => span.innerText,
);
return names;
});
console.log(data);
} catch (err) { console.error(err); } finally { await browser.close(); }
You will need to load all supporters with this method from console or manually because you don't have all of them loaded once:
await document.getElementById("load-more-recent").click();
The request for loading supporters is traceable via network tab of developer tools. After loading all, you can copy a list of names from output of code below. You can change concatenation for your output, or ignore null values, but basically that's working:
var supporters = $("div.supp-wrapper");
var list = [];
for(var i = 0; i < supporters.length; i++){
list.push(supporters[i].querySelectorAll("span.av-heavy")[0].textContent.trim(" "));
}
console.log(list);
this script will result:
(10) ['Amy', 'Wong', 'Someone', 'Someone', 'Someone', 'Emily', 'KWONG Wai Oi Anna', 'Simon wong', 'Elaine Liu', 'Someone']
To get all of the supporters name you need to load all with the click script above. Otherwise you can checkout network tab to use API request.
Related
I am trying to store the downloadLink from firebase's storage into firestore. I am able to set all the data, and I am able to set the link, the second time I click the "post" button.
I know the issue has to do with asynchronous functions, but I'm not experienced enough to know how to solve the issue.
In the "createPost" function, I am console logging "i am the URL: {url}" and in the "uploadFile" function, I am console logging "look at me {url}" to debug.
I noticed the "I am the URL" outputs nothing and then shortly after, the "look at me" outputs the URL.
setDoc() of course stores the imageLink as an empty string.
What can I do to solve this? Any help would be greatly appreciated or any documentation to help with my understanding of async functions.
Here is my code:
const PostModal = (props) => {
const makeid = (length) => {
var result = '';
var characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
var charactersLength = characters.length;
for ( var i = 0; i < length; i++ ) {
result += characters.charAt(Math.floor(Math.random() * charactersLength));
}
return result;
}
const [descriptionText, setDescriptionText] = useState("");
const [addressText, setAddressText] = useState("");
const [venueText, setVenueText] = useState("");
const [startTimeText, setStartTimeText] = useState("");
const [endTimeText, setEndTimeText] = useState("");
const [shareImage, setShareImage] = useState("");
const [videoLink, setVideoLink] = useState("");
const [assetArea, setAssetArea] = useState("");
const [url, setURL] = useState("");
const { data } = useSession();
const storage = getStorage();
const storageRef = ref(storage, `images/${makeid(5) + shareImage.name}`);
const uploadFile = () => {
if (shareImage == null) return;
uploadBytes(storageRef, shareImage).then( (snapshot) => {
//console.log("Image uploaded")
getDownloadURL(snapshot.ref).then( (URL) =>
{
setURL(URL);
console.log(`look at me: ${URL}`)});
});
}
const createPost = async () => {
var idLength = makeid(25);
const uploadTask = uploadBytesResumable(storageRef, file);
uploadFile()
console.log(`I am the URL: ${url} `)
setDoc(doc(db, "posts", idLength), {
eventDescription: descriptionText,
eventAddress: addressText,
venueName: venueText,
startTime: startTimeText,
endTime: endTimeText,
imageLink: url,
videoLink: videoLink,
username: data.user.name,
companyName: !data.user.company ? "" : data.user.company,
timestamp: Timestamp.now(),
});
}
const handleChange = (e) => {
const image = e.target.files[0];
if(image === '' || image === undefined) {
alert('not an image, the file is a ${typeof image}');
return;
}
setShareImage(image);
};
const switchAssetArea = (area) => {
setShareImage("");
setVideoLink("");
setAssetArea(area);
};
const reset = (e) => {
setDescriptionText("");
setAddressText("");
setVenueText("");
setStartTimeText("");
setEndTimeText("");
setShareImage("");
setVideoLink("");
setURL("");
props.handleClick(e);
};
This was taken from a reddit user who solved my answer. Big thank you to him for taking the time to write out a thoughtful response.
So, you're kinda right that your issue has a bit to do with asynchronicity, but it's actually got nothing to do with your functions being async, and everything to do with how useState works.
Suffice it to say, when you call uploadFile in the middle of your createPost function, on the next line the value of url has not yet changed. This would still be true even if uploadFile were synchronous, because when you call a useState setter function, in this case setURL, the getter value url doesn't change until the next time the component renders.
This actually makes perfect sense if you stop thinking about it as a React component for a moment, and imagine that this was just vanilla JavaScript:
someFunction () {
const url = 'https://www.website.com';
console.log(url);
anotherFunction();
yetAnotherFunction();
evenMoreFunction();
console.log(url);
}
In this example, would you ever expect the value of url to change? Probably not, since url is declared as const, which means if the code runs literally at all, it's physically impossible for the value of url to change within a single invocation of someFunction.
Functional components and hooks are the same; in a single "invocation" (render) of a functional component, url will have the same value at every point in your code, and it's not until the entire functional component re-renders that any calls to setURL would take effect.
This is an extremely common misunderstanding; you're not the first and you won't be the last. Usually, it's indicative of a design flaw in your data flow - why are you storing url in a useState to begin with? If you don't need it to persist across distinct, uncoupled events, it's probably better to treat it like a regular JavaScript value.
Since uploadBytes returns a promise, you could make uploadFile asynchronous as well, and ultimately make uploadFile return the information you need back to createPost, like this:
const uploadFile = async () => {
if (shareImage == null) return;
const snapshot = await uploadBytes(storageRef, shareImage);
// console.log("Image uploaded")
const URL = await getDownloadURL(snapshot.ref);
return URL;
};
All I've done here us un-nest your .then calls, pulling the trapped values out into the usable scope of your uploadFile function. Now, you can change that one line of createPost to this:
const url = await uploadFile();
and eliminate your useState altogether.
I just started learning Nuxt3. In my project I get list of movies from an API:
<script setup>
const config = useAppConfig();
let page = ref(1);
let year = 2022;
let url = computed(() => `https://api.themoviedb.org/3/discover/movieapi_key=${config.apiKey}&sort_by=popularity.desc&page=${page.value}&year=${year}`);
const { data: list } = await useAsyncData("list", () => $fetch(url.value));
const next = () => {
page.value++;
refreshNuxtData("list");
};
const prev = () => {
if (page.value > 1) {
page.value--;
refreshNuxtData("list");
}
};
</script>
Then I have a page for each movie where I get information about it:
<script setup>
const config = useAppConfig();
const route = useRoute();
const movieId = route.params.id;
const url = `https://api.themoviedb.org/3/movie/${movieId}api_key=${config.apiKey}`;
const { data: movie } = await useAsyncData("movie", () => $fetch(url));
refreshNuxtData("movie");
</script>
My problem is that when I open a new movie page, I see information about the old one, but after a second it changes. How can I fix it?
And I have doubts if I'm using refreshNuxtData() correctly. If not, can you show me the correct example of working with API in Nuxt3?
OP fixed the issue by using
const { data: movie } = await useFetch(url, { key: movieId })
movieId being dynamic, it will dedupe all the calls as explained here for the key: https://v3.nuxtjs.org/api/composables/use-async-data/#params
key: a unique key to ensure that data fetching can be properly de-duplicated across requests. If you do not provide a key, then a key that is unique to the file name and line number of the instance of useAsyncData will be generated for you
With GraphQL and nextjs, I'm trying to retrieve some data from strapi.
When I try to access these data from the other file and display them on the UI, I get this error Promise {} in console.log.
This is what i tried
sliderAdapter.js
import { fetchSlider } from "./apiClient";
export const sliderAdapter = async (data, locale, url) => {
const sl = await fetchSlider();
const deepDownSlides = sl.data?.slides?.data;
if (deepDownSlides.length > 0) {
const slider = deepDownSlides[0]?.attributes?.slider;
// console.log("slider", slider);
return slider;
}
// This code is working but not properly, just return the data into the console.
return "";
};
fetchSlider is the file where i put the query.
Next:
import { sliderAdapter } from "../../lib/sliderAdapter";
const Slider = (data) => {
const slide= sliderAdapter(data)
console.log("slide", slide)
If anyone knows or can find the issues, plz let me know :)
Your function is asynchronous so you have to retrieve the value once the promise is resolved
sliderAdapter(data).then(slide=>console.log(slide))
I am new to web scraping and I was trying to create a simple web scraper using a tutorial. I did that, however, I wanted to try implementing another feature on my own. In the link (https://old.reddit.com/r/programming/), I was trying to fetch all the bullet points from the 'guidelines' (On the right side of the page). Right now, I am able to scrape and get all the information from the 'guidelines', 'info', and 'relatedReddits'. However, I was only trying to get the information from the 'guidelines'. Does anyone know how I can modify my code to access only the first ul tag under the div because right now, it accesses all. Thanks for stopping by.
const axios = require('axios');
const cheerio = require('cheerio');
const getPostTitles = async () => {
try{
const {data} = await axios.get('https://old.reddit.com/r/programming/');
//console.log(data);
const $ = cheerio.load(data);
const guidelines = [];
const postTitles = [];
// to get text in form of array
$('p.title > a').each((idx, el) => {
const postTitle = $(el).text();
postTitles.push(postTitle);
});
$('.md ul li').each((idx, el) => {
const guideline = $(el).text();
guidelines.push(guideline);
});
console.log(guidelines);
return postTitles;
}
catch(error){
throw error;
}
}
getPostTitles()
.then((postTitles) => console.log(postTitles))
.catch(err => console.log(err));
$('.md').find('ul').first().each((i, el) => {
const guideline = $(el).text();
guidelines.push(guideline);
});
This was the solution for anyone that comes here to look.
I am developing an e-commerce like website using NextJS.
I will fetch & display list of products in /products page. On clicking any product, I'll navigate to /details/[productId], and I'll fetch those product details as follows.
// In /details/[productId].js file
export async function getServerSideProps({params}) {
const res = await fetch(`https:my-api-url/api/products/${params.productId}`)
const product = await res.json()
return {
props: {
product
}
}
}
Problem
Everything looks good till this step. But I thought to reduce number of database read count, hence instead of fetching product detail again in detail page, I planned to use the data fetched in the previous page (/products) which will have the information about the product. Hence I need a way to pass those product object into next screen /details/[productId]'s getServerSideProps (to achieve SSR for SEO purposes).
Workaround
One solution I currently have is to stringify the product json and pass it via query parameter and get it back in getServerSideProps({params, query}). But it just spams my url in the browser which isn't look good at all.
Expectation
Is there any other way to pass the data into getServerSideProps function so that it will utilize the data to generate the whole page in server itself. Please guide me to overcome this issue. Any help would be appreciated.
Thanks in advance.. (:
You can bring in a custom server as express that provides locals property available through the lifetime of your application or request.
const next = require('next');
const express = require('express');
const app = next({ dev: process.env.NODE_ENV !== 'production' });
const handle = routes.getRequestHandler(app);
const env = process.env.NODE_ENV || 'dev';
app.prepare().then(() => {
const server = express();
server.get('/products', async (req, reply) => {
const products = await //... fetch product with details
req.app.locals.products = products;
return app.render(req, reply, '/path/to/products/page', req.query);
});
server.get('/details/:productId', async (req, reply) => {
const {productId} = req.params;
const {products} = req.app.locals;
// find product using productId and make available in req.locals
req.locals.product = // product;
return app.render(req, reply, '/path/to/product/detail/page', req.query)
});
server.get('*', (req, reply) => {
return handle(req, reply)
});
server.listen(3000);
});
Pay caution to how large your product list grow to avoid running your application out of memory.
You could also return a cookie containing the list of products on the request for products (See limits for HTTP cookies). Then read that on the product detail page.
When I enter URL http://localhost:3000/blog/wfe436
//getting the meta tags dynamically
export const getServerSideProps = async ({ params }) => {
// Get external data from the file system, API, DB, etc.
console.log(params) // here is the data of the url { blogname: 'wfe436' }
const posts = Data
// The value of the `props` key will be
// passed to the `Home` component
return {
props: { posts }
}
}