I am new to web scraping and I was trying to create a simple web scraper using a tutorial. I did that, however, I wanted to try implementing another feature on my own. In the link (https://old.reddit.com/r/programming/), I was trying to fetch all the bullet points from the 'guidelines' (On the right side of the page). Right now, I am able to scrape and get all the information from the 'guidelines', 'info', and 'relatedReddits'. However, I was only trying to get the information from the 'guidelines'. Does anyone know how I can modify my code to access only the first ul tag under the div because right now, it accesses all. Thanks for stopping by.
const axios = require('axios');
const cheerio = require('cheerio');
const getPostTitles = async () => {
try{
const {data} = await axios.get('https://old.reddit.com/r/programming/');
//console.log(data);
const $ = cheerio.load(data);
const guidelines = [];
const postTitles = [];
// to get text in form of array
$('p.title > a').each((idx, el) => {
const postTitle = $(el).text();
postTitles.push(postTitle);
});
$('.md ul li').each((idx, el) => {
const guideline = $(el).text();
guidelines.push(guideline);
});
console.log(guidelines);
return postTitles;
}
catch(error){
throw error;
}
}
getPostTitles()
.then((postTitles) => console.log(postTitles))
.catch(err => console.log(err));
$('.md').find('ul').first().each((i, el) => {
const guideline = $(el).text();
guidelines.push(guideline);
});
This was the solution for anyone that comes here to look.
Related
I am trying to store the downloadLink from firebase's storage into firestore. I am able to set all the data, and I am able to set the link, the second time I click the "post" button.
I know the issue has to do with asynchronous functions, but I'm not experienced enough to know how to solve the issue.
In the "createPost" function, I am console logging "i am the URL: {url}" and in the "uploadFile" function, I am console logging "look at me {url}" to debug.
I noticed the "I am the URL" outputs nothing and then shortly after, the "look at me" outputs the URL.
setDoc() of course stores the imageLink as an empty string.
What can I do to solve this? Any help would be greatly appreciated or any documentation to help with my understanding of async functions.
Here is my code:
const PostModal = (props) => {
const makeid = (length) => {
var result = '';
var characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
var charactersLength = characters.length;
for ( var i = 0; i < length; i++ ) {
result += characters.charAt(Math.floor(Math.random() * charactersLength));
}
return result;
}
const [descriptionText, setDescriptionText] = useState("");
const [addressText, setAddressText] = useState("");
const [venueText, setVenueText] = useState("");
const [startTimeText, setStartTimeText] = useState("");
const [endTimeText, setEndTimeText] = useState("");
const [shareImage, setShareImage] = useState("");
const [videoLink, setVideoLink] = useState("");
const [assetArea, setAssetArea] = useState("");
const [url, setURL] = useState("");
const { data } = useSession();
const storage = getStorage();
const storageRef = ref(storage, `images/${makeid(5) + shareImage.name}`);
const uploadFile = () => {
if (shareImage == null) return;
uploadBytes(storageRef, shareImage).then( (snapshot) => {
//console.log("Image uploaded")
getDownloadURL(snapshot.ref).then( (URL) =>
{
setURL(URL);
console.log(`look at me: ${URL}`)});
});
}
const createPost = async () => {
var idLength = makeid(25);
const uploadTask = uploadBytesResumable(storageRef, file);
uploadFile()
console.log(`I am the URL: ${url} `)
setDoc(doc(db, "posts", idLength), {
eventDescription: descriptionText,
eventAddress: addressText,
venueName: venueText,
startTime: startTimeText,
endTime: endTimeText,
imageLink: url,
videoLink: videoLink,
username: data.user.name,
companyName: !data.user.company ? "" : data.user.company,
timestamp: Timestamp.now(),
});
}
const handleChange = (e) => {
const image = e.target.files[0];
if(image === '' || image === undefined) {
alert('not an image, the file is a ${typeof image}');
return;
}
setShareImage(image);
};
const switchAssetArea = (area) => {
setShareImage("");
setVideoLink("");
setAssetArea(area);
};
const reset = (e) => {
setDescriptionText("");
setAddressText("");
setVenueText("");
setStartTimeText("");
setEndTimeText("");
setShareImage("");
setVideoLink("");
setURL("");
props.handleClick(e);
};
This was taken from a reddit user who solved my answer. Big thank you to him for taking the time to write out a thoughtful response.
So, you're kinda right that your issue has a bit to do with asynchronicity, but it's actually got nothing to do with your functions being async, and everything to do with how useState works.
Suffice it to say, when you call uploadFile in the middle of your createPost function, on the next line the value of url has not yet changed. This would still be true even if uploadFile were synchronous, because when you call a useState setter function, in this case setURL, the getter value url doesn't change until the next time the component renders.
This actually makes perfect sense if you stop thinking about it as a React component for a moment, and imagine that this was just vanilla JavaScript:
someFunction () {
const url = 'https://www.website.com';
console.log(url);
anotherFunction();
yetAnotherFunction();
evenMoreFunction();
console.log(url);
}
In this example, would you ever expect the value of url to change? Probably not, since url is declared as const, which means if the code runs literally at all, it's physically impossible for the value of url to change within a single invocation of someFunction.
Functional components and hooks are the same; in a single "invocation" (render) of a functional component, url will have the same value at every point in your code, and it's not until the entire functional component re-renders that any calls to setURL would take effect.
This is an extremely common misunderstanding; you're not the first and you won't be the last. Usually, it's indicative of a design flaw in your data flow - why are you storing url in a useState to begin with? If you don't need it to persist across distinct, uncoupled events, it's probably better to treat it like a regular JavaScript value.
Since uploadBytes returns a promise, you could make uploadFile asynchronous as well, and ultimately make uploadFile return the information you need back to createPost, like this:
const uploadFile = async () => {
if (shareImage == null) return;
const snapshot = await uploadBytes(storageRef, shareImage);
// console.log("Image uploaded")
const URL = await getDownloadURL(snapshot.ref);
return URL;
};
All I've done here us un-nest your .then calls, pulling the trapped values out into the usable scope of your uploadFile function. Now, you can change that one line of createPost to this:
const url = await uploadFile();
and eliminate your useState altogether.
hope you have an amazing day!
i'm new with NextJs and i have a problem when i have to map my Firebase documents, i need to put 5 cards in my web, but i setting all the documents that i have.
this is my call ref
const [posts, setPosts] = useState([]);
useEffect(() => {
(async () => {
const callref = collection(db, "posts", limit(3));
const snapshots = await getDocs(callref);
if (posts.length < 5) {
const docs = snapshots.docs.map(doc => {
const data = doc.data()
data.id = doc.id
return data
});
setPosts(docs);
console.log(docs)
} else {
return;
}
})()
}, [])
and this is the map code in my web
{
posts.map((post) => (
<div className="contProduct">
<div className="imageProduct">
<img src={post.image}></img>
</div>
<div className="descriptionProduct">
<h3>{post.title}</h3>
<p>{post.subtitle}</p>
</div>
</div>
))
}
i tried putting an if sentense but it shows me an error, i dont know how to show only five posts.
Thank you for your time and I hope you can help me.
I'm not sure if I understand the question well, but maybe what you need is using the firestore limit() function, just like this :
const callref = collection(db, "posts", limit(3));
callRef.limit(5);
const snapshots = await getDocs(callref);
I just started learning Nuxt3. In my project I get list of movies from an API:
<script setup>
const config = useAppConfig();
let page = ref(1);
let year = 2022;
let url = computed(() => `https://api.themoviedb.org/3/discover/movieapi_key=${config.apiKey}&sort_by=popularity.desc&page=${page.value}&year=${year}`);
const { data: list } = await useAsyncData("list", () => $fetch(url.value));
const next = () => {
page.value++;
refreshNuxtData("list");
};
const prev = () => {
if (page.value > 1) {
page.value--;
refreshNuxtData("list");
}
};
</script>
Then I have a page for each movie where I get information about it:
<script setup>
const config = useAppConfig();
const route = useRoute();
const movieId = route.params.id;
const url = `https://api.themoviedb.org/3/movie/${movieId}api_key=${config.apiKey}`;
const { data: movie } = await useAsyncData("movie", () => $fetch(url));
refreshNuxtData("movie");
</script>
My problem is that when I open a new movie page, I see information about the old one, but after a second it changes. How can I fix it?
And I have doubts if I'm using refreshNuxtData() correctly. If not, can you show me the correct example of working with API in Nuxt3?
OP fixed the issue by using
const { data: movie } = await useFetch(url, { key: movieId })
movieId being dynamic, it will dedupe all the calls as explained here for the key: https://v3.nuxtjs.org/api/composables/use-async-data/#params
key: a unique key to ensure that data fetching can be properly de-duplicated across requests. If you do not provide a key, then a key that is unique to the file name and line number of the instance of useAsyncData will be generated for you
I am trying to scrape supporters names from this https://www.buymeacoffee.com/singtaousa website.
Currently, I am able to get the total number of supporters using axios and cheerio modules. The problem is I can't figure out how to get the supporters name.
I also tried to search with span, not a single supporters name comes out. Not sure whether my code is wrong or the names are impossible to be retrieved.
Here is my code:
import cheerio from 'cheerio'
import axios from 'axios'
export default async function handler(req, res) {
const { data } = await axios.get('https://www.buymeacoffee.com/singtaousa') // example
const $ = cheerio.load(data)
const count = $('.text-fs-16.av-medium.clr-grey.xs-text-fs-14.mg-t-8').text()
const supporters = []
// to be change
$('span').each((i, element) => {
const name = $(element).text()
supporters.push(name)
})
res.status(200).json({ count, supporters })
}
The names are added by JavaScript, so you need something like puppeteer or any other headless browser runner to get full-fledged script-based page content. Here is an example for your case using puppeteer:
import puppeteer from 'puppeteer';
const browser = await puppeteer.launch();
try {
const [page] = await browser.pages();
await page.goto('https://www.buymeacoffee.com/singtaousa');
const namesMinimum = 20;
const nameSelector = 'div.supp-wrapper span.av-heavy';
const moreSelector = 'button#load-more-recent';
await page.waitForSelector(moreSelector);
while (await page.$$eval(nameSelector, names => names.length) < namesMinimum) {
await Promise.all([
page.click(moreSelector),
page.waitForResponse(
response => response.url().includes('www.buymeacoffee.com')
),
]);
}
const data = await page.evaluate(() => {
const names = Array.from(
document.querySelectorAll('div.supp-wrapper span.av-heavy'),
span => span.innerText,
);
return names;
});
console.log(data);
} catch (err) { console.error(err); } finally { await browser.close(); }
You will need to load all supporters with this method from console or manually because you don't have all of them loaded once:
await document.getElementById("load-more-recent").click();
The request for loading supporters is traceable via network tab of developer tools. After loading all, you can copy a list of names from output of code below. You can change concatenation for your output, or ignore null values, but basically that's working:
var supporters = $("div.supp-wrapper");
var list = [];
for(var i = 0; i < supporters.length; i++){
list.push(supporters[i].querySelectorAll("span.av-heavy")[0].textContent.trim(" "));
}
console.log(list);
this script will result:
(10) ['Amy', 'Wong', 'Someone', 'Someone', 'Someone', 'Emily', 'KWONG Wai Oi Anna', 'Simon wong', 'Elaine Liu', 'Someone']
To get all of the supporters name you need to load all with the click script above. Otherwise you can checkout network tab to use API request.
Sometimes the app working well.
but I dont know why, sometimes the data from firebase give me blank screen instead of the data.
after I reopen the app it work.
for example,
one of my pages:
useEffect( () => {
const subscriber = firestore()
.collection('Trails')
.onSnapshot(querySnapshot => { //const querySnapshot = await firebase.firestore().collection('users').get();
const trails = [];
console.log('subscribe')
if (querySnapshot)
querySnapshot.forEach(async documentSnapshot => {
trails.push({
...documentSnapshot.data(),
key: documentSnapshot.id,
});
console.log("trails test", trails)
});
setTrails(trails);
setLoading(false);
});
return () => {subscriber()};
}, []);
I made useEffect to get the data from DB then show it, and same - sometimes give me blank and sometimes works well.
I want to publish the app but im not satisfying with this bug?
sorry for my English, I dont even know how to describe this problem better.
Please can anyone guide me through? maybe my useEffect not doing well?
I think you should use debugging.
React native documentation
Stackoverflow question
I think there's issue with the return in useEffect return is called when componeent unmounts. here's an example how i handle async data fetching:
...
const [data, setData] = useState([]);
const isCurrentView = useRef(true);
useEffect(() => {
if (isCurrentView.current === true) {
const asyncFetch = async () => {
const response = await fetch(...something) //here some Asynchronous Call Like(axios, fetch)
setData([...response]);
};
asyncFetch();
}
return () => {
isCurrentView.current = false;
};
}, []);
...
im not 100% sure if this is the VERY best approach, but i have seen similar code in places so i addopted this.
problem was solved:
the setTrails was under scope and it kept refreshing with empty data.
querySnapshot.forEach(async documentSnapshot => {
trails.push({
...documentSnapshot.data(),
key: documentSnapshot.id,
});
setTrails(trails); // <<<~~~~ put the set in this scope.
});
setLoading(false);
});