Problem scraping a url that has src = embed # async_embed using [cheerio] - web-scraping

Hello devs,
I am trying to take some data related to covid19 in my country from the following website
const url = https://e.infogram.com/dab81851-e3af-4767-b1f5-9b54eb900274?parent_url=https%3A%2F%2Festadisticas.pr%2Fen%2Fcovid-19&src=embed#async_embed
using the cheerio library, but apparently I cannot access the data.
If there is a way in which the data can be accessed, I will appreciate it.
index.js
const cheerio = require('cheerio');
const axios = require('axios').default;
const main = async() =>{
const url = 'https://e.infogram.com/dab81851-e3af-4767-b1f5-9b54eb900274?parent_url=https%3A%2F%2Festadisticas.pr%2Fen%2Fcovid-19&src=embed#async_embed'
const {data} = await axios.get(url, {method: 'GET'});
const $ = cheerio.load(data);
console.log($.html())
}
main();

That data is in a json blob:
let match = data.match(/window.infographicData=(\{.*?\});/)
let parsed = JSON.parse(match[1])

Related

Vue3 - OnMount doesn´t load array

In Vue 3 i need to fill some array with result of store. I import store like this
Imports
import { onMounted, ref, watch } from "vue";
import { useTableStore } from "../stores/table";
Then i declare values and try to fill it
const search = ref(null);
const searchInput = ref("");
const edition = ref([]);
const compilation = ref([]);
const debug = ref([]);
const navigation = ref([]);
const refactoring = ref([]);
const store = useTableStore();
onMounted(() => {
store.fetchTable();
edition.value = store.getEdition;
compilation.value = store.getCompilation;
debug.value = store.getDebug;
navigation.value = store.getNavigation;
refactoring.value = store.getRefactoring;
});
Values doesn´t fill it. Is strange, if use watcher like this
edition.value = store.getEdition.filter((edition: String) => {
for (let key in edition) {
if (
edition[key].toLowerCase().includes(searchInput.value.toLowerCase())
) {
return true;
}
}
});
Array get values.
So, the problem is: How can i get store values when view loads?
Maybe the problem is the store returns Proxy object...
UPDATE 1
I created a gist with full code
https://gist.github.com/ElHombreSinNombre/4796da5bcdcf6bf4f36f009132dd9f48
UPDATE 2
Pinia loads array data, but 'setup' can´t get it
UPDATE 3: SOLUTION
Finally i resolved the problems and upload to my Github. I used computed to get data updated. Maybe other solution was better.
https://github.com/ElHombreSinNombre/vue-shortcuts
Your onMounted lambda needs to be async, and you need to wait the fetchTable function. Edit: Try using reactive instead of ref for your arrays. Rule of thumb is ref for primitive values and reactive for objects and arrays.
const search = ref(null);
const searchInput = ref("");
const edition = reactive([]);
const compilation = reactive([]);
const debug = reactive([]);
const navigation = reactive([]);
const refactoring = reactive([]);
const store = useTableStore();
onMounted(async () => {
await store.fetchTable();
edition.push(...store.getEdition);
compilation.push(...store.getCompilation);
debug.push(...store.getDebug);
navigation.push(...store.getNavigation);
refactoring.push(...store.getRefactoring);
});
If what you need is the component to not be rendered until data is ready, you'll need a flag in your data that works along with a v-if to render the component when everything is ready, something like this:
// in your template
<div v-if="dataReady">
// your html code
</div>
// inside your script
const dataReady = ref(false)
onMounted(async () => {
await store.fetchTable();
dataReady.value = true;
});

How to export data from a Puppeteer script?

So, I got this script that collects titles from a news website.
The result of the scraping is pushed into the x empty array.
const puppeteer = require('puppeteer');
export let x = []
async function scrapeNewsTitles(url){
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
const [el] = await page.$x('/html/body/main/div[1]/div[1]/article/figure/a/img');
const src = await el.getProperty('src');
const srcTxt = await src.jsonValue();
console.log(srcTxt);
const [el2] = await page.$x('/html/body/main/div[1]/div[1]/article/div/h1/a');
const txt = await el2.getProperty('textContent');
const rawTxt = await txt.jsonValue();
const newArticle = {srcTxt, rawTxt};
x.push(newArticle);
browser.close();
console.log(x)
}
scrapeNewsTitles('https://www.lmneuquen.com');
What I want now is to export the x array, which contains the collected data, so I can use it in another script. The problem is that if I do this...
export let x = []
and then I import it into another file like this...
import {x} from './file.js'
...it gives me the following error:
SyntaxError: Cannot use import statement outside a module
Would you point me in the right direction to do it?
Thank you in advance! Have a nice day.
Instead of using "export let x = []", use "module.exports".
So, change your code to:
let x = []
and at the end of the code, write
module.exports = {"x": x};
When you import this array from the new file, use
let x = require("./index.js") //Instead of index.js, write the name of your first file.
console.log(x);
The reason why is that the keywords "export" and "import from" are used in Vanilla JS. However, the code that you are using is Node JS, so the structure will be slightly different.

firebase cloud functions: thumbnail creation with sharp library is not generating access token

I am generating thumbnail images with sharp library as follows:
const filePath128 = path.join(path.dirname(filePath), `${_128_PREFIX}${fileName}`);
const filePath512 = path.join(path.dirname(filePath), `${_512_PREFIX}${fileName}`);
const filePath1024 = path.join(path.dirname(filePath), `${fileName}`);
const uploadStream128 = bucket.file(filePath128).createWriteStream({metadata});
const uploadStream512 = bucket.file(filePath512).createWriteStream({metadata});
const uploadStream1024 = bucket.file(filePath1024).createWriteStream({metadata});
const pipeline128 = sharp();
pipeline128.resize(_128_MAX_WIDTH, _128_MAX_HEIGHT).pipe(uploadStream128);
const pipeline512 = sharp();
pipeline512.resize(_512_MAX_WIDTH, _512_MAX_HEIGHT).pipe(uploadStream512);
const pipeline1024 = sharp();
pipeline1024.resize(_1024_MAX_WIDTH, _1024_MAX_HEIGHT).pipe(uploadStream1024);
bucket.file(filePath).createReadStream().pipe(pipeline128);
bucket.file(filePath).createReadStream().pipe(pipeline512);
bucket.file(filePath).createReadStream().pipe(pipeline1024);
return new Promise((resolve, reject) => {
let finishCount = 0;
const checkFinish = (ee: any) => {
finishCount++;
console.log('finishCount: ', finishCount, ee);
if (finishCount === 3) { resolve(); }
}
uploadStream128.on('finish', checkFinish).on('error', checkFinish);
uploadStream512.on('finish', checkFinish).on('error', checkFinish);
uploadStream1024.on('finish', checkFinish).on('error', checkFinish);
});
All the thumbnail images are getting generated without any problem. But access token was not created for them by default. In Firebase console, I can't even view the generated image files. Once after I do Create new access token link from the console, the image becomes viewable.
Any help on how to tackle this situation?
Thanks.
This currently can't be solved from Cloud Functions. You can create a download URL programmatically from a Firebase client app (which will create the token), but not from backend code.

Filtering data by child for reading || React Native || Rest Api

I am using react native with firebase. where i am storing my data like this.
infographics-
mdhd98djj24340-
cat:'health',
imgUrl:'something....'
mdhd98djj24340-
cat:'Education',
imgUrl:'something....'
I want to read data by specific cat like i want to fetch all data where for screen where cat is equal to health. I am requesting like that but it's not showing any thing.
const fetchData = async () => {
const req = await fetch(
'https://stratic-research-institute.firebaseio.com/infographics.json?
orderBy="cat"&startAt="d"&endAt="e"'
);
const res = await req.json();
const vl = Object.keys(res);
const loadedData = [];
vl.map((item) => loadedData.push(res[item]));
setstate({
info: loadedData.reverse(),
loading: false,
});
};

cheerioJS not grabbing element from page

I'm having issues grabbing an element with cheerio js.
I opened the following website -> https://www.voobly.com/ladder/view/Age-of-Mythology-The-Titans/1v1-Supremacy
and when I do view page source I can see the id -> pagebrowser1, but when I try to grab it using cheerio, it returns null.
const axios = require('axios');
const cheerio = require('cheerio');
axios.get('https://www.voobly.com/ladder/view/Age-of-Mythology-The-Titans/1v1-Supremacy').then((res) => {
const $ = cheerio.load(res.data)
console.log($("#pagebrowser1").html());
});

Resources