Unable to fetch rendered dom using puppeteer - web-scraping

I have recently been experimenting with using Puppeteer for fetching content from webpage and I have noticed that for certain webpages like https://www.scmp.com/news/china/diplomacy/article/3174562/china-needs-new-playbook-counter-eus-tougher-trade-and?module=lead_hero_story&pgtype=homepage, I am unable to fetch the actual dom that is rendered when the link is opened using a browser. The snippet I use to fetch the content is below:
(async () => {
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto('https://www.scmp.com/news/china/diplomacy/article/3174562/china-needs-new-playbook-counter-eus-tougher-trade-and?module=lead_hero_story&pgtype=homepage', { waitUntil: 'domcontentloaded', timeout: 60000 });
const data = await page.evaluate(() => document.querySelector('*').outerHTML);
await fs.writeFile("./test.html", data, err => {
if (err) {
console.error(err)
return
}
//file written successfully
});
console.log(data);
await browser.close();
})();
I already tried with a waitUntil value of networkidl0 but could not extract the expected dom. What am I doing wrong here?

Related

expo react native upload image to firebase storage

I am trying to upload an image from the phone library with expo-image-picker to Firebase storage and download the URL to save it in Firestore but for some reason, my app keeps crashing on (iPhone) without any error. I have tried every possible way to fix this issue(running my code line by line etc) but nothing has yet fixed this issue.
Has anyone encountered a similar issue and could help me with this particular problem? I have been stuck for a few days now. It would be a big help. Thank you in advance.
Here is my code:
Turning image to blob. at First, I used the fetch method but this seems to work better.
const urlToBlob = async (url) => {
return await new Promise((resolve, reject) => {
var xhr = new XMLHttpRequest();
xhr.onerror = reject;
xhr.onreadystatechange = () => {
if (xhr.readyState === 4) {
resolve(xhr.response);
}
};
xhr.open("GET", url);
xhr.responseType = "blob"; // convert type
xhr.send();
});
};
Uploading an image to storage. Sometimes it uploads but if you upload again it crashes.
const uploadImageAsync = async (imageUri) => {
let blob;
const imageRef = imageUri.substring(imageUri.lastIndexOf("/"));
try {
blob = await urlToBlob(imageUri);
const ref = await firebase.storage().ref().child(imageRef);
await ref.put(blob);
return await ref.getDownloadURL();
} catch (error) {
console.log(
"🚀 ~ file: eventServices.jsx ~ line 33 ~ createEvent ~ error",
error
);
} finally {
blob.close();
console.log("blob closed");
}
};
Here I get the image and pass it to my function which should return the URL to the image. URL then should get saved in Firestore.
export const createEvent = async (eventObj) => {
const imageUri = eventObj.image;
try {
const downloadUrl = uploadImageAsync(imageUri);
console.log(downloadUrl );
await firebase
.firestore()
.collection("events")
.add({ ...eventObj, image: downloadUrl });
console.log("Event added!");
} catch (error) {
console.log(
"🚀 ~ file: eventServices.jsx ~ line 62 ~ createEvent ~ error",
error
);
}
};

Download image from firebase storage and add to jszip using node.js cloud function

I've been trying various approaches on this for a few days and am hitting a wall. I've got images stored in firebase storage that I want to add to a zip file that gets emailed out with some other forms. I've tried quite a few iterations but, while the jpeg file gets added to the outputted zip, it's not able to be opened by any application.
Here is my latest iteration:
exports.sendEmailPacket = functions.https.onRequest(async (request, response) => {
const userId = request.query.userId;
const image = await admin
.storage()
.bucket()
.file(`images/${userId}`)
.download();
const zipped = new JSZip();
zipped.file('my-image.jpg', image, { binary: true });
const content = await zipped.generateAsync({ type: 'nodebuffer' });
// this gets picked up by another cloud function that delivers the email
await admin.firestore()
.collection("emails")
.doc(userId)
.set({
to: 'myemail#gmail.com',
message: {
attachments: [
{
filename: 'test.mctesty.zip',
content: Buffer.from(content)
}
]
}
});
});
Was able to figure this out after a bit more research:
exports.sendEmailPacket = functions.https.onRequest(async (request, response) => {
const userId = request.query.userId;
const image = await admin
.storage()
.bucket()
.file(`images/${userId}`)
.get(); // get instead of download
const zipped = new JSZip();
zipped.file('my-image.jpg', image[0].createReadStream(), { binary: true }); // from the 'File' type, call .createReadStream()
const content = await zipped.generateAsync({ type: 'nodebuffer' });
// this gets picked up by another cloud function that delivers the email
await admin.firestore()
.collection("emails")
.doc(userId)
.set({
to: 'myemail#gmail.com',
message: {
attachments: [
{
filename: 'test.mctesty.zip',
content: Buffer.from(content)
}
]
}
});
});

Clicking a button within an iframe with puppeteer

Trying to click on the "I accept all cookies" button which is inside iFrame (The popup only show for EU country ip).
You can check here also jsfiddle.net/#&togetherjs=VgKpE0jfJF.
//index.js
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless:false,
ignoreHTTPSErrors: true,
slowMo: 50,
args: ['--window-size=1440,900', '--disable-gpu', "--disable-features=IsolateOrigins,site-per-process", '--blink-settings=imagesEnabled=true']
});
const page = await browser.newPage();
await page.goto('https://www.oracle.com/cloud/cost-estimator.html');
await page.waitFor(3000)
const frame = page.frames().find(f => f.name() === 'iframe');
const acceptBtn = await frame.$(`a[class="call"]`);
await acceptBtn.click();
await page.screenshot({path: 'example.png'});
//await browser.close();
})();
The error i get
UnhandledPromiseRejectionWarning: TypeError: Cannot read property '$' of undefined
at
Please help. Thanks
As far as I can tell, this iframe has no name in the HTML code, so you can try its src (URL):
const frame = page.frames().find(f => f.url().startsWith('https://consent-pref.trustarc.com/'));

Expo/Firebase: Image chosen from camera roll uploading as octet-stream instead of .jpg

I've been having trouble viewing the image files I've uploaded to firebase and just noticed the issue is with the file type in firebase.
Two files in my firebase storage console. One uploaded from my IOS simulator (octet-stream) and the other uploaded directly into the console from the browser which uploads properly and is viewable.
Here are my select and upload functions:
_selectPhoto = async () => {
const status = await getPermission(Permissions.CAMERA_ROLL);
if (status) {
let imageName = "pic"
const result = await ImagePicker.launchImageLibraryAsync(options);
if (!result.cancelled) {
Animated.timing(this.animatedWidth, {
toValue: 600,
duration: 15000
}).start()
this.uploadImage(result.uri, imageName)
.then(() => {
this.props.navigation.navigate('Profile')
})
.catch((error) => {
Alert.alert('Must Sign In');
this.props.navigation.navigate('Login')
console.log(error);
})
}
}
};
uploadImage = async (uri, imageName) => {
const user = firebase.auth().currentUser;
const response = await fetch(uri);
const blob = await response.blob();
let storageRef = firebase.storage().ref().child(''images/'+user.displayName+'/'+imageName+'.jpg'');
const snapshot = await storageRef.put(blob);
blob.close();
snapshot.ref.getDownloadURL().then(function(downloadURL) {
console.log("File available at", downloadURL);
user.updateProfile({
photoURL: downloadURL.toString(),
}).then(function() {
console.log('saved photo')
}).catch(function(error) {
console.log('failed photo')
});
});
}
When I get the link in my console, it also has the media&token:
... .appspot.com/o/profile-pic.jpg?alt=media&token=56eb9c36-b5cd-4dbb-bec1-3ea5c3a74bdd
If I CMD+Click in VS Code I receive an error:
{
error: {
code: 400,
message: "Invalid HTTP method/URL pair."
}
}
So naturally, when I put that link in the browser it downloads a file with that name but says:
The file “pic.jpg” could not be opened.
It may be damaged or use a
file format that Preview doesn’t recognize.
Maybe it could be something with mediaTypes, but I'm not exactly sure how to use it.
mediaTypes : String -- Choose what type of media to pick. Usage:
ImagePicker.MediaTypeOptions., where is one of: Images,
Videos, All.
Thanks!
I've been fighting with this same issue for the past few days. I was finally able get images to upload and render as expected by following the Firebase Upload example in the Expo repo. I don't fully understand why it works, but it seems like Firebase doesn't like the blob that's generated by
const blob = await response.blob();
Try replacing the above with:
const blob = await new Promise((resolve, reject) => {
const xhr = new XMLHttpRequest();
xhr.onload = function() {
resolve(xhr.response);
};
xhr.onerror = function(e) {
console.log(e);
reject(new TypeError('Network request failed'));
};
xhr.responseType = 'blob';
xhr.open('GET', uri, true);
xhr.send(null);
});

google places api returns a string, how do I parse to JSON object?

In a small webshop that I am trying to setup, I need to update the opening hours in the background with firebase functions and google place details when a user creates a shoppingcart.
I can succesfully sent a GET request with POSTMAN to retrieve the opening hours of a shop using the following instructions:
https://developers.google.com/places/web-service/details
But I cannot access the response from the GET request as I usually do with JSON responses.
I tried also:response.result.opening_hours.json()
Can someone tell me what I am doing wrong?
export const mapGooglePlaces = functions.database
.ref('/shopping-carts/{shoppingCartId}/shippingManner')
.onWrite(event => {
const shippingManner = event.data.val();
const optionsAPI = {
method: 'GET',
uri: 'https://maps.googleapis.com/maps/api/place/details/json?placeid=ChIJN1t_tDeuEmsRUsoyG83frY4&key=YOUR_API_KEY',
};
return request(optionsAPI)
.then(response => {
const openingHours = response.result.opening_hours;
console.log(openingHours);
return;
})
.catch(function (err) {
console.log(err);
});
});
The response is not a JSON object. It is JSON formatted text and must be parsed to create an object. Modify the code as follows:
return request(optionsAPI)
.then(response => {
const responseObject = JSON.parse(response);
const openingHours = responseObject.result.opening_hours;
console.log(openingHours);
return;
})
.catch(function (err) {
console.log(err);
});
Also, before using the opening_hours or any other property of result, you should test responseObject.status === 'OK' to confirm that a place was found and at least one result was returned.

Resources