I'm scraping a website with Puppeteer using a Nest.js module.
The scraping works the first time I call the GET method, but it won't scrape again after that. Which means I have to restart the server each for each scraping, which isn't ideal.
Scraping code (Simplified):
#Injectable()
export class AppService {
constructor(#InjectBrowser() private readonly browser: Browser) {}
async getFoodoraRestaurantMenus(url: string): Promise<string> {
console.log("Calling method");
const page = await this.browser.newPage();
console.log('Never gets here the second time');
await page.goto(url);
const content = await page.evaluate(() => {});
await this.browser.close();
return JSON.stringify(content, null, 4);
}
}
The second time I call getFoodoraRestaurantMenus I never get to the second console log, so a new page is never opened. I'm unsure if I need to do something more than browser.close(), that was all I could find.
I have no idea why I need to restart the server each time I want to perform a scrape. All help I can get is greatly appreciated.
Thanks.
If you only want to free the memory and reuse the same browser, you could try only closing the page instead of the browser:
await page.close();
Related
This question already has answers here:
Internal API fetch with getServerSideProps? (Next.js)
(3 answers)
Closed last year.
using getServerSideProps to do fetch internal API data, the TTFb time is really high, my page run slow.
So I'm searching for other fetching strategies, my MongoDB data is not large (DATABASE SIZE: 33.84KB), and data does not change often, the best way I think is the State generation page, the total should only 25 pages being generated, but the problem is getStateProps() method can't fetch internal API (development works, production not).
I try:
useEffect : slower than getServerProps
export the MongoDB file to data.js and put it into the project as a fake API: it can work with getStaticProp but the date I still want to storge in the database.
Host API to other domains as external: getStateProps works, approach weird
hard code every 25 page (X)
Question:
the method to improve the code and TTFB
Why getStateProps can't fetch internal API, why design like that.
I saw an article on MongoDB, here is the link, just don't use internal API and then fetch data direct to MongoDB in getStaticProps, here in my code.
BEFORE
export async function getServerSideProps() {
const response = await fetch(`${server}/api/gallery`);
const data = await response.json();
if (!data) {
return {
notFound: true,
};
}
return {
props: { data },
};
}
AFTER
export async function getStaticProps() {
await dbConnect()
//connect to mongodb
const gallery = await art.find()
//i use mongoose model to fetch data
return {
props:{
data:JSON.parse(JSON.stringify(gallery))
}
}
}
I have a basic next.js application that does two things:
provide an authentication mechanism using keycloak
talk to a backend server that authorizes each request using the keycloak-access-token
I use the #react-keycloak/ssr library to achieve this. The problem now is that after I login and get redirected back to my application the cookie that contains the kcToken is empty. After I refresh my page it works like expected.
I understand that maybe my entire process flow is wrong. If so, what is the "usual" way to achieve what is mentioned above?
export async function getServerSideProps(context) {
const base64KcToken = context.req.cookies.kcToken // the cookie that keycloak places after login
const kcToken = base64KcToken ? Buffer.from(base64KcToken, "base64") : ""
// the backend server passes the token along to keycloak for role-based authorization
const res = await fetch(`${BACKEND_URL}/info`, {
headers: {
"Authorization": "Bearer " + kcToken
}
})
const data = await res.json()
// ... exception handling is left out for readability ...
return {
props: {
data
}
}
}
export default function Home({data}) {
const router = useRouter() // the next.js client side router to redirect to keycloak
const { keycloak, initialized } = useKeycloak() // keycloak instance configured in _app.js
if (keycloak && !initialized && keycloak.createLoginUrl) router.push(keycloak.createLoginUrl())
return (
<div> ... some jsx that displays data ... </div>
)
}
This process basically works but it feels really bad because a user that gets redirected after login is not able to see the fetched data unless he refreshes the entire page. This is because when getServerSideProps() is called right after redirect the base64KcToken is not there yet.
Also everything related to the login-status (eg. logout button) only gets displayed after ~1sec, when the cookie is loaded by the react-keycloak library.
Using NextJS, I am defining some routes in getStaticPaths by making an API call:
/**
* #dev Fetches the article route and exports the title and id to define the available routes
*/
const getAllArticles = async () => {
const result = await fetch("https://some_api_url");
const articles = await result.json();
return articles.results.map((article) => {
const articleTitle = `${article.title}`;
return {
params: {
title: articleName,
id: `${article.id}`,
},
};
});
};
/**
* #dev Defines the paths available to reach directly
*/
export async function getStaticPaths() {
const paths = await getAllArticles();
return {
paths,
fallback: false,
};
}
Everything works most of the time: I can access most of the articles, Router.push works with all URLs defined.
However, when the article name includes a special character such as &, Router.push keeps working, but copy/pasting the URL that worked from inside the app to another tab returns a page:
An unexpected error has occurred.
In the Network tab of the inspector, a 404 get request error (in Network) appears.
The component code is mostly made of API calls such as:
await API.put(`/set_article/${article.id}`, { object });
With API being defined by axios.
Any idea why it happens and how to make the getStaticPaths work with special characters?
When you transport values in URLs, they need to be URL-encoded. (When you transport values in HTML, they need to be HTML encoded. In JSON, they need to be JSON-encoded. And so on. Any text-based system that can transport structured data has an encoding scheme that you need to apply to data. URLs are not an exception.)
Turn your raw values in your client code
await API.put(`/set_article/${article.id}`)
into encoded ones
await API.put(`/set_article/${encodeURIComponent(article.id)}`)
It might be tempting, but don't pre-encode the values on the server-side. Do this on the client end, at the time you actually use them in a URL.
I would like to use cookies for authentication in my nextjs app. I have a bug in my code where the SSR won't work because somewhere in the execution process of the code it does not find the cookie on the first render of the page so it will throw an error. I have played with the code a lot now and have gotten it to a state where the data will eventually load but will not be a SSR page. Has anyone else dealt with this problem?
I am using next, apollo client and apollo server express.
When you do an SSR, the code runs on the server. The cookies you added in browser are not available as default. You can access then in getInitialProps or getServerSideProps via req.headers.cookie and pass it to the authentication code again.
Alternately, you can use an npm module like react-cookie https://www.npmjs.com/package/react-cookie which support isomorphic cookies. More examples on integration are available on the link.
We can custom the headers before sending.
Please check my full answer at this link https://github.com/apollographql/apollo-client/issues/5089#issuecomment-749301669
async function getHeaders(ctx) {
if (ctx?.req?.cookies) {
const cookieItems = []
for (let key of Object.keys(ctx?.req?.cookies)) {
cookieItems.push(`${key}=${ctx.req.cookies[key]}`)
}
return {
cookie: cookieItems.join('; ')
}
}
return {
}
}
WithApollo.getInitialProps = async (ctx) => {
const { AppTree } = ctx
// Initialize ApolloClient, add it to the ctx object so
// we can use it in `PageComponent.getInitialProp`.
const apolloClient = (ctx.apolloClient = initApolloClient(null, await getHeaders(ctx)))
// Run wrapped getInitialProps methods
let pageProps = {}
if (PageComponent.getInitialProps) {
pageProps = await PageComponent.getInitialProps(ctx)
}
............
}
}
I am trying to implement the auto logout feature after x mins of inactivity on flutter while using Firebase , authentication method being email.
I have searched online but whatever I've found is not for flutter.
Any help will be greatly appreciated thank you!
you can use interceptor for all api instance like this, but instead customize the onRequest method.
the idea is: save time information when hit api occurred. and then whenever another hit api occur, check duration between now and last saved time.
if the duration is longer than, let's say 5 minutes, then you can call method logout, else you can continue the request
here some snippet to make it clear:
Future<Dio> getApiClient() async {
_dio.interceptors.clear();
_dio.interceptors
.add(InterceptorsWrapper(onRequest: (RequestOptions options) {
// Do something before request is sent
var pref = await SharedPreferences.getInstance();
var timeNow = DateTime.now().millisecondsSinceEpoch;
var lastHitApi = pref.getInt(LAST_HIT_API);
var delay = timeNow - lastHitApi;
pref.setInt(LAST_HIT_API, timeNow);
if (delay > DELAY_MAX) {
// do logout here
}
return options;
},onResponse:(Response response) {
// Do something with response data
return response; // continue
}, onError: (DioError error) async {
// Do something with response error
}));
_dio.options.baseUrl = baseUrl;
return _dio;
}
Edit: i guess this one is more preferable
Set the timeout duration and call logout funtion
Timer(Duration(seconds: 5), () => logOut());