I'm trying to download a few million web pages that google has already indexed. I've used proxies with mixed success directly but for the bandwidth I'm considering it's relatively cost prohibitive.
Is there any way to get access to google's cache of websites with datacenter ip's? - I've not gotten it to work reliably.
I've tried using HTTRACK, outwit and writing a basic script on google cloud.
I always run into limitations with proxy services.
In a perfect world I could just batch download google cache pages.
Related
I'm hosting a static website generated by Hugo on Google Firebase. I know Firebase simplifies a lot of things from Google Cloud via its console and default settings. However, I'm hoping to make my site faster, and I was wondering if there was any settings I could change on the larger Google Cloud platform such as increasing the number of places the site deploys from, or using an SSD, or a faster virtual machine etc.
This is for a static website generated by Hugo being hosted on Google Firebase and with a CDN via CloudFlare. I've done plenty of optimization via the typical website optimization stuff, but I was wondering if there was anything I could do on Google Cloud Platform to increase the reach of my site and its speed etc.
Firebase Hosting doesn't have any configurations that tune its runtime behavior, in terms of performance. The only configurations it has are documented here.
I use Firebase hosting to host my Single Page App.
I have two versions of my websites :
one optimized for https/1.x
one optimized for https/2
Firebase hosting is now using https/2. But how to optimize the website for the people still using a browser who only support https/1.x? Knowing that, as of today, there is still 20% of the browser who don't support https/2.
I have just started playing with Google cloud. I used to work on normal servers so I need advice.
I created my first instance and deployed Wordpress. I installed woocommerce plugin. The shop is quite fast and I am happy (with the lowest settings) but now:
I wanted edit function.php but I can't. The attributes are read only so How can I change it?
How to get access to my all files I can't see them in storage cloud. How to set up ftp?
What about database for my shop? I understand I can create new data base but where to access to current data base of my wordpress.
What should I deploy more to work comfortable with my wordpress?
About ssl
SNI SSL certificate slots are offered for no additional charge for
accounts that have billing activated. Free accounts are limited to 5
certificates.
I have no experience with ssl but I plan run shop so what it means. Free certificates for 5 instances or 5 deployement ? How many certificates do I need to run one shop?
I know there are many questions but I wanted to go further and all advise on internet is outdated because are for older versions of google cloud. Please help me to understand this all.
I assume you're attempting to use WordPress on Google App Engine.
GAE has no real filesystem, so you cannot write to it (unless you juggle with the API GAE offers). Editing happens locally using the GAE SDK development server and you deploy your changes to the App Engine ecosystem using the SDK interface (GUI or CLI). All application writes should go to Google Cloud Storage (which is similar to Amazon S3 and the like).
I'm not certain whether the Google Cloud Storage can be accessed via traditional FTP. There might be some middleware required. You can see and browse the contents of your buckets in the developer project console (https://console.developers.google.com/).
The databases are on a separate "server" when using GAE. MySQL instances are spawned into the Google Cloud SQL ecosystem, which are available for App Engine and Compute Engine instances (and why not other places too). You can define the GCSQL address and port to wp-config.php like normally. You need to create a local MySQL database for your local installation. More: https://cloud.google.com/appengine/docs/php/cloud-sql/
When working with Google App Engine you should deploy the whole WordPress installation (wp-config.php, wp-includes/, wp-admin/, wp-content/, etc.) in order for it to work in the GAE system. For a "better" deployment system you should do some searching or ask a new question dedicated for that issue.
The certificates themselves on GAE are not free, but the "slots" you put the certificates into are. Free projects (no billing enabled) offer 5 free slots where you can put your purchased certificates. SSL SNI means that you can use multiple different domain/host certificates under a single listening IP address (which some years back was not that simple to do). What this all means that GCP offers a way to use certificates with their services, but you still need to get the certificates themselves elsewhere.
Have you seen the GAE starter project offered by Google: https://googlecloudplatform.github.io/appengine-php-wordpress-starter-project/ ? It makes your live a bit easier when developing WP sites for Google App Engine.
If you're working with Google Compute Engine instances, then they should operate just like regular VPS machines, with some Google restrictions applied. I have not used them so I do not know the specifics.
We need to use WordPress for a site that is going to have high traffic. We expect an initial load of 500K page views a month and will increase to about 8M page views a month. Usage will be mainly during working time, which is around 20 days a month during 8 hours.
We are thinking on using Google App Engine with Google Cloud SQL. We were wondering how well it scales for that kind of load. Theory says Google App Engine should scale automatically, but not sure how good is Google Cloud SQL when scaling. This will be a mostly read database, which a few writes.
So the questions are:
Does anyone has experience deploying WordPress on Google App Engine + Google Cloud SQL with a high load?
Do you know if there are problem installing plugins for WordPress on Google App Engine? Do they need any especial modification?
To save you some time, look to other solutions.
I'm working on this exact task now, but I'm about to give up due to Cloud SQL's very poor performance. It might work fine for websites like Orane's, but for larger more complex websites the high latency and slow response time from Cloud SQL means for us 3 second load times instead of 0.7s that we have on our VPS. I have tested by connecting to both IP and Socket, SSL and without, and it's just not usable as-is. If you test with Amazon RDS, the difference in speed is shocking.
The only other solution we've been able to come up with is to set up an API server that continously caches data to memcache and only serve static pages on App Engine with most dynamic content loading through AJAX. Scary!
Keep trying, but you'd be better off looking into RackSpace Cloud DB or Amazon RDS.
There are no problems at all and it doesn't need any modifications. Everything works perfectly and from previous projects I've done on appengine, I know it scales extremely well. I've just set up my new wordpress blog on appengine here and everything works the same but loads a lot faster. Its a little tricky to get setup however..I'm working on a tutorial for that.
We have an application deployed on Windows Azure as a Web Role and we are using Pingdom for testing page load times: http://tools.pingdom.com/fpt/
The url for the application on Windows Azure is: http://www.doctorspring.com .
The load time of the app is usually around 7s.
The database is an SQL Azure database and the role and the database are in the same zone.
Sample pingdom result: http://tools.pingdom.com/fpt/#!/CllGggrMz/http://www.doctorspring.com/
Sample pingdom result(with gzip):http://tools.pingdom.com/fpt/#!/f2TUbR6OX/www.doctorspring.com
Suspecting that Azure could be the problem, we tried a free hosting from Somee as:
http://www.doctorspring.somee.com
The load time of the app on Somee is around 3.5s.
Sample pingdom result: http://tools.pingdom.com/fpt/#!/o3gZOjTwH/http://www.doctorspring.somee.com/
That is a huge performance issue for us.
Can you please help us understand the problem with Azure or suggest a method, as to how can we overcome it?
Thanks,
Manish
In both cases, loading the homepage is unacceptably slow - 3.5 seconds to generate a page is around 10 times slower than you need to be when there's no load on the site. I'd expect the site to crumble under even moderate load with this kind of performance.
Without knowing how the site is constructed, it's hard to explain the reason one environment is faster than the other - but my guess is that whatever is generating the page (some kind of CMS?) is the cause. Azure is known to be a touch slow when doing database queries - though normally this only manifests itself under extreme conditions.
I'd recommend tuning the CMS - especially with caching. We found that Azure is normally pretty fast, but when doing database lookups (e.g. retrieving content for the CMS), it can be variable; if your CMS is doing a LOT of database queries to get the homepage content, it's going to be slow.
It's also worth running Yslow - there's some low-hanging fruit on getting performance up.
What services are you running in Azure? Web-role, VM, Website? Are you connecting to an Azure Database instance from the homepage (if so how many distinct calls are you making)?. I'm getting around a 7.5 second load time from London, but to be honest even 3 seconds is too slow for the homepage. It's hard to know what's causing the prolonged page-load but if you are connecting to a DB instance there's a great deal you can do e.g.
Render the page and make some asynchronous calls to spool in additional data.
Make sure your Azure services are running close together
Consider caching database content to a blob. E.g. for the data in "Medical Questions Answered in Last 24 Hours" if you are pulling this from a DB on every load you could considerably speed up access by routinely caching this to a html file stored in a blob container and inject it into the page.
If you must make DB calls from the homepage try to make as few round trips as possible by batching up your queries into a stored procedure.
I've made a lot of assumptions here, but there are certainly things you could do to drastically improve performance on this page.