imgix.com downloads images instead of browsing to them - cdn

I am using the imgix.com CDN for a test project and for some reason it keeps downloading the images instead of browsing and applying the rules to to them.
So if I type in myprefix.imgix.net/myimage.png it simply downloads it and if I type https://myprefix.imgix.net/myimage.png~text?txtsize=44&txt=470%C3%97480&w=450&h=480 nothing happens.
Has anyone come across this problem?
Thanks

These are two separate issues:
1) If you request an imgix URL without adding any query parameters, imgix will just act as a passthrough to your source. If your images are being treated as a download by the browser rather than as images to display, there must be something mis-configured at the source level. Not knowing anything about your source, I really can't offer any better advice here.
2) The myimage.png~text URL isn't working because you shouldn't be using ~text at all here. Take those five characters out of your URL and it should work as you expect.
Imgix's ~text endpoint is a way to request an image where the "base image" is text rather than a real image. In trying to combine a real base image (myimage.png, in your URL above) with this text-only endpoint (~text), you're making a request that imgix doesn't know how to handle.
If you've got further questions about your imgix integration, especially if they're configuration questions that involve your specific account and settings, I'd encourage you to send your questions to support#imgix.com instead of StackOverflow. While SO is a great place to answer one-off questions, writing into our support-ticket system will allow us to answer account-specific questions a lot easier.

Once your Source has been configured and deployed, you can begin making image requests to imgix. These requests differ slightly for each imgix Source type, but they all have the same basic structure:
https:// example.imgix.net imgix domain / products/desk.jpg path ? w=600&exp=1 query string
The hostname, or domain, of the imgix URL will have the form YOUR_SOURCE_NAME.imgix.net. In the above URL, the name of the Source is example, so the hostname takes the form of example.imgix.net. Different hostnames can be set in your Source by clicking Manage under the Domains header.
The path consists of any additional directory information required to locate your image within your image storage (e.g. if you have different subfolders for your images). In this example, /products/desk.jpg completes the full path to the image.
imgix’s parameters are added to the query string of the URL. In the above example, the query string begins with ?w=600 and the additional parameters are linked with ampersands. These parameters dictate how images are processed. In the above URL, w=600 specifies the width of the image and exp=1 adjusts the exposure setting.

Related

How do Website like craigslist create content depending on the city your computer is located

I am looking to create a website that generates content depending on your city location. The best Example I found was Craigslit.They generate a web domain name like https://yourcity.craigslist.org/ when you either click on the city or it locates where you are. I was just wondering if I could get some help on how to build something like that.
The web pages are created using a template that doesn't change, populated with data that is selected from a database server, using your location to lookup appropriate items.
The subdomain (your city) is usually defined in the DNS record, just like www. There would be an entry for chicago.craigslist.org, for example.
edit
If you're asking how they know where you are, they can take a guess based on your IP address, however this isn't very reliable. Google does this also, when getting you search results that could be localized.
So yeah, it is expected of you to type some stuff into google to (try) find your answer (like detect city from javascript will bring up a lot of results for your problem.)
But yeah you would use a service like https://ipstack.com/ to detect where you live, depending on where you live the accuracy increases. (EU has some rules and regulations that make it a lot less accurate than if you would be living in the US)
Once you have a database with content - For example craigslist has a database of second hand items sold by people from all over. When you connect to craigslist they ask a service where your request came from - then use some filter function based on your location to match the results.
Good luck
Your IP address can be used to make an educated guess as to where you are, but it's not very accurate. When providing you with search results that might be localised, Google also does this.To know more about creating a website like craigslist follow here
https://www.yarddiant.com/blog/classifieds/how-to-build-a-website-like-craigslist.html

Scene7 URL parameters

Our business uses Adobe Scene7. One of the things we need to be able to do is share the URL of an image, to a vendor for all of the products with an image.
We have identified the construct of the URL to predict the link, and then we ping the image URL to ensure it is valid and available for viewing.
As of late, we've come into a problem where many of the images are not rendering...
Most images:
http://s7d5.scene7.com/is/image/LuckyBrandJeans/7W64372_960_1
Some images:
https://s7d9.scene7.com/is/image/LuckyBrandJeans/7Q64372_960_1
The only difference appears to be the s7d5 becomes s7d9 on some images. What drives that?
How do we get a list of all of those URL's if we can't predict the d9 vs d5?
I'm not sure it matters. I think all you need is the filename. It looks like if you take the filename "7W64372_960_1" it works on both s7d5 and s7d3:
http://s7d5.scene7.com/is/image/LuckyBrandJeans/7W64372_960_1
http://s7d9.scene7.com/is/image/LuckyBrandJeans/7W64372_960_1
In fact, you can change it to s7d1, s7d2, s7d3, etc. and it still works.
So, I think if you were to build some sort of template you could just pick whatever URL you wanted and just append the filename on the end like:
http://s7d5.scene7.com/is/image/LuckyBrandJeans/{{imageFileName}}
We have the same thing with our company.
One domain serves the images for the lower "sandbox" environment (d5) and the other serves the images to your live environment (d9).

How to know if a user clicked a link using its network traffic

I have large traffic files that I'm trying to analyze in order to get statistical features of users.
One of the features that I would like to extract is links clicking in specific sites (for examples - clicking on popups and more)
My first idea was to look in the packets' content and search for hrefs and links, save them all in some kind of data structure with their time stamps, and then iterate again over the packets to search for requests at any time close to the time the links appeared.
Something like in the following pseudo code (in the following code, the packets are sorted by flows (flow: IP1 <=> IP2)):
for each packet in each flow:
search for "href" or "http://" or "https://"
save the links with their timestamp
for each packet in each flow:
if it's an HTTP request and its URL matches any URL in the list and the
time is close enough, record it
The problem with this code is that some links are dynamically generated while the page is loading (using javascript or so), and cannot be found using the above method.
I have also tried to check the referrer field in the HTTP header and look for packets that were referred by the relevant sites. This method generates a lot of false positives because of iframes and embedded objects.
It is important to mention that this is not my server, and my intention is to make a tool for statistical analysis of users behavior (thus, I can't add some kind of click tracker to my site).
Does anyone have an idea what can I do in order to check if the users clicked on links according to their network traffic?
Any help will be appreciated!
Thank you

Parsing Web page with R

this is my first time posting here. I do not have much experience (less than a week) with html parsing/web scraping and have difficulties parsing this webpage:
https://www.jobsbank.gov.sg/
What I wan to do is to parse the content of all available job listing in the web.
my approach:
click search on an empty search bar which will return me all records listed. The resulting web page is: https://www.jobsbank.gov.sg/ICMSPortal/portlets/JobBankHandler/SearchResult.do
provide the search result web address to R and identify all the job listing links
supply the job listing links to R and ask R to go to each listing and extract the content.
look for next page and repeat step 2 and 3.
However, the problem is that the resulting webpage I got from step 1 does not direct me to the search result page. Instead, it will direct me back to the home page.
Is there anyway to overcome this problem?
Suppose I managed to get the web address for the search result, I intent to use the following code:
base_url <- "https://www.jobsbank.gov.sg/ICMSPortal/portlets/JobBankHandler/SearchResult.do"
base_html <- getURLContent(base_url,cainfo="cacert.pem")[[1]]
links <- strsplit(base_html,"a href=")[[1]]
Learn to use the web developer tools in your web browser (hint: Use Chrome or Firefox).
Learn about HTTP GET and HTTP POST requests.
Notice the search box sends a POST request.
See what the Form Data parameters are (they seem to be {actionForm.checkValidRequest}:YES
{actionForm.keyWord}:my search string )
Construct a POST request using one of the R http packages with that form data in.
Hope the server doesn't care about the cookies, if it does, get the cookies and feed it cookies.
Hence you end up using postForm from RCurl package:
p = postForm(url, .params=list(checkValidRequest="YES", keyword="finance")
And then just extract the table from p. Getting the next page involves constructing another form request with a bunch of different form parameters.
Basically, a web request is more than just a URL, there's all this other conversation going on between the browser and the server involving form parameters, cookies, sometimes there's AJAX requests going on internally to the web page updating parts.
There's a lot of "I can't scrape this site" questions on SO, and although we could spoonfeed you the precise answer to this exact problem, I do feel the world would be better served if we just told you to go learn about the HTTP protocol, and Forms, and Cookies, and then you'll understand how to use the tools better.
Note I've never seen a job site or a financial site that doesn't like you scraping its content - although I can't see a warning about it on this site, that doesn't mean it's not there and I would be careful about breaking the Terms and Conditions of Use. Otherwise you might find all your requests failing.

Caching images with different query strings (S3 signed urls)

I'm trying to figure out if I can get browsers to cache images with signed urls.
What I want is to generate a new signed url for every request (same image, but with an updated signature), but have the browser not re-download it every time.
So, assuming the cache-related headers are set correctly, and all of the URL is the same except for the query string, is there any way to make the browser cache it?
The urls would look something like:
http://example.s3.amazonaws.com/magic.jpg?WSAccessKeyId=stuff&Signature=stuff&Expires=1276297463
http://example.s3.amazonaws.com/magic.jpg?WSAccessKeyId=stuff&Signature=stuff&Expires=1276297500
We plan to set the e-tags to be an md5sum, so will it at least figure out it's the same image at that point?
My other option is to keep track of when last gave out a url, then start giving out new ones slightly before the old ones expire, but I'd prefer not to deal with session info.
The browser will use the entire URL for caching purposes, including request parameters. So if you change a request parameter it will effectively be a new "key" in the cache and will always download a new copy of that image. This is a popular technique in the ad-serving world - you add a random number (or the current timestamp) to the end of the URL as a parameter to ensure the browser always goes back to the server to make a new request.
The only way you might get this to work is if you can make the URL static - i.e. by using Apache rewrite rules or a proxy of some sort.
I've been having exactly the same issue with S3 signed URLs. The only solution I came up with is to have the URLs expire on the same day. This is not ideal but at least it will provide caching for some time.
For example all URLs signed during April I set the expiry on May 10th. All URLs signed in June I set to expire on July 10th. This means the signed URLs will be identical for the whole month.
Just stumbled on this problem and found a way to solve it. Here's what you need to do:
Store first url string (in localStorage for example);
When you receive img url next time just check if their main urls match (str1.split('?')[0] === str2.split('?')[0])
If they do, use the first one as img src attribute.
Hope it helps someone.

Resources