I'm running some tests with PhantomJS / CasperJS on Ubuntu and Google Analytics, and i'm having problems with GA to correctly recognize my language settings that i'm sending in HTTP Request Headers.
No matter what i enter in my Accept-Language header, i end up with GA classifying the language as "c".
I'm sure my Accept-Language headers are correct, here's an example:
ACCEPT-ENCODING:gzip, deflate
CONNECTION:Keep-Alive
ACCEPT-LANGUAGE:en-US
USER-AGENT:Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1
ACCEPT:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
After long hours of trial-and-error i found out that C was in fact the default setting for the LANG env variable inside Ubuntu itself:
LANG=C.UTF-8
I can in fact impact Google Analytics classification by altering my ENV variables by using the following command from the command line:
export LC_ALL="en_US.UTF-8"
It does not work if i only set "export LC_LANG" or "LANGUAGE". I am not sure why either.
But how do i control this setting from inside PhantomJS / CasperJS? I can't / don't want to have to change my ENV variables for each PhantomJS run from CLI, i test multiple languages at once in big numbers.
Has anyone experienced this and can help?
I managed to find a hack-ish solution to this problem. I simply use the following command fromt he CLI:
$ LC_ALL=en-gb phantomjs script.js
and that passes the Accept-Language correctly to Google Analytics.
I think there's a problem with CasperJS request-headers being correctly passed on to PhantomJS.
Related
I am having issues with the Dexador library (and same issue with the Drakma library) when attempting to webscrape. They work fine with HTTP requests, but I receive an error when working with HTTPS requests. Here is an example of the basic get request that I sent:
(defvar *url* "https://www.amazon.com/")
(defvar *request* (dex:get *url*))
Then I receive this backtrace:
No OpenSSL version number could be determined, both SSLeay and OpenSSL_version_num failed.
So i'm assuming something may be wrong with the OpenSSL library, but i'm not 100% sure what's going on. I achieved the request successfully on my windows machine from the command line, and OpenSSL works fine on my Windows 10 machine, but not in Common Lisp.
One solution that I saw was to set the keyword argument insecure to true:
(defvar *request* (dex:get *url* :insecure t))
But this does not work and a receive the same error. When I attempted to scrape an http website the request was successful:
(defvar *request* (dex:get "http://paulgraham.com"))
I am wondering if anyone else has had this problem and if there are any solutions that anyone has found. I do not know if this is an issue with Windows 10, a dependency library, or something else.
This works fine from the command line with:
curl -v "https://www.amazon.com"
The issue here is that my version of OpenSSL was too new. CL+SSL is a dependency of Dexador, and CL+SSL wants the 1.0.1 version, and it wants the 32-bit version of OpenSSL specifically. After playing around with various versions, that seemed to fix the problem.
Out of now where I'm getting this 404 error when browser is requesting the jquery.min.map.
Funny this is that I've never added this file to my solution.
Can anyone explain to me how to get ride of this error?
I have no idea where this file is being referenced since I did not add a reference to this file.
Request URL:http://localhost:22773/Scripts/jquery.min.map
Request Method:GET
Status Code:404 Not Found
Request Headersview source
Accept:*/*
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8
Connection:keep-alive
Host:localhost:22773
Referer:http://localhost:22773/Manager/ControlPanel.aspx
User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.62 Safari/537.36
Response Headersview source
Cache-Control:private
Content-Length:4992
Content-Type:text/html; charset=utf-8
Date:Tue, 10 Sep 2013 17:37:40 GMT
Server:Microsoft-IIS/8.0
X-Powered-By:ASP.NET
X-SourceFiles:=?UTF-8?B?YzpcdXNlcnNcYWRtaW5pc3RyYXRvclxkb2N1bWVudHNcdmlzdWFsIHN0dWRpbyAyMDEyXFByb2plY3RzXEFsdW1DbG91ZFxBbHVtQ2xvdWRcU2NyaXB0c1xqcXVlcnkubWluLm1hcA==?=
Source maps are like favicons, a thing that will be loaded by browsers in some circumstances.
Typically, javascript are minified on production servers and debugging javascript on them is difficult.
Source maps are the original versions of minified javascript. It's up to the developers to include them or not on their websites.
In Chrome, you have to activate this functionality for the browser to attempt to download the original non-minified version of a minified script. It is then easier to debug client-side.
Basically, you can't get rid of this error besides providing source maps.
Anyways, see: http://www.html5rocks.com/en/tutorials/developertools/sourcemaps/
I just fixed this in my own app.
Files that you copy from a CDN often have the sourcemap line at the bottom. For axios.min.js, it's
//# sourceMappingURL=axios.min.map
Just remove that line and you won't get that error. Better still, use the version they provide for local loading.
I came across this when developing something without reliable internet access, so I needed the local version. Removing that line solved the problem.
I am running Robot Framework 2.8.7 (Python 2.6.6 on win32) on my laptop and VM
Laptop: Windows 7 Enterprise
VM: Windows Embedded Standard
The post command works in Postmon for laptop and VM
however when I run a test case post command from Laptop, the Post works fine.
The VM test case Post command gives a response of 400. The 400 response is a bad request.
Both Laptop and VM have the same environment variable settings, same Post test case files.
This is the Post command:
#
${tmp} Set Variable Basic${SPACE}dmVyaXNlcTpWZVJpU2VRNTc0Lg==
${headers}=
... Create Dictionary Content-Type application/json Authorization ${tmp}
#
# read the raw data
${file_data}= Get Binary File ${jFileName}
Log To Console ${file_data}
Log *Posting Data*: ${file_data}
${result}= Post Request webapiuri / data=${file_data} headers=${headers}
Any idea why the VM robot framework response is a 400 (Bad request)?
Solved the problem by uninstalling a current version of robotframework-requests and installing an older version 0.3.8 robotframework-requests
For some reason, the newer version would change the way the json file was being sent to the Tomcat. Now it works great! :)
pip uninstall robotframework-requests
pip install robotframework-requests-0.3.8.tar.gz
T
I need authentication to use internet, say these are my variables:
Proxy : 1ncproxy1
Port : 80
Loggin : MyLoGiN
Pass : MyPaSs
How can I install a package in R and its addon packages ?
Such that the following would work:
install.packages("TSA", dependencies=TRUE)
Without our having internet connection failutes?
I tried this:
Sys.setenv("ftp_proxy" = "1ncproxy1","ftp_proxy_user"="MyLoGiN","ftp_proxy_password"="MyPaSs")#Port = 80
ButI get :
Warning: unable to access index for repository http://cran.ma.imperial.ac.uk/src/contrib
# or
cannot open: HTTP status was '407 Proxy Authentication Required'
Many thanks,
You are probably on Windows, so I would advice you to check the 'R on Windows FAQ' that came with your installation, particularly Question 2.19: The Internet download functions fail. You may need to restart R with the --internet2 option (IIRC) for the proxy settings to come into effect.
I always found this very cumbersome. An alternative is to install a proxy-aware webdownloader as eg wget (as a windows binary) where you set the proxy options in a file in your home directory. This is all from memory, I think the last time I was faced with such a proxy was in 2005 so YMMV.
As #juba states, I think you want to set the http_proxy. From ?download.file:
Usernames and passwords can be set for HTTP proxy transfers via
environment variable http_proxy_user in the form user:passwd.
Alternatively, http_proxy can be of the form
"http://user:pass#proxy.dom.com:8080/"
So, try: Sys.setenv(http_proxy="http://MyLoGiN:MyPaSs#1ncproxy1:80")
Be aware though:
These environment variables must be set before the download code is
first used: they cannot be altered later by calling Sys.setenv.
So you are best off calling it in your .Rprofile
+1 for Juba, above. This worked for me:
$ export http_proxy=http://username:password#the-proxy.mycompany.com:80
$ R
> install.packages("quantmod")
I tried to install swirl package, and had the same problem - proxy with authorisation.
After some experiments i found decision.
May be my answer will help for anybody.
On Windows 7 :
set 1 or more (if ou need) env variables http_proxy (https_proxy and ftp_proxy if you need) (If you dont know how - read there http://www.computerhope.com/issues/ch000549.htm )
Its look like that
env variables for proxy
format http_proxy="http://Proxyusername:ProxyUserPassw#proxyServName:ProxyPort"
Use '#' instead of %40
In RStudio Tools->Global Options->Packages release check box "Use Internet Explorer library /proxy for HTTP"
As Jeff Taylor wrote, R can indirectly make use of a proxy server. You need to specify the proxy server for both, http and https protocols, as follows:
$ export http_proxy=http://user:pass#proxy_server:port
$ export https_proxy=http://user:pass#proxy_server:port
$ R
> install.packages("<package_name>")
I just tested this solution and it works like a charm. The answer from Jeff was correct but unfortunatelly for most cases incomplete, as most of the servers are nowadays accesible over https.
I am facing problem while conecting R with internet in my office. May be this due to LAN settings. I tried the almost all possible ways I come across in the web (see below) but still in vain.
Method1: Invoking R using --internet2
Method2: Invoking R by setting ~/Rgui.exe http_proxy=http:/999.99.99.99:8080/ http_proxy_user=ask
Method3: Setting Setinternet2=TRUE
Method4:
curl <- getCurlHandle()
curlSetOpt(.opts = list(proxy = '999.99.99.99:8080'), curl = curl)
Res <- getURL('http://www.cricinfo.com', curl = curl)
In above all methods I can able to load packages directly from CRAN also able to download files using download.file command
But using getURL(RCurl), readHTMLTable(XML), htmlTreeParse(XML) commands I am unable to extract web data. I am getting ~<HEAD>\n<TITLE>Access Denied</TITLE>\n</HEAD>~ error.
How to set LAN proxy settings for XML package in R?
On Mac OS, I found the best solution here. Quoting the author, two simple steps are:
1) Open Terminal and do the following:
export http_proxy=http://staff-proxy.ul.ie:8080
export HTTP_PROXY=http://staff-proxy.ul.ie:8080
2) Run R and do the following:
Sys.setenv(http_proxy="http://staff-proxy.ul.ie:8080")
double-check this with:
Sys.getenv("http_proxy")
I am behind university proxy, and this solution worked perfectly. The major issue is to export the items in Terminal before running R, both in upper- and lower-case.
For RStudio just you have to do this:
Firstly, open RStudio like always, select from the top menu:
Tools-Global Options-Packages
Uncheck the option: Use Internet Explorer library/proxy for HTTP
And then close the Rstudio, furthermore you have to:
Find the file (.Renviron) in your computer, most probably you would find it here: C:\Users\your user name\Documents. Note that if it does not exist you can creat it just by writing this command in RStudio:
file.edit('~/.Renviron')
Add these two lines to the initials of the file:
options(internet.info = 0)
http_proxy="http://user_id:password#your_proxy:your_port"
And that's it..??!!!
The problem is with your curl options – the RCurl package doesn't seem to use internet2.dll.
You need to specify the port separately, and will probably need to give your user login details as network credentials, e.g.,
opts <- list(
proxy = "999.999.999.999",
proxyusername = "mydomain\\myusername",
proxypassword = "mypassword",
proxyport = 8080
)
getURL("http://stackoverflow.com", .opts = opts)
Remember to escape any backslashes in your password. You may also need to wrap the URL in a call to curlEscape.
I had the same problem at my office and I solved it adding the proxy in the destination of the R shortcut; clik on right button of the R icon, preferences, and in the destination field add
"C:\Program Files\R\your_R_version\bin\Rgui.exe" http_proxy=http://user_id:passwod#your_proxy:your_port/
Be sure to put the directory where you have the R program installed. That works for me. Hope this help.
This post pertains to R proxy issues on *nix. You should know that R has many libraries/methods to fetch data over internet.
For 'curl', 'libcurl', 'wget' etc, just do the following:
Open a terminal. Type the following command:
sudo gedit /etc/R/Renviron.site
Enter the following lines:
http_proxy='http://username:password#abc.com:port/'
https_proxy='https://username:password#xyz.com:port/'
Replace username, password, abc.com, xyz.com and port with these settings specific to your network.
Quit R and launch again.
This should solve your problem with 'libcurl' and 'curl' method. However, I have not tried it with 'httr'. One way to do that with 'httr' only for that session is as follows:
library(httr)
set_config(use_proxy(url="abc.com",port=8080, username="username", password="password"))
You need to substitute settings specific to your n/w in relevant fields.
Inspired by all the responses related on the internet, finally I've found the solution to correctly configure the Proxy for R and Rstudio.
There are several steps to follow, perhaps some of the steps are useless, but the combination works!
Add environment variables http_proxy and https_proxy with proxy details.
variable name: http_proxy
variable value: https://user_id:password#your_proxy:your_port/
variable name: https_proxy
variable value: https:// user_id:password#your_proxy:your_port
If you start R from a desktop icon, you can add the --internet flag to the target line (right click -> Properties)
e.g."C:\Program Files\R\R-2.8.1\bin\Rgui.exe" --internet2
For RStudio just you have to do this:
Firstly, open RStudio like always, select from the top menu:
Tools-Global Options-Packages
Uncheck the option: Use Internet Explorer library/proxy for HTTP
Find the file (.Renviron) in your computer, most probably you would find it here: C:\Users\your user name\Documents.
Note that: if it does not exist you can create it just by writing this command in R:
file.edit('~/.Renviron')
Then add these six lines to the initials of the file:
options(internet.info = 0)
http_proxy = https:// user_id:password#your_proxy:your_port
http_proxy_user = user_id:password
https_proxy = https:// user_id:password0#your_proxy:your_port
https_proxy_user = user_id:password
ftp_proxy = user_id:password#your_proxy:your_port
Restart R. Type the following commands in R to assure that the configuration above works well:
Sys.getenv("http_proxy")
Sys.getenv("http_proxy_user")
Sys.getenv("https_proxy")
Sys.getenv("https_proxy_user")
Sys.getenv("ftp_proxy")
Now you can install the packages as you want by using the command like:
install.packages("mlr",method="libcurl")
It's important to add method="libcurl", otherwise it won't work.
On Windows 7 I solved this by going into my environment settings (try this link for how) and adding user variables http_proxy and https_proxy with my proxy details.
If you start R from a desktop icon, you can add the --internet flag to the target line (right click -> Properties) e.g.
"C:\Program Files\R\R-2.8.1\bin\Rgui.exe" --internet2
Simplest way to get everything working in RStudio under Windows 10:
Open up Internet Explorer, select Internet Options:
Open editor for Environment variables:
Add a variable HTTP_PROXY in form:
HTTP_PROXY=http://username:password#localhost:port/
Example:
HTTP_PROXY=http://John:JohnPassword#localhost:8080/
RStudio should work:
Tried all of these and also the solutions using netsh, winhttp etc.
Geek On Acid's answer helped me download packages from the server but none of these solutions worked for using the package I wanted to run (twitteR package).
The best solution is to use a software that let's you configure system-wide proxy.
FreeCap (free) and Proxifier (trial) worked perfectly for me at my company.
Please note that you need to remove proxy settings from your browser and any other apps that you have configured to use proxy as these tools provide system-wide proxy for all network traffic from your computer.
Find your R home with R.home("home")
Add following lines to Renviron.site in your R home
http_proxy=http://proxy.dom.com/
http_proxy_user=user:passwd
https_proxy=https://proxy.dom.com/
https_proxy_user=user:passwd
Open R -> R reads Renviron.site in its home -> it should work :)
My solution on a Windows 7 (32bit). R version 3.0.2
Sys.setenv(http_proxy="http://proxy.*_add_your_proxy_here_*:8080")
setInternt2
updateR(2)