Airflow SFTPHook - No hostkey for host found - airflow

I'm trying to use the Airflow SFTPHook by passing in a ssh_conn_id and I'm getting an error:
No hostkey for host myhostname found.
Using the SFTPOperator with the same ssh_conn_id however is working fine. How can I resolve this error?

Just had this issue, the simple trick is to keep your SSH connector inside airflow and to add the following in the "Extra" field :
{"no_host_key_check": true}
Hope it helps !
Edit : Indeed, it allows the man-in-the-middle attack, so even if it helps temporarily, you should get the ssh fingerprint and allow it

The SFTPOperators uses SSHHook. Hence, you should use SSHHook instead.

Related

default_time_zone not recognised by MariaDB

I'm trying to set the GLOBAL time of my MariaDB database to UTC. I've followed the recommendations of their official documentation and default_time_zone="+00:00" in the my.cnf file, however it does NOT work and I get the following error when I start it in the shell: mysql: unknown variable 'default_time_zone=+00:00'.
Does anyone have an idea?
Thanks
Remove the quotes
As the error message states command line client (mysql) complains about unknown variable, since default_time_zone is a server but not a client variable. So you added it in wrong section. Move the entry to the server section:
[server]
default_time_zone=+00:00

Docker container failed to start when deploying to Google Cloud Run

I am new to GCP, and am trying to teach myself by deploying a simple R script in a Docker container that connects to BigQuery and writes the system time. I am following along with this great tutorial: https://arbenkqiku.github.io/create-docker-image-with-r-and-deploy-as-cron-job-on-google-cloud
So far, I have:
1.- Made the R script
library(bigrquery)
library(tidyverse)
bq_auth("/home/rstudio/xxxx-xxxx.json", email="xxxx#xxxx.com")
project = "xxxx-project"
dataset = "xxxx-dataset"
table = "xxxx-table"
job = insert_upload_job(project=project, data=dataset, table=table, write_disposition="WRITE_APPEND",
values=Sys.time() %>% as_tibble(), billing=project)
2.- Made the Dockerfile
FROM rocker/tidyverse:latest
RUN R -e "install.packages('bigrquery', repos='http://cran.us.r-project.org')"
ADD xxxx-xxxx.json /home/rstudio
ADD big-query-tutorial_forQuestion.R /home/rstudio
EXPOSE 8080
CMD ["Rscript", "/home/rstudio/big-query-tutorial.R", "--host", "0.0.0.0"]
3.- Successfully run the container locally on my machine, with the system time being written to BigQuery
4.- Pushed the container to my container registry in Google Cloud
When I try to deploy the container to Cloud Run (fully managed) using the Google Cloud Console I get this error: 
"Cloud Run error: Container failed to start. Failed to start and then listen on the port defined by the PORT environment variable. Logs for this revision might contain more information. Logs URL:https://console.cloud.google.com/logs/xxxxxx"
When I review the log, I find these noteworthy entries:
1.- A warning that says "Container called exit(0)"
2.- Two bugs that say "Container Sandbox: Unsupported syscall setsockopt(0x8,0x0,0xb,0x3e78729eedc4,0x4,0x8). It is very likely that you can safely ignore this message and that this is not the cause of any error you might be troubleshooting. Please, refer to https://gvisor.dev/c/linux/amd64/setsockopt for more information."
When I check BigQuery, I find that the system time was written to the table, even though the container failed to deploy.
When I use the port specified in the tutorial (8787) Cloud Run throws an error about an "Invalid ENTRYPOINT".
What does this error mean? How can it be fixed? I'd greatly appreciate input as I'm totally stuck!
Thank you!
H.
The comment of John is the right source of the errors: You need to expose a webserver which listen on the $PORT and answer to HTTP/1 HTTP/2 protocols.
However, I have a solution. You can use Cloud Build for this. Simply define your step with your container name and the args if needed
Let me know if you need more guidance on this (strange) workaround.
Log information "Container Sandbox: Unsupported syscall setsockopt" from Google Cloud Run is documented as an issue for gVisor.

symfony/silex csr.token_manager not defined

i have a project in silex that works perfectly on my Windows running xampp, but when i cloned it to my Ubuntu it threw the error "InvalidArgumentException in Container.php line 96:
Identifier "csrf.token_manager" is not defined."
I'm not using xampp or lamp on Ubuntu so i guess it could be some configuration of the apache2 or php.
I had an error before with csrf_provider and solved it with
$app['form.csrf_provider'] = null;
but if i try do do something like that it says it espects CsrfProviderInterface or CsrfTokenManagerInterface or null, or it just ignores if i'm logged in or not. I tried to find something around but i just found how to handle the token manager manually, i just want it to work as is. Thanks in advance.
Thanks to mTorres i solved it registering the CsrfTokenProvider
use Silex\Provider\CsrfServiceProvider;
$app->register(new CsrfServiceProvider());
No idea why it works on windows whitout it though.

Getting InvalidProtocolBufferException while running oozie job

I'm getting the below exception while running the sample oozie examples.
I've modified the job.properties located at the /examples/apps/map-reduce with the appropriate nameNode and jobTracker details.
I'm using the below command to run the oozie job:
"sudo oozie job -oozie http://ip-10-0-20-143.ec2.internal:11000/oozie -config examples/apps/map-reduce/job.properties -run"
Error: E0501 : E0501: Could not perform authorization operation, Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.; Host Details : local host is: "ip-10-0-20-143.ec2.internal/10.0.20.143"; destination host is: "ip-10-0-20-144.ec2.internal":50070;
The hadoop core-site.xml also has the correct proxyuser details for oozie user.
Really, dont know where it is going wrong?? :(
I will answer in case someone will google up this page.
In my case the cause was in using http address for Name Node.
You should check your job configuration and if there stays something like:
nameNode=yourhostname:50070
You should change it to something like this:
nameNode=hdfs://yourhostname:8020
Check your ports first of course!
Please notice that jobTracker parameter has different notation. In my case it's:
jobTracker=yourhostname:8021
and it works fine.
Hope it helps to someone.

How to install R packages via proxy [user + password]

I need authentication to use internet, say these are my variables:
Proxy : 1ncproxy1
Port : 80
Loggin : MyLoGiN
Pass : MyPaSs
How can I install a package in R and its addon packages ?
Such that the following would work:
install.packages("TSA", dependencies=TRUE)
Without our having internet connection failutes?
I tried this:
Sys.setenv("ftp_proxy" = "1ncproxy1","ftp_proxy_user"="MyLoGiN","ftp_proxy_password"="MyPaSs")#Port = 80
ButI get :
Warning: unable to access index for repository http://cran.ma.imperial.ac.uk/src/contrib
# or
cannot open: HTTP status was '407 Proxy Authentication Required'
Many thanks,
You are probably on Windows, so I would advice you to check the 'R on Windows FAQ' that came with your installation, particularly Question 2.19: The Internet download functions fail. You may need to restart R with the --internet2 option (IIRC) for the proxy settings to come into effect.
I always found this very cumbersome. An alternative is to install a proxy-aware webdownloader as eg wget (as a windows binary) where you set the proxy options in a file in your home directory. This is all from memory, I think the last time I was faced with such a proxy was in 2005 so YMMV.
As #juba states, I think you want to set the http_proxy. From ?download.file:
Usernames and passwords can be set for HTTP proxy transfers via
environment variable http_proxy_user in the form user:passwd.
Alternatively, http_proxy can be of the form
"http://user:pass#proxy.dom.com:8080/"
So, try: Sys.setenv(http_proxy="http://MyLoGiN:MyPaSs#1ncproxy1:80")
Be aware though:
These environment variables must be set before the download code is
first used: they cannot be altered later by calling Sys.setenv.
So you are best off calling it in your .Rprofile
+1 for Juba, above. This worked for me:
$ export http_proxy=http://username:password#the-proxy.mycompany.com:80
$ R
> install.packages("quantmod")
I tried to install swirl package, and had the same problem - proxy with authorisation.
After some experiments i found decision.
May be my answer will help for anybody.
On Windows 7 :
set 1 or more (if ou need) env variables http_proxy (https_proxy and ftp_proxy if you need) (If you dont know how - read there http://www.computerhope.com/issues/ch000549.htm )
Its look like that
env variables for proxy
format http_proxy="http://Proxyusername:ProxyUserPassw#proxyServName:ProxyPort"
Use '#' instead of %40
In RStudio Tools->Global Options->Packages release check box "Use Internet Explorer library /proxy for HTTP"
As Jeff Taylor wrote, R can indirectly make use of a proxy server. You need to specify the proxy server for both, http and https protocols, as follows:
$ export http_proxy=http://user:pass#proxy_server:port
$ export https_proxy=http://user:pass#proxy_server:port
$ R
> install.packages("<package_name>")
I just tested this solution and it works like a charm. The answer from Jeff was correct but unfortunatelly for most cases incomplete, as most of the servers are nowadays accesible over https.

Resources