Google Cloud Composer and MS SQL using Cloud Proxy - airflow

I'm building a Cloud SQL (MS SQL Server) to BigQuery integration using Airflow on GCP (Composer). I've setup a cloud SQL Proxy in GKE cluster which is running fine, no errors there:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
run: cloud-sql-proxy
name: cloud-sql-proxy
namespace: cloud-sql-to-bq
spec:
replicas: 1
selector:
matchLabels:
run: cloud-sql-proxy
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
labels:
run: cloud-sql-proxy
spec:
containers:
- command:
- /cloud_sql_proxy
- -instances=[INSTANCE-NAME]=tcp:0.0.0.0:1433
image: b.gcr.io/cloudsql-docker/gce-proxy:latest
imagePullPolicy: IfNotPresent
name: airflow-sqlproxy
ports:
- containerPort: 1433
protocol: TCP
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
nodeSelector:
cloud.google.com/gke-nodepool: default-pool
restartPolicy: Always
My DAG:
dag = DAG('mssql-export-demo', catchup=False, default_args=default_args)
cloud_storage_bucket_name = 'mssql-export-test'
export_customers = MsSqlToGoogleCloudStorageOperator(
task_id='export_analysis',
sql='SELECT * FROM vwAnalysis;',
bucket=cloud_storage_bucket_name,
filename='data/customers/export.json',
schema_filename='schemas/export.json',
mssql_conn_id='cloud_sql_proxy_conn',
dag=dag
)
I've also created a connection in Airflow to point to cloud_sql_proxy_conn.
When I run the DAG I get the following error:
[2020-11-28 01:59:20,555] {taskinstance.py:1153} ERROR - Connection to the database failed for an unknown reason.
Traceback (most recent call last)
File "src/pymssql.pyx", line 636, in pymssql.connec
File "src/_mssql.pyx", line 1964, in _mssql.connec
File "src/_mssql.pyx", line 683, in _mssql.MSSQLConnection.__init_
_mssql.MSSQLDriverException: Connection to the database failed for an unknown reason
There's no other error message so this makes it quite difficult to debug. Anybody have experience with MS SQL on Cloud SQL and Composer to help me figure this out?

Airflow now provides CloudSqlInstanceExportOperator which means there's no need of setting up a cloud SQL proxy in GKE.

Related

Airflow helm chart 1.7 - Mounting DAGs from an externally populated PVC and non-default DAG path

I want to use Airflow in Kubernetes on my local machine.
From the Airflow helm chart doc I should use a PVC to use my local DAG files, so I setup my PV and PVC like so:
apiVersion: v1
kind: PersistentVolume
metadata:
name: dags-pv
spec:
volumeMode: Filesystem
storageClassName: local-path
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
hostPath:
path: /mnt/c/Users/me/dags
type: Directory
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: dags-pvc
spec:
storageClassName: local-path
accessModes:
- ReadWriteMany
resources:
requests:
storage: 3Gi
Then I create a override-values.yaml file
config:
core:
dags_folder: "/usr/somepath/airflow/dags"
dags:
persistence:
enabled: true
existingClaim: "dags-pvc"
gitSync:
enabled: false
Note here that I want to change the default DAG folder path. And that's where I am having difficulties (it works if I keep the default DAG folder path). I don't know how to create a mount point and attach the PVC to it.
I tried to add, in my override file:
worker:
extraVolumeMounts:
- name: w-dags
mountPath: "/usr/somepath/airflow/dags"
extraVolumes:
- name: w-dags
persistentVolumeClaim:
claimName: "dags-pvc"
scheduler:
extraVolumeMounts:
- name: s-dags
mountPath: "/usr/somepath/airflow/dags"
extraVolumes:
- name: s-dags
persistentVolumeClaim:
claimName: "dags-pvc"
But that doesn't work, my scheduler is stuck on Init:0/1: "Unable to attach or mount volumes: unmounted volumes=[dags], unattached volumes=[logs dags s-dags config kube-api-access-9mc4c]: timed out waiting for the condition". So, I can tell I broke a condition - dags should be mounted (aka my extraVolumes section is wrong) - but I am not sure where to go from here.

Kubernetes Mariadb service cannot be accessed

i wanted to make wordpress with kubernetes, but wordpress cant use host from mariadb-service. This is my script
---
apiVersion: v1
kind: Service
metadata:
name: db-wordpress
labels:
app: mariadb-database
spec:
selector:
app: mariadb-database
ports:
- port: 3306
clusterIP: None
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: mariadb-database
spec:
selector:
matchLabels:
app: mariadb-database
template:
metadata:
labels:
app: mariadb-database
spec:
containers:
- name: mariadb-database
image: darywinata/mariadb:1.0
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: database-secret
key: password
- name: MYSQL_USER
value: blibli
- name: MYSQL_PASSWORD
valueFrom:
secretKeyRef:
name: database-secret
key: password
- name: MYSQL_DATABASE
value: wpdb
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: servicetype
operator: In
values:
- database-mariadb
im already fight this error for 1 week, can somebody help me with this?
note: inside docker container port 3306 is not listening, idk this is wrong or not.
Hi there and welcome to stackoverflow.
There are two issues with your setup. First of all, I have tried running your mysql docker image locally and compared to the official mysql image it is not listening to any port. Without the mysql process listening to any port, you will not be able to connect to it.
Also, you might want to consider a standard internal service type instead of one with clusterIP: None which is called a headless service and usually used for statefulsets and not deployments. more information can be found on the official documentation
So in order to connect from your application to your pod:
Fix problem with your custom mysql image so it actually listens on port 3306 (or whatever you have configured in your image)

kubernetes persistent volume for nginx not showing the default index.html file

I am testing out something with the PV and wanted to get some clarification. We have an 18 node cluster(using Docker EE) and we have mounted NFS share on each of this node to be used for the k8s persistent storage. I created a PV (using hostPath) to bind it with my nginx deployment(mounting the /usr/share/nginx/html to PV).
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs-test-namespace-pv
namespace: test-namespace
spec:
storageClassName: manual
capacity:
storage: 2Gi
accessModes:
- ReadWriteMany
hostPath:
path: "/nfs_share/docker/mynginx/demo"
How to create the PVC:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs-test-namespace-pvc
namespace: test-namespace
spec:
storageClassName: manual
accessModes:
- ReadWriteMany
resources:
requests:
storage: 2Gi
Deployment File:
apiVersion: apps/v1
kind: Deployment
metadata:
name: mynginx
specs:
selector:
matchLabels:
run: mynginx-apps
replicas:2
template:
metadata:
labels:
run: mynginx-apps
spec:
volumes:
- name: task-pv-storage
persistentVolumeClaim:
claimName: nfs-test-namespace-pvc
containers:
- name: mynginx
image: dtr.midev.spglobal.com/spgmi/base:mynginx-v1
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: task-pv-storage
So i assume when my pod starts the default index.html file from the nginx image should be available at the /usr/share/nginx/html within my pod and it should also be copied/available at my /nfs_share/mynginx/demo.
However i am not seeing any file here and when i expose this deployment and access the service it gives me 403 error as the index file is not available. Now when i create an html file either from inside the pod or from the node on the nfs share mounted as PV, it works as expected.
Is my assumption of the default file getting copied to hostpath correct? or am i missing something?
Your /nfs_share/docker/mynginx/demo will not be available in pod, explanation is available here:
apiVersion: v1
kind: PersistentVolume
metadata:
name: task-pv-volume
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data"
The configuration file specifies that the volume is at /mnt/data on the cluster’s Node. The configuration also specifies a size of 10 gibibytes and an access mode of ReadWriteOnce, which means the volume can be mounted as read-write by a single Node. It defines the StorageClass name manual for the PersistentVolume, which will be used to bind PersistentVolumeClaim requests to this PersistentVolume.
You do not see PV on your pod, it's being used to utilize as PVC which then can be mounted inside a pod.
You can read the whole article Configure a Pod to Use a PersistentVolume for Storage which should answer all the questions.
the "/mnt/data" directory should be created on the node which your pod running actually.

GKE Load Balancer Connection Refused

I am trying to set up my app on GKE and use an internal load balancer for public access. I am able to deploy the cluster / load balancer service without any issues, but when I try to access the external ip address of the load balancer, I get Connection Refused and I am not sure what is wrong / how to debug this.
These are the steps I did:
I applied my deployment yaml file via kubectl apply -f file.yaml then after, I applied my load balancer service yaml file with kubectl apply -f service.yaml. After both were deployed, I did kubectl get service to fetch the External IP Address from the Load Balancer.
Here is my deployment.yaml file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 1
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app-api
image: gcr.io/...
ports:
- containerPort: 8000
resources:
requests:
memory: "250M"
cpu: "250m"
limits:
memory: "1G"
cpu: "500m"
- name: my-app
image: gcr.io/...
ports:
- containerPort: 3000
resources:
requests:
memory: "250M"
cpu: "250m"
limits:
memory: "1G"
cpu: "500m"
and here is my service.yaml file:
apiVersion: v1
kind: Service
metadata:
name: my-app-ilb
annotations:
cloud.google.com/load-balancer-type: "Internal"
labels:
app: my-app-ilb
spec:
type: LoadBalancer
selector:
app: my-app
ports:
- port: 3000
targetPort: 3000
protocol: TCP
My deployment file has two containers; a backend api and a frontend. What I want to happen is that I should be able to go on [external ip address]:3000 and see my web app.
I hope this is enough information; please let me know if there is anything else I may be missing / can add.
Thank you all!
You need to allow traffic to flow into your cluster by creating firewall rule.
gcloud compute firewall-rules create my-rule --allow=tcp:3000
Remove this annotation :
annotations:
cloud.google.com/load-balancer-type: "Internal"
You need external Load Balancer.

How to prevent a Cronjob execution in Kubernetes if there is already a job running

I have to deploy a Cronjob in Kubernetes that will create a Job pod every 15 minutes. The job will check if a service is ready to provide new data. Once this service is ready, the job will take more than 1 hour to complete its execution. The problem revolves around the fact that other jobs are going to be executed during this time.
In short, how can I prevent a job to be executed in Kubernetes Cronjobs when there's already a Job running?
CronJob resource has a property called concurrencyPolicy, here an example:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: your-cron
spec:
schedule: "*/40 8-18 * * 1-6"
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
metadata:
labels:
app: your-periodic-job
spec:
containers:
- name: your_container
image: your_image
imagePullPolicy: IfNotPresent
restartPolicy: OnFailure

Resources