Flume to stream in weather data - hadoop-streaming

I am new to flume.
but i want to stream in weather data form any website to my hdfs location.
so i have created the sink, source and channel...as below
weather.channels= memory-channel
weather.channels.memory-channel.capacity=10000
weather.channels.memory-channel.type = memory
weather.sinks = hdfs-write
weather.sinks.hdfs-write.channel=memory-channel
weather.sinks.hdfs-write.type = logger
weather.sinks.hdfs-write.hdfs.path = hdfs://localhost:8020/user/hadoop/flume
weather.sinks.hdfs-write.rollInterval = 1200
weather.sinks.hdfs-write.hdfs.writeFormat=Text
weather.sinks.hdfs-write.hdfs.fileType=DataStream
weather.sources= Weather
weather.sources.Weather.bind = api.openweathermap.org/data/2.5/forecast/city?id=524901&APPID=********************************
weather.sources.Weather.channels=memory-channel
weather.sources.Weather.type = netcat
weather.sources.Weather.port = 80
so i am using here API to work with this.
What else i can use to stream in weather data, what online website can i use, or which API i should use to configure the source?
While executing the flume-ng command to start the agent i am getting following
15/03/18 11:13:28 ERROR lifecycle.LifecycleSupervisor: Unable to start EventDrivenSourceRunner:{
source:org.apache.flume.source.http.HTTPSource{name:Weather,state:IDLE} } - Exception follows.
java.lang.IllegalStateException: Running HTTP Server found in
source:Weather before I started one.Will not attempt to start.
at com.google.common.base.Preconditions.checkState(Preconditions.java:145)at org.apache.flume.source.http.HTTPSource.start(HTTPSource.java:189)
at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
C15/03/18 11:13:31 INFO lifecycle.LifecycleSupervisor: Stopping lifecycle supervisor 10
15/03/18 11:13:31 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider stopping
15/03/18 11:13:31 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: memory-channel stopped

The "lyfecycle" error you see is the cause of a previous error trying to start the http server.
The original error is likely due to trying to bind to the priviledged 80 port with non root user. Change the port to >1024, e.g. 8080
However, it won't work as you are trying to use. A http or netcat source listens to calls, doesn't go an fetch the url you are setting in bind.
I see two options:
Create a linux daemon to go a wget or curl to that url at regular intervals, save the result to a file and then configure flume with the spool source.
Create your own Flume source that pools that url at regular intervals

Related

Error while trying to send logs with rsyslog without local storage

I'm trying to send logs into datadog using rsyslog. Ideally, I'm trying to do this without having the logs stored on the server hosting rsyslog. I've run into an error in my config that I haven't been able to find out much about. The error occurs on startup of rsyslog.
omfwd: could not get addrinfo for hostname '(null)':'(null)': Name or service not known [v8.2001.0 try https://www.rsyslog.com/e/2007 ]
Here's the portion I've added into the default rsyslog.config
module(load="imudp")
input(type="imudp" port="514" ruleset="datadog")
ruleset(name="datadog"){
action(
type="omfwd"
action.resumeRetryCount="-1"
queue.type="linkedList"
queue.saveOnShutdown="on"
queue.maxDiskSpace="1g"
queue.fileName="fwdRule1"
)
$template DatadogFormat,"00000000000000000 <%pri%>%protocol-version% %timestamp:::date-rfc3339% %HOSTNAME% %app-name% - - - %msg%\n "
$DefaultNetstreamDriverCAFile /etc/ssl/certs/ca-certificates.crt
$ActionSendStreamDriver gtls
$ActionSendStreamDriverMode 1
$ActionSendStreamDriverAuthMode x509/name
$ActionSendStreamDriverPermittedPeer *.logs.datadoghq.com
*.* ##intake.logs.datadoghq.com:10516;DatadogFormat
}
First things first.
The module imudp enables log reception over udp.
The module omfwd enables log forwarding over (tcp, udp, ...)
So most probably - or atleast as far as i can tell - with rsyslog you just want to log messages locally and then send them to datadog.
I don't know anything about the $ActionSendStreamDriver tags, so I can't help you there. But what is jumping out is, that in your action you haven't defined where the logs should be sent to.
ruleset(name="datadog"){
action(
type="omfwd"
target="10.100.1.1"
port="514"
protocol="udp"
...
)
...
}

Running WireMock server as a stand alone

I am trying to set up a mock server using wireMock as a standalone process. I downloaded the jar file and executed the following command:
java -jar wiremock-standalone-2.23.2.jar --port 0
I had to dynamically determine a port because I am already using the default 8080 port for another program running on my machine. It gave me the port number 55142, but when I tried accessing that on the web, it gave me the following error:
HTTP ERROR 403
Problem accessing /__files/. Reason:
Forbidden
Powered by Jetty://
It's probably due to the fact that you just entered http://localhost:55142
and as there are no mappings in ./mappings directory and files in ./files directory (the same where you have your wiremock.jar file is located)
2019-06-04 00:10:58.890 Request was not matched as there were no stubs registered:
{
"url" : "/"
...
}
please try call with __admin endpoint to see if WireMock is working
http://localhost:55142/__admin
please see also docs here for more nice admin commands.

Airflow: How to setup log directory?

I upload a dag file to the web page and when I click 'Graph View' -> ${my_dag} -> 'View Log', it shows:
*** Log file isn't local.
*** Fetching here: http://:8793/log/demo_dag/hello_task/2018-11-14T15:06:00
*** Failed to fetch log file from worker.
*** Reading remote logs...
*** Unsupported remote log location.
I have checked the airflow.cfg and find these config info:
worker_log_server_port = 8793
base_log_folder = /root/airflow/logs
My question is:
How to setup IP address for log service (Only port is setup)?
I have setup directory for log service, why does it still go to /log/.. ?
Any help is appreciated.
This can happen when the task status was manually changed (likely through the "Mark Success" option) and the task never receives a hostname value on the record.
The webserver is attempting to reach out to a server, with no name, to get logs for a task that never ran.
PS: Be careful running processes as the root user.
I've been getting this error, fix it by correcting the socket volume path:
WARNING - OSError while attempting to symlink the latest log directory
In windows the volume will go with a double bar like this:
volumes:
- //var/run/docker.sock:/var/run/docker.sock
Bind to docker socket on Windows
Setting up Airflow to run with Docker Swarm’s orchestration

Kubernetes GKE Error dialing backend: EOF on random exec command

On GKE we experiencing some random error with the API.
Many time ago we have "Error dialing backend: EOF".
We use Jenkins on top of K8s to manage our build. And afew time ago job is killed with this error:
Executing shell script inside container [protobuf] of pod [kubernetes-bad0aa993add416e80bdc1e66d1b30fc-536045ac8bbe]
java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
at com.squareup.okhttp.ws.WebSocketCall.createWebSocket(WebSocketCall.java:123)
at com.squareup.okhttp.ws.WebSocketCall.access$000(WebSocketCall.java:40)
at com.squareup.okhttp.ws.WebSocketCall$1.onResponse(WebSocketCall.java:98)
at com.squareup.okhttp.Call$AsyncCall.execute(Call.java:177)
at com.squareup.okhttp.internal.NamedRunnable.run(NamedRunnable.java:33)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
This case looks a lot like: https://gitlab.com/gitlab-org/gitlab-runner/issues/3247
Many Audit log url:
permission: "io.k8s.core.v1.pods.exec.create"
resource: "core/v1/namespaces/default/pods/pubsub-6132c0bc-2542-46a2-8041-c865f238698d-4ccc0-c1nkz-lqg5x/exec/pubsub-6132c0bc-2542-46a2-8041-c865f238698d-4ccc0-c1nkz-lqg5x"
and
permission: "io.k8s.core.v1.pods.exec.get"
resource: "core/v1/namespaces/default/pods/pubsub-a5a21f14-0bd1-4338-87b1-8658c3bbc7ad-9gm4n-8nz14/exec"
But i don't unerstand why this error comes on Kubernetes...
Update:
Those error can be validated with kube-state-metrics with 2 of them:
- ssh_tunnel_open_count
- ssh_tunnel_open_fail_count
For me the number of open tunnel ssh fail grow with more than 200 ssh tunnel open.
For information, we have make some test with GKE
- swith from zonal to regional cluster
- use new native IP (old alias IP)
But this not solve the problem.
After disabling auto-scaling on node-pool , we have no more error.
I could fix this issue by deactivating auto-scaling profile optimize-utilization/resetting the profile back to the default balanced. optimize-utilization is in beta status anyway.

How to make HTTP stream from RTSP

I have LRP camera which produces RTSP stream in rtsp://172.16.4.6. I use VLC to see this streaming. Then I need to use ALPR Daemon for passing stream in http:// to recognize registration plates captured by camera. According to documentation in should be only http://. So using VLC I am trying to convert/transcode it to proper format. What is the problem, that I am not familiar with this field and have no time to study basic.
I installed apache2 on ubuntu that has port 80 and http://127.0.0.1 address. Then I tried to use some of approaches from documentation https://wiki.videolan.org/Documentation:Streaming_HowTo/Command_Line_Examples/ , then I tried $vlc -I http rtsp://172.16.4.6:554/HighResolutionVideo :sout='#transcode{vcodec=MJPG,vb=800,fps=5}:std{access=http{mime=multipart/x-mixed-repace},mux=mpjpeg,dst=127.0.0.1:80/go.mjpg,delay=0}'
But then I have this error log:
[00007f5fb0001268] core access out error: socket bind error: Permission denied
[00007f5fb0001268] core access out error: cannot create socket(s) for HTTP host
[00007f5fb0001268] access_output_http access out error: cannot start HTTP server
[00007f5fb0003388] stream_out_standard stream out error: no suitable sout access module for `http{mine=multipart/x-mixed-repace}/mpjpeg://172.0.0.1:80/go.mjpg'
[00007f5fb0000b18] core stream output error: stream chain failed for `transcode{vcodec=MJPG,vb=800,fps=5}:std{access=http{mine=multipart/x-mixed-repace},mux=mpjpeg,dst=172.0.0.1:80/go.mjpg,delay=0}'
[00007f5fb42929f8] core input error: cannot start stream output instance, aborting
[00007f5fb0003388] access_output_http access out: Consider passing --http-host=IP on the command line instead.
[00007f5fb0003388] core access out error: socket bind error: Permission denied
[00007f5fb0003388] core access out error: cannot create socket(s) for HTTP host
[00007f5fb0003388] access_output_http access out error: cannot start HTTP server
[00007f5fb0001268] stream_out_standard stream out error: no suitable sout access module for `http{mine=multipart/x-mixed-repace}/mpjpeg://172.0.0.1:80/go.mjpg'
[00007f5fb0000b18] core stream output error: stream chain failed for `transcode{vcodec=MJPG,vb=800,fps=5}:std{access=http{mine=multipart/x-mixed-repace},mux=mpjpeg,dst=172.0.0.1:80/go.mjpg,delay=0}'
[00007f5fb42929f8] core input error: cannot start stream output instance, aborting
I think you have a typo in repace, Try this:
--sout #transcode{vcodec=MJPG,venc=ffmpeg{strict=1}}:standard{access=http{mime=multipart/x-mixed-replace;boundary=--7b3cc56e5f51db803f790dad720ed50a},mux=mpjpeg,dst=:80/go.mjpg}

Resources