Where is emrfs-site.xml? - emr

Where is emrfs-site.xml on the EMR master node?
Consistent view is disabled within the EMR UI but I am unable to find the configuration file to verify.
sudo find / -name emrfs-site.xml
yields
/var/aws/emr/bigtop-deploy/puppet/modules/emrfs/templates/emrfs-site.xml
/usr/share/aws/emr/emrfs/conf/emrfs-site.xml
Neither of which seem to be relevant.

This is the relevant one:
/usr/share/aws/emr/emrfs/conf/emrfs-site.xml
It seems to be a strange place, but the path to the config file is hardcoded in java code. You can verify this by logging the debug output of the root logger to the console:
export HADOOP_ROOT_LOGGER="DEBUG,console
If consistent view is enabled, the following should be in that file:
<property>
<name>fs.s3.consistent</name>
<value>true</value>
</property>

Related

How to configure log4j

How to configure log4j to show only my
log.debug("test log");
messages in console without other system generated information?
It's very disturbing when in small app your console is messed with tons of useless ( at least for me) information like
DEBUG org.springframework.beans.CachedIntrospectionResults: Getting BeanInfo for class [org.thymeleaf.spring4.view.ThymeleafView]
my log4j.properties file:
log4j.rootLogger=DEBUG, stdout, file
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
You can use the LoggerMatchFilter and DenyAllFilter to restrict your logging appender to messages coming from your code only.
LoggerMatchFilter filter = new LoggerMatchFilter();
filter.setLoggerToMatch("Your.Root.Namespace");
filter.setAcceptOnMatch(true);
log4j.appender.stdout.addFilter(filter); // Match your messages only.
log4j.appender.stdout.addFilter(new DenyAllFilter()); // Don't match anything else.
Is your application using log4j or yet to configure??
log4j configure steps
If already log4j is in use, change the logger level to ERROR
Check whether your applications is using any xml configuration file for log4j or properties file so you can change in it.
logger level configuration steps.

Oozie shell action: exec and file tags

I'm a newbie in Oozie and I've read some Oozie shell action examples but this got me confused about certain things.
There are examples I've seen where there is no <file> tag.
Some example, like in Cloudera here, repeats the shell script in file tag:
<shell xmlns="uri:oozie:shell-action:0.2">
<exec>check-hour.sh</exec>
<argument>${earthquakeMinThreshold}</argument>
<file>check-hour.sh</file>
</shell>
While in Oozie's website, writes the shell script (the reference ${EXEC} from job.properties, which points to script.sh file) twice, separated by #.
<shell xmlns="uri:oozie:shell-action:0.1">
...
<exec>${EXEC}</exec>
<argument>A</argument>
<argument>B</argument>
<file>${EXEC}#${EXEC}</file>
</shell>
There are also examples I've seen where the path (HDFS or local?) is prepended before the script.sh#script.sh within the <file> tag.
<shell xmlns="uri:oozie:shell-action:0.1">
...
<exec>script.sh</exec>
<argument>A</argument>
<argument>B</argument>
<file>/path/script.sh#script.sh</file>
</shell>
As I understand, any shell script file can be included in the workflow HDFS path (same path where workflow.xml resides).
Can someone explain the differences in these examples and how <exec>, <file>, script.sh#script.sh, and the /path/script.sh#script.sh are used?
<file>hdfs:///apps/duh/mystuff/check-hour.sh</file> means "download that HDFS file into the Current Working Dir of the YARN container that runs the Oozie Launcher for the Shell action, using the same file name by default, so that I can reference it as ./check-hour.sh or simply check-hour.sh in the <exec> element".
<file>check-hour.sh</file> means "download that HDFS file -- from my user's home dir e.g. hdfs:///user/borat/check-hour.sh -- into etc. etc.".
<file>hdfs:///apps/duh/mystuff/check-hour.sh#youpi</file> means "download that HDFS file etc. etc., renaming it as youpi, so that I can reference it as ./youpi or simply youpi in the element".
Note that the Hue UI often inserts unnecessary # stuff with no actual name change. That's why you will see it so often.

How to retrieve currently applied node configuration from Riak v2.0+

Showing currently applied configuration values
In v2.0+ of Riak there is a new command option: riak config effective
Which I read as it would tell you the current running values of riak.
At any time, you can get a snapshot of currently applied
configurations through the command line. For a listing of all of the
configs currently applied in the node
Config changes applied only on start of each node?
In multiple locations in Riak documentation there is reference like:
Remember that you must stop and then re-start each node when you
change storage backends or modify any other configuration
Problem:
However when I made a change to a setting (I've tested this in both riak.conf and advanced.conf), I see the newest value when running: riak config effective
ie:
Start node: riak start
View current setting for log level: riak config effective | grep log.console.level
log.console.level = info
Change the level to debug (something that will output a lot to console.log)
Re-run: riak config effective | grep log.console.level, we get:
log.console.level = debug
Checking the console log file for debug: cat /var/log/riak/console.log | grep debug give no results (indicating the config change has not been applied)
So the question is, how can I retrieve and verify what config setting each Riak node is running under?
When Riak starts, it creates two files: 'app..config' and 'vm..config'. The default location is in a 'generated.configs' directory under the platform data directory (usually /var/lib/riak).
These files will contain the settings that were in place when Riak was started. The command riak config effective processes the current riak.conf and advanced.config files.

Cloudera configuration - Multi NIC

I'm trying to setup a multi-NIC cluster with Cloudera5. Each node has an ethernet interface (eth1 - 172.17.2.x) plus an infiniband interface (ib0 - 192.168.69.x).
The problem is, the cluster communicates the infiniband addresses to the "outside world" when using HDFS.
I found out that the right parameter to get such a configuration working is "dfs.datanode.dns.interface" and that it has to be set to "eth1".
However, this parameter is not present in the Cloudera Manager interface. As it automatically overwrites the hdfs-site.xml file, I can't go to write it in the file.
I tried to use the Cloudera manager "Safety Valves" (Configuration > Service-Wide > Advanced > HDFS Service Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml), and set it to
<property>
<name>dfs.datanode.dns.interface</name>
<value>eth1</value>
</property>
but the HDFS Canary fails.
Could anyone please
Confirm that it's the right parameter
give me some help on how to set
it in the Cloudera Manager interface ?
Thanks in advance.
You can add further configuration properties that is not present in the CM interface filling these field in the HDFS configuration page:
- HDFS Service Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml"
- Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml
- HDFS Service Advanced Configuration Snippet (Safety Valve) for hadoop-policy.xml
In yout case you have to insert this code:
<property>
<name>dfs.datanode.dns.interface</name>
<value>eth1</value>
</property>
in the HDFS Service Advanced Configuration Snippet

trying to use log4j.xml file within WinRun4j

has anyone tried to use a log4j.xml reference within a WinRun4j service configuration. here is a copy of my service.ini file. I have tried many configuration combinations. this is just my latest attempt
service.class=org.boris.winrun4j.MainService
service.id=SimpleBacnetIpDataTransfer
service.name=Simple Backnet IP DataTransfer Service
service.description=This is the service for the Simple Backnet IP DataTransfer.
service.startup=auto
classpath.1=C:\Inbox\DataTransferClient-1.0-SNAPSHOT-jar-with-dependencies.jar
classpath.2=WinRun4J.jar
classpath.3=C:\Inbox\log4j-1.2.16.jar
arg.1=C:\Inbox\DataTransferClient.xml
log=C:\WinRun4J-Service\SimpleBacnetIpDataTransfer\NBP-DT-service.log
log.overwrite=true
log.roll.size=10MB
[MainService]
class=com.shiftenergy.ws.App
vmarg.1=-Xdebug
vmarg.2=-Xnoagent
vmarg.3=-Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n
vmarg.4=-Dlog4j.configuration=file:C:\Inbox\log4j.xml
within the log4j.xml file, there is reference to a log file for when the application runs. if I run the java -jar -Dlog4j.configuration=file:C:\Inbox\log4j.xml ...., the log file is created accordingly. if I register my service and start the service, the log file does not get created.
has anyone had success using the -D log4j configuration, using winrun4j?
thanks
I think that you provided the vmarg.4 parameter incorrectly. In your case it has to be like:
vmarg.4=-Dlog4j.configurationFile=[Path for log4j.xml]
I am also using the same and in my case, it is working perfectly fine. Please see below example:
vmarg.1=-Dlog4j.configurationFile=.\log4j2.xml
Have you tried setting the path in your code instead:
System.setProperty("log4j.configurationFile", "config/log4j.xml");
I'm using a relative path to a folder named config that contains log4j.xml. An absolute path is not recommended, but may work as well.
Just be sure to set this before making any calls to log4j, including any log4j config settings or static method calls!
System.setProperty("log4j.configurationFile", "config/log4j.xml");
final Logger log = Logger.getLogger(Main.class);
log.info("Starting up");
I didn't specify the log4j path in the ini file, only placed log4j.xml file at the same place the jar was placed.
Also without specify the
System.setProperty("log4j.configurationFile", "config/log4j.xml");
In the Java project it was stored in (src/main/resources) and will be included in the jar, but it will not be that one used if placed outside the jar.

Resources