Hadoop: the Mapper didn't read files from multiple input paths

Hadoop: the Mapper didn't read files from multiple input paths - dictionary

The Mapper didn't manage to read a file from multiple directories. Could anyone help?
I need to read one file in each mapper. I've added multiple input paths and implemented the custom WholeFileInputFormat, WholeFileRecordReader. In the map method, I don't need the input key. I make sure that each map can read a whole file.
Command line: hadoop jar AutoProduce.jar Autoproduce /input_a /input_b /output
I specified two input path----1.input_a; 2.input_b;
Run method snippets:
Job job = new Job(getConf());
job.setInputFormatClass(WholeFileInputFormat.class);
FileInputFormat.setInputPaths(job, new Path(args[0]), new Path(args[1]));
FileOutputFormat.setOutputPath(job, new Path(args[2]));
map method snippets:
public void map(NullWritable key, BytesWritable value, Context context){
FileSplit fileSplit = (FileSplit) context.getInputSplit();
System.out.println("Directory :" + fileSplit.getPath().toString());
......
}
Custom WholeFileInputFormat:
class WholeFileInputFormat extends FileInputFormat<NullWritable, BytesWritable> {
#Override
protected boolean isSplitable(JobContext context, Path file) {
return false;
}
#Override
public RecordReader<NullWritable, BytesWritable> createRecordReader(
InputSplit split, TaskAttemptContext context) throws IOException,
InterruptedException {
WholeFileRecordReader reader = new WholeFileRecordReader();
reader.initialize(split, context);
return reader;
}
}
Custom WholeFileRecordReader:
class WholeFileRecordReader extends RecordReader<NullWritable, BytesWritable> {
private FileSplit fileSplit;
private Configuration conf;
private BytesWritable value = new BytesWritable();
private boolean processed = false;
#Override
public void initialize(InputSplit split, TaskAttemptContext context)
throws IOException, InterruptedException {
this.fileSplit = (FileSplit) split;
this.conf = context.getConfiguration();
}
#Override
public boolean nextKeyValue() throws IOException, InterruptedException {
if (!processed) {
byte[] contents = new byte[(int) fileSplit.getLength()];
Path file = fileSplit.getPath();
FileSystem fs = file.getFileSystem(conf);
FSDataInputStream in = null;
try {
in = fs.open(file);
IOUtils.readFully(in, contents, 0, contents.length);
value.set(contents, 0, contents.length);
} finally {
IOUtils.closeStream(in);
}
processed = true;
return true;
}
return false;
}
#Override
public NullWritable getCurrentKey() throws IOException,InterruptedException {
return NullWritable.get();
}
#Override
public BytesWritable getCurrentValue() throws IOException,InterruptedException {
return value;
}
#Override
public float getProgress() throws IOException {
return processed ? 1.0f : 0.0f;
}
#Override
public void close() throws IOException {
// do nothing
}
}
PROBLEM:
After setting two input paths, all map tasks read files from only one directory..
Thanks in advance.

You'll have to use MultipleInputs instead of FileInputFormat in the driver. So your code should be as:
MultipleInputs.addInputPath(job, new Path(args[0]), <Input_Format_Class_1>);
MultipleInputs.addInputPath(job, new Path(args[1]), <Input_Format_Class_2>);
.
.
.
MultipleInputs.addInputPath(job, new Path(args[N-1]), <Input_Format_Class_N>);
So if you want to use WholeFileInputFormat for the first input path and TextInputFormat for the second input path, you'll have to use it the following way:
MultipleInputs.addInputPath(job, new Path(args[0]), WholeFileInputFormat.class);
MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class);
Hope this works for you!

Related

Unit test for post sling servlet aem 6.5

I have the following POST servlet that adds new node under certain resource with parameters(name and last nam) from the request:
#Component(
service = Servlet.class,
property = {
"sling.servlet.paths=/bin/createuser",
"sling.servlet.methods=" + HttpConstants.METHOD_POST
})
public class CreateNodeServlet extends SlingAllMethodsServlet {
/**
* Logger
*/
private static final Logger log = LoggerFactory.getLogger(CreateNodeServlet.class);
#Override
protected void doPost(final SlingHttpServletRequest req, final SlingHttpServletResponse resp) throws IOException {
log.info("Inside CreateNodeServlet");
ResourceResolver resourceResolver = req.getResourceResolver();
final Resource resource = resourceResolver.getResource("/content/test/us/en");
String name = req.getParameter("name");
String lastname = req.getParameter("lastname");
log.info("name :{}",name);
log.info("lastname :{}",lastname);
Node node = resource.adaptTo(Node.class);
try {
log.info("Node {}", node.getName() );
Node newNode = node.addNode(name+lastname, "nt:unstructured");
newNode.setProperty("name", name);
newNode.setProperty("lastname", lastname);
resourceResolver.commit();
} catch (RepositoryException e) {
e.printStackTrace();
} catch (PersistenceException e) {
e.printStackTrace();
}
resp.setStatus(200);
resp.getWriter().write("Simple Post Test");
}
}
I tried creating unit test for this I got this so far:
#ExtendWith(AemContextExtension.class)
class CreateNodeServletTest {
private final AemContext context = new AemContext();
private CreateNodeServlet createNodeServlet = new CreateNodeServlet();
#Test
void doPost() throws IOException, JSONException {
context.currentPage(context.pageManager().getPage("/bin/createuser"));
context.currentResource(context.resourceResolver().getResource("/bin/createuser"));
context.requestPathInfo().setResourcePath("/bin/createuser");
MockSlingHttpServletRequest request = context.request();
MockSlingHttpServletResponse response = context.response();
createNodeServlet.doPost(request, response);
JSONArray output = new JSONArray(context.response().getOutputAsString());
assertEquals("Simple Post Test", output);
}
}
however this is not working I am getting null pointer on this line
Node node = resource.adaptTo(Node.class);
can some one help what I am missing and some tips will be of great help as I am new to AEM, and there is not much resources about unit testing sling servlets ?

I think you need to register JCR_MOCK as resource resolver type
new AemContext(ResourceResolverType.JCR_MOCK);

Extent Reports 3.1.5 not appending after each test run

I am use Extent Reports 3.1.5 and my reports keep getting overwritten on each test run. I have implemented to the bests of my ability the solution found on stack-overflow and various sites and no progress. I have spent 24 hours troubleshooting this issue and need help.
I even went and examined the following sites http://extentreports.com/docs/versions/3/java/#htmlreporter-features
http://extentreports.com/docs/versions/3/java/#htmlreporter-features
and check my code and still could not find the issue. If there is something i am overlooking please help me.
Again my test runs however it only writes the last test ran in my testng.xml file
This is my Base Test Class:
package test;
public class BaseTest {
//-------Page Objects----------
login_Page objBELogin;
poll_Page objCreatePoll;
survey_Page objCreateSurvey;
task_Page objCreateTaskGroup;
discussion_Page objCreateDiscussion;
userprofile_Page objUserProfile;
event_Page objectEvent;
workgroup_Page objectWorkgroup;
workroom_Page objectWorkroom;
//-----------------------------
static WebDriver driver;
static String homePage = "https://automation-ozzie.boardeffect.com/login";
ExtentReports report;
ExtentTest test;//--parent test
#BeforeClass
public void setUp() throws InterruptedException{
//--------Extent Report--------
report = ExtentFactory.getInstance();
//-----------------------------
System.setProperty("webdriver.chrome.driver","C:\\GRID\\chromedriver.exe");
ChromeOptions option = new ChromeOptions();
option.addArguments("disable-infobars");
driver = new ChromeDriver(option);
driver.manage().window().maximize();
driver.get(homePage);
driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
}
#BeforeMethod
public void register(Method method) {
String testName = method.getName();
test = report.createTest(testName);
}
#AfterMethod
public void catureStatus(ITestResult result) {
if (result.getStatus()==ITestResult.SUCCESS) {
test.log(Status.PASS,"Test Method named as : "+ result.getName()+" is passed");
}else if(result.getStatus()==ITestResult.FAILURE) {
test.log(Status.PASS,"Test Method named as : "+ result.getName()+" is FAILED");
test.log(Status.FAIL,"Test failure : "+ result.getThrowable());
}
else if(result.getStatus()==ITestResult.SKIP) {
test.log(Status.PASS,"Test Method named as : "+ result.getName()+" is skipped");
}
}
#AfterClass
public void tearDown() throws InterruptedException {
report.flush();
driver.quit();
}
}
public class ExtentFactory {
public static ExtentReports getInstance() {
ExtentHtmlReporter html = new ExtentHtmlReporter("surefire-reports//Extent.html");
html.setAppendExisting(true);
ExtentReports extent = new ExtentReports();
extent.attachReporter(html);
return extent;
}
}

You have to consider adding a unique name to report file for each run.
import java.text.SimpleDateFormat;
import java.util.Date;
public class ExtentFactory {
public static ExtentReports getInstance() {
String out = new SimpleDateFormat("yyyy-MM-dd hh-mm-ss'.html'").format(new Date());
ExtentHtmlReporter html = new ExtentHtmlReporter("surefire-reports//"+out);
ExtentReports extent = new ExtentReports();
extent.attachReporter(html);
return extent;
}
}
You will also need to move the report instance creation to #beforesuite, so that it runs only once for the entire test suite.
#BeforeSuite
public void OneTimesetUp() throws InterruptedException{
//--------Extent Report--------
report = ExtentFactory.getInstance();
//-----------------------------
}

Explicit Wait for automating windows application using winappdriver

I am a newbie to Windows Application Driver and my project demands automating the desktop application, so I decided to use winappdriver as it is similar to selenium, on which I am pretty confident about using.
speaking of the issue,
Just wondering if there is a way to achieve explicit wait and implicit wait using winappdriver. Following is the code i used as part of my test cases, the test fails with an exception (NoSuchElementException), however, if I put a static wait in place instead of explicit wait, it works as expected.
//Driver Setup
public class OscBase {
public static WindowsDriver<WebElement> applicaitonSession, driver = null;
public static WindowsDriver<RemoteWebElement> desktopSession = null;
public static DesiredCapabilities capabilities, cap1, cap2;
public static ProcessBuilder pBuilder;
public static Process p;
public void startDriver() {
try {
pBuilder = new ProcessBuilder("C:\\Program Files (x86)\\Windows Application Driver\\WinAppDriver.exe");
pBuilder.inheritIO();
p = pBuilder.start();
}
catch (IOException e) {
e.printStackTrace();
}
}
public void stopDriver() {
p.destroy();
}
public void createDesktopSession() throws MalformedURLException {
cap1 = new DesiredCapabilities();
cap1.setCapability("app", "Root");
desktopSession = new WindowsDriver<RemoteWebElement>(new URL("http://localhost:4723"), cap1);
}
public void openApplication() throws InterruptedException, MalformedURLException {
if (driver == null) {
try {
capabilities = new DesiredCapabilities();
capabilities.setCapability("app",
"Appnamewithlocation");
applicaitonSession = new WindowsDriver<WebElement>(new URL("http://localhost:4723"),
capabilities);
} catch (Exception e) {
System.out.println("Application opened!!!");
} finally {
createDesktopSession();
}
Thread.sleep(8000L);
String handle = desktopSession.findElementByAccessibilityId("InstallerView5")
.getAttribute("NativeWindowHandle");
System.out.println(handle);
int inthandle = Integer.parseInt(handle);
String hexHandle = Integer.toHexString(inthandle);
//System.out.println(hexHandle);
cap2 = new DesiredCapabilities();
cap2.setCapability("appTopLevelWindow", hexHandle);
driver = new WindowsDriver<WebElement>(new URL("http://localhost:4723"), cap2);
driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
}
}
public boolean isDisplayed_SafeLoginNoBtn() {
wait = new WebDriverWait(driver, 40);
return wait.until(ExpectedConditions.visibilityOf(safeLoginNoBtn())).isDisplayed();
}
#Test
public void osc_Get_Data() throws InterruptedException, IOException {
//Thread.sleep(20000);
// Boolean value=oscLogin.safeLoginNoBtn().isDisplayed();
try {
Boolean value = oscLogin.isDisplayed_SafeLoginNoBtn();
System.out.println("IS displayed========>"+value);
if (value) {
oscLogin.click_safeLogin();
}
} catch (Exception e) {
System.out.println("Safe Login!!!!");
}

Of course yes, the WebDriverWait class will work. Here's an example
WebDriverWait waitForMe = new WebDriverWait();
WebDriverWait waitForMe = new WebDriverWait(session, new TimeSpan.Fromseconds(10));
var txtLocation = session.FindElementByName("Enter a location");
waitForMe.Until(pred => txtLocation.Displayed);
I've created a detailed course about UI Automation using WinAppDriver and C# .Net. I'll be publishing it in a few days. Do let me know if you're interested :)

I am getting null for the one of the column value after extraction. What's wrong with the following program?

I am getting null for one of the selected column with IterableCSVToBean<MessageFileExtractHeader>
DTO Classe:
public class MessageFileExtractHeader implements Serializable {
private static final long serialVersionUID = -3052197544136826142L;
private String mesgid;
private String mesg_type;
// getters and setters
Main Class:
public class FileExtraction {
public static void main(String[] args) throws IOException, IllegalAccessException, InvocationTargetException, InstantiationException, IntrospectionException, CsvBadConverterException, CsvDataTypeMismatchException, CsvRequiredFieldEmptyException, CsvConstraintViolationException {
Properties prop = new Properties();
ExtractFieldUtils efUtils= new ExtractFieldUtils();
MessageFileExtractHeader msgFilxtractRecord = null;
try {
InputStream inputStream =
SAADumpFileExtraction.class.getClassLoader().getResourceAsStream("config.properties");
prop.load(inputStream);
} catch (IOException e) {
e.printStackTrace();
}
String fileDirectory= prop.getProperty("file.directory");
//get the filenames
String mesgfilename= fileDirectory+prop.getProperty("mesg.file.name");
//get the headers
String mesgheader= fileDirectory+prop.getProperty("mesg.file.header.fields");
int msgskiplines=1;
CSVReader reader = null;
try {
reader = new CSVReader(new FileReader(mesgfilename));
Map<String, String> msgmapping = efUtils.getMapping(mesgheader);
HeaderColumnNameTranslateMappingStrategy<MessageFileExtractHeader> strategy = new HeaderColumnNameTranslateMappingStrategy<MessageFileExtractHeader>();
strategy.setType(MessageFileExtractHeader.class);
strategy.setColumnMapping(msgmapping);
IterableCSVToBean<MessageFileExtractHeader> msgCTBIterator= new IterableCSVToBean<MessageFileExtractHeader>(reader, strategy, null);
Iterator<MessageFileExtractHeader> mesgIterator= msgCTBIterator.iterator();
while(mesgIterator.hasNext()){
msgFilxtractRecord = mesgIterator.next();
System.out.println(msgFilxtractRecord);
//
}} finally {
reader.close();
}
}
}
Output:
MessageFileExtractHeaders [mesgid=null, mesg_type=081]
Please suggest me good solution to get the mesgid.

Please send an short sample of your csv file (header and one line) and the value of your header property.
My guess is there is a type in either the csv header, the headers in the property or both and it does not match what is in the DTO (mesgid). Because of that it will not be populated.

Can Storm's HdfsBolt flush data after a timeout as well?

We are using Storm to process streaming data and store into HDFS. We have got everything to work but have one issue. I understand that we can specify the number of tuples after which the data gets flushed to HDFS using SyncPolicy, something like this below:
SyncPolicy syncPolicy = new CountSyncPolicy(Integer.parseInt(args[3]));
The question I have is can the data also be flushed after a timeout? For e.g. we have set the SyncPolicy above to 1000 tuples. If for whatever reason we get 995 tuples and then the data stops coming in for a while is there any way that storm can flush the 995 records to HDFS after a specified timeout (5 seconds)?
Thanks in advance for any help on this!
Shay

Yes, if you send a tick tuple to the HDFS bolt, it will cause the bolt to try to sync to the HDFS file system. All this happens in the HDFS bolt's execute function.
To configure tick tuples for your topology, in your topology config. In Java, to set that to every 300 seconds the code would look like:
Config topologyConfig = new Config();
topologyConfig.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 300);
StormSubmitter.submitTopology("mytopology", topologyConfig, builder.createTopology());
You'll have to adjust that last line depending on your circumstances.

There is an alternative solution for this problem,
First, lets clarify about sync policy, If your sync policy is 1000 ,then HdfsBolt only sync the data from 1000 tuple by calling hsync() method in execute() means it only clears the buffer by pushing data to disk, but for faster write disk may uses its cache and not writing to file directly.
The data is written to the file only when the size of data matches your rotation policy that need to specify at the time of bolt creation.
FileRotationPolicy rotationPolicy = new FileSizeRotationPolicy(100.0f, Units.KB);
So for flushing the record the to file after timeout, Seperate your tick tuple from normal tuples in excecute method and calculate the time difference of both tuple, If the diff is greater than timeout period then write the data to file.
By handling tick tuple differently you can also avoid the tick tuple frequency written to your file.
See the below code for better understanding:
public class CustomHdfsBolt1 extends AbstractHdfsBolt {
private static final Logger LOG = LoggerFactory.getLogger(CustomHdfsBolt1.class);
private transient FSDataOutputStream out;
private RecordFormat format;
private long offset = 0L;
private int tickTupleCount = 0;
private String type;
private long normalTupleTime;
private long tickTupleTime;
public CustomHdfsBolt1() {
}
public CustomHdfsBolt1(String type) {
this.type = type;
}
public CustomHdfsBolt1 withFsUrl(String fsUrl) {
this.fsUrl = fsUrl;
return this;
}
public CustomHdfsBolt1 withConfigKey(String configKey) {
this.configKey = configKey;
return this;
}
public CustomHdfsBolt1 withFileNameFormat(FileNameFormat fileNameFormat) {
this.fileNameFormat = fileNameFormat;
return this;
}
public CustomHdfsBolt1 withRecordFormat(RecordFormat format) {
this.format = format;
return this;
}
public CustomHdfsBolt1 withSyncPolicy(SyncPolicy syncPolicy) {
this.syncPolicy = syncPolicy;
return this;
}
public CustomHdfsBolt1 withRotationPolicy(FileRotationPolicy rotationPolicy) {
this.rotationPolicy = rotationPolicy;
return this;
}
public CustomHdfsBolt1 addRotationAction(RotationAction action) {
this.rotationActions.add(action);
return this;
}
protected static boolean isTickTuple(Tuple tuple) {
return tuple.getSourceComponent().equals(Constants.SYSTEM_COMPONENT_ID)
&& tuple.getSourceStreamId().equals(Constants.SYSTEM_TICK_STREAM_ID);
}
public void execute(Tuple tuple) {
try {
if (isTickTuple(tuple)) {
tickTupleTime = Calendar.getInstance().getTimeInMillis();
long timeDiff = normalTupleTime - tickTupleTime;
long diffInSeconds = TimeUnit.MILLISECONDS.toSeconds(timeDiff);
if (diffInSeconds > 5) { // specify the value you want.
this.rotateWithOutFileSize(tuple);
}
} else {
normalTupleTime = Calendar.getInstance().getTimeInMillis();
this.rotateWithFileSize(tuple);
}
} catch (IOException var6) {
LOG.warn("write/sync failed.", var6);
this.collector.fail(tuple);
}
}
public void rotateWithFileSize(Tuple tuple) throws IOException {
syncHdfs(tuple);
this.collector.ack(tuple);
if (this.rotationPolicy.mark(tuple, this.offset)) {
this.rotateOutputFile();
this.offset = 0L;
this.rotationPolicy.reset();
}
}
public void rotateWithOutFileSize(Tuple tuple) throws IOException {
syncHdfs(tuple);
this.collector.ack(tuple);
this.rotateOutputFile();
this.offset = 0L;
this.rotationPolicy.reset();
}
public void syncHdfs(Tuple tuple) throws IOException {
byte[] e = this.format.format(tuple);
synchronized (this.writeLock) {
this.out.write(e);
this.offset += (long) e.length;
if (this.syncPolicy.mark(tuple, this.offset)) {
if (this.out instanceof HdfsDataOutputStream) {
((HdfsDataOutputStream) this.out).hsync(EnumSet.of(SyncFlag.UPDATE_LENGTH));
} else {
this.out.hsync();
}
this.syncPolicy.reset();
}
}
}
public void closeOutputFile() throws IOException {
this.out.close();
}
public void doPrepare(Map conf, TopologyContext topologyContext, OutputCollector collector) throws IOException {
LOG.info("Preparing HDFS Bolt...");
this.fs = FileSystem.get(URI.create(this.fsUrl), this.hdfsConfig);
this.tickTupleCount = 0;
this.normalTupleTime = 0;
this.tickTupleTime = 0;
}
public Path createOutputFile() throws IOException {
Path path = new Path(this.fileNameFormat.getPath(),
this.fileNameFormat.getName((long) this.rotation, System.currentTimeMillis()));
this.out = this.fs.create(path);
return path;
}
}
You can directly use this class in your project.
Thanks,

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Hadoop: the Mapper didn't read files from multiple input paths - dictionary

Related

Unit test for post sling servlet aem 6.5

Extent Reports 3.1.5 not appending after each test run

Explicit Wait for automating windows application using winappdriver

I am getting null for the one of the column value after extraction. What's wrong with the following program?

Can Storm's HdfsBolt flush data after a timeout as well?

Categories

Resources