Exception when adding element to ArrayList while iterate - collections

I am trying to add an String object into ArrayList<String> while iterating it. then i have a Exception like :
Exception in thread "main" java.util.ConcurrentModificationException
at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859)
at java.util.ArrayList$Itr.next(ArrayList.java:831)
at com.alonegk.corejava.collections.list.ArrayListDemo.main(ArrayListDemo.java:19)
the piece of code as -
public static void main(String[] args) {
ArrayList<String> al =new ArrayList<String>();
al.add("str1");
al.add("str2");
Iterator<String> it = al.iterator();
while(it.hasNext()){
System.out.println(it.next());
al.add("gkgk");
}
there is no synchronization here. i need to know the cause of this exception ?

Refer this for ConcurrentModificationException.Try using ListIterator<String> if you want to add new value in the iterator.
public static void main(String[] args) {
ArrayList<String> al =new ArrayList<String>();
al.add("str1");
al.add("str2");
ListIterator<String> it = al.listIterator();
while(it.hasNext()){
System.out.println(it.next());
it.add("gkgk");
}
}

The ConcurrentModificationException is used to fail-fast when something we are iterating and modifying at the same time
we can do the modification by using iterator directly as
for (Iterator<Integer> iterator = integers.iterator(); iterator.hasNext();) {
Integer integer = iterator.next();
if(integer == 2) {
iterator.remove();
}
}

Related

List to Map giving compilation error

I have the following code :
List<Object> result = new ArrayList<Object>();
//Object is actually a Map<String,Object>
return Maps.uniqueIndex(result, new Function<Map<String, Object>, Long>() {
#Override
public Long apply(Map<String, Object> input) {
return (Long) input.remove("id");
}
});
I get compilation error.
The method uniqueIndex(Iterable<V>, Function<? super V,K>) in the type Maps is not applicable for the arguments (List, new Function<Map<String,Object>,Long>(){}).
How do I rewrite this piece of code such that I don't get into this issue?
The first generic parameter of Function must match the type of elements held by List.
So, if you have a List<T>, a Function will be used for doing something with elements from that List, hence it needs to be a Function<T, WHATEVER>.
So, in your case:
List<Object> result = new ArrayList<>();
Maps.uniqueIndex(result, new Function<Object, WHATEVER>() {
#Nullable
#Override
public WHATEVER apply(#Nullable Object s) {
return null; // do whatever you want here
}
});
If you want to store Map<String,Object> in a List why not use List<Map<String,Object>>?
List<Map<String,Object>> result = new ArrayList<>();
Maps.uniqueIndex(result, new Function<Map<String,Object>, WHATEVER>() {
#Nullable
#Override
public WHATEVER apply(#Nullable Map<String,Object> s) {
return null; // do whatever you want here
}
});

I am getting null for the one of the column value after extraction. What's wrong with the following program?

I am getting null for one of the selected column with IterableCSVToBean<MessageFileExtractHeader>
DTO Classe:
public class MessageFileExtractHeader implements Serializable {
private static final long serialVersionUID = -3052197544136826142L;
private String mesgid;
private String mesg_type;
// getters and setters
Main Class:
public class FileExtraction {
public static void main(String[] args) throws IOException, IllegalAccessException, InvocationTargetException, InstantiationException, IntrospectionException, CsvBadConverterException, CsvDataTypeMismatchException, CsvRequiredFieldEmptyException, CsvConstraintViolationException {
Properties prop = new Properties();
ExtractFieldUtils efUtils= new ExtractFieldUtils();
MessageFileExtractHeader msgFilxtractRecord = null;
try {
InputStream inputStream =
SAADumpFileExtraction.class.getClassLoader().getResourceAsStream("config.properties");
prop.load(inputStream);
} catch (IOException e) {
e.printStackTrace();
}
String fileDirectory= prop.getProperty("file.directory");
//get the filenames
String mesgfilename= fileDirectory+prop.getProperty("mesg.file.name");
//get the headers
String mesgheader= fileDirectory+prop.getProperty("mesg.file.header.fields");
int msgskiplines=1;
CSVReader reader = null;
try {
reader = new CSVReader(new FileReader(mesgfilename));
Map<String, String> msgmapping = efUtils.getMapping(mesgheader);
HeaderColumnNameTranslateMappingStrategy<MessageFileExtractHeader> strategy = new HeaderColumnNameTranslateMappingStrategy<MessageFileExtractHeader>();
strategy.setType(MessageFileExtractHeader.class);
strategy.setColumnMapping(msgmapping);
IterableCSVToBean<MessageFileExtractHeader> msgCTBIterator= new IterableCSVToBean<MessageFileExtractHeader>(reader, strategy, null);
Iterator<MessageFileExtractHeader> mesgIterator= msgCTBIterator.iterator();
while(mesgIterator.hasNext()){
msgFilxtractRecord = mesgIterator.next();
System.out.println(msgFilxtractRecord);
//
}} finally {
reader.close();
}
}
}
Output:
MessageFileExtractHeaders [mesgid=null, mesg_type=081]
Please suggest me good solution to get the mesgid.
Please send an short sample of your csv file (header and one line) and the value of your header property.
My guess is there is a type in either the csv header, the headers in the property or both and it does not match what is in the DTO (mesgid). Because of that it will not be populated.

Can Storm's HdfsBolt flush data after a timeout as well?

We are using Storm to process streaming data and store into HDFS. We have got everything to work but have one issue. I understand that we can specify the number of tuples after which the data gets flushed to HDFS using SyncPolicy, something like this below:
SyncPolicy syncPolicy = new CountSyncPolicy(Integer.parseInt(args[3]));
The question I have is can the data also be flushed after a timeout? For e.g. we have set the SyncPolicy above to 1000 tuples. If for whatever reason we get 995 tuples and then the data stops coming in for a while is there any way that storm can flush the 995 records to HDFS after a specified timeout (5 seconds)?
Thanks in advance for any help on this!
Shay
Yes, if you send a tick tuple to the HDFS bolt, it will cause the bolt to try to sync to the HDFS file system. All this happens in the HDFS bolt's execute function.
To configure tick tuples for your topology, in your topology config. In Java, to set that to every 300 seconds the code would look like:
Config topologyConfig = new Config();
topologyConfig.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 300);
StormSubmitter.submitTopology("mytopology", topologyConfig, builder.createTopology());
You'll have to adjust that last line depending on your circumstances.
There is an alternative solution for this problem,
First, lets clarify about sync policy, If your sync policy is 1000 ,then HdfsBolt only sync the data from 1000 tuple by calling hsync() method in execute() means it only clears the buffer by pushing data to disk, but for faster write disk may uses its cache and not writing to file directly.
The data is written to the file only when the size of data matches your rotation policy that need to specify at the time of bolt creation.
FileRotationPolicy rotationPolicy = new FileSizeRotationPolicy(100.0f, Units.KB);
So for flushing the record the to file after timeout, Seperate your tick tuple from normal tuples in excecute method and calculate the time difference of both tuple, If the diff is greater than timeout period then write the data to file.
By handling tick tuple differently you can also avoid the tick tuple frequency written to your file.
See the below code for better understanding:
public class CustomHdfsBolt1 extends AbstractHdfsBolt {
private static final Logger LOG = LoggerFactory.getLogger(CustomHdfsBolt1.class);
private transient FSDataOutputStream out;
private RecordFormat format;
private long offset = 0L;
private int tickTupleCount = 0;
private String type;
private long normalTupleTime;
private long tickTupleTime;
public CustomHdfsBolt1() {
}
public CustomHdfsBolt1(String type) {
this.type = type;
}
public CustomHdfsBolt1 withFsUrl(String fsUrl) {
this.fsUrl = fsUrl;
return this;
}
public CustomHdfsBolt1 withConfigKey(String configKey) {
this.configKey = configKey;
return this;
}
public CustomHdfsBolt1 withFileNameFormat(FileNameFormat fileNameFormat) {
this.fileNameFormat = fileNameFormat;
return this;
}
public CustomHdfsBolt1 withRecordFormat(RecordFormat format) {
this.format = format;
return this;
}
public CustomHdfsBolt1 withSyncPolicy(SyncPolicy syncPolicy) {
this.syncPolicy = syncPolicy;
return this;
}
public CustomHdfsBolt1 withRotationPolicy(FileRotationPolicy rotationPolicy) {
this.rotationPolicy = rotationPolicy;
return this;
}
public CustomHdfsBolt1 addRotationAction(RotationAction action) {
this.rotationActions.add(action);
return this;
}
protected static boolean isTickTuple(Tuple tuple) {
return tuple.getSourceComponent().equals(Constants.SYSTEM_COMPONENT_ID)
&& tuple.getSourceStreamId().equals(Constants.SYSTEM_TICK_STREAM_ID);
}
public void execute(Tuple tuple) {
try {
if (isTickTuple(tuple)) {
tickTupleTime = Calendar.getInstance().getTimeInMillis();
long timeDiff = normalTupleTime - tickTupleTime;
long diffInSeconds = TimeUnit.MILLISECONDS.toSeconds(timeDiff);
if (diffInSeconds > 5) { // specify the value you want.
this.rotateWithOutFileSize(tuple);
}
} else {
normalTupleTime = Calendar.getInstance().getTimeInMillis();
this.rotateWithFileSize(tuple);
}
} catch (IOException var6) {
LOG.warn("write/sync failed.", var6);
this.collector.fail(tuple);
}
}
public void rotateWithFileSize(Tuple tuple) throws IOException {
syncHdfs(tuple);
this.collector.ack(tuple);
if (this.rotationPolicy.mark(tuple, this.offset)) {
this.rotateOutputFile();
this.offset = 0L;
this.rotationPolicy.reset();
}
}
public void rotateWithOutFileSize(Tuple tuple) throws IOException {
syncHdfs(tuple);
this.collector.ack(tuple);
this.rotateOutputFile();
this.offset = 0L;
this.rotationPolicy.reset();
}
public void syncHdfs(Tuple tuple) throws IOException {
byte[] e = this.format.format(tuple);
synchronized (this.writeLock) {
this.out.write(e);
this.offset += (long) e.length;
if (this.syncPolicy.mark(tuple, this.offset)) {
if (this.out instanceof HdfsDataOutputStream) {
((HdfsDataOutputStream) this.out).hsync(EnumSet.of(SyncFlag.UPDATE_LENGTH));
} else {
this.out.hsync();
}
this.syncPolicy.reset();
}
}
}
public void closeOutputFile() throws IOException {
this.out.close();
}
public void doPrepare(Map conf, TopologyContext topologyContext, OutputCollector collector) throws IOException {
LOG.info("Preparing HDFS Bolt...");
this.fs = FileSystem.get(URI.create(this.fsUrl), this.hdfsConfig);
this.tickTupleCount = 0;
this.normalTupleTime = 0;
this.tickTupleTime = 0;
}
public Path createOutputFile() throws IOException {
Path path = new Path(this.fileNameFormat.getPath(),
this.fileNameFormat.getName((long) this.rotation, System.currentTimeMillis()));
this.out = this.fs.create(path);
return path;
}
}
You can directly use this class in your project.
Thanks,

Can Crawler4j be run from another class

I need to call Crawler4j from a different class. Instead of the main method in the Controller class I used a simple method called setup.
class Controller {
public void setup(String seed) {
try {
String rootFolder = "data/crawler";
int numberOfCrawlers = 1;
CrawlConfig config = new CrawlConfig();
config.setCrawlStorageFolder(rootFolder);
config.setPolitenessDelay(300);
config.setMaxDepthOfCrawling(1);
PageFetcher pageFetcher = new PageFetcher(config);
RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);
controller.addSeed(seed);
controller.setCustomData(seed);
controller.start(MyCrawler.class, numberOfCrawlers);
} catch(Exception e) {
e.printStackTrace();
}
}
}
Tried to call it like this in another class, but props up an error.
Controller c = new Controller();
c.setup(seed);
Is it possible to not have a main method in the Controller class and still run crawler4j. In short, I would like to know how to integrate the crawler in to my application which already has a main method. Help would be appreciated.
There should be no problem running Crawler like you want. The code below is tested and will work like expected:
public class Controller {
public void setup(String seed) {
try {
String rootFolder = "data/crawler";
int numberOfCrawlers = 4;
CrawlConfig config = new CrawlConfig();
config.setCrawlStorageFolder(rootFolder);
config.setPolitenessDelay(300);
config.setMaxDepthOfCrawling(2);
PageFetcher pageFetcher = new PageFetcher(config);
RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);
controller.addSeed(seed);
controller.setCustomData(seed);
controller.start(BasicCrawler.class, numberOfCrawlers);
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) throws Exception {
Controller crawler = new Controller();
crawler.setup("http://www.ics.uci.edu/");
}
}
Sorry I forgot to place an access modifier "public" before the class name. Hence the error. Thank you for your answer.

Hadoop: the Mapper didn't read files from multiple input paths

The Mapper didn't manage to read a file from multiple directories. Could anyone help?
I need to read one file in each mapper. I've added multiple input paths and implemented the custom WholeFileInputFormat, WholeFileRecordReader. In the map method, I don't need the input key. I make sure that each map can read a whole file.
Command line: hadoop jar AutoProduce.jar Autoproduce /input_a /input_b /output
I specified two input path----1.input_a; 2.input_b;
Run method snippets:
Job job = new Job(getConf());
job.setInputFormatClass(WholeFileInputFormat.class);
FileInputFormat.setInputPaths(job, new Path(args[0]), new Path(args[1]));
FileOutputFormat.setOutputPath(job, new Path(args[2]));
map method snippets:
public void map(NullWritable key, BytesWritable value, Context context){
FileSplit fileSplit = (FileSplit) context.getInputSplit();
System.out.println("Directory :" + fileSplit.getPath().toString());
......
}
Custom WholeFileInputFormat:
class WholeFileInputFormat extends FileInputFormat<NullWritable, BytesWritable> {
#Override
protected boolean isSplitable(JobContext context, Path file) {
return false;
}
#Override
public RecordReader<NullWritable, BytesWritable> createRecordReader(
InputSplit split, TaskAttemptContext context) throws IOException,
InterruptedException {
WholeFileRecordReader reader = new WholeFileRecordReader();
reader.initialize(split, context);
return reader;
}
}
Custom WholeFileRecordReader:
class WholeFileRecordReader extends RecordReader<NullWritable, BytesWritable> {
private FileSplit fileSplit;
private Configuration conf;
private BytesWritable value = new BytesWritable();
private boolean processed = false;
#Override
public void initialize(InputSplit split, TaskAttemptContext context)
throws IOException, InterruptedException {
this.fileSplit = (FileSplit) split;
this.conf = context.getConfiguration();
}
#Override
public boolean nextKeyValue() throws IOException, InterruptedException {
if (!processed) {
byte[] contents = new byte[(int) fileSplit.getLength()];
Path file = fileSplit.getPath();
FileSystem fs = file.getFileSystem(conf);
FSDataInputStream in = null;
try {
in = fs.open(file);
IOUtils.readFully(in, contents, 0, contents.length);
value.set(contents, 0, contents.length);
} finally {
IOUtils.closeStream(in);
}
processed = true;
return true;
}
return false;
}
#Override
public NullWritable getCurrentKey() throws IOException,InterruptedException {
return NullWritable.get();
}
#Override
public BytesWritable getCurrentValue() throws IOException,InterruptedException {
return value;
}
#Override
public float getProgress() throws IOException {
return processed ? 1.0f : 0.0f;
}
#Override
public void close() throws IOException {
// do nothing
}
}
PROBLEM:
After setting two input paths, all map tasks read files from only one directory..
Thanks in advance.
You'll have to use MultipleInputs instead of FileInputFormat in the driver. So your code should be as:
MultipleInputs.addInputPath(job, new Path(args[0]), <Input_Format_Class_1>);
MultipleInputs.addInputPath(job, new Path(args[1]), <Input_Format_Class_2>);
.
.
.
MultipleInputs.addInputPath(job, new Path(args[N-1]), <Input_Format_Class_N>);
So if you want to use WholeFileInputFormat for the first input path and TextInputFormat for the second input path, you'll have to use it the following way:
MultipleInputs.addInputPath(job, new Path(args[0]), WholeFileInputFormat.class);
MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class);
Hope this works for you!

Resources