OpenNLP Custom POS Tagger : How to make Dictionary override input tags - dictionary

I am using OpenNLP for creating my own POS Tagger as follows
public Trainer(String trainingData, String modelSavePath, String dictionary){
try {
dataIn = new MarkableFileInputStreamFactory(
new File(classLoader.getResource(trainingData).getFile()));
lineStream = new PlainTextByLineStream(dataIn, "UTF-8");
ObjectStream<POSSample> sampleStream = new WordTagSampleStream(lineStream);
POSTaggerFactory fac=new POSTaggerFactory();
if(dictionary!=null && dictionary.length()>0)
{
fac.setDictionary(new Dictionary(new FileInputStream(classLoader.getResource(dictionary).getFile())));
}
model = POSTaggerME.train("en", sampleStream, TrainingParameters.defaultParams(), fac);
} catch (IOException e) {
// Failed to read or parse training data, training failed
e.printStackTrace();
} finally {
if (lineStream != null) {
try {
lineStream.close();
} catch (IOException e) {
// Not an issue, training already finished.
// The exception should be logged and investigated
// if part of a production system.
e.printStackTrace();
}
}
}
OutputStream modelOut = null;
try {
modelOut = new BufferedOutputStream(new FileOutputStream(modelSavePath));
//modelOut = new BufferedOutputStream(new FileOutputStream(new File(getClass().getResource(modelSavePath).toURI())));
model.serialize(modelOut);
} catch (IOException e) {
// Failed to save model
e.printStackTrace();
} finally {
if (modelOut != null) {
try {
modelOut.close();
} catch (IOException e) {
// Failed to correctly save model.
// Written model might be invalid.
e.printStackTrace();
}
}
}
}
which works well and saves the newly created model as a bin file. I want the dictionary terms to overwrite the words in my input and i do not see this behavior.
So consider the input
Mary_NNP had_VBD a_DT little_JJ lamb_NN
Now i want the tag to be
lamb_LAMB
so put this in the dictionary
<?xml version="1.0" encoding="UTF-8"?>
<dictionary>
<entry tags="LAMB">
<token>lamb</token>
</entry>
</dictionary>
But when i try out the newly trained tagger, i still see the tag as NN for lamb
However, if my training data is
Mary_NNP had_VBD a_DT little_JJ lamb_LAMB
then it works as expected. Also, if i do not have the word lamb in my training data at all, then the custom generated tagger uses the dictionary tag.
How can i make sure that the dictionary tag always overrides the training data tag? Do i have to modify the training in any way?

Related

Spring MVC Multipart file upload random FileNotFoundException

I built a web application using spring MVC, everything is working fine except the file upload in which I got random FileNotFoundExceptions. I found some solutions online like using a different tmp folder but I keep getting random error.
My code is:
#RequestMapping(value="/upload", method=RequestMethod.POST)
public #ResponseBody String handleFileUpload(#RequestParam("file") final MultipartFile multipartFile,
#RequestHeader("email") final String email, #RequestHeader("password") String password){
if (authenticateUser(email, password)) {
if (!multipartFile.isEmpty()) {
System.out.println("Start processing");
Thread thread = new Thread(){
public void run(){
ProcessCSV obj = new ProcessCSV();
try {
File file = multipartToFile(multipartFile);
if(file !=null) {
obj.extractEvents(file, email, cluster, session);
}
else {
System.out.println("null File");
}
} catch (IOException e) {
System.out.println("File conversion error");
e.printStackTrace();
}
}
};
thread.start();
return "true";
} else {
return "false";
}
}
else {
return "false";
}
}
and:
public File multipartToFile(MultipartFile multipartFile) throws IOException {
File uploadFile = null;
if(multipartFile != null && multipartFile.getSize() > 0) {
uploadFile = new File("/tmp/" + multipartFile.getOriginalFilename());
FileOutputStream fos = null;
try {
uploadFile.createNewFile();
fos = new FileOutputStream(uploadFile);
IOUtils.copy(multipartFile.getInputStream(), fos);
} catch (FileNotFoundException e) {
System.out.println("File conversion error");
e.printStackTrace();
} catch (IOException e) {
System.out.println("File conversion error");
e.printStackTrace();
} finally {
if (fos != null) {
try {
fos.close();
} catch (IOException e) {
System.out.println("File conversion error");
e.printStackTrace();
}
}
}
}
else {
System.out.println("null MultipartFile");
}
return uploadFile;
}
and the configuration file:
multipart.maxFileSize: 100MB
multipart.maxRequestSize: 100MB
multipart.location = ${user.home}
server.port = 8090
I used different versions of the multipartToFile function, one was using multipartfile.transferTo() but I was getting the same random error. Any advice?
Thank you
EDIT stack trace:
java.io.IOException: java.io.FileNotFoundException: /Users/aaa/upload_07720775_4b37_4b86_b370_40280388f3a4_00000003.tmp (No such file or directory)
at org.apache.catalina.core.ApplicationPart.write(ApplicationPart.java:121)
at org.springframework.web.multipart.support.StandardMultipartHttpServletRequest$StandardMultipartFile.transferTo(StandardMultipartHttpServletRequest.java:260)
at main.RESTController.multipartToFile(RESTController.java:358)
at main.RESTController$1.run(RESTController.java:241)
Caused by: java.io.FileNotFoundException: /Users/aaa/upload_07720775_4b37_4b86_b370_40280388f3a4_00000003.tmp (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at org.apache.tomcat.util.http.fileupload.disk.DiskFileItem.write(DiskFileItem.java:392)
at org.apache.catalina.core.ApplicationPart.write(ApplicationPart.java:119)
... 3 more
I had just had a night of terror with this error. I found out that MultiPartFile is only recognisable to and by the #Controller class. So if you pass it to another bean which is not a controller, Spring will not be able to help you. It somewhat makes sense that the #Controller is tightly bound to the front screen (communication from the browser to the system - Controllers are the entry point from the browser). So any conversation must happen there in the Controller.
In my case, I did something like the following:
#Controller
public class FileUploadingController{
#PostMapping("/uploadHistoricData")
public String saveUploadedDataFromBrowser(#RequestParam("file") MultipartFile file) {
try {
String pathToFile = "/home/username/destination/"
new File(pathToFile).mkdir();
File newFile = new File(pathToFile + "/uploadedFile.csv");
file.transferTo(newFile); //transfer the uploaded file data to a java.io.File which can be passed between layers
dataService.processUploadedFile( newFile);
} catch (IOException e) {
//handle your exception here please
}
return "redirect:/index?successfulDataUpload";
}
}`
I had the same problem, it looks like MultipartFile is using different current dir internally, so all not absolute paths are not working.
I had to convert my path to an absolute path and then it worked.
It is working inside #RestController and in other beans too.
Path path = Paths.get(filename).toAbsolutePath();
fileToImport.transferTo(path.toFile());
fileToImport is MultipartFile.

Riak Yokuzuna Schema upload , create index and search query always result in error 60,56,27

public class RiakSearch {
public static final String RIAK_SERVER = "10.11.172.17";
private static RiakCluster setUpCluster() throws UnknownHostException {
// This example will use only one node listening on localhost:10017
RiakNode node = new RiakNode.Builder()
.withRemoteAddress("10.11.172.17")
.withAuth("administrator", "password#123", null).build();
// This cluster object takes our one node as an argument
RiakCluster cluster = new RiakCluster.Builder(node).build();
// The cluster must be started to work, otherwise you will see errors
cluster.start();
return cluster;
}
public void uploadSchema() {
try {
RiakCluster cluster = setUpCluster();
RiakClient client = new RiakClient(cluster);
System.out.println("Client object successfully created");
File xml = new File("blog_post_schema.xml");
String xmlString = FileUtils.readFileToString(xml);
YokozunaSchema schema = new YokozunaSchema("blog_post_schema",
xmlString);
StoreSchema storeSchemaOp = new StoreSchema.Builder(schema).build();
client.execute(storeSchemaOp);
} catch (UnknownHostException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ExecutionException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static void main(String[] args) {
// TODO Auto-generated method stub
RiakSearch obj = new RiakSearch();
obj.uploadSchema();
}
}
java.util.concurrent.ExecutionException: com.basho.riak.client.core.netty.RiakResponseException: Unknown message code: 56
at com.basho.riak.client.core.FutureOperation.get(FutureOperation.java:260)
at com.basho.riak.client.api.commands.CoreFutureAdapter.get(CoreFutureAdapter.java:52)
at com.basho.riak.client.api.RiakCommand.execute(RiakCommand.java:89)
at com.basho.riak.client.api.RiakClient.execute(RiakClient.java:293)
at com.search.RiakSearch.main(RiakSearch.java:64)
Caused by: com.basho.riak.client.core.netty.RiakResponseException: Unknown message code: 56
at com.basho.riak.client.core.netty.RiakResponseHandler.channelRead(RiakResponseHandler.java:52)
at io.netty.channel.ChannelHandlerInvokerUtil.invokeChannelReadNow(ChannelHandlerInvokerUtil.java:84)
at io.netty.channel.DefaultChannelHandlerInvoker.invokeChannelRead(DefaultChannelHandlerInvoker.java:153)
at io.netty.channel.PausableChannelEventExecutor.invokeChannelRead(PausableChannelEventExecutor.java:86)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:389)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:243)
at io.netty.handler.codec.ByteToMessageCodec.channelRead(ByteToMessageCodec.java:103)
at io.netty.channel.ChannelHandlerInvokerUtil.invokeChannelReadNow(ChannelHandlerInvokerUtil.java:84)
at io.netty.channel.DefaultChannelHandlerInvoker.invokeChannelRead(DefaultChannelHandlerInvoker.java:153)
at io.netty.channel.PausableChannelEventExecutor.invokeChannelRead(PausableChannelEventExecutor.java:86)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:389)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:956)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:127)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:514)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:471)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:385)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:351)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at io.netty.util.internal.chmv8.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1412)
at io.netty.util.internal.chmv8.ForkJoinTask.doExec(ForkJoinTask.java:280)
at io.netty.util.internal.chmv8.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:877)
at io.netty.util.internal.chmv8.ForkJoinPool.scan(ForkJoinPool.java:1706)
at io.netty.util.internal.chmv8.ForkJoinPool.runWorker(ForkJoinPool.java:1661)
at io.netty.util.internal.chmv8.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:126)
Make sure that Solr is actually started. By default, search is disabled in Riak 2.x. In order to enable it, change search property in /etc/riak/riak.conf to on. Then restart Riak.
I had the similar issue of
RiakError: 'Unknown message code: 56'
I solved it by changing the parameter of search in the 'riak.conf' file
Here is the file location, if you are using mac and installed via brew
/usr/local/Cellar/riak/2.2.2/libexec/etc/riak.conf
Here are the lines of code i changed from off to on
## To enable Search set this 'on'.
##
## Default: off
##
## Acceptable values:
## - on or off
search = on
I found the documentation explanation a little bit tricky to follow but more or less it is the reference to solve the issue.

Error Handling in Web Form

I'm trying to cater for an error in my data access layer, which would return an int of value -1. See below:
protected void FolderBtn_Click(object sender, EventArgs e)
{
if (Page.IsValid)
{
try
{
DocsDALC dalc = new DocsDALC();
// Updating two tables here - Folders and FolderAccess tables
// - as an ADO.NET transaction
int folderID = dalc.CreateFolder(...);
if (folderID > 0)
{
Response.Redirect(Request.Url.ToString(), false);
// Re-construct this to include newly-created folderID
}
else
{
// How do I throw error from here?
}
}
catch (Exception ex)
{
HandleErrors(ex);
}
}
}
If data layer returns -1, how can I throw an error from within the try block?
As simply as the following - however, since you are catching errors, if you know it's a problem, it would be better to call an overload of HandleErrors method that you could pass in a string defining the problem, rather than throw the exception (which is costly for what this will do).
If you still want to throw the exception:
if (folderID > 0)
{
Response.Redirect(Request.Url.ToString(), false);
// Re-construct this to include newly-created folderID
}
else
{
throw new Exception("Database returned -1 from CreateFolder method");
}
A possible alternative:
if (folderID > 0)
{
Response.Redirect(Request.Url.ToString(), false);
// Re-construct this to include newly-created folderID
}
else
{
HandleErrors("Database returned -1 from CreateFolder method");
}
With of course an overloaded HandleErrors method.

Vaadin 7: File Upload

I have a Upload component in which I´m supposed to import a xml file in order to parse it.
I´m trying to use the File.createTempFile method to create the file phisically,but something weird is going on.
For example,if I take the file named "test.xml" and use the createTempFile method to create it on the disk,the name of the generate file becomes something like 'test.xml13234xml'.How can I create the file the correct way?
This is expected when using i.e. createTempFile method as it implicitly creates a file with random prefix:
// a part of createTempFile method
private static final SecureRandom random = new SecureRandom();
static File generateFile(String prefix, String suffix, File dir) {
long n = random.nextLong();
if (n == Long.MIN_VALUE) {
n = 0; // corner case
} else {
n = Math.abs(n);
}
return new File(dir, prefix + Long.toString(n) + suffix);
}
which should give something like 'test.xml13234xml'.
If you want to create a file with the correct name and keep it for later use you can rename/move it within uploadSucceeded method.
public class ExampleUpload implements Upload.Receiver, Upload.SucceededListener {
private Upload xmlUpload;
private File tempFile;
public ExampleUpload() {
this.xmlUpload = new Upload("Upload:", this);
this.xmlUpload.addSucceededListener(this);
}
#Override
public OutputStream receiveUpload(String filename, String mimeType) {
try {
tempFile = File.createTempFile(filename, "xml");
tempFile.deleteOnExit();
return new FileOutputStream(tempFile);
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
#Override
public void uploadSucceeded(SucceededEvent event) {
try {
File destinationFile = new File("c:\\" + event.getFilename());
FileUtils.moveFile(tempFile, destinationFile));
// TODO read and parse destinationFile
} catch (IOException e) {
e.printStackTrace();
}
}
}

TrueZip and MultiPart form

I am currently using TrueZip to add a file to a Zip file that was uploaded to a server via MultiPartFile.
The Problem
Upon appending a file the zip becomes invalid. It can no longer be opened as a zip file.
The Code
Let's start with the relevant code in my upload controller (file is the MultiPartFile):
// Get the file
File dest = null;
TFile zip = null;
try {
// Obtain the file locally, zip, and delete the old
dest = new File(request.getRealPath("") + "/datasource/uploads/" + fixedFileName);
file.transferTo(dest);
// Validate
zip = new TFile(dest);
resp = mls.validateMapLayer(zip);
// Now perform the upload and delete the temp file
FoundryUserDetails userDetails = (FoundryUserDetails) SecurityContextHolder.getContext().getAuthentication()
.getPrincipal();
UserIdentity ui = userDetails.getUserIdentity();
MapLayer newLayer = new MapLayer();
// generate the prj
mls.generateProjection(resp, dest.getAbsolutePath(), projection);
The method "generateProjection" is where the file is added:
public void generateProjection(UploadMapResponse resp, String fLoc, FoundryCRS proj) throws NoSuchAuthorityCodeException,
FactoryException, IOException {
TFile projFile = new TFile(fLoc, resp.getLayerName() + ".prj");
CoordinateReferenceSystem crs = CRS.decode(proj.getEpsg());
String wkt = crs.toWKT();
TConfig config = TConfig.push();
try {
config.setOutputPreferences(config.getOutputPreferences().set(FsOutputOption.GROW));
TFileOutputStream writer = new TFileOutputStream(projFile);
try {
writer.write(wkt.getBytes());
} finally {
writer.close();
}
} finally {
config.close();
}
}
In order to test if this worked at all I tried it in a simple main:
public static void main(String[] args) {
File f = new File("C:/Data/SierritaDec2011TopoContours.zip");
TFile tf = new TFile(f);
tf.listFiles();
TFile proj = new TFile(f, "test.prj");
TConfig config = TConfig.push();
try {
config.setOutputPreferences(config.getOutputPreferences().set(FsOutputOption.GROW));
TFileOutputStream writer = null;
try {
writer = new TFileOutputStream(proj);
} catch (FileNotFoundException e1) {
e1.printStackTrace();
}
try {
writer.write("Hello Zip world".getBytes());
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally {
try {
writer.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
} finally {
// Pop the current configuration off the inheritable thread local
// stack.
config.close();
}
}
Which, of course, works just fine.
The Question
Does anyone have insight into why, in a web server with a MultiPartFile copied to a local file, the TFileOutputStream fails to write properly?
In a long running server app, you may need to add a call to TVFS.sync() or TVFS.umount() in order to sync or umount archive files. In the case of ZIP files, this will trigger to write the Central Directory at the end of the ZIP file, which is required to form a valid ZIP file.
Please check the Javadoc to decide which call is the best for your use case: http://truezip.java.net/apidocs/de/schlichtherle/truezip/file/TVFS.html
Also, please note that calling TFVS.sync() or TVFS.umount() after each append operation will result in a growing Central Directory to be written each time, which results in huge overhead. So it's worth to consider when exactly you need to do this. Generally speaking this is only required when you want a third party to access the ZIP file. A third party is anyone not interacting with the TrueZIP Kernel for accessing the ZIP file.

Resources