How to generate text from Seq2SeqLMOutput of Transformers? - decode

I am working on a text generation task using T5. For some reason, I can't use T5ForConditionalGeneraton class which includes functions like model.generate(). Now the output of model is Seq2SeqLMOutput, same as T5ForConditionalGeneraton.
This blog shows a simple example of greedy encoding, but model.generate() uses more complicated method beam search. Are there simple ways to decode with beam search?

Related

Why use macros in Julia?

I was reading up on the documentation of macros and ran into the following under the `Hold up: why macros' section. The reasoning given to use macros is as follows:
Macros are necessary because they execute when code is parsed,
therefore, macros allow the programmer to generate and include
fragments of customized code before the full program is run
This leads me to wonder why someone would want to use "generate and include fragments of customized code before the full program is run". Can someone provide context as to why this would be beneficial and/or other good use cases for macros?
Let me give you my view on macros.
A macro basically is a code -> code function. It takes code (a Julia expression) as input and spits out code (a different Julia expression).
Why is this useful? It has multiple purposes:
compile time copy-and-paste: You don't have to write the same piece of code multiple times but instead can define a short macro that writes it for you wherever you put it. (example)
domain specific language (DSL): You can create special syntax that after the macros code -> code transform is replaced by pure Julia constructs. This is used in many packages to define special syntax, for example here and here.
code generation: Imagine you want to write a really long piece of code which, although being long, is very simple because it has some kind of pattern that repeats itself rather trivially. Writing that code by hand can be a pain (or even practically impossible). A macro can programmatically generate the code for you. One example is for-loop unrolling (see here and here). But even the #time macro isn't doing much more than just putting a bunch of Base.time_ns() function calls around the provided Julia expression.
special string parsing: If you type the literal 3.2 in Julia it will be parsed and interpreted as a Float64. Now, imagine you want to supply a number literally that goes beyond Float64 precision but would fit into a BigFloat. Typing big(3.123124812498124812498) won't work, because the literal number is first interpreted as a Float64 and then handed to the big function. Instead you need a way to tell Julia at parse time that this should become a BigFloat. This is handled by a #big_str 3.2 macro which (for convenience) can also be written as big"3.2". The latter is just syntactic sugar.
There might be many more applications of macros, but those are the most important to me.
Let me end by referencing Steven G. Johnson's great talk at JuliaCon 2019:
Most of the time, don't do metaprogramming :)

How to write epsilon in Xmgrace?

In Xmgrace, the var-epsilon (the inverted-3 form) can be written as \xe\f{}.
I need to write the epsilon (the lunate form) in xmgrace.
You can use the "\xÎ" combination to get something that is similar to (but not exactly) a lunate epsilon, but otherwise I am not aware of any other methods that can be used to directly input an \epsilon into xmgrace. Unfortunately, xmgrace is an old piece of software and does not have unicode support.
If you have no alternatives, I think that best course of action would be to generate an eps file and use inkscape or a similar software to directly edit the "right" epsilon in.

In chainer, How to write BPTT updater using multiple GPUs?

I don't find example because existing example only extends training.StandardUpdater, thus only use One GPU.
I assume that you are talking about the BPTTUpdater of the ptb example of Chainer.
It's not straight forward to make the customized updater support learning on multiple GPUs. The MultiprocessParallelUpdater hard code the way to compute the gradient (only the target link implementation is customizable), so you have to copy the overall implementation of MultiprocessParallelUpdater and modify the gradient computation parts. What you have to copy and edit is chainer/training/updaters/multiprocess_parallel_updater.py.
There are two parts in this file that compute gradient; one in _Worker.run, which represents a worker process task, and the other in MultiprocessParallelUpdater.update_core, which represents the master process task. You have to make these code do BPTT by modifying the code starting from _calc_loss to backward in each of these two parts:
# Change self._master into self.model for _Worker.run code
loss = _calc_loss(self._master, batch)
self._master.cleargrads()
loss.backward()
It should be modified by inserting the code of BPTTUpdater.update_core.
You also have to take care on the data iterators. MultiprocessParallelUpdater accept the set of iterators that will be distributed to master/worker processes. Since the ptb example uses a customized iterator (ParallelSequentialIterator), you have to make sure that these iterators iterate over different portions of the dataset or using different initial offsets of word positions. It may require customization to ParalellSequentialIterator as well.

Cheat sheet for caffe / pycaffe?

Does anyone know whether there is a cheat sheet for all important pycaffe commands?
I was so far using caffe only via Matlab interface and terminal + bash scripts.
I wanted to shift towards using ipython and work through the ipython notebook examples. However I find it hard to get an overview of all the functions that are inside the caffe module for python. (I'm also quite new to python).
The pycaffe tests and this file are the main gateway to the python coding interface.
First of all, you would like to choose whether to use Caffe with CPU or GPU. It is sufficient to call caffe.set_mode_cpu() or caffe.set_mode_gpu(), respectively.
Net
The main class that the pycaffe interface exposes is the Net. It has two constructors:
net = caffe.Net('/path/prototxt/descriptor/file', caffe.TRAIN)
which simply create a Net (in this case using the Data Layer specified for training), or
net = caffe.Net('/path/prototxt/descriptor/file', '/path/caffemodel/weights/file', caffe.TEST)
which creates a Net and automatically loads the weights as saved in the provided caffemodel file - in this case using the Data Layer specified for testing.
A Net object has several attributes and methods. They can be found here. I will cite just the ones I use more often.
You can access the network blobs by means of Net.blobs. E.g.
data = net.blobs['data'].data
net.blobs['data'].data[...] = my_image
fc7_activations = net.blobs['fc7'].data
You can access the parameters (weights) too, in a similar way. E.g.
nice_edge_detectors = net.params['conv1'].data
higher_level_filter = net.params['fc7'].data
Ok, now it's time to actually feed the net with some data. So, you will use backward() and forward() methods. So, if you want to classify a single image
net.blobs['data'].data[...] = my_image
net.forward() # equivalent to net.forward_all()
softmax_probabilities = net.blobs['prob'].data
The backward() method is equivalent, if one is interested in computing gradients.
You can save the net weights to subsequently reuse them. It's just a matter of
net.save('/path/to/new/caffemodel/file')
Solver
The other core component exposed by pycaffe is the Solver. There are several types of solver, but I'm going to use only SGDSolver for the sake of clarity. It is needed in order to train a caffe model.
You can instantiate the solver with
solver = caffe.SGDSolver('/path/to/solver/prototxt/file')
The Solver will encapsulate the network you are training and, if present, the network used for testing. Note that they are usually the same network, only with a different Data Layer. The networks are accessible with
training_net = solver.net
test_net = solver.test_nets[0] # more than one test net is supported
Then, you can perform a solver iteration, that is, a forward/backward pass with weight update, typing just
solver.step(1)
or run the solver until the last iteration, with
solver.solve()
Other features
Note that pycaffe allows you to do more stuff, such as specifying the network architecture through a Python class or creating a new Layer type.
These features are less often used, but they are pretty easy to understand by reading the test cases.
Please note that the answer by Flavio Ferrara has a litte problem which may cause you waste a lot of time:
net.blobs['data'].data[...] = my_image
net.forward()
The code above is noneffective if your first layer is a Data type layer, because when net.forward() is called, it will begin from the first layer, and then your inserted data my_image will be covered. So it will show no error but give you totally irrelevant output. The correct way is to assign the start and end layer, for example:
net.forward(start='conv1', end='fc')
Here is a Github repository of Face Verification Experiment on LFW Dataset, using pycaffe and some matlab code. I guess it could help a lot, especially the caffe_ftr.py file.
https://github.com/AlfredXiangWu/face_verification_experiment
Besides, here are some short example code of using pycaffe for image classification:
http://codrspace.com/Jaleyhd/caffe-python-tutorial/
http://prog3.com/sbdm/blog/u011762313/article/details/48342495

Information Extraction -> Relationships

"The movie was amazing. The background music was eccentric and the lighting was perfect"
Movie:Amazing
Background Music:Eccentric
Lighting:Perfect
How viable is this in C#?
I'm using the stanford NLP library but I don't have a clue on how I'd do this.
You can use http://sergey-tihon.github.io/Stanford.NLP.NET/StanfordCoreNLP.html which is a .Net port of CoreNLP, though I'm not sure how up-to-date it is.
You are basically doing POS tagging: finding the adjective associated with the noun.
You can try your text here: http://nlp.stanford.edu:8080/corenlp/process
Get your POS model from CoreNLP, find the nouns (NN) that you're interested in, and get their corresponding adjectives (JJ)

Resources