Using Camembert pre-trained model with DeepPavlov

Using Camembert pre-trained model with DeepPavlov - bert-language-model

I'm learning how to use DeepPavlov and can't figure how to use it for NER using the Camembert (french) pre-trained model. My goal is to tag a short paragraph in french.
The docs from deeppavlov explicitly list the Camembert model as a viable transformer architecture. I tried to follow as best as I could but I keep getting this error when I try to build the model.
>>> ner_model = build_model('ner_ontonotes_bert_mult')
/home/philippe/.local/lib/python3.10/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
Some weights of the model checkpoint at camembert-base were not used when initializing CamembertForTokenClassification: ['lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'lm_head.dense.bias']
- This IS expected if you are initializing CamembertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing CamembertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of CamembertForTokenClassification were not initialized from the model checkpoint at camembert-base and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
2023-01-29 09:58:13.308 ERROR in 'deeppavlov.core.common.params'['params'] at line 108: Exception in <class 'deeppavlov.models.torch_bert.torch_transformers_sequence_tagger.TorchTransformersSequenceTagger'>
Traceback (most recent call last):
File "/home/philippe/.local/lib/python3.10/site-packages/deeppavlov/core/common/params.py", line 102, in from_params
component = obj(**dict(config_params, **kwargs))
File "/home/philippe/.local/lib/python3.10/site-packages/deeppavlov/models/torch_bert/torch_transformers_sequence_tagger.py", line 182, in __init__
super().__init__(optimizer=optimizer,
File "/home/philippe/.local/lib/python3.10/site-packages/deeppavlov/core/models/torch_model.py", line 98, in __init__
self.load()
File "/home/philippe/.local/lib/python3.10/site-packages/deeppavlov/models/torch_bert/torch_transformers_sequence_tagger.py", line 295, in load
self.crf = CRF(self.n_classes).to(self.device)
File "/home/philippe/.local/lib/python3.10/site-packages/deeppavlov/models/torch_bert/crf.py", line 13, in __init__
super().__init__(num_tags=num_tags, batch_first=batch_first)
File "/home/philippe/.local/lib/python3.10/site-packages/torchcrf/__init__.py", line 40, in __init__
raise ValueError(f'invalid number of tags: {num_tags}')
ValueError: invalid number of tags: 0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/philippe/.local/lib/python3.10/site-packages/deeppavlov/core/commands/infer.py", line 55, in build_model
component = from_params(component_config, mode=mode)
File "/home/philippe/.local/lib/python3.10/site-packages/deeppavlov/core/common/params.py", line 102, in from_params
component = obj(**dict(config_params, **kwargs))
File "/home/philippe/.local/lib/python3.10/site-packages/deeppavlov/models/torch_bert/torch_transformers_sequence_tagger.py", line 182, in __init__
super().__init__(optimizer=optimizer,
File "/home/philippe/.local/lib/python3.10/site-packages/deeppavlov/core/models/torch_model.py", line 98, in __init__
self.load()
File "/home/philippe/.local/lib/python3.10/site-packages/deeppavlov/models/torch_bert/torch_transformers_sequence_tagger.py", line 295, in load
self.crf = CRF(self.n_classes).to(self.device)
File "/home/philippe/.local/lib/python3.10/site-packages/deeppavlov/models/torch_bert/crf.py", line 13, in __init__
super().__init__(num_tags=num_tags, batch_first=batch_first)
File "/home/philippe/.local/lib/python3.10/site-packages/torchcrf/__init__.py", line 40, in __init__
raise ValueError(f'invalid number of tags: {num_tags}')
ValueError: invalid number of tags: 0
I downloaded the camembert-base model from https://huggingface.co/camembert-base and copied the files in .deeppavlov/models/camembert-base directory.
Then I figured the deeppavlov's model 'ner_ontonotes_bert_mult' was the best for my use so I edited the config file and changed thoses lines in the metadata section at the end. The docs from DeepPavlov ask to change the TRANSFORMER value, witch I did, and I changed the MODEL_PATH so it point to the files I downloaded previously.
"variables": {
"ROOT_PATH": "~/.deeppavlov",
"DOWNLOADS_PATH": "{ROOT_PATH}/downloads",
"MODELS_PATH": "{ROOT_PATH}/models",
"TRANSFORMER": "camembert-base",
"MODEL_PATH": "{MODELS_PATH}/camembert-base"
},
I am aware that I should have copied the config file to a new one with a different name but this should not be a problem.
Then in python I did the following :
from deeppavlov import configs, build_model
build_model('ner_ontonotes_bert_mult')`
And then I get the error mentioned before. I am lost and don't know where to look from now.

Related

Failing to train Google Colab: Custom Training StyleGan2-ADA

I've been trying to get the following colab notbook to work without success.
I was able to scrape images needed and (hopefully) resized them all to 256x256.
After connecting my drive and following all steps, I reached the training part.
There, I am using the default values with: resume_from = "ffhq256"
The process start but ends after ~20 secods with the following error:
Constructing networks...
Setting up TensorFlow plugin "fused_bias_act.cu": Compiling... Loading... Done.
Setting up TensorFlow plugin "upfirdn_2d.cu": Compiling... Loading... Done.
Resuming from "https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/transfer-learning-source-nets/ffhq-res256-mirror-paper256-noaug.pkl"
Downloading https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/transfer-learning-source-nets/ffhq-res256-mirror-paper256-noaug.pkl ... done
Traceback (most recent call last):
File "train.py", line 645, in <module>
main()
File "train.py", line 637, in main
run_training(**vars(args))
File "train.py", line 522, in run_training
training_loop.training_loop(**training_options)
File "/content/drive/MyDrive/colab-sg2-ada/stylegan2-ada/training/training_loop.py", line 129, in training_loop
G.copy_vars_from(rG)
File "/content/drive/MyDrive/colab-sg2-ada/stylegan2-ada/dnnlib/tflib/network.py", line 512, in copy_vars_from
self._components[name].copy_vars_from(src_comp)
File "/content/drive/MyDrive/colab-sg2-ada/stylegan2-ada/dnnlib/tflib/network.py", line 509, in copy_vars_from
self.copy_own_vars_from(src_net)
File "/content/drive/MyDrive/colab-sg2-ada/stylegan2-ada/dnnlib/tflib/network.py", line 482, in copy_own_vars_from
tfutil.set_vars({self._get_vars()[name]: value for name, value in value_dict.items() if name in self._get_vars()})
File "/content/drive/MyDrive/colab-sg2-ada/stylegan2-ada/dnnlib/tflib/tfutil.py", line 227, in set_vars
run(ops, feed_dict)
File "/content/drive/MyDrive/colab-sg2-ada/stylegan2-ada/dnnlib/tflib/tfutil.py", line 33, in run
return tf.get_default_session().run(*args, **kwargs)
File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/client/session.py", line 1156, in _run
(np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (3, 3, 512, 256) for Tensor 'G_synthesis/64x64/Conv0_up/weight/new_value:0', which has shape '(3, 3, 512, 512)'
Any help would be greatly appreciated!

Error when exporting from BigQuery to MySQL

I am trying to export a table from BigQuery to Google Cloud MySQL database.
I found this operator called BigQueryToMySqlOperator (documented here https://airflow.apache.org/docs/apache-airflow-providers-google/stable/_api/airflow/providers/google/cloud/transfers/bigquery_to_mysql/index.html?highlight=bigquerytomysqloperator#module-airflow.providers.google.cloud.transfers.bigquery_to_mysql)
When I deploy the DAG containing this task onto cloud composer, the task always failed with the error
Traceback (most recent call last):
File "/opt/python3.8/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1113, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1287, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1317, in _execute_task
result = task_copy.execute(context=context)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/transfers/bigquery_to_mysql.py", line 166, in execute
for rows in self._bq_get_data():
File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/transfers/bigquery_to_mysql.py", line 138, in _bq_get_data
response = cursor.get_tabledata(
File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/bigquery.py", line 2508, in get_tabledata
return self.hook.get_tabledata(*args, **kwargs)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/bigquery.py", line 1284, in get_tabledata
rows = self.list_rows(dataset_id, table_id, max_results, selected_fields, page_token, start_index)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/common/hooks/base_google.py", line 412, in inner_wrapper
raise AirflowException(
airflow.exceptions.AirflowException: You must use keyword arguments in this methods rather than positional
I don't really understand why it is throwing out this error. Can anyone help me figuring out what went wrong or how should I export data from BigQuery to MySQL DB? Much thanks for your help!
EDIT: My operator code would basically look like this
transfer_data = BigQueryToMySqlOperator(
task_id='task_id',
dataset_table='origin_bq_table',
mysql_table='dest_table_name',
replace=True,
)

Based on the stacktrace, you are most likely using apache-airflow-providers-google==2.2.0.
airflow.exceptions.AirflowException: You must use keyword arguments in
this methods rather than positional
This error originates from the GoogleBaseHook, which can be traced back the BigQueryToMySqlOperator.
BigQueryToMySqlOperator > BigQueryHook > BigQueryConnection > BigQueryCursor > get_tabledata
The reason why you are getting the AirflowException is because get_tabledata
is called as part of the execute method.
Unforuntately, the test for the operator is not comprehensive since it only checks whether or not the method was called was the correct parameters.
I think this will require a new release of the google provider where the cursor in BigQueryToMySqlOperator calls list_rows with keyword arguments instead of get_tabledata, which calls list_rows with positional arguments.
I have also made a Github Issue in the Airflow repository.

specializing configuration with files instead of variables in Hydra config

I'd like to use specialized configuration as per Hydra documentation in Common Patterns -> Specializing Configuration. The difference is that my specialized configuration is in a file, not just one variable. In the example below I want to choose transform based on the model and the dataset. The configs for different transforms are in files. This would work if I specified all the transform configuration in dataset_model/cifar10_alexnet.yaml file, but that would defeat the purpose because I can't reuse the transform config in this case. Alsewhere in Hydra if you specify the name of the file it would automatically pick up the config in that file, but it does not seem to work in the specialized configuration.
I've modified the example in documentation as follows:
config.yaml:
defaults:
- dataset: cifar10
- model: alexnet
- transform: crop
- dataset_model: ${defaults.0.dataset}_${defaults.1.model}
optional: true
Added directory called transform and two files inside that directory:
crop.yaml:
# #package _group_
type: crop
test1: 7
resize.yaml:
# #package _group_
type: resize
test1: 50
changed file dataset_model/cifar10_alexnet.yaml:
# #package _global_
model:
num_layers: 5
transform: resize
Everything else is exactly as per the documentation. When I run this I get an exception:
Traceback (most recent call last):
File "/home/natalia/.pyenv/versions/3.7.9/lib/python3.7/site-packages/hydra/_internal/config_loader_impl.py", line 720, in _merge_config
ret = OmegaConf.merge(cfg, loaded_cfg)
File "/home/natalia/.pyenv/versions/3.7.9/lib/python3.7/site-packages/omegaconf/omegaconf.py", line 321, in merge
target.merge_with(*others[1:])
File "/home/natalia/.pyenv/versions/3.7.9/lib/python3.7/site-packages/omegaconf/basecontainer.py", line 331, in merge_with
self._format_and_raise(key=None, value=None, cause=e)
File "/home/natalia/.pyenv/versions/3.7.9/lib/python3.7/site-packages/omegaconf/base.py", line 101, in _format_and_raise
type_override=type_override,
File "/home/natalia/.pyenv/versions/3.7.9/lib/python3.7/site-packages/omegaconf/_utils.py", line 629, in format_and_raise
_raise(ex, cause)
File "/home/natalia/.pyenv/versions/3.7.9/lib/python3.7/site-packages/omegaconf/_utils.py", line 610, in _raise
raise ex # set end OC_CAUSE=1 for full backtrace
File "/home/natalia/.pyenv/versions/3.7.9/lib/python3.7/site-packages/omegaconf/basecontainer.py", line 329, in merge_with
self._merge_with(*others)
File "/home/natalia/.pyenv/versions/3.7.9/lib/python3.7/site-packages/omegaconf/basecontainer.py", line 347, in _merge_with
BaseContainer._map_merge(self, other)
File "/home/natalia/.pyenv/versions/3.7.9/lib/python3.7/site-packages/omegaconf/basecontainer.py", line 296, in _map_merge
dest.__setitem__(key, src_value)
File "/home/natalia/.pyenv/versions/3.7.9/lib/python3.7/site-packages/omegaconf/dictconfig.py", line 262, in __setitem__
self._format_and_raise(key=key, value=value, cause=e)
File "/home/natalia/.pyenv/versions/3.7.9/lib/python3.7/site-packages/omegaconf/base.py", line 101, in _format_and_raise
type_override=type_override,
File "/home/natalia/.pyenv/versions/3.7.9/lib/python3.7/site-packages/omegaconf/_utils.py", line 694, in format_and_raise
_raise(ex, cause)
File "/home/natalia/.pyenv/versions/3.7.9/lib/python3.7/site-packages/omegaconf/_utils.py", line 610, in _raise
raise ex # set end OC_CAUSE=1 for full backtrace
omegaconf.errors.ValidationError:
full_key: transform
reference_type=Optional[Dict[Union[str, Enum], Any]]
object_type=dict
So, the question is - is this functionality supported and if it is, what am i doing wrong?

Your config is trying to merge the string "resize" into a dictionary like:
transform:
type: crop
test1: 7
This is not something you can do.
You are not explaining what you are trying to do very well, but my guess is that you want to compose a different transform based on selected dataset.
Hydra 1.1 will add support for recursive defaults list which will probably allow you to do what you want.
This is the doc for the new defaults list. You can install this version as a pre-release (see primary project readme).

How do I enable through-the-filesystem diazo editing with plone 4.3

Summary: through-the-filesystem editing not working for my diazo theme. Plone breaks.
Details:
I've created my first live plone site with 4.3.2 and diazo. You can see the live version at borogreen.org. I would like to keep editing the theme forward.
My ubuntu 12.04LTS test server has only plone432 + diazo + dexterity (not used) + Static resource storage 1.0.2 enabled. For test purposes, I'm using the available sunrain theme.
I've placed the sunrain theme manually inside the /resources folder, as suggested per
http://developer.plone.org/reference_manuals/external/plone.app.theming/userguide.html#deploying-and-testing-themes
Trying to enable that theme in the Site Setup | Theming panel | Advanced, I set the path to the theme rules to
/++theme++sunrain/rules.xml
and the absolute path prefix to
/++theme++sunrain/
Plone does not recognize it: no theme gets enabled. The debug mode spits out the following error codes
2014-03-29 00:10:07 ERROR plone.subrequest Error handling subrequest to /++theme++sunrain/rules.xml
Traceback (most recent call last):
File "/home/plone/Plone/buildout-cache/eggs/plone.subrequest-1.6.7-py2.7.egg/plone/subrequest/__init__.py", line 116, in subrequest
traversed = request.traverse(path)
File "/home/plone/Plone/buildout-cache/eggs/Zope2-2.13.21-py2.7.egg/ZPublisher/BaseRequest.py", line 502, in traverse
subobject = self.traverseName(object, entry_name)
File "/home/plone/Plone/buildout-cache/eggs/Zope2-2.13.21-py2.7.egg/ZPublisher/BaseRequest.py", line 326, in traverseName
ob2 = namespaceLookup(ns, nm, ob, self)
File "/home/plone/Plone/buildout-cache/eggs/zope.traversing-3.13.2-py2.7.egg/zope/traversing/namespace.py", line 112, in namespaceLookup
return traverser.traverse(name, ())
File "/home/plone/Plone/buildout-cache/eggs/plone.resource-1.0.2-py2.7.egg/plone/resource/traversal.py", line 27, in traverse
raise NotFound
NotFound
2014-03-29 00:10:07 ERROR plone.transformchain Unexpected error whilst trying to apply transform chain
Traceback (most recent call last):
File "/home/plone/Plone/buildout-cache/eggs/plone.transformchain-1.0.3-py2.7.egg/plone/transformchain/transformer.py", line 48, in __call__
newResult = handler.transformIterable(result, encoding)
File "/home/plone/Plone/buildout-cache/eggs/plone.app.theming-1.1.1-py2.7.egg/plone/app/theming/transform.py", line 170, in transformIterable
transform = self.setupTransform(runtrace=runtrace)
File "/home/plone/Plone/buildout-cache/eggs/plone.app.theming-1.1.1-py2.7.egg/plone/app/theming/transform.py", line 108, in setupTransform
transform = compileThemeTransform(rules, absolutePrefix, readNetwork, parameterExpressions, runtrace=runtrace)
File "/home/plone/Plone/buildout-cache/eggs/plone.app.theming-1.1.1-py2.7.egg/plone/app/theming/utils.py", line 580, in compileThemeTransform
runtrace=runtrace,
File "/home/plone/Plone/buildout-cache/eggs/diazo-1.0.4-py2.7.egg/diazo/compiler.py", line 115, in compile_theme
read_network=read_network,
File "/home/plone/Plone/buildout-cache/eggs/diazo-1.0.4-py2.7.egg/diazo/rules.py", line 195, in process_rules
rules_doc = etree.parse(rules, parser=rules_parser)
File "lxml.etree.pyx", line 2957, in lxml.etree.parse (src/lxml/lxml.etree.c:56299)
File "parser.pxi", line 1526, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:82331)
File "parser.pxi", line 1555, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:82624)
File "parser.pxi", line 1455, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:81663)
File "parser.pxi", line 1002, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:78623)
File "parser.pxi", line 569, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:74567)
File "parser.pxi", line 650, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:75458)
File "parser.pxi", line 588, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:74760)
IOError: Error reading file '/++theme++sunrain/rules.xml': failed to load external entity "/++theme++sunrain/rules.xml"
What's wrong here?
ps: of course I can upload the theme as zip file and enable it that way, which works fine. I would really like to edit through-the-filesystem as I can foresee a lot of development in the future.
An up-to-date and working write-up for plone432 how to edit diazo themes through the filesystem using the /resources directory would be the answer, but I have not found that either outside of the plone.app.theming user guide. Help!

How to debug "TypeError: Can't pickle objects in acquisition wrappers." in Plone

I have a handler that adds a Member to a Group. The last line in this handler causes an error:
TypeError: Can't pickle objects in acquisition wrappers.
> /home/mnieber/.buildout/eggs/ZODB3-3.10.3-py2.6-linux-i686.egg/ZODB/serialize.py(431)_dump()
430 self._p.dump(classmeta)
--> 431 self._p.dump(state)
432 self._file.truncate()
In the pdb debugger I can see that indeed Plone is trying to pickle a value that is an Acquisition wrapper:
ipdb> state
((((<PloneUser 'newuser#usecm.com'>, ('Default_Group',), 'maarten#usecm.com', ('PAS',)),),),)
ipdb> type(state[0][0][0][0])
<type 'Acquisition.ImplicitAcquisitionWrapper'>
However, I cannot see which object is being pickled, and therefore I have no idea which part of my code needs fixing. My question is: how should I go about debugging this error? I have tried looking at all the stack frames, but none of them reveal which object is being serialized.
The handler is this one (run_insecure is a decorator that I use to temporarily install a new security manager that avoids a NotAuthorized error when adding the new member):
#adapter(IPrincipalCreatedEvent)
#run_insecure
def userCreatedHandler(event):
portal_groups = getToolByName(getSite(), "portal_groups")
membersGroup = portal_groups.getGroupById('Default_Group')
membersGroup.addMember(event.principal)
The full error is this one:
Traceback (innermost last):
Module ZPublisher.Publish, line 134, in publish
Module Zope2.App.startup, line 301, in commit
Module transaction._manager, line 89, in commit
Module transaction._transaction, line 329, in commit
Module transaction._transaction, line 443, in _commitResources
Module ZODB.Connection, line 567, in commit
Module ZODB.Connection, line 623, in _commit
Module ZODB.Connection, line 658, in _store_objects
Module ZODB.serialize, line 422, in serialize
Module ZODB.serialize, line 431, in _dump
TypeError: Can't pickle objects in acquisition wrappers.
> /home/mnieber/.buildout/eggs/ZODB3-3.10.3-py2.6-linux-i686.egg/ZODB/serialize.py(431)_dump()
430 self._p.dump(classmeta)
--> 431 self._p.dump(state)
432 self._file.truncate()

I got this kind of problem with pickle, and solved by debugging like you did.
Pickle (used to store objects in ZODB) is trying to serialize your PloneUser, and raising this acquisition wrapper error.
In my case, I was wrapping the portal_workflow object into another class, and had to inherit it from pickle.Pickler, and override __getstate__ method to solve my problem.
This method is called by pickle in order to serialize your object. If you override this method, and return your object.__dict__ without this PloneUser, than this error would not be raised.
This question (although with not your exact problem) have more info about what I'm trying to say.
Nice you could solve your problem.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Using Camembert pre-trained model with DeepPavlov - bert-language-model

Related

Failing to train Google Colab: Custom Training StyleGan2-ADA

Error when exporting from BigQuery to MySQL

specializing configuration with files instead of variables in Hydra config

How do I enable through-the-filesystem diazo editing with plone 4.3

How to debug "TypeError: Can't pickle objects in acquisition wrappers." in Plone

Categories

Resources