Cannot download tensorflow model of cahya/bert-base-indonesian-522M - bert-language-model

I was going to download this model, and then I was going to save it later to be used with bert-serving. Since bert-serving only supports tensorflow model, I need to download the tensorflow one and not the PyTorch. The PyTorch model downloads just fine, but the I cannot download the tensorflow model. I used this code to download:
from transformers import BertTokenizer, TFBertModel
model_name='cahya/bert-base-indonesian-522M'
model = TFBertModel.from_pretrained(model_name)
Here's what I got when running the code on Ubuntu 16.04, python3.5, transformers==2.5.1,
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/username/.local/lib/python3.5/site-packages/transformers/modeling_tf_utils.py", line 346, in from_pretrained
assert os.path.isfile(resolved_archive_file), "Error retrieving file {}".format(resolved_archive_file)
File "/usr/lib/python3.5/genericpath.py", line 30, in isfile
st = os.stat(path)
TypeError: stat: can't specify None for path argument
And here's what I got when running it on Windows 10, python 3.6.5, transformers 3.1.0:
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\transformers\modeling_tf_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
579 if resolved_archive_file is None:
--> 580 raise EnvironmentError
581 except EnvironmentError:
OSError:
During handling of the above exception, another exception occurred:
OSError Traceback (most recent call last)
<ipython-input-3-c2f14f761f05> in <module>()
3 model_name='cahya/bert-base-indonesian-522M'
4 tokenizer = BertTokenizer.from_pretrained(model_name)
----> 5 model = TFBertModel.from_pretrained(model_name)
C:\ProgramData\Anaconda3\lib\site-packages\transformers\modeling_tf_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
585 f"- or '{pretrained_model_name_or_path}' is the correct path to a directory containing a file named one of {TF2_WEIGHTS_NAME}, {WEIGHTS_NAME}.\n\n"
586 )
--> 587 raise EnvironmentError(msg)
588 if resolved_archive_file == archive_file:
589 logger.info("loading weights file {}".format(archive_file))
OSError: Can't load weights for 'cahya/bert-base-indonesian-522M'. Make sure that:
- 'cahya/bert-base-indonesian-522M' is a correct model identifier listed on 'https://huggingface.co/models'
- or 'cahya/bert-base-indonesian-522M' is the correct path to a directory containing a file named one of tf_model.h5, pytorch_model.bin.
This also happens with other cahya/ models. This page says that you can use the tensorflow model. However, based on the error, it seems like the file does not exist over there?
I tried downloading other pretrained model like bert-base-uncased etc. and they download just fine. This issue only happens with cahya/ models.
Am I missing something? or should I report this issue to forum or the github issue?

This seems to be purely an issue of your environment.
Running the first code sample worked fine for me under Ubuntu 18.04 (I think using at least Ubuntu 16.04 should work as well, Windows 10 I cannot guarantee). I further use transformers 3.1.0, and tensorflow 2.3.0.
The first environment to me seems to be purely the fault of outdated versions for both Python (recommendation in general is at least 3.6+ currently, not even tied to transformers specifically), as well as the latest transformers release for full compatibility with models from the ModelHub.
For the second enviornment, I cannot full confirm this, but I supsect that it is due to path handling under Windows 10, as transformers needs to interpret paths as either an OS path, or ModelHub id.

Related

GCP Composer v1.18.6 and 2.0.10 incompatible with CloudSqlProxyRunner

In my Composer Airflow DAGs, I have been using the CloudSqlProxyRunner to connect to my Cloud SQL instance.
However, after updating Google Cloud Composer from v1.18.4 to 1.18.6, my DAG started to encounter a strange error:
[2022-04-22, 23:20:18 UTC] {cloud_sql.py:462} INFO - Downloading cloud_sql_proxy from https://dl.google.com/cloudsql/cloud_sql_proxy.linux.x86_64 to /home/airflow/dXhOYoU_cloud_sql_proxy.tmp
[2022-04-22, 23:20:18 UTC] {taskinstance.py:1702} ERROR - Task failed with exception
Traceback (most recent call last):
File "/opt/python3.8/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1330, in _run_raw_task
self._execute_task_with_callbacks(context)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1457, in _execute_task_with_callbacks
result = self._execute_task(context, self.task)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1513, in _execute_task
result = execute_callable(context=context)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/decorators/base.py", line 134, in execute
return_value = super().execute(context)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/operators/python.py", line 174, in execute
return_value = self.execute_callable()
File "/opt/python3.8/lib/python3.8/site-packages/airflow/operators/python.py", line 185, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/home/airflow/gcs/dags/real_time_scoring_pipeline.py", line 99, in get_messages_db
with SQLConnection() as sql_conn:
File "/home/airflow/gcs/dags/helpers/helpers.py", line 71, in __enter__
self.proxy_runner.start_proxy()
File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/cloud_sql.py", line 524, in start_proxy
self._download_sql_proxy_if_needed()
File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/cloud_sql.py", line 474, in _download_sql_proxy_if_needed
raise AirflowException(
airflow.exceptions.AirflowException: The cloud-sql-proxy could not be downloaded. Status code = 404. Reason = Not Found
Checking manually, https://dl.google.com/cloudsql/cloud_sql_proxy.linux.x86_64 indeed returns a 404.
Looking at the function that raises the exception, _download_sql_proxy_if_needed, it has this code:
system = platform.system().lower()
processor = os.uname().machine
if not self.sql_proxy_version:
download_url = CLOUD_SQL_PROXY_DOWNLOAD_URL.format(system, processor)
else:
download_url = CLOUD_SQL_PROXY_VERSION_DOWNLOAD_URL.format(
self.sql_proxy_version, system, processor
)
So, for whatever reason, in both of these latest images of Composer, processor = os.uname().machine returns x86_64. Previously, it returned amd64, and https://dl.google.com/cloudsql/cloud_sql_proxy.linux.amd64 is in fact a valid link to the binary we need.
I replicated this error in Composer 2.0.10 as well.
I am still investigating possible workarounds, but posting this here in case someone else encounters this issue, and has figured out a workaround, and to raise this with Google engineers (who, according to Composer's docs, monitor this tag).
My current workaround is patching the CloudSqlProxyRunner to hardcode the correct URL:
class PatchedCloudSqlProxyRunner(CloudSqlProxyRunner):
"""
This is a patched version of CloudSqlProxyRunner to provide a workaround for an incorrectly
generated URL to the Cloud SQL proxy binary.
"""
def _download_sql_proxy_if_needed(self) -> None:
download_url = "https://dl.google.com/cloudsql/cloud_sql_proxy.linux.amd64"
# the rest of the code is taken from the original method
proxy_path_tmp = self.sql_proxy_path + ".tmp"
self.log.info(
"Downloading cloud_sql_proxy from %s to %s", download_url, proxy_path_tmp
)
# httpx has a breaking API change (follow_redirects vs allow_redirects)
# and this should work with both versions (cf. issue #20088)
if "follow_redirects" in signature(httpx.get).parameters.keys():
response = httpx.get(download_url, follow_redirects=True)
else:
response = httpx.get(download_url, allow_redirects=True) # type: ignore[call-arg]
# Downloading to .tmp file first to avoid case where partially downloaded
# binary is used by parallel operator which uses the same fixed binary path
with open(proxy_path_tmp, "wb") as file:
file.write(response.content)
if response.status_code != 200:
raise AirflowException(
"The cloud-sql-proxy could not be downloaded. "
f"Status code = {response.status_code}. Reason = {response.reason_phrase}"
)
self.log.info(
"Moving sql_proxy binary from %s to %s", proxy_path_tmp, self.sql_proxy_path
)
shutil.move(proxy_path_tmp, self.sql_proxy_path)
os.chmod(self.sql_proxy_path, 0o744) # Set executable bit
self.sql_proxy_was_downloaded = True
And then instantiate it and use it as I would the original CloudSqlProxyRunner:
proxy_runner = PatchedCloudSqlProxyRunner(path_prefix, instance_spec)
proxy_runner.start_proxy()
But I am hoping that this is properly fixed by someone at Google soon, by fixing the os.uname().machine value,
or uploading a Cloud SQL proxy binary to the one currently generated in _download_sql_proxy_if_needed.
As mentioned by #enocom this commit to support arm64 download links actually caused a side-effect of generating broken download links. I assume the author of the commit thought that the Cloud SQL Proxy had binaries for each machine type, although in fact there are not Linux x86_64 links.
I have created an airflow PR to hopefully fix the broken links, hopefully it will get merged in soon and resolve this. Will update the thread with any updates.
Update (I've been working with Jack on this): I just merged that PR! When a new version of the providers is added to PyPI, you'll need to add it to your Composer environment. In the meantime, as a workaround, you could take the fix from Jack's PR and use it as a local dependency. (Similar to the other reply here!) If you do this, I highly recommend setting a calendar reminder (maybe a month from now?) to remove the workaround and go back to importing from the provider package, just to make sure you don't miss out on other updates to it! :)

"TypeError: expected bytes, str found" when calling .MidiIn() from rtmidi-python

I installed rtmidi_python for Python 3.4.2 from the .whl provided on http://www.lfd.uci.edu/~gohlke/pythonlibs/, and the import works fine, but as soon as I call "rtmidi_python.MidiIn()", I get a TypeError as follows:
Python 3.4.2 (v3.4.2:ab2c023a9432, Oct 6 2014, 22:15:05) [MSC v.1600 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import rtmidi_python
>>> rtmidi_python.MidiIn()
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
rtmidi_python.MidiIn()
File "rtmidi_python.pyx", line 72, in rtmidi_python.MidiIn.__cinit__ (rtmidi_python.cpp:1440)
TypeError: expected bytes, str found
As I understand, after some research, this means that the there's a mistake somewhere in the package itself or in the build of it, and there's nothing I can do about it, but I might be wrong. Can anyone confirm?
I use 3.4.2 because that is the version of Python used by the current version of Blender. I want to use rtmidi-python within the Blender Game Engine.
I am currently working on Windows 7 32 bit, and use .whls to install packages as I do not have the necessary C++ compiler for regular pip installs.
For comparison, I previously installed rtmidi-python for 3.5.1, also from the adequate .whl provided on the link above, and there the command worked perfectly fine.
Should any necessary information be missing, feel free to ask. Thanks ahead if the answer comes as a comment and I don't get to upvote it.
While not a perfect solution, this can be fixed in the manner described by sehqlr here...https://github.com/superquadratic/rtmidi-python/issues/17
...by calling MidiIn() like this:
>>> rtmidi_python.MidiIn(b'in')

Error while running depswriter.py from google closure library

I am trying to build XTK following this link on Linux running on Oracle VirtualBox to get non-minified xtk.js. I am getting following error when I tried to generate the xtk-deps.js on running deps.py file:
Generating dependency file for XTK...
Traceback (most recent call last):
File "/root/Downloads/X-master/lib/google-closure-library/closure/bin/build/depswriter.py", line 212, in <module>
main()
File "/root/Downloads/X-master/lib/google-closure-library/closure/bin/build/depswriter.py", line 196, in main
path_to_source[depspath] = source.Source(source.GetFileContents(srcpath))
File "/root/Downloads/X-master/lib/google-closure-library/closure/bin/build/source.py", line 126, in GetFileContents
return fileobj.read()
File "/usr/lib/python2.7/codecs.py", line 668, in read
return self.reader.read(size)
File "/usr/lib/python2.7/codecs.py", line 474, in read
newchars, decodedbytes = self.decode(data, self.errors)
File "/usr/lib/python2.7/encodings/utf_8_sig.py", line 104, in decode
return codecs.utf_8_decode(input, errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x9a in position 4584: invalid start byte
Could not generate dependency file.
Could anybody please explain why this error is coming.
There's probably some non-uft8 characters in your code (most likely in X.js).
Take my experience for example, in the X.js of XTK, I found there's a non-English word (maybe a German or French name) in line #210. What I did is to delete the character and run build.py again. The encode error didn't appear again.
What worked for me is that I used an earlier commit of google closure library for building XTK and it worked perfectly.
I had to search XTK's commit history extensively to know which version of closure library they were using to build it.
PS: Earlier I posted similar solution here. But the post was deleted by moderator so sharing it here again.

TypeError: Expected bytes While printing Any Report Using Client Database in OpenERP 7.0

I am using Client Database and it will be restored successfully in my local system and working fine but when I am printing any report the within that database at that time.
I got the below traceback from the terminal.
Traceback (most recent call last):
File "/home/best/workspace/dynaweld/web/addons/web/http.py", line 285, in dispatch
r = method(self, **self.params)
File "/home/best/workspace/dynaweld/web/addons/web/controllers/main.py", line 1769, in index
cookies={'fileToken': int(token)})
File "/home/best/workspace/dynaweld/web/addons/web/http.py", line 332, in make_response
response.set_cookie(k, v)
File "/usr/local/lib/python2.7/dist-packages/Werkzeug-0.10.4-py2.7.egg/werkzeug/wrappers.py", line 1008, in set_cookie
self.charset))
File "/usr/local/lib/python2.7/dist-packages/Werkzeug-0.10.4-py2.7.egg/werkzeug/http.py", line 920, in dump_cookie
value = to_bytes(value, charset)
File "/usr/local/lib/python2.7/dist-packages/Werkzeug-0.10.4-py2.7.egg/werkzeug/_compat.py", line 106, in to_bytes
raise TypeError('Expected bytes')
TypeError: Expected bytes
I have tried the following way to resolve above traceback issue but I have not yet succeed.
1. Try remove the unwanted data from my local client database remove the all the data of mail.message object.
2. Remove all the unnecessary database from my system and using only 2-3 database for my OpenERP Server Run.
3. Clean my pc for unwanted files and other detail which was not relevant for my database.
4. I have also check with my enough memory space but I have that enough space for restoring that database file.
Can any one help me how can i fix this issue.
This is because cookies are not intended to support unicode characters, you must use a decoded variable in the cookie you are trying to set. something like :
set_cookie(k, bytes(v))
or at least send your variable as bytes.
I have fixed this by installing an older version of werkzeug, 0.6.2

problems with monkeyrunner

I am working on making some changes to the android framework layer and building my own version. I am working based on froyo and trying to use monkeyrunner for some testing. I have pulled the source and can build and run in the emulator but when I try to use a monkeyrunner script I can't seem to get anyhting to work. I built the code using lunch full-eng and it runs fine on the device. I am just trying to get a simple script running based on the example at http://developer.android.com/guide/developing/tools/monkeyrunner_concepts.html shown below with a print statement added just to see if I could get anything to run.
/# Imports the monkeyrunner modules used by this program
from com.android.monkeyrunner import MonkeyRunner, MonkeyDevice
/# Connects to the current device, returning a MonkeyDevice object
device = MonkeyRunner.waitForConnection()
print "Hello World!"
When the following line is in the script I get an error as follows.
from com.android.monkeyrunner import MonkeyRunner, MonkeyDevice
Traceback (most recent call last):
File "../../MRTesting/MyTest.py, line 4, in
from com.android.monkeyrunner import MonkeyRunner, MonkeyDevice
ImportError: cannot import name MonkeyDevice
So if I remove MonkeyDevice from the import as shown below I get a different error on the call to waitForConnection()
from com.android.monkeyrunner import MonkeyRunner
Traceback (most recent call last):
File "../../MRTesting/MyTest.py, line 6, in
device = MonkeyRunner.waitForConnection()
AttributeError: type object 'com.android.monkeyrunner.MonkeyRunner' has no attribute 'waitForConnection'
I tried modifying the call to have some arguments as indicated in the documentation as follows but I still get the same error. The second argument matches the value returned by a call to adb devices.
device = MonkeyRunner.waitForConnection(5, 'emulator-5554')
I have done some digging around and one person said that the shebang needs to be at the beginning of the file as follows with the path modified to avoid putting information in I would rather not share.
/#! /home/<path>/monkeyrunner
I could not see how this would be any different than me invoking monkeyrunner directly from the command line but I tried it and no luck. I did not install the sdk anywhere on my system as it is included in the build tree but it seems to me that the monkeyrunner tool might not be able to locate it as needed but I can't find any indication of how to fix this. I am running the following commands when I build my system from within my build directory at the root.
. build/envsetup.sh
setpaths
lunch full-eng
make -j16
Anyone have any thoughts on this?

Resources