Can't load tokenizer for '/content/drive/MyDrive/Colab Notebooks/classification_models_2/space1/' - bert-language-model

I used Bert to predict the review score. After the trained model was saved to a folder, the above error occurred when I loaded it again. This is the second time I trained this model, that is to say, after I trained it once, I put the first training It is possible to train the model with new data to improve its accuracy. Why did the first training not report an error, but the second time it reported an error, I only changed the saved path.
import os
model.save_pretrained("/content/drive/MyDrive/Colab Notebooks/test/classification_models_2/space1/")
tokenizer.save_pretrained("/content/drive/MyDrive/Colab Notebooks/test/classification_models_2/space1/")
torch.save(args,os.path.join("/content/drive/MyDrive/Colab Notebooks/test/classification_models_2/space1/","training_args.bin"))
# 1. 载入训练好的模型
args_eval={"model_name_or_path": "/content/drive/MyDrive/Colab Notebooks/test/classification_models_2/space1/",
"config_name": "/content/drive/MyDrive/Colab Notebooks/test/classification_models_2/space1/",
"tokenizer_name": "/content/drive/MyDrive/Colab Notebooks/classification_models_2/space1/",
}
config_class, tokenizer_class = MODEL_CLASSES["bert"]
model_class=BertForClassification
config = config_class.from_pretrained(
args_eval["config_name"],
finetuning_task="",
cache_dir=None,
)
tokenizer = tokenizer_class.from_pretrained(
args_eval["tokenizer_name"],
do_lower_case=True,
cache_dir=None,
)
model = model_class.from_pretrained(
args_eval["model_name_or_path"],
from_tf=bool(".ckpt" in args_eval["model_name_or_path"]),
config=config,
cache_dir=None,
)
model.to(device)
Blockquote
OSError Traceback (most recent call last)
<ipython-input-57-4658f3d30e04> in <module>()
17 args_eval["tokenizer_name"],
18 do_lower_case=True,
---> 19 cache_dir=None,
20 )
21 model = model_class.from_pretrained(
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
1766 if all(full_file_name is None for full_file_name in resolved_vocab_files.values()):
1767 raise EnvironmentError(
-> 1768 f"Can't load tokenizer for '{pretrained_model_name_or_path}'. If you were trying to load it from "
1769 "'https://huggingface.co/models', make sure you don't have a local directory with the same name. "
1770 f"Otherwise, make sure '{pretrained_model_name_or_path}' is the correct path to a directory "
OSError: Can't load tokenizer for '/content/drive/MyDrive/Colab Notebooks/classification_models_2/space1/'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '/content/drive/MyDrive/Colab Notebooks/classification_models_2/space1/' is the correct path to a directory containing all relevant files for a BertTokenizer tokenizer.

Related

I get this error while tying to join a file using the os path

filepath=os.path.join('SalesData')
filepath
disFile=os.listdir(filepath)
disFile
joinFile = os.path.join(disFile, "Sales_Data_2020")
I keep getting this error:
TypeError Traceback (most recent call last)
Input In [15], in <cell line: 1>()
----> 1 joinFile=os.path.join(disFile,"Sales_Data_2020")
File ~\anaconda3\lib\ntpath.py:78, in join(path, *paths)
77 def join(path, *paths):
---> 78 path = os.fspath(path)
79 if isinstance(path, bytes):
80 sep = b'\'
TypeError: expected str, bytes or os.PathLike object, not list
Depending on what disFile represents in your case, this should do the trick:
joinFile = os.path.join("disFile", "Sales_Data_2020")
According to the error I guess that disFile on your end represents a list which won't work. Thus, you would have to select a string of that list to join it.
See also the official docu

How to visualize a clog2 file from MPI/MPE using jumpshot

I am following the examples given in the “Using MPI” book and am working on the example that turns on MPE logging (pmatmatlog.c - ported it from the Fortran example). It runs and produces a log file called “pmatmat.log.clog2”. I would like to visualize this log file.
I started by trying to use “jumpshot-4”, because that is what is installed and I cannot find a way to download jumpshot-2 (which appears to favor log files in the clog2 format?). Jumpshot-4, however, wants files in slog2 format and gives an error about not finding “clog2TOslog2” in the TAU directory tree.
Looking at the head of the clog2 file, it appears to be correct as far as I can tell:
$ head pmatmat.log.clog2
CLOG-02.44is_big_endian=TRUE is_finalzed=TRUE block_size=65536num_buffered_blocks=128max_comm_world_size=4max_thread_count=1known_eventID_start=0user_eventID_start=600known_solo_eventID_start=-10user_solo_eventID_start=5000known_stateID_count=300user_stateID_count=4known_solo_eventID_count=0user_solo_eventID_count=0commtable_fptr=0107374182466560>? … <and on into a lot of unreadable binary>
When I try to convert my file clog2 file using the command line calling “clog2TOslog2”, I get the following:
$ Clog2ToSlog2 ./pmatmat.log.clog2
GUI_LIBDIR is set. GUI_LIBDIR = /Users/markrbower/mpi/lib
**** Error! State!=State
**** Error! State!=State
**** Error! State!=State
**** Error! State!=State
java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at logformat.clog2TOdrawable.InputLog$ContentIterator.hasNext(InputLog.java:481)
at logformat.clog2TOdrawable.InputLog.peekNextKind(InputLog.java:58)
at logformat.slog2.output.Clog2ToSlog2.main(Clog2ToSlog2.java:77)
Caused by: logformat.clog2TOdrawable.NoMatchingEventException: No matching State end-event for Record RecHeader[ time=1.9073486328125E-5, icomm=0, rank=1, thread=0, rectype=6 ], RecCargo[ etype=1, bytes=bstartbendcomputecomputedpma ]
at logformat.clog2TOdrawable.Topo_State.matchFinalEvent(Topo_State.java:95)
... 7 more
java.lang.reflect.InvocationTargetException
After a lot of errors that all look to be “InvocationTargetException” as above, the tail says:
…
SLOG-2 Header:
version = SLOG 2.0.6
NumOfChildrenPerNode = 2
TreeLeafByteSize = 65536
MaxTreeDepth = 0
MaxBufferByteSize = 1346
Categories is FBinfo(157 # 1454)
MethodDefs is FBinfo(0 # 0)
LineIDMaps is FBinfo(232 # 1611)
TreeRoot is FBinfo(1346 # 108)
TreeDir is FBinfo(38 # 1843)
Annotations is FBinfo(0 # 0)
Postamble is FBinfo(0 # 0)
Number of Drawables = 20
Number of Unmatched Events = 0
Total ByteSize of the logfile = 8320
timeElapsed between 1 & 2 = 13 msec
timeElapsed between 2 & 3 = 79 msec
$
There are several routes I could see to finally get to visualizing the log file:
1. re-install TAU and hope that allows jumpshot-4 to find the converter
2. install MPI2 with MPE2 and try the newer version of everything
3. find a way to download and install Jumpshot-2 and hope that reads clog2 files
4. find some other way to convert clog2 to slog2
Why isn’t the conversion working and which is the best option to pursue?

Is it possible to run a custom OpenAI gym environment entirely from within Jupyter Notebook

Long story short: I have been given some Python code for a custom openAI gym environment. I can successfully run the code via ExperimentGrid from the command line but would like to be able to run the entire experiment from within Jupyter notebook, rather than calling scripts. This would be more convenient for some experiments that I will be doing farther down the road.
My question: Is it possible to execute an experiment on a custom OpenAI gym environment entirely from within Jupyter Notebook and if so, how? I've seen plenty of examples of people executing gym's standard environments (like SpaceInvaders-v0 or CartPole-v0) from Jupyter but even then, they are calling the environment with
env=gym.make('SpaceInvaders-v0')
and essentially executing that environment's script behind the scenes.
Below is a basic description of how my code is set-up to run from the command line and the errors that I'm getting in Jupyter.
Any advice would be appreciated. I am admittedly rather new to Gym, Python and Linux.
My basic environment code is structured like this in, say, envs/mygames/Custom_Env.py:
various import statements (numpy, gym, pyglet, copy)
class Entity()
class State()
class The_Custom_Env(core.Env) # This is the main environment class
class Shell_Class # This class calls The_Custom_Env and provides some arguments
In mygames/__ init__.py,I import the Shell_Class:
from gym.envs.mygames.Custom_Env import Shell_Class
In envs/__ init__.py, I have the environment registered
register(
id='TEST-v0',
entry_point='gym.envs.mygames:Shell_Class',
max_episode_steps=200,
reward_threshold=25.0,)
Finally, if I execute a script containing this code from the command line, the experiment works without issue:
from spinup.utils.run_utils import ExperimentGrid
from spinup import ppo_pytorch
import torch
if __name__ == '__main__':
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--cpu', type=int, default=4)
parser.add_argument('--num_runs', type=int, default=1)
args = parser.parse_args()
eg = ExperimentGrid(name='super-cool-test')
eg.add('env_name', 'TEST-v0', '', True)
eg.add('seed', [10*i for i in range(args.num_runs)])
eg.add('epochs', [10])
eg.add('steps_per_epoch', 4000)
eg.add('ac_kwargs:hidden_sizes', [(32, 32)], 'hid')
eg.add('ac_kwargs:activation', [torch.nn.ReLU], '')
eg.add('pi_lr', [0.001])
eg.add('clip_ratio', 0.3)
eg.run(ppo_pytorch, num_cpu=args.cpu)
My Jupyter Attempt
I put all of the code from Custom_env.py in cell #1.
I then registered the environment in cell #2:
gym.register(
id='TEST-v1',
entry_point='__main__:Shell_Class',
max_episode_steps=200,
reward_threshold=25.0,)
based on this Q/A: Register gym environment that is defined inside a jupyter notebook cell
, I make the environment in cell #3:
gym.make('TEST-v1')
and get this non-descriptive output:
<TimeLimit<Shell_Class< TEST-v1 >>>
In cell #4, I tried to execute ExperimentGrid code directly within Jupyter like so:
from spinup.utils.run_utils import ExperimentGrid
from spinup import ppo_pytorch
import torch
num_runs=1
cpu=4
env_name='TEST-v1'
eg = ExperimentGrid(name='Jupyter-test')
eg.add('env_name', env_name, '', True)
eg.add('seed', [10*i for i in range(num_runs)])
eg.add('epochs', 500)
eg.add('steps_per_epoch', 4000)
eg.add('ac_kwargs:hidden_sizes', [(32, 32)], 'hid')
eg.add('ac_kwargs:activation', [torch.nn.ReLU], '')
eg.add('pi_lr', 0.001)
eg.add('clip_ratio', 0.3)
eg.run(ppo_pytorch, num_cpu=cpu)
The experiment starts up as usual but then runs into some kind of error:
> ================================================================================
ExperimentGrid [Jupyter-test] runs over parameters:
env_name []
TEST-v1
seed [see]
0
epochs [epo]
500
steps_per_epoch [ste]
4000
ac_kwargs:hidden_sizes [hid]
(32, 32)
ac_kwargs:activation []
ReLU
pi_lr [pi]
0.001
clip_ratio [cli]
0.3
Variants, counting seeds: 1
Variants, not counting seeds: 1
================================================================================
Preparing to run the following experiments...
Jupyter-test_test-v1
================================================================================
Launch delayed to give you a few seconds to review your experiments.
To customize or disable this behavior, change WAIT_BEFORE_LAUNCH in
spinup/user_config.py.
================================================================================
Running experiment:
Jupyter-test_test-v1
with kwargs:
{
"ac_kwargs": {
"activation": "ReLU",
"hidden_sizes": [
32,
32
]
},
"clip_ratio": 0.3,
"env_name": "TEST-v1",
"epochs": 500,
"pi_lr": 0.001,
"seed": 0,
"steps_per_epoch": 4000
}
================================================================================
There appears to have been an error in your experiment.
Check the traceback above to see what actually went wrong. The
traceback below, included for completeness (but probably not useful
for diagnosing the error), shows the stack leading up to the
experiment launch.
================================================================================
---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
<ipython-input-14-de843fd528cf> in <module>
15 eg.add('pi_lr', 0.001)
16 eg.add('clip_ratio', 0.3)
---> 17 eg.run(ppo_pytorch, num_cpu=cpu)
~/Downloads/spinningup/spinup/utils/run_utils.py in run(self, thunk, num_cpu, data_dir, datestamp)
544
545 call_experiment(exp_name, thunk_, num_cpu=num_cpu,
--> 546 data_dir=data_dir, datestamp=datestamp, **var)
547
548
~/Downloads/spinningup/spinup/utils/run_utils.py in call_experiment(exp_name, thunk, seed, num_cpu, data_dir, datestamp, **kwargs)
169 cmd = [sys.executable if sys.executable else 'python', entrypoint, encoded_thunk]
170 try:
--> 171 subprocess.check_call(cmd, env=os.environ)
172 except CalledProcessError:
173 err_msg = '\n'*3 + '='*DIV_LINE_WIDTH + '\n' + dedent("""
~/anaconda3/envs/spinningup/lib/python3.6/subprocess.py in check_call(*popenargs, **kwargs)
309 if cmd is None:
310 cmd = popenargs[0]
--> 311 raise CalledProcessError(retcode, cmd)
312 return 0
313

Why tf2 can't save a tf_function model as .pb file?

I tried to saved a model like the official code of transformer on official website, but when i want to save the train_step graph or trace on it with tf.summary.trace_on ,it errors.the error is as
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/func_graph.py in convert(x)
936 try:
--> 937 x = ops.convert_to_tensor_or_composite(x)
938 except (ValueError, TypeError):
15 frames
TypeError: Can't convert Operation 'PartitionedFunctionCall' to Tensor (target dtype=None, name=None, as_ref=False)
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/func_graph.py in convert(x)
941 "must return zero or more Tensors; in compilation of %s, found "
942 "return value of type %s, which is not a Tensor." %
--> 943 (str(python_func), type(x)))
944 if add_control_dependencies:
945 x = deps_ctx.mark_as_return(x)
TypeError: To be compatible with tf.contrib.eager.defun, Python functions must return zero or more Tensors; in compilation of <function canonicalize_signatures.<locals>.signature_wrapper at 0x7fcf794b47b8>, found return value of type <class 'tensorflow.python.framework.ops.Operation'>, which is not a Tensor.
I supposed it was some error on tensor operation and write an other demo to confirm my idea:
import tensorflow as tf
sig=[tf.TensorSpec(shape=(None, None), dtype=tf.int64),tf.TensorSpec(shape=(None, None), dtype=tf.int64)]
#tf.function(input_signature=sig)
def cal(a,d):
b=a[1:]
root=tf.Module()
root.func = cal
# 获取具体函数。
concrete_func = root.func.get_concrete_function(
tf.TensorSpec(shape=(None, None), dtype=tf.int64),tf.TensorSpec(shape=(None, None), dtype=tf.int64)
)
tf.saved_model.save(root, '/correct', concrete_func)
the error occurs as supposed. But how can i fixed it? the positional_encoding requires and i have no idea about how to replace this operation.
I figure it because i didn't save the transformer model in the pb file

Building R package with Reference objects fails

I recently rewrote a package to use the new(er) R reference class objects. I've exported the three classes using export() in the NAMESPACE file so as far as I'm aware that should work. However when I test build the package I get an error at lazy loading stage:
** preparing package for lazy loading
Error in file(con, "rb") : invalid 'description' argument
ERROR: lazy loading failed for package ‘PACKAGE_NAME_HERE’
* removing ‘/Library/Frameworks/R.framework/Versions/3.0/Resources/library/PACKAGE_NAME_HERE’
I'm not sure what the problem is here. I don't know if it's relevant but the reference classes do store data on files in the tmp directory by having some fields set as accessor functions - I don't know if that's what s being complained about here when it says (con, "rb") which I guess is some connection thing. Does anybody have any ideas or advice for making sure reference classes get exported properly? My namespace is currently simple -
export(Main)
export(Mainseq)
export(Maintriplet)
Which are the three reference classes I exported by using #export tags in roxygen2.
What is it I'm doing (or not doing) that is throwing the lazy load error?
(ASIDE - I have no compiled code - all R, although the reference class methods do call some internal functions that are not exported, but these are supposed to be internal so I don't think I need to export them.
Thanks,
Ben.
EDIT:
My description file is as follows:
Package: HybRIDS
Type: Package
Title: Quick detection and dating of Recombinant Regions in DNA sequence data.
Version: 1.0
Date: 2013-03-13
Author: Ben J. Ward
Maintainer: Ben J. Ward <b.ward#uea.ac.uk>
Description: A simple R package for the quick detection and dating of Recombinant Regions in DNA sequence data.
License: GPL-2
Depends: ggplot2,grid,gridExtra,png,ape
I can't see what is wrong with this - the Depends are correct.
EDIT:
I've eliminated the first error with the description but I'm still getting the con error.
I think it's because the Mainseq class (which is nested in class Main) has some fields:
FullSequenceFile = "character",
FullSequence = function( value ) {
if( missing( value ) ){
as.character( read.dna( file = FullSequenceFile, format = "fasta", as.matrix = TRUE ) )
} else {
write.dna( value, file = FullSequenceFile, format = "fasta" )
}
},
InformativeSequenceFile = "character",
InformativeSequence = function( value ) {
if( missing( value ) ){
as.character( read.dna( file = InformativeSequenceFile, format = "fasta", as.matrix = TRUE ) )
} else {
write.dna( value, file = InformativeSequenceFile, format = "fasta" )
}
}
The idea being upon initialisation, the two character fields are filled with a path to a temp file in tmpdir, and when the variables are called or edited the files containing the variable data are read or written to. However it seems the variables are being accessed before this path is available because up package build the following happens:
** preparing package for lazy loading
Warning in file(con, "rb") :
cannot open file '/var/folders/kp/clkqvqn9739ffw2755zjwy74_skf_z/T//RtmpLB8ESC/FullSequenceaba52ac591f3': No such file or directory
Error in file(con, "rb") : cannot open the connection

Resources