Issue saving tf Bert model - bert-language-model

I have this below model, when Im trying to saving it, Im getting
IndexError: Exception encountered when calling layer "bert" (type TFBertMainLayer).
list index out of range
My model:
input_ids = tf.keras.layers.Input(shape=(MAX_SEQ_LEN,), name='input_token', dtype='int32')
input_masks_ids = tf.keras.layers.Input(shape=(MAX_SEQ_LEN,), name='masked_token', dtype='int32')
sequence_output = bert_layer.bert([input_ids, input_masks_ids])["last_hidden_state"]
x = tf.keras.layers.Lambda(lambda seq: seq[:, 0, :])(sequence_output)
out = tf.keras.layers.Dense(3,activation="softmax")(x)
model = tf.keras.Model(inputs=(input_ids, input_masks_ids), outputs=out)
model.save("/content/drive/MyDrive/Berttest",save_format='tf')

Related

Pipeline hyperparameter tunning with optuna

I'm trying to find the best parameter for the lasso polynomial especially : the degree of the polynomial, alpha and max_iteration, using OPTUNA, however i'm getting this error message :
ValueError: Invalid parameter degree for estimator Pipeline(steps=[('poly', PolynomialFeatures()), ('linear', Lasso())]). Check the list of available parameters with `estimator.get_params().keys()`.
I used K-cross-validation and this is the code :
model = Pipeline([('poly', PolynomialFeatures()),('linear',Lasso())])```
def run(trial):
scores =[]
for fold in range(5):
param_grid = {
"degree": trial.suggest_int("degree", 2, 5),
'alpha': trial.suggest_float("alpha", 0.01, 0.3),
'max_iter':trial.suggest_int('max_iter', 1000, 4000)
}
xtrain = df[df.kfold != fold].reset_index(drop=True)
xvalid = df[df.kfold == fold].reset_index(drop=True)
ytrain = xtrain.target
yvalid = xvalid.target
xtrain = xtrain[useful_features]
xvalid = xvalid[useful_features]
model.set_params(**param_grid)
model.fit(xtrain, ytrain)
preds_valid = model.predict(xvalid)
rmse = mean_squared_error(yvalid, preds_valid, squared=False)
scores.append(rmse)
return np.mean(scores)
study = optuna.create_study(direction="minimize")
study.optimize(run, n_trials=100)

TRT inference using onnx - Error Code 1: Cuda Driver (invalid resource handle)

Currently I'm tryin to convert given onnx file to tensorrt file, and do inference on the generated tensorrt file.
To do so, I used tensorrt python binding API, but
"Error Code 1: Cuda Driver (invalid resource handle)" happens and there is no kind description about this.
Can anyone help me to overcome this situation?
Thx in advance, and below is my code snippet.
def trt_export(self):
fp_16_mode = True
## Obviously, I provided appropriate file names
trt_file_name = "PATH_TO_TRT_FILE"
onnx_name = "PATH_TO_ONNX_FILE"
TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
builder = trt.Builder(TRT_LOGGER)
network = builder.create_network(EXPLICIT_BATCH)
parser = trt.OnnxParser(network, TRT_LOGGER)
config = builder.create_builder_config()
config.max_workspace_size = (1<<30)
config.set_flag(trt.BuilderFlag.FP16)
config.default_device_type = trt.DeviceType.GPU
profile = builder.create_optimization_profile()
profile.set_shape('input', (1, 3, IMG_SIZE, IMG_SIZE), (12, 3, IMG_SIZE, IMG_SIZE), (32, 3, IMG_SIZE, IMG_SIZE)) # random nubmers for min. opt. max batch
config.add_optimization_profile(profile)
with open(onnx_name, 'rb') as model:
if not parser.parse(model.read()):
for error in range(parser.num_errors):
print(parser.get_error(error))
engine = builder.build_engine(network, config)
buf = engine.serialize()
with open(trt_file_name, 'wb') as f:
f.write(buf)
def validate_trt_result(self, input_path):
TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)
trt_file_name = "PATH_TO_TRT_FILE"
trt_runtime = trt.Runtime(TRT_LOGGER)
with open(trt_file_name, 'rb') as f:
engine_data = f.read()
engine = trt_runtime.deserialize_cuda_engine(engine_data)
cuda.init()
device = cuda.Device(0)
ctx = device.make_context()
inputs, outputs, bindings = [], [], []
context = engine.create_execution_context()
stream = cuda.Stream()
index = 0
for binding in engine:
size = trt.volume(engine.get_binding_shape(binding)) * -1 # assuming one batch
dtype = trt.nptype(engine.get_binding_dtype(binding))
host_mem = cuda.pagelocked_empty(size, dtype)
device_mem = cuda.mem_alloc(host_mem.nbytes)
bindings.append(int(device_mem))
if engine.binding_is_input(binding):
inputs.append(HostDeviceMem(host_mem, device_mem))
context.set_binding_shape(index, [1, 3, IMG_SIZE, IMG_SIZE])
else:
outputs.append(HostDeviceMem(host_mem, device_mem))
index += 1
print(context.all_binding_shapes_specified)
input_img = cv2.imread(input_path)
input_r = cv2.resize(input_img, dsize = (256, 256))
input_p = np.transpose(input_r, (2, 0, 1))
input_e = np.expand_dims(input_p, axis = 0)
input_f = input_e.astype(np.float32)
input_f /= 255
numpy_array_input = [input_f]
hosts = [input.host for input in inputs]
trt_types = [trt.int32]
for numpy_array, host, trt_types in zip(numpy_array_input, hosts, trt_types):
numpy_array = np.asarray(numpy_array).astype(trt.nptype(trt_types)).ravel()
print(numpy_array.shape)
np.copyto(host, numpy_array)
[cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
#### ERROR HAPPENS HERE ####
context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)
#### ERROR HAPPENS HERE ####
[cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
stream.synchronize()
print("TRT model inference result : ")
output = outputs[0].host
for one in output :
print(one)
ctx.pop()
Looks like ctx.push() line is missing before a line with memcpy_htod_async.
Such a error can happen if TensorFlow / PyTorch is also using CUDA in parallel with TensorRT.
See the related question/answer: https://stackoverflow.com/a/73996477/5655977

Issue with TFX Trainer component not outputting model to filesystem

First of all, I am using TFX version 0.21.2 and Tensorflow version 2.1.
I have constructed a pipeline largely following the Chigaco taxi example. When the Trainer component is executed I can see the following in the logs:
INFO - Training complete. Model written to /root/airflow/tfx/pipelines/fish/Trainer/model/9/serving_model_dir
When checking the above directory it is empty. What am I missing?
This is my DAG definition file (import statements omitted):
_pipeline_name = 'fish'
_airflow_config = AirflowPipelineConfig(airflow_dag_config = {
'schedule_interval': None,
'start_date': datetime.datetime(2019, 1, 1),
})
_project_root = os.path.join(os.environ['HOME'], 'airflow')
_data_root = os.path.join(_project_root, 'data', 'fish_data')
_module_file = os.path.join(_project_root, 'dags', 'fishUtils.py')
_serving_model_dir = os.path.join(_project_root, 'serving_model', _pipeline_name)
_tfx_root = os.path.join(_project_root, 'tfx')
_pipeline_root = os.path.join(_tfx_root, 'pipelines', _pipeline_name)
_metadata_path = os.path.join(_tfx_root, 'metadata', _pipeline_name,
'metadata.db')
def _create_pipeline(pipeline_name: Text, pipeline_root: Text, data_root: Text,
module_file: Text, serving_model_dir: Text,
metadata_path: Text,
direct_num_workers: int) -> pipeline.Pipeline:
examples = external_input(data_root)
example_gen = CsvExampleGen(input=examples)
statistics_gen = StatisticsGen(examples=example_gen.outputs['examples'])
infer_schema = SchemaGen(
statistics=statistics_gen.outputs['statistics'],
infer_feature_shape=False)
validate_stats = ExampleValidator(
statistics=statistics_gen.outputs['statistics'],
schema=infer_schema.outputs['schema'])
trainer = Trainer(
examples=example_gen.outputs['examples'], schema=infer_schema.outputs['schema'],
module_file=_module_file, train_args= trainer_pb2.TrainArgs(num_steps=10000),
eval_args= trainer_pb2.EvalArgs(num_steps=5000))
model_validator = ModelValidator(
examples=example_gen.outputs['examples'],
model=trainer.outputs['model'])
pusher = Pusher(
model=trainer.outputs['model'],
model_blessing=model_validator.outputs['blessing'],
push_destination=pusher_pb2.PushDestination(
filesystem=pusher_pb2.PushDestination.Filesystem(
base_directory=_serving_model_dir)))
return pipeline.Pipeline(
pipeline_name=_pipeline_name,
pipeline_root=_pipeline_root,
components=[
example_gen,
statistics_gen,
infer_schema,
validate_stats,
trainer,
model_validator,
pusher],
enable_cache=True,
metadata_connection_config=metadata.sqlite_metadata_connection_config(
metadata_path),
beam_pipeline_args=['--direct_num_workers=%d' % direct_num_workers]
)
runner = AirflowDagRunner(config = _airflow_config)
DAG = runner.run(
_create_pipeline(
pipeline_name=_pipeline_name,
pipeline_root=_pipeline_root,
data_root=_data_root,
module_file=_module_file,
serving_model_dir=_serving_model_dir,
metadata_path=_metadata_path,
# 0 means auto-detect based on on the number of CPUs available during
# execution time.
direct_num_workers=0))
And this is my module file:
_DENSE_FLOAT_FEATURE_KEYS = ['length']
real_valued_columns = [tf.feature_column.numeric_column('length')]
def _eval_input_receiver_fn():
serialized_tf_example = tf.compat.v1.placeholder(
dtype=tf.string, shape=[None], name='input_example_tensor')
features = tf.io.parse_example(
serialized=serialized_tf_example,
features={
'length': tf.io.FixedLenFeature([], tf.float32),
'label': tf.io.FixedLenFeature([], tf.int64),
})
receiver_tensors = {'examples': serialized_tf_example}
return tfma.export.EvalInputReceiver(
features={'length' : features['length']},
receiver_tensors=receiver_tensors,
labels= features['label'],
)
def parser(serialized_example):
features = tf.io.parse_single_example(
serialized_example,
features={
'length': tf.io.FixedLenFeature([], tf.float32),
'label': tf.io.FixedLenFeature([], tf.int64),
})
return ({'length' : features['length']}, features['label'])
def _input_fn(filenames):
# TFRecordDataset doesn't directly accept paths with wildcards
filenames = tf.data.Dataset.list_files(filenames)
dataset = tf.data.TFRecordDataset(filenames, 'GZIP')
dataset = dataset.map(parser)
dataset = dataset.shuffle(2000)
dataset = dataset.batch(40)
dataset = dataset.repeat(10)
return dataset
def trainer_fn(trainer_fn_args, schema):
estimator = tf.estimator.LinearClassifier(feature_columns=real_valued_columns)
train_input_fn = lambda: _input_fn(trainer_fn_args.train_files)
train_spec = tf.estimator.TrainSpec(
train_input_fn,
max_steps=trainer_fn_args.train_steps)
eval_input_fn = lambda: _input_fn(trainer_fn_args.eval_files)
eval_spec = tf.estimator.EvalSpec(
eval_input_fn,
steps=trainer_fn_args.eval_steps,
name='fish-eval')
receiver_fn = lambda: _eval_input_receiver_fn()
return {
'estimator': estimator,
'train_spec': train_spec,
'eval_spec': eval_spec,
'eval_input_receiver_fn': receiver_fn
}
Thank you in advance for your help!
Posting the solution for anyone that is facing the same problem that I faced.
The reason that the model was not written in the filesystem was that the estimator needs a config argument to know where to write the model.
The following modification to the trainer_fn function should solve the problem.
run_config = tf.estimator.RunConfig(save_checkpoints_steps=999, keep_checkpoint_max=1)
run_config = run_config.replace(model_dir=trainer_fn_args.serving_model_dir)
estimator=tf.estimator.LinearClassifier(feature_columns=real_valued_columns,config=run_config)

How do I average 20 saved ANN model?

I have created 20 neural network models with the exact same architecture (but different seed numbers) and I have saved them as SM1, SM2, …., SM20. I am basically tying to import these models using the joblib function and use the average model (called NN here) to apply for various analyses. When I run this in Jupyter Notebook , I get the following error. Appreciate any help.
NN1 = joblib.load('SM1.sav')
NN2 = joblib.load('SM2.sav')
NN3 = joblib.load('SM3.sav')
NN4 = joblib.load('SM4.sav')
NN5 = joblib.load('SM5.sav')
NN6 = joblib.load('SM6.sav')
NN7 = joblib.load('SM7.sav')
NN8 = joblib.load('SM8.sav')
NN9 = joblib.load('SM9.sav')
NN10 = joblib.load('SM10.sav')
NN11 = joblib.load('SM11.sav')
NN12 = joblib.load('SM12.sav')
NN13 = joblib.load('SM13.sav')
NN14 = joblib.load('SM14.sav')
NN15 = joblib.load('SM15.sav')
NN16 = joblib.load('SM16.sav')
NN17 = joblib.load('SM17.sav')
NN18 = joblib.load('SM18.sav')
NN19 = joblib.load('SM19.sav')
NN20 = joblib.load('SM20.sav')
NN = (NN1+NN2+NN3+NN4+NN5+NN6+NN7+NN8+NN9+NN10+NN11+NN12+NN13+NN14+NN15+NN16+NN17+NN18+NN19+NN20)/(20)
TypeError: unsupported operand type(s) for +: 'MLPRegressor' and 'MLPRegressor'

error with earth function using MARS algo

I run my ML algo:
EarthAlgo<-earth(cible~., data=train, degree=4, glm=list(family=binomial))
I got:
Error in leaps.setup(x = bx, y = y, force.in = 1, force.out = NULL, intercept = FALSE, :
NA/NaN/Inf in foreign function call (arg 3)
When I try with train<-train[(1:dim(train)[1]-2),], it's ok. I search last two line on my train data set but i do not see error.
Could you help me please?

Resources