Finetuning BERT on Custom data using Colab - bert-language-model

I am running run_lm_finetuning on colab, to fine tune CamemBERT on custom vocabulary.
I am using the following parameters:
!python run_lm_finetuning.py \
--output_dir Skander/ \
--model_type camembert\
--model_name_or_path camembert-base \
--do_train \
--train_data_file="Text.txt" \
--line_by_line\
--mlm\
--per_gpu_train_batch_size=32 \
--num_train_epochs=3 \
However, I am getting the following error:
tcmalloc: large alloc 1264730112 bytes == 0xe87fe000 # 0x7f9828a8f1e7 0x5ad4cb 0x4bb356 0x5bd993 0x50a8af 0x50c5b9 0x508245 0x509642 0x595311 0x54a6ff 0x551b81 0x5aa6ec 0x50abb3 0x50d390 0x508245 0x50a080 0x50aa7d 0x50d390 0x508245 0x50a080 0x50aa7d 0x50c5b9 0x508245 0x50b403 0x635222 0x6352d7 0x638a8f 0x639631 0x4b0f40 0x7f982868cb97 0x5b2fda
^C
Anyone has an idea about this error?

Related

Accessing and compare properties of the stored edge in traversal

I have following graph model:
I want to select user, who has performed action via the controller. The performed-by edge contains used_user_key property, that i want to use in order to select the called-by edge connected to the required user on the following condition: called-by.user_key == performed-by.used_user_key property.
I store performed-by in action_edge and trying to use stored value in has step.
Problem: has('user_key', select('action_edge').values('used_user_key')) yields random edge.
Question: How should i get/reference property from the stored edge in a has step?
GraphDB: JanusGraph 0.5.2
gremlinpython: 3.5.0
Python snippet for reproducing the issue:
user_a = g.addV('user').property('name', 'a').next()
user_b = g.addV('user').property('name', 'b').next()
user_c = g.addV('user').property('name', 'c').next()
controller = g.addV('controller').property('name', 'controller').next()
action = g.addV('action').property('name', 'action').next()
g.V(user_a).as_('to').V(controller).as_('from') \
.addE('called-by') \
.property('user_key', 'user_a') \
.to('to') \
.next()
g.V(user_b).as_('to').V(controller).as_('from') \
.addE('called-by') \
.property('user_key', 'user_b') \
.to('to') \
.next()
g.V(user_c).as_('to').V(controller).as_('from') \
.addE('called-by') \
.property('user_key', 'user_c') \
.to('to') \
.next()
g.V(controller).as_('to').V(action).as_('from') \
.addE('performed-by') \
.property('used_user_key', 'user_a') \
.to('to') \
.next()
# Works as expected!
user_perming_the_action = g.V(action).outE('performed-by').as_('action_edge').inV() \
.outE('called-by').has('user_key', 'user_a').inV() \
.next()
assert user_a.id == user_perming_the_action.id
# Selects random user - ignores all action_edge.used_user_key value
user_perming_the_action = g.V(action).outE('performed-by').as_('action_edge').inV() \
.outE('called-by').has('user_key', select('action_edge').values('used_user_key')).inV();
# Why it yield 3 instead of 1 edge?
assert user_perming_the_action.clone().count().next() == 3
# Returns random user
assert user_a.id == user_perming_the_action.clone().next().id
Thanks for you help in advance!
After some research, i have found following solution to the problem:
user_perming_the_action = g.V(action).outE('performed-by').as_('action_edge').inV() \
.outE('called-by').where(eq('action_edge')).by('user_key').by('used_user_key').inV() \
.next()
assert user_a.id == user_perming_the_action.id
I am comparing edges with where on properties with different names by using two by modulators.

How can I add a user to a protected branch?

I would like to configure my gitlab project so that every maintainer can merge (after review) but nobody can push on master; only a bot (for release).
I'm using terraform to configure my gitlab, with something like this:
resource "gitlab_branch_protection" "BranchProtect" {
project = local.project_id
branch = "master"
push_access_level = "no one"
merge_access_level = "maintainer"
}
But with have a "premium" version and the terraform provider do not allow to add a user (goto: https://github.com/gitlabhq/terraform-provider-gitlab/issues/165 ).
So, what I like to do is doing some http request on the API to add the specific user.
So I'm doing it like this:
get the actual protection
delete the actual configuration
update the retrieved configuration with what I want
push the new configuration
BTW: I've not found how to just update the configuration... https://docs.gitlab.com/ee/api/protected_branches.html
TMP_FILE=$(mktemp)
http GET \
$GITLAB_URL/api/v4/projects/$pid/protected_branches \
PRIVATE-TOKEN:$GITLAB_TOKEN \
name=$BRANCH_NAME \
| \
jq \
--arg uid $USER_ID \
'.[0] | .push_access_levels |= . + [{user_id: ($uid | tonumber)}]' \
> $TMP_FILE
http DELETE \
"$GITLAB_URL/api/v4/projects/$pid/protected_branches/$BRANCH_NAME" \
PRIVATE-TOKEN:$GITLAB_TOKEN
http --verbose POST \
"$GITLAB_URL/api/v4/projects/$pid/protected_branches" \
PRIVATE-TOKEN:$GITLAB_TOKEN \
< $TMP_FILE
But my problem is that the resulting configuration is not what I expect, I've got something like this:
"push_access_levels": [
{
"access_level": 40,
"access_level_description": "Maintainers",
"group_id": null,
"user_id": null
}
],
How can I just update the branch protection to add a simple user ?
Ok like they say: RTFM !
But you need to delete the rule before adding the new configuration.
http \
DELETE \
"$GITLAB_URL/api/v4/projects/$pid/protected_branches/$BRANCH_NAME" \
PRIVATE-TOKEN:$GITLAB_TOKEN \
http \
POST \
$GITLAB_URL/api/v4/projects/$pid/protected_branches \
PRIVATE-TOKEN:$GITLAB_TOKEN \
name==${BRANCH_NAME} \
push_access_level==0 \
merge_access_level==40 \
unprotect_access_level==40 \
allowed_to_push[][user_id]==$USER_ID \

awscli doesn't consider global-secondary-indexes when validating attribute-definitions

I'm trying to initialize dynamodb table when creating a localstack container.
Consider following command:
awslocal dynamodb create-table \
--debug \
--table-name Journal \
--global-secondary-indexes 'IndexName=GetJournalRowsIndex, KeySchema=[{AttributeName=persistence-id, KeyType=HASH},{AttributeName=sequence-nr,KeyType=RANGE}], Projection={ProjectionType=ALL}, ProvisionedThroughput={ReadCapacityUnits=10,WriteCapacityUnits=10}' \
--global-secondary-indexes 'IndexName=TagsIndex, KeySchema=[{AttributeName=tags,KeyType=HASH}],Projection={ProjectionType=ALL},ProvisionedThroughput={ReadCapacityUnits=10,WriteCapacityUnits=10}' \
--key-schema \
AttributeName=pkey,KeyType=HASH \
AttributeName=skey,KeyType=RANGE \
--attribute-definitions \
AttributeName=persistence-id,AttributeType=S \
AttributeName=pkey,AttributeType=S \
AttributeName=skey,AttributeType=S \
AttributeName=sequence-nr,AttributeType=N \
AttributeName=tags,AttributeType=S \
--billing-mode PAY_PER_REQUEST
I'm getting the following error:
An error occurred (ValidationException) when calling the CreateTable operation: The number of attributes in key schema must match the number of attributesdefined in attribute definitions.
I'm using those in GSI so I wonder what am I doing wrong here?
I guess you can't specify global-secondary-indexes flag twice. Try the following
awslocal dynamodb create-table \
--debug \
--table-name Journal \
--global-secondary-indexes "[{\"IndexName\": \"GetJournalRowsIndex\", \"KeySchema\": [{\"AttributeName\": \"persistence-id\", \"KeyType\": \"HASH\"}, {\"AttributeName\": \"sequence-nr\", \"KeyType\": \"RANGE\"}], \"Projection\": {\"ProjectionType\": \"ALL\"}, \"ProvisionedThroughput\": {\"ReadCapacityUnits\": 1, \"WriteCapacityUnits\": 1}}, {\"IndexName\": \"TagsIndex\", \"KeySchema\": [{\"AttributeName\": \"tags\", \"KeyType\": \"HASH\"}], \"Projection\": {\"ProjectionType\": \"ALL\"}, \"ProvisionedThroughput\": {\"ReadCapacityUnits\": 1, \"WriteCapacityUnits\": 1}}]" \
--key-schema \
AttributeName=pkey,KeyType=HASH \
AttributeName=skey,KeyType=RANGE \
--attribute-definitions \
AttributeName=persistence-id,AttributeType=S \
AttributeName=pkey,AttributeType=S \
AttributeName=skey,AttributeType=S \
AttributeName=sequence-nr,AttributeType=N \
AttributeName=tags,AttributeType=S \
--billing-mode PAY_PER_REQUEST

Why can't I create a Google DataProc cluster with both Jupyter and DataLab installed?

I want to create a cluster in DataProc with both Jupyter and DataLab installed (I understand they are very similar but team members have different preference). I can create cluster with any of them:
Cluster with Jupyter:
gcloud dataproc clusters create $DATAPROC_CLUSTER_NAME_JUPYTER \
--project $PROJECT \
--bucket $BUCKET \
--zone $ZONE \
--initialization-actions gs://dataproc-initialization-actions/connectors/connectors.sh,gs://dataproc-initialization-actions/jupyter/jupyter.sh \
--metadata gcs-connector-version=$GCS_CONNECTOR_VERSION \
--metadata bigquery-connector-version=$BQ_CONNECTOR_VERSION \
--metadata JUPYTER_PORT=$JUPYTER_PORT,JUPYTER_CONDA_PACKAGES=numpy:scipy:pandas:scikit-learn
Cluster with DataLab:
gcloud dataproc clusters create $DATAPROC_CLUSTER_NAME_DATALAB \
--project $PROJECT \
--bucket $BUCKET \
--zone $ZONE \
--master-boot-disk-size $MASTER_DISK_SIZE \
--worker-boot-disk-size $WORKER_DISK_SIZE \
--initialization-actions gs://dataproc-initialization-actions/connectors/connectors.sh,gs://dataproc-initialization-actions/datalab/datalab.sh \
--metadata gcs-connector-version=$GCS_CONNECTOR_VERSION \
--metadata bigquery-connector-version=$BQ_CONNECTOR_VERSION \
--scopes cloud-platform,bigquery
And both work well. However, when I try to create a cluster with both of them, it fails:
gcloud dataproc clusters create test \
--project $PROJECT \
--bucket $BUCKET \
--zone $ZONE \
--initialization-actions gs://dataproc-initialization-actions/connectors/connectors.sh,gs://dataproc-initialization-actions/datalab/datalab.sh,gs://dataproc-initialization-actions/jupyter/jupyter.sh \
--metadata gcs-connector-version=$GCS_CONNECTOR_VERSION \
--metadata bigquery-connector-version=$BQ_CONNECTOR_VERSION \
--metadata JUPYTER_PORT=$JUPYTER_PORT,JUPYTER_CONDA_PACKAGES=numpy:scipy:pandas:scikit-learn \
--scopes cloud-platform,bigquery
The error message are:
ERROR: (gcloud.dataproc.clusters.create) Operation [projects/abc/regions/global/operations/d34943dc-5bda-386f-af91-db6e0516e2c5] failed: Multiple Errors:
- Initialization action failed. Failed action 'gs://dataproc-initialization-actions/jupyter/jupyter.sh', see output in: gs://abc/google-cloud-dataproc-metainfo/266175ef-e595-4732-b351-335837a3f30e/test-m/dataproc-initialization-script-2_output
- Initialization action failed. Failed action 'gs://dataproc-initialization-actions/jupyter/jupyter.sh', see output in: gs://abc/google-cloud-dataproc-metainfo/266175ef-e595-4732-b351-335837a3f30e/test-w-0/dataproc-initialization-script-2_output
- Initialization action failed. Failed action 'gs://dataproc-initialization-actions/jupyter/jupyter.sh', see output in: gs://abc/google-cloud-dataproc-metainfo/266175ef-e595-4732-b351-335837a3f30e/test-w-1/dataproc-initialization-script-2_output.
The file in test-m looks like following:
++ /usr/share/google/get_metadata_value attributes/dataproc-role
+ readonly ROLE=Worker
+ ROLE=Worker
++ /usr/share/google/get_metadata_value attributes/INIT_ACTIONS_REPO
++ echo https://github.com/GoogleCloudPlatform/dataproc-initialization-actions.git
+ readonly INIT_ACTIONS_REPO=https://github.com/GoogleCloudPlatform/dataproc-initialization-actions.git
+ INIT_ACTIONS_REPO=https://github.com/GoogleCloudPlatform/dataproc-initialization-actions.git
++ /usr/share/google/get_metadata_value attributes/INIT_ACTIONS_BRANCH
++ echo master
+ readonly INIT_ACTIONS_BRANCH=master
+ INIT_ACTIONS_BRANCH=master
++ /usr/share/google/get_metadata_value attributes/JUPYTER_CONDA_CHANNELS
+ readonly JUPYTER_CONDA_CHANNELS=
+ JUPYTER_CONDA_CHANNELS=
++ /usr/share/google/get_metadata_value attributes/JUPYTER_CONDA_PACKAGES
+ readonly JUPYTER_CONDA_PACKAGES=numpy:scipy:pandas:scikit-learn
+ JUPYTER_CONDA_PACKAGES=numpy:scipy:pandas:scikit-learn
+ echo 'Cloning fresh dataproc-initialization-actions from repo https://github.com/GoogleCloudPlatform/dataproc-initialization-actions.git and branch master...'
Cloning fresh dataproc-initialization-actions from repo https://github.com/GoogleCloudPlatform/dataproc-initialization-actions.git and branch master...
+ git clone -b master --single-branch https://github.com/GoogleCloudPlatform/dataproc-initialization-actions.git
fatal: destination path 'dataproc-initialization-actions' already exists and is not an empty directory.
Looks like there is a clone step which prevents the installation from success. How can I solve this? Any suggestion is appreciated, thank you.
This appears to be a bug in the init actions where we can't git clone the repository twice. We will fix this.
In the mean time, you can try Jupyter optional component instead with datalab init action.

Which module to install to fix "module QtQuick.Controls.Styles is not installed" error for Yocto?

I've successfully built a Yocto image for the RPi2 following this tutorial. I decided to expand the QML demo and try some of the Qt quick styles (import QtQuick.Controls.Styles 1.4).
Here is the bb file for the image
# Pulled from a mix of different images:
include recipes-core/images/rpi-basic-image.bb
# This image is a little more full featured, and includes wifi
# support, provided you have a raspberrypi3
inherit linux-raspberrypi-base
SUMMARY = "The minimal image that can run Qt5 applications"
LICENSE = "MIT"
# depend on bcm2835, which will bring in rpi-config
DEPENDS += "bcm2835-bootfiles"
MY_TOOLS = " \
qtbase \
qtbase-dev \
qtbase-mkspecs \
qtbase-plugins \
qtbase-tools \
"
MY_PKGS = " \
qt3d \
qt3d-dev \
qt3d-mkspecs \
qtcharts \
qtcharts-dev \
qtcharts-mkspecs \
qtconnectivity-dev \
qtconnectivity-mkspecs \
qtquickcontrols2 \
qtquickcontrols2-dev \
qtquickcontrols2-mkspecs \
qtdeclarative \
qtdeclarative-dev \
qtdeclarative-mkspecs \
qtgraphicaleffects \
qtgraphicaleffects-dev \
"
MY_FEATURES = " \
linux-firmware-bcm43430 \
bluez5 \
i2c-tools \
python-smbus \
bridge-utils \
hostapd \
dhcp-server \
iptables \
wpa-supplicant \
"
DISTRO_FEATURES_append += " bluez5 bluetooth wifi"
IMAGE_INSTALL_append = " \
${MY_TOOLS} \
${MY_PKGS} \
${MY_FEATURES} \
basicquick \
"
# Qt >5.7 doesn't ship with fonts, so these need to be added explicitely
IMAGE_INSTALL_append = "\
ttf-dejavu-sans \
ttf-dejavu-sans-mono \
ttf-dejavu-sans-condensed \
ttf-dejavu-serif \
ttf-dejavu-serif-condensed \
ttf-dejavu-common \
"
and the bb file for the demo itself
SUMMARY = "Simple Qt5 Quick application"
SECTION = "examples"
LICENSE = "MIT"
LIC_FILES_CHKSUM = "file://${COMMON_LICENSE_DIR}/MIT;md5=0835ade698e0bcf8506ecda2f7b4f302"
# I want to make sure these get installed too.
DEPENDS += "qtbase qtdeclarative qtquickcontrols2"
SRCREV = "${AUTOREV}"
# GitLab own repo
SRC_URI[sha256sum] = "f2dcc13cda523e9aa9e8e1db6752b3dfaf4b531bfc9bb8e272eb3bfc5974738a"
SRC_URI = "git://git#gitlab.com:/some-repo.git;protocol=ssh"
S = "${WORKDIR}/git"
require recipes-qt/qt5/qt5.inc
do_install() {
install -d ${D}${bindir}
install -m 0755 BasicQuick ${D}${bindir}
}
Upon execution I got the error
QQmlApplicationEngine failed to load component
qrc:/main.qml:24 Type Page2 unavailable
qrc:/Page2.qml:4 module "QtQuick.Controls.Styles" is not installed
with Page2 being an item I have defined and used inside main.qml. The demo runs without any issues on my PC (custom built Qt 5.9.1) but fails on the RPi2 due to the missing submodule.
Frankly I've never use this submodule before (my custom built Qt 5.9.1 has everything enabled) and I'm not sure what I need include (if meta-qt5 even provides it) in order to be able to use it on the Yocto system.
The problem is the mismatch of versions of the Qt Quick Controls package.
You use version 1:
import QtQuick.Controls.Styles 1.4
but build version 2:
MY_PKGS = " \
...
qtquickcontrols2 \
...
What you need include in your image is qtquickcontrols.
You need to install qtquickcontrols-qmlplugins.
Just add to build/local.conf
PACKAGECONFIG_append_pn-qtbase = " accessibility"
PACKAGECONFIG_append_pn-qtquickcontrols = " accessibility"
IMAGE_INSTALL_append = " qtdeclarative-qmlplugins qtquickcontrols-qmlplugins"
Here is origin manual
https://importgeek.wordpress.com/2018/07/17/module-qtquick-controls-is-not-installed/

Resources