Training the SyntaxNet POS Tagger in SytaxNet - syntaxnet

I had some difficulties training SyntaxNet POS tagger and parser training and I could find a good solution which I addressed in Answers section. if you have got stuck in one of the following problem this documentation really helps you:
the training, testing, and tuning data set introduced by Universal Dependencies were at .conllu format and I did not know how to change the format to .conll file and also after I found conllu-formconvert.py and conllu_to_conllx.plI still didn't have a clue about how to use them. If you have some problem like this the documentation has a python file named convert.py which is called in the main body of train.sh and [train_p.sh][5] to convert the downloded datasets to readble files for SyntaxNet.
whenever I ran bazel test, I was told to run bazel test on one of stackoverflow question and answer, on parser_trainer_test.sh it failed and then it gave me this error in test.log: path to save model cannot be found : --model_path=$TMP_DIR/brain_parser/greedy/$PARAMS/ model
the documentation splited train POS tagger and PARSER and showed how to use different directories in parser_trainer and parser_eval. even if you don't want to use the document it self you can update your files based on that.
3. for me training parser took one day so don't panic it takes time "if you do not use use gpu server" said disinex

I got one answer from github from Disindex and I found it very useful.
the documentation in https://github.com/dsindex/syntaxnet includes:
convert_corpus
train_pos_tagger
preprocess_with_tagger
As Disindex said and I quote: " I thought you want to train pos-tagger. if then, run ./train.sh"

Related

Rendering a Quarto blog post trips an error when reading in a brms file object

First, I'll apologize for not having a fuller reproducable example, but I'm not entirely sure how to go about that given the various layers to the question/problem.
I'm moving a blog over from Blogdown to a new Quarto-based website and blog. I have three saved brms object files that I'm trying to read into a code chunk in one of the posts. The code chunks work fine when I run them manually, but when I try to render the blog post I get the following error:
Quitting from lines 75-86 (tables-modelsummary-brms.qmd)
Error in stri_replace_all_charclass(str, "[\\u0020\\r\\n\\t]", " ", merge = TRUE) :
invalid UTF-8 byte sequence detected; try calling stri_enc_toutf8()
Calls: .main ... stri_trim -> stri_trim_both -> stri_replace_all_charclass
Execution halted
I've checked the primary data frame contained in the brms model object and all of the character vectors there are valid UTF-8 vectors. These models objects can be quite large, so it's possible I'm missing something buried deep within the model object, but so far it's nothing apparent.
I tried re-running the models again to ensure that the model objects' files weren't corrupted, and also to make sure that the encoding issue wasn't somehow introduced the last time they were run, which would have been on a Windows machine and a different version of brms.
I've also moved the brms files around to different directories to see if it's a file path issue. The same error comes up regardless of whether the files are in the same folder as the blog post qmd file or in a parent directory file I use for storing site data.
I've also migrated several other posts to the new Quarto site successfully, and some of them also contain R code, but it's all rendering without a problem.
Finally, I don't quite understand how to implement the suggersted alternate function found in the error message either.

Parsing failure when working with the rethomics (circadian behavior) package in R

I am having issues both with my data and the test data provided in the rethomics tutorial (https://rethomics.github.io/damr.html#linking) specifically at the linking step. When run the following code, it provides me with the following error.
Code:
list.files(DATA_DIR, pattern= "*.txt|*.csv")
setwd(DATA_DIR)
metadata <- link_dam_metadata(metadata, result_dir = DATA_DIR)
metadata
Output:
row col expected actual
1 -- date like 7/1/2017 8:00
Error: 1 parsing failure
I also received a similar error when attempting to apply my own data. This is coming directly from the tutorial, so is there something wrong with the code tutorial or is it something that I have not done correctly. Thank you in advance.
This does not answer your question per se, but I thought I'd throw it out there in case it helps someone who stumbles across this.
Here's how you can create a behavr table in memory, without the disk-based link and load steps: use the behavr() function described at https://rdrr.io/cran/behavr/man/behavr.html
I have some very simple data my PI wanted plotted as an actogram using ggetho (2 variables from a single piece of equipment over time, so, only "one animal"). I wanted to do this in memory where I already had everything, but all the tutorials and discussions either already came with a behavr table, or used the link_dam_metadata(), load_dam() kind of stuff; nobody I saw talked about how to do it in memory.
I luckily stumbled across this function just as I was getting close to getting it right the disk way....

R function example requires nonstandard dataset, doesn't jive with devtools

I've been struggling to get the example code for a function working using devtools::check(), because the data required for the example is not in .RData format. Unfortunately, the way the function is written, .RData cannot be loaded and work properly. The function takes in a list of filenames and performs an action on them collectively.
Therefore, example code must be written in a way that check() is able to access a folder and list the files therein. Using the function on my own computer, I input
setwd("/Users/mydirectory")
myfilelist <- list.files(pattern = "mypattern")
output <- myfunction(myfilelist, ...)
and everything is groovy. But this doesn't work with devtools because #examples doesn't know how to access subdirectories on my computer. check() pulls the following error:
base::assign(".ptime", proc.time(), pos = "CheckExEnv")
This is almost undoubtedly because check() doesn't know where to look for the data. I'd like it to look toward github to access the online data repository.
I found this brief conversation regarding a similar roxygen-related problem, but overall I haven't seen much advice on how to work through it. I think that perhaps this issue starts to get a little closer to my situation, but here the user failed to export a function, rather than bind data to an example.
I don't think I'm looking for a pull function (though the end goal is to pull data...), does anyone have advice moving forward? I have the data stored in the inst/extdata folder on github, so while I don't really have something reproducible for you all I'm hoping you might have some thoughts.
Edit: I worked around the problem using #alistaire's advice below, and guiding the roxygen to the package directory (updated on github) and also using \dontrun{}. However, I am leaving the question unanswered for now because I think accessing data stored in github should still be somehow possible and we haven't yet addressed that.

How to put an r script into a package

I'm writing my first R package and have made a successful build with documentation using roxygen2 and added data sets.
However, I would also like ship an example script with how I use the functions in the r package. But I don't know where to put it.
Let's say I have created MyPackage. I have put my function scripts in the /R folder. Let's say I have:
foo1.R
foo2.R
foo3.R
Somewhere I'd also like to put a script with my workflow. Let's say I have a file, MyWorkflow.R:
library(MyPackage)
load(file='inData.R') # Loads indata variables A, B and C
X=foo1(A)
Y=foo2(X,B)
Z=foo3(Y,C)
Can I do this? If so, where do I put it? Is it an OK procedure - or generally frowned upon?
Any help or thoughts are appreciated.
Thanks.
Carl
Edit:
I looked at the link on demo/ and exec/, but didn't understand the exec/ folder thing. Grateful if you could clarify/exemplify/point to good uses of...
If I understand correctly, I'm not looking for an example or demo/, since the script won't necessarily be executable without tweaking by the user (e.g. to provide input data or paths). I "just" want to add an example script showing how I work with these functions.
I realise I should probably dive into the world of vignettes, but have difficulty in finding the time/oomph/energy to do so.
I also saw that there's the inst/ folder. Could you shed some light on the different uses of these options or hint at good examples of where they've been used (I often find examples more informative than reading an explanatory text that's above my level - I often get the feeling of being like a dog looking at a ceiling fan ;)
Will add info to the GitHub README. Thx for good suggestion!
Created inst/Workflow_Example/workflow.R. Upon build & reload, a Workflow_Example folder was created in the library with workflow.R script in it.
In combination with an explanatory remark in the README, this looks like what I was after. Problem solved or am I not seeing something obvious? Am I e.g. violating conventions/conduct/good practice?
You could either put it in demo/ or exec/ depending on the format of the script. See here for more details. I would mention the workflow and where it lives in the README regardless, and if you host your code on Github, you could create a wiki to describe the workflow and place the script there. This would be similar to what nrussell has mentioned in a comment above.

How can I remove a lock from a linked environment in R?

I tried to run a Bioconductor package (truncateCDF) that modify an environment(hgu133plus2cdf), to remove unwanted probesets, from an affymetrix chip.
Everything went fine, until I got the following message (translated from french):
> assign(cdfname, cdf.env, env=CDF.env)
Error in assign(cdfname, cdf.env, env = CDF.env) :
impossible to change the value of a locked link for 'hgu133plus2cdf'
The assign function is the ultimate function of the code, that save the changes made to the environment dataset CDF.env to the original environment (hgu133plus2cdf), before using it in analyses of affymetrix chip results; so, it is essential.
My question: what is this locked link to the hgu133plus2cdf environment, and how could I bypass it.
The author of this package successfully run its package around 2005; so I suppose it is a feature introduced since then in R (probably not related to Bioconductor, since assign is a basic R function, reason why I ask this question on this forum instead of Biostar).
I tried to read the docs, but I am overwhelmed, when it comes to environments.
Thanks in advance for any help.
I don't think truncateCDF is from a Bioconductor package; it is a at least not current. This sounds like this post and the next two from the same thread from the Bioconductor mailing list. It is a result of a change in R -- packages now have not-easily-modified name spaces, and these are implemented by locking the environment in which name space symbols are defined. Removing probes is not an essential part of a typical microarray work flow. Please ask on the Bioconductor mailing list (no subscription required) if you'd like more help.

Resources