'ConcatenatedDoc2Vec' object has no attribute 'docvecs' - jupyter-notebook

I am a beginner in Machine Learning and trying Document Embedding for a university project. I work with Google Colab and Jupyter Notebook (via Anaconda). The problem is that my code is perfectly running in Google Colab but if i execute the same code in Jupyter Notebook (via Anaconda) I run into an error with the ConcatenatedDoc2Vec Object.
With this function I build the vector features for a Classifier (e.g. Logistic Regression).
def build_vectors(model, length, vector_size):
vector = np.zeros((length, vector_size))
for i in range(0, length):
prefix = 'tag' + '_' + str(i)
vector[i] = model.docvecs[prefix]
return vector
I concatenate two Doc2Vec Models (d2v_dm, d2v_dbow), both are working perfectly trough the whole code and have no problems with the function build_vectors():
d2v_combined = ConcatenatedDoc2Vec([d2v_dm, d2v_dbow])
But if I run the function build_vectors() with the concatenated model:
#Compute combined Vector size
d2v_combined_vector_size = d2v_dm.vector_size + d2v_dbow.vector_size
d2v_combined_vec= build_vectors(d2v_combined, len(X_tagged), d2v_combined_vector_size)
I receive this error (but only if I run this in Jupyter Notebook (via Anaconda) -> no problem with this code in the Notebook in Google Colab):
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [20], in <cell line: 4>()
1 #Compute combined Vector size
2 d2v_combined_vector_size = d2v_dm.vector_size + d2v_dbow.vector_size
----> 4 d2v_combined_vec= build_vectors(d2v_combined, len(X_tagged), d2v_combined_vector_size)
Input In [11], in build_vectors(model, length, vector_size)
3 for i in range(0, length):
4 prefix = 'tag' + '_' + str(i)
----> 5 vector[i] = model.docvecs[prefix]
6 return vector
AttributeError: 'ConcatenatedDoc2Vec' object has no attribute 'docvecs'
Since this is mysterious (for me) -> Working in Google Colab but not Anaconda and Juypter Notebook -> and I did not find anything to solve my problem in the web.

If it's working one place, but not the other, you're probably using different versions of the relevant libraries – in this case, gensim.
Does the following show exactly the same version in both places?
import gensim
print(gensim.__version__)
If not, the most immediate workaround would be to make the place where it doesn't work match the place that it does, by force-installing the same explicit version – pip intall gensim==VERSION (where VERSION is the target version) – then ensuring your notebook is restarted to see the change.
Beware, though, that unless starting from a fresh environment, this could introduce other library-version mismatches!
Other things to note:
Last I looked, Colab was using an over-4-year-old version of Gensim (3.6.0), despite more-recent releases with many fixes & performance improvements. It's often best to stay at or closer-to the latest versions of any key libraries used by your project; this answer describes how to trigger the installation of a more-recent Gensim at Colab. (Though of course, the initial effects of that might be to cause the same breakage in your code, adapted for the older version, at Colab.)
In more-recent Gensim versions, the property formerly called docvecs is now called just dv - so some older code erroring this way may only need docvecs replaced with dv to work. (Other tips for migrating older code to the latest Gensim conventions are available at: https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4 )
It's unclear where you're pulling the ConcatenatedDoc2Vec class from. A clas of that name exists in some Gensim demo/test code, as a very minimal shim class that was at one time used in attempts to reproduce the results of the original "Paragaph Vector" (aka Doc2Vec) paper. But beware: that's not a usual way to use Doc2Vec, & the class of that name I know barely does anything outside its original narrow purpose.
Further, beware that as far as I know, noone has ever reproduced the full claimed performance of the two-kinds-of-doc-vectors-concatenated approach reported in that paper, even using the same data/described-technique/evaluation. The claimed results likely relied on some other undisclosed techniques, or some error in the writeup. So if you're trying to mimic that, don't get too frustrated. And know most uses of Doc2Vec just pick one mode.
If you have your own separate reasons for creating concatenated feature-vectors, from multiple algorithms, you should probably write your own code for that, not limited to the peculiar two-modes-of-Doc2Vec code from that one experiment.

Related

Using reticulate with targets

I'm having this weird issue where my target, which interfaces a slightly customized python module (installed with pip install --editable) through reticulate, gives different results when it's being called from an interactive session in R from when targets is being started from the command line directly, even when I make sure the other argument(s) to tar_make are identical (callr_function = NULL, which I use for interactive debugging). The function is deterministic and should be returning the exact same result but isn't.
It's tricky to provide a reproducible example but if truly necessary I'll invest the required time in it. I'd like to get tips on how to debug this and identify the exact issue. I already safeguarded against potential pointer issues; the python object is not getting passed around between different targets/environments (anymore), rather it's immediately used to compute the result of interest. I also checked that the same python version is being used by printing the result of reticulate::pyconfig() to screen. I also verified both approaches are using the same version of the customized module.
Thanks in advance..!

How to create a sys image that has multiple packages with their precompiled functions cached in julia?

Let's say we have a symbol array of packages packages::Vector{Symbol} = [...] and we want to create a sys image using PackageCompiler.jl. We could simply use
using PackageCompiler
create_sysimage(packages; incremental = false, sysimage_path = "custom_sys.dll"
but without a precompile_execution_file, this isn't going to be worth it.
Note: sysimage_path = "custom_sys.so" on Linux and "custom_sys.dylib" on macOS...
For the precompile_execution_file, I thought running the test for each package might do it so I did something like this:
precompilation.jl
packages = [...]
#assert typeof(packages) == Vector{Symbol}
import Pkg
m = Module()
try Pkg.test.(Base.require.(m, packages)) catch ; end
The try catch is for when some tests give an error and we don't want it to fail.
Then, executing the following in a shell,
using PackageCompiler
packages = [...]
Pkg.add.(String.(packages))
Pkg.update()
Pkg.build.(String.(packages))
create_sysimage(packages; incremental = false,
sysimage_path = "custom_sys.dll",
precompile_execution_file = "precompilation.jl")
produced a sys image dynamic library which loaded without a problem. When I did using Makie, there was no delay so that part's fine but when I did some plotting with Makie, there still was the first time plot delay so I am guessing the precompilation script didn't do what I thought it would do.
Also when using tab to get suggestions in the repl, it would freeze the first time but I am guessing this is an expected side effect.
There are a few problems with your precompilation.jl script, that make tests throw errors which you don't see because of the try...catch.
But, although running the tests for each package might be a good idea to exercise precompilation, there are deeper reasons why I don't think it can work so simply:
Pkg.test spawns a new process in which tests actually run. I don't think that PackageCompiler can see what happens in this separate process.
To circumvent that, you might want to simply include() every package's test/runtests.jl file. But this is likely to fail too, because of missing test-specific dependencies.
So I would say that, for this to work reliably and systematically for all packages, you'd have to re-implement (or re-use, if you can) some of the internal logic of Pkg.test in order to add all test-specific dependencies to the current environment.
That being said, some packages have ready-to-use precompilation scripts helping to do just this. This is the case for Makie, which suggests in its documentation to use the following file to build system images:
joinpath(pkgdir(Makie), "test", "test_for_precompile.jl")

How do I update scripts from OpenMx 1 to OpenMx 2?

I have an example OpenMx script written a few years ago to do twin modelling.
It was written for OpenMx version 1.0 (script linked here )
When I run it, there are some warnings about updating fit functions and objectives. How should I update it to use OpenMx 2.0 fit function calls?
There are a small number of changes from OpenMx 1.0 to 2.0 and higher. Nearly all scripts will run fine, but some pre-2012 or will benefit in features if you update for OpenMx 2.x
An example is referenced here
The user had hassles with:
1. No path to the helper functions
This is a more generic robustness issue for example R code: better to include web urls rather than disk-based file paths.
source("http://www.vipbg.vcu.edu/~vipbg/Tc24/GenEpiHelperFunctions.R")
A better solution is CRAN-based helper packages like umx. These are easier to keep up to date and accessible.
2. Old-style objectives (instead of expectations and fit functions)
Calls like this one are deprecated:
objMZ<- mxFIMLObjective(covariance="expCovMZ", means="expMean", dimnames=selVars)
It’s easy to update these across a stack of scripts, replacing mxFIMLObjective with mxExpectationNormal + a call to mxFitFunctionML
In addition, in old-style multiple-group objectives like this:
minus2ll <- mxAlgebra( expression = MZ.objective + DZ.objective, name="m2LL")
obj <- mxAlgebraObjective("m2LL")
You should replace mxAlgebraObjective with mxFitFunctionAlgebra
However, OpenMx 2 has a neat Multigroup function which handles this in one line and enables identification checks, reference model generation etc.
So just replace the whole thing with (for example):
mxFitFunctionMultigroup(c("MZ", "DZ"))}

Trouble of understanding the concept of packages in Common Lisp

A time ago I started learning Common Lisp, but now I have come to my first real stumbling block, understanding a concept. I started to change my learning projects to move from single file sources to packages. Everything so far went as expected, but then, I stumbled upon one file, a sudoku game I coded, that behaves other then I thought. You can find it here: https://github.com/Silberbogen/cl-sudoku
When I started (spiele-sudoku) after I switched inside the package via (in-package :cl-sudoku), everything works fine, but when I start it via (cl-sudoku:spiele-sudoku), only my input of coordinates is excepted, while any other input seems not to be interpreted.
What concept do I miss, so I could start the game via (cl-sudoku:spiele)?
You use read-from-string to read your input. That will intern any word encountered as a symbol into the current package.
In your main function, you use case to compare with symbols, but those are interned into the cl-sudoku package. So, if your current package is cl-sudoku, it will work, otherwise not.
You should not use read or read-form-string to parse user input (if you absolutely must, at least bind *read-eval* to nil). Instead call intern yourself (possibly in combination string-upcase) to create symbols in the right package. If you want to use package-independent symbols, intern them into the KEYWORD package, so that you can do case on keywords.
It might be helpful to use ecase or ccase, or at least log some debug information on invalid input.

R2PPT crashes R; are there alternatives to R2PPT?

I am attempting to automate the insertion of JPEG images into Powerpoint. I have a macro done for that already, except using R would be infinitely better for my purposes.
The package R2PPT should do this, I understand. However, I cannot use it. For example, when I try to use PPT.Open, I understand I can do it two different ways by calling method = "rcom" or method = "RDCOMClient". Using the latter, R will always crash, sending an error report to windows. Using the former, it tells me I need to install statconnDCOM , before giving the error:
Error in PPT.Open(x) : attempt to apply non-function.
I cannot install statconnDCOM freely, as I wouldn't call this work non-commercial. So if there isn't a way to get around this issue, are there at least some free alternatives to R2PPT so that I can save several hours of manual work with a simple R code? If there is a way for me to use R2PPT, that would be ideal.
Thanks!
Edit:
I'm using R version 2.15 and downloaded the most recent version of R2PPT. Powerpoint is 2007.
Do you have administrative privileges on this machine?
There is an issue with package RDCOMClient. It needs permissions to write file rdcom.err in the root of drive C:. If you don't have privileges to write to c:, there is a rather cumbersome workaround:
Close R
Create "c:\temp" folder if it doesn't exist.
Locate on your hard drive file rdcomclient.dll. It usually placed in \R\library\RDCOMClient\libs\i386\ and in \R\library\RDCOMClient\libs\x64\ (you need to patch file which corresponds your Windows version - 32 bit or 64 bit). It's recommended to make backup copy of this files before patching.
Open rdcomclient.dll in text editor (Notepad++, for example -http://notepad-plus-plus.org/)
Find in file string c:\rdcom.err - it occurs only once.
Go into overwrite mode (usually by pressing "Ins" key). It is very important that new path will have the same number of characters as original one. Type C:\temp\e.rr instead of c:\rdcom.err
Save the file.
Now all should work fine.
Arguably not an answer, but have you looked at using Sweave/knitr to render your presentations in LaTeX using something like Beamer? (As discussed on slide 17 here.)
Wouldn't help any with getting JPGs into a PowerPoint, but would certainly make putting R-output (numerical or graphical) into a presentation much easier!
Edit: if you want to use knitr (which I recommend), here's another reference.

Resources