Python LDA gensim "DeprecationWarning: invalid escape sequence"

Python LDA gensim "DeprecationWarning: invalid escape sequence" - r

I am new to stackoverflow and python so please bear with me.
I am trying to run an Latent Dirichlet Analysis on a text corpora with the gensim package in python using PyCharm editor. I prepared the corpora in R and exported it to a csv file using this R command:
write.csv(testdf, "C://...//test.csv", fileEncoding = "utf-8")
Which creates the following csv structure (though with much longer and already preprocessed texts):
,"datetimestamp","id","origin","text"
1,"1960-01-01","id_1","Newspaper1","Test text one"
2,"1960-01-02","id_2","Newspaper1","Another text"
3,"1960-01-03","id_3","Newspaper1","Yet another text"
4,"1960-01-04","id_4","Newspaper2","Four Five Six"
5,"1960-01-05","id_5","Newspaper2","Alpha Bravo Charly"
6,"1960-01-06","id_6","Newspaper2","Singing Dancing Laughing"
I then try the following essential python code (based on the gensim tutorials) to perform simple LDA analysis:
import gensim
from gensim import corpora, models, similarities, parsing
import pandas as pd
from six import iteritems
import os
import pyLDAvis.gensim
class MyCorpus(object):
def __iter__(self):
for row in pd.read_csv('//mpifg.local/dfs/home/lu/Meine Daten/Imagined Futures and Greek State Bonds/Topic Modelling/Python/test.csv', index_col=False, header = 0 ,encoding='utf-8')['text']:
# assume there's one document per line, tokens separated by whitespace
yield dictionary.doc2bow(row.split())
if __name__ == '__main__':
dictionary = corpora.Dictionary(row.split() for row in pd.read_csv(
'//.../test.csv', index_col=False, encoding='utf-8')['text'])
print(dictionary)
dictionary.save(
'//.../greekdict.dict') # store the dictionary, for future reference
## create an mmCorpus
corpora.MmCorpus.serialize('//.../greekcorpus.mm', MyCorpus())
corpus = corpora.MmCorpus('//.../greekcorpus.mm')
dictionary = corpora.Dictionary.load('//.../greekdict.dict')
corpus = corpora.MmCorpus('//.../greekcorpus.mm')
# train model
lda = gensim.models.ldamodel.LdaModel(corpus=corpus, id2word=dictionary, num_topics=50, iterations=1000)
I get the following error codes and the code exits:
...\Python\venv\lib\site-packages\setuptools-28.8.0-py3.6.egg\pkg_resources_vendor\pyparsing.py:832: DeprecationWarning: invalid escape sequence \d
\...\Python\venv\lib\site-packages\setuptools-28.8.0-py3.6.egg\pkg_resources_vendor\pyparsing.py:2736: DeprecationWarning: invalid escape sequence \d
\...\Python\venv\lib\site-packages\setuptools-28.8.0-py3.6.egg\pkg_resources_vendor\pyparsing.py:2914: DeprecationWarning: invalid escape sequence \g
\...\Python\venv\lib\site-packages\pyLDAvis_prepare.py:387:
DeprecationWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing
I cannot find any solution and to be honest neither have any clue where exactly the problem comes from. I spent hours making sure that the encoding of the csv is utf-8 and exported (from R) and imported (in python) correctly.
What am I doing wrong or where else could I look at? Cheers!

DeprecationWarining is exactly that - warning about a feature being deprecated which is supposed to prompt the user to use some other functionality instead to maintain the compatibility in the future. So in your case I would just watch for the update of libraries that you use.
Starting with the last warning it look like it is originating from pandas and has been logged against pyLDAvis here.
The remaining ones come from pyparsing module but it does not seem that you are importing it explicitly. Maybe one of the libraries you use has a dependency and uses some relatively old and deprecated functionality. To eradicate the warning for the start I would check if upgrading does not help. Good luck!

import warnings
warnings.filterwarnings("ignore")
pyLDAvis.enable_notebook()
Try using this

Related

Can´t import biom-file with import_biom in R

I have collected 6 fastq-files from the same mock-sample and I merged them using gzip in linux for further using it with Kraken2. The output-file from Kraken2 (.report) was converted to .biom-format using Kraken-biom in linux. When I then try to import the .biom-file into R using import_biom-package I receive the following message:
Error in validObject(.Object) : invalid class “phyloseq” object:
Component sample names do not match. Try sample_names()
I have opened the .biom-file and can only see one sample name (the one I called the output-file during gzip). I tried to use sample_names(), but cant do it since the .biom-file is not loaded into R. Do anyone know why the sample names do not match since I merged them to one, so should it not be one sample name?
Edit: When I run Kraken2 on the 6 fastq-files without merging them and then using kraken-biom, it works to import the .biom-file into R.

"`select()` doesn't handle lists" when computing textSimilarity between two word embeddings in R

How many words in word embedding variables do you need to compute semantic similarity in r-package text? I’m trying to run:
library(text)
WEhello<-textEmbed("hello")
WEgoodbye<-textEmbed("goodbye")
textSimilarity(WEhello, WEgoodbye)
But I get this error:
Error in `dplyr::select()`:
! `select()` doesn't handle lists.

To get this to work you have to select the word embedding (and avoid also including the $singlewords_we). Try this:
textSimilarity(WEhello$x, WEgoodbye$x)

How to convert Tensorflow Object Detection API model to TFLite?

I am trying to convert a Tensorflow Object Detection model(ssd-mobilenet-v2-fpnlite, from TensorFlow 2 Detection Model Zoo) to TFLite. First of all, I train the model using the model_main_tf2.py and then I use the export_tflite_graph_tf2.py to export a saved model(.pb). However, when it comes to convert the .pb file to .tflite it throws this error:
OSError: SavedModel file does not exist at: /content/gdrive/My Drive/models/research/object_detection/fine_tuned_model/saved_model/saved_model.pb/{saved_model.pbtxt|saved_model.pb}
To convert the .pb file I used:
import tensorflow as tf
SAVED_MODEL_PATH = os.path.join(os.getcwd(),'object_detection', 'fine_tuned_model', 'saved_model', 'saved_model.pb')
# SAVED_MODEL_PATH: '/content/gdrive/My Drive/models/research/object_detection/exported_model/saved_model/saved_model.pb'
converter = tf.lite.TFLiteConverter.from_saved_model(SAVED_MODEL_PATH)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.experimental_new_converter = True
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]
tflite_model = converter.convert()
open("detect.tflite", "wb").write(tflite_model)
or "tflite_convert" from command line, but with the same error. I also tried to run it with the latest tf-nightly version as it suggests here, but the outcome is the same. I tried to pass the path with various ways, it seems like the .pd is not well written (not the right file). Is there a way to manage to convert the model to tflite so as to implement it to android? Thank you!

Your saved_model path should be "/content/gdrive/My Drive/models/research/object_detection/fine_tuned_model/saved_model/". It is the folder instead of files in that folder
For quick test, try to type in terminal
tflite_convert \
--saved_model_dir="path to saved_folder" \
--output_file="path to tflite file u want to save"

I don't have enough reputation to just comment but the problem here seems to be your SAVED_MODEL_PATH.
You could try to hardcode the path and remove the .pb file. I don't remember exactly what's the trick here but it's definitively due to the path

Using speech to text with googlelanguageR produces NULL transcripts

I'm using the R package 'googleLanguageR' to transcribe various 30 second audio files (over 500 so want to automatize this). I've followed all the steps in the googleLanguageR tutorials, got my key, and authenticated through R.
I'm able to transcribe the test audio (.wav) that comes with the package, but whenever I apply the same function to my files (.mp3), I get NULL for both transcript and timings.
This is the code provided in tutorials:
# get the sample source file
test_audio <- system.file("woman1_wb.wav", package = "googleLanguageR")
gl_speech(test_audio)$transcript
If I use the same for my file, I get an empty element, so I've tried the following with no luck:
test_audio <- "/audio_location/filename.mp3"
gl_speech(test_audio)$transcript
Has anybody encountered a similar problem with this package or have any suspicions of why it produces NULL transcripts?

Use Azure custom-vision trained model with tensorflow.js

I've trained a model with Azure Custom Vision and downloaded the TensorFlow files for Android
(see: https://learn.microsoft.com/en-au/azure/cognitive-services/custom-vision-service/export-your-model). How can I use this with tensorflow.js?
I need a model (pb file) and weights (json file). However Azure gives me a .pb and a textfile with tags.
From my research I also understand that there are also different pb files, but I can't find which type Azure Custom Vision exports.
I found the tfjs converter. This is to convert a TensorFlow SavedModel (is the *.pb file from Azure a SavedModel?) or Keras model to a web-friendly format. However I need to fill in "output_node_names" (how do I get these?). I'm also not 100% sure if my pb file for Android is equal to a "tf_saved_model".
I hope someone has a tip or a starting point.

Just parroting what I said here to save you a click. I do hope that the option to export directly to tfjs is available soon.
These are the steps I did to get an exported TensorFlow model working for me:
Replace PadV2 operations with Pad. This python function should do it. input_filepath is the path to the .pb model file and output_filepath is the full path of the updated .pb file that will be created.
import tensorflow as tf
def ReplacePadV2(input_filepath, output_filepath):
graph_def = tf.GraphDef()
with open(input_filepath, 'rb') as f:
graph_def.ParseFromString(f.read())
for node in graph_def.node:
if node.op == 'PadV2':
node.op = 'Pad'
del node.input[-1]
print("Replaced PadV2 node: {}".format(node.name))
with open(output_filepath, 'wb') as f:
f.write(graph_def.SerializeToString())
Install tensorflowjs 0.8.6 or earlier. Converting frozen models is deprecated in later versions.
When calling the convertor, set --input_format as tf_frozen_model and set output_node_names as model_outputs. This is the command I used.
tensorflowjs_converter --input_format=tf_frozen_model --output_json=true --output_node_names='model_outputs' --saved_model_tags=serve path\to\modified\model.pb folder\to\save\converted\output
Ideally, tf.loadGraphModel('path/to/converted/model.json') should now work (tested for tfjs 1.0.0 and above).

Partial answer:
Trying to achieve the same thing - here is the start of an answer - to make use of the output_node_names:
tensorflowjs_converter --input_format=tf_frozen_model --output_node_names='model_outputs' model.pb web_model
I am not yet sure how to incorporate this into same code - do you have anything #Kasper Kamperman?

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Python LDA gensim "DeprecationWarning: invalid escape sequence" - r

import warnings warnings.filterwarnings("ignore") pyLDAvis.enable_notebook() Try using this

Related

Can´t import biom-file with import_biom in R

"`select()` doesn't handle lists" when computing textSimilarity between two word embeddings in R

How to convert Tensorflow Object Detection API model to TFLite?

Using speech to text with googlelanguageR produces NULL transcripts

Use Azure custom-vision trained model with tensorflow.js

Categories

Resources