Import wav file in Tensorflow 2 - wav

Using Python 3.7 and Tensorflow 2.0, I'm having a hard time reading wav files from the UrbanSounds dataset. This question and answer are helpful because they explain that the input has to be a string tensor, but it seems to be having a hard time getting past the initial metadata encoded in the file, and getting to the real data. Do I have to preprocess the string before being able to load it as a float32 tensor? I already had to preprocess the data by downsampling it from 24-bit wav to 16-bit wav, so the data-input pipeline is turning out to be much more cumbersome than I would have expected. The required downsampling is particularly frustrating. Here's what I'm trying so far:
import tensorflow as tf # this is TensorFlow 2.0
path_to_wav_file = '/mnt/d/Code/UrbanSounds/audio/fold1/101415-3-0-2.wav'
# Turn the wav file into a string tensor
input_data = tf.io.read_file(path_to_wav_file)
# Convert the string tensor to a float32 tensor
audio, sampling_rate = tf.audio.decode_wav(input_data)
This is the error I get at the last step:
2019-10-08 20:56:09.124254: W tensorflow/core/framework/op_kernel.cc:1546] OP_REQUIRES failed at decode_wav_op.cc:55 : Invalid argument: Header mismatch: Expected fmt but found junk
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/ops/gen_audio_ops.py", line 216, in decode_wav
_six.raise_from(_core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Header mismatch: Expected fmt but found junk [Op:DecodeWav]
And here is the beginning of that string tensor. I'm no expert on wav files, but I think the part after "fmt" is where the actual audio data starts. Before that I think it's all metadata about the file.
data.numpy()[:70]
b'RIFFhb\x05\x00WAVEjunk\x1c\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00fmt \x10\x00\x00\x00\x01\x00\x01\x00D\xac\x00\x00\x88X\x01\x00\x02\x00'

It seems like your error has to do with TensorFlow expecting the fmt part as the beginning.
The code of TensorFlow for the processing can be found here: https://github.com/tensorflow/tensorflow/blob/c9cd1784bf287543d89593ca1432170cdbf694de/tensorflow/core/lib/wav/wav_io.cc#L225
There's also an open issue, awaiting response from TensorFlow's team which roughly covers the same error you've provided.
https://github.com/tensorflow/tensorflow/issues/32382
Other libraries just skip the Junk part, so it works with them.

It seems that your code fails for dual channel audio file. The code works for mono channel wav file. In your case you can try using scipy.
from scipy.io import wavfile as wav
sampling_rate, data = wav.read('101415-3-0-2.wav')

Related

How to convert Tensorflow Object Detection API model to TFLite?

I am trying to convert a Tensorflow Object Detection model(ssd-mobilenet-v2-fpnlite, from TensorFlow 2 Detection Model Zoo) to TFLite. First of all, I train the model using the model_main_tf2.py and then I use the export_tflite_graph_tf2.py to export a saved model(.pb). However, when it comes to convert the .pb file to .tflite it throws this error:
OSError: SavedModel file does not exist at: /content/gdrive/My Drive/models/research/object_detection/fine_tuned_model/saved_model/saved_model.pb/{saved_model.pbtxt|saved_model.pb}
To convert the .pb file I used:
import tensorflow as tf
SAVED_MODEL_PATH = os.path.join(os.getcwd(),'object_detection', 'fine_tuned_model', 'saved_model', 'saved_model.pb')
# SAVED_MODEL_PATH: '/content/gdrive/My Drive/models/research/object_detection/exported_model/saved_model/saved_model.pb'
converter = tf.lite.TFLiteConverter.from_saved_model(SAVED_MODEL_PATH)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.experimental_new_converter = True
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]
tflite_model = converter.convert()
open("detect.tflite", "wb").write(tflite_model)
or "tflite_convert" from command line, but with the same error. I also tried to run it with the latest tf-nightly version as it suggests here, but the outcome is the same. I tried to pass the path with various ways, it seems like the .pd is not well written (not the right file). Is there a way to manage to convert the model to tflite so as to implement it to android? Thank you!
Your saved_model path should be "/content/gdrive/My Drive/models/research/object_detection/fine_tuned_model/saved_model/". It is the folder instead of files in that folder
For quick test, try to type in terminal
tflite_convert \
--saved_model_dir="path to saved_folder" \
--output_file="path to tflite file u want to save"
I don't have enough reputation to just comment but the problem here seems to be your SAVED_MODEL_PATH.
You could try to hardcode the path and remove the .pb file. I don't remember exactly what's the trick here but it's definitively due to the path

Batch processing a 30GB json file in R

I have a large (30GB) json file of tweets I'd like to parse and conduct some text analysis with in R. Tweets were acquired using the filter_stream function from the twitteR package about 2 years ago. Here is a sample (pretty standard): https://www.dropbox.com/s/ecrfo3etk2ingcm/WomensMarch2018.json?dl=0.
My computer grinds to a halt anytime I attempt the following:
library(streamR)
mydata <- parseTweets("BigData.json", simplify = TRUE)
I know I need to batch process the file, else move to a cloud server with tons of RAM, but I don't know how to do either. Can anyone help?
Edit: I tried this solution (Reading a huge json file in R , issues), but get the following error:
Error: lexical error: invalid char in json text.
_at":"Wed Jul 21 12:54:05 +{"created_at":"Sat Jan 21 17:18:2
(right here) ------^

Problems parsing StreamR JSON data

I am attempting to use the streamR in R to download and analyze Twitter, under the pretense that this library can overcome the limitations from the twitteR package.
When downloading data everything seems to work fabulously, using the filterStream function (just to clarify, the function captures Twitter data, just running it will provide the json file -saved in the working directory- that needs to be used in further steps):
filterStream( file.name="tweets_test.json",
track="NFL", tweets=20, oauth=credential, timeout=10)
Capturing tweets...
Connection to Twitter stream was closed after 10 seconds with up to 21 tweets downloaded.
However, when moving on to parse the json file, I keep getting all sorts of errors:
readTweets("tweets_test.json", verbose = TRUE)
0 tweets have been parsed.
list()
Warning message:
In readLines(tweets) : incomplete final line found on 'tweets_test.json'
Or with this function from the same package:
tweet_df <- parseTweets(tweets='tweets_test.json')
Error in `$<-.data.frame`(`*tmp*`, "country_code", value = NA) :
replacement has 1 row, data has 0
In addition: Warning message:
In stream_in_int(path.expand(path)) : Parsing error on line 0
I have tried reading the json file with jsonlite and rjson with the same results.
Originally, it seemed that the error came from special characters ({, then \) within the json file that I tried to clean up following the suggestion from this post, however, not much came out of it.
I found out about the streamR package from this post, which shows the process as very straight forward and simple (which it is, except for the parsing part!).
If any of you have experience with this library and/or these parsing issues, I'd really appreciate your input. I have been searching non stop but haven't been able to locate a solution.
Thanks!

Got error parsing message when import pretrain caffe model to chainer

I want to import Resnet50 pretrain file "ResNet-50-model.caffemodel" to chainer.
Here is chainer code:
class chexnet(L.ResNet50Layers):
def __init__(self, pretrained_model="auto", out_features=2):
super(chexnet, self).__init__(pretrained_model)
with self.init_scope():
self.classifier = L.Linear(2048, out_features)
But i got the error message as below :
File "/home/tamnt27/.local/lib/python3.5/site-packages/chainer/links/model/vision/resnet.py", line 148, in convert_caffemodel_to_npz
caffemodel = CaffeFunction(path_caffemodel)
File "/home/tamnt27/.local/lib/python3.5/site-packages/chainer/links/caffe/caffe_function.py", line 151, in __init__
net.MergeFromString(model_file.read())
google.protobuf.message.DecodeError: Error parsing message
I don't know why this error happens, it should work, please help me. Thank you all.
I tried to reproduce your situation, but could not.
My environment is
python2.7
chainer4.2.0
cupy4.2.0
I downloaded a model from
https://onedrive.live.com/?authkey=%21AAFW2-FVoxeVRck&id=4006CBB8476FF777%2117887&cid=4006CBB8476FF777
and placed it on ~/.chainer/dataset/pfnet/chainer/models/ResNet-50-model.caffemodel
I think the downloaded file is corrupted, therefore I recommend you to check md5sum by
$ md5sum ~/.chainer/dataset/pfnet/chainer/models/ResNet-50-model.caffemodel
44b20660c5948391734036963e855dd2
If the md5sum is different from mine, try download the model again.

Error reading hdf file in R

I have two hdf4 files namely file 1:"MYD04_L2.A2011001.2340.006.2014078044212.hdf" and file 2: "MYD04_L2.A2011031.mosaic.006.AOD_550_DT_DB_Combined.hdf". First one is raw data file with 72 sub-datasets and second one is the file I obtained after ordering (i.e. post-processed). For the first R code:
layer_name <- getSds("MYD04_L2.A2011001.2340.006.2014078044212.hdf",method="mrt")
layer_name$SDSnames[66:68]
[1] "AOD_550_Dark_Target_Deep_Blue_Combined"
[2] "AOD_550_Dark_Target_Deep_Blue_Combined_QA_Flag"
[3] "AOD_550_Dark_Target_Deep_Blue_Combined_Algorithm_Flag"
It works ok with method="gdal" as well. However, when I try to read file 2, a window pops up showing gdalinfo.exe has stopped working (method = "gdal"). The same kind of problem arises for mrt and it shows sdslist.exe has stopped working. I get following error message:
Error in sds[[i]] <- substr(sdsRaw[i], 1, 11) == "SDgetinfo: " :
attempt to select less than one element in integerOneIndex
Is single layer is the issue here? As the first one has 72 sub-data sets and second one has only one sub-data set (assuming because of the given file name as I couldn't read it), have R failed to read the data file? Can anyone propose any solution for reading such data files? If ncdf4 package is the solution with enabled hdf4, can anyone explain, step-by-step, how can I enable hdf4 and build ncdf4 using windows platform?

Resources