I have collected 6 fastq-files from the same mock-sample and I merged them using gzip in linux for further using it with Kraken2. The output-file from Kraken2 (.report) was converted to .biom-format using Kraken-biom in linux. When I then try to import the .biom-file into R using import_biom-package I receive the following message:
Error in validObject(.Object) : invalid class “phyloseq” object:
Component sample names do not match. Try sample_names()
I have opened the .biom-file and can only see one sample name (the one I called the output-file during gzip). I tried to use sample_names(), but cant do it since the .biom-file is not loaded into R. Do anyone know why the sample names do not match since I merged them to one, so should it not be one sample name?
Edit: When I run Kraken2 on the 6 fastq-files without merging them and then using kraken-biom, it works to import the .biom-file into R.
Related
So I use SGA tools for processing my images. It gives back results in .dat files. Now in order to work on this data in R, I tried to import the .dat file using the haven package. I installed haven and then its library, but I am not able to import data still and it gives this error message.
Error: Failed to parse C:/Users/QuRana/Desktop/SGA Tools/Plate_Image_Example (1).dat: This version of the file format is not supported.
When I use this command install.packages("haven"), haven is loaded, but then when I load library using library(haven) nothing appears on my console except for this
> library(haven)
Then when I use this code:
datatrial1 <- read_dta("C:/Users/QuRana/Desktop/SGA Tools/Plate_Image_Example (1).dat")
It gives me the error mentioned above. When I try converting my .dat file to a .csv file and load my data, the imported data adds additional "t" values before the values in columns except for the first one like this:
Flags: S - Colony spill or edge interference C - Low colony circularity
# row\tcol\tsize\tcircularity\tflags
1\t1\t4355\t0.9053\t
1\t2\t4456\t0.8401\t
1\t3\t3439\t0.8219\t
1\t4\t3215\t0.8707\t
All the t's before the numeric values are not what I want. Another issue that I am facing is I cannot install the gitter package on my R version which is R 4.2.2.
You can read your tab separated file like so `read.delim("file_path", header = TRUE, sep = "\t")
I'm using the R package 'googleLanguageR' to transcribe various 30 second audio files (over 500 so want to automatize this). I've followed all the steps in the googleLanguageR tutorials, got my key, and authenticated through R.
I'm able to transcribe the test audio (.wav) that comes with the package, but whenever I apply the same function to my files (.mp3), I get NULL for both transcript and timings.
This is the code provided in tutorials:
# get the sample source file
test_audio <- system.file("woman1_wb.wav", package = "googleLanguageR")
gl_speech(test_audio)$transcript
If I use the same for my file, I get an empty element, so I've tried the following with no luck:
test_audio <- "/audio_location/filename.mp3"
gl_speech(test_audio)$transcript
Has anybody encountered a similar problem with this package or have any suspicions of why it produces NULL transcripts?
I am new to stackoverflow and python so please bear with me.
I am trying to run an Latent Dirichlet Analysis on a text corpora with the gensim package in python using PyCharm editor. I prepared the corpora in R and exported it to a csv file using this R command:
write.csv(testdf, "C://...//test.csv", fileEncoding = "utf-8")
Which creates the following csv structure (though with much longer and already preprocessed texts):
,"datetimestamp","id","origin","text"
1,"1960-01-01","id_1","Newspaper1","Test text one"
2,"1960-01-02","id_2","Newspaper1","Another text"
3,"1960-01-03","id_3","Newspaper1","Yet another text"
4,"1960-01-04","id_4","Newspaper2","Four Five Six"
5,"1960-01-05","id_5","Newspaper2","Alpha Bravo Charly"
6,"1960-01-06","id_6","Newspaper2","Singing Dancing Laughing"
I then try the following essential python code (based on the gensim tutorials) to perform simple LDA analysis:
import gensim
from gensim import corpora, models, similarities, parsing
import pandas as pd
from six import iteritems
import os
import pyLDAvis.gensim
class MyCorpus(object):
def __iter__(self):
for row in pd.read_csv('//mpifg.local/dfs/home/lu/Meine Daten/Imagined Futures and Greek State Bonds/Topic Modelling/Python/test.csv', index_col=False, header = 0 ,encoding='utf-8')['text']:
# assume there's one document per line, tokens separated by whitespace
yield dictionary.doc2bow(row.split())
if __name__ == '__main__':
dictionary = corpora.Dictionary(row.split() for row in pd.read_csv(
'//.../test.csv', index_col=False, encoding='utf-8')['text'])
print(dictionary)
dictionary.save(
'//.../greekdict.dict') # store the dictionary, for future reference
## create an mmCorpus
corpora.MmCorpus.serialize('//.../greekcorpus.mm', MyCorpus())
corpus = corpora.MmCorpus('//.../greekcorpus.mm')
dictionary = corpora.Dictionary.load('//.../greekdict.dict')
corpus = corpora.MmCorpus('//.../greekcorpus.mm')
# train model
lda = gensim.models.ldamodel.LdaModel(corpus=corpus, id2word=dictionary, num_topics=50, iterations=1000)
I get the following error codes and the code exits:
...\Python\venv\lib\site-packages\setuptools-28.8.0-py3.6.egg\pkg_resources_vendor\pyparsing.py:832: DeprecationWarning: invalid escape sequence \d
\...\Python\venv\lib\site-packages\setuptools-28.8.0-py3.6.egg\pkg_resources_vendor\pyparsing.py:2736: DeprecationWarning: invalid escape sequence \d
\...\Python\venv\lib\site-packages\setuptools-28.8.0-py3.6.egg\pkg_resources_vendor\pyparsing.py:2914: DeprecationWarning: invalid escape sequence \g
\...\Python\venv\lib\site-packages\pyLDAvis_prepare.py:387:
DeprecationWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing
I cannot find any solution and to be honest neither have any clue where exactly the problem comes from. I spent hours making sure that the encoding of the csv is utf-8 and exported (from R) and imported (in python) correctly.
What am I doing wrong or where else could I look at? Cheers!
DeprecationWarining is exactly that - warning about a feature being deprecated which is supposed to prompt the user to use some other functionality instead to maintain the compatibility in the future. So in your case I would just watch for the update of libraries that you use.
Starting with the last warning it look like it is originating from pandas and has been logged against pyLDAvis here.
The remaining ones come from pyparsing module but it does not seem that you are importing it explicitly. Maybe one of the libraries you use has a dependency and uses some relatively old and deprecated functionality. To eradicate the warning for the start I would check if upgrading does not help. Good luck!
import warnings
warnings.filterwarnings("ignore")
pyLDAvis.enable_notebook()
Try using this
I have two hdf4 files namely file 1:"MYD04_L2.A2011001.2340.006.2014078044212.hdf" and file 2: "MYD04_L2.A2011031.mosaic.006.AOD_550_DT_DB_Combined.hdf". First one is raw data file with 72 sub-datasets and second one is the file I obtained after ordering (i.e. post-processed). For the first R code:
layer_name <- getSds("MYD04_L2.A2011001.2340.006.2014078044212.hdf",method="mrt")
layer_name$SDSnames[66:68]
[1] "AOD_550_Dark_Target_Deep_Blue_Combined"
[2] "AOD_550_Dark_Target_Deep_Blue_Combined_QA_Flag"
[3] "AOD_550_Dark_Target_Deep_Blue_Combined_Algorithm_Flag"
It works ok with method="gdal" as well. However, when I try to read file 2, a window pops up showing gdalinfo.exe has stopped working (method = "gdal"). The same kind of problem arises for mrt and it shows sdslist.exe has stopped working. I get following error message:
Error in sds[[i]] <- substr(sdsRaw[i], 1, 11) == "SDgetinfo: " :
attempt to select less than one element in integerOneIndex
Is single layer is the issue here? As the first one has 72 sub-data sets and second one has only one sub-data set (assuming because of the given file name as I couldn't read it), have R failed to read the data file? Can anyone propose any solution for reading such data files? If ncdf4 package is the solution with enabled hdf4, can anyone explain, step-by-step, how can I enable hdf4 and build ncdf4 using windows platform?
I am doing following in Cooccur library in R.
> fb<-read.table("Fb6_peaks.bed")
> f1<-read.table("F16_peaks.bed")
everything is ok with the first two commands and I can also display the data:
> fb
> f1
But when I give the next command as given below
> explore_pairs(c("fb", "f1"))
I get an error message:
Error in sum(sapply(tf1_s, score_sample, tf2_hits = tf2_s, hit_list = hit_l)) :
invalid 'type' (list) of argument
Could anyone suggest something?
Despite promising to release a version to the Bioconductor depository in the article the authors published over a year ago, they have still not delivered. The gz file that is attached to the article is not of a form that my installation recognizes. Your really should be corresponding with the authors for this question.
The nature of the error message suggests that the function is expecting a different data class. You should be looking at the specification for the arguments in the help(explore_pairs) file. If it is expecting 2 matrices, then wrapping data.matrix around the arguments may solve the problem, but if it is expecting a class created by one of that packages functions then you need to take the necessary step to construct the right objects.
The help file for explore_pairs does exist (at least in the MAN directory) and says the first argument should be a character vector with further provisos:
\arguments{
\item{factornames}{an vector of character strings, each naming a GFF-like
data frame containing the binding profile of a DNA-binding factor.
There is also a load utility, load_GFF, which I assume is designed for creation of such files.
Try rename your data frame:
names(fb)=c("seq","start","end")
Check the example datasets. The column names are as above. I set the names and it worked.