I have been trying to run the pcpr2 package using the following tutorial: https://github.com/JoeRothwell/pcpr2
The data for this package is available in this link: https://github.com/JoeRothwell/pcpr2/raw/master/data/PCPR2data.RData
My datamatrix file is: https://drive.google.com/file/d/1wzzw2Jcui-IKICYc_QmTCvXqeFZ_3teX/view?usp=share_link
My metadata file is: https://drive.google.com/file/d/1d52Cj4qNvjTJox7n5uZKlmF9d8YPpfAn/view?usp=share_link
My code:
transcripts <- read.csv("test_matrix.csv", row.names = 1)
Z_metadata <- read.csv("test_trait.csv")
output <- runPCPR2(transcripts , Z_metadata, pct.threshold = 0.8)
Each time I try to use my data to run the codes from the pcpr2 package, I get the following error:
Error in runPCPR2(transcripts, Z_metadata, pct.threshold = 0.8) :
is.numeric(X_DataMatrix) is not TRUE
I tried converting my datamatrix into a numeric format by the following command:
transcripts = lapply(transcripts , as.double)
transcripts = do.call("cbind", transcripts)
However, this didn't work also. I get another error message:
Error in solve.default(crossprod(model.matrix(mod))) :
Lapack routine dgesv: system is exactly singular: U[15,15] = 0
I can tell something is wrong with my datamatrix format as the datatype and the class ain't the same as the ones used in the tutorial. However, I don't understand how to fix this. Any sort of help will be greatly appriciated.
Be careful when importing the test_matrix dataset (there are rownames and it should be a matrix) :
transcripts=as.matrix(read.csv("https://raw.githubusercontent.com/dtonmoy/PCPR2-data/main/test_matrix.csv",row.names=1))
Z_metadata=read.csv("https://raw.githubusercontent.com/dtonmoy/PCPR2-data/main/test_trait.csv")
output <- runPCPR2(transcripts , Z_metadata, pct.threshold = 0.8)
Related
I have an issue when I try to export a data frame with the library openxlsx to an Excel. When I tried, this error happen:
openxlsx::write.xlsx(usertl_lp, file = "Mi_Exportación.xlsx")
Error in x[is.na(x)] <- na.string : replacement has length zero
usertl_lp_clean <- usertl_lp %>% mutate(across(where(is.list), as.character))
openxlsx::write.xlsx(usertl_lp_clean, file = "Mi_Exportación.xlsx")
This error may be caused by cells containing vectors. So, using across to modify the vector to character.
I posted this here for others in need.
I think you are looking for the writeData function from the same package.
Check out writeFormula from the same package as well or even write_xlsx from the writexl package.
I was having a similar problem in a data frame, but, in my case, I was using the related openxlsx::writeData.
The data frame was generated using sapply, with functions which could deliver errors because of the data. So, I coded to fill with NA when an error were generated. I ended up with NaN and NAs in the same column.
What worked for me is conducting the following treatment before writeData:
df[is.na(df)]<-''
so, for your problem, the following may work:
df[is.na(df)]<-''
openxlsx::write.xlsx(as.data.frame(df), file = "df.xlsx", colNames = TRUE, rowNames = FALSE, append = FALSE)
I'm using the convert function in Highfrequency package in R. The dataset I'm using is TAQ downloaded from WRDS. The data looks like This.
The function convert suppose to convert the .csv into .RData files of xts objects.
I follow the instruction of the package and use the following code:
library(highfrequency)
from <- "2017-01-05"
to <- "2017-01-05"
format <- "%Y%m%d %H:%M:%S"
datasource <- "C:/Users/feimo/OneDrive/SFU/Thesis-Project/R/IBM"
datadestination <- "C:/Users/feimo/OneDrive/SFU/Thesis-Project/R/IBM"
convert( from=from, to=to, datasource=datasource,
datadestination=datadestination, trades = T, quotes = F,
ticker="IBM", dir = T, extension = "csv",
header = F, tradecolnames = NULL,
format=format, onefile = T )
But I got the following error message:
> Error in `$<-.data.frame`(`*tmp*`, "COND", value = numeric(0)) :
> replacement has 0 rows, data has 23855
I believe the default column names in the function is: c("SYMBOL", "DATE", "EX", "TIME", "PRICE", "SIZE", "COND", "CORR", "G127") which is different from my dataset, so I manually changed it in my .csv to match it. Then I got another error
>Error in xts(tdata, order.by = tdobject) : 'order.by' cannot contain 'NA', 'NaN', or 'Inf'
Tried to look at the original code, but couldn't find a solution.
Any suggestion would be really helpful. Thanks!
When I run your code on the data to which you provide a link, I get the second error you mention:
Error in xts(tdata, order.by = tdobject) :
'order.by' cannot contain 'NA', 'NaN', or 'Inf'
This error can be traced to these lines in the function highfrequency:::makeXtsTrades(), which is called by highfrequency::convert():
tdobject = as.POSIXct(paste(as.vector(tdata$DATE), as.vector(tdata$TIME)),
format = format, tz = "GMT")
tdata = xts(tdata, order.by = tdobject)
The error results from two problems:
The variable "DATE" in your data file is read into R as numeric, whereas it appears that the code creating tdobject expects tdata$DATE to be a character vector. You could fix this by manually converting that variable to a character vector:
tdata <- read.csv("IBM_trades.csv")
tdata$DATE <- as.character(tdata$DATE)
write.csv(tdata, file = "IBM_trades_DATE_fixed.csv", row.names = FALSE)
The variable "TIME_M" in your data file is not a time of the format "%H:%M:%S". It looks like it is only the minutes and seconds component of a more complete time variable, because values only contain one colon and the values before and after the colon vary from 0 to 59.9. Fixing this problem would require finding the hour component of the time variable.
These two problems result in tdobject being filled with NA values rather than valid date-times, which causes an error when xts::xts() tries to order the data by tdobject.
The more general issue seems to be that the function highfrequency::convert() expects your data to follow something like the format described here on the WRDS website, but your data has slightly different column names and possibly different value formats. I would recommend taking a close look at that WRDS page and the documentation for your data file and determining which variables in your data correspond to those described on that page (for instance, it's not clear to me that your data contains any variable that is equivalent to "G127").
I'm trying to make a simple boxplot with the following data:
pop.blind.cataract
2,994,231
17,038,617
87,572
2,130,689
2,425,043
26,551,580
8,332,035
377,354
2,554,610
8,734
128,809
396,198
619,308
25,922
1,944,676
I've tried both these commands and gotten both these errors:
boxplot( x=pop.blind.cataract, range=100)
Error in boxplot(x = pop.blind.cataract, range = 100) :
object 'pop.blind.cataract' not found
boxplot( x=cataract_opths$pop.blind.cataract, range=100)
Error in boxplot.default(x = cataract_opths$pop.blind.cataract, range = 100) :
adding class "factor" to an invalid object
I can't figure out what's going on. There are no "NA"s in the data. They are numbers. Can't figure out what's going on. Please help!
Thanks.
If I understand your question correctly, the problem is in your data. With the commas cleaned up and put into a character vector (use a data.frame if you like), it would look like this:
pop.blind.cataract <- c(2994231, 17038617, 87572, 2130689, 2425043, 26551580, 8332035, 377354, 2554610, 8734, 128809, 396198, 619308, 25922, 1944676)
Now just boxplot(pop.blind.cataract) should do the trick:
I am using the R package ape to analyze some sequences stored in a DNAbin object:
library(ape)
my.seq <- read.dna("sequences.txt", format = "clustal")
my.dist <- dist.dna(my.seq)
my.tree <- nj(my.dist)
I want to find the bootstrap values, so I use boot.phylo:
boot <- boot.phylo(my.tree, my.seq, FUN = function(xx) nj(dist.dna(xx)), B = 100)
But I get an error message saying:
Error in if (drop[j]) next : missing value where TRUE/FALSE needed
Any idea what this means, and how to fix it? I tried googling the error message, and I could not find anything.
Your if condition resulted in an NA.
It must have either TRUE or FALSEresult.
I am attempting to use R package RecordLinkage, and am using two articles by the package authors as usage guides, in addition to the package documentation.
I am using 2 large datasets (100k+ rows), which I hope to link, and so I am using those elements of the package which are built around S4 class RLBigDataLinkage.
I begin by running the following lines in R:
>library('RecordLinkage')
>data1 <- as.data.frame(#source)
>data2 <- as.data.frame(#source)
>rpairs <- RLBigDataLinkage(data1, data2, strcmp = 2:8, exclude = 9:10)
This works fine (though it takes some time), and writes the necessary .ff files to deal with the large data sets.
If I then try:
>rpairs <- epiWeights(rpairs)
Or:
>rpairs <- epiWeights(rpairs, e = 0.01, f = getFrequencies(rpairs))
Then when I run:
>summary(rpairs)
I get the error message:
Error in dbGetQuery(object#con, "select count(*) from data1") :
error in evaluating the argument 'conn' in selecting a method for function 'dbGetQuery': Error: no slot of name "con" for this object of class "RLBigDataLinkage"
If, on the other hand, I run:
>result <- epiClassify(rpairs, 0.5)
>getTable(result)
I get the error message:
Error in table.ff(object#data#pairs$is_match, object#prediction, useNA = "ifany") :
Only vmodes integer currently allowed - are you sure ... contains only factors or integers?
I'm clearly missing something about how these objects need to be handled. Does anyone have any experience with this package that sees my error? Thanks kindly.
when the type of 'rpairs' is 'RLBigDataLinkage' use print(rpairs) ,you will get the summary of rpairs.