RecordLinkage Package and RLBigDataLinkage-Class Objects - r

I am attempting to use R package RecordLinkage, and am using two articles by the package authors as usage guides, in addition to the package documentation.
I am using 2 large datasets (100k+ rows), which I hope to link, and so I am using those elements of the package which are built around S4 class RLBigDataLinkage.
I begin by running the following lines in R:
>library('RecordLinkage')
>data1 <- as.data.frame(#source)
>data2 <- as.data.frame(#source)
>rpairs <- RLBigDataLinkage(data1, data2, strcmp = 2:8, exclude = 9:10)
This works fine (though it takes some time), and writes the necessary .ff files to deal with the large data sets.
If I then try:
>rpairs <- epiWeights(rpairs)
Or:
>rpairs <- epiWeights(rpairs, e = 0.01, f = getFrequencies(rpairs))
Then when I run:
>summary(rpairs)
I get the error message:
Error in dbGetQuery(object#con, "select count(*) from data1") :
error in evaluating the argument 'conn' in selecting a method for function 'dbGetQuery': Error: no slot of name "con" for this object of class "RLBigDataLinkage"
If, on the other hand, I run:
>result <- epiClassify(rpairs, 0.5)
>getTable(result)
I get the error message:
Error in table.ff(object#data#pairs$is_match, object#prediction, useNA = "ifany") :
Only vmodes integer currently allowed - are you sure ... contains only factors or integers?
I'm clearly missing something about how these objects need to be handled. Does anyone have any experience with this package that sees my error? Thanks kindly.

when the type of 'rpairs' is 'RLBigDataLinkage' use print(rpairs) ,you will get the summary of rpairs.

Related

Trying to use Bert (R add-on for Excel) with auto.arima not working

I'm using Bert (R add on for Excel).
When I try to run the following:
sales <- sample(100:170, 4*10, replace = TRUE)
advertising <- sample(50:70, 4*10, replace = TRUE)
sales_ts <- ts(sales, frequency = 4, end = c(2017, 4))
fit <- forecast::auto.arima(sales_ts, xreg = advertising,d=1,ic=c("aic"))
The arima works.
But when I try to use auto.arima
fit.arima <- arima(sales_ts,xreg =advertising,order=c(3,1,1))
I get the following error:
Error in rbind(info, getNamespaceInfo(env, "S3methods")) : number of columns of matrices must match (see arg 2)
Please help!
I was able to run this in R itself, and faced no issues (it worked fine).
Based on your error message, it's possible there's a namespace/library clash causing this, or that one of your libraries/packages wasn't installed properly.
You can either:
remove the package(s)
Use find.package("insert package name here") to find its location, then remove.packages("insert package name here") to remove it. Re-install and proceed
add the package name itself when referencing the function, like you did with forecast, e.g.
# adding "stats" in front, if that's the package being used
fit.arima <- stats::arima(sales_ts,xreg =advertising,order=c(3,1,1))

Creating a compartive object in R from two dataframes for comparitive phylogenetics

I'm trying to read in two dataframes into a comparitive object so I can plot them using pgls.
I'm not sure what the error being returned means, and how to go about getting rid of it.
My code:
library(ape)
library(geiger)
library(caper)
taxatree <- read.nexus("taxonomyforzeldospecies.nex")
LWEVIYRcombodata <- read.csv("LWEVIYR.csv")
LWEVIYRcombodataPGLS <-data.frame(LWEVIYRcombodata$Sum.of.percentage,OGT=LWEVIYRcombodata$OGT, Species=LWEVIYRcombodata$Species)
comp.dat <- comparative.data(taxatree, LWEVIYRcombodataPGLS, "Species")
Returns error:
> comp.dat <- comparative.data(taxatree, LWEVIYRcombodataPGLS, 'Species')
Error in if (tabulate(phy$edge[, 1])[ntips + 1] > 2) FALSE else TRUE :
missing value where TRUE/FALSE needed
This might come from your data set and your phylogeny having some discrepancies that comparative.data struggles to handle (by the look of the error message).
You can try cleaning both the data set and the tree using dispRity::clean.data:
library(dispRity)
## Reading the data
taxatree <- read.nexus("taxonomyforzeldospecies.nex")
LWEVIYRcombodata <- read.csv("LWEVIYR.csv")
LWEVIYRcombodataPGLS <- data.frame(LWEVIYRcombodata$Sum.of.percentage,OGT=LWEVIYRcombodata$OGT, Species=LWEVIYRcombodata$Species)
## Cleaning the data
cleaned_data <- clean.data(LWEVIYRcombodataPGLS, taxatree)
## Preparing the comparative data object
comp.dat <- comparative.data(cleaned_data$tree, cleaned_data$data, "Species")
However, as #MrFlick suggests, it's hard to know if that solves the problem without a reproducible example.
The error here is that I was using a nexus file, although ?comparitive.data does not specify which phylo objects it should use, newick trees seem to work fine, whereas nexus files do not.

eqmcc function in R QCA package exiting with error

When I attempt to call eqmcc() against a truthTable object, the result is this error message:
Error: The outcome's length should be the same as the number of rows in the data.
Here's my script:
library(QCA); library (psych); library(readr)
gamson <- read_csv("/path/to/Gamson.csv", col_names = TRUE)
is.na(gamson)
ttACP2 <- truthTable(data=gamson, outcome = "ACP", conditions = "BUR, LOW, DIS, HLP", n.cut=3, incl.cut=0.750, sort.by="incl, n", complete=FALSE, show.cases=TRUE)
ttACP2
csACP2 <- eqmcc(ttACP2, details=TRUE, show.cases=TRUE, row.dom=TRUE, all.sol=FALSE, use.tilde=FALSE)
The is.na() function shows that there are no missing values in my data set. The data set contains 54 rows, of which the first is the column names. The truth table is generated according to expectations. But the minimization of the selected causal conditions fails.
I found a chunk of source code that matches the error message on line 90 here:
https://github.com/cran/QCApro/blob/master/R/pof.R
But I'm not competent enough in programming to understand what conditions lead to the error message being thrown.
This is because your dataset is a tibble instead of a dataframe. After loading the dataset, and before finding the truth table, do this:
gamson <- as.data.frame(gamson)
It should work after that. (The latest version of the eqmcc function is called minimize now.

How to include bootstrap values in phylogenetic trees in ape package

I am using the R package ape to analyze some sequences stored in a DNAbin object:
library(ape)
my.seq <- read.dna("sequences.txt", format = "clustal")
my.dist <- dist.dna(my.seq)
my.tree <- nj(my.dist)
I want to find the bootstrap values, so I use boot.phylo:
boot <- boot.phylo(my.tree, my.seq, FUN = function(xx) nj(dist.dna(xx)), B = 100)
But I get an error message saying:
Error in if (drop[j]) next : missing value where TRUE/FALSE needed
Any idea what this means, and how to fix it? I tried googling the error message, and I could not find anything.
Your if condition resulted in an NA.
It must have either TRUE or FALSEresult.

Error importing SPSS data into R

I imported a dataset in the .sav SPSS format, and I'm getting an error that I haven't seen before.
1: In read.spss("C:\\Users\\acer\\Desktop\\X\\X\\PIREDEU\\ees2009_v0.9_20110622.sav", ... :
C:\Users\acer\Desktop\X\X\PIREDEU\ees2009_v0.9_20110622.sav: File contains duplicate label for value 1.1 for variable V200
Error in cat(list(...), file, sep, fill, labels, append) :
argument 2 (type 'list') cannot be handled by 'cat'
This came up after I typed warnings(PIREDEU). I imported the data using the foreign library:
library(foreign)
PIREDEU<-read.spss("C:\\Users\\acer\\Desktop\\X\\X\\PIREDEU\\ees2009_v0.9_20110622.sav", use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE)
I've fiddled with various combinations for the latter three arguments of the read.spss function, and I've gotten nowhere.
Anyone have any suggestions?
I used the below one and it worked perfectly, just ignore the warning message and check data by typing its name:
mydata4<-read.spss("C:\\Work\\data.sav",use.value.labels=F,to.data.frame=T)
mydata4 # check data
Do you have long strings in the file - longer than 8 bytes? Statistics uses some special arrangements to handle those. It looks like the problem is with the value labels. If you can delete those (using SPSS) you might be able to get the rest of the data.
Try to read data without labels.
library(foreign)
PIREDEU <- read.spss("C:\\Users\\acer\\Desktop\\X\\X\\PIREDEU\\ees2009_v0.9_20110622.sav",
use.value.labels = F,
to.data.frame = T)
Does it work?
Convert the spss datafile into .por (portable file) and in R, install the packages hMisc, memisc and foreign and load the package using library(foreign), library(hMisc) and library(memisc).
Then type the following:
mydata <- spss.get("c:/mydata.por", use.value.labels=TRUE)
# last option converts value labels to R factors

Resources