I am brand new to R, so please excuse anything that may seem overly obvious.
I am using apriori to evaluate frequent item sets. When I execute the code below and my subset call returns items, everything works great. The problem is when there is nothing returned on the subset (the criteria returns no subset). When it does this, I am receiving "object 'rulesMatchLHS' not found" when trying to construct a data frame for output. Can you please tell me what I am doing wrong when checking the validity of rulesMatchLHS on the ifelse line?
rules <- apriori(trnew, parameter=list(supp=0.01, conf=0.5, minlen=2, maxlen=2))
rulesMatchLHS <- subset(rules, lhs %ain% dataset1)
ifelse(exists(rulesMatchLHS),
OutputClient <- data.frame(lhs=labels(lhs(rulesMatchLHS))$elements, rhs=labels(rhs(rulesMatchLHS))$elements,rulesMatchLHS#quality),
OutputClient <- data.frame())
View(OutputClient)
Subset returns an empty data frame. So it does exist. Also exists requires that the parameter be a character string. You might want to change the exists to nrow in your ifelse. Here is a simple example to demonstrate:
test <- subset(iris, Species == "Fake")
typeof(test)
exists("test")
nrow(test) == 0
Related
This is what I'm trying to do:
I have a large excel sheet I'm importing to R.
The data needs to be cleaned so one of the procedures is to test for character length.
Once the program finds a string that is too long, it needs to prompt the operator for a replacement
The operator inputs an alternative, and the program replaces the original with the input text.
The code I have seems to work procedurally, but the variable I have is not overwriting the original value.
library(tidyr)
library(dplyr)
library(janitor)
library(readxl)
fileToOpen <-read_excel(file.choose(),sheet="Data")
MasterFile <- fileToOpen
#This line checks the remaining bad strings in the column
CPNErrors <- nrow(filter(MasterFile,nchar(Field_to_Check) > 26))
#This line selects the bad field from the first in the list of strings to exceed the limit
TEST <- select(filter(MasterFile,nchar(Field_to_Check) > 26),Field_to_Check)[1,]
#This is the loop -- prompts the operator for a replacement, assigns a variable to the input and then replaces the bad value in the data frame
while (CPNErrors >= 1) {message("Replace ",TEST," with what?"); var=readline();MasterFile$Field_to_Check[MasterFile$Field_to_Check == TEST] <- var;print(var)}
The prompt works and assigns the readline() to the var, but the code will not replace the original string as a variable. When I run the code separately outside the loop, it will replace as long as I input an exact string (no variable assignment), so there's some syntactical thing I'm missing.
I've been searching for hours, and am just starting out in R, so if anyone can offer any assistance I'd greatly appreciate it.
EDIT -- ok... I think I found the source of the problem, but I don't know how to fix it. When I run
MasterFile$Field_to_Check[MasterFile$Field_to_Check == TEST]
It comes with a null result, but if I run
MasterFile$Field_to_Check[MasterFile$Field_to_Check == "Some Text that's in the data frame"]
It comes out with a result. Any idea on why I can't filter this list by the variable? The TEST variable comes out as expected.
Try this approach with a for loop :
CPNErrors <- which(nchar(MasterFile$Field_to_Check) > 26)
for(i in CPNErrors){
var=readline(paste0("Replace ",MasterFile$Field_to_Check[i]," with what? "))
MasterFile$Field_to_Check[i] <- var
}
Let me start by saying I'm sure this has been answered before but I am unsure of what terms to search.
I have a few data frames that are named like df_A , df_B , and df_C and wish to send them all to ggplot. I tried to loop through them all but have been unsuccessful. Here is what I have now:
for (Param in c("A","B","C"){
chosen_df <- paste0("df_",Param)
ggplot(data=chosen_df...)
}
I receive back an error saying "data must be a data frame". This error makes sense to me since chosen_df is character vector rather than the actual data frame.
I have tried using noquote but to no avail.
My questions are:
1) What kind of search terms can I look up to solve this problem
2) How close am I to solving this problem?
We can use get to return the value of the object names as a string
for (Param in c("A","B","C"){
chosen_df <- get(paste0("df_",Param))
ggplot(data=chosen_df, ...)
}
Or with mget, return the values in a list
lst1 <- mget(ls(pattern = '^df_[A-Z]$'))
I'm running a large number of meta-analyses with metafor. To get an overview of the results, I wanted to put together vectors containing the main estimates (to combine them in a dataframe later on). Yet, for some of these calculations, I do not have enough primary studies yet, so R will not be able to create a model for this particular domain. Hence, I will get an error message when I try to create a vector at the end.
library(metafor)
r1<-c(NA,NA)
n1<-c(NA,NA)
data1<-data.frame(r1,n1)
escalc1<-escalc(measure="COR", ri=r1,ni=n1, data = data1, method=REML)
rma1<-rma(yi,vi, data=escalc1)
#note the program will not be able to calculate rma1, because k = 0.
r2<-c(.3,.2)
n2<-c(100,200)
data2<-data.frame(r2,n2)
escalc2<-escalc(measure="COR", ri=r2,ni=n2, data = data2, method=REML)
rma2<-rma(yi,vi, data=escalc2)
#it will create an object for rma2 though
estimates<-c(rma1$beta, rma2$beta)
#as rma2 exists but rma1 doesn't, R will no let me create a vector here
Is there a way to tell R to check if the object exists first and to put in NAs for all cases where no object has been created yet? Specifically, I want R to replace rma1$beta (which does not exist) with NA in the last line of code. Is that possible?
You can use tryCatch to tell R what to do as an alternative if an error occurs, e.g.,
library(metafor)
r1<-c(NA,NA)
n1<-c(NA,NA)
data1<-data.frame(r1,n1)
escalc1<-escalc(measure="COR", ri=r1,ni=n1, data = data1)
e1 <- tryCatch({
rma1<-rma(yi,vi, data=escalc1);
rma1$beta}, error = function(e) NA)
r2<-c(.3,.2)
n2<-c(100,200)
data2<-data.frame(r2,n2)
escalc2<-escalc(measure="COR", ri=r2,ni=n2, data = data2)
e2 <- tryCatch({
rma2<-rma(yi,vi, data=escalc2);
rma2$beta}, error = function(e) NA)
estimates<-c(e1, e2)
#[1] NA 0.2356358
DF <- data.frame(CpGId, tframe$t, tframe$p, q)
dimnames(DF)[[2]] <- c("CpGId", "t_value", "p_value", "q_value")
DFhyper <- DF[with(DF, q_value < 0.05 & t_value> 0), ]
DFhyper <- data.frame(DFhyper, row.names = NULL)
DFhyper <- DFhyper [order(p_value), ]
Until fourth line of code, things work fine but then why R gives an error stating p_value object not found?
R executes the bracketed expression first, without paying any attention to how it is going to be used. When you type
DFhyper[order(p_value),]
R will look for p_value in the current scope (probably the global scope), however, as this is bound into the dataframe, it will not be able to find it. You need to do something to tell it where this is located.
Either
DFhyper[order(DFhyper$p_value),]
or
DFhyper[with(DFhyper,order(p_value)),]
(or nearly equivalent, with(DFhyper,DFHyper[order(p_value),])) will work. The first command tells R specifically that you are referencing the column in the data frame, and the second tells R to look in the dataframe for the variable if it can't find it in scope.
Finally, you can just bind the dataframe into the scope as well, executing
attach(DFhyper)
DFhyper[order(p_value),]
The attach command adds the dataframe columns to the current scope. It can be useful for when you have many operations on the dataframe columns, but don't want to keep referencing it. You can then detach it with detach(DFhyper) when you are done.
It needs to be
DFhyper <- DFhyper [order(Dfhyper$p_value), ]
The following is the code for which i need an explanation for:
for (i in id) {
data <- read.csv(files[i] )
c <- complete.cases(data)
naRm <- data[c, ]
completeCases <- rbind(completeCases, c(i, nrow(naRm)))
as i understand, the variable c here stores multiple logical values. The line after, that seems foreign to me. How does data[c, ] work?
FYI, I am an R newbie.
complete.classes looks for all rows that are "complete", have no missing values. Here is the man page. Thus the completeCases object will tell you the number of "complete" rows in each file you have just read. You really don't need to store the value of i in the rbind call though as it is just the row number, so it is redundant. A vector would do just fine for this application.
Also looks like you are missing a close brackets or this isn't a complete chunk of code.