What's wrong with order function in data frame? - r

DF <- data.frame(CpGId, tframe$t, tframe$p, q)
dimnames(DF)[[2]] <- c("CpGId", "t_value", "p_value", "q_value")
DFhyper <- DF[with(DF, q_value < 0.05 & t_value> 0), ]
DFhyper <- data.frame(DFhyper, row.names = NULL)
DFhyper <- DFhyper [order(p_value), ]
Until fourth line of code, things work fine but then why R gives an error stating p_value object not found?

R executes the bracketed expression first, without paying any attention to how it is going to be used. When you type
DFhyper[order(p_value),]
R will look for p_value in the current scope (probably the global scope), however, as this is bound into the dataframe, it will not be able to find it. You need to do something to tell it where this is located.
Either
DFhyper[order(DFhyper$p_value),]
or
DFhyper[with(DFhyper,order(p_value)),]
(or nearly equivalent, with(DFhyper,DFHyper[order(p_value),])) will work. The first command tells R specifically that you are referencing the column in the data frame, and the second tells R to look in the dataframe for the variable if it can't find it in scope.
Finally, you can just bind the dataframe into the scope as well, executing
attach(DFhyper)
DFhyper[order(p_value),]
The attach command adds the dataframe columns to the current scope. It can be useful for when you have many operations on the dataframe columns, but don't want to keep referencing it. You can then detach it with detach(DFhyper) when you are done.

It needs to be
DFhyper <- DFhyper [order(Dfhyper$p_value), ]

Related

Problems with renaming columns via variables in R

I'm having issues with a specific problem I have a dataset of a ton of matrices that all have V1 as their column names, essentially NULL. I'm trying to write a loop to replace all of these with column names from a list but I'm running into some issues.
To break this down to the most simple form, this code isn't functioning as I'd expect it to.
nameofmatrix <- paste('column_', i, sep = "")
colnames(eval(as.name(nameofmatrix))) <- c("test")
I would expect this to take the value of column_1 for example, and replace (in the 2nd line) with "test" as the column name.
I tried to break this down smaller, for example, if I run print(eval(as.name(nameofmatrix)) I get the object's column/rows printed as expected and if I run print(colnames(eval(as.name(nameofmatrix))) I'm getting NULL as expected for the column header (since it was set as V1).
I've even tried to manually type in the column name, such as colnames(column_1) <- c("test) and this successfully works to rename the column. But once this variable is put in the text's place as shown above, it does not work the same. I'm having difficulties finding a solution on how to rename several matrix columns after they have been created with this method. Does anyone have any advice or suggestions?
Note, the error I'm receiving on trying to run this is
Error in eval([as.name](nameofmatrix)) <- \`vtmp\` : could not find function "eval<-"
We could return the values of the objects in a list with get (if there are multiple objects use mget, then rename the objects in the list and update those objects in the global env with list2env
list2env(lapply(mget(nameofmatrix), function(x) {colnames(x) <- newnames
x}), .GlobalEnv)
It can also be done with assign
data(mtcars)
nameofobject <- 'mtcars'
assign(nameofobject, `colnames<-`(get(nameofobject),
c('mpg1', names(mtcars)[-1])))
Now, check the names of 'mtcars'
names(mtcars)[1]
#[1] "mpg1"

R Generic References to Data Frames and Variables

I would like to know how to make a reference to a data frame and variable generic, please. Say I have a data frame named 's' and a variable in that data frame named 'Y'.
Regular R code:
look = s$Y
What I would like to do:
data = s
variable = Y
look = data$variable (which functions the same as look = s$Y)
Any thoughts? The reason I would like to do this is that I have s$Y throughout my code, and later I may want to change s for t (or Y for some other variable), and don't want to have to go through all of my code manually replacing s$Y with t$Y where I need it changed.
Thanks!
This is the reason that the $-operator is considered poor-practice inside function definitions, i.e. it "locks you in" to a particular spelling of a column name. You are not going to do this, however:
variable = Y
Rather you are going to do this:
variable = "Y"
And that is because the first version would have caused the R-interpreter to go out and try to identify a value for the symbol Y someplace in what is known as its "search path" which is roughly speaking all that functions and values that have been called and are still being processed since code was started. In the case of the second version "Y" is its own value and no further searching is needed. With that fundamental confusion corrected you would now do this
look <- data[[ variable ]] # although using 'data' as a name is another "poor-practice"
Whereupon R will look for a value of variable and find it in the global environment, returning the character "Y" and delivering a column named "Y" from the dataset s. Column names are not considered first-class objects in R, whereas named dataframes are. The "names" of columns are not true R names (even though they are called colnames).. The $-operator is just shorthand for "[[" with a character value. Here's a full transcript to test this:
> s <- data.frame(Y=1:10, X=LETTERS[1:10]); data = s
>
> variable <- "Y"
>
> look1 <- data$Y; look2 <- data[["Y"]]
> identical(look1, look2)
[1] TRUE
The confusion that this "non-standard evaluation" (NSE) shorthand feature of R has caused new users appears to be one of the motivations for the creation of first the ggplot aes function and later the evolution of the package-dplyr and the tidyverse-bundle-of-packages. Those packages allow the use of non-quoted names or tokens to refer to column identities.
In addition to #42-'s answer, you can dynamically reference columns like this:
colName <- "something"
myDataFrame[,colname]
Edit: Since you also asked about dynamically referencing data.frames #Rich Scriven suggested making a function that takes the data.frame as an argument, which is one working solution. You can also just load the data you need at the top of your script, which is easy to change on the fly if you need:
fileName <- "file1.csv"
data <- read.table(fileName, header = TRUE, stringsAsFactors = FALSE)
As per -42 above, the best choice seems to be the packages referenced. Using a function is close but doesn't seem to allow 'data' and 'variable' to be generic in 'data$variable'.
Thanks everyone!

Check whether object exists in R

I am brand new to R, so please excuse anything that may seem overly obvious.
I am using apriori to evaluate frequent item sets. When I execute the code below and my subset call returns items, everything works great. The problem is when there is nothing returned on the subset (the criteria returns no subset). When it does this, I am receiving "object 'rulesMatchLHS' not found" when trying to construct a data frame for output. Can you please tell me what I am doing wrong when checking the validity of rulesMatchLHS on the ifelse line?
rules <- apriori(trnew, parameter=list(supp=0.01, conf=0.5, minlen=2, maxlen=2))
rulesMatchLHS <- subset(rules, lhs %ain% dataset1)
ifelse(exists(rulesMatchLHS),
OutputClient <- data.frame(lhs=labels(lhs(rulesMatchLHS))$elements, rhs=labels(rhs(rulesMatchLHS))$elements,rulesMatchLHS#quality),
OutputClient <- data.frame())
View(OutputClient)
Subset returns an empty data frame. So it does exist. Also exists requires that the parameter be a character string. You might want to change the exists to nrow in your ifelse. Here is a simple example to demonstrate:
test <- subset(iris, Species == "Fake")
typeof(test)
exists("test")
nrow(test) == 0

R: add column to dataframe, named based on formula

More 'feels like it should be' simple stuff which seems to be eluding me today. Thanks in advance for assistance.
Within a loop, that's within a function, I'm trying to add a column, and name it based on a formula.
I can bind a column & its name is taken from the bound object: data<-cbind(data,bothdata)
I can bind a column & manually name the bound object: data<-cbind(data,newname=bothdata)
I can bind a column which is the product of an equation & manually name the bound object: data<-cbind(data,newname2=bothdata-1)
Or another way: data <- transform(data, newColumn = bothdata-1)
What I can't do is have the name be the product of a formula. My actual formula-derived example name is paste("E_wgt",rev(which(rev(Esteps) == q))-1,"%") & equation for column: baddata - q.
A simpler one: data<-cbind(data,paste("magic",100,"beans")=bothdata-1). This fails because cbind isn't expecting the = even though it's fine in previous examples. Same fail for transform.
My first thought was assign but while I've used this successfully for creating forumla-named objects, I can't see how to get it to work for formula-named columns.
If I use an intermediary step to put the naming formula in an object container then use that, e.g.:
name <- paste("magic",100,"beans")
data<-cbind(data,name=bothdata-1)
the column name is "name" not "magic100beans". If I assign the equation result to an formula-named object:
assign(paste("magic",100,"beans"),bothdata-1)
Then try to cbind that via get:
data<-cbind(data,get(paste("magic",100,"beans")))
The column is called "get(paste("magic",100,"beans"))". Boo! Any thoughts anyone? It occurs to me that I can do cbind then separately colnames(data)[ncol(data)] <- paste("magic",100,"beans")) which I guess I'll settle for for now, but would still be interested to find if there was a direct way.
Thanks.
Chances are that cbind is overkill for your use case. In almost every instance, you can simply mutate the underlying data frame using data$newname2 <- data$bothdata - 1.
In the case where the name of the column is dynamic, you can just refer to it using the [[ operator -- data[["newcol"]] <- data$newname + 1. See ?'[' and ?'[.data.frame' for other tips and usages.
EDIT: Incorporated #Marek's suggestion for [["newcol"]] instead of [, "newcol"]
It may help you to know that data$col1 is the same than data[,"col1"] which is the same than data[,x] if x is "col1". This is how I usually access/set columns programmatically.
So this should work:
name <- paste("magic",100,"beans")
data[,name] <- obsdata-1
Note that you don't have to use the temporary variable name. This is equivalent to:
data$magic100beans <- obsdata-1
Itself equivalent, for a data.frame, to:
data<-cbind(data, magic100beans=bothdata-1)
Just so you know, you could also set the names afterwards:
old_names <- names(data)
name <- paste("magic",100,"beans")
data <- cbind(data, bothdata-1)
data <- setNames(data, c(old_names, name))
# or
names(data) <- c(old_names, name)

Need an explanation for a particular R code snippet

The following is the code for which i need an explanation for:
for (i in id) {
data <- read.csv(files[i] )
c <- complete.cases(data)
naRm <- data[c, ]
completeCases <- rbind(completeCases, c(i, nrow(naRm)))
as i understand, the variable c here stores multiple logical values. The line after, that seems foreign to me. How does data[c, ] work?
FYI, I am an R newbie.
complete.classes looks for all rows that are "complete", have no missing values. Here is the man page. Thus the completeCases object will tell you the number of "complete" rows in each file you have just read. You really don't need to store the value of i in the rbind call though as it is just the row number, so it is redundant. A vector would do just fine for this application.
Also looks like you are missing a close brackets or this isn't a complete chunk of code.

Resources