Naming different variables and using i to subset a file - r

I want to go through a vector, name all variables with i and use i to subset a larger file.
Why this does not work?
x <- c(seq(.1,.9,.1),seq(.9,1,.01))
doplot <- function(y)
{
for (i in unique(y))
{
paste("f_", i, sep = "") <- (F_agg[F_agg$Assort==i,])
}
}
doplot(x)

There are several problems here. First of all, on the left hand side of <- you need a symbol (well, or a special function, but let's not get into that now). So when you do this:
a <- "b"
a <- 15
then a will be set to 15, instead of first evaluating a to be b and then set b to 15.
Then, if you create variables within a function, they will be (by default) local to that function, and destroyed at the end of the function.
Third, it is not good practice to create variables this way. (For details I will not go into now.) It is better to put your data in a named list, and then return the list from the function.
Here is a solution that should work, although I cannot test it, because you did not provide any test data:
doplot <- function(y) {
lapply(unique(y), function(i) {
F_agg[F_agg$Assort == i, ]
})
}

Related

How do I get an if else statement embedded in a for loop to choose one option over another if ANY of the submissions to the if statement are true?

I am working with microarray data within an ExpressionSet object downloaded from Gene Expression Omnibus. The rows of the expression data in this object are labeled with probe names, but for downstream analysis I really need the gene symbols.
Thankfully, the individuals that compiled this dataset included the corresponding gene symbols in the metadata that accompanies this kind of object.
I am trying to write a for loop within a function that looks at the list of variable labels (effectively row names for the metadata), determines whether a column called "GENE_SYMOBL" is present, then either writes those gene symbols to a vector, or moves on and converts the probe names to gene symbols using gprofileR.
I don't want my if else statement to run for each iteration of my for loop, I just want it to run after the if statement has determined if any of the row names are "GENE_SYMBOL".
So far I have written the for loop with the if statement but can't figure out how to put the condition if ANY of the column names match, then do A, if none match then do B.
nums <- as.data.frame(matrix(0, ncos = 27, nrow = 12))
feature_headers <- c(letters, "GENE_SYMBOL")
colnames(nums) <- feature_headers
for (i in 1:length(feature_headers)) {
if(feature_headers[i] == "GENE_SYMBOL") {
gene_symb <- nums[["GENE_SYMBOL"]]
}else{
#what else it does is more involved that this question needs be so
#I just wrote out something for the function to say
cat("boohoo no genes for you"\n)
}
}
Any help you could provide would be much appreciated and let me know if you need more information.
In your specific situation, R has a handy %in% operator that you can use to check this:
if ("GENE_SYMBOL" %in% feature_headers) {
#...
}
As a more general rule, if your goal is "if loop A meets condition B, then do action C", you can follow this pattern:
found <- FALSE
for (loopStatement) {
if(condition) {
found <- TRUE
break
}
}
if(found) {
doActionC()
}
This way, if you get through the whole list without finding the label, found is still FALSE- but if you do find the label, you don't do a bunch of unnecessary checking. This is essentially the gist of what's happening under the hood with %in%, and %in% is faster to write and probably faster to process. It's a good thing to know for other situations, though.
Also, the %in% operator can be used to check if the elements of one list are shared with another list!
You can add a boolean variable to record if your condition is hit in the for loop and then break to avoid unnecessary calculation
nums <- as.data.frame(matrix(0, ncos = 27, nrow = 12))
feature_headers <- c(letters, "GENE_SYMBOL")
colnames(nums) <- feature_headers
FALSE -> found
for (i in 1:length(feature_headers)) {
if(feature_headers[i] == "GENE_SYMBOL") {
TRUE -> found
break
}
}
if (found) {
dosomething()
}
else {
dosomethingelse()
}

Check if column exists and if it does check something about it

Hi I want to check if a column in a data.frame exists, and only if it does check another conditions.
I know I can use a nested if statement as I have in the example.
This is normally for checking inputs to functions. This is a working example which gives me the output I want, I just was wondering if there is a smarter way, as this can get messy especially if I am doing it for a number of conditions. My example:
testfun <- function(dat,...){
library(dplyr)
if("Site" %in% colnames(dat)){
#for example check number of sites, this condition could be anything though
if(n_distinct(dat$Site) > 1) stop ("Function must have site specific data")
}
#do stuff
return(1)
}
testdf1 <- data.frame(x = 1:10, y = 1:10)
testdf2 <- data.frame(x = 1:10, y = 1:10,Site = "A")
testdf3 <- data.frame(x = 1:10, y = 1:10,Site = rep(c("A","B"),each = 5))
testfun(testdf1)
testfun(testdf2)
testfun(testdf3)
Edit with a bit more context: For this example the reason for this is that the user may input data that is site specific and therefore doesn't have a Site column (i.e. they have a data.frame with data only at one site so they have never specified the site as a column) or they might be using a data.frame that has had data for a number of sites specified in a column. So if there is no Site column it is safe to assume that data is for one site and the its valid to continue calculations, but if there is a site column I have to check that it only has one distinct value (eg might have been filtered on this column before applying the function of applied through plyr::ddply).
There are a lot of other cases however where I want to check that my input data to a function is of the expected form, and if the input is a data.frame this often means checking for column names and something about that column
You can decide if this is a smarter way or not but one way is by separating the logic using map_if. Here we check the basic condition ("Site" %in% colnames(dat)) in predicate part and based on that we call two functions one for TRUE and other for FALSE. We still check similar conditions but by keeping the functions separate we can keep the code clean and it is easy to understand which part is doing what.
library(dplyr)
library(purrr)
testfun <- function(dat, ...) {
unlist(map_if(list(dat), "Site" %in% colnames(dat), true_fun, .else = false_fun))
}
true_fun <- function(dat) {
if(n_distinct(dat$Site) > 1) stop ("Function must have site specific data")
return(1)
}
false_fun <- function(dat) { return(1) }
testfun(testdf1)
#[1] 1
testfun(testdf2)
#[1] 1
testfun(testdf3)
Error in .f(.x[[i]], ...) : Function must have site specific data

Change all R columns names using a reference file

I am trying to rename columns in a dataframe in R. However, the renaming has circular referencing. I would like a solution to this problem, the circular referencing cannot be avoided. One way to think was to rename a column and move it to a new dataframe, hence, avoiding the circular referencing. However, I am unable to do so.
The renaming reference is as follows:
The current function I am using is as follows:
standard_mapping <- function(mapping.col, current_name, standard_name, data){
for(i in 1:nrow(mapping.col)) {
# i =32
print(i)
eval(parse(text = paste0("std.name = mapping.col[",i,",'",new_name,"']")))
eval(parse(text = paste0("data.name = mapping.col[",i,",'",old_name,"']")))
if(data.name %in% colnames(data)){
setnames(data, old=c(data.name), new = c(std.name))
}
}
return(data)
}
Mapping.col is referred to the image
You can rename multiple colums at the same time, and there's no need to move the data itself that's stored in your data.frame. If you know the right order, you can just use
names(data) <- mapping.col$new_name
If the order is different, you can use match to first match them to the right positions:
names(data) <- mapping.col$new_name[match(names(data), mapping.col$old_name)]
By the way, assigning names and other attributes is always done by some sort of assignment. The setNames returns something, that still needs assigning.

R assign a list of values to a list of objects

Thank you for trying to help. I am happy to be corrected on all R misdemeanors.
I am not sure that I was entirely clear with my earlier post as below, so I will hope to clarify:
In the R console, my calls 'use source (etc)' to a .R file
Code within the .R file uses variables (for e.g. 'extracted info' ) ex1, ex2, ex3. These may hold strings or (a string of) numbers pulled from text.
In line with your guidance I've renamed my function to 'reset' (and ?reset indicates no other occurrences) are in scope. I'm passing both x and y which from outside the function:
#send variables ex1, ex2, ex3 together with location, loc and parse, prs to be reset with 0
reset(x<-c(loc,prs,ex1,ex2,ex3),y<-rep(c(0),length(x))) #repeats 0 in y variable as many times as there are entries for x
reset<-function(x,y){
print(c("resetting ",x," with ", y))
if (length(x) == length(y)) {x <- y
print(paste(x,"=",y),sep="") #both x and y should now be equal (to y)
} else {
paste("list lengths differ: x=",length(x)," y=",length(y),sep="")
}
}
Now both x and y are 0 but ex1, ex2 and ex3 still contain the previous values
I would like ex1, ex2 and ex3 all to be 0 before they are used in a subsequent section of code, so they don't contaminate extracted data with previous values such as:
loc<-str_locate(data[i],"=")
prs<-str_locate(data[i],",")
#extract data from the end of loc to before the occurrence of prs
ex1<-str_sub(data[i],loc[2]+1,prs[1]-1)
#cleanup
#below is simplified for example;
#in reality I wish to send ex1:ex(n) to be reset with values val1:val(n)
The desired outcome would be that back in the Rconsole >ex1 should now return 0.
Hope you can understand my dilemma and possibly help.
Say my code uses some variables to hold data extracted from a string using Stringr str_sub. The variables are temporary in that I use the values to construct other strings then they should be freed up to be used in an upcoming test: i.e. if (test==true){extract<-str_sub(string, start, end)}
For a later test, I would like extract==0; simple enough, but I have a few of these and would like to do it in one fell swoop.
I've used a for loop, but if there is a simpler way, please identify this.
My attempt is using a function:
#For variables loc, prs, ex1 and x2, set all values to 0
x<-assign(x<-c(loc, prs, ex1, ex2),y<-rep(c(0),length(x)))
#Function
assign <- function(x, y) {
if(length(x)==length(y)){
for (i in 1:length(x)){x[i]<-y[i]}
print(c("Assigned",x[i]))
return (x)
} else { print (c("list lengths differ: x=",length(x)," y=",length(y)))
}
}
The problem being that this returns x as 0, but the list of variables retain their values.
I'm a bit of a noob to both r and SO, so although I've benefitted from SO's bountiful advice on numerous occasions, this is my first question, so please be gentle. I have searched this issue, but have not found what I need in a few hours now. Hope you can help.
Beware of naming a function assign. There is already one in base-r and you will create confusion.
There are a couple of problems with your function besides its name. First, you do not need the for-loop to replace x by y, as this is a basic vectorized operation. Just use x <- y ; second, your should wrap your message in paste.
asgn <- function(x, y) {
if(length(x)==length(y)){
## This step is not needed, return(y) is better as #Rick proposed in their now deleted answer
## I am leaving it to show you how the for-loop is not needed
x<-y
return (x)
} else {
print (paste("list lengths differ: x=",length(x)," y=",length(y)))
return(x)
}
}
Then, there are a couple of problems with your function call. You use <- instead of = to specify the arguments. They are only somewhat synonymous for assigning variables, but a function argument is another matter. Finally, you are trying to use x is the definition of y in the arguments (length(x)), but this is not possible, because it is not yet defined, so it is looking for x in the parent environment. You should test your function with length(3) instead.
x<-asgn(x=c(loc, prs, ex1, ex2),y=rep(c(0),length(3)))

R copying attributes over to anther object

I have an initial variable:
a = c(1,2,3)
attr(a,'name') <- 'numbers'
Now I want to create a new variable that is a subset of a and then have it have the same attributes as a. Is there like a copy.over.attr function or something around that does this without me having to go inside and identify which one is user defined attributes etc. This gets complicated when I have numerous attributes attached to a single variable.
It should be used with caution and care. There is mostattributes<-, which receives a list and attempts to set the attributes in the list to the object in its argument. At the very least, reading the source code will give you some nice ideas on how to check attributes between objects. Here's a little run on your sample a vector. It succeeds since it's not violating any properties of b
a = c(1,2,3)
attr(a,'name') <- 'numbers'
b <- a[-1]
attributes(b)
# NULL
mostattributes(b) <- attributes(a)
attributes(b)
# $name
# [1] "numbers"
Here's a sample of the source code where names are checked.
if (h.nam <- !is.na(inam <- match("names", names(value)))) {
n1 <- value[[inam]]
value <- value[-inam]
}
if (h.dim <- !is.na(idin <- match("dim", names(value)))) {
d1 <- value[[idin]]
value <- value[-idin]
}
if (h.dmn <- !is.na(idmn <- match("dimnames", names(value)))) {
dn1 <- value[[idmn]]
value <- value[-idmn]
}
attributes(obj) <- value
There is also attr.all.equal. It's not the operation you want, but I think you would benefit from reading that source code too. There are many good checks you can learn about in that one.
Wouldn't a simple attributes(b) <- attributes(a) work?
This will just be executed after creating b from a subset of the data in a, so it's not really a single statement, but should work.

Resources