Apply a function on columns of a matrix [closed] - r

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I'm using a function ("myfunction") from an R library in this way:
myfunction(obj1, obj2, obj3, c("Name1", "Name2"))
where "Name1" and "Name2" are two gene names. Instead of retrieving information about this two genes, I would like to retrieve information on many other genes that are stored in a file with 1000 columns and 100 rows (100 rows are 100 gene names).
In other words, suppose my file is named fl1000. For each column I would like the following code:
myfunction(obj1, obj2, obj3, fl1000[,1])
myfunction(obj1, obj2, obj3, fl1000[,2])
myfunction(obj1, obj2, obj3, fl1000[,3])
....
myfunction(obj1, obj2, obj3, fl1000[,1000])
Since it is impossible to do so manually, how this can be done in a more compact and fast way?

Your function has four arguments - obj1, obj2, obj3, and an unnamed argument that seems to be vector of two names. It's not clear what those first three objs are - they vectors, single elements, or what?
So the first problem seems to be in what way could you possibly run a function of this sort on a single column from your rectangle of data. To get anything working with apply on that rectangle, you will need a function that takes as its input a single vector of 100 elements. Clearyl myfunction() is not such a function as it currently stands. However, if that vector can be fed in as eg obj1, and you just need to supply other things to be obj2 and obj3, it would be simple to adjust the function so it will work. But unless we have more of an idea of what you are doing we couldn't help more.
Edit (after question's edit)
The question still doesn't quite make sense to me, as the function looks like it wants a vector of Name1 and Name2, and you now want to give it a row with 100 values (not just 2).
But putting that aside, perhaps you want:
apply(fl1000, 2, function(x){myfunction(obj1, obj2, obj3, x)})

Related

good practice to use "$" and run a function in one line in R [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
Today i have seem a "strange" thing and am wondering if this is a good practice. Basically there is a list:
testList <- list("columnA" = c(1, 2, 3),
"columnB" = c(11,22,33))
and then a function:
calculateMean <- function(input){
out <- lapply(input, mean)
return(out)
}
and the this:
resultTest <- calculateMean(testList)$columnA
Question: Is this a good practice to refer to functions result without storing the results of a function in an intermediate step?
We may use sapply and return a named vector and store it as a single vector and use that for other cases i.e. suppose we want to take the max of that vector, it can be applied directly instead of unlist the list.
calculateMean <- function(input){
out <- sapply(input, mean)
return(out)
}
-ouptut
calculateMean(testList)
columnA columnB
2 22
Regarding storing the output, it depends i.e. if we want to extract the output of 'columnB', we may need to run it again and do $. Instead, save it as a single object and extract as needed
You ask if this is good practice. I'd say there are good and bad aspects to it.
On the positive side, it keeps your code simpler than if you defined a new variable to hold calculateMean(testList) when all you are interested in is one element of it. In some cases (probably not yours though) that could save a lot of memory: that variable might hold a lot of stuff that is of no interest, and it takes up space.
On the negative side, it makes your code harder to debug. Keeping expressions simple makes it easier to see when and why things aren't working. Each line of
temp <- calculateMean(testList)
resultTest <- temp$columnA
is simpler than the one line
resultTest <- calculateMean(testList)$columnA
In some situations you could use an informative name in the two-line version to partially document what you had in mind here (not temp!), making your code easier to understand.
If you were trying to single step through the calculation in a debugger, it would be more confusing, because you'd jump from the calculateMean source to the source for $ (or more likely, to the final result, since that's a primitive function).
Since the one-line version is relatively simple in your case, I'd probably use it, but in other situations I might split it into two lines.

Understanding the logic of R code [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I am learning R through tutorials, but I have difficulties in "how to read" R code, which in turn makes it difficult to write R code. For example:
dir.create(file.path("testdir2","testdir3"), recursive = TRUE)
vs
dup.names <- as.character(data.combined[which(duplicated(as.character(data.combined$name))), "name"])
While I know what these lines of code do, I cannot read or interpret the logic of each line of code. Whether I read left to right or right to left. What strategies should I use when reading/writing R code?
dup.names <- as.character(data.combined[which(duplicated(as.character(data.combined$name))), "name"])
Don't let lines of code like this ruin writing R code for you
I'm going to be honest here. The code is bad. And for many reasons.
Not a lot of people can read a line like this and intuitively know what the output is.
The point is you should not write lines of code that you don't understand. This is not Excel, you do not have but 1 single line to fit everything within. You have a whole deliciously large script, an empty canvas. Use that space to break your code into smaller bits that make a beautiful mosaic piece of art! Let's dive in~
Dissecting the code: Data Frames
Reading a line of code is like looking at a face for familiar features. You can read left to right, middle to out, whatever -- as long as you can lock onto something that is familiar.
Okay you see data.combined. You know (hope) it has rows and columns... because it's data!
You spot a $ in the code and you know it has to be a data.frame. This is because only lists and data.frames (which are really just lists) allow you to subset columns using $ followed by the column name. Subset-by the way- just means looking at a portion of the overall. In R, subsetting for data.frames and matrices can be done using single brackets[, within which you will see [row, column]. Thus if we type data.combined[1,2], it would give you the value in row 1 of column 2.
Now, if you knew that the name of column 2 was name you can use data.combined[1,"name"] to get the same output as data.combined$name[1]. Look back at that code:
dup.names <- as.character(data.combined[which(duplicated(as.character(data.combined$name))), "name"])
Okay, so now we see our eyes should be locked on data.combined[SOMETHING IS IN HERE?!]) and slowly be picking out data.combined[ ?ROW? , Oh the "name" column]. Cool.
Finding those ROW values!
which(duplicated(as.character(data.combined$name)))
Anytime you see the which function, it is just giving you locations. An example: For the logical vector a = c(1,2,2,1), which(a == 1) would give you 1 and 4, the location of 1s in a.
Now duplicated is simple too. duplicated(a) (which is just duplicated(c(1,2,2,1))) will give you back FALSE FALSE TRUE TRUE. If we ran which(duplicated(a)) it would return 3 and 4. Now here is a secret you will learn. If you have TRUES and FALSES, you don't need to use the which function! So maybe which was unnessary here. And also as.character... since duplicated works on numbers and strings.
What You Should Be Writing
Who am I to tell you how to write code? But here's my take.
Don't mix up ways of subsetting: use EITHER data.frame[,column] or data.frame$column...
The code could have been written a little bit more legibly as:
dupes <- duplicated(data.combined$name)
dupe.names <- data.combines$name[dupes]
or equally:
dupes <- duplicated(data.combined[,"name"])
dupe.names <- data.combined[dupes,"name"]
I know this was lengthy but I hope it helps.
An easier way to read any code is to break up their components.
dup.names <-
as.character(
data.combined[which(
duplicated(
as.character(
data.combined$name
)
)
), "name"]
)
For each of the functions - those parts with rounded brackets following them e.g. as.character() you can learn more about what they do and how they work by typing ?as.character in the console
Square brackets [] are use to subset data frames, which are stored in your environment (the box to the upper right if you're using R within RStudio contains your values as well as any defined functions). In this case, you can tell that data.combined is the name that has been given to such a data frame in this example (type ?data.frame to find out more about data frames).
"Unwrapping" long lines of code can be daunting at first. Start by breaking it down into parenthesis , brackets, and commas. Parenthesis directly tacked onto a word indicate a function, and any commas that lie within them (unless they are part of another nested function or bracket) separate arguments which contain parameters that modify the way the function behaves. We can reduce your 2nd line to an outer function as.character and its arguments:
dup.names <- as.character(argument_1)
Just from this, we know that dup.names will be assigned a value with the data type "character" off of a single argument.
Two functions in the first line, file.path() and dir.create(), contain a comma to denote two arguments. Arguments can either be a single value or specified with an equal sign. In this case, the output of file.path happens to perform as argument #1 of dir.create().
file.path(argument_1,argument_2)
dir.create(argument_1,argument_2)
Brackets are a way of subsetting data frames, with the general notation of dataframe_object[row,column]. Within your second line is a dataframe object, data.combined. You know it's a dataframe object because of the brackets directly tacked onto it, and knowing this allows you to that any functions internal to this are contributing to subsetting this data frame.
data.combined[row, column]
So from there, we can see that the internal functions within this bracket will produce an output that specifies the rows of data.combined that will contribute to the subset, and that only columns with name "name" will be selected.
Use the help function to start to unpack these lines by discovering what each function does, and what it's arguments are.

Correct way for R function to reference columns in user's dataframe? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I've got a function which operates on 5 columns of a data frame. Eventually I'd like to release this function so that others can use it with their own data.
What is the idiomatic R way to design a function to allow the user to pass in the 5 required columns?
I.e. my function wants to work on a dataframe which contains (at least) columns 'a', 'b', 'c', 'd', 'e', but in the user's data frame they are labelled differently, for example as 'foo', 'bar', etc...
There are several possibilities although none some particularly elegant to me:
Require the user to pass in the columns individually as 5 separate vector arguments
Require the user to name their columns in a specified way and pass in the data frame as a single argument
Require the user to order their columns in a specified way and pass in the data frame as a single argument
Pass in the data frame along with a vector consisting of the names of the required columns in this data frame
There is no one "best" way to do this. The advantage of different methods vary depending on the situation. In this instance, my personal preference is to give the function (at least) two arguments: the data.frame as "data" and a character vector containing the names of the variables.
Then, if you are applying the same operation to each of these, you may supply the character vector as the main argument to sapply or lapply.
myFunc <- function(data=NULL, variables=NULL) {
if(is.null(data) stop("need a dataset")
if(is.null(variables) | !is.character(variables) stop("variables improperly supplied")
}
sapply(variables, FUN=funciton(i) <doStuff to data[, i]>)
It is usually better to provide the names of the variables rather than their position, as the position might change across datasets.

write method in R -where does the file end up? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
It seems forloops in R do not work exactly the way I thought:
myVector <-c(0,0,0)
> for (i in 0:0){
s1<-sum(e1*data3[,i]);
s2<-sum(e2*data3[,i]);
s3<-sum(e3*data3[,i]);
hilf <- cbind(s1,s2,s3);
myVector <- cbind(myVector, help);
}
works but the result is:
> myVector
myVector s1 s2 s3
[1,] 0 0 0 0
now, I would expect something like 3 zeros.
Does anyone know why I get four dimensions instead of three?
Despite the fact that the code you posted is, eh, interesting, what is happening here has nothing to do with for-loops. It is your multiple cbind statements.
you are attempting to combine a tall 1x3 matrix with a vector of length 3. This will cause myVector to lose it's last two elements when combined, as you witness in the output you have pasted.
instead, if you transpose one of the two (either myVector or hilf) you might get something more in line with what you are looking for.
I am assuming that by help you meant hilf.

Correct the format of my function output [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
Here is my function to calculate beta for the China stock market:
mybeta <- function(company) {
require(quantmod)
setSymbolLookup(CSI300=list(name="000300.ss",src="yahoo"))
getSymbols("CSI300",from="2010-01-01",to="2011-01-01")
setSymbolLookup(SDB=list(name=company,src="yahoo"))
getSymbols("SDB",from="2010-01-01",to="2011-01-01")
csi=as.data.frame(weeklyReturn(CSI300))
sdb=as.data.frame(weeklyReturn(SDB))
cbeta=merge(csi, sdb, by="row.names")
cov(cbeta[2],cbeta[3])/var(cbeta[2])
}
when i input:
mybeta("600005.ss")
weekly.returns.y
weekly.returns.x 1.105631
I only want the 1.105631 from the output, not the "weekly.returns.y" and "weekly.returns.x". How can I do that?
It's clear English isn't your first language, so I will be patient.
You have revealed what you are actually trying to do, so your first two questions (one, two) could have been avoided because they are not useful for solving your actual problem.
Here is a modified version of your function that accomplishes the same goal, with a lot less unnecessary work.
mybeta <- function(company) {
if(!require(quantmod))
stop("quantmod must be installed; try install.packages('quantmod')")
setSymbolLookup(CSI300=list(name="000300.ss",src="yahoo"),
SDB=list(name=company,src="yahoo"))
getSymbols(c("CSI300","SDB"),from="2010-01-01",to="2011-01-01")
ret <- merge(weeklyReturn(CSI300),weeklyReturn(SDB))
cbeta <- cov(ret, use="pairwise.complete.obs")
cbeta[1,2]/cbeta[1,1]
}
Use as.numeric
Make the last line of your function
as.numeric(cov(cbeta[2],cbeta[3])/var(cbeta[2]))
As an aside, there is no reason to be using data.frames here. xts is awesome; embrace it.
Edit:
In addition to not needing to convert to data.frame, it's probably safer for your function to not have side-effects (for example, getSymbols("SDB") would return a different value depending on what you passed to mybeta last; also, getSymbols assigns data in your .GlobalEnv by default. You might consider using auto.assign=FALSE. This is how I would edit your function:
mybeta <- function(company) {
require("quantmod")
CSI300 <- getSymbols("000300.ss", src='yahoo', from="2010-01-01",
to="2011-01-01", auto.assign=FALSE)
SDB <- getSymbols(company, src='yahoo', from="2010-01-01", to="2011-01-01",
auto.assign=FALSE)
csi <- weeklyReturn(CSI300)
sdb <- weeklyReturn(SDB)
cbeta=merge(csi, sdb)
as.numeric(cov(cbeta[, 1], cbeta[, 2])/var(cbeta[, 1]))
}

Resources