writing user defined functions - r

I am new to writing functions and I'm not really sure where to start. Below is a subset of a data frame named m1 for this example. I would like to write a function that will go through the data set and extract length and depth information by number. For instance, if it encounters the number 1 it takes the length and depth and inserts them into the first row of a new data frame or vectors. It then does the same if the number equals 2 and so on.
length number depth
[1,] 109 1 10
[2,] 109 1 10
[3,] 109 1 10
[4,] 109 1 10
[5,] 109 1 10
[6,] 109 1 10
[7,] 109 1 10
[8,] 109 1 10
[9,] 109 1 10
[10,] 109 1 10
[11,] 109 1 10
[12,] 109 1 10
[13,] 107 2 10
[14,] 107 2 10
[15,] 107 2 10
[16,] 107 2 10
[17,] 107 2 10
[18,] 107 2 10
[19,] 107 2 10
[20,] 107 2 10
Here is an attempt at writing a function to get the output described above if the number equals 1.
length.fun=function(x)
{
lengths=numeric()
depth=numeric()
if (x[2]==1)
{
lengths=x[1]
depth=x[3]
}
return(cbind(depth,lengths))
}
length.fun(m1)
However, all I get as an output is this:
length.fun(m1)
depth lengths
Any help is greatly appreciated.
Thanks

Edit:
From you comment I understand that you want to get the unique rows. Fortunately, there is a function just for this:
unique(m1)
# length number depth
# [1,] 109 1 10
# [13,] 107 2 10
unique(m1)[,-2] will give you only the two columns. Use as.data.frame to turn a matrix into a data.frame.
m1 is a matrix. A matrix is just a vector with a dimension attribute. m1[2] gives you the second value in the vector, that is 109. Therefore your if condition is FALSE and you cbind empty vectors in your function.
This does what you want:
m1[m1[,2]==1,c(1,3)]
You should read up on matrix subsetting in R.
You can use debugging functions to inspect what happens. Here is an example:
First insert breakpoints in your function using browser.
length.fun=function(x)
{
lengths=numeric()
depth=numeric()
if (x[2]==1)
{browser("1")
lengths=x[1]
depth=x[3]
}
browser("2")
return(cbind(depth,lengths))
}
Now call the function using trace.
trace(length.fun(m1))
You will get a prompt, that allows you to inspect the state of variables.
> trace(length.fun(m1))
Called from: length.fun(m1)
Browse[1]> browserText()
[1] "2"
Browse[1]> lengths
numeric(0)
Browse[1]> Q
As you see, the first breakpoint that is reached is the second breakpoint. Thus, the condition of the if construct was FALSE and the code inside was never executed. This is also confirmed by the value of lengths.

EDIT: it is not clear from the question whether the data is in matrix or in dataframe form.
If it is a dataframe, then x[2] is a vector with length > 1. Therefore, your condition will test only the first element. If it is a matrix, see the explanation of #Roland.
As beginner, when writing function it is advised to go from "inside out". Namely, don't write the function first. Begin with simple code pieces. See what m1[2] gives. See what Boolean values are given by m1[2]==1 (whether this is expression is TRUE or FALSE). Then try running the condition. Only when the main/key portions of your code work as expected, with specific data at hand, wrap the function around that code.
The particular function you are trying to achieve must cycle through all values in the column 2. Therefore, some sort of loop is required, e.g. for or apply.

You can use the split function to split your data frame into a list of separate data frames. If your data frame is called foo then:
foo.split<-split(foo[,c('length','depth')],foo$number)
Given this list you can name each element of the list, extract the elements etc.
Note, this only works for data frames. If you have a matrix, you can convert it to a data frame using the data.frame() function.

Related

Import Excel Data into R

I'm working on an excel-file consisting of a 261 x 10 matrix. The matrix consists of the weekly returns of 10 stocks from 2010 to 2015. So, I have 10 variables (stocks) and 261 observations (weekly returns) for each variable.
For my master thesis I have to apply a "rearrangement algorithm" developed by Rüschendorf and Puccetti (2012) on my matrix. I'm not going into further details on the theorical side of that concept. The thing is that I downloaded a package capable of performing the rearrangement algorithm in R. I tested it out and it works perfectly.
Actually the only thing I need to know is how to import my excel-matrix into R in order to be capable of performing the rearrangement algorithm on it. I can rewrite my matrix into R (manually) just by encoding every element of the matrix by using the matrix programming formula in R:
A = matrix( c(), nrow= , ncol= , byrow=TRUE)
The problem is that doing so for such a big matrix (261 x 10) would be very time consuming. Is their any way to import my excel-matrix in R and that R recognizes it as matrix consisting of numerical values ready for calculations (similar to the case of doing it manually) ? In such a way that I just have to run the "rearrangement algorithm" function provided in R.
Thanks in advance.
I make a selection within an opened Excel sheet and copied to the clipboard. This then worked on a Mac:
> con=pipe("pbpaste")
> dat <- data.matrix( read.table(con) )
> dat
V1 V2 V3
[1,] 1 1 1
[2,] 2 2 2
[3,] 3 3 3
[4,] 4 4 4
[5,] 5 5 5
[6,] 6 6 6
[7,] 7 7 7
[8,] 8 8 8
[9,] 9 9 9
[10,] 10 10 10
[11,] 11 11 11
[12,] 12 12 12
[13,] 13 13 13
[14,] 14 14 14
The method is somewhat different on Windows devices but the help page for ?connections should have your OS-specific techniques.
You didn't provide a minimal reproducible example, so the answers are probably gonna of lesser quality. Anyway, you should be able to load the the excel file with something like:
require(XLConnect)
wrkbk <- loadWorkbook("path/to/your/data.xlsx")
df <- readWorksheet(wrkbk, sheet = 1, header = TRUE)
And then convert the data.frame to a matrix via
ans <- as.matrix(df)
Otherwise, you need to save your file as a .txt or .csv plain-text file and use read.table or read.csv and the like. Consult their respective help pages.

How to get values on testdata in RSNNS

I have two files, "testi" containing few numbers and "testo" containing their square roots. I have another test named file which contains some numbers for which I want their square roots. I used the command
model <- mlp(testi,testo,size=50,learnFuncParams = c(0.001),maxit = 5000)
xyz <- predict(model,test)
The values which I get from "xyz" are
xyz
#[1,] 0.9971085
#[2,] 0.9992253
#[3,] 0.9992997
#[4,] 0.9993009
#[5,] 0.9993009
#[6,] 0.9993009
#[7,] 0.9993009
Whereas "test" contains
1 4
2 16
3 36
4 64
5 100
6 144
7 196
Please let me know why does this happen?
mlp has logistic output, you need to specify linOut=TRUE. In general, normalizing your data would also help.

Filtering permutations to avoid running out of memory

The context of this problem is asset allocation. If I have N assets, and can allocate them in 5% chunks, what are the permutations that exist such that the sum of the allocation is exactly equal to 100%.
For example if I had 2 assets there would be 21 (created using my function "fMakeAllocationsWeb(2)" code at the bottom of this post:
[,1] [,2]
[1,] 0 100
[2,] 5 95
[3,] 10 90
[4,] 15 85
[5,] 20 80
[6,] 25 75
[7,] 30 70
[8,] 35 65
[9,] 40 60
[10,] 45 55
[11,] 50 50
[12,] 55 45
[13,] 60 40
[14,] 65 35
[15,] 70 30
[16,] 75 25
[17,] 80 20
[18,] 85 15
[19,] 90 10
[20,] 95 5
[21,] 100 0
The problem of course come when the number of assets increases, even modestly. This is understandable as with repetition the number of permutations is n^(n) and I'm not able to allocate the intermediate step of creating all permutations to memory. For example with 20 assets the number of permutations is 5.84258701838598E+27!!
I would like to be able to filter these on the fly (sum==100) so as to not run into the memory allocation issue. Digging into the code beneath gtools::permutations it seems to be vectorised and intervening there to filter seems impossible.
Would gratefully welcome any thoughts - ideally would prefer to stick with R code and packages.
Many thanks
Russ
installifMissing <- function(sPackageName) {
if (!sPackageName %in% installed.packages()) install.packages(sPackageName)
}
fMakeAllocationsWeb<-function(iNumAssets=10,iIncrement=5){
installifMissing("gtools")
require(gtools)
iAlloc<-seq(0,100,by=iIncrement) #'the allocation increments eg 0,5,10...,95,100
#'generate permutations
permut<-permutations(n=length(iAlloc),r=iNumAssets,v=iAlloc,repeats.allowed=TRUE)
#'filter permuatations for those which sum to exactly 100'
permutSum<-apply(permut,MARGIN=1,FUN=sum)
permut100<-permut[which(permutSum==100),]
return(permut100)
}
If you install the partitions package, you have the restrictedparts function that will enumerate all the ways you can add n numbers together to get a sum S. In your case, you want to restrict the summands to be multiples of 5, and the restriction is to add up to S=100. Instead, divide your summands by 5 and have the total add up to 20. If you want 2 assets, then the code restrictedparts(100/5,2) * 5 will give you the 10 unordered pairs.
You can then loop through the columns and enumerate, for each, the set of all permutations of asset allocations. You'll have to deal carefully with the case where there are repeated elements - for example, we generate {100,0} which represents <100,0> and <0,100> whereas {50,50} only represents the single allocation <50,50>. You can deal with this by using the set attribute of permuatations
restrictedparts(100/5,20) * 5 gives 627 partitions that add up to 100% - and you'll need to permute each of these to get your full list of allocations.
In your problem, you will still have large number of combinations to deal with even after filtering.
Your problem essentially boils down to n multichoose k problem as described here
You want to choose k=20 slots of 5% weightage each to allocate from n assets.
So in your example case of 20 assets, your number of combinations would still be
choose(39, 20)
## [1] 68923264410
I suggest you have a look at DEoptim package which has specific examples directly related to your problem at hand. It uses differential evolution.

R Pooled DataFrame analysis

I'm trying to perform several analysis on subsets of data in a dataframe in R, and i was wondering if there is generic way for doing this.
Say, I have a dataframe like:
one two three four
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 11 18
[4,] 4 9 11 19
[5,] 5 10 15 20
how could I apply some computation (e.g. cumulative counting) based upon values in col "one" condition upon (grouped by) the value in col "three".
That is, I wanna do stuff to one column, based upon grouping in another column. I can do this with loops, but I feel there might be standard ways to do this all at once.
thank you in advance!
ddply(data, .(coln), Stat) does the trick exactly

Finding the index of the minimum value which is larger than a threshold in R

This is probably very simple, but I'm missing the correct syntax in order to simplify it.
Given a matrix, find the entry in one column which is the lowest value, greater than some input parameter. Then, return an entry in a different column on that corresponding row. Not very complicated... and I've found something that works but, a more efficient solution would be greatly appreciated.
I found this link:Better way to find a minimum value that fits a condition?
which is great.. but that method of finding the least entry loses the index information required to find a corresponding value in a corresponding row.
Let's say column 2 is the condition column, and column 1 is the one I want to return.... currently I've made this: (note that this only works because row two is full of numbers which are less than 1).
matrix[which.max((matrix[,2]>threshhold)/matrix[,2]),1]
Any thoughts? I'm expecting that there is probably some quick and easy function which has this effect... it's just never been introduced to me haha.
rmk's answer shows the basic way to get a lot of info out of your matrix. But if you know which column you're testing for the minimum value (above your threshold), and then want to return a different value in that row, maybe something like
incol<- df[,4] # select the column to search
outcol <- 2 # select the element of the found row you want to get
threshold <- 5
df[ rev(order(incol>threshold))[1] ,outcol]
You could try the following. Say,
df <- matrix(sample(1:35,35),7,5)
> df
[,1] [,2] [,3] [,4] [,5]
[1,] 18 16 27 19 31
[2,] 24 1 7 12 5
[3,] 28 35 23 4 6
[4,] 33 3 25 26 15
[5,] 14 10 11 21 20
[6,] 9 2 32 17 13
[7,] 30 8 29 22 34
Say your threshold is 5:
apply(df,2,function(x){ x[x<5] <- max(x);which.min(x)})
[1] 6 7 2 2 2
Corresponding to the values:
[1] 9 8 7 12 5
This should give you the index of the smallest entry in each column greater than threshold according to the original column indexing.

Resources