Trying to rename rows of a matrix with a for - r

I'm trying to rename rows in a matrix as follows:
M = matrix(0,50,1)
nf = 50
for (k in 1:50) {
filtro = k*(1/nf)
rownames(M[k,]) <- paste("p.v",filtro)
}
it gives me the following error:
Error in `rownames<-`(`*tmp*`, value = paste("p.v", filtro)) :
attempt to set 'rownames' on an object with no dimensions

In general, you can't use rownames(M[k,]) <- ... instead you must use rownames(M)[k] <- .... Why: the inner expression M[k,] returns a vector, which does not have a row name. What you are trying to do is change an attribute of the object M, so rownames(M) <- .. flows a little better.
Internally, there is a function called `rownames<-` that changes row names for its first argument. Technically, its two arguments are x (matrix or frame) and value (the new name(s)); this is called even if subsequently indexed with [ (as in bullet 2).
You can't use rownames(.)[k] <- ... because there are initially no row names, i.e., rownames(M) is initially NULL. You can get around this by assigning some names, even if arbitary:
rownames(M) <- seq(nrow(M)) # arbitrary, must be same length
for (k in 1:50) {
filtro = k*(1/nf)
rownames(M)[k] <- paste("p.v",filtro)
}
Better, though, use a vectorized approach, no for loop required:
rownames(M) <- paste("p.v", (1:50)/nf)
head(M)
# [,1]
# p.v 0.02 0
# p.v 0.04 0
# p.v 0.06 0
# p.v 0.08 0
# p.v 0.1 0
# p.v 0.12 0

Related

Optimization of speed in R

I am currently working on a function that works on a big matrix of 2 columns ( number of values > 2000 in general) and have a time problem.
here the head of my matrix :
matrix
here my function :
get <- function()
{
v <- sample(1:1e6,20000, replace=TRUE) #for example
table <- #mymatrix
for ( i in 1:nrow(table))
{
b <- which(v > table[i,1] & v < table[i,2]) #want index between 2 intervals
}
return(b)
}
the problem is the which it is too long when I repeat my loop for the whole table, and i can't find how to fix it (still learning in R).
As Andrey said in a comment, you’re only returning the result for the last row. You’re also not passing table into the function (in fact, your function has no arguments), and it’s also unclear what v represents and in particular why it has more values than table has rows.
However, assuming that you want the results for all rows, you can do two things:
Don’t use which, you probably don’t need numeric indices.
Use vectorisation instead of a for loop:
get = function(table) {
v = sample(1 : 1E6, 20000, replace = TRUE)
v > table[, 1] & v < table[, 2]
}
That’s it.
Here is the code that would, for every value in vector v tell you which of the bins it fell into.
tbl = matrix(c(0,224,
225,233,
234,239,
240,243,
244,290,
291,292),
byrow = TRUE,
ncol = 2);
v = c(0,100,224,
225, 230, 233,
235)
fi1 = findInterval(v, tbl[,1]+1)
fi2 = findInterval(v, tbl[,2]-1)
set = (fi1!=fi2)
b = double(length(v))
b[set] = fi1[set];
# show the results
cbind(value = v, bin = b)
# value bin
# [1,] 0 0
# [2,] 100 1
# [3,] 224 0
# [4,] 225 0
# [5,] 230 2
# [6,] 233 0
# [7,] 235 3

create multi-dimensional array with named dimensions and elements compactly

A while ago I wanted a simpler way of creating multidimensional arrays with named dimensions.
I ended up writing a function that has worked really well for me but I worry it may be somewhat of a hack and that there may be a better way of doing. So before passing this on to a colleague I'm seeking advice here.
What I want to do is to be able to create multi-dimensional arrays where the name of a dimension is specified by the name of a vector and the names of the elements are specified by the contents of that vector. e.g.
sex <- c("F","M")
name2 <- c("a","b","c")
This can be written out like so.
dimnames1 <- list( sex=sex, name2=name2 )
dim1 <- sapply(dimnames1, function(x) length(x))
a <- array(0,dim=dim1, dimnames=dimnames1)
a
name2
sex a b c
F 0 0 0
M 0 0 0
But I wanted to be able to keep this more compact :
I wrote this function that enables that.
array_named <- function( ...)
{
listArgs <- as.list(match.call()[-1])
#works only if args are specified by actual ranges not by a varname
#dimnames1 <- lapply(listArgs,eval)
#works, I'm not sure why n=3
dimnames1 <- lapply(listArgs,function(x){eval.parent(x, n=3)})
#setting dimensions of array from dimnames1
dim1 <- sapply(dimnames1, function(x) length(x))
#creating array and filling with fill value
a <- array(0, dim=dim1, dimnames=dimnames1)
return(a)
}
This allows passing vectors by name :
array_named( sex=sex, name2=name2 )
name2
sex a b c
F 0 0 0
M 0 0 0
and directly e.g.
array_named( a=c(1,2), b=c('x','y') )
b
a x y
1 0 0
2 0 0
Are there problems with this is there a more sensible way of doing ?

Can I further vectorize this function

I am relatively new to R, and matrix-based scripting languages in general. I have written this function to return the index's of each row which has a content similar to any another row's content. It is a primitive form of spam reduction that I am developing.
if (!require("RecordLinkage")) install.packages("RecordLinkage")
library("RecordLinkage")
# Takes a column of strings, returns a list of index's
check_similarity <- function(x) {
threshold <- 0.8
values <- NULL
for(i in 1:length(x)) {
values <- c(values, which(jarowinkler(x[i], x[-i]) > threshold))
}
return(values)
}
is there a way that I could write this to avoid the for loop entirely?
We can simplify the code somewhat using sapply.
# some test data #
x = c('hello', 'hollow', 'cat', 'turtle', 'bottle', 'xxx')
# create an x by x matrix specifying which strings are alike
m = sapply(x, jarowinkler, x) > threshold
# set diagonal to FALSE: we're not interested in strings being identical to themselves
diag(m) = FALSE
# And find index positions of all strings that are similar to at least one other string
which(rowSums(m) > 0)
# [1] 1 2 4 5
I.e. this returns the index positions of 'hello', 'hollow', 'turtle', and 'bottle' as being similar to another string
If you prefer, you can use colSums instead of rowSums to get a named vector, but this could be messy if the strings are long:
which(colSums(m) > 0)
# hello hollow turtle bottle
# 1 2 4 5

Change an element of global vector in a function

I have a vector in global environment and I want to create a function that modifies only one element of that vector. The problem is that the vector is too large and standard methods take too long to compute. See functions I already have, both of them are too slow.
x <- rep(0, 1e8)
f1 <- function(n,a) {
x <- x # loads the vector to current environment
x[n] <- a # changes the position in current environment
x <<- x # saves the vector to global environment
}
f2 <- function(n,a) {
x[n] <<- a # changes the vector element in global environment
}
system.time(f1(1,1)) # 0.34
system.time(f2(2,1)) # 0.30
system.time(x[3] <- 1) # 0.00
I am looking for something like this:
assign('x[4]', 1, .GlobalEnv)
For me, you can address this with data.table package as it manipulates object by reference.
For instance:
library(data.table)
data <- data.table(x=rep(0, 1e8))
f3 <- function(n,a){
data[n,x:=a]
return(TRUE)
}
system.time(f3(2,1)) # 0
print(data)
x
1: 0
2: 1
3: 0
4: 0
...
You can retrieve x as vector at any time with data[["x"]]

calculation conditional on the signs of an element in r

I know the question is silly but I really can't solve it. I just want to do different operations to the elements in a dataframe deppending on its sign. The following code generating a mock dataframe:
mock<-data.frame(matrix(NA,ncol=5,nrow=2))
colnames(mock)<-as.vector(c("m","n","1985-02-04","1985-02-05","1985-02-06"))
rownames(mock)<-as.vector(c("fund1","fund2"))
mock
mock[1,]<-c(0.001,0.0045,-0.03,0.25,NA)
mock[2,]<-c(0.004,0.0004,NA,0.12,-0.087)
mock
so it looks like
m n 1985-02-04 1985-02-05 1985-02-06
fund1 0.001 0.0045 -0.03 0.25 NA
fund2 0.004 0.0004 NA 0.12 -0.087
for each fund, m and n represent two different ratios, the last three figures are returns on the given days. I wish to do the following oerations:
if the return x on one day is positive, I need (x+m)/(1+n) to replace the corresponding figure in the dataframe.
If the return x is negative, I need x+m to replace the corresponding figure in the dataframe.
If it is NA on the day, I will leave it NA.
I tried the following code:
Grossreturn<-function(x){
a<-x[3:5]
m<-x[1]
p<-x[2]
a[a>0]<-(a[a>0]+m)/(1-p)
a[a<0]<-a[a<0]+m
return(a)
}
apply(mock,1,Grossreturn)
and of course it failed and the error message is:
Error in a[a > 0] <- (a[a > 0] + m)/(1 - p) :
NAs are not allowed in subscripted assignments
I really get stucked here and couldn't sort it out. Can someone help?
Thanks!
You should just exclude NAs from all your assignments. A sample syntax for doing this is below:
> foo = data.frame(x=runif(3)-0.5, y=runif(3)) #random data frame
> foo[2,1] <- NA #adding an NA
> foo
x y
1 -0.4616014 0.4892859
2 NA 0.4730237
3 0.4060813 0.1517448
If you now try to reassign without filtering out NAs, you get your error.
> foo[sign(foo$x)==-1, 1] <- -10
Error in `[<-.data.frame`(`*tmp*`, sign(foo$x) == -1, 1, value = -10) :
missing values are not allowed in subscripted assignments of data frames
But not if you explicitly leave out NAs:
> foo[sign(foo$x)==-1 & !is.na(foo$x), 1] <- -10
> foo
x y
1 -10.0000000 0.4892859
2 NA 0.4730237
3 0.4060813 0.1517448
Here is a code which solves your problem:
grossreturn <- function(x) {
m <- x[1]
n <- x[2]
# iterate over all date columns and compute new value
for (i in 3:length(x)) {
if (is.na(x[i]) {
# NA remains NA
} else if (x[i] < 0) {
x[i] <- x[i] + m # x + m
} else {
# x[i] >= 0
# includes case where x[i] == 0
x[i] <- (x[i] + m) / (1 + n) # (x + m) / (1 + n)
}
}
return x
}
result <- apply(mock, 1, FUN=function(x) grossreturn(x))
I wanted to use an apply function to iterate over the numerical columns after extracting out m and n, but there does not seem to be any vectorized apply functions which can also pass multiple parameters as input (so mapply would not be a vectorized solution).
I assumed that the case where a return is 0 that you wanted (x + m) / (1 + n). Also, you test whether R drops either the row or column names when you run this code.

Resources