What if one wants to apply a functon i.e. to each row of a matrix, but also wants to use as an argument for this function the number of that row. As an example, suppose you wanted to get the n-th root of the numbers in each row of a matrix, where n is the row number. Is there another way (using apply only) than column-binding the row numbers to the initial matrix, like this?
test <- data.frame(x=c(26,21,20),y=c(34,29,28))
t(apply(cbind(as.numeric(rownames(test)),test),1,function(x) x[2:3]^(1/x[1])))
P.S. Actually if test was really a matrix : test <- matrix(c(26,21,20,34,29,28),nrow=3) , rownames(test) doesn't help :(
Thank you.
What I usually do is to run sapply on the row numbers 1:nrow(test) instead of test, and use test[i,] inside the function:
t(sapply(1:nrow(test), function(i) test[i,]^(1/i)))
I am not sure this is really efficient, though.
If you give the function a name rather than making it anonymous, you can pass arguments more easily. We can use nrow to get the number of rows and pass a vector of the row numbers in as a parameter, along with the frame to be indexed this way.
For clarity I used a different example function; this example multiplies column x by column y for a 2 column matrix:
test <- data.frame(x=c(26,21,20),y=c(34,29,28))
myfun <- function(position, df) {
print(df[position,1] * df[position,2])
}
positions <- 1:nrow(test)
lapply(positions, myfun, test)
cbind()ing the row numbers seems a pretty straightforward approach. For a matrix (or a data frame) the following should work:
apply( cbind(1:(dim(test)[1]), test), 1, function(x) plot(x[-1], main=x[1]) )
or whatever you want to plot.
Actually, in the case of a matrix, you don't even need apply. Just:
test^(1/row(test))
does what you want, I think. I think the row() function is the thing you are looking for.
I'm a little confuse so excuse me if I get this wrong but you want work out n-th root of the numbers in each row of a matrix where n = the row number. If this this the case then its really simple create a new array with the same dimensions as the original with each column having the same values as the corresponding row number:
test_row_order = array(seq(1:length(test[,1]), dim = dim(test))
Then simply apply a function (the n-th root in this case):
n_root = test^(1/test_row_order)
Related
I have a function that outputs a list containing strings. Now, I want to check if this list contain strings which are all 0's or if there is at least one string which doesn't contain all 0's (can be more).
I have a large dataset. I am going to execute my function on each of the rows of the dataset. Now,
Basically,
for each row of the dataset
mylst <- func(row[i])
if (mylst(contains strings containing all 0's)
process the next row of the dataset
else
execute some other code
Now, I can code the if-else clause but I am not able to code the part where I have to check the list for all 0's. How can I do this in R?
Thanks!
You can use this for loop:
for (i in seq(nrow(dat))) {
if( !any(grepl("^0+$", dat[i, ])) )
execute some other code
}
where dat is the name of your data frame.
Here, the regex "^0+$" matches a string that consists of 0s only.
I'd like to suggest solution that avoids use of explicit for-loop.
For a given data set df, one can find a logical vector that indicates the rows with all zeroes:
all.zeros <- apply(df,1,function(s) all(grepl('^0+$',s))) # grepl() was taken from the Sven's solution
With this logical vector, it is easy to subset df to remove all-zero rows:
df[!all.zeros,]
and use it for any subsequent transformations.
'Toy' dataset
df <- data.frame(V1=c('00','01','00'),V2=c('000','010','020'))
UPDATE
If you'd like to apply the function to each row first and then analyze the resulting strings, you should slightly modify the all.zeros expression:
all.zeros <- apply(df,1,function(s) all(grepl('^0+$',func(s))))
I imagine there is a simple function for this but I can't seem to find it. I have five columns within a larger data frame that I want to add to get a single sum. Here's what I did, but I am wondering if there is a much simpler way to get the same result:
count <- subset(NAMEOFDATA, select=c(COL1,COL2,COL3,COL4,COL5))
colcount <- as.data.frame(colSums(count))
colSums(colcount)
The sum function should do that:
sum(count)
Unlike "+" which is vectorized, sum will "collapse" its arguments and it will accept a data.frame argument. If some of the arguments are logical, then TRUE==1 and FALSE==0 for purposes of summation, which makes the construction sum(is.na(x)) possibly useful.
Always easier with a reproducible example, but here's an attempt:
apply( NAMEOFDATA[,paste0("COL",seq(5))], 1, sum )
I am trying to find out which element of each of my sub-lists is the minimum for that particular sub-list. The current chunk of data I am trying to apply the functionality to is a record of 41 entries. They get grouped by another function that produces indices for each of the sub-lists or sub-group. Elements 1:8 are in the first sub-group, the following sub-groups are as follow: 9:17, 18:23, 24:33, 34:41. Please note I called the data I am working with "b1", and the index created to group b1's element into sub-groups is "indx". I am able to find out the minimum value in each sub-group using sapply like this:
sapply(indx, function(i) min(b1[i])
But, I am stuck at finding which "b1" element is each of these numbers sapply provided above. I know I probably need the function which() and mapply(), but have not been able to put it together.
Reproducible data:
b1 <- sample(1:20,41,T)
starts <- c(1,9,18,24,34)
stops <- c(8,17,23,33,41)
indx <- mapply(seq, from=starts, to=stops)
You basically figured it out yourself.
Try
sapply(indx, function(i) which.min(b1[i]))
Edit
I'm not sure anymore if that is actually what you want. The answer above should return you the index of the minimum element within each subgroup.
In that case you could do the following (one of probably quite a few possible ways):
indices <- 1:length(b1)
sapply(indx, function(i) indices[i][which.min(b1[i])])
I have the following problem within R:
I'm working with a huge matrix. Some of the columns contain the value 'zero', which leads to problems during my further work.
Hence, I want to identify the columns, which contain at least one value of 'zero'.
Any ideas how to do it?
If you have a big matrix then this would be probably faster than an apply solution:
mat[,colSums(mat==0)<0.5]
lets say your matrix is called x,
x = matrix(runif(300), nrow=10)
to get the indices of the columns that have at least 1 zero:
ix = apply(x, MARGIN=2, function(col){any(col==0)})
My dataframe(m*n) has few hundreds of columns, i need to compare each column with all other columns (contingency table) and perform chisq test and save the results for each column in different variable.
Its working for one column at a time like,
s <- function(x) {
a <- table(x,data[,1])
b <- chisq.test(a)
}
c1 <- apply(data,2,s)
The results are stored in c1 for column 1, but how will I loop this over all columns and save result for each column for further analysis?
If you're sure you want to do this (I wouldn't, thinking about the multitesting problem), work with lists :
Data <- data.frame(
x=sample(letters[1:3],20,TRUE),
y=sample(letters[1:3],20,TRUE),
z=sample(letters[1:3],20,TRUE)
)
# Make a nice list of indices
ids <- combn(names(Data),2,simplify=FALSE)
# use the appropriate apply
my.results <- lapply(ids,
function(z) chisq.test(table(Data[,z]))
)
# use some paste voodoo to give the results the names of the column indices
names(my.results) <- sapply(ids,paste,collapse="-")
# select all values for y :
my.results[grep("y",names(my.results))]
Not harder than that. As I show you in the last line, you can easily get all tests for a specific column, so there is no need to make a list for each column. That just takes longer and takes more space, but gives the same information. You can write a small convenience function to extract the data you need :
extract <- function(col,l){
l[grep(col,names(l))]
}
extract("^y$",my.results)
Which makes you can even loop over different column names of your dataframe and get a list of lists returned :
lapply(names(Data),extract,my.results)
I strongly suggest you get yourself acquainted with working with lists, they're one of the most powerful and clean ways of doing things in R.
PS : Be aware that you save the whole chisq.test object in your list. If you only need the value for Chi square or the p-value, select them first.
Fundamentally, you have a few problems here:
You're relying heavily on global arguments rather than local ones.
This makes the double usage of "data" confusing.
Similarly, you rely on a hard-coded value (column 1) instead of
passing it as an argument to the function.
You're not extracting the one value you need from the chisq.test().
This means your result gets returned as a list.
You didn't provide some example data. So here's some:
m <- 10
n <- 4
mytable <- matrix(runif(m*n),nrow=m,ncol=n)
Once you fix the above problems, simply run a loop over various columns (since you've now avoided hard-coding the column) and store the result.