which element is the minimum value of each sublist - r

I am trying to find out which element of each of my sub-lists is the minimum for that particular sub-list. The current chunk of data I am trying to apply the functionality to is a record of 41 entries. They get grouped by another function that produces indices for each of the sub-lists or sub-group. Elements 1:8 are in the first sub-group, the following sub-groups are as follow: 9:17, 18:23, 24:33, 34:41. Please note I called the data I am working with "b1", and the index created to group b1's element into sub-groups is "indx". I am able to find out the minimum value in each sub-group using sapply like this:
sapply(indx, function(i) min(b1[i])
But, I am stuck at finding which "b1" element is each of these numbers sapply provided above. I know I probably need the function which() and mapply(), but have not been able to put it together.
Reproducible data:
b1 <- sample(1:20,41,T)
starts <- c(1,9,18,24,34)
stops <- c(8,17,23,33,41)
indx <- mapply(seq, from=starts, to=stops)

You basically figured it out yourself.
Try
sapply(indx, function(i) which.min(b1[i]))
Edit
I'm not sure anymore if that is actually what you want. The answer above should return you the index of the minimum element within each subgroup.
In that case you could do the following (one of probably quite a few possible ways):
indices <- 1:length(b1)
sapply(indx, function(i) indices[i][which.min(b1[i])])

Related

R: Removing vector entries from a list of vectors after comparison using operator

I'm trying to remove elements smaller than a given number from the vectors contained in a list. I manage to find exactly which elements in the vector meet my criteria, but somehow I'm failing to select them.
myList <- list(1:7,4:7,5:10)
lapply(myList, function(x)`>`(x ,5))
...
Rmagic
...
desiredoutput <- list(6:7,6:7,6:10)
I'm sure it's something to do with `[` but I can't figure it out and searching for this problem is a nightmare.
We need to extract the elements based on the logical index (x>=6)
lapply(myList, function(x) x[x>= 6])

Remove duplicates from list elements

I am trying to remove rows that have duplicate entries, as defined by two columns, from multiple dataframes located in a single list.
Simple data:
aa <- data.frame(a=rnorm(100),b=rnorm(100),x=rnorm(100),y=rnorm(100),Z=rep(1:4, each=25))
split.aa<-split(aa, aa$Z)
For each df in the list 'split.aa' I am trying to remove rows with duplicated x,y pairs.
I could do this one df a time with:
split[[z]][!duplicated(split[[z]][,c('x','y')]),]
where z is the name of each df within 'split.aa'.
How would I write this into lapply so that the action is performed on each element?
I am having a hard time wrapping my head around how to refer to the specific list elements within the lapply function.
lapply(split.aa, function(x) x[!duplicated(x[c("x", "y")]), ])
will do the trick.
just define a function in lapply
lapply(split.aa, function(x) x[!duplicated(x[c("x", "y")]), ])

in R: combine columns of different dataframes

I try to combine each columns of three different dataframes to get an object with the same length of the original dataframe and three columns of every subobject. Each of the original dataframe has 10 columns and 14 rows.
I tried it with a for-loop, but the result is not usable for me.
t <- NULL
for(i in 1 : length(net)) {
a <- cbind(imp.qua.00.09[i], exp.qua.00.09[i], net[i])
t <- list(t, a)
}
t
But in the end I would like to get 10 seperated dataframes with three columns.
So I want to loop through this:
a <- cbind(imp.qua.00.09[i], exp.qua.00.09[i], net[i])
for every column of each original dataframe. But if I use t <- list(t, a) it constructs a crazy list. Thanks.
The code you're using to append elements to t is wrong, you should do in this way:
t <- list()
for(i in 1:length(net)) {
a <- cbind(imp.qua.00.09[i], exp.qua.00.09[i], net[i])
t[[length(t)+1]] <- a
}
t
Your code is wrong since at each step, you transform t into a list where the first element is the previous t (that is a list, except for the first iteration), and the second element is the subset. So basically in the end you're getting a sort of recursive list composed by two elements where the second one is the data.frame subset and the first is again a list of two elements with the same structure, for ten levels.
Anyway, your code is equivalent to this one-liner (that is probably more efficient since it does not perform any list concatenation):
t <- lapply(1:length(net),
function(i){cbind(imp.qua.00.09[i], exp.qua.00.09[i], net[i])})
This should work:
do.call(cbind,list(imp.qua.00.09, exp.qua.00.09, net))

Evaluating dataframe and storing the result

My dataframe(m*n) has few hundreds of columns, i need to compare each column with all other columns (contingency table) and perform chisq test and save the results for each column in different variable.
Its working for one column at a time like,
s <- function(x) {
a <- table(x,data[,1])
b <- chisq.test(a)
}
c1 <- apply(data,2,s)
The results are stored in c1 for column 1, but how will I loop this over all columns and save result for each column for further analysis?
If you're sure you want to do this (I wouldn't, thinking about the multitesting problem), work with lists :
Data <- data.frame(
x=sample(letters[1:3],20,TRUE),
y=sample(letters[1:3],20,TRUE),
z=sample(letters[1:3],20,TRUE)
)
# Make a nice list of indices
ids <- combn(names(Data),2,simplify=FALSE)
# use the appropriate apply
my.results <- lapply(ids,
function(z) chisq.test(table(Data[,z]))
)
# use some paste voodoo to give the results the names of the column indices
names(my.results) <- sapply(ids,paste,collapse="-")
# select all values for y :
my.results[grep("y",names(my.results))]
Not harder than that. As I show you in the last line, you can easily get all tests for a specific column, so there is no need to make a list for each column. That just takes longer and takes more space, but gives the same information. You can write a small convenience function to extract the data you need :
extract <- function(col,l){
l[grep(col,names(l))]
}
extract("^y$",my.results)
Which makes you can even loop over different column names of your dataframe and get a list of lists returned :
lapply(names(Data),extract,my.results)
I strongly suggest you get yourself acquainted with working with lists, they're one of the most powerful and clean ways of doing things in R.
PS : Be aware that you save the whole chisq.test object in your list. If you only need the value for Chi square or the p-value, select them first.
Fundamentally, you have a few problems here:
You're relying heavily on global arguments rather than local ones.
This makes the double usage of "data" confusing.
Similarly, you rely on a hard-coded value (column 1) instead of
passing it as an argument to the function.
You're not extracting the one value you need from the chisq.test().
This means your result gets returned as a list.
You didn't provide some example data. So here's some:
m <- 10
n <- 4
mytable <- matrix(runif(m*n),nrow=m,ncol=n)
Once you fix the above problems, simply run a loop over various columns (since you've now avoided hard-coding the column) and store the result.

Row/column counter in 'apply' functions

What if one wants to apply a functon i.e. to each row of a matrix, but also wants to use as an argument for this function the number of that row. As an example, suppose you wanted to get the n-th root of the numbers in each row of a matrix, where n is the row number. Is there another way (using apply only) than column-binding the row numbers to the initial matrix, like this?
test <- data.frame(x=c(26,21,20),y=c(34,29,28))
t(apply(cbind(as.numeric(rownames(test)),test),1,function(x) x[2:3]^(1/x[1])))
P.S. Actually if test was really a matrix : test <- matrix(c(26,21,20,34,29,28),nrow=3) , rownames(test) doesn't help :(
Thank you.
What I usually do is to run sapply on the row numbers 1:nrow(test) instead of test, and use test[i,] inside the function:
t(sapply(1:nrow(test), function(i) test[i,]^(1/i)))
I am not sure this is really efficient, though.
If you give the function a name rather than making it anonymous, you can pass arguments more easily. We can use nrow to get the number of rows and pass a vector of the row numbers in as a parameter, along with the frame to be indexed this way.
For clarity I used a different example function; this example multiplies column x by column y for a 2 column matrix:
test <- data.frame(x=c(26,21,20),y=c(34,29,28))
myfun <- function(position, df) {
print(df[position,1] * df[position,2])
}
positions <- 1:nrow(test)
lapply(positions, myfun, test)
cbind()ing the row numbers seems a pretty straightforward approach. For a matrix (or a data frame) the following should work:
apply( cbind(1:(dim(test)[1]), test), 1, function(x) plot(x[-1], main=x[1]) )
or whatever you want to plot.
Actually, in the case of a matrix, you don't even need apply. Just:
test^(1/row(test))
does what you want, I think. I think the row() function is the thing you are looking for.
I'm a little confuse so excuse me if I get this wrong but you want work out n-th root of the numbers in each row of a matrix where n = the row number. If this this the case then its really simple create a new array with the same dimensions as the original with each column having the same values as the corresponding row number:
test_row_order = array(seq(1:length(test[,1]), dim = dim(test))
Then simply apply a function (the n-th root in this case):
n_root = test^(1/test_row_order)

Resources