Does anyone know of a simple and efficient way to figure out how many values, in an unsorted vector, are greater than a variable?
My vector is 1,000,000 values long, and I have about 400 of these comparisons to make, with different vectors and variables. Any time-saving function would be appreciated...
If you just want to know how many meet the condition rather than which ones meet the condition, try this:
vector<-c(1,2,3,4,5)
sum(vector>1)
Just use the which function. So if I have vector,
vector<-c(1,2,3,4,5)
which(vector>1)
Outputs 2,3,4,5
if I understand what you want.
you may reorder your vector (quick sort), and after you may a search(binary search). all elements, after the first element that is > then you variable, will be >. is the opposite for <.
Related
I got this code from elsewhere and I wondering if someone can explain what the square brackets are doing.
matrix1[i,] <- df[[1]][]
I am using this to assign values to a matrix and it works but I am not sure what exactly it's doing. What does the initial set of [[]] mean followed by another []?
This might help you understand a bit. You can copy and paste this code and see the differences between different ways of indexing using [] and $. The only thing I can't answer for you is the second empty set of square brackets, from my understanding that does nothing, unless a value is within those brackets.
#Retreives the first column as a data frame
mtcars[1]
#Retrieves the first column values only (three different methods of doing the same thing)
mtcars[,1]
mtcars[[1]]
mtcars$mpg
#Retrieves the first row as a data frame
mtcars[1,]
#I can use a second set of brackets to get the 4th value within the first column
mtcars[[1]][4]
mtcars$mpg[4]
The general function of [ is that of subsetting, which is well documented both in help (as suggested in comments), and in this piece. The rest of of my answer is heavily based on that source.
In fact, there are operators for subsetting in R; [[,[, and $.
The [ and $ are useful for returning the index and named position, respectfully, for example the first three elements of vector a = 1:10 may be subsetted with a[c(1,2,3)]. You can also negatively subset to remove elements, as a[-1] will remove the first index.
The $ operator is different in that it only takes element names as input, e.g. if your df was a dataframe with a column values, df$values would subset that column. You can achieve the same [, but only with a quoted name such as df["values"].
To answer more specifically, what does df[[1]][] do?
First, the [[-operator will return the 1st element from df, and the following empty [-operator will pull everything from that output.
Sorry if this is a duplicate, I read through a number of threads but couldn't really find a good explanation.
I have a dataset (dataframe) where I calculated the mean value of each column. I now want to do some logical comparisons between these values. I used lapply to get the means
means_list <- lapply(dataset_df, mean)
which outputs a named list. But when I try to compare two elements of this list, e.g.
means_list["condition1"] > means_list["condition2"]
I get an error ("comparison of these types is not implemented").
I don't get that error if I use sapply instead so that I'm working with a named vector. I can also get around the error by converting the list to a dataframe with as.data.frame first.
So, I feel like I'm doing something wrong when subsetting a named list here but I don't quite understand how. Is there a correct way to subset the list so that I can do the logical comparison? Or is this not possible with named lists?
Thanks!
To access to the element of a list by its name, you have to use double brackets:
means_list[["condition1"]] > means_list[["condition2"]]
I have a data frame that contains 150 numerical values that I want the mean of. Which column or row they're on is not relevant at all, that's just how the data was given.
I have found a solution to do this, but it's so shamefully disgusting that I'd prefer to use a better method. I've literally just added up the mean of each column and divided by the number of columns...
This is still a 1-liner so it's not that bad, but there must be better ways to do this.
A thousand thanks in advance!
Best solution for me was to unlist it with the unlist() function. Thanks to #H 1 !
Then you can simply use the mean() function.
I'm needing to subset a list which contains an array as well as a factor variable. Essentially if you imagine each component of the array is relative to a single individual which is then associated to a two factor variable (treatment).
list(array=array(rnorm(2,4,1),c(5,5,10)), treatment= rep(c(1,2),5))
Typically when sub-setting multiple components of the array from the first component of the list I would use something like
list$array[,,c(2,4,6)]
this would return the array components in location 2,4 and 6. However, for the factor component of the list this wouldn't work as subsetting is different, what you would need is this:
list$treatment[c(2,4,6)]
Need to subset a list with containing different classes (array and vector) by the same relative number.
You're treating your list of matrices as some kind of 3-dimensional object, but it's not.
Your list$matrices is of itself a list as well, which means you can index at as a list as well, it doesn't matter if it is a list of matrices, numerics, plot-objects, or whatever.
The data you provided as an example can just be indexed at one level, so list$matrices[c(2,4,6)] works fine.
And I don't really get your question about saving the indices in a numeric vector, what's to stop you from this code?
indices <- c(2,4,6)
mysubset <- list(list$matrices[indices], list$treatment[indices])
EDIT, adding new info for edited question:
I see you actually have an 3-D array now. Which is kind of weird, as there is no clear convention of what can be seen as "components". I mean, from your question I understand that list$array[,,n] refers to the n-th individual, but from a pure code-point of view there is no reason why something like list$array[n,,] couldn't refer to that.
Maybe you got the idea from other languages, but this is not really R-ish, your earlier example with a list of matrices made more sense to me. And I think the most logical would have been a data.frame with columns matrix and treatment (which is conceptually close to a list with a vector and a list of matrices, but it's clearer to others what you have).
But anyway, what is your desired output?
If it's just subsetting: with this structure, as there are no constraints on what could have been the content, you just have to tell R exactly what you want. There is no one operator that takes a subset of a vector and the 3rd index of an array at the same time. You're going to have to tell R that you want 3rd index to use for subsetting, and that you want to use the same index for subsetting a vector. Which is basically just the code you already have:
idx <- c(2,4,6)
output <- list(list$array[,,idx], list$treatment[idx])
The way that you use for subsetting multiple matrices actually gives an error since you are giving extra dimension although you already specify which sublist you are in. Hence in order to subset matrices for the given indices you can usemy_list[[1]][indices] or directly my_list$matrices[indices]. It is the same for the case treatement my_list[[2]][indices] or my_list$treatement[indices]
I am trying to do some calculations where I divide two vectors. Sometimes I encounter a division by zero, which cannot take place. Instead of attempting this division, I would like to store an empty element in the output.
The question is: how do I do this? Can vectors have empty fields? Can a structure be the solution to my problem or what else should I use?
No, there must be something in the memory slot. Simply store a NaN or INT_MIN for integer values.