How to create a sorted vector in r - r

I have a list of elements in a random order. I want to read each element of this data one at a time and insert into other list in a sorted order. I wonder how to do this in R. I tried the below code.
lst=list()
x=c(2,3,1,4,5)
for(i in 1:length(x)) ## for reading the elements from x
{
if(lst==NULL)
{
lst=x[i]
}
else
{
lst=x[i]
print(lst)
for(k in 2: length(lst)) ## For sorting the elements in a list
{
value = lst[k]
j=k-1
while(j>=1 && lst[j]>value)
{
lst[j+1] = lst[j]
j= j-1
}
lst[j+1] = value
}
}
print(lst)
}
But i get the the Error :
error in if (lst == NULL) { : argument is of length zero.

For big datasets with lots of columns, you can use do.call
df1 <- df[do.call(order, df),]
Checking the order by specifying the column names,
df2 <- df[with(df, order(V1, V2, V3, V4)),]
identical(df1,df2)
#[1] TRUE
If you need to order in the reverse direction
df[do.call(order, c(df,decreasing=TRUE)),]
data
set.seed(24)
df <- as.data.frame(matrix(sample(letters,10*4,replace=TRUE),ncol=4))

First off, as commenters as pointed, you could use sort or order. But I believe you are trying to solve an assignment.
Your problem is a typo. Try executing in a console:
lst <- list()
lst == NULL
The last line evaluates to a null-length vector (logical(0)) for which R has no interpretation. Instead you are interested in
is.null(lst)
which will return TRUE or FALSE.

Related

How to modify the list/vector argument in place in the R function lapply?

Perhaps an odd question but I have a real use case for it: is it possible to modify in place, i.e. when the function runs and not on a copy, the list/vector argument in the lapply function in R? And if yes, how to do it?
I need it because of some (desired) side effects.
To be more specific, after each element of the list/vector argument is 'read' (or 'used') by lapply, I need to delete it.
Best regards,
Olivier
Some code to illustrate what I am looking for.
Blocked is an atomic vector and B a list.
Unblock <- function(M)
{
Blocked[M] <<- FALSE
Length_B_M <- length(B[[M]])
if(Length_B_M > 0)
{
# Vectorised version
UnblockAndUpdateAdjacencyList <- function(Node.Value)
{
Node.Name <- as.character(Node.Value)
Node.Value <<- NULL # This is not correct because it won't delete the argument in place
if(Blocked[Node.Name]) Unblock(Node.Name)
}
lapply(X = B[[M]], FUN = UnblockAndUpdateAdjacencyList)
# Unvectorised version
# 'i' is a vector/list index - integer
# Important: it is necessary to browse items backwards as items from the atomic vector 'B[[M]]' are deleted and thus its length decreases at each iteration
for(i in Length_B_M:1)
{
P <- as.character(B[[M]][i])
B[[M]] <<- B[[M]][-i]
if(Blocked[P]) Unblock(P)
}
}
}
Actually I found a somewhat elegant solution, i.e. without any explicit loop and any copy of the vector/list in the argument of lapply.
Unblock <- function(M)
{
Blocked[M] <<- FALSE
Length_B_M <- length(B[[M]])
if(Length_B_M > 0)
{
# Vectorised version
# One solution is to copy the vector/list in the argument of 'lapply' - Actually not even needed
UnblockAndUpdateAdjacencyList <- function(Node.Value)
{
Node.Name <- as.character(Node.Value)
B[[M]] <<- B[[M]][-1]
if(Blocked[Node.Name]) Unblock(Node.Name)
}
#B_M_copy <- B[[M]]
#lapply(X = B_M_copy, FUN = UnblockAndUpdateAdjacencyList)
lapply(X = B[[M]], FUN = UnblockAndUpdateAdjacencyList)
}
}

User defined function - issue with return values

I regularly come up against the issue of how to categorise dataframes from a list of dataframes according to certain values within them (E.g. numeric, factor strings, etc). I am using a simplified version using vectors here.
After writing messy for loops for this task a bunch of times, I am trying to write a function to repeatedly solve the problem. The code below returns a subscripting error (given at the bottom), however I don't think this is a subscripting problem, but to do with my use of return.
As well as fixing this, I would be very grateful for any pointers on whether there are any cleaner / better ways to code this function.
library(plyr)
library(dplyr)
#dummy data
segmentvalues <- c('1_P', '2_B', '3_R', '4_M', '5_D', '6_L')
trialvec <- vector()
for (i in 1:length(segmentvalues)){
for (j in 1:20) {
trialvec[i*j] <- segmentvalues[i]
}
}
#vector categorisation
vcategorise <- function(categories, data) {
#categorises a vector into a list of vectors
#requires plyr and dyplyr
assignment <- list()
catlength <- length(categories)
for (i in 1:length(catlength)){
for (j in 1:length(data)) {
if (any(contains(categories[i], ignore.case = TRUE,
as.vector(data[j])))) {
assignment[[i]][j] <- data[j]
}
}
}
return (assignment)
}
result <- vcategorise(categories = segmentvalues, data = trialvec)
Error in *tmp*[[i]] : subscript out of bounds
You are indexing assignments -- which is ok, even if at an index that doesn't have a value, that just gives you NULL -- and then indexing into what you get there -- which won't work if you get NULL. And NULL you will get, because you haven't allocated the list to be the right size.
In any case, I don't think it is necessary for you to allocate a table. You are already using a flat indexing structure in your test data generation, so why not do the same with assignment and then set its dimensions afterwards?
Something like this, perhaps?
vcategorise <- function(categories, data) {
assignment <- vector("list", length = length(data) * length(categories))
n <- length(data)
for (i in 1:length(categories)){
for (j in 1:length(data)) {
assignment[(i-1)*n + j] <-
if (any(contains(categories[i],
ignore.case = TRUE,
as.vector(data[j])))) {
data[j]
} else {
NA
}
}
}
dim(assignment) <- c(length(data), length(categories))
assignment
}
It is not the prettiest code, but without fully understanding what you want to achieve, I don't know how to go further.

R: Remove all data frames from work space that have 0 rows (i.e. are empty)

I know this should be easy, but I am baffled on how to solve this problem.
I have a bunch of data frames, some are empty (0 rows, 42 variables), some have information in them (x rows, 42 variables) from a previous working step. I now simply want to delete all those with 0 rows.
First, I get all DF by
alldfnames <- which(unlist(eapply(.GlobalEnv,is.data.frame)))
Second, I tried to write a function to distinguish between the data frames:
isFullDF <- function(x) dim(x)[1] > 0
Third, I tried to
for (i in seq_along(alldfnames)) {
if(isFullDF(alldfnames[i]) == FALSE){
rm(alldfnames[i])
} else {
# do nothing
}
}
But this gives me (for hours now) an error:
Error in if (isFullDF(alldfnames[i]) == FALSE) { :
argument is of length zero
Any idea?
First if you look at alldfnames you'll see it's a vector of integers where names(alldfnames) are the names of the variables you are after. So alldfnames[i] is just a number. So you need
alldfnames <- names(alldfnames)
which is a character vector of df names.
Next, when you do dim(x) and (e.g.) you have a dataframe called df in your enviromnent, x is the character "df" not the dataframe. So you need to retrieve it. You can use get for that.
isFullDF <- function(x) nrow(get(x)) > 0
And then when you rm you need to tell R that the things you are removing are character strings with the names of the things you want to remove. As opposed to removing the object called alldfnames[i]. ie
rm(list=alldfnames[i])
(as an aside, you don't need the else { } if it's empty).
Using Filter:
alldfnames = names(which(unlist(eapply(.GlobalEnv,is.data.frame))))
rowCounts = sapply(alldfnames,function(x) ifelse(nrow(get(x))==0,1,0))
emptyDF = names(Filter(function(f) f==1, rowCounts))
rm(list = emptyDF)
Try:
x <- eapply(.GlobalEnv,is.data.frame)
alldfnames <- names(x[x==T])
Now alldfnames contains all data frame names in your environment, then use the following function:
isFullDF <- function(nm) nrow(get(nm))>0
And then this one-line code instead of your for loop:
rm(list = alldfnames[!sapply(alldfnames, isFullDF)])

Trying to vectorize a for loop in R

UPDATE
Thanks to the help and suggestions of #CarlWitthoft my code was simplified to this:
model <- unlist(sapply(1:length(model.list),
function(i) ifelse(length(model.list[[i]][model.lookup[[i]]] == "") == 0,
NA, model.list[[i]][model.lookup[[i]]])))
ORIGINAL POST
Recently I read an article on how vectorizing operations in R instead of using for loops are a good practice, I have a piece of code where I used a big for loop and I'm trying to make it a vector operation but I cannot find the answer, could someone help me? Is it possible or do I need to change my approach? My code works fine with the for loop but I want to try the other way.
model <- c(0)
price <- c(0)
size <- c(0)
reviews <- c(0)
for(i in 1:length(model.list)) {
if(length(model.list[[i]][model.lookup[[i]]] == "") == 0) {
model[i] <- NA
} else {
model[i] <- model.list[[i]][model.lookup[[i]]]
}
if(length(model.list[[i]][price.lookup[[i]]] == "") == 0) {
price[i] <- NA
} else {
price[i] <- model.list[[i]][price.lookup[[i]]]
}
if(length(model.list[[i]][reviews.lookup[[i]]] == "") == 0) {
reviews[i] <- NA
} else {
reviews[i] <- model.list[[i]][reviews.lookup[[i]]]
}
size[i] <- product.link[[i]][size.lookup[[i]]]
}
Basically the model.list variable is a list from which I want to extract a particular vector, the location from that vector is given by the variables model.lookup, price.lookup and reviews.lookup which contain logical vectors with just one TRUE value which is used to return the desired vector from model.list. Then every cycle of the for loop the extracted vectors are stored on variables model, price, size and reviews.
Could this be changed to a vector operation?
In general, try to avoid if when not needed. I think your desired output can be built as follows.
model <- unlist(sapply(1:length(model.list), function(i) model.list[[i]][model.lookup[[i]]]))
model[model=='']<-NA
And the same for your other variables. This assumes that all model.lookup[[i]] are of length one. If they aren't, you won't be able to write the output to a single element of model in the first place.
I would also note that you are grossly overcoding, e.g. x<-0 is better than x<-c(0), and don't bother with length evaluation on a single item.

Using lapply to subset rows from data frames -- incorrect number of dimensions error

I have a list called "scenbase" that contains 40 data frames, which are each 326 rows by 68 columns. I would like to use lapply() to subset the data frames so they only retain rows 33-152. I've written a simple function called trim() (below), and am attempting to apply it to the list of data frames but am getting an error message. The function and my attempt at using it with lapply is below:
trim <- function(i)
{ (i <- i[33:152,]) }
lapply(scenbase, trim)
Error in i[33:152, ] : incorrect number of dimensions
When I try to do the same thing to one of the individual data frames (soil11base.txt) that are included in the list (below), it works as expected:
soil11base.txt <- soil11base.txt[33:152,]
Any idea what I need to do to get the dimensions correct?
You have 2 solutions. You can either
(a) assign to a new list newList = lapply(scenbase, function(x) { x[33:152,,drop=F]} )
(b) use the <<- operator will assign your trimmed data in place lapply(1:length(scenbase), function(x) { scenbase[[x]] <<- scenbase[[x]][33:152,,drop=F]} ).
Your call does not work because the i is not in the global scope. You can work your way around that by using calls to the <<- operator which assigns to the first variable it finds in successive parent environments. Or by creating a new trimmed list.
Here is some code that reproduces solution (a):
listOfDfs = list()
for(i in 1:10) { listOfDfs[[i]] = data.frame("x"=sample(letters,200,replace=T),"y"=sample(letters,200,replace=T)) }
choppedList = lapply(listOfDfs, function(x) { x[33:152,,drop=F]} )
Here is some code that reproduces solution (b):
listOfDfs = list()
for(i in 1:10) { listOfDfs[[i]] = data.frame("x"=sample(letters,200,replace=T),"y"=sample(letters,200,replace=T)) }
lapply(1:length(listOfDfs), function(x) { listOfDfs[[x]] <<- listOfDfs[[x]][33:152,,drop=F]} )

Resources