Process and update simultaneously bunches of data.frames/matrix in R - r

I have bunches of data.frames in R workspace. And I have exactly same processing to treat them. Since I am "lazy" to run the command for each data.frame one by one, I wish to treat them as a group and process them with a loop which saves time.
Let me say, simply, to apply as.data.frame to those matrix for example of my real serial data-processing.
# dummy data
set.seed(1026)
a<-matrix(rnorm(100),50,2)
b<-matrix(rnorm(100),50,2)
c<-matrix(rnorm(100),50,2)
# process data one-by-one which is not good
a<-as.data.frame(a)
b<-as.data.frame(b)
c<-as.data.frame(c)
I could do but it is time-consume. I turn to a lazy but quick way similar to*applydealing with rows or columns inside data.frame.
sapply(c(a,b,c),as.data.frame) or sapply(list(a,b,c),as.data.frame), or even:
> for (dt in c(a,b,c)){
+ dt<-as.data.frame(dt)
+ }
But, none of them make changes happened to the original three matrix.
> class(a)
[1] "matrix"
> class(b)
[1] "matrix"
> class(c)
[1] "matrix"
I wish to see all of them have been trans to data.frame.

Your problem is that you're using sapply, which simplifies results to vectors or matrices.
You want lapply instead:
lapply(list(a,b,c), as.data.frame)
Edit for the (generally frowned upon) practice of changing the objects systematically but keeping the object names the same:
for(i in c("a", "b", "c"))
assign(i, as.data.frame(get(i))

This should get you a list of 3 data.frames:
set.seed(1026)
lapply(1:3,function(x){as.data.frame(matrix(rnorm(100),50,2))})

Related

Create vector using for loop with if else condition

How can I turn the 3 item output of the for loop below into a data frame. In attempting a solution, I've tried:
-Creating an object related to the for loop, but couldn't succeed
-Creating a matrix, to no effect
What code would turn the output into a vector or list?
> for(i in X$Planned)ifelse(is.na(i),print("ISNA"),print("NOTNA"))
[1] "NOTNA"
[1] "NOTNA"
[1] "ISNA"
sapply(x$Planned, function(elem) if (is.na(elem)) {"isNA"} else {"notNA"})
# this will do it!
# however, it will be slower than the vectorized form
ifelse(is.na(x$Planned), "isNA", "notNA")

Proper way to subset big.matrix

I would like to know if there is a 'proper' way to subset big.matrix objects in R. It is simple to subset a matrix but the class always reverts to 'matrix'. This isn't a problem when working with small datasets like this but with massive datasets but with extremely large datasets the subset could still benefit from the 'big.matrix' class.
require(bigmemory)
data(iris)
# I realize the warning about factors but not important for this example
big <- as.big.matrix(iris)
class(big)
[1] "big.matrix"
attr(,"package")
[1] "bigmemory"
class(big[,c("Sepal.Length", "Sepal.Width")])
[1] "matrix"
class(big[,1:2])
[1] "matrix"
I have since learned that the 'proper' way to subset a big.matrix is to use sub.big.matrix although this is only for contiguous columns and/or rows. Non-contiguous subsetting is not currently implemented.
sm <- sub.big.matrix(big, firstCol=1, lastCol=2)
It doesn't seem to be possible without calling as.big.matrix on the subset.
From the big.matrix documentation,
If x is a big.matrix, then x[1:5,] is returned as an R matrix containing the first five rows of x.
I presume this also applies to columns as well. So it seems you would need to call
a <- as.big.matrix(big[,1:2])
in order for the subset to also be a big.matrix object.
class(a)
# [1] "big.matrix"
# attr(,"package")
# [1] "bigmemory"

rbind doing different things for single versus multiple arguments

I am new to R and came over code that uses do.call("rbind", df.list) to combine a list of data frames.
The data frames have arrays as columns and rbind does remove the arrays, but only if there are at least two elements in the list to combine.
Quick example:
> class(rbind(data.frame(a=array(1,2)), data.frame(a=array(3,4)))$a)
[1] "numeric"
> class(rbind(data.frame(a=array(1,2)))$a)
[1] "array"
Is this a bug in rbind? It appears if it is called with one argument, it does just return that argument, while if called with multiple, it does remove arrays.
How can I "unarray" such a data frame if length(df.list) == 1?
Example of what I need:
> df.list1 <- list(data.frame(a=array(1,2), b=array("a")), data.frame(a=array(3,4), b=array("b")))
> df.list2 <- list(data.frame(a=array(1,2), b=array("a")))
> df.combined1 <- do.call("rbind", df.list1)
> df.combined2 <- do.call("rbind", df.list2)
> class(df.combined1$a)
[1] "numeric"
> class(df.combined2$a)
[1] "array"
The goal is to have a data frame df.combined not having array columns independent whether df.list had one or multiple elements. The type and number of the data frame columns are unknown in advance.
Lets start with:
class(rbind(data.frame(a=array(1,2)), data.frame(a=array(3,4))))
class(rbind(data.frame(a=array(1,2))))
Both these have class of data.frame.
Now, as you noticed:
> class(rbind(data.frame(a=array(1,2)), data.frame(a=array(3,4)))$a)
[1] "numeric"
> class(rbind(data.frame(a=array(1,2)))$a)
[1] "array"
The first one is expected, however, second one is unexpected due to the way rbind method for data.frame. As per the documentation for rbind:
... It [then] takes the classes of the columns from the first data frame,...
If you want to coerce the class in case of a single array to numeric, then you can use something like this:
ifelse(length(df.list) == 1,
class(rbind(data.frame(a=as.vector(array(1,2))))$a),
...)
Coercing it as.vector gets rid of the array class.
(EDIT: Depending on what you want, you might also benefit from the discussion in the comments below!)
Lastly, note that this is an issue only for a one-dimensional array. For higher dimensions, you get the appropriate class:
class(rbind(data.frame(a=array(as.numeric(1:10),c(2,5))))$a.1)
EDIT: Based on your update, I think here is what you want:
df.list1 <- list(data.frame(a=array(1,2), b=array("a")), data.frame(a=array(3,4), b=array("b")))
df.list2 <- list(data.frame(a=array(1,2), b=array("a")))
cobmineDFList <- function(df.list) {
temp <- do.call(rbind, df.list)
if(class(temp$a) == "array") temp$a <- as.numeric(temp$a)
temp
}
df.combined1 <- cobmineDFList(df.list1)
df.combined2 <- cobmineDFList(df.list2)
class(df.combined1$a)
class(df.combined2$a)
Hope this helps!

basic R question on manipulating dataframes

I have a data frame with several columns. rows have names.
I want to calculate some value for each row (col1/col2) and create a new data frame with the original row names. If I just do something like data$col1/data$col2 I get a vector with the results but lose the row names.
i know it's very basic but I'm quite new to R.
It would help to read ?"[.data.frame" to understand what's going on. Specifically:
Note that there is no ‘data.frame’
method for ‘$’, so ‘x$name’ uses the
default method which treats ‘x’ as a
list.
You will see that the object's names are lost if you convert a data.frame to a list (using Joris' example data):
> as.list(Data)
$col1
[1] -0.2179939 -2.6050843 1.6980104 -0.9712305 1.6953474 0.4422874
[7] -0.5012775 0.2073210 1.0453705 -0.2883248
$col2
[1] -1.3623349 0.4535634 0.3502413 -0.1521901 -0.1032828 -0.9296857
[7] 1.4608866 1.1377755 0.2424622 -0.7814709
My suggestion would be to avoid using $ if you want to keep row names. Use this instead:
> Data["col1"]/Data["col2"]
col1
a 0.1600149
b -5.7435947
c 4.8481157
d 6.3816918
e -16.4146120
f -0.4757387
g -0.3431324
h 0.1822161
i 4.3114785
j 0.3689514
use the function names() to add the names :
Data <- data.frame(col1=rnorm(10),col2=rnorm(10),row.names=letters[1:10])
x <- Data$col1/Data$col2
names(x) <- row.names(Data)
This solution gives a vector with the names. To get a data-frame (solution from Marek) :
NewFrame <- data.frame(x=Data$col1/Data$col2,row.names=row.names(Data))
A very simple and neat way is to use row.names(data frame) to store it as a column and further manipulate

how do I get the difference between two R named lists?

OK, I've got two named lists, one is "expected" and one is "observed". They may be complex in structure, with arbitrary data types. I want to get a new list containing just those elements of the observed list that are different from what's in the expected list. Here's an example:
Lexp <- list(a=1, b="two", c=list(3, "four"))
Lobs <- list(a=1, c=list(3, "four"), b="ni")
Lwant <- list(b="ni")
Lwant is what I want the result to be. I tried this:
> setdiff(Lobs, Lexp)
[[1]]
[1] "ni"
Nope, that loses the name, and I don't think setdiff pays attention to the names. Order clearly doesn't matter here, and I don't want a=1 to match with b=1.
Not sure what a good approach is... Something that loops over a list of names(Lobs)? Sounds clumsy and non-R-like, although workable... Got any elegant ideas?
At least in this case
Lobs[!(Lobs %in% Lexp)]
gives you what you want.
OK, I found one slightly obtuse answer, using the plyr package:
> Lobs[laply(names(Lobs), function(x) !identical(Lobs[[x]], Lexp[[x]]))]
$b
[1] "ni"
So, it takes the names of the array from the observed function, uses double-bracket indexing and the identical() function to compare the sub-lists, then uses the binary array that results from laply() to index into the original observed function.
Anyone got a better/cleaner/sexier/faster way?

Resources