R ffdf sorted data - r

I want to sort the data
z=as.ffdf(data.frame(w=c(4,1,2,5,7,8,65,3,2,9), x=c(12,1,3,5,65,3,2,45,34,11),y=1:10))
I need sorted data based on columns w,x. This is much simple task, if we have a data frame.
Thanks.

Use ffdforder from package ff, this returns an ff_vector, which you can use to index your ffdf, without RAM issues.
require(ff)
z=as.ffdf(data.frame(w=c(4,1,2,5,7,8,65,3,2,9), x=c(12,1,3,5,65,3,2,45,34,11),y=1:10))
idx <- ffdforder(z[c("w","x")])
zordered <- z[idx, ]
zordered

You can try something like this
require(ffbase)
z <- as.ffdf(data.frame(w=c(4,1,2,5,7,8,65,3,2,9),
x=c(12,1,3,5,65,3,2,45,34,11),y=1:10))
z[order(z$w[], z$x[]), ]
## w x y
## 2 1 1 2
## 3 2 3 3
## 9 2 34 9
## 8 3 45 8
## 1 4 12 1
## 4 5 5 4
## 5 7 65 5
## 6 8 3 6
## 10 9 11 10
## 7 65 2 7
You can use fforder to order your ffdf without using your RAM. Credit to #jwijffels
z[fforder(z$w, z$x), ]

Related

Need help concatenating column names

I am generating 5 different prediction and adding those predictions to an existing data frame. My code is:
For j in i{
…
actual.predicted <- data.frame(test_data, predicted)
}
I am trying to concatenate words together to create new column names, in the loop. Specifically, I have a column named “predicted” and I am generating predictions in each iteration of the loop. So, in the first iteration, I want the new column name to be “predicted.1” and for the second iteration, the new column name should be “predicted.2” and so on.
Any thoughts would be greatly appreciated.
You may not even need to use a loop here, but assuming you do, one pattern which might work well here would be to use a list:
results <- list()
for j in i {
# do something involving j
name <- paste0("predicted.", j)
results[[name]] <- data.frame(test_data, predicted)
}
One option is to set the names after assigning new columns
actual.predicted <- data.frame(orig_col = sample(10))
for (j in 1:5){
new_col = sample(10)
actual.predicted <- cbind(actual.predicted, new_col)
names(actual.predicted)[length(actual.predicted)] <- paste0('predicted.',j)
}
actual.predicted
# orig_col predicted.1 predicted.2 predicted.3 predicted.4 predicted.5
# 1 1 4 4 9 1 5
# 2 10 2 3 7 5 9
# 3 8 6 5 4 2 3
# 4 5 9 9 10 7 7
# 5 2 1 10 8 3 10
# 6 9 7 6 6 8 6
# 7 7 8 7 2 4 2
# 8 3 3 1 1 6 8
# 9 6 10 2 3 9 4
# 10 4 5 8 5 10 1

Build a data frame with overlapping observations

Lets say I have a data frame with the following structure:
> DF <- data.frame(x=1:5, y=6:10)
> DF
x y
1 1 6
2 2 7
3 3 8
4 4 9
5 5 10
I need to build a new data frame with overlapping observations from the first data frame to be used as an input for building the A matrix for the Rglpk optimization library. I would use n-length observation windows, so that if n=2 the resulting data frame would join rows 1&2, 2&3, 3&4, and so on. The length of the resulting data frame would be
(numberOfObservations-windowSize+1)*windowSize
The result for this example with windowSize=2 would be a structure like
x y
1 1 6
2 2 7
3 2 7
4 3 8
5 3 8
6 4 9
7 4 9
8 5 10
I could do a loop like
DFResult <- NULL
numBlocks <- nrow(DF)-windowSize+1
for (i in 1:numBlocks) {
DFResult <- rbind(DFResult, DF[i:(i+horizon-1), ])
}
But this seems vey inefficient, especially for very large data frames.
I also tried
rollapply(data=DF, width=windowSize, FUN=function(x) x, by.column=FALSE, by=1)
x y
[1,] 1 6
[2,] 2 7
[3,] 2 7
[4,] 3 8
where I was trying to repeat a block of rows without applying any aggregate function. This does not work since I am missing some rows
I am a bit stumped by this and have looked around for similar problems but could not find any. Does anyone have any better ideas?
We could do a vectorized approach
i1 <- seq_len(nrow(DF))
res <- DF[c(rbind(i1[-length(i1)], i1[-1])),]
row.names(res) <- NULL
res
# x y
#1 1 6
#2 2 7
#3 2 7
#4 3 8
#5 3 8
#6 4 9
#7 4 9
#8 5 10

R Merge tables with one identifier, and other columns with same name will add up

I am currently have multiple tables need to merge. For example, I have tbl_1, tbl_2, and tbl_3. And I want to reach the final result as result table.
tbl_1:
ID trx_1 Cre_counts Deb_counts
1 10 9 8
2 5 6 5
3 10 4 3
tbl_2:
ID trx_2 Unk_counts Deb_counts
1 10 1 2
2 5 6 5
3 10 3 7
tbl_3:
ID trx_3 Unk_counts Ckc_counts
1 3 4 4
2 2 4 3
3 8 7 6
result:
ID trx_1 tx_2 trx_3 Cre_counts Deb_counts Unk_counts Ckc_counts
1 10 10 3 9 10 5 4
2 5 5 2 6 10 10 3
3 10 10 8 4 10 10 6
I have tries merge three tables by "ID", but the column name will change to Deb_counts.x, Deb_counts.y... I can use transform(), rowSums() to take some extra step to make it work. But I am wondering is there a easier way to do it? Thank you!
Maybe not the most elegant but here is a way:
First, you need to put your tables into a list:
l_tbl <- mget(ls(pattern="^tbl"))
Then you go through the list, working with 2 tables at a time, thanks to Reduce, first adding the common columns, then merging:
Reduce(function(x, y) {
col_com <- setdiff(intersect(names(x), names(y)), "ID")
if(length(col_com)) {
x[, col_com] <- x[, col_com] + y[, col_com]
y <- y[, !(names(y) %in% col_com)] # you only keep the "not common" columns in the second table
}
return(merge(x, y, by="ID"))
}, l_tbl)
ID trx_1 Cre_counts trx_3 Ckc_counts trx_2 Deb_counts Unk_counts
1 1 10 9 3 4 10 10 5
2 2 5 6 2 3 5 10 10
3 3 10 4 8 6 10 10 10

How can I make an list from existing data frame, each object in a list contains a vector of a single or multiple row from the data frame?

I am very new to R, still getting my head around so my question can be very basic but please help me out!
I have a large data frame, with more than 400000 rows.
GENE_ID p1 p2 p3 ...
41 1 2 3
41 4 5 6
41 7 8 9
85 1 2 3
1923 1 2 3
1923 4 5 6
First, I wanted to simply name the GENE_ID as the row name, but due to some gene IDs not unique, I failed.
Now I am thinking of making this data frame into a list each object contains expression level of a gene.
So what I want is a list that has outcome something like,
mylist$41
[1] 1 2 3 4 5 6 7 8 9
mylist$85
[1] 1 2 3
mylist$1923
[1] 1 2 3 4 5 6
Any advice to achieve this would be greatly appreciated.
We can do a melt by 'GENE_ID' and then do the split to get a list of vectors
library(reshape2)
mylist <- melt(df1, id.var = 'GENE_ID')
split(mylist$value, mylist$GENE_ID)
#$`41`
#[1] 1 4 7 2 5 8 3 6 9
#$`85`
#[1] 1 2 3
#$`1923`
#[1] 1 4 2 5 3 6
Also, we can do this in base R
v1 <- unlist(df1[-1], use.names = FALSE)
grp <- rep(df1[,1], ncol(df1[-1]))
split(v1, grp)

Build data.frame to set names?

I can convert list into data.frame with do.call function:
z=list(c(1:3),c(5:7),c(7:9))
x=as.data.frame(do.call(rbind,z))
names(x)=c("one","two","three")
x
## one two three
## 1 1 2 3
## 2 5 6 7
## 3 7 8 9
I want to make it to be more concise ,merge the two statement into one statment,can i?
x=as.data.frame(do.call(rbind,z))
names(x)=c("one","two","three")
setNames is what you want. It is in the stats package which should load with R
setNames(as.data.frame(do.call(rbind,z)), c('a','b','c'))
## a b c
## 1 1 2 3
## 2 5 6 7
## 3 7 8 9
An alternative is the structure() function, this is in base, and more general:
structure(as.data.frame(do.call(rbind,z)), names=c('a','b','c'))

Resources