create a vector from a given data frame row wise - r

in R I need to create a vector from a given data frame row wise. for example:
data frame:
A B
1 4
2 5
3 6
vector = 1 4 2 5 3 6
couldn't be so hard i think. Thanks so far.

> df <- data.frame(A=1:3,B=4:6)
> as.vector(t(df))
[1] 1 4 2 5 3 6

Related

Build a data frame with overlapping observations

Lets say I have a data frame with the following structure:
> DF <- data.frame(x=1:5, y=6:10)
> DF
x y
1 1 6
2 2 7
3 3 8
4 4 9
5 5 10
I need to build a new data frame with overlapping observations from the first data frame to be used as an input for building the A matrix for the Rglpk optimization library. I would use n-length observation windows, so that if n=2 the resulting data frame would join rows 1&2, 2&3, 3&4, and so on. The length of the resulting data frame would be
(numberOfObservations-windowSize+1)*windowSize
The result for this example with windowSize=2 would be a structure like
x y
1 1 6
2 2 7
3 2 7
4 3 8
5 3 8
6 4 9
7 4 9
8 5 10
I could do a loop like
DFResult <- NULL
numBlocks <- nrow(DF)-windowSize+1
for (i in 1:numBlocks) {
DFResult <- rbind(DFResult, DF[i:(i+horizon-1), ])
}
But this seems vey inefficient, especially for very large data frames.
I also tried
rollapply(data=DF, width=windowSize, FUN=function(x) x, by.column=FALSE, by=1)
x y
[1,] 1 6
[2,] 2 7
[3,] 2 7
[4,] 3 8
where I was trying to repeat a block of rows without applying any aggregate function. This does not work since I am missing some rows
I am a bit stumped by this and have looked around for similar problems but could not find any. Does anyone have any better ideas?
We could do a vectorized approach
i1 <- seq_len(nrow(DF))
res <- DF[c(rbind(i1[-length(i1)], i1[-1])),]
row.names(res) <- NULL
res
# x y
#1 1 6
#2 2 7
#3 2 7
#4 3 8
#5 3 8
#6 4 9
#7 4 9
#8 5 10

How can I make an list from existing data frame, each object in a list contains a vector of a single or multiple row from the data frame?

I am very new to R, still getting my head around so my question can be very basic but please help me out!
I have a large data frame, with more than 400000 rows.
GENE_ID p1 p2 p3 ...
41 1 2 3
41 4 5 6
41 7 8 9
85 1 2 3
1923 1 2 3
1923 4 5 6
First, I wanted to simply name the GENE_ID as the row name, but due to some gene IDs not unique, I failed.
Now I am thinking of making this data frame into a list each object contains expression level of a gene.
So what I want is a list that has outcome something like,
mylist$41
[1] 1 2 3 4 5 6 7 8 9
mylist$85
[1] 1 2 3
mylist$1923
[1] 1 2 3 4 5 6
Any advice to achieve this would be greatly appreciated.
We can do a melt by 'GENE_ID' and then do the split to get a list of vectors
library(reshape2)
mylist <- melt(df1, id.var = 'GENE_ID')
split(mylist$value, mylist$GENE_ID)
#$`41`
#[1] 1 4 7 2 5 8 3 6 9
#$`85`
#[1] 1 2 3
#$`1923`
#[1] 1 4 2 5 3 6
Also, we can do this in base R
v1 <- unlist(df1[-1], use.names = FALSE)
grp <- rep(df1[,1], ncol(df1[-1]))
split(v1, grp)

Extract data from data.frame based on coordinates in another data.frame

So here is what my problem is. I have a really big data.frame woth two columns, first one represents x coordinates (rows) and another one y coordinates (columns), for example:
x y
1 1
2 3
3 1
4 2
3 4
In another frame I have some data (numbers actually):
a b c d
8 7 8 1
1 2 3 4
5 4 7 8
7 8 9 7
1 5 2 3
I would like to add a third column in first data.frame with data from second data.frame based on coordinates from first data.frame. So the result should look like this:
x y z
1 1 8
2 3 3
3 1 5
4 2 8
3 4 8
Since my data.frames are really big the for loops are too slow. I think there is a way to do this with apply loop family, but I can't find how. Thanks in advance (and sorry for ugly message layout, this is my first post here and I don't know how to produce this nice layout with code and proper data.frames like in another questions).
This is a simple indexing question. No need in external packages or *apply loops, just do
df1$z <- df2[as.matrix(df1)]
df1
# x y z
# 1 1 1 8
# 2 2 3 3
# 3 3 1 5
# 4 4 2 8
# 5 3 4 8
A base R solution: (df1 and df2 are coordinates and numbers as data frames):
df1$z <- mapply(function(x,y) df2[x,y], df1$x, df1$y )
It works if the last y in the first data frame is corrected from 5 to 4.
I guess it was a typo since you don't have 5 columns in the second data drame.
Here's how I would do this.
First, use data.table for fast merging; then convert your data frames (I'll call them dt1 with coordinates and vals with values) to data.tables.
dt1<-data.table(dt)
vals<-data.table(vals)
Second, put vals into a new data.table with coordinates:
vals_dt<-data.table(x=rep(1:dim(vals)[1],dim(vals)[2]),
y=rep(1:dim(vals)[2],each=dim(vals)[1]),
z=matrix(vals,ncol=1)[,1],key=c("x","y"))
Now merge:
setkey(dt1,x,y)[vals_dt,z:=z]
You can also try the data.table package and update df1 by reference
library(data.table)
setDT(df1)[, z := df2[cbind(x, y)]][]
# x y z
# 1: 1 1 8
# 2: 2 3 3
# 3: 3 1 5
# 4: 4 2 8
# 5: 3 4 8

How to get the index column in a data frame in R [duplicate]

Starting with a data.frame...
df = data.frame(k=c(1,5,4,7,6), v=c(3,1,4,1,5))
> df
k v
1 1 3
2 5 1
3 4 4
4 7 1
5 6 5
I might run some number of arbitrary manipulations...
> foo1 = df[df$k>3,]
> foo2 = head(foo1[order(foo1$v),], 2)
> foo2
k v
2 5 1
4 7 1
At this point foo2 has somehow retained the original row numbers fromdf (in this case 2 and 4).
How do I extract these?
> insert_magic_function_here(foo2)
[1] 2 4
I think you're looking for rownames.

Select max or equal value from several columns in a data frame

I'm trying to select the column with the highest value for each row in a data.frame. So for instance, the data is set up as such.
> df <- data.frame(one = c(0:6), two = c(6:0))
> df
one two
1 0 6
2 1 5
3 2 4
4 3 3
5 4 2
6 5 1
7 6 0
Then I'd like to set another column based on those rows. The data frame would look like this.
> df
one two rank
1 0 6 2
2 1 5 2
3 2 4 2
4 3 3 3
5 4 2 1
6 5 1 1
7 6 0 1
I imagine there is some sort of way that I can use plyr or sapply here but it's eluding me at the moment.
There might be a more efficient solution, but
ranks <- apply(df, 1, which.max)
ranks[which(df[, 1] == df[, 2])] <- 3
edit: properly spaced!

Resources