I'm trying to crate a data frame in R by generating rows and appending them one by one. I am doing following
# create an empty data frame.
x <- data.frame ()
# Create 2 lists.
l1 <- list (a = 9, b = 2, c = 4)
l2 <- list (a = 7, b = 2, c = 3)
# Append and print.
x <- rbind (x, l1)
x
a b c
2 9 2 4
# append l2
x <- rbind (x, l2)
x
a b c
2 9 2 4
21 7 2 3
# Append again
x <- rbind (x, l2)
x
a b c
2 9 2 4
21 7 2 3
3 7 2 3
# Append again.
x <- rbind (x, l2)
x
a b c
2 9 2 4
21 7 2 3
3 7 2 3
4 7 2 3
My question is when I print x, what is the significance of the values printed at the beginning of each row ( ie the values 2, 21, 3, 4...) and why these values are appearing as they are, I'd expect then to have been 1,2, 3, 4 .... and so on for shown the indexes of corresponding rows.
Please help.
I think your issue is that you are trying to rbind a data.frame with a list. If you change your rbind commands to this:
x <- rbind (x, as.data.frame(l1))
you won't have an issue.
If you have many lists, may I suggest the data.table package which is very convenient and fast. An example follows:
library(data.table)
n = 100;
V=vector("list",n)
for (i in 1:n) {
V[[i]]<-list(a=runif(1),b=runif(1),c=runif(1));
}
V=rbindlist(V)
V
Thanks.
You won't have strange row names if you avoid initializing an empty data frame.
x <- as.data.frame(l1)
x <- rbind (x, l1)
x <- rbind (x, l2)
x <- rbind (x, l2)
x
If you want to bind rows in a more efficient way, I recommend you the function rbindlist from the data.table package.
Related
I have a list l and an integer n. I would like to pass l n-times to expand.grid.
Is there a better way than writing expand.grid(l, l, ..., l) with n times l?
The function rep seems to do what you want.
n <- 3 #number of repetitions
x <- list(seq(1,5))
expand.grid(rep(x,n)) #gives a data.frame of 125 rows and 3 columns
x2 <- list(a = seq(1,5), b = seq(6, 10))
expand.grid(rep(x2,n)) #gives a data.frame of 15625 rows and 6 columns
If the solution by #Phann doesn't fit to your situation, you can try the following "evil trio" solution:
l <- list(height = seq(60, 80, 5), weight = seq(100, 300, 50), sex = c("male", "female"))
n <- 4
eval(parse(text = paste("expand.grid(",
paste(rep("l", times = n), collapse = ","), ")")))
I think the easiest way to solve the original question is to nest the list using rep.
For example, to expand the same list, n times, use rep to expand the nested list as many times as necessary (n), then use the expanded list as the only argument to expand.grid.
# Example list
l <- list(1, 2, 3)
# Times required
n <- 3
# Expand as many times as needed
m <- rep(list(l), n)
# Expand away
expand.grid(m)
If the function is wanted to (repeatedly) act on the elements of the list freely (i.e., the list members being unconnected from the defined list itself), the following will be useful:
l <- list(1:5, "s") # A list with numerics and characters
n <- 3 # number of repetitions
expand.grid(unlist(rep(l, n))) # the result is:
Var1
1 1
2 2
3 3
4 4
5 5
6 s
7 1
8 2
9 3
10 4
11 5
12 s
13 1
14 2
15 3
16 4
17 5
18 s
I would like to process all rows in data frame df by applying function f to every row. As function f returns numeric vector with two elements I would like to assign individual elements to new columns in df.
Sample df, trivial function f returning two elements and my trial with using apply
df <- data.frame(a = 1:3, b = 3:5)
f <- function (a, b) {
c(a + b, a * b)
}
df[, c('apb', 'amb')] <- apply(df, 1, function(x) f(a = x[1], b = x[2]))
This does not work results are assigned by columns:
> df
a b apb amb
1 1 3 4 8
2 2 4 3 8
3 3 5 6 15
You could also use Reduce instead of apply as it is generally more efficient. You just need to slightly modify your function to use cbind instead of c
f <- function (a, b) {
cbind(a + b, a * b) # midified to use `cbind` instead of `c`
}
df[c('apb', 'amb')] <- Reduce(f, df)
df
# a b apb amb
# 1 1 3 4 3
# 2 2 4 6 8
# 3 3 5 8 15
Note: This will only work nicely if you have only two columns (as in your example), thus if you have more columns in you data set, run this only on a subset
You need to transpose apply results to get what you want :
df[, c('apb', 'amb')] <- t(apply(df, 1, function(x) f(a = x[1], b = x[2])))
> df
a b apb amb
1 1 3 4 3
2 2 4 6 8
3 3 5 8 15
So, I have several dataframes like this
1 2 a
2 3 b
3 4 c
4 5 d
3 5 e
......
1 2 j
2 3 i
3 4 t
3 5 r
.......
2 3 t
2 4 g
6 7 i
8 9 t
......
What I want is, I want to merge all of these files into one single file showing the values of third column for each pair of values in columns 1 and columns 2 and 0 if that pair is not present.
So, the output for this will be, since, there are three files (there are more)
1 2 aj0
2 3 bit
3 4 ct0
4 5 d00
3 5 er0
6 7 00i
8 9 00t
......
What I did was combine all my text .txt files in a single list.
Then,
L <- lapply(seq_along(L), function(i) {
L[[i]][, paste0('DF', i)] <- 1
L[[i]]
})
Which will indicate the presence of a value when we will be merging them.
I don't know how to proceed further. Any inputs will be great. Thanks!
Here is one way to do it with Reduce
# function to generate dummy data
gen_data<- function(){
data.frame(
x = 1:3,
y = 2:4,
z = sample(LETTERS, 3, replace = TRUE)
)
}
# generate list of data frames to merge
L <- lapply(1:3, function(x) gen_data())
# function to merge by x and y and concatenate z
f <- function(x, y){
d <- merge(x, y, by = c('x', 'y'), all = TRUE)
# set merged column to zero if no match is found
d[['z.x']] = ifelse(is.na(d[['z.x']]), 0, d[['z.x']])
d[['z.y']] = ifelse(is.na(d[['z.y']]), 0, d[['z.y']])
d$z <- paste0(d[['z.x']], d[['z.y']])
d['z.x'] <- d['z.y'] <- NULL
return(d)
}
# merge data frames
Reduce(f, L)
I have one matrix of mutation counts, say "counts". This matrix has column names V1, V2,...,Vi,...Vn where not every "i" is there. Thus it can jump, such as V1, V2, V5 say. Further, most of columns have a 0 in them.
I need to create a sum matrix, called "answer", where element i, j is the sum of the number of the number counts at both i and j. At the i, i element it just shows the number of counts at i.
Here's a quick data set up. I already have the correct dimensioned matrix set up in my code called "answer". Thus what I would need to automate are the last several lines where I fill in the matrix.
counts <- matrix(data = c(0,2,0,5,0,6,0), nrow = 1, ncol = 7, dimnames=list("",c("V1","V2","V3","V4","V5","V6","V7")))
answer <- matrix(data =0, nrow = 3, ncol = 3, dimnames = list(c("V2","V4","V6"),c("V2","V4","V6")))
answer[1,1] <- 2
answer[1,2] <- 7
answer[1,3] <- 8
answer[2,1] <- 7
answer[2,2] <- 5
answer[2,3] <- 11
answer[3,1] <- 8
answer[3,2] <- 11
answer[3,3] <- 6
I understand I can do this with 2 nested for loops, but surely there must be a better way no? Thanks!
This could be done with the right use of expand.grid and rowSums:
n = counts[, counts > 0]
answer = matrix(rowSums(expand.grid(n, n)), nrow=length(n), dimnames=list(names(n), names(n)))
diag(answer) = n
To show how it works, n would end up being:
V2 V4 V5
2 5 6
and expand.grid(n, n) would be:
Var1 Var2
1 2 2
2 5 2
3 6 2
4 2 5
5 5 5
6 6 5
7 2 6
8 5 6
9 6 6
The last line (diag) is necessary because otherwise the diagonal would be twice the original vector (adding 2+2, 5+5, or 6+6).
I would like to interweave two data.frame in R. For example:
a = data.frame(x=1:5, y=5:1)
b = data.frame(x=2:6, y=4:0)
I would like the result to look like:
> x y
1 5
2 4
2 4
3 3
3 3
...
obtained by cbinding x[1] with y[1], x[2] with y[2], etc.
What is the cleanest way to do this? Right now my solution involves spitting everthing out to a list and merging. This is pretty ugly:
lst = lapply(1:length(x), function(i) cbind(x[i,], y[i,]))
res = do.call(rbind, lst)
There is, of course, the interleave function in the "gdata" package:
library(gdata)
interleave(a, b)
# x y
# 1 1 5
# 6 2 4
# 2 2 4
# 7 3 3
# 3 3 3
# 8 4 2
# 4 4 2
# 9 5 1
# 5 5 1
# 10 6 0
You can do this by giving x and y an index, rbind them and sort by the index.
a = data.frame(x=1:5, y=5:1)
b = data.frame(x=2:6, y=4:0)
df <- rbind(data.frame(a, index = 1:nrow(a)), data.frame(b, index = 1:nrow(b)))
df <- df[order(df$index), c("x", "y")]
This is how I'd approach:
dat <- do.call(rbind.data.frame, list(a, b))
dat[order(dat$x), ]
do.call was unnecessary in the first step but makes the solution more extendable.
Perhaps this is cheating a bit, but the (non-exported) function interleave from ggplot2 is something I've stolen for my own uses before:
as.data.frame(mapply(FUN=ggplot2:::interleave,a,b))