How to combine a data frame and a vector

How to combine a data frame and a vector - r

df<-data.frame(w=c("r","q"), x=c("a","b"))
y=c(1,2)
How do I combine df and y into a new data frame that has all combinations of rows from df with elements from y? In this example, the output should be
data.frame(w=c("r","r","q","q"), x=c("a","a","b","b"),y=c(1,2,1,2))
w x y
1 r a 1
2 r a 2
3 q b 1
4 q b 2

This should do what you're trying to do, and without too much work.
dl <- unclass(df)
dl$y <- y
merge(df, expand.grid(dl))
# w x y
# 1 q b 1
# 2 q b 2
# 3 r a 1
# 4 r a 2

data.frame(lapply(df, rep, each = length(y)), y = y)

this should work
library(combinat)
df<-data.frame(w=c("r","q"), x=c("a","b"))
y=c("one", "two") #for generality
indices <- permn(seq_along(y))
combined <- NULL
for(i in indices){
current <- cbind(df, y=y[unlist(i)])
if(is.null(combined)){
combined <- current
} else {
combined <- rbind(combined, current)
}
}
print(combined)
Here is the output:
w x y
1 r a one
2 q b two
3 r a two
4 q b one
... or to make it shorter (and less obvious):
combined <- do.call(rbind, lapply(indices, function(i){cbind(df, y=y[unlist(i)])}))

First, convert class of columns from factor to character:
df <- data.frame(lapply(df, as.character), stringsAsFactors=FALSE)
Then, use expand.grid to get a index matrix for all combinations of rows of df and elements of y:
ind.mat = expand.grid(1:length(y), 1:nrow(df))
Finally, loop through the rows of ind.mat to get the result:
data.frame(t(apply(ind.mat, 1, function(x){c(as.character(df[x[2], ]), y[x[1]])})))

Related

Aggregate rows under certain conditions using aggregate() function with R, without using dplyr

I want to aggregate rows in my table under certain conditions. For example I have :
x <- data.frame("id"=c("T","T","R","R"),"value"=c(10,-5,10,-5),"level"=c(3,2,1,2))
print(x)
My condition is : for the same "id" if the level of a negative value is lower than the level of the positive value, then I can aggregate through summing values. So I get :
x <- data.frame("id"=c("T","R","R"),"value"=c(5,10,-5))
print(x)
Can I do this using aggregate() fucntion ?

Or:
x <- data.frame("id"=c("T","T","R","R"),"value"=c(10,-5,10,-5),"level"=c(3,2,1,2))
lookup_vec <- setNames(x[sign(x$value) == 1, ]$level,
as.character(x[sign(x$value) == 1, ]$id))
x$level_plus <- lookup_vec[as.character(x$id)]
x$level_plus <- ifelse(x$level_plus >= x$level, x$level_plus, x$level)
aggregate(value ~ id + level_plus, x, sum)[c("id", "value")]
# id value
# 1 R 10
# 2 R -5
# 3 T 5

You could use by.
do.call(rbind, by(x, x$id, function(x) {i <- cbind(x, d=c(1, diff(x[, 3]))); i[i$d > 0, 1:2]}))
# id value
# 1 T 5
# 2 R 10
# 3 R -5

how to use lapply to add the value of each number in a vector to a list of DFs

I have a list in which each element is a DF of two columns and there are n total DFs (length of the list). I also have a numeric vector of length n. I want to add (sum) each numeric value in the vector to each DF in the list.
For example,
df1 <- data.frame(x = runif(3), y = runif(3))
df2 <- data.frame(x = runif(3), y = runif(3))
dfs <- dfs <- list(df1, df2)
a <- c(1,2)
print(dfs)
[[1]]
x y
1 0.8272478 0.2574596
2 0.6211760 0.9493301
3 0.7034334 0.9994961
[[2]]
x y
1 0.3088512 0.7153767
2 0.2060098 0.8956978
3 0.5299310 0.1292302
I want the result to be
[[1]]
x y
1 1.8272478 1.2574596
2 1.6211760 1.9493301
3 1.7034334 1.9994961
[[2]]
x y
1 2.3088512 2.7153767
2 2.2060098 2.8956978
3 2.5299310 2.1292302

We can use Map to sum the corresponding elements of 'dfs' with the elements of 'a'
Map(`+`, dfs, a)
#[[1]]
# x y
#1 1.827248 1.257460
#2 1.621176 1.949330
#3 1.703433 1.999496
#[[2]]
# x y
#1 2.308851 2.715377
#2 2.206010 2.895698
#3 2.529931 2.129230

If you want to do it only using lapply, as the title of the question, you need to be able to keep track of what position you are on. This calls for a global counter that persists between function calls.
curIndex <- 1
Then, we need a function that will add the correct element to the given data.frame, and increment the counter.
addNextElement <- function(x) {
nextElement <- a[curIndex]
curIndex <<- curIndex + 1
return(as.data.frame(lapply(x, `+`, nextElement)))
}
Then just apply the above function to each df.
> lapply(dfs, addNextElement)
#[[1]]
# x y
#1 1.391478 1.174208
#2 1.753578 1.222098
#3 1.973113 1.569923
#
#[[2]]
# x y
#1 2.454878 2.135680
#2 2.749754 2.132524
#3 2.269514 2.241654

Replicate variable based off match of two other variables in R

I've got a seemingly simple question that I can't answer: I've got three vectors:
x <- c(1,2,3,4)
weight <- c(5,6,7,8)
y <- c(1,1,1,2,2,2)
I want to create a new vector that replicates the values of weight for each time an element in x matches y such that it produces the following new weight vector associated with y:
y_weight <- c(5,5,5,6,6,6)
Any thoughts on how to do this (either loop or vectorized)? Thanks

You want the match function.
match(y, x)
to return the indicies of the matches, the use that to build your new weight vector
weight[match(y, x)]

#Using plyr
library(plyr)
df<-as.data.frame(cbind(x,weight)) # converting to dataframe
df<-rename(df,c(x="y")) # rename x as y for joining dataframes
y<-as.data.frame(y) # converting to dataframe
mydata <- join(df, y, by = "y",type="right")
> mydata
y weight
1 1 5
2 1 5
3 1 5
4 2 6
5 2 6
6 2 6

How to find which elements of one set are in another set?

I have two sets: A with columns x,y, and B also with columns x, y.
I need to find the index of the rows of A which are inside of B (both x and y must match).
I have come up with a simple solution (see below), but this comparison is inside of the loop and paste adds much more extra time.
B <- data.frame(x = sample(1:1000, 1000), y = sample(1:1000, 1000))
A <- B[sample(1:1000, 10),]
#change some elements
A$x[c(1,3,7,10)] <- A$x[c(1,3,7,10)] + 0.5
A$xy <- paste(A$x, A$y, sep='ZZZ')
B$xy <- paste(B$x, B$y, sep='ZZZ')
indx <- which(A$xy %in% B$xy)
indx
For example for a single observation an alternative to paste is almost 3 times faster
ind <- sample(1:1000, 1)
xx <- B$x[ind]
yy <- B$y[ind]
ind <- which(with(B, x==xx & y==yy))
# [1] 0.0160000324249268 seconds
xy <- paste(xx,'ZZZ',yy, sep='')
ind <- which(B$xy == xy)
# [1] 0.0469999313354492 seconds

How about using merge() to do the matching for you?
A$id <- seq_len(nrow(A))
sort(merge(A, B)$id)
# [1] 2 4 5 6 8 9
Edit:
Or, to get rid of two unnecessary sorts, use the sort= option to merge()
merge(A, B, sort=FALSE)$id
# [1] 2 4 5 6 8 9

Reorganizing Lists of data.frames

Let's say I have a list of data frames. Where each data frame has columns like this:
lists$a
company, x, y ,z
lists$b
company, x, y, z
lists$c
company, x, y, z
Any thoughts on how I mean change it to something like:
new.list$company
a,x,y,z
b,x,y,z
c,x,y,z
new.list$company2
a,x,y,z
b,x,y,z
c,x,y,z
I've been using:
new.list[[company]] <- ldply(lists, subset, company=company.name)
But this only does one at a time. Is there a shorter way?

Brandon,
You can use the | parameter in cast to create lists. Using the data.frame from #Wojciech:
require(reshape)
dat.m <- melt(dat_1, "company")
cast(dat.m, L1 ~ variable | company)

Here's a way using the plyr package: start with #wojciech's dat_l and put the whole thing in a single data-frame using ldply:
require(plyr)
df <- ldply(dat_l)
and then turn it back into a list by splitting on the company column:
new_list <- dlply(df, .(company), subset, select = c(.id,x,y,z) )
> new_list[1:3]
$C
.id x y z
3 a 3 0.7209484 1.6247163
35 i 3 0.1630658 0.2158516
37 j 1 0.8779915 -0.9371671
$G
.id x y z
2 a 2 0.1132311 -1.8067876
10 c 2 0.1825166 1.8355509
28 g 4 0.6474877 -0.8052137
$H
.id x y z
1 a 1 0.9562020 -1.450522
25 g 1 0.1322886 0.584342

Example data
dat_l <- lapply(1:10,function(x) data.frame(x=1:4,y=rexp(4),
z=rnorm(4),company=sample(LETTERS,4)))
names(dat_l) <- letters[1:10]
Code
Nrec <- unlist(lapply(dat_l,nrow))
dat <- do.call(rbind,dat_l)
dat$A <- rep(names(Nrec),Nrec)
dat_new <- split(dat[-4],dat$company)