How to combine two vectors into a data frame - r

I have two vectors like this
x <-c(1,2,3)
y <-c(100,200,300)
x_name <- "cond"
y_name <- "rating"
I'd like to output the dataframe like this:
> print(df)
cond rating
1 x 1
2 x 2
3 x 3
4 y 100
5 y 200
6 y 300
What's the way to do it?

While this does not answer the question asked, it answers a related question that many people have had:
x <-c(1,2,3)
y <-c(100,200,300)
x_name <- "cond"
y_name <- "rating"
df <- data.frame(x,y)
names(df) <- c(x_name,y_name)
print(df)
cond rating
1 1 100
2 2 200
3 3 300

x <-c(1,2,3)
y <-c(100,200,300)
x_name <- "cond"
y_name <- "rating"
require(reshape2)
df <- melt(data.frame(x,y))
colnames(df) <- c(x_name, y_name)
print(df)
UPDATE (2017-02-07):
As an answer to #cdaringe comment - there are multiple solutions possible, one of them is below.
library(dplyr)
library(magrittr)
x <- c(1, 2, 3)
y <- c(100, 200, 300)
z <- c(1, 2, 3, 4, 5)
x_name <- "cond"
y_name <- "rating"
# Helper function to create data.frame for the chunk of the data
prepare <- function(name, value, xname = x_name, yname = y_name) {
data_frame(rep(name, length(value)), value) %>%
set_colnames(c(xname, yname))
}
bind_rows(
prepare("x", x),
prepare("y", y),
prepare("z", z)
)

This should do the trick, to produce the data frame you asked for, using only base R:
df <- data.frame(cond=c(rep("x", times=length(x)),
rep("y", times=length(y))),
rating=c(x, y))
df
cond rating
1 x 1
2 x 2
3 x 3
4 y 100
5 y 200
6 y 300
However, from your initial description, I'd say that this is perhaps a more likely usecase:
df2 <- data.frame(x, y)
colnames(df2) <- c(x_name, y_name)
df2
cond rating
1 1 100
2 2 200
3 3 300
[edit: moved parentheses in example 1]

You can use expand.grid( ) function.
x <-c(1,2,3)
y <-c(100,200,300)
expand.grid(cond=x,rating=y)

Here's a simple function. It generates a data frame and automatically uses the names of the vectors as values for the first column.
myfunc <- function(a, b, names = NULL) {
setNames(data.frame(c(rep(deparse(substitute(a)), length(a)),
rep(deparse(substitute(b)), length(b))), c(a, b)), names)
}
An example:
x <-c(1,2,3)
y <-c(100,200,300)
x_name <- "cond"
y_name <- "rating"
myfunc(x, y, c(x_name, y_name))
cond rating
1 x 1
2 x 2
3 x 3
4 y 100
5 y 200
6 y 300

df = data.frame(cond=c(rep("x",3),rep("y",3)),rating=c(x,y))

Alt simplification of https://stackoverflow.com/users/1969435/gx1sptdtda above:
cond <-c(1,2,3)
rating <-c(100,200,300)
df <- data.frame(cond, rating)
df
cond rating
1 1 100
2 2 200
3 3 300

Related

Randomise across columns for half a dataset

I have a data set for MMA bouts.
The structure currently is
Fighter 1, Fighter 2, Winner
x y x
x y x
x y x
x y x
x y x
My problem is that Fighter 1 = Winner so my model will be trained that fighter 1 always wins, which is a problem.
I need to be able to randomly swap Fighter 1 and Fighter 2 for half the data set in order to have the winner represented equally.
Ideally i would have this
Fighter 1, Fighter 2, Winner
x y x
y x x
x y y
y x x
x y y
is there a way to randomise across columns without messing up the order of the rows ??
I'm assuming your xs and ys are arbitrary and just placeholders. I'll further assume that you need the Winner column to stay the same, you just need that the winner not always be in the first column.
Sample data:
set.seed(42)
x <- data.frame(
F1 = sample(letters, size = 5),
F2 = sample(LETTERS, size = 5),
stringsAsFactors = FALSE
)
x$W <- x$F1
x
# F1 F2 W
# 1 x N x
# 2 z S z
# 3 g D g
# 4 t P t
# 5 o W o
Choose some rows to change, randomly:
(ind <- sample(nrow(x), size = ceiling(nrow(x)/2)))
# [1] 3 5 4
This means that we expect rows 3-5 to change.
Now the random changes:
within(x, { tmp <- F1[ind]; F1[ind] = F2[ind]; F2[ind] = tmp; rm(tmp); })
# F1 F2 W
# 1 x N x
# 2 z S z
# 3 D g g
# 4 P t t
# 5 W o o
Rows 1-2 still show the F1 as the Winner, and rows 3-5 show F2 as the Winner.
I also found that this code worked
matches_clean[, c("fighter1", "fighter2")] <- lapply(matches_clean[, c("fighter1", "fighter2")], as.character)
changeInd <- !!((match(matches_clean$fighter1, levels(as.factor(matches_clean$fighter1))) -
match(matches_clean$fighter2, levels(as.factor(matches_clean$fighter2)))) %% 2)
matches_clean[changeInd, c("fighter1", "fighter2")] <- matches_clean[changeInd, c("fighter2", "fighter1")]

Combine lists in forloop

I have a list of chromosomes chromosomes <- c(1:2, "X", "Y") that I am iterating over to generate random data n times for each chromosome.
I am doing this first by iterating over the chromosomes and generating the data using generateData() and then adding these to a list which I then combine into a data frame outside of the loop using bp_data <- as.data.frame(do.call(rbind, simByChrom)):
chromosomes <- c(1:2, "X", "Y")
simByChrom <- list()
for (c in chromosomes){
n <- sample(1:5,1)
cat(paste("Simulating", n, "breakpoints on chromosome", c), "\n")
bp_data <- generateData(c, n)
simByChrom[[c]] <- bp_data
}
bp_data <- as.data.frame(do.call(rbind, simByChrom))
rownames(bp_data) <- NULL
# generate dummy data
generateData <- function(c, n){
df <- data.frame(chrom = rep(c, n),
pos= sample(1:10000, n))
return(df)
}
chrom pos
1 1 7545
2 2 5798
3 2 3863
4 3 4036
5 3 9347
6 3 4749
I would like to iterate over this multiple times and record the iteration number in bp_data$iteration, to produce a data frame that looks like this:
chrom pos iteration
1 7215 1
1 4606 1
2 8282 1
2 3501 1
2 4350 1
2 6044 1
X 2467 1
Y 2816 1
Y 8848 1
Y 2304 1
Y 4235 1
1 3760 2
1 8205 2
1 4735 2
2 3061 2
X 56 2
X 1722 2
X 2430 2
X 6749 2
X 2081 2
Y 9646 2
However, I'm unsure how to do this. I've tried:
iterations <- 2
for (i in (1:iterations)){
cat("Running iteration", i, "\n")
simByChrom <- list()
for (c in chromosomes){
n <- sample(1:5,1)
cat(paste("Simulating", n, "breakpoints on chromosome", c), "\n")
bp_data <- generateData(c, n)
bp_data$iteration <- i
simByChrom[[c]] <- bp_data
# or
# simByChrom[[c]][[i]] <- bp_data
# or
# simByChrom[[c]] <- bp_data
# simByChrom[[c]]$iteration <- i
}
bp_data <- as.data.frame(do.call(rbind, simByChrom))
rownames(bp_data) <- NULL
}
But this results in only the last iteration being recorded.
Can anyone suggest how I can achieve my desired result?
The reason you are only seeing the last iteration in your result is because bp_data is being over-written each time through the for loop. You need to make sure you save each iteration result separately and then combine them together at the end.
I believe just a few minor adjustments to what you already have will do the trick:
iterations <- 2
#create empty list to store each iteration result
bp_data <- list()
#run each iteration
for (i in 1:iterations){
cat("Running iteration", i, "\n")
simByChrom <- list()
for (c in chromosomes){
n <- sample(1:5,1)
cat(paste("Simulating", n, "breakpoints on chromosome", c), "\n")
aa <- generateData(c, n)
aa$iteration <- i
simByChrom[[c]] <- aa
}
result <- as.data.frame(do.call(rbind, simByChrom))
rownames(result) <- NULL
bp_data[[i]] <- result
}
#combine each iteration into one data frame
final <- as.data.frame(do.call(rbind, bp_data))

Matching 2 date columns with unequal length

i have the following sample of data
X <- c("11/12/2016", "12/12/2016", "13/12/2016","14/12/2016","15/12/2016","16/12/2016", "17/12/2016")
Y <- c("11/12/2016", "13/12/2016", "14/12/2016", "18/12/2016")
the output i want is something like this
X Y
11/12/2016 11/12/2016
12/12/2016 NA
13/12/2016 13/12/2016
14/12/2016 14/12/2016
15/12/2016 NA
16/12/2016 NA
17/12/2016 NA
i have tried the following code but not getting the desired output
> X <- as.Date(data$X)
> Y <- as.Date(data$Y)
> Z <- NA
> for (i in 1:length(X)) {
+ if(X[i] == Y){
+ Z <- Y}
+ else NA }
Try this:
Your data:
> X <- c("11/12/2016", "12/12/2016", "13/12/2016","14/12/2016","15/12/2016","16/12/2016", "17/12/2016")
> Y <- c("11/12/2016", "13/12/2016", "14/12/2016", "18/12/2016")
Creating new vector of NA's and doing the match:
> Z<-rep(NA,length(X))
> Z[which(X %in% Y)]<-X[which(X %in% Y)]
> Z
[1] "11/12/2016" NA "13/12/2016" "14/12/2016" NA NA NA
Your data frame:
> data.frame(X,Y=Z)
X Y
1 11/12/2016 11/12/2016
2 12/12/2016 <NA>
3 13/12/2016 13/12/2016
4 14/12/2016 14/12/2016
5 15/12/2016 <NA>
6 16/12/2016 <NA>
7 17/12/2016 <NA>
You could use merge:
X <- c("11/12/2016", "12/12/2016", "13/12/2016","14/12/2016","15/12/2016","16/12/2016", "17/12/2016")
Y <- c("11/12/2016", "13/12/2016", "14/12/2016", "18/12/2016")
df_X <- data.frame(X)
df_Y <- data.frame(X = Y, Y = Y)
merge(df_X, df_Y, all = TRUE)
Or, if you like a tidyverse-approach:
library(tidyverse)
X <- c("11/12/2016", "12/12/2016", "13/12/2016","14/12/2016","15/12/2016","16/12/2016", "17/12/2016")
Y <- c("11/12/2016", "13/12/2016", "14/12/2016", "18/12/2016")
df_X <- tibble(X)
df_Y <- tibble(X = Y, Y = Y)
full_join(df_X, df_Y)
The important part is, that you duplicate your column you want to match and name it accordingly, or use the by-argument.
Got the answer!
What you want is to "match" the values of one long vector for your other vector. For that, the function match is perfect because return the vector position of the matched elements. First, input the data (I added some corrections):
# Input data
X <- c("11/12/2016", "12/12/2016", "13/12/2016","14/12/2016","15/12/2016","16/12/2016", "17/12/2016")
Y <- c("11/12/2016", "13/12/2016", "14/12/2016", "18/12/2016")
# Transform into dates
X <- as.Date(X,"%d/%m/%Y")
Y <- as.Date(Y, "%d/%m/%Y")
Then, I create the data.frame Z based on the long vector X and I add the matched values of the vector Y:
# Run function match so you can see what can of output generates
match(x = X, table = Y)
# Create data.frame
Z <- data.frame(X = X,
# add matched values
Y = Y[match(x = X, table = Y)])
Hope this helps.

R - Looping through datasets and change column names

I'm trying to loop through a bunch of datasets and change columns in R.
I have a bunch of datasets, say a,b,c,etc, and all of them have three columns, say X, Y, Z.
I would like to change their names to be a_X, a_Y, a_Z for dataset a, and b_X, b_Y, b_Z for dataset b, and so on.
Here's my code:
name.list = ("a","b","c")
for(i in name.list){
names(i) = c(paste(i,"_X",sep = ""),paste(i,"_Y",sep = ""),paste(i,"_Y",sep = ""));
}
However, the code above doesn't work since i is in text format.
I've considered assign function but doesn't seem to fit as well.
I would appreciate if any ideas.
Something like this :
list2env(lapply(mget(name.list),function(dat){
colnames(dat) <- paste(nn,colnames(dat),sep='_')
dat
}),.GlobalEnv)
for ( i in name.list) {
assign(i, setNames( get(i), paste(i, names(get(i)), sep="_")))
}
> a
a_X a_Y a_Z
1 1 3 A
2 2 4 B
> b
b_X b_Y b_Z
1 1 3 A
2 2 4 B
> c
c_X c_Y c_Z
1 1 3 A
2 2 4 B
Here's some free data:
a <- data.frame(X = 1, Y = 2, Z = 3)
b <- data.frame(X = 4, Y = 5, Z = 6)
c <- data.frame(X = 7, Y = 8, Z = 9)
And here's a method that uses mget and a custom function foo
name.list <- c("a", "b", "c")
foo <- function(x, i) setNames(x, paste(name.list[i], names(x), sep = "_"))
list2env(Map(foo, mget(name.list), seq_along(name.list)), .GlobalEnv)
a
# a_X a_Y a_Z
# 1 1 2 3
b
# b_X b_Y b_Z
# 1 4 5 6
c
# c_X c_Y c_Z
# 1 7 8 9
You could also avoid get or mget by putting a, b, and c into their own environment (or even a list). You also wouldn't need the name.list vector if you go this route, because it's the same as ls(e)
e <- new.env()
e$a <- a; e$b <- b; e$c <- c
bar <- function(x, y) setNames(x, paste(y, names(x), sep = "_"))
list2env(Map(bar, as.list(e), ls(e)), .GlobalEnv)
Another perk of doing it this way is that you still have the untouched data frames in the environment e. Nothing was overwritten (check a versus e$a).

number elements in a vector with constraints

Given x and y I wish to create the desired.result below:
x <- 1:10
y <- c(2:4,6:7,8:9)
desired.result <- c(1,2,2,2,3,4,4,5,5,6)
where, in effect, each sequence in y is replaced in x by the the first element in the sequence in y and then the elements of the new x are numbered.
The intermediate step for x would be:
x.intermediate <- c(1,2,2,2,5,6,6,8,8,10)
Below is code that does this. However, the code is not general and is overly complex:
x <- 1:10
y <- list(c(2:4),(6:7),(8:9))
unique.x <- 1:(length(x[-unlist(y)]) + length(y))
y1 <- rep(min(unlist(y[1])), length(unlist(y[1])))
y2 <- rep(min(unlist(y[2])), length(unlist(y[2])))
y3 <- rep(min(unlist(y[3])), length(unlist(y[3])))
new.x <- x
new.x[unlist(y[1])] <- y1
new.x[unlist(y[2])] <- y2
new.x[unlist(y[3])] <- y3
rep(unique.x, rle(new.x)$lengths)
[1] 1 2 2 2 3 4 4 5 5 6
Below is my attempt to generalize the code. However, I am stuck on the second lapply.
x <- 1:10
y <- list(c(2:4),(6:7),(8:9))
unique.x <- 1:(length(x[-unlist(y)]) + length(y))
y2 <- lapply(y, function(i) rep(min(i), length(i)))
new.x <- x
lapply(y2, function(i) new.x[i[1]:(i[1]-1+length(i))] = i)
rep(unique.x, rle(new.x)$lengths)
Thank you for any advice. I suspect there is a much simpler solution I am overlooking. I prefer a solution in base R.
A solution like this should work:
x <- 1:10
y <- list(c(2:4),(6:7),(8:9))
x[unlist(y)]<-rep(sapply(y,'[',1),lapply(y,length))
rep(1:length(rle(x)$lengths), rle(x)$lengths)
## [1] 1 2 2 2 3 4 4 5 5 6

Resources