Randomise across columns for half a dataset - r

I have a data set for MMA bouts.
The structure currently is
Fighter 1, Fighter 2, Winner
x y x
x y x
x y x
x y x
x y x
My problem is that Fighter 1 = Winner so my model will be trained that fighter 1 always wins, which is a problem.
I need to be able to randomly swap Fighter 1 and Fighter 2 for half the data set in order to have the winner represented equally.
Ideally i would have this
Fighter 1, Fighter 2, Winner
x y x
y x x
x y y
y x x
x y y
is there a way to randomise across columns without messing up the order of the rows ??

I'm assuming your xs and ys are arbitrary and just placeholders. I'll further assume that you need the Winner column to stay the same, you just need that the winner not always be in the first column.
Sample data:
set.seed(42)
x <- data.frame(
F1 = sample(letters, size = 5),
F2 = sample(LETTERS, size = 5),
stringsAsFactors = FALSE
)
x$W <- x$F1
x
# F1 F2 W
# 1 x N x
# 2 z S z
# 3 g D g
# 4 t P t
# 5 o W o
Choose some rows to change, randomly:
(ind <- sample(nrow(x), size = ceiling(nrow(x)/2)))
# [1] 3 5 4
This means that we expect rows 3-5 to change.
Now the random changes:
within(x, { tmp <- F1[ind]; F1[ind] = F2[ind]; F2[ind] = tmp; rm(tmp); })
# F1 F2 W
# 1 x N x
# 2 z S z
# 3 D g g
# 4 P t t
# 5 W o o
Rows 1-2 still show the F1 as the Winner, and rows 3-5 show F2 as the Winner.

I also found that this code worked
matches_clean[, c("fighter1", "fighter2")] <- lapply(matches_clean[, c("fighter1", "fighter2")], as.character)
changeInd <- !!((match(matches_clean$fighter1, levels(as.factor(matches_clean$fighter1))) -
match(matches_clean$fighter2, levels(as.factor(matches_clean$fighter2)))) %% 2)
matches_clean[changeInd, c("fighter1", "fighter2")] <- matches_clean[changeInd, c("fighter2", "fighter1")]

Related

placing value between specific numbers in cycle

so lets say I have
x = 1,4,2
i = 2
j = 4
k = 3
So i = 2 and j = 4, the point is i need to place k (3) between the numbers i,j in x so the result would be x = 1,4,3,2. I need it to work in a cycle because the numbers in i,j,k always change and so does the length of x when a new number from k is placed in x. The new x after step one is
x = 1,4,3,2 and lets say new values:
i = 4
j = 3
k = 5 so again in the cycle it should place 5 in x between 4 and 3 so final x = 1,4,5,3,2
Is there a way i could do it?
When i is always the number before j,
You could use append function:
ie:
x = c(1,4,2)
i = 4
k = 3
x <- append(x, k, match(i, x))
x
[1] 1 4 3 2
i = 4
k = 5
x <- append(x, k, match(i, x))
x
[1] 1 4 5 3 2
Putting this in a function:
insert <- function(x, k, i){
append(x, k, match(i, x))
}
Note that you did not specify what would happen if you had more than 1 four in your vector. ie x<- c(1,4,2,4,2) where exactly do you want to place the 3? Is it after the first four or the second four? etc
You can try this function :
insert_after <- function(x, i, k) {
ind <- match(i, x)
new_inds <- sort(c(seq_along(x), ind))
new_x <- x[new_inds]
new_x[duplicated(new_inds)] <- k
new_x
}
x = c(1,4,2)
x <- insert_after(x, 4, 3)
x
#[1] 1 4 3 2
x <- insert_after(x, 4, 5)
x
#[1] 1 4 5 3 2

R - build a matrix from other matrices with linking information [duplicate]

This question already has answers here:
How to join (merge) data frames (inner, outer, left, right)
(13 answers)
Simultaneously merge multiple data.frames in a list
(9 answers)
Closed 3 years ago.
I need to build a matrix from data that is stored in several other matrices that all have a pointer in their first column. This is how the original matrices might look, with a-e being the pointers connecting the the data from all the matrices and the v-z being the data that is linked together. The arrow points to what I want my final matrix to look like.
a x x
b y y
c z z
d w w
e v v
e v v
d w w
c z z
b y y
a x x
----->
x x x x
y y y y
z z z z
w w w w
v v v v
I cant seem to write the right algorithm to do this, I am either getting subscript out of bounds errors or replacement has length zero errors. Here is what I have now but it is not working.
for(i in 1:length(matlist)){
tempmatrix = matlist[[i]] # list of matrices to be combined
genMatrix[1,i] = tempmatrix[1,2]
for(j in 2:length(tempmatrix[,1])){
index = which(indexv == tempmatrix[j,1]) #the row index for the data that needs to be match
# with an ECID
for(k in 1:length(tempmatrix[1,])){
genMatrix[index,k+i] = tempmatrix[j,k]
}
# places the data in same row as the ecid
}
}
print(genMatrix)
EDIT: I just want to clarify that my example only shows two matrices but in the list matlist there can be any number of matrices. I need to find a way of merging them without having to know how many matrices are in matlist at the time.
We can merge all the matrices in the list using Reduce and merge from base package.
as.matrix(read.table(text="a x x
b y y
c z z
d w w
e v v")) -> mat1
as.matrix(read.table(text="e v v
d w w
c z z
b y y
a x x")) -> mat2
as.matrix(read.table(text="e x z
d z w
c w v
b y x
a v y")) -> mat3
matlist <- list(mat1=mat1, mat2=mat2, mat3=mat3)
Reduce(function(m1, m2) merge(m1, m2, by = "V1", all.x = TRUE),
matlist)[,-1]
#> V2.x V3.x V2.y V3.y V2 V3
#> 1 x x x x v y
#> 2 y y y y y x
#> 3 z z z z w v
#> 4 w w w w z w
#> 5 v v v v x z
Created on 2019-06-05 by the reprex package (v0.3.0)
Or we can append all the matrices together and then use tidyr to go from long to wide and get the desired output.
library(tidyr)
library(dplyr)
bind_rows(lapply(matlist, as.data.frame), .id = "mat") %>%
gather(matkey, val, c("V2","V3")) %>%
unite(matkeyt, mat, matkey, sep = ".") %>%
spread(matkeyt, val) %>%
select(-V1)
#> mat1.V2 mat1.V3 mat2.V2 mat2.V3 mat3.V2 mat3.V3
#> 1 x x x x v y
#> 2 y y y y y x
#> 3 z z z z w v
#> 4 w w w w z w
#> 5 v v v v x z
Created on 2019-06-06 by the reprex package (v0.3.0)

R Replace values in multiply columns based on specified condition?

How can I replace 2nd to 7th values of "N" to "Y" in the first row ? the first value stays "N"
SOC_023 SOC_040 SOC_044 SOC_055 SOC_079 SOC_089 SOC_090
1 N N N N N N N
2 N N N N N N Y
3 N N N N N Y N
My desire outcone is :
1 N Y Y Y Y Y Y
Many thanks,
A.
a <- read.table("a.txt", sep = '\t', header=TRUE, stringsAsFactors=FALSE)
a
SOC_023 SOC_040 SOC_044 SOC_055 SOC_079 SOC_089 SOC_090
1 N N N N N N N
2 N N N N N N Y
3 N N N N N Y N
a[1,2:7] <- "Y"
a
SOC_023 SOC_040 SOC_044 SOC_055 SOC_079 SOC_089 SOC_090
1 N Y Y Y Y Y Y
2 N N N N N N Y
3 N N N N N Y N
Ok, it's a bit tricky but possible to do. I will edit this answer. We want to change N to Y only in rows where from column 2:7 we have only N, sooo I added new column with value FALSE and TRUE. If row have only N from column 2:7 value is FALSE becase we have not any Y. I use
b$new <- apply(b[,2:7], 1, function(x) any(x %in% c("Y")))
SOC_023 SOC_040 SOC_044 SOC_055 SOC_079 SOC_089 SOC_090 new
1 N N N N N N N FALSE
2 N N N N N N Y TRUE
3 N N N N N Y N TRUE
Then if we have FALSE in column new we can put values Y in columns 2:7
b[,2:7][b$new==FALSE ,] <- "Y"
So we have desired result.
SOC_023 SOC_040 SOC_044 SOC_055 SOC_079 SOC_089 SOC_090 new
1 N Y Y Y Y Y Y FALSE
2 N N N N N N Y TRUE
3 N N N N N Y N TRUE
Summarizing, each value in roww with value N in columns 2:7 will be replaced with Y.
Of course we dont need column new so we can remove it by
b$new <- NULL
Ok, so count occarances in columns and barplot:
x <- apply(a, 2, table)
y <- do.call(rbind, x)
Easy R bulit barplot
z <- as.data.frame(t(y))
barplot(data.matrix(z[1:2,]), col=c("darkblue","red"),beside=TRUE)
X-axis labels will expand, if you plot it by yourself.
There's other way to get this plot using ggplot package but I would have to re-build datafile what is a bit time consuming, cheers!
>dat
SOC_023 SOC_040 SOC_044 SOC_055 SOC_079 SOC_089 SOC_090
1 N Y Y Y Y Y Y
2 N N N N N N Y
3 N N N N N N N
4 N N N N N Y N
5 N Y N Y N N N
6 Y Y Y Y Y Y Y
dat$new <- apply(dat[,1:7], 1, function(x) all(x %in% c("Y") | all((x %in% c("N")))))
result <- dat[dat$new!=TRUE, ]
result$new <- NULL
> result
SOC_023 SOC_040 SOC_044 SOC_055 SOC_079 SOC_089 SOC_090
1 N Y Y Y Y Y Y
2 N N N N N N Y
4 N N N N N Y N
5 N Y N Y N N N

number elements in a vector with constraints

Given x and y I wish to create the desired.result below:
x <- 1:10
y <- c(2:4,6:7,8:9)
desired.result <- c(1,2,2,2,3,4,4,5,5,6)
where, in effect, each sequence in y is replaced in x by the the first element in the sequence in y and then the elements of the new x are numbered.
The intermediate step for x would be:
x.intermediate <- c(1,2,2,2,5,6,6,8,8,10)
Below is code that does this. However, the code is not general and is overly complex:
x <- 1:10
y <- list(c(2:4),(6:7),(8:9))
unique.x <- 1:(length(x[-unlist(y)]) + length(y))
y1 <- rep(min(unlist(y[1])), length(unlist(y[1])))
y2 <- rep(min(unlist(y[2])), length(unlist(y[2])))
y3 <- rep(min(unlist(y[3])), length(unlist(y[3])))
new.x <- x
new.x[unlist(y[1])] <- y1
new.x[unlist(y[2])] <- y2
new.x[unlist(y[3])] <- y3
rep(unique.x, rle(new.x)$lengths)
[1] 1 2 2 2 3 4 4 5 5 6
Below is my attempt to generalize the code. However, I am stuck on the second lapply.
x <- 1:10
y <- list(c(2:4),(6:7),(8:9))
unique.x <- 1:(length(x[-unlist(y)]) + length(y))
y2 <- lapply(y, function(i) rep(min(i), length(i)))
new.x <- x
lapply(y2, function(i) new.x[i[1]:(i[1]-1+length(i))] = i)
rep(unique.x, rle(new.x)$lengths)
Thank you for any advice. I suspect there is a much simpler solution I am overlooking. I prefer a solution in base R.
A solution like this should work:
x <- 1:10
y <- list(c(2:4),(6:7),(8:9))
x[unlist(y)]<-rep(sapply(y,'[',1),lapply(y,length))
rep(1:length(rle(x)$lengths), rle(x)$lengths)
## [1] 1 2 2 2 3 4 4 5 5 6

How to combine two vectors into a data frame

I have two vectors like this
x <-c(1,2,3)
y <-c(100,200,300)
x_name <- "cond"
y_name <- "rating"
I'd like to output the dataframe like this:
> print(df)
cond rating
1 x 1
2 x 2
3 x 3
4 y 100
5 y 200
6 y 300
What's the way to do it?
While this does not answer the question asked, it answers a related question that many people have had:
x <-c(1,2,3)
y <-c(100,200,300)
x_name <- "cond"
y_name <- "rating"
df <- data.frame(x,y)
names(df) <- c(x_name,y_name)
print(df)
cond rating
1 1 100
2 2 200
3 3 300
x <-c(1,2,3)
y <-c(100,200,300)
x_name <- "cond"
y_name <- "rating"
require(reshape2)
df <- melt(data.frame(x,y))
colnames(df) <- c(x_name, y_name)
print(df)
UPDATE (2017-02-07):
As an answer to #cdaringe comment - there are multiple solutions possible, one of them is below.
library(dplyr)
library(magrittr)
x <- c(1, 2, 3)
y <- c(100, 200, 300)
z <- c(1, 2, 3, 4, 5)
x_name <- "cond"
y_name <- "rating"
# Helper function to create data.frame for the chunk of the data
prepare <- function(name, value, xname = x_name, yname = y_name) {
data_frame(rep(name, length(value)), value) %>%
set_colnames(c(xname, yname))
}
bind_rows(
prepare("x", x),
prepare("y", y),
prepare("z", z)
)
This should do the trick, to produce the data frame you asked for, using only base R:
df <- data.frame(cond=c(rep("x", times=length(x)),
rep("y", times=length(y))),
rating=c(x, y))
df
cond rating
1 x 1
2 x 2
3 x 3
4 y 100
5 y 200
6 y 300
However, from your initial description, I'd say that this is perhaps a more likely usecase:
df2 <- data.frame(x, y)
colnames(df2) <- c(x_name, y_name)
df2
cond rating
1 1 100
2 2 200
3 3 300
[edit: moved parentheses in example 1]
You can use expand.grid( ) function.
x <-c(1,2,3)
y <-c(100,200,300)
expand.grid(cond=x,rating=y)
Here's a simple function. It generates a data frame and automatically uses the names of the vectors as values for the first column.
myfunc <- function(a, b, names = NULL) {
setNames(data.frame(c(rep(deparse(substitute(a)), length(a)),
rep(deparse(substitute(b)), length(b))), c(a, b)), names)
}
An example:
x <-c(1,2,3)
y <-c(100,200,300)
x_name <- "cond"
y_name <- "rating"
myfunc(x, y, c(x_name, y_name))
cond rating
1 x 1
2 x 2
3 x 3
4 y 100
5 y 200
6 y 300
df = data.frame(cond=c(rep("x",3),rep("y",3)),rating=c(x,y))
Alt simplification of https://stackoverflow.com/users/1969435/gx1sptdtda above:
cond <-c(1,2,3)
rating <-c(100,200,300)
df <- data.frame(cond, rating)
df
cond rating
1 1 100
2 2 200
3 3 300

Resources