number elements in a vector with constraints - r

Given x and y I wish to create the desired.result below:
x <- 1:10
y <- c(2:4,6:7,8:9)
desired.result <- c(1,2,2,2,3,4,4,5,5,6)
where, in effect, each sequence in y is replaced in x by the the first element in the sequence in y and then the elements of the new x are numbered.
The intermediate step for x would be:
x.intermediate <- c(1,2,2,2,5,6,6,8,8,10)
Below is code that does this. However, the code is not general and is overly complex:
x <- 1:10
y <- list(c(2:4),(6:7),(8:9))
unique.x <- 1:(length(x[-unlist(y)]) + length(y))
y1 <- rep(min(unlist(y[1])), length(unlist(y[1])))
y2 <- rep(min(unlist(y[2])), length(unlist(y[2])))
y3 <- rep(min(unlist(y[3])), length(unlist(y[3])))
new.x <- x
new.x[unlist(y[1])] <- y1
new.x[unlist(y[2])] <- y2
new.x[unlist(y[3])] <- y3
rep(unique.x, rle(new.x)$lengths)
[1] 1 2 2 2 3 4 4 5 5 6
Below is my attempt to generalize the code. However, I am stuck on the second lapply.
x <- 1:10
y <- list(c(2:4),(6:7),(8:9))
unique.x <- 1:(length(x[-unlist(y)]) + length(y))
y2 <- lapply(y, function(i) rep(min(i), length(i)))
new.x <- x
lapply(y2, function(i) new.x[i[1]:(i[1]-1+length(i))] = i)
rep(unique.x, rle(new.x)$lengths)
Thank you for any advice. I suspect there is a much simpler solution I am overlooking. I prefer a solution in base R.

A solution like this should work:
x <- 1:10
y <- list(c(2:4),(6:7),(8:9))
x[unlist(y)]<-rep(sapply(y,'[',1),lapply(y,length))
rep(1:length(rle(x)$lengths), rle(x)$lengths)
## [1] 1 2 2 2 3 4 4 5 5 6

Related

Updating a vector within a dataframe using a random experiment

I have the following dataframes "df1" and "df2":
x1 <- c(1,1,1,2,2,3)
y1 <- c(0,0,1,1,2,2)
df1 <- data.frame(x1,y1)
y <- c(0,1,2)
p <- c(0.1,0.6,0.9)
df2 <- data.frame(y,p)
What I want to do is to update df1$x1 to a new vector df1$x2, based on a random experiment. This can be manually done using the following function and "lapply" on vector df1$x1:
example_function <- function(x,p){
if(runif(1) <= p) return(x + 1)
return(x)
}
set.seed(123)
df1$x2 <- unlist(lapply(df1$x1,example_function,0.5))
The function performs a random experiment and compares it with a given probability p. Depending on the result either x remains the same for df$x2 or increases by the value of 1.
In the procedure described above, "p" was selected manually within the function (here 0.5 for all x-values in df1). However, I want p to be chosen automatically depending on the combination of df1$x1 and df1$y1. Here comes df2 into play. df2 shows which p-values are related to which y-values. For example df1$x1[3] equals 1, the corresponding y value df1$y1[3] is also equal 1. df2 shows that the associated p-value has to be 0.6 (that is the p-value for y equal 1). In order to determine the corresponding value df1$x2, p = 0.6 should be used in "example_function". Depending on df1$y1, p should be 0.1 for df1$x1[1] and df1$x1[2], 0.6 for df1$x1[3] and df1$x1[4] and 0.9 for df1$x1[5] and df1$x1[6].
Following example is an approach, but only if vector df$x1 contains only different values:
x1 <- c(1,2,3,4,5,6)
y1 <- c(0,0,1,1,2,2)
df1 <- data.frame(x1,y1)
set.seed(123)
df1$x2 <- unlist(lapply(df1$x1,
function(z) {
example_function(z, df2$p[df2$y == df1$y1[df1$x1 == z]])
}))
df1
x1 y1 x2
#1 1 0 1
#2 2 0 2
#3 3 1 4
#4 4 1 4
#5 5 2 5
#6 6 2 7
Using x1 <- c(1,1,1,2,2,3), as mentioned above, leads to warnings and errors:
x1 <- c(1,1,1,2,2,3)
y1 <- c(0,0,1,1,2,2)
df1 <- data.frame(x1,y1)
set.seed(123)
df1$x2 <- unlist(lapply(df1$x1,
function(z) {
example_function(z, df2$p[df2$y == df1$y1[df1$x1 == z]])
}))
Error in if (runif(1) <= p) return(x + 1) : argument is of length zero
In addition: Warning message:
In df2$y == df1$y1[df1$x1 == z] :
Error in if (runif(1) <= p) return(x + 1) : argument is of length zero
Is there anyone who has an idea how to fix that problem? I am very grateful for any help.
Working with "merge" seems to be one solution:
df_new <- merge(df1, df2, by.x = 'y1', by.y = 'y')
set.seed(123)
df1$x2 <- mapply(example_function,df1$x1,df_new$p)
> df1
x1 y1 x2
1 1 0 1
2 1 0 1
3 1 1 2
4 2 1 2
5 2 2 2
6 3 2 4

Split a vector into multiple vectors in R

I want to split one vector(x) into multiple vectors(x1, x2 ,... , xn).
My input: x <- 1:10
My desire output:
x1 <- c(1,2,3,4)
x2 <- c(2,3,4,5)
x3 <- c(3,4,5,6)
x4 <- c(4,5,6,7)
x5 <- c(5,6,7,8)
x6 <- c(6,7,8,9)
x7 <- c(7,8,9,10)
My code(thanks to Mrs.Richard Herron for inspiration):
x <- 1:10
n <-3
vectors <- function(x, n) split(x, sort(rank(x) %% n))
vectors(x,n)
Thanks very much!
We can use lapply to loop over the sequence of 'x' such that we have a length of 4 in each of the elements in list, create a sequence (:) from that index to index + n, subset the 'x'. If needed to have individual vectors, we set the names of the list and use list2env.
n <- 3
lst <- lapply(1:(length(x)-n), function(i) x[i:(i+n)])
names(lst) <- paste0("x", seq_along(lst))
list2env(lst, envir = .GlobalEnv)
x1
#[1] 1 2 3 4
x2
#[1] 2 3 4 5
x3
#[1] 3 4 5 6
Or we can also create a matrix instead of multiple vectors in the global environment where each row corresponds to the vector of interest
matrix(x[1:4] + rep(0:6, each = 4), ncol=4, byrow = TRUE)

Standard deviation for each row value and constant value

N <- c(1,3,4,6)
a <- c(3,4,5,6)
b <- c(4,5,6,7)
w <- c(5,6,7,6)
dat1 <- data.frame(N,May = a, April = b,June = w)
N May April June
1 1 3 4 5
2 3 4 5 6
3 4 5 6 7
4 6 6 7 6
I need a data frame, where each value is sd of N value and row value
sd(c(1,3) sd(c(1,4) sd(c(1,5) # for 1st row
sd(c(3,4) sd(c(3,5) sd(c(3,6) # for second and so on.
Try this:
The data:
Norm <- c(1,3,4,6)
a <- c(3,4,5,6)
b <- c(4,5,6,7)
w <- c(5,6,7,6)
mydata <- data.frame(Norm=Norm,May = a, April = b,June = w)
Solution:
finaldata <- do.call('cbind',lapply(names(mydata)[2:4], function(x) apply(mydata[c("Norm",x)],1,sd)))
I hope it helps.
Piece of advice:
Please refrain from using names like data and norm for your variable names. They can easily conflict with things that are native to R. For example norm is a function in R, and so is data.
I think I got it
x=matrix(data=NA, nrow=4, ncol=3)
for(j in 1:3){
for(i in 1:4){
x[i, j] <- sd(data[i, c(i,(j+1))])
x
}
}

Functions with loops & multiples in R

I'm still getting to grips with R and have been set the task of specifically writing a function where if x and y are vectors:
x <- c(3,7,9)
y <- 20
...then all of x and multiples of x which are less than y need to be output in the form of a vector, e.g.:
v1 <- c(3,6,7,9,12,14,15,18)
But then within the function it needs to sum up all the numbers in the vector v1 - (3+6+...+15+18).
I've had a go at it but I can never really get my head around if else statements, so could anyone help me out and explain so I know for future reference?
No loops needed. Figure out how many times each x value goes into y, then generate a list of the unique numbers:
x <- c(3,7,9)
y <- 20
possible <- y %/% x
#[1] 6 2 2
out <- unique(sequence(possible) * rep(x,possible))
# or alternatively
# out <- unique(unlist(Map(function(a,b) sequence(a) * b, possible, x)))
out
#[1] 3 6 9 12 15 18 7 14
sum(out)
#[1] 84
Here's an example using basic loops and if else branching in R.
x <- c(3,7,9)
y1 <- 20
v1 <- numeric()
for(i in x){
nex <- i
counter <- 1
repeat{
if(!(nex %in% v1)){
v1 <- c(v1, nex)
}
counter <- counter + 1
nex <- i*counter
if(nex >= y1){
break
}
}
}
v1 <- sort(v1)
v1.sum <- sum(v1)
v1
## 3 6 7 9 12 14 15 18
v1.sum
## 84

How to combine two vectors into a data frame

I have two vectors like this
x <-c(1,2,3)
y <-c(100,200,300)
x_name <- "cond"
y_name <- "rating"
I'd like to output the dataframe like this:
> print(df)
cond rating
1 x 1
2 x 2
3 x 3
4 y 100
5 y 200
6 y 300
What's the way to do it?
While this does not answer the question asked, it answers a related question that many people have had:
x <-c(1,2,3)
y <-c(100,200,300)
x_name <- "cond"
y_name <- "rating"
df <- data.frame(x,y)
names(df) <- c(x_name,y_name)
print(df)
cond rating
1 1 100
2 2 200
3 3 300
x <-c(1,2,3)
y <-c(100,200,300)
x_name <- "cond"
y_name <- "rating"
require(reshape2)
df <- melt(data.frame(x,y))
colnames(df) <- c(x_name, y_name)
print(df)
UPDATE (2017-02-07):
As an answer to #cdaringe comment - there are multiple solutions possible, one of them is below.
library(dplyr)
library(magrittr)
x <- c(1, 2, 3)
y <- c(100, 200, 300)
z <- c(1, 2, 3, 4, 5)
x_name <- "cond"
y_name <- "rating"
# Helper function to create data.frame for the chunk of the data
prepare <- function(name, value, xname = x_name, yname = y_name) {
data_frame(rep(name, length(value)), value) %>%
set_colnames(c(xname, yname))
}
bind_rows(
prepare("x", x),
prepare("y", y),
prepare("z", z)
)
This should do the trick, to produce the data frame you asked for, using only base R:
df <- data.frame(cond=c(rep("x", times=length(x)),
rep("y", times=length(y))),
rating=c(x, y))
df
cond rating
1 x 1
2 x 2
3 x 3
4 y 100
5 y 200
6 y 300
However, from your initial description, I'd say that this is perhaps a more likely usecase:
df2 <- data.frame(x, y)
colnames(df2) <- c(x_name, y_name)
df2
cond rating
1 1 100
2 2 200
3 3 300
[edit: moved parentheses in example 1]
You can use expand.grid( ) function.
x <-c(1,2,3)
y <-c(100,200,300)
expand.grid(cond=x,rating=y)
Here's a simple function. It generates a data frame and automatically uses the names of the vectors as values for the first column.
myfunc <- function(a, b, names = NULL) {
setNames(data.frame(c(rep(deparse(substitute(a)), length(a)),
rep(deparse(substitute(b)), length(b))), c(a, b)), names)
}
An example:
x <-c(1,2,3)
y <-c(100,200,300)
x_name <- "cond"
y_name <- "rating"
myfunc(x, y, c(x_name, y_name))
cond rating
1 x 1
2 x 2
3 x 3
4 y 100
5 y 200
6 y 300
df = data.frame(cond=c(rep("x",3),rep("y",3)),rating=c(x,y))
Alt simplification of https://stackoverflow.com/users/1969435/gx1sptdtda above:
cond <-c(1,2,3)
rating <-c(100,200,300)
df <- data.frame(cond, rating)
df
cond rating
1 1 100
2 2 200
3 3 300

Resources