I'm currently working with a large matrix of two columns, and what I want to check is If every line/combination (two columns) is also present in a dataframe loaded (two columns as well).
Example,
(obj_design <- matrix(c(2,5,4,7,6,6,20,12,4,0), nrow = 5, ncol = 2))
[,1] [,2]
[1,] 2 6
[2,] 5 20
[3,] 4 12
[4,] 7 4
[5,] 6 0
(refined_grid <- data.frame(i=1:4, j=1:12))
i j
1 1 1
2 2 2
3 3 3
4 4 4
5 1 5
6 2 6
7 3 7
8 4 8
9 1 9
10 2 10
11 3 11
12 4 12
Following the reproducible example, it would be selected (2,6) and (4,12).
I'm wondering if there's a function that I can use to check the whole matrix, and see if a specific line is in the dataframe, and (if possible) write separately (new dataset) which elements of the matrix it is in.
Any assistance would be wonderful.
Here is an option with match
i1 <- match(do.call(paste, as.data.frame(obj_design)),
do.call(paste, refined_grid), nomatch = 0)
refined_grid[i1,]
This code will give you which rows of the matrix exist in the dataframe.
which(paste(obj_design[,1], obj_design[,2]) %in%
paste(refined_grid$i, refined_grid$j)
)
Then you can just assign it to a vector!
Related
Lets say I have a data frame with the following structure:
> DF <- data.frame(x=1:5, y=6:10)
> DF
x y
1 1 6
2 2 7
3 3 8
4 4 9
5 5 10
I need to build a new data frame with overlapping observations from the first data frame to be used as an input for building the A matrix for the Rglpk optimization library. I would use n-length observation windows, so that if n=2 the resulting data frame would join rows 1&2, 2&3, 3&4, and so on. The length of the resulting data frame would be
(numberOfObservations-windowSize+1)*windowSize
The result for this example with windowSize=2 would be a structure like
x y
1 1 6
2 2 7
3 2 7
4 3 8
5 3 8
6 4 9
7 4 9
8 5 10
I could do a loop like
DFResult <- NULL
numBlocks <- nrow(DF)-windowSize+1
for (i in 1:numBlocks) {
DFResult <- rbind(DFResult, DF[i:(i+horizon-1), ])
}
But this seems vey inefficient, especially for very large data frames.
I also tried
rollapply(data=DF, width=windowSize, FUN=function(x) x, by.column=FALSE, by=1)
x y
[1,] 1 6
[2,] 2 7
[3,] 2 7
[4,] 3 8
where I was trying to repeat a block of rows without applying any aggregate function. This does not work since I am missing some rows
I am a bit stumped by this and have looked around for similar problems but could not find any. Does anyone have any better ideas?
We could do a vectorized approach
i1 <- seq_len(nrow(DF))
res <- DF[c(rbind(i1[-length(i1)], i1[-1])),]
row.names(res) <- NULL
res
# x y
#1 1 6
#2 2 7
#3 2 7
#4 3 8
#5 3 8
#6 4 9
#7 4 9
#8 5 10
I have 6 digits (1, 2, 3, 4, 5, 6), and I need to create all possible combinations (i.e. 6*5*4*3*2*1 = 720 combinations) in which no number can be used twice and O is not allowed. I would like to obtain combinations like: 123456, 246135, 314256, etc.
Is there a way to create them with Matlab or R? Thank you.
In Matlab you can use
y = perms(1:6);
This gives a numerical 720×6 array y, where each row is a permutation:
y =
6 5 4 3 2 1
6 5 4 3 1 2
6 5 4 2 3 1
6 5 4 2 1 3
6 5 4 1 2 3
···
If you want the result as a char array:
y = char(perms(1:6)+'0');
which produces
y =
654321
654312
654231
654213
654123
···
In R:
library(combinat)
p <- permn(1:6)
gives you a list; do.call(rbind, p) or matrix(unlist(p), ncol=6, byrow=TRUE) will give a numeric array; sapply(p,paste,collapse="") gives a vector of strings.
Here's a base R 'solution':
p <- unique(t(replicate(100000, sample(6,6), simplify="vector")))
nrow(p)
#> [1] 720
head(p)
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 3 5 4 2 1 6
#> [2,] 6 3 5 4 1 2
#> [3,] 5 1 6 2 3 4
#> [4,] 6 5 3 2 4 1
#> [5,] 5 2 3 6 4 1
#> [6,] 1 4 2 5 6 3
It's a hack of course, and this potentially only applies to the example given, but sometimes it's useful to do things in silly ways... this takes an excessive number of samples (without replacement) of the vector 1:6, then removes any duplicates. It does indeed produce the unique 720 results, but they're not sorted.
A base R approach is
x <- do.call(expand.grid, rep(list(1:6), 6))
x <- x[apply(x, MAR = 1, function(x) length(unique(x)) == 6), ]
which creates a matrix with 6^6 rows, then retains only rows that contain all 6 numbers.
Seems like this very simple maneuver used to work for me, and now it simply doesn't. A dummy version of the problem:
df <- data.frame(x = 1:5) # create simple dataframe
df
x
1 1
2 2
3 3
4 4
5 5
df$y <- c(1:5) # adding a new column with a vector of the exact same length. Works out like it should
df
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
df$z <- c(1:4) # trying to add a new colum, this time with a vector with less elements than there are rows in the dataframe.
Error in `$<-.data.frame`(`*tmp*`, "z", value = 1:4) :
replacement has 4 rows, data has 5
I was expecting this to work with the following result:
x y z
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 1
I.e. the shorter vector should just start repeating itself automatically. I'm pretty certain this used to work for me (it's in a script that I've been running a hundred times before without problems). Now I can't even get the above dummy example to work like I want to. What am I missing?
If the vector can be evenly recycled, into the data.frame, you do not get and error or a warning:
df <- data.frame(x = 1:10)
df$z <- 1:5
This may be what you were experiencing before.
You can get your vector to fit as you mention with rep_len:
df$y <- rep_len(1:3, length.out=10)
This results in
df
x z y
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 1
5 5 5 2
6 6 1 3
7 7 2 1
8 8 3 2
9 9 4 3
10 10 5 1
Note that in place of rep_len, you could use the more common rep function:
df$y <- rep(1:3,len=10)
From the help file for rep:
rep.int and rep_len are faster simplified versions for two common cases. They are not generic.
If the total number of rows is a multiple of the length of your new vector, it works fine. When it is not, it does not work everywhere. In particular, probably you have used this type of recycling with matrices:
data.frame(1:6, 1:3, 1:4) # not a multiply
# Error in data.frame(1:6, 1:3, 1:4) :
# arguments imply differing number of rows: 6, 3, 4
data.frame(1:6, 1:3) # a multiple
# X1.6 X1.3
# 1 1 1
# 2 2 2
# 3 3 3
# 4 4 1
# 5 5 2
# 6 6 3
cbind(1:6, 1:3, 1:4) # works even with not a multiple
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] 2 2 2
# [3,] 3 3 3
# [4,] 4 1 4
# [5,] 5 2 1
# [6,] 6 3 2
# Warning message:
# In cbind(1:6, 1:3, 1:4) :
# number of rows of result is not a multiple of vector length (arg 3)
I have a .txt file in which there are 13 columns. the first one is Characters(names) and the next 12 are numbers. also there are 1000 rows. I want to filter out the rows in which even one column has the value less than 10. in other word I just need the rows with values equal or more than 10 in all columns. could you please let me know how I can do that in R?
thanks.
You can use the which() function in R to satisfy your condition. Create some test data:
> test
X1 X2 X3 X4
1 9.725585 10.067146 9.473320 9.959529
2 10.104124 11.278900 9.299356 10.317570
3 8.770733 11.092994 9.803285 12.078180
4 10.163150 9.233452 9.425293 9.968435
5 9.815270 9.932501 9.798252 9.194674
6 10.635158 9.175388 10.938356 10.611528
7 10.959444 7.766411 8.955005 10.712767
8 9.907442 10.123078 9.897276 10.467526
9 9.337628 10.811072 11.062031 10.426313
10 10.056789 11.029007 10.875958 11.160633
using which(test < 10, arr.ind = TRUE) gives:
> head(which(test < 10, arr.ind = TRUE))
row col
[1,] 1 1
[2,] 3 1
[3,] 5 1
[4,] 8 1
[5,] 9 1
[6,] 4 2
Then:
> sort(unique(which(test < 10, arr.ind = TRUE)[, 1]))
[1] 1 2 3 4 5 6 7 8 9
I am trying to simulate the OFFSET function from Excel. I understand that this can be done for a single value but I would like to return a range. I'd like to return a group of values with an offset of 1 and a group size of 2. For example, on row 4, I would like to have a group with values of column a, rows 3 & 2. Sorry but I am stumped.
Is it possible to add this result to the data frame as another column using cbind or similar? Alternatively, could I use this in a vectorized function so I could sum or mean the result?
Mockup Example:
> df <- data.frame(a=1:10)
> df
a
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
> #PROCESS
> df
a b
1 1 NA
2 2 (1)
3 3 (1,2)
4 4 (2,3)
5 5 (3,4)
6 6 (4,5)
7 7 (5,6)
8 8 (6,7)
9 9 (7,8)
10 10 (8,9)
This should do the trick:
df$b1 <- c(rep(NA, 1), head(df$a, -1))
df$b2 <- c(rep(NA, 2), head(df$a, -2))
Note that the result will have to live in two columns, as columns in data frames only support simple data types. (Unless you want to resort to complex numbers.) head with a negative argument cuts the negated value of the argument from the tail, try head(1:10, -2). rep is repetition, c is concatenation. The <- assignment adds a new column if it's not there yet.
What Excel calls OFFSET is sometimes also referred to as lag.
EDIT: Following Greg Snow's comment, here's a version that's more elegant, but also more difficult to understand:
df <- cbind(df, as.data.frame((embed(c(NA, NA, df$a), 3))[,c(3,2)]))
Try it component by component to see how it works.
Do you want something like this?
> df <- data.frame(a=1:10)
> b=t(sapply(1:10, function(i) c(df$a[(i+2)%%10+1], df$a[(i+4)%%10+1])))
> s = sapply(1:10, function(i) sum(b[i,]))
> df = data.frame(df, b, s)
> df
a X1 X2 s
1 1 4 6 10
2 2 5 7 12
3 3 6 8 14
4 4 7 9 16
5 5 8 10 18
6 6 9 1 10
7 7 10 2 12
8 8 1 3 4
9 9 2 4 6
10 10 3 5 8