I have a data set that looks like the following
xx = c(1:5, 1:9, 1:7)
# [1] 1 2 3 4 5 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7
I would like to know the index of 1 and the maximum value before the sequence begins again. For example:
[1] 1, 1
[2] 5, 5
[3] 6, 1
[4] 14, 9
[5] 15, 1
[6] 21, 7
and so on.....
An option would be (Assuming that the vector contains only sequence elements)
v1 <- which(xx == 1)
v2 <- c(rbind(v1, c(v1[-1]-1, length(xx))))
cbind(ind = v2, value = xx[v2])
# ind value
#[1,] 1 1
#[2,] 5 5
#[3,] 6 1
#[4,] 14 9
#[5,] 15 1
#[6,] 21 7
Or another option is to do a split on the sequence of elements of 'xx' and get the first and last elements of each list
ind <- unlist(lapply(split(seq_along(xx), cumsum(xx==1)), function(x) x[c(1, length(x))]))
cbind(ind, value = xx[ind])
This can be answered with a somewhat ugly (but efficient!) lapply:
a<-lapply(1:length(xx),function(x) {
if(x==length(xx)){c(x,xx[x])}else{
if(xx[x] == 1){c(x,1)}else{
if(xx[x]>xx[x+1]){c(x,xx[x])}
}
}
})
matrix(unlist(a),ncol = 2,byrow = T)
Related
I have this df:
dx <- structure(list(a = c(0.916290731874155, 2.89037175789616, -0.156004248476581,
-0.318453731118534, -2.07944154167984, 2.00533356952611, -1.24319351747922,
0.42744401482694, 1.29532258291416, -2.03292152604494, -0.606135803570316,
-0.693147180559945), b = c(0.550046336919272, 0.228258651980981,
-0.577634293438101, 0.135801541159061, 0.644357016390513, -2.30258509299405,
-0.0870113769896297, 1.71297859137494, 0.17958557697508, -1.65140211153313,
1.31218638896617, 0.282862786015832), c = c(0.0988458346366325,
-3.34403896782221, 1.99243016469021, -1.70474809223843, 2.62103882411258,
2.20727491318972, -1.40242374304977, -1.256836293883, -2.16905370036952,
2.91777073208428, 0.138586163286146, -0.946143695023836), d = c(0.268263986594679,
-2.83321334405622, 1.83258146374831, 1.15057202759882, 0.0613689463762919,
-2.23359222150709, 4.34236137828145, -3.44854350225935, 1.29098418131557,
-0.356674943938732, -0.21868920096483, -0.810930216216329), e = c(1.65140211153313,
0.220400065368459, -0.044951387862266, 0.0773866636154201, -1.49877234454658,
1.36219680954083, -0.295845383090942, -0.709676482511156, -0.916290731874155,
1.65822807660353, 0.451985123743057, -0.810930216216329)), class = "data.frame", row.names = 2:13)
and this script
output <- t(as.matrix(rep(NA, ncol=1)))
for(i in 1:12) {
output <- 2*dx[i,]
cmin <- which.min(output)
}
I need to save the result of cmin for each loop of i in another matrix. The result I expect is:
[1]
[1] 3
[2] 4
[3] 2
[4] 1
[5] 1
[6] 2
[7] 3
[8] 4
[9] 3
[10] 1
[11] 1
[12] 3
How can I do? Thank you!
Just use sapply() here, like this
as.vector(sapply(1:12, \(i) which.min(2*dx[i,])))
Output:
[1] 3 3 2 3 1 2 3 4 3 1 1 3
Just use
as.matrix(apply(dx , 1 , function(x) which.min(2*x)))
[,1]
1 3
2 3
3 2
4 3
5 1
6 2
7 3
8 4
9 3
10 1
11 1
12 3
Initialize a vector of length 12 and then assign the output to each element of the vector
cmin_out <- integer(12)
for(i in 1:12) {
output <- 2*dx[i,]
cmin_out[i] <- which.min(output)
}
cmin_out
[1] 3 3 2 3 1 2 3 4 3 1 1 3
The vector can be converted to a column matrix by wrapping with matrix
matrix(cmin_out)
This can be done in a efficient vectorized way in base R as well without having to loop - i.e. with max.col
max.col(-dx, 'first')
#[1] 3 3 2 3 1 2 3 4 3 1 1 3
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I have a dataframe (df) in R. All columns are character class.
> dim(df)
[1] 1000 6
I'm trying to remove rows where df$entry == c("7795").
entries_to_remove <- subset(df, entry == c("7795"))
> dim(entries_to_remove)
[1] 35 6
So as you can see above, I have 35 entries to remove from the data frame. However, when I go to remove these using subset, it doesn't remove the correct amount:
entries_to_remove <- subset(df, entry != c("7795"))
> dim(entries_to_remove)
[1] 648 6
The above command was supposed to remove 35 entries, but instead it removed 352. Does anyone know why this might be happening?
Here's another solution, which takes up just one line:
df[-which(grepl("7995", apply(df, 1, paste0, collapse = " "))),]
RESULT:
v1 entry1 entry2 entry3
2 2 5 5 2
3 3 2 4 2
4 4 2 3 1
6 6 1 2 1
7 7 2 4 4
8 8 4 5 5
9 9 5 1 5
DATA:
set.seed(121)
df <- data.frame(
v1 = 1:10,
entry1 = c(sample(1:5, 9, replace = T), 7995),
entry2 = c(sample(1:5, 4), 7995, sample(1:5, 5)),
entry3 = c(7995, sample(1:5, 9, replace = T))
)
df[2:4] <- lapply(df[2:4], as.character) # convert to character, as in your data
df
v1 entry1 entry2 entry3
1 1 1 2 7995
2 2 5 5 2
3 3 2 4 2
4 4 2 3 1
5 5 3 7995 2
6 6 1 2 1
7 7 2 4 4
8 8 4 5 5
9 9 5 1 5
10 10 7995 3 5
The above solutions didn't work, I do not think the issue is with NA. However, I solved the problem myself. It is a workaround but it worked:
# list the row numbers for the entries to remove
row_remove <- rownames(entries_to_remove )
# make a list of all the row numbers
all_rows <- 1:dim(df)[1]
# create a vector with only the rows to keep
subset_row <- all_rows[!(all_rows%in%row_remove)]
# subset the dataframe with these rows
df<- df[subset_row,]
The issue has to do with NAs, some of the other solutions will work, but the easiest and I think most inutive is just to use %in% rather than ==
entries_to_remove <- subset(df, !(entry %in% c("7795")))
entries_to_remove <- subset(df, entry %in% c("7795"))
This should explain whats happening. Notice how the ==, returns NA rather than FALSE.
> c( 5, 6, 7) == 5
[1] TRUE FALSE FALSE
> c( 5, 6, 7 , NA) == 5
[1] TRUE FALSE FALSE NA
> c( 5, 6, 7 , NA) %in% 5
[1] TRUE FALSE FALSE FALSE
and you can't subset using an NA
I would like to list all unique combinations of vectors of length 3 where each element of the vector can range between 1 to 9.
First I list all such combinations:
df <- expand.grid(1:9, 1:9, 1:9)
Then I would like to remove the rows that contain repetitions.
For example:
1 1 9
9 1 1
1 9 1
should only be included once.
In other words if two lines have the same numbers and the same number of each number then it should only be included once.
Note that
8 8 8 or
9 9 9 is fine as long as it only appears once.
Based on your approach and the idea to remove repetitions:
df <- expand.grid(1:2, 1:2, 1:2)
# Var1 Var2 Var3
# 1 1 1 1
# 2 2 1 1
# 3 1 2 1
# 4 2 2 1
# 5 1 1 2
# 6 2 1 2
# 7 1 2 2
# 8 2 2 2
df2 <- unique(t(apply(df, 1, sort))) #class matrix
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] 1 1 2
# [3,] 1 2 2
# [4,] 2 2 2
df2 <- as.data.frame(df2) #class data.frame
There are probably more efficient methods, but if I understand you correct, that is the result you want.
Maybe something like this (since your data frame is not large, so it does not pain!):
len <- apply(df,1,function(x) length(unique(x)))
res <- rbind(df[len!=2,], df[unique(apply(df[len==2,],1,prod)),])
Here is what is done:
Get the number of unique elements per row
Comprises two steps:
First argument of rbind: Those with length either 1 (e.g. 1 1 1, 7 7 7, etc) or 3 (e.g. 5 8 7, 2 4 9, etc) are included in the final results res.
Second argument of rbind: For those in which the number of unique elements are 2 (e.g. 1 1 9, 3 5 3, etc), we apply product per row and take whose unique products (cause, for example, the product of 3 3 5 and 3 5 3 and 5 3 3 are the same)
I have 6 digits (1, 2, 3, 4, 5, 6), and I need to create all possible combinations (i.e. 6*5*4*3*2*1 = 720 combinations) in which no number can be used twice and O is not allowed. I would like to obtain combinations like: 123456, 246135, 314256, etc.
Is there a way to create them with Matlab or R? Thank you.
In Matlab you can use
y = perms(1:6);
This gives a numerical 720×6 array y, where each row is a permutation:
y =
6 5 4 3 2 1
6 5 4 3 1 2
6 5 4 2 3 1
6 5 4 2 1 3
6 5 4 1 2 3
···
If you want the result as a char array:
y = char(perms(1:6)+'0');
which produces
y =
654321
654312
654231
654213
654123
···
In R:
library(combinat)
p <- permn(1:6)
gives you a list; do.call(rbind, p) or matrix(unlist(p), ncol=6, byrow=TRUE) will give a numeric array; sapply(p,paste,collapse="") gives a vector of strings.
Here's a base R 'solution':
p <- unique(t(replicate(100000, sample(6,6), simplify="vector")))
nrow(p)
#> [1] 720
head(p)
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 3 5 4 2 1 6
#> [2,] 6 3 5 4 1 2
#> [3,] 5 1 6 2 3 4
#> [4,] 6 5 3 2 4 1
#> [5,] 5 2 3 6 4 1
#> [6,] 1 4 2 5 6 3
It's a hack of course, and this potentially only applies to the example given, but sometimes it's useful to do things in silly ways... this takes an excessive number of samples (without replacement) of the vector 1:6, then removes any duplicates. It does indeed produce the unique 720 results, but they're not sorted.
A base R approach is
x <- do.call(expand.grid, rep(list(1:6), 6))
x <- x[apply(x, MAR = 1, function(x) length(unique(x)) == 6), ]
which creates a matrix with 6^6 rows, then retains only rows that contain all 6 numbers.
I have a .txt file in which there are 13 columns. the first one is Characters(names) and the next 12 are numbers. also there are 1000 rows. I want to filter out the rows in which even one column has the value less than 10. in other word I just need the rows with values equal or more than 10 in all columns. could you please let me know how I can do that in R?
thanks.
You can use the which() function in R to satisfy your condition. Create some test data:
> test
X1 X2 X3 X4
1 9.725585 10.067146 9.473320 9.959529
2 10.104124 11.278900 9.299356 10.317570
3 8.770733 11.092994 9.803285 12.078180
4 10.163150 9.233452 9.425293 9.968435
5 9.815270 9.932501 9.798252 9.194674
6 10.635158 9.175388 10.938356 10.611528
7 10.959444 7.766411 8.955005 10.712767
8 9.907442 10.123078 9.897276 10.467526
9 9.337628 10.811072 11.062031 10.426313
10 10.056789 11.029007 10.875958 11.160633
using which(test < 10, arr.ind = TRUE) gives:
> head(which(test < 10, arr.ind = TRUE))
row col
[1,] 1 1
[2,] 3 1
[3,] 5 1
[4,] 8 1
[5,] 9 1
[6,] 4 2
Then:
> sort(unique(which(test < 10, arr.ind = TRUE)[, 1]))
[1] 1 2 3 4 5 6 7 8 9