Remove rows by multiple logical conditions (rstudio)

Remove rows by multiple logical conditions (rstudio) - r

Let's say this is my dataframe:
df <- data.frame(replicate(10,sample(0:50,20,rep=TRUE)))
S X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 3 26 39 25 24 4 46 42 8 42
2 40 6 50 50 22 2 40 24 26 17
3 32 45 18 7 19 6 33 12 0 13
4 3 45 43 32 16 33 25 18 35 45
5 7 36 2 25 16 20 24 14 27 29
6 45 4 12 13 50 35 38 1 27 34
7 18 43 38 16 34 18 19 45 4 34
8 18 9 33 38 18 13 23 44 41 4
9 28 34 6 3 14 11 47 4 21 50
10 6 48 42 46 48 42 12 33 1 32
11 28 20 37 2 26 33 5 2 22 27
12 40 30 41 45 28 6 5 46 21 46
13 1 47 46 37 0 3 11 45 12 11
14 20 0 9 38 42 15 44 1 2 45
15 49 29 25 41 38 26 20 34 50 0
16 2 5 47 6 36 34 28 36 32 38
17 15 22 50 13 26 9 37 40 41 23
18 44 27 47 37 26 34 31 36 44 12
19 47 41 19 2 50 44 48 36 34 38
20 25 31 28 34 8 19 3 13 14 23
I need to exclude subjects ('S') with values higher than 30 in 8 or more columns(X1:X10). That is, only exclude those who has 8 times or more values above 30 (e.g. Subject 19). I was thinking that maybe 'ifelse' function can be useful, but I really don't know how to implement it.
Any help is highly appreciated! Thanks a lot!

df[-which(apply(df, 1, function(x) sum(x > 30) > 8)),]
To illustrate how (and that) this works consider this dataframe:
set.seed(1111)
df <- data.frame(replicate(5,sample(0:50,5,rep=TRUE)))
df
X1 X2 X3 X4 X5
1 23 49 0 8 8
2 21 44 38 46 21
3 46 5 38 28 42
4 6 27 32 45 50
5 37 7 44 39 3
Here the second row has values > 20 in more than 4 rows. To remove that row you substract (-) from df those rows in which the number of columns where values are greater than 20 is greater than 4:
df[-which(apply(df, 1, function(x) sum(x > 20) > 4)),]
X1 X2 X3 X4 X5
1 23 49 0 8 8
3 46 5 38 28 42
4 6 27 32 45 50
5 37 7 44 39 3
Et voilá, the second rows has been removed.

You can try subset + rowSums like below
subset(df,!rowSums(df > 30)>=8)

Related

R plot numbers of factor levels having n, n+1, .... counts

I have a very large dataset (> 200000 lines) with 6 variables (only the first two shown)
>head(gt7)
ChromKey POS
1 2447 25
2 2447 183
3 26341 75
4 26341 2213
5 26341 2617
6 54011 1868
I have converted the Chromkey variable to a factor variable made up of > 55000 levels.
> gt7[1] <- lapply(gt7[1], factor)
> is.factor(gt7$ChromKey)
[1] TRUE
I can further make a table with counts of ChromKey levels
> table(gt7$ChromKey)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
88 88 44 33 11 11 33 22 121 11 22 11 11 11 22 11 33
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
22 22 44 55 22 11 22 66 11 11 11 22 11 11 11 187 77
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
77 11 44 11 11 11 11 11 11 22 66 11 22 11 44 22 22
... outut cropped
Which I can save in table format
> table <- table(gt7$ChromKey)
> head(table)
1 2 3 4 5 6
88 88 44 33 11 11
I would like to know whether is it possible to have a table (and histogram) of the number of levels with specific count numbers. From the example above, I would expect
88 44 33 11
2 1 1 2
I would very much appreciate any hint.

We can apply table again on the output to get the frequency count of the frequency
table(table(gt7$ChromKey))

Distance Matrix from table in R

Good evening,
I need to solve a location problem in R and I'm stuck in one of the first steps.
From a .txt file I need to create a distance matrix using the euclidean method.
datos <- file.choose()
servidores <- read.table(datos)
servidores
From which I obtain the following information:
X50 shows the total number of servers.
x5 the number of hubs required.
x120 the total capacity.
The first column shows the distance of x.
The second column shows the distance of y.
The third column shows the requirements of the node.
X50 X5 X120
1 2 62 3
2 80 25 14
3 36 88 1
4 57 23 14
5 33 17 19
6 76 43 2
7 77 85 14
8 94 6 6
9 89 11 7
10 59 72 6
11 39 82 10
12 87 24 18
13 44 76 3
14 2 83 6
15 19 43 20
16 5 27 4
17 58 72 14
18 14 50 11
19 43 18 19
20 87 7 15
21 11 56 15
22 31 16 4
23 51 94 13
24 55 13 13
25 84 57 5
26 12 2 16
27 53 33 3
28 53 10 7
29 33 32 14
30 69 67 17
31 43 5 3
32 10 75 3
33 8 26 12
34 3 1 14
35 96 22 20
36 6 48 13
37 59 22 10
38 66 69 9
39 22 50 6
40 75 21 18
41 4 81 7
42 41 97 20
43 92 34 9
44 12 64 1
45 60 84 8
46 35 100 5
47 38 2 1
48 9 9 7
49 54 59 9
50 1 58 2
I tried to use the dist() function:
distance_matrix <-dist(servidores,method = "euclidean",diag = TRUE,upper = TRUE)
but since x and y are on different columns I am not sure what to do to get a 50x50 matrix with all the distances.
Anybody knows how could I create such matrix?.
Many thanks in advance.

What is the name and reason for the [1] at the output prompt?

What's the name for the [1] below.
What is its significance?
Is it always only [1]? If not, then under what conditions is it something else? (example please)
> bb <- c(5,6,7)
> bb
[1] 5 6 7

It shows the count of the variables. In your case, it shows
bb <- c(5,6,7)
> bb
# [1] 5 6 7
Try,
c(1:50)
#[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
#[35] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
You can also avoid that being displayed by using cat
cat(c(1:50))
#1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

split a list and increment for loop by 10

How to split a list in r?
I want to split a list in increment manner.
for ex.:
x <- 1:50
n <- 5
spt <- split(x,cut(x,quantile(x,(0:n)/n), include.lowest=TRUE, labels=FALSE))
we get
$`1`
[1] 1 2 3 4 5 6 7 8 9 10
$`2`
[1] 11 12 13 14 15 16 17 18 19 20
$`3`
[1] 21 22 23 24 25 26 27 28 29 30
$`4`
[1] 31 32 33 34 35 36 37 38 39 40
$`5`
[1] 41 42 43 44 45 46 47 48 49 50
I don't want this output. I want the output like below,
$`1`
[1] 1 2 3 4 5 6 7 8 9 10
$`2`
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
$`3`
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
$`4`
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2021 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
$`5`
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
any idea?
And i also want to know that how to increment for loop by 10 in r?
Thanks.

We can use seq
lapply(seq(10,50, by=10), function(i) x[1:i])
Or as #RichardScriven mentioned in the comments, the seq(10,50, by=10) can be replaced by 1:5 * 10L

How can I change the column names of a chunk of columns?

I have the following example table and would like to change the column names of e,f,g,h,i. In this example, let's say I just want to paste a "2" onto the end (so e2, f2, etc.) Is there a way to do this simply without a for loop?
m <- matrix(seq_len(12*5), nrow=5, ncol=12)
m <- data.frame(m)
names(m) <- letters[1:12]
m
a b c d e f g h i j k l
1 1 6 11 16 21 26 31 36 41 46 51 56
2 2 7 12 17 22 27 32 37 42 47 52 57
3 3 8 13 18 23 28 33 38 43 48 53 58
4 4 9 14 19 24 29 34 39 44 49 54 59
5 5 10 15 20 25 30 35 40 45 50 55 60
After diligent searching, and trial/error I have not found the answer.

Both sprintf and paste0 will work. If the two who posted good answers in the comments wish to post answers, I'll remove this since they should get the credit.
Here's a paste0 answer.
> names(m)[5:9] <- paste0(names(m[5:9]), 2)
> m
a b c d e2 f2 g2 h2 i2 j k l
1 1 6 11 16 21 26 31 36 41 46 51 56
2 2 7 12 17 22 27 32 37 42 47 52 57
3 3 8 13 18 23 28 33 38 43 48 53 58
4 4 9 14 19 24 29 34 39 44 49 54 59
5 5 10 15 20 25 30 35 40 45 50 55 60

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Remove rows by multiple logical conditions (rstudio) - r

You can try subset + rowSums like below subset(df,!rowSums(df > 30)>=8)

Related

R plot numbers of factor levels having n, n+1, .... counts

Distance Matrix from table in R

What is the name and reason for the [1] at the output prompt?

split a list and increment for loop by 10

How can I change the column names of a chunk of columns?

Categories

Resources