subsetting between two data frames

subsetting between two data frames - r

I want to subset everything from df1 except df2.
df1<-
A B C D E F G H I J
80 16 55 74 89 39 4 67 36 87
69 49 91 83 50 1 77 19 73 43
85 45 97 9 47 65 79 81 86 66
37 58 17 38 76 14 54 78 62 98
12 25 56 20 31 82 34 23 33 11
df2<-
C D E F
55 74 89 39
91 83 50 1
97 9 47 65
17 38 76 14
56 20 31 82
I would like to utilise this kind of approach if possible:
mydata<-df1[,!colnames(df2)]

If you want the columns that are in df1, but not in df2, this can be done as such:
not_in_df2 <- setdiff(colnames(df1), colnames(df2))
subSet_df1 <- df1[,not_in_df2]
Or you could define not_in_df2 via
not_in_df2 <- !(colnames(df1) %in% colnames(df2))

Related

Create new variables by dividing all pre-exisiting variables by all other variables

I would like to create new variables by dividing all pre-existing variables by each other
e.g.
X1/X1, X1/X2, X1/X3, X1/X4, X1/X5, X1/X6, X1/X7, X1/X8, X1/X9, X1/X10,
X2/X1, X2/X2, X2/X3, X2/X4, X2/X5, X2/X6, X2/X7, X2/X8, X2/X9, X2/X10,
X3/X1, X3/X2 ...
I started by trying to do each individually, as below, but I need to replicate this with multiple variable names so an automation (I assume a function/lapply) would be ideal.
ds$rom_3_5m <- (ds$roll_open_mean_3m/ds$roll_open_mean_5m)
ds$rom_3_10m <- (ds$roll_open_mean_3m/ds$roll_open_mean_10m)
ds$rom_3_15m <- (ds$roll_open_mean_3m/ds$roll_open_mean_15m)
ds$rom_3_30m <- (ds$roll_open_mean_3m/ds$roll_open_mean_30m)
ds$rom_3_60m <- (ds$roll_open_mean_3m/ds$roll_open_mean_60m)
ds$rom_3_120m <- (ds$roll_open_mean_3m/ds$roll_open_mean_120m)
ds$rom_3_240m <- (ds$roll_open_mean_3m/ds$roll_open_mean_240m)
ds$rom_3_480m <- (ds$roll_open_mean_3m/ds$roll_open_mean_480m)
ds$rom_3_960m <- (ds$roll_open_mean_3m/ds$roll_open_mean_960m)
ds$rom_3_1920m <- (ds$roll_open_mean_3m/ds$roll_open_mean_1920m)
ds$rom_3_3840m <- (ds$roll_open_mean_3m/ds$roll_open_mean_3840m)
ds$rom_3_7680m <- (ds$roll_open_mean_3m/ds$roll_open_mean_7680m)
ds$rom_3_15360m <- (ds$roll_open_mean_3m/ds$roll_open_mean_15360m)
ds$rom_3_30720m <- (ds$roll_open_mean_3m/ds$roll_open_mean_30720m)
ds$rom_3_61440m <- (ds$roll_open_mean_3m/ds$roll_open_mean_61440m)
ds$rom_3_122880m <- (ds$roll_open_mean_3m/ds$roll_open_mean_122880m)
ds$rom_3_245760m <- (ds$roll_open_mean_3m/ds$roll_open_mean_245760m)
ds$rom_3_491520m <- (ds$roll_open_mean_3m/ds$roll_open_mean_491520m)
#5m
ds$rom_5_3m <- (ds$roll_open_mean_5m/ds$roll_open_mean_3m)
ds$rom_5_10m <- (ds$roll_open_mean_5m/ds$roll_open_mean_10m)
ds$rom_5_15m <- (ds$roll_open_mean_5m/ds$roll_open_mean_15m)
ds$rom_5_30m <- (ds$roll_open_mean_5m/ds$roll_open_mean_30m)
ds$rom_5_60m <- (ds$roll_open_mean_5m/ds$roll_open_mean_60m)
ds$rom_5_120m <- (ds$roll_open_mean_5m/ds$roll_open_mean_120m)
ds$rom_5_240m <- (ds$roll_open_mean_5m/ds$roll_open_mean_240m)
ds$rom_5_480m <- (ds$roll_open_mean_5m/ds$roll_open_mean_480m)
ds$rom_5_960m <- (ds$roll_open_mean_5m/ds$roll_open_mean_960m)
ds$rom_5_1920m <- (ds$roll_open_mean_5m/ds$roll_open_mean_1920m)
ds$rom_5_3840m <- (ds$roll_open_mean_5m/ds$roll_open_mean_3840m)
ds$rom_5_7680m <- (ds$roll_open_mean_5m/ds$roll_open_mean_7680m)
ds$rom_5_15360m <- (ds$roll_open_mean_5m/ds$roll_open_mean_15360m)
ds$rom_5_30720m <- (ds$roll_open_mean_5m/ds$roll_open_mean_30720m)
ds$rom_5_61440m <- (ds$roll_open_mean_5m/ds$roll_open_mean_61440m)
ds$rom_5_122880m <- (ds$roll_open_mean_5m/ds$roll_open_mean_122880m)
ds$rom_5_245760m <- (ds$roll_open_mean_5m/ds$roll_open_mean_245760m)
ds$rom_5_491520m <- (ds$roll_open_mean_5m/ds$roll_open_mean_491520m)
#10m
ds$rom_10_3m <- (ds$roll_open_mean_10m/ds$roll_open_mean_3m)
ds$rom_10_5m <- (ds$roll_open_mean_10m/ds$roll_open_mean_5m)
ds$rom_10_15m <- (ds$roll_open_mean_10m/ds$roll_open_mean_15m)
I have a data frame with 40+ variables with 6 million rows, I have attached a smaller example data frame below.
Thanks in advance!
Charlie
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 57 77 48 8 31 43 47 13 26 88
2 25 75 86 77 4 65 5 49 31 57
3 91 90 42 69 82 33 56 99 47 39
4 35 96 86 77 67 77 20 17 77 92
5 6 100 50 62 16 31 0 39 72 4
6 90 34 74 89 71 37 73 45 24 28
7 24 22 92 13 57 97 32 2 12 80
8 74 59 49 2 97 100 15 37 15 67
9 43 38 66 97 8 20 85 25 97 67
10 82 4 56 40 42 46 44 98 98 76
11 60 68 92 99 81 92 78 59 23 81
12 22 57 37 100 7 1 89 41 40 56
13 69 13 1 82 89 45 83 24 71 29
14 8 14 66 48 94 8 20 3 28 63
15 26 70 56 62 9 34 11 86 71 64
16 7 55 15 100 91 89 46 74 98 14
17 29 68 19 66 83 29 84 76 90 45
18 27 76 6 48 17 28 8 7 52 37
19 68 58 51 75 60 57 74 46 98 93
20 15 15 89 55 23 3 3 8 32 37
21 78 49 57 48 96 89 4 95 67 58
22 12 36 42 59 27 92 48 0 92 28
23 51 17 77 61 84 53 46 22 27 36
24 40 84 83 35 19 13 80 78 96 87
25 44 80 25 72 43 17 74 70 52 36
26 14 61 63 82 16 47 32 93 19 84
27 93 19 28 62 74 1 85 65 50 9
28 80 62 6 58 48 97 97 18 65 43
29 12 58 95 79 37 89 89 83 22 85
30 57 73 22 88 99 63 58 87 90 66

As #27 ϕ 9 suggested in the comments you should use that lapply solution.
With this, you also create a unique dataframe with correct names
l <- lapply(df, `/`, df)
l <- unlist(l, recursive = FALSE)
data.frame(l)

What ways exist to create an array with given dimensions from a given sequence in Julia?

I'm new to Julia and I could not find any useful information on the following: I would like to create an array of given dimensions and fill it with a given sequence.
m,n = 10,10 # dimensions
i = 1:100 # sequence
I've tried to use collect, but this gives me a single column array. I have also tried it the Julia way
[? for i in 1:m, j in 1:n]
but I don't know what I could insert for ?.

The easiest way is reshape(i, m,n) (potentially together with a collect if you really need an Array{Int64,2}):
julia> reshape(i,m,n)
10×10 reshape(::UnitRange{Int64}, 10, 10) with eltype Int64:
1 11 21 31 41 51 61 71 81 91
2 12 22 32 42 52 62 72 82 92
3 13 23 33 43 53 63 73 83 93
4 14 24 34 44 54 64 74 84 94
5 15 25 35 45 55 65 75 85 95
6 16 26 36 46 56 66 76 86 96
7 17 27 37 47 57 67 77 87 97
8 18 28 38 48 58 68 78 88 98
9 19 29 39 49 59 69 79 89 99
10 20 30 40 50 60 70 80 90 100
julia> collect(ans)
10×10 Array{Int64,2}:
1 11 21 31 41 51 61 71 81 91
2 12 22 32 42 52 62 72 82 92
3 13 23 33 43 53 63 73 83 93
4 14 24 34 44 54 64 74 84 94
5 15 25 35 45 55 65 75 85 95
6 16 26 36 46 56 66 76 86 96
7 17 27 37 47 57 67 77 87 97
8 18 28 38 48 58 68 78 88 98
9 19 29 39 49 59 69 79 89 99
10 20 30 40 50 60 70 80 90 100
To answer your question what to put as ? in the array comprehension approach, you must convert the cartesian index to a linear index, for example like so:
julia> [i[LinearIndices((m,n))[p,q]] for p in 1:m, q in 1:n]
10×10 Array{Int64,2}:
1 11 21 31 41 51 61 71 81 91
2 12 22 32 42 52 62 72 82 92
3 13 23 33 43 53 63 73 83 93
4 14 24 34 44 54 64 74 84 94
5 15 25 35 45 55 65 75 85 95
6 16 26 36 46 56 66 76 86 96
7 17 27 37 47 57 67 77 87 97
8 18 28 38 48 58 68 78 88 98
9 19 29 39 49 59 69 79 89 99
10 20 30 40 50 60 70 80 90 100
Of course, you can also calculate the linear index yourself, [i[(q-1)*m + p] for p in 1:m, q in 1:n].
Alternatively, you can preallocate the array and fill it in a linear fashion:
julia> result = Matrix{Int64}(undef, m,n);
julia> result[:] .= i;
julia> result
10×10 Array{Int64,2}:
1 11 21 31 41 51 61 71 81 91
2 12 22 32 42 52 62 72 82 92
3 13 23 33 43 53 63 73 83 93
4 14 24 34 44 54 64 74 84 94
5 15 25 35 45 55 65 75 85 95
6 16 26 36 46 56 66 76 86 96
7 17 27 37 47 57 67 77 87 97
8 18 28 38 48 58 68 78 88 98
9 19 29 39 49 59 69 79 89 99
10 20 30 40 50 60 70 80 90 100
which is basically equivalent to the naive, explicit solution
julia> result = Matrix{Int64}(undef, m,n);
julia> for k in eachindex(i) result[k] = i[k] end
julia> result
10×10 Array{Int64,2}:
1 11 21 31 41 51 61 71 81 91
2 12 22 32 42 52 62 72 82 92
3 13 23 33 43 53 63 73 83 93
4 14 24 34 44 54 64 74 84 94
5 15 25 35 45 55 65 75 85 95
6 16 26 36 46 56 66 76 86 96
7 17 27 37 47 57 67 77 87 97
8 18 28 38 48 58 68 78 88 98
9 19 29 39 49 59 69 79 89 99
10 20 30 40 50 60 70 80 90 100

How to cut the values in a regular interval and define them into the separate group? [duplicate]

This question already has answers here:
Split a vector into chunks
(22 answers)
Closed 3 years ago.
How to cut the values (1 to 100) in a regular interval (25) and place them into 4 groups as below:
sdr <- c(1:100)
Group1: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Group2: 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Group3: 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
Group4: 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
Any suggestion, please.

You could use split
sdr <- 1:100
split(sdr, rep(1:4, each = 25))
#$`1`
# [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#
#$`2`
# [1] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
#
#$`3`
# [1] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
#
#$`4`
# [1] 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94
#[20] 95 96 97 98 99 100
This returns a list with 4 vector elements.
Also note that the c() around 1:100 is not necessary.
Or we can define the number of groups
ngroup <- 4
split(sdr, rep(1:ngroup, each = length(sdr) %/% ngroup))
giving the same result.

You can make a dataframe for your groups and then transpose using t:
df <- t(data.frame(Group1 = c(1:25), Group2 = c(26:50), Group3 = c(51:75), Group4 = c(76:100)))

Generate sequence with alternating increments in R? [duplicate]

This question already has answers here:
Get a seq() in R with alternating steps
(6 answers)
Closed 6 years ago.
I want to use R to create the sequence of numbers 1:8, 11:18, 21:28, etc. through 1000 (or the closest it can get, i.e. 998). Obviously typing that all out would be tedious, but since the sequence increases by one 7 times and then jumps by 3 I'm not sure what function I could use to achieve this.
I tried seq(1, 998, c(1,1,1,1,1,1,1,3)) but it does not give me the results I am looking for so I must be doing something wrong.

This is a perfect case of vectorisation( recycling too) in R. read about them
(1:100)[rep(c(TRUE,FALSE), c(8,2))]
# [1] 1 2 3 4 5 6 7 8 11 12 13 14 15 16 17 18 21 22 23 24 25 26 27 28 31 32
#[27] 33 34 35 36 37 38 41 42 43 44 45 46 47 48 51 52 53 54 55 56 57 58 61 62 63 64
#[53] 65 66 67 68 71 72 73 74 75 76 77 78 81 82 83 84 85 86 87 88 91 92 93 94 95 96
#[79] 97 98

rep(seq(0,990,by=10), each=8) + seq(1,8)

You want to exclude numbers that are 0 or 9 (mod 10). So you can try this too:
n <- 1000 # upper bound
x <- 1:n
x <- x[! (x %% 10) %in% c(0,9)] # filter out (0, 9) mod (10)
head(x,80)
# [1] 1 2 3 4 5 6 7 8 11 12 13 14 15 16 17 18 21 22 23 24 25 26 27
# 28 31 32 33 34 35 36 37 38 41 42 43 44 45 46 47 48 51 52 53 54 55 56 57
# 58 61 62 63 64 65 66 67 68 71 72 73 74 75 76 77 78 81 82 83 84 85
# 86 87 88 91 92 93 94 95 96 97 98
Or in a single line using Filter:
Filter(function(x) !((x %% 10) %in% c(0,9)), 1:100)
# [1] 1 2 3 4 5 6 7 8 11 12 13 14 15 16 17 18 21 22 23 24 25 26 27 28 31 32 33 34 35 36 37 38 41 42 43 44 45 46 47 48 51 52 53 54 55 56 57
# [48] 58 61 62 63 64 65 66 67 68 71 72 73 74 75 76 77 78 81 82 83 84 85 86 87 88 91 92 93 94 95 96 97 98

With a cycle: for(value in c(seq(1,991,10))){vector <- c(vector,seq(value,value+7))}

Subsetting Data frame or matrix based on criteria of values

Suppose I have a matrix or a data frame and I want only those values that are greater than 15 and no values between 85 and 90 both inclusive
a<-matrix(1:100,nrow = 10, ncol = 10)
rownames(a) <- LETTERS[1:10]
colnames(a) <- LETTERS[1:10]
A B C D E F G H I J
A 1 11 21 31 41 51 61 71 81 91
B 2 12 22 32 42 52 62 72 82 92
C 3 13 23 33 43 53 63 73 83 93
D 4 14 24 34 44 54 64 74 84 94
E 5 15 25 35 45 55 65 75 85 95
F 6 16 26 36 46 56 66 76 86 96
G 7 17 27 37 47 57 67 77 87 97
H 8 18 28 38 48 58 68 78 88 98
I 9 19 29 39 49 59 69 79 89 99
J 10 20 30 40 50 60 70 80 90 100
Note: You can convert it into dataframe if you know this kind of operation is possible in dataframe
Now I want My result in such a format that only those values that are greater than 5 and less than 85 retain and all else got deleted and replaced with blank space.
My desired out is like below
A B C D E F G H I J
A 11 21 31 41 51 61 71 81 91
B 12 22 32 42 52 62 72 82 92
C 13 23 33 43 53 63 73 83 93
D 14 24 34 44 54 64 74 84 94
E 5 15 25 35 45 55 65 75 85 95
F 6 16 26 36 46 56 66 76 96
G 7 17 27 37 47 57 67 77 97
H 8 18 28 38 48 58 68 78 98
I 9 19 29 39 49 59 69 79 99
J 10 20 30 40 50 60 70 80 100
Is there any kind of function in R which can take my condition and produce the desired result. I want to change code according to problem . I searched it over stack flow but didn't find something like this. I don't want to format based on rows or column.
I tried
a[a> 5 & a!=c(85:90)]
but this give me values and looses the structure.

Assuming that the 'a' is matrix, we can assign the values of 'a' %in% 86:90 or | less than 5 (a < 5) to NA. Here, I am not assigning it to '' as it will change the class from numeric to character. Also, assigning to NA would be useful for later processing.
a[a %in% 86:90 | a<5] <- NA
However, if we need it to be ''
a[a %in% 86:90 | a<5] <- ""
If we are using a data.frame
a1 <- as.data.frame(a)
a1[] <- lapply(a1, function(x) replace(x, x %in% 86:90| x <5, ""))
a1
# A B C D E F G H I J
#A 11 21 31 41 51 61 71 81 91
#B 12 22 32 42 52 62 72 82 92
#C 13 23 33 43 53 63 73 83 93
#D 14 24 34 44 54 64 74 84 94
#E 5 15 25 35 45 55 65 75 85 95
#F 6 16 26 36 46 56 66 76 96
#G 7 17 27 37 47 57 67 77 97
#H 8 18 28 38 48 58 68 78 98
#I 9 19 29 39 49 59 69 79 99
#J 10 20 30 40 50 60 70 80 100
NOTE: This changes the class of each column to character
In the OP's code, a!=c(85:90) will not work as intended as the 85:90 will recycle to the length of the 'a' and the comparison will be between the corresponding values in the recycled value and 'a'. Instead, we need to use %in% for a vector with length > 1.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

subsetting between two data frames - r

If you want the columns that are in df1, but not in df2, this can be done as such: not_in_df2 <- setdiff(colnames(df1), colnames(df2)) subSet_df1 <- df1[,not_in_df2] Or you could define not_in_df2 via not_in_df2 <- !(colnames(df1) %in% colnames(df2))

Related

Create new variables by dividing all pre-exisiting variables by all other variables

What ways exist to create an array with given dimensions from a given sequence in Julia?

How to cut the values in a regular interval and define them into the separate group? [duplicate]

Generate sequence with alternating increments in R? [duplicate]

Subsetting Data frame or matrix based on criteria of values

Categories

Resources