J Get first N columns of a Matrix - functional-programming

I know that given a Matrix M of size NxN I can get the first m rows using (i.m){M I would like to know how to get the first n columns from M.
I assume that having something like
rows =: (i.m){M
giving a matrix of size mxN the same approach would be taken to get the first n columns of this new matrix.
edit:
I am trying the use code like this:
(i.n)"1{(i.m){M
However it is not working as it only returns the first element of the n columns in the first row of M, I need the get n columns.

You already have several answers from Dan. This one is just to explain why you might prefer using take instead of from. If you run into a case where your n is greater than the number of columns in your M, take will give you fill where from will produce an error.
$M
10 10
(i. 3){"1 M
0 1 2
10 11 12
20 21 22
30 31 32
40 41 42
50 51 52
60 61 62
70 71 72
80 81 82
90 91 92
3{."1 M
0 1 2
10 11 12
20 21 22
30 31 32
40 41 42
50 51 52
60 61 62
70 71 72
80 81 82
90 91 92
(i. 12){"1 M
|index error
| (i.12) {"1 M
12{."1 M
0 1 2 3 4 5 6 7 8 9 0 0
10 11 12 13 14 15 16 17 18 19 0 0
20 21 22 23 24 25 26 27 28 29 0 0
30 31 32 33 34 35 36 37 38 39 0 0
40 41 42 43 44 45 46 47 48 49 0 0
50 51 52 53 54 55 56 57 58 59 0 0
60 61 62 63 64 65 66 67 68 69 0 0
70 71 72 73 74 75 76 77 78 79 0 0
80 81 82 83 84 85 86 87 88 89 0 0
90 91 92 93 94 95 96 97 98 99 0 0

Related

loop over a sequence and rounding problem in R

I want to assign some value to a vecter like:
a = rep(0, 101)
for(i in seq(0, 1, 0.01)){
u <- 100 * i + 1
a[u] <- u
}
a
plot(a)
The output is
> a
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 30 0 31 32 33 34
[35] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 59 0 60 61 62 63 64 65 66 67 68
[69] 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101
There are problems on the 29th and the 59th elements. They should be 29 and 59, but it turns out to be 0, the default value. And the previous values, the 28th and 58th, are also incorrect. Why is this happening? Thank you!
There is a problem with your indexing. I don't know how to explain why it doesn't work as written, but here is a modification to your code that works:
a = rep(0, 101)
s<-seq(0, 1, 0.01)
for(i in 1:101){
a[i] <- 100 * s[i] + 1
}
a
plot(a)
In general it is best to avoid multiple indexes in the same loop as it can be confusing and difficult to diagnose problems.

Take sample from diminishing population

I would like to take a random sample of rows from a data.frame, apply a function to the subset, then take a sample from the remaining rows, apply the function to the new subset (with different parameters), and so on.
A simple example would be if 5% of a population dies each month, in month 2 I need the population minus those ones who died in time month 1.
I have put together a very verbose method of doing this involving where I save the IDs from the sampled rows, then subset them out from the data for the second period, etc.
library(data.table)
dt <- data.table(Number=1:100, ID=paste0("A", 1:100))
first<-dt[sample(nrow(dt), nrow(dt)*.05)]$ID
mean(dt[ID %in% first]$Number)
second<-dt[!(ID %in% first)][sample(nrow(dt[!(ID %in% first)]),
nrow(dt[!(ID %in% first)])*.05)]$ID
mean(dt[ID %in% c(first,second)]$Number)
dt[!(ID %in% first)][!(ID %in% second)] #...
Obviously, this is not sustainable past a couple periods. What is the better way to do this? I imagine this is a standard method but couldn't think what to look for specifically. Thanks for any and all input.
This shows how to "grow" a vector of items that have been sampled at a 5% per interval time course:
removed <- numeric(0)
for ( i in 1:10){
removed <- c(removed, sample( (1:100)[!(1:100) %in% removed], # items out so far
(100-length(removed))*.05)) # 5% of remainder
cat(c(removed, "\n")) # print to console with each iteration.
}
54 1 76 96 93
54 1 76 96 93 81 16 13 79
54 1 76 96 93 81 16 13 79 80 74 30 29
54 1 76 96 93 81 16 13 79 80 74 30 29 52 33 86 19
54 1 76 96 93 81 16 13 79 80 74 30 29 52 33 86 19 34 32 41 62
54 1 76 96 93 81 16 13 79 80 74 30 29 52 33 86 19 34 32 41 62 5 70 8
54 1 76 96 93 81 16 13 79 80 74 30 29 52 33 86 19 34 32 41 62 5 70 8 66 82 50
54 1 76 96 93 81 16 13 79 80 74 30 29 52 33 86 19 34 32 41 62 5 70 8 66 82 50 6 91 99
54 1 76 96 93 81 16 13 79 80 74 30 29 52 33 86 19 34 32 41 62 5 70 8 66 82 50 6 91 99 46 27 51
54 1 76 96 93 81 16 13 79 80 74 30 29 52 33 86 19 34 32 41 62 5 70 8 66 82 50 6 91 99 46 27 51 22 23 20
Notice that the actual number of items added to the list of "removals" will be decreasing.

Generate sequence with alternating increments in R? [duplicate]

This question already has answers here:
Get a seq() in R with alternating steps
(6 answers)
Closed 6 years ago.
I want to use R to create the sequence of numbers 1:8, 11:18, 21:28, etc. through 1000 (or the closest it can get, i.e. 998). Obviously typing that all out would be tedious, but since the sequence increases by one 7 times and then jumps by 3 I'm not sure what function I could use to achieve this.
I tried seq(1, 998, c(1,1,1,1,1,1,1,3)) but it does not give me the results I am looking for so I must be doing something wrong.
This is a perfect case of vectorisation( recycling too) in R. read about them
(1:100)[rep(c(TRUE,FALSE), c(8,2))]
# [1] 1 2 3 4 5 6 7 8 11 12 13 14 15 16 17 18 21 22 23 24 25 26 27 28 31 32
#[27] 33 34 35 36 37 38 41 42 43 44 45 46 47 48 51 52 53 54 55 56 57 58 61 62 63 64
#[53] 65 66 67 68 71 72 73 74 75 76 77 78 81 82 83 84 85 86 87 88 91 92 93 94 95 96
#[79] 97 98
rep(seq(0,990,by=10), each=8) + seq(1,8)
You want to exclude numbers that are 0 or 9 (mod 10). So you can try this too:
n <- 1000 # upper bound
x <- 1:n
x <- x[! (x %% 10) %in% c(0,9)] # filter out (0, 9) mod (10)
head(x,80)
# [1] 1 2 3 4 5 6 7 8 11 12 13 14 15 16 17 18 21 22 23 24 25 26 27
# 28 31 32 33 34 35 36 37 38 41 42 43 44 45 46 47 48 51 52 53 54 55 56 57
# 58 61 62 63 64 65 66 67 68 71 72 73 74 75 76 77 78 81 82 83 84 85
# 86 87 88 91 92 93 94 95 96 97 98
Or in a single line using Filter:
Filter(function(x) !((x %% 10) %in% c(0,9)), 1:100)
# [1] 1 2 3 4 5 6 7 8 11 12 13 14 15 16 17 18 21 22 23 24 25 26 27 28 31 32 33 34 35 36 37 38 41 42 43 44 45 46 47 48 51 52 53 54 55 56 57
# [48] 58 61 62 63 64 65 66 67 68 71 72 73 74 75 76 77 78 81 82 83 84 85 86 87 88 91 92 93 94 95 96 97 98
With a cycle: for(value in c(seq(1,991,10))){vector <- c(vector,seq(value,value+7))}

Subsetting Data frame or matrix based on criteria of values

Suppose I have a matrix or a data frame and I want only those values that are greater than 15 and no values between 85 and 90 both inclusive
a<-matrix(1:100,nrow = 10, ncol = 10)
rownames(a) <- LETTERS[1:10]
colnames(a) <- LETTERS[1:10]
A B C D E F G H I J
A 1 11 21 31 41 51 61 71 81 91
B 2 12 22 32 42 52 62 72 82 92
C 3 13 23 33 43 53 63 73 83 93
D 4 14 24 34 44 54 64 74 84 94
E 5 15 25 35 45 55 65 75 85 95
F 6 16 26 36 46 56 66 76 86 96
G 7 17 27 37 47 57 67 77 87 97
H 8 18 28 38 48 58 68 78 88 98
I 9 19 29 39 49 59 69 79 89 99
J 10 20 30 40 50 60 70 80 90 100
Note: You can convert it into dataframe if you know this kind of operation is possible in dataframe
Now I want My result in such a format that only those values that are greater than 5 and less than 85 retain and all else got deleted and replaced with blank space.
My desired out is like below
A B C D E F G H I J
A 11 21 31 41 51 61 71 81 91
B 12 22 32 42 52 62 72 82 92
C 13 23 33 43 53 63 73 83 93
D 14 24 34 44 54 64 74 84 94
E 5 15 25 35 45 55 65 75 85 95
F 6 16 26 36 46 56 66 76 96
G 7 17 27 37 47 57 67 77 97
H 8 18 28 38 48 58 68 78 98
I 9 19 29 39 49 59 69 79 99
J 10 20 30 40 50 60 70 80 100
Is there any kind of function in R which can take my condition and produce the desired result. I want to change code according to problem . I searched it over stack flow but didn't find something like this. I don't want to format based on rows or column.
I tried
a[a> 5 & a!=c(85:90)]
but this give me values and looses the structure.
Assuming that the 'a' is matrix, we can assign the values of 'a' %in% 86:90 or | less than 5 (a < 5) to NA. Here, I am not assigning it to '' as it will change the class from numeric to character. Also, assigning to NA would be useful for later processing.
a[a %in% 86:90 | a<5] <- NA
However, if we need it to be ''
a[a %in% 86:90 | a<5] <- ""
If we are using a data.frame
a1 <- as.data.frame(a)
a1[] <- lapply(a1, function(x) replace(x, x %in% 86:90| x <5, ""))
a1
# A B C D E F G H I J
#A 11 21 31 41 51 61 71 81 91
#B 12 22 32 42 52 62 72 82 92
#C 13 23 33 43 53 63 73 83 93
#D 14 24 34 44 54 64 74 84 94
#E 5 15 25 35 45 55 65 75 85 95
#F 6 16 26 36 46 56 66 76 96
#G 7 17 27 37 47 57 67 77 97
#H 8 18 28 38 48 58 68 78 98
#I 9 19 29 39 49 59 69 79 99
#J 10 20 30 40 50 60 70 80 100
NOTE: This changes the class of each column to character
In the OP's code, a!=c(85:90) will not work as intended as the 85:90 will recycle to the length of the 'a' and the comparison will be between the corresponding values in the recycled value and 'a'. Instead, we need to use %in% for a vector with length > 1.

How to do efficient vectorized update on multiple columns using data.tables?

I have the following code using data.frames, and I'm wondering how to write this using data.tables, using the most efficient, most vectorized code?
data.frame code:
set.seed(1)
to <- cbind(data.frame(time=seq(1:5),bananas=sample(100,5),apples=sample(100,5)),setNames(data.frame(matrix(sample(100,90,replace=T),nrow=5)),paste0(1:18)))
from <- cbind(data.frame(time=seq(1:5),blah=sample(100,5),foo=sample(100,5)),setNames(data.frame(matrix(sample(100,90,replace=T),nrow=5)),paste0(1:18)))
from
to
rownames(to) <- to$time
to[as.character(from$time),paste0(1:18)] <- from[,paste0(1:18)]
to
Running this:
> set.seed(1)
> to <- cbind(data.frame(time=seq(1:5),bananas=sample(100,5),apples=sample(100,5)),setNames(data.frame(matrix(sample(100,90,replace=T),nrow=5)),paste0(1:18)))
> from <- cbind(data.frame(time=seq(1:5),blah=sample(100,5),foo=sample(100,5)),setNames(data.frame(matrix(sample(100,90,replace=T),nrow=5)),paste0(1:18)))
> from
time blah foo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 1 66 22 98 2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2 2 35 13 74 72 50 52 8 57 61 18 56 53 90 7 85 65 20 76 39 12
3 3 27 47 36 11 49 21 4 53 24 75 33 8 45 34 86 75 89 73 11 85
4 4 97 90 44 45 18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5 5 61 58 15 65 76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79
> to
time bananas apples 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 1 27 90 21 50 94 39 49 67 83 79 48 10 92 26 34 90 44 21 24 80
2 2 37 94 18 72 22 2 60 80 65 3 87 32 30 48 84 87 72 72 6 46
3 3 57 65 69 100 66 39 50 11 79 48 44 52 46 77 35 39 40 13 65 42
4 4 89 62 39 39 13 87 19 73 56 74 25 67 34 9 34 78 33 25 88 82
5 5 20 6 77 78 27 35 83 42 53 70 8 41 66 88 48 97 76 15 78 61
>
> rownames(to) <- to$time
> to[as.character(from$time),paste0(1:18)] <- from[,paste0(1:18)]
> to
time bananas apples 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 1 27 90 98 2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2 2 37 94 74 72 50 52 8 57 61 18 56 53 90 7 85 65 20 76 39 12
3 3 57 65 36 11 49 21 4 53 24 75 33 8 45 34 86 75 89 73 11 85
4 4 89 62 44 45 18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5 5 20 6 15 65 76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79
Basically, we update columns paste0(1:18) of to from columns paste0(1:18) of from, matching up the times.
data.tables apparently have some advantages, such as not needing head when printing them at the console, so I'm thinking about using them.
However I'd like not to have to write the := expressions by hand, ie try to avoid:
to[from,`1`:=i.`1`,`2`:=i.`2`, ..]
I'd also prefer to use vectorized syntax if possible, rather than some kind of for loop, ie try to avoid something like:
for( i in 1:18 ) {
to[from, sprintf("%d",i) := i.sprintf("%d",i)]
}
I read through the faq vignette, and the datatable-intro vignette, though I admit I probably haven't understood everything 100%.
I looked at Loop through columns in a data.table and transform those columns , but I can't say I understand it 100%, and it seems to say that I need to use a for loop?
There does seem to be some kind of a hint at the bottom of 8374816 that it might be possible to just use data frame syntax, adding with=FALSE? But since the data.frame procedure is hacking on the row names, I'm not sure how well / if that will work, and I wonder to what extent that makes use of the efficiencies of data.table?
Good question. The base construct you've shown :
to[as.character(from$time),paste0(1:18)] <- from[,paste0(1:18)]
works assuming row names can't be duplicated, or if they are then only the first is matched to. Here, the LHS of <- has the same number of rows as the RHS of <-.
data.table is different since routinely, multiple rows in to may match; the default for mult is "all". data.table also prefers long format to wide. So this question is kind of putting data.table through its paces for something it wasn't really designed for. If you have any NA in those 18 columns (i.e. sparse), then a long format may be more appropriate. If all 18 columns are the same type, then a matrix may be more appropriate.
That said, here are three data.table options for completeness.
1. Using := but without a for loop (multiple LHS and multiple RHS in LHS:=RHS)
from = as.data.table(from)
to = as.data.table(to)
from
time blah foo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1: 1 66 22 98 2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2: 2 35 13 74 72 50 52 8 57 61 18 56 53 90 7 85 65 20 76 39 12
3: 3 27 47 36 11 49 21 4 53 24 75 33 8 45 34 86 75 89 73 11 85
4: 4 97 90 44 45 18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5: 5 61 58 15 65 76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79
to
time bananas apples 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1: 1 27 90 21 50 94 39 49 67 83 79 48 10 92 26 34 90 44 21 24 80
2: 2 37 94 18 72 22 2 60 80 65 3 87 32 30 48 84 87 72 72 6 46
3: 3 57 65 69 100 66 39 50 11 79 48 44 52 46 77 35 39 40 13 65 42
4: 4 89 62 39 39 13 87 19 73 56 74 25 67 34 9 34 78 33 25 88 82
5: 5 20 6 77 78 27 35 83 42 53 70 8 41 66 88 48 97 76 15 78 61
setkey(to,time)
setkey(from,time)
to[from,paste0(1:18):=from[.GRP,paste0(1:18),with=FALSE]]
time bananas apples 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1: 1 27 90 98 2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2: 2 37 94 74 72 50 52 8 57 61 18 56 53 90 7 85 65 20 76 39 12
3: 3 57 65 36 11 49 21 4 53 24 75 33 8 45 34 86 75 89 73 11 85
4: 4 89 62 44 45 18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5: 5 20 6 15 65 76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79
or
to[from,paste0(1:18):=from[,paste0(1:18),with=FALSE],mult="first"]
time bananas apples 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1: 1 27 90 98 2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2: 2 37 94 74 72 50 52 8 57 61 18 56 53 90 7 85 65 20 76 39 12
3: 3 57 65 36 11 49 21 4 53 24 75 33 8 45 34 86 75 89 73 11 85
4: 4 89 62 44 45 18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5: 5 20 6 15 65 76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79
Note I'm using latest v1.8.3, which is needed for option 1 to work (.GRP has just been added, and the outer with=FALSE is no longer needed).
2. Use one list column to store the length 18 vectors, rather than 18 columns
to = data.table( time=seq(1:5),
bananas=sample(100,5),
apples=sample(100,5),
v18=replicate(5,sample(100,18),simplify=FALSE))
from = data.table( time=seq(1:5),
blah=sample(100,5),
foo=sample(100,5),
v18=replicate(5,sample(100,18),simplify=FALSE))
setkey(to,time)
setkey(from,time)
from
time blah foo v18
1: 1 56 97 88,47,1,71,69,18,
2: 2 69 40 96,99,60,3,33,27,
3: 3 65 84 100,38,56,72,84,55,
4: 4 98 74 91,69,24,63,27,100,
5: 5 46 52 65,4,59,41,8,51,
to
time bananas apples v18
1: 1 66 73 100,36,74,77,68,46,
2: 2 19 37 84,88,92,8,37,52,
3: 3 94 77 37,94,13,7,93,43,
4: 4 88 2 27,93,71,16,46,66,
5: 5 91 91 85,94,58,49,19,1,
to[from,v18:=i.v18]
to
time bananas apples v18
1: 1 66 73 88,47,1,71,69,18,
2: 2 19 37 96,99,60,3,33,27,
3: 3 94 77 100,38,56,72,84,55,
4: 4 88 2 91,69,24,63,27,100,
5: 5 91 91 65,4,59,41,8,51,
If you are not used to list column printing, the trailing comma signifies that more items are in that vector. Just the first 6 are printed.
3. Use data.frame syntax on the data.table
to = as.data.table(to)
from = as.data.table(from)
setkey(to,time)
setkey(from,time)
from
time blah foo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1: 1 66 22 98 2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2: 2 35 13 74 72 50 52 8 57 61 18 56 53 90 7 85 65 20 76 39 12
3: 3 27 47 36 11 49 21 4 53 24 75 33 8 45 34 86 75 89 73 11 85
4: 4 97 90 44 45 18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5: 5 61 58 15 65 76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79
to
time bananas apples 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1: 1 27 90 21 50 94 39 49 67 83 79 48 10 92 26 34 90 44 21 24 80
2: 2 37 94 18 72 22 2 60 80 65 3 87 32 30 48 84 87 72 72 6 46
3: 3 57 65 69 100 66 39 50 11 79 48 44 52 46 77 35 39 40 13 65 42
4: 4 89 62 39 39 13 87 19 73 56 74 25 67 34 9 34 78 33 25 88 82
5: 5 20 6 77 78 27 35 83 42 53 70 8 41 66 88 48 97 76 15 78 61
to[from, paste0(1:18)] <- from[,paste0(1:18),with=FALSE]
to
time bananas apples 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1: 1 27 90 98 2 100 46 58 60 69 46 62 19 29 42 64 90 30 19 72 60
2: 2 37 94 74 72 50 52 8 57 61 18 56 53 90 7 85 65 20 76 39 12
3: 3 57 65 36 11 49 21 4 53 24 75 33 8 45 34 86 75 89 73 11 85
4: 4 89 62 44 45 18 23 65 99 26 11 46 28 78 73 40 61 51 95 93 32
5: 5 20 6 15 65 76 60 93 51 73 87 51 22 89 34 39 91 88 55 29 79
So the LHS of <- can use data.table keyed join syntax; i.e. to[from]. It's just that this method (currently in R) will copy the entire to dataset. That's what := was introduced to avoid by providing update by reference. Also, if each row in from matches to multiple rows in to then the RHS of <- would need to expanded to line up (by you the user), otherwise the RHS would be recycled to fill up the LHS. That's one reason why, in data.table, we like := being inside j, all inside [...].

Resources