Thanks to #akrun, I could run my previous question about merging and creating tables with loop. Merge and create tables using a loop
However, because my laptop only has 16GB of RAM, I couldn't run the large dataset using the code. So, instead of merging 100 times, I decided to separate the process, and do it step by step using a for-loop.
I was going to create 20 lists of data using for loop, but then I couldn't find a way to make this happen.
To be specific, I would run the following 20 lines of code manually without using a for loop.
list1 <- mget(paste0("", 1:5))
list2 <- mget(paste0("", 6:10))
list3 <- mget(paste0("", 11:15))
list4 <- mget(paste0("", 16:20))
list5 <- mget(paste0("", 21:25))
...
list20 <- mget(paste0("", 96:100))
How would I write for loop in this case?
I tried to find a way to do this (for example as below), but I am getting an error.
for(i in 1:20){
list[i] <- mget(paste0("",5*i-4:5*i))
}
Thanks in advance for all your help!
There are multiple ways to create the list. Either use split with %/%
fulllst <- lapply(split(as.character(1:100), (1:100-1) %/% 5 + 1), mget)
Or use the same code in OP's post by wrapping the code with () to avoid evaluation based on precedence of operators
# create an empty list to store the output
lstout <- vector('list', 20)
# loop over the sequence and add the `()` for `(5* i- 4)` and similarly for (5*i)
for(i in 1:20)
lstout[[i]] <- mget(as.character((5 *i -4):(5*i)))
Use print to find the difference
> for(i in 1:20) print((5 *i -4):(5*i))
[1] 1 2 3 4 5
[1] 6 7 8 9 10
[1] 11 12 13 14 15
[1] 16 17 18 19 20
[1] 21 22 23 24 25
[1] 26 27 28 29 30
[1] 31 32 33 34 35
[1] 36 37 38 39 40
[1] 41 42 43 44 45
[1] 46 47 48 49 50
[1] 51 52 53 54 55
[1] 56 57 58 59 60
[1] 61 62 63 64 65
[1] 66 67 68 69 70
[1] 71 72 73 74 75
[1] 76 77 78 79 80
[1] 81 82 83 84 85
[1] 86 87 88 89 90
[1] 91 92 93 94 95
[1] 96 97 98 99 100
> for(i in 1:20) print(5 *i -4:5*i)
[1] 1 0
[1] 2 0
[1] 3 0
[1] 4 0
[1] 5 0
[1] 6 0
[1] 7 0
[1] 8 0
[1] 9 0
[1] 10 0
[1] 11 0
[1] 12 0
[1] 13 0
[1] 14 0
[1] 15 0
[1] 16 0
[1] 17 0
[1] 18 0
[1] 19 0
[1] 20 0
ie. if we don't use the () the evaluation will be
i <- 1
(5 * i) - (4:5 * i)
[1] 1 0
# instead of
(5 * i -4):(5 * i)
[1] 1 2 3 4 5
The operator precendence is showed in ?Syntax
:: ::: access variables in a namespace
$ # component / slot extraction
[ [[ indexing
^ exponentiation (right to left)
- + unary minus and plus
: sequence operator
%any% |> special operators (including %% and %/%)
* / multiply, divide
+ - (binary) add, subtract
....
Related
It works that it iterates over word, but the variable "word" contains a word, instead of the number (position) of that word in the row. For example, in the first row, 'yzi' has number 1, and 'runner' has number 3. Can anyone help?
Are you looking for this?
lapply(strsplit(output$text, ' '), function(x) seq_along(x)^2)
#[[1]]
# [1] 1 4 9 16 25 36 49 64 81 100 121 144 169 196
#[[2]]
# [1] 1 4 9 16 25 36 49 64 81 100 121 144 169 196
#[[3]]
# [1] 1 4 9 16 25 36 49 64 81 100 121 144 169
#[[4]]
# [1] 1 4 9 16 25 36 49 64 81 100
#...
#...
Or in a loop -
for(row in 1:nrow(output)){
list=strsplit(output$text[row], " ")[[1]]
for(i in seq_along(list)){
print(i^2)
}
}
We can use map
library(purrr)
map(strsplit(output$text, ' '), ~ seq_along(.x)^2)
I have a function like this
extract = function(x)
{
a = x$2007[6:18]
b = x$2007[30:42]
c = x$2007[54:66]
}
the subsetting needs to continue up to 744 in this way. I need to skip the first 6 data points, and then pull out every other 12 points into a new object or a list. Is there a more elegant way to do this with a for loop or apply?
Side note: if 2007 is truly a column name (you would have had to explicitly do this, R defaults to converting numbers to names starting with letters, see make.names("2007")), then x$"2007"[6:18] (etc) should work for column reference.
To generate that sequence of integers, let's try
nr <- 100
ind <- seq(6, nr, by = 12)
ind
# [1] 6 18 30 42 54 66 78 90
ind[ seq_along(ind) %% 2 == 1 ]
# [1] 6 30 54 78
ind[ seq_along(ind) %% 2 == 0 ]
# [1] 18 42 66 90
Map(seq, ind[ seq_along(ind) %% 2 == 1 ], ind[ seq_along(ind) %% 2 == 0 ])
# [[1]]
# [1] 6 7 8 9 10 11 12 13 14 15 16 17 18
# [[2]]
# [1] 30 31 32 33 34 35 36 37 38 39 40 41 42
# [[3]]
# [1] 54 55 56 57 58 59 60 61 62 63 64 65 66
# [[4]]
# [1] 78 79 80 81 82 83 84 85 86 87 88 89 90
So you can use this in your function to create a list of subsets:
nr <- nrow(x)
ind <- seq(6, nr, by = 12)
out <- lapply(Map(seq, ind[ seq_along(ind) %% 2 == 1 ], ind[ seq_along(ind) %% 2 == 0 ]),
function(i) x$"2007"[i])
we could use
split( x[7:744] , cut(7:744,seq(7,744,12)) )
I am having difficulties trying to order a list element-wise by decreasing order...
I have a ByPos_Mindex object or a list of 1000 IRange objects (CG_seqP) from
C <- vmatchPattern(CG, CPGi_Seq, max.mismatch = 0, with.indels = FALSE)
IRanges object with 27 ranges and 0 metadata columns:
start end width
<integer> <integer> <integer>
[1] 1 2 2
[2] 3 4 2
[3] 9 10 2
[4] 27 28 2
[5] 34 35 2
... ... ... ...
[23] 189 190 2
[24] 207 208 2
[25] 212 213 2
[26] 215 216 2
[27] 218 219 2
length(1000 of these IRanges)
I then change this to a list of only the start integers (which I want)
CG_SeqP <- sapply(C, function(x) sapply(as.vector(x), "[", 1))
[[1]]
[1] 1 3 9 27 34 47 52 56 62 66 68 70 89 110 112
[16] 136 140 146 154 160 163 178 189 207 212 215 218
(1000 of these)
The Problem happens when I try and order the list of elements using
CG_SeqP <- sapply(as.vector(CG_SeqP),order, decreasing = TRUE)
I get a list of what I think is row numbers so if the first IRAnge object is 27 I get this...
CG_SeqP[1]
[[1]]
[1] 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8
[21] 7 6 5 4 3 2 1
So the decreasing has worked but not for my actual list of elements>?
Any suggestions, thanks in advance.
Order returns order of the sequence not the actual elements of your vector, to extract it let us look at a toy example (I am following your idea here) :
set.seed(1)
alist1 <- list(a = sample(1:100, 30))
So, If you print alist1 with the current seed value , you will have below results:
> alist1
$a
[1] 99 51 67 59 23 25 69 43 17 68 10 77 55 49 29 39 93 16 44
[20] 7 96 92 80 94 34 97 66 31 5 24
Now to sort them either you use sort function or you can use order, sort just sorts the data, whereas order just returns the order number of the elements in a sorted sequence. It doesn't return the actual sequence, it returns the position. Hence we need to put those positions in the actual vector using square notation brackets to get the right sorted outcome.
lapply(as.vector(alist1),function(x)x[order(x, decreasing = TRUE)])
I have used lapply instead of sapply just to enforce the outcome as a list. You are free to choose any command basis your need
Will return:
#> lapply(as.vector(alist1),function(x)x[order(x, decreasing = TRUE)])
#$a
# [1] 99 97 96 94 93 92 80 77 69 68 67 66 59 55 51 49 44 43 39
#[20] 34 31 29 25 24 23 17 16 10 7 5
I hope this clarifies your doubt. Thanks
I have a long list of numbers, e.g.
set.seed(123)
y<-round(runif(100, 0, 200))
And I would like to store in column y the number of values that exceed each value in column x of a data frame:
df <- data.frame(x=seq(0,200,20))
I can compute the numbers manually, like this:
length(which(y>=20)) #93 values exceed 20
length(which(y>=40)) #81 values exceed 40
etc. I know I can use a for-loop with all values of x, but is there a more elegant way?
I tried this:
df$y <- length(which(y>=df$x))
But this gives a warning and does not give me the desired output.
The data frame should look like this:
df
x y
1 0 100
2 20 93
3 40 81
4 60 70
5 80 61
6 100 47
7 120 40
8 140 29
9 160 19
10 180 8
11 200 0
You can compare each value of df$x against all value of y using sapply
sapply(df$x, function(a) sum(y>a))
#[1] 99 93 81 70 61 47 40 29 18 6 0
#Looking at your output, maybe you want
sapply(df$x, function(a) sum(y>=a))
#[1] 100 93 81 70 61 47 40 29 19 8 0
Here's another approach using outer that allows for element wise comparison of two vectors
rowSums(outer(df$x,y, "<="))
#[1] 100 93 81 70 61 47 40 29 19 8 0
Yet one more (from alexis_laz's comment)
length(y) - findInterval(df$x, sort(y), left.open = TRUE)
# [1] 100 93 81 70 61 47 40 29 19 8 0
I have the following data frame
df <- data.frame(A = c(0,100,5), B = c(0,100,10), C = c(0,100,25)
which I use with this function
for(i in c(1:3)))
{seq(df [1,i], df [2,i], df [3,i])
}
I need store the output but so far I only managed to print the results using
{y<-seq(df [1,i], df [2,i], df [3,i])
print(y)}
Instead I would like to store these output in a list to obtain something like
[[1]]
[1] 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
[[2]]
[1] 0 10 20 30 40 50 60 70 80 90 100
[[3]]
[1] 0 25 50 30 100
Use lapply instead of for loop
> lapply(1:3, function(i) seq(df[1,i], df[2,i], df[3,i]))
[[1]]
[1] 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
[[2]]
[1] 0 10 20 30 40 50 60 70 80 90 100
[[3]]
[1] 0 25 50 75 100
The best solution here is to use lapply as #Jilber.
In case you want to know how to append a list:
result = list()
for(i in c(1:3))
{
result[[i]]<- seq(df [1,i], df [2,i], df [3,i])
}
result
lapply is the way to go. Here is another approach that uses the fact that a data.frame is simply a list.
lapply(df, function(x) do.call(seq,as.list(x)))
# $A
# [1] 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
#
# $B
# [1] 0 10 20 30 40 50 60 70 80 90 100
#
# $C
# [1] 0 25 50 75 100