I have the following data frame
df <- data.frame(A = c(0,100,5), B = c(0,100,10), C = c(0,100,25)
which I use with this function
for(i in c(1:3)))
{seq(df [1,i], df [2,i], df [3,i])
}
I need store the output but so far I only managed to print the results using
{y<-seq(df [1,i], df [2,i], df [3,i])
print(y)}
Instead I would like to store these output in a list to obtain something like
[[1]]
[1] 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
[[2]]
[1] 0 10 20 30 40 50 60 70 80 90 100
[[3]]
[1] 0 25 50 30 100
Use lapply instead of for loop
> lapply(1:3, function(i) seq(df[1,i], df[2,i], df[3,i]))
[[1]]
[1] 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
[[2]]
[1] 0 10 20 30 40 50 60 70 80 90 100
[[3]]
[1] 0 25 50 75 100
The best solution here is to use lapply as #Jilber.
In case you want to know how to append a list:
result = list()
for(i in c(1:3))
{
result[[i]]<- seq(df [1,i], df [2,i], df [3,i])
}
result
lapply is the way to go. Here is another approach that uses the fact that a data.frame is simply a list.
lapply(df, function(x) do.call(seq,as.list(x)))
# $A
# [1] 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
#
# $B
# [1] 0 10 20 30 40 50 60 70 80 90 100
#
# $C
# [1] 0 25 50 75 100
Related
Thanks to #akrun, I could run my previous question about merging and creating tables with loop. Merge and create tables using a loop
However, because my laptop only has 16GB of RAM, I couldn't run the large dataset using the code. So, instead of merging 100 times, I decided to separate the process, and do it step by step using a for-loop.
I was going to create 20 lists of data using for loop, but then I couldn't find a way to make this happen.
To be specific, I would run the following 20 lines of code manually without using a for loop.
list1 <- mget(paste0("", 1:5))
list2 <- mget(paste0("", 6:10))
list3 <- mget(paste0("", 11:15))
list4 <- mget(paste0("", 16:20))
list5 <- mget(paste0("", 21:25))
...
list20 <- mget(paste0("", 96:100))
How would I write for loop in this case?
I tried to find a way to do this (for example as below), but I am getting an error.
for(i in 1:20){
list[i] <- mget(paste0("",5*i-4:5*i))
}
Thanks in advance for all your help!
There are multiple ways to create the list. Either use split with %/%
fulllst <- lapply(split(as.character(1:100), (1:100-1) %/% 5 + 1), mget)
Or use the same code in OP's post by wrapping the code with () to avoid evaluation based on precedence of operators
# create an empty list to store the output
lstout <- vector('list', 20)
# loop over the sequence and add the `()` for `(5* i- 4)` and similarly for (5*i)
for(i in 1:20)
lstout[[i]] <- mget(as.character((5 *i -4):(5*i)))
Use print to find the difference
> for(i in 1:20) print((5 *i -4):(5*i))
[1] 1 2 3 4 5
[1] 6 7 8 9 10
[1] 11 12 13 14 15
[1] 16 17 18 19 20
[1] 21 22 23 24 25
[1] 26 27 28 29 30
[1] 31 32 33 34 35
[1] 36 37 38 39 40
[1] 41 42 43 44 45
[1] 46 47 48 49 50
[1] 51 52 53 54 55
[1] 56 57 58 59 60
[1] 61 62 63 64 65
[1] 66 67 68 69 70
[1] 71 72 73 74 75
[1] 76 77 78 79 80
[1] 81 82 83 84 85
[1] 86 87 88 89 90
[1] 91 92 93 94 95
[1] 96 97 98 99 100
> for(i in 1:20) print(5 *i -4:5*i)
[1] 1 0
[1] 2 0
[1] 3 0
[1] 4 0
[1] 5 0
[1] 6 0
[1] 7 0
[1] 8 0
[1] 9 0
[1] 10 0
[1] 11 0
[1] 12 0
[1] 13 0
[1] 14 0
[1] 15 0
[1] 16 0
[1] 17 0
[1] 18 0
[1] 19 0
[1] 20 0
ie. if we don't use the () the evaluation will be
i <- 1
(5 * i) - (4:5 * i)
[1] 1 0
# instead of
(5 * i -4):(5 * i)
[1] 1 2 3 4 5
The operator precendence is showed in ?Syntax
:: ::: access variables in a namespace
$ # component / slot extraction
[ [[ indexing
^ exponentiation (right to left)
- + unary minus and plus
: sequence operator
%any% |> special operators (including %% and %/%)
* / multiply, divide
+ - (binary) add, subtract
....
We are looking to create a vector with the following sequence:
1,4,5,8,9,12,13,16,17,20,21,...
Start with 1, then skip 2 numbers, then add 2 numbers, then skip 2 numbers, etc., not going above 2000. We also need the inverse sequence 2,3,6,7,10,11,...
We may use recyling vector to filter the sequence
(1:21)[c(TRUE, FALSE, FALSE, TRUE)]
[1] 1 4 5 8 9 12 13 16 17 20 21
Here's an approach using rep and cumsum. Effectively, "add up alternating increments of 1 (successive #s) and 3 (skip two)."
cumsum(rep(c(1,3), 500))
and
cumsum(rep(c(3,1), 500)) - 1
Got this one myself - head(sort(c(seq(1, 2000, 4), seq(4, 2000, 4))), 20)
We can try like below
> (v <- seq(21))[v %% 4 %in% c(0, 1)]
[1] 1 4 5 8 9 12 13 16 17 20 21
You may arrange the data in a matrix and extract 1st and 4th column.
val <- 1:100
sort(c(matrix(val, ncol = 4, byrow = TRUE)[, c(1, 4)]))
# [1] 1 4 5 8 9 12 13 16 17 20 21 24 25 28 29 32 33
#[18] 36 37 40 41 44 45 48 49 52 53 56 57 60 61 64 65 68
#[35] 69 72 73 76 77 80 81 84 85 88 89 92 93 96 97 100
A tidyverse option.
library(purrr)
library(dplyr)
map_int(1:11, ~ case_when(. == 1 ~ as.integer(1),
. %% 2 == 0 ~ as.integer(.*2),
T ~ as.integer((.*2)-1)))
# [1] 1 4 5 8 9 12 13 16 17 20 21
I have a function like this
extract = function(x)
{
a = x$2007[6:18]
b = x$2007[30:42]
c = x$2007[54:66]
}
the subsetting needs to continue up to 744 in this way. I need to skip the first 6 data points, and then pull out every other 12 points into a new object or a list. Is there a more elegant way to do this with a for loop or apply?
Side note: if 2007 is truly a column name (you would have had to explicitly do this, R defaults to converting numbers to names starting with letters, see make.names("2007")), then x$"2007"[6:18] (etc) should work for column reference.
To generate that sequence of integers, let's try
nr <- 100
ind <- seq(6, nr, by = 12)
ind
# [1] 6 18 30 42 54 66 78 90
ind[ seq_along(ind) %% 2 == 1 ]
# [1] 6 30 54 78
ind[ seq_along(ind) %% 2 == 0 ]
# [1] 18 42 66 90
Map(seq, ind[ seq_along(ind) %% 2 == 1 ], ind[ seq_along(ind) %% 2 == 0 ])
# [[1]]
# [1] 6 7 8 9 10 11 12 13 14 15 16 17 18
# [[2]]
# [1] 30 31 32 33 34 35 36 37 38 39 40 41 42
# [[3]]
# [1] 54 55 56 57 58 59 60 61 62 63 64 65 66
# [[4]]
# [1] 78 79 80 81 82 83 84 85 86 87 88 89 90
So you can use this in your function to create a list of subsets:
nr <- nrow(x)
ind <- seq(6, nr, by = 12)
out <- lapply(Map(seq, ind[ seq_along(ind) %% 2 == 1 ], ind[ seq_along(ind) %% 2 == 0 ]),
function(i) x$"2007"[i])
we could use
split( x[7:744] , cut(7:744,seq(7,744,12)) )
I have a csv file with three columns. The first column is pentad dates (73 pentads in a year) while the second and third columns are for precipitation values.
What I want to do:
[1]. Get the first pentad when the precipitation exceeds the "annual mean" in "at least three consecutive pentads".
I can subset the first column like this:
dat<-read.csv("test.csv",header=T,sep=",")
aa<-which(dat$RR>mean(dat$RR))
This gives me the following:
[1] 27 28 29 30 31 34 36 37 38 41 42 43 44 45 46 52 53 54 55 56 57
The correct output should be P27 in this case.
In the second column:
[1] 31 32 36 38 39 40 41 42 43 44 45 46 47 48 49 50 53 54 55 57 59 60 61
The correct output should be P38.
How can I add a conditional statement here taking into consideration the "three consecutive pentads"?
I don't know how I can implement this in R (in a code). I'll appreciate any suggestion.
I have the following data:
Pentad RR YY
1 0 0.5771428571
2 0.0142857143 0
3 0 1.2828571429
4 0.0885714286 1.4457142857
5 0.0714285714 0.1114285714
6 0 0.36
7 0.0657142857 0
8 0.0285714286 0
9 0.0942857143 0
10 0.0114285714 1
11 0 0.0114285714
12 0 0.0085714286
13 0 0.3057142857
14 0 0
15 0 0
16 0 0
17 0.04 0
18 0 0.8
19 0.8142857143 0.0628571429
20 0.2857142857 0
21 1.14 0
22 5.3342857143 0
23 2.3514285714 0
24 1.9857142857 0.0133333333
25 1.4942857143 0.0433333333
26 2.0057142857 1.4866666667
27 20.0485714286 0
28 25.0085714286 2.4866666667
29 16.32 1.9433333333
30 11.0685714286 0.7733333333
31 8.9657142857 8.1066666667
32 3.9857142857 7.7333333333
33 5.2028571429 0.5
34 7.8028571429 4.3566666667
35 4.4514285714 2.66
36 9.22 6.6266666667
37 32.0485714286 4.4042857143
38 19.5057142857 7.9771428571
39 3.1485714286 12.9428571429
40 2.4342857143 18.4942857143
41 9.0571428571 7.3571428571
42 28.7085714286 11.0828571429
43 34.1514285714 9.0342857143
44 33.0257142857 14.2914285714
45 46.5057142857 34.6142857143
46 70.6171428571 45.3028571429
47 3.1685714286 6.66
48 1.9285714286 6.7028571429
49 7.0314285714 5.9628571429
50 0.9028571429 14.8542857143
51 5.3771428571 2.1
52 11.3571428571 2.8371428571
53 15.0457142857 7.3914285714
54 11.6628571429 32.0371428571
55 21.24 9.0057142857
56 11.4371428571 3.5257142857
57 11.6942857143 12.32
58 2.9771428571 2.32
59 4.3371428571 7.9942857143
60 0.8714285714 6.5657142857
61 1.3914285714 4.7714285714
62 0.8714285714 2.3542857143
63 1.1457142857 0.0057142857
64 2.3171428571 2.5085714286
65 0.1828571429 0.8171428571
66 0.2828571429 2.8857142857
67 0.3485714286 0.8971428571
68 0 0
69 0.3457142857 0
70 0.1428571429 0
71 0.18 0
72 4.8942857143 0.1457142857
73 0.0371428571 0.4342857143
Something like this should do it:
first_exceed_seq <- function(x, thresh = mean(x), len = 3)
{
# Logical vector, does x exceed the threshold
exceed_thresh <- x > thresh
# Indices of transition points; where exceed_thresh[i - 1] != exceed_thresh[i]
transition <- which(diff(c(0, exceed_thresh)) != 0)
# Reference index, grouping observations after each transition
index <- vector("numeric", length(x))
index[transition] <- 1
index <- cumsum(index)
# Break x into groups following the transitions
exceed_list <- split(exceed_thresh, index)
# Get the number of values exceeded in each index period
num_exceed <- vapply(exceed_list, sum, numeric(1))
# Get the starting index of the first sequence where more then len exceed thresh
transition[as.numeric(names(which(num_exceed >= len))[1])]
}
first_exceed_seq(dat$RR)
first_exceed_seq(dat$YY)
I have a long list of numbers, e.g.
set.seed(123)
y<-round(runif(100, 0, 200))
And I would like to store in column y the number of values that exceed each value in column x of a data frame:
df <- data.frame(x=seq(0,200,20))
I can compute the numbers manually, like this:
length(which(y>=20)) #93 values exceed 20
length(which(y>=40)) #81 values exceed 40
etc. I know I can use a for-loop with all values of x, but is there a more elegant way?
I tried this:
df$y <- length(which(y>=df$x))
But this gives a warning and does not give me the desired output.
The data frame should look like this:
df
x y
1 0 100
2 20 93
3 40 81
4 60 70
5 80 61
6 100 47
7 120 40
8 140 29
9 160 19
10 180 8
11 200 0
You can compare each value of df$x against all value of y using sapply
sapply(df$x, function(a) sum(y>a))
#[1] 99 93 81 70 61 47 40 29 18 6 0
#Looking at your output, maybe you want
sapply(df$x, function(a) sum(y>=a))
#[1] 100 93 81 70 61 47 40 29 19 8 0
Here's another approach using outer that allows for element wise comparison of two vectors
rowSums(outer(df$x,y, "<="))
#[1] 100 93 81 70 61 47 40 29 19 8 0
Yet one more (from alexis_laz's comment)
length(y) - findInterval(df$x, sort(y), left.open = TRUE)
# [1] 100 93 81 70 61 47 40 29 19 8 0