Subsetting data at an irregular interval in an R function - r

I have a function like this
extract = function(x)
{
a = x$2007[6:18]
b = x$2007[30:42]
c = x$2007[54:66]
}
the subsetting needs to continue up to 744 in this way. I need to skip the first 6 data points, and then pull out every other 12 points into a new object or a list. Is there a more elegant way to do this with a for loop or apply?

Side note: if 2007 is truly a column name (you would have had to explicitly do this, R defaults to converting numbers to names starting with letters, see make.names("2007")), then x$"2007"[6:18] (etc) should work for column reference.
To generate that sequence of integers, let's try
nr <- 100
ind <- seq(6, nr, by = 12)
ind
# [1] 6 18 30 42 54 66 78 90
ind[ seq_along(ind) %% 2 == 1 ]
# [1] 6 30 54 78
ind[ seq_along(ind) %% 2 == 0 ]
# [1] 18 42 66 90
Map(seq, ind[ seq_along(ind) %% 2 == 1 ], ind[ seq_along(ind) %% 2 == 0 ])
# [[1]]
# [1] 6 7 8 9 10 11 12 13 14 15 16 17 18
# [[2]]
# [1] 30 31 32 33 34 35 36 37 38 39 40 41 42
# [[3]]
# [1] 54 55 56 57 58 59 60 61 62 63 64 65 66
# [[4]]
# [1] 78 79 80 81 82 83 84 85 86 87 88 89 90
So you can use this in your function to create a list of subsets:
nr <- nrow(x)
ind <- seq(6, nr, by = 12)
out <- lapply(Map(seq, ind[ seq_along(ind) %% 2 == 1 ], ind[ seq_along(ind) %% 2 == 0 ]),
function(i) x$"2007"[i])

we could use
split( x[7:744] , cut(7:744,seq(7,744,12)) )

Related

For loop in multiple of fives

Thanks to #akrun, I could run my previous question about merging and creating tables with loop. Merge and create tables using a loop
However, because my laptop only has 16GB of RAM, I couldn't run the large dataset using the code. So, instead of merging 100 times, I decided to separate the process, and do it step by step using a for-loop.
I was going to create 20 lists of data using for loop, but then I couldn't find a way to make this happen.
To be specific, I would run the following 20 lines of code manually without using a for loop.
list1 <- mget(paste0("", 1:5))
list2 <- mget(paste0("", 6:10))
list3 <- mget(paste0("", 11:15))
list4 <- mget(paste0("", 16:20))
list5 <- mget(paste0("", 21:25))
...
list20 <- mget(paste0("", 96:100))
How would I write for loop in this case?
I tried to find a way to do this (for example as below), but I am getting an error.
for(i in 1:20){
list[i] <- mget(paste0("",5*i-4:5*i))
}
Thanks in advance for all your help!
There are multiple ways to create the list. Either use split with %/%
fulllst <- lapply(split(as.character(1:100), (1:100-1) %/% 5 + 1), mget)
Or use the same code in OP's post by wrapping the code with () to avoid evaluation based on precedence of operators
# create an empty list to store the output
lstout <- vector('list', 20)
# loop over the sequence and add the `()` for `(5* i- 4)` and similarly for (5*i)
for(i in 1:20)
lstout[[i]] <- mget(as.character((5 *i -4):(5*i)))
Use print to find the difference
> for(i in 1:20) print((5 *i -4):(5*i))
[1] 1 2 3 4 5
[1] 6 7 8 9 10
[1] 11 12 13 14 15
[1] 16 17 18 19 20
[1] 21 22 23 24 25
[1] 26 27 28 29 30
[1] 31 32 33 34 35
[1] 36 37 38 39 40
[1] 41 42 43 44 45
[1] 46 47 48 49 50
[1] 51 52 53 54 55
[1] 56 57 58 59 60
[1] 61 62 63 64 65
[1] 66 67 68 69 70
[1] 71 72 73 74 75
[1] 76 77 78 79 80
[1] 81 82 83 84 85
[1] 86 87 88 89 90
[1] 91 92 93 94 95
[1] 96 97 98 99 100
> for(i in 1:20) print(5 *i -4:5*i)
[1] 1 0
[1] 2 0
[1] 3 0
[1] 4 0
[1] 5 0
[1] 6 0
[1] 7 0
[1] 8 0
[1] 9 0
[1] 10 0
[1] 11 0
[1] 12 0
[1] 13 0
[1] 14 0
[1] 15 0
[1] 16 0
[1] 17 0
[1] 18 0
[1] 19 0
[1] 20 0
ie. if we don't use the () the evaluation will be
i <- 1
(5 * i) - (4:5 * i)
[1] 1 0
# instead of
(5 * i -4):(5 * i)
[1] 1 2 3 4 5
The operator precendence is showed in ?Syntax
:: ::: access variables in a namespace
$ # component / slot extraction
[ [[ indexing
^ exponentiation (right to left)
- + unary minus and plus
: sequence operator
%any% |> special operators (including %% and %/%)
* / multiply, divide
+ - (binary) add, subtract
....

R - create vector with sequence c(1,4,5,8,9,12,13,16),etc

We are looking to create a vector with the following sequence:
1,4,5,8,9,12,13,16,17,20,21,...
Start with 1, then skip 2 numbers, then add 2 numbers, then skip 2 numbers, etc., not going above 2000. We also need the inverse sequence 2,3,6,7,10,11,...
We may use recyling vector to filter the sequence
(1:21)[c(TRUE, FALSE, FALSE, TRUE)]
[1] 1 4 5 8 9 12 13 16 17 20 21
Here's an approach using rep and cumsum. Effectively, "add up alternating increments of 1 (successive #s) and 3 (skip two)."
cumsum(rep(c(1,3), 500))
and
cumsum(rep(c(3,1), 500)) - 1
Got this one myself - head(sort(c(seq(1, 2000, 4), seq(4, 2000, 4))), 20)
We can try like below
> (v <- seq(21))[v %% 4 %in% c(0, 1)]
[1] 1 4 5 8 9 12 13 16 17 20 21
You may arrange the data in a matrix and extract 1st and 4th column.
val <- 1:100
sort(c(matrix(val, ncol = 4, byrow = TRUE)[, c(1, 4)]))
# [1] 1 4 5 8 9 12 13 16 17 20 21 24 25 28 29 32 33
#[18] 36 37 40 41 44 45 48 49 52 53 56 57 60 61 64 65 68
#[35] 69 72 73 76 77 80 81 84 85 88 89 92 93 96 97 100
A tidyverse option.
library(purrr)
library(dplyr)
map_int(1:11, ~ case_when(. == 1 ~ as.integer(1),
. %% 2 == 0 ~ as.integer(.*2),
T ~ as.integer((.*2)-1)))
# [1] 1 4 5 8 9 12 13 16 17 20 21

Transpose and rearrange rows in a matrix

I have several files with the following structure:
data <- matrix(c(1:100000), nrow=1000, ncol=100)
The first 500 rows are X coordinates and the final 500 rows are Y coordinates of several object contours. Row # 1 (X) and row 501 (Y) correspond to coordinates of the same object. I need to:
transpose the whole matrix and arrange it so now row 1 is column 1 and row 501 is column 2 and have paired x, y coordinates in contiguous columns. Row 2 and row 502 should be in column 1 and column 2 below the data of previous object.
ideally, have an extra column with filename info.
thanks.
Simpler version:
Transpose the matrix, then create a vector with the column indices and subset with them:
mat <- matrix(1:100, nrow = 10)
mat2 <- t(mat)
cols <- unlist(lapply(1:(nrow(mat2)/2), function(i) c(i, i+nrow(mat2)/2)))
mat3 <- mat2[,cols]
Then just make it a dataframe as below.
You can subset pairs of rows separated by nrow/2, make them a 2-column matrix and then cbind them all:
df <- as.data.frame(do.call(cbind, lapply(1:(nrow(mat)/2), function(i) {
matrix(mat[c(i, nrow(mat)/2 + i),], ncol = 2, byrow = TRUE)
})))
df
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 fname
# 1 1 6 2 7 3 8 4 9 5 10 a
# 2 11 16 12 17 13 18 14 19 15 20 e
# 3 21 26 22 27 23 28 24 29 25 30 e
# 4 31 36 32 37 33 38 34 39 35 40 o
# 5 41 46 42 47 43 48 44 49 45 50 y
# 6 51 56 52 57 53 58 54 59 55 60 q
# 7 61 66 62 67 63 68 64 69 65 70 v
# 8 71 76 72 77 73 78 74 79 75 80 b
# 9 81 86 82 87 83 88 84 89 85 90 v
# 10 91 96 92 97 93 98 94 99 95 100 y
Then just add the new column as necessary, since it's now a dataframe:
df$fname <- sample(letters, nrow(df), TRUE)
What about
n <- 500
df <- data.frame(col1 = data[1:n, ],
col2 = data[(nrow(data) - 500):nrow(data), ],
fileinfo = "this is the name of the file...")
Try David's answer, but this way:
n <- 500
df <- data.frame(col1 = data[1:n, ],
col2 = data[(nrow(data) - (n-1)):nrow(data), ],
fileinfo = "this is the name of the file...")

Plot a list of variable length vectors in R

I have a list which has multiple vectors (total 80) of various lengths. On the x-axis I want the names of these vectors. On the y-axis I want to plot the values corresponding to each vector. How can I do it in R?
One way to do this is to reshape the data using reshape2::melt or some other method. Please try and make a reproducible example. I think this is the gist of what you are after:
set.seed(4)
mylist <- list(a = sample(1:50, 10, T),
b = sample(25:40, 15, T),
c = sample(51:75, 20, T))
mylist
# $a
# [1] 30 1 15 14 41 14 37 46 48 4
#
# $b
# [1] 37 29 26 40 31 32 40 34 40 37 36 40 33 32 35
#
# $c
# [1] 71 63 72 63 64 65 56 72 67 63 75 62 66 60 51 74 57 65 55 73
library(ggplot2)
library(reshape2)
df <- melt(mylist)
head(df)
# value L1
# 1 30 a
# 2 1 a
# 3 15 a
# 4 14 a
# 5 41 a
# 6 14 a
ggplot(df, aes(x = factor(L1), y = value)) + geom_point()

Storing function output in a list in R

I have the following data frame
df <- data.frame(A = c(0,100,5), B = c(0,100,10), C = c(0,100,25)
which I use with this function
for(i in c(1:3)))
{seq(df [1,i], df [2,i], df [3,i])
}
I need store the output but so far I only managed to print the results using
{y<-seq(df [1,i], df [2,i], df [3,i])
print(y)}
Instead I would like to store these output in a list to obtain something like
[[1]]
[1] 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
[[2]]
[1] 0 10 20 30 40 50 60 70 80 90 100
[[3]]
[1] 0 25 50 30 100
Use lapply instead of for loop
> lapply(1:3, function(i) seq(df[1,i], df[2,i], df[3,i]))
[[1]]
[1] 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
[[2]]
[1] 0 10 20 30 40 50 60 70 80 90 100
[[3]]
[1] 0 25 50 75 100
The best solution here is to use lapply as #Jilber.
In case you want to know how to append a list:
result = list()
for(i in c(1:3))
{
result[[i]]<- seq(df [1,i], df [2,i], df [3,i])
}
result
lapply is the way to go. Here is another approach that uses the fact that a data.frame is simply a list.
lapply(df, function(x) do.call(seq,as.list(x)))
# $A
# [1] 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
#
# $B
# [1] 0 10 20 30 40 50 60 70 80 90 100
#
# $C
# [1] 0 25 50 75 100

Resources