I have a vector:
vec <- c(44,0,13,18,32,13,25,42,13,24)
I want to calculate fT as follows:
fT <- ifelse(vec >= 10 & vec <= 20, min(vec) - max(vec),
ifelse(vec > 20 & vec <= 50, max(vec) - min(vec),0))
I want to extent this calculation for each row of a dataframe i.e.
I have a dataframe and I want to calculate fT for each row.
A sample data:
dat <- data.frame(replicate(10,sample(0:50,1000,rep=TRUE)))
That means I will have another dataframe which will have the fT value for each value in dat.
To calculate fT for each row, I thought of using dplyr,
dat%>%
rowwise()%>%
mutate(fT = ifelse(dat[,1:10] >= 10 & dat[,1:10] <= 30, min(dat[,1:10]) - max(dat[,1:10]),
ifelse(dat[,1:10] > 30 & dat[,1:10] <= 50, max(dat[,1:10]) - min(dat[,1:10]),0)))
I am stuck at this stage. I do not know how to index by column so that for each row of dat, I have a
fT.
If you want the sums of fT, you can do this with apply:
dat$fT = apply(dat, 1, function(x) sum(ifelse(x >= 10 & x <= 20, min(x) - max(x),
ifelse(x > 20 & x <= 50, max(x) - min(x),0))))
Result:
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 fT
1 14 13 8 10 15 12 22 47 29 40 -39
2 40 30 7 48 42 50 20 30 24 44 301
3 20 8 7 19 30 36 18 4 37 12 -33
4 45 43 26 31 41 33 26 43 11 28 272
5 47 43 25 9 14 12 3 1 38 46 138
6 2 24 31 33 7 4 36 41 42 0 252
Note:
1 in apply specifies the row margin. This loops through the rows of the input, dat, and output a single sum of fT for each row.
Edit:
If you actually want the value of fT (not the sum), you can still use apply, but wrap the output with matrix and specify ncol=10 and byrow=TRUE. This means that you want an output matrix with 10 columns (just like dat) and fill the matrix rowwise with the output of apply:
new_dat = matrix(apply(dat, 1,
function(x) ifelse(x >= 10 & x <= 20, min(x) - max(x),
ifelse(x > 20 & x <= 50, max(x) - min(x),0))),
ncol = 10, byrow = TRUE)
Result:
> head(new_dat)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] -39 -39 0 -39 -39 -39 39 39 39 39
[2,] 43 43 0 43 43 43 -43 43 43 43
[3,] -33 0 0 -33 33 33 -33 0 33 -33
[4,] 34 34 34 34 34 34 34 34 -34 34
[5,] 46 46 46 0 -46 -46 0 0 46 46
[6,] 0 42 42 42 0 0 42 42 42 0
If you prefer sticking to dplyr, you can first transpose your dat and map it on the "columns", then transpose back:
library(dplyr)
library(purrr)
dat %>%
transpose() %>%
map_dfr(~ ifelse(. >= 10 & . <= 20, min(.) - max(.),
ifelse(. > 20 & . <= 50, max(.) - min(.),0))) %>%
transpose()
Result:
> head(new_dat2)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 -39 -39 0 -39 -39 -39 39 39 39 39
2 43 43 0 43 43 43 -43 43 43 43
3 -33 0 0 -33 33 33 -33 0 33 -33
4 34 34 34 34 34 34 34 34 -34 34
5 46 46 46 0 -46 -46 0 0 46 46
6 0 42 42 42 0 0 42 42 42 0
Note:
The advantage of using transpose instead of t in Base R is that you get a data.frame after transposing instead of a matrix.
Data:
set.seed(123)
dat <- data.frame(replicate(10,sample(0:50,1000,rep=TRUE)))
Here is one option with pmax/pmin which would be efficient
m1 <- (do.call(pmax, dat) - do.call(pmin, dat))[row(dat)]
out <- (-1*m1 *(dat >=10 & dat <=20)) + (m1*(dat > 20 & dat <=50))
all.equal(new_dat, out, check.attributes = FALSE)
#[1] TRUE
Benchmarks
set.seed(24)
dat <- data.frame(replicate(500,sample(0:50,15000,rep=TRUE)))
system.time({
new_dat = matrix(apply(dat, 1,
function(x) ifelse(x >= 10 & x <= 20, min(x) - max(x),
ifelse(x > 20 & x <= 50, max(x) - min(x),0))),
ncol = ncol(dat), byrow = TRUE)
})
#user system elapsed
# 2.67 0.10 2.77
system.time({
m1 <- (do.call(pmax, dat) - do.call(pmin, dat))[row(dat)]
out <- (-1*m1 *(dat >=10 & dat <=20)) + (m1*(dat > 20 & dat <=50))
})
# user system elapsed
# 0.48 0.11 0.60
#all.equal(new_dat, out, check.attributes = FALSE)
#[1] TRUE
Related
We are looking to create a vector with the following sequence:
1,4,5,8,9,12,13,16,17,20,21,...
Start with 1, then skip 2 numbers, then add 2 numbers, then skip 2 numbers, etc., not going above 2000. We also need the inverse sequence 2,3,6,7,10,11,...
We may use recyling vector to filter the sequence
(1:21)[c(TRUE, FALSE, FALSE, TRUE)]
[1] 1 4 5 8 9 12 13 16 17 20 21
Here's an approach using rep and cumsum. Effectively, "add up alternating increments of 1 (successive #s) and 3 (skip two)."
cumsum(rep(c(1,3), 500))
and
cumsum(rep(c(3,1), 500)) - 1
Got this one myself - head(sort(c(seq(1, 2000, 4), seq(4, 2000, 4))), 20)
We can try like below
> (v <- seq(21))[v %% 4 %in% c(0, 1)]
[1] 1 4 5 8 9 12 13 16 17 20 21
You may arrange the data in a matrix and extract 1st and 4th column.
val <- 1:100
sort(c(matrix(val, ncol = 4, byrow = TRUE)[, c(1, 4)]))
# [1] 1 4 5 8 9 12 13 16 17 20 21 24 25 28 29 32 33
#[18] 36 37 40 41 44 45 48 49 52 53 56 57 60 61 64 65 68
#[35] 69 72 73 76 77 80 81 84 85 88 89 92 93 96 97 100
A tidyverse option.
library(purrr)
library(dplyr)
map_int(1:11, ~ case_when(. == 1 ~ as.integer(1),
. %% 2 == 0 ~ as.integer(.*2),
T ~ as.integer((.*2)-1)))
# [1] 1 4 5 8 9 12 13 16 17 20 21
I have a function like this
extract = function(x)
{
a = x$2007[6:18]
b = x$2007[30:42]
c = x$2007[54:66]
}
the subsetting needs to continue up to 744 in this way. I need to skip the first 6 data points, and then pull out every other 12 points into a new object or a list. Is there a more elegant way to do this with a for loop or apply?
Side note: if 2007 is truly a column name (you would have had to explicitly do this, R defaults to converting numbers to names starting with letters, see make.names("2007")), then x$"2007"[6:18] (etc) should work for column reference.
To generate that sequence of integers, let's try
nr <- 100
ind <- seq(6, nr, by = 12)
ind
# [1] 6 18 30 42 54 66 78 90
ind[ seq_along(ind) %% 2 == 1 ]
# [1] 6 30 54 78
ind[ seq_along(ind) %% 2 == 0 ]
# [1] 18 42 66 90
Map(seq, ind[ seq_along(ind) %% 2 == 1 ], ind[ seq_along(ind) %% 2 == 0 ])
# [[1]]
# [1] 6 7 8 9 10 11 12 13 14 15 16 17 18
# [[2]]
# [1] 30 31 32 33 34 35 36 37 38 39 40 41 42
# [[3]]
# [1] 54 55 56 57 58 59 60 61 62 63 64 65 66
# [[4]]
# [1] 78 79 80 81 82 83 84 85 86 87 88 89 90
So you can use this in your function to create a list of subsets:
nr <- nrow(x)
ind <- seq(6, nr, by = 12)
out <- lapply(Map(seq, ind[ seq_along(ind) %% 2 == 1 ], ind[ seq_along(ind) %% 2 == 0 ]),
function(i) x$"2007"[i])
we could use
split( x[7:744] , cut(7:744,seq(7,744,12)) )
I have a table with eighty columns and I want to create columns by multiplying var1*var41 var1*var42....var1*var80. var2*var41 var2*var42...var2*var80. How could I write a loop to multiply the columns and write the labeled product into a .csv? The result should have 1600 additional columns.
I took a stab at this with some fake data:
# Fake data (arbitraty 5 rows)
mtx <- sample(1:100, 5 * 80, replace = T)
dim(mtx) <- c(5,80)
colnames(mtx) <- paste0("V", 1:ncol(mtx)) # Name the original columns
mtx[1:5,1:5]
# V1 V2 V3 V4 V5
#[1,] 8 10 69 84 92
#[2,] 59 34 36 96 86
#[3,] 51 26 78 63 8
#[4,] 74 93 73 70 49
#[5,] 62 30 20 43 9
Using a for loop, one might try something like this:
v <- expand.grid(1:40,41:80) # all combos
v[c(1:3,1598:1600),]
# Var1 Var2
#1 1 41
#2 2 41
#3 3 41
#1598 38 80
#1599 39 80
#1600 40 80
# Initialize matrix for multiplication results
newcols <- matrix(NA, nrow = nrow(mtx), ncol = nrow(v))
# Run the for loop
for(i in 1:nrow(v)) newcols[,i] <- mtx[,v[i,1]] * mtx[,v[i,2]]
# save the names as "V1xV41" format with apply over rows (Margin = 1)
# meaning, for each row in v, paste "V" in front and "x" between
colnames(newcols) <- apply(v, MARGIN = 1, function(eachv) paste0("V", eachv, collapse="x"))
# combine the additional 1600 columns
tocsv <- cbind(mtx, newcols)
tocsv[,78:83] # just to view old and new columns
# V78 V79 V80 V1xV41 V2xV41 V3xV41
#[1,] 17 92 13 429 741 1079
#[2,] 70 94 1 4836 4464 5115
#[3,] 6 77 93 3740 1020 3468
#[4,] 88 34 26 486 258 66
#[5,] 48 77 61 873 4365 970
# Write it
write.csv(tocsv, "C:/Users/Evan Friedland/Documents/NEWFILENAME.csv")
I am new to R functions, always preferred to use packages and avoid loops. However, now I am trying to create a loop for a specific question that I have. I would like to subset a dataset based on ranges. I think the code below is self explanatory.
dt = as.data.frame(sample(1:100))
names(dt) = "num"
subs.it <- function(x) {
subs <- subset(dt, num >= (x - 5) & num <= (x + 5))
return(subs)
}
subs.it(c(15, 50))
wrong output:
num
44 55
47 20
65 19
77 17
83 12
91 16
92 51
100 54
correct:
num
4 15
18 11
47 20
50 13
54 10
65 19
66 14
77 17
82 18
83 12
91 16
17 48
19 53
29 45
33 52
39 46
44 55
45 50
49 49
89 47
92 51
100 54
I can't find what I am doing wrong.
Thanks
It seems like the function you are looking for is subset itself. Try:
subset(dt, num > 15 & num <50)
edit:
ah I see you want two different ranges. You can do this:
x = 15
y = 50
subset(dt, (num >= x-5 & num <= x+5) | (num >= y-5 & num <= y+5))
or a more compact version using absolute values:
subset(dt, (abs(num - x) <= 5 | abs(num - y) <= 5))
Here you go.
set.seed(12345)
library(dplyr)
subs.it <- function(x, y, z) {
subs <- x %>% filter(
(num >= (y-5) & num <= (y+5)) | (num >= (z-5) & num <= (z+5))
)
return(subs)
}
subs.it(dt, 15, 55)
num
1 16
2 14
3 15
4 55
5 52
6 17
7 56
8 13
9 57
10 54
11 18
12 53
13 11
14 58
15 19
16 10
17 51
18 60
19 20
20 50
21 12
22 59
I have several files with the following structure:
data <- matrix(c(1:100000), nrow=1000, ncol=100)
The first 500 rows are X coordinates and the final 500 rows are Y coordinates of several object contours. Row # 1 (X) and row 501 (Y) correspond to coordinates of the same object. I need to:
transpose the whole matrix and arrange it so now row 1 is column 1 and row 501 is column 2 and have paired x, y coordinates in contiguous columns. Row 2 and row 502 should be in column 1 and column 2 below the data of previous object.
ideally, have an extra column with filename info.
thanks.
Simpler version:
Transpose the matrix, then create a vector with the column indices and subset with them:
mat <- matrix(1:100, nrow = 10)
mat2 <- t(mat)
cols <- unlist(lapply(1:(nrow(mat2)/2), function(i) c(i, i+nrow(mat2)/2)))
mat3 <- mat2[,cols]
Then just make it a dataframe as below.
You can subset pairs of rows separated by nrow/2, make them a 2-column matrix and then cbind them all:
df <- as.data.frame(do.call(cbind, lapply(1:(nrow(mat)/2), function(i) {
matrix(mat[c(i, nrow(mat)/2 + i),], ncol = 2, byrow = TRUE)
})))
df
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 fname
# 1 1 6 2 7 3 8 4 9 5 10 a
# 2 11 16 12 17 13 18 14 19 15 20 e
# 3 21 26 22 27 23 28 24 29 25 30 e
# 4 31 36 32 37 33 38 34 39 35 40 o
# 5 41 46 42 47 43 48 44 49 45 50 y
# 6 51 56 52 57 53 58 54 59 55 60 q
# 7 61 66 62 67 63 68 64 69 65 70 v
# 8 71 76 72 77 73 78 74 79 75 80 b
# 9 81 86 82 87 83 88 84 89 85 90 v
# 10 91 96 92 97 93 98 94 99 95 100 y
Then just add the new column as necessary, since it's now a dataframe:
df$fname <- sample(letters, nrow(df), TRUE)
What about
n <- 500
df <- data.frame(col1 = data[1:n, ],
col2 = data[(nrow(data) - 500):nrow(data), ],
fileinfo = "this is the name of the file...")
Try David's answer, but this way:
n <- 500
df <- data.frame(col1 = data[1:n, ],
col2 = data[(nrow(data) - (n-1)):nrow(data), ],
fileinfo = "this is the name of the file...")