I wrote a simple function that produces all combinations of the input (a vector). Here the input vector is basically a sequence of 4 coordinates (x, y) as mentioned inside the function as a, b,c, and d.
intervals<-function(x1,y1,x2,y2,x3,y3,x4,y4){
a<-c(x1,y1)
b<-c(x2,y2)
c<-c(x3,y3)
d<-c(x4,y4)
union<-expand.grid(a,b,c,d)
union
}
intervals(2,10,3,90,6,50,82,7)
> intervals(2,10,3,90,6,50,82,7)
Var1 Var2 Var3 Var4
1 2 3 6 82
2 10 3 6 82
3 2 90 6 82
4 10 90 6 82
5 2 3 50 82
6 10 3 50 82
7 2 90 50 82
8 10 90 50 82
9 2 3 6 7
10 10 3 6 7
11 2 90 6 7
12 10 90 6 7
13 2 3 50 7
14 10 3 50 7
15 2 90 50 7
16 10 90 50 7
>
Now I want to find (max of x) and (min of y) for each row of the given output. E.g. row 2: we have 4 values (10, 3, 6, 82). Here (3,6,82) are from x (x2,x3,x4) and 10 is basically from y (y1). Thus max of x is 82, and the min of y is 10.
So what I want is two values from each row.
I do not actually know how to approach this kind of logical command. Any idea or suggestions?
You can pass x and y vector separately to the function. Use expand.grid to create all combinations of the vector and get max of x and min of y from each row.
intervals<-function(x, y){
tmp <- do.call(expand.grid, rbind.data.frame(x, y))
names(tmp) <- paste0('col', seq_along(tmp))
result <- t(apply(tmp, 1, function(p) {
suppressWarnings(c(max(p[p %in% x]), min(p[p %in% y])))
}))
result[is.infinite(result)] <- NA
result <- as.data.frame(result)
names(result) <- c('max_x', 'min_x')
result
}
intervals(c(2,3,6,82), c(10, 90, 50, 7))
# max_x min_x
#1 82 NA
#2 82 10
#3 82 90
#4 82 10
#5 82 50
#6 82 10
#7 82 50
#8 82 10
#9 6 7
#10 6 7
#11 6 7
#12 6 7
#13 3 7
#14 3 7
#15 2 7
#16 NA 7
Related
Let the following be the dataset:
What I need to do is to create new columns wherein I need to multiply all a columns with b columns and name the newly created column as
a1_b1, a1_b2........ a1_b4, a2_b1, a2_b2 as shown in the figure.
I am using R for data analysis. Even though I have stated only two columns by two columns, in reality, it is 1600 by 25. Hence the question.
This might be fast enough:
set.seed(42)
DF <- data.frame(a1 = sample(1:10),
a2 = sample(1:10),
b1 = sample(1:10),
b2 = sample(1:10))
a <- grep("a", names(DF))
b <- grep("b", names(DF))
combs <- expand.grid(a, b)
res <- do.call(mapply, c(list(FUN = \(...) do.call(`*`, DF[, c(...)])), combs))
colnames(res) <- paste(names(DF)[combs[[1]]], names(DF)[combs[[2]]], sep = "_")
cbind(DF, res)
# a1 a2 b1 b2 a1_b1 a2_b1 a1_b2 a2_b2
#1 1 8 9 3 9 72 3 24
#2 5 7 10 1 50 70 5 7
#3 10 4 3 2 30 12 20 8
#4 8 1 4 6 32 4 48 6
#5 2 5 5 10 10 25 20 50
#6 4 10 6 8 24 60 32 80
#7 6 2 1 4 6 2 24 8
#8 9 6 2 5 18 12 45 30
#9 7 9 8 7 56 72 49 63
#10 3 3 7 9 21 21 27 27
The operation in the question is the transpose of the KhatriRao product. We use the Matrix package which comes with R so it does not have to be installed. Using the input in the Note at the end,
pick out the two portions, transpose them, use KhatriRao and transpose back giving a sparse matrix (class "dgCMatrix"). We can use as.matrix to convert to a dense matrix as shown or as.data.frame(as.matrix(...)) to convert to a data.frame.
library(Matrix)
rownames(dat) <- 1:nrow(dat)
ix <- grep("a", colnames(dat))
as.matrix(t(KhatriRao(t(dat[, -ix]), t(dat[, ix]), make.dimnames = TRUE)))
giving:
a1:b1 a2:b1 a1:b2 a2:b2
1 101 838.3 108.3 898.89
2 204 1050.6 220.6 1136.09
3 309 1957.0 357.0 2261.00
4 416 1664.0 464.0 1856.00
5 525 1638.0 578.0 1803.36
6 749 2118.6 838.6 2372.04
Note
dat <- setNames(cbind(BOD, BOD + 100), c("a1", "a2", "b1", "b2"))
dat
giving
a1 a2 b1 b2
1 1 8.3 101 108.3
2 2 10.3 102 110.3
3 3 19.0 103 119.0
4 4 16.0 104 116.0
5 5 15.6 105 115.6
6 7 19.8 107 119.8
I have the following dataset:
Class Budget Total Rank
A 120 1926 58 5 9 2 10 3
B 120 3146 52 6 15 1 6 7 8 9
C 120 2358 51 2 1 4
D 120 3252 57 5 16 0.5 9 7 6 33 4 6
I would like to get the maximum and minimum value for each row starting from the column after the Rank (i.e., those columns that don't have titles).
What I want is to include the max and min within the data frame like:
Class Budget Total Rank max min
A 120 1926 58 10 2 5 9 2 10 3
B 120 3146 52 15 1 6 15 1 6 7 8 9
C 120 2358 51 4 1 2 1 4
D 120 3252 57 33 0.5 5 16 0.5 9 7 6 33 4 6
How can I do that?
Try the following:
df[, "Max"] <- apply(df[, 5:length(df)], 1, max, na.rm = TRUE)
df[, "Min"] <- apply(df[, 5:length(df)], 1, min, na.rm = TRUE)
I have a column with 10 random numbers, from that I want to create a new column that have switched places for every pair, see example for how I mean. How would you do that?
column newcolumn
1 5
5 1
7 6
6 7
25 67
67 25
-10 2
2 -10
-50 36
36 -50
Taking advantage of the fact that R will replicate smaller vectors when adding them to larger vectors, you can:
a <- data.frame(column=c(1,5,7,6,25,67,-10,2,50,36))
a$newColumn <- a$column[seq(nrow(a)) + c(1, -1)]
Something like this.
a <- data.frame(column=c(1,5,7,6,25,67,-10,2,50,36))
a$newColumn <- 0
a[seq(1,nrow(a),by=2),"newColumn"]<-a[seq(2,nrow(a),by=2),"column"]
a[seq(2,nrow(a),by=2),"newColumn"]<-a[seq(1,nrow(a),by=2),"column"]
# results
column newColumn
1 1 5
2 5 1
3 7 6
4 6 7
5 25 67
6 67 25
7 -10 2
8 2 -10
9 50 36
10 36 50
Here is a base R one-liner: We can cast column as 2 x nrow(df)/2 matrix, swap rows, and recast as vector.
df$newcolumn <- c(matrix(df$column, ncol = nrow(df) / 2)[c(2,1), ]);
# column newcolumn
#1 1 5
#2 5 1
#3 7 6
#4 6 7
#5 25 67
#6 67 25
#7 -10 2
#8 2 -10
#9 -50 36
#10 36 -50
Sample data
df <- read.table(text =
"column
1
5
7
6
25
67
-10
2
-50
36", header = T)
Another option would be to use ave and rev
transform(df, newCol = ave(x = df$column, rep(1:5, each = 2), FUN = rev))
# column newCol
#1 1 5
#2 5 1
#3 7 6
#4 6 7
#5 25 67
#6 67 25
#7 -10 2
#8 2 -10
#9 -50 36
#10 36 -50
The part rep(1:5, each = 2) creates a grouping variable ("pairs") for each of which we reverse the elements.
Here's a compact way:
a$new_col <- c(matrix(a$column,2)[2:1,])
# column new_col
# 1 1 5
# 2 5 1
# 3 7 6
# 4 6 7
# 5 25 67
# 6 67 25
# 7 -10 2
# 8 2 -10
# 9 50 36
# 10 36 50
The idea is to write in a 2 row matrix, switch the rows, and unfold back in a vector.
Here is my data :
class x1 x2
c 6 90
b 5 50
c 3 70
b 9 40
a 5 30
b 1 60
a 7 20
c 4 80
a 2 10
I first want to order it by class (increasing or decreasing doesn't really matter) and then by x1 (decreasing), so I do the following :
df <- df[with(df, order(class, x1, decreasing = TRUE))]
class x1 x2
c 6 90
c 4 80
c 3 70
b 9 40
b 5 50
b 1 60
a 7 20
a 5 30
a 2 10
And then I would like the cumulative sum over x1 for each class :
class x1 x2 cumsum
c 6 90 90
c 4 80 170 # 90+80
c 3 70 240 # 90+80+70
b 9 40 40
b 5 50 90 # 40+50
b 1 60 150 # 40+50+60
a 7 20 20
a 5 30 50 # 20+30
a 2 10 60 # 20+30+10
Following this answer, I did this :
df$cumsum <- unlist(by(df$x2, df$class, cumsum))
# (Also tried this, same result)
df$cumsum <- unlist(by(df[,x2], df[,class], cumsum))
But what I get is a cumulative sum over the whole set + misordered. To be more specific, Here is what I get :
class x1 x2 cumsum
c 6 90 20 # this cumsum
c 4 80 50 # and this cumsum
c 3 70 60 # and this cumsum are the cumsum of the lines of class a,
b 9 40 100 # then it adds the 'x2' values of class b : 60 ('cumsum' from the previous line) + 40
b 5 50 150 # and keeps doing so : 100 + 50
b 1 60 210 # 150 + 60
a 7 20 300 # 210 + 90
a 5 30 380 # 300 + 80
a 2 10 450 # 380 + 70
Any idea on how I could solve this ? Thanks
dplyr can work here too
library(dplyr)
df %>%
group_by(class) %>%
arrange(desc(x1)) %>%
mutate(cumsum=cumsum(x2))
## class x1 x2 cumsum
## (fctr) (int) (int) (int)
## 1 a 7 20 20
## 2 a 5 30 50
## 3 a 2 10 60
## 4 b 9 40 40
## 5 b 5 50 90
## 6 b 1 60 150
## 7 c 6 90 90
## 8 c 4 80 170
## 9 c 3 70 240
As described here (https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html) and elsewhere, the group_by in conjunction with arrange implies that the data will be sorted by the grouping variable first.
We can use data.table
library(data.table)
setDT(df)[, x2:= cumsum(x2) , class]
df
# class x1 x2
#1: c 6 90
#2: c 4 170
#3: c 3 240
#4: b 9 40
#5: b 5 90
#6: b 1 150
#7: a 7 20
#8: a 5 50
#9: a 2 60
NOTE: In the above I used the ordered data
If we need to order also,
setorder(setDT(df), -class, -x1)[, x2:=cumsum(x2), class]
You can use base R transform and ave to cumsum over the class column
transform(df[order(df$class, decreasing = T), ], cumsum = ave(x2, class, FUN=cumsum))
# class x1 x2 cumsum
#1 c 6 90 90
#3 c 3 70 160
#8 c 4 80 240
#2 b 5 50 50
#4 b 9 40 90
#6 b 1 60 150
#5 a 5 30 30
#7 a 7 20 50
#9 a 2 10 60
I have the data frame df and I want to subset df based on a number sequence within a categorical.
x <- c(1,2,3,4,5,7,9,11,13)
x2 <- x+77
df <- data.frame(x=c(x,x2),y= c(rep("A",9),rep("B",9)))
df
x y
1 1 A
2 2 A
3 3 A
4 4 A
5 5 A
6 7 A
7 9 A
8 11 A
9 13 A
10 78 B
11 79 B
12 80 B
13 81 B
14 82 B
15 84 B
16 86 B
17 88 B
18 90 B
I want only the rows where x increments by 1 and not the rows where x increases by two: e.g.
x y
1 1 A
2 2 A
3 3 A
4 4 A
5 5 A
10 78 B
11 79 B
12 80 B
13 81 B
14 82 B
I figured I have to do some dort of subtraction between elements and check if the difference is >1 and combine this with a ddply but this seems cumbersome. Is there a sort of sequence function I am missing?
using diff
df[which(c(1,diff(df$x))==1),]
Your example seems to behave well and can be nicely handled by #agstudy's answer. Should your data act up one day, though...
myfun <- function(d, whichDiff = 1) {
# d is the data.frame you'd like to subset, containing the variable 'x'
# whichDiff is the difference between values of x you're looking for
theWh <- which(!as.logical(diff(d$x) - whichDiff))
# Take the diff of x, subtract whichDiff to get the desired values equal to 0
# Coerce this to a logical vector and take the inverse (!)
# which() gets the indexes that are TRUE.
# allWh <- sapply(theWh, "+", 1)
# Since the desired rows may be disjoint, use sapply to get each index + 1
# Seriously? sapply to add 1 to a numeric vector? Not even on a Friday.
allWh <- theWh + 1
return(d[sort(unique(c(theWh, allWh))), ])
}
> library(plyr)
>
> ddply(df, .(y), myfun)
x y
1 1 A
2 2 A
3 3 A
4 4 A
5 5 A
6 78 B
7 79 B
8 80 B
9 81 B
10 82 B