I want to built a dataframe like df2 from df1, looking always for the name of the column where the value is closet to 0: Where clossets_1 - closer value to 0 of the columns x,y and z. clossets_2 - closer value to 0 of the columns x and a, because x is the most received value in clossets_1. clossets_3 - closer value to 0 of the columns a and b, because a is the most received value in clossets_2.
df1
df1
# x y z a b
#1 1 2 3 4 3
#2 2 3 4 1 2
#3 3 2 4 2 1
#4 4 3 2 3 6
Desire output:
df2
# x y z clossets_1 a clossets_2 b clossets_3
#1 1 2 3 x 4 x 3 b
#2 2 3 4 x 1 a 2 a
#3 3 2 4 y 2 a 1 b
#4 4 3 2 z 3 a 2 b
Here is the first step to get you started:
cols = c("x","y","z")
df2 = df1
df2$clossets_1 = cols[apply(df1[,cols], 1, function(x) {which(x == min(x))})]
df2
## x y z a b clossets_1
## 1 1 2 3 4 3 x
## 2 2 3 4 1 2 x
## 3 3 2 4 2 1 y
## 4 4 3 2 3 6 z
I solved it this way, using the first step of #BigFinger answer and the mlv() function from the package modeest to find the most repeated value in the closests columns
library(DescTools)
library(modeest)
library(tibble)
df1 = tibble(x = c(1,2,3,4),
y = c(2,3,2,3),
z = c(3,4,4,2),
clossest_1 = c("x","y","z")[apply(data.frame(x,y,z),1,function(x){which(x == Closest(x,0))})],
a = c(4,1,2,3),
clossest_2 = c(mlv(clossest_1),"a")[apply(data.frame(get(mlv(clossest_1)),a),1,function(x){which(x == Closest(x,0))})],
b = c(3,2,1,2),
clossest_3 = c(mlv(clossest_2),"b")[apply(data.frame(get(mlv(clossest_2)),b),1,function(x){which(x == Closest(x,0))})])
df1
# A tibble: 4 x 8
# x y z clossest_1 a clossest_2 b clossest_3
# <dbl> <dbl> <dbl> <chr> <dbl> <chr> <dbl> <chr>
#1 1 2 3 x 4 x 3 b
#2 2 3 4 x 1 a 2 a
#3 3 2 4 y 2 a 1 b
#4 4 3 2 z 3 a 2 b
Related
I am using the separate_rows function from tidyr.
Essentially, I would like to change the value of the data that is copied -- in the example below, it would read: "everytime a new row is created, multiply z by 0.5"
I already added an index in the default df. so it could be "everytime the index N is the same as [-1], multiply z by 0.5"
df <- tibble(
x = 1:4,
y = c("a", "b,c,d", "e,f"),
z = 1:4
)
# A tibble: 3 x 3
x y z
<int> <chr> <int>
1 1 a 1
2 2 b,c,d 2
3 3 e,f 3
what we get:
> separate_rows(df, y)
# A tibble: 6 x 3
x y z
<int> <chr> <int>
1 1 a 1
2 2 b 2
3 2 c 2
4 2 d 2
5 3 e 3
6 3 f 3
what I would need (the z values that have a new row multipled by 0.5:
# A tibble: 6 x 3
x y z
<int> <chr> <int>
1 1 a 1
2 2 b 1
3 2 c 1
4 2 d 1
5 3 e 1.5
6 3 f 1.5
You can group by z and multiply if n > 1.
df %>%
separate_rows(y) %>%
group_by(z) %>%
mutate(z = ifelse(n() > 1, z*0.5, z))
x y z
<int> <chr> <dbl>
1 1 a 1
2 2 b 1
3 2 c 1
4 2 d 1
5 3 e 1.5
6 3 f 1.5
An option is also to multiply 'z' by 0.5, get the pmax with 1 and then use separate_rows
library(dplyr)
library(tidyr)
df %>%
mutate(z = pmax(1, z * 0.5)) %>%
separate_rows(y)
-output
# A tibble: 6 × 3
x y z
<int> <chr> <dbl>
1 1 a 1
2 2 b 1
3 2 c 1
4 2 d 1
5 3 e 1.5
6 3 f 1.5
This question already has answers here:
How to generate permutations or combinations of object in R?
(3 answers)
Closed 2 years ago.
x = 1:3
y = 1:3
> expand.grid(x = 1:3, y = 1:3)
x y
1 1 1
2 2 1
3 3 1
4 1 2
5 2 2
6 3 2
7 1 3
8 2 3
9 3 3
Using expand.grid gives me all of the combinations. However, I want only pairwise comparisons, that is, I don't want a comparison of 1 vs 1, 2 vs, 2, or 3 vs 3. Moreover, I want to keep only the unique pairs, i.e., I want to keep 1 vs 2 (and not 2 vs 1).
In summary, for the above x and y, I want the following 3 pairwise combinations:
x y
1 1 2
2 1 3
3 2 3
Similarly, for x = y = 1:4, I want the following pairwise combinations:
x y
1 1 2
2 1 3
3 1 4
4 2 3
5 2 4
6 3 4
We can use combn
f1 <- function(x) setNames(as.data.frame(t(combn(x, 2))), c("x", "y"))
f1(1:3)
# x y
#1 1 2
#2 1 3
#3 2 3
f1(1:4)
# x y
#1 1 2
#2 1 3
#3 1 4
#4 2 3
#5 2 4
#6 3 4
Using data.table,
library(data.table)
x <- 1:4
y <- 1:4
CJ(x, y)[x < y]
x y
1: 1 2
2: 1 3
3: 1 4
4: 2 3
5: 2 4
6: 3 4
Actually you are already very close to the desired output. You may need subset as well
> subset(expand.grid(x = x, y = y), x < y)
x y
4 1 2
7 1 3
8 2 3
Here is another option but with longer code
v <- letters[1:5] # dummy data vector
mat <- diag(length(v))
inds <- upper.tri(mat)
data.frame(
x = v[row(mat)[inds]],
y = v[col(mat)[inds]]
)
which gives
x y
1 a b
2 a c
3 b c
4 a d
5 b d
6 c d
7 a e
8 b e
9 c e
10 d e
library(tidyr)
dat <- expand_grid(df = data.frame(x = 1:2, y = c(2, 1)), z = 1:3)
dat
yeilds
# A tibble: 6 x 2
z df$x $y
<int> <int> <dbl>
1 1 1 2
2 1 2 1
3 2 1 2
4 2 2 1
5 3 1 2
6 3 2 1
I would like to remove df from the data frame, having 3 simple columns, z, x, and y
I have tried unnest but got no where. any advice?
Just don't name the data frame:
library(tidyr)
expand_grid(data.frame(x = 1:2, y = c(2, 1)), z = 1:3)
# A tibble: 6 x 3
x y z
<int> <dbl> <int>
1 1 2 1
2 1 2 2
3 1 2 3
4 2 1 1
5 2 1 2
6 2 1 3
If the dataset is already created, then convert to a regular data.frame with do.call, and rename the columns that starts with df
library(dplyr)
library(stringr)
do.call(data.frame, dat) %>%
rename_at(vars(starts_with('df')), ~ str_remove(., 'df\\.'))
# x y z
#1 1 2 1
#2 1 2 2
#3 1 2 3
#4 2 1 1
#5 2 1 2
#6 2 1 3
Or another option is to pull the column 'df' and then bind with the rest
dat %>%
pull(df) %>%
bind_cols(z = dat %>%
pull(z))
# x y z
#1 1 2 1
#2 1 2 2
#3 1 2 3
#4 2 1 1
#5 2 1 2
#6 2 1 3
Or using crossing in this case
crossing(data.frame(x = 1:2, y = c(2, 1)), z = 1:3)
# A tibble: 6 x 3
# x y z
# <int> <dbl> <int>
#1 1 2 1
#2 1 2 2
#3 1 2 3
#4 2 1 1
#5 2 1 2
#6 2 1 3
This question already has answers here:
Sort data frame by two columns (with condition) [duplicate]
(2 answers)
Closed 5 years ago.
mydata <- data.frame(id = c(rep(1, 3), rep(2, 3), rep(3, 3)),
score = c(c(1, 2, 3), c(3, 2, 1), c(1, 3, 2)),
location = c(rep(c("X", "Y", "Z"), 3)))
> mydata
id score location
1 1 1 X
2 1 2 Y
3 1 3 Z
4 2 3 X
5 2 2 Y
6 2 1 Z
7 3 1 X
8 3 3 Y
9 3 2 Z
I would like to sort my data.frame according to score from smallest to largest for each id.
Simplying ordering by score ignores the id column.
> mydata[with(mydata, order(score)),]
id score location
1 1 1 X
6 2 1 Z
7 3 1 X
2 1 2 Y
5 2 2 Y
9 3 2 Z
3 1 3 Z
4 2 3 X
8 3 3 Y
Essentially, I want my output to be
id score location
1 1 1 X
2 1 2 Y
3 1 3 Z
4 2 1 Z
5 2 2 Y
6 2 3 X
7 3 1 X
8 3 2 Z
9 3 3 Y
Using base R only.
mydata[order(mydata$id, mydata$score), ]
id score location
1 1 1 X
2 1 2 Y
3 1 3 Z
6 2 1 Z
5 2 2 Y
4 2 3 X
7 3 1 X
9 3 2 Z
8 3 3 Y
You can use dplyr package:
library(dplyr)
mydata %>% arrange(id,score)
# id score location
# 1 1 1 X
# 2 1 2 Y
# 3 1 3 Z
# 4 2 1 Z
# 5 2 2 Y
# 6 2 3 X
# 7 3 1 X
# 8 3 2 Z
# 9 3 3 Y
how to mutate a column with ID in group
data.frame like:
a b c
1 a 1 1
2 a 1 2
3 a 2 3
4 b 1 4
5 b 2 5
6 b 3 6
group by a, flag start with 1, if b equals pre b,then flag=1 else flag+=1
a b c flag
1 a 1 1 1 <- group a start with 1
2 a 1 2 1 <-- in group a, 1(in row 2)=1(in row 1)
3 a 2 3 2 <- in group a, 2(in row 3)!=1(in row 2)
4 b 1 4 1 <- group b start with 1
5 b 2 5 2 <- in group b, 2(in row 5)!=1(in row 4)
6 b 3 6 3 <- in group b, 3(in row 6)!=2(in row 5)
i now using this:
for(i in 2:nrow(x)){
x[i, 'flag'] = ifelse(x[i, 'a']!=x[i-1,'a'], 1, ifelse(x[i, 'b']==x[i-1, 'b'], x[i-1, 'flag'], x[i-1,'flag']+1))
}
but it is inefficiency in large dataset
#
UPDATE
dense_rank in dplyr give me the answer
> x %>% group_by(a) %>% mutate(dense_rank(b))
Source: local data frame [10 x 4]
Groups: a
a b c dense_rank(b)
1 a x 1 1
2 a x 2 1
3 a y 3 2
4 b x 4 1
5 b y 5 2
6 b z 6 3
7 c x 7 1
8 c y 8 2
9 c z 9 3
10 c z 10 3
thanks.
I am not entirely sure what you are trying to do. But it seems to me that you are trying to assign index numbers to values in b for each group (a or b).
#I modified your example here.
a <- rep(c("a","b"), each =3)
b <- c(4,4,5,11,12,13)
c <- 1:6
foo <- data.frame(a,b,c, stringsAsFactors = F)
a b c
1 a 4 1
2 a 4 2
3 a 5 3
4 b 11 4
5 b 12 5
6 b 13 6
#Since you referred to dplyr, I will use it.
cats <- list()
for(i in unique(foo$a)){
ana <- foo %>%
filter(a == i) %>%
arrange(b) %>%
mutate(indexInb = as.integer(as.factor(b)))
cats[[i]] <- ana
}
bob <- rbindlist(cats)
a b c indexInb
1: a 4 1 1
2: a 4 2 1
3: a 5 3 2
4: b 11 4 1
5: b 12 5 2
6: b 13 6 3
Hers's a quick vectorized way to solve this without using any for loops
Base R solution using ave and transform
transform(x, flag = ave(b, a, FUN = function(x) cumsum(c(1, diff(x)))))
# a b c flag
# 1 a 1 1 1
# 2 a 1 2 1
# 3 a 2 3 2
# 4 b 1 4 1
# 5 b 2 5 2
# 6 b 3 6 3
Or a data.table solution (more efficient)
library(data.table)
setDT(x)[, flag := cumsum(c(1, diff(b))), by = a]
x
# a b c flag
# 1: a 1 1 1
# 2: a 1 2 1
# 3: a 2 3 2
# 4: b 1 4 1
# 5: b 2 5 2
# 6: b 3 6 3
Or a dplyr solution (because you tagged it)
library(dplyr)
x %>%
group_by(a) %>%
mutate(flag = cumsum(c(1, diff(b))))
# Source: local data frame [6 x 4]
# Groups: a
#
# a b c flag
# 1 a 1 1 1
# 2 a 1 2 1
# 3 a 2 3 2
# 4 b 1 4 1
# 5 b 2 5 2
# 6 b 3 6 3