Retrieve Top AND Bottom Values R Dataframe - r

Looking for a way to select the top 3 AND bottom 3 rows by value. I have tried using slice_max() in conjunction with slice_min() with no success.
id value
a 0.9
b 0.2
c -0.4
d -0.9
e 0.6
f 0.8
g -0.3
h 0.1
i 0.2
j 0.5
k -0.2
# Desired output: <br>
a 0.9
f 0.8
e 0.6
d -0.9
c -0.4
g -0.3

dplyr
dat %>%
filter(!between(dense_rank(value), 4, n() - 4))
# id value
# 1 a 0.9
# 2 c -0.4
# 3 d -0.9
# 4 e 0.6
# 5 f 0.8
# 6 g -0.3
or
dat %>%
arrange(value) %>%
slice( unique(c(1:3, n() - 0:2)) )

Related

Taking two columns from your dataframe and creating a squared dataframe

Hello I have a dataframe that contains information on the correlation of two factors, which looks somewhat like this.
Factor_1 Factor_2 value
a b 0.8
a a 1
a d 0.6
b c 0.4
b b 1
a c 0.2
b d 0.75
b a 0.8
c a 0.2
c c 1
c d 0.1
c b 0.4
As you can see, when Factor_1 and Factor_2 are of the same value, their correlation is 1. Also, the number of each factor does not match (Factor_1 has a,b,c when Factor_2 has a,b,c,d.)
With this dataframe, I want to create a squared dataframe that has the values of Factor_1 as the row and column names, and the values matching each correlation value.
It should look something like this.
a b c
a 1 0.8 0.2
b 0.8 1 0.4
c 0.2 0.4 1
Any way to create this dataframe, using tidyverse?
Thanks in advance!
in base R:
xtabs(value~Factor_1 + Factor_2, df)[,-4]
Factor_2
Factor_1 a b c
a 1.00 0.80 0.20
b 0.80 1.00 0.40
c 0.20 0.40 1.00
if you need it as a dataframe:
as.data.frame.matrix(xtabs(value~., df)[,1:3])
a b c
a 1.0 0.8 0.2
b 0.8 1.0 0.4
c 0.2 0.4 1.0
You could also use
xtabs(value~.,subset(df, Factor_1 %in%Factor_2 & Factor_2 %in%Factor_1))
Factor_2
Factor_1 a b c
a 1.0 0.8 0.2
b 0.8 1.0 0.4
c 0.2 0.4 1.0
You may try
library(dplyr)
library(tidyr)
df %>%
filter(Factor_2 %in% unique(Factor_1), Factor_1 %in% unique(Factor_2)) %>%
arrange(Factor_1, Factor_2) %>%
pivot_wider(id_cols = Factor_1, names_from = Factor_2, values_from = value) %>%
column_to_rownames(var = "Factor_1")
a b c
a 1.0 0.8 0.2
b 0.8 1.0 0.4
c 0.2 0.4 1.0
df %>% arrange(Factor_1,Factor_2) %>%
pivot_wider(names_from = Factor_2, values_from = value)%>%
select(1:(nrow(.)+1))
# A tibble: 3 × 4
Factor_1 a b c
<chr> <dbl> <dbl> <dbl>
1 a 1 0.8 0.2
2 b 0.8 1 0.4
3 c 0.2 0.4 1

Multiply values depending on values of certains columns

I have two data base, df and cf. I want to multiply each value of A in df by each coefficient in cf depending on the value of B and C in table df.
For example
row 2 in df A= 20 B= 4 and C= 2 so the correct coefficient is 0.3,
the result is 20*0.3 = 6
There is a simple way to do that in R!?
Thanks in advance!!
df
A B C
20 4 2
30 4 5
35 2 2
24 3 3
43 2 1
cf
C
B/C 1 2 3 4 5
1 0.2 0.3 0.5 0.6 0.7
2 0.1 0.5 0.3 0.3 0.4
3 0.9 0.1 0.6 0.6 0.8
4 0.7 0.3 0.7 0.4 0.6
One solution with apply:
#iterate over df's rows
apply(df, 1, function(x) {
x[1] * cf[x[2], x[3]]
})
#[1] 6.0 18.0 17.5 14.4 4.3
Try this vectorized:
df[,1] * cf[as.matrix(df[,2:3])]
#[1] 6.0 18.0 17.5 14.4 4.3
A solution using dplyr and a vectorised function:
df = read.table(text = "
A B C
20 4 2
30 4 5
35 2 2
24 3 3
43 2 1
", header=T, stringsAsFactors=F)
cf = read.table(text = "
0.2 0.3 0.5 0.6 0.7
0.1 0.5 0.3 0.3 0.4
0.9 0.1 0.6 0.6 0.8
0.7 0.3 0.7 0.4 0.6
")
library(dplyr)
# function to get the correct element of cf
# vectorised version
f = function(x,y) cf[x,y]
f = Vectorize(f)
df %>%
mutate(val = f(B,C),
result = val * A)
# A B C val result
# 1 20 4 2 0.3 6.0
# 2 30 4 5 0.6 18.0
# 3 35 2 2 0.5 17.5
# 4 24 3 3 0.6 14.4
# 5 43 2 1 0.1 4.3
The final dataset has both result and val in order to check which value from cf was used each time.

Expanding rows of data

I have an issue of expanding rows of my data frame. I tried expand from tidyr inside of a dplyr chain. The point is that it seems that this function is expanding the data but by changing the order of expand element which is not desired. I want to keep order of sp column after expand.
Here is my attempt
df <- data.frame(label1=letters[1:6],label2=letters[7:12])
sp <- c(-1,0,seq(0.1,0.5,0.1),seq(-2,-2.5,-0.1),seq(0.1,0.5,0.1))
sp
# [1] -1.0 0.0 0.1 0.2 0.3 0.4 0.5 -2.0 -2.1 -2.2 -2.3 -2.4 -2.5 0.1 0.2 0.3 0.4 0.5
library(dplyr)
library(tidyr)
expanded <- df%>%
expand(df,sp)
> head(expanded)
label1 label2 sp
1 a g -2.5
2 a g -2.4
3 a g -2.3
4 a g -2.2
5 a g -2.1
6 a g -2.0
I want to expand df based on sp order. how can we do that?
expected output
label1 label2 sp
1 a g -1.0
2 a g 0.0
3 a g 0.1
4 a g 0.2
5 a g 0.3
6 a g 0.4
7 a g 0.5
8 a g -2
9 a g -2.1
10 a g -2.2
11 a g -2.3
12 a g -2.4
13 a g -2.5
14 b h -1.0
15 b h 0.0
16 b h 0.1
and so on
We can match the column 'sp' with the vector sp in the global environment to do the ordering
r1 <- df %>%
expand(df, sp) %>%
arrange(label1, label2, match(sp, unique(.GlobalEnv$sp)))
dim(r1)
#[1] 78 3
identical(unique(r1$sp), unique(sp))
#[1] TRUE
Update
If there are duplicates in the 'sp' vector and we want to expand on all the values, one option is to do the expansion on the sequence of the vector and later change the values
r2 <- df %>%
expand(df, sp=seq_along(sp)) %>%
mutate(sp = .GlobalEnv$sp[sp])
dim(r2)
#[1] 108 3
head(r2, length(sp))
# label1 label2 sp
# 1 a g -1.0
# 2 a g 0.0
# 3 a g 0.1
# 4 a g 0.2
# 5 a g 0.3
# 6 a g 0.4
# 7 a g 0.5
# 8 a g -2.0
# 9 a g -2.1
# 10 a g -2.2
# 11 a g -2.3
# 12 a g -2.4
# 13 a g -2.5
# 14 a g 0.1
# 15 a g 0.2
# 16 a g 0.3
# 17 a g 0.4
# 18 a g 0.5

How to reset row names?

Here is a sample data set:
sample1 <- data.frame(Names=letters[1:10], Values=sample(seq(0.1,1,0.1)))
When I'm reordering the data set, I'm losing the row names order
sample1[order(sample1$Values), ]
Names Values
7 g 0.1
4 d 0.2
3 c 0.3
9 i 0.4
10 j 0.5
5 e 0.6
8 h 0.7
6 f 0.8
1 a 0.9
2 b 1.0
Desired output:
Names Values
1 g 0.1
2 d 0.2
3 c 0.3
4 i 0.4
5 j 0.5
6 e 0.6
7 h 0.7
8 f 0.8
9 a 0.9
10 b 1.0
Try
rownames(Ordersample2) <- 1:10
or more generally
rownames(Ordersample2) <- NULL
I had a dplyr usecase:
df %>% as.data.frame(row.names = 1:nrow(.))

Fill nth columns in a dataframe

I have this data frame:
df <- data.frame(A=c("a","b","c","d","e","f","g","h","i"),
B=c("1","1","1","2","2","2","3","3","3"),
C=c(0.1,0.2,0.4,0.1,0.5,0.7,0.1,0.2,0.5))
> df
A B C
1 a 1 0.1
2 b 1 0.2
3 c 1 0.4
4 d 2 0.1
5 e 2 0.5
6 f 2 0.7
7 g 3 0.1
8 h 3 0.2
9 i 3 0.5
I would like to add 1000 further columns and fill this columns with the values generated by :
transform(df, D=ave(C, B, FUN=function(b) sample(b, replace=TRUE)))
I've tried with a for loop but it does not work:
for (i in 4:1000){
df[, 4:1000] <- NA
df[,i] = transform(df, D=ave(C, B, FUN=function(b) sample(b, replace=TRUE)))
}
For efficiency reasons, I suggest running sample only once for each group. This can be achieved with this:
sample2 <- function(x, size)
{
if(length(x)==1) rep(x, size) else sample(x, size, replace=TRUE)
}
new_df <- do.call(rbind, by(df, df$B,
function(d) cbind(d, matrix(sample2(d$C, length(d$C)*1000),
ncol=1000))))
Notes:
I've created sample2 in case there is a group with only one C value. Check ?sample to see what I mean.
The names of the columns will be numbers, from 1 to 1000. This can be changed as in the answer by #agstudy.
The row names are also changed. "Fixing" them is similar, just use row.names instead of col.names.
Using replicate for example:
cbind(df,replicate(1000,ave(df$C, df$B,
FUN=function(b) sample(b, replace=TRUE))))
To add 4 columns for example:
cbind(df,replicate(4,ave(df$C, df$B,
FUN=function(b) sample(b, replace=TRUE))))
A B C 1 2 3 4
1 a 1 0.1 0.2 0.2 0.1 0.2
2 b 1 0.2 0.4 0.2 0.4 0.4
3 c 1 0.4 0.1 0.1 0.1 0.1
4 d 2 0.1 0.1 0.5 0.5 0.1
5 e 2 0.5 0.7 0.1 0.5 0.1
6 f 2 0.7 0.1 0.7 0.7 0.7
7 g 3 0.1 0.2 0.5 0.2 0.2
8 h 3 0.2 0.2 0.1 0.2 0.1
9 i 3 0.5 0.5 0.5 0.1 0.5
Maybe you need to rename columns by something like :
gsub('([0-9]+)','D\\1',colnames(res))
1] "A" "B" "C" "D1" "D2" "D3" "D4"

Resources