Subsetting and transposing table iterating in R - r

I have this table (inputdf):
Number
Value
1
0.2
1
0.3
1
0.4
2
0.2
2
0.7
3
0.1
and I want to obtain this (outputdf):
Number1
Number2
Number3
0.2
0.2
0.1
0.3
0.7
NA
0.4
NA
NA
I have tried it by iterating with a for loop through the numbers in column 1, then subsetting the dataframe by that number but I have troubles to append the result to an output dataframe:
inputdf <- read.table("input.txt", sep="\t", header = TRUE)
outputdf <- data.frame()
i=1
total=3 ###user has to modify it
for(i in seq(1:total)) {
cat("Collecting values for number", i, "\n")
values <- subset(input, Number == i, select=c(Value))
cbind(outputdf, NewColumn= values, )
names(outputdf)[names(outputdf) == "NewColumn"] <- paste0("Number", i)
}
Any help or hint will be very wellcomed. Thanks in advance!

In the tidyverse, you can create an id for each element of the groups and then use tidyr::pivot_wider:
library(tidyverse)
dat %>%
group_by(Number) %>%
mutate(id = row_number()) %>%
pivot_wider(names_from = Number, names_prefix = "Number", values_from = "Value")
# A tibble: 3 × 4
n Number1 Number2 Number3
<int> <dbl> <dbl> <dbl>
1 1 0.2 0.2 0.1
2 2 0.3 0.7 NA
3 3 0.4 NA NA
in base R, same idea. Create the id column and then reshape to wide:
transform(dat, id = with(dat, ave(rep(1, nrow(dat)), Number, FUN = seq_along))) |>
reshape(direction = "wide", timevar = "Number")

Related

Add new column with appropriate values from separate list

I am trying to fill a new column with appropriate values from a list using dplyr. I tried to come up with a simple reproducible example, which can be found below. In short, I want to add a column "Param" to a dataframe, based on the values of the existing columns. The matching values are found in a separate list. I've tried functions as ifelse()and switch but I cannot make it work. Any tips on how this can be achieved?
Thank you in advance!
library(dplyr)
# Dataframe to start with
df <- as.data.frame(matrix(data = c(rep("A", times = 3),
rep("B", times = 3),
rep(1:3, times = 2)), ncol = 2))
colnames(df) <- c("Method", "Type")
df
#> Method Type
#> 1 A 1
#> 2 A 2
#> 3 A 3
#> 4 B 1
#> 5 B 2
#> 6 B 3
# Desired dataframe
desired <- cbind(df, Param = c(0.9, 0.8, 0.7, 0.6, 0.5, 0.4))
desired
#> Method Type Param
#> 1 A 1 0.9
#> 2 A 2 0.8
#> 3 A 3 0.7
#> 4 B 1 0.6
#> 5 B 2 0.5
#> 6 B 3 0.4
# Failed attempt
param <- list("A" = c("1" = 0.9, "2" = 0.8, "3" = 0.7),
"B" = c("1" = 0.6, "2" = 0.5, "3" = 0.4))
param
#> $A
#> 1 2 3
#> 0.9 0.8 0.7
#>
#> $B
#> 1 2 3
#> 0.6 0.5 0.4
df %>%
mutate(Param = ifelse(.$Method == "A", param$A[[.$Type]],
ifelse(.$Method == "B", param$B[[.$Type]], NA)))
#> Error: Problem with `mutate()` column `Param`.
#> ℹ `Param = ifelse(...)`.
#> x attempt to select more than one element in vectorIndex
You can unlist your list and just add it to your df.
df$Param <- unlist(param)
Method Type Param
1 A 1 0.9
2 A 2 0.8
3 A 3 0.7
4 B 1 0.6
5 B 2 0.5
6 B 3 0.4
As mentioned by #dario including matching data in dataframe would be easier.
library(dplyr)
library(tidyr)
df %>%
nest(data = Type) %>%
left_join(stack(param) %>% nest(data1 = values), by = c('Method' = 'ind')) %>%
unnest(c(data, data1))
# Method Type values
# <chr> <chr> <dbl>
#1 A 1 0.9
#2 A 2 0.8
#3 A 3 0.7
#4 B 1 0.6
#5 B 2 0.5
#6 B 3 0.4
Sure this could be cleaner, but it will get the job done: Option 1:
df %>%
mutate(
Param = unlist(param)[
match(
paste0(
df$Method,
df$Type
),
names(
do.call(
c,
lapply(
param,
names
)
)
)
)
]
)
Option 2: (cleaner version):
df %>%
type.convert() %>%
left_join(
do.call(cbind, param) %>%
data.frame() %>%
mutate(Type = as.integer(row.names(.))) %>%
pivot_longer(!Type, names_to = "Method", values_to = "Param"),
by = c("Type", "Method")
)

How to divide long format R dataframe by a factor and put the factor before divided dataframe?

I'm trying to divide a long-formatted dataframe by a factor (e.g. for each subject) and then put the factor (subject) before the data of each one as a label. The simplied dataframe looks like this, columns X and Y are numbers, column subject is factor. The real dataset actually has hundreds of subjects.
X <- c(1,1,2,2)
Y <- c(0.2, 0.3, 1, 0.5)
Subject <- as.factor(c("A", "A", "B", "B"))
M <- tibble(X,Y,Subject)
> M
# A tibble: 4 x 3
X Y Subject
<dbl> <dbl> <fct>
1 1 0.2 A
2 1 0.3 A
3 2 1 B
4 2 0.5 B
The resulting dataframe should look like this:
> M_trans
A
1 0.2
1 0.3
B
2 1
2 0.5
Thank you for your help!
I tried this code and it works to output like below, I couldn't find a way to introduce factors as everything in r works in vector format. If you find a better solution, post it for us.
X <- c(1,1,2,2,3,3)
Y <- c(0.2, 0.3, 1, 0.5,0.2,0.9)
Subject <- as.factor(c("A", "A", "B", "B","C","C"))
M <- tibble(X,Y,Subject)
unq_subjects <- unique(Subject)
final <- data.frame()
for (i in 1: length(unique(Subject)))
{
sub <- unq_subjects[i]
tmp <- as.data.frame(M %>% filter(Subject == sub) %>%
select(-Subject) %>%
add_row(X = sub, Y = NA) %>%
arrange(desc(X)))
final <- union_all(tmp,final)
}
final Output
X Y
1 C NA
2 3 0.2
3 3 0.9
4 B NA
5 2 1.0
6 2 0.5
7 A NA
8 1 0.2
9 1 0.3
Does it answer your question now?
Using dplyr and tidyr
library(dplyr)
library(tidyr)
M %>%
group_by(Subject) %>%
nest()
Hope this helps!
Here I got an inelegant solution worked for myself, inspired by Bertil Baron's answer. I would be happy to got any easier code...
trans_output <- function(M){
M1 <- M %>%
group_by(subject) %>%
nest()
df <- NULL
for (i in 1:2)
{
output2 <- M1$data[[i]]
df_sub <- rbind(as.character(M1$subject[[i]]), # subject ID
output2) # output data
idx <- c(1L)
df_sub <- df_sub %>%
mutate(Y = ifelse(row_number() %in% idx, NA, Y)) %>% # else, stay as Y
transmute(X = X,
Y = as.numeric(Y))
df <- rbind(df, df_sub)
rm(df_sub)
}
return(df)
}
M_trans <- trans_output(M)
The output looks like this:
> M_trans
# A tibble: 6 x 2
X Y
<chr> <dbl>
1 A NA
2 1 0.2
3 2 0.3
4 B NA
5 3 1
6 4 0.5

Identifying Differences in Rows using Groupby a Column

I have this reproducible dataframe:
df <- data.frame(ID = c("A", "A", "B", "B", "B","C", "C", "D"), cost = c("0.5", "0.4", "0.7", "0.8", "0.5", "1.3", "1.3", "2.6"))
I'm trying to groupby the ID, to test if there are differences in the cost column and update a new column called Test diff
Intermediate Output
ID cost Testdiff
1 A 0.5 Y
2 A 0.4 Y
3 B 0.7 Y
4 B 0.8 Y
5 B 0.5 Y
6 C 1.3 N
7 C 1.3 N
8 D 2.6 N
I'm looking at using a dplyr example to do this but I"m unsure if match is the correct function.
df %>% group_by(ID) %>% mutate(Testdiff = ifelse(match(cost) == T, "Y", "N"))
Once that is completed, I want to keep the 1st row of the unique ID, giving me this output
ID cost Testdiff
1 A 0.5 Y
2 B 0.7 Y
3 C 1.3 N
4 D 2.6 N
We could use n_distinct and then slice
library(dplyr)
df %>%
group_by(ID) %>%
mutate(Testdiff = n_distinct(cost) > 1) %>%
slice(1)
# ID cost Testdiff
# <fct> <fct> <lgl>
#1 A 0.5 TRUE
#2 B 0.7 TRUE
#3 C 1.3 FALSE
#4 D 2.6 FALSE
If you want output to be "Y"/"N" instead of TRUE/FALSE
df %>%
group_by(ID) %>%
mutate(Testdiff = ifelse(n_distinct(cost) > 1, "Y", "N")) %>%
slice(1)
We could use ave and aggregate to solve it using base R
df$Testdiff <- ifelse(with(df, ave(cost, ID, FUN = function(x)
length(unique(x)))) > 1, "Y", "N")
aggregate(.~ID, df, head, n = 1)
# ID cost Testdiff
#1 A 0.5 Y
#2 B 0.7 Y
#3 C 1.3 N
#4 D 2.6 N
Since we have dplyr and base R already why not add in data.table:
library(data.table)
setDT(df)[, .(cost = cost[1], testdiff = uniqueN(cost) > 1), by = ID]
ID cost testdiff
1: A 0.5 TRUE
2: B 0.7 TRUE
3: C 1.3 FALSE
4: D 2.6 FALSE
A different tidyverse possibility could be:
df %>%
group_by(ID) %>%
mutate(Testdiff = ifelse(all(cost == first(cost)), "N", "Y")) %>%
filter(row_number() == 1)
ID cost Testdiff
<fct> <fct> <chr>
1 A 0.5 Y
2 B 0.7 Y
3 C 1.3 N
4 D 2.6 N
Or:
df %>%
group_by(ID) %>%
mutate(Testdiff = ifelse(all(cost == first(cost)), "N", "Y")) %>%
top_n(1, wt = desc(row_number()))

use user defined function after grouped data in R

I have a function checking zero numbers in each column in a large dataframe. Now I want to check zero numbers in each col after grouped by category.
Here is the example:
zero_rate <- function(df) {
z_rate_list <- sapply(df, function(x) {
data.frame(
n_zero=length(which(x==0)),
n=length(x),
z_rate=length(which(x==0))/length(x))
})
d <- data.frame(z_rate_list)
d <- sapply(d, unlist)
d <- as.data.frame(d)
return(d)}
df = data.frame(var1=c(1,0,NA,4,NA,6,7,0,0,10),var2=c(11,NA,NA,0,NA,16,0,NA,19,NA))
df1= data.frame(cat = c(1,1,1,1,1,2,2,2,2,2),df)
zero_rate_df = df1 %>% group_by(cat) %>% do( zero_rate(.))
Here zero_rate(df) works just as I expected. But when I group the data by cat and calculate in each category the zero_rate for each column, the result is not as I expected.
I expect something like this:
cat va1 var2
1 n_zero 1 1
n 5 5
z_rate 0.2 0.2
2 n_zero 2 1
n 5 5
z_rate 0.4 0.2
Any suggestion? Thank you.
I came up with the following code. .[-1] was used to remove grouping col:
zero_rate <- function(df){
res <- lapply(df, function(x){
y <- c(sum(x == 0, na.rm = T), length(x))
c(y, y[1]/y[2])
})
res <- do.call(cbind.data.frame, res)
res$vars <- c('n_zero', 'n', 'z_rate')
res
}
df1 %>% group_by(cat) %>% do( zero_rate(.[-1]))
# cat var1 var2 vars
# <dbl> <dbl> <dbl> <chr>
# 1 1 1.0 1.0 n_zero
# 2 1 5.0 5.0 n
# 3 1 0.2 0.2 z_rate
# 4 2 2.0 1.0 n_zero
# 5 2 5.0 5.0 n
# 6 2 0.4 0.2 z_rate

Reshape column values to column names

I've got a dataset with the following structure:
df <- data.frame(mult=c(1,2,3,4),red=c(1,0.9,0.8,0.7),
result=c('value1','value2','value3','value4'))
that I'd like to display in a 3-D plot (x axis: mult, y axis: red, and the x-y points would be 'result') or multiple 2-D plots. Obviously the real DF has a lot more rows and combinations of mult&red.
Columns mult & red do not have values repeated. What I'd like is to reshape DF to DF1:
- 1 0.9 0.8 0.7
1 value1
2 value2
3 value3
4 .....
so essentially:
1) [mult] values stays as it is (column 1)
2) [red] values become the column names.
3) Each cross between 'mult' and 'red' is a value in
the new DF
My preference would be to do this with the reshape function, but other packages are fine too.
Thanks in advance, p.
Try
library(reshape2)
df1 <- transform(df, result=as.character(result),
red= factor(red, levels= unique(red)))
dcast(df1, mult~red, value.var='result', fill='')[-1]
# 1 0.9 0.8 0.7
#1 value1
#2 value2
#3 value3
#4 value4
Here is a way using tidyr
library(tidyr)
out = rev(spread(df[-1], red, result))
out[is.na(out)] = ''
#> out
# 1 0.9 0.8 0.7
#1 value1
#2 value2
#3 value3
#4 value4
Using reshape as you requested
df <- data.frame(mult=c(1,2,3,4),red=c(1,0.9,0.8,0.7),
result=c('value1','value2','value3','value4'))
df$result = as.character(df$result)
dfWide = reshape(data = df, idvar = "mult", timevar = "red", v.names = "result", direction = "wide")
rownames(dfWide) = dfWide$mult
dfWide$mult = NULL
colnames(dfWide) = gsub(pattern = "result.", replacement = "", colnames(dfWide) )
dfWide[is.na(dfWide)] = ''
dfWide
# 1 0.9 0.8 0.7
# 1 value1
# 2 value2
# 3 value3
# 4 value4

Resources