Rstudio Columns Multiple Binary Features

Rstudio Columns Multiple Binary Features - r

I want to split a column in multiple binary dummy columns. my dataframe: df
id siz eage
1 6 10
2 7 11
3 8 10
At the moment i have this code with package qdaptools and caret:
df <- cbind(df [1:3],mtabulate(strsplit(as.character(df$age), ':')))
My question: how can I give a title to these dummy columns, so I get this:
id size age_10 age_11
1 6 1 0
2 7 0 1
3 8 1 0

You can try dummy.data.frame from dummies package.
library(dummies)
library(dplyr)
df %>%
dummy.data.frame(names="age", sep="_")
Output is:
id size age_10 age_11
1 1 6 1 0
2 2 7 0 1
3 3 8 1 0
Sample data:
df <- structure(list(id = 1:3, size = 6:8, age = c(10L, 11L, 10L)), .Names = c("id",
"size", "age"), class = "data.frame", row.names = c(NA, -3L))
Update:
For the error which you are getting on your actual data you can use below code
Error in sort.list(y) : 'x' must be atomic for 'sort.list' Have you
called 'sort' on a list?
library(dummies)
library(dplyr)
df %>%
data.frame() %>%
dummy.data.frame(names="Verkoopkanaal_groepering", sep="_")

To rename by index: colnames(df)[4:5] <- c("age_10", "age_11")
To rename by existing column name colnames(df)[colnames(df) == "INSERT_COL_NAME"] <- "NEW_COL_NAME"

Related

Updating Values in New Data with Old Data

I have the following data frame:
library(dplyr)
old_data = data.frame(id = c(1,2,3), var1 = c(11,12,13))
> old_data
id var1
1 1 11
2 2 12
3 3 13
I want to replace the values in the 2nd row of "old_data" with data in "new_data" (i.e. rows in "old_data" where the id variables matches ):
new_data = data.frame(id = c(4,2,5), var1 = c(11,15,13))
> new_data
id var1
1 4 11
2 2 15
3 5 13
Using the answer found here (Update rows of data frame in R), I tried to do this with the "dplyr" library:
update = old_data %>%
rows_update(new_data, by = "id")
But this gave me the following error:
Error: Attempting to update missing rows.
Run `rlang::last_error()` to see where the error occurred.
This is what I am trying to get:
id var1
1 1 11
2 2 15
3 3 13
Can someone please tell me what I am doing wrong?
Thanks!

A little bit messy but this works (on this sample data at least)
old_data %>%
left_join(new_data,by="id") %>%
mutate(var1 = if_else(!is.na(var1.y),var1.y,var1.x)) %>%
select(id,var1)
# id var1
#1 1 11
#2 2 15
#3 3 13

A base R approach using match -
inds <- match(old_data$id, new_data$id)
old_data$var1[!is.na(inds)] <- na.omit(new_data$var1[inds])
old_data
# id var1
#1 1 11
#2 2 15
#3 3 13

A data.table approach (with turning the data table back into a dataframe):
library(data.table)
as.data.frame(setDT(old_data)[new_data, var1 := .(i.var1), on = "id"])
Output
id var1
1 1 11
2 2 15
3 3 13
An alternative tidyverse option using rows_update. You can filter new_data to only have ids that appear in old_data. Then, you can update those values, like you had previously tried. Essentially, new_data must only have id values that appear in old_data.
library(tidyverse)
old_data %>%
rows_update(., new_data %>% filter(id %in% old_data$id), by = "id")
Data
old_data <-
structure(list(id = c(1, 2, 3), var1 = c(11, 12, 13)),
class = "data.frame",
row.names = c(NA,-3L))
new_data <-
structure(list(id = c(4, 2, 5), var1 = c(11, 15, 13)),
class = "data.frame",
row.names = c(NA,-3L))

We can use dplyr::rows_update if we first use a semi_join on new_data to filter only those ids that are included in old_data.
library(dplyr)
old_data %>%
rows_update(new_data %>%
semi_join(old_data, by = "id"),
by = "id")
#> id var1
#> 1 1 11
#> 2 2 15
#> 3 3 13
Created on 2021-12-29 by the reprex package (v0.3.0)

Summarizing a dataframe in R with multiple functions in place?

I am new to R and trying to summarize a dataframe with multiple functions and I would like the result to appear in the same column, rather than in separated columns for each function. For example, my data set looks something like this
data =
A B
----
1 2
2 2
3 2
4 2
And I call summarize_all(data, c(min, max)) the dataframe becomes
a_fn1 b_fn1 a_fn2 b_fn2
1 2 4 2
How can I make it so that the result of the summarize_all becomes this:
A B
----
1 2
4 2
Thanks

Does this work:
library(dplyr)
bind_rows(apply(data,2,min),apply(data,2,max))
# A tibble: 2 x 2
A B
<dbl> <dbl>
1 1 2
2 4 2

Here is an option with transpose
library(dplyr)
library(tidyr)
pivot_longer(df1, cols = everything()) %>%
group_by(name) %>%
summarise(min = min(value), max = max(value)) %>%
data.table::transpose(., make.names = 'name')
A B
1 1 2
2 4 2
data
df1 <- structure(list(A = 1:4, B = c(2L, 2L, 2L, 2L)),
class = "data.frame", row.names = c(NA,
-4L))

How do I merge multiple contingency tables into one using R?

I have multiple columns that I need to merge and return a contingency table counting each number.
Example of an ordinal data set:
df <- data.frame(ab = c(1,2,3,4,5),
ba = c(1,3,3,3,5))
>ab ba
1 1
2 3
3 3
4 3
5 5
I would like to be able to return a contingency table showing like this:
>1 2 3 4 5
2 1 4 1 2
Ive attempted examples featured on here for similar issues, but I get the sums returned instead of a count:
library(plyr)
colSums(rbind.fill(data.frame(t(unclass(df$ab))), data.frame(t(unclass(df$ba)))),`
na.rm = T)
Any help is greatly appreciated

We unlist the data.frame into a vector and apply table in base R
table(unlist(df))
# 1 2 3 4 5
# 2 1 4 1 2
Or with tidyverse, by reshaping the data into 'long' format with pivot_longer and get the count
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = everything()) %>%
count(value)
data
df <- structure(list(ab = 1:5, ba = c(1L, 3L, 3L, 3L, 5L)),
class = "data.frame", row.names = c(NA,
-5L))

For Loop over data frame, using dplyr results in error

I have a, simplified, a data frame with 71 columns and N rows. What I want to get is a frequency table of the values in the first column based on all other columns (all other columns have dummies). Simplified (with only 4 columns) this would be like that:
df <- data.frame(sample(1:8,20,replace=T),sample(0:1,20,replace = T),sample(0:1,20,replace = T),sample(0:1,20,replace = T))
I have tried this for loop with dplyr (where x is the first column with the 8 different values), and it only works for the first 10 or 11 columns without problems, but after then it only generates NA's and returns the error:
freq_df <- data.frame(matrix(NA, nrow=8, ncol=71))
for (i in 2:71){
freq_df[,i] <- df %>%
filter(df[i]==1) %>%
count(x) %>%
select(n)
}
in `[<-.data.frame`(`*tmp*`, , i, value = list(n = c(3L, 5L, 8L, :
replacement element 1 has 7 rows, need 8
Anyone knows why R returns this error? Thank you for your help!

Your error is because not all first column values will occur where other columns are 1. You have 8 unique values in the first column, maybe you have 7 when you filter on the 11th column == 1. So the results can have different lengths, which is a problem.
Try this instead, I think it's what you're trying to do. (If not, please clarify your goal by showing the expected output.)
names(df) = paste0("V", 1:4)
df %>%
group_by(V1) %>%
summarize(across(everything(), sum, .names = "{.col}_count"))
# V1 V2_count V3_count V4_count
# <int> <int> <int> <int>
# 1 1 1 0 1
# 2 2 2 1 2
# 3 3 3 3 2
# 4 4 0 0 0
# 5 5 0 0 0
# 6 6 3 1 2
# 7 7 3 1 1
# 8 8 1 1 0

In base R, we can do
names(df) <- paste0("V", 1:4)
out <- aggregate(.~ V1, df, sum, na.rm = TRUE)
names(out)[-1] <- paste0(names(out)[-1], "_count")

Collapse data frame into single row and creating new columns based on row R

I have a data frame with object names and a list of statistical moments for that object, like this:
Object Mean IQR Skew
x 1 1 1
y 2 2 2
z 3 3 3
What i want is to for each row create columns with the statistical moments and the object name prefixed. Like so:
xMean xIQR xSkew yMean yIQR ySkew zMean zIQR zSkew
1 1 1 2 2 2 3 3 3
In essence what I need is to collapse a data frame to a single row such that it list all statistical moments on a single line as i'll have many rows like the final one but a finite set of columns.

You could do:
df1$id <- 1
reshape(df1, idvar="id", timevar="Object", direction="wide")[-1]
# Mean.x IQR.x Skew.x Mean.y IQR.y Skew.y Mean.z IQR.z Skew.z
#1 1 1 1 2 2 2 3 3 3
Or using dcast, melt from reshape2
library(reshape2)
dcast(melt(df1, id.var=c('id', 'Object')), id~..., value.var='value')[-1]
# x_Mean x_IQR x_Skew y_Mean y_IQR y_Skew z_Mean z_IQR z_Skew
#1 1 1 1 2 2 2 3 3 3
Or using dplyr and tidyr
library(dplyr)
library(tidyr)
df1 %>%
gather(Var, Val, Mean:Skew) %>%
unite(VarNew,Object, Var, sep="") %>%
spread(VarNew, Val) %>%
select(-id)
# xIQR xMean xSkew yIQR yMean ySkew zIQR zMean zSkew
#1 1 1 1 2 2 2 3 3 3
data
df1 <- structure(list(Object = c("x", "y", "z"), Mean = 1:3, IQR = 1:3,
Skew = 1:3), .Names = c("Object", "Mean", "IQR", "Skew"), class = "data.frame", row.names = c(NA,
-3L))

Or maybe something like
setNames(unlist(data.frame(t(df[-1]))), paste0(rep(df[, 1], each = nrow(df)), names(df[, -1])))
# xMean xIQR xSkew yMean yIQR ySkew zMean zIQR zSkew
# 1 1 1 2 2 2 3 3 3

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Rstudio Columns Multiple Binary Features - r

To rename by index: colnames(df)[4:5] <- c("age_10", "age_11") To rename by existing column name colnames(df)[colnames(df) == "INSERT_COL_NAME"] <- "NEW_COL_NAME"

Related

Updating Values in New Data with Old Data

Summarizing a dataframe in R with multiple functions in place?

How do I merge multiple contingency tables into one using R?

For Loop over data frame, using dplyr results in error

Collapse data frame into single row and creating new columns based on row R

Categories

Resources