Creating a Ones and Zeros matrix in R from features [duplicate] - r

This question already has answers here:
Reshape three column data frame to matrix ("long" to "wide" format) [duplicate]
(6 answers)
Closed 2 years ago.
I have a dataset like so:
Name | Pet | isTrain |
---------------------------------
Ben | Dog | 1 |
Kim | Cat | 0 |
Kim | Rabbit | 0 |
How do I make this into a matrix in R where the Name is the row and the Pet is the column, and isTrain is the value?

We can use xtabs from base R
xtabs(isTrain ~ Name + Pet, df1)
# Pet
#Name Cat Dog Rabbit
# Ben 0 1 0
# Kim 0 0 0
data
df1 <- data.frame(Name = c('Ben', 'Kim', 'Kim'),
Pet = c('Dog', 'Cat', 'Rabbit'), isTrain = c(1, 0, 0))

Related

How to get all combinations of 2 columns into 1 data frame in R? [duplicate]

This question already has answers here:
Unique combination of all elements from two (or more) vectors
(6 answers)
Closed 2 years ago.
I think an example will make my question clearer
c1<- c('A','B','C')
c2<- c(1,2)
Desired Output
c1 | c2
____________
A | 1
A | 2
B | 1
B | 2
C | 1
C | 2
a simple solution is the expand.grid function:
expand.grid(c1,c2)

Split survey text cell in multiple (unique and binary) columns [duplicate]

This question already has answers here:
Split string column to create new binary columns
(10 answers)
Dummy variables from a string variable
(7 answers)
Closed 4 years ago.
I have a survey database at user level, one of the fields has several multiple choices that the user has selected. Example
col1 | col2
ID1 | a, b, c
ID2 | c, f
ID3 | g, k, z
I want to reshape the file as follows using R:
col1| col2(a)| col3(b)| col4(c)| col5(f)| col6(g)| col7(k)| col8(z)**
ID1 | 1 | 1 | 1 | 0 | 0 | 0 | 0
ID2 | 0 | 0 | 1 | 1 | 0 | 0 | 0
ID3 | 0 | 0 | 0 | 0 | 1 | 1 | 1
please note: I don't know how many distinct values are existing in the original multiple choice field.
Thanks
One option is mtabuate after splitting the 'col2' by ,
library(qdapTools)
cbind(df1[1], mtabulate(strsplit(df1$col2, ", ")))

How to transpose data frame by a column and also group by each column value in R [duplicate]

This question already has answers here:
Transpose / reshape dataframe without "timevar" from long to wide format
(9 answers)
Closed 3 years ago.
I have a dataset like that:
RULE | GENERATION
A | 1
B | 1
C | 1
D | 2
I would like this output:
1 | 2
A | D
B |
C |
At this time i tried spread, aggregate and also a lot of functions, but still no have the desire result. I want to group by "GENERATION" and make its categories the column name of the new dataset where each column have the values with same order of the first dataset.
Thanks.
Something like this?
library(tidyverse)
df<-data.frame(x=c(letters[1:4]),y=c(1,1,1,2))
df%>%
group_by(y)%>%
mutate(num=row_number())%>%
spread(y,x)%>%
select(-num)
# A tibble: 3 x 2
`1` `2`
<fct> <fct>
1 a d
2 b NA
3 c NA

Reshaping data with R [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 6 years ago.
I have my data in the below table structure:
Person ID | Role | Role Count
-----------------------------
1 | A | 24
1 | B | 3
2 | A | 15
2 | B | 4
2 | C | 7
I would like to reshape this so that there is one row for each Person ID, A column for each distinct role (e.g. A,B,C) and then the Role Count for each person as the values. Using the above data the output would be:
Person ID | Role A | Role B | Role C
-------------------------------------
1 | 24 | 3 | 0
2 | 16 | 4 | 7
Coming from a Java background I would take an iterative approach to this:
Find all distinct values for Role
Create a new table with a column for PersonID and each of the distinct roles
Iterate through the first table, get role counts for each Person ID and Role combination and insert results into new table.
Is there another way of doing this in R without iterating through the first table?
Thanks
Try:
library(tidyr)
df %>% spread(Role, `Role Count`)
To make the column names exactly as per your example:
df2 <- df %>% spread(Role, `Role Count`)
names(df2) <- paste('Role', names(df2))
Try this:
library(reshape2)
df <- dcast(df, PersonID~Role, value.var='RoleCount')
df[is.na(df)] <- 0
names(df)[-1] <- paste('Role', names(df[-1]))
df
PersonID Role A Role B Role C
1 1 24 3 0
2 2 15 4 7
With spread from tidyr
library(tidyr)
spread(data, Role, `Role Count`, sep = " ")

If Column 1 greater than Column 2, assign name "Col1" in column 3 [duplicate]

This question already has answers here:
Nested ifelse statement
(10 answers)
For each row return the column name of the largest value
(10 answers)
Closed 6 years ago.
I'm looking to find a solution here that would decide between 3-4 columns on a line-by-line basis (apply?), and allocate the name of highest value column in a new column [df$NAME]. Below is what I'd like the result to look like; what is the best approach for this? Thank you!
| Bird | Cat | Dog | NAME
| 3 | 4 | 10 | DOG
| 5 | 2 | 4 | BIRD
| 3 | 6 | 2 | CAT
| 4 | 8 | 9 | DOG
Is there something like a pmax index? is perfect for you.
The max.col()-based solution (second answer in the linked question) is preferable to the apply()-based solution, both for simplicity and efficiency.
df$NAME <- names(df)[max.col(df)];
df;
## Bird Cat Dog NAME
## 1 3 4 10 Dog
## 2 5 2 4 Bird
## 3 3 6 2 Cat
## 4 4 8 9 Dog
Data
df <- data.frame(Bird=c(3L,5L,3L,4L),Cat=c(4L,2L,6L,8L),Dog=c(10L,4L,2L,9L));

Resources