This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 6 years ago.
I have my data in the below table structure:
Person ID | Role | Role Count
-----------------------------
1 | A | 24
1 | B | 3
2 | A | 15
2 | B | 4
2 | C | 7
I would like to reshape this so that there is one row for each Person ID, A column for each distinct role (e.g. A,B,C) and then the Role Count for each person as the values. Using the above data the output would be:
Person ID | Role A | Role B | Role C
-------------------------------------
1 | 24 | 3 | 0
2 | 16 | 4 | 7
Coming from a Java background I would take an iterative approach to this:
Find all distinct values for Role
Create a new table with a column for PersonID and each of the distinct roles
Iterate through the first table, get role counts for each Person ID and Role combination and insert results into new table.
Is there another way of doing this in R without iterating through the first table?
Thanks
Try:
library(tidyr)
df %>% spread(Role, `Role Count`)
To make the column names exactly as per your example:
df2 <- df %>% spread(Role, `Role Count`)
names(df2) <- paste('Role', names(df2))
Try this:
library(reshape2)
df <- dcast(df, PersonID~Role, value.var='RoleCount')
df[is.na(df)] <- 0
names(df)[-1] <- paste('Role', names(df[-1]))
df
PersonID Role A Role B Role C
1 1 24 3 0
2 2 15 4 7
With spread from tidyr
library(tidyr)
spread(data, Role, `Role Count`, sep = " ")
Related
This question already has answers here:
Reshape three column data frame to matrix ("long" to "wide" format) [duplicate]
(6 answers)
Closed 2 years ago.
I have a dataset like so:
Name | Pet | isTrain |
---------------------------------
Ben | Dog | 1 |
Kim | Cat | 0 |
Kim | Rabbit | 0 |
How do I make this into a matrix in R where the Name is the row and the Pet is the column, and isTrain is the value?
We can use xtabs from base R
xtabs(isTrain ~ Name + Pet, df1)
# Pet
#Name Cat Dog Rabbit
# Ben 0 1 0
# Kim 0 0 0
data
df1 <- data.frame(Name = c('Ben', 'Kim', 'Kim'),
Pet = c('Dog', 'Cat', 'Rabbit'), isTrain = c(1, 0, 0))
This question already has answers here:
Transpose / reshape dataframe without "timevar" from long to wide format
(9 answers)
Closed 3 years ago.
I have a dataset like that:
RULE | GENERATION
A | 1
B | 1
C | 1
D | 2
I would like this output:
1 | 2
A | D
B |
C |
At this time i tried spread, aggregate and also a lot of functions, but still no have the desire result. I want to group by "GENERATION" and make its categories the column name of the new dataset where each column have the values with same order of the first dataset.
Thanks.
Something like this?
library(tidyverse)
df<-data.frame(x=c(letters[1:4]),y=c(1,1,1,2))
df%>%
group_by(y)%>%
mutate(num=row_number())%>%
spread(y,x)%>%
select(-num)
# A tibble: 3 x 2
`1` `2`
<fct> <fct>
1 a d
2 b NA
3 c NA
I have a unique issue that I am trying to solve.
I have a data table that contains few different types of information in it.
Example bellow.
ID|inpSeq|Act |User |Representing
--|----- |----|---- |-----
1 | 123 | s | ABC | NA
1 | 124 | s | ABC | NA
1 | 125 | c | ABC | x1
1 | 126 | c | XYZ | x2
1 | 127 | d | ABC | x2
What I am trying to do is to organize the data so that view how "User" relates to "Repres"
In other words, I am looking to create following output
ID|Act |User|....
--|------|----|----|----
1 | sscd | ABC| x1 | x2.....
1 | c | XYZ| x2.....
So as you can see the original table is compacted into "User" centric view and the "Act" now contains all the activity that User performed on single ID.
Additionally, one I have this activity sorted out, I would need to (dynamically, if different) show on who's behalf they performed the activity. This is represented by x1, x2..... meaning that this can grow depending on how may unique "Representing" parties there are for each ID/Act/User combinations.
An important thing to note is that "s" values in Act field will always have NA in Representing filed. So in those NA do not need to be included in the transformed view.
Now thus far I was able to get the ID|Act|User part of the code figured out by using following code
aggregate(Act~ID+User, paste, collapse="", data=df)
But I need to figure out how to do the rest. That is where I need all of your help.
P.S. "inpSeq" field is a just unique numeric field that is created sequentially by an outside application and it allows for ordering of activities in correct sequential order.
With your data as a data frame df, you can use dplyr with the spread function from tidyr to get what you want:
library(dplyr)
library(tidyr)
f <- function(x) { paste(na.omit(x), collapse="") } ## 1.
result <- df %>% spread(Representing, Representing) %>% ## 2.
select(-inpSeq, -`<NA>`) %>% ## 3.
group_by(ID, User) %>% ## 4.
summarise_each(funs(f)))
Notes:
We define a function f that collapses the vector of characters to a single string and omits NAs in the process.
The first argument to spread is the column name for the keys and the second argument is the column name for the values. The spread function spreads the the values into multiple columns. These additional columns are named by the keys. Here, we spread the rows of Representing into multiple columns named after the rows of Representing. The result of just that command on your data gives:
## ID inpSeq Act User x1 x2 <NA>
##1 1 123 s ABC <NA> <NA> <NA>
##2 1 124 s ABC <NA> <NA> <NA>
##3 1 125 c ABC x1 <NA> <NA>
##4 1 126 c XYZ <NA> x2 <NA>
##5 1 127 d ABC <NA> x2 <NA>
Note that there are now three additional columns named x1, x2, and <NA> replacing the original Representing column.
From this result, we use select to omit the columns inpSeq and <NA>.
We then group_by ID and User and summaries_each of the remaining columns using the function f that we defined.
The result is:
print(result)
##Source: local data frame [2 x 5]
##Groups: ID [?]
## ID User Act x1 x2
## <int> <fctr> <chr> <chr> <chr>
##1 1 ABC sscd x1 x2
##2 1 XYZ c x2
Have a data.frame, df as below
id | name | value
1 | team1 | 3
1 | team2 | 1
2 | team1 | 1
2 | team2 | 4
3 | team1 | 0
3 | team2 | 6
4 | team1 | 1
4 | team2 | 2
5 | team1 | 3
5 | team2 | 0
How do we subset the data frame to get rows for all values of id from 2:4 ?
We can apply conditionally like df[,df$id >= 2 & df$id <= 4] . But is there a way to directly use a vector of integer ranges like ids <- c(2:4) to subset a dataframe ?
One way to do this is df[,df$id >= min(ids) & df$id <= max(ids)].
Is there a more elegant R way of doing this ?
The most typical way is mentioned already, but also variations using match
with(df, df[match(id, 2:4, F) > 0, ])
or, similar
with(df, df[is.element(id, 2:4), ])
Imagine I have a data frame with data like this:
A | B | C
---+---+---
1 | 2 | a
1 | 2 | b
5 | 5 | a
5 | 5 | b
I want to take only columns A and B, and I want to remove any rows that have become duplicates as a result of eliminating all other columns (that is, column C). So my desied result for the table above would be:
A | B
---+---
1 | 2
5 | 5
What is the best way to do this?
If your data.frame is called df, then do this:
unique(df[, c("A", "B")])