wide to long multiple columns issue - r

I have something like this:
id role1 Approved by Role1 role2 Approved by Role2
1 Amy 1/1/2019 David 4/4/2019
2 Bob 2/2/2019 Sara 5/5/2019
3 Adam 3/3/2019 Rachel 6/6/2019
I want something like this:
id Name Role Approved
1 Amy role1 1/1/2019
2 Bob role1 2/2/2019
3 Adam role1 3/3/2019
1 David role2 4/4/2019
2 Sara role2 5/5/2019
3 Rachel role2 6/6/2019
I thought something like this would work
melt(df,id.vars= id,
measure.vars= list(c("role1", "role2"),c("Approved by Role1", "Approved by Role2")),
variable.name= c("Role","Approved"),
value.name= c("Name","Date"))
but i am getting Error: measure variables not found in data:c("role1", "role2"),c("Approved by Role1", "Approved by Role2")
I have tried replacing this with the number of the columns as well and haven't had any luck.
Any suggestions?? Thanks!

I really like the new tidyr::pivot_longer() function. It's still only available in the dev version of tidyr, but should be released shortly. First I'm going to clean up the column names slightly, so they have a consistent structure:
> df
# A tibble: 3 x 5
id name_role1 approved_role1 name_role2 approved_role2
<dbl> <chr> <chr> <chr> <chr>
1 1 Amy 1/1/2019 David 4/4/2019
2 2 Bob 2/2/2019 Sara 5/5/2019
3 3 Adam 3/3/2019 Rachel 6/6/2019
Then it's easy to convert to long format with pivot_longer():
library(tidyr)
df %>%
pivot_longer(
-id,
names_to = c(".value", "role"),
names_sep = "_"
)
Output:
id role name approved
<dbl> <chr> <chr> <chr>
1 1 role1 Amy 1/1/2019
2 1 role2 David 4/4/2019
3 2 role1 Bob 2/2/2019
4 2 role2 Sara 5/5/2019
5 3 role1 Adam 3/3/2019
6 3 role2 Rachel 6/6/2019

Related

R- How to reshape Long to Wide with multiple variables/columns

I started off with the following subset of my data
UserID labelnospaces responses
1 Were you given any info? yes
1 By using this service..? yes
1 How satisfied are you? Very satisfied
2 Were you given any info? no
2 By using this service..? no
2 How satisfied are you? unsatisfied
By using the code below, I was able to get from long to wide perfectly
service_L_to_W<- reshape(data=service, idvar="UserID",
timevar = "labelnospaces",
direction = "wide")
Using the code above, I got (this is what I wanted)
UserID Were you given any info? By using this service..? How satisfied are you?
1 yes yes very satisfied
2 no no unsatisfied
My question is how do I edit my code so that I can convert my data (with the extra variables/columns) from long to wide:
UserID Full Name DOB EncounterID QuestionID Name Type labelnospaces responses
1 John Smith 1-1-90 13 505 Intro Check Were you given any info? yes
1 John Smith 1-1-90 13 506 Care Check By using this service.. yes
1 John Smith 1-1-90 13 507 Out Check How satisfied are you? vsat
2 Jane Doe 2-2-80 14 505 Intro Check Were you given any info? no
2 Jane Doe 2-2-80 14 506 Care Check By using this service.. no
2 Jane Doe 2-2-80 14 507 Out Check How satisfied are you? unsat
Some variables are can be better to together
df %>%
pivot_wider(id_cols = c(UserID, Full.Name, DOB, EncounterID), names_from = c(QuestionID, QName, labelnospaces), values_from = responses)
UserID Full.Name DOB EncounterID `505_Intro_Were you given any info?` `506_Care_By using this service..`
<int> <chr> <chr> <int> <chr> <chr>
1 1 John Smith 1-1-90 13 yes yes
2 2 Jane Doe 2-2-80 14 no no
`507_Out_How satisfied are you?`
<chr>
1 vsat
2 unsat

Melting dataframe in R

I have the following R dataframe :
foo <- data.frame("Department" = c('IT', 'IT', 'Sales'),
"Name.boy" = c('John', 'Mark', 'Louis'),
"Age.boy" = c(21,23,44),
"Name.girl" = c('Jane', 'Charlotte', 'Denise'),
"Age.girl" = c(16,25,32))
which looks like the following :
Department Name.boy Age.boy Name.girl Age.girl
IT John 21 Jane 16
IT Mark 23 Charlotte 25
Sales Louis 44 Denise 32
How do I 'melt' the dataframe, so that for a given Department, I have three columns : Name, Age, and Sex ?
Department Name Age Sex
IT John 21 Boy
IT Jane 16 Girl
IT Mark 23 Boy
IT Charlotte 25 Girl
Sales Louis 44 Boy
Sales Denise 32 Girl
We can use pivot_longer from tidyr
library(tidyr)
pivot_longer(foo, cols = -Department, names_to = c(".value", "Sex"),
names_sep="\\.")
# A tibble: 6 x 4
# Department Sex Name Age
# <chr> <chr> <chr> <dbl>
#1 IT boy John 21
#2 IT girl Jane 16
#3 IT boy Mark 23
#4 IT girl Charlotte 25
#5 Sales boy Louis 44
#6 Sales girl Denise 32
Using reshape:
reshape(foo, direction="long", varying=2:5, tiemvar="Sex")
Department Sex Name Age id
1.boy IT boy John 21 1
2.boy IT boy Mark 23 2
3.boy Sales boy Louis 44 3
1.girl IT girl Jane 16 1
2.girl IT girl Charlotte 25 2
3.girl Sales girl Denise 32 3

Reshaping a dataset of patients with different numbers of diagnosis from long to wide [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 3 years ago.
I am a beginner, confronted with a big task and all the typical long to wide reshaping tools I found using the search function did not really do the job for me. I would be glad if someone could help me.
I try to achieve the following:
I have patientdata in which every patient has a unique patient number but multiple stays in hospital have lead to multiple cases per person. I want to work with these cases. Problem is, I have all the diagnoses per case but not everybody has the same number of diagnosis and I don't know how to tell R to create a new dagnosis (and date of diagnosis) variable each time there is already a diagnosis. Every help is highly appreciated!
So, I have a huge dataset that looks roughly like that:
Patient Case Diagnosis DateOfDiagnosis
1 John Doe 1 A 2010-10-10
2 John Doe 1 B 2010-10-10
3 John Doe 1 C 2010-10-10
4 Peter Griffin 2 D 2010-10-11
5 Peter Griffin 2 E 2010-10-11
6 Homer Simpson 3 F 2010-10-12
7 Homer Simpson 4 G 2010-10-13
I need row by case and I need all the diagnosis and their dates in separate variables. This would be no problem but there is no pattern in the cases or diagnosis so some patients have only one case others 5 and some cases have 1 others 5 diagnoses with respective date.
So what I need looks like this:
Patient Case Diag1 DateOfDiag1 Diag2 DateOfDiag2 Diag3 DateOfDiag3 ....
1 John Doe 1 A 2010-10-10 B 2010-10-10 C 2010-10-10
2 Peter Grif 2 D 2010-10-11 E 2010-10-11 NA NA
3 Homer Simp 3 F 2010-10-12 NA NA NA NA
4 Homer Simp 4 G 2010-10-13 NA NA NA NA
The code for my example is:
Patient <- c('John Doe','John Doe','John Doe', 'Peter Griffin','Peter Griffin', 'Homer Simpson', 'Homer Simpson')
Case <- c(1,1,1,2,2,3,4)
Diagnosis <- c('A','B','C','D','E','F','G')
DateOfDiagnosis <- as.Date(c('2010-10-10','2010-10-10','2010-10-10','2010-10-11','2010-10-11','2010-10-12','2010-10-13'))
df<-data.frame(Patient, Case, Diagnosis, DateOfDiagnosis)
Every help is highly appreciated!
Kind regards,
Jan
You could use pivot_wider, after creating a unique column.
library(dplyr)
library(tidyr)
df %>%
group_by(Patient, Case) %>%
mutate(row = row_number()) %>%
pivot_wider(values_from = c(Diagnosis, DateOfDiagnosis), names_from = row)
# Patient Case Diagnosis_1 Diagnosis_2 Diagnosis_3 DateOfDiagnosis_1 DateOfDiagnosis_2 DateOfDiagnosis_3
# <fct> <dbl> <fct> <fct> <fct> <date> <date> <date>
#1 John Doe 1 A B C 2010-10-10 2010-10-10 2010-10-10
#2 Peter Griffin 2 D E NA 2010-10-11 2010-10-11 NA
#3 Homer Simpson 3 F NA NA 2010-10-12 NA NA
#4 Homer Simpson 4 G NA NA 2010-10-13 NA NA

Getting Data in a single row into multiple rows

I have a code where I see which people work in certain groups. When I ask the leader of each group to present those who work for them, in a survey, I get a row of all of the team members. What I need is to clean the data into multiple rows with their group information.
I don't know where to start.
This is what my data frame looks like,
LeaderName <- c('John','Jane','Louis','Carl')
Group <- c('3','1','4','2')
Member1 <- c('Lucy','Stephanie','Chris','Leslie')
Member1ID <- c('1','2','3','4')
Member2 <- c('Earl','Carlos','Devon','Francis')
Member2ID <- c('5','6','7','8')
Member3 <- c('Luther','Peter','','Severus')
Member3ID <- c('9','10','','11')
GroupInfo <- data.frame(LeaderName, Group, Member1, Member1ID, Member2 ,Member2ID, Member3, Member3ID)
This is what I would like it to show with a certain code
LeaderName_ <- c('John','Jane','Louis','Carl','John','Jane','Louis','Carl','John','Jane','','Carl')
Group_ <- c('3','1','4','2','3','1','4','2','3','1','','2')
Member <- c('Lucy','Stephanie','Chris','Leslie','Earl','Carlos','Devon','Francis','Luther','Peter','','Severus')
MemberID <- c('1','2','3','4','5','6','7','8','9','10','','11')
ActualGroupInfor <- data.frame(LeaderName_,Group_,Member,MemberID)
An option would be melt from data.table and specify the column name patterns in the measure parameter
library(data.table)
melt(setDT(GroupInfo), measure = patterns("^Member\\d+$",
"^Member\\d+ID$"), value.name = c("Member", "MemberID"))[, variable := NULL][]
# LeaderName Group Member MemberID
# 1: John 3 Lucy 1
# 2: Jane 1 Stephanie 2
# 3: Louis 4 Chris 3
# 4: Carl 2 Leslie 4
# 5: John 3 Earl 5
# 6: Jane 1 Carlos 6
# 7: Louis 4 Devon 7
# 8: Carl 2 Francis 8
# 9: John 3 Luther 9
#10: Jane 1 Peter 10
#11: Louis 4
#12: Carl 2 Severus 11
Here is a solution in base r:
reshape(
data=GroupInfo,
idvar=c("LeaderName", "Group"),
varying=list(
Member=which(names(GroupInfo) %in% grep("^Member[0-9]$",names(GroupInfo),value=TRUE)),
MemberID=which(names(GroupInfo) %in% grep("^Member[0-9]ID",names(GroupInfo),value=TRUE))),
direction="long",
v.names = c("Member","MemberID"),
sep="_")[,-3]
#> LeaderName Group Member MemberID
#> John.3.1 John 3 Lucy 1
#> Jane.1.1 Jane 1 Stephanie 2
#> Louis.4.1 Louis 4 Chris 3
#> Carl.2.1 Carl 2 Leslie 4
#> John.3.2 John 3 Earl 5
#> Jane.1.2 Jane 1 Carlos 6
#> Louis.4.2 Louis 4 Devon 7
#> Carl.2.2 Carl 2 Francis 8
#> John.3.3 John 3 Luther 9
#> Jane.1.3 Jane 1 Peter 10
#> Louis.4.3 Louis 4
#> Carl.2.3 Carl 2 Severus 11
Created on 2019-05-23 by the reprex package (v0.2.1)

Find the favorite and analyse sequence questions in R

We have a daily meeting when participants nominate each other to speak. The first person is chosen randomly.
I have a dataframe that consists of names and the order of speech every day.
I have a day1, a day2 ,a day3 , etc. in the columns.
The data in the rows are numbers, meaning the order of speech on that particular day.
NA means that the person did not participate on that day.
Name day1 day2 day3 day4 ...
Albert 1 3 1 ...
Josh 2 2 NA
Veronica 3 5 3
Tim 4 1 2
Stew 5 4 4
...
I want to create two analysis, first, I want to create a dataframe who has chosen who the most times. (I know that the result depends on if a participant was nominated before and therefore on that day that participant cannot be nominated again, I will handle it later, but for now this is enough)
It should look like this:
Name Favorite
Albert Stew
Josh Veronica
Veronica Tim
Tim Stew
...
My questions (feel free to answer only one if you can):
1. What code shall I use for it without having to manunally put the names in a different dataframe?
2. How shall I handle a tie, for example Josh chose Veronica and Tim first the same number of times? Later I want to visualise it and I have no idea how to handle ties.
I also would like to analyse the results to visualise strong connections.
Like to show that there are people who usually chose each other, etc.
Is there a good package that is specialised for these? Or how should I get to it?
I do not need DNA sequences, only this simple ones, but I have not found a suitable one yet.
Thanks for your help!
If I am not misunderstanding your problem, here is some code to get the number of occurences of who choose who as next speaker. I added a fourth day to have some count that is not 1. There are ties in the result, choosing the first couple of each group by speaker ('who') may be a solution :
df <- read.table(textConnection(
"Name,day1,day2,day3,day4
Albert,1,3,1,3
Josh,2,2,,2
Veronica,3,5,3,1
Tim,4,1,2,4
Stew,5,4,4,5"),header=TRUE,sep=",",stringsAsFactors=FALSE)
purrr::map(colnames(df)[-1],
function (x) {
who <- df$Name[order(df[x],na.last=NA)]
data.frame(who,lead(who),stringsAsFactors=FALSE)
}
) %>%
replyr::replyr_bind_rows() %>%
filter(!is.na(lead.who.)) %>%
group_by(who,lead.who.) %>% summarise(n=n()) %>%
arrange(who,desc(n))
Input:
Name day1 day2 day3 day4
1 Albert 1 3 1 3
2 Josh 2 2 NA 2
3 Veronica 3 5 3 1
4 Tim 4 1 2 4
5 Stew 5 4 4 5
Result:
# A tibble: 12 x 3
# Groups: who [5]
who lead.who. n
<chr> <chr> <int>
1 Albert Tim 2
2 Albert Josh 1
3 Albert Stew 1
4 Josh Albert 2
5 Josh Veronica 1
6 Stew Veronica 1
7 Tim Stew 2
8 Tim Josh 1
9 Tim Veronica 1
10 Veronica Josh 1
11 Veronica Stew 1
12 Veronica Tim 1

Resources