Wide to Long format with multiple variables? [duplicate] - r

This question already has answers here:
Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
(8 answers)
Closed 4 years ago.
I'm looking to convert a data frame from wide to long format, while maintaining multiple columns.
Here is sample data:
df <- read.table(header=T, text='
Subject Day Correct1 Correct2 Correct3 Percent1 Percent2 Percent3
1 1 1 0 1 50 25 70
2 1 1 0 0 75 30 80
3 1 0 1 1 70 45 90
4 1 0 1 0 80 50 100
5 1 1 1 1 90 60 100
1 2 0 1 0 30 75 90
2 2 0 0 1 45 70 80
3 2 1 1 0 50 30 90
4 2 1 0 0 60 45 100
5 2 1 1 1 80 45 90
')
And would like it to look like this -- where I have a Correct and Percent column.
Subject Day Correct CorrectValue Percent PercentValue
1 1 1 1 1 50
2 1 1 1 1 75
3 1 1 0 1 70
4 1 1 0 1 80
5 1 1 1 1 90
1 1 2 0 2 25
2 1 2 0 2 30
3 1 2 1 2 45
4 1 2 1 2 50
5 1 2 1 2 60
1 1 3 1 3 70
2 1 3 0 3 80
3 1 3 1 3 90
4 1 3 0 3 100
5 1 3 1 3 100
1 2 1 0 1 30
2 2 1 0 1 45
3 2 1 1 1 50
4 2 1 1 1 60
5 2 1 1 1 80
1 2 2 1 2 75
2 2 2 0 2 70
3 2 2 1 2 30
4 2 2 0 2 45
5 2 2 1 2 45
1 2 3 0 3 90
2 2 3 1 3 80
3 2 3 0 3 90
4 2 3 0 3 100
5 2 3 1 3 90
Thank you!

With gather from tidyr:
library(dplyr)
library(tidyr)
df %>%
gather(Correct, CorrectValue, Correct1:Correct3) %>%
gather(Percent, PercentValue, Percent1:Percent3) %>%
mutate_at(vars(Correct, Percent), ~sub("[[:alpha:]]+", "", .))
Result:
Subject Day Correct CorrectValue Percent PercentValue
1 1 1 1 1 1 50
2 2 1 1 1 1 75
3 3 1 1 0 1 70
4 4 1 1 0 1 80
5 5 1 1 1 1 90
6 1 2 1 0 1 30
7 2 2 1 0 1 45
8 3 2 1 1 1 50
9 4 2 1 1 1 60
10 5 2 1 1 1 80
11 1 1 2 0 1 50
12 2 1 2 0 1 75
13 3 1 2 1 1 70
14 4 1 2 1 1 80
15 5 1 2 1 1 90
16 1 2 2 1 1 30
17 2 2 2 0 1 45
18 3 2 2 1 1 50
19 4 2 2 0 1 60
20 5 2 2 1 1 80
21 1 1 3 1 1 50
22 2 1 3 0 1 75
23 3 1 3 1 1 70
24 4 1 3 0 1 80
25 5 1 3 1 1 90
...

Related

Recode when there is a missing category in R

I need a recoding help. Here how my dataset looks like.
df <- data.frame(id = c(1,1,1,1,1, 2,2,2,2,2, 3,3,3,3,3, 4,4,4,4,4),
score = c(0,1,0,1,0, 0,2,0,2,2, 0,3,3,0,0, 0,1,3,1,3))
> df
id score
1 1 0
2 1 1
3 1 0
4 1 1
5 1 0
6 2 0
7 2 2
8 2 0
9 2 2
10 2 2
11 3 0
12 3 3
13 3 3
14 3 0
15 3 0
16 4 0
17 4 1
18 4 3
19 4 1
20 4 3
Some ids have missing score categories. So if this is the case per id, I would like to recode score category. So:
a) if the score options are `0,1,2` and `1` score is missing, then `2` need to be recoded as `1`,
b) if the score options are `0,1,2,3` and `1,2` score is missing, then `3` need to be recoded as `1`,
c) if the score options are `0,1,2,3` and `2` score is missing, then `2,3` need to be recoded as `1,2`,
the idea is there should not be any missing score categories in between.
The desired output would be:
> df.1
id score score.recoded
1 1 0 0
2 1 1 1
3 1 0 0
4 1 1 1
5 1 0 0
6 2 0 0
7 2 2 1
8 2 0 0
9 2 2 1
10 2 2 1
11 3 0 0
12 3 3 1
13 3 3 1
14 3 0 0
15 3 0 0
16 4 0 0
17 4 1 1
18 4 3 2
19 4 1 1
20 4 3 2
df %>%
group_by(id)%>%
mutate(score = as.numeric(factor(score)) - 1)
# A tibble: 20 x 2
# Groups: id [4]
id score
<dbl> <dbl>
1 1 0
2 1 1
3 1 0
4 1 1
5 1 0
6 2 0
7 2 1
8 2 0
9 2 1
10 2 1
11 3 0
12 3 1
13 3 1
14 3 0
15 3 0
16 4 0
17 4 1
18 4 2
19 4 1
20 4 2
Using data.table
library(data.table)
setDT(df)[, score.recoded := 0][
score >0, score.recoded := match(score, score), id]
-output
> df
id score score.recoded
<num> <num> <int>
1: 1 0 0
2: 1 1 1
3: 1 0 0
4: 1 1 1
5: 1 0 0
6: 2 0 0
7: 2 2 1
8: 2 0 0
9: 2 2 1
10: 2 2 1
11: 3 0 0
12: 3 3 1
13: 3 3 1
14: 3 0 0
15: 3 0 0
16: 4 0 0
17: 4 1 1
18: 4 3 2
19: 4 1 1
20: 4 3 2

How Would I go About Subsetting this Data Frame?

I have the follow data frame:
> resident
X LOS Age Meds MHealth DietRest ReligAff NmChores Employed EdLevel Courses
1 R1 27 35 2 1 3 2 2 0 2 1
2 R2 56 43 0 0 0 1 3 1 3 2
3 R3 101 41 1 1 0 0 2 2 2 3
4 R4 19 54 3 2 4 3 1 0 1 0
5 R5 34 29 0 0 0 2 3 0 2 1
6 R6 78 46 2 0 2 1 2 1 3 2
7 R7 134 51 3 2 4 0 1 1 3 2
8 R8 112 38 0 1 1 4 2 1 2 3
9 R9 83 61 3 1 3 2 2 0 4 3
10 R10 9 50 2 0 2 1 1 2 2 0
11 R11 67 23 0 1 0 0 2 0 3 1
12 R12 30 47 2 2 0 3 2 0 4 0
13 R13 95 65 4 1 4 2 2 0 3 2
14 R14 165 63 5 2 4 1 1 0 2 2
15 R15 29 40 0 1 0 0 3 2 5 0
16 R16 44 33 2 2 1 0 2 0 3 1
17 R17 36 48 2 1 0 3 2 0 1 1
18 R18 58 57 3 0 2 1 1 1 2 1
19 R19 116 39 0 1 0 2 2 1 3 1
20 R20 73 44 1 0 0 2 1 0 4 2
21 R21 79 30 3 2 3 3 1 0 2 1
22 R22 39 41 0 0 0 0 3 2 2 2
23 R23 18 50 2 1 2 1 1 1 3 0
24 R24 60 35 1 0 0 0 2 1 4 2
25 R25 106 48 3 2 3 2 2 0 2 2
26 R26 46 31 2 1 0 0 1 1 3 1
27 R27 52 59 2 0 1 1 3 2 2 1
28 R28 28 62 6 0 4 2 1 0 5 1
29 R29 79 45 4 2 3 3 2 1 3 2
30 R30 24 42 1 1 1 0 1 0 2 1
31 R31 123 36 3 1 0 2 2 1 3 4
32 R32 11 49 2 0 2 1 2 0 1 0
33 R33 95 26 1 1 0 1 3 0 3 4
34 R34 61 24 0 0 0 2 2 1 2 1
35 R35 88 63 2 1 0 1 1 1 4 2
36 R36 64 38 1 2 1 4 1 1 2 3
37 R37 99 40 2 0 0 1 3 2 4 1
>
LOS = length of stay
I am trying to go through the data frame and create a new column that consists of either a zero or one, based upon if the resident is completing an average of one course every thirty days. How would I go upon doing this? I understand I would need to do something like within this subset of people, break things down so that if someone has been there between thirty and fifty-nine days and has completed at least one course, they receive a value of one. If someone has been there between sixty and eighty-nine days and that person has finished at least two courses, give them a one, and so forth and if not give them a value of zero. How would I create a function that does this and adds a value of either 1 or 0 to a new vector based upon the data for each resident?

Reshape wide data to long with multiple variables in R (dplyr) [duplicate]

This question already has an answer here:
How to use Pivot_longer to reshape from wide-type data to long-type data with multiple variables
(1 answer)
Closed 2 years ago.
I have a dataset of adolescents over 3 waves. I need to reshape the data from wide to long, but I haven't been able to figure out how to use pivot_longer (I've checked other questions, but maybe I missed one?). Below is sample data:
HAVE DATA:
id c1sports c2sports c3sports c1smoker c2smoker c3smoker c1drinker c2drinker c3drinker
1 1 1 1 1 1 4 1 5 2
2 1 1 1 5 1 3 4 1 4
3 1 0 0 1 1 5 2 3 2
4 0 0 0 1 3 3 4 2 3
5 0 0 0 2 1 2 1 5 3
6 0 0 0 4 1 4 4 3 1
7 1 0 1 2 2 3 1 4 1
8 0 1 1 4 4 1 4 5 4
9 1 1 1 3 2 2 3 4 2
10 0 1 0 2 5 5 4 2 3
WANT DATA:
id wave sports smoker drinker
1 1 1 1 1
1 2 1 1 5
1 3 1 4 2
2 1 1 5 4
2 2 1 1 1
2 3 1 3 4
3 1 1 1 2
3 2 0 1 3
3 3 0 5 2
4 1 0 1 4
4 2 0 3 2
4 3 0 3 3
5 1 0 2 1
5 2 0 1 5
5 3 0 2 3
6 1 0 4 4
6 2 0 1 3
6 3 0 4 1
7 1 1 2 1
7 2 0 2 4
7 3 1 3 1
8 1 0 4 4
8 2 1 4 5
8 3 1 1 4
9 1 1 3 3
9 2 1 2 4
9 3 1 2 2
10 1 0 2 4
10 2 1 2 2
10 3 0 5 3
So far the only think that I've been able to run is:
long_dat <- wide_dat %>%
pivot_longer(., cols = c1sports:c3drinker)
But this doesn't get me separate columns for sports, smoker, drinker.
You could use names_pattern argument in pivot_longer.
tidyr::pivot_longer(df,
cols = -id,
names_to = c('wave', '.value'),
names_pattern = 'c(\\d+)(.*)')
# id wave sports smoker drinker
# <int> <chr> <int> <int> <int>
# 1 1 1 1 1 1
# 2 1 2 1 1 5
# 3 1 3 1 4 2
# 4 2 1 1 5 4
# 5 2 2 1 1 1
# 6 2 3 1 3 4
# 7 3 1 1 1 2
# 8 3 2 0 1 3
# 9 3 3 0 5 2
#10 4 1 0 1 4
# … with 20 more rows

Plotting count data in r

I have counted crashes at intersections and am wondering how to plot this data in time series. The data was counted through the years of 2008 to 2018. the data is found at this link. Please, i am interested in the code and proper technique for plotting the data.
In order to get the data into table format the melt command from shape2 is required:
using melt from reshape2:
> attidtudeM=melt(df)
> head(attidtudeM)
variable value
1 F2008 0
2 F2008 1
3 F2008 1
4 F2008 2
5 F2008 0
6 F2008 1
> table(attidtudeM)
variable 0 1 2 3 4 5 6 7
F2008 235 38 11 3 0 0 0 0
F2009 244 27 8 6 2 0 0 0
F2010 237 9 31 3 2 2 3 0
F2011 241 33 11 0 1 0 1 0
F2012 246 31 8 1 1 0 0 0
F2013 251 28 7 1 0 0 0 0
F2014 265 16 5 0 0 1 0 0
F2015 261 6 17 0 2 0 1 0
F2016 263 17 5 0 1 0 0 1
F2017 275 7 4 0 0 0 0 1
F2008 F2009 F2010 F2011 F2012 F2013 F2014 F2015 F2016 F2017
1 1 1
1 2 1 1 2 1
1 1 2
2 1 2
1 1
3 1
1 1 2 3 2 2 1
3 1
2
1
1 1 4 1 1 2 2 2
2 1
2 1 1 1 1 2
1 3 2 2 1 5 4 1 7
1
2 2
1 6 2 1 2 1 1 2
1 2 1
5 2 1 2
2 1 1
1 2 2 1
2 2
1
1
1
1 0
1
4

Removing the unordered pairs repeated twice in a file in R

I have a file like this in R.
**0 1**
0 2
**0 3**
0 4
0 5
0 6
0 7
0 8
0 9
0 10
**1 0**
1 11
1 12
1 13
1 14
1 15
1 16
1 17
1 18
1 19
**3 0**
As we can see, there are similar unordered pairs in this ( marked pairs ), like,
1 0
and
0 1
I wish to remove these pairs. And I want to count the number of such pairs that I have and append the count in front of the tow that is repeated. If not repeated, then 1 should be written in the third column.
For example ( A sample of the output file )
0 1 2
0 2 1
0 3 2
0 4 1
0 5 1
0 6 1
0 7 1
0 8 1
0 9 1
0 10 1
1 11 1
1 12 1
1 13 1
1 14 1
1 15 1
1 16 1
1 17 1
1 18 1
1 19 1
How can I achieve it in R?
Here is a way using transform, pmin and pmax to reorder the data by row, and then aggregate to provide a count:
# data
x <- data.frame(a=c(rep(0,10),rep(1,10),3),b=c(1:10,0,11:19,0))
#logic
aggregate(count~a+b,transform(x,a=pmin(a,b), b=pmax(a,b), count=1),sum)
a b count
1 0 1 2
2 0 2 1
3 0 3 2
4 0 4 1
5 0 5 1
6 0 6 1
7 0 7 1
8 0 8 1
9 0 9 1
10 0 10 1
11 1 11 1
12 1 12 1
13 1 13 1
14 1 14 1
15 1 15 1
16 1 16 1
17 1 17 1
18 1 18 1
19 1 19 1
Here's one approach:
First, create a vector of the columns sorted and then pasted together.
x <- apply(mydf, 1, function(x) paste(sort(x), collapse = " "))
Then, use ave to create the counts you are looking for.
mydf$count <- ave(x, x, FUN = length)
Finally, you can use the "x" vector again, this time to detect and remove duplicated values.
mydf[!duplicated(x), ]
# V1 V2 count
# 1 0 1 2
# 2 0 2 1
# 3 0 3 2
# 4 0 4 1
# 5 0 5 1
# 6 0 6 1
# 7 0 7 1
# 8 0 8 1
# 9 0 9 1
# 10 0 10 1
# 12 1 11 1
# 13 1 12 1
# 14 1 13 1
# 15 1 14 1
# 16 1 15 1
# 17 1 16 1
# 18 1 17 1
# 19 1 18 1
# 20 1 19 1

Resources