Using reshape where there are multiple values at each time point [duplicate] - r

This question already has answers here:
Combine Multiple Columns Into Tidy Data [duplicate]
(3 answers)
Closed 5 years ago.
I'm trying to reshape a longitudinal dataset containing visual measurements for the left and right eyes of several individuals over a one year period. I need to end up with a data.frame() with the headings 'patient','month','re','le' (where 're' means 'right eye' and 'le' means 'left eye')
My data are currently in the format:
patient','re_month1','le_month1','re_month2','le_month2'....'le_month12'
I know I could use the reshape() function to sort the data if I only had one piece of data per time point. If I were just working with 'patient','month1','month2' etc, I could use the following:
reshape(dframe,idvar = 'patient',v.names = 'vision',
varying = 2:13,direction = "long")
...But how do I do this when there are two pieces of data (or more) at each time point?

We can use melt from data.table and specify the measure columns with the patterns argument. The patterns can take multiple regex/fixed column names.
library(data.table)
melt(setDT(dframe), id.var="patient",
measure = patterns("^re_", "^le_"))
# patient variable value1 value2
#1: 1 1 20 21
#2: 2 1 25 18
#3: 3 1 23 22
#4: 1 2 18 29
#5: 2 2 22 19
#6: 3 2 25 24
data
dframe <- data.frame(patient=1:3, re_month1 = c(20, 25,
23), le_month1= c(21, 18, 22), re_month2=c(18, 22, 25),
le_month2= c(29, 19, 24))

Related

Filter by Condition occurring Consecutively in R

I'm hoping to see if there is a dplyr solution to this problem as I'm building a survival dataset.
I am looking to create my 'event' coding that would satisfy a particular condition if it occurs twice consecutively. In this case, the event condition would be if Var was > 21 for two consecutive dates. For example, in the following dataset:
ID Date Var
1 1/1/20 22
1 1/3/20 23
2 1/2/20 23
2 2/10/20 18
2 2/16/20 21
3 12/1/19 16
3 12/6/19 14
3 12/20/19 22
In this case, patient 1 should remain, and patient 2 and 3 should be filtered out because > 21 did not happen consecutively, and then i'd like to simply take the maximum date by each ID so that I can easily calculate the time to the event.
Final result:
ID Date Var
1 1/3/20 23
Thank you
As long as the dates are sorted (latest date is later in the table) this should work. However, this is in data.table since I dont use dplyr that much, however it should be pretty similar.
library(data.table)
setDT(df)
df = df[Var > 21 & shift(Var > 21, n = -1), ]
df = unique(df, by = "ID", fromLast = T)

How to sum a variable by group but do not aggregate the data frame in R? [duplicate]

This question already has answers here:
Count number of rows per group and add result to original data frame
(11 answers)
Calculate group mean, sum, or other summary stats. and assign column to original data
(4 answers)
Closed 4 years ago.
although I have found a lot of ways to calculate the sum of a variable by group, all the approaches end up creating a new data set which aggregates the double cases.
To be more precise, if I have a data frame:
id year
1 2010
1 2015
1 2017
2 2011
2 2017
3 2015
and I want to count the number of times I have the same ID by the different years, there are a lot of ways (using aggregate, tapply, dplyr, sqldf etc) which use a "group by" kind of functionality that in the end will give something like:
id count
1 3
2 2
3 1
I haven't managed to find a way to calculate the same thing but keep my original data frame, in order to obtain:
id year count
1 2010 3
1 2015 3
1 2017 3
2 2011 2
2 2017 2
3 2015 1
and therefore do not aggregate my double cases.
Has somebody already figured out?
Thank you in advance

Average for column value across multiple datasets in R [duplicate]

This question already has answers here:
calculate average over multiple data frames
(5 answers)
Closed 6 years ago.
I am new to R and I need help in this. I have 3 data sets from 3 different years. they have the same columns with different values for each year. I want to find the average for the column values across the three years based on the name field. To be specific:
assume : first data set
Name Age Height Weight
A 4 20 20
B 5 22 22
C 8 25 21
D 10 25 23
second data set
Name Age Height Weight
A 5 22 25
B 6 23 26
Third data set
Name Age Height Weight
A 6 24 24
B 7 24 27
C 10 27 28
I want to find the average height for "A" across the three data sets
We can place them in a list and rbind them, group by 'Name' and get the mean of each column
library(data.table)
rbindlist(list(df1, df2, df3))[, lapply(.SD, mean), by = Name]
Or with dplyr
bind_rows(df1, df2, df3) %>%
group_by(Name) %>%
summarise_each(funs(mean))

R - How can I find a duplicated line based in one Column and add extra text in that duplicated value?

I'am looking for a easy solution, instead of doing several steps.
I have a data frame with 36 variables with almost 3000 lines, one of vars is a char type with names. They must be unique. I need to find the rows with the same name, and the add "duplicated" in the text. I can't delete the duplicated because it is from a relational data base and I'll need that row ID for others operations.
I can find the duplicated rows and them rename the text manually. But that implies in finding the duplicated, record the row ID and them replace the text name manually.
Is there a way to automatically add the extra text to the duplicated names? I'am still new to R and have a hard time making auto condition based functions.
It would be something like this:
From this:
ID name age sex
1 John 18 M
2 Mary 25 F
3 Mary 19 F
4 Ben 21 M
5 July 35 F
To this:
ID name age sex
1 John 18 M
2 Mary 25 F
3 Mary - duplicated 19 F
4 Ben 21 M
5 July 35 F
Could you guys shed some light?
Thank you very much.
Edit: the comment about adding a column is probably the best thing to do, but if you really want to do what you're suggesting...
The duplicated function will identify duplicates. Then, you just need to use paste to apply the append.
df <- data.frame(
ID = 1:5,
name = c('John', 'Mary', 'Mary', 'Ben', 'July'),
age = c(18, 25, 19, 21, 35),
sex = c('M', 'F', 'F', 'M', 'F'),
stringsAsFactors = FALSE)
# Add "-duplicated" to every duplicated value (following Laterow's comment)
dup <- duplicated(df$name)
df$name[dup] <- paste0(df$name[dup], '-duplicated')
df
ID name age sex
1 1 John 18 M
2 2 Mary 25 F
3 3 Mary-duplicated 19 F
4 4 Ben 21 M
5 5 July 35 F

Turning one row into multiple rows in r [duplicate]

This question already has answers here:
Combine Multiple Columns Into Tidy Data [duplicate]
(3 answers)
Closed 5 years ago.
In R, I have data where each person has multiple session dates, and the scores on some tests, but this is all in one row. I would like to change it so I have multiple rows with the persons info, but only one of the session dates and corresponding test scores, and do this for every person. Also, each person may have completed different number of sessions.
Ex:
ID Name Session1Date Score Score Session2Date Score Score
23 sjfd 20150904 2 3 20150908 5 7
28 addf 20150905 3 4 20150910 6 8
To:
ID Name SessionDate Score Score
23 sjfd 20150904 2 3
23 sjfd 20150908 5 7
28 addf 20150905 3 4
28 addf 20150910 6 8
You can use melt from the devel version of data.table ie. v1.9.5. It can take multiple 'measure' columns as a list. Instructions to install are here
library(data.table)#v1.9.5+
melt(setDT(df1), measure = patterns("Date$", "Score(\\.2)*$", "Score\\.[13]"))
# ID Name variable value1 value2 value3
#1: 23 sjfd 1 20150904 2 3
#2: 28 addf 1 20150905 3 4
#3: 23 sjfd 2 20150908 5 7
#4: 28 addf 2 20150910 6 8
Or using reshape from base R, we can specify the direction as 'long' and varying as a list of column index
res <- reshape(df1, idvar=c('ID', 'Name'), varying=list(c(3,6), c(4,7),
c(5,8)), direction='long')
res
# ID Name time Session1Date Score Score.1
#23.sjfd.1 23 sjfd 1 20150904 2 3
#28.addf.1 28 addf 1 20150905 3 4
#23.sjfd.2 23 sjfd 2 20150908 5 7
#28.addf.2 28 addf 2 20150910 6 8
If needed, the rownames can be changed
row.names(res) <- NULL
Update
If the columns follow a specific order i.e. 3rd grouped with 6th, 4th with 7th, 5th with 8th, we can create a matrix of column index and then split to get the list for the varying argument in reshape.
m1 <- matrix(3:8,ncol=2)
lst <- split(m1, row(m1))
reshape(df1, idvar=c('ID', 'Name'), varying=lst, direction='long')
If your data frame name is data
Use this
data1 <- data[1:5]
data2 <- data[c(1,2,6,7,8)]
newdata <- rbind(data1,data2)
This works for the example you've given. You might have to change column names appropriately in data1 and data2 for a proper rbind

Resources