Creating a long table from a wide table using merged.stack (or reshape) - r

I have a data frame that looks like this:
ID rd_test_2011 rd_score_2011 mt_test_2011 mt_score_2011 rd_test_2012 rd_score_2012 mt_test_2012 mt_score_2012
1 A 80 XX 100 NA NA BB 45
2 XX 90 NA NA AA 80 XX 80
I want to write a script that would, for IDs that don't have NA's in the yy_test_20xx columns, create a new data frame with the subject taken from the column title, the test name, the test score and year taken from the column title. So, in this example ID 1 would have three entries. Expected output would look like this:
ID Subject Test Score Year
1 rd A 80 2011
1 mt XX 100 2012
1 mt BB 45 2012
2 rd XX 90 2011
2 rd AA 80 2012
2 mt XX 80 2012
I've tried both reshape and various forms of merged.stack which works in the sense that I get an output that is on the road to being right but I can't understand the inputs well enough to get there all the way:
library(splitstackshape)
merged.stack(x, id.vars='id', var.stubs=c("rd_test","mt_test"), sep="_")
I've had more success (gotten closer) with reshape:
y<- reshape(x, idvar="id", ids=1:nrow(x), times=grep("test", names(x), value=TRUE),
timevar="year", varying=list(grep("test", names(x), value=TRUE), grep("score",
names(x), value=TRUE)), direction="long", v.names=c("test", "score"),
new.row.names=NULL)

This will get your data into the right format:
df.long = reshape(df, idvar="ID", ids=1:nrow(df), times=grep("Test", names(df), value=TRUE),
timevar="Year", varying=list(grep("Test", names(df), value=TRUE),
grep("Score", names(df), value=TRUE)), direction="long", v.names=c("Test", "Score"),
new.row.names=NULL)
Then omitting NA:
df.long = df.long[!is.na(df.long$Test),]
Then splitting Year to remove Test_:
df.long$Year = sapply(strsplit(df.long$Year, "_"), `[`, 2)
And ordering by ID:
df.long[order(df.long$ID),]
ID Year Test Score
1 1 2011 A 80
5 1 2012 XX 100
2 2 2011 XX 90
9 2 2013 AA 80
6 3 2012 A 10
3 4 2011 A 50
7 4 2012 XX 60
10 4 2013 AA 99
4 5 2011 C 50
8 5 2012 A 75

Using reshape:
dat.long <- reshape(dat, direction="long", varying=list(c(2, 4,6), c(3, 5,7)),
times=2011:2013,timevar='Year',
sep="_", v.names=c("Test", "Score"))
dat.long[complete.cases(dat.long),]
ID Year Test Score id
1.2011 1 2011 A 80 1
2.2011 2 2011 XX 90 2
4.2011 4 2011 A 50 4
5.2011 5 2011 C 50 5
1.2012 1 2012 XX 100 1
3.2012 3 2012 A 10 3
4.2012 4 2012 XX 60 4
5.2012 5 2012 A 75 5
2.2013 2 2013 AA 80 2
4.2013 4 2013 AA 99 4

Considering your update, I've entirely rewritten this answer. View the history if you want to see the old version.
The main problem is that your data is "double wide" in a ways. Thus, you can actually solve your problem by reshaping in the "long" direction twice. Alternatively, use melt and *cast to melt your data in a very long format and convert it to a semi-wide format.
However, I would still suggest "splitstackshape" (and not just because I wrote it). It can handle this problem fine, but it needs you to rearrange your names of your data. The part of the name that will result in the names of the new columns should come first. In your example, that means "test" and "score" should be the first part of the variable name.
For this, we can use some gsub to rearrange the existing names.
library(splitstackshape)
setnames(mydf, gsub("(rd|mt)_(score|test)_(.*)", "\\2_\\1_\\3", names(mydf)))
names(mydf)
# [1] "ID" "test_rd_2011" "score_rd_2011" "test_mt_2011"
# [5] "score_mt_2011" "test_rd_2012" "score_rd_2012" "test_mt_2012"
# [9] "score_mt_2012"
out <- merged.stack(mydf, "ID", var.stubs=c("test", "score"), sep="_")
setnames(out, c(".time_1", ".time_2"), c("Subject", "Year"))
out[complete.cases(out), ]
# ID Subject Year test score
# 1: 1 mt 2011 XX 100
# 2: 1 mt 2012 BB 45
# 3: 1 rd 2011 A 80
# 4: 2 mt 2012 XX 80
# 5: 2 rd 2011 XX 90
# 6: 2 rd 2012 AA 80
For the benefit of others, "mydf" in this answer is defined as:
mydf <- structure(list(ID = 1:2, rd_test_2011 = c("A", "XX"),
rd_score_2011 = c(80L, 90L), mt_test_2011 = c("XX", NA),
mt_score_2011 = c(100L, NA), rd_test_2012 = c(NA, "AA"),
rd_score_2012 = c(NA, 80L), mt_test_2012 = c("BB", "XX"),
mt_score_2012 = c(45L, 80L)),
.Names = c("ID", "rd_test_2011", "rd_score_2011", "mt_test_2011",
"mt_score_2011", "rd_test_2012", "rd_score_2012", "mt_test_2012",
"mt_score_2012"), class = "data.frame", row.names = c(NA, -2L))

Related

converting an abbreviation into a full word

I am trying to avoid writing a long nested ifelse statement in excel.
I am working on two datasets, one where I have abbreviations and county names.
Abbre
COUNTY_NAME
1 AD Adams
2 AS Asotin
3 BE Benton
4 CH Chelan
5 CM Clallam
6 CR Clark
And another data set that contains the county abbreviation and votes.
CountyCode Votes
1 WM 97
2 AS 14
3 WM 163
4 WM 144
5 SJ 21
For the second table, how do I convert the countycode (abbreviation) into the full spelled-out text and add that as a new column?
I have been trying to solve this unsuccessfully using grep, match, and %in%. Clearly I am missing something and any insight would be greatly appreciated.
We can use a join
library(dplyr)
library(tidyr)
df2 <- df2 %>%
left_join(Abbre %>%
separate(COUNTY_NAME, into = c("CountyCode", "FullName")),
by = "CountyCode")
Or use base R
tmp <- read.table(text = Abbre$COUNTY_NAME, header = FALSE,
col.names = c("CountyCode", "FullName"))
df2 <- merge(df2, tmp, by = 'CountyCode', all.x = TRUE)
Another base R option using match
df2$COUNTY_NAME <- with(
df1,
COUNTY_NAME[match(df2$CountyCode, Abbre)]
)
gives
> df2
CountyCode Votes COUNTY_NAME
1 WM 97 <NA>
2 AS 14 Asotin
3 WM 163 <NA>
4 WM 144 <NA>
5 SJ 21 <NA>
A data.table option
> setDT(df1)[setDT(df2), on = .(Abbre = CountyCode)]
Abbre COUNTY_NAME Votes
1: WM <NA> 97
2: AS Asotin 14
3: WM <NA> 163
4: WM <NA> 144
5: SJ <NA> 21

Programmatically Finding, Correcting IDs in Dataframes with Different Column and Row Lengths

I have two data frames of differing lengths and widths. Both contain panel data on sites across several years, with each site having a unique ID code. However, these unique ID codes were altered for some sites between data frames. For example:
Year <- c(2006,2006,2006,2006)
Name <- as.character(c("A","B","C","D.B"))
Qtr.2 <- as.numeric(c(14,32,62,40))
Code <- as.character(c(123,456,789,101))
DF1 <- data.frame(Year,Name,Qtr.2,Code,stringsAsFactors = FALSE)
Year2 <- c(2007,2007,2007,2007,2007,2007)
Name2 <- as.character(c("A","B","C","E","D.B","D.A"))
Qtr.3 <- as.numeric(c(14,32,62,11,40,20))
Code2 <- as.character(c("W33","456","789","121","W133","W111"))
Type <- as.character(c("Blue","Red","Red","Green","Blue","Red"))
DF2 <- data.frame(Year2,Name2,Qtr.3,Code2,Type,stringsAsFactors = FALSE)
> DF1
Year Name Qtr.2 Code
1 2006 A 14 123
2 2006 B 32 456
3 2006 C 62 789
4 2006 D.B 40 101
> DF2
Year2 Name2 Qtr.3 Code2 Type
1 2007 A 14 W33 Blue
2 2007 B 32 456 Red
3 2007 C 62 789 Red
4 2007 E 11 121 Green
5 2007 D.B 40 W133 Blue
6 2007 D.A 20 W111 Red
Here, site “A's” code has changed from “123” in DF1 to “W33” in DF2.
I am having trouble programmatically finding and converting the altered ID codes to match their prior ID code. In other words, I want to match names from DF1 to DF2, and replace "Code2" in DF2 with "Code" from DF1 when a matching name is discovered. My approach thus far has involved a rather convoluted padding and for loop process. However, I feel this must be a semiregular wrangling problem and there must be a simpler approach.
Ideally, my second DF would look as follows:
Year2_fixed <- c(2007,2007,2007,2007,2007,2007)
Name2_fixed <- as.character(c("A","B","C","E","D.B","D.A"))
Qtr.3_fixed <- as.numeric(c(14,32,62,11,40,20))
Code2_fixed <- as.character(c("123","456","789","121","101","W111"))
Type <- as.character(c("Blue","Red","Red","Green","Blue","Red"))
DF2_fixed <-data.frame(Year2_fixed,Name2_fixed,Qtr.3_fixed,Code2_fixed,Type,stringsAsFactors = FALSE)
> DF2_fixed
Year2_fixed Name2_fixed Qtr.3_fixed Code2_fixed Type
1 2007 A 14 123 Blue
2 2007 B 32 456 Red
3 2007 C 62 789 Red
4 2007 E 11 121 Green
5 2007 D.B 40 101 Blue
6 2007 D.A 20 W111 Red
I have done some looking but I haven't found a clear answer on OS that gets at this problem. It is possible I am not asking the question clearly enough in searches. Please point it out if it is out there, or let me know if I can clarify my question.
A few last points: I want to be able to perform an inner_join BY the code, preserving those observations that appear in both sets. I am providing a toy example, but, as is often the case, the true problem is too large to manually check these names.
Edit
As pointed out by others, stringAsFactors = FALSE has been added to prevent error.
Try using the match command:
DF2 <- within(DF2, {
ind <- match(Name2, DF1$Name)
new_code <- DF1$Code[ind]
Code_fixed <- ifelse(is.na(ind), as.character(Code2), as.character(new_code))
rm(ind, new_code)
})
DF2
A solution is to use dplyr::coalesce along with left_join to get the desired result.
library(dplyr)
DF2 %>% left_join(select(DF1, Name, Code), by=c("Name2" = "Name")) %>%
mutate(Code2 = coalesce(Code, Code2)) %>%
select(-Code)
# Year2 Name2 Qtr.3 Code2 Type
# 1 2007 A 14 123 Blue
# 2 2007 B 32 456 Red
# 3 2007 C 62 789 Red
# 4 2007 E 11 121 Green
# 5 2007 D.B 40 101 Blue
# 6 2007 D.A 20 W111 Red
Note: stringsAsFactors = FALSE has been added in OP's code to create data.frames, otherwise it would generate unnecessary warnings.
Data:
Year <- c(2006,2006,2006,2006)
Name <- as.character(c("A","B","C","D.B"))
Qtr.2 <- as.numeric(c(14,32,62,40))
Code <- as.character(c(123,456,789,101))
DF1 <- data.frame(Year,Name,Qtr.2,Code, stringsAsFactors = FALSE)
Year2 <- c(2007,2007,2007,2007,2007,2007)
Name2 <- as.character(c("A","B","C","E","D.B","D.A"))
Qtr.3 <- as.numeric(c(14,32,62,11,40,20))
Code2 <- as.character(c("W33","456","789","121","W133","W111"))
Type <- as.character(c("Blue","Red","Red","Green","Blue","Red"))
DF2 <- data.frame(Year2,Name2,Qtr.3,Code2,Type, stringsAsFactors = FALSE)

R Cleaning and reordering names/serial numbers in data frame

Let's say I have a data frame as follows in R:
Data <- data.frame("SerialNum" = character(), "Year" = integer(), "Name" = character(), stringsAsFactors = F)
Data[1,] <- c("983\n837\n424\n ", 2015, "Michael\nLewis\nPaul\n ")
Data[2,] <- c("123\n456\n789\n136", 2014, "Elaine\nJerry\nGeorge\nKramer")
Data[3,] <- c("987\n654\n321\n975\n ", 2010, "John\nPaul\nGeorge\nRingo\nNA")
Data[4,] <- c("424\n983\n837", 2015, "Paul\nMichael\nLewis")
Data[5,] <- c("456\n789\n123\n136", 2014, "Jerry\nGeorge\nElaine\nKramer")
What I want to do is the following:
Split up each string of names and each string of serial numbers so that they are their own vectors (or a list of string vectors).
Eliminate any character "NA" in either set of vectors or any blank spaces denoted by "...\n ".
Reorder each list of names alphabetically and reorder the corresponding serial numbers according to the same permutation.
Concatenate each vector in the same fashion it was originally (I usually do this with paste(., collapse = "\n")).
My issue is how to do this without using a for loop. What is an object-oriented way to do this? As a first attempt in this direction I originally made a list by the command LIST <- strsplit(Data$Name, split = "\n") and from here I need a for loop in order to find the permutations of the names, which seems like a process that won't scale according to my actual data. Additionally, once I make the list LIST I'm not sure how I go about removing NA symbols or blank spaces. Any help is appreciated!
Using lapply I take each row of the data frame and turn it into a new data frame with one name per row. This creates a list of 5 data frames, one for each row of the original data frame.
seinfeld = lapply(1:nrow(Data), function(i) {
# Turn strings into data frame with one name per row
dat = data.frame(SerialNum=unlist(strsplit(Data[i,"SerialNum"], split="\n")),
Year=Data[i,"Year"],
Name=unlist(strsplit(Data[i,"Name"], split="\n")))
# Get rid of empty strings and NA values
dat = dat[!(dat$Name %in% c(""," ","NA")), ]
# Order alphabetically
dat = dat[order(dat$Name), ]
})
UPDATE: Based on your comment, let me know if this is the result you're trying to achieve:
seinfeld = lapply(1:nrow(Data), function(i) {
# Turn strings into data frame with one name per row
dat = data.frame(SerialNum=unlist(strsplit(Data[i,"SerialNum"], split="\n")),
Name=unlist(strsplit(Data[i,"Name"], split="\n")))
# Get rid of empty strings and NA values
dat = dat[!(dat$Name %in% c(""," ","NA")), ]
# Order alphabetically
dat = dat[order(dat$Name), ]
# Collapse back into a single row with the new sort order
dat = data.frame(SerialNum=paste(dat[, "SerialNum"], collapse="\n"),
Year=Data[i, "Year"],
Name=paste(dat[, "Name"], collapse="\n"))
})
do.call(rbind, seinfeld)
SerialNum Year Name
1 837\n983\n424 2015 Lewis\nMichael\nPaul
2 123\n789\n456\n136 2014 Elaine\nGeorge\nJerry\nKramer
3 321\n987\n654\n975 2010 George\nJohn\nPaul\nRingo
4 837\n983\n424 2015 Lewis\nMichael\nPaul
5 123\n789\n456\n136 2014 Elaine\nGeorge\nJerry\nKramer
eipi10 offered a great answer. In addition to that, I'd like to leave what I tried mainly with data.table. First, I split two columns (i.e., SerialNum and Name) with cSplit(), added an index with add_rownames(), and split the data by the index. In the first lapply(), I used Stacked() from the splitstackshape package. I stacked SerialNum and Name; separated SeriaNum and Name become two columns, as you see in a part of temp2. In the second lapply(), I used merge from the data.table package. Then, I removed rows with NAs (lapply(na.omit)), combined all data tables (rbindlist), and changed order of rows by rowname, which is row number of the original data) and Name (setorder(rowname, Name))
library(data.table)
library(splitstackshape)
library(dplyr)
cSplit(mydf, c("SerialNum", "Name"), direction = "wide",
type.convert = FALSE, sep = "\n") %>%
add_rownames %>%
split(f = .$rowname) -> temp
#a part of temp
#$`1`
#Source: local data frame [1 x 12]
#
#rowname Year SerialNum_1 SerialNum_2 SerialNum_3 SerialNum_4 SerialNum_5 Name_1 Name_2
#(chr) (dbl) (chr) (chr) (chr) (chr) (chr) (chr) (chr)
#1 1 2015 983 837 424 NA NA Michael Lewis
#Variables not shown: Name_3 (chr), Name_4 (chr), Name_5 (chr)
lapply(temp, function(x){
Stacked(x, var.stubs = c("SerialNum", "Name"), sep = "_")
}) -> temp2
# A part of temp2
#$`1`
#$`1`$SerialNum
# rowname Year .time_1 SerialNum
#1: 1 2015 1 983
#2: 1 2015 2 837
#3: 1 2015 3 424
#4: 1 2015 4 NA
#5: 1 2015 5 NA
#
#$`1`$Name
# rowname Year .time_1 Name
#1: 1 2015 1 Michael
#2: 1 2015 2 Lewis
#3: 1 2015 3 Paul
#4: 1 2015 4 NA
#5: 1 2015 5 NA
lapply(1:nrow(mydf), function(x){
merge(temp2[[x]]$SerialNum, temp2[[x]]$Name, by = c("rowname", "Year", ".time_1"))
}) %>%
lapply(na.omit) %>%
rbindlist %>%
setorder(rowname, Name) -> out
print(out)
# rowname Year .time_1 SerialNum Name
# 1: 1 2015 2 837 Lewis
# 2: 1 2015 1 983 Michael
# 3: 1 2015 3 424 Paul
# 4: 2 2014 1 123 Elaine
# 5: 2 2014 3 789 George
# 6: 2 2014 2 456 Jerry
# 7: 2 2014 4 136 Kramer
# 8: 3 2010 3 321 George
# 9: 3 2010 1 987 John
#10: 3 2010 2 654 Paul
#11: 3 2010 4 975 Ringo
#12: 4 2015 3 837 Lewis
#13: 4 2015 2 983 Michael
#14: 4 2015 1 424 Paul
#15: 5 2014 3 123 Elaine
#16: 5 2014 2 789 George
#17: 5 2014 1 456 Jerry
#18: 5 2014 4 136 Kramer
DATA
mydf <- structure(list(SerialNum = c("983\n837\n424\n ", "123\n456\n789\n136",
"987\n654\n321\n975\n ", "424\n983\n837", "456\n789\n123\n136"
), Year = c(2015, 2014, 2010, 2015, 2014), Name = c("Michael\nLewis\nPaul\n ",
"Elaine\nJerry\nGeorge\nKramer", "John\nPaul\nGeorge\nRingo\nNA",
"Paul\nMichael\nLewis", "Jerry\nGeorge\nElaine\nKramer")), .Names = c("SerialNum",
"Year", "Name"), row.names = c(NA, -5L), class = "data.frame")

melt multiple groups of measure.vars

I have a data.table containing a number of variables across multiple years, i.e:
> dt <- data.table(id=1:3, A_2011=rnorm(3), A_2012=rnorm(3),
B_2011=rnorm(3), B_2012=rnorm(3),
C_2011=rnorm(3), C_2012=rnorm(3))
> dt
id A_2011 A_2012 B_2011 B_2012 C_2011 C_2012
1: 1 -0.8262134 0.832013744 -2.320136 0.1275409 -0.1344309 0.7360329
2: 2 0.9350433 0.279966534 -0.725613 0.2514631 1.0246772 -0.2009985
3: 3 1.1520396 -0.005775964 1.376447 -1.2826486 -0.8941282 0.7513872
I would like to melt this table into variable groups by year, i.e:
> dtLong <- data.table(id=rep(dt[,id], 2), year=c(rep(2011, 3), rep(2012, 3)),
A=c(dt[,A_2011], dt[,A_2012]),
B=c(dt[,B_2011], dt[,B_2012]),
C=c(dt[,C_2011], dt[,C_2012]))
> dtLong
id year A B C
1: 1 2011 -0.826213405 -2.3201355 -0.1344309
2: 2 2011 0.935043336 -0.7256130 1.0246772
3: 3 2011 1.152039595 1.3764468 -0.8941282
4: 1 2012 0.832013744 0.1275409 0.7360329
5: 2 2012 0.279966534 0.2514631 -0.2009985
6: 3 2012 -0.005775964 -1.2826486 0.7513872
I can easily do this for one set of variables easily using melt.data.frame from the reshape2 package:
> melt(dt[,list(id, A_2011, A_2012)], measure.vars=c("A_2011", "A_2012"))
But haven't been able to achieve this for multiple measure.vars with a common "factor".
You can do this easily with reshape from base R
reshape(dt, varying = 2:7, sep = "_", direction = 'long')
This will give you the following output
id time A B C
1.2011 1 2011 -0.1602428 0.428154271 0.384892382
2.2011 2 2011 1.4493949 0.178833067 2.404267878
3.2011 3 2011 -0.1952697 1.072979813 -0.653812311
1.2012 1 2012 1.7151334 0.007261567 1.521799983
2.2012 2 2012 1.0866426 0.060728118 -1.158503305
3.2012 3 2012 1.0584738 -0.508854175 -0.008505982
From ?melt samples:
melt(DT, id=1:2, measure=patterns("^f_", "^d_"), value.factor=TRUE)

Reshaping repeated measures data in R wide to long

I need to convert a "wide" dataframe of annually repeated measures on individuals into "long" format so that I can model it like lm(y_year2 ~ x_year1) as well as lm(z_year2 ~ y_year2)
I can get it into the format I want "by hand" but cannot get figure out how to melt/dcast it into the shape I want
Below I've illustrated what I'm doing with some simulated data
The dataframe is like this in wide format, one individual per line
ID SITE L_03 M_03 R_03 L_04 M_04 R_04 L_05 M_05 R_05
1 forest X a YES Y b YES Z c NO
2 forest ...
I'd like it in LONG format:
ID SITE L_year1 L_year2 M_year1 M_year2 R_year1 R_year2 year1 year2
1 forest Z Y a b YES YES 03 04
1 forest Y Z b c YES NO 04 05
2 forest ...
2 forest ...
Some Simulated data:
L and M are numeric (length & mass), R is a Yes/No factor (reproductive), 3 years of repeated measurements (2003-2005)
ID <- 1:10; SITE <- c(rep("forest",3), rep("swamp",3), rep("field",4))
L_03 <- round(rnorm(10, 100, 1),3) ; M_03 <- round((10 + L_03*0.25 + rnorm(10, 0, 1)), 3)
R_03 <- sample(c("Yes", "No"), 10, replace = TRUE) ; L_04 <- round((2 + L_03*1.25 + rnorm(10, 1,10)), 3)
M_04 <- round((10 + L_04*0.25 + rnorm(10, 0,10)), 3) ;R_04 <- sample(c("Yes", "No"), 10, replace = TRUE)
L_05 <- round((2 + L_04*1.25 + rnorm(10, 1,10)),3) ; M_05 <- round((10 + L_05*0.25 + abs(rnorm(10, 0,10))),3)
R_05 <- sample(c("Yes", "No"), 10, replace = TRUE); rm_data <- data.frame(ID, SITE, L_03, M_03, R_03, L_04, M_04,R_04, L_05, M_05, R_05)
Approach 1: My ad hoc reshaping "by hand" with rbind
1st, make subset with 2003 & 2004 data, then another w/ 2004 & 2005
rm_data1 <- cbind(rm_data[ ,c(1,2,3:5, 6:8)], rep(2003,10), rep(2004,10))
rm_data2 <- cbind(rm_data[ ,c(1,2,6:8, 9:11)],rep(2004,10), rep(2005,10))
names(rm_data1)[3:10]<- c("L1", "M1", "R1", "L2", "M2", "R2", "yr1", "yr2")
names(rm_data2)[3:10]<- c("L1", "M1", "R1", "L2", "M2", "R2", "yr1", "yr2")
data3 <- rbind(rm_data1, rm_data2)
Approach 2?: I'd like to do this with reshape/melt/dcast. I can't figure out if I can use dcast directly on the wide dataframe or, once I melt it, how to dcast it into the format I want.
library(reshape2)
rm_measure_vars <- c("L_03", "M_03", "R_03", "L_04", "M_04","R_04", "L_05", "M_05", "R_05")
rm_data_melt <- melt(data = rm_data, id.vars = c("ID", "SITE"), measure.vars = rm_measure_vars, value.name = "data")
I add a designator of the year the measurement was taken to the melted data
obs_year <- gsub("(.*)([0-9]{2})", "\\2", rm_data_melt$variable)
rm_data_melt <- cbind(rm_data_melt, obs_year)
The dcast seems like it should be something like this, but this is not yet what I need
dcast(data = rm_data_melt, formula = ID + SITE + obs_year ~ variable)
ID SITE obs_year L_03 M_03 R_03 L_04 M_04 R_04 L_05 M_05 R_05
1 1 forest 03 99.96 35.364 No <NA> <NA> <NA> <NA> <NA> <NA>
2 1 forest 04 <NA> <NA> <NA> 129.595 47.256 Yes <NA> <NA> <NA>
3 1 forest 05 <NA> <NA> <NA> <NA> <NA> <NA> 177.607 58.204 Yes
Any suggestions would be greatly appreciated
I gave it some try. The reshape is the easy part. The rest needs some semi-manual handling, I believe. The following should give you what you want.
output <- reshape(rm_data, idvar=c("ID","SITE"), varying=3:11,
v.names=c("L_","M_","R_"), direction="long")
output$time <- output$time + 2 # to get the year
names(output)[3:6] <- c("year1", "L_year1", "M_year1", "R_year1")
output$year2 <- output$year1+1
rownames(output) <- c()
sapply(output[,4:6], function(x) {
i <- ncol(output)+1
output[,i] <<- x[c(2:length(x), NA)]
names(output)[i] <<- sub("1","2",names(output)[i-4])
})
output <- output[,c(1,2,4,8,5,9,6,10,3,7)] # rearrange columns as necessary
Hope this helps!
Install onetree packages.
devtools::install_github("yikeshu0611/onetree")
library(onetree)
3 steps, using onetree package
1 step
reshape the data to a long data
long1=reshape_toLong(data = rm_data,
id = "ID",
j = "year",
value.var.prefix = c("L_","M_","R_"))
2nd step
drop 5 year, choose 3 and 4 year; duplicated year as y
long2=long1[long1$year!=5,]
long2$y=long2$year
reshape long2 to a wide data by year
wide1=reshape_toWide(data = long2,
id = "ID",
j = "year",
value.var.prefix = c("L_","M_","R_","y")
)
Now, we get data with year 3 and year 4, whic is year1 and year2 in your purpose data.
So we repalce 3 with 1, 4 with 2 in the colnames.
colnames(wide1)=gsub(3,1,colnames(wide1))
colnames(wide1)=gsub(4,2,colnames(wide1))
3rd step
do 2nd step again, this time, we drop year3, we choose year4 and year5.
long3=long1[long1$year!=3,]
long3$y=long3$year
wide2=reshape_toWide(data = long3,
id = "ID",
j = "year",
value.var.prefix = c("L_","M_","R_","y")
)
colnames(wide2)=gsub(4,1,colnames(wide2))
colnames(wide2)=gsub(5,2,colnames(wide2))
last
rbind wide1 and wide2
data=rbind(wide1,wide2)
data[order(data$ID),]
ID SITE L_1 M_1 R_1 y1 L_2 M_2 R_2 y2
1 1 forest 100.181 34.279 Yes 3 131.88 50.953 No 4
11 1 forest 131.88 50.953 No 4 158.642 50.255 No 5
2 2 forest 101.645 36.667 Yes 3 123.923 43.915 No 4
12 2 forest 123.923 43.915 No 4 163.81 55.979 No 5
3 3 forest 98.961 33.901 Yes 3 125.928 41.611 No 4
13 3 forest 125.928 41.611 No 4 165.865 57.417 No 5
4 4 swamp 100.807 36.254 No 3 117.856 48.634 Yes 4
14 4 swamp 117.856 48.634 Yes 4 137.487 50.945 No 5
5 5 swamp 99.75 33.881 No 3 132.419 50.563 Yes 4
15 5 swamp 132.419 50.563 Yes 4 168.461 58.373 Yes 5
6 6 swamp 100.463 34.859 Yes 3 122.884 40.301 No 4
16 6 swamp 122.884 40.301 No 4 152.85 57.491 No 5
7 7 field 102.527 34.521 No 3 123.363 35.935 No 4
17 7 field 123.363 35.935 No 4 168 55.692 No 5
8 8 field 99.957 35.236 Yes 3 139.083 34.793 No 4
18 8 field 139.083 34.793 No 4 177.648 62.638 Yes 5
9 9 field 100.16 36.454 No 3 135.468 45.115 Yes 4
19 9 field 135.468 45.115 Yes 4 180.666 57.233 No 5
10 10 field 100.037 35.612 No 3 139.165 46.95 No 4
20 10 field 139.165 46.95 No 4 169.333 55.782 Yes 5

Resources