Rearrange data to align with data - r

I have the following dataframe in R.
df <- data.frame(
"DateValue" = c("2016-07-01", "2016-07-02", "2016-07-03", "2016-07-04", "2016-07-05", "2016-07-06","2017-07-01", "2017-07-02", "2017-07-03", "2017-07-04", "2017-07-05", "2017-07-06", "2018-07-01", "2018-07-02", "2018-07-03", "2018-07-04", "2018-07-05", "2018-07-06"),
"Age1" = seq(1:18),
"Age2" = c(seq(14,36,2), rep(NA, 6)),
"Age3" = c(seq(45,50),rep(NA, 12))
)
DateValue Age1 Age2 Age3
# 1 2016-07-01 1 14 45
# 2 2016-07-02 2 16 46
# 3 2016-07-03 3 18 47
# 4 2016-07-04 4 20 48
# 5 2016-07-05 5 22 49
# 6 2016-07-06 6 24 50
# 7 2017-07-01 7 26 NA
# 8 2017-07-02 8 28 NA
# 9 2017-07-03 9 30 NA
# 10 2017-07-04 10 32 NA
# 11 2017-07-05 11 34 NA
# 12 2017-07-06 12 36 NA
# 13 2018-07-01 13 NA NA
# 14 2018-07-02 14 NA NA
# 15 2018-07-03 15 NA NA
# 16 2018-07-04 16 NA NA
# 17 2018-07-05 17 NA NA
# 18 2018-07-06 18 NA NA
I am trying to come up with a code that aligns the data from the "Age2" and "Age3" columns so that the dates line up. Below is the output I am looking for:
df <- data.frame(
"DateValue" = c("07-01", "07-02", "07-03", "07-04", "07-05", "07-06"),
"Age1" = seq(13:18),
"Age2" = seq(26,36,2),
"Age3" = seq(45,50)
)
# DateValue Age1 Age2 Age3
# 1 07-01 13 26 45
# 2 07-02 14 28 46
# 3 07-03 15 30 47
# 4 07-04 16 32 48
# 5 07-05 17 34 49
# 6 07-06 18 36 50
I am essentially keeping all the dates and values for my current year (2018) and matching them with the dates for the previous years. Note that I may have more dates in my previous year. But I need to drop all the rows that do not have any data for the current year. I reviewed the following thread on SO on rearranging the dataframe but the context is quite different than my situation.
R Data Rearrange
I tried looking at the R reshape package but haven't had any luck. Any suggestions/ pointers would be appreciated.

Here's a non-robust solution:
df$DateValue = format(as.Date(df$DateValue), '%m-%d')
Age3_non_NA <- sum(!is.na(df[['Age3']]))
df <- as.data.frame(lapply(df, function(l) tail(na.omit(l), Age3_non_NA)))
df
DateValue Age1 Age2 Age3
1 07-01 13 26 45
2 07-02 14 28 46
3 07-03 15 30 47
4 07-04 16 32 48
5 07-05 17 34 49
6 07-06 18 36 50
And here's a more robust solution that includes gather and spread:
library(tidyr)
library(dplyr)
library(lubridate)
df%>%
mutate(DateValue = as.Date(DateValue),
Year = year(DateValue),
Mon_Day = format(DateValue, '%m-%d'))%>%
select(-DateValue)%>%
gather(Age, val, -Year, -Mon_Day, na.rm = T)%>%
group_by(Age, Mon_Day)%>%
filter(Year == max(Year))%>%
ungroup()%>%
select(-Year)%>%
spread(Age, val)
# A tibble: 6 x 4
Mon_Day Age1 Age2 Age3
<chr> <dbl> <dbl> <dbl>
1 07-01 13 26 45
2 07-02 14 28 46
3 07-03 15 30 47
4 07-04 16 32 48
5 07-05 17 34 49
6 07-06 18 36 50

Here's one way to do it. This could definitely be refactored, but it works.
library(dplyr)
# DateValue is a factor; convert to date format
df$DateValue <- as.Date(as.character(df$DateValue), format="%Y-%m-%d")
# grab the month and day from DateValue, sort by Age1
df <- df %>%
mutate(MonthAndDay = format(DateValue, "%m-%d")) %>%
arrange(desc(Age1))
# get vector of dates
dates <- df$MonthAndDay[which(!duplicated(df$MonthAndDay))]
# define age columns
agecols <- c("Age1","Age2","Age3")
# initialize empty df to be populated in loop
temp <- data.frame(MonthAndDay = dates)
# for each column, get values that a) are in the target dates, b) aren't NA, and c) only get the first ones (not duplicates--that's why we arranged by Age1 before). Select the values and add them as a new column to the new dataframe.
for (col in agecols) {
temp_col <- filter(df, MonthAndDay %in% dates & !is.na(df[,col]))
temp_col <- filter(temp_col[-which(duplicated(df$MonthAndDay)), ]) %>%
select(col)
temp[,col] <- temp_col
}
temp %>% arrange(MonthAndDay)
# MonthAndDay Age1 Age2 Age3
# 1 07-01 13 26 45
# 2 07-02 14 28 46
# 3 07-03 15 30 47
# 4 07-04 16 32 48
# 5 07-05 17 34 49
# 6 07-06 18 36 50

Using base R, here is one way
#Get age columns
age_cols <- grep("^Age", names(df))
#Convert date to actual object
df$DateValue <- as.Date(df$DateValue)
#Get year from date
df$year <- as.integer(format(df$DateValue, "%Y"))
#Get month-date from Date
df$month_date <- format(df$DateValue, "%m-%d")
#Select dates which are present in max year
subset_date <- with(df, month_date[year == max(year)])
#For each age_cols select the non NA values which match subset_date
cbind.data.frame(DateValue = subset_date,
sapply(df[age_cols], function(x) {
x <- x[order(df$year, decreasing = TRUE)]
x <- x[!is.na(x)]
x[match(subset_date, df$month_date)]
}))
# DateValue Age1 Age2 Age3
#1 07-01 13 26 45
#2 07-02 14 28 46
#3 07-03 15 30 47
#4 07-04 16 32 48
#5 07-05 17 34 49
#6 07-06 18 36 50

Related

R | Mutate with condition for multiple columns

I want to calculate the mean in a row if at least three out of six observations in the row are != NA. If four or more NA´s are present, the mean should show NA.
Example which gives me the mean, ignoring the NA´s:
require(dplyr)
a <- 1:10
b <- a+10
c <- a+20
d <- a+30
e <- a+40
f <- a+50
df <- data.frame(a,b,c,d,e,f)
df[2,c(1,3,4,6)] <- NA
df[5,c(1,4,6)] <- NA
df[8,c(1,2,5,6)] <- NA
df <- df %>% mutate(mean = rowMeans(df[,1:6], na.rm=TRUE))
I thought about the use of
case_when
but i´m not sure how to use it correctly:
df <- df %>% mutate(mean = case_when( ~ rowMeans(df[,1:6], na.rm=TRUE), TRUE ~ NA))
You can try a base R solution saving the number of non NA values in a new variable and then use ifelse() for the mean:
#Data
a <- 1:10
b <- a+10
c <- a+20
d <- a+30
e <- a+40
f <- a+50
df <- data.frame(a,b,c,d,e,f)
df[2,c(1,3,4,6)] <- NA
df[5,c(1,4,6)] <- NA
df[8,c(1,2,5,6)] <- NA
#Code
#Count number of non NA
df$count <- rowSums( !is.na( df [,1:6]))
#Compute mean
df$Mean <- ifelse(df$count>=3,rowMeans(df [,1:6],na.rm=T),NA)
Output:
a b c d e f count Mean
1 1 11 21 31 41 51 6 26.00000
2 NA 12 NA NA 42 NA 2 NA
3 3 13 23 33 43 53 6 28.00000
4 4 14 24 34 44 54 6 29.00000
5 NA 15 25 NA 45 NA 3 28.33333
6 6 16 26 36 46 56 6 31.00000
7 7 17 27 37 47 57 6 32.00000
8 NA NA 28 38 NA NA 2 NA
9 9 19 29 39 49 59 6 34.00000
10 10 20 30 40 50 60 6 35.00000
You could do:
library(dplyr)
df %>%
rowwise %>%
mutate(
mean = case_when(
sum(is.na(c_across())) < 4 ~ mean(c_across(), na.rm = TRUE),
TRUE ~ NA_real_)
) %>% ungroup()
Output:
# A tibble: 10 x 7
a b c d e f mean
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 11 21 31 41 51 26
2 NA 12 NA NA 42 NA NA
3 3 13 23 33 43 53 28
4 4 14 24 34 44 54 29
5 NA 15 25 NA 45 NA 28.3
6 6 16 26 36 46 56 31
7 7 17 27 37 47 57 32
8 NA NA 28 38 NA NA NA
9 9 19 29 39 49 59 34
10 10 20 30 40 50 60 35
This is leveraging rowwise and c_across which basically means operating on row level, so you can use vectorized functions such as sum, mean etc. in their usual way (also with case_when).
c_across also has a cols argument where you can specify which columns you want to take into account. For example, if you'd like to take into account columns 1:6, you can specify this as:
df %>%
rowwise %>%
mutate(
mean = case_when(
sum(is.na(c_across(1:6))) < 4 ~ mean(c_across(), na.rm = TRUE),
TRUE ~ NA_real_)
) %>% ungroup()
Alternatively, if you'd e.g. like to take into account all columns except column number 2, you would do c_across(-2). You can also use column names, e.g. for the first example c_across(a:f) (all columns) or for the second c_across(-b) (all columns except b).
This is implemented internally in dplyr, but you could also do usual vector subsetting with taking the whole c_across() (which defaults to all columns, i.e. everything()) and do e.g. c_across()[1:6] or c_across()[-2].
We can create an index first and then do the assignment based on the index
i1 <- rowSums(!is.na(df)) >=3
df$Mean[i1] <- rowMeans(df[i1,], na.rm = TRUE)
df
# a b c d e f Mean
#1 1 11 21 31 41 51 26.00000
#2 NA 12 NA NA 42 NA NA
#3 3 13 23 33 43 53 28.00000
#4 4 14 24 34 44 54 29.00000
#5 NA 15 25 NA 45 NA 28.33333
#6 6 16 26 36 46 56 31.00000
#7 7 17 27 37 47 57 32.00000
#8 NA NA 28 38 NA NA NA
#9 9 19 29 39 49 59 34.00000
#10 10 20 30 40 50 60 35.00000

Group_by / summarize by two variables within a function

I would like to write a function that summarize the provided data by some specified criteria, in this case by age
The example data is a table of users' age and their stats.
df <- data.frame('Age'=rep(18:25,2), 'X1'=10:17, 'X2'=28:35,'X4'=22:29)
Next I define the output columns that are relevant for the analysis
output_columns <- c('Age', 'X1', 'X2', 'X3')
This function computes the basic the sum of X1. X2 and X3 grouped by age.
aggr <- function(data, criteria, output_columns){
k <- data %>% .[, colnames(.) %in% output_columns] %>%
group_by_(.dots = criteria) %>%
#summarise_each(funs(count), age) %>%
summarize_if(is.numeric, sum)
return (k)
}
When I call it like this
> e <- aggr(df, "Age", output_columns)
> e
# A tibble: 8 x 3
Age X1 X2
<int> <int> <int>
1 18 20 56
2 19 22 58
3 20 24 60
4 21 26 62
5 22 28 64
6 23 30 66
7 24 32 68
8 25 34 70
I want to have another column called count which shows the number of observations in each age group. Desired output is
> desired
Age X1 X2 count
1 18 20 56 2
2 19 22 58 2
3 20 24 60 2
4 21 26 62 2
5 22 28 64 2
6 23 30 66 2
7 24 32 68 2
8 25 34 70 2
I have tried different ways to do that, e.g. tally(), summarize_each
etc. They all deliver wrong results.
I believe their should be an easy and simple way to do that.
Any help is appreciated.
Since you're already summing all variables, you can just add a column of all 1s before the summary function
aggr <- function(data, criteria, output_columns){
data %>%
.[, colnames(.) %in% output_columns] %>%
group_by_(.dots = criteria) %>%
mutate(n = 1L) %>%
summarize_if(is.numeric, sum)
}
# A tibble: 8 x 4
Age X1 X2 n
<int> <int> <int> <int>
1 18 20 56 2
2 19 22 58 2
3 20 24 60 2
4 21 26 62 2
5 22 28 64 2
6 23 30 66 2
7 24 32 68 2
8 25 34 70 2
We could create the 'count' column before summarise_if
aggr<- function(data, criteria, output_columns){
data %>%
select(intersect(names(.), output_columns))%>%
group_by_at(criteria)%>%
group_by(count = n(), add= TRUE) %>%
summarize_if(is.numeric,sum) %>%
select(setdiff(names(.), 'count'), count)
}
aggr(df,"Age",output_columns)
# A tibble: 8 x 4
# Groups: Age [8]
# Age X1 X2 count
# <int> <int> <int> <int>
#1 18 20 56 2
#2 19 22 58 2
#3 20 24 60 2
#4 21 26 62 2
#5 22 28 64 2
#6 23 30 66 2
#7 24 32 68 2
#8 25 34 70 2
In base R you could do
aggr <- function(data, criteria, output_columns){
ds <- data[, colnames(data) %in% output_columns]
d <- aggregate(ds, by=list(criteria), function(x) c(sum(x), length(x)))
"names<-"(do.call(data.frame, d)[, -c(2:3, 5)], c(names(ds), "n"))
}
> with(df, aggr(df, Age, output_columns))
Age X1 X2 n
1 18 20 56 2
2 19 22 58 2
3 20 24 60 2
4 21 26 62 2
5 22 28 64 2
6 23 30 66 2
7 24 32 68 2
8 25 34 70 2

Having a subset of a data set based on a specific condition

I have a data set which the values of "age" has different units (days, months, year). I want to convert the rows which their values are based on days and months to year. How I can do it in R?
If there is no letter after the number, then the unit is years.
If there is a ‘D’ after the number, then the unit is days (e.g. 10D means 10 days)
If there is an ‘M’ after the number, then the unit is months (e.g. 5M means 5 months).
Age <- c("33","32","44","54M","67M","34D","33D","44","77","88M","49 D","55D","11M")
ID <- c(1,2,3,4,5,6,7,8,9,10,11,12,13)
Data <- data.frame(ID,Age)
> Data
ID Age
1 1 33
2 2 32
3 3 44
4 4 54M
5 5 67M
6 6 34D
7 7 33D
8 8 44
9 9 77
10 10 88M
11 11 49 D
12 12 55D
13 13 11M
Here's a quick way in base R:
Data$units = ifelse(grepl("M", Data$Age), "month", ifelse(grepl("D", Data$Age), "day", "year"))
Data$value = as.numeric(gsub(pattern = "[^0-9]", replacement = "", Data$Age))
Data$result = with(Data,
ifelse(units == "year", value,
ifelse(units == "month", value / 12, value / 365.25)))
Data
# ID Age units value result
# 1 1 33 year 33 33.00000000
# 2 2 32 year 32 32.00000000
# 3 3 44 year 44 44.00000000
# 4 4 54M month 54 4.50000000
# 5 5 67M month 67 5.58333333
# 6 6 34D day 34 0.09308693
# 7 7 33D day 33 0.09034908
# 8 8 44 year 44 44.00000000
# 9 9 77 year 77 77.00000000
# 10 10 88M month 88 7.33333333
# 11 11 49 D day 49 0.13415469
# 12 12 55D day 55 0.15058179
# 13 13 11M month 11 0.91666667
And here's another option using tidyverse tools:
library(dplyr)
library(stringr)
Data %>%
mutate(Unit = str_extract(string = Age,pattern = "[DM]"),
Unit = if_else(is.na(Unit),'Y',Unit),
Age = as.numeric(gsub(pattern = "[MD]","",Age))) %>%
mutate(AgeYears = Age / c('Y' = 1,'M' = 12,'D' = 365)[Unit])
ID Age Unit AgeYears
1 1 33 Y 33.00000000
2 2 32 Y 32.00000000
3 3 44 Y 44.00000000
4 4 54 M 4.50000000
5 5 67 M 5.58333333
6 6 34 D 0.09315068
7 7 33 D 0.09041096
8 8 44 Y 44.00000000
9 9 77 Y 77.00000000
10 10 88 M 7.33333333
11 11 49 D 0.13424658
12 12 55 D 0.15068493
13 13 11 M 0.91666667
#baseR
Age <-c("33","32","44","54M","67M","34D","33D","44","77","88M","49 D","55D","11M")
AgeNum<- as.numeric(sub("\\s*\\D$","",Age))
Age[grepl("M$",Age)] <- AgeNum[grepl("M$",Age)]/12
Age[grepl("D$",Age)] <- AgeNum[grepl("D$",Age)]/365
Age <- as.numeric(Age)
result:
> Age
[1] 33.00000000 32.00000000 44.00000000 4.50000000 5.58333333 0.09315068 0.09041096 44.00000000
[9] 77.00000000 7.33333333 0.13424658 0.15068493 0.91666667
>
Additionally, a further solution using data.table:
> library(data.table)
> dt <- data.table(ID, Age)
> dt[, Unit := ifelse(grepl("D$", Age), "D", ifelse(grepl("M$", Age), "M", "Y"))][
, Age := as.integer(gsub("M|D", "", Age))]
> dt[, Age_in_years := ifelse(Unit == "Y", Age,
ifelse(Unit == "M", Age / 12, Age / 365.25))][]
ID Age Unit Age_in_years
1: 1 33 Y 33.00000000
2: 2 32 Y 32.00000000
3: 3 44 Y 44.00000000
4: 4 54 M 4.50000000
5: 5 67 M 5.58333333
6: 6 34 D 0.09308693
7: 7 33 D 0.09034908
8: 8 44 Y 44.00000000
9: 9 77 Y 77.00000000
10: 10 88 M 7.33333333
11: 11 49 D 0.13415469
12: 12 55 D 0.15058179
13: 13 11 M 0.91666667

How to place not available values with previous nearest with respect to corresponding value

Hi i am having data frame ,how to replace NA values in "Val_1" with respect to nearest value of Val_2
for e.g Val_1 at ID -4 value is missing and corresponding value of Val_2 is "33.3" we need to replace with nearest value in Val_2 i.e 45 (previous nearest value is 45) also ID-8 with 33 (nearest value of 44.6 is 44.5)
ID Date Val_1 Val_2
1 01-02-2014 NA 22
2 02-02-2014 23 NA
3 03-02-2014 45 33
4 04-02-2014 NA 33.3
5 05-02-2014 45 46
6 06-02-2014 33 44.5
7 07-02-2014 56 48
8 08-02-2014 NA 44.6
9 09-02-2014 10 43
10 10-02-2014 14 56
11 11-02-2014 NA NA
12 12-02-2014 22 22
we can replace NA value by
library(zoo)
na.locf(na.locf(DF$Val_1), fromLast = TRUE)
but above code replace with previous value from the same column
o/p :
ID Date Val_1 Val_2
1 01-02-2014 NA 22
2 02-02-2014 23 NA
3 03-02-2014 45 33
4 04-02-2014 45 33.3
5 05-02-2014 45 46
6 06-02-2014 33 44.5
7 07-02-2014 56 48
8 08-02-2014 33 44.6
9 09-02-2014 10 43
10 10-02-2014 14 56
11 11-02-2014 NA NA
12 12-02-2014 22 22
Thanks
Sorry but I couldn't think of any simpler way:
# To use pipes
library(dplyr)
# Give a threshold. Nearest values must have a difference below this threshold
diff.threshold <- 0.5
# Create a vector with IDs that must have Val_1 updated
IDtoReplace <- DF %>% filter(is.na(Val_1), !is.na(Val_2)) %>%
select(ID) %>%
unlist()
for (id in IDtoReplace){
# Get Val_2 from current id
curVal2 <- DF %>% filter(ID==id) %>% select(Val_2) %>% unlist()
# Get value to be input
valuetoinput <- DF %>% filter(!is.na(Val_1),!is.na(Val_2),ID < id) %>% # Filter out all NA values and keep only previous ID
mutate(diff = abs(Val_2-curVal2)) %>% # Calculate all the differentes
filter(diff==min(diff),diff<=diff.threshold) %>% # Keep row with minimum difference (it has to be below the threshold)
select(Val_1) %>% # Select Val_1
unlist()
# If any value is found, replace it in the data frame
if(length(valuetoinput)>0)
DF[which(DF$ID==id),"Val_1"] <- valuetoinput
}
And as result:
> DF
ID Date Val_1 Val_2
1 1 01-02-2014 NA 22.0
2 2 02-02-2014 23 NA
3 3 03-02-2014 45 33.0
4 4 04-02-2014 45 33.3
5 5 05-02-2014 45 46.0
6 6 06-02-2014 33 44.5
7 7 07-02-2014 56 48.0
8 8 08-02-2014 33 44.6
9 9 09-02-2014 10 43.0
10 10 10-02-2014 14 56.0
11 11 11-02-2014 NA NA
12 12 12-02-2014 22 22.0
Will you use something similar very often? If yes, I suggest you to rewrite the for loop as a function.

Combine two dataframes one above the other

I have two dataframes and I want to put one above the other "with" column names of second as a row of the new dataframe. Column names are different and one dataframe has more columns.
For example:
mydf1 <- data.frame(V1=c(1:5), V2=c(21:25))
mydf1
V1 V2
1 1 21
2 2 22
3 3 23
4 4 24
5 5 25
mydf2 <- data.frame(C1=c(1:10), C2=c(21:30),C3=c(41:50))
mydf2
C1 C2 C3
1 1 21 41
2 2 22 42
3 3 23 43
4 4 24 44
5 5 25 45
6 6 26 46
7 7 27 47
8 8 28 48
9 9 29 49
10 10 30 50
Result:
mydf
V1 V2
1 1 21 NA
2 2 22 NA
3 3 23 NA
4 4 24 NA
5 5 25 NA
6 C1 C2 C3
7 1 21 41
8 2 22 42
9 3 23 43
10 4 24 44
11 5 25 45
12 6 26 46
13 7 27 47
14 8 28 48
15 9 29 49
16 10 30 50
I dont care if all numeric values treated like characters.
Many thanks
You can do this easily without any packages:
mydf1 <- data.frame(V1=c(1:5), V2=c(21:25))
mydf1[,3] <- NA
names(mydf1) <- c("one", "two", "three")
mydf2 <- data.frame(C1=c(1:10), C2=c(21:30),C3=c(41:50))
names <- t(as.data.frame(names(mydf2)))
names <- as.data.frame(names)
names(mydf2) <- c("one", "two", "three")
names(names) <- c("one", "two", "three")
mydf3 <- rbind(mydf1, names)
mydf4 <- rbind(mydf3, mydf2)
> mydf4
one two three
1 1 21 <NA>
2 2 22 <NA>
3 3 23 <NA>
4 4 24 <NA>
5 5 25 <NA>
6 C1 C2 C3
7 1 21 41
8 2 22 42
9 3 23 43
10 4 24 44
11 5 25 45
12 6 26 46
13 7 27 47
14 8 28 48
15 9 29 49
16 10 30 50
>
Of course, you can edit the <- c("one", "two", "three") to make the final column names whatever you'd like. For example:
> mydf1 <- data.frame(V1=c(1:5), V2=c(21:25))
> mydf1[,3] <- NA
> names(mydf1) <- c("V1", "V2", "NA")
> mydf2 <- data.frame(C1=c(1:10), C2=c(21:30),C3=c(41:50))
> names <- t(as.data.frame(names(mydf2)))
> names <- as.data.frame(names)
> names(mydf2) <- c("V1", "V2", "NA")
> names(names) <- c("V1", "V2", "NA")
> mydf3 <- rbind(mydf1, names)
> mydf4 <- rbind(mydf3, mydf2)
> row.names(mydf4) <- NULL
> mydf4
V1 V2 NA
1 1 21 <NA>
2 2 22 <NA>
3 3 23 <NA>
4 4 24 <NA>
5 5 25 <NA>
6 C1 C2 C3
7 1 21 41
8 2 22 42
9 3 23 43
10 4 24 44
11 5 25 45
12 6 26 46
13 7 27 47
14 8 28 48
15 9 29 49
16 10 30 50
If you need to resort a package for any reason when scaling this up to your real use case, then try melt from reshape2 or the package plyr. However, use of a package shouldn't be necessary.
I don't know what you tried with write.table, but that seems to me like the way to go.
I would create a function something like this:
myFun <- function(...) {
L <- list(...)
temp <- tempfile()
maxCol <- max(vapply(L, ncol, 1L))
lapply(L, function(x)
suppressWarnings(
write.table(x, file = temp, row.names = FALSE,
sep = ",", append = TRUE)))
read.csv(temp, header = FALSE, fill = TRUE,
col.names = paste0("New_", sequence(maxCol)),
stringsAsFactors = FALSE)
}
Usage would then simply be:
myFun(mydf1, mydf2)
# New_1 New_2 New_3
# 1 V1 V2
# 2 1 21
# 3 2 22
# 4 3 23
# 5 4 24
# 6 5 25
# 7 C1 C2 C3
# 8 1 21 41
# 9 2 22 42
# 10 3 23 43
# 11 4 24 44
# 12 5 25 45
# 13 6 26 46
# 14 7 27 47
# 15 8 28 48
# 16 9 29 49
# 17 10 30 50
The function is written such that you can specify more than two data.frames as input:
mydf3 <- data.frame(matrix(1:8, ncol = 4))
myFun(mydf1, mydf2, mydf3)
# New_1 New_2 New_3 New_4
# 1 V1 V2
# 2 1 21
# 3 2 22
# 4 3 23
# 5 4 24
# 6 5 25
# 7 C1 C2 C3
# 8 1 21 41
# 9 2 22 42
# 10 3 23 43
# 11 4 24 44
# 12 5 25 45
# 13 6 26 46
# 14 7 27 47
# 15 8 28 48
# 16 9 29 49
# 17 10 30 50
# 18 X1 X2 X3 X4
# 19 1 3 5 7
# 20 2 4 6 8
Here's one approach with the rbind.fill function (part of the plyr package).
library(plyr)
setNames(rbind.fill(setNames(mydf1, names(mydf2[seq(mydf1)])),
rbind(names(mydf2), mydf2)), names(mydf1))
V1 V2 NA
1 1 21 <NA>
2 2 22 <NA>
3 3 23 <NA>
4 4 24 <NA>
5 5 25 <NA>
6 C1 C2 C3
7 1 21 41
8 2 22 42
9 3 23 43
10 4 24 44
11 5 25 45
12 6 26 46
13 7 27 47
14 8 28 48
15 9 29 49
16 10 30 50
Give this a try.
Assign the column names from the second data set to a vector, and then replace the second set's names with the names from the first set. Then create a list where the middle element is the vector you assigned. Now when you call rbind, it should be fine since everything is in the right order.
d1$V3 <- NA
nm <- names(d2)
names(d2) <- names(d1)
dc <- do.call(rbind, list(d1,nm,d2))
rownames(dc) <- NULL
dc

Resources