R: Reprint column headers when using rbind - r

Is there a way to reprint column headers directly below the last row of the first data set (directly above the second data set) when using rbind to put two data sets together? I have searched and searched but haven't seen any examples like this. Thanks!

I generated some example data (easier if you provide this yourself when asking the question). Basically, you take the column names of the second dataframe, convert these to a dataframe object. You also need to setNames function to give each dataframe that you want to rbind the same column names as the first dataframe.
df1 <- data.frame(one=c("a", "b"), two=c("c", "d"))
df1
# one two
#1 a c
#2 b d
df2 <- data.frame(three=c("e", "f"), four=c("g", "g"))
df2
# three four
#1 e g
#2 f g
rbind(df1,
setNames(as.data.frame(t(colnames(df2))), names(df1)),
setNames(df2, names(df1)))
# one two
#1 a c
#2 b d
#3 three four
#4 e g
#5 f g

A not so sophisticated but this will work for you.
Import both the data frames with keeping header = F
after that use
library(dplyr)
final<- bind_rows(df1,df2) ##this will bind both the data frames
names(final) <- final[1,] ##this will take 1st row as column names or header
final <- final[-1,] ##this will remove your 1st row which is not useful now.
This method will help you do your work.

Related

Function to move one observation to a different column in entire data frame list?

I am trying to move observations around to clean my data set. I am having trouble making a function that will move the first value into a row by itself and apply that value all the way down the data set.
X1
X2
A
D
B
E
C
F
This is what I want my data frames from the list to look like:
left
center
right
B
D
A
C
E
A
I just figured out a way , you probably need to specify the row | columns in sub setting;
x1 <- c("a","b","c")
x2 <- c("d","e","f")
df <- data.frame(x1,x2)
to get your desired output following code works with me
df_2 <- data.frame(left = df[2:3,1],center = df[1:2,2],right = df[1,1])

R: retrieve dataframe name from another dataframe

I have a dataframe dataselect that tells me what dataframe to use for each case of an analysis (let's call this the relevant dataframe).
The case is assigned dynamically, and therefore which dataframe is relevant depends on that case.
Based on the case, I would like to assign the relevant dataframe to a pointer "relevantdf". I tried:
datasetselect <- data.frame(case=c("case1","case2"),dataset=c("df1","df2"))
df1 <- data.frame(var1=letters[1:3],var2=1:3)
df2 <- data.frame(var1=letters[4:10],var2=4:10)
currentcase <- "case1"
relevantdf <- get(datasetselect[datasetselect$case == currentcase,"dataset"]) # relevantdf should point to df1
I don't understand if I have a problem with the get() function or the subsetting process.
You are almost there, the problem is that the dataset column from datasetselect is a factor, you just need to convert it to character
You can add this line after the definition of datasetselect:
datasetselect$dataset <- as.character(datasetselect$dataset)
And you get your expected output
> relevantdf
var1 var2
1 a 1
2 b 2
3 c 3

Count with conditional - dataframe

I would like to count how many times a observation appears with the condition one column is greater than another.
For example, how many times the "A", "B" and "C" apperead counting only if the column B is greater than colun C.
set.seed(20170524)
A <- rep(c("A","B","C"),5)
B <- round(runif(15,0,20),0)
C <- round(runif(15,1,5),0) + B
D <- as.data.frame(cbind(A,B,C))
D <- D[order(B),]
Thank you!
#firstly, those numbers got converted to factors, this is problematic.
D$B<-as.numeric(D$B)
D$C<-as.numeric(D$C)
#Then, get the counts for the A:
countA = sum(D$A=='A' & D$B < D$C)
Similarly for 'B' and 'C'
If there's many more than just categories "A,B,C" you might want to do a data.table for the by= option, but someone will probably be along to say that's overkill.
You can use: table(D$A[which(D$B>D$C)])
Note that when you do D <- as.data.frame(cbind(A,B,C)) you will get factors so either you transform B and C into numeric variables afterwards, or you just create directly a data.frame without passing through a matrix:
D <- data.frame(A,B,C)

order function only partially reordering dataframe

I have created a data frame using rbind() to append two data frames with the same row names together. I am then trying to use the order() function to order the factor levels alphabetically. However, it is still treating the data frames as two separate objects, and ordering the first alphabetically, and then the second alphabetically separately.
Example:
df1 <- data.frame(site=c("A", "F", "C"))
df2 <- data.frame(site=c("B", "G", "D"))
new.df <- rbind(df1, df2)
new.df <- new.df[order(new.df$site),]
outcome:
site
A
C
F
B
D
G
I have looked at other methods of reordering data, for example using the arrange function from package dplyr, but have not had any success. Any suggestions of how to fix this?
Any help much appreciated.
Thanks
Avoid creation of factors by
df1 <- data.frame(site=c("A", "F", "C"), stringsAsFactors = FALSE)
df2 <- data.frame(site=c("B", "G", "D"), stringsAsFactors = FALSE)
then the remaining stuff will work as expected.
I'm guessing you're not doing quite what you think you're doing there: the resulting new.df isn't a data frame any more, it's a factor. The result of order is to put it in the order of the levels of the factor (see levels(new.df$site). So, if you really want to do it this way (ie, keeping it as a factor rather than a character vector), you will need to reorder the levels first.
new.df$site <- factor(new.df$site, levels = sort(levels(new.df$site)))
new.df[order(new.df$site), ]
[1] A B C D F G
Levels: A B C D F G
But unless you really need it to be a factor from the start, I think you would be best advised to do what #Uwe Block suggests and, if necessary, turn it in to a factor after you've used rbind and done the sorting.

Select columns matching names in a list

I have a data.frame
DF1
a.x.c b.y.l c.z.n d.a.pl f.e.cl
which consists of numeric columns
I also have a list
DF2
a.x.c c.z.n f.e.cl
which contains certain names of columns in DF2
I need to create DF3 that would store only those columns of DF1 which have matching names in DF2.
I have tried which to find indexes of columns i need. But problem that i have long name list of columns and which become useless.
Could you please help. Thank you beforehand.
We can use intersect to get the names that are common in both the datasets and use that to subset the columns of 'DF1' to create 'DF3'.
DF3 <- DF1[intersect(names(DF1),names(DF2))]
DF3
# a.x.c c.z.n
#1 1 7
#2 2 8
#3 3 9
data
DF1 <- data.frame(a.x.c = 1:3, b.y.l= 4:6, c.z.n=7:9)
DF2 <- list(a.x.c= 1:5, c.z.n=8:15, z.l.y=22:29)

Resources