How to combine several dataframes with different rows using R?

How to combine several dataframes with different rows using R? - r

I have several text files containing 2 columns and different row numbers. I would like to follow drawing a plot using ggplot2 as explained enter link description here; however, it works well for dataframes with equal row numbers, and I couldn't reproduce it with dataframes with different row numbers.
please let me know how I should combine these data frames (dataframes with different row number) using R?
case siza
case1 129
case2 129
case3 130
case4 131
case5 132
case6 132
Thank you

It seems from the comments that you're actually trying to merge multiple columns and then plot each column individually. The problem, however, is that each of these columns has a different number of rows. Therefore you need to combine them based on some common variable (i.e. row names).
Using the examples from the link you provided:
df1 = data.frame(size=runif(300,300,1200))
#now adding an unequal column
df2 = data.frame(size=df1[c(1:275),])
Now merge the data frames based on row number. "all=TRUE" keeps all the values, "by=0" merges by row.names.
df.all=merge(df1$size,df2$size,by=0,all=TRUE)
#and to order the row names.
df.all=df.all[order(as.numeric(df.all[,1])),]
#finally if you want to remove the NA values
df.all[is.na(df.all)]=0
Does that get you the data.frame you want?

Related

Merge multiple dataframes with matching and different columns and put NA's

I have 5 dataframes with different subsets of variables. For example, the subset of the 5 A-Variables appear in dataframe 1 and 5. The subset of the 7 B-Variables appear in dataframe 1 and 4 and so on. A different number of persons did one of the 5 test-versions (thats why I have 5 dataframes)
Now, I want to merge the dataframes together. The colums shall have all variables of all dataframes. When a variable appeared in two dataframes, the values should be merged and appear in one column at the end. For all persons who did not see a variable because it was in another test, a "NA" should be in there at the end..
Do you guys have an idea?
Thank you very much in advance!

You'll probably need to do some combination of inner_join(), left_join(), right_join() etc.
Check this out, should have what you need... It's difficult to know exactly what you need without seeing the data.

how to divide the value in each cell of a .csv by the value in another cell across multiple rows and variables in R?

I have a .csv file of 39 variables and 713 rows, each containing a count of plastic items. I have another column which is the survey length, and I want to standardise each count of items by a survey length of 100. I am unsure how to create a loop to run through each row and cell individually to do this. Many also have NA values.
Any ideas would be great.
Thank you.

Consider applying formula directly on columns without need of looping:
# RETRIEVE ALL COLUMN NAMES (MINUS SURVEY LENGTH)
vars <- names(df)[!grepl("survey_length", names(df))]
# EXPAND SINGLE COLUMN TO EQUAL DIMENSION OF DATA FRAME
survey_length_mat <- matrix(df$survey_length, ncol=length(vars), nrow=nrow(df))
# APPLY FORMULA
df[vars] <- (df[vars] / survey_length_mat) * 100
df

how to create subsets of data. frame in R?

I have two data frames one with 94 rows and 167 columns (df_1) and the other one with 94 rows and 1 column (df_2) and I would like to do 167 different data frames with each column of the first data frame and the same column of the second data frame, I have tried with a for loop like the next
for (i in seq_len(ncol(df_1))){
df_[[i]] <- data.frame(df_1[sort(rownames(df_1)),i,df_2[sort(rownames(df_2)),])
}
But it does not work, can someone help me?

I think to join two df if they have similar column name use the below code
library(gtools)
df3<- smartbind(df1,df2) ####df1 with 167 column and df2 with 1 column
this will give you a single data frame and to create various data frames use the answer used. in the below thread:
How to loop through the columns in an R data frame and create a new data frame using the column name in each iteration?

Making a New R DataFrame from 2 Existing Ones

I have 2 data frames in R. One has 187 observations and one has 195. I need to create a new data frame consisting of only the 8 observations that are not common between the two. Data frame 1 (with 195 observations) is called merged. Data frame 2 (with 187 observations) is called merged 2013. There is a column called Country.Code in both data frames and each observation has a unique code that would separate it from the others. How can I complete this task? Please list a function and explain it if possible!
Thank you!

Try using logical indexing. This returns the subset of rows where the Country.Code's don't match:
merged[ !(merged$Country.Code %in% merged2013$Country.Code) , ]
Edited the names of the dataframes to match the question.

Split a dataframe into any number of smaller dataframes with no more than N number of rows

I have a number of dataframes, each with different numbers of rows. I want to break them all into smaller dataframes that have no more than 50 rows each, for example.
So, if I had a dataframe with 107 rows, I want to output the following:
A dataframe containing rows 1-50
A dataframe containing rows 51-100
A dataframe containing rows 101-107
I have been reading many examples using the split() function but I have not been able to find any usage of split() or any other solution to this that does not pre-define the number of dataframes to split into, or scrambles the order of the data, or introduce other problems.
This seems like such a simple task that I am surprised that I have not been able to find a solution.

Try:
split(df,(seq_len(nrow(df))-1) %/% 50)
What have in common the first 50 rows? If you make an integer division (%/%) of the index of row (less one) by 50, they all give 0 as result. As you can guess, rows 51-100 give 1 and so on. The (seq_len(nrow(df))-1) %/% 50 basically indicate the group you want to split into.