how to create subsets of data. frame in R? - r

I have two data frames one with 94 rows and 167 columns (df_1) and the other one with 94 rows and 1 column (df_2) and I would like to do 167 different data frames with each column of the first data frame and the same column of the second data frame, I have tried with a for loop like the next
for (i in seq_len(ncol(df_1))){
df_[[i]] <- data.frame(df_1[sort(rownames(df_1)),i,df_2[sort(rownames(df_2)),])
}
But it does not work, can someone help me?

I think to join two df if they have similar column name use the below code
library(gtools)
df3<- smartbind(df1,df2) ####df1 with 167 column and df2 with 1 column
this will give you a single data frame and to create various data frames use the answer used. in the below thread:
How to loop through the columns in an R data frame and create a new data frame using the column name in each iteration?

Related

How can I select all rows and columns from a matrix by matching with another data frame in R?

imagine you have a matrix of 2000 rows and 2000 columns in R. Both (rownames and column names are identical). Now I have another data frame with 380 rows and one column. I would like to know how it is possible to select the rows and columns from the big matrix which match to the 380 values?
I hope you can help.
Best wishes,
Lukas
If I am understanding correctly, here is an example of how you could create a second column in the dataframe with the matching value from the matrix.
A<-matrix(c(1:16), nrow = 4)
colnames(A)<-c("a","b","c","d")
rownames(A)<-c("a","b","c","d")
b<-as.data.frame(matrix(c("a","b","c","d")))
for (i in 1:nrow(B)){
b[i,2]<-A[b[i,1],b[i,1]]
}
b

Update a data frame within a for loop

The point of this question is that I want to know how to update a dataframe inside of either a for loop or a function. So i know there are other ways to do the specific task i am looking at, but i want to know how to do it the way i am trying to do it.
I have a data frame with 15 columns and 2k observations with some 98 and 99s. For each row in where there is a 98 or 99 for any variable/column, I want to remove the whole row. I create a function to filter by variable name not equal to 98/99, and use lapply. however, instead of continually updating the data frame, It just spits out a series of data frames, overwriting the previous data frame, meaning that at the end i will only get a data frame with the last column cleaned. How do i get it to update the data frame for each column sequentially?
nafunction = function(variable){
kuwait5=kuwait5%>%
filter(variable<90)
}
`nafunction = function(variable){
kuwait5=kuwait5%>%
filter(variable<90)
}
lapply(kuwait5, nafunction)`
Expected result is a new data frame with all rows that have an 98 removed. What i get is a sequence of data frames each one having ONE column in which rows with NAS are removed.

Making a New R DataFrame from 2 Existing Ones

I have 2 data frames in R. One has 187 observations and one has 195. I need to create a new data frame consisting of only the 8 observations that are not common between the two. Data frame 1 (with 195 observations) is called merged. Data frame 2 (with 187 observations) is called merged 2013. There is a column called Country.Code in both data frames and each observation has a unique code that would separate it from the others. How can I complete this task? Please list a function and explain it if possible!
Thank you!
Try using logical indexing. This returns the subset of rows where the Country.Code's don't match:
merged[ !(merged$Country.Code %in% merged2013$Country.Code) , ]
Edited the names of the dataframes to match the question.

finding the mean of columns in a data frame in R

I have a vector that contains 50 data frames of re-sampled data. So all of the column names are consistent in each data frame but the numeric values are different. Each data frame consists of 12 rows. How can I find the mean value of each row in one particular column between the 50 data frames and place the 12 mean values into a new one column data frame?
If you want the mean of a specific column that exists across your list of dataframes into a dataframe of its own, you can use dplyr and purrr.
library(dplyr)
library(purrr)
map2_df(your_list, "column_name", ~summarize_at(.x, .y, mean))

How to combine several dataframes with different rows using R?

I have several text files containing 2 columns and different row numbers. I would like to follow drawing a plot using ggplot2 as explained enter link description here; however, it works well for dataframes with equal row numbers, and I couldn't reproduce it with dataframes with different row numbers.
please let me know how I should combine these data frames (dataframes with different row number) using R?
case siza
case1 129
case2 129
case3 130
case4 131
case5 132
case6 132
Thank you
It seems from the comments that you're actually trying to merge multiple columns and then plot each column individually. The problem, however, is that each of these columns has a different number of rows. Therefore you need to combine them based on some common variable (i.e. row names).
Using the examples from the link you provided:
df1 = data.frame(size=runif(300,300,1200))
#now adding an unequal column
df2 = data.frame(size=df1[c(1:275),])
Now merge the data frames based on row number. "all=TRUE" keeps all the values, "by=0" merges by row.names.
df.all=merge(df1$size,df2$size,by=0,all=TRUE)
#and to order the row names.
df.all=df.all[order(as.numeric(df.all[,1])),]
#finally if you want to remove the NA values
df.all[is.na(df.all)]=0
Does that get you the data.frame you want?

Resources