Making a New R DataFrame from 2 Existing Ones - r

I have 2 data frames in R. One has 187 observations and one has 195. I need to create a new data frame consisting of only the 8 observations that are not common between the two. Data frame 1 (with 195 observations) is called merged. Data frame 2 (with 187 observations) is called merged 2013. There is a column called Country.Code in both data frames and each observation has a unique code that would separate it from the others. How can I complete this task? Please list a function and explain it if possible!
Thank you!

Try using logical indexing. This returns the subset of rows where the Country.Code's don't match:
merged[ !(merged$Country.Code %in% merged2013$Country.Code) , ]
Edited the names of the dataframes to match the question.

Related

Enforce dataframe datatype from one to another / Copy schema

I have 2 dataframes with exactly similar columns.
Except that the data type definitions are not.
I want to ensure that both dataframes share similar data types.
For example, let's assume only 1 column for both dataframes.
Dataframe_A
KPI
10
144
14
..
Dataframe_B
KPI
10
144
14
..
Both dataframes have the same column which is KPI.
However KPI from Dataframe A is defined as string but KPI from Dataframe B is defined as numeric.
I have hundreds of columns.
Is there a faster way than redefining the column type one by one?
I want to use one of the dataframe as reference. For example Dataframe B should follow Dataframe A column types.
Thanks!
Current solution is not ideal, as I need to define all the columns one by one.

how to create subsets of data. frame in R?

I have two data frames one with 94 rows and 167 columns (df_1) and the other one with 94 rows and 1 column (df_2) and I would like to do 167 different data frames with each column of the first data frame and the same column of the second data frame, I have tried with a for loop like the next
for (i in seq_len(ncol(df_1))){
df_[[i]] <- data.frame(df_1[sort(rownames(df_1)),i,df_2[sort(rownames(df_2)),])
}
But it does not work, can someone help me?
I think to join two df if they have similar column name use the below code
library(gtools)
df3<- smartbind(df1,df2) ####df1 with 167 column and df2 with 1 column
this will give you a single data frame and to create various data frames use the answer used. in the below thread:
How to loop through the columns in an R data frame and create a new data frame using the column name in each iteration?

How to combine several dataframes with different rows using R?

I have several text files containing 2 columns and different row numbers. I would like to follow drawing a plot using ggplot2 as explained enter link description here; however, it works well for dataframes with equal row numbers, and I couldn't reproduce it with dataframes with different row numbers.
please let me know how I should combine these data frames (dataframes with different row number) using R?
case siza
case1 129
case2 129
case3 130
case4 131
case5 132
case6 132
Thank you
It seems from the comments that you're actually trying to merge multiple columns and then plot each column individually. The problem, however, is that each of these columns has a different number of rows. Therefore you need to combine them based on some common variable (i.e. row names).
Using the examples from the link you provided:
df1 = data.frame(size=runif(300,300,1200))
#now adding an unequal column
df2 = data.frame(size=df1[c(1:275),])
Now merge the data frames based on row number. "all=TRUE" keeps all the values, "by=0" merges by row.names.
df.all=merge(df1$size,df2$size,by=0,all=TRUE)
#and to order the row names.
df.all=df.all[order(as.numeric(df.all[,1])),]
#finally if you want to remove the NA values
df.all[is.na(df.all)]=0
Does that get you the data.frame you want?

Create a stack of n subset data frames from a single data frame based on date column

I need to create a bunch of subset data frames out of a single big df, based on a date column (e.g. - "Aug 2015" in month-Year format). It should be something similar to the subset function, except that the count of subset dfs to be formed should change dynamically depending upon the available values on date column
All the subsets data frames need to have similar structure, such that the date column value will be one and same for each and every subset df.
Suppose, If my big df currently has last 10 months of data, I need 10 subset data frames now, and 11 dfs if i run the same command next month (with 11 months of base data).
I have tried something like below. but after each iteration, the subset subdf_i is getting overwritten. Thus, I am getting only one subset df atlast, which is having the last value of month column in it.
I thought that would be created as 45 subset dfs like subdf_1, subdf_2,... and subdf_45 for all the 45 unique values of month column correspondingly.
uniqmnth <- unique(df$mnth)
for (i in 1:length(uniqmnth)){
subdf_i <- subset(df, mnth == uniqmnth[i])
i==i+1
}
I hope there should be some option in the subset function or any looping might do. I am a beginner in R, not sure how to arrive at this.
I think the perfect solution for this might be use of assign() for the iterating variable i, to get appended in the names of each of the 45 subsets. Thanks for the note from my friend. Here is the solution to avoid the subset data frame being overwritten each run of the loop.
uniqmnth <- unique(df$mnth)
for (i in 1:length(uniqmnth)){
assign(paste("subdf_",i,sep=""), subset(df, mnth == uniqmnth[i])) i==i+1
}

How to create a data.frame with 3 factors?

I hope you won't find my question too silly, i did a lot of research but it seems that i can't figure how to solve this really annoying issue.
Well, i have datas for 6 participants (P) in an experiment, with 50 trials (T) per participants and 10 condition (C). So i'd like to create a dataframe in r allowing me to put these datas.
This data.frame should have 3 factors (P, T and C) and so a number of total row of (P*T*C). The difficulty for me is to create this one, since i have the datas for the 6 participant in 6 data.frame of 100 obs(T) by 10 varibles(C).
I'd like first to create the empty dataset with these factors, and then copy the values of the 6 data.set according to the factors P, T and C.
Any help would be greatly appreciated, i'm novice in r.
Thank you.
OK; First we create one big dataframe for all participants:
result<-rbind(dfrforparticipant1, dfrforparticipant2,...dfrforparticipant6) #you'll have to fill out the proper names of the original data.frames
Next, we add a column for the participant ID:
numTrials<-50 #although 100 is also mentioned in your question
result$P<-as.factor(rep(1:6, each=numTrials))
Finally, we need to go from 'wide' format to 'long' format (I'm assuming your column names holding the results for each condition are called C1, C2 etc. ; I'm also assuming your original data.frames already held a column named T to denote the trial), like this (untested, since you did not provide example data):
orgcolnames<-paste("C", 1:10, sep="")
result2<-reshape(result, varying=list(orgcolnames), v.names="val", idvar=c("T","P"), timevar="C", times=seq_along(orgcolnames), direction="long")
What you want is now in result2.

Resources