Sorry I did not find similar answer, if there were, from past posts here.
Suppose I have two simple n-by-m data frames, df1 and df2. Now I want to combine them to get a n-by-2m data frame called df. By doing this, I want column 1 in df2 to be column 2 in df, column 2 in df2 to be column 4 in df, column 3 in df2 to be column 6 in df....Meanwhile, column 1 in df1 is column 1 in df, column 2 in df1 is column 3 in df...
It means in the new df, column 1, 3, 5, 7...come from df1 and column 2, 4, 6, 8...come from df2.
In general, it looks like to INSERT df2 into df1 by every other column, to put each column of df2 behind its corresponding column in df1.
Can anybody help me on this?
why does it need to be that way, they are variables after all, but if you want to do it
m<-ncol(df1)
df<-merge(df1,df2)
df<-df[,c(seq(1,2*m, by=2),seq(2,2*m, by=2))
Related
I have the following problem: I want to create a column in my dataset which contains values from an already existing column, based on a third column.
Example:
a <- sample(1:3, 20, T)
b <- sample(1:100, 20, T)
df <- data.frame(a,b)
I now want to create a column c in the data frame that contains the value from b for every row where a == 1 and 0 if-else. I know this should be easy to do, but I somehow only got it to work with a complicated solution creating a new data frame and then binding it to the original one..
I have a dataframe with 1 column and 162424 rows.
I want to create a new df from this;
First three values in original dataframe should be divided into four new columns in first row in new df.
Next three values in original dataframe should be divided into four columns in second row in new df.
And so on.
Original df size - 1 column, 162424 rows
New df size - 4 columns, 40606 rows
Do you want something like this?
m = data.frame(x = sample(1:12))
as.data.frame(matrix(m$x, ncol = 3, byrow = T))
I have a 9801 by 3 reference table.
The first 2 columns of this table is defined as follows.
x1 = x2 = seq(0.01,0.99,0.01)
x12 = data.matrix(expand.grid(x1,x2))
The 3rd columns contains the outcome values.
Now I have another n by 3 matrix where the 1st and 2nd columns are selected rows of the above matrix 'x12' and the 3rd column is to be filled. I would like fill in the 3rd column of the 2nd table by looking up the same combination of the 1st and 2nd column in the 1st table and find the value in the 3rd column.
How can I do this?
You can do this with the merge function:
# Original data frame
x1 = x2 = seq(0.01,0.99,0.01)
x12 = expand.grid(x1,x2)
# Add a fake "outcome"
x12$outcome = rnorm(nrow(x12))
# New data frame with 100 random rows and the first two columns of x12
x12new = x12[sample(1:nrow(x12), 100), c(1,2)]
# Merge the outcome values from x12 into x12new
x12new = merge(x12new, x12, by=c("Var1","Var2"), all.x=TRUE)
by tells merge which columns must match when comparing the two data frames. all.x=TRUE tells merge to keep all rows from the first data frame, x12new in this case, even if they don't have a match in the second data frame (not an issue here, but you'll often want to make sure you don't lose any rows when merging).
One other thing to note is that, unlike vlookup in Excel, merge will increase the number of rows in the new, merged data frame if there are multiple rows that match the criteria. For example, see what happens when you merge values from df2 into df1:
df1 = data.frame(x = c(1,2,3,4), z=c(10,20,30,40))
df2 = data.frame(x = c(1,1,1,2,3), y=c("a","b","c","a","c"))
merge(df1, df2, by="x", all.x=TRUE)
x z y
1 1 10 a
2 1 10 b
3 1 10 c
4 2 20 a
5 3 30 c
6 4 40 <NA>
You can also use left_join from the dplyr package (other types of joins are available as well):
library(dplyr)
left_join(df1, df2, by="x")
I have a dataframe with a column for the name of individuals and columns for results.
Now I want to attach a new column with either 1 , 2 or NA depending on the individual.
I have a vector with all the individuals which are level 1 and one for individuals from level 2
How can I attach a collumn to this data frame that goes something like this:
if dataframe$individual is (1,3,6,7) value in column is 1, if dataframe$individual is (2,5,8) value in column is 2, else value is NA
I hope I made it clear with the example what i am looking for.
Thanks for the help
Try
dat$newCol <- with(dat, ifelse(individual %in% c(1,3,6,7), 1,
ifelse(individual %in% c(2,5,8), 2, NA)))
Looking at a Data Frame like so:
set.seed(3)
Data1<-rnorm(20, mean=20)
Dir_1<-rnorm(20,mean=2)
Data2<-rnorm(20, mean=21)
Dir_2<-rnorm(20,mean=2)
Data3<-rnorm(20, mean=22)
Dir_3<-rnorm(20,mean=2)
Data4<-rnorm(20, mean=19)
Dir_4<-rnorm(20,mean=2)
Data5<-rnorm(20, mean=20)
Dir_5<-rnorm(20,mean=2)
Data6<-rnorm(20, mean=23)
Dir_6<-rnorm(20,mean=2)
Data7<-rnorm(20, mean=21)
Dir_7<-rnorm(20,mean=2)
Data8<-rnorm(20, mean=25)
Dir_8<-rnorm(20,mean=2)
Index<-rnorm(20,mean=5)
DF<-data.frame(Data1,Dir_1,Data2,Dir_2,Data3,Dir_3,Data4,Dir_4,Data5,Dir_5,Data6,Dir_6,Data7,Dir_7,Data8,Dir_8,Index)
I end up with a data frame with two columns of data per observation (based on observation 1-8) and an index. Based on this index I would like to remove (or make NA) certain data observations.
As an example:
If the index is greater than 5, drop observation 8 (both Data and Dir) in that row
If the index is greater than 4, drop observations 7 and 8 in that row
If the index is greater than 3 and less then 3.5, drop 6,7,8 in that row
I was hoping to come up with a series of "if" statements that would let me drop columns for each row based on an index value.
Assuming what you want is not to "drop columns for a row" but put NAs into the proper columns for the rpecific row, you need to use a few index vectors and not a series of if statements:
DF[DF$Index>3 & DF$Index<3.5, (6*2-1):(8*2)] <- NA
DF[DF$Index>4, (7*2-1):(8*2)] <- NA
DF[DF$Index>5, (8*2-1):(8*2)] <- NA