How to insert a column inside a dataframe based on vector? - r

For example I have a dataframe data1 with these columns:
A B C D G T Q Y U J N
And I have another dataframe data2 with rows as follows:
A B C M
D G K T
Q F Y U
J W E N
Based on the above dataframe, I should have a column M after column C and before column D. I also should have a column K between columns G and T etc..
Therefore I want to use data2 to fill up the missing columns in data1. If I do that successfully, data1 should be :
A B C M D G K T Q F Y U J W E N
My code so far:
for(row in 1:nrow(data2))
{
for(column in 1:ncol(data2)){
element = data2[row,column]
for(column in 1:ncol(data1))
{
if(element!=colnames(data1)[column])
{
}
}
}
I'm not sure where to go with my code now, I don't think that it is an efficient code to begin with. Any help is appreciated.

We can transpose the second dataset, convert to a vector and use that as column names after assigning the columns that are not in the original data to NA
nm1 <- c(t(data2))
nm2 <- setdiff(nm1, names(data2))
data1[nm2 ] <- NA
data1 <- data1[nm1]

Related

Add elements from vector to every nth column of dataframe in R

I have the following vector:
samples <- c("bl", "ra", "ye", "gp", "dk")
which I would like to add to the dataframe
df <- data.frame(Country = "FR", Name = "Jean", A="",B="",C="",D="",E="",F="",G="",H="",I="",J="",K="",L="",M="",N="",O="",P="",Q="",R="",S="",T="",U="",V="",W="ok",X="ok",Y="ok",Z="ok",A1="ok",B1="ok")
and give the output
Country Name A B C D E F G H I J K L M N O P Q R S T ....
1 FR Jean bl ra ye gp dk
The aim:
Place elements within the vector into the dataframe that already contains some values.
The first element has to be in column 3
Subsequent elements have to be in every 5th column from the first element i.e columns 7, 11, 15, 19, ... (4i-1)
A for loop that automatically adds the elements every 5th column from the first element. Depending on the situation, I may have a much longer vector than what I specified. It would be tedious to assign each element to the column names individually.
There is no need of a loop. You can directly assign samples to the corresponding columns.
df[, seq_along(samples) * 4 - 1] <- samples
df
# Country Name A B C D E F G H I J K L M N O P Q R S T U V W X Y Z A1 B1
# 1 FR Jean bl ra ye gp dk ok ok ok ok ok ok
First define a target_pos vector containing the position of your target columns, then iterate over the vector to append samples to your df.
Note there's some differences between your input df and your desired output. For example you don't have column K in your df but you have that in your desired output.
target_pos <- seq(3, 5*length(samples) - 3, 4)
for (i in 1:length(target_pos)) df[, target_pos[i]] <- samples[i]
df
#> Country Name A B C D E F G H I J L M N O P Q R S T U V W X Y Z A1 B1
#> 1 FR Jean bl ra ye gp dk ok ok ok ok ok ok
Data
df<-data.frame(Country = "FR", Name = "Jean", A="",B="",C="",D="",E="",F="",G="",H="",I="",J="",L="",M="",N="",O="",P="",Q="",R="",S="",T="",U="",V="",W="ok",X="ok",Y="ok",Z="ok",A1="ok",B1="ok")
samples=c("bl","ra","ye","gp","dk")

R: find specific element in list of data frames and assign it to colname and switch element to the right

Let's have a list of data frames.
df1 <- data.frame(V1=c("a", "b", "c"),V2=c("d", "e","f"), V3=c("g","h","i"),V4=c("j","k","l"))
df2 <- data.frame(V1=c("m","n"), V2=c("o","p"), V3=c("q","r"))
l <-list(df1, df2)
> l
[[1]]
V1 V2 V3 V4
1 a d g j
2 b e h k
3 c f i l
[[2]]
V1 V2 V3
1 m o q
2 n p r
In this list there is a data frame which is turned and colnames are as element of the list. Data frame [[1]] just turned and merged data frames (See the picture).
For instance column V1 and V3 cointain colnames, while V2 and V4 are variables.
I would like to run some code which match one of the elements from data frames (ele is a vector containing names of the primal column):
ele <- c("a","b","c","g","h","i")
and when it matches it assigns this element as a colname and assign the element in the right as a variable to this column and finally we can get new data frame, for instance:
dfa<-data.frame(a="d")
> dfa
a
1 d
Important: note that list[[2]] will not be matched. I would rather opt for method for loop / lappy and get separated data frames like dfa, dfb, dfc...
Are you looking for something like this?
df1[]=apply(df1,2,as.character)
setNames(as.data.frame(t(unlist(df1[,c(FALSE,TRUE)]))),
unlist(df1[,c(TRUE,FALSE)]))
a b c g h i
1 d e f j k l
We first change factors to character as factors don't play well in data manipulation. We then use c(FALSE,TRUE) to select even columns, which are the content of the dataframe and assign the names, which are the odds (c(TRUE,FALSE))
edit
Checking for a match between the name-reference to carry out the process.
ele <- c("a","b","c","g","h","i")
l = lapply(l, function(x){
x[]=apply(x,2,as.character)
if (any(unlist(x)%in%ele)){
setNames(as.data.frame(t(unlist(x[,c(FALSE,TRUE)]))),
unlist(x[,c(TRUE,FALSE)]))
} else {NA}
})
l
[[1]]
a b c g h i
1 d e f j k l
[[2]]
NULL
If you want to delete nulls use l[lengths(l) != 0]

intersection with tolerance of non-equal vectors and ID

I have a question about matching values between two vectors.
Lets say I have a vector and data frame:
data.frame
value name vector 2
154.0031 A 154.0084
154.0768 B 159.0344
154.2145 C 154.0755
154.4954 D 156.7758
156.7731 E
156.8399 F
159.0299 G
159.6555 H
159.9384 I
Now I want to compare vector 2 with values in the data frame with a defined global tolerance (e.g. +-0.005) that is adjustable and add the corresponding names to vector 2, so I get a result like this:
data.frame
value name vector 2 name
154.0031 A 154.0074 A
154.0768 B 159.0334 G
154.2145 C 154.0755 B
154.4954 D 156.7758 E
156.7731 E
156.8399 F
159.0299 G
159.6555 H
159.9384 I
I tried to use intersect() but there is no option for tolerance in it?
Many thanks!
This outcome can be achieved through with outer, which, and subsetting.
# calculate distances between elements of each object
# rows are df and columns are vec 2
myDists <- outer(df$value, vec2, FUN=function(x, y) abs(x - y))
# get the values that have less than some given value
# using arr.ind =TRUE returns a matrix with the row and column positions
matches <- which(myDists < 0.05, arr.ind=TRUE)
data.frame(name = df$name[matches[, 1]], value=vec2[matches[, 2]])
name value
1 A 154.0084
2 G 159.0344
3 B 154.0755
4 E 156.7758
Note that this will only return elements of vec2 with matches and will return all elements of df that satisfy the threshold.
to make the results robust to this, use
# get closest matches for each element of vec2
closest <- tapply(matches[,1], list(matches[,2]), min)
# fill in the names.
# NA will appear where there are no obs that meet the threshold.
data.frame(name = df$name[closest][match(as.integer(names(closest)),
seq_along(vec2))], value=vec2)
Currently, this returns the same result as above, but will return NAs where there is no adequate observation in df.
data
Please provide reproducible data if you ask a question in the future. See below.
df <- read.table(header=TRUE, text="value name
154.0031 A
154.0768 B
154.2145 C
154.4954 D
156.7731 E
156.8399 F
159.0299 G
159.6555 H
159.9384 I")
vec2 <- c(154.0084, 159.0344, 154.0755, 156.7758)

convert only some factors into a different factor

I'm trying to build a factor column that relates to two other factor columns with completely different factor levels. Here's example data.
set.seed(1234)
a<-sample(LETTERS[1:10],50,replace=TRUE)
b<-sample(letters[11:20],50,replace=TRUE)
df<-data.frame(a,b)
df$a<-as.factor(df$a)
df$b<-as.factor(df$b)
The rule I want to make creates a new column, c, that bases it's factor level value based on the value of column a.
if any row in column a ="F", that row in column c will equal whatever the entry is for column b. The code I'm trying:
dfn<-dim(df)[1]
for (i in 1:dfn){
df$c[i]<-ifelse(df$a[i]=="F",df$b[i],df$a[i])
}
df
only spits out the numbered index of the factor level for column b and not the actual entry. What have I done wrong?
I think you'll need to do a little finagling of character values. This seems to do it.
w <- df$a == "F"
df$c <- factor(replace(as.character(df$a), w, as.character(df$b)[w]))
Here is a quick look at the new column,
factor(replace(as.character(df$a), w, as.character(df$b)[w]))
# [1] B G G G I G A C G s G k C J C I C C B C D D B A C I n J I A
# [31] E C D p B H C C J I l G D G D p G E C H
# Levels: A B C D E G H I J k l n p s
As my previous comment, a solution with dplyr:
df %>% mutate(c = ifelse(a == "F", as.character(b), as.character(a)))
If you plan on doing anything involving combinations of the columns as factors, for example, comparisons, you should refactor to the same set of levels.
u<-union(levels(df$a),levels(df$b))
df$a<-factor(df$a,u)
df$b<-factor(df$b,u)
df$c<-df$a
ind<-df$a=="F"
df$c[ind]<-df$b[ind]
By taking this precaution, you can sensibly do
> sum(df$c==df$b)
[1] 6
> sum(df$a=="F")
[1] 6
otherwise the first line will fail.

Sorting rows alphabetically

My data looks like,
A B C D
B C A D
X Y M Z
O M L P
How can I sort the rows to get something like
A B C D
A B C D
M X Y Z
L M O P
Thanks,
t(apply(DF, 1, sort))
The t() function is necessary because row operations with the apply family of functions returns the results in column-major order.
What did you try? This is really straight-forward and easy to solve with a simple loop.
> s <- x
> for(i in 1:NROW(x)) {
+ s[i,] <- sort(s[i,])
+ }
> s
V1 V2 V3 V4
1 A B C D
2 A B C D
3 M X Y Z
4 L M O P
No plyr answer yet?!
foo <- matrix(sample(LETTERS,10^2,T),10,10)
library("plyr")
aaply(foo,1,sort)
Exactly the same as DWins answer except that you don't need t()
Another fast base R option from Martin Morgan in Fastest way to select i-th highest value from row and assign to new column is
matrix(a[order(row(a), a, method="radix")], ncol=ncol(a))
Timings can be found here

Resources