I have a vector containing string representing names of variables that should be in my final df. Those names could change every time based on other conditions.
x <- colnames(df)
y <- c("blue", "yellow", "red")
z <- setdiff(y,x)
Let's say my result now is that: z = c("blue", "red")
I would like a function that, if any element of vector y is missing from z, THEN the function will create a column on df with such element as variable name.
Here's my inconclusive attempt:
if (length(z) > 0) {
for (i in z) {
df$i <- NA
}
}
The part I don't know how to do is pass i as an argument for creating a new variable on df.
In my example: I should finally get df$yellow as a new variable of df.
I checked many posts, either I don't understand how it works, or they are not doing what I need, some for reference:
Create new variables based on another df
Rename variable based on textInput value in Shiny
Executing a function with paste to create a new variable in a dataframe in R
Evaluate expression given as a string
this is one possibility without any loops:
df <- data.frame(x = 1:5)
z <- c("blue", "red")
df[z] <- NA_character_
x blue red
1 1 NA NA
2 2 NA NA
3 3 NA NA
4 4 NA NA
5 5 NA NA
Solution was indeed the simple suggestion from #akrun:
You can use [ instead of $ i.e. df[z] <- NA Reproducible mtcars[z] <- NA; head(mtcars)
Hence, as follows:
if (length(z) > 0) {
for (i in z) {
df[i] <- NA
}
}
Related
I currently have the below list and am trying to develop a data frame from the results. Essentially, I would like to take the first list and add it to a new data frame, creating columns with variable names, and storing the results in the first row. Then move to the next list and create a new y column so that any variables with y results would be added to that list.
list
[[1]]
x xy
1.000365 1.000365
[[2]]
x y
1.007184 1.007184
[[3]]
x y
1.020803 1.020803
[[4]]
NA
Is this possible to do? I've been trying to figure out how a for loop or lapply might work in this scenario but am unsure.
Thanks.
You can use [ in lappy on unique names of the vectors:
i <- unique(unlist(lapply(x, names)))
setNames(as.data.frame(do.call(rbind, lapply(x, `[`, i))), i)
# x xy y
#1 1.000365 1.000365 NA
#2 1.007184 NA 1.007184
#3 1.020803 NA 1.020803
#4 NA NA NA
Data:
x <- list(c(x=1.000365, xy=1.000365), c(x=1.007184, y=1.007184),
c(x=1.020803, y=1.020803), NA)
lapply, or even better bind_rows would be good for this:
library(dplyr)
d <- bind_rows( your.list )
Note, I assume the xy name of the 2nd element of the first list entry is a typo?
I have a 2 dataframe.
df1:
Dis1_SubDIs1_Village1 Dis2_SubDIs1_Village1 Dis1_SubDIs2_Village1
JODHPUR|JODHPUR|JODHPUR |JODHPUR|JODHPUR JODHPUR||JODHPUR
JHUNJHUNUN|JHUNJHUNUN|BARI |JHUNJHUNUN|BARI JHUNJHUNUN|BARI|BARI
BUNDI|HINDOLI|BUNDI |HINDOLI|BUNDI BUNDI|BUNDI|BUNDI
SIROHI|SIROHI|SIROHI |SIROHI|SIROHI SIROHI||SIROHI
ALWAR|ALWAR|BASAI |ALWAR|BASAI ALWAR||BASAI
BHARATPUR|BHARATPUR|SEEKRI |BHARATPUR|SEEKRI BHARATPUR||SEEKRI
and second data,
df2 :
High
|BHARATPUR|SEEKRI
BUNDI|HINDOLI|BUNDI
SIROHI||SIROHI
CHURU|TARANAGAR|DABRI CHHOTI
Now, I want to apply vloook/match in df1 with respect to df2 column. The same we do in excel.
If exact matches are there, give me the match, else 0.
I tried making the function in R
For match
for(i in names(df1)){
match_vector = match(df_final[,i], df$High, incomparables = NA)
df1$High = df2$High[match_vector]
}
but getting an error. It's showing only for the last column and replacing the value of other column.
For vlookup:
func_vlook = function(a){
for(i in 1:ncol(a)) {
lookup_df = vlookup_df(lookup_value = i,
dict = df2,
lookup_column = 1)
}
return(lookup_df)
}
lookup_df <- func_vlook(a = df1)
Still getting an error.
My final Output should be like the below attached with df file:
Dis1_SubDIs1_Village1_M1 Dis2_SubDIs1_Village1_M2 Dis1_SubDIs2_Village1_M3
NA NA NA
NA NA NA
BUNDI|HINDOLI|BUNDI NA NA
NA SIROHI||SIROHI SIROHI||SIROHI
NA NA NA
NA NA |BHARATPUR|SEEKRI
for the N no. of columns, there should be N no. of columns with match
Please help.
No need for any loops with this one - apply and match should work fine. apply will iterate over as many columns you have, so the output will have the same number of columns as the input. In your example, apply will simplify to produce a matrix.
apply(X = df1,
MARGIN = 2,
FUN = function(x) df2$High[match(x, df2$High)])
If you need a dataframe as the output, then wrap the code below in as.data.frame()
as.data.frame(apply(X = df1,
MARGIN = 2,
FUN = function(x) df2$High[match(x, df2$High)]))
I want to append values from one dataframe as column names to an another data frame.
I've written code that will produce one column at a time if I "manually" assigne index values:
df_searchtable <- data.frame(category = c("air", "ground", "ground", "air"), wiggy = c("soar", "trot", "dive", "gallop"))
df_host <- data.frame(textcolum = c("run on the ground", "fly through the air"))
#create vector of categories
categroups <- as.character(unique(df_searchtable$category))
##### if I assign colum names one at a time using index numbers no prob:
group = categroups[1]
df_host[, group] <- NA
##### if I use a loop to assign the column names:
for (i in categroups) {
group = categroups[i]
df_host[, group] <- NA
}
the code fails, giving:
Error in [<-.data.frame(`*tmp*`, , group, value = NA) :
missing values are not allowed in subscripted assignments of data frames
How can I get around this problem?
Here's a simple base R solution:
df_host[categroups] <- NA
df_host
textcolum air ground
1 run on the ground NA NA
2 fly through the air NA NA
The problem with your loop is that you are looping through each element whereas your code assumes you are looping through 1, 2, ..., n.
For instance:
for (i in categroups) {
print(i)
print(categroups[i])
}
[1] "air"
[1] NA
[1] "ground"
[1] NA
To fix your loop, you could do one of two things:
for (group in categroups) {
df_host[, group] <- NA
}
# or
for (i in seq_along(categroups)) {
group <- categroups[i]
df_host[, group] <- NA
}
Here's a solution using purrr's map.
bind_cols(df_host,
map_dfc(categroups,
function(group) tibble(!!group := rep(NA_real_, nrow(df_host)))))
Gives:
textcolum air ground
1 run on the ground NA NA
2 fly through the air NA NA
map_dfc maps over the input categroups, creates a single-column tibble for each one, and joins the newly created tibbles into a dataframe
bind_cols joins the original dataframe to your new tibble
Alternatively you could use walk:
walk(categroups, function(group){df_host <<- mutate(df_host, !!group := rep(NA_real_, nrow(df_host)))})
Here's an ugly base R solution: create an empty matrix with the column names and cbind it to the second dataframe.
df_searchtable <- data.frame(category = c("air", "ground", "ground", "air"),
wiggy = c("soar", "trot", "dive", "gallop"),
stringsAsFactors = FALSE)
df_host <- data.frame(textcolum = c("run on the ground", "fly through the air"),
stringsAsFactors = FALSE)
cbind(df_host,
matrix(nrow = nrow(df_host),
ncol = length(unique(df_searchtable$category)),
dimnames = list(NULL, unique(df_searchtable$category))))
Result:
textcolum air ground
1 run on the ground NA NA
2 fly through the air NA NA
Trying to using %in% operator in r to find an equivalent SAS Code as below:
If weather in (2,5) then new_weather=25;
else if weather in (1,3,4,7) then new_weather=14;
else new_weather=weather;
SAS code will produce variable "new_weather" with values 25, 14 and as defined in variable "weather".
R code:
GS <- function(df, col, newcol){
# Pass a dataframe, col name, new column name
df[newcol] = df[col]
df[df[newcol] %in% c(2,5)]= 25
df[df[newcol] %in% c(1,3,4,7)] = 14
return(df)
}
Result: output values of "col" and "newcol" are same, when passing a data frame through a function "GS". Syntax is not picking up the second or more values for a variable "newcol"? Appreciated your time explaining the reason and possible fix.
Is this what you are trying to do?
df <- data.frame(A=seq(1:4), B=seq(1:4))
add_and_adjust <- function(df, copy_column, new_column_name) {
df[new_column_name] <- df[copy_column] # make copy of column
df[,new_column_name] <- ifelse(df[,new_column_name] %in% c(2,5), 25, df[,new_column_name])
df[,new_column_name] <- ifelse(df[,new_column_name] %in% c(1,3,4,7), 14, df[,new_column_name])
return(df)
}
Usage:
add_and_adjust(df, 'B', 'my_new_column')
df[newcol] is a data frame (with one column), df[[newcol]] or df[, newcol] is a vector (just the column). You need to use [[ here.
You also need to be assigning the result to df[[newcol]], not to the whole df. And to be perfectly consistent and safe you should probably test the col values, not the newcol values.
GS <- function(df, col, newcol){
# Pass a dataframe, col name, new column name
df[[newcol]] = df[[col]]
df[[newcol]][df[[col]] %in% c(2,5)] = 25
df[[newcol]][df[[col]] %in% c(1,3,4,7)] = 14
return(df)
}
GS(data.frame(x = 1:7), "x", "new")
# x new
# 1 1 14
# 2 2 25
# 3 3 14
# 4 4 14
# 5 5 25
# 6 6 6
# 7 7 14
#user9231640 before you invest too much time in writing your own function you may want to explore some of the recode functions that already exist in places like car and Hmisc.
Depending on how complex your recoding gets your function will get longer and longer to check various boundary conditions or to change data types.
Just based upon your example you can do this in base R and it will be more self documenting and transparent at one level:
df <- data.frame(A=seq(1:30), B=seq(1:30))
df$my_new_column <- df$B
df$my_new_column <- ifelse(df$my_new_column %in% c(2,5), 25, df$my_new_column)
df$my_new_column <- ifelse(df$my_new_column %in% c(1,3,4,7), 14, df$my_new_column)
I am struggling a bit with a probably fairly simple task. I wanted to create a function that has arguments of dataframe(df), column names of dataframe(T and R), value of the selected column of dataframe(a and b). I know that the function reads the dataframe. but , I don't know how the columns are selected. I'm getting an error.
fun <- function(df,T,a,R,b)
{
col <- ds[c("x","y")]
omit <- na.omit(col)
data1 <- omit[omit$x == 'a',]
data2 <- omit[omit$x == 'b',]
nrow(data2)/nrow(data1)
}
fun(jugs,Place,UK,Price,10)
I'm new to r language. So, please help me.
There are several errors you're making.
col <- ds[c("x","y")]
What are x and y? Presumably they're arguments that you're passing, but you specify T and R in your function, not x and y.
data1 <- omit[omit$x == 'a',]
data2 <- omit[omit$x == 'b',]
Again, presumably, you want a and b to be arguments you passed to the function, but you specified 'a' and 'b' which are specific, not general arguments. Also, I assume that second "omit$x" should be "omit$y" (or vice versa). And actually, since you just made this into a new data frame with two columns, you can just use the column index.
nrow(data2)/nrow(data1)
You should print this line, or return it. Either one should suffice.
fun(jugs,Place,UK,Price,10)
Finally, you should use quotes on Place, UK, and Price, at least the way I've done it.
fun <- function(df, col1, val1, col2, val2){
new_cols <- df[,c(col1, col2)]
omit <- na.omit(new_cols)
data1 <- omit[omit[,1] == val1,]
data2 <- omit[omit[,2] == val2,]
print(nrow(data2)/nrow(data1))
}
fun(jugs, "Place", "UK", "Price", 10)
And if I understand what you're trying to do, it may be easier to avoid creating multiple dataframes that you don't need and just use counts instead.
fun <- function(df, col1, val1, col2, val2){
new_cols <- df[,c(col1, col2)]
omit <- na.omit(new_cols)
n1 <- sum(omit[,1] == val1)
n2 <- sum(omit[,2] == val2)
print(n2/n1)
}
fun(jugs, "Place", "UK", "Price", 10)
I would write this function as follows:
fun <- function(df,T,a,R,b) {
data <- na.omit(df[c(T,R)]);
sum(data[[R]]==b)/sum(data[[T]]==a);
};
As you can see, you can combine the first two lines into one, because in your code col was not reused anywhere. Secondly, since you only care about the number of rows of the two subsets of the intermediate data.frame, you don't actually need to construct those two data.frames; instead, you can just compute the logical vectors that result from the two comparisons, and then call sum() on those logical vectors, which naturally treats FALSE as 0 and TRUE as 1.
Demo:
fun <- function(df,T,a,R,b) { data <- na.omit(df[c(T,R)]); sum(data[[R]]==b)/sum(data[[T]]==a); };
df <- data.frame(place=c(rep(c('p1','p2'),each=4),NA,NA), price=c(10,10,20,NA,20,20,20,NA,20,20), stringsAsFactors=F );
df;
## place price
## 1 p1 10
## 2 p1 10
## 3 p1 20
## 4 p1 NA
## 5 p2 20
## 6 p2 20
## 7 p2 20
## 8 p2 NA
## 9 <NA> 20
## 10 <NA> 20
fun(df,'place','p1','price',20);
## [1] 1.333333