What I want to do is generate a new column in a dataframe that meets these conditions:
dataframe1$var1 == dataframe2$var1 &
dataframe1$var2 == dataframe2$var2 &
dataframe1var3 == dataframe3$var3*
Basically I need to generate a dummy variable that has the value 1 if the conditions are met, and the value 0 if they are not.
I've tried the following code that doesn't work:
dataframe1$NewVar <- ifelse(dataframe1$var1 == dataframe2$var1 &
dataframe1$var2 == dataframe2$var2 & dataframe1$var3 == dataframe2$var3 , 1, 0)
Data
dput(df1)
structure(list(var1 = c("A", "B", "C"), var2 = c("X", "X", "X"
), var3 = c(1, 2, 2)), .Names = c("var1", "var2", "var3"), row.names = c(NA,
-3L), class = "data.frame")
dput(df2)
structure(list(var1 = c("A", "A", "C"), var2 = c("X", "X", "Y"
), var3 = c(1, 1, 1)), .Names = c("var1", "var2", "var3"), row.names = c(NA,
-3L), class = "data.frame")
btw my dataset is not as simple as the example I posted in the pictures.
I don't know if it's relevant but values in my variables (columns) would look like this:
var1: 24000000000
var2: 1234567
var3: 8
You can simply do,
as.integer(rowSums(df1 == df2) == ncol(df1))
#[1] 1 0 0
Related
I have a df:
df<-structure(list(Name = c("test", "a", "nb", "c", "r", "f", NA,
"d", "ee", "test", "value", "test", "b")), row.names = c(NA,
-13L), class = c("tbl_df", "tbl", "data.frame"))
How can I only keep the row which upper row=="test" and row value !="value"?
The new df1 will looks like this (any of either case is Ok):
library(dplyr)
df %>%
filter(lag(Name == "test"), Name != "value")
# A tibble: 2 x 1
Name
<chr>
1 a
2 b
I have a dataframe generated by a function:
Each time it's of different number of rows:
structure(list(a = c(1, 2, 3), b = c("er", "gd", "ku"), c = c(43,
453, 12)), .Names = c("a", "b", "c"), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"))
structure(list(a = c(1, 2), b = c("er", "gd"), c = c(43, 453)), .Names = c("a",
"b", "c"), row.names = c(NA, -2L), class = c("tbl_df", "tbl",
"data.frame"))
I want to be able like in a while loop to control the number of rows to be less then n (n = 4, 100, 4242...) when I bind rows.
Please advise how to do this using functional programming without a while loop?
I mean sometimes you will get n = 10 and the df before bind_rows is 7 and after binding the last one it will be 20. It's ok, I want the number of rows to be min_k (k >= n)
Here is my while loop doing this:
b <- list()
total_rows <- 0
while(total_rows < 1000) {
df <- f_produce_rand_df()
b[[length(b) + 1]] <- df
total_rows <- total_rows + nrow(df)
}
Suppose I have three tables as below:
table1 <- structure(list(Pos = 1:6, A = c(16.8508287292818, 0, 0.552486187845304,
0, 1.10497237569061, 1.38121546961326), C = c(1.93370165745856,
0.276243093922652, 0.828729281767956, 0.276243093922652, 0, 0.552486187845304
), G = c(1.10497237569061, 2.48618784530387, 0.276243093922652,
0.828729281767956, 0.276243093922652, 0), T = c(0.828729281767956,
0, 0.828729281767956, 1.10497237569061, 0, 0)), .Names = c("Pos",
"A", "C", "G", "T"), row.names = c(NA, 6L), class = "data.frame")
table2<- structure(list(Pos = 1:6, A = c(4.15584415584416, 1.03896103896104,
0.779220779220779, 0.692640692640693, 2.25108225108225, 2.94372294372294
), C = c(1.12554112554113, 0.173160173160173, 0.173160173160173,
0.519480519480519, 0.173160173160173, 0.173160173160173), G = c(1.03896103896104,
0.346320346320346, 0.0865800865800866, 0.432900432900433, 0.519480519480519,
0.0865800865800866), T = c(2.77056277056277, 0.606060606060606,
0.25974025974026, 0.692640692640693, 0.346320346320346, 0.25974025974026
)), .Names = c("Pos", "A", "C", "G", "T"), row.names = c(NA,
6L), class = "data.frame")
table3 <- structure(list(Pos = 1:6, A = c(10.3492063492063, 0.317460317460317,
0.349206349206349, 0.920634920634921, 1.96825396825397, 1.23809523809524
), C = c(0.825396825396825, 0.126984126984127, 0.349206349206349,
0.317460317460317, 0.19047619047619, 0.253968253968254), G = c(0.761904761904762,
0.952380952380952, 0.285714285714286, 0.412698412698413, 0.126984126984127,
0.19047619047619), T = c(1.07936507936508, 0.412698412698413,
0.476190476190476, 0.253968253968254, 0.19047619047619, 0.253968253968254
)), .Names = c("Pos", "A", "C", "G", "T"), row.names = c(NA,
6L), class = "data.frame")
I have now saved the table names as files.table:
files.table <- paste0("table", seq(1:3))
My problem is that I could not run this bind_rows function to bind table1, table2 and table3 using files.table instead of listing all three tables. This is the error I get: Error in bind_rows_(x, .id) : Argument 1 must have names
This is the code I tried:
bind.table <- bind_rows(files.table, .id = "table") %>%
gather(Base, Percent, -Pos, -table)
The .id argument for bind_rows sets the name of the variable containing the name of the list item each row came from, not these names themselves. You set the table names by naming the items in the list. Then, bind_rows will get those names and put them into a column with a name you specify:
table_list <- list(table1, table2, table3)
names(table_list) <- paste0("table", seq(1:3))
bind.table <- bind_rows(table_list, .id = 'id')
From ?bind_rows:
Each argument can either be a data frame, a list that could be a data
frame, or a list of data frames
The easiest way to get the data frames into bind_rows is to assemble them into a list and then just pass the list of data frames in. As #joran suggests, the easiest way to do this is to load or generate them in a lapply function which will automatically output a list that can go into bind_rows.
I've several data objects nested in a huge object which I need to stack using rbind. However, before stacking these, I need to convert column names to lower case, once the data objects were stored with different case styles. How could I make this happen?
Toy data
df <- list(structure(list(a = 1:3, x = c(-1.99, -1.11, -0.34), y = c("C", "B", "A")), .Names = c("a", "x",
"y"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-3L)), structure(list(a = 1:3, x = c(-0.44, -1.07,
-0.23)), .Names = c("A", "x"), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -3L)), structure(list(
a = 1:3, x = c(-0.62, -0.60, -0.06
), y = c(3L, 2L, 1L)), .Names = c("a", "X", "y"), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -3L)))
lapply(df, names)
rbind
data.table::rbindlist(df, fill=TRUE, idcol = TRUE)
Here is a solution using lapply. However, it creates a duplicate of the original list.
df_lower <- lapply(df, function(x) setNames(x, tolower(names(x))))
Design a function and use lapply to apply that function to all the data frames. This will change all column names to lower cases.
colname_fun <- function(dt){
dt <- setNames(dt, tolower(names(dt)))
return(dt)
}
lapply(df, colname_fun)
I have a question in exporting data frame from list into txt file. I found some solutions, but it was only for vectors. Here is one example:
dataframe1 <- data.frame(a= c(1,2,3,4,5), b= c(1,1,1,1,1))
dataframe2 <- data.frame(a= c(5,5,5), b= c(1,1,1))
mylist <- list(dataframe1, dataframe2)
I would like that the txt file looks like this:
$dataframe1
a b
1 1
2 1
3 1
4 1
5 1
$dataframe2
a b
5 1
5 1
5 1
Thank you for the help.
Say your list is named:
mylist<-structure(list(dataframe1 = structure(list(a = c(1, 2, 3, 4,
5), b = c(1, 1, 1, 1, 1)), .Names = c("a", "b"), row.names = c(NA,
-5L), class = "data.frame"), dataframe2 = structure(list(a = c(5,
5, 5), b = c(1, 1, 1)), .Names = c("a", "b"), row.names = c(NA,
-3L), class = "data.frame")), .Names = c("dataframe1", "dataframe2"
))
You can try:
con<-file("temp.csv",open="at")
Map(function(x,y) {cat(file=con,y,"\n");write.table(x,file=con,quote=FALSE,row.names=FALSE)},
mylist,names(mylist))
close(con)
The above will write the files on the file temp.csv. You have to give names to your list if you want it to work.
Alternatively, if you are ok with the print method, you can just redirect the standard output to a file:
sink("temp.csv")
print(mylist)
sink(NULL)