How to append two dataframe by overwriting the existing rows - r

I have a dataframe say df. I have extracted a sample 5% rows from df and created a new dataframe df1 to do few manipulations in the dataset. Now I need to append df1 to df and overwrite the existing rows of df1 as it is a subset of df.
I tried to extract the rows that are not present in df using
df2 <- subset(df, !(rownames(df) %in% rownames(df1[])))
But this didnt work.
Can anyone help please.

Save the filter and re-use it like so
set.seed(357)
xy <- data.frame(col1 = letters[1:5], col2 = runif(5))
col1 col2
1 a 0.10728121
2 b 0.05504568
3 c 0.27987766
4 d 0.22486212
5 e 0.65348521
your.condition <- xy$col1 %in% c("c", "d")
newxy1 <- xy[your.condition, ]
newxy1$col2 <- 1:2
xy[your.condition, "col2"] <- newxy1$col2
xy
col1 col2
1 a 0.10728121
2 b 0.05504568
3 c 1.00000000
4 d 2.00000000
5 e 0.65348521

You should always try to make a reproducible example so that it is easy for others to help you
I have tried to do that with the help of mtcars dataset
#Copied mtcars data into df
df = mtcars
# sample 5 rows from df
df1 = df[sample(1:nrow(df), 5), ]
# did few manipulations in the dataset
df1 = df1 * 2
# overwrite the existing rows of df1 as it is a subset of df
df[rownames(df1), ] <- df1

Related

Printing a data.frame with zero columns but still has row.names?

Is there a built-in function to display a data.frame with zero columns but still show row.names?
> df
DataFrame with 5 rows and 0 columns
> row.names(df)
[1] "ID1" "ID2" "ID3" "ID4" "ID5"
It would be useful if instead:
> df
DataFrame with 5 rows and 0 columns
ID1
ID2
ID3
ID4
ID5
I wrote a custom function to do it via cat, but would be nice to know if there's a built-in way of doing it.
library(tidyverse)
df <- df %>%
select(-everything())
cat(print(df), cat(rownames(df), sep = "\n"))
Or could also be simplified to:
df %>%
select(-everything()) %>%
cat(print(.), cat(rownames(.), sep = "\n"))
Output
data frame with 0 columns and 2 rows
A
B
Or using base R, if you don't care about the information being displayed about the dataframe.
df <- df[1]
df[1] <- rep("", nrow(df))
colnames(df) <- ""
Output
A
B
Data
df <- data.frame(a = c(1, 2),
b = c(1, 2),
c = c(4, 5))
rownames(df) <- c("A", "B")

Select columns based on another column in a different data frame in R

I have a df:
AA <- c("GA","GA", "GA","GA","GA")
A <- c(1,2,3,4,5)
B <- c(5,4,3,2,1)
C <- c(2,3,4,5,1)
D <- c(4,3,2,1,5)
df <- data.frame(AA, A, B, C, D)
The other df is:
E <- c("B", "D")
F <- c("GA","GA")
df2 <- data.frame(E, F)
I would like to only select the columns from df based on the values from df2$E.
And that data frame would look like this:
AA <- c("GA","GA", "GA","GA","GA")
B <- c(5,4,3,2,1)
D <- c(4,3,2,1,5)
df3 <- data.frame(AA, B, D)
My current code below gives me a empty data frame with 0 obs and 5 variables
df3 <- df %>% filter(df %in% df2$E)
Any assistance in generating a code that works would be greatly appreciated.
Thank you!
Here we can index via column names.
df[,c("AA",df2$E)]

Subset a df and remove rows subsetted R

hello I have a df called df and I have subsetted it in another df called df1. Now I'd like to remove df1 rows from df to obtain a df2 = df - df1. How I can do it on R?
df <- read.csv("dataframe.csv")
df1 <- df[(df$time <= 0.345),]
Try:
df2 <- df[(df$time > 0.345), ]
or
df2 <- df[-which(df$time <= 0.345), ]
If for any reason you strictly have to keep the structure described, this is a possible approach:
df = data.frame(Sample.Name = c(12,13,14,12,13),
Target=c("A","B","C","A","A"),
Task=c("Sample","Standard","Sample","Standard","Sample"),
Value=c(36,34,34,35,36),
Mean=c(35,32,36,37,35))
df1 = df[(df$Value <= 34),]
df2 = df[do.call(paste0, df) %in% do.call(paste0, df1),]
df2
The result is this one:
Sample.Name Target Task Value Mean
2 13 B Standard 34 32
3 14 C Sample 34 36
This should work without even knowing the logic of first subset
library (dplyr)
df2 <- setdiff(df, df1)
OR
df2 <- anti_join(df, df1)

Combining two columns with character strings into a new column

Below I have two columns of data (column 6 and 7) of genus and species names. I would like to combine those two columns with character string data into a new column with the names combined.
I am quite new to R and the code below does not work! Thank you for the help wonderful people of stack overflow!
#TRYING TO MIX GENUS & SPECIES COLUMN
accepted_genus <- merged_subsets_2[6]
accepted_species <- merged_subsets_2[7]
accepted_genus
accepted_species
merged_subsets_2%>%
bind_cols(accepted_genus, accepted_species)
merged_subsets_2
We can use str_c from stringr
library(dplyr)
library(stringr)
df %>%
mutate(Col3 = str_c(Col1, Col2))
Or with unite
library(tidyr)
df %>%
unite(Col3, Col1, Col2, sep="", remove = FALSE)
Please take a look at this if this doesn't answer your question.
df <- data.frame(Col1 = letters[1:2], Col2=LETTERS[1:2]) # Sample data
> df
Col1 Col2
1 a A
2 b B
df$Col3 <- paste0(df$Col1, df$Col2) # Without spacing
> df
Col1 Col2 Col3
1 a A aA
2 b B bB
df$Col3 <- paste(df$Col1, df$Col2)
> df
Col1 Col2 Col3
1 a A a A
2 b B b B

Selecting rows from a data frame from combinations of lists given by another dataframe [duplicate]

This question already has answers here:
Selecting rows from a data frame from combinations of lists [duplicate]
(2 answers)
Closed 5 years ago.
I have a dataframe, dat:
dat<-data.frame(col1=rep(1:4,3),
col2=rep(letters[24:26],4),
col3=letters[1:12])
I want to filter dat on two different columns using ONLY the combinations given by the rows in the data frame filter:
filter<-data.frame(col1=1:3,col2=NA)
lists<-list(list("x","y"),list("y","z"),list("x","z"))
filter$col2<-lists
So for example, rows containing (1,x) and (1,y), would be selected, but not (1,z),(2,x), or (3,y).
I know how I would do it using a for loop:
#create a frame to drop results in
results<-dat[0,]
for(f in 1:nrow(filter)){
temp_filter<-filter[f,]
temp_dat<-dat[dat$col1==temp_filter[1,1] &
dat$col2%in%unlist(temp_filter[1,2]),]
results<-rbind(results,temp_dat)
}
Or if you prefer dplyr style:
require(dplyr)
results<-dat[0,]
for(f in 1:nrow(filter)){
temp_filter<-filter[f,]
temp_dat<-filter(dat,col1==temp_filter[1,1] &
col2%in%unlist(temp_filter[1,2])
results<-rbind(results,temp_dat)
}
results should return
col1 col2 col3
1 1 x a
5 1 y e
2 2 y b
6 2 z f
3 3 z c
7 3 x g
I would normally do the filtering using a merge, but I can't now since I have to check col2 against a list rather than a single value. The for loop works but I figured there would be a more efficient way to do this, probably using some variation of apply or do.call.
A solution using tidyverse. dat2 is the final output. The idea is to extract the value from the list column of filter data frame. Convert the filter data frame to the format as filter2 with the col1 and col2 columns having the same components in dat data frame. Finally, use semi_join to filter dat to create dat2.
By the way, filter is a pre-defined function in the dplyr package. In your example you used dplyr package, so it is better to avoid naming a data frame as filter.
library(tidyverse)
filter2 <- filter %>%
mutate(col2_a = map_chr(col2, 1),
col2_b = map_chr(col2, 2)) %>%
select(-col2) %>%
gather(group, col2, -col1)
dat2 <- dat %>%
semi_join(filter2, by = c("col1", "col2")) %>%
arrange(col1)
dat2
col1 col2 col3
1 1 x a
2 1 y e
3 2 y b
4 2 z f
5 3 z c
6 3 x g
Update
Another way to prepare the filter2 package, which does not need to know how many elements are in each list. The rest is the same as the previous solution.
library(tidyverse)
filter2 <- filter %>%
rowwise() %>%
do(data_frame(col1 = .$col1, col2 = flatten_chr(.$col2)))
dat2 <- dat %>%
semi_join(filter2, by = c("col1", "col2")) %>%
arrange(col1)
This is doable with a straight-forward join once you get the filter list back to a standard data.frame:
merge(
dat,
with(filter, data.frame(col1=rep(col1, lengths(col2)), col2=unlist(col2)))
)
# col1 col2 col3
#1 1 x a
#2 1 y e
#3 2 y b
#4 2 z f
#5 3 x g
#6 3 z c
Arguably, I'd do away with whatever process is creating those nested lists in the first place.

Resources