hello I have a df called df and I have subsetted it in another df called df1. Now I'd like to remove df1 rows from df to obtain a df2 = df - df1. How I can do it on R?
df <- read.csv("dataframe.csv")
df1 <- df[(df$time <= 0.345),]
Try:
df2 <- df[(df$time > 0.345), ]
or
df2 <- df[-which(df$time <= 0.345), ]
If for any reason you strictly have to keep the structure described, this is a possible approach:
df = data.frame(Sample.Name = c(12,13,14,12,13),
Target=c("A","B","C","A","A"),
Task=c("Sample","Standard","Sample","Standard","Sample"),
Value=c(36,34,34,35,36),
Mean=c(35,32,36,37,35))
df1 = df[(df$Value <= 34),]
df2 = df[do.call(paste0, df) %in% do.call(paste0, df1),]
df2
The result is this one:
Sample.Name Target Task Value Mean
2 13 B Standard 34 32
3 14 C Sample 34 36
This should work without even knowing the logic of first subset
library (dplyr)
df2 <- setdiff(df, df1)
OR
df2 <- anti_join(df, df1)
Related
I have a dataset df1 like so:
snp <- c("rs7513574_T", "rs1627238_A", "rs1171278_C")
p.value <- c(2.635489e-01, 9.836280e-01 , 6.315047e-01 )
df1 <- data.frame(snp, p.value)
I want to remove the _ underscore and the letters after it (representing allele) in df1 and make this into a new dataframe df2
I tried this using the code
df2 <- df1[,c("snp", "allele"):=tstrsplit(`snp`, "_", fixed = TRUE)]
However, this changes the df1 data frame. Is there another way to do this?
This is my best guess as to what you want:
library(tidyr)
separate(df1, snp, into = c("snp", "allele"), sep = "_")
# snp allele p.value
# 1 rs7513574 T 0.2635489
# 2 rs1627238 A 0.9836280
# 3 rs1171278 C 0.6315047
df2 = df1 %>%
dplyr::mutate(across(c(V1, V2, V3), ~stringr::str_remove_all(., "_[:alpha:]")))
> df2
V1 V2 V3
snp rs7513574 rs1627238 rs1171278
p.value 0.2635489 0.983628 0.6315047
Try:
df2 <- df1 %>% mutate(snp=gsub("_.","",snp))
Consider creating a copy of the dataset and do the tstrsplit on the copied data to avoid changes in original data
library(data.table)
df2 <- copy(df1)
setDT(df2)[,c("snp", "allele") := tstrsplit(snp, "_", fixed = TRUE)]
I'm coding in R. I have a big data frame (df1) and a little data frame (df2). df2 is a subset of df1, but in a random order. I need to know the row indices of df1 which occur in df2. All of the specific cell values have lots of duplicates. Tapirus terrestris shows up more than once, as does each ModType value. I tried experimenting with which() and grpl() but couldn't get my code to work.
df1 <- data.frame(
SpeciesName = c('Tapirus terrestris', 'Panthera onca', 'Leopardus tigrinus' , 'Leopardus tigrinus'),
ModType = c('ANN', 'GAM', 'GAM','RF'),
Variable_scale = c('aspect_s2_sd', 'CHELSAbio1019_s3_sd','CHELSAbio1015_s4_sd','CHELSAbio1015_s4_sd'))
df2 <- data.frame(
SpeciesName = c('Tapirus terrestris', 'Leopardus tigrinus'),
ModType = c('ANN', 'RF'),
Variable_scale = c('aspect_s2_sd', 'CHELSAbio1015_s4_sd'))
Should output an array: 1,4 because df1 rows 1 and 4 occur in df2.
You can create an index column in df1 and merge the datasets.
df1$index <- 1:nrow(df1)
df3 <- merge(df1, df2)
df3$index
#[1] 4 1
You can use match.
df1[match(df2$SpeciesName, df1$SpeciesName), ]
Another option is tidyverse
library(dplyr)
df1 %>%
mutate(index = row_number()) %>%
inner_join(df2)
I have two data frames, df1 has stock symbols and values. df2 correlations with the same names but arranged as rows. df1 has many more columns than df2, but all columns that are in df2 exist in df1. I need to multiply matching columns and store newly created values as a new dataframe. The new dataframe will only have a stock symbol and then all multiplications of df1*df2.
The data looks like this:
df1
A Company Symbol Earn.GR MF Effic MF
TRUE 1.320005832 -0.080712181
df2:
Variable Corr
1 Val MF 0.312140675
2 Earn.GR.withCorr MF 0.992410721
I have tried this code, but not getting the expected result:
Transpose df2:
df2 <- transpose (df2)
rownames(df2) <- colnames(df2)
Match and multiply columns
df3 <- df1[names(df1) %in% names(df2)] <- sapply(names(df1[names(df1) %in% names(df2)]),
function(x) df1[[x]] * df2[[x]])
Thanks in advance.
With base R, you could do something like this
df1 = as.data.frame(matrix(1:14,2,7))
df2 = as.data.frame(matrix(15:28,2,7))
names(df1)= letters[1:7]
names(df2)= c("a","d",letters[9:12],"b")
m = match(names(df1),names(df2))
newdf = setNames(df1[,which(!is.na(m))]*df2[,na.omit(m)],
paste0("mult_",names(df2[,na.omit(m)])))
> newdf
mult_a mult_b mult_d
1 15 81 119
2 32 112 144
Find common columns using intersect, subset from both the dataframe and multiply
common_cols <- intersect(names(df1), names(df2))
df3 <- df1[common_cols] * df2[common_cols]
df3
df3
# a c
#1 2 144
#2 6 169
#3 12 196
#4 20 225
#5 30 256
data
df1 <- data.frame(a = 1:5, b = 11:15, c = 12:16)
df2 <- data.frame(a = 2:6, d = 11:15, c = 12:16, e = 1:5)
Update
Since you have unI think you need to merge before multiplying
df3 <- merge(df1[common_cols], df2[common_cols], by = "Company")
cbind(df3[1], df3[-1][c(TRUE, FALSE)] * df3[-1][c(FALSE, TRUE)])
I have 300 locations, let's say "A1, A2, A3,..., A300" and I have 40 values. My Locations are in df1 and my values are in df2. I want to add those 40 values to each location, location A1 would have codes from to 40 and so on.
I tried to make a for loop:
df1 <- c("A1", "A2", "A3")
df1 <- data.frame(df1)
colnames(df1) <- c("location")
df2 <- c(1:40)
df2 <-data.frame(df2)
colnames(df2) <- c("code")
data <- data.frame() #Empty data.frame
for (i in df2) {
temp <- df1
temp$code <- rep(i)
data1 <- rbind(data, temp)
}
This script results in an Error: 'replacement has 40 rows, data has 315'.
Can someone tell me what should I do to make this work?
Desired output:
We can use aggregate
aggregate(Value ~Location, df1, sum)
If the values are in a different dataset and have the same order as in the original dataset 'Location', just do a cbind and aggregate
aggregate(Value ~Location, cbind(df1, df2), sum)
Assuming that there are no common columns in each dataset to merge
Update
Based on the OP's update
expand.grid(location = df1$location, code = df2$code)
Or CJ from data.table
library(data.table)
CJ(location = df1$location, code = df2$code)
I have a dataframe say df. I have extracted a sample 5% rows from df and created a new dataframe df1 to do few manipulations in the dataset. Now I need to append df1 to df and overwrite the existing rows of df1 as it is a subset of df.
I tried to extract the rows that are not present in df using
df2 <- subset(df, !(rownames(df) %in% rownames(df1[])))
But this didnt work.
Can anyone help please.
Save the filter and re-use it like so
set.seed(357)
xy <- data.frame(col1 = letters[1:5], col2 = runif(5))
col1 col2
1 a 0.10728121
2 b 0.05504568
3 c 0.27987766
4 d 0.22486212
5 e 0.65348521
your.condition <- xy$col1 %in% c("c", "d")
newxy1 <- xy[your.condition, ]
newxy1$col2 <- 1:2
xy[your.condition, "col2"] <- newxy1$col2
xy
col1 col2
1 a 0.10728121
2 b 0.05504568
3 c 1.00000000
4 d 2.00000000
5 e 0.65348521
You should always try to make a reproducible example so that it is easy for others to help you
I have tried to do that with the help of mtcars dataset
#Copied mtcars data into df
df = mtcars
# sample 5 rows from df
df1 = df[sample(1:nrow(df), 5), ]
# did few manipulations in the dataset
df1 = df1 * 2
# overwrite the existing rows of df1 as it is a subset of df
df[rownames(df1), ] <- df1