Differences between two data frames in R [closed] - r

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have two data frames, each with 9 columns, and DF2 is a subset of DF1. I'm trying to create a third data frame that contains only the contents of DF1 that are NOT present in DF2.
What is the most efficient way of doing this? I can write a while loop, but I was wondering if there is another way (besides sqldf as for some reason I cannot upload it into my R Studio) that I can do this?

The following can work (directly from Identify records in data frame A not contained in data frame B)
fun.12 <- function(x.1,x.2,...){
x.1p <- do.call("paste", x.1)
x.2p <- do.call("paste", x.2)
x.1[! x.1p %in% x.2p, ]
}
DF1 <- data.frame(a=c(1,2,3,4,5), b=c(1,2,3,4,5))
DF2 <- data.frame(a=c(1,1,2,3,4), b=c(1,1,99,3,4))
fun.12(DF1, DF2)
# a b
# 2 2 2
# 5 5 5

Related

How do you group data with the same name in a vector? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 10 months ago.
Improve this question
I have a DataFrame with three columns:region, year, grdp.
How do I group data with the same name in 'region' column.
Here's the code to create a sample dataset:
Here's the desired result:
store data of values with the same name in the 'region' column
ex) 'region' column has three "서울특별시" data. I want to group the three "서울특별시" data in three columns and assign it to a variable
I'm not completely understanding the question, but I think one of these two might solve what you're looking for?
library(dplyr)
df <- data.frame(region=sample(c('x','y','z'),100,replace=TRUE),
year=sample(c(2017,2018,2019),100,replace=TRUE),
GRDP=sample(200000000:400000000,100))
regions <- unique(df$region)[order(unique(df$region))]
#OPTION 1
for(i in 1:length(regions)){
assign(tolower(LETTERS[i]),df %>% filter(region==regions[i]))
}
a
b
c
#OPTION 2
ltrs <- tolower(LETTERS[1:length(regions)])
df['ex)'] <- sapply(df$region,FUN=function(x){ltrs[which(regions==x)]})
head(df)

Scale only certain columns R [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
The community reviewed whether to reopen this question 11 months ago and left it closed:
Duplicate This question has been answered, is not unique, and doesn’t differentiate itself from another question.
Improve this question
How can I scale(x) only certain columns of a dataframe? I have a dataframe with 7 columns and I want to scale only column 3 and 6. The rest should stay as it is.
We can do this with lapply. Subset the columns of interest, loop through them with lapply, assign the output back to the subset of data. Here, we are using c because the outpuf of scale is a matrix with a single column. Using c or as.vector, it gets converted to vector
df[c(3,6)] <- lapply(df[c(3, 6), function(x) c(scale(x)))
Or another option is mutate_at from dplyr
library(dplyr)
df %>%
mutate_at(c(3,6), funs(c(scale(.))))

How to check for a particular data of one data frame in some other data frame using R [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I have two data frames called df1 and df2, and df1 has 2 columns named poi, score. And the another data frame df2 has only one column called poi_ and it contains a few common data from df1$poi. I would be needing to check which df2$poi_ have their score defined in df1$poi and if score is present then put a new column called score_ in df2 and fill the column with the score found in df1
try this:
res <- merge(df2,df1,by.x="poi_",by.y="poi")
names(res)[2] <- "score_"

WeiRd: R does not find value but it's just there [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 2 years ago.
Improve this question
Trying to merge two data frames, using a variable called hash_id. For some reason R does not recognize the hash-id's in one of the data frames, while it does so in the other.
I have checked and I just don't get it. See below how I checked:
> head(df1[46],1) # so I take the first 'hash-id' from df1
# hash_id
# 1 abab123123
> which(df2 == "abab123123", arr.ind=TRUE) # here it shows that row 6847 contains a match
# row col
# [1,] 6847 32`
> which(df1 == "abab123123", arr.ind=TRUE) # and here there is NO matching value!
# row col
#
One possibility is trailing or leading spaces in the concerned columns for one of the datasets. You could do:
library(stringr)
df1[, "hash_id"] <- str_trim(df1[,"hash_id"])
df2[, "hash_id"] <- str_trim(df2[, "hash_id"])
which(df1[, "hash_id"]=="abab123123", arr.ind=TRUE)
which(df2[, "hash_id"]=="abab123123", arr.ind=TRUE)
Another way would be use grep
grepl("\\babab123123\\b", df1[,"hash_id"])
grepl("\\babab123123\\b", df2[,"hash_id"])

Sliding window using R [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have a data frame with daily data in R (148 columns by 6230 rows). I want to find the correlations coefficients using sliding windows with length of 600 (days) with windows displacement of 5 (days) and trying to generate 1220 correlation matrices (approx.). All the examples that I saw used only one information vector. There exist an easy way to find those correlation matrices using sliding window? I'll appreciate any suggestion.
If M is the input matrix then each row of out is one correlation matrix strung out column by column:
library(zoo)
out <- rollapply(M, 600, by = 5, function(x) c(cor(x)), by.column = FALSE)
They could be reshaped into a list of correlation matrices, if need be:
L <- lapply(1:nrow(out), function(i) matrix(out[i, ], ncol(M)))
or as an array:
simplify2array(L)

Resources