subset dataframe by specific string entries in R

subset dataframe by specific string entries in R - r

I have data in data frame format here (data). I want to subset the data by using a specific string "Spatially clustered". So, the subset data will the data frame with all columns with entries that are "Spatially clustered". How can I do that? I have tried this
moran_deviation_data_multiple_correction_1january_raw_pval_conclusion = data
moran_deviation_data_multiple_correction_1january_raw_pval_conclusion_spatially_clustered = select(moran_deviation_data_multiple_correction_1january_raw_pval_conclusion, matches("clustered"))
moran_deviation_data_multiple_correction_1january_raw_pval_conclusion_spatially_clustered
also this one
moran_deviation_data_multiple_correction_1january_raw_pval_conclusion_spatially_clustered = moran_deviation_data_multiple_correction_1january_raw_pval_conclusion[apply(moran_deviation_data_multiple_correction_1january_raw_pval_conclusion,1, function(x) any(grepl("dispersed", x))), ]
moran_deviation_data_multiple_correction_1january_raw_pval_conclusion_spatially_clustered
However, the result is not what I expected.

Perhaps this helps
library(dplyr)
library(stringr)
df2 <- df1 %>%
select(where(~ any(str_detect(.x, "Spatially clustered"))))
-output
> dim(df2)
[1] 5 17989
> dim(df1)
[1] 5 23474

Related

How to sum values in a DF column that are two individual values

i am unable to add column values in R in my DF. after reading the CSV, I need to calculate the individual cell values which have two values. How do I go about doing that?
enter image description here

Here are two solutions with toy data:
df <- data.frame(
Variable = c("1-3", "40-45")
)
with sapply:
library(stringr)
sapply(lapply(str_extract_all(df$Variable, "\\d+"), as.numeric), sum)
[1] 4 85
with map_dbl:
library(purrr)
library(stringr)
library(dplyr)
df %>%
mutate(
# extract whatever digits are there:
add = str_extract_all(df$Variable, "\\d+"),
# map the digits to one another and add the first to the second:
add = map_dbl(add, function(x) as.numeric(x)[1] + as.numeric(x)[2]))
Variable add
1 1-3 4
2 40-45 85

R Subsetting text from a comma seperated column in a data-frame

I have a data.frame with a column that looks like that:
diagnosis
F.31.2,A.43.2,R.45.2,F.43.1
I want to somehow split this column into two colums with one containing all the values with F and one for all the other values, resulting in two columns in a df that looks like that.
F other
F.31.2,F43.1 A.43.2,R.45.2
Thanks in advance

Try next tidyverse approach. You can separate the rows by , and then create a group according to the pattern in order to reshape to wide and obtain the expected result:
library(dplyr)
library(tidyr)
#Data
df <- data.frame(diagnosis='F.31.2,A.43.2,R.45.2,F.43.1',stringsAsFactors = F)
#Code
new <- df %>% separate_rows(diagnosis,sep = ',') %>%
mutate(Group=ifelse(grepl('F',diagnosis),'F','Other')) %>%
pivot_wider(values_fn = toString,names_from=Group,values_from=diagnosis)
Output:
# A tibble: 1 x 2
F Other
<chr> <chr>
1 F.31.2, F.43.1 A.43.2, R.45.2

First, use strsplit at the commas. Then, using grep find indexes of F, and select/antiselect them by multiplying by 1 or -1 and paste them.
tmp <- el(strsplit(d$diagnosis, ","))
res <- lapply(c(1, -1), function(x) paste(tmp[grep("F", tmp)*x], collapse=","))
res <- setNames(as.data.frame(res), c("F", "other"))
res
# F other
# 1 F.31.2,F.43.1 A.43.2,R.45.2
Data:
d <- setNames(read.table(text="F.31.2,A.43.2,R.45.2,F.43.1"), "diagnosis")

filter dates not contained in seperate data frame

I have three data frames,
df1= 2 columns, dates and amounts
df2 = 2 columns, dates and amounts
df3 = 1 column, list of bank holidays
I have combined DF1+2,
FULLDF <- left_join(df1, df2, by=c("date"))
Now i am trying to filter FULLDF to exclude the dates in df3. I have tried subsetting and filtering but neither are providing me with the results needed.
NOBHDF <- subset.data.frame(FULLDF != BH)
NOBHDF <- filter(FULLDF[, 1] != BH )
Is this something someone could provide some guidance on?
Thanks

This code should do the job (tidyverse way) :
library(dplyr)
df <- df1 %>%
left_join(df2, by = "date") %>%
anti_join(df3, by = "date")

Frequency table with common values of 5 tables

I have 5 data frames and I have to analize just the first column. From these, I must obtain a frequency table of their common words (not necessarily of all data frames, for example a word can appear just in two or more dataframes).
Then I must obtain a frequency table of common words of ALL dataframes
I just tried doing a for cycle but I seems very complicated. Moreover, dataframes have different dimentions. I didn't find any useful function.
Then I tried doing
lst1 <- list(a,b,c,d,e)
newdat <- stack(setNames(lapply(lst1, "[", 1), seq_along(lst1)))[2:1]
library(dplyr)
newdat %>% group_by(val) %>% filter(uniqueN(ind) > 1) %>% count(val)
but it gives me an error
> stack(setNames(lapply(lst1, "[", 1), seq_along(lst1)))
Error in stack.default(setNames(lapply(lst1, "[", 1), seq_along(lst1))):
at least one vector element is required
Thank you

Here's my solution using purrr & dplyr:
library(purrr)
library(dplyr)
lst1 <- list(mtcars=mtcars, iris=iris, chick=chickwts, cars=cars, airqual=airquality)
lst1 %>%
map_dfr(select, value=1, .id="df") %>% # select first column of every dataframe and name it "value"
group_by(value) %>%
summarise(freq=n(), # frequency over all dataframes
n_df=n_distinct(df), # number of dataframes this value ocurrs
dfs = paste(unique(df), collapse=",")) %>%
filter(n_df > 1) %>%
filter(n_df == 5) # if value has to be in all 5 dataframes

R: subset a data frame condition on sum of the subset data frame

I have a data frame manipulation question.
I would like to find the subset of data frame "data1" which sum of each col equal to another data frame "data2".
Here is my code:
AA<-c(2,3,1,4,9)
BB<-c(5,13,9,1,2)
A1<-c(5)
B1<-c(18)
data1<-data.frame(AA,BB)
data2<-data.frame(A1,B1)
library(dplyr)
subset(data1, ((sum(AA) ==data2$A1 ) && (sum(BB) ==data2$B1 ) ) )
I am wondering if any other algorithm would help?
Thanks!

This solution only considers the scenario that you want to calculate the sum from any two rows. If you want to test other row numbers, you will need to create those combinations by changing the numbers in the combn function. final_data is the final output. If there are multiple matches, you may want to keep the final_data as a list.
# Prepare example datasets
AA<-c(2,3,1,4,9)
BB<-c(5,13,9,1,2)
A1<-c(5)
B1<-c(18)
data1<-data.frame(AA,BB)
data2<-data.frame(A1,B1)
# Load packages
library(tidyverse)
# Use combn to find out all the combination of row number
row_indices <- as.data.frame(t(combn(1:nrow(data1), 2)))
# Prepare a list of data frame. Each data frame is one row from row_indices
row_list <- row_indices %>%
rowid_to_column() %>%
split(f = .$rowid)
# Based on row_list to subset data1
sub_list <- map(row_list, function(dt){
temp_data <- data1 %>% filter(row_number() %in% c(dt$V1, dt$V2))
return(temp_data)
})
# Calcualte the sum of each data frame in sub_list
sub_list2 <- map(sub_list, function(dt){
dt2 <- dt %>%
summarise_all(funs(sum(.))) %>%
setNames(c("A1", "B1"))
return(dt2)
})
# Compare each data frame in sub_list2 with data2
# Find the one that is the same and store the logical results in result_indices
result_indices <- map_lgl(sub_list2, function(dt) setequal(dt, data2))
# Get the final output
final_data <- sub_list[result_indices][[1]]
final_data
AA BB
1 2 5
2 3 13

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

subset dataframe by specific string entries in R - r

Perhaps this helps library(dplyr) library(stringr) df2 <- df1 %>% select(where(~ any(str_detect(.x, "Spatially clustered")))) -output > dim(df2) [1] 5 17989 > dim(df1) [1] 5 23474

Related

How to sum values in a DF column that are two individual values

R Subsetting text from a comma seperated column in a data-frame

filter dates not contained in seperate data frame

Frequency table with common values of 5 tables

R: subset a data frame condition on sum of the subset data frame

Categories

Resources