I have data in data frame format here (data). I want to subset the data by using a specific string "Spatially clustered". So, the subset data will the data frame with all columns with entries that are "Spatially clustered". How can I do that? I have tried this
moran_deviation_data_multiple_correction_1january_raw_pval_conclusion = data
moran_deviation_data_multiple_correction_1january_raw_pval_conclusion_spatially_clustered = select(moran_deviation_data_multiple_correction_1january_raw_pval_conclusion, matches("clustered"))
moran_deviation_data_multiple_correction_1january_raw_pval_conclusion_spatially_clustered
also this one
moran_deviation_data_multiple_correction_1january_raw_pval_conclusion_spatially_clustered = moran_deviation_data_multiple_correction_1january_raw_pval_conclusion[apply(moran_deviation_data_multiple_correction_1january_raw_pval_conclusion,1, function(x) any(grepl("dispersed", x))), ]
moran_deviation_data_multiple_correction_1january_raw_pval_conclusion_spatially_clustered
However, the result is not what I expected.
Perhaps this helps
library(dplyr)
library(stringr)
df2 <- df1 %>%
select(where(~ any(str_detect(.x, "Spatially clustered"))))
-output
> dim(df2)
[1] 5 17989
> dim(df1)
[1] 5 23474
Related
i am unable to add column values in R in my DF. after reading the CSV, I need to calculate the individual cell values which have two values. How do I go about doing that?
enter image description here
Here are two solutions with toy data:
df <- data.frame(
Variable = c("1-3", "40-45")
)
with sapply:
library(stringr)
sapply(lapply(str_extract_all(df$Variable, "\\d+"), as.numeric), sum)
[1] 4 85
with map_dbl:
library(purrr)
library(stringr)
library(dplyr)
df %>%
mutate(
# extract whatever digits are there:
add = str_extract_all(df$Variable, "\\d+"),
# map the digits to one another and add the first to the second:
add = map_dbl(add, function(x) as.numeric(x)[1] + as.numeric(x)[2]))
Variable add
1 1-3 4
2 40-45 85
I have a data.frame with a column that looks like that:
diagnosis
F.31.2,A.43.2,R.45.2,F.43.1
I want to somehow split this column into two colums with one containing all the values with F and one for all the other values, resulting in two columns in a df that looks like that.
F other
F.31.2,F43.1 A.43.2,R.45.2
Thanks in advance
Try next tidyverse approach. You can separate the rows by , and then create a group according to the pattern in order to reshape to wide and obtain the expected result:
library(dplyr)
library(tidyr)
#Data
df <- data.frame(diagnosis='F.31.2,A.43.2,R.45.2,F.43.1',stringsAsFactors = F)
#Code
new <- df %>% separate_rows(diagnosis,sep = ',') %>%
mutate(Group=ifelse(grepl('F',diagnosis),'F','Other')) %>%
pivot_wider(values_fn = toString,names_from=Group,values_from=diagnosis)
Output:
# A tibble: 1 x 2
F Other
<chr> <chr>
1 F.31.2, F.43.1 A.43.2, R.45.2
First, use strsplit at the commas. Then, using grep find indexes of F, and select/antiselect them by multiplying by 1 or -1 and paste them.
tmp <- el(strsplit(d$diagnosis, ","))
res <- lapply(c(1, -1), function(x) paste(tmp[grep("F", tmp)*x], collapse=","))
res <- setNames(as.data.frame(res), c("F", "other"))
res
# F other
# 1 F.31.2,F.43.1 A.43.2,R.45.2
Data:
d <- setNames(read.table(text="F.31.2,A.43.2,R.45.2,F.43.1"), "diagnosis")
I have three data frames,
df1= 2 columns, dates and amounts
df2 = 2 columns, dates and amounts
df3 = 1 column, list of bank holidays
I have combined DF1+2,
FULLDF <- left_join(df1, df2, by=c("date"))
Now i am trying to filter FULLDF to exclude the dates in df3. I have tried subsetting and filtering but neither are providing me with the results needed.
NOBHDF <- subset.data.frame(FULLDF != BH)
NOBHDF <- filter(FULLDF[, 1] != BH )
Is this something someone could provide some guidance on?
Thanks
This code should do the job (tidyverse way) :
library(dplyr)
df <- df1 %>%
left_join(df2, by = "date") %>%
anti_join(df3, by = "date")
I have 5 data frames and I have to analize just the first column. From these, I must obtain a frequency table of their common words (not necessarily of all data frames, for example a word can appear just in two or more dataframes).
Then I must obtain a frequency table of common words of ALL dataframes
I just tried doing a for cycle but I seems very complicated. Moreover, dataframes have different dimentions. I didn't find any useful function.
Then I tried doing
lst1 <- list(a,b,c,d,e)
newdat <- stack(setNames(lapply(lst1, "[", 1), seq_along(lst1)))[2:1]
library(dplyr)
newdat %>% group_by(val) %>% filter(uniqueN(ind) > 1) %>% count(val)
but it gives me an error
> stack(setNames(lapply(lst1, "[", 1), seq_along(lst1)))
Error in stack.default(setNames(lapply(lst1, "[", 1), seq_along(lst1))):
at least one vector element is required
Thank you
Here's my solution using purrr & dplyr:
library(purrr)
library(dplyr)
lst1 <- list(mtcars=mtcars, iris=iris, chick=chickwts, cars=cars, airqual=airquality)
lst1 %>%
map_dfr(select, value=1, .id="df") %>% # select first column of every dataframe and name it "value"
group_by(value) %>%
summarise(freq=n(), # frequency over all dataframes
n_df=n_distinct(df), # number of dataframes this value ocurrs
dfs = paste(unique(df), collapse=",")) %>%
filter(n_df > 1) %>%
filter(n_df == 5) # if value has to be in all 5 dataframes
I have a data frame manipulation question.
I would like to find the subset of data frame "data1" which sum of each col equal to another data frame "data2".
Here is my code:
AA<-c(2,3,1,4,9)
BB<-c(5,13,9,1,2)
A1<-c(5)
B1<-c(18)
data1<-data.frame(AA,BB)
data2<-data.frame(A1,B1)
library(dplyr)
subset(data1, ((sum(AA) ==data2$A1 ) && (sum(BB) ==data2$B1 ) ) )
I am wondering if any other algorithm would help?
Thanks!
This solution only considers the scenario that you want to calculate the sum from any two rows. If you want to test other row numbers, you will need to create those combinations by changing the numbers in the combn function. final_data is the final output. If there are multiple matches, you may want to keep the final_data as a list.
# Prepare example datasets
AA<-c(2,3,1,4,9)
BB<-c(5,13,9,1,2)
A1<-c(5)
B1<-c(18)
data1<-data.frame(AA,BB)
data2<-data.frame(A1,B1)
# Load packages
library(tidyverse)
# Use combn to find out all the combination of row number
row_indices <- as.data.frame(t(combn(1:nrow(data1), 2)))
# Prepare a list of data frame. Each data frame is one row from row_indices
row_list <- row_indices %>%
rowid_to_column() %>%
split(f = .$rowid)
# Based on row_list to subset data1
sub_list <- map(row_list, function(dt){
temp_data <- data1 %>% filter(row_number() %in% c(dt$V1, dt$V2))
return(temp_data)
})
# Calcualte the sum of each data frame in sub_list
sub_list2 <- map(sub_list, function(dt){
dt2 <- dt %>%
summarise_all(funs(sum(.))) %>%
setNames(c("A1", "B1"))
return(dt2)
})
# Compare each data frame in sub_list2 with data2
# Find the one that is the same and store the logical results in result_indices
result_indices <- map_lgl(sub_list2, function(dt) setequal(dt, data2))
# Get the final output
final_data <- sub_list[result_indices][[1]]
final_data
AA BB
1 2 5
2 3 13