partially match a data.frame and subset all the data.frame

partially match a data.frame and subset all the data.frame - r

I have some data that looks like this:
List_name Condition1 Condition2 Situation1 Situation2
List1 0.01 0.12 66 123
List2 0.23 0.22 45 -34
List3 0.32 0.23 13 -12
List4 0.03 0.56 -3 45
List5 0.56 0.05 12 100
List6 0.90 0.09 22 32
I would like to filter each column "Condition" of the data.frame according to a cut off 0.5.
After the filter, the subset will occur and will carry the corresponding value of columns "Situation". Filter and subset will work pairwise: "Condition1" with "Situation1", "Condition2" with "Situation2" and so on.
Just the desired output:
List_name Condition1 Situation1 List_name Condition2 Situation2
List1 0.01 66 List1 0.12 123
List2 0.23 45 List2 0.22 -34
List3 0.32 13 List3 0.23 -12
List4 0.03 -3 List5 0.05 100
List6 0.09 32
I'm pretty sure that there's probably another similar situation posted before but I searched and I didn't find it.

Similar to excellent #Arun solution, but based on columns names and without any assumption.
cols.conds <- colnames(dat)[gregexpr(pattern='Condition[0-9]+',colnames(dat)) > 0]
lapply(cols.conds, function(x){
col.list <- colnames(dat)[1]
col.situ <- gsub('Condition','Situation',x)
dat[which(dat[[x]] < 0.5), c(col.list,x,col.situ)]}
)
I assume dat is :
dat <- read.table(text =' List_name Condition1 Condition2 Situation1 Situation2
List1 0.01 0.12 66 123
List2 0.23 0.22 45 -34
List3 0.32 0.23 13 -12
List4 0.03 0.56 -3 45
List5 0.56 0.05 12 100
List6 0.90 0.02 22 32',head=T)

You can use the notion that boolean checks are vectorized:
x <- c(0.1, 0.3, 0.5, 0.2)
x < 0.5
# [1] TRUE TRUE FALSE TRUE
And some grep results:
grep('Condition', names(DF1))
To do this subsetting you can use apply to generate your boolean vector:
keepers <- apply(DF1[, grep('Condition', names(DF1))], 1, function(x) any(x < 0.5))
And subset:
DF1[keepers,]
Notice that this doesn't necessarily return the data structure you showed in your question. But you can alter the anonymous function accordingly using all or a different threshold value.
In lieu of the edits, I would approach this differently. I would use melt from the reshape2 package:
library(reshape2)
dat.c <- melt(DF1,
id.var='List_name',
measure.var=grep('Condition', names(DF1), value=TRUE),
variable.name='condition',
value.name='cond.val')
dat.c$idx <- gsub('Condition', '', dat.c$condition)
dat.s <- melt(DF1,
id.var='List_name',
measure.var=grep('Situation', names(DF1), value=TRUE),
variable.name='situation',
value.name='situ.val')
dat.s$idx <- gsub('Situation', '', dat.s$situation)
dat <- merge(dat.c, dat.s)
out <- dat[dat$cond.val < 0.5,]
List_name idx condition cond.val situation situ.val
1 List1 1 Condition1 0.01 Situation1 66
2 List1 2 Condition2 0.12 Situation2 123
3 List2 1 Condition1 0.23 Situation1 45
4 List2 2 Condition2 0.22 Situation2 -34
5 List3 1 Condition1 0.32 Situation1 13
6 List3 2 Condition2 0.23 Situation2 -12
7 List4 1 Condition1 0.03 Situation1 -3
10 List5 2 Condition2 0.05 Situation2 100
12 List6 2 Condition2 0.09 Situation2 32
You can then use dcast to put the data back in the initial format if you want, but I find data in this "long" form much easier to work with. This form is also pleasant since it avoids the need for NA values where you have rows where one condition is met and others are not.
out.c <- dcast(out, List_name ~ condition, value.var='cond.val')
out.s <- dcast(out, List_name ~ situation, value.var='situ.val')
merge(out.c, out.s)
List_name Condition1 Condition2 Situation1 Situation2
1 List1 0.01 0.12 66 123
2 List2 0.23 0.22 45 -34
3 List3 0.32 0.23 13 -12
4 List4 0.03 NA -3 NA
5 List5 NA 0.05 NA 100
6 List6 NA 0.09 NA 32

I think what you're asking for is attainable, but it can't be bind(bound) in the way you've shown as they have unequal elements. So, you'll get a list.
Here, I assume that your data.frame always is of the form List_name, followed by a list of Condition1, ... ,ConditionN and then Situation1, ..., SituationN.
Then, this can be obtained by getting the ids first and then filtering using lapply
ids <- grep("Condition", names(df))
lapply(ids, function(x) df[which(df[[x]] < 0.5), c(1,x,x+length(ids))])
# [[1]]
# List_name Condition1 Situation1
# 1 List1 0.01 66
# 2 List2 0.23 45
# 3 List3 0.32 13
# 4 List4 0.03 -3
#
# [[2]]
# List_name Condition2 Situation2
# 1 List1 0.12 123
# 2 List2 0.22 -34
# 3 List3 0.23 -12
# 5 List5 0.05 100
# 6 List6 0.09 32

Related

Automated fill in columns in r

I have a dataframe (shown below) where there are some asterisks in the "sig" column.
I want to fill in asterisks in the empty cells in the sig column everywhere above the furthest down row where there is an asterisk, which in this case would be everywhere from row "H" up to get something like this:
I'm thinking some sort of a for loop where it identifies the furthest down row where there is an asterisk and then fills in asterisks in empty cells above might be the way to go, but I'm not sure how to code this.
For debugging purposes, I make the data frame in R with
df<- data.frame("variable"= c("a","b","c","d","e","f","g","h","i","j","k"),
"value" = c(0.04,0.03,0.04,0.02,0.03,0.02,0.02,0.01,0.04,0.1,0.02),
"sig" = c("*","*","*","","*","*","","*","","",""))
Any help would be greatly appreciated - thanks!

Another way:
df[1:max(which(df$sig == "*")), "sig"] = "*"
Gives:
variable value sig
1 a 0.04 *
2 b 0.03 *
3 c 0.04 *
4 d 0.02 *
5 e 0.03 *
6 f 0.02 *
7 g 0.02 *
8 h 0.01 *
9 i 0.04
10 j 0.10
11 k 0.02

We could use replace based on finding the index of the last element having *
library(dplyr)
df <- df %>%
mutate(sig = replace(sig, seq(tail(which(sig == "*"), 1)), "*"))
-output
df
variable value sig
1 a 0.04 *
2 b 0.03 *
3 c 0.04 *
4 d 0.02 *
5 e 0.03 *
6 f 0.02 *
7 g 0.02 *
8 h 0.01 *
9 i 0.04
10 j 0.10
11 k 0.02

Another solution would be using fill, but you need to change "" to NA
Libraries
library(tidyverse)
Data
df <-
data.frame("variable"= c("a","b","c","d","e","f","g","h","i","j","k"),
"value" = c(0.04,0.03,0.04,0.02,0.03,0.02,0.02,0.01,0.04,0.1,0.02),
"sig" = c("*","*","*","","*","*","","*","","",""))
Code
df %>%
mutate(sig = if_else(sig == "",NA_character_,sig)) %>%
fill(sig,.direction = "up")
Output
variable value sig
1 a 0.04 *
2 b 0.03 *
3 c 0.04 *
4 d 0.02 *
5 e 0.03 *
6 f 0.02 *
7 g 0.02 *
8 h 0.01 *
9 i 0.04 <NA>
10 j 0.10 <NA>
11 k 0.02 <NA>

Looping through the data frame and match the values from another file in R

dput(df) of dataframe2
I need some help with r.
I have a data frame:
ant <- data.frame(n_scale = c(0.62, 0.29, -0.9),
aa = c('A','B','C'))
It looks like this:
0.62 A
0.29 B
-0.90 C
Then I read a file with a dataframe2 which looks like:
-1 0 1 2
C B A A
I want to achieve this:
-1 0 1 2
C B A A
-0.9 0.29 0.62 0.62
How can I loop through the dataframe2 to get values from the ant data frame?
Thank you very much for your help! :)

Using merge. After that you can match hyd of result with that of df2.
res <- merge(ant, df2)
res <- res[match(df2$hyd, res$hyd), ]
res
# aa n_scale hyd
# 4 C -0.90 -1
# 3 B 0.29 0
# 1 A 0.62 1
# 2 A 0.62 2
Please next time when asking, provide your data as I do below.
Data:
ant <- data.frame(n_scale = c(0.62, 0.29, -0.9),
aa = c('A','B','C'))
df2 <- data.frame(hyd=c(-1, 0, 1, 2),
aa=c("C", "B", "A", "A"))

In a 2 column dataframe how to format the 2nd column numbers to 2 decimals

Can someone explain how to change the 2nd column data in a data frame to 2 decimal points (as a part of data cleaning). I have tried the following code.
data <- as.matrix(read.table("Assessment2.txt"))
data
data <- data %>%
mutate_if(is.numeric(), round, digits = 2)
data

Try this:
iris %>%
mutate_if(~is.numeric(.), ~round(., digits = 2))

If it's just the second column, the following rounds it to 2 decimal digits.
Load the required package and create a data set.
library(dplyr)
set.seed(1234)
df <- data.frame(a = rnorm(10), b = rnorm(10), c = rnorm(10))
Now round the 2nd column to 2 digits.
df %>% mutate_at(2, round, digits = 2)
# a b c
#1 -1.2070657 -0.48 0.1340882
#2 0.2774292 -1.00 -0.4906859
#3 1.0844412 -0.78 -0.4405479
#4 -2.3456977 0.06 0.4595894
#5 0.4291247 0.96 -0.6937202
#6 0.5060559 -0.11 -1.4482049
#7 -0.5747400 -0.51 0.5747557
#8 -0.5466319 -0.91 -1.0236557
#9 -0.5644520 -0.84 -0.0151383
#10 -0.8900378 2.42 -0.9359486

Here a quick example if you know the name of the second column and that this column is already in a numeric format:
In base R, you can go simple:
df[,2] <- round(df[,2], digits = 2)
Alternatively, you can use mutate from dplyr:
> df %>% mutate(b = round(b,digits = 2))
a b
1 -1.63728065 0.89
2 -0.77956851 -0.03
3 -0.64117693 -0.65
4 -0.68113139 0.65
5 -2.03328560 -0.43
6 0.50096356 1.77
7 -1.53179814 -0.02
8 -0.02499764 0.85
9 0.59298472 0.21
10 -0.19819542 -3.01
Data
df = data.frame(a = rnorm(10),
b = rnorm(10))
> df
a b
1 -1.63728065 0.89200839
2 -0.77956851 -0.02571507
3 -0.64117693 -0.64766045
4 -0.68113139 0.64635942
5 -2.03328560 -0.43383274
6 0.50096356 1.77261118
7 -1.53179814 -0.01825971
8 -0.02499764 0.85281499
9 0.59298472 0.20516290
10 -0.19819542 -3.00804860

replacing values in dataframe with another dataframe r

I have a dataframe of values that represent fold changes as such:
> df1 <- data.frame(A=c(1.74,-1.3,3.1), B=c(1.5,.9,.71), C=c(1.1,3.01,1.4))
A B C
1 1.74 1.50 1.10
2 -1.30 0.90 3.01
3 3.10 0.71 1.40
And a dataframe of pvalues as such that matches rows and columns identically:
> df2 <- data.frame(A=c(.02,.01,.8), B=c(NA,.01,.06), C=c(.01,.01,.03))
A B C
1 0.02 NA 0.01
2 0.01 0.01 0.01
3 0.80 0.06 0.03
What I want is to modify the values in df1 so that only retain the values that had a correponding pvalue in df2 < .05, and replace with NA otherwise. Note there are also NA in df2.
> desired <- data.frame(A=c(1.74,-1.3,NA), B=c(NA,.9,NA), C=c(1.1,3.01,1.4))
> desired
A B C
1 1.74 NA 1.10
2 -1.30 0.9 3.01
3 NA NA 1.40
I first tried to use vector syntax on these dataframes and that didn't work. Then I tried a for loop by columns and that also failed.
I don't think i understand how to index each i,j position and then replace df1 values with df2 values based on a logical.
Or if there is a better way in R.

You can try this:
df1[!df2 < 0.05 | is.na(df2)] <- NA
Out:
> df1
A B C
1 1.74 NA 1.10
2 -1.30 0.9 3.01
3 NA NA 1.40

ifelse and as.matrix seem to do the trick.
df1 <- data.frame(A=c(1.74,-1.3,3.1), B=c(1.5,.9,.71), C=c(1.1,3.01,1.4))
df2 <- data.frame(A=c(.02,.01,.8), B=c(NA,.01,.06), C=c(.01,.01,.03))
x1 <- as.matrix(df1)
x2 <- as.matrix(df2)
as.data.frame( ifelse( x2 >= 0.05 | is.na(x2), NA, x1) )
Result
A B C
1 1.74 NA 1.10
2 -1.30 0.9 3.01
3 NA NA 1.40

Access a row based on two or more values in a row in R

This should be a very simple question, but I just don't know how to do this.
I want to delete certain rows of my data.frame. how can i access a row based on the values of two columns?
DATE <- c("01.01.2000","02.01.2000","03.01.2000","06.01.2000","07.01.2000","09.01.2000","10.01.2000","01.01.2000","02.01.2000","04.01.2000","06.01.2000","07.01.2000","09.01.2000","10.01.2000")
RET <- c(-2.0,1.1,3,1.4,-0.2, 0.6, 0.1, -0.21, -1.2, 0.9, 0.3, -0.1,0.3,-0.12)
COMP <- c("A","A","A","A","A","A","A","B","B","B","B","B","B","B")
df <- data.frame(DATE, RET, COMP)
df
# DATE RET COMP
# 1 01.01.2000 -2.00 A
# 2 02.01.2000 1.10 A
# 3 03.01.2000 3.00 A
# 4 06.01.2000 1.40 A
# 5 07.01.2000 -0.20 A
# 6 09.01.2000 0.60 A
# 7 10.01.2000 0.10 A
# 8 01.01.2000 -0.21 B
# 9 02.01.2000 -1.20 B
# 10 04.01.2000 0.90 B
# 11 06.01.2000 0.30 B
# 12 07.01.2000 -0.10 B
# 13 09.01.2000 0.30 B
# 14 10.01.2000 -0.12 B
Let's say I want to delete the row where DATE is "07.01.2000" and COMP is "A". If I could do something like:
df["07.01.2000" %in% df$DATE and "A" %in% COMP, ] <- NULL
would be nice.

Well, you can't delete a row by assigning NULL. You can create a new data.frame that does not contain those values. Something like
df[!(df$DATE=="07.01.2000" & df$COMP =="A"), ]
If you have more dates or comps such that you really wanted to use %in% just realize that you have your parameters flipped. So
df[!(df$DATE %in% "07.01.2000" & df$COMP %in% "A"), ]
is the way to go.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

partially match a data.frame and subset all the data.frame - r

Related

Automated fill in columns in r

Looping through the data frame and match the values from another file in R

In a 2 column dataframe how to format the 2nd column numbers to 2 decimals

replacing values in dataframe with another dataframe r

Access a row based on two or more values in a row in R

Categories

Resources