r check and replace stuck data - r

There are two sensors. The collected data should be changing with time. How can identify the data stuck and replace it with another sensor?
a<- c(1:24)
b<- seq(0.1,2.4,0.1)
c<- c(0.05,0.2,0.3,rep(0.4,18),2.2,2.3,2.4)
d<- data.frame(a,b,c)
so the data has
d
a b c
1 0.1 0.05
2 0.2 0.20
3 0.3 0.30
4 0.4 0.40
5 0.5 0.40
6 0.6 0.40
7 0.7 0.40
8 0.8 0.40
9 0.9 0.40
10 1.0 0.40
11 1.1 0.40
12 1.2 0.40
13 1.3 0.40
14 1.4 0.40
15 1.5 0.40
16 1.6 0.40
17 1.7 0.40
18 1.8 0.40
19 1.9 0.40
20 2.0 0.40
21 2.1 0.40
22 2.2 2.20
23 2.3 2.30
24 2.4 2.40
Sensor c stuck at 0.4 from time a4 to a20, is there a quick way to identify it and replace the stuck part using data from sensor b?

The new column c_updated is what you want. I've created some helpful columns (c_previous and c_is_stuck) which you can remove if you want.
library(dplyr)
a<- c(1:24)
b<- seq(0.1,2.4,0.1)
c<- c(0.05,0.2,0.3,rep(0.4,18),2.2,2.3,2.4)
d<- data.frame(a,b,c)
d %>%
mutate(c_previous = lag(c, default = 0), # get previous measurement for sensor c
c_is_stuck = ifelse(c == c_previous, 1 ,0), # flag stuck for sensor c when current measurement is same as previous one
c_updated = ifelse(c_is_stuck == 1, b, c)) # if sensor c is stuck use measurement from sensor b
# a b c c_previous c_is_stuck c_updated
# 1 1 0.1 0.05 0.00 0 0.05
# 2 2 0.2 0.20 0.05 0 0.20
# 3 3 0.3 0.30 0.20 0 0.30
# 4 4 0.4 0.40 0.30 0 0.40
# 5 5 0.5 0.40 0.40 1 0.50
# 6 6 0.6 0.40 0.40 1 0.60
# 7 7 0.7 0.40 0.40 1 0.70
# 8 8 0.8 0.40 0.40 1 0.80
# 9 9 0.9 0.40 0.40 1 0.90
# 10 10 1.0 0.40 0.40 1 1.00
# 11 11 1.1 0.40 0.40 1 1.10
# 12 12 1.2 0.40 0.40 1 1.20
# 13 13 1.3 0.40 0.40 1 1.30
# 14 14 1.4 0.40 0.40 1 1.40
# 15 15 1.5 0.40 0.40 1 1.50
# 16 16 1.6 0.40 0.40 1 1.60
# 17 17 1.7 0.40 0.40 1 1.70
# 18 18 1.8 0.40 0.40 1 1.80
# 19 19 1.9 0.40 0.40 1 1.90
# 20 20 2.0 0.40 0.40 1 2.00
# 21 21 2.1 0.40 0.40 1 2.10
# 22 22 2.2 2.20 0.40 0 2.20
# 23 23 2.3 2.30 2.20 0 2.30
# 24 24 2.4 2.40 2.30 0 2.40

This is a pretty simple way. Duplicate the c column with an offset of 1 and check if the two values are identical. If so, take the value from b.
a<- c(1:24)
b<- seq(0.1,2.4,0.1)
c<- c(0.05,0.2,0.3,rep(0.4,18),2.2,2.3,2.4)
d<- data.frame(a,b,c)
d$d <- c(NA, d$c[1:23])
d$replaced <- ifelse(d$c == d$d, d$b, d$c)
a b c d replaced
1 1 0.1 0.05 NA NA
2 2 0.2 0.20 0.05 0.2
3 3 0.3 0.30 0.20 0.3
4 4 0.4 0.40 0.30 0.4
5 5 0.5 0.40 0.40 0.5
6 6 0.6 0.40 0.40 0.6
7 7 0.7 0.40 0.40 0.7
8 8 0.8 0.40 0.40 0.8
9 9 0.9 0.40 0.40 0.9
10 10 1.0 0.40 0.40 1.0
11 11 1.1 0.40 0.40 1.1
12 12 1.2 0.40 0.40 1.2
13 13 1.3 0.40 0.40 1.3
14 14 1.4 0.40 0.40 1.4
15 15 1.5 0.40 0.40 1.5
16 16 1.6 0.40 0.40 1.6
17 17 1.7 0.40 0.40 1.7
18 18 1.8 0.40 0.40 1.8
19 19 1.9 0.40 0.40 1.9
20 20 2.0 0.40 0.40 2.0
21 21 2.1 0.40 0.40 2.1
22 22 2.2 2.20 0.40 2.2
23 23 2.3 2.30 2.20 2.3
24 24 2.4 2.40 2.30 2.4

The bellow solution is as basic as it gets I think. No additional packages required. Cheers!
a<- c(1:24)
b<- seq(0.1,2.4,0.1)
c<- c(0.05,0.2,0.3,rep(0.4,18),2.2,2.3,2.4)
d<- data.frame(a,b,c)
d$diff.b <- c(NA, diff(d$b))
d$diff.c <- c(NA, diff(d$c))
stuck.index <- which(d$diff.c==0)
d[stuck.index, "c"] <- d[stuck.index, "b"]
# changing to original data frame format
d$diff.b <- NULL
d$diff.c <- NULL

Related

R: need help matching up table rows and getting differences

I have chromatographic data in a table organized by peak position and integration value of various samples. All samples in the table have a repeated measurement as well with a different sample log number.
What I'm interested in, is the repeatability of the measurements of the various peaks. The measure for that would be the difference in peak integration = 0 for each sample.
The data
Sample Log1 Log2 Peak1 Peak2 Peak3 Peak4 Peak5
A 100 104 0.20 0.80 0.30 0.00 0.00
B 101 106 0.25 0.73 0.29 0.01 0.04
C 102 103 0.20 0.80 0.30 0.00 0.07
C 103 102 0.22 0.81 0.31 0.04 0.00
A 104 100 0.21 0.70 0.33 0.00 0.10
B 106 101 0.20 0.73 0.37 0.00 0.03
with Log1 is the original sample log number, and Log2 is the repeat log number.
How can I construct a new variable for every peak (being the difference PeakX_Log1 - PeakX_Log2)?
Mind that in my example I only have 5 peaks. The real-life situation is a complex mixture involving >20 peaks, so very hard to do it by hand.
If you will only have two values for each sample, something like this could work:
df <- data.table::fread(
"Sample Log1 Log2 Peak1 Peak2 Peak3 Peak4 Peak5
A 100 104 0.20 0.80 0.30 0.00 0.00
B 101 106 0.25 0.73 0.29 0.01 0.04
C 102 103 0.20 0.80 0.30 0.00 0.07
C 103 102 0.22 0.81 0.31 0.04 0.00
A 104 100 0.21 0.70 0.33 0.00 0.10
B 106 101 0.20 0.73 0.37 0.00 0.03"
)
library(tidyverse)
new_df <- df %>%
mutate(Log = ifelse(Log1 < Log2,"Log1","Log2")) %>%
select(-Log1,-Log2) %>%
pivot_longer(cols = starts_with("Peak"),names_to = "Peak") %>%
pivot_wider(values_from = value, names_from = Log) %>%
mutate(Variation = Log1 - Log2)
new_df
# A tibble: 15 × 5
Sample Peak Log1 Log2 Variation
<chr> <chr> <dbl> <dbl> <dbl>
1 A Peak1 0.2 0.21 -0.0100
2 A Peak2 0.8 0.7 0.100
3 A Peak3 0.3 0.33 -0.0300
4 A Peak4 0 0 0
5 A Peak5 0 0.1 -0.1
6 B Peak1 0.25 0.2 0.05
7 B Peak2 0.73 0.73 0
8 B Peak3 0.29 0.37 -0.08
9 B Peak4 0.01 0 0.01
10 B Peak5 0.04 0.03 0.01
11 C Peak1 0.2 0.22 -0.0200
12 C Peak2 0.8 0.81 -0.0100
13 C Peak3 0.3 0.31 -0.0100
14 C Peak4 0 0.04 -0.04
15 C Peak5 0.07 0 0.07

Creating new variable in wide data format, R

I have transformed my data into a wide format using the mlogit.data function in order to be able to perform an mlogit multinomial logit regression in R. The data has three different "choices" and looks like this (in its wide format):
Observation Choice Variable A Variable B Variable C
1 1 1.27 0.2 0.81
1 0 1.27 0.2 0.81
1 -1 1.27 0.2 0.81
2 1 0.20 0.45 0.70
2 0 0.20 0.45 0.70
2 -1 0.20 0.45 0.70
However, as the variables A, B and C are linked to the different outcomes I would now like to create a new variable that looks like this:
Observation Choice Variable A Variable B Variable C Variable D
1 1 1.27 0.2 0.81 1.27
1 0 1.27 0.2 0.81 0.2
1 -1 1.27 0.2 0.81 0.81
2 1 0.20 0.45 0.70 0.20
2 0 0.20 0.45 0.70 0.45
2 -1 0.20 0.45 0.70 0.70
I have tried the following code:
Variable D <- ifelse(Choice == "1", Variable A, ifelse(Choice == "-1", Variable B, Variable C))
However, the ifelse function only considers one choice from each observation, creating this:
Observation Choice Variable A Variable B Variable C Variable D
1 1 1.27 0.2 0.81 1.27
1 0 1.27 0.2 0.81 -
1 -1 1.27 0.2 0.81 -
2 1 0.20 0.45 0.70 -
2 0 0.20 0.45 0.70 0.2
2 -1 0.20 0.45 0.70 -
Anyone know how to solve this?
Thanks!
You can create a table mapping choices to variables and then use match
choice_map <-
data.frame(choice = c(1, 0, -1), var = grep('Variable[A-C]', names(df)))
# choice var
# 1 1 3
# 2 0 4
# 3 -1 5
df$VariableD <-
df[cbind(seq_len(nrow(df)), with(choice_map, var[match(df$Choice, choice)]))]
df
# Observation Choice VariableA VariableB VariableC VariableD
# 1 1 1 1.27 0.20 0.81 1.27
# 2 1 0 1.27 0.20 0.81 0.20
# 3 1 -1 1.27 0.20 0.81 0.81
# 4 2 1 0.20 0.45 0.70 0.20
# 5 2 0 0.20 0.45 0.70 0.45
# 6 2 -1 0.20 0.45 0.70 0.70
Data used (removed spaces in colnames)
df <- data.table::fread('
Observation Choice VariableA VariableB VariableC
1 1 1.27 0.2 0.81
1 0 1.27 0.2 0.81
1 -1 1.27 0.2 0.81
2 1 0.20 0.45 0.70
2 0 0.20 0.45 0.70
2 -1 0.20 0.45 0.70
', data.table = F)
df$`Variable D`= sapply(1:nrow(df),function(x){
df[x,4-df$Choice[x]]
})
> df
Observation Choice Variable A Variable B Variable C Variable D
1 1 1 1.27 0.20 0.81 1.27
2 1 0 1.27 0.20 0.81 0.20
3 1 -1 1.27 0.20 0.81 0.81
4 2 1 0.20 0.45 0.70 0.20
5 2 0 0.20 0.45 0.70 0.45
6 2 -1 0.20 0.45 0.70 0.70

Replace values in data frame based on a table in R

Data Frame:
set.seed(90)
df <- data.frame(id = 1:10, values = round(rnorm(10),1))
id values
1 1 0.1
2 2 -0.2
3 3 -0.9
4 4 -0.7
5 5 0.7
6 6 0.4
7 7 1.0
8 8 0.9
9 9 -0.6
10 10 2.4
Table:
table <- data.frame(values = c(-2.0001,1.0023,0.0005,1.0002,2.00009), final_values = round(rnorm(5),2))
values final_values
1 -2.00010 -0.81
2 1.00230 -0.08
3 0.00050 0.87
4 1.00020 1.66
5 2.00009 -0.24
I need to replace the values in data frame based on the closest match of the values in table.
Final Output:
id final_values
1 1 0.87
2 2 0.87
3 3 -0.08
4 4 -0.08
5 5 1.66
6 6 0.87
7 7 1.66
8 8 1.66
9 9 -0.08
10 10 -0.24
What is the best way to do this with base R?
Here is a way and you can overwrite the result back to df:
sapply(df$values, function(x) table$final_values[which.min(abs(x - table$values))])
[1] 0.87 0.87 -0.08 -0.08 1.66 0.87 1.66 1.66 -0.08 -0.24

Merging two columns in R considering other corresponding columns?

I am using R and have imported data from 2 excel sheets containing 3 columns each. The first matrix contains 3 columns (1-3) and 380 rows and the second matrix contains 3 columns and 365 rows. Columns 2 and 3 are always values corresponding to the first column. I would like to merge the first columns of the two matrices in a single column such that after merging the identical values in the two columns are just replaced (they should not be in individual rows one after the other) and the column is arranged in an ascending order. Also, the main condition should be that columns 2,3 of each matrix (that are values for column 1) should get rearranged correspondingly, but should not get merged. If there are some values in the first column (generated after merging) whose value is not present in the corresponding column then it should be replaced by zero. I have done merging and rearranging for the first column, but I am not able to make the corresponding changes in the other columns. How should I go around?
Here are the two matrices:
Matrix A
92.6691 1076.5 0.48
93.324 1110.1 0.5
96.9597 1123.3 0.5
97.7539 968.4 0.43
98.992 1006.1 0.45
99.0061 5584.6 2.49
101.0243 1555.7 0.69
101.0606 12821.2 5.72
102.1221 972 0.43
Matrix B
95.4466 974.2 0.43
99.0062 4721.9 2.06
100.0321 1040.1 0.45
101.0241 2115.8 0.92
101.0606 15202.8 6.64
102.2736 945.3 0.41
108.4273 1059.7 0.46
115.0397 25106.3 10.96
115.0761 54740 23.9
After merging, the results should be a single matrix:
Column 1 - Merged 1st columns of matrices A and B (ascending order)
Column 2 - Rearranged based on change in row positions of column 1 in matrix A
Column 3 - Rearranged based on change in row positions of column 1 in matrix A
Column 4 - Rearranged based on change in row positions of column 1 in matrix B
Column 5 - Rearranged based on change in row positions of column 1 in matrix B
Here is the resulting matrix:
92.6691 1076.5 0.48 0 0
93.324 1110.1 0.5 0 0
95.4466 0 0 974.2 0.43
96.9597 1123.3 0.5 0 0
97.7539 968.4 0.43 0 0
98.992 1006.1 0.45 0 0
99.0061 5584.6 2.49 0 0
99.0062 0 0 4721.9 2.06
100.0321 0 0 1040.1 0.45
101.0241 0 0 2115.8 0.92
101.0243 1555.7 0.69 0 0
101.0606 12821.2 5.72 15202.8 6.64
102.1221 972 0.43 0 0
102.2736 0 0 945.3 0.41
108.4273 0 0 1059.7 0.46
115.0397 0 0 25106.3 10.96
115.0761 0 0 54740 23.9
Note that in matrices A and B, the value 101.0606 is common.
This can be done easily with merge().
# read your data:
read.table(
t="92.6691 1076.5 0.48
93.324 1110.1 0.5
96.9597 1123.3 0.5
97.7539 968.4 0.43
98.992 1006.1 0.45
99.0061 5584.6 2.49
101.0243 1555.7 0.69
101.0606 12821.2 5.72
102.1221 972 0.43") -> M1
read.table(
t="95.4466 974.2 0.43
99.0062 4721.9 2.06
100.0321 1040.1 0.45
101.0241 2115.8 0.92
101.0606 15202.8 6.64
102.2736 945.3 0.41
108.4273 1059.7 0.46
115.0397 25106.3 10.96
115.0761 54740 23.90") -> M2
# merge data -- note `all = TRUE`
result <- merge(M1,M2,by = "V1", all = TRUE)
# replace na with 0
result[is.na(result)] <- 0
result
# V1 V2.x V3.x V2.y V3.y
# 1 92.67 1076.5 0.48 0.0 0.00
# 2 93.32 1110.1 0.50 0.0 0.00
# 3 95.45 0.0 0.00 974.2 0.43
# 4 96.96 1123.3 0.50 0.0 0.00
# 5 97.75 968.4 0.43 0.0 0.00
# 6 98.99 1006.1 0.45 0.0 0.00
# 7 99.01 5584.6 2.49 0.0 0.00
# 8 99.01 0.0 0.00 4721.9 2.06
# 9 100.03 0.0 0.00 1040.1 0.45
# 10 101.02 0.0 0.00 2115.8 0.92
# 11 101.02 1555.7 0.69 0.0 0.00
# 12 101.06 12821.2 5.72 15202.8 6.64
# 13 102.12 972.0 0.43 0.0 0.00
# 14 102.27 0.0 0.00 945.3 0.41
# 15 108.43 0.0 0.00 1059.7 0.46
# 16 115.04 0.0 0.00 25106.3 10.96
# 17 115.08 0.0 0.00 54740.0 23.90
df3 <- merge(df1,df2,all.x=T,all.y=T)
df3[is.na(df3)] <- 0
x a b c d
1 92.6691 1076.5 0.48 0.0 0.00
2 93.3240 1110.1 0.50 0.0 0.00
3 95.4466 0.0 0.00 974.2 0.43
4 96.9597 1123.3 0.50 0.0 0.00
5 97.7539 968.4 0.43 0.0 0.00
6 98.9920 1006.1 0.45 0.0 0.00
7 99.0061 5584.6 2.49 0.0 0.00
8 99.0062 0.0 0.00 4721.9 2.06
9 100.0321 0.0 0.00 1040.1 0.45
10 101.0241 0.0 0.00 2115.8 0.92
11 101.0243 1555.7 0.69 0.0 0.00
12 101.0606 12821.2 5.72 15202.8 6.64
13 102.1221 972.0 0.43 0.0 0.00
14 102.2736 0.0 0.00 945.3 0.41
15 108.4273 0.0 0.00 1059.7 0.46
16 115.0397 0.0 0.00 25106.3 10.96
17 115.0761 0.0 0.00 54740.0 23.90
data
df1
x a b
92.6691 1076.5 0.48
93.324 1110.1 0.5
96.9597 1123.3 0.5
97.7539 968.4 0.43
98.992 1006.1 0.45
99.0061 5584.6 2.49
101.0243 1555.7 0.69
101.0606 12821.2 5.72
102.1221 972 0.43
df2
x c d
95.4466 974.2 0.43
99.0062 4721.9 2.06
100.0321 1040.1 0.45
101.0241 2115.8 0.92
101.0606 15202.8 6.64
102.2736 945.3 0.41
108.4273 1059.7 0.46
115.0397 25106.3 10.96
115.0761 54740 23.9
I generated some data myself you can replace them with yours. Here you will need to merge two files; first vertically and then horizontally. Finally, order them according to first column.
set.seed(42)
# Load data 1
dat1<- as.data.frame(matrix(rexp(30), 10))
# Inly keep unique rows
dat1 <- unique(dat1)
set.seed(24)
# Load data 2
dat2 <-as.data.frame(matrix(rexp(30), 10))
# Inly keep unique rows
dat2 <- unique(dat2)
# Copy it in temp
dat2n <-dat2
# sed second and third column to 0s
dat2n[,2:3] <- 0
# Concatenate them and keep only unique
dat <- rbind(dat1,dat2n)
# Merge dat and dat2 with respect to column 1 and keep everything in dat
fin.dat <- merge(dat, dat2, by="V1", all.x = TRUE)
# Finally order the dataframe
fin.dat <- fin.dat[order(fin.dat[,1], decreasing = FALSE),]
# Replace NA with zeros
fin.dat[is.na(fin.dat)] <- 0

All combinations of values between 0-1 sum to 1 in R

Simple question: I'm trying to get all combinations where the weights of 3 numbers (between 0.1 and 0.9) sums to 1.
Example:
c(0.20,0.20,0.60)
c(0.35,0.15,0.50)
.................
with weights differing by 0.05
I have tried this:
library(gregmisc)
permutations(n = 9, r = 3, v = seq(0.1,0.9,0.05))
combn(seq(0.1,0.9,0.05),c(3))
However I would need the 3 numbers (weights) to equal 1, how can I do this?
x <- expand.grid(seq(0.1,1,0.05),
seq(0.1,1,0.05),
seq(0.1,1,0.05))
x <- x[rowSums(x)==1,]
Edit: Use this instead to avoid floating point errors:
x <- x[abs(rowSums(x)-1) < .Machine$double.eps ^ 0.5,]
#if order doesn't matter
unique(apply(x,1,sort), MARGIN=2)
# 15 33 51 69 87 105 123 141 393 411 429 447 465 483 771 789 807 825 #843 1149 1167 1185 1527 1545
#[1,] 0.1 0.10 0.1 0.10 0.1 0.10 0.1 0.10 0.15 0.15 0.15 0.15 0.15 0.15 0.2 0.20 0.2 0.20 0.2 0.25 0.25 0.25 0.3 0.30
#[2,] 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.15 0.20 0.25 0.30 0.35 0.40 0.2 0.25 0.3 0.35 0.4 0.25 0.30 0.35 0.3 0.35
#[3,] 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.70 0.65 0.60 0.55 0.50 0.45 0.6 0.55 0.5 0.45 0.4 0.50 0.45 0.40 0.4 0.35
This will run into performance and memory problems if the possible number of combinations gets huge.
This was an easier to read solution for me:
x_grid <- data.frame(expand.grid(seq(0.1,1,0.05),
seq(0.1,1,0.05),
seq(0.1,1,0.05)))
x_combinations <- x[rowSums(x_grid) == 1, ]

Resources