Convert 2 rows in one in a data frame - r

I have a data frame like this:
> mydata <- read.csv("mydata.csv", header=T, stringsAsFactors=F)
> tbl_df(mydata)
# A tibble: 16,499 x 60
SRC_035_01 SRC_035_01.1 SRC_035_02 SRC_035_02.1
1 Force Time Force Time
2 -0.0037 0.000 0.0041 0.000
3 0.0000 0.004 0.0073 0.004
4 0.0079 0.008 0.0156 0.008
5 0.0150 0.012 0.0228 0.012
6 0.0177 0.016 0.0262 0.016
7 0.0141 0.020 0.0236 0.020
8 0.0103 0.024 0.0206 0.024
9 0.0080 0.028 0.0193 0.028
10 0.0102 0.032 0.0226 0.032
I need to combine the first row with the header, column by column, resulting in this output:
# A tibble: 16,498 x 60
SRC_035_01_Force SRC_035_01.1_Time SRC_035_02_Force SRC_035_02.1_Time
2 -0.0037 0.000 0.0041 0.000
3 0.0000 0.004 0.0073 0.004
4 0.0079 0.008 0.0156 0.008
5 0.0150 0.012 0.0228 0.012
6 0.0177 0.016 0.0262 0.016
7 0.0141 0.020 0.0236 0.020
8 0.0103 0.024 0.0206 0.024
9 0.0080 0.028 0.0193 0.028
10 0.0102 0.032 0.0226 0.032
Anyone could give me a code hint?
Thanks a lot!

Related

R: How to find which rows in a file are responsible for two 'populations'

Let's say I have two input data files. The first looks like this:
1 0.00038 0.75053 0.50 35 6000 0.75346
2 0.00038 0.75053 0.50 35 6050 0.72079
3 0.00038 0.75053 0.50 35 6100 0.69229
4 0.00038 0.75053 0.50 35 6150 0.66689
5 0.00038 0.75053 0.50 35 6200 0.64382
6 0.00038 0.75053 0.50 35 6250 0.62269
7 0.00038 0.75053 0.50 35 6300 0.60313
8 0.00038 0.75053 0.50 35 6350 0.58481
9 0.00038 0.75053 0.50 35 6400 0.56756
10 0.00038 0.75053 0.50 35 6450 0.55122
And the second one looks like this:
1 -0.123 -0.306 inf 1.043 0.000 0.010 0.000 0.653 0.000 0.091 0.000 0.009 0.000 3.097 0.000 0.137 0.002
2 -0.142 -0.170 inf 1.035 0.000 0.064 0.000 0.538 0.000 0.560 0.000 0.289 0.000 3.168 0.000 6.182 0.000
3 -0.160 -0.143 inf 1.027 0.000 0.086 0.000 0.401 0.000 0.631 0.000 0.400 0.000 3.348 0.000 0.130 0.000
4 -0.176 -0.117 inf 1.020 0.000 0.107 0.000 0.249 0.000 0.592 0.000 0.435 0.000 3.526 0.000 0.402 0.001
5 -0.191 -0.110 inf 1.014 0.000 0.133 0.000 0.091 0.000 0.514 0.000 0.425 0.000 3.644 0.001 0.598 0.001
6 -0.206 -0.099 inf 1.008 0.000 0.162 0.000 6.247 0.000 0.435 0.001 0.392 0.001 3.675 0.001 0.707 0.002
7 -0.220 -0.093 0.976 1.003 0.000 0.194 0.000 6.168 0.001 0.377 0.001 0.352 0.001 3.602 0.003 0.740 0.003
8 -0.233 -0.092 inf 0.999 0.000 0.226 0.000 6.137 0.001 0.353 0.001 0.302 0.001 3.445 0.004 0.712 0.005
9 -0.246 -0.124 inf 0.996 0.000 0.258 0.000 6.145 0.001 0.363 0.001 0.252 0.001 3.242 0.004 0.620 0.006
10 -0.259 -0.119 inf 0.994 0.000 0.289 0.000 6.172 0.001 0.393 0.001 0.206 0.001 3.028 0.005 0.456 0.008
Now, as you can see, there appears to be 2 populations in this graph, no? I would like to find out which rows in the 2nd file correlate to the different populations. What would be the best way to do this?
If you would like to reproduce this yourself here is the first input file and the second input file.

R: Create a new data frame from an old one by checking if the old data frame's first column value matches a third data frame's first column value

For example, let's say I have a first data frame that looks like this:
1 -0.123 -0.306 inf 1.043 0.000 0.010 0.000 0.653 0.000 0.091 0.000 0.009 0.000 3.097 0.000 0.137 0.002
2 -0.142 -0.170 inf 1.035 0.000 0.064 0.000 0.538 0.000 0.560 0.000 0.289 0.000 3.168 0.000 6.182 0.000
3 -0.160 -0.143 inf 1.027 0.000 0.086 0.000 0.401 0.000 0.631 0.000 0.400 0.000 3.348 0.000 0.130 0.000
4 -0.176 -0.117 inf 1.020 0.000 0.107 0.000 0.249 0.000 0.592 0.000 0.435 0.000 3.526 0.000 0.402 0.001
5 -0.191 -0.110 inf 1.014 0.000 0.133 0.000 0.091 0.000 0.514 0.000 0.425 0.000 3.644 0.001 0.598 0.001
6 -0.206 -0.099 inf 1.008 0.000 0.162 0.000 6.247 0.000 0.435 0.001 0.392 0.001 3.675 0.001 0.707 0.002
7 -0.220 -0.093 0.976 1.003 0.000 0.194 0.000 6.168 0.001 0.377 0.001 0.352 0.001 3.602 0.003 0.740 0.003
8 -0.233 -0.092 inf 0.999 0.000 0.226 0.000 6.137 0.001 0.353 0.001 0.302 0.001 3.445 0.004 0.712 0.005
9 -0.246 -0.124 inf 0.996 0.000 0.258 0.000 6.145 0.001 0.363 0.001 0.252 0.001 3.242 0.004 0.620 0.006
10 -0.259 -0.119 inf 0.994 0.000 0.289 0.000 6.172 0.001 0.393 0.001 0.206 0.001 3.028 0.005 0.456 0.008
I want to create a new second data frame from this first data frame - on one condition. I have a third frame that looks like this:
1 0.00038 0.75053 0.50 35 6000 0.75346
7 0.00038 0.75053 0.50 35 6300 0.60313
10 0.00038 0.75053 0.50 35 6450 0.55122
and I want to use only the rows where the first column value exists.
Ultimately, I want the second data frame to look like this:
1 -0.123 -0.306 inf 1.043 0.000 0.010 0.000 0.653 0.000 0.091 0.000 0.009 0.000 3.097 0.000 0.137 0.002
7 -0.220 -0.093 0.976 1.003 0.000 0.194 0.000 6.168 0.001 0.377 0.001 0.352 0.001 3.602 0.003 0.740 0.003
10 -0.259 -0.119 inf 0.994 0.000 0.289 0.000 6.172 0.001 0.393 0.001 0.206 0.001 3.028 0.005 0.456 0.008
can be done like this:
df1[ df1[,1] %in% df3[,1], ]

import specific rows from "txt" into R

I have a "example.txt" document just as follows:
SIGNAL: 40 41 42
0.406 0.043 0.051 0.021 0.013
0.056 0.201 0.026 0.009 0.000
0.000 0.128 0 0.009 0.000
TOTAL: 0.657
SIGNAL: 44 45 46 48
0.128 0.338 0.026
0.333 0.03 0.000
0.060 0.013 0.004
0.009 0.017 0.009
0.013 0 0.000
TOTAL: 0.704
SIGNAL: 51 52 54
0.368 0.081 0.085 0.004
0.162 0.09 0.064 0.073
0.013 0.017 0.009 0.000
TOTAL: 0.266
SIGNAL: 60 61 62 63 64 65 66 67
0.530 0.030
0.009 0.179
0.154 0.004
0.068 0.009
TOTAL: 0.796
I want to import the rows between "SIGNAL: 44 45 46 48" and "TOTAL: 0.704" into R, I use read.table("example.txt",skip=6 ,nrow=5) to extract these specific rows, it works.
V1 V2 V3
1 0.128 0.338 0.026
2 0.333 0.030 0.000
3 0.060 0.013 0.004
4 0.009 0.017 0.009
5 0.013 0.000 0.000
However, my real data (has 450,000 rows) is very big, if I want to extract the rows between "SIGNAL: 3000 3001 3002 3003" and the next"TOTAL", how can I do with it? Thank you so much!
I have worked it out based on akrun's code. For example, I want to extract the first two sets. I can just use:
lines <- readLines('example.txt')
g<-c(40,44)
sapply(1:length(g), function(x){Map(function(i,j) read.table(text=lines[(i+1):(j-1)], sep='', header=FALSE), grep(paste('SIGNAL:',g[x]), lines), grep('TOTAL', lines)[which(grep(paste('SIGNAL:',g[x]), lines)==grep('SIGNAL', lines))])})

Extract rows from a matrix based of values from another matrix

I need your help!
I am trying to pull out rows of the second matrix based on IDs from the first matrix. To check that my function (which is not provided here) works correctly, I run the following code (CritMat is the second matrix and parms is the first):
results <- matrix(0, nrow = 15, ncol = 8)
colnames(results) <- c("alpha", "beta", "omega", "T=64", "T=128", "T=256", "T=512", "T=1024")
for (r in 1:15) {
results [r,] <- CritMat[CritMat[, 1] == parms[r, 2] & CritMat[, 2] ==
parms[r, 1] & CritMat[, 3] == parms[r, 3] , ]
print(results[r,])
}
The loop works for the first 4 iterations followed by the following error message for the fifth:
*Error in results[r, ] <- CritMat[CritMat[, 1] == parms[r, 2] & CritMat[, :
replacement has length zero*
Any idea why this happens and solution.
Many thanks
AA
****parms matrix****
beta alpha omega
1 0.005 0.005 0.990
2 0.240 0.005 0.755
3 0.490 0.005 0.505
4 0.740 0.005 0.255
5 0.990 0.005 0.005
6 0.005 0.250 0.745
7 0.240 0.250 0.510
8 0.490 0.250 0.260
9 0.740 0.250 0.010
10 0.005 0.500 0.495
11 0.240 0.500 0.260
12 0.490 0.500 0.010
13 0.005 0.750 0.245
14 0.240 0.750 0.010
15 0.005 0.990 0.005
****CritMat matrix****
alpha beta omega T.64 T.128 T.256 T.512 T.1024
1 0.005 0.005 0.990 -2.956420 -2.919654 -2.921704 -2.886429 -2.879443
2 0.005 0.240 0.755 -2.959242 -2.917744 -2.923356 -2.885018 -2.881905
3 0.005 0.490 0.505 -2.959395 -2.915798 -2.927405 -2.886637 -2.885186
4 0.005 0.740 0.255 -2.957763 -2.912088 -2.934518 -2.890182 -2.889484
5 0.005 0.990 0.005 -2.937999 -2.857668 -2.864637 -2.819950 -2.820588
6 0.250 0.005 0.745 -2.987160 -2.986864 -2.897846 -2.865875 -2.911572
7 0.250 0.240 0.510 -3.034868 -2.979375 -2.924888 -2.875446 -2.898752
8 0.250 0.490 0.260 -3.052279 -2.995942 -2.969414 -2.926178 -2.918958
9 0.250 0.740 0.010 -3.197169 -3.263336 -3.258011 -3.202253 -3.248068
10 0.500 0.005 0.495 -3.031267 -3.038585 -2.936348 -2.921126 -2.908868
11 0.500 0.240 0.260 -3.142031 -3.086536 -3.026555 -3.079825 -2.871080
12 0.500 0.490 0.010 -3.383052 -3.410789 -3.431221 -3.367462 -3.332024
13 0.750 0.005 0.245 -3.209441 -3.170385 -3.112472 -3.141569 -2.925559
14 0.750 0.240 0.010 -3.452131 -3.517234 -3.428402 -3.477691 -3.178128
15 0.990 0.005 0.005 -3.427804 -3.491805 -3.298037 -3.290127 -3.087541

Evaluating a matrix by row for a condition being met in R

I've got data in the following format.
P10_neg._qn P11_neg._qn P12_neg._qn P14_neg._qn P17_neg._qn P24_neg._qn P25_neg._qn
1 -0.025 -0.037 -0.032 -0.061 -0.176 0.033 -0.011
2 -0.029 -0.125 0.003 -0.098 0.117 0.039 0.087
3 0.033 -0.127 0.042 0.014 0.097 0.105 0.048
4 0.033 -0.127 0.042 0.014 0.097 0.105 0.048
5 -0.029 -0.125 0.003 -0.098 0.117 0.039 0.087
6 -0.029 -0.125 0.003 -0.098 0.117 0.039 0.087
What is the best way by which I can check, for every row, how many entries are greater than 0.1, for instance and return a vector of counts?
You can use the rowSum function for this task. Assuming that dat is you matrix then :
rowSum(dat > 0.1)
Using the sample data provided we have :
dat <- read.table(text = ' P10_neg._qn P11_neg._qn P12_neg._qn P14_neg._qn P17_neg._qn P24_neg._qn P25_neg._qn
1 -0.025 -0.037 -0.032 -0.061 -0.176 0.033 -0.011
2 -0.029 -0.125 0.003 -0.098 0.117 0.039 0.087
3 0.033 -0.127 0.042 0.014 0.097 0.105 0.048
4 0.033 -0.127 0.042 0.014 0.097 0.105 0.048
5 -0.029 -0.125 0.003 -0.098 0.117 0.039 0.087
6 -0.029 -0.125 0.003 -0.098 0.117 0.039 0.087',
row.names = 1, header = TRUE)
rowSums(dat > 0.1)
## 1 2 3 4 5 6
## 0 1 1 1 1 1
apply(dat, 1, function(x) sum(x>.1))
# [1] 0 1 1 1 1 1
here an Rcpp version:
// [[Rcpp::export]]
IntegerVector countGreaterThan2(NumericMatrix M,double val) {
IntegerVector res;
for (int i=0; i<M.nrow(); i++) {
NumericVector row = M( i, _);
double num = std::count_if(row.begin(), row.end(),
[&val](const double& x) -> bool {return x>val;});
res.push_back(num);
}
return res;
}
But rowSum is unbeatable:
system.time(rowSums(dfx>0.2))
user system elapsed
0.01 0.00 0.02
> system.time(countGreaterThan2(dfx,0.2))
user system elapsed
0.06 0.00 0.06

Resources