Cross-tabulating data with a function - r

I have data in tree columns
set.seed(42)
N = 1000
XYp = as.data.frame(matrix(cbind(round(runif(N)*100),
round(runif(N)*1000+1000),
round(runif(N),2)),N,3))
colnames(XYp) <- c('X','Y','p')
Now I would like to cross-tabulate the data based on deciles in 2 dimension:
colX_deciles = quantile(data[,'X'], probs=seq(0,1,1/10))
colY_deciles = quantile(data[,'Y'], probs=seq(0,1,1/10))
XYp['X_decile'] <- findInterval(XYp[,'X'],colX_deciles,all.inside = TRUE)
XYp['Y_decile'] <- findInterval(XYp[,'Y'],colY_deciles,all.inside = TRUE)
> colX_deciles
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0.0 9.9 18.0 29.0 39.0 48.0 57.0 69.0 79.2 91.0 100.0
> colY_deciles
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
1000.0 1088.0 1180.0 1279.4 1392.0 1502.5 1602.4 1711.3 1805.2 1902.0 2000.0
I have figured out that it is possible to calculate the sum of elements in column p using xtabs:
> xtabs(p ~ X_decile + Y_decile, XYp)
Y_decile
X_decile 1 2 3 4 5 6 7 8 9 10
1 2.57 8.74 5.51 5.74 4.40 1.77 5.79 3.43 4.66 3.80
2 6.43 4.25 7.29 5.41 3.08 4.43 8.70 2.62 3.37 4.45
3 1.99 2.80 7.54 2.56 5.02 4.30 7.99 2.03 4.91 6.28
4 4.53 4.90 8.04 3.49 2.25 2.87 7.47 5.41 3.54 9.28
5 2.32 5.82 7.18 4.58 5.39 2.26 0.59 9.61 5.91 5.37
6 7.70 5.50 6.45 7.83 4.65 8.45 1.70 6.40 4.88 4.32
7 7.05 3.87 3.54 3.79 6.15 5.55 6.31 2.31 3.42 6.14
8 4.43 4.50 3.04 3.62 9.92 5.66 3.75 7.01 4.92 7.08
9 3.67 5.56 3.56 7.92 5.05 5.00 3.64 6.74 5.85 3.26
10 5.75 3.17 9.50 5.44 3.64 6.13 3.18 5.93 6.18 3.71
But how to elegantly apply any function to the cross-tabulated matrix element and get the results, for example avg(p) in the following manner? :
> xtabs(mean(p) ~ X_decile + Y_decile, XYp)
Error in model.frame.default(formula = mean(p) ~ X_decile + Y_decile, :
variable lengths differ (found for 'X_decile')
As a bonus, the values of colX_deciles[1:10] and colY_deciles[1:10] could be set as row names and column names, respectively.

I assume you want to use XYp object all the time (sometimes you used data)
I would suggest to immerse the aggregate function inside xtabs
xtabs(p ~ X_decile + Y_decile, aggregate(p ~ X_decile + Y_decile, XYp, mean))
Y_decile
X_decile 1 2 3 4 5 6 7 8 9 10
1 0.4283333 0.5826667 0.5009091 0.4100000 0.5500000 0.2950000 0.5263636 0.4900000 0.3584615 0.4222222
2 0.5358333 0.5312500 0.6627273 0.4918182 0.3850000 0.5537500 0.5800000 0.4366667 0.4814286 0.4450000
3 0.3980000 0.3500000 0.5800000 0.5120000 0.4183333 0.3583333 0.4205263 0.3383333 0.5455556 0.5233333
4 0.4118182 0.3769231 0.6700000 0.5816667 0.5625000 0.3587500 0.6225000 0.3864286 0.5900000 0.7138462
5 0.4640000 0.4476923 0.6527273 0.5088889 0.4900000 0.4520000 0.1966667 0.6006250 0.4925000 0.5370000
6 0.4812500 0.6111111 0.7166667 0.5592857 0.5166667 0.6035714 0.3400000 0.5818182 0.5422222 0.6171429
7 0.5035714 0.5528571 0.4425000 0.5414286 0.5125000 0.3964286 0.4853846 0.5775000 0.4275000 0.4723077
8 0.4430000 0.4090909 0.6080000 0.5171429 0.6200000 0.5660000 0.4687500 0.5392308 0.3784615 0.5446154
9 0.4077778 0.6177778 0.5085714 0.7200000 0.4208333 0.5000000 0.4550000 0.5616667 0.5318182 0.3622222
10 0.5227273 0.4528571 0.6785714 0.3885714 0.3640000 0.4715385 0.5300000 0.5390909 0.6866667 0.5300000

Related

R data.table, select columns with no NA

I have a table of stock prices here:
https://drive.google.com/file/d/1S666wiCzf-8MfgugN3IZOqCiM7tNPFh9/view?usp=sharing
Some columns have NA's because the company does not exist (until later dates), or the company folded.
What I want to do is: select columns that has no NA's. I use data.table because it is faster. Here are my working codes:
example <- fread(file = "example.csv", key = "date")
example_select <- example[,
lapply(.SD,
function(x) not(sum(is.na(x) > 0)))
] %>%
as.logical(.)
example[, ..example_select]
Is there better (less lines) code to do the same? Thank you!
Try:
example[,lapply(.SD, function(x) {if(anyNA(x)) {NULL} else {x}} )]
There are lots of ways you could do this. Here's how I usually do it - a data.table approach without lapply:
example[, .SD, .SDcols = colSums(is.na(example)) == 0]
An answer using tidyverse packages
library(readr)
library(dplyr)
library(purrr)
data <- read_csv("~/Downloads/example.csv")
map2_dfc(data, names(data), .f = function(x, y) {
column <- tibble("{y}" := x)
if(any(is.na(column)))
return(NULL)
else
return(column)
})
Output
# A tibble: 5,076 x 11
date ACU ACY AE AEF AIM AIRI AMS APT ARMP ASXC
<date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2001-01-02 2.75 4.75 14.4 8.44 2376 250 2.5 1.06 490000 179.
2 2001-01-03 2.75 4.5 14.5 9 2409 250 2.5 1.12 472500 193.
3 2001-01-04 2.75 4.5 14.1 8.88 2508 250 2.5 1.06 542500 301.
4 2001-01-05 2.38 4.5 14.1 8.88 2475 250 2.25 1.12 586250 301.
5 2001-01-08 2.56 4.75 14.3 8.75 2376 250 2.38 1.06 638750 276.
6 2001-01-09 2.56 4.75 14.3 8.88 2409 250 2.38 1.06 568750 264.
7 2001-01-10 2.56 5.5 14.5 8.69 2310 300 2.12 1.12 586250 274.
8 2001-01-11 2.69 5.25 14.4 8.69 2310 300 2.25 1.19 564375 333.
9 2001-01-12 2.75 4.81 14.6 8.75 2541 275 2 1.38 564375 370.
10 2001-01-16 2.75 4.88 14.9 8.94 2772 300 2.12 1.62 595000 358.
# … with 5,066 more rows
Using Filter :
library(data.table)
Filter(function(x) all(!is.na(x)), fread('example.csv'))
# date ACU ACY AE AEF AIM AIRI AMS APT
# 1: 2001-01-02 2.75 4.75 14.4 8.44 2376.00 250.00 2.50 1.06
# 2: 2001-01-03 2.75 4.50 14.5 9.00 2409.00 250.00 2.50 1.12
# 3: 2001-01-04 2.75 4.50 14.1 8.88 2508.00 250.00 2.50 1.06
# 4: 2001-01-05 2.38 4.50 14.1 8.88 2475.00 250.00 2.25 1.12
# 5: 2001-01-08 2.56 4.75 14.3 8.75 2376.00 250.00 2.38 1.06
# ---
#5072: 2021-03-02 36.95 10.59 28.1 8.77 2.34 1.61 2.48 14.33
#5073: 2021-03-03 38.40 10.00 30.1 8.78 2.26 1.57 2.47 12.92
#5074: 2021-03-04 37.90 8.03 30.8 8.63 2.09 1.44 2.27 12.44
#5075: 2021-03-05 35.68 8.13 31.5 8.70 2.05 1.48 2.35 12.45
#5076: 2021-03-08 37.87 8.22 31.9 8.59 2.01 1.52 2.47 12.15
# ARMP ASXC
# 1: 4.90e+05 178.75
# 2: 4.72e+05 192.97
# 3: 5.42e+05 300.62
# 4: 5.86e+05 300.62
# 5: 6.39e+05 276.25
# ---
#5072: 5.67e+00 3.92
#5073: 5.58e+00 4.54
#5074: 5.15e+00 4.08
#5075: 4.49e+00 3.81
#5076: 4.73e+00 4.15

Retrieve information of winning unit in a self organizing map plot

I would like to figure out which is the winning unit of a node in the kohonen plot
library(kohonen)
set.seed(0)
data("wines")
wines <- scale(wines)
som_grid <- somgrid(8, 6, "hexagonal")
som_model <- som(wines, som_grid)
plot(som_model)
The plot will look like this:
And you may know in which cluster the observation will lie with
head(data.frame(cbind(wines,unit= som_model$unit.classif)))
alcohol malic.acid ash ash.alkalinity magnesium tot..phenols flavonoids non.flav..phenols proanth col..int. col..hue
1 13.20 1.78 2.14 11.2 100 2.65 2.76 0.26 1.28 4.38 1.05
2 13.16 2.36 2.67 18.6 101 2.80 3.24 0.30 2.81 5.68 1.03
3 14.37 1.95 2.50 16.8 113 3.85 3.49 0.24 2.18 7.80 0.86
4 13.24 2.59 2.87 21.0 118 2.80 2.69 0.39 1.82 4.32 1.04
5 14.20 1.76 2.45 15.2 112 3.27 3.39 0.34 1.97 6.75 1.05
6 14.39 1.87 2.45 14.6 96 2.50 2.52 0.30 1.98 5.25 1.02
OD.ratio proline unit
1 3.40 1050 24
2 3.17 1185 46
3 3.45 1480 48
4 2.93 735 4
5 2.85 1450 48
6 3.58 1290 47
But I would like to retrieve this unit information in the plot, like putting a text in the nodes with this unit number in the same way that identify function does, but automatically. Thanks in advance!

'max.print' option in R

I have a data.frame with 178 rows and 14 columns. When I print it into the R-console, it only shows me 71 rows, despite the max.print option being set to 1000 rows.
Could anyone please explain why max.print option doesn't work to print full dataset in R console? And how can I do that?
I use R 3.4.1 on MacOS.
Here is a data example:
1 1 14.23 1.71 2.43 15.6 127 2.80 3.06 0.28 2.29 5.640000 1.040 3.92 1065
2 1 13.20 1.78 2.14 11.2 100 2.65 2.76 0.26 1.28 4.380000 1.050 3.40 1050
3 1 13.16 2.36 2.67 18.6 101 2.80 3.24 0.30 2.81 5.680000 1.030 3.17 1185
4 1 14.37 1.95 2.50 16.8 113 3.85 3.49 0.24 2.18 7.800000 0.860 3.45 1480
5 1 13.24 2.59 2.87 21.0 118 2.80 2.69 0.39 1.82 4.320000 1.040 2.93 735
6 1 14.20 1.76 2.45 15.2 112 3.27 3.39 0.34 1.97 6.750000 1.050 2.85 1450
7 1 14.39 1.87 2.45 14.6 96 2.50 2.52 0.30 1.98 5.250000 1.020 3.58 1290
8 1 14.06 2.15 2.61 17.6 121 2.60 2.51 0.31 1.25 5.050000 1.060 3.58 1295
9 1 14.83 1.64 2.17 14.0 97 2.80 2.98 0.29 1.98 5.200000 1.080 2.85 1045
10 1 13.86 1.35 2.27 16.0 98 2.98 3.15 0.22 1.85 7.220000 1.010 3.55 1045
11 1 14.10 2.16 2.30 18.0 105 2.95 3.32 0.22 2.38 5.750000 1.250 3.17 1510
12 1 14.12 1.48 2.32 16.8 95 2.20 2.43 0.26 1.57 5.000000 1.170 2.82 1280
13 1 13.75 1.73 2.41 16.0 89 2.60 2.76 0.29 1.81 5.600000 1.150 2.90 1320
14 1 14.75 1.73 2.39 11.4 91 3.10 3.69 0.43 2.81 5.400000 1.250 2.73 1150
15 1 14.38 1.87 2.38 12.0 102 3.30 3.64 0.29 2.96 7.500000 1.200 3.00 1547
16 1 13.63 1.81 2.70 17.2 112 2.85 2.91 0.30 1.46 7.300000 1.280 2.88 1310
17 1 14.30 1.92 2.72 20.0 120 2.80 3.14 0.33 1.97 6.200000 1.070 2.65 1280
18 1 13.83 1.57 2.62 20.0 115 2.95 3.40 0.40 1.72 6.600000 1.130 2.57 1130
19 1 14.19 1.59 2.48 16.5 108 3.30 3.93 0.32 1.86 8.700000 1.230 2.82 1680
20 1 13.64 3.10 2.56 15.2 116 2.70 3.03 0.17 1.66 5.100000 0.960 3.36 845
21 1 14.06 1.63 2.28 16.0 126 3.00 3.17 0.24 2.10 5.650000 1.090 3.71 780
22 1 12.93 3.80 2.65 18.6 102 2.41 2.41 0.25 1.98 4.500000 1.030 3.52 770
23 1 13.71 1.86 2.36 16.6 101 2.61 2.88 0.27 1.69 3.800000 1.110 4.00 1035
24 1 12.85 1.60 2.52 17.8 95 2.48 2.37 0.26 1.46 3.930000 1.090 3.63 1015
25 1 13.50 1.81 2.61 20.0 96 2.53 2.61 0.28 1.66 3.520000 1.120 3.82 845
26 1 13.05 2.05 3.22 25.0 124 2.63 2.68 0.47 1.92 3.580000 1.130 3.20 830
27 1 13.39 1.77 2.62 16.1 93 2.85 2.94 0.34 1.45 4.800000 0.920 3.22 1195
28 1 13.30 1.72 2.14 17.0 94 2.40 2.19 0.27 1.35 3.950000 1.020 2.77 1285
29 1 13.87 1.90 2.80 19.4 107 2.95 2.97 0.37 1.76 4.500000 1.250 3.40 915
30 1 14.02 1.68 2.21 16.0 96 2.65 2.33 0.26 1.98 4.700000 1.040 3.59 1035
31 1 13.73 1.50 2.70 22.5 101 3.00 3.25 0.29 2.38 5.700000 1.190 2.71 1285
32 1 13.58 1.66 2.36 19.1 106 2.86 3.19 0.22 1.95 6.900000 1.090 2.88 1515
33 1 13.68 1.83 2.36 17.2 104 2.42 2.69 0.42 1.97 3.840000 1.230 2.87 990
34 1 13.76 1.53 2.70 19.5 132 2.95 2.74 0.50 1.35 5.400000 1.250 3.00 1235
35 1 13.51 1.80 2.65 19.0 110 2.35 2.53 0.29 1.54 4.200000 1.100 2.87 1095
36 1 13.48 1.81 2.41 20.5 100 2.70 2.98 0.26 1.86 5.100000 1.040 3.47 920
37 1 13.28 1.64 2.84 15.5 110 2.60 2.68 0.34 1.36 4.600000 1.090 2.78 880
38 1 13.05 1.65 2.55 18.0 98 2.45 2.43 0.29 1.44 4.250000 1.120 2.51 1105
39 1 13.07 1.50 2.10 15.5 98 2.40 2.64 0.28 1.37 3.700000 1.180 2.69 1020
40 1 14.22 3.99 2.51 13.2 128 3.00 3.04 0.20 2.08 5.100000 0.890 3.53 760
41 1 13.56 1.71 2.31 16.2 117 3.15 3.29 0.34 2.34 6.130000 0.950 3.38 795
42 1 13.41 3.84 2.12 18.8 90 2.45 2.68 0.27 1.48 4.280000 0.910 3.00 1035
43 1 13.88 1.89 2.59 15.0 101 3.25 3.56 0.17 1.70 5.430000 0.880 3.56 1095
44 1 13.24 3.98 2.29 17.5 103 2.64 2.63 0.32 1.66 4.360000 0.820 3.00 680
45 1 13.05 1.77 2.10 17.0 107 3.00 3.00 0.28 2.03 5.040000 0.880 3.35 885
46 1 14.21 4.04 2.44 18.9 111 2.85 2.65 0.30 1.25 5.240000 0.870 3.33 1080
47 1 14.38 3.59 2.28 16.0 102 3.25 3.17 0.27 2.19 4.900000 1.040 3.44 1065
48 1 13.90 1.68 2.12 16.0 101 3.10 3.39 0.21 2.14 6.100000 0.910 3.33 985
49 1 14.10 2.02 2.40 18.8 103 2.75 2.92 0.32 2.38 6.200000 1.070 2.75 1060
50 1 13.94 1.73 2.27 17.4 108 2.88 3.54 0.32 2.08 8.900000 1.120 3.10 1260
51 1 13.05 1.73 2.04 12.4 92 2.72 3.27 0.17 2.91 7.200000 1.120 2.91 1150
52 1 13.83 1.65 2.60 17.2 94 2.45 2.99 0.22 2.29 5.600000 1.240 3.37 1265
53 1 13.82 1.75 2.42 14.0 111 3.88 3.74 0.32 1.87 7.050000 1.010 3.26 1190
54 1 13.77 1.90 2.68 17.1 115 3.00 2.79 0.39 1.68 6.300000 1.130 2.93 1375
55 1 13.74 1.67 2.25 16.4 118 2.60 2.90 0.21 1.62 5.850000 0.920 3.20 1060
56 1 13.56 1.73 2.46 20.5 116 2.96 2.78 0.20 2.45 6.250000 0.980 3.03 1120
57 1 14.22 1.70 2.30 16.3 118 3.20 3.00 0.26 2.03 6.380000 0.940 3.31 970
58 1 13.29 1.97 2.68 16.8 102 3.00 3.23 0.31 1.66 6.000000 1.070 2.84 1270
59 1 13.72 1.43 2.50 16.7 108 3.40 3.67 0.19 2.04 6.800000 0.890 2.87 1285
60 2 12.37 0.94 1.36 10.6 88 1.98 0.57 0.28 0.42 1.950000 1.050 1.82 520
61 2 12.33 1.10 2.28 16.0 101 2.05 1.09 0.63 0.41 3.270000 1.250 1.67 680
62 2 12.64 1.36 2.02 16.8 100 2.02 1.41 0.53 0.62 5.750000 0.980 1.59 450
63 2 13.67 1.25 1.92 18.0 94 2.10 1.79 0.32 0.73 3.800000 1.230 2.46 630
64 2 12.37 1.13 2.16 19.0 87 3.50 3.10 0.19 1.87 4.450000 1.220 2.87 420
65 2 12.17 1.45 2.53 19.0 104 1.89 1.75 0.45 1.03 2.950000 1.450 2.23 355
66 2 12.37 1.21 2.56 18.1 98 2.42 2.65 0.37 2.08 4.600000 1.190 2.30 678
67 2 13.11 1.01 1.70 15.0 78 2.98 3.18 0.26 2.28 5.300000 1.120 3.18 502
68 2 12.37 1.17 1.92 19.6 78 2.11 2.00 0.27 1.04 4.680000 1.120 3.48 510
69 2 13.34 0.94 2.36 17.0 110 2.53 1.30 0.55 0.42 3.170000 1.020 1.93 750
70 2 12.21 1.19 1.75 16.8 151 1.85 1.28 0.14 2.50 2.850000 1.280 3.07 718
71 2 12.29 1.61 2.21 20.4 103 1.10 1.02 0.37 1.46 3.050000 0.906 1.82 870
[ reached getOption("max.print") -- omitted 107 rows ]```
options(max.print = 99999)
try this command
Type this code at the start of your R code. Worked for me:
options(max.print = .Machine$integer.max)

Substituting the results of a calculation

I'm munging data, specifically, I've opened this pdf http://pubs.acs.org/doi/suppl/10.1021/ja105035r/suppl_file/ja105035r_si_001.pdf and scraped the data from table s4,
1a 1b 1a 1b
1 5.27 4.76 5.09 4.75
2 2.47 2.74 2.77 2.80
4 1.14 1.38 1.12 1.02
6 7.43 7.35 7.22-7.35a 7.25-7.36a
7 7.38 7.34 7.22-7.35a 7.25-7.36a
8 7.23 7.20 7.22-7.35a 7.25-7.36a
9(R) 4.16 3.89 4.12b 4.18b
9(S) 4.16 3.92 4.12b 4.18b
10 1.19 0.91 1.21 1.25
pasted it into notepad and saved it as a txt file.
s4 <- read.table("s4.txt", header=TRUE, stringsAsFactors=FALSE)
gives,
X1a X1b X1a.1 X1b.1
1 5.27 4.76 5.09 4.75
2 2.47 2.74 2.77 2.80
4 1.14 1.38 1.12 1.02
6 7.43 7.35 7.22-7.35a 7.25-7.36a
7 7.38 7.34 7.22-7.35a 7.25-7.36a
8 7.23 7.20 7.22-7.35a 7.25-7.36a
in order to use the data I need to change it all to numeric and remove the letters, thanks to this link R regex gsub separate letters and numbers I can use the following code,
gsub("([[:alpha:]])","",s4[,3])
I can get rid of the extraneous letters.
What I want to do now, and the point of the question, is to change the ranges,
"7.22-7.35" "7.22-7.35" "7.22-7.35"
with their means,
"7.29"
Could I use gsub for this? (or would I need to strsplit across the hyphen, combine into a vector and return the mean?).
You need a single regex in strsplit for this task (removing letters and splitting):
s4[] <- lapply(s4, function(x) {
if (is.numeric(x)) x
else sapply(strsplit(as.character(x), "-|[[:alpha:]]"),
function(y) mean(as.numeric(y)))
})
The result:
> s4
X1a X1b X1a.1 X1b.1
1 5.27 4.76 5.090 4.750
2 2.47 2.74 2.770 2.800
4 1.14 1.38 1.120 1.020
6 7.43 7.35 7.285 7.305
7 7.38 7.34 7.285 7.305
8 7.23 7.20 7.285 7.305
Here's an approach that seems to work right on the sample data:
df[] <- lapply(df, function(col){
col <- gsub("([[:alpha:]])","", col)
col <- ifelse(grepl("-", col), mean(as.numeric(unlist(strsplit(col[grepl("-", col)], "-")))), col)
as.numeric(col)
})
> df
# X1a X1b X1a.1 X1b.1
#1 5.27 4.76 5.090 4.750
#2 2.47 2.74 2.770 2.800
#4 1.14 1.38 1.120 1.020
#6 7.43 7.35 7.285 7.305
#7 7.38 7.34 7.285 7.305
#8 7.23 7.20 7.285 7.305
Disclaimer: It only works right if the ranges in each column are all the same (as in the sample data)
something like that :
mean(as.numeric(unlist(strsplit("7.22-7.35","-"))))
should work (and correspond to what you had in mind I guess)
or you can do :
eval(parse(text=paste0("mean(c(",gsub("-",",","7.22-7.35"),"))")))
but I'm not sure this is simpler...
To apply it to a vector :
vec<-c("7.22-7.35","7.22-7.35")
1st solution : sapply(vec, function(x) mean(as.numeric(unlist(strsplit(x,"-")))))
2nd solution : sapply(vec, function(x) eval(parse(text=paste0("mean(c(",gsub("-",",",x),"))"))))
In both cases, you'll get :
7.22-7.35 7.22-7.35
7.285 7.285
Also,
library(gsubfn)
indx <- !sapply(s4, is.numeric)
s4[indx] <- lapply(s4[indx], function(x)
sapply(strapply(x, '([0-9.]+)', ~as.numeric(x)), mean))
s4
# X1a X1b X1a.1 X1b.1
#1 5.27 4.76 5.090 4.750
#2 2.47 2.74 2.770 2.800
#4 1.14 1.38 1.120 1.020
#6 7.43 7.35 7.285 7.305
#7 7.38 7.34 7.285 7.305
#8 7.23 7.20 7.285 7.305

R: returning row value when certain number of columns reach certain value

Return row value when certain number of columns reach certain value from the following table
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 3.93 3.92 3.74 4.84 4.55 4.67 3.99 4.10 4.86 4.06
2 4.00 3.99 3.81 4.90 4.61 4.74 4.04 4.15 4.92 4.11
3 4.67 4.06 3.88 5.01 4.66 4.80 4.09 4.20 4.98 4.16
4 4.73 4.12 3.96 5.03 4.72 4.85 4.14 4.25 5.04 4.21
5 4.79 4.21 4.04 5.09 4.77 4.91 4.18 4.30 5.10 4.26
6 4.86 4.29 4.12 5.15 4.82 4.96 4.23 4.35 5.15 4.30
7 4.92 4.37 4.19 5.21 4.87 5.01 4.27 4.39 5.20 4.35
8 4.98 4.43 4.25 5.26 4.91 5.12 4.31 4.43 5.25 4.38
9 5.04 4.49 4.31 5.30 4.95 5.15 4.34 4.46 5.29 4.41
10 5.04 4.50 4.49 5.31 5.01 5.17 4.50 4.60 5.30 4.45
11 ...
12 ...
As an output, I need a data frame, containing the % reach of the value of interest ('5' in this example) by V1-V10:
Rownum Percent
1 0
2 0
3 10
4 20
5 20
6 20
7 33
8 33
9 40
10 50
Many thanks!
If your matrix is mat:
cbind(1:dim(mat)[1],rowSums(mat>5)/dim(mat)[2]*100)
As far as it's always about 0 and 1 with ten columns, I would multiply the whole dataset by 10 (equals percentage values in this case...). Just use the following code:
# Sample data
set.seed(10)
data <- as.data.frame(do.call("rbind", lapply(seq(9), function(...) {
sample(c(0, 1), 10, replace = TRUE)
})))
rownames(data) <- c("abc", "def", "ghi", "jkl", "mno", "pqr", "stu", "vwx", "yza")
# Percentages
rowSums(data * 10)
# abc def ghi jkl mno pqr stu vwx yza
# 80 40 80 60 60 10 30 50 50
Ok, so now I believe you want to get the percentage of values in each row that meet some threshold criteria. You give the example > 5. One solution of many is using apply:
apply( df , 1 , function(x) sum( x > 5 )/length(x)*100 )
# 1 2 3 4 5 6 7 8 9 10
# 0 0 10 20 20 20 30 30 40 50
#Thomas' solution will be faster for large data.frames because it converts to a matrix first, and these are faster to operate on.

Resources