I have data in tree columns
set.seed(42)
N = 1000
XYp = as.data.frame(matrix(cbind(round(runif(N)*100),
round(runif(N)*1000+1000),
round(runif(N),2)),N,3))
colnames(XYp) <- c('X','Y','p')
Now I would like to cross-tabulate the data based on deciles in 2 dimension:
colX_deciles = quantile(data[,'X'], probs=seq(0,1,1/10))
colY_deciles = quantile(data[,'Y'], probs=seq(0,1,1/10))
XYp['X_decile'] <- findInterval(XYp[,'X'],colX_deciles,all.inside = TRUE)
XYp['Y_decile'] <- findInterval(XYp[,'Y'],colY_deciles,all.inside = TRUE)
> colX_deciles
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0.0 9.9 18.0 29.0 39.0 48.0 57.0 69.0 79.2 91.0 100.0
> colY_deciles
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
1000.0 1088.0 1180.0 1279.4 1392.0 1502.5 1602.4 1711.3 1805.2 1902.0 2000.0
I have figured out that it is possible to calculate the sum of elements in column p using xtabs:
> xtabs(p ~ X_decile + Y_decile, XYp)
Y_decile
X_decile 1 2 3 4 5 6 7 8 9 10
1 2.57 8.74 5.51 5.74 4.40 1.77 5.79 3.43 4.66 3.80
2 6.43 4.25 7.29 5.41 3.08 4.43 8.70 2.62 3.37 4.45
3 1.99 2.80 7.54 2.56 5.02 4.30 7.99 2.03 4.91 6.28
4 4.53 4.90 8.04 3.49 2.25 2.87 7.47 5.41 3.54 9.28
5 2.32 5.82 7.18 4.58 5.39 2.26 0.59 9.61 5.91 5.37
6 7.70 5.50 6.45 7.83 4.65 8.45 1.70 6.40 4.88 4.32
7 7.05 3.87 3.54 3.79 6.15 5.55 6.31 2.31 3.42 6.14
8 4.43 4.50 3.04 3.62 9.92 5.66 3.75 7.01 4.92 7.08
9 3.67 5.56 3.56 7.92 5.05 5.00 3.64 6.74 5.85 3.26
10 5.75 3.17 9.50 5.44 3.64 6.13 3.18 5.93 6.18 3.71
But how to elegantly apply any function to the cross-tabulated matrix element and get the results, for example avg(p) in the following manner? :
> xtabs(mean(p) ~ X_decile + Y_decile, XYp)
Error in model.frame.default(formula = mean(p) ~ X_decile + Y_decile, :
variable lengths differ (found for 'X_decile')
As a bonus, the values of colX_deciles[1:10] and colY_deciles[1:10] could be set as row names and column names, respectively.
I assume you want to use XYp object all the time (sometimes you used data)
I would suggest to immerse the aggregate function inside xtabs
xtabs(p ~ X_decile + Y_decile, aggregate(p ~ X_decile + Y_decile, XYp, mean))
Y_decile
X_decile 1 2 3 4 5 6 7 8 9 10
1 0.4283333 0.5826667 0.5009091 0.4100000 0.5500000 0.2950000 0.5263636 0.4900000 0.3584615 0.4222222
2 0.5358333 0.5312500 0.6627273 0.4918182 0.3850000 0.5537500 0.5800000 0.4366667 0.4814286 0.4450000
3 0.3980000 0.3500000 0.5800000 0.5120000 0.4183333 0.3583333 0.4205263 0.3383333 0.5455556 0.5233333
4 0.4118182 0.3769231 0.6700000 0.5816667 0.5625000 0.3587500 0.6225000 0.3864286 0.5900000 0.7138462
5 0.4640000 0.4476923 0.6527273 0.5088889 0.4900000 0.4520000 0.1966667 0.6006250 0.4925000 0.5370000
6 0.4812500 0.6111111 0.7166667 0.5592857 0.5166667 0.6035714 0.3400000 0.5818182 0.5422222 0.6171429
7 0.5035714 0.5528571 0.4425000 0.5414286 0.5125000 0.3964286 0.4853846 0.5775000 0.4275000 0.4723077
8 0.4430000 0.4090909 0.6080000 0.5171429 0.6200000 0.5660000 0.4687500 0.5392308 0.3784615 0.5446154
9 0.4077778 0.6177778 0.5085714 0.7200000 0.4208333 0.5000000 0.4550000 0.5616667 0.5318182 0.3622222
10 0.5227273 0.4528571 0.6785714 0.3885714 0.3640000 0.4715385 0.5300000 0.5390909 0.6866667 0.5300000
Related
I have a table of stock prices here:
https://drive.google.com/file/d/1S666wiCzf-8MfgugN3IZOqCiM7tNPFh9/view?usp=sharing
Some columns have NA's because the company does not exist (until later dates), or the company folded.
What I want to do is: select columns that has no NA's. I use data.table because it is faster. Here are my working codes:
example <- fread(file = "example.csv", key = "date")
example_select <- example[,
lapply(.SD,
function(x) not(sum(is.na(x) > 0)))
] %>%
as.logical(.)
example[, ..example_select]
Is there better (less lines) code to do the same? Thank you!
Try:
example[,lapply(.SD, function(x) {if(anyNA(x)) {NULL} else {x}} )]
There are lots of ways you could do this. Here's how I usually do it - a data.table approach without lapply:
example[, .SD, .SDcols = colSums(is.na(example)) == 0]
An answer using tidyverse packages
library(readr)
library(dplyr)
library(purrr)
data <- read_csv("~/Downloads/example.csv")
map2_dfc(data, names(data), .f = function(x, y) {
column <- tibble("{y}" := x)
if(any(is.na(column)))
return(NULL)
else
return(column)
})
Output
# A tibble: 5,076 x 11
date ACU ACY AE AEF AIM AIRI AMS APT ARMP ASXC
<date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2001-01-02 2.75 4.75 14.4 8.44 2376 250 2.5 1.06 490000 179.
2 2001-01-03 2.75 4.5 14.5 9 2409 250 2.5 1.12 472500 193.
3 2001-01-04 2.75 4.5 14.1 8.88 2508 250 2.5 1.06 542500 301.
4 2001-01-05 2.38 4.5 14.1 8.88 2475 250 2.25 1.12 586250 301.
5 2001-01-08 2.56 4.75 14.3 8.75 2376 250 2.38 1.06 638750 276.
6 2001-01-09 2.56 4.75 14.3 8.88 2409 250 2.38 1.06 568750 264.
7 2001-01-10 2.56 5.5 14.5 8.69 2310 300 2.12 1.12 586250 274.
8 2001-01-11 2.69 5.25 14.4 8.69 2310 300 2.25 1.19 564375 333.
9 2001-01-12 2.75 4.81 14.6 8.75 2541 275 2 1.38 564375 370.
10 2001-01-16 2.75 4.88 14.9 8.94 2772 300 2.12 1.62 595000 358.
# … with 5,066 more rows
Using Filter :
library(data.table)
Filter(function(x) all(!is.na(x)), fread('example.csv'))
# date ACU ACY AE AEF AIM AIRI AMS APT
# 1: 2001-01-02 2.75 4.75 14.4 8.44 2376.00 250.00 2.50 1.06
# 2: 2001-01-03 2.75 4.50 14.5 9.00 2409.00 250.00 2.50 1.12
# 3: 2001-01-04 2.75 4.50 14.1 8.88 2508.00 250.00 2.50 1.06
# 4: 2001-01-05 2.38 4.50 14.1 8.88 2475.00 250.00 2.25 1.12
# 5: 2001-01-08 2.56 4.75 14.3 8.75 2376.00 250.00 2.38 1.06
# ---
#5072: 2021-03-02 36.95 10.59 28.1 8.77 2.34 1.61 2.48 14.33
#5073: 2021-03-03 38.40 10.00 30.1 8.78 2.26 1.57 2.47 12.92
#5074: 2021-03-04 37.90 8.03 30.8 8.63 2.09 1.44 2.27 12.44
#5075: 2021-03-05 35.68 8.13 31.5 8.70 2.05 1.48 2.35 12.45
#5076: 2021-03-08 37.87 8.22 31.9 8.59 2.01 1.52 2.47 12.15
# ARMP ASXC
# 1: 4.90e+05 178.75
# 2: 4.72e+05 192.97
# 3: 5.42e+05 300.62
# 4: 5.86e+05 300.62
# 5: 6.39e+05 276.25
# ---
#5072: 5.67e+00 3.92
#5073: 5.58e+00 4.54
#5074: 5.15e+00 4.08
#5075: 4.49e+00 3.81
#5076: 4.73e+00 4.15
I would like to figure out which is the winning unit of a node in the kohonen plot
library(kohonen)
set.seed(0)
data("wines")
wines <- scale(wines)
som_grid <- somgrid(8, 6, "hexagonal")
som_model <- som(wines, som_grid)
plot(som_model)
The plot will look like this:
And you may know in which cluster the observation will lie with
head(data.frame(cbind(wines,unit= som_model$unit.classif)))
alcohol malic.acid ash ash.alkalinity magnesium tot..phenols flavonoids non.flav..phenols proanth col..int. col..hue
1 13.20 1.78 2.14 11.2 100 2.65 2.76 0.26 1.28 4.38 1.05
2 13.16 2.36 2.67 18.6 101 2.80 3.24 0.30 2.81 5.68 1.03
3 14.37 1.95 2.50 16.8 113 3.85 3.49 0.24 2.18 7.80 0.86
4 13.24 2.59 2.87 21.0 118 2.80 2.69 0.39 1.82 4.32 1.04
5 14.20 1.76 2.45 15.2 112 3.27 3.39 0.34 1.97 6.75 1.05
6 14.39 1.87 2.45 14.6 96 2.50 2.52 0.30 1.98 5.25 1.02
OD.ratio proline unit
1 3.40 1050 24
2 3.17 1185 46
3 3.45 1480 48
4 2.93 735 4
5 2.85 1450 48
6 3.58 1290 47
But I would like to retrieve this unit information in the plot, like putting a text in the nodes with this unit number in the same way that identify function does, but automatically. Thanks in advance!
I have a data.frame with 178 rows and 14 columns. When I print it into the R-console, it only shows me 71 rows, despite the max.print option being set to 1000 rows.
Could anyone please explain why max.print option doesn't work to print full dataset in R console? And how can I do that?
I use R 3.4.1 on MacOS.
Here is a data example:
1 1 14.23 1.71 2.43 15.6 127 2.80 3.06 0.28 2.29 5.640000 1.040 3.92 1065
2 1 13.20 1.78 2.14 11.2 100 2.65 2.76 0.26 1.28 4.380000 1.050 3.40 1050
3 1 13.16 2.36 2.67 18.6 101 2.80 3.24 0.30 2.81 5.680000 1.030 3.17 1185
4 1 14.37 1.95 2.50 16.8 113 3.85 3.49 0.24 2.18 7.800000 0.860 3.45 1480
5 1 13.24 2.59 2.87 21.0 118 2.80 2.69 0.39 1.82 4.320000 1.040 2.93 735
6 1 14.20 1.76 2.45 15.2 112 3.27 3.39 0.34 1.97 6.750000 1.050 2.85 1450
7 1 14.39 1.87 2.45 14.6 96 2.50 2.52 0.30 1.98 5.250000 1.020 3.58 1290
8 1 14.06 2.15 2.61 17.6 121 2.60 2.51 0.31 1.25 5.050000 1.060 3.58 1295
9 1 14.83 1.64 2.17 14.0 97 2.80 2.98 0.29 1.98 5.200000 1.080 2.85 1045
10 1 13.86 1.35 2.27 16.0 98 2.98 3.15 0.22 1.85 7.220000 1.010 3.55 1045
11 1 14.10 2.16 2.30 18.0 105 2.95 3.32 0.22 2.38 5.750000 1.250 3.17 1510
12 1 14.12 1.48 2.32 16.8 95 2.20 2.43 0.26 1.57 5.000000 1.170 2.82 1280
13 1 13.75 1.73 2.41 16.0 89 2.60 2.76 0.29 1.81 5.600000 1.150 2.90 1320
14 1 14.75 1.73 2.39 11.4 91 3.10 3.69 0.43 2.81 5.400000 1.250 2.73 1150
15 1 14.38 1.87 2.38 12.0 102 3.30 3.64 0.29 2.96 7.500000 1.200 3.00 1547
16 1 13.63 1.81 2.70 17.2 112 2.85 2.91 0.30 1.46 7.300000 1.280 2.88 1310
17 1 14.30 1.92 2.72 20.0 120 2.80 3.14 0.33 1.97 6.200000 1.070 2.65 1280
18 1 13.83 1.57 2.62 20.0 115 2.95 3.40 0.40 1.72 6.600000 1.130 2.57 1130
19 1 14.19 1.59 2.48 16.5 108 3.30 3.93 0.32 1.86 8.700000 1.230 2.82 1680
20 1 13.64 3.10 2.56 15.2 116 2.70 3.03 0.17 1.66 5.100000 0.960 3.36 845
21 1 14.06 1.63 2.28 16.0 126 3.00 3.17 0.24 2.10 5.650000 1.090 3.71 780
22 1 12.93 3.80 2.65 18.6 102 2.41 2.41 0.25 1.98 4.500000 1.030 3.52 770
23 1 13.71 1.86 2.36 16.6 101 2.61 2.88 0.27 1.69 3.800000 1.110 4.00 1035
24 1 12.85 1.60 2.52 17.8 95 2.48 2.37 0.26 1.46 3.930000 1.090 3.63 1015
25 1 13.50 1.81 2.61 20.0 96 2.53 2.61 0.28 1.66 3.520000 1.120 3.82 845
26 1 13.05 2.05 3.22 25.0 124 2.63 2.68 0.47 1.92 3.580000 1.130 3.20 830
27 1 13.39 1.77 2.62 16.1 93 2.85 2.94 0.34 1.45 4.800000 0.920 3.22 1195
28 1 13.30 1.72 2.14 17.0 94 2.40 2.19 0.27 1.35 3.950000 1.020 2.77 1285
29 1 13.87 1.90 2.80 19.4 107 2.95 2.97 0.37 1.76 4.500000 1.250 3.40 915
30 1 14.02 1.68 2.21 16.0 96 2.65 2.33 0.26 1.98 4.700000 1.040 3.59 1035
31 1 13.73 1.50 2.70 22.5 101 3.00 3.25 0.29 2.38 5.700000 1.190 2.71 1285
32 1 13.58 1.66 2.36 19.1 106 2.86 3.19 0.22 1.95 6.900000 1.090 2.88 1515
33 1 13.68 1.83 2.36 17.2 104 2.42 2.69 0.42 1.97 3.840000 1.230 2.87 990
34 1 13.76 1.53 2.70 19.5 132 2.95 2.74 0.50 1.35 5.400000 1.250 3.00 1235
35 1 13.51 1.80 2.65 19.0 110 2.35 2.53 0.29 1.54 4.200000 1.100 2.87 1095
36 1 13.48 1.81 2.41 20.5 100 2.70 2.98 0.26 1.86 5.100000 1.040 3.47 920
37 1 13.28 1.64 2.84 15.5 110 2.60 2.68 0.34 1.36 4.600000 1.090 2.78 880
38 1 13.05 1.65 2.55 18.0 98 2.45 2.43 0.29 1.44 4.250000 1.120 2.51 1105
39 1 13.07 1.50 2.10 15.5 98 2.40 2.64 0.28 1.37 3.700000 1.180 2.69 1020
40 1 14.22 3.99 2.51 13.2 128 3.00 3.04 0.20 2.08 5.100000 0.890 3.53 760
41 1 13.56 1.71 2.31 16.2 117 3.15 3.29 0.34 2.34 6.130000 0.950 3.38 795
42 1 13.41 3.84 2.12 18.8 90 2.45 2.68 0.27 1.48 4.280000 0.910 3.00 1035
43 1 13.88 1.89 2.59 15.0 101 3.25 3.56 0.17 1.70 5.430000 0.880 3.56 1095
44 1 13.24 3.98 2.29 17.5 103 2.64 2.63 0.32 1.66 4.360000 0.820 3.00 680
45 1 13.05 1.77 2.10 17.0 107 3.00 3.00 0.28 2.03 5.040000 0.880 3.35 885
46 1 14.21 4.04 2.44 18.9 111 2.85 2.65 0.30 1.25 5.240000 0.870 3.33 1080
47 1 14.38 3.59 2.28 16.0 102 3.25 3.17 0.27 2.19 4.900000 1.040 3.44 1065
48 1 13.90 1.68 2.12 16.0 101 3.10 3.39 0.21 2.14 6.100000 0.910 3.33 985
49 1 14.10 2.02 2.40 18.8 103 2.75 2.92 0.32 2.38 6.200000 1.070 2.75 1060
50 1 13.94 1.73 2.27 17.4 108 2.88 3.54 0.32 2.08 8.900000 1.120 3.10 1260
51 1 13.05 1.73 2.04 12.4 92 2.72 3.27 0.17 2.91 7.200000 1.120 2.91 1150
52 1 13.83 1.65 2.60 17.2 94 2.45 2.99 0.22 2.29 5.600000 1.240 3.37 1265
53 1 13.82 1.75 2.42 14.0 111 3.88 3.74 0.32 1.87 7.050000 1.010 3.26 1190
54 1 13.77 1.90 2.68 17.1 115 3.00 2.79 0.39 1.68 6.300000 1.130 2.93 1375
55 1 13.74 1.67 2.25 16.4 118 2.60 2.90 0.21 1.62 5.850000 0.920 3.20 1060
56 1 13.56 1.73 2.46 20.5 116 2.96 2.78 0.20 2.45 6.250000 0.980 3.03 1120
57 1 14.22 1.70 2.30 16.3 118 3.20 3.00 0.26 2.03 6.380000 0.940 3.31 970
58 1 13.29 1.97 2.68 16.8 102 3.00 3.23 0.31 1.66 6.000000 1.070 2.84 1270
59 1 13.72 1.43 2.50 16.7 108 3.40 3.67 0.19 2.04 6.800000 0.890 2.87 1285
60 2 12.37 0.94 1.36 10.6 88 1.98 0.57 0.28 0.42 1.950000 1.050 1.82 520
61 2 12.33 1.10 2.28 16.0 101 2.05 1.09 0.63 0.41 3.270000 1.250 1.67 680
62 2 12.64 1.36 2.02 16.8 100 2.02 1.41 0.53 0.62 5.750000 0.980 1.59 450
63 2 13.67 1.25 1.92 18.0 94 2.10 1.79 0.32 0.73 3.800000 1.230 2.46 630
64 2 12.37 1.13 2.16 19.0 87 3.50 3.10 0.19 1.87 4.450000 1.220 2.87 420
65 2 12.17 1.45 2.53 19.0 104 1.89 1.75 0.45 1.03 2.950000 1.450 2.23 355
66 2 12.37 1.21 2.56 18.1 98 2.42 2.65 0.37 2.08 4.600000 1.190 2.30 678
67 2 13.11 1.01 1.70 15.0 78 2.98 3.18 0.26 2.28 5.300000 1.120 3.18 502
68 2 12.37 1.17 1.92 19.6 78 2.11 2.00 0.27 1.04 4.680000 1.120 3.48 510
69 2 13.34 0.94 2.36 17.0 110 2.53 1.30 0.55 0.42 3.170000 1.020 1.93 750
70 2 12.21 1.19 1.75 16.8 151 1.85 1.28 0.14 2.50 2.850000 1.280 3.07 718
71 2 12.29 1.61 2.21 20.4 103 1.10 1.02 0.37 1.46 3.050000 0.906 1.82 870
[ reached getOption("max.print") -- omitted 107 rows ]```
options(max.print = 99999)
try this command
Type this code at the start of your R code. Worked for me:
options(max.print = .Machine$integer.max)
I'm munging data, specifically, I've opened this pdf http://pubs.acs.org/doi/suppl/10.1021/ja105035r/suppl_file/ja105035r_si_001.pdf and scraped the data from table s4,
1a 1b 1a 1b
1 5.27 4.76 5.09 4.75
2 2.47 2.74 2.77 2.80
4 1.14 1.38 1.12 1.02
6 7.43 7.35 7.22-7.35a 7.25-7.36a
7 7.38 7.34 7.22-7.35a 7.25-7.36a
8 7.23 7.20 7.22-7.35a 7.25-7.36a
9(R) 4.16 3.89 4.12b 4.18b
9(S) 4.16 3.92 4.12b 4.18b
10 1.19 0.91 1.21 1.25
pasted it into notepad and saved it as a txt file.
s4 <- read.table("s4.txt", header=TRUE, stringsAsFactors=FALSE)
gives,
X1a X1b X1a.1 X1b.1
1 5.27 4.76 5.09 4.75
2 2.47 2.74 2.77 2.80
4 1.14 1.38 1.12 1.02
6 7.43 7.35 7.22-7.35a 7.25-7.36a
7 7.38 7.34 7.22-7.35a 7.25-7.36a
8 7.23 7.20 7.22-7.35a 7.25-7.36a
in order to use the data I need to change it all to numeric and remove the letters, thanks to this link R regex gsub separate letters and numbers I can use the following code,
gsub("([[:alpha:]])","",s4[,3])
I can get rid of the extraneous letters.
What I want to do now, and the point of the question, is to change the ranges,
"7.22-7.35" "7.22-7.35" "7.22-7.35"
with their means,
"7.29"
Could I use gsub for this? (or would I need to strsplit across the hyphen, combine into a vector and return the mean?).
You need a single regex in strsplit for this task (removing letters and splitting):
s4[] <- lapply(s4, function(x) {
if (is.numeric(x)) x
else sapply(strsplit(as.character(x), "-|[[:alpha:]]"),
function(y) mean(as.numeric(y)))
})
The result:
> s4
X1a X1b X1a.1 X1b.1
1 5.27 4.76 5.090 4.750
2 2.47 2.74 2.770 2.800
4 1.14 1.38 1.120 1.020
6 7.43 7.35 7.285 7.305
7 7.38 7.34 7.285 7.305
8 7.23 7.20 7.285 7.305
Here's an approach that seems to work right on the sample data:
df[] <- lapply(df, function(col){
col <- gsub("([[:alpha:]])","", col)
col <- ifelse(grepl("-", col), mean(as.numeric(unlist(strsplit(col[grepl("-", col)], "-")))), col)
as.numeric(col)
})
> df
# X1a X1b X1a.1 X1b.1
#1 5.27 4.76 5.090 4.750
#2 2.47 2.74 2.770 2.800
#4 1.14 1.38 1.120 1.020
#6 7.43 7.35 7.285 7.305
#7 7.38 7.34 7.285 7.305
#8 7.23 7.20 7.285 7.305
Disclaimer: It only works right if the ranges in each column are all the same (as in the sample data)
something like that :
mean(as.numeric(unlist(strsplit("7.22-7.35","-"))))
should work (and correspond to what you had in mind I guess)
or you can do :
eval(parse(text=paste0("mean(c(",gsub("-",",","7.22-7.35"),"))")))
but I'm not sure this is simpler...
To apply it to a vector :
vec<-c("7.22-7.35","7.22-7.35")
1st solution : sapply(vec, function(x) mean(as.numeric(unlist(strsplit(x,"-")))))
2nd solution : sapply(vec, function(x) eval(parse(text=paste0("mean(c(",gsub("-",",",x),"))"))))
In both cases, you'll get :
7.22-7.35 7.22-7.35
7.285 7.285
Also,
library(gsubfn)
indx <- !sapply(s4, is.numeric)
s4[indx] <- lapply(s4[indx], function(x)
sapply(strapply(x, '([0-9.]+)', ~as.numeric(x)), mean))
s4
# X1a X1b X1a.1 X1b.1
#1 5.27 4.76 5.090 4.750
#2 2.47 2.74 2.770 2.800
#4 1.14 1.38 1.120 1.020
#6 7.43 7.35 7.285 7.305
#7 7.38 7.34 7.285 7.305
#8 7.23 7.20 7.285 7.305
Return row value when certain number of columns reach certain value from the following table
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 3.93 3.92 3.74 4.84 4.55 4.67 3.99 4.10 4.86 4.06
2 4.00 3.99 3.81 4.90 4.61 4.74 4.04 4.15 4.92 4.11
3 4.67 4.06 3.88 5.01 4.66 4.80 4.09 4.20 4.98 4.16
4 4.73 4.12 3.96 5.03 4.72 4.85 4.14 4.25 5.04 4.21
5 4.79 4.21 4.04 5.09 4.77 4.91 4.18 4.30 5.10 4.26
6 4.86 4.29 4.12 5.15 4.82 4.96 4.23 4.35 5.15 4.30
7 4.92 4.37 4.19 5.21 4.87 5.01 4.27 4.39 5.20 4.35
8 4.98 4.43 4.25 5.26 4.91 5.12 4.31 4.43 5.25 4.38
9 5.04 4.49 4.31 5.30 4.95 5.15 4.34 4.46 5.29 4.41
10 5.04 4.50 4.49 5.31 5.01 5.17 4.50 4.60 5.30 4.45
11 ...
12 ...
As an output, I need a data frame, containing the % reach of the value of interest ('5' in this example) by V1-V10:
Rownum Percent
1 0
2 0
3 10
4 20
5 20
6 20
7 33
8 33
9 40
10 50
Many thanks!
If your matrix is mat:
cbind(1:dim(mat)[1],rowSums(mat>5)/dim(mat)[2]*100)
As far as it's always about 0 and 1 with ten columns, I would multiply the whole dataset by 10 (equals percentage values in this case...). Just use the following code:
# Sample data
set.seed(10)
data <- as.data.frame(do.call("rbind", lapply(seq(9), function(...) {
sample(c(0, 1), 10, replace = TRUE)
})))
rownames(data) <- c("abc", "def", "ghi", "jkl", "mno", "pqr", "stu", "vwx", "yza")
# Percentages
rowSums(data * 10)
# abc def ghi jkl mno pqr stu vwx yza
# 80 40 80 60 60 10 30 50 50
Ok, so now I believe you want to get the percentage of values in each row that meet some threshold criteria. You give the example > 5. One solution of many is using apply:
apply( df , 1 , function(x) sum( x > 5 )/length(x)*100 )
# 1 2 3 4 5 6 7 8 9 10
# 0 0 10 20 20 20 30 30 40 50
#Thomas' solution will be faster for large data.frames because it converts to a matrix first, and these are faster to operate on.