I have to iterate 1000 times a random sampling of a point per grid polygon in a raster for grid sizes from 5 to 25.
With a raster 50 x 50 (2500 cells) the process takes more than 1h with the following code:
library(raster)
library(dplyr)
# This is the script for random sampling inside the grid cells
sample_grid <- function(r, w, n){
grid <- raster(extent(r))
res(grid) <- w
proj4string(grid) <- proj4string(r)
gridpolygon <- rasterToPolygons(grid)
pickpts <- sapply(gridpolygon#polygons, spsample, n = n, type = "random")
sapply(pickpts, FUN = extract, x = r)
}
# Let's make a raster
r <- raster(ncol = 50, nrow = 50, xmn = 0, xmx = 50, ymn = 0, ymx = 50)
values(r) <- runif(ncell(r))
# Repeat the random sampling process 1000 times for different grid sizes
sapply(5:25, function(x) replicate(1000, sample_grid(r, x, 1) %>%
mean(., na.rm = TRUE)))
I would like to make it faster. A reasonable target would be about 15 minutes.
Do you have any suggestions?
This is the output for Rprof
Rprof(tmp <- tempfile())
sample_grid(r, 10, 1) %>% mean(., na.rm = TRUE)
Rprof()
summaryRprof(tmp)
#################### summaryRprof output ####################
$by.self
self.time self.pct total.time total.pct
"eval" 0.02 14.29 0.14 100.00
"initialize" 0.02 14.29 0.06 42.86
"getClassDef" 0.02 14.29 0.04 28.57
".getClassFromCache" 0.02 14.29 0.02 14.29
"aperm" 0.02 14.29 0.02 14.29
"merge.data.frame" 0.02 14.29 0.02 14.29
"validityMethod" 0.02 14.29 0.02 14.29
$by.total
total.time total.pct self.time self.pct
"eval" 0.14 100.00 0.02 14.29
"%>%" 0.14 100.00 0.00 0.00
".local" 0.14 100.00 0.00 0.00
"FUN" 0.14 100.00 0.00 0.00
"lapply" 0.14 100.00 0.00 0.00
"sample_grid" 0.14 100.00 0.00 0.00
"sapply" 0.14 100.00 0.00 0.00
"standardGeneric" 0.14 100.00 0.00 0.00
"initialize" 0.06 42.86 0.02 14.29
"new" 0.06 42.86 0.00 0.00
"getClassDef" 0.04 28.57 0.02 14.29
".cellValues" 0.04 28.57 0.00 0.00
".readCells" 0.04 28.57 0.00 0.00
".xyValues" 0.04 28.57 0.00 0.00
"CRS" 0.04 28.57 0.00 0.00
"over" 0.04 28.57 0.00 0.00
"sample.Polygon" 0.04 28.57 0.00 0.00
"validObject" 0.04 28.57 0.00 0.00
".getClassFromCache" 0.02 14.29 0.02 14.29
"aperm" 0.02 14.29 0.02 14.29
"merge.data.frame" 0.02 14.29 0.02 14.29
"validityMethod" 0.02 14.29 0.02 14.29
".bboxCoords" 0.02 14.29 0.00 0.00
".uniqueNames" 0.02 14.29 0.00 0.00
"[" 0.02 14.29 0.00 0.00
"anyStrings" 0.02 14.29 0.00 0.00
"apply" 0.02 14.29 0.00 0.00
"as.matrix" 0.02 14.29 0.00 0.00
"identical" 0.02 14.29 0.00 0.00
"identicalCRS" 0.02 14.29 0.00 0.00
"is" 0.02 14.29 0.00 0.00
"match.arg" 0.02 14.29 0.00 0.00
"merge" 0.02 14.29 0.00 0.00
"merge.default" 0.02 14.29 0.00 0.00
"names" 0.02 14.29 0.00 0.00
"SpatialPolygons" 0.02 14.29 0.00 0.00
"stopifnot" 0.02 14.29 0.00 0.00
"t" 0.02 14.29 0.00 0.00
"table" 0.02 14.29 0.00 0.00
"validNames" 0.02 14.29 0.00 0.00
$sample.interval
[1] 0.02
$sampling.time
[1] 0.14
###############################################################
Related
I am struggeling with an optimization problem involving a simple matrix operation. The task is the following: I have a sqare matrix D containing "damage multipliers" stemming from a prodcuction reduction in producing countries (columns) and felt by "receiving" countries (rows).
AUT BEL BGR CYP CZE DEU DNK ESP EST FIN FRA GBR GRC HRV HUN IRL ITA LTU LUX LVA MLT NLD POL PRT ROU SVK SVN SWE
AUT 1.48 0.15 0.18 0.08 0.19 0.22 0.01 0.01 0.02 0.02 0.05 0.01 0.01 0.02 0.14 0.00 0.02 0.03 0.02 0.02 0.00 0.04 0.10 0.09 0.11 0.16 0.17 0.11
BEL 0.03 2.70 0.34 0.09 0.05 0.03 0.02 0.01 0.04 0.09 0.09 0.02 0.01 0.01 0.03 0.01 0.01 0.03 0.08 0.02 0.00 0.04 0.03 0.37 0.09 0.07 0.15 0.29
BGR 0.01 0.02 9.81 0.09 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.06 0.00 0.01 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.02 0.12 0.01 0.00 0.01
CYP 0.00 0.01 0.01 9.87 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
CZE 0.19 0.11 0.08 0.07 4.14 0.27 0.01 0.00 0.01 0.01 0.03 0.01 0.00 0.00 0.05 0.00 0.03 0.05 0.01 0.01 0.00 0.02 0.32 0.07 0.03 2.57 0.05 0.05
DEU 0.29 2.54 0.27 0.15 0.19 1.71 0.10 0.04 0.06 0.22 0.22 0.09 0.03 0.02 0.11 0.03 0.08 0.12 0.08 0.07 0.00 0.28 0.28 0.55 0.25 0.26 0.11 1.09
DNK 0.01 0.09 0.02 0.09 0.01 0.14 3.43 0.00 0.02 0.12 0.02 0.02 0.00 0.00 0.01 0.00 0.01 0.02 0.01 0.02 0.00 0.01 0.03 0.05 0.01 0.01 0.01 1.39
ESP 0.02 0.26 0.06 0.05 0.02 0.03 0.02 2.72 0.45 0.04 0.22 0.05 0.04 0.01 0.01 0.05 0.06 0.02 0.01 0.01 0.00 0.02 0.03 1.28 0.05 0.02 0.01 0.32
EST 0.00 0.01 0.00 0.03 0.00 0.00 0.00 0.00 5.03 0.17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05
FIN 0.01 0.09 0.02 0.03 0.01 0.01 0.06 0.00 0.21 5.48 0.01 0.01 0.00 0.00 0.00 0.01 0.00 0.02 0.01 0.02 0.00 0.01 0.02 0.05 0.01 0.01 0.00 1.99
FRA 0.04 0.89 0.11 0.13 0.03 0.08 0.03 0.18 0.04 0.08 5.19 0.05 0.02 0.01 0.03 0.05 0.06 0.06 0.03 0.03 0.00 0.14 0.04 0.54 0.08 0.04 0.03 0.79
GBR 0.03 0.80 0.09 2.13 0.03 0.05 0.12 0.08 0.03 0.30 0.15 3.13 0.02 0.01 0.02 0.41 0.02 0.12 0.02 0.05 0.00 0.19 0.06 0.36 0.05 0.04 0.02 2.28
GRC 0.00 0.04 0.14 0.26 0.00 0.00 0.00 0.01 0.00 0.00 0.01 0.00 2.10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.03 0.00 0.00 0.02
HRV 0.19 0.01 0.01 0.03 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.25 0.03 0.00 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.00 0.09 0.01
HUN 0.29 0.07 0.08 0.17 0.30 0.08 0.02 0.00 0.01 0.01 0.06 0.00 0.00 0.01 4.83 0.00 0.01 0.09 0.01 0.05 0.00 0.01 0.05 0.04 0.13 0.23 0.06 0.04
IRL 0.00 0.03 0.01 0.06 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.03 0.00 0.00 0.00 1.80 0.00 0.00 0.01 0.00 0.00 0.00 0.01 0.02 0.00 0.00 0.00 0.03
ITA 0.76 0.46 0.40 0.20 0.06 0.24 0.02 0.18 0.04 0.05 0.19 0.03 0.14 0.06 0.06 0.06 4.16 0.05 0.02 0.07 0.00 0.14 0.05 0.37 0.15 0.08 0.21 0.34
LTU 0.00 0.02 0.01 0.01 0.00 0.00 0.00 0.00 0.02 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.18 0.00 0.03 0.00 0.00 0.01 0.01 0.00 0.00 0.00 0.02
LUX 0.00 0.14 0.00 0.02 0.00 0.02 0.00 0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.04 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.01 0.01
LVA 0.00 0.01 0.00 0.03 0.00 0.00 0.00 0.00 0.05 0.15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.16 0.00 6.77 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.03
MLT 0.00 0.00 0.00 0.09 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.67 0.00 0.00 0.00 0.00 0.00 0.00 0.01
NLD 0.02 0.86 0.07 0.08 0.02 0.04 0.03 0.01 0.03 0.11 0.08 0.03 0.01 0.01 0.02 0.05 0.01 0.07 0.03 0.02 0.00 2.03 0.03 0.23 0.04 0.03 0.02 0.43
POL 0.02 0.09 0.03 0.19 0.16 0.13 0.01 0.01 0.01 0.02 0.06 0.01 0.00 0.00 0.02 0.00 0.01 0.33 0.00 0.03 0.00 0.01 2.18 0.05 0.02 0.11 0.01 0.11
PRT 0.00 0.05 0.01 0.10 0.00 0.03 0.01 0.07 0.02 0.01 0.04 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.53 0.01 0.00 0.00 0.07
ROU 0.04 0.06 0.89 0.13 0.04 0.02 0.00 0.00 0.00 0.01 0.01 0.00 0.01 0.00 0.31 0.00 0.02 0.02 0.00 0.01 0.00 0.00 0.03 0.04 10.52 0.06 0.01 0.03
SVK 0.23 0.04 0.02 0.08 1.12 0.60 0.00 0.00 0.00 0.01 0.32 0.00 0.00 0.00 0.11 0.00 0.00 0.07 0.00 0.02 0.00 0.00 0.34 0.03 0.03 7.06 0.02 0.03
SVN 0.13 0.01 0.02 0.01 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.01 0.00 0.05 0.01 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.01 6.77 0.01
SWE 0.02 0.20 0.05 0.08 0.02 0.03 0.26 0.01 0.12 0.90 0.04 0.03 0.00 0.01 0.01 0.01 0.01 0.03 0.01 0.06 0.00 0.02 0.05 0.12 0.03 0.02 0.02 8.05
The values represent the effect of a unitary shock in production: ie. if country AUT reduces production by one unit, the damage felt in country DEU is 0.29. Hence, the matrix can be seen as a symmetric network of production effects between countries.
My goal is to find the optimal weights of a weighted unitary shock (i.e. weighting the columns so that the total reduction of production summed over all countries is = 1) that:
ensure a certain distribution of damage across receiving (row) countries (i.e. the row sums), lets say an equal distribution
while at the same minimizing the damage in the overall economic system
I've tried solving it as a simple non-linear optimization problem with equality constraints, using the package Rsolnp:
# objective function to be minimized (global damage)
damage <- function(weights) {
D_weighted <- t(t(D)*weights); return(sum(D_weighted))
}
# constraints (combined in one function:
constr <- function(weights) {
# constraint 1: sum of weights needs to be 1
c1 = sum(weights)
# constraint 2: equal distribution in damage outcome
D_weighted <- t(t(D)*weights)
damage_per_country <- rowSums(D_weighted)/sum(D_weighted)
c2 = damage_per_country/sum(D_weighted)
return(c(c1, c2))
}
# target distribution of damage outcome (for example: equal distribution)
targ_dist <- c(rep(1/(ncol(D)), ncol(D)))
# starting weights (sart with same production reduction in every country)
startweights <- rep(1/ncol(D), ncol(D))
# run optimization with Rsolnp
opt_weights <- solnp(pars = startweights, fun = damage, eqfun = constr, eqB = c(1, targ_dist), LB = rep(0, ncol(D)), UB = rep(1, ncol(D)), control=list(outer.iter=1000,trace=0, tol= 0.001))
but it doesn't converge and returns a warning message:
"The linearized problem has no feasible solution. The problem may not be feasible".
Changing the tolerance doesn't solve the problem. It might be that this solver is not suited for this kind of problem or I need to reformulate the problem completely. I'd be thankful for any help!
Occasionally, we find novice R programmers build data frames in a for loop, usually by initializing an empty data frame and then iteratively calling rbind. To respond to this inefficient approach, we often cite Patrick Burns' R Inferno - Circle 2: Growing Objects who emphasizes the hazard of this situation.
In Python pandas (the other open-source data science tool), experts have asserted the quadratic copy and O(N^2) logic: (#unutbu here, #Alexander here). Additionally, docs (see section note) stress the copying problem of datasets and wiki explains Python's list.append does not have the copy problem. I wonder if similar constructs apply to R.
Specifically, my question:
Can timing alone illustrate or quantify the growing object in loop problem? See microbenchmark results below. Burns shows timings to illustrate the computational challenge to create a sequence.
Or does memory usage illustrate or quantify the growing object in loop problem? See RProf results below. Burns cites using RProf to show memory consumption within code.
Or is the growing object problem, context-specific, with general rule of thumb to avoid loops in building objects?
Consider following examples of growing a random data frame of 500 rows in a loop and using a list:
grow_df_loop <- function(n) {
final_df <- data.frame()
for(i in 1:n) {
df <- data.frame(
group = sample(c("sas", "stata", "spss", "python", "r", "julia"), 500, replace=TRUE),
int = sample(1:15, 500, replace=TRUE),
num = rnorm(500),
char = replicate(500, paste(sample(c(LETTERS, letters, c(0:9)), 3, replace=TRUE), collapse="")),
bool = sample(c(TRUE, FALSE), 500, replace=TRUE),
date = as.Date(sample(10957:as.integer(Sys.Date()), 500, replace=TRUE), origin="1970-01-01")
)
final_df <- rbind(final_df, df)
}
return(final_df)
}
grow_df_list <- function(n) {
df_list <- lapply(1:n, function(i)
df <- data.frame(
group = sample(c("sas", "stata", "spss", "python", "r", "julia"), 500, replace=TRUE),
int = sample(1:15, 500, replace=TRUE),
num = rnorm(500),
char = replicate(500, paste(sample(c(LETTERS, letters, c(0:9)), 3, replace=TRUE), collapse="")),
bool = sample(c(TRUE, FALSE), 500, replace=TRUE),
date = as.Date(sample(10957:as.integer(Sys.Date()), 500, replace=TRUE), origin="1970-01-01")
)
)
final_df <- do.call(rbind, df_list)
return(final_df)
}
Timing
Benchmarking by timing confirms the list approach is more efficient across the different number of iterations. But given reproducible, uniform data examples can timing results capture the difference of object growth?
library(microbenchmark)
microbenchmark(grow_df_loop(50), grow_df_list(50), times = 5L)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# grow_df_loop(50) 758.2412 762.3489 809.8988 793.3590 806.4191 929.1256 5 b
# grow_df_list(50) 554.3722 562.1949 577.6891 568.7658 589.8565 613.2560 5 a
microbenchmark(grow_df_loop(100), grow_df_list(100), times = 5L)
# Unit: seconds
# expr min lq mean median uq max neval cld
# grow_df_loop(100) 2.223617 2.225441 2.425668 2.233529 2.677309 2.768447 5 b
# grow_df_list(100) 1.211181 1.255191 1.325670 1.287821 1.396905 1.477252 5 a
microbenchmark(grow_df_loop(500), grow_df_list(500), times = 5L)
# Unit: seconds
# expr min lq mean median uq max neval cld
# grow_df_loop(500) 38.78245 39.74367 41.54976 40.10221 44.36565 44.75483 5 b
# grow_df_list(500) 13.37076 13.90227 14.67498 14.53042 15.49942 16.07203 5 a
Memory Usage
Additionally, profiling by memory shows "rbind" memory totals sizeably growing with iteration size but more pronounced with loop approach than list approach. Given a reproducible, uniform example can mem.total results capture the difference of object growth? Any other approach to use?
Loop Approach
n = 50
utils::Rprof(tmp <- tempfile(), memory.profiling = TRUE)
output_df1 <- grow_df_loop(50)
utils::Rprof(NULL)
summaryRprof(tmp, memory="both")
unlink(tmp)
# $by.total
# total.time total.pct mem.total self.time self.pct
# "grow_df_loop" 0.58 100.00 349.1 0.00 0.00
# "data.frame" 0.38 65.52 209.4 0.00 0.00
# "paste" 0.28 48.28 186.4 0.06 10.34
# "FUN" 0.26 44.83 150.8 0.02 3.45
# "lapply" 0.26 44.83 150.8 0.00 0.00
# "replicate" 0.26 44.83 150.8 0.00 0.00
# "sapply" 0.26 44.83 150.8 0.00 0.00
# "sample" 0.20 34.48 131.4 0.08 13.79
# "rbind" 0.20 34.48 139.7 0.00 0.00
# "[<-.factor" 0.12 20.69 66.0 0.10 17.24
# "[<-" 0.12 20.69 66.0 0.00 0.00
# "factor" 0.10 17.24 47.8 0.04 6.90
# "as.data.frame" 0.10 17.24 48.5 0.00 0.00
# "as.data.frame.character" 0.10 17.24 48.5 0.00 0.00
# "order" 0.06 10.34 12.9 0.06 10.34
# "as.vector" 0.04 6.90 38.7 0.04 6.90
# "sample.int" 0.04 6.90 18.7 0.02 3.45
# "as.vector.factor" 0.04 6.90 38.7 0.00 0.00
# "deparse" 0.04 6.90 35.6 0.00 0.00
# "!" 0.02 3.45 18.7 0.02 3.45
# ":" 0.02 3.45 0.0 0.02 3.45
# "anyNA" 0.02 3.45 19.0 0.02 3.45
# "as.POSIXlt.POSIXct" 0.02 3.45 10.1 0.02 3.45
# "c" 0.02 3.45 19.8 0.02 3.45
# "is.na" 0.02 3.45 18.9 0.02 3.45
# "length" 0.02 3.45 13.8 0.02 3.45
# "mode" 0.02 3.45 16.6 0.02 3.45
# "%in%" 0.02 3.45 16.6 0.00 0.00
# ".deparseOpts" 0.02 3.45 19.0 0.00 0.00
# "as.Date" 0.02 3.45 10.1 0.00 0.00
# "as.POSIXlt" 0.02 3.45 10.1 0.00 0.00
# "Sys.Date" 0.02 3.45 10.1 0.00 0.00
#
# $sample.interval
# [1] 0.02
#
# $sampling.time
# [1] 0.58
n = 100
# $by.total
# total.time total.pct mem.total self.time self.pct
# "grow_df_loop" 1.74 98.86 963.0 0.00 0.00
# "rbind" 1.06 60.23 599.3 0.06 3.41
# "data.frame" 0.68 38.64 363.7 0.02 1.14
# "lapply" 0.50 28.41 239.0 0.04 2.27
# "replicate" 0.50 28.41 239.0 0.00 0.00
# "sapply" 0.50 28.41 239.0 0.00 0.00
# "paste" 0.46 26.14 218.4 0.06 3.41
# "FUN" 0.46 26.14 218.4 0.00 0.00
# "factor" 0.44 25.00 249.2 0.24 13.64
# "sample" 0.40 22.73 179.2 0.10 5.68
# "[<-" 0.38 21.59 244.3 0.00 0.00
# "[<-.factor" 0.34 19.32 229.5 0.30 17.05
# "c" 0.26 14.77 136.6 0.26 14.77
# "as.vector" 0.24 13.64 101.2 0.24 13.64
# "as.vector.factor" 0.24 13.64 101.2 0.00 0.00
# "order" 0.14 7.95 87.3 0.14 7.95
# "as.data.frame" 0.14 7.95 87.3 0.00 0.00
# "as.data.frame.character" 0.14 7.95 87.3 0.00 0.00
# "sample.int" 0.10 5.68 28.2 0.10 5.68
# "unique" 0.10 5.68 64.9 0.00 0.00
# "is.na" 0.06 3.41 62.4 0.06 3.41
# "unique.default" 0.04 2.27 42.4 0.04 2.27
# "[<-.Date" 0.04 2.27 14.9 0.00 0.00
# ".Call" 0.02 1.14 0.0 0.02 1.14
# "Make.row.names" 0.02 1.14 0.0 0.02 1.14
# "NextMethod" 0.02 1.14 0.0 0.02 1.14
# "structure" 0.02 1.14 10.3 0.02 1.14
# "unclass" 0.02 1.14 14.9 0.02 1.14
# ".Date" 0.02 1.14 0.0 0.00 0.00
# ".rs.enqueClientEvent" 0.02 1.14 0.0 0.00 0.00
# "as.Date" 0.02 1.14 23.2 0.00 0.00
# "as.Date.character" 0.02 1.14 23.2 0.00 0.00
# "as.Date.numeric" 0.02 1.14 23.2 0.00 0.00
# "charToDate" 0.02 1.14 23.2 0.00 0.00
# "hook" 0.02 1.14 0.0 0.00 0.00
# "is.na.POSIXlt" 0.02 1.14 23.2 0.00 0.00
# "utils::Rprof" 0.02 1.14 0.0 0.00 0.00
#
# $sample.interval
# [1] 0.02
#
# $sampling.time
# [1] 1.76
n = 500
# $by.total
# total.time total.pct mem.total self.time self.pct
# "grow_df_loop" 28.12 100.00 15557.7 0.00 0.00
# "rbind" 25.30 89.97 13418.5 3.06 10.88
# "factor" 8.94 31.79 5026.5 6.98 24.82
# "[<-" 8.72 31.01 4486.9 0.02 0.07
# "[<-.factor" 7.62 27.10 3915.5 7.32 26.03
# "unique" 3.06 10.88 2060.9 0.00 0.00
# "as.vector" 2.96 10.53 1250.1 2.96 10.53
# "as.vector.factor" 2.96 10.53 1250.1 0.00 0.00
# "data.frame" 2.82 10.03 2139.1 0.02 0.07
# "unique.default" 2.30 8.18 1657.9 2.30 8.18
# "replicate" 1.88 6.69 1364.7 0.00 0.00
# "sapply" 1.88 6.69 1364.7 0.00 0.00
# "FUN" 1.84 6.54 1367.2 0.18 0.64
# "lapply" 1.84 6.54 1338.8 0.02 0.07
# "paste" 1.70 6.05 1281.3 0.38 1.35
# "sample" 1.36 4.84 1089.2 0.20 0.71
# "[<-.Date" 1.08 3.84 571.4 0.00 0.00
# "c" 1.04 3.70 688.7 1.04 3.70
# ".Date" 0.96 3.41 488.0 0.34 1.21
# "sample.int" 0.76 2.70 584.2 0.74 2.63
# "as.data.frame" 0.70 2.49 533.6 0.00 0.00
# "as.data.frame.character" 0.64 2.28 476.0 0.00 0.00
# "NextMethod" 0.62 2.20 424.7 0.62 2.20
# "order" 0.60 2.13 475.5 0.50 1.78
# "structure" 0.32 1.14 155.5 0.32 1.14
# "is.na" 0.28 1.00 150.5 0.26 0.92
# "Make.row.names" 0.12 0.43 153.8 0.12 0.43
# "unclass" 0.12 0.43 83.3 0.12 0.43
# "as.Date" 0.10 0.36 120.1 0.02 0.07
# "length" 0.06 0.21 79.2 0.06 0.21
# "seq.int" 0.06 0.21 57.0 0.06 0.21
# "vapply" 0.06 0.21 84.6 0.02 0.07
# ":" 0.04 0.14 1.1 0.04 0.14
# "as.POSIXlt.POSIXct" 0.04 0.14 57.7 0.04 0.14
# "is.factor" 0.04 0.14 0.0 0.04 0.14
# "deparse" 0.04 0.14 55.0 0.02 0.07
# "eval" 0.04 0.14 36.2 0.02 0.07
# "match.arg" 0.04 0.14 25.2 0.02 0.07
# "match.fun" 0.04 0.14 32.4 0.02 0.07
# "as.data.frame.integer" 0.04 0.14 55.0 0.00 0.00
# "as.POSIXlt" 0.04 0.14 57.7 0.00 0.00
# "force" 0.04 0.14 55.0 0.00 0.00
# "make.names" 0.04 0.14 42.1 0.00 0.00
# "Sys.Date" 0.04 0.14 57.7 0.00 0.00
# "!" 0.02 0.07 29.6 0.02 0.07
# "$" 0.02 0.07 2.6 0.02 0.07
# "any" 0.02 0.07 18.3 0.02 0.07
# "as.data.frame.numeric" 0.02 0.07 2.6 0.02 0.07
# "as.data.frame.vector" 0.02 0.07 21.6 0.02 0.07
# "as.list" 0.02 0.07 26.6 0.02 0.07
# "baseenv" 0.02 0.07 25.2 0.02 0.07
# "is.ordered" 0.02 0.07 14.5 0.02 0.07
# "lengths" 0.02 0.07 14.9 0.02 0.07
# "levels" 0.02 0.07 0.0 0.02 0.07
# "mode" 0.02 0.07 30.7 0.02 0.07
# "names" 0.02 0.07 0.0 0.02 0.07
# "rnorm" 0.02 0.07 29.6 0.02 0.07
# "%in%" 0.02 0.07 30.7 0.00 0.00
# "as.Date.character" 0.02 0.07 2.6 0.00 0.00
# "as.Date.numeric" 0.02 0.07 2.6 0.00 0.00
# "as.POSIXct" 0.02 0.07 2.6 0.00 0.00
# "as.POSIXct.POSIXlt" 0.02 0.07 2.6 0.00 0.00
# "charToDate" 0.02 0.07 2.6 0.00 0.00
# "eval.parent" 0.02 0.07 11.0 0.00 0.00
# "is.na.POSIXlt" 0.02 0.07 2.6 0.00 0.00
# "simplify2array" 0.02 0.07 14.9 0.00 0.00
#
# $sample.interval
# [1] 0.02
#
# $sampling.time
# [1] 28.12
List Approach
n = 50
# $by.total
# total.time total.pct mem.total self.time self.pct
# "grow_df_list" 0.40 100 257.0 0.00 0
# "data.frame" 0.32 80 175.6 0.02 5
# "lapply" 0.32 80 175.6 0.02 5
# "FUN" 0.32 80 175.6 0.00 0
# "replicate" 0.24 60 129.6 0.00 0
# "sapply" 0.24 60 129.6 0.00 0
# "paste" 0.22 55 119.2 0.10 25
# "sample" 0.12 30 49.4 0.00 0
# "sample.int" 0.08 20 39.1 0.08 20
# "<Anonymous>" 0.08 20 81.4 0.00 0
# "do.call" 0.08 20 81.4 0.00 0
# "rbind" 0.08 20 81.4 0.00 0
# "factor" 0.06 15 29.7 0.02 5
# "as.data.frame" 0.06 15 29.7 0.00 0
# "as.data.frame.character" 0.06 15 29.7 0.00 0
# "c" 0.04 10 10.3 0.04 10
# "order" 0.04 10 17.3 0.04 10
# "unique.default" 0.04 10 31.1 0.04 10
# "[<-" 0.04 10 50.3 0.00 0
# "unique" 0.04 10 31.1 0.00 0
# ".Date" 0.02 5 27.9 0.02 5
# "[<-.factor" 0.02 5 22.4 0.02 5
# "[<-.Date" 0.02 5 27.9 0.00 0
#
# $sample.interval
# [1] 0.02
#
# $sampling.time
# [1] 0.4
n = 100
# $by.total
# total.time total.pct mem.total self.time self.pct
# "grow_df_list" 1.00 100 620.4 0.00 0
# "data.frame" 0.66 66 401.8 0.00 0
# "FUN" 0.66 66 401.8 0.00 0
# "lapply" 0.66 66 401.8 0.00 0
# "paste" 0.42 42 275.3 0.14 14
# "replicate" 0.42 42 275.3 0.00 0
# "sapply" 0.42 42 275.3 0.00 0
# "rbind" 0.34 34 218.6 0.02 2
# "<Anonymous>" 0.34 34 218.6 0.00 0
# "do.call" 0.34 34 218.6 0.00 0
# "sample" 0.28 28 188.6 0.08 8
# "unique.default" 0.20 20 90.1 0.20 20
# "unique" 0.20 20 90.1 0.00 0
# "as.data.frame" 0.18 18 81.2 0.00 0
# "factor" 0.16 16 81.2 0.02 2
# "as.data.frame.character" 0.16 16 81.2 0.00 0
# "[<-.factor" 0.14 14 112.0 0.14 14
# "sample.int" 0.14 14 96.8 0.14 14
# "[<-" 0.14 14 112.0 0.00 0
# "order" 0.12 12 51.2 0.12 12
# "c" 0.06 6 45.8 0.06 6
# "as.Date" 0.04 4 28.3 0.02 2
# "length" 0.02 2 17.0 0.02 2
# "strptime" 0.02 2 11.2 0.02 2
# "structure" 0.02 2 0.0 0.02 2
# "as.data.frame.integer" 0.02 2 0.0 0.00 0
# "as.Date.character" 0.02 2 11.2 0.00 0
# "as.Date.numeric" 0.02 2 11.2 0.00 0
# "charToDate" 0.02 2 11.2 0.00 0
#
# $sample.interval
# [1] 0.02
#
# $sampling.time
# [1] 1
n = 500
# $by.total
# total.time total.pct mem.total self.time self.pct
# "grow_df_list" 9.40 100.00 5621.8 0.00 0.00
# "rbind" 6.12 65.11 3633.5 0.44 4.68
# "<Anonymous>" 6.12 65.11 3633.5 0.00 0.00
# "do.call" 6.12 65.11 3633.5 0.00 0.00
# "lapply" 3.28 34.89 1988.3 0.34 3.62
# "FUN" 3.28 34.89 1988.3 0.10 1.06
# "data.frame" 3.28 34.89 1988.3 0.02 0.21
# "[<-" 3.28 34.89 2118.4 0.00 0.00
# "[<-.factor" 3.00 31.91 1829.1 3.00 31.91
# "replicate" 2.36 25.11 1422.9 0.00 0.00
# "sapply" 2.36 25.11 1422.9 0.00 0.00
# "unique" 2.32 24.68 1189.9 0.00 0.00
# "paste" 1.98 21.06 1194.2 0.70 7.45
# "unique.default" 1.96 20.85 1017.8 1.96 20.85
# "sample" 1.20 12.77 707.4 0.44 4.68
# "as.data.frame" 0.88 9.36 540.5 0.02 0.21
# "as.data.frame.character" 0.78 8.30 496.2 0.00 0.00
# "factor" 0.72 7.66 444.2 0.06 0.64
# "c" 0.68 7.23 379.6 0.68 7.23
# "order" 0.64 6.81 385.1 0.64 6.81
# "sample.int" 0.40 4.26 233.0 0.38 4.04
# ".Date" 0.28 2.98 289.3 0.10 1.06
# "[<-.Date" 0.28 2.98 289.3 0.00 0.00
# "NextMethod" 0.18 1.91 171.2 0.18 1.91
# "deparse" 0.08 0.85 54.6 0.02 0.21
# "%in%" 0.08 0.85 54.6 0.00 0.00
# "mode" 0.08 0.85 54.6 0.00 0.00
# "length" 0.06 0.64 10.4 0.06 0.64
# "structure" 0.06 0.64 30.8 0.04 0.43
# ".deparseOpts" 0.06 0.64 49.1 0.02 0.21
# "[[" 0.06 0.64 34.2 0.02 0.21
# ":" 0.04 0.43 33.6 0.04 0.43
# "[[.data.frame" 0.04 0.43 22.6 0.04 0.43
# "force" 0.04 0.43 20.0 0.00 0.00
# "as.vector" 0.02 0.21 0.0 0.02 0.21
# "is.na" 0.02 0.21 0.0 0.02 0.21
# "levels" 0.02 0.21 14.6 0.02 0.21
# "make.names" 0.02 0.21 9.4 0.02 0.21
# "pmatch" 0.02 0.21 17.3 0.02 0.21
# "as.data.frame.Date" 0.02 0.21 5.5 0.00 0.00
# "as.data.frame.integer" 0.02 0.21 0.0 0.00 0.00
# "as.data.frame.logical" 0.02 0.21 14.5 0.00 0.00
# "as.data.frame.numeric" 0.02 0.21 13.5 0.00 0.00
# "as.data.frame.vector" 0.02 0.21 17.3 0.00 0.00
# "simplify2array" 0.02 0.21 0.0 0.00 0.00
#
# $sample.interval
# [1] 0.02
#
# $sampling.time
# [1] 9.4
Graphs (using a different call to save $by.total results)
I am learning how to use R profiling, and have run the Rprof command on my code.
The summaryRprof function has shown that a lot of time is spent using .External2. What is this? Additionally, there is a large proportion of the total time spent on <Anonymous>, is there a way to find out what this is?
> summaryRprof("test")
$by.self
self.time self.pct total.time total.pct
".External2" 4.30 27.74 4.30 27.74
"format.POSIXlt" 2.70 17.42 2.90 18.71
"which.min" 2.38 15.35 4.12 26.58
"-" 1.30 8.39 1.30 8.39
"order" 1.16 7.48 1.16 7.48
"match" 0.58 3.74 0.58 3.74
"file" 0.44 2.84 0.44 2.84
"abs" 0.40 2.58 0.40 2.58
"scan" 0.30 1.94 0.30 1.94
"anyDuplicated.default" 0.20 1.29 0.20 1.29
"unique.default" 0.20 1.29 0.20 1.29
"unlist" 0.18 1.16 0.20 1.29
"c" 0.16 1.03 0.16 1.03
"data.frame" 0.14 0.90 0.22 1.42
"structure" 0.12 0.77 1.74 11.23
"as.POSIXct.POSIXlt" 0.12 0.77 0.12 0.77
"strptime" 0.12 0.77 0.12 0.77
"as.character" 0.08 0.52 0.90 5.81
"make.unique" 0.08 0.52 0.16 1.03
"[.data.frame" 0.06 0.39 1.54 9.94
"<Anonymous>" 0.04 0.26 4.34 28.00
"lapply" 0.04 0.26 1.70 10.97
"rbind" 0.04 0.26 0.94 6.06
"as.POSIXlt.POSIXct" 0.04 0.26 0.04 0.26
"ifelse" 0.04 0.26 0.04 0.26
"paste" 0.02 0.13 0.92 5.94
"merge.data.frame" 0.02 0.13 0.56 3.61
"[<-.factor" 0.02 0.13 0.52 3.35
"stopifnot" 0.02 0.13 0.04 0.26
".deparseOpts" 0.02 0.13 0.02 0.13
".External" 0.02 0.13 0.02 0.13
"close.connection" 0.02 0.13 0.02 0.13
"doTryCatch" 0.02 0.13 0.02 0.13
"is.na" 0.02 0.13 0.02 0.13
"is.na<-.default" 0.02 0.13 0.02 0.13
"mean" 0.02 0.13 0.02 0.13
"seq.int" 0.02 0.13 0.02 0.13
"sum" 0.02 0.13 0.02 0.13
"sys.function" 0.02 0.13 0.02 0.13
$by.total
total.time total.pct self.time self.pct
"write.table" 5.10 32.90 0.00 0.00
"<Anonymous>" 4.34 28.00 0.04 0.26
".External2" 4.30 27.74 4.30 27.74
"mapply" 4.22 27.23 0.00 0.00
"head" 4.16 26.84 0.00 0.00
"which.min" 4.12 26.58 2.38 15.35
"eval" 3.16 20.39 0.00 0.00
"eval.parent" 3.14 20.26 0.00 0.00
"write.csv" 3.14 20.26 0.00 0.00
"format" 2.92 18.84 0.00 0.00
"format.POSIXlt" 2.90 18.71 2.70 17.42
"do.call" 1.78 11.48 0.00 0.00
"structure" 1.74 11.23 0.12 0.77
"lapply" 1.70 10.97 0.04 0.26
"FUN" 1.66 10.71 0.00 0.00
"format.POSIXct" 1.62 10.45 0.00 0.00
"[.data.frame" 1.54 9.94 0.06 0.39
"[" 1.54 9.94 0.00 0.00
"-" 1.30 8.39 1.30 8.39
"order" 1.16 7.48 1.16 7.48
"rbind" 0.94 6.06 0.04 0.26
"paste" 0.92 5.94 0.02 0.13
"as.character" 0.90 5.81 0.08 0.52
"read.csv" 0.84 5.42 0.00 0.00
"read.table" 0.84 5.42 0.00 0.00
"as.character.POSIXt" 0.82 5.29 0.00 0.00
"match" 0.58 3.74 0.58 3.74
"merge.data.frame" 0.56 3.61 0.02 0.13
"merge" 0.56 3.61 0.00 0.00
"[<-.factor" 0.52 3.35 0.02 0.13
"[<-" 0.52 3.35 0.00 0.00
"strftime" 0.48 3.10 0.00 0.00
"file" 0.44 2.84 0.44 2.84
"weekdays" 0.42 2.71 0.00 0.00
"weekdays.POSIXt" 0.42 2.71 0.00 0.00
"abs" 0.40 2.58 0.40 2.58
"unique" 0.38 2.45 0.00 0.00
"scan" 0.30 1.94 0.30 1.94
"data.frame" 0.22 1.42 0.14 0.90
"cbind" 0.22 1.42 0.00 0.00
"anyDuplicated.default" 0.20 1.29 0.20 1.29
"unique.default" 0.20 1.29 0.20 1.29
"unlist" 0.20 1.29 0.18 1.16
"anyDuplicated" 0.20 1.29 0.00 0.00
"as.POSIXct" 0.18 1.16 0.00 0.00
"as.POSIXlt" 0.18 1.16 0.00 0.00
"c" 0.16 1.03 0.16 1.03
"make.unique" 0.16 1.03 0.08 0.52
"as.POSIXct.POSIXlt" 0.12 0.77 0.12 0.77
"strptime" 0.12 0.77 0.12 0.77
"as.POSIXlt.character" 0.12 0.77 0.00 0.00
"object.size" 0.12 0.77 0.00 0.00
"as.POSIXct.default" 0.10 0.65 0.00 0.00
"Ops.POSIXt" 0.08 0.52 0.00 0.00
"type.convert" 0.08 0.52 0.00 0.00
"!=" 0.06 0.39 0.00 0.00
"as.POSIXlt.factor" 0.06 0.39 0.00 0.00
"as.POSIXlt.POSIXct" 0.04 0.26 0.04 0.26
"ifelse" 0.04 0.26 0.04 0.26
"stopifnot" 0.04 0.26 0.02 0.13
"$" 0.04 0.26 0.00 0.00
"$.data.frame" 0.04 0.26 0.00 0.00
"[[" 0.04 0.26 0.00 0.00
"[[.data.frame" 0.04 0.26 0.00 0.00
"head.default" 0.04 0.26 0.00 0.00
".deparseOpts" 0.02 0.13 0.02 0.13
".External" 0.02 0.13 0.02 0.13
"close.connection" 0.02 0.13 0.02 0.13
"doTryCatch" 0.02 0.13 0.02 0.13
"is.na" 0.02 0.13 0.02 0.13
"is.na<-.default" 0.02 0.13 0.02 0.13
"mean" 0.02 0.13 0.02 0.13
"seq.int" 0.02 0.13 0.02 0.13
"sum" 0.02 0.13 0.02 0.13
"sys.function" 0.02 0.13 0.02 0.13
"%in%" 0.02 0.13 0.00 0.00
".rs.getSingleClass" 0.02 0.13 0.00 0.00
"[.POSIXlt" 0.02 0.13 0.00 0.00
"==" 0.02 0.13 0.00 0.00
"close" 0.02 0.13 0.00 0.00
"data.row.names" 0.02 0.13 0.00 0.00
"deparse" 0.02 0.13 0.00 0.00
"factor" 0.02 0.13 0.00 0.00
"is.na<-" 0.02 0.13 0.00 0.00
"match.arg" 0.02 0.13 0.00 0.00
"match.call" 0.02 0.13 0.00 0.00
"pushBack" 0.02 0.13 0.00 0.00
"seq" 0.02 0.13 0.00 0.00
"seq.POSIXt" 0.02 0.13 0.00 0.00
"simplify2array" 0.02 0.13 0.00 0.00
"tryCatch" 0.02 0.13 0.00 0.00
"tryCatchList" 0.02 0.13 0.00 0.00
"tryCatchOne" 0.02 0.13 0.00 0.00
"which" 0.02 0.13 0.00 0.00
$sample.interval
[1] 0.02
$sampling.time
[1] 15.5
I have code which takes a lot of time executing:
dataRaw<-pblapply(femme,function (x) {
article<-user(x,date=FALSE,weight=FALSE)
names<-rep(x,length(article))
result<-matrix(c(names,article),ncol=2)
})
dataRaw<-do.call(rbind,dataRaw)
dataRaw[,3]<-vector(length=length(dataRaw[,2]))
dataRaw[,3]<-pbapply(dataRaw,1,function(x){
Rprof(filename = "profile.out")
revisions<-revisionsPage(x[2])
rank<-rankingContrib(revisions,50)
rank<-rank$contrib
x[1] %in% rank
Rprof(NULL)
})
result<-as.vector(dataRaw[dataRaw$ranking==TRUE,2])
Lanching the summaryRprof function, it give me this
$by.self
self.time self.pct total.time total.pct
".Call" 0.46 95.83 0.46 95.83
"as.data.frame.numeric" 0.02 4.17 0.02 4.17
$by.total
total.time total.pct self.time self.pct
"FUN" 0.48 100.00 0.00 0.00
"pbapply" 0.48 100.00 0.00 0.00
".Call" 0.46 95.83 0.46 95.83
"<Anonymous>" 0.46 95.83 0.00 0.00
"GET" 0.46 95.83 0.00 0.00
"request_fetch" 0.46 95.83 0.00 0.00
"request_fetch.write_memory" 0.46 95.83 0.00 0.00
"request_perform" 0.46 95.83 0.00 0.00
"revisionsPage" 0.46 95.83 0.00 0.00
"as.data.frame.numeric" 0.02 4.17 0.02 4.17
"as.data.frame" 0.02 4.17 0.00 0.00
"data.frame" 0.02 4.17 0.00 0.00
"rankingContrib" 0.02 4.17 0.00 0.00
$sample.interval
[1] 0.02
$sampling.time
[1] 0.48
Appears it is the ".Call" function which takes all the machine time. What is this .Call entry?
I have seen this error multiple times in different projects and I was wondering if there is a way to tell which line caused the error in general?
My specific case:
http://archive.ics.uci.edu/ml/machine-learning-databases/00275/
#using the bike.csv
data<-read.csv("PATH_HERE\\Bike-Sharing-Dataset\\day.csv",header=TRUE)
require(psych)
corr.test(data)
data<-data[,c("atemp","casual","cnt","holiday","hum","mnth","registered",
"season","temp","weathersit","weekday","windspeed","workingday","yr")]
data[data=='']<-NA
#View(data)
require(psych)
cors<-corr.test(data)
returns the error:
Error in data.frame(lower = lower, r = r[lower.tri(r)], upper = upper, :
arguments imply differing number of rows: 0, 91
It works for me
> #using the bike.csv
> data <- read.csv("day.csv",header=TRUE)
> require(psych)
> corr.test(data)
Error in cor(x, use = use, method = method) : 'x' must be numeric
> data <- data[,c("atemp","casual","cnt","holiday","hum","mnth","registered",
+ "season","temp","weathersit","weekday","windspeed","workingday","yr")]
> data[data==''] <- NA
> #View(data)
>
> require(psych)
> cors <- corr.test(data)
> cors
Call:corr.test(x = data)
Correlation matrix
atemp casual cnt holiday hum mnth registered season temp
atemp 1.00 0.54 0.63 -0.03 0.14 0.23 0.54 0.34 0.99
casual 0.54 1.00 0.67 0.05 -0.08 0.12 0.40 0.21 0.54
cnt 0.63 0.67 1.00 -0.07 -0.10 0.28 0.95 0.41 0.63
holiday -0.03 0.05 -0.07 1.00 -0.02 0.02 -0.11 -0.01 -0.03
hum 0.14 -0.08 -0.10 -0.02 1.00 0.22 -0.09 0.21 0.13
mnth 0.23 0.12 0.28 0.02 0.22 1.00 0.29 0.83 0.22
registered 0.54 0.40 0.95 -0.11 -0.09 0.29 1.00 0.41 0.54
season 0.34 0.21 0.41 -0.01 0.21 0.83 0.41 1.00 0.33
temp 0.99 0.54 0.63 -0.03 0.13 0.22 0.54 0.33 1.00
weathersit -0.12 -0.25 -0.30 -0.03 0.59 0.04 -0.26 0.02 -0.12
weekday -0.01 0.06 0.07 -0.10 -0.05 0.01 0.06 0.00 0.00
windspeed -0.18 -0.17 -0.23 0.01 -0.25 -0.21 -0.22 -0.23 -0.16
workingday 0.05 -0.52 0.06 -0.25 0.02 -0.01 0.30 0.01 0.05
yr 0.05 0.25 0.57 0.01 -0.11 0.00 0.59 0.00 0.05
weathersit weekday windspeed workingday yr
atemp -0.12 -0.01 -0.18 0.05 0.05
casual -0.25 0.06 -0.17 -0.52 0.25
cnt -0.30 0.07 -0.23 0.06 0.57
holiday -0.03 -0.10 0.01 -0.25 0.01
hum 0.59 -0.05 -0.25 0.02 -0.11
mnth 0.04 0.01 -0.21 -0.01 0.00
registered -0.26 0.06 -0.22 0.30 0.59
season 0.02 0.00 -0.23 0.01 0.00
temp -0.12 0.00 -0.16 0.05 0.05
weathersit 1.00 0.03 0.04 0.06 -0.05
weekday 0.03 1.00 0.01 0.04 -0.01
windspeed 0.04 0.01 1.00 -0.02 -0.01
workingday 0.06 0.04 -0.02 1.00 0.00
yr -0.05 -0.01 -0.01 0.00 1.00
Sample Size
[1] 731
Probability values (Entries above the diagonal are adjusted for multiple tests.)
atemp casual cnt holiday hum mnth registered season temp
atemp 0.00 0.00 0.00 1.00 0.01 0.00 0.00 0.00 0.00
casual 0.00 0.00 0.00 1.00 1.00 0.04 0.00 0.00 0.00
cnt 0.00 0.00 0.00 1.00 0.28 0.00 0.00 0.00 0.00
holiday 0.38 0.14 0.06 0.00 1.00 1.00 0.15 1.00 1.00
hum 0.00 0.04 0.01 0.67 0.00 0.00 0.58 0.00 0.03
mnth 0.00 0.00 0.00 0.60 0.00 0.00 0.00 0.00 0.00
registered 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00
season 0.00 0.00 0.00 0.78 0.00 0.00 0.00 0.00 0.00
temp 0.00 0.00 0.00 0.44 0.00 0.00 0.00 0.00 0.00
weathersit 0.00 0.00 0.00 0.35 0.00 0.24 0.00 0.60 0.00
weekday 0.84 0.11 0.07 0.01 0.16 0.80 0.12 0.93 1.00
windspeed 0.00 0.00 0.00 0.87 0.00 0.00 0.00 0.00 0.00
workingday 0.16 0.00 0.10 0.00 0.51 0.87 0.00 0.74 0.15
yr 0.21 0.00 0.00 0.83 0.00 0.96 0.00 0.96 0.20
weathersit weekday windspeed workingday yr
atemp 0.05 1.00 0.00 1.00 1.00
casual 0.00 1.00 0.00 0.00 0.00
cnt 0.00 1.00 0.00 1.00 0.00
holiday 1.00 0.25 1.00 0.00 1.00
hum 0.00 1.00 0.00 1.00 0.13
mnth 1.00 1.00 0.00 1.00 1.00
registered 0.00 1.00 0.00 0.00 0.00
season 1.00 1.00 0.00 1.00 1.00
temp 0.05 1.00 0.00 1.00 1.00
weathersit 0.00 1.00 1.00 1.00 1.00
weekday 0.40 0.00 1.00 1.00 1.00
windspeed 0.29 0.70 0.00 1.00 1.00
workingday 0.10 0.33 0.61 0.00 1.00
yr 0.19 0.88 0.75 0.96 0.00
To see confidence intervals of the correlations, print with the short=FALSE option
>
It works for me:::
rm(list=ls())
# http://archive.ics.uci.edu/ml/machine-learning-databases/00275/
#using the bike.csv
day <- read.csv("Bike-Sharing-Dataset//day.csv")
require(psych)
day<-day[,c("atemp","casual","cnt","holiday","hum","mnth","registered",
"season","temp","weathersit","weekday","windspeed","workingday","yr")]
day[day=='']<-NA
require(psych)
corr.test(day)
# corr.test(day)
# Call:corr.test(x = day)
# Correlation matrix
# atemp casual cnt holiday hum mnth registered season temp weathersit weekday windspeed workingday yr
# atemp 1.00 0.54 0.63 -0.03 0.14 0.23 0.54 0.34 0.99 -0.12 -0.01 -0.18 0.05 0.05
# casual 0.54 1.00 0.67 0.05 -0.08 0.12 0.40 0.21 0.54 -0.25 0.06 -0.17 -0.52 0.25
# cnt 0.63 0.67 1.00 -0.07 -0.10 0.28 0.95 0.41 0.63 -0.30 0.07 -0.23 0.06 0.57
# holiday -0.03 0.05 -0.07 1.00 -0.02 0.02 -0.11 -0.01 -0.03 -0.03 -0.10 0.01 -0.25 0.01
# hum 0.14 -0.08 -0.10 -0.02 1.00 0.22 -0.09 0.21 0.13 0.59 -0.05 -0.25 0.02 -0.11
# mnth 0.23 0.12 0.28 0.02 0.22 1.00 0.29 0.83 0.22 0.04 0.01 -0.21 -0.01 0.00
# registered 0.54 0.40 0.95 -0.11 -0.09 0.29 1.00 0.41 0.54 -0.26 0.06 -0.22 0.30 0.59
# season 0.34 0.21 0.41 -0.01 0.21 0.83 0.41 1.00 0.33 0.02 0.00 -0.23 0.01 0.00
# temp 0.99 0.54 0.63 -0.03 0.13 0.22 0.54 0.33 1.00 -0.12 0.00 -0.16 0.05 0.05
# weathersit -0.12 -0.25 -0.30 -0.03 0.59 0.04 -0.26 0.02 -0.12 1.00 0.03 0.04 0.06 -0.05
# weekday -0.01 0.06 0.07 -0.10 -0.05 0.01 0.06 0.00 0.00 0.03 1.00 0.01 0.04 -0.01
# windspeed -0.18 -0.17 -0.23 0.01 -0.25 -0.21 -0.22 -0.23 -0.16 0.04 0.01 1.00 -0.02 -0.01
# workingday 0.05 -0.52 0.06 -0.25 0.02 -0.01 0.30 0.01 0.05 0.06 0.04 -0.02 1.00 0.00
# yr 0.05 0.25 0.57 0.01 -0.11 0.00 0.59 0.00 0.05 -0.05 -0.01 -0.01 0.00 1.00
# Sample Size
# [1] 731
# Probability values (Entries above the diagonal are adjusted for multiple tests.)
# atemp casual cnt holiday hum mnth registered season temp weathersit weekday windspeed workingday yr
# atemp 0.00 0.00 0.00 1.00 0.01 0.00 0.00 0.00 0.00 0.05 1.00 0.00 1.00 1.00
# casual 0.00 0.00 0.00 1.00 1.00 0.04 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00
# cnt 0.00 0.00 0.00 1.00 0.28 0.00 0.00 0.00 0.00 0.00 1.00 0.00 1.00 0.00
# holiday 0.38 0.14 0.06 0.00 1.00 1.00 0.15 1.00 1.00 1.00 0.25 1.00 0.00 1.00
# hum 0.00 0.04 0.01 0.67 0.00 0.00 0.58 0.00 0.03 0.00 1.00 0.00 1.00 0.13
# mnth 0.00 0.00 0.00 0.60 0.00 0.00 0.00 0.00 0.00 1.00 1.00 0.00 1.00 1.00
# registered 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00
# season 0.00 0.00 0.00 0.78 0.00 0.00 0.00 0.00 0.00 1.00 1.00 0.00 1.00 1.00
# temp 0.00 0.00 0.00 0.44 0.00 0.00 0.00 0.00 0.00 0.05 1.00 0.00 1.00 1.00
# weathersit 0.00 0.00 0.00 0.35 0.00 0.24 0.00 0.60 0.00 0.00 1.00 1.00 1.00 1.00
# weekday 0.84 0.11 0.07 0.01 0.16 0.80 0.12 0.93 1.00 0.40 0.00 1.00 1.00 1.00
# windspeed 0.00 0.00 0.00 0.87 0.00 0.00 0.00 0.00 0.00 0.29 0.70 0.00 1.00 1.00
# workingday 0.16 0.00 0.10 0.00 0.51 0.87 0.00 0.74 0.15 0.10 0.33 0.61 0.00 1.00
# yr 0.21 0.00 0.00 0.83 0.00 0.96 0.00 0.96 0.20 0.19 0.88 0.75 0.96 0.00
#
# To see confidence intervals of the correlations, print with the short=FALSE option
cheers