How to improve speed in parallel cluster processing - r

I'm new to cluster processing, and could use some advice as to how better to prepare data and/or the calls to functions from the parallel package. I have read thru the parallels package vignettes, so have a vague idea what's going on.
The function I want to parallelize calls the 2-D interpolation tool akima::interp . My input consists of 3 matrices (or vectors -- all the same in R): one contains the x-coordinates, one the y-coordinates, and one the "z", or data values, for a set of sample points. interp uses this to produce interpolated data on a regular grid so I can, e.g., plot the field. Once I have these 3 items set up, I cut them into "chunks" and feed them to clusterApply to execute interp chunk by chunk.
I'm using a Windows7, i7 CPU (8-core) machine. Here's the summary output from Rprof for an input data set with 1e6 points (1000x1000 if you like), and mapped onto a 1000x1000 output grid.
So my questions are:
1) It appears that "unserialize" is taking most of the time. What is this operation, and how could it be reduced?
2) In general, since each worker loads the default .Rdata file, is there any speed gained if I first save all input data to .Rdata so that it doesn't need to get passed to the workers?
3) Anything else that I'm simply unaware of that I should have done differently?
Note: the sin, atan2, cos, +, max, min functions take place prior to the clusterApply call I make.
Rgames> summaryRprof('bigprof.txt')
$by.self
self.time self.pct total.time total.pct
"unserialize" 329.04 99.11 329.04 99.11
"socketConnection" 1.74 0.52 1.74 0.52
"serialize" 0.96 0.29 0.96 0.29
"sin" 0.06 0.02 0.06 0.02
"atan2" 0.04 0.01 0.06 0.02
"cos" 0.04 0.01 0.04 0.01
"+" 0.02 0.01 0.02 0.01
"max" 0.02 0.01 0.02 0.01
"min" 0.02 0.01 0.02 0.01
"row" 0.02 0.01 0.02 0.01
"writeLines" 0.02 0.01 0.02 0.01
$by.total
total.time total.pct self.time self.pct
"mcswirl" 331.98 100.00 0.00 0.00
"clusterApply" 330.00 99.40 0.00 0.00
"staticClusterApply" 330.00 99.40 0.00 0.00
"FUN" 329.06 99.12 0.00 0.00
"unserialize" 329.04 99.11 329.04 99.11
"lapply" 329.04 99.11 0.00 0.00
"recvData" 329.04 99.11 0.00 0.00
"recvData.SOCKnode" 329.04 99.11 0.00 0.00
"makeCluster" 1.76 0.53 0.00 0.00
"makePSOCKcluster" 1.76 0.53 0.00 0.00
"newPSOCKnode" 1.76 0.53 0.00 0.00
"socketConnection" 1.74 0.52 1.74 0.52
"serialize" 0.96 0.29 0.96 0.29
"postNode" 0.96 0.29 0.00 0.00
"sendCall" 0.96 0.29 0.00 0.00
"sendData" 0.96 0.29 0.00 0.00
"sendData.SOCKnode" 0.96 0.29 0.00 0.00
"sin" 0.06 0.02 0.06 0.02
"atan2" 0.06 0.02 0.04 0.01
"cos" 0.04 0.01 0.04 0.01
"+" 0.02 0.01 0.02 0.01
"max" 0.02 0.01 0.02 0.01
"min" 0.02 0.01 0.02 0.01
"row" 0.02 0.01 0.02 0.01
"writeLines" 0.02 0.01 0.02 0.01
"outer" 0.02 0.01 0.00 0.00
"system" 0.02 0.01 0.00 0.00
$sample.interval
[1] 0.02
$sampling.time
[1] 331.98

When clusterApply is called, it first sends a task to each of the cluster workers, and then waits for each of them to return the corresponding result. If there are more tasks to do, it repeats that procedure until all of the tasks are complete.
The function that it uses to wait for a result from a particular worker is recvResult which ultimately calls unserialize to read data from the socket that is connected to that worker. So if the master process is spending most of its time in unserialize, then it is spending most of its time waiting for the cluster workers to return the task results, which is what you would hope to see on the master. If it was spending a lot of time in serialize, that would mean that it was spending a lot of time sending the tasks to the workers, which would be a bad sign.
Unfortunately, you can't tell how much time unserialize spends blocking, waiting for the result data to arrive, and how much time it spends actually transferring that data. The results might be easily computed by the workers and huge, or they might take a long time to compute and be tiny: there's no way to tell from the profiling data.
So to make unserialize execute faster, you need to make the workers compute their results faster, or make the results smaller, if that's possible. In addition, it might help to use the makeCluster useXDR=FALSE option. It might improve your performance by not using XDR to encode your data, making both serialize and unserialize faster.
I don't think it will help to save all input data to .Rdata since you're not spending much time sending data to the workers, as seen by the short time spent in the serialize function. I suspect that would slow you down a little bit.
The only other advice I can think of is to try using parLapply or clusterApplyLB, rather than clusterApply. I recommend using parLapply unless you have a specific reason to use one of the other functions since parLapply is often the most efficient. clusterApplyLB is useful when you have tasks that take a long but variable length of time to execute.

Related

How can find difference between value with out missing first sample?

I like to find difference between my samples but when I use diff() my first sample miss.
input:
data
XX.3.22 XX.1.2 XX.5.19 XX.2.21 XX.2.16 XX.5.27 XX.3.5 XX.2.12 XX.4.15
0.00 0.12 0.17 0.20 0.21 0.26 0.27 0.27 0.32
diff(data)
output:
XX.1.2 XX.5.19 XX.2.21 XX.2.16 XX.5.27 XX.3.5 XX.2.12 XX.4.15
0.05 0.05 0.03 0.01 0.05 0.01 0.00 0.05
I do not want miss first (XX.3.22) sample.
I expect:
XX.3.22 = 0.12

Import dataset in R

Sorry possibly very silly question? Couldn't find the answer? How do I load this kind of .dat file in R and stck them in one column? I have been trying
NerveData<-as.vector(read.table("D:/Dropbox/nerve.dat", sep=" ")$value)
The data set looks like
0.21 0.03 0.05 0.11 0.59 0.06
0.18 0.55 0.37 0.09 0.14 0.19
0.02 0.14 0.09 0.05 0.15 0.23
0.15 0.08 0.24 0.16 0.06 0.11
0.15 0.09 0.03 0.21 0.02 0.14
0.24 0.29 0.16 0.07 0.07 0.04
0.02 0.15 0.12 0.26 0.15 0.33
If you want to read all the data in as a single vector, use
src <- "http://www.stat.cmu.edu/~larry/all-of-nonpar/=data/nerve.dat"
NerveData <- scan(src, numeric())
Actually I found a easier solution thanks for the initial helps
Nervedata<-read.table("nerve.dat",sep ="\t")
Nervedata2<-c(t(Nervedata))
Simply use read.table with the correct separator. Which in your case is probably \t, a tab character.
So try:
NerveData = read.table("D:/Dropbox/nerve.dat", sep="\t")

Profiling data.table's setkey operation with Rprof

I am working with a relatively large data.table dataset and trying to profile/optimize the code. I am using Rprof, but I'm noticing that the majority of time spent within a setkey operation is not included in the Rprof summary. Is there a way to include this time spent?
Here is a small test to show how time spent setting the key for a data table is not represented in the Rprof summary:
Create a test function that runs a profiled setkey operation on a data table:
testFun <- function(testTbl) {
Rprof()
setkey(testTbl, x, y, z)
Rprof(NULL)
print(summaryRprof())
}
Then create a test data table that is large enough to feel the weight of the setkey operation:
testTbl = data.table(x=sample(1:1e7, 1e7), y=sample(1:1e7,1e7), z=sample(1:1e7,1e7))
Then run the code, and wrap it within a system.time operation to show the difference between the system.time total time and the rprof total time:
> system.time(testFun(testTbl))
$by.self
self.time self.pct total.time total.pct
"sort.list" 0.88 75.86 0.88 75.86
"<Anonymous>" 0.08 6.90 1.00 86.21
"regularorder1" 0.08 6.90 0.92 79.31
"radixorder1" 0.08 6.90 0.12 10.34
"is.na" 0.02 1.72 0.02 1.72
"structure" 0.02 1.72 0.02 1.72
$by.total
total.time total.pct self.time self.pct
"setkey" 1.16 100.00 0.00 0.00
"setkeyv" 1.16 100.00 0.00 0.00
"system.time" 1.16 100.00 0.00 0.00
"testFun" 1.16 100.00 0.00 0.00
"fastorder" 1.14 98.28 0.00 0.00
"tryCatch" 1.14 98.28 0.00 0.00
"tryCatchList" 1.14 98.28 0.00 0.00
"tryCatchOne" 1.14 98.28 0.00 0.00
"<Anonymous>" 1.00 86.21 0.08 6.90
"regularorder1" 0.92 79.31 0.08 6.90
"sort.list" 0.88 75.86 0.88 75.86
"radixorder1" 0.12 10.34 0.08 6.90
"doTryCatch" 0.12 10.34 0.00 0.00
"is.na" 0.02 1.72 0.02 1.72
"structure" 0.02 1.72 0.02 1.72
"is.unsorted" 0.02 1.72 0.00 0.00
"simpleError" 0.02 1.72 0.00 0.00
$sample.interval
[1] 0.02
$sampling.time
[1] 1.16
user system elapsed
31.112 0.211 31.101
Note the 1.16 and 31.101 time differences.
Reading ?Rprof, I see why this difference might have occurred:
Functions will only be recorded in the profile log if they put a
context on the call stack (see sys.calls). Some primitive functions do
not do so: specifically those which are of type "special" (see the ‘R
Internals’ manual for more details).
So is this the reason why time spent within the setkey operation isn't represented in Rprof? Is there a workaround have Rprof watch all of data.table's operations (including setkey, and maybe others that I haven't noticed)? I essentially want to have the system.time and Rprof time match up.
Here is the most-likely relevant sessionInfo():
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
data.table_1.8.11
I still observe this issue when Rprof() isn't within a function call:
> testFun <- function(testTbl) {
+ setkey(testTbl, x, y, z)
+ }
> Rprof()
> system.time(testFun(testTbl))
user system elapsed
28.855 0.191 28.854
> Rprof(NULL)
> summaryRprof()
$by.self
self.time self.pct total.time total.pct
"sort.list" 0.86 71.67 0.88 73.33
"regularorder1" 0.08 6.67 0.92 76.67
"<Anonymous>" 0.06 5.00 0.98 81.67
"radixorder1" 0.06 5.00 0.10 8.33
"gc" 0.06 5.00 0.06 5.00
"proc.time" 0.04 3.33 0.04 3.33
"is.na" 0.02 1.67 0.02 1.67
"sys.function" 0.02 1.67 0.02 1.67
$by.total
total.time total.pct self.time self.pct
"system.time" 1.20 100.00 0.00 0.00
"setkey" 1.10 91.67 0.00 0.00
"setkeyv" 1.10 91.67 0.00 0.00
"testFun" 1.10 91.67 0.00 0.00
"fastorder" 1.08 90.00 0.00 0.00
"tryCatch" 1.08 90.00 0.00 0.00
"tryCatchList" 1.08 90.00 0.00 0.00
"tryCatchOne" 1.08 90.00 0.00 0.00
"<Anonymous>" 0.98 81.67 0.06 5.00
"regularorder1" 0.92 76.67 0.08 6.67
"sort.list" 0.88 73.33 0.86 71.67
"radixorder1" 0.10 8.33 0.06 5.00
"doTryCatch" 0.10 8.33 0.00 0.00
"gc" 0.06 5.00 0.06 5.00
"proc.time" 0.04 3.33 0.04 3.33
"is.na" 0.02 1.67 0.02 1.67
"sys.function" 0.02 1.67 0.02 1.67
"formals" 0.02 1.67 0.00 0.00
"is.unsorted" 0.02 1.67 0.00 0.00
"match.arg" 0.02 1.67 0.00 0.00
$sample.interval
[1] 0.02
$sampling.time
[1] 1.2
EDIT2: Same issue with 1.8.10 on my machine with only the data.table package loaded. Times are not equal even when the Rprof() call is not within a function:
> library(data.table)
data.table 1.8.10 For help type: help("data.table")
> base::source("/tmp/r-plugin-claytonstanley/Rsource-86075-preProcess.R", echo=TRUE)
> testFun <- function(testTbl) {
+ setkey(testTbl, x, y, z)
+ }
> testTbl = data.table(x=sample(1:1e7, 1e7), y=sample(1:1e7,1e7), z=sample(1:1e7,1e7))
> Rprof()
> system.time(testFun(testTbl))
user system elapsed
29.516 0.281 29.760
> Rprof(NULL)
> summaryRprof()
EDIT3: Doesn't work even if setkey is not within a function:
> library(data.table)
data.table 1.8.10 For help type: help("data.table")
> testTbl = data.table(x=sample(1:1e7, 1e7), y=sample(1:1e7,1e7), z=sample(1:1e7,1e7))
> Rprof()
> setkey(testTbl, x, y, z)
> Rprof(NULL)
> summaryRprof()
EDIT4: Doesn't work even when R is called from a --vanilla bare-bones terminal prompt.
EDIT5: Does work when tested on a Linux VM. But still does not work on darwin machine for me.
EDIT6: Doesn't work after watching the Rprof.out file get created, so it isn't a write access issue.
EDIT7: Doesn't work after compiling data.table from source and creating a new temp user and running on that account.
EDIT8: Doesn't work when compiling R 3.0.2 from source for darwin via MacPorts.
EDIT9: Does work on a different darwin machine, a Macbook Pro laptop running the same OS version (10.6.8). Still doesn't work on a MacPro desktop machine running same OS version, R version, data.table version, etc.
I'm thinking it's b/c the desktop machine is running in 64-bit kernel mode (not default), and the laptop is 32-bit (default). Confirmed.
Great question. Given edits, I'm not sure then, can't reproduce. Leaving remainder of answer here for now.
I've tested on my (very slow) netbook and it works fine, see output below.
I can tell you right now why setkey is so slow on that test case. When the number of levels are large (greater than 100,000 as here) it reverts to comparison sort rather than counting sort. Yes pretty poor if you have data like that in practice. Typically we have under 100,000 unique values in the first column then, say, dates in the 2nd column. Both columns can be sorted using counting sort and performance is ok.
It's a known issue and we've been working hard on it. Arun has implemented radix sort for integers with range > 100,000 to solve this problem and that's in the next release. But we are still tidying up v1.8.11. See our presentation in Cologne which goes into more detail and gives some idea of speedups.
Inroduction to data.table and news from v1.8.11
Here is the output with v1.8.10, along with R version and lscpu info (for your entertainment). I like to test on a very poor machine with small cache so that in development I can see what's likely to bite when the data is scaled up on larger machines with larger cache.
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 20
Model: 2
Stepping: 0
CPU MHz: 800.000
BogoMIPS: 1995.01
Virtualisation: AMD-V
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
NUMA node0 CPU(s): 0,1
$ R
R version 3.0.2 (2013-09-25) -- "Frisbee Sailing"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
> require(data.table)
Loading required package: data.table
data.table 1.8.10 For help type: help("data.table")
> testTbl = data.table(x=sample(1:1e7, 1e7), y=sample(1:1e7,1e7), z=sample(1:1e7,1e7))
> testTbl
x y z
1: 1748920 6694402 7501082
2: 4571252 565976 5695727
3: 1284455 8282944 7706392
4: 8452994 8765774 6541097
5: 6429283 329475 5271154
---
9999996: 2019750 5956558 1735214
9999997: 1096888 1657401 3519573
9999998: 1310171 9002746 350394
9999999: 5393125 5888350 7657290
10000000: 2210918 7577598 5002307
> Rprof()
> setkey(testTbl, x, y, z)
> Rprof(NULL)
> summaryRprof()
$by.self
self.time self.pct total.time total.pct
"sort.list" 195.44 91.34 195.44 91.34
".Call" 5.38 2.51 5.38 2.51
"<Anonymous>" 4.32 2.02 203.62 95.17
"radixorder1" 4.32 2.02 4.74 2.22
"regularorder1" 4.28 2.00 199.30 93.15
"is.na" 0.12 0.06 0.12 0.06
"any" 0.10 0.05 0.10 0.05
$by.total
total.time total.pct self.time self.pct
"setkey" 213.96 100.00 0.00 0.00
"setkeyv" 213.96 100.00 0.00 0.00
"fastorder" 208.36 97.38 0.00 0.00
"tryCatch" 208.36 97.38 0.00 0.00
"tryCatchList" 208.36 97.38 0.00 0.00
"tryCatchOne" 208.36 97.38 0.00 0.00
"<Anonymous>" 203.62 95.17 4.32 2.02
"regularorder1" 199.30 93.15 4.28 2.00
"sort.list" 195.44 91.34 195.44 91.34
".Call" 5.38 2.51 5.38 2.51
"radixorder1" 4.74 2.22 4.32 2.02
"doTryCatch" 4.74 2.22 0.00 0.00
"is.unsorted" 0.22 0.10 0.00 0.00
"is.na" 0.12 0.06 0.12 0.06
"any" 0.10 0.05 0.10 0.05
$sample.interval
[1] 0.02
$sampling.time
[1] 213.96
>
The problem was that the darwin machine was running Snow Leopard with a 64-bit kernel, which is not the default for that OS X version.
I also verified that this is not a problem for another darwin machine running Mountain Lion which uses a 64-bit kernel by default. So it's an interaction between Snow Leopard and running a 64-bit kernel specifically.
As a side note, the official OS X binary installer for R is still built with Snow Leopard, so I do think that this issue is still relevant, as Snow Leopard is still a widely-used OS X version.
When the 64-bit kernel in Snow Leopard is enabled, no kernel extensions that are compatible only with the 32-bit kernel are loaded. After booting into the default 32-bit kernel for Snow Leopard, kextfind shows that these 32-bit only kernel extensions are on the machine and (most likely) loaded:
$ kextfind -not -arch x86_64
/System/Library/Extensions/ACard6280ATA.kext
/System/Library/Extensions/ACard62xxM.kext
/System/Library/Extensions/ACard67162.kext
/System/Library/Extensions/ACard671xSCSI.kext
/System/Library/Extensions/ACard6885M.kext
/System/Library/Extensions/ACard68xxM.kext
/System/Library/Extensions/AppleIntelGMA950.kext
/System/Library/Extensions/AppleIntelGMAX3100.kext
/System/Library/Extensions/AppleIntelGMAX3100FB.kext
/System/Library/Extensions/AppleIntelIntegratedFramebuffer.kext
/System/Library/Extensions/AppleProfileFamily.kext/Contents/PlugIns/AppleIntelYonahProfile.kext
/System/Library/Extensions/IO80211Family.kext/Contents/PlugIns/AirPortAtheros.kext
/System/Library/Extensions/IONetworkingFamily.kext/Contents/PlugIns/AppleRTL8139Ethernet.kext
/System/Library/Extensions/IOSerialFamily.kext/Contents/PlugIns/InternalModemSupport.kext
/System/Library/Extensions/IOSerialFamily.kext/Contents/PlugIns/MotorolaSM56KUSB.kext
/System/Library/Extensions/JMicronATA.kext
/System/Library/Extensions/System.kext/PlugIns/BSDKernel6.0.kext
/System/Library/Extensions/System.kext/PlugIns/IOKit6.0.kext
/System/Library/Extensions/System.kext/PlugIns/Libkern6.0.kext
/System/Library/Extensions/System.kext/PlugIns/Mach6.0.kext
/System/Library/Extensions/System.kext/PlugIns/System6.0.kext
/System/Library/Extensions/ufs.kext
So it could be any one of those loaded extensions that is enabling something for the Rprof package to use, so that the setkey operation in data.table is profiled correctly.
If anyone wants to investigate this further, dig a bit deeper, and get to the root cause of the problem, please post an answer and I'll happily accept that one.

How to efficiently grow large data in R

The product of one simulation is a large data.frame, with fixed columns and rows. I ran several hundreds of simulations, with each result stored in a separate RData file (for efficient reading).
Now I want to gather all those files together and create statistics for each field of this data.frame into the "cells" structure which is basically a list of vectors with . This is how I do it:
#colscount, rowscount - number of columns and rows from each simulation
#simcount - number of simulation.
#colnames - names of columns of simulation's data frame.
#simfilenames - vector with filenames with each simulation
cells<-as.list(rep(NA, colscount))
for(i in 1:colscount)
{
cells[[i]]<-as.list(rep(NA,rowscount))
for(j in 1:rows)
{
cells[[i]][[j]]<-rep(NA,simcount)
}
}
names(cells)<-colnames
addcells<-function(simnr)
# This function reads and appends simdata to "simnr" position in each cell in the "cells" structure
{
simdata<readRDS(simfilenames[[simnr]])
for(i in 1:colscount)
{
for(j in 1:rowscount)
{
if (!is.na(simdata[j,i]))
{
cells[[i]][[j]][simnr]<-simdata[j,i]
}
}
}
}
library(plyr)
a_ply(1:simcount,1,addcells)
The problem is, that this the
> system.time(dane<-readRDS(path.cat(args$rdatapath,pliki[[simnr]]))$dane)
user system elapsed
0.088 0.004 0.093
While
? system.time(addcells(1))
user system elapsed
147.328 0.296 147.644
I would expect both commands to have comparable execution times (or at least the latter be max 10 x slower). I guess I am doing something very inefficient there, but what? The whole cells data structure is rather big, it takes around 1GB of memory.
I need to transpose data in this way, because later I do many descriptive statistics on the results (like computing means, sd, quantiles, and maybe histograms), so it is important, that the data for each cell is stored as a (single-dimensional) vector.
Here is profiling output:
> summaryRprof('/tmp/temp/rprof.out')
$by.self
self.time self.pct total.time total.pct
"[.data.frame" 71.98 47.20 129.52 84.93
"names" 11.98 7.86 11.98 7.86
"length" 10.84 7.11 10.84 7.11
"addcells" 10.66 6.99 151.52 99.36
".subset" 10.62 6.96 10.62 6.96
"[" 9.68 6.35 139.20 91.28
"match" 6.06 3.97 11.36 7.45
"sys.call" 4.68 3.07 4.68 3.07
"%in%" 4.50 2.95 15.86 10.40
"all" 4.28 2.81 4.28 2.81
"==" 2.34 1.53 2.34 1.53
".subset2" 1.28 0.84 1.28 0.84
"is.na" 1.06 0.70 1.06 0.70
"nargs" 0.62 0.41 0.62 0.41
"gc" 0.54 0.35 0.54 0.35
"!" 0.42 0.28 0.42 0.28
"dim" 0.34 0.22 0.34 0.22
".Call" 0.12 0.08 0.12 0.08
"readRDS" 0.10 0.07 0.12 0.08
"cat" 0.10 0.07 0.10 0.07
"readLines" 0.04 0.03 0.04 0.03
"strsplit" 0.04 0.03 0.04 0.03
"addParaBreaks" 0.02 0.01 0.04 0.03
It looks that indexing the list structure takes a lot of time. But I can't make it array, because not all cells are numeric, and R doesn't easily support hash map...

transpose 250,000 rows into columns in R

I always transpose by using t(file) command in R.
But i it is not running properly (not running at all) on big data file (250,000 rows and 200 columns). Any ideas.
I need to calculate correlation between 2nd row (PTBP1) with all other rows (except 8 rows including header). In order to do this I transpose rows to columns and then use cor function.
But I struck at transpose fn. Any help would be really appreciated!
I copied example from one of the post in stackoverflow (They are also almost discussing the same problem but seems no answer yet!)
ID A B C D E F G H I [200 columns]
Row0$-1 0.08 0.47 0.94 0.33 0.08 0.93 0.72 0.51 0.55
Row02$1 0.37 0.87 0.72 0.96 0.20 0.55 0.35 0.73 0.44
Row03$ 0.19 0.71 0.52 0.73 0.03 0.18 0.13 0.13 0.30
Row04$- 0.08 0.77 0.89 0.12 0.39 0.18 0.74 0.61 0.57
Row05$- 0.09 0.60 0.73 0.65 0.43 0.21 0.27 0.52 0.60
Row06-$ 0.60 0.54 0.70 0.56 0.49 0.94 0.23 0.80 0.63
Row07$- 0.02 0.33 0.05 0.90 0.48 0.47 0.51 0.36 0.26
Row08$_ 0.34 0.96 0.37 0.06 0.20 0.14 0.84 0.28 0.47
........
250,000 rows
Use a matrix instead. The only advantage of a dataframe over a matrix is the capacity to have different classes in the columns and you clearly do not have that situation, since a transposed dataframe could not support such a result.
I don't get why you want to transpose the data.frame. If you just use cor it doesn't matter if your data is in rows or columns.
Actually, it is one of the major advantages of R that it doen's matter if your data fits in the classical row-column pattern as SPSS and others programs require data to be.
There are numerous ways to correlate the first row with all other rows (I don't get which rows you want to exclude). One is using a loop (here the loop is implicit in the call to one of the *apply family functions):
lapply(2:(dim(fn)[1]), function(x) cor(fn[1,],fn[x,]))
Note that I expect you data.frame to ba called fn. To skip some rows change the 2 to the number you want. Furthermore, I would probably use vapply here.
I hope this answer points you in the correct direction and that is to not use t() if you absolutely don't need it.

Resources