I have a two-column tibble with many rows, and I would like to display the contents of the tibble in an HTML table while flowing the contents of the tibble to multiple columns. Here's a sample tibble of data typical of what I'm working with.
structure(list(scores = 328:360, points = c(1.44976648822324,
2.39850620178477, 3.54432361637504, 4.87641377160755, 6.38641933285773,
8.06758106817846, 9.91425882252425, 11.9216360940354, 14.0855256867808,
16.4022354393545, 18.8684718158155, 21.4812684932283, 24.2379320869751,
27.136, 30.1732070790481, 33.3474588150326, 36.6568095024477,
40.0994442217347, 43.6736638137649, 47.3778722279125, 51.2105657758874,
55.1703239324027, 59.2558014037444, 63.46572124494, 67.7988688512606,
72.2540866842331, 76.8302696189673, 81.5263608204015, 86.3413480724772,
91.2742604972992, 96.3241656118041, 101.490166677918, 106.771400309065
)), row.names = c(NA, -33L), class = c("tbl_df", "tbl", "data.frame"
))
I would like to display those data pairs in an HTML table something like the following:
| Score | Points | Score | Points | Score | Points | Score | Points |
|------:|-------:|------:|-------:|------:|-------:|------:|-------:|
| 328 | 1.4 | 337 | 16.4 | 346 | 43.7 | 355 | 81.5 |
| 329 | 2.4 | 338 | 18.9 | 347 | 47.4 | 356 | 86.3 |
| 330 | 3.5 | 339 | 21.5 | 348 | 51.2 | 357 | 91.3 |
| 331 | 4.9 | 340 | 24.2 | 349 | 55.2 | 358 | 96.3 |
| 332 | 6.4 | 341 | 27.1 | 350 | 59.3 | 359 | 101.5 |
| 333 | 8.1 | 342 | 30.2 | 351 | 63.5 | 360 | 106.8 |
| 334 | 9.9 | 343 | 33.3 | 352 | 67.8 | | |
| 335 | 11.9 | 344 | 36.7 | 353 | 72.3 | | |
| 336 | 14.1 | 345 | 40.1 | 354 | 76.8 | | |
I'd like to have a solution that would generate a four-doublecolumn layout no matter how many rows are in the original tibble.
I started by slicing the tibble into four sections, but got stuck because the fourth one didn't have as many elements as the first three.
Any suggestions on a method to accomplish this?
You can write a small function, that will take in the data and the number of columns you need. Default is just 4 columns
reshaping = function(dat, cols = 4){
n = nrow(dat)
m = ceiling(n/cols)
time=rep(1:cols, each = m, len = n)
id = rep(1:m, times = cols, len = n)
reshape(cbind(id, time, dat), idvar = 'id', dir='wide')[-1]
}
reshaping(dat)
scores.1 points.1 scores.2 points.2 scores.3 points.3 scores.4 points.4
1 328 1.449766 337 16.40224 346 43.67366 355 81.52636
2 329 2.398506 338 18.86847 347 47.37787 356 86.34135
3 330 3.544324 339 21.48127 348 51.21057 357 91.27426
4 331 4.876414 340 24.23793 349 55.17032 358 96.32417
5 332 6.386419 341 27.13600 350 59.25580 359 101.49017
6 333 8.067581 342 30.17321 351 63.46572 360 106.77140
7 334 9.914259 343 33.34746 352 67.79887 NA NA
8 335 11.921636 344 36.65681 353 72.25409 NA NA
9 336 14.085526 345 40.09944 354 76.83027 NA NA
reshaping(dat,8)
scores.1 points.1 scores.2 points.2 scores.3 points.3 scores.4 points.4 scores.5 points.5 scores.6 points.6 scores.7 points.7
1 328 1.449766 333 8.067581 338 18.86847 343 33.34746 348 51.21057 353 72.25409 358 96.32417
2 329 2.398506 334 9.914259 339 21.48127 344 36.65681 349 55.17032 354 76.83027 359 101.49017
3 330 3.544324 335 11.921636 340 24.23793 345 40.09944 350 59.25580 355 81.52636 360 106.77140
4 331 4.876414 336 14.085526 341 27.13600 346 43.67366 351 63.46572 356 86.34135 NA NA
5 332 6.386419 337 16.402235 342 30.17321 347 47.37787 352 67.79887 357 91.27426 NA NA
Related
I have a table in a Mariadb version 10.3.27 database that looks like this:
+----+------------+---------------+-----------------+
| id | channel_id | timestamp | value |
+----+------------+---------------+-----------------+
| 1 | 2 | 1623669600000 | 2882.4449252449 |
| 2 | 1 | 1623669600000 | 295.46914369742 |
| 3 | 2 | 1623669630000 | 2874.46365243 |
| 4 | 1 | 1623669630000 | 295.68124546516 |
| 5 | 2 | 1623669660000 | 2874.9638893452 |
| 6 | 1 | 1623669660000 | 295.69561247521 |
| 7 | 2 | 1623669690000 | 2878.7120274678 |
and I want to have a result like this:
+------+-------+-------+
| hour | valhh | valwp |
+------+-------+-------+
| 0 | 419 | 115 |
| 1 | 419 | 115 |
| 2 | 419 | 115 |
| 3 | 419 | 115 |
| 4 | 419 | 115 |
| 5 | 419 | 115 |
| 6 | 419 | 115 |
| 7 | 419 | 115 |
| 8 | 419 | 115 |
| 9 | 419 | 115 |
| 10 | 419 | 115 |
| 11 | 419 | 115 |
| 12 | 419 | 115 |
| 13 | 419 | 115 |
| 14 | 419 | 115 |
| 15 | 419 | 115 |
| 16 | 419 | 115 |
| 17 | 419 | 115 |
| 18 | 419 | 115 |
| 19 | 419 | 115 |
| 20 | 419 | 115 |
| 21 | 419 | 115 |
| 22 | 419 | 115 |
| 23 | 419 | 115 |
+------+-------+-------+
but with valhh (valwp) being the average of the values for the hour of the day for all days where the channel_id is 1 (2) and not the overall average. So far, I've tried:
select h.hour, hh.valhh, wp.valwp from
(select hour(from_unixtime(timestamp/1000)) as hour from data) h,
(select hour(from_unixtime(timestamp/1000)) as hour, cast(avg(value) as integer) as valhh from data where channel_id = 1) hh,
(select hour(from_unixtime(timestamp/1000)) as hour, cast(avg(value) as integer) as valwp from data where channel_id = 2) wp group by h.hour;
which gives the result above (average of all values).
I can get what I want by querying the channels separately, i.e.:
select hour(from_unixtime(timestamp/1000)) as hour, cast(avg(value) as integer) as value from data where channel_id = 1 group by hour;
gives
+------+-------+
| hour | value |
+------+-------+
| 0 | 326 |
| 1 | 145 |
| 2 | 411 |
| 3 | 142 |
| 4 | 143 |
| 5 | 171 |
| 6 | 160 |
| 7 | 487 |
| 8 | 408 |
| 9 | 186 |
| 10 | 214 |
| 11 | 199 |
| 12 | 942 |
| 13 | 521 |
| 14 | 196 |
| 15 | 247 |
| 16 | 364 |
| 17 | 252 |
| 18 | 392 |
| 19 | 916 |
| 20 | 1024 |
| 21 | 1524 |
| 22 | 561 |
| 23 | 249 |
+------+-------+
but I want to have both channels in one result set as separate columns.
How would I do that?
Thanks!
After a steep learning curve I think I figured it out:
select
hh.hour, hh.valuehh, wp.valuewp
from
(select
hour(from_unixtime(timestamp/1000)) as hour,
cast(avg(value) as integer) as valuehh
from data
where channel_id=1
group by hour) hh
inner join
(select
hour(from_unixtime(timestamp/1000)) as hour,
cast(avg(value) as integer) as valuewp
from data
where channel_id=2
group by hour) wp
on hh.hour = wp.hour;
gives
+------+---------+---------+
| hour | valuehh | valuewp |
+------+---------+---------+
| 0 | 300 | 38 |
| 1 | 162 | 275 |
| 2 | 338 | 668 |
| 3 | 166 | 38 |
| 4 | 152 | 38 |
| 5 | 176 | 37 |
| 6 | 174 | 38 |
| 7 | 488 | 36 |
| 8 | 553 | 37 |
| 9 | 198 | 36 |
| 10 | 214 | 38 |
| 11 | 199 | 612 |
| 12 | 942 | 40 |
| 13 | 521 | 99 |
| 14 | 187 | 38 |
| 15 | 209 | 38 |
| 16 | 287 | 39 |
| 17 | 667 | 37 |
| 18 | 615 | 39 |
| 19 | 854 | 199 |
| 20 | 1074 | 44 |
| 21 | 1470 | 178 |
| 22 | 665 | 37 |
| 23 | 235 | 38 |
+------+---------+---------+
I am new to R and I am trying to build a nonlinear correlation in the format below. I have tried a script in R but It is not working and return an error message " singular gradient matrix at initial parameter estimates". Can Someone please help me with the right script to enter in R in order to estimate the updated correlation Coefficients based on new data set?. The data set is made of 3 variables Z,X and Y. I would like to Estimate Z=f(x,y).
Thank You
Equation to Fit
z=a+bx+cy+dx^2+ey^2+fxy+gx^3+hy^3+ixy^2+jx^2y
a 0.065119008
b -0.002506607
c 0.004586821
d 3.73635E-05
e 8.41116E-07
f -1.7902E-05
g -1.28967E-07
h -1.04123E-10
i -2.40641E-09
j 4.42138E-08
X | Y | Z
_______ | _______ | _______
60 | 100 | 0.41994
60 | 200 | 0.79807
60 | 300 | 1.18778
60 | 400 | 1.58945
60 | 500 | 2.00336
60 | 600 | 2.42971
60 | 700 | 2.86858
60 | 800 | 3.31989
60 | 900 | 3.78335
60 | 1000 | 4.25842
60 | 1100 | 4.74429
60 | 1200 | 5.23983
60 | 1300 | 5.74359
60 | 1400 | 6.25381
60 | 1500 | 6.76844
60 | 1600 | 7.28523
60 | 1700 | 7.80179
60 | 1800 | 8.31574
60 | 1900 | 8.82475
60 | 2000 | 9.32668
80 | 100 | 0.40357
80 | 200 | 0.76552
80 | 300 | 1.13711
80 | 400 | 1.5185
80 | 500 | 1.90979
80 | 600 | 2.311
80 | 700 | 2.72205
80 | 800 | 3.14274
80 | 900 | 3.57269
80 | 1000 | 4.01141
80 | 1100 | 4.45817
80 | 1200 | 4.91207
80 | 1300 | 5.37202
80 | 1400 | 5.83674
80 | 1500 | 6.30477
80 | 1600 | 6.77453
80 | 1700 | 7.24438
80 | 1800 | 7.71262
80 | 1900 | 8.17761
80 | 2000 | 8.63777
100 | 100 | 0.38847
100 | 200 | 0.73573
100 | 300 | 1.09104
100 | 400 | 1.45447
100 | 500 | 1.82598
100 | 600 | 2.20551
100 | 700 | 2.59287
100 | 800 | 2.9878
100 | 900 | 3.38993
100 | 1000 | 3.79877
100 | 1100 | 4.21372
100 | 1200 | 4.63401
100 | 1300 | 5.0588
100 | 1400 | 5.48709
100 | 1500 | 5.91781
100 | 1600 | 6.3498
100 | 1700 | 6.78184
100 | 1800 | 7.21271
100 | 1900 | 7.64119
100 | 2000 | 8.06612
120 | 100 | 0.37451
120 | 200 | 0.70832
120 | 300 | 1.04892
120 | 400 | 1.39627
120 | 500 | 1.7503
120 | 600 | 2.11085
120 | 700 | 2.47771
120 | 800 | 2.85059
120 | 900 | 3.22913
120 | 1000 | 3.61287
120 | 1100 | 4.00129
120 | 1200 | 4.39376
120 | 1300 | 4.78958
120 | 1400 | 5.18797
120 | 1500 | 5.58809
120 | 1600 | 5.98905
120 | 1700 | 6.38994
120 | 1800 | 6.78981
120 | 1900 | 7.18777
120 | 2000 | 7.58291
140 | 100 | 0.36155
140 | 200 | 0.683
140 | 300 | 1.01021
140 | 400 | 1.34307
140 | 500 | 1.68148
140 | 600 | 2.02523
140 | 700 | 2.37411
140 | 800 | 2.72783
140 | 900 | 3.08602
140 | 1000 | 3.4483
140 | 1100 | 3.81418
140 | 1200 | 4.18314
140 | 1300 | 4.55459
140 | 1400 | 4.9279
140 | 1500 | 5.3024
140 | 1600 | 5.67739
140 | 1700 | 6.05216
140 | 1800 | 6.42596
140 | 1900 | 6.7981
140 | 2000 | 7.16787
160 | 100 | 0.34948
160 | 200 | 0.65953
160 | 300 | 0.97447
160 | 400 | 1.29419
160 | 500 | 1.61852
160 | 600 | 1.94728
160 | 700 | 2.28022
160 | 800 | 2.61706
160 | 900 | 2.95748
160 | 1000 | 3.3011
160 | 1100 | 3.64752
160 | 1200 | 3.99628
160 | 1300 | 4.34688
160 | 1400 | 4.6988
160 | 1500 | 5.05149
160 | 1600 | 5.40438
160 | 1700 | 5.7569
160 | 1800 | 6.10847
160 | 1900 | 6.4585
160 | 2000 | 6.80647
180 | 100 | 0.33822
180 | 200 | 0.6377
180 | 300 | 0.94137
180 | 400 | 1.24907
180 | 500 | 1.56064
180 | 600 | 1.87588
180 | 700 | 2.19455
180 | 800 | 2.51639
180 | 900 | 2.84109
180 | 1000 | 3.16833
180 | 1100 | 3.49772
180 | 1200 | 3.82888
180 | 1300 | 4.16138
180 | 1400 | 4.49478
180 | 1500 | 4.82863
180 | 1600 | 5.16245
180 | 1700 | 5.49577
180 | 1800 | 5.82812
180 | 1900 | 6.15903
180 | 2000 | 6.48806
200 | 100 | 0.32767
200 | 200 | 0.61734
200 | 300 | 0.91058
200 | 400 | 1.20725
200 | 500 | 1.50717
200 | 600 | 1.81015
200 | 700 | 2.11596
200 | 800 | 2.42434
200 | 900 | 2.73502
200 | 1000 | 3.04768
200 | 1100 | 3.36202
200 | 1200 | 3.67767
200 | 1300 | 3.99427
200 | 1400 | 4.31145
200 | 1500 | 4.62882
200 | 1600 | 4.94597
200 | 1700 | 5.26253
200 | 1800 | 5.57809
200 | 1900 | 5.89227
200 | 2000 | 6.2047
I'm not entirely sure what it is you would like to do, or why google was unsatisfactory, but maybe something along these lines will give you an idea:
x <- rep(c(60,80,100,160,200), each = 10)
y <- c(seq(from = 100, to = 2000, length.out = 25),seq(1800, 200, length.out = 25))
z <- rnorm(50, 6)
df <- data.frame(x,y,z)
mod <- lm(z ~ 1 + x + y + I(x^2) + I(y^2) + I(x*y) + I(x^3) + I(y^3) + I(x*y^2) + I(x*y^3), data =df)
summary(mod)
summary(mod)$adj
I have a table of doctor visits wherein there are sometimes multiple records for the same encounter key if there are multiple diagnoses, such as:
Enc_Key | Patient_Key | Enc_Date | Diag_Key
123 789 20160512 765
123 789 20160512 263
123 789 20160515 493
546 013 20160226 765
564 444 20160707 004
789 226 20160707 546
789 226 20160707 765
I am trying to create an indicator variable based on the value of the Diag_Key column, but I need to apply it for the entire encounter. In other word, if I get a value of "756" for the diagnoses code, then I want to apply a "1" for the indicator variable to every record that has the same Enc_Key as the record that has a Diag_Code value of 756, such as below:
Enc_Key | Patient_Key | Enc_Date | Diag_Key | Diag_Ind
123 789 20160512 765 1
123 789 20160512 263 1
123 789 20160515 493 1
546 013 20160226 723 0
564 444 20160707 004 0
789 226 20160707 546 1
789 226 20160707 765 1
I can't seem to figure out a way to apply this binary indicator to multiple different records. I have been using a line of code that resembles this:
tbl$Diag_Ind <- ifelse(grepl('765',tbl$Diag_Key),1,0)
but this would only assign a value of "1" to the single record with that Diag_Key value, and I'm unsure of how to apply it to the rest of the records with the same Enc_Key value
Use == to compare values directly and %in% for filtering with multiple values. For example, this will identify all Enc_Keys which have some Diag_Key == 765:
dat$Enc_Key[dat$Diag_Key == 765]
Then just select the data by Enc_Key and convert boolean to integer:
as.integer(
dat$Enc_Key %in% unique(dat$Enc_Key[dat$Diag_Key == 765])
)
Use mutate from dplyr. May be you have a typo in required output in original data Enc_Key = 546 is 765 but not in the required dataframe.
library(dplyr)
input = read.table(text = "Enc_Key Patient_Key Enc_Date Diag_Key
123 789 20160512 765
123 789 20160512 263
123 789 20160515 493
546 013 20160226 765
564 444 20160707 004
789 226 20160707 546
789 226 20160707 765", header = TRUE, stringsAsFactors = FALSE)
input %>% group_by(Enc_Key) %>%
mutate(Diag_Ind = max(grepl('765',Diag_Key)))
Output:
Enc_Key Patient_Key Enc_Date Diag_Key Diag_Ind
1 123 789 20160512 765 1
2 123 789 20160512 263 1
3 123 789 20160515 493 1
4 546 13 20160226 765 1
5 564 444 20160707 4 0
6 789 226 20160707 546 1
7 789 226 20160707 765 1
With corrected typo output is
Enc_Key Patient_Key Enc_Date Diag_Key Diag_Ind
1 123 789 20160512 765 1
2 123 789 20160512 263 1
3 123 789 20160515 493 1
4 546 13 20160226 723 0
5 564 444 20160707 4 0
6 789 226 20160707 546 1
7 789 226 20160707 765 1
I have data on work stations were workers worked by day, and I need to find how many days a worker began working in the same station he left off the period day. Each observation is one work-day per worker.
worker.id | start.station | end.station | day
1 | 234 | 342 | 2015-01-02
1 | 342 | 425 | 2015-01-03
1 | 235 | 621 | 2015-01-04
2 | 155 | 732 | 2015-01-02
2 | 318 | 632 | 2015-01-03
2 | 632 | 422 | 2015-01-04
So the desired outcomes would be to generate a variable (same) that identifies days in which worker started at same work station as he left off previous day (with NA or FALSE in first observation for each worker).
worker.id | start.station | end.station | day | same
1 | 234 | 342 | 2015-01-02 | FALSE
1 | 342 | 425 | 2015-01-03 | TRUE
1 | 235 | 621 | 2015-01-04 | FALSE
2 | 155 | 732 | 2015-01-02 | FALSE
2 | 318 | 632 | 2015-01-03 | FALSE
2 | 632 | 422 | 2015-01-04 | TRUE
I think something using dplyr would work, but not sure what.
Thanks!
worker.id<-c(1,1,1,2,2,2)
start.station<-c(234,342,235,155,218,632)
end.station<-c(342,425,621,732,632,422)
end.station<-c(342,425,621,732,632,422)
day<-c("2015-01-02"," 2015-01-03"," 2015-01-04"," 2015-01-02"," 2015-01-03"," 2015-01-04")
df<-data.frame(worker.id, start.station ,end.station, day)
worker.id start.station end.station day
1 1 234 342 2015-01-02
2 1 342 425 2015-01-03
3 1 235 621 2015-01-04
4 2 155 732 2015-01-02
5 2 218 632 2015-01-03
6 2 632 422 2015-01-04
df$same<-ifelse(df$start.station!=lag(df$end.station) |
df$day=="2015-01-02", "FALSE","TRUE")
worker.id start.station end.station day same
1 1 234 342 2015-01-02 FALSE
2 1 342 425 2015-01-03 TRUE
3 1 235 621 2015-01-04 FALSE
4 2 155 732 2015-01-02 FALSE
5 2 218 632 2015-01-03 FALSE
6 2 632 422 2015-01-04 TRUE
Per suggestions in comments below if you want to group by worker ID but use ifelse (clunky):
df <-df %>%
group_by(worker.id) %>%
mutate(same=ifelse(start.station!=lag(end.station) &
start.station!=NA, "FALSE","TRUE")) %>%
mutate(same=ifelse(is.na(same), "FALSE","TRUE"))
as.data.frame(df)
worker.id start.station end.station day same
1 1 234 342 2015-01-02 FALSE
2 1 342 425 2015-01-03 TRUE
3 1 235 621 2015-01-04 FALSE
4 2 155 732 2015-01-02 FALSE
5 2 218 632 2015-01-03 FALSE
6 2 632 422 2015-01-04 TRUE
How can I load an input file like this into an R dataframe?
[S1] [E1] | [S2] [E2] | [LEN 1] [LEN 2] | [% IDY] | [TAGS]
=====================================================================================
959335 959806 | 169 640 | 472 472 | 80.84 | LmjF.34 ULAVAL|LtaPseq521
322990 324081 | 1436 342 | 1092 1095 | 83.86 | LmjF.12 ULAVAL|LtaPseq501
324083 324327 | 245 1 | 245 245 | 91.84 | LmjF.12 ULAVAL|LtaPseq501
1097873 1098325 | 892 437 | 453 456 | 76.75 | LmjF.32 ULAVAL|LtaPseq491
1098566 1098772 | 207 4 | 207 204 | 75.60 | LmjF.32 ULAVAL|LtaPseq491
This looks like Fixed Width Formatted data, and can be easily read in with read.fwf - the tricky bit might be getting rid of the | marks. WHat do you want to do with the [TAGS] section?
Here I work out the widths of each field, add some fields (length 3) to skip over the | markers, read it in, then use negative column subsetting to drop the separator columns:
> widths=c(8,9,3,9,9,3,9,9,3,9,3,100)
> read.fwf("data.txt",widths=widths,skip=2)[,-c(3,6,9,11)]
V1 V2 V4 V5 V7 V8 V10 V12
1 959335 959806 169 640 472 472 80.84 LmjF.34 ULAVAL|LtaPseq521
2 322990 324081 1436 342 1092 1095 83.86 LmjF.12 ULAVAL|LtaPseq501
3 324083 324327 245 1 245 245 91.84 LmjF.12 ULAVAL|LtaPseq501
4 1097873 1098325 892 437 453 456 76.75 LmjF.32 ULAVAL|LtaPseq491
5 1098566 1098772 207 4 207 204 75.60 LmjF.32 ULAVAL|LtaPseq491
You might want to split the tags into two columns - just work out the width of each part and add field widths to the widths vector. An exercise for the reader.
Note this only works if the file is spaced out with space characters and NOT tab characters...
Read the file using readLines or Scan
test <-' [S1] [E1] | [S2] [E2] | [LEN 1] [LEN 2] |
[% IDY] | [TAGS]
=====================================================================================
959335 959806 | 169 640 | 472 472 | 80.84 | LmjF.34 ULAVAL|LtaPseq521
322990 324081 | 1436 342 | 1092 1095 | 83.86 | LmjF.12 ULAVAL|LtaPseq501
324083 324327 | 245 1 | 245 245 | 91.84 | LmjF.12 ULAVAL|LtaPseq501
1097873 1098325 | 892 437 | 453 456 | 76.75 | LmjF.32 ULAVAL|LtaPseq491
1098566 1098772 | 207 4 | 207 204 | 75.60 | LmjF.32 ULAVAL|LtaPseq491'
test2 <- gsub('|',' ',test, fixed=TRUE)
test2 <- gsub('=','',test2, fixed=TRUE)
test3 <- gsub('[ \t]{2,8}',';',test2,perl=TRUE)
test3 <- gsub('\n','',test3,perl=TRUE)
test4<-strsplit(test3,split=';')
test5<- data.frame(matrix(test4[[1]],ncol=9,
byrow=T),stringsAsFactors=FALSE)
colnames(test5)[1:8]<-test5[1,2:9]
test5<-test5[-1,]
the output:
test5
[S1] [E1] [S2] [E2] [LEN 1] [LEN 2] [% IDY] [TAGS] X9
2 959335 959806 169 640 472 472 80.84 LmjF.34 ULAVAL LtaPseq521
3 322990 324081 1436 342 1092 1095 83.86 LmjF.12 ULAVAL LtaPseq501
4 324083 324327 245 1 245 245 91.84 LmjF.12 ULAVAL LtaPseq501
5 1097873 1098325 892 437 453 456 76.75 LmjF.32 ULAVAL LtaPseq491
6 1098566 1098772 207 4 207 204 75.60 LmjF.32 ULAVAL LtaPseq491
In this way is not simple with R.
If you are using UNIX, they are simple scripts (like in awk) to converts also large files before import it (i use this tecnique).