After upgrade Rails 3 .0. to 7.0 I am getting retrieve_connection': No connection pool for 'ActiveRecord::Base' found - ruby-on-rails-7

Traceback (most recent call last):
31: from script/rails:6:in `<main>'
30: from script/rails:6:in `require'
29: from /var/lib/gems/2.7.0/gems/railties-7.0.3/lib/rails/commands.rb:18:in `<top (required)>'
28: from /var/lib/gems/2.7.0/gems/railties-7.0.3/lib/rails/command.rb:48:in `invoke'
27: from /var/lib/gems/2.7.0/gems/railties-7.0.3/lib/rails/command/base.rb:87:in `perform'
26: from /var/lib/gems/2.7.0/gems/thor-1.2.1/lib/thor.rb:392:in `dispatch'
25: from /var/lib/gems/2.7.0/gems/thor-1.2.1/lib/thor/invocation.rb:127:in `invoke_command'
24: from /var/lib/gems/2.7.0/gems/thor-1.2.1/lib/thor/command.rb:27:in `run'
23: from /var/lib/gems/2.7.0/gems/railties-7.0.3/lib/rails/commands/server/server_command.rb:134:in `perform'
22: from /var/lib/gems/2.7.0/gems/railties-7.0.3/lib/rails/commands/server/server_command.rb:134:in `tap'
21: from /var/lib/gems/2.7.0/gems/railties-7.0.3/lib/rails/commands/server/server_command.rb:137:in `block in perform'
20: from /var/lib/gems/2.7.0/gems/railties-7.0.3/lib/rails/commands/server/server_command.rb:137:in `require'
19: from /home/sharique/Desktop/CodeSlash/config/application.rb:21:in `<top (required)>'
18: from /usr/lib/ruby/2.7.0/bundler.rb:174:in `require'
17: from /usr/lib/ruby/2.7.0/bundler/runtime.rb:58:in `require'
16: from /usr/lib/ruby/2.7.0/bundler/runtime.rb:58:in `each'
15: from /usr/lib/ruby/2.7.0/bundler/runtime.rb:69:in `block in require'
14: from /usr/lib/ruby/2.7.0/bundler/runtime.rb:69:in `each'
13: from /usr/lib/ruby/2.7.0/bundler/runtime.rb:74:in `block (2 levels) in require'
12: from /usr/lib/ruby/2.7.0/bundler/runtime.rb:74:in `require'
11: from /var/lib/gems/2.7.0/gems/acts_as_audited-2.0.0/lib/acts_as_audited.rb:36:in `<top (required)>'
10: from /var/lib/gems/2.7.0/gems/acts_as_audited-2.0.0/lib/acts_as_audited.rb:36:in `require'
9: from /var/lib/gems/2.7.0/gems/acts_as_audited-2.0.0/lib/acts_as_audited/audit.rb:12:in `<top (required)>'
8: from /var/lib/gems/2.7.0/gems/acts_as_audited-2.0.0/lib/acts_as_audited/audit.rb:25:in `<class:Audit>'
7: from /var/lib/gems/2.7.0/gems/activerecord-7.0.3/lib/active_record/querying.rb:22:in `order'
6: from /var/lib/gems/2.7.0/gems/activerecord-7.0.3/lib/active_record/relation/query_methods.rb:429:in `order'
5: from /var/lib/gems/2.7.0/gems/activerecord-7.0.3/lib/active_record/relation/query_methods.rb:434:in `order!'
4: from /var/lib/gems/2.7.0/gems/activerecord-7.0.3/lib/active_record/relation/query_methods.rb:1591:in `preprocess_order_args'
3: from /var/lib/gems/2.7.0/gems/activerecord-7.0.3/lib/active_record/relation/delegation.rb:93:in `connection'
2: from /var/lib/gems/2.7.0/gems/activerecord-7.0.3/lib/active_record/connection_handling.rb:280:in `connection'
1: from /var/lib/gems/2.7.0/gems/activerecord-7.0.3/lib/active_record/connection_handling.rb:313:in `retrieve_connection'
/var/lib/gems/2.7.0/gems/activerecord-7.0.3/lib/active_record/connection_adapters/abstract/connection_handler.rb:208:in `retrieve_connection': No connection pool for 'ActiveRecord::Base' found. (ActiveRecord::ConnectionNotEstablished)

Rails 3 -> 7 is not going to be a smooth upgrade, there's a LOT of work involved. I mean script/rails doesn't even exist anymore.
Honestly, 3 is so old there's no "right way" to upgrade from 3 to 7. I think the easiest option is to create a blank Rails 3 project, and a blank Rails 7 project, and then use diff to look at the differences and apply those differences to your project. Then you'll still need to upgrade most of the gems in the project.
Just to put you in the ballpark, I've done these sorts of upgrades professionally, and the time required is around the 40-80 hour mark.

Related

Unexpected plot when using geom_area to plot multiple stacked levels

I am using geom_area to make a plot that shows binned time series with multiple stacked levels (the bins are 15 minutes long each). The resulting plot seems to have some sort of glitch. I expect the areas for the different levels to be stacked, but instead there is a diagonal red line (corresponding to level 'g') that crosses the plot (see image). At t = 16:10:00 I would expect to see some blue area (corresponding to level 'v'). Instead there is just an empty triangle.
In addition to that issue, the time series contain a gap:
17: "2017-07-23 21:10:00" t 3611
18: "2017-07-24 01:25:00" t 6676
There are no events between these two times, so I would expect the area to be zero until t = 01:25:00. Instead, the plot shows a linear slope starting at (21:10:00, 3611) and ending at (01:25:00, 6676). I suppose this might be fixed if I add the missing intervals in the gap and set them to zero. But, I wonder if there is any easier way to do so.
I am using R version 3.4.1 (2017-06-30) and ggplot2 version 2.2.1.
The following example should reproduce the issues:
library(data.table)
library(ggplot2)
txt <- 'time requester count
1: "2017-07-23 17:40:00" t 6289
2: "2017-07-23 17:55:00" t 7161
3: "2017-07-23 18:10:00" t 7444
4: "2017-07-23 18:25:00" t 7121
5: "2017-07-23 18:40:00" t 6677
6: "2017-07-23 18:55:00" t 6604
7: "2017-07-23 19:10:00" t 7079
8: "2017-07-23 19:25:00" t 6856
9: "2017-07-23 19:40:00" t 6663
10: "2017-07-23 19:55:00" t 6829
11: "2017-07-23 20:10:00" t 6945
12: "2017-07-23 20:25:00" t 6876
13: "2017-07-23 20:25:00" g 5
14: "2017-07-23 20:40:00" t 7087
15: "2017-07-23 20:40:00" g 1
16: "2017-07-23 20:55:00" t 6752
17: "2017-07-23 21:10:00" t 3611
18: "2017-07-24 01:25:00" t 6676
19: "2017-07-24 01:40:00" t 7100
20: "2017-07-24 01:55:00" t 7192
21: "2017-07-24 02:10:00" t 7640
22: "2017-07-24 02:25:00" t 7543
23: "2017-07-24 02:40:00" t 7289
24: "2017-07-24 02:55:00" t 7170
25: "2017-07-24 03:10:00" t 7022
26: "2017-07-24 03:25:00" t 7524
27: "2017-07-24 03:40:00" t 7285
28: "2017-07-24 03:55:00" t 6834
29: "2017-07-24 04:10:00" t 6035
30: "2017-07-24 04:25:00" t 7055
31: "2017-07-24 04:40:00" t 6072
32: "2017-07-24 04:55:00" t 5737
33: "2017-07-24 05:10:00" t 5847
34: "2017-07-24 05:25:00" t 5838
35: "2017-07-24 05:40:00" t 5282
36: "2017-07-24 05:55:00" t 5467
37: "2017-07-24 06:10:00" t 5502
38: "2017-07-24 06:25:00" t 5328
39: "2017-07-24 06:40:00" t 4752
40: "2017-07-24 06:55:00" t 4720
41: "2017-07-24 07:10:00" t 3994
42: "2017-07-24 07:25:00" t 3926
43: "2017-07-24 07:40:00" t 3003
44: "2017-07-24 07:55:00" t 3183
45: "2017-07-24 08:10:00" t 3155
46: "2017-07-24 08:25:00" t 3642
47: "2017-07-24 08:40:00" t 4251
48: "2017-07-24 08:55:00" t 4064
49: "2017-07-24 09:10:00" t 4032
50: "2017-07-24 09:25:00" t 3722
51: "2017-07-24 09:40:00" t 3897
52: "2017-07-24 09:55:00" t 4351
53: "2017-07-24 10:10:00" t 4655
54: "2017-07-24 10:25:00" t 4676
55: "2017-07-24 10:40:00" t 4961
56: "2017-07-24 10:55:00" t 4669
57: "2017-07-24 11:10:00" t 4426
58: "2017-07-24 11:10:00" g 13
59: "2017-07-24 11:25:00" t 5387
60: "2017-07-24 11:40:00" t 5323
61: "2017-07-24 11:55:00" t 4818
62: "2017-07-24 12:10:00" t 4554
63: "2017-07-24 12:10:00" g 6
64: "2017-07-24 12:25:00" t 5000
65: "2017-07-24 12:40:00" t 4597
66: "2017-07-24 12:55:00" t 5196
67: "2017-07-24 12:55:00" g 2
68: "2017-07-24 13:10:00" t 4964
69: "2017-07-24 13:10:00" g 2
70: "2017-07-24 13:25:00" t 4922
71: "2017-07-24 13:25:00" g 2
72: "2017-07-24 13:40:00" t 4843
73: "2017-07-24 13:55:00" t 4803
74: "2017-07-24 13:55:00" g 50
75: "2017-07-24 14:10:00" t 4828
76: "2017-07-24 14:25:00" t 4750
77: "2017-07-24 14:25:00" g 1
78: "2017-07-24 14:40:00" t 4873
79: "2017-07-24 14:40:00" g 3
80: "2017-07-24 14:55:00" t 4679
81: "2017-07-24 15:10:00" t 5262
82: "2017-07-24 15:10:00" g 17
83: "2017-07-24 15:25:00" t 5396
84: "2017-07-24 15:25:00" g 59
85: "2017-07-24 15:40:00" t 5312
86: "2017-07-24 15:55:00" t 5171
87: "2017-07-24 16:10:00" t 5570
88: "2017-07-24 16:10:00" v 48
89: "2017-07-24 16:25:00" t 5606
90: "2017-07-24 16:40:00" t 5041
91: "2017-07-24 16:40:00" g 20
92: "2017-07-24 16:55:00" t 5292
93: "2017-07-24 16:55:00" g 12
94: "2017-07-24 17:10:00" t 5233
95: "2017-07-24 17:10:00" g 2
96: "2017-07-24 17:25:00" t 5355
97: "2017-07-24 17:25:00" g 24
98: "2017-07-24 17:40:00" t 316
99: "2017-07-24 17:40:00" g 9'
dt <- data.table(read.table(text=txt, header=T))
dt[, time := as.POSIXct(time, tz='UTC')]
pl <- ggplot(dt, aes(x = time, y = count)) +
geom_area(stat = 'identity', aes(fill = requester))
print(pl)
In your data you have one value per row. For the stacked area plot, however, you need information for all three requester types per row, even if it is zero.
For that purpose, you need to reshape your data to create 0 where no count is available.
This code included the 'reshape' part will create the stacked area graph:
library(data.table)
library(ggplot2)
library(reshape2)
# insert your data as above
dt <- data.table(read.table(text=txt, header=T))
dt[, time := as.POSIXct(time, tz='UTC')]
####### NEW: Reshaping ########
#reshape your data from long to wide format
data_wide <- dcast(dt, time ~ requester, value.var="count")
data_wide[is.na(data_wide)] <- 0 #replace all NA with 0
#reshape your long data included 0 back to wide format
data_long <- melt(data_wide, id.vars = c("time"),
variable.name = "requester",
value.name = "count")
##############################
# produce the stacked area graph
pl <- ggplot(data_long, aes(x = time, y=count)) +
geom_area(stat = 'identity', aes(fill = requester))
print(pl)
Regarding the gap in your data, I assume you need to include those rows with the time data in your data frame and fill the corresponding count values with 0.

Origin and destination with R

I have a table listing the place where an Id stopped (sorting by time).
df <- structure(list(Location = c("enter_Skagen", "Nordjyllands-Vaerket",
"exit_Skagen", "enter_Skagen", "Nordjyllands-Vaerket", "exit_Skagen",
"enter_Skagen", "Nordjyllands-Vaerket", "exit_Skagen", "enter_Skagen",
"Nordjyllands-Vaerket", "exit_Skagen", "enter_Skagen", "Nordjyllands-Vaerket",
"exit_Skagen", "enter_Skagen", "Aarhus", "Fredericia", "Copenhagen",
"exit_Skagen"), Ship = c(8131180L, 8131180L, 8131180L, 8131180L,
8131180L, 8131180L, 8131180L, 8131180L, 8131180L, 8131180L, 8131180L,
8131180L, 8131180L, 8131180L, 8131180L, 8201674L, 8201674L, 8201674L,
8201674L, 8201674L)), .Names = c("Location", "Id"), class = "data.frame", row.names = c(61702L,
61698L, 61699L, 61703L, 61704L, 61705L, 61700L, 61707L, 61711L,
61697L, 61701L, 61710L, 61708L, 61709L, 61706L, 63055L, 63053L,
63045L, 63103L, 63159L))
I would like to have a matrix counting the number of moves between the different locations per Id. As a first step, I tried to edit the table to have two column from and to.
I tried to split by Id and then to transform with the following lines:
spl <- split(df, df$Id)
move.spl <- lapply(spl, function(x) {
ret <- data.frame(from=head(df$Location, -1), to=tail(df$Location, -1),
#year=ceiling((head(x$year, -1)+tail(x$year, -1))/2),
#id=head(x$id, -1),
stringsAsFactors=FALSE)
})
moves <- rbindlist(move.spl)
It gives as output:
> moves
from to
1: enter_Skagen Nordjyllands-Vaerket
2: Nordjyllands-Vaerket exit_Skagen
3: exit_Skagen enter_Skagen
4: enter_Skagen Nordjyllands-Vaerket
5: Nordjyllands-Vaerket exit_Skagen
6: exit_Skagen enter_Skagen
7: enter_Skagen Nordjyllands-Vaerket
8: Nordjyllands-Vaerket exit_Skagen
9: exit_Skagen enter_Skagen
10: enter_Skagen Nordjyllands-Vaerket
11: Nordjyllands-Vaerket exit_Skagen
12: exit_Skagen enter_Skagen
13: enter_Skagen Nordjyllands-Vaerket
14: Nordjyllands-Vaerket exit_Skagen
15: exit_Skagen enter_Skagen
16: enter_Skagen Aarhus
17: Aarhus Fredericia
18: Fredericia Copenhagen
19: Copenhagen exit_Skagen
20: enter_Skagen Nordjyllands-Vaerket
21: Nordjyllands-Vaerket exit_Skagen
22: exit_Skagen enter_Skagen
23: enter_Skagen Nordjyllands-Vaerket
24: Nordjyllands-Vaerket exit_Skagen
25: exit_Skagen enter_Skagen
26: enter_Skagen Nordjyllands-Vaerket
27: Nordjyllands-Vaerket exit_Skagen
28: exit_Skagen enter_Skagen
29: enter_Skagen Nordjyllands-Vaerket
30: Nordjyllands-Vaerket exit_Skagen
31: exit_Skagen enter_Skagen
32: enter_Skagen Nordjyllands-Vaerket
33: Nordjyllands-Vaerket exit_Skagen
34: exit_Skagen enter_Skagen
35: enter_Skagen Aarhus
36: Aarhus Fredericia
37: Fredericia Copenhagen
38: Copenhagen exit_Skagen
from to
On the line 15, it should not be like this because the id is different.
After line 15, it is doing well for the next Id but after it goes completely bananas.
(than I create the matrix of origin / destination with
a <- table(moves$from, moves$to)
a <- data.table(a)
colnames(a) <- c("from","to", "N")
# create matrix
matrix <- dcast(a, from ~to, value.var = "N")
but this is a final step)
the results of the moves are a bit odd. I am not sure that this working and splitting per Id I guess it just taking the list as a whole but not checking per Id.
Is it possible to have a better result taking into account the Id?

Identifying a cluster of low transit speeds in GPS tracking data

I'm working with a GPS tracking dataset, and I've been playing around with filtering the dataset based on speed and time of day. The species I am working with becomes inactive around dusk, during which it rests on the ocean's surface, but then resumes activity once night has fallen. For each animal in the dataset, I would like to remove all data points after it initially becomes inactive around dusk (21:30). But because each animal becomes inactive at different times, I cannot simply filter out all the data points occurring after 21:30.
My data looks like this...
AnimalID Latitude Longitude Speed Date
99B 50.86190 -129.0875 5.6 2015-05-14 21:26:00
99B 50.86170 -129.0875 0.6 2015-05-14 21:32:00
99B 50.86150 -129.0810 0.5 2015-05-14 21:33:00
99B 50.86140 -129.0800 0.3 2015-05-14 21:40:00
99C.......
Essentially, I want to find a cluster of GPS positions (say, a minimum of 5), occurring after 21:30:00, that all have speeds of <0.8. I then want to delete all points after this point (including the identified cluster).
Does anyone know a way of identifying clusters of points in R? Or is this type of filtering WAY to complex?
Using data.table, you can use a rolling forward/backwards max to find the max of the next five or previous five entries by animal ID. Then, filter out any that don't meet the criteria. For example:
library(data.table)
set.seed(40)
DT <- data.table(Speed = runif(1:1000), AnimalID = rep(c("A","B"), each = 500))
DT[ , FSpeed := Reduce(pmax,shift(Speed,0:4, type = "lead", fill = 1)), by = .(AnimalID)] #0 + 4 forward
DT[ , BSpeed := Reduce(pmax,shift(Speed,0:4, type = "lag", fill = 1)), by = .(AnimalID)] #0 + 4 backwards
DT[FSpeed < 0.5 | BSpeed < 0.5] #min speed
Speed AnimalID FSpeed BSpeed
1: 0.220509197 A 0.4926640 0.8897597
2: 0.225883211 A 0.4926640 0.8897597
3: 0.264809801 A 0.4926640 0.6648507
4: 0.184270587 A 0.4926640 0.6589303
5: 0.492664002 A 0.4926640 0.4926640
6: 0.472144689 A 0.4721447 0.4926640
7: 0.254635219 A 0.7409803 0.4926640
8: 0.281538568 A 0.7409803 0.4926640
9: 0.304875597 A 0.7409803 0.4926640
10: 0.059605991 A 0.7409803 0.4721447
11: 0.132069268 A 0.2569604 0.9224052
12: 0.256960449 A 0.2569604 0.9224052
13: 0.005059727 A 0.8543111 0.2569604
14: 0.191478376 A 0.8543111 0.2569604
15: 0.170969244 A 0.4398143 0.7927442
16: 0.059577719 A 0.4398143 0.7927442
17: 0.439814267 A 0.4398143 0.7927442
18: 0.307714603 A 0.9912536 0.4398143
19: 0.075750773 A 0.9912536 0.4398143
20: 0.100589403 A 0.9912536 0.4398143
21: 0.032957748 A 0.4068012 0.7019594
22: 0.080091554 A 0.4068012 0.7019594
23: 0.406801193 A 0.9761119 0.4068012
24: 0.057445020 A 0.9761119 0.4068012
25: 0.308382143 A 0.4516870 0.9435490
26: 0.451686996 A 0.4516870 0.9248595
27: 0.221964923 A 0.4356419 0.9248595
28: 0.435641917 A 0.5363373 0.4516870
29: 0.237658906 A 0.5363373 0.4516870
30: 0.324597512 A 0.9710011 0.4356419
31: 0.357198893 B 0.4869905 0.9226573
32: 0.486990475 B 0.4869905 0.9226573
33: 0.115922994 B 0.4051843 0.9226573
34: 0.010581766 B 0.9338841 0.4869905
35: 0.003976893 B 0.9338841 0.4869905
36: 0.405184342 B 0.9338841 0.4051843
37: 0.412468699 B 0.4942280 0.9113595
38: 0.402063509 B 0.4942280 0.9113595
39: 0.494228013 B 0.8254665 0.4942280
40: 0.123264949 B 0.8254665 0.4942280
41: 0.251132449 B 0.4960371 0.9475821
42: 0.496037128 B 0.8845043 0.4960371
43: 0.250853014 B 0.3561290 0.9858652
44: 0.356129033 B 0.3603769 0.8429552
45: 0.225943145 B 0.7028077 0.3561290
46: 0.360376907 B 0.7159759 0.3603769
47: 0.169606203 B 0.3438164 0.9745535
48: 0.343816363 B 0.4396962 0.9745535
49: 0.067265545 B 0.9641856 0.3438164
50: 0.439696243 B 0.9641856 0.4396962
51: 0.024403516 B 0.3730828 0.9902976
52: 0.373082846 B 0.4713596 0.9902976
53: 0.290466668 B 0.9689225 0.3730828
54: 0.471359568 B 0.9689225 0.4713596
55: 0.402111615 B 0.4902595 0.8045104
56: 0.490259530 B 0.8801029 0.4902595
57: 0.477884140 B 0.4904800 0.6696598
58: 0.490480001 B 0.8396014 0.4904800
Speed AnimalID FSpeed BSpeed
This shows all the clusters where either the following or previous four (+ the anchor cell) all have a max speed below our min speed (in this case 0.5)
In your code, just run DT <- as.data.table(myDF) where myDF is the name of the data.frame you are using.
For this analysis, we assume that GPS measurements are measured at constant intervals. I also am throwing out the first 4 and last 4 observations by setting fill=1. You should set fill= to your max speed.

rbindlist and nested data.table, different behavior with/without using get

I am loading some JSON data using jsonlite which is resulting in some nested data similar (in structure) to the toy data.table dt constructed below. I want to be able to use rbindlist to bind the nested data.tables together.
Setup:
> dt <- data.table(a=c("abc", "def", "ghi"), b=runif(3))
> dt[, c:=list(list(data.table(d=runif(4), e=runif(4))))]
> dt
a b c
1: abc 0.2623218 <data.table>
2: def 0.7092507 <data.table>
3: ghi 0.2795103 <data.table>
Using the NSE built into data.table, I can do:
> rbindlist(dt[, c])
d e
1: 0.8420476 0.26878325
2: 0.1704087 0.59654706
3: 0.6023655 0.42590380
4: 0.9528841 0.06121386
5: 0.8420476 0.26878325
6: 0.1704087 0.59654706
7: 0.6023655 0.42590380
8: 0.9528841 0.06121386
9: 0.8420476 0.26878325
10: 0.1704087 0.59654706
11: 0.6023655 0.42590380
12: 0.9528841 0.06121386
which is exactly what I expect/want. Furthermore, the original dt remains unmodified:
> dt
a b c
1: abc 0.2623218 <data.table>
2: def 0.7092507 <data.table>
3: ghi 0.2795103 <data.table>
However, when manipulating the data.table within a function I generally want to use get with string column names:
> rbindlist(dt[, get("c")])
V1 V2
1: 0.8420476 0.26878325
2: 0.1704087 0.59654706
3: 0.6023655 0.42590380
4: 0.9528841 0.06121386
5: 0.8420476 0.26878325
6: 0.1704087 0.59654706
7: 0.6023655 0.42590380
8: 0.9528841 0.06121386
9: 0.8420476 0.26878325
10: 0.1704087 0.59654706
11: 0.6023655 0.42590380
12: 0.9528841 0.06121386
Now the column names have been lost and replaced by the default "V1" and "V2" values. Is there a way to retain the names?
In the development version (v1.9.5) the problem is worse than simply lost names though. After executing the statement: rbindlist(dt[, get("c")]) the entire data.table becomes corrupt:
> dt
Error in FUN(X[[3L]], ...) :
Invalid column: it has dimensions. Can't format it. If it's the result of data.table(table()), use as.data.table(table()) instead.
To be clear, the lost names issue happens in both v1.9.4 (installed from CRAN) and v1.9.5 (installed from github), but the corrupt data.table issue seems to affect v1.9.5 only (as of today - July 8, 2015).
If I were able to stick with the NSE version of things everything runs smoothly. My issue is that sticking with the NSE version would involve writing multiple NSE functions calling each other which seems to get messy pretty fast.
Are there any (non-NSE-based) known work-arounds? Also, is this a known issue?
This must have been fixed in last 5 years since this Q was asked. Now I am getting expected results.
> library(data.table)
data.table 1.13.3 IN DEVELOPMENT built 2020-11-17 18:11:47 UTC; jan using 4 threads (see ?getDTthreads). Latest news: r-datatable.com
> dt <- data.table(a=c("abc", "def", "ghi"), b=runif(3))
> dt[, c:=list(list(data.table(d=runif(4), e=runif(4))))]
> dt
a b c
1: abc 0.2416624 <data.table[4x2]>
2: def 0.0222938 <data.table[4x2]>
3: ghi 0.3510681 <data.table[4x2]>
> rbindlist(dt[, c])
d e
1: 0.5485731 0.32366420
2: 0.5457945 0.45173251
3: 0.6796699 0.03783026
4: 0.4442776 0.03121024
5: 0.5485731 0.32366420
6: 0.5457945 0.45173251
7: 0.6796699 0.03783026
8: 0.4442776 0.03121024
9: 0.5485731 0.32366420
10: 0.5457945 0.45173251
11: 0.6796699 0.03783026
12: 0.4442776 0.03121024
> rbindlist(dt[, get("c")])
d e
1: 0.5485731 0.32366420
2: 0.5457945 0.45173251
3: 0.6796699 0.03783026
4: 0.4442776 0.03121024
5: 0.5485731 0.32366420
6: 0.5457945 0.45173251
7: 0.6796699 0.03783026
8: 0.4442776 0.03121024
9: 0.5485731 0.32366420
10: 0.5457945 0.45173251
11: 0.6796699 0.03783026
12: 0.4442776 0.03121024
> dt
a b c
1: abc 0.2416624 <data.table[4x2]>
2: def 0.0222938 <data.table[4x2]>
3: ghi 0.3510681 <data.table[4x2]>

interpolation of grouped data using data.table

This is a continuation of a question that I had originally posted at
http://r.789695.n4.nabble.com/subset-between-data-table-list-and-single-data-table-object-tp4673202.html . Matthew had suggested that I post my question here so I am doing that now.
This is my input below:
library(data.table)
library(pracma) # for the interp1 function
tempbigdata1 <- data.table(c(14.80, 14.81, 14.82), c(7900, 7920, 7930), c("02437100", "02437100", "02437100"))
tempbigdata2 <- data.table(c(9.98, 9.99, 10.00), c(816, 819, 821), c("02446500", "02446500", "02446500"))
tempbigdata3 <- data.table(c(75.65, 75.66, 75.67), c(23600, 23700, 23800), c("02467000", "02467000", "02467000"))
tempsbigdata <- rbind(tempbigdata1, tempbigdata2, tempbigdata3)
setnames(tempsbigdata,c("y", "x", "site_no"))
setkey(tempsbigdata, site_no)
tempsbigdata
y x site_no
1: 14.80 7900 02437100
2: 14.81 7920 02437100
3: 14.82 7930 02437100
4: 9.98 816 02446500
5: 9.99 819 02446500
6: 10.00 821 02446500
7: 75.65 23600 02467000
8: 75.66 23700 02467000
9: 75.67 23800 02467000
aimsmall <- data.table(c("02437100", "02446500", "02467000"), c(3882.65, 819.82, 23742.37), c(1830.0, 382.0, 10400.0))
setnames(aimsmall,c("site_no", "mean", "p50"))
setkey(aimsmall, site_no)
aimsmall
site_no mean p50
1: 02437100 3882.65 1830
2: 02446500 819.82 382
3: 02467000 23742.37 10400
I am using this code to generate the interpolated tempsbigdata$y using the aimsmall$mean values by the site_no:
meanpre <- tempsbigdata[,if(aimsmall$mean > min(tempsbigdata$x){
interp1(tempsbigdata$x, tempsbigdata$y,
xi = aimsmall$mean, method ="linear")},by=site_no]
This is the output from the function meanpre, but it is not correct.
meanpre
site_no V1
1: 02437100 12.07599
2: 02437100 9.99410
3: 02437100 19.56813
4: 02446500 12.07599
5: 02446500 9.99410
6: 02446500 19.56813
7: 02467000 12.07599
8: 02467000 9.99410
9: 02467000 19.56813
This is what I would like to get:
meanpre
site_no V1
1: 02446500 9.99
2: 02467000 75.66
Any suggestions? Thank you.
UPDATE 1:
Hugh, I used the approx function in the past and it is not accurate for my data; however, the interp1 function in pracma is accurate. The mean and p50 columns in aimsmall & the x values in tempsbigdata are discharge values. The y in tempsbigdata represent gage heights. I am using the interp1 function to determine the appropriate gage height or y value for the discharge values or mean (and p50).
Frank, thank you for your advice and suggested code. This is the output for your suggested code:
tempsbigdata[aimsmall][,if(mean[1] > min(x)){interp1(tempsbigdata$x,tempsbigdata$y, xi = aimsmall$mean, method ="linear")},by=site_no]
site_no V1
1: 02446500 12.07599
2: 02446500 9.99410
3: 02446500 75.66424
4: 02467000 12.07599
5: 02467000 9.99410
6: 02467000 75.66424
When I run the following code I get the result below:
interp1(tempsbigdata$x, tempsbigdata$y, xi = aimsmall$mean, method ="linear")
[1] 12.07599 9.99410 75.66424
Is there any way to get this in return? Thank you.
site_no V1
1: 02446500 9.99
2: 02467000 75.66
UPDATE 2
Frank, thank you and I have added the code to make it easier to have the data in R. Pracma is an R package of numerical method routines that were ported from GNU Octave [similar to MATLAB(R)] to R. The interp1 function is a one-dimensional interpolation function.
Frank, that was perfect (your last comment about the R code for "do stuff"):
tempsbigdata[aimsmall][,if(mean[1] > min(x)){interp1(x, y, xi = mean[1], method ="linear")},by=site_no]
site_no V1
1: 02446500 9.99410
2: 02467000 75.66424

Resources