Folium Heatmap Recursion Error based on location coordinates - recursion

I am attempting to make a heat map through folium with my data. Below is my code but I keep getting an error stating: RecursionError: maximum recursion depth exceeded and I have no clue what that means. Any input? Below is the code for the heatmap.
# Creating a dataframe of the 'month', 'day_of_week' and 'location' day_month = pd.DataFrame(df_criclean[['month', 'day_of_week','location']])
day_month.sort_values('month', ascending = False).head(10)
# Trying to use folium to make a heatmap of the data I have in 'day_month'
map = folium.Map(location=[42.3601, -71.0589], [enter image description here][1]tiles='cartodbpositron', zoom_start=1)
HeatMap(day_month['location']).add_to(map)

I also have this "bug", I think its related to variables being of type object instead of float64 or other base types (my dataset had a lot of blanks "" instead of valid GPS coordinates).
#> ./folium-test.py
--------------------
_id city daily_rain date ... wind_degrees wind_dir wind_speed wind_string
0 {'$oid': '5571aaa8e4b07aa3c1c4e231'} NaN 0 2015-06-05 15:56:00 ... NaN NaN 4.8 NaN
1 {'$oid': '5571aaa9e4b07aa3c1c4e232'} NaN 0 2015-06-05 15:56:00 ... NaN NaN 1.6 NaN
2 {'$oid': '5571aaa9e4b07aa3c1c4e233'} NaN 0 2015-06-05 15:56:00 ... NaN NaN 11.3 NaN
3 {'$oid': '5571aaa9e4b07aa3c1c4e234'} NaN 0 2015-06-05 15:56:00 ... NaN NaN 13 NaN
4 {'$oid': '5571aaa9e4b07aa3c1c4e235'} NaN 0 2015-06-05 15:56:00 ... NaN NaN 5 NaN
[5 rows x 18 columns]
(500, 18)
--------------------
0 8.48402346662349
1 8.15408706665039
2 9.81855869293213
3 9.83495235443115
4 9.92164134979248
5 9.26684331789147
6 9.59504252663464
7 9.07091170549393
8 8.99822786450386
9 8.9606299996376
10 8.93120750784874
11 9.02073669538368
12 8.912937
13
...
498 8.912937
499
Name: longitudine, Length: 500, dtype: object
0 44.3720632234234
1 43.9720632409982
2 44.1090045985169
3 44.1142735479457
4 44.145446252325
5 44.3377021234296
6 44.3773853328621
7 44.3798960485217
8 44.4051013957662
9 44.4094088501931
10 44.4160476104163
11 44.4527250625144
12 44.516321
13
...
498 44.516321
499
Name: latitudine, Length: 500, dtype: object
Traceback (most recent call last):
File "./folium-test.py", line 89, in <module>
folium.Marker([row["latitudine"], row["longitudine"]], popup=row["temperatura"]).add_to(marker_cluster)
File "/mnt/ros-data/venvs/meteo-viz/lib/python3.7/site-packages/folium/map.py", line 258, in __init__
self.location = _validate_coordinates(location)
File "/mnt/ros-data/venvs/meteo-viz/lib/python3.7/site-packages/folium/utilities.py", line 53, in _validate_coordinates
if _isnan(coordinates):
File "/mnt/ros-data/venvs/meteo-viz/lib/python3.7/site-packages/folium/utilities.py", line 79, in _isnan
return any(math.isnan(value) for value in _flatten(values))
File "/mnt/ros-data/venvs/meteo-viz/lib/python3.7/site-packages/folium/utilities.py", line 79, in <genexpr>
return any(math.isnan(value) for value in _flatten(values))
File "/mnt/ros-data/venvs/meteo-viz/lib/python3.7/site-packages/folium/utilities.py", line 71, in _flatten
for j in _flatten(i):
File "/mnt/ros-data/venvs/meteo-viz/lib/python3.7/site-packages/folium/utilities.py", line 71, in _flatten
for j in _flatten(i):
File "/mnt/ros-data/venvs/meteo-viz/lib/python3.7/site-packages/folium/utilities.py", line 71, in _flatten
for j in _flatten(i):
[Previous line repeated 982 more times]
File "/mnt/ros-data/venvs/meteo-viz/lib/python3.7/site-packages/folium/utilities.py", line 70, in _flatten
if _is_sized_iterable(i):
File "/mnt/ros-data/venvs/meteo-viz/lib/python3.7/site-packages/folium/utilities.py", line 32, in _is_sized_iterable
return isinstance(arg, abc.Sized) & isinstance(arg, abc.Iterable)
File "/mnt/ros-data/venvs/meteo-viz/lib/python3.7/abc.py", line 139, in __instancecheck__
return _abc_instancecheck(cls, instance)
RecursionError: maximum recursion depth exceeded in comparison
But if I add these lines to my code, folium works fine (even for large datasets):
df['longitudine'] = df['longitudine'].replace(r'\s+', np.nan, regex=True)
df['longitudine'] = df['longitudine'].replace(r'^$', np.nan, regex=True)
df['longitudine'] = df['longitudine'].fillna(-0.99999)
df['longitudine'] = pd.to_numeric(df['longitudine'])
df['latitudine'] = df['latitudine'].replace(r'\s+', np.nan, regex=True)
df['latitudine'] = df['latitudine'].replace(r'^$', np.nan, regex=True)
df['latitudine'] = df['latitudine'].fillna(-0.99999)
df['latitudine'] = pd.to_numeric(df['latitudine'])
This is the output:
#> ./folium-test.py
--------------------
_id city daily_rain date ... wind_degrees wind_dir wind_speed wind_string
0 {'$oid': '5571aaa8e4b07aa3c1c4e231'} NaN 0 2015-06-05 15:56:00 ... NaN NaN 4.8 NaN
1 {'$oid': '5571aaa9e4b07aa3c1c4e232'} NaN 0 2015-06-05 15:56:00 ... NaN NaN 1.6 NaN
2 {'$oid': '5571aaa9e4b07aa3c1c4e233'} NaN 0 2015-06-05 15:56:00 ... NaN NaN 11.3 NaN
3 {'$oid': '5571aaa9e4b07aa3c1c4e234'} NaN 0 2015-06-05 15:56:00 ... NaN NaN 13 NaN
4 {'$oid': '5571aaa9e4b07aa3c1c4e235'} NaN 0 2015-06-05 15:56:00 ... NaN NaN 5 NaN
[5 rows x 18 columns]
(500, 18)
--------------------
0 8.484023
1 8.154087
2 9.818559
3 9.834952
4 9.921641
5 9.266843
6 9.595043
7 9.070912
8 8.998228
9 8.960630
10 8.931208
11 9.020737
12 8.912937
13 -0.999990
...
498 8.912937
499 -0.999990
Name: longitudine, Length: 500, dtype: float64
0 44.372063
1 43.972063
2 44.109005
3 44.114274
4 44.145446
5 44.337702
6 44.377385
7 44.379896
8 44.405101
9 44.409409
10 44.416048
11 44.452725
12 44.516321
13 -0.999990
...
498 44.516321
499 -0.999990
Name: latitudine, Length: 500, dtype: float64
1 43.9720632409982 8.154087066650389 30.6

I got the same error until I converted my coordinates from str to float
df['lat'] = df['lat'].astype(float).fillna(0)
df['long'] = df['long'].astype(float).fillna(0)

If you just have a list of strings, then easiest might be to use np.array with dtype=float
tuple_lat_lon = list(zip(
np.array(myplot.gpsLatitude.split(','), dtype=float),
np.array(myplot.gpsLongitude.split(','), dtype=float)
))
Here , myplot is a TextField

# These are the top 20 'coordinates' according to the data.
sns.set(font_scale=1.25)
f, ax = plt.subplots(figsize=(15,8))
sns.countplot(y='location', data=df_criclean, order=df_criclean.location.value_counts().iloc[:20].index)
# Here, I'm making a Dataframe of the locations and the count. What you see below
# is the top 5 locations.
# I want to use this for my folium map.
df1 = df_criclean.groupby(["lat", "long", "location"]).size().reset_index(name='count')
df1['location'] = df1['location'].str.replace(',', '')
# Sort the count from highest count with location to lowest.
print(df1.sort_values(by = 'count', ascending=False).head())
# The DataFrame not sorted.
print(df1.head())
# convert to (n, 2) nd-array format for heatmap
locationArr = df1[['lat', 'long']].as_matrix()
m = folium.Map(location=[42.32, -71.0589], zoom_start=12)
m.add_child(plugins.HeatMap(locationArr, radius=9))
m`

I had this same problem and solved it by transforming my latitude and longitude values to floats:
import folium
import numpy as np
plot = folium.Map(location=[40, -95], zoom_start=4)
coords = np.random.rand(1000,2) * 100
for lat, lon in coords:
folium.Circle(location=[float(lat), float(lon)]).add_to(plot)

I got the same problem. I had to convert it to a numpy array using .to_numpy() method to get it to work.

Related

How to obtain frequency table in R from one data set, using intervals and breaks form a different data set?

I have tried to get a frequency table for one dataset ("sim") using the intervals and classes from another dataset ("obs") (both of the same type). I've tried using the table () function in R, but it doesn't give me the frequency of the dataset called "sim" using the "obs" intervals. There may be data that falls outside the range defined with "obs", the idea is that those are omitted. Is there a simple way to get the frequency table for this case?
Here is a sample of my data (vector):
X obs sim
1 1 11.2 8.44
2 2 22.5 15.51
3 3 26.0 20.08
4 4 28.1 23.57
5 5 29.0 26.46
6 6 29.5 28.95
...etc...
I leave you the lines of code:
# Set working directory
setwd("C:/Users/...")
# Vector has 2 set of data, "obs" and "sim"
vector <- read.csv("vector.csv", fileEncoding = 'UTF-8-BOM')
# Divide the range of "obs" into intervals, using Sturges for number of classes:
factor_obs <- cut(vector$obs, breaks=nclass.Sturges(vector$obs), include.lowest = T)
# Get a frequency table using the table() function for "obs"
obs_out <- as.data.frame(table(factor_obs))
obs_out <- transform(obs_out, cumFreq = cumsum(Freq), relative = prop.table(Freq))
# Get a frequency table using the table() function for "sim", using cut from "obs"
sim_out <- as.data.frame(table(factor_obs, vector$sim > 0))
This is what I get from "obs" frequency table:
> obs_out
factor_obs Freq cumFreq relative
1 [11.1,25.6] 2 2 0.04166667
2 (25.6,40.1] 10 12 0.20833333
3 (40.1,54.5] 17 29 0.35416667
4 (54.5,69] 4 33 0.08333333
5 (69,83.4] 8 41 0.16666667
6 (83.4,97.9] 5 46 0.10416667
7 (97.9,112] 2 48 0.04166667
This is what I get from "sim" frequency table:
> sim_out
factor_obs Var2 Freq
1 [11.1,25.6] TRUE 2
2 (25.6,40.1] TRUE 10
3 (40.1,54.5] TRUE 17
4 (54.5,69] TRUE 4
5 (69,83.4] TRUE 8
6 (83.4,97.9] TRUE 5
7 (97.9,112] TRUE 2
Which is the same frequency from the table for "obs".
The idea is that the elements of "sim" in each interval defined by the classes of "obs" are counted, and that extreme values ​​outside the ranges of "obs" are omitted.
It would be helpful if someone can guide me. Thanks a lot!!
You will need to define your own breakpoints since if you let cut do it, the values are not saved for you to use with the sim variable. First use dput(vector) to put the data in a simple form for R:
vector <- structure(list(X = 1:48, obs = c(11.2, 22.5, 26, 28.1, 29, 29.5,
30.8, 32, 33.5, 35, 35.5, 38.9, 41, 41, 41, 43, 43.51, 44, 46,
48.5, 50, 50, 50, 50, 50.8, 51.5, 51.5, 53, 54.4, 55, 57.5, 59.5,
66.9, 70.6, 74.2, 75, 77, 80.2, 81.5, 82, 83, 83.6, 85, 85.1,
93.8, 94, 106.7, 112.3), sim = c(8.44, 15.51, 20.08, 23.57, 26.46,
28.95, 31.16, 33.17, 35.02, 36.75, 38.37, 39.92, 41.39, 42.81,
44.19, 45.52, 46.82, 48.09, 49.34, 50.56, 51.78, 52.98, 54.18,
55.37, 56.55, 57.75, 58.94, 60.14, 61.36, 62.59, 63.83, 65.1,
66.4, 67.74, 69.11, 70.53, 72.01, 73.55, 75.18, 76.9, 78.75,
80.76, 82.98, 85.46, 88.35, 91.84, 96.41, 103.48)), class = "data.frame",
row.names = c(NA, -48L))
Now we need the number of categories and the breakpoints:
nbreaks <- nclass.Sturges(vector$obs)
minval <- min(vector$obs)
maxval <- max(vector$obs)
int <- round((maxval - minval) / nbreaks, 3) # round to 1 digit more thab obs or sim
brks <- c(minval, minval + seq(nbreaks-1) * int, maxval)
The table for the obs data:
factor_obs <- cut(vector$obs, breaks=brks, include.lowest=TRUE)
obs_out <- transform(table(factor_obs), cumFreq = cumsum(Freq), relative = prop.table(Freq))
print(obs_out, digits=3)
# factor_obs Freq cumFreq relative
# 1 [11.2,25.6] 2 2 0.0417
# 2 (25.6,40.1] 10 12 0.2083
# 3 (40.1,54.5] 17 29 0.3542
# 4 (54.5,69] 4 33 0.0833
# 5 (69,83.4] 8 41 0.1667
# 6 (83.4,97.9] 5 46 0.1042
# 7 (97.9,112] 2 48 0.0417
Now the sim data:
factor_sim <- cut(vector$sim, breaks=brks, include.lowest=TRUE)
sim_out <- transform(table(factor_sim), cumFreq = cumsum(Freq), relative = prop.table(Freq))
print(sim_out, digits=3)
# factor_sim Freq cumFreq relative
# 1 [11.2,25.6] 3 3 0.0638
# 2 (25.6,40.1] 8 11 0.1702
# 3 (40.1,54.5] 11 22 0.2340
# 4 (54.5,69] 11 33 0.2340
# 5 (69,83.4] 9 42 0.1915
# 6 (83.4,97.9] 4 46 0.0851
# 7 (97.9,112] 1 47 0.0213
Notice there are only 47 cases shown instead of 48 since one value is less then the minimum.
addmargins(table(factor_obs, factor_sim, useNA="ifany"))
# factor_sim
# factor_obs [11.2,25.6] (25.6,40.1] (40.1,54.5] (54.5,69] (69,83.4] (83.4,97.9] (97.9,112] <NA> Sum
# [11.2,25.6] 1 0 0 0 0 0 0 1 2
# (25.6,40.1] 2 8 0 0 0 0 0 0 10
# (40.1,54.5] 0 0 11 6 0 0 0 0 17
# (54.5,69] 0 0 0 4 0 0 0 0 4
# (69,83.4] 0 0 0 1 7 0 0 0 8
# (83.4,97.9] 0 0 0 0 2 3 0 0 5
# (97.9,112] 0 0 0 0 0 1 1 0 2
# Sum 3 8 11 11 9 4 1 1 48

Loop function to grab mean value into new data frame return subset error

I have large amount of file,but lets start with file 1:N. There are data frame inside each file. each file contain the same header but different in rows number including some error and missing number here and there. I want to get mean value from specific column (temp_c) from each file and make a new list/data frame from it. 'dat' below is one example from one of the file content. Please give me a hand..
head(dat)
X pres_hpa hght_m temp_c dwpt_c relh_pct mixr_g_kg drct_deg sknt_knot thta_k thte_k thtv_k
1 1 1008.0 16 24.0 19.1 74 14.00 230 7 296.5 337.1 299.0
2 2 1007.8 18 24.0 19.1 74 14.00 230 7 296.5 337.1 299.0
3 3 1000.0 88 23.8 18.8 74 13.85 229 8 296.9 337.2 299.4
4 4 975.7 304 24.4 17.8 67 13.34 225 10 299.7 338.9 302.1
5 5 970.0 355 24.6 17.6 65 13.23 224 11 300.4 339.4 302.7
6 6 909.5 914 21.7 14.7 64 11.67 210 19 302.9 337.8 305.1
date from_hr to_hr
1 1981-11-01 0 0
2 1981-11-01 0 0
3 1981-11-01 0 0
4 1981-11-01 0 0
5 1981-11-01 0 0
6 1981-11-01 0 0
y = 1978
N <- 3
for (i in 1:N) {
yr = y +(as.numeric(i))
yr = as.character(yr)
p <- paste0("c:/Users/climatology/yr/",yr,".csv")
print(p)
#read.csv
dat <- read.csv(p,header = TRUE, stringsAsFactors = F)
#filter
dat_sub <- filter(dat, pres_hpa == 1000)
dat_sub <- filter(dat_sub, hght_m > 0)
dat_sub <- filter(dat_sub, temp_c > 0)
#grab Mean Value into data frame
#m = sapply(dat_sub$temp_c,function(i)mean(dat_sub$temp_c))
data[i] = data.frame(index = i, year = as.numeric(yr), temp =
mean(dat_sub$temp_c))
}
Error in data[i] <- data.frame(index = i, year = as.numeric(yr), temp = mean(dat_sub$temp_c)) :
object of type 'closure' is not subsettable
data is not subsettable because it is a function. You try to select the first item of data but data is not define in your code (and is a function btw)

How to use arguments specified in a user-created R function?

this seems like a basic question; however, I am not sure if I am unable to word my question to search for the answer that I need.
This is the sample:
id2 sbp1 dbp1 age1 sbp2 dbp2 sex bmi1 bmi2 smoke drink exercise
1 1 134.5 89.5 40 146 84 2 21.74685 22.19658 1 0 1
2 4 128.5 89.5 48 125 70 1 24.61942 22.29476 1 0 0
3 5 105.5 64.5 42 121 80 2 22.15103 26.90204 1 0 0
4 8 116.5 79.5 39 107 72 2 21.08032 27.64403 0 0 1
5 9 106.5 73.5 26 132 81 2 21.26762 29.16131 0 0 0
6 10 120.5 81.5 34 130 85 1 24.91663 26.89427 1 1 0
I have this code here for a function I am making:
linreg.ols<- function(indat, dv, p1, p2, p3){
data<- read.csv(file= indat, header=T)
data[1:5,]
y<- data$dv
x <- as.matrix(data.frame(x0=rep(1,nrow(data)), x1=data$p1, x2=data$p2,
x3=data$p3))
inv<- solve(t(x)%*%x)
xy<- t(x)%*%y
betah<- inv%*%xy
print("Value of beta hat")
betah
}
And when I run my code with this line:
linreg.ols("bp.csv",sbp1,smoke,drink,exercise)
I get the following error:
Error in data.frame(x0 = rep(1, nrow(data)), x1 = data$p1, x2 = data$p2, :
arguments imply differing number of rows: 75, 0
I have a feeling that it's because of how I am extracting the p1, p2, and p3 columns on the line where I create the x variable.
EDIT: changed to y<-data$dv
EDIT: added on part of the sample. Also, I tried:
x <- as.matrix(data.frame(1,data[,c("p1","p2","p3")]))
But that returned the error:
Error in `[.data.frame`(data, , c("p1", "p2", "p3")) : undefined columns selected

How to make a 3D Mesh in RGL with shade3d or wire3d using tmesh3d in R

I have some data which I have collected.
It consists of Vertices and then Triangles which I have made using a meshing software.
I am able to use R with
trimesh(triangles, vertices)
to make a nice mesh plot.
But can't figure out how to use RGL to make an interactive plot that I can view, and I can't work out how to colour the faces of the mesh based on a different value in the data frame.
here are the vertices in a data frame. x, y, z are the coordinates of the nodes/points (nn)
'data.frame': 23796 obs. of 7 variables:
$ nn : int 0 1 2 3 4 5 6 7 8 9 ...
$ x : num 39.5 70.8 49 83.5 -16 ...
$ y : num 28.2 -2.97 -25.67 -9.1 -39.75 ...
$ z: num 160 158 109 121 188 ...
$ uni: num 3.87 6.64 5.02 4.48 1.91 ...
$ bi : num 0.749 0.784 1.045 0.935 0.733 ...
nn x y z uni bi
0 39.527 28.202 160.219 3.86942 0.74871
1 70.804 -2.966 157.578 6.64361 0.78373
2 48.982 -25.674 109.022 5.02491 1.0451
3 83.514 -9.096 120.988 4.47977 0.9348
4 -16.04 -39.749 188.467 1.90873 0.73286
5 74.526 -3.096 174.347 8.4263 0.70594
6 54.93 -56.347 151.496 7.53334 2.17128
7 56.936 -20.131 186.177 7.16118 1.44875
8 -14.627 -47.1 162.185 2.13939 0.70887
9 38.207 -59.201 147.993 5.83457 4.32971
10 50.645 -32.04 110.418 5.3741 1.14543
The triangles for the vertices are
'data.frame': 47602 obs. of 7 variables:
$ X : int 3435 3161 18424 13600 1564 21598 21283 1171 51 9331 ...
$ Y : int 19658 17204 17467 19721 10099 19018 11341 2723 15729 5851 ...
$ Z : int 2764 9466 16955 2669 10091 21205 18399 20833 15865 9106 ...
X Y Z
3435 19658 2764
3161 17204 9466
18424 17467 16955
13600 19721 2669
1564 10099 10091
21598 19018 21205
21283 11341 18399
1171 2723 20833
51 15729 15865
9331 5851 9106
310 3513 9121
5651 11928 15468
8594 2295 6852
22725 22636 11114
I need to make this into a mesh as I can in trimesh, but with RGL and I need to colour the faces of the mesh based on a scale of uni, where <0.5 is red, 0/5-1/5 is orange and >1.5 is green
It looks something like this in trimesh but how to i do it in RGL for R, WITH COLOURING BASED ON VALUE ON UNI in the first data table
Here is an example, starting with two dataframes.
> library(rgl)
> vertices
x y z
1 1 -1 1
2 1 -1 -1
3 1 1 -1
4 1 1 1
5 -1 -1 1
6 -1 -1 -1
7 -1 1 -1
8 -1 1 1
> triangles
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12
1 5 5 1 1 2 2 6 6 8 8 1 1
2 1 4 2 3 6 7 5 8 4 3 6 5
3 4 8 3 4 7 3 8 7 3 7 2 6
You need matrices to deal with tmesh3d. A row of 1's must be added to the table of vertices.
verts <- rbind(t(as.matrix(vertices)),1)
trgls <- as.matrix(triangles)
tmesh <- tmesh3d(verts, trgls)
Now you can plot the mesh:
wire3d(tmesh)
About colors, you have to associate one color to each triangle:
tmesh$material <- list(color=rainbow(ncol(trgls)))
wire3d(tmesh)
> shade3d(tmesh)
UPDATE 2019-03-09
The newest version of rgl (0.100.18) allows different interpretation of the material colors.
You can assign a color to each face:
vertices <- as.matrix(vertices)
triangles <- as.matrix(triangles)
mesh1 <- tmesh3d(
vertices = t(vertices),
indices = triangles,
homogeneous = FALSE,
material = list(color = rainbow(ncol(triangles)))
)
shade3d(mesh1, meshColor = "faces")
or assign a color to each vertex:
mesh2 <- tmesh3d(
vertices = t(vertices),
indices = triangles,
homogeneous = FALSE,
material = list(color = rainbow(nrow(vertices)))
)
shade3d(mesh2, meshColor = "vertices")

mistake in multivePenal but not in frailtyPenal

The libraries used are: library(survival)
library(splines)
library(boot)
library(frailtypack) and the function used is in the library frailty pack.
In my data I have two recurrent events(delta.stable and delta.unstable) and one terminal event (delta.censor). There are some time-varying explanatory variables, like unemployment rate(u.rate) (is quarterly) that's why my dataset has been splitted by quarters.
Here there is a link to the subsample used in the code just below, just in case it may be helpful to see the mistake. https://www.dropbox.com/s/spfywobydr94bml/cr_05_males_services.rda
The problem is that it takes a lot of time running until the warning message appear.
Main variables of the Survival function are:
I have two recurrent events:
delta.unstable (unst.): takes value one when the individual find an unstable job.
delta.stable (stable): takes value one when the individual find a stable job.
And one terminal event
delta.censor (d.censor): takes value one when the individual has death, retired or emigrated.
row id contadorbis unst. stable d.censor .t0 .t
1 78 1 0 1 0 0 88
2 101 2 0 1 0 0 46
3 155 3 0 1 0 0 27
4 170 4 0 0 0 0 61
5 170 4 1 0 0 61 86
6 213 5 0 0 0 0 92
7 213 5 0 0 0 92 182
8 213 5 0 0 0 182 273
9 213 5 0 0 0 273 365
10 213 5 1 0 0 365 394
11 334 6 0 1 0 0 6
12 334 7 1 0 0 0 38
13 369 8 0 0 0 0 27
14 369 8 0 0 0 27 119
15 369 8 0 0 0 119 209
16 369 8 0 0 0 209 300
17 369 8 0 0 0 300 392
When I apply multivePenal I obtain the following message:
Error en aggregate.data.frame(as.data.frame(x), ...) :
arguments must have same length
Además: Mensajes de aviso perdidos
In Surv(.t0, .t, delta.stable) : Stop time must be > start time, NA created
#### multivePenal function
fit.joint.05_malesP<multivePenal(Surv(.t0,.t,delta.stable)~cluster(contadorbis)+terminal(as.factor(delta.censor))+event2(delta.unstable),formula.terminalEvent=~1, formula2=~as.factor(h.skill),data=cr_05_males_serv,Frailty=TRUE,recurrentAG=TRUE,cross.validation=F,n.knots=c(7,7,7), kappa=c(1,1,1), maxit=1000, hazard="Splines")
I have checked if Surv(.t0,.t,delta.stable) contains NA, and there are no NA's.
In addition, when I apply for the same data the function frailtyPenal for both possible combinations, the function run well and I get results. I take one week looking at this and I do not find the key. I would appreciate some of light to this problem.
#delta unstable+death
enter code here
fit.joint.05_males<-frailtyPenal(Surv(.t0,.t,delta.unstable)~cluster(id)+u.rate+as.factor(h.skill)+as.factor(m.skill)+as.factor(non.manual)+as.factor(municipio)+as.factor(spanish.speakers)+ as.factor(no.spanish.speaker)+as.factor(Aged.16.19)+as.factor(Aged.20.24)+as.factor(Aged.25.29)+as.factor(Aged.30.34)+as.factor(Aged.35.39)+ as.factor(Aged.40.44)+as.factor(Aged.45.51)+as.factor(older61)+ as.factor(responsabilities)+
terminal(delta.censor),formula.terminalEvent=~u.rate+as.factor(h.skill)+as.factor(m.skill)+as.factor(municipio)+as.factor(spanish.speakers)+as.factor(no.spanish.speaker)+as.factor(Aged.16.19)+as.factor(Aged.20.24)+as.factor(Aged.25.29)+as.factor(Aged.30.34)+as.factor(Aged.35.39)+as.factor(Aged.40.44)+as.factor(Aged.45.51)+as.factor(older61)+ as.factor(responsabilities),data=cr_05_males_services,n.knots=12,kappa1=1000,kappa2=1000,maxit=1000, Frailty=TRUE,joint=TRUE, recurrentAG=TRUE)
###Be patient. The program is computing ...
###The program took 2259.42 seconds
#delta stable+death
fit.joint.05_males<frailtyPenal(Surv(.t0,.t,delta.stable)~cluster(id)+u.rate+as.factor(h.skill)+as.factor(m.skill)+as.factor(non.manual)+as.factor(municipio)+as.factor(spanish.speakers)+as.factor(no.spanish.speaker)+as.factor(Aged.16.19)+as.factor(Aged.20.24)+as.factor(Aged.25.29)+as.factor(Aged.30.34)+as.factor(Aged.35.39)+as.factor(Aged.40.44)+as.factor(Aged.45.51)+as.factor(older61)+as.factor(responsabilities)+terminal(delta.censor),formula.terminalEvent=~u.rate+as.factor(h.skill)+as.factor(m.skill)+as.factor(municipio)+as.factor(spanish.speakers)+as.factor(no.spanish.speaker)+as.factor(Aged.16.19)+as.factor(Aged.20.24)+as.factor(Aged.25.29)+as.factor(Aged.30.34)+as.factor(Aged.35.39)+as.factor(Aged.40.44)+as.factor(Aged.45.51)+as.factor(older61)+as.factor(responsabilities),data=cr_05_males_services,n.knots=12,kappa1=1000,kappa2=1000,maxit=1000, Frailty=TRUE,joint=TRUE, recurrentAG=TRUE)
###The program took 3167.15 seconds
Because you neither provide information about the packages used, nor the data necessary to run multivepenal or frailtyPenal, I can only help you with the Surv part (because I happened to have that package loaded).
The Surv warning message you provided (In Surv(.t0, .t, delta.stable) : Stop time must be > start time, NA created) suggests that something is strange with your variables .t0 (the time argument in Surv, refered to as 'start time' in the warning), and/or .t (time2 argument, 'Stop time' in the warning). I check this possibility with a simple example
# read the data you feed `Surv` with
df <- read.table(text = "row id contadorbis unst. stable d.censor .t0 .t
1 78 1 0 1 0 0 88
2 101 2 0 1 0 0 46
3 155 3 0 1 0 0 27
4 170 4 0 0 0 0 61
5 170 4 1 0 0 61 86
6 213 5 0 0 0 0 92
7 213 5 0 0 0 92 182
8 213 5 0 0 0 182 273
9 213 5 0 0 0 273 365
10 213 5 1 0 0 365 394
11 334 6 0 1 0 0 6
12 334 7 1 0 0 0 38
13 369 8 0 0 0 0 27
14 369 8 0 0 0 27 119
15 369 8 0 0 0 119 209
16 369 8 0 0 0 209 300
17 369 8 0 0 0 300 392", header = TRUE)
# create survival object
mysurv <- with(df, Surv(time = .t0, time2 = .t, event = stable))
mysurv
# create a new data set where one .t for some reason is less than .to
# on row five .t0 is 61, so I set .t to 60
df2 <- df
df2$.t[df2$.t == 86] <- 60
# create survival object using new data which contains at least one Stop time that is less than Start time
mysurv2 <- with(df2, Surv(time = .t0, time2 = .t, event = stable))
# Warning message:
# In Surv(time = .t0, time2 = .t, event = stable) :
# Stop time must be > start time, NA created
# i.e. the same warning message as you got
# check the survival object
mysurv2
# as you can see, the fifth interval contains NA
# I would recommend you check .t0 and .t in your data set carefully
# one way to examine rows where Stop time (.t) is less than start time (.t0) is:
df2[which(df2$.t0 > df2$.t), ]
I am not familiar with multivepenal but it seems that it does not accept a survival object which contains intervals with NA, whereas might frailtyPenal might do so.
The authors of the package have told me that the function is not finished yet, so perhaps that is the reason that it is not working well.
I encountered the same error and arrived at this solution.
frailtyPenal() will not accept data.frames of different length. The data.frame used in Surv and data.frame named in data= in frailtyPenal must be the same length. I used a Cox regression to identify the incomplete cases, reset the survival object to exclude the missing cases and, finally, run frailtyPenal:
library(survival)
library(frailtypack)
data(readmission)
#Reproduce the error
#change the first start time to NA
readmission[1,3] <- NA
#create a survival object with one missing time
surv.obj1 <- with(readmission, Surv(t.start, t.stop, event))
#observe the error
frailtyPenal(surv.obj1 ~ cluster(id) + dukes,
data=readmission,
cross.validation=FALSE,
n.knots=10,
kappa=1,
hazard="Splines")
#repair by resetting the surv object to omit the missing value(s)
#identify NAs using a Cox model
cox.na <- coxph(surv.obj1 ~ dukes, data = readmission)
#remove the NA cases from the original set to create complete cases
readmission2 <- readmission[-cox.na$na.action,]
#reset the survival object using the complete cases
surv.obj2 <- with(readmission2, Surv(t.start, t.stop, event))
#run frailtyPenal using the complete cases dataset and the complete cases Surv object
frailtyPenal(surv.obj2 ~ cluster(id) + dukes,
data = readmission2,
cross.validation = FALSE,
n.knots = 10,
kappa = 1,
hazard = "Splines")

Resources