I received an error in heatmap.2, and I found similar error here R : knnImputation Giving Error but it doesn't have an answer yet. I couldn't understand the problem. I am new to this world, sorry in advance if there is any mistake.
I have a dataframe df with 144 rows,177 columns which shows the monthly averages of years between 2005-2016 by timeAverage function openair package.
Here the small example from df:
date Year Month Adana-Catalan Adana-Dogankent Adana-Meteoroloji
2008/09/01 2008 9 NaN NaN NaN
2008/10/01 2008 10 NaN NaN 1.7948718
2008/11/01 2008 11 NaN NaN 2.0909091
2008/12/01 2008 12 1.2694064 12.2384106 0.7272727
2009/01/01 2009 1 2.3150358 12.7479339 10.3779762
2009/02/01 2009 2 2.8241107 18.4320175 2.4494949
2009/03/01 2009 3 2.0401606 8.4597523 1.6529412
2009/04/01 2009 4 1.8604651 4.8560000 1.1267606
2009/05/01 2009 5 2.1087719 1.8202247 NaN
2009/06/01 2009 6 4.0695103 2.1463415 1.1111111
2009/07/01 2009 7 5.4016393 8.1298905 NaN
2009/08/01 2009 8 0.1313869 16.9874411 NaN
2009/09/01 2009 9 NaN 5.3753943 NaN
2009/10/01 2009 10 1.6626506 8.8000000 1.8388889
2009/11/01 2009 11 1.4177632 NaN 3.9879154
2009/12/01 2009 12 0.9644128 NaN 5.0281457
2010/01/01 2010 1 0.2608696 4.0898876 3.1981424
2010/02/01 2010 2 0.7619048 NaN 4.3169811
remove non-numeric columns:
df.monthly <- df[,-c(1:3)] #remove non-numeric columns
df.monthly.un <- unlist(df.monthly) #unlist the list
df.monthly.un[is.nan(df.monthly.un)] <- -999 #replace NaNs with -999
monthly.dim <- dim(df.monthly)
monthly.frame <- matrix(df.monthly.un, monthly.dim) #convert unlist to matrix
then I calculated distance matrix and produced dendograms. Finally, I used heatmap.2 to produce heatmap with dendograms.
monthly.dist <- dist(monthly.frame)
monthly.hclust <- hclust(monthly.dist, method="complete")
monthly.dist2 <- dist(t(monthly.frame))
colClust <- as.dendrogram(hclust(monthly.dist2, method="complete"))
rowClust <- as.dendrogram(monthly.hclust)
colpalette <- colorRampPalette(c("red","blue","green"))(n=100)
heatmap.2(monthly.frame, scale="none",
col=colpalette, trace= "none", cexRow=0.6, cexCol=1,
cex.main=0.7, key=T, Rowv=rowClust, labRow=df[,1],
main=expression("2005-2016 SO"[2] * " (ug/m"^3*")"))
However, when I run the code, it gives the error:
Error: Column indexes must be at most 1 if positive, not 22, 23, 24, 25, 21, 18, 19, 20, 16, 17, 12, 10, 11, 15, 13, 14, 3, 9, 8, 4, 7, 5, 6, 2, 124, 125, 121, 122, 123, 133, 132, 131, 134, 135, 126, 129, 127, 128, 130, 136, 137, 143, 144, 141, 142, 138, 139, 140, 57, 58, 55, 56, 42, 47, 41, 40, 36, 38, 37, 39, 46, 43, 44, 45, 34, 35, 26, 27, 28, 29, 30, 31, 32, 33, 59, 54, 53, 48, 49, 50, 51, 112, 116, 117, 114, 115, 88, 89, 52, 60, 63, 70, 75, 73, 74, 79, 77, 76, 78, 66, 67, 62, 65, 71, 64, 61, 72, 97, 87, 85, 86, 90, 98, 91, 83, 84, 92, 94, 96, 93, 95, 68, 69, 82, 80, 81, 113, 110, 111, 109, 118, 119, 120, 101, 105, 103, 104, 99, 106, 100, 102, 107, 108
Any idea why this error occurs? Thanks in advance!
This link shows you how to do KNN another way:
https://www.youtube.com/watch?v=u8XvfhBdbMw
Also, I don't understand why the knnImputation(data) won't work - although I have played around with the data frame and now it does work - even though I don't know why it works now.
What I did was:
mydata <- read_excel("icecreamdata.xlsx") #Here I'm importing my data
newdata <- data.frame() #I've created a blank data frame
newdata <- data.frame(mydata) #I'm putting the data I want into this new data frame
anyNA(newdata) #Checking for missing data. TRUE = yes, data is missing. FALSE = no, data is not missing.
fixeddata <- knnImputation(newdata) #Imputing data to a new object
anyNA(fixed data)
FALSE = There is now no missing data
Both work, but I'd also like to know from the experts why we got the error: column indexes must be at most 1 if positive etc.
The primary explanation for the error reported can be found here.
I came into the problem today,and I found that we should transform our tbl object into data.frame object!!This is one disgusting point that different packages do not have compatibility.
#check your df class,I think your df is actually a tbl object
class(df)
df_new <- as.data.frame(df)
Related
I am trying to remove certain rows from the lexicon::hash_valence_shifters in the sentimentr package. Specifically, i want to keep only rows:
c( 1 , 2 , 3 , 6 , 7 , 13, 14 , 16 , 19, 24, 25 , 26, 27 , 28, 36, 38, 39, 41, 42, 43, 45, 46, 53, 54, 55, 56, 57, 59, 60, 65, 70, 71, 73, 74, 79, 84, 85, 87, 88, 89, 94, 95, 96, 97, 98, 99, 100, 102, 103, 104, 105, 106, 107, 114, 115, 119, 120, 123, 124, 125, 126, 127, 128, 129, 135, 136, 138)
I have tried the below approach:
vsm = lexicon::hash_valence_shifters[c, ]
vsm[ , y := as.numeric(y)]
vsm = sentimentr::as_key(vsm, comparison = NULL, sentiment = FALSE)
sentimentr::is_key(vsm)
vsn = sentimentr::update_valence_shifter_table(vsm, drop = c(dropvalue$x), x= lexicon::hash_valence_shifters, sentiment = FALSE, comparison = TRUE )
However, when I am calculating the sentiment using the updated valence shifter table "vsn", it is giving the sentiment as 0.
Can someone please let me know how to just keep specific rows of the valence shifter table ?
Thanks!
I am trying to get Rsolnp to constrain my parameters to binary integers or to decimals that are nearly the same (.999 is close enough to 1 for example).
I have three vectors of equal length (52), each of which will get multiplied by my binary parameter vector in my objective function.
library(Rsolnp)
a <- c(251, 179, 215, 251, 63, 45, 54, 63, 47, 34, 40, 47, 141, 101, 121, 141, 47, 34, 40, 47, 94, 67, 81, 94, 47, 34, 40, 47, 157, 108, 133, 157, 126, 85, 106, 126, 126, 85, 106, 126, 110, 74, 92, 110, 110, 74, 92, 110, 63, 40, 52, 63)
b <- c(179, 251, 215, 0, 45, 63, 54, 0, 34, 47, 40, 0, 101, 141, 121, 0, 34, 47, 40, 0, 67, 94, 81, 0, 34, 47, 40, 0, 108, 157, 133, 0, 85, 126, 106, 0, 85, 126, 106, 0, 74, 110, 92, 0, 74, 110, 92, 0, 40, 63, 52, 0)
c <- c(179, 179, 118, 179, 45, 45, 30, 45, 34, 34, 22, 34, 101, 101, 67, 101, 34, 34, 22, 34, 67, 67, 44, 67, 34, 34, 22, 34, 108, 108, 71, 108, 85, 85, 56, 85, 85, 85, 56, 85, 74, 74, 49, 74, 74, 74, 49, 74, 40, 40, 27, 40)
x is my parameter vector and below if my objective function.
objective_function = function(x){
-(1166 * sum(x[1:52] * a) / 2000) *
(((sum(x[1:52] * b)) / 2100) + .05) *
(((sum(x[1:52] * c))/1500) + 1.5)
}
I essentially want 1 paramater in each group of 4 equal to 1 and the rest 0 and I'm not sure how to create the correct constraints for this but I believe I need to use these sum constraints in combination with another type of constraint as well. Here are my constraints:
eqn1=function(x){
z1=sum(x[1:4])
z2=sum(x[5:8])
z3=sum(x[9:12])
z4=sum(x[13:16])
z5=sum(x[17:20])
z6=sum(x[21:24])
z7=sum(x[25:28])
z8=sum(x[29:32])
z9=sum(x[33:36])
z10=sum(x[37:40])
z11=sum(x[41:44])
z12=sum(x[45:48])
z13=sum(x[49:52])
return(c(z1,z2,z3,z4,z5,z6,z7,z8,z9,z10,z11,z12,z13))
}
And finally, here is my function call:
opti<-solnp(pars=rep(1,52), fun = objective_function, eqfun = eqn1, eqB = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), LB=rep(0,52))
Calling opti$pars returns my solution vector:
[1] 7.199319e-01 2.800680e-01 6.015388e-08 4.886578e-10 5.540961e-01 4.459036e-01 2.906853e-07 4.635970e-08 5.389325e-01
[10] 4.610672e-01 2.979195e-07 3.651954e-08 6.228346e-01 3.771652e-01 1.980380e-07 3.348488e-09 5.389318e-01 4.610679e-01
[19] 2.979195e-07 3.651954e-08 5.820231e-01 4.179766e-01 2.099869e-07 2.624076e-08 5.389317e-01 4.610680e-01 2.979195e-07
[28] 3.651954e-08 6.499878e-01 3.500120e-01 1.959133e-07 1.059012e-08 6.249098e-01 3.750900e-01 2.588037e-07 1.752927e-08
[37] 6.249106e-01 3.750892e-01 2.588037e-07 1.752927e-08 6.095743e-01 3.904254e-01 2.741968e-07 2.233806e-08 6.095743e-01
[46] 3.904254e-01 2.741968e-07 2.233806e-08 5.679608e-01 4.320385e-01 6.821224e-07 3.997882e-08
As one can see the weight is getting split between multiple variables in each group of 4 instead of being forced onto just 1 with the rest being 0.
If this is not possible with this package could someone show me how to convert my objective function to work with other optimization packages? From what I have seen, they require the objective function to be converted to a vector of coefficients. Any help is appreciated. Thanks!
I tried with a few solvers. With MINLP solvers Couenne and Baron we can solve this directly. With Gurobi we need to decompose the objective into two quadratic parts. All these solvers give:
---- 119 VARIABLE x.L
i1 1.000, i5 1.000, i9 1.000, i14 1.000, i17 1.000, i21 1.000, i25 1.000, i29 1.000
i34 1.000, i38 1.000, i41 1.000, i46 1.000, i49 1.000
---- 119 VARIABLE z.L = -889.346 obj
Zeroes are not printed here.
I used GAMS (commercial) but if you want to use free tools you can use Pyomo(Python) + Couenne. I am not sure about MINLP solvers for R, but Gurobi can be used from R.
Note that the group constraint is simply:
groups(g).. sum(group(g,i),x(i)) =e= 1;
where g are the groups and group(g,i) is a 2d set with the mapping between groups and items.
For Gurobi you need to do something like (in pseudo code):
z1 = 1166 * sum(i,x(i)*a(i)) / 2000 (linear)
z2 = ((sum(i, x(i)*b(i))) / 2100) + .05 (linear)
z3 = ((sum(i, x(i)*c(i)))/1500) + 1.5 (linear)
z23 = z2*z3 (non-convex quadratic)
obj = -z1*z23 (non-convex quadratic)
and tell Gurobi to use the nonconvex MIQCP solver.
Sorry, no R code for this. But it may give you something to think about.
within CPLEX you may try mathematical programming as Paul wrote, but you may also use Constraint Programming.
In OPL (CPLEX modeling language)
using CP;
execute
{
cp.param.timelimit=5; // time limit 5 seconds
}
int n=52;
range r=1..n;
int a[r]=[251, 179, 215, 251, 63, 45, 54, 63, 47, 34, 40, 47, 141, 101, 121, 141, 47, 34, 40, 47, 94, 67,
81, 94, 47, 34, 40, 47, 157, 108, 133, 157, 126, 85, 106, 126, 126,
85, 106, 126, 110, 74, 92, 110, 110, 74, 92, 110, 63, 40, 52, 63];
int b[r]=[179, 251, 215, 0, 45, 63, 54, 0, 34, 47, 40, 0, 101, 141, 121, 0,
34, 47, 40, 0, 67, 94, 81, 0, 34, 47, 40, 0, 108, 157, 133, 0, 85, 126, 106, 0, 85,
126, 106, 0, 74, 110, 92, 0, 74, 110, 92, 0, 40, 63, 52, 0];
int c[r]=[179, 179, 118, 179, 45, 45, 30, 45, 34, 34, 22, 34, 101, 101, 67, 101,
34, 34, 22, 34, 67, 67, 44, 67, 34, 34, 22, 34, 108, 108, 71, 108, 85, 85, 56, 85,
85, 85, 56, 85, 74, 74, 49, 74, 74, 74, 49, 74, 40, 40, 27, 40];
// decision variable
dvar boolean x[r];
// objective
dexpr float obj=
-(1166 * sum(i in r) (x[i]*a[i]) / 2000) *
(((sum(i in r) (x[i]* b[i])) / 2100) + .05) *
(((sum(i in r) (x[i]*c[i]))/1500) + 1.5);
minimize obj;
subject to
{
// one and only one out of 4 is true
forall(i in 1..n div 4) count(all(j in 1+(i-1)*4..4+(i-1)*4)x[j],1)==1;
}
gives
// solution with objective -889.3463
x = [1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1
0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0
0 0];
within 5 seconds
NB: You could call OPL CPLEX from R or rely on any other CPLEX API
And in python you can write the same
from docplex.cp.model import CpoModel
n=52
r=range(0,n)
a =[251, 179, 215, 251, 63, 45, 54, 63, 47, 34, 40, 47, 141, 101, 121, 141, 47, 34, 40, 47, 94, 67, 81, 94, 47, 34, 40, 47, 157, 108, 133, 157, 126, 85, 106, 126, 126, 85, 106, 126, 110, 74, 92, 110, 110, 74, 92, 110, 63, 40, 52, 63]
b =[179, 251, 215, 0, 45, 63, 54, 0, 34, 47, 40, 0, 101, 141, 121, 0, 34, 47, 40, 0, 67, 94, 81, 0, 34, 47, 40, 0, 108, 157, 133, 0, 85, 126, 106, 0, 85, 126, 106, 0, 74, 110, 92, 0, 74, 110, 92, 0, 40, 63, 52, 0]
c =[179, 179, 118, 179, 45, 45, 30, 45, 34, 34, 22, 34, 101, 101, 67, 101, 34, 34, 22, 34, 67, 67, 44, 67, 34, 34, 22, 34, 108, 108, 71, 108, 85, 85, 56, 85, 85, 85, 56, 85, 74, 74, 49, 74, 74, 74, 49, 74, 40, 40, 27, 40]
mdl = CpoModel(name='x')
#decision variables
mdl.x = {i: mdl.integer_var(0,n,name="x"+str(i+1)) for i in r}
mdl.minimize(-1166 * sum(mdl.x[i]*a[i] / 2000 for i in r) \
*((sum(mdl.x[i]* b[i] / 2100 for i in r) +0.05)) \
*((sum(mdl.x[i]*c[i]/1500 for i in r) +1.5)) )
for i in range(0,n // 4):
mdl.add(1==sum( mdl.x[j] for j in range(i*4+0,i*4+4)))
msol=mdl.solve(TimeLimit=5)
# Dislay solution
for i in r:
if (msol[mdl.x[i]]==1):
print(i+1," ")
and that gives
! Best objective : -889.3464
1
5
9
13
17
22
25
30
34
38
41
45
49
I set up an R notebook to solve (or try to solve) the problem as a mixed integer linear program, using CPLEX as the MIP solver and the Rcplex package as the interface to it. The results were unspectacular. After five minutes of grinding, CPLEX had a solution somewhat inferior to what Erwin got (-886.8748 v. his -889.346) with a gap over 146% (which, given Erwin's result, is mostly just the upper bound converging very slowly). I'm happy to share the notebook, which shows the linearization, but to use it you would need to have CPLEX installed.
Update: I have a second notebook, using the GA genetic algorithm package, that consistently gets close to Erwin's solution (and occasionally hits it) in under five seconds. The results are random, so rerunning may do better (or worse), and there is no proof of optimality.
I have a time series data set with 3 measurement variables and with about 2000 samples. I want to classify samples into 1 of 4 categories using a RNN or 1D CNN model using Keras in R. My problem is that I am unable to successfully reshape the model the k_reshape() function.
I am following along the Ch. 6 of Deep Learning with R by Chollet & Allaire, but their examples aren't sufficiently different from my data set that I'm now confused. I've tried to mimic the code from that chapter of the book to no avail. Here's a link to the source code for the chapter.
library(keras)
df <- data.frame()
for (i in c(1:20)) {
time <- c(1:100)
var1 <- runif(100)
var2 <- runif(100)
var3 <- runif(100)
run <- data.frame(time, var1, var2, var3)
run$sample <- i
run$class <- sample(c(1:4), 1)
df <- rbind(df, run)
}
head(df)
# time feature1 feature2 feature3 sample class
# 1 0.4168828 0.1152874 0.0004415961 1 4
# 2 0.7872770 0.2869975 0.8809415097 1 4
# 3 0.7361959 0.5528836 0.7201276931 1 4
# 4 0.6991283 0.1019354 0.8873193581 1 4
# 5 0.8900918 0.6512922 0.3656302236 1 4
# 6 0.6262068 0.1773450 0.3722923032 1 4
k_reshape(df, shape(10, 100, 3))
# Error in py_call_impl(callable, dots$args, dots$keywords) :
# TypeError: Failed to convert object of type <class 'dict'> to Tensor. Contents: {'time': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 3
I'm very new to reshaping arrays, but I would like to have an array with the shape: (samples, time, features). I would love to hear suggestions on how to properly reshape this array or guidance on how this data should be treated for a DL model if I'm off basis on that front.
I found two solutions to my question. My confusion stemmed from the error message from k_reshape that I did not understand how to interpret.
Use the array_reshape() function from the reticulate package.
Use k_reshape() function from keras but this time use the appropriate shape.
Here is the code I successfully executed:
# generate data frame
dat <- data.frame()
for (i in c(1:20)) {
time <- c(1:100)
var1 <- runif(100)
var2 <- runif(100)
var3 <- runif(100)
run <- data.frame(time, var1, var2, var3)
run$sample <- i
run$class <- sample(c(1:4), 1)
dat <- rbind(df, run)
}
dat_m <- as.matrix(df) # convert data frame to matrix
# time feature1 feature2 feature3 sample class
# 1 0.4168828 0.1152874 0.0004415961 1 4
# 2 0.7872770 0.2869975 0.8809415097 1 4
# 3 0.7361959 0.5528836 0.7201276931 1 4
# 4 0.6991283 0.1019354 0.8873193581 1 4
# 5 0.8900918 0.6512922 0.3656302236 1 4
# 6 0.6262068 0.1773450 0.3722923032 1 4
# solution with reticulate's array_reshape function
dat_array <- reticulate::array_reshape(x = dat_m[,c(2:4)], dim = c(20, 100, 3))
dim(dat_array)
# [1] 20 100 3
class(dat_array)
# [1] "array"
# solution with keras's k_reshape
dat_array_2 <- keras::k_reshape(x = dat_m[,c(2:4)], shape = c(20, 100, 3))
dim(dat_array)
# [1] 20 100 3
class(dat_array)
# [1] 20 100 3
class(dat_array_2)
# [1] "tensorflow.tensor" "tensorflow.python.framework.ops.Tensor"
# [3] "tensorflow.python.framework.ops._TensorLike" "python.builtin.object"
A few notes:
Conceptually, this reshaping makes more sense to me as a cast or spreading of the data in R parlance.
The output of array_reshape is an array class, but k_reshape() outputs a tensorflow tensor object. Both worked for me in created deep learning networks, but I find the array class much more interpretable.
I have a set of data that I have collected which consists of a time series, where each y-value is found by taking the mean of 30 samples of grape cluster weight.
I want to simulate more data from this, with the same number of x and y values, so that I can carry out some Bayesian analysis to find the posterior distribution of the data.
I have the data, and I know that the growth follows a Gompertz curve with formula:
[y = a*exp(-exp(-(x-x0)/b))], with a = 88.8, b = 11.7, and x0 = 15.1.
The data I have is
x = c(0, 28, 36, 42, 50, 58, 63, 71, 79, 85, 92, 99, 106, 112)
y = c(0, 15, 35, 55, 62, 74, 80, 96, 127, 120, 146, 160, 177, 165).
Any help would be appreciated thank you
*Will edit when more information is given**
I am a little confused by your question. I have compiled what you have written into R. Please elaborate for me so that I can help you:
gompertz <- function(x, x0, a, b){
a*exp(-exp(-(x-x0)/b))
}
y = c(0, 15, 35, 55, 62, 74, 80, 96, 127, 120, 146, 160, 177, 165) # means of 30 samples of grape cluster weights?
x = c(0, 28, 36, 42, 50, 58, 63, 71, 79, 85, 92, 99, 106, 112) # ?
#??
gompertz(x, x0 = 15.1, a = 88.8, b = 11.7)
gompertz(y, x0 = 15.1, a = 88.8, b = 11.7)
I would like to read.table from a csv file that has dots as thousand separators.
Resulting numbers should be numeric.
This is somewhat complicated as read.table allows to configure decimal signs and quote signs but not thousand separators.
The command gsub(input[10,10],pattern='[.]',replacement='') could delete the dots but transforms everything to characters. The conversation with as.numeric does work for single numbers:
> input[4,4]
[1] 1.742
97 Levels: 0 1.034 1.132 1.137 1.153 1.164 1.178 1.190 1.208 1.251 1.282 ... 950
> gsub(input[4,4],pattern='[.]',replacement='')
[1] "1742"
> as.numeric(gsub(input[4,4],pattern='[.]',replacement=''))
[1] 1742
but not for tables, as gsub(input,pattern='[.]',replacement='') yields
…
[4] "c(17, 21, 31, 38, 39, 48, 56, 52, 57, 63, 66, 68, 71, 76, 78, 79, 75, 77, 74, 73, 65, 64, 55, 50, 45, 43, 34, 36, 44, 42, 32, 5, 96, 10, 9, 6, 22, 53, 54, 14, 15, 16, 24, 18, 23, 33, 25, 28, 35, 47, 49, 51, 62, 70, 72, 69, 67, 58, 26, 94, 93, 97, 8, 41, 37, 46, 29, 40, 27, 30, 20, 19, 12, 13, 11, 7, 3, 4, 2, 95, 92, 90, 89, 87, 86, 83, 81, 80, 61, 60, 59, 91, 82, 88, 84, 85, 1, 1, 1, 1)" …
which is a vector of NA if converted to numeric. Furthermore, something else seems to be wrong with that command since most values are larger than thousand.
Is there anything else that could be useful, besides editing the original .csv files?
You can use the same answer as here, just change the coma (,) to the escaped period (\\.) in the gsub call to remove the periods.
Assuming input is of type character to begin with, this should work -
library(data.table)
dt <- data.table(dt)
dt[,input := as.numeric(gsub(input,pattern='[.]',replacement='')), by = 'input']