Fill data.frame with lists data - r

I have a data.frame like this which I splitted by "bicho" in a list:
row.names bicho freq date ndvi date2 ndvi2 date3 ndvi3 ...
1 john 3 2009-04-08 5001 2009-04-23 4537 2009-05-09 3540
1.1 john 3 2009-04-08 5001 2009-04-23 4537 2009-05-09 3540
1.2 john 3 2009-04-08 5001 2009-04-23 4537 2009-05-09 3540
... ... . ... .. ... .. ... .. ...
2 steve 4 2010-04-29 6338 2010-05-09 5145 2010-05-25 3318
2.1 steve 4 2010-04-29 6338 2010-05-09 5145 2010-05-25 3318
2.2 steve 4 2010-04-29 6338 2010-05-09 5145 2010-05-25 3318
2.3 steve 4 2010-04-29 6338 2010-05-09 5145 2010-05-25 3318
List example:
$ john:'data.frame': 14 obs. of 152 variables:
..$ bicho : Factor w/ 26 levels "john","john",..: 1 1 1 1 1 1 1 1 1 1 ...
..$ freq : num [1:14] 14 14 14 14 14 14 14 14 14 14 ...
..$ date : Date[1:14], format: "2009-04-08" "2009-04-08" ...
..$ ndvi : num [1:14] 5001 5001 5001 5001 5001 ...
..$ date2 : chr [1:14] "2009-04-23" "2009-04-23" "2009-04-23" "2009-04-23" ...
..$ ndvi2 : num [1:14] 4538 4538 4538 4538 4538 ...
..$ date3 : chr [1:14] "2009-05-09" "2009-05-09" "2009-05-09" "2009-05-09" ...
..$ ndvi3 : num [1:14] 3540 3540 3540 3540 3540 ...
The list has 26 elements, each one looking like the one above.
My goal is to fill a data frame with all of the elements in it, but with new columns in which I want to do some calculations. The final data.frame should look like this:
row.names bicho freq time1 time2 ndvi
1 john 3 0 (date2-date1) 5001
1.1 john 3 (date2-date1) (date3-date2) 4537
1.2 john 3 (date3-date2) (date4-date3) 3540
... ... . ... ... ..
2 steve 4 0 (date2-date1) 6338
2.1 steve 4 (date2-date1) (date3-date2) 5145
2.2 steve 4 (date3-date2) (date4-date3) 3318
2.3 steve 4 (date4-date3) (date5-date4) 1239
My initial code looks like this. The problem is that I want to fill the final data.frame row by row (1:563) with list elements with variable length, but I can't find a way to do that.
for(b in list){
for(j in seq_along(df$bicho){
for(i in seq_along(b$bicho)){
print(i)
if(i==1){
df$tempo1[j]=0
df$tempo2[j]=as.Date(b$date2[i])-b$date[i]
df$NDVI<-b[i,4]
df$tempo1[j+1]=df$tempo2[j]
}}}}
The objective of this code was to fill only the first row of each variable.

How about fill the data first ?
## make toy lists
ll1 <- list(bicho=rep("a",3),frep=rep(3,3),x=1:3,y=7:9)
ll2 <- list(bicho=rep("b",5),frep=rep(4,5),x=1:5,y=6:10)
## fill
ret <- c()
your.list.of.list <- list(ll1,ll2)
for (e in your.list.of.list){
ret <- rbind(ret,do.call(cbind,e))
}
ret <- data.frame(ret,stringsAsFactors=FALSE)
then you can add any additional columns by column-wise computations
ret$z <- as.numeric(ret$y)-as.numeric(ret$x)
ret
> ret
bicho frep x y z
1 a 3 1 7 6
2 a 3 2 8 6
3 a 3 3 9 6
4 b 4 1 6 5
5 b 4 2 7 5
6 b 4 3 8 5
7 b 4 4 9 5
8 b 4 5 10 5
>

Related

How to split a CHR column, pivot, then combine tables?

So I have two tables:
LST data (24 months in total) (already pivoted_longer)
Buffer Date LST
<chr> <chr> <chr>
1 100 15/01/2010 6.091741043
2 100 16/02/2010 6.405879111
3 100 20/03/2010 8.925945159
4 100 24/04/2011 6.278147269
5 100 07/05/2010 6.133940129
6 100 08/06/2010 7.705591939
7 100 13/07/2011 4.066052173
8 100 11/08/2010 5.962087092
9 100 12/09/2010 5.761892842
10 100 17/10/2011 3.155769317
# ... with 1,550 more rows
Weather data (24 months in total)
Weather variable 15/01/2010 16/02/2010 20/03/2010 24/04/2011 07/05/2010
1 Temperature 12.0 15.0 16.0 23.00 21.50
2 Wind_speed 10.0 9.0 10.5 19.50 9.50
3 Wind_trend 1.0 1.0 1.0 0.00 1.00
4 Wind_direction 22.5 45.0 67.5 191.25 56.25
5 Humidity 40.0 44.5 22.0 24.50 7.00
6 Pressure 1024.0 1018.5 1025.0 1005.50 1015.50
7 Pressure_trend 1.0 1.0 1.0 1.00 1.00
If I pivot the weather data I get:
1 Temperature 15/01/2010 12
2 Temperature 16/02/2010 15
3 Temperature 20/03/2010 16
4 Temperature 24/04/2011 23
5 Temperature 07/05/2010 21.5
6 Temperature 08/06/2010 36.5
7 Temperature 13/07/2011 33
8 Temperature 11/08/2010 34.5
9 Temperature 12/09/2010 33
10 Temperature 17/10/2011 27
# ... with 158 more rows
(each weather variable listed in turn).
I need to combine 1) and 3) - using the date and something like data_long <- merge(LST_data,weather_data,by="Date") I think - appending weather data columns to each row in 1).
But I'm stuck.
The solution I found to this was to pivot the weather data (longer):
weather_long <- weather %>% pivot_longer(cols = 2:21, names_to = "Date", values_to = "Value")
which gives a tibble in the format:
# A tibble: 180 x 3
`Weather variable` Date Value
<chr> <chr> <dbl>
1 Temperature 28/10/2016 17
2 Temperature 31/12/2016 22
3 Temperature 16/01/2017 25
4 Temperature 05/03/2017 19
(as described above in the question).
Because this process changes the 'Date' variable type:
tibble [180 x 3] (S3: tbl_df/tbl/data.frame)
$ Weather variable: chr [1:180] "Temperature" "Temperature" "Temperature" "Temperature" ...
$ Date : chr [1:180] "28/10/2016" "31/12/2016" "16/01/2017" "05/03/2017" ...
$ Value : num [1:180] 17 22 25 19 20 22 11 10 3 9 ...
I then corrected this:
weather_long$Date <- as.Date(weather_long$Date, format = "%d/%m/%Y")
Next was to convert the weather data to the 'wide' format (in preparation for the next step):
weather_wide <- weather_long %>%
pivot_wider(names_from = "Weather variable", values_from = "Value")
Then join it to the LST data using the Date column as the key:
LST_Weather_dataset <- full_join(data_long, weather_wide, by = "Date")
This produced the desired result:
str(LST_Weather_dataset)
'data.frame': 380 obs. of 16 variables:
$ Buffer : int 100 200 300 400 500 600 700 800 900 1000 ...
$ Date : Date, format: "2016-10-28" "2016-10-28" "2016-10-28" "2016-10-28" ...
$ LST : num 0.918 0.951 0.791 0.748 0.687 ...
$ Month : num 10 10 10 10 10 10 10 10 10 10 ...
$ Year : num 2016 2016 2016 2016 2016 ...
$ JulianDay : num 302 302 302 302 302 302 302 302 302 302 ...
$ TimePeriod : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ Temperature : num 17 17 17 17 17 17 17 17 17 17 ...
$ Humidity : num 59 59 59 59 59 59 59 59 59 59 ...
$ Humidity_trend: num 1 1 1 1 1 1 1 1 1 1 ...
$ Wind_speed : num 19 19 19 19 19 19 19 19 19 19 ...
$ Wind_gust : num 0 0 0 0 0 0 0 0 0 0 ...
$ Wind_trend : num 2 2 2 2 2 2 2 2 2 2 ...
$ Wind_direction: num 338 338 338 338 338 ...
$ Pressure : num 1017 1017 1017 1017 1017 ...
$ Pressure_trend: num 2 2 2 2 2 2 2 2 2 2 ...

Classify factor output with factors with >60 levels and numeric inputs

I'm newbie, and working on a classification to see the causes of coral diseases. The dataset contains 45 variables.
The output variable is a factor with 21 levels (21 diseases) and the inputs are numeric and factor variables, and those factors have even 94 levels, those are like "type of specie of coral", so I can't get into a split factor because I want to be as precise as possible, so maybe one species is less resistant than another. So I can't split those factors. Numeric variables are such as, population in the area, fishing trips etc.
First problem: tried genetic algorithms to select most important variables, random forests, etc., but... it gets aborted, so the variables I eliminated were just based on correlograms. I want something stronger to decide which variables select.
Second problem: I've tried everything I know and made tons of searches on Google to find something that runs and make a classification, but nothing goes on. I tried SVM, Random Forests, Cart, GBM, bagging and boosting, but nothing can't with this dataset.
This is the structure of the dataset
'data.frame': 136510 obs. of 45 variables:
$ SITE : Factor w/ 144 levels "TUT-1511","TUT-1513",..: 56 15 55 21 12 12 17 53 48 82 ...
$ Zone_Fine : Factor w/ 17 levels "Aunuu_E","Aunuu_W",..: 11 9 10 9 9 9 9 8 10 10 ...
$ TRANSECT : num 1 1 1 1 1 1 1 1 1 1 ...
$ SEGMENT : num 5 1 1 1 7 5 7 5 3 7 ...
$ Seg_WIDTH : num 1 1 1 1 1 1 1 1 1 1 ...
$ Seg_LENGTH : num 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 ...
$ SPECIES : Factor w/ 156 levels "AAAA","AABR",..: 94 126 94 102 9 126 135 94 93 94 ...
$ COLONYLENGTH : num 11 45 10 5 12 10 8 30 20 14 ...
$ OLDDEAD : num 5 2 5 0 0 5 10 0 5 10 ...
$ RECENTDEAD : num 0 10 0 0 0 0 0 0 0 0 ...
$ DZCLASS : Factor w/ 21 levels "Acute Tissue Loss - White Syndrome",..: 14 14 14 14 14 14 14 14 14 14 ...
$ EXTENT : num 52.9 52.9 52.9 52.9 52.9 ...
$ SEVERITY : num 3.11 3.11 3.11 3.11 3.11 ...
$ TAXONNAME.x : Factor w/ 155 levels "Acanthastrea hemprichii",..: 95 132 95 107 7 132 133 95 89 95 ...
$ PHYLUM : Factor w/ 2 levels "Cnidaria","Rhodophyta": 1 1 1 1 1 1 1 1 1 1 ...
$ CLASS : Factor w/ 3 levels "Anthozoa","Florideophyceae",..: 1 1 1 1 1 1 1 1 1 1 ...
$ FAMILY : Factor w/ 20 levels "Acroporidae",..: 1 18 1 2 1 18 18 1 8 1 ...
$ GENUS : Factor w/ 55 levels "Acanthastrea",..: 35 44 35 39 2 44 44 35 34 35 ...
$ RANK : Factor w/ 2 levels "Genus","Species": 1 1 1 1 2 1 2 1 1 1 ...
$ DATE_ : Date, format: "0015-03-27" ...
$ OBS_YEAR : num 2015 2015 2015 2015 2015 ...
$ REEF_ZONE : Factor w/ 2 levels "Backreef","Forereef": 2 2 2 2 2 2 2 2 2 2 ...
$ DEPTH_BIN : Factor w/ 4 levels "Bank","Deep",..: 2 2 4 3 2 2 3 4 3 3 ...
$ LBSP : Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 1 ...
$ Zone_Fine_ReefZone_Depth: Factor w/ 41 levels "Aunuu_E_Deep",..: 30 24 29 25 24 24 25 23 28 28 ...
$ Area_km2.x : num 50.9 49.1 101.8 49.1 49.1 ...
$ Fishing.trips.per.km2 : num 719 1148 1431 1148 1148 ...
$ Area_km2.y : num 50.9 49.1 50.9 49.1 49.1 ...
$ Pop.km2 : num 167.5 49.1 561.9 49.1 49.1 ...
$ SHED_NAME : Factor w/ 35 levels "Aasu","Afao - Asili",..: 2 9 15 17 17 1 1 35 28 26 ...
$ Shed_Cond : Factor w/ 4 levels "Extensive","Intermediate",..: 3 4 2 4 4 3 3 3 1 2 ...
$ Shed_Area_Calc : num 30202 29422 458542 126361 32595 ...
$ Perc_Area : num 0.00128 0.00107 0.00993 0.00458 0.00118 ...
$ Cond_Scale : num 3 4 2 4 4 3 3 3 1 2 ...
$ Shoreline_m : num 23146 33046 45821 33046 33046 ...
$ Rank : num 5 9 3 9 9 9 9 6 3 3 ...
$ Comp.8 : num 0.826 0.814 0.838 0.814 0.814 ...
$ Ble : num 0.958 0.969 0.959 0.969 0.969 ...
$ DZ : num 0.647 0.837 0.732 0.837 0.837 ...
$ Herb : num 0.682 0.564 0.704 0.564 0.564 ...
$ Rec : num 0.375 0.477 0.467 0.477 0.477 ...
$ MA : num 0.965 0.975 0.907 0.975 0.975 ...
$ Dam : num 0.998 1 0.992 1 1 ...
$ TAXONNAME.y : Factor w/ 94 levels "Abudefduf sordidus",..: 94 94 94 94 94 94 94 94 94 94 ...
$ Dummy : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
I expected a classification of "DZCLASS".
Thanks, every recommendation is welcomed!

chart.Correlation with continious and categorical variables

I want to see if there is correlation between my variables. This is the structure of the dataset
'data.frame': 189 obs. of 20 variables:
$ age : num 24 31 32 35 36 26 31 24 35 36 ...
$ diplM2 : Factor w/ 3 levels "0","1","2": 3 2 1 3 2 2 3 2 2 1 ...
$ TimeDelcat : Factor w/ 4 levels "0","1","2","3": 1 1 3 3 3 4 2 1 4 4 ...
$ SeasonDel : Factor w/ 4 levels "1","2","3","4": 1 2 4 3 4 3 4 3 2 3 ...
$ BMIM2 : num 23.4 25.7 17 26.6 24.6 21.6 21 22.3 20.8 20.7 ...
$ WgtB2 : int 3740 3615 3705 3485 3420 2775 3365 3770 3075 3000 ...
$ sex : Factor w/ 2 levels "1","2": 2 2 1 2 2 2 1 1 1 1 ...
$ smoke : Factor w/ 3 levels "0","1","2": 1 1 1 2 1 1 1 1 1 3 ...
$ nRBC : num 0.1621 0.0604 0.1935 0.0527 0.1118 ...
$ CD4T : num 0.1427 0.2143 0.1432 0.0686 0.0979 ...
$ CD8T : num 0.1574 0.1549 0.1243 0.0804 0.0782 ...
$ NK : num 0.02817 0 0.04368 0.00641 0.02398 ...
$ Bcell : num 0.1033 0.1124 0.1468 0.0551 0.0696 ...
$ Mono : num 0.0633 0.0641 0.0773 0.0531 0.0656 ...
$ Gran : num 0.428 0.442 0.329 0.716 0.6 ...
$ chip : Factor w/ 92 levels "200251580021",..: 12 24 23 2 27 22 6 22 17 22 ...
$ pos : Factor w/ 12 levels "R01C01","R01C02",..: 11 12 1 6 9 2 12 1 7 11 ...
$ trim1PM25ifdmv4: num 9.45 13.81 15.59 7.13 15.43 ...
$ trim2PM25ifdmv4: num 13.27 15.53 10.69 13.56 9.27 ...
$ trim3PM25ifdmv4: num 16.72 16.21 12.17 6.47 10.66 ...
As you can see, there are both continious and categorical variables.
When I run chart.Correlation(variables, histrogram=T,method = c("pearson") )
I get this error:
Error in pairs.default(x, gap = 0, lower.panel = panel.smooth, upper.panel = panel.cor, :
non-numeric argument to 'pairs'
How can I fix this?
Thank you.
I believe you want correlation only between numerical variables. The below code will do this and it will output only unique correlations between the input.
library(reshape2)
data <- data.frame(x1=rnorm(10),
x2=rnorm(10),
x3=rnorm(10),
x4=c("a","b","c","d","e","f","g","h","i","j"),
x5=c("ab","sp","sp","dd","hg","hj","qw","dh","ko","jk"))
data
x1 x2 x3 x4 x5
1 -1.2169793 0.5397598 0.4981513 a ab
2 -0.7032631 -2.1262837 -1.0377371 b sp
3 0.8766831 -0.2326975 -0.1219613 c sp
4 0.3405332 2.4766225 -1.1960618 d dd
5 0.1889945 0.3444534 1.9659062 e hg
6 0.8086956 0.4654644 -1.2526696 f hj
7 -0.6850181 -1.7657241 0.5156620 g qw
8 0.8518034 0.9484547 1.4784063 h dh
9 0.5191793 1.2246566 1.3867829 i ko
10 0.4568953 -0.6881464 0.3548839 j jk
#finding correlation for all numerical values
corr=cor(data[as.numeric(which(sapply(data,class)=="numeric"))])
#convert the correlation table to long format
res=melt(corr)
##keeping only one side of the correlations
res$type=apply(res,1,function(x)
paste(sort(c(as.character(x[1]),as.character(x[2]))),collapse="*"))
res=unique(res[,c("type","value")])
res
type value
x1*x1 1.00000000
x1*x2 0.44024939
x1*x3 0.04936654
x2*x2 1.00000000
x2*x3 0.08859169
x3*x3 1.00000000

Find mean from subset of one column based on ranking in the top 50 of another column

I have a data frame that has the following columns:
> str(wbr)
'data.frame': 214 obs. of 12 variables:
$ countrycode : Factor w/ 214 levels "ABW","ADO","AFG",..: 1 2 3 4 5 6 7 8 9 10 ...
$ countryname : Factor w/ 214 levels "Afghanistan",..: 10 5 1 6 2 202 8 9 4 7 ...
$ gdp_per_capita : num 19913 35628 415 2738 4091 ...
$ literacy_female : num 96.7 NA 17.6 59.1 95.7 ...
$ literacy_male : num 96.9 NA 45.4 82.5 98 ...
$ literacy_all : num 96.8 NA 31.7 70.6 96.8 ...
$ infant_mortality : num NA 2.2 70.2 101.6 13.3 ...
$ illiteracy_female: num 3.28 NA 82.39 40.85 4.31 ...
$ illiteracy_mele : num 3.06 NA 54.58 17.53 1.99 ...
$ illiteracy_male : num 3.06 NA 54.58 17.53 1.99 ...
$ illiteracy_all : num 3.18 NA 68.26 29.42 3.15 ...
I would like to find the mean of illiteracy_all from the top 50 countries with the highest GDP.
Before you answer me I need to inform you that the data frame has NA values meaning that if I want to find the mean I would have to write:
mean(wbr$illiteracy_all, na.rm=TRUE)
For a reproducible example, let's take:
data.df <- data.frame(x=101:120, y=rep(c(1,2,3,NA), times=5))
So how could I average the y values for e.g. the top 5 values of x?
> data.df
x y
1 101 1
2 102 2
3 103 3
4 104 NA
5 105 1
6 106 2
7 107 3
8 108 NA
9 109 1
10 110 2
11 111 3
12 112 NA
13 113 1
14 114 2
15 115 3
16 116 NA
17 117 1
18 118 2
19 119 3
20 120 NA
Any of the following would work:
mean(data.df[rank(-data.df$x)<=5,"y"], na.rm=TRUE)
mean(data.df$y[rank(-data.df$x)<=5], na.rm=TRUE)
with(data.df, mean(y[rank(-x)<=5], na.rm=TRUE))
To unpack why this works, note first that rank gives ranks in a different order to what you might expect, 1 being the rank of the smallest number not the largest:
> rank(data.df$x)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
We can get round that by negating the input:
> rank(-data.df$x)
[1] 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
So now ranks 1 to 5 are the "top 5". If we want a vector of TRUE and FALSE to indicate the position of the top 5 we can use:
> rank(-data.df$x)<=5
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[14] FALSE FALSE TRUE TRUE TRUE TRUE TRUE
(In reality you might find you have some ties in your data set. This is only going to cause issues if the 50th position is tied. You might want to have a look at the ties.method argument for rank to see how you want to handle this.)
So let's grab the values of y in those positions:
> data.df[rank(-data.df$x)<=5,"y"]
[1] NA 1 2 3 NA
Or you could use:
> data.df$y[rank(-data.df$x)<=5]
[1] NA 1 2 3 NA
So now we know what to input into mean:
> mean(data.df[rank(-data.df$x)<=5,"y"], na.rm=TRUE)
[1] 2
Or:
> mean(data.df$y[rank(-data.df$x)<=5], na.rm=TRUE)
[1] 2
Or if you don't like repeating the name of the data frame, use with:
> with(data.df, mean(y[rank(-x)<=5], na.rm=TRUE))
[1] 2

Carc data from rda file to numeric matrix

I try to make KDA (Kernel discriminant analysis) for carc data, but when I call command X<-data.frame(scale(X)); r shows error:
"Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric"
I tried to use as.numeric(as.matrix(carc)) and carc<-na.omit(carc), but it does not help either
library(ks);library(MASS);library(klaR);library(FSelector)
install.packages("klaR")
install.packages("FSelector")
library(ks);library(MASS);library(klaR);library(FSelector)
attach("carc.rda")
data<-load("carc.rda")
data
carc<-na.omit(carc)
head(carc)
class(carc) # check for its class
class(as.matrix(carc)) # change class, and
as.numeric(as.matrix(carc))
XX<-carc
X<-XX[,1:12];X.class<-XX[,13];
X<-data.frame(scale(X));
fit.pc<-princomp(X,scores=TRUE);
plot(fit.pc,type="line")
X.new<-fit.pc$scores[,1:5]; X.new<-data.frame(X.new);
cfs(X.class~.,cbind(X.new,X.class))
X.new<-fit.pc$scores[,c(1,4)]; X.new<-data.frame(X.new);
fit.kda1<-Hkda(x=X.new,x.group=X.class,pilot="samse",
bw="plugin",pre="sphere")
kda.fit1 <- kda(x=X.new, x.group=X.class, Hs=fit.kda1)
Can you help to resolve this problem and make this analysis?
Added:The car data set( Chambers, kleveland, Kleiner & Tukey 1983)
> head(carc)
P M R78 R77 H R Tr W L T D G C
AMC_Concord 4099 22 3 2 2.5 27.5 11 2930 186 40 121 3.58 US
AMC_Pacer 4749 17 3 1 3.0 25.5 11 3350 173 40 258 2.53 US
AMC_Spirit 3799 22 . . 3.0 18.5 12 2640 168 35 121 3.08 US
Audi_5000 9690 17 5 2 3.0 27.0 15 2830 189 37 131 3.20 Europe
Audi_Fox 6295 23 3 3 2.5 28.0 11 2070 174 36 97 3.70 Europe
Here is a small dataset with similar characteristics to what you describe
in order to answer this error:
"Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric"
carc <- data.frame(type1=rep(c('1','2'), each=5),
type2=rep(c('5','6'), each=5),
x = rnorm(10,1,2)/10, y = rnorm(10))
This should be similar to your data.frame
str(carc)
# 'data.frame': 10 obs. of 3 variables:
# $ type1: Factor w/ 2 levels "1","2": 1 1 1 1 1 2 2 2 2 2
# $ type2: Factor w/ 2 levels "5","6": 1 1 1 1 1 2 2 2 2 2
# $ x : num -0.1177 0.3443 0.1351 0.0443 0.4702 ...
# $ y : num -0.355 0.149 -0.208 -1.202 -1.495 ...
scale(carc)
# Similar error
# Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
Using set()
require(data.table)
DT <- data.table(carc)
cols_fix <- c("type1", "type2")
for (col in cols_fix) set(DT, j=col, value = as.numeric(as.character(DT[[col]])))
str(DT)
# Classes ‘data.table’ and 'data.frame': 10 obs. of 4 variables:
# $ type1: num 1 1 1 1 1 2 2 2 2 2
# $ type2: num 5 5 5 5 5 6 6 6 6 6
# $ x : num 0.0465 0.1712 0.1582 0.1684 0.1183 ...
# $ y : num 0.155 -0.977 -0.291 -0.766 -1.02 ...
# - attr(*, ".internal.selfref")=<externalptr>
The first column(s) of your data set may be factors. Taking the data from corrgram:
library(corrgram)
carc <- auto
str(carc)
# 'data.frame': 74 obs. of 14 variables:
# $ Model : Factor w/ 74 levels "AMC Concord ",..: 1 2 3 4 5 6 7 8 9 10 ...
# $ Origin: Factor w/ 3 levels "A","E","J": 1 1 1 2 2 2 1 1 1 1 ...
# $ Price : int 4099 4749 3799 9690 6295 9735 4816 7827 5788 4453 ...
# $ MPG : int 22 17 22 17 23 25 20 15 18 26 ...
# $ Rep78 : num 3 3 NA 5 3 4 3 4 3 NA ...
# $ Rep77 : num 2 1 NA 2 3 4 3 4 4 NA ...
# $ Hroom : num 2.5 3 3 3 2.5 2.5 4.5 4 4 3 ...
# $ Rseat : num 27.5 25.5 18.5 27 28 26 29 31.5 30.5 24 ...
# $ Trunk : int 11 11 12 15 11 12 16 20 21 10 ...
# $ Weight: int 2930 3350 2640 2830 2070 2650 3250 4080 3670 2230 ...
# $ Length: int 186 173 168 189 174 177 196 222 218 170 ...
# $ Turn : int 40 40 35 37 36 34 40 43 43 34 ...
# $ Displa: int 121 258 121 131 97 121 196 350 231 304 ...
# $ Gratio: num 3.58 2.53 3.08 3.2 3.7 3.64 2.93 2.41 2.73 2.87 ...
So exclude them by trying this:
X<-XX[,3:14]
or this
X<-XX[,-(1:2)]

Resources