Error in plotting atomic vector - r

I want to plot log return graph. I imported my data file in CSV format.My code after that as below with the errors. For more information all the variables do exist in the table.
t <- read.csv("~/Documents/FYP/Log return 1.csv")
View(t)
df<-ts(t)
plot.ts(df$Year,df$IND)
Error in df$Year : $ operator is invalid for atomic vectors
plot.ts(df[Time],df[CHN])
Error in NextMethod("[") : object 'Time' not found
plot.ts(df[Year],df[CHN])
Error in NextMethod("[") : object 'Year' not found
plot.ts(df[[Year]],df[[CHN]])
Error in NCOL(x) : object 'Year' not found

Given a data frame input t, df <- ts(t) gives you a matrix rather than a data frame, so using $ is invalid. To access a column of a matrix, you need
for example, df[, "Time"].
As an example, let's use R's built-in dataset cars. Originally it is a data frame with two columns: speed and dist, while x <- ts(cars) gives a matrix:
class(x)
# [1] "mts" "ts" "matrix"
head(x)
# speed dist
#[1,] 4 2
#[2,] 4 10
#[3,] 7 4
#[4,] 7 22
#[5,] 8 16
#[6,] 9 10
The error you saw can be reproduced by
x$dist
# Error in x$dist : $ operator is invalid for atomic vectors
Instead, we want
x[, "dist"]
#Time Series:
#Start = 1
#End = 50
#Frequency = 1
# [1] 2 10 4 22 16 10 18 26 34 17 28 14 20 24 28 26 34 34 46
#[20] 26 36 60 80 20 26 54 32 40 32 40 50 42 56 76 84 36 46 68
#[39] 32 48 52 56 64 66 54 70 92 93 120 85

Related

For loop in multiple of fives

Thanks to #akrun, I could run my previous question about merging and creating tables with loop. Merge and create tables using a loop
However, because my laptop only has 16GB of RAM, I couldn't run the large dataset using the code. So, instead of merging 100 times, I decided to separate the process, and do it step by step using a for-loop.
I was going to create 20 lists of data using for loop, but then I couldn't find a way to make this happen.
To be specific, I would run the following 20 lines of code manually without using a for loop.
list1 <- mget(paste0("", 1:5))
list2 <- mget(paste0("", 6:10))
list3 <- mget(paste0("", 11:15))
list4 <- mget(paste0("", 16:20))
list5 <- mget(paste0("", 21:25))
...
list20 <- mget(paste0("", 96:100))
How would I write for loop in this case?
I tried to find a way to do this (for example as below), but I am getting an error.
for(i in 1:20){
list[i] <- mget(paste0("",5*i-4:5*i))
}
Thanks in advance for all your help!
There are multiple ways to create the list. Either use split with %/%
fulllst <- lapply(split(as.character(1:100), (1:100-1) %/% 5 + 1), mget)
Or use the same code in OP's post by wrapping the code with () to avoid evaluation based on precedence of operators
# create an empty list to store the output
lstout <- vector('list', 20)
# loop over the sequence and add the `()` for `(5* i- 4)` and similarly for (5*i)
for(i in 1:20)
lstout[[i]] <- mget(as.character((5 *i -4):(5*i)))
Use print to find the difference
> for(i in 1:20) print((5 *i -4):(5*i))
[1] 1 2 3 4 5
[1] 6 7 8 9 10
[1] 11 12 13 14 15
[1] 16 17 18 19 20
[1] 21 22 23 24 25
[1] 26 27 28 29 30
[1] 31 32 33 34 35
[1] 36 37 38 39 40
[1] 41 42 43 44 45
[1] 46 47 48 49 50
[1] 51 52 53 54 55
[1] 56 57 58 59 60
[1] 61 62 63 64 65
[1] 66 67 68 69 70
[1] 71 72 73 74 75
[1] 76 77 78 79 80
[1] 81 82 83 84 85
[1] 86 87 88 89 90
[1] 91 92 93 94 95
[1] 96 97 98 99 100
> for(i in 1:20) print(5 *i -4:5*i)
[1] 1 0
[1] 2 0
[1] 3 0
[1] 4 0
[1] 5 0
[1] 6 0
[1] 7 0
[1] 8 0
[1] 9 0
[1] 10 0
[1] 11 0
[1] 12 0
[1] 13 0
[1] 14 0
[1] 15 0
[1] 16 0
[1] 17 0
[1] 18 0
[1] 19 0
[1] 20 0
ie. if we don't use the () the evaluation will be
i <- 1
(5 * i) - (4:5 * i)
[1] 1 0
# instead of
(5 * i -4):(5 * i)
[1] 1 2 3 4 5
The operator precendence is showed in ?Syntax
:: ::: access variables in a namespace
$ # component / slot extraction
[ [[ indexing
^ exponentiation (right to left)
- + unary minus and plus
: sequence operator
%any% |> special operators (including %% and %/%)
* / multiply, divide
+ - (binary) add, subtract
....

Can not convert values from factor into only numeric

I want to convert this variable into numeric, as you can see:
> class(DATA$estimate)
[1] "factor"
> head(DATA$estimate)
[1] 0,253001909 0,006235543 0,005285019 0,009080499 6,580140903 0,603060006
57 Levels: 0,000263863 0,000634365 0,004405696 0,005285019 0,006235543 0,009080499 0,009700147 0,018568434 0,253001909 ... 7,790580873
>
But when I want to convert, look what I have got
> DATA$estimate<-as.numeric(DATA$estimate)
> DATA$estimate
[1] 9 5 4 6 51 12 3 53 11 8 1 7 15 27 30 29 28 31 21 23 22 39 38 37 33 26 34 52 57 50 24 18 20 10 2 55 54 56 36 32 35 44 46
[44] 48 19 25 16 43 41 40 49 42 47 14 17 13 45
It's not numeric and I don't understand how the program gives these numbers!
data:
fac <- factor(c("0,253001909" ,"0,006235543" ,"0,005285019" ,"0,009080499" ,"6,580140903" ,"0,603060006"))
I convert to character, then turn the "," into ".", then convert to numeric.
as.numeric(sub(",",".",as.character(fac)))
in your case its:
DATA$estimate<-as.numeric(sub(",",".",as.character(DATA$estimate)))
You can also scan() your factor variable and specify , as decimal separator
fac <- factor(c("0,253001909" ,"0,006235543" ,"0,005285019" ,"0,009080499" ,
"6,580140903" ,"0,603060006"))
scan(text = as.character(fac), dec = ",")
#output
[1] 0.253001909 0.006235543 0.005285019 0.009080499 6.580140903
[6] 0.603060006

Calculate number of values in vector that exceed values in column of data.frame

I have a long list of numbers, e.g.
set.seed(123)
y<-round(runif(100, 0, 200))
And I would like to store in column y the number of values that exceed each value in column x of a data frame:
df <- data.frame(x=seq(0,200,20))
I can compute the numbers manually, like this:
length(which(y>=20)) #93 values exceed 20
length(which(y>=40)) #81 values exceed 40
etc. I know I can use a for-loop with all values of x, but is there a more elegant way?
I tried this:
df$y <- length(which(y>=df$x))
But this gives a warning and does not give me the desired output.
The data frame should look like this:
df
x y
1 0 100
2 20 93
3 40 81
4 60 70
5 80 61
6 100 47
7 120 40
8 140 29
9 160 19
10 180 8
11 200 0
You can compare each value of df$x against all value of y using sapply
sapply(df$x, function(a) sum(y>a))
#[1] 99 93 81 70 61 47 40 29 18 6 0
#Looking at your output, maybe you want
sapply(df$x, function(a) sum(y>=a))
#[1] 100 93 81 70 61 47 40 29 19 8 0
Here's another approach using outer that allows for element wise comparison of two vectors
rowSums(outer(df$x,y, "<="))
#[1] 100 93 81 70 61 47 40 29 19 8 0
Yet one more (from alexis_laz's comment)
length(y) - findInterval(df$x, sort(y), left.open = TRUE)
# [1] 100 93 81 70 61 47 40 29 19 8 0

Subscript out of bounds-R error

I am using createFolds function in R to create folds which is returning successful result. But when I am using loop to perform some calculation on each fold I am getting below error.
Code is:
set.seed(1000)
k <- 10
folds <- createFolds(train_data,k=k,list = TRUE, returnTrain = FALSE)
str(folds)
This is giving output as:
List of 10
$ Fold01: int [1:18687] 1 8 10 21 22 25 26 29 34 35 ...
$ Fold02: int [1:18685] 5 11 14 32 40 46 50 52 56 58 ...
$ Fold03: int [1:18685] 16 20 39 47 49 77 78 83 84 86 ...
$ Fold04: int [1:18685] 3 15 30 38 41 44 51 53 54 55 ...
$ Fold05: int [1:18685] 7 9 17 18 23 37 42 67 75 79 ...
$ Fold06: int [1:18686] 6 31 36 48 72 74 90 113 114 121 ...
$ Fold07: int [1:18686] 2 33 59 61 100 103 109 123 137 161 ...
$ Fold08: int [1:18685] 24 64 68 87 88 101 110 130 141 152 ...
$ Fold09: int [1:18684] 4 27 28 66 70 85 97 105 112 148 ...
$ Fold10: int [1:18684] 12 13 19 43 65 91 94 108 134 138 ...
However below code is giving me error
for( i in 1:k ){
testData <- train_data[folds[[i]], ]
trainData <- train_data[(-folds[[i]]), ]
}
Error is:
> for( i in 1:k ){
+ testData <- train_data[folds[[i]], ]
+ trainData <- train_data[(-folds[[i]]), ]
+ }
Error in train_data[folds[[i]], ] : subscript out of bounds
I tried with different seed values but I am getting same error.
Any help is appreciated.
Thank you!
As per my understanding, your problem is arising because you are using the whole dataframe train_data to create folds. K-folds can be generated for samples, ie, rows of the dataset.
For instance:
data(spam) # from package kernlab
dim(spam) #has 4601 rows/samples
folds <- createFolds(y=spam$type, k=10, list=T, returnTrain = T)
# Here, only one column , spam$type, is used
# and indeed
max(unlist(folds)) #4601
#and these can be used as row indices
head( spam[folds[[4]], ] )
Using the whole dataframe is very similar to using a matrix. Such a matrix will first be converted to a vector. Thus a 5x10 matrix will actually be converted to 50 element vector and the values in folds will be corresponding to the indices of this vector. If you try to then use these values as row indices for your dataframe, they will overshoot
r <- 8
c <- 10
m0 <- matrix(rnorm(r*c), r, c)
features<-apply(m0, c(1,2), function(x) sample(c(0,1),1))
features
folds<-createFolds(features,4)
folds
max(unlist(folds))
m0[folds[[2]],] # Error in m0[folds[[2]], ] : subscript out of bounds

Data frame header in R

I am trying to make some calculations with data from oracle db using R. I connected to the DB and extracted the data correctly.
> y=dbGetQuery(con, "select distinct(fk_parametro) from t_datos")
> y
FK_PARAMETRO
1 30
2 42
3 43
4 83
5 87
6 1
7 6
8 44
9 20
10 14
11 86
12 88
13 85
14 81
15 35
16 8
17 80
18 89
19 7
20 12
21 82
22 9
23 10
The following command.. works:
> sum(y)
[1] 1042
But this one.. fails:
> mean(y)
[1] NA
Warning message:
In mean.default(y) : argument is not numeric or logical: returning NA
I think it happens because R is considering the header "FK_PARAMETRO" as an element. can someone help me to figure out?
As commented by #akrun, this works
mean(y[,1])
Or as suggested by #PierreLafortune, could also do
colMeans(y)

Resources