rounding of numbers using R - r

I want to convert the following numbers in this way I tried to use all possible methods but i am unable to get the value which i expected
value round off value
0.0 - 4.9 0
5.0 - 5.9 6
6.0 - 6.9 7
7.0 - 7.9 8
8.0 - 8.9 9
9.0 - 10.0 10
The above table is for reference
expected output eg :- roundup(5.0) = 6 ,roundup(6.9)=7

You can try:
roundup<-function(x) c(0,6:10)[findInterval(x,c(0,5:9))]
roundup(c(5,6.9))
#[1] 6 7

Related

What is note_ind:ncol(dataset) mean in R?

I have this line of code but I don't know what it means especially the note_ind part.
apply(mydat[,-c(1,2,3,note_ind:ncol(dataset))],c(1,2),as.numeric)
The notation x:y is used to create numeric vector sequences where each element is the previous element incremented by 1. It is shorthand for `seq(x, y, by = 1). It is most commonly used for integer sequences, but it works on doubles also.
1:10
[1] 1 2 3 4 5 6 7 8 9 10
1.1:10.1
[1] 1.1 2.1 3.1 4.1 5.1 6.1 7.1 8.1 9.1 10.1
1.5:10.2 # sequence stops after 9.5 because 10.2 < 9.5 + 1 - seq() behaves the same way
[1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
Presumably note_ind is an integer value from somewhere else in your code. ncol(data.set) is the number of columns, so note_ind:ncol(dataset) generates a seqence between those two values, incrementing by 1 for each element.

R: Creating an index vector

I need some help with R coding here.
The data set Glass consists of 214 rows of data in which each row corresponds to a glass sample. Each row consists of 10 columns. When viewed as a classification problem, column 10
(Type) specifies the class of each observation/instance. The remaining columns are attributes that might beused to infer column 10. Here is an example of the first row
RI Na Mg Al Si K Ca Ba Fe Type
1 1.52101 13.64 4.49 1.10 71.78 0.06 8.75 0.0 0.0 1
First, I casted column 10 so that it is interpreted by R as a factor instead of an integer value.
Now I need to create a vector with indices for all observations (must have values 1-214). This needs to be done to creating training data for Naive Bayes. I know how to create a vector with 214 values, but not one that has specific indices for observations from a data frame.
If it helps this is being done to set up training data for Naive Bayes, thanks
I'm not totally sure that I get what you're trying to do... So please forgive me if my solution isn't helpful. If your df's name is 'df', just use the dplyr package for reordering your columns and write
library(dplyr)
df['index'] <- 1:214
df <- df %>% select(index,everything())
Here's an example. So that I can post full dataframes, my dataframes will only have 10 rows...
Let's say my dataframe is:
df <- data.frame(col1 = c(2.3,6.3,9.2,1.7,5.0,8.5,7.9,3.5,2.2,11.5),
col2 = c(1.5,2.8,1.7,3.5,6.0,9.0,12.0,18.0,20.0,25.0))
So it looks like
col1 col2
1 2.3 1.5
2 6.3 2.8
3 9.2 1.7
4 1.7 3.5
5 5.0 6.0
6 8.5 9.0
7 7.9 12.0
8 3.5 18.0
9 2.2 20.0
10 11.5 25.0
If I want to add another column that just is 1,2,3,4,5,6,7,8,9,10... and I'll call it 'index' ...I could do this:
library(dplyr)
df['index'] <- 1:10
df <- df %>% select(index, everything())
That will give me
index col1 col2
1 1 2.3 1.5
2 2 6.3 2.8
3 3 9.2 1.7
4 4 1.7 3.5
5 5 5.0 6.0
6 6 8.5 9.0
7 7 7.9 12.0
8 8 3.5 18.0
9 9 2.2 20.0
10 10 11.5 25.0
Hope this will help
df$ind <- seq.int(nrow(df))

Extracting complete paired values (non-NA) from a matrix in R [duplicate]

This question already has answers here:
Remove rows with all or some NAs (missing values) in data.frame
(18 answers)
Closed 7 years ago.
I apologize if this is elementary or has been answered before, but I haven't found an answer to my question despite extensive searching. I'm also very new to programming so please bear with me here.
I have a bunch of 25 by 2 matrices of data, however some of the cells have NA values. I'm looking to extract a subset of the matrix consisting of only the complete paired values (so no NA values).
So say I have:
3.6 4.2
9.2 8.4
4.8 NA
1.1 8.2
NA 11.6
NA NA
2.7 3.5
I want:
3.6 4.2
9.2 8.4
1.1 8.2
2.7 3.5
Is there some function that would do this easily?
Thanks!
Try this
df <- read.table(text = "3.6 4.2
9.2 8.4
4.8 NA
1.1 8.2
NA 11.6
NA NA
2.7 3.5")
df[complete.cases(df), ]
# V1 V2
# 1 3.6 4.2
# 2 9.2 8.4
# 4 1.1 8.2
# 7 2.7 3.5
df[ apply(!is.na(df), 1, all) , ]
df <- data.frame(V1 = c(3.6,9.2,4.8,1.1,NA,NA,2.7),
V2 = c(4.2,8.4,NA,8.2,11.6,NA,3.5))
EDIT: I forgot na.omit or complete.cases Doh.

select column in R with condition

I have a data frame as follows
V2 V4 V6 V8
1 5 5.2 5.1 4.8
2 4.4 4.1 4.5 4.3
3 4.2 3.8 4.2 4.1
4 5 3.2 3.3 4.0
In actual data V value goes from V2 to V200 and row goes from 1 to 99. I want to select columns if its values ever goes less than 4.
Result should be,
V4 V6
1 5.2 5.1
2 4.1 4.5
3 3.8 4.2
4 3.2 3.3
Also want to select columns whose value never goes less than 4. Result should be
V2 V8
1 5 4.8
2 4.4 4.3
3 4.2 4.1
4 5 4.0
I am trying with subset command, but not able to get it done yet.
You have not specified whether you want to do this for each row or for the whole data.frame. For a full data.frame:
mins <- sapply(df, min)
moreThan4 <- df[which(mins > 4)]
lessThan4 <- df[which(mins < 4)]

Computing a "rightmost" moving average?

I would like to compute a moving average (ma) over some time series data but I would like the ma to consider the order n starting from the rightmost of my series so my last ma value corresponds to the ma of the last n values of my series. The desired function rightmost_ma would produce this output:
data <- seq(1,10)
> data
[1] 1 2 3 4 5 6 7 8 9 10
rightmost_ma(data, n=2)
NA 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
I was reviewing the different ma possibilities e.g. package forecast and could not find how to cover this use case. Note that the critical requirement for me is to have valid non NA ma values for the last elements of the series or in other words I want my ma to produce valid results without "looking into the future".
Take a look at rollmean function from zoo package
> library(zoo)
> rollmean(zoo(1:10), 2, align ="right", fill=NA)
1 2 3 4 5 6 7 8 9 10
NA 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
you can also use rollapply
> rollapply(zoo(1:10), width=2, FUN=mean, align = "right", fill=NA)
1 2 3 4 5 6 7 8 9 10
NA 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
I think using stats::filter is less complicated, and might have better performance (though zoo is well written).
This:
filter(1:10, c(1,1)/2, sides=1)
gives:
Time Series:
Start = 1
End = 10
Frequency = 1
[1] NA 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
If you don't want the result to be a ts object, use as.vector on the result.

Resources