How do I make a list of numbers by tenths? - r

I know that 1:10 will give me a vector of all integers from 1 to 10, but how can I get numbers from 1 to 2 going up by tenths (i.e., 1.0, 1.1, 1.2, ..., 2.0)?

Try seq
> seq(1, 2, by = 0.1)
[1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

Just in the spirit of there is more than one way to do things, another option is:
> (10:20)/10
[1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

Related

Julia: Floats sum is wrong?

How can I make sure that by adding 0.2 at every iteration I get the correct result?
some = 0.0
for i in 1:10
some += 0.2
println(some)
end
the code above gives me
0.2
0.4
0.6000000000000001
0.8
1.0
1.2
1.4
1.5999999999999999
1.7999999999999998
1.9999999999999998
Floats are only approximatively correct and if adding up to infinity the error will become infinite, but you can still calculate with it pretty precisely. If you need to evaluate the result and look if it is correct you can use isapprox(a,b) or a ≈ b.
I.e.
some = 0.
for i in 1:1000000
some += 0.2
end
isapprox(some, 1000000 * 0.2)
# true
Otherwise, you can add integer numbers in the for loop and then divide by 10.
some = 0.
for i in 1:10
some += 2.
println(some/10.)
end
#0.2
#0.4
#0.6
#0.8
#1.0
#1.2
#1.4
#1.6
#1.8
#2.0
More info about counting with floats:
https://en.wikipedia.org/wiki/Floating-point_arithmetic
You can iterate over a range since they use some clever tricks to return more "natural" values:
julia> collect(0:0.2:2)
11-element Vector{Float64}:
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
julia> collect(range(0.0, step=0.2, length=11))
11-element Vector{Float64}:
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0

Rbind-ing data.tables with NA values

I have a big data.table with about 40 columns, and I need to add a record for which I only have 3 of the 40 columns (the rest will be just NA). To make a reproducible example:
require(data.table)
data(iris)
setDT(iris)
# this works (and is the expected result):
rbind(iris, list(6, NA, NA, NA, "test"))
The problem is I have 37+ empty columns (the data I want to input is in the 1st, 2nd and 37th columns of the variable). So, I need to rep some of the NAs. But if I try:
rbind(iris, list(6, rep(NA, 3), "test"))
It won't work (sizes are different). I could do
rbind(iris, list(c(6, rep(NA, 3), "test")))
But it will (obviously) coerce the whole first column to char. I've tried unlisting the list, inverting the list(c( sequence (it only accepts lists), and haven't found anything yet.
Please note that this is not a duplicate of the (several) posts about rbind data.tables, as I'm able to do that. What I haven't been able to, is to maintain proper data classes while doing it and using rep(NA, x).
You can do...
rbind(data.table(iris), c(list(6), logical(3), list("test")))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1: 5.1 3.5 1.4 0.2 setosa
2: 4.9 3.0 1.4 0.2 setosa
3: 4.7 3.2 1.3 0.2 setosa
4: 4.6 3.1 1.5 0.2 setosa
5: 5.0 3.6 1.4 0.2 setosa
---
147: 6.3 2.5 5.0 1.9 virginica
148: 6.5 3.0 5.2 2.0 virginica
149: 6.2 3.4 5.4 2.3 virginica
150: 5.9 3.0 5.1 1.8 virginica
151: 6.0 NA NA NA test
logical(n) is the same as rep(NA, n). I wrapped iris in data.table() so rbindlist is used instead of rbind.data.frame and "test" is treated as a new factor level instead of an invalid level.
I think there are better ways to go, though, like...
newrow = setDT(iris[NA_integer_, ])
newrow[, `:=`(Sepal.Length = 6, Species = factor("test")) ]
rbind(data.table(iris), newrow)
# or
rbind(data.table(iris), list(Sepal.Length = 6, Species = "test"), fill=TRUE)
These approaches are clearer and don't require fiddling with column counting.
I prefer the newrow way, since it leaves a table I can inspect to review the data transformation.
We can use replicate
rbind(iris, c(6, replicate(3, NA, simplify = FALSE), "test"))
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1: 5.1 3.5 1.4 0.2 setosa
# 2: 4.9 3.0 1.4 0.2 setosa
# 3: 4.7 3.2 1.3 0.2 setosa
# 4: 4.6 3.1 1.5 0.2 setosa
# 5: 5.0 3.6 1.4 0.2 setosa
# ---
#147: 6.3 2.5 5.0 1.9 virginica
#148: 6.5 3.0 5.2 2.0 virginica
#149: 6.2 3.4 5.4 2.3 virginica
#150: 5.9 3.0 5.1 1.8 virginica
#151: 6.0 NA NA NA test
Or as #Frank commented
rbind(iris, c(6, as.list(rep(NA, 3)), "test"))

Remove values in vector from double variable in R

I have a variable of type double X: 1.5 1.3 0.6 1.8 2.9 2.1 1.5 1.4 5.8 0.0
and a vector V: c(0.6,2.9). I want to remove the values in V from X
test<-X[!X %in% V]
The values are not removed from test:
test
[1] 1.5 1.3 0.6 1.8 2.9 2.1 1.5 1.4 5.8 0.0`
I tried the following:
are.equal <- function(x, y, eps = .Machine$double.eps^0.5) abs(x - y) < eps
test=X[!(are.equal(X,0.6))]
0.6 were removed..
I could have something odd in my data or my system.
Any idea?

R programming transfering data from excel with missing values to R

So I have an excel spreadsheet with NA values....What is the best way to copy the data and put it in R...I usually use data=read.delim("clipboard").... But because of those missing values....I keep getting this error
Error in if (del == 0 && to == 0) return(to) :
missing value where TRUE/FALSE needed
What are the possible ways I can get rid of this error?...I tried putting zeros instead of NA values but that kinda screws up the what the code is doing
Heres the link of the code that I'm using R programming fixing error was really helpful for my data problems.
I was gonna post the whole set but theres a limit of 30000 characters
You need to set option fill to TRUE , This will let you in case the rows have unequal length, to add NA fields.
read.table(fileName,header=TRUE,fill=TRUE)
fileName here is your excel file path. for example filename ='c:\temp\myfile.csv'.
This should work also with read.delim which is a wrapper of read.table. You can give read.table a string , but you set the text argument not the file one. For example:
read.table(text = ' Time Speed Time Speed
0.8 2.9 0.3 2.7
1.3 2.8 0.9 2.7
1.7 2.3 2.5 3.1
2.0 0.6
2.3 1.7 13.6 3.3
3.0 1.4 15.1 3.5
3.5 1.3 17.5 3.3',head=T,fill=T)
Time Speed Time.1 Speed.1
1 0.8 2.9 0.3 2.7
2 1.3 2.8 0.9 2.7
3 1.7 2.3 2.5 3.1
4 2.0 0.6 NA NA
5 2.3 1.7 13.6 3.3
6 3.0 1.4 15.1 3.5
7 3.5 1.3 17.5 3.3

Multiple columns of data and getting average R program

I asked a question like this before but I decided to simplify my data format because I'm very new at R and didnt understand what was going on....here's the link for the question How to handle more than multiple sets of data in R programming?
But I edited what my data should look like and decided to leave it like this..in this format...
X1.0 X X2.0 X.1
0.9 0.9 0.2 1.2
1.3 1.4 0.8 1.4
As you can see I have four columns of data, The real data I'm dealing with is up to 2000 data points.....Columns "X1.0" and "X2.0" refer "Time"...so what I want is the average of "X" and "X.1" every 100 seconds based on my 2 columns of time which are "X1.0" and "X2.0"...I can do it using this command
cuts <- cut(data$X1.0, breaks=seq(0, max(data$X1.0)+400, 400))
 by(data$X, cuts, mean)
But this will only give me the average from one set of data....which is "X1.0" and "X".....How will I do it so that I could get averages from more than one data set....I also want to stop having this kind of output
cuts: (0,400]
[1] 0.7
------------------------------------------------------------
cuts: (400,800]
[1] 0.805
Note that the output was done every 400 s....I really want a list of those cuts which are the averages at different intervals...please help......I just used data=read.delim("clipboard") to get my data into the program
It is a little bit confusing what output do you want to get.
First I change colnames but this is optional
colnames(dat) <- c('t1','v1','t2','v2')
Then I will use ave which is like by but with better output. I am using a trick of a matrix to index column:
matrix(1:ncol(dat),ncol=2) ## column1 is col1 adn col2...
[,1] [,2]
[1,] 1 3
[2,] 2 4
Then I am using this matrix with apply. Here the entire solution:
cbind(dat,
apply(matrix(1:ncol(dat),ncol=2),2,
function(x,by=10){ ## by 10 seconds! you can replace this
## with 100 or 400 in you real data
t.col <- dat[,x][,1] ## txxx
v.col <- dat[,x][,2] ## vxxx
ave(v.col,cut(t.col,
breaks=seq(0, max(t.col),by)),
FUN=mean)})
)
EDIT correct the cut and simplify the code
cbind(dat,
apply(matrix(1:ncol(dat),ncol=2),2,
function(x,by=10)ave(dat[,x][,1], dat[,x][,1] %/% by)))
X1.0 X X2.0 X.1 1 2
1 0.9 0.9 0.2 1.2 3.3000 3.991667
2 1.3 1.4 0.8 1.4 3.3000 3.991667
3 2.0 1.7 1.6 1.1 3.3000 3.991667
4 2.6 1.9 2.2 1.6 3.3000 3.991667
5 9.7 1.0 2.8 1.3 3.3000 3.991667
6 10.7 0.8 3.5 1.1 12.8375 3.991667
7 11.6 1.5 4.1 1.8 12.8375 3.991667
8 12.1 1.4 4.7 1.2 12.8375 3.991667
9 12.6 1.8 5.4 1.2 12.8375 3.991667
10 13.2 2.1 6.3 1.3 12.8375 3.991667
11 13.7 1.6 6.9 1.1 12.8375 3.991667
12 14.2 2.2 9.4 1.3 12.8375 3.991667
13 14.6 1.8 10.0 1.5 12.8375 10.000000

Resources