Remove values in vector from double variable in R - r

I have a variable of type double X: 1.5 1.3 0.6 1.8 2.9 2.1 1.5 1.4 5.8 0.0
and a vector V: c(0.6,2.9). I want to remove the values in V from X
test<-X[!X %in% V]
The values are not removed from test:
test
[1] 1.5 1.3 0.6 1.8 2.9 2.1 1.5 1.4 5.8 0.0`
I tried the following:
are.equal <- function(x, y, eps = .Machine$double.eps^0.5) abs(x - y) < eps
test=X[!(are.equal(X,0.6))]
0.6 were removed..
I could have something odd in my data or my system.
Any idea?

Related

Julia: Floats sum is wrong?

How can I make sure that by adding 0.2 at every iteration I get the correct result?
some = 0.0
for i in 1:10
some += 0.2
println(some)
end
the code above gives me
0.2
0.4
0.6000000000000001
0.8
1.0
1.2
1.4
1.5999999999999999
1.7999999999999998
1.9999999999999998
Floats are only approximatively correct and if adding up to infinity the error will become infinite, but you can still calculate with it pretty precisely. If you need to evaluate the result and look if it is correct you can use isapprox(a,b) or a ≈ b.
I.e.
some = 0.
for i in 1:1000000
some += 0.2
end
isapprox(some, 1000000 * 0.2)
# true
Otherwise, you can add integer numbers in the for loop and then divide by 10.
some = 0.
for i in 1:10
some += 2.
println(some/10.)
end
#0.2
#0.4
#0.6
#0.8
#1.0
#1.2
#1.4
#1.6
#1.8
#2.0
More info about counting with floats:
https://en.wikipedia.org/wiki/Floating-point_arithmetic
You can iterate over a range since they use some clever tricks to return more "natural" values:
julia> collect(0:0.2:2)
11-element Vector{Float64}:
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
julia> collect(range(0.0, step=0.2, length=11))
11-element Vector{Float64}:
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0

Trying to use a variable as label in ggplots

I'm not sure what's going on here, but when I try to run ggplots, it tells me that u and u1 are not valid lists. Did I enter u and u1 incorrectly, that it thinks these are functions, did I forget something, or did I enter things wrong into ggplots?
u1 <- function(x,y){max(utilityf1(x))}
utilityc1 <- data.frame("utilityc1" =
u(c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,20),
c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,20)))
utilityc1 <- data.frame("utilityc1" =
u1(c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,20),
c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,20)))
hhcomp <- data.frame(
pqx, pqy, utility, hours, p1qx, p1qy, utilit, utilityc1,
utilityc, u,u1, o, o1, o2
)
library(ggplot2)
ggplot(hhcomp, aes(x=utility, y=consumption))+
coord_cartesian(xlim = c(0, 16) )+
ylim(0,20)+
labs(x = "leisure(hours)",y="counsumption(units)")+
geom_line(aes(x = u, y = consumption))+
geom_line(aes(x = u1, y = consumption))
I'm not sure what else to explain, so if someone could provide some help on providing code to stack overflow that would be useful. I'm also not sure how much of a description to have, I should have enough code to be reproducible, but there is a problem that Stack Overflow only allows so much code, so it would be good to know the right amount to add.
I think you may need to read the documentation for ggplot2 and maybe r in general.
data.frame
For starters, a data.frame object is a collection of vectors appended together column wise. Most of what you have defined as inputs for hhcomp are functions, which cannot be stored as a data.frame. A canonical example of a data frame in r is iris
head(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 5.1 3.5 1.4 0.2 setosa
#2 4.9 3.0 1.4 0.2 setosa
#3 4.7 3.2 1.3 0.2 setosa
#4 4.6 3.1 1.5 0.2 setosa
#5 5.0 3.6 1.4 0.2 setosa
#6 5.4 3.9 1.7 0.4 setosa
str(iris) #print the structure of an r object
#'data.frame': 150 obs. of 5 variables:
# $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
# $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
# $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
# $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
# $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
functions
There is a lot going on with your functions. Nested functions are fine, but it seems as though you are failing to pass all values on. This probably means you are trying to apply R's scoping rules but this makes code ambiguous of where values are found.
With the currently defined functions, calling u(1:2,3:4) passes 1:2 to utilityf but utilityf's y argument is never assigned (but with r's lazy evaluation we reach a different error before r realizes that this value is missing). The next function that gets evaluated in this nest is p1qyf which is defined as follows
p1qyf <- function(y){(w1*16)-(w1*x)}
with this definition, it does not matter what you pass to the argument y it will never be used and will always return the same thing.
#with only the function defined
p1qyf()
#Error in p1qyf() : object 'w1' not found
#defining w1
w1 <- 1.5
p1qyf()
#Error in p1qyf() : object 'x' not found
x <- 10:20
#All variables defined in the function
#can now be found in the global environment
#thus the function can be called with no errors because
#w1 and x are defined somewhere...
p1qyf() #nothing assigned to y
[1] 9.0 7.5 6.0 4.5 3.0 1.5 0.0 -1.5 -3.0 -4.5 -6.0
p1qyf(y = iris) #a data.frame assigned to y
[1] 9.0 7.5 6.0 4.5 3.0 1.5 0.0 -1.5 -3.0 -4.5 -6.0
p1qyf(y = foo_bar) #an object that hasn't even been assigned yet
[1] 9.0 7.5 6.0 4.5 3.0 1.5 0.0 -1.5 -3.0 -4.5 -6.0
I imagine you actually intend to define it this way
p1qyf <- function(y){(w1*16)-(w1*y)}
#Now what we pass to it affects the output
p1qyf(1:10)
#[1] 22.5 21.0 19.5 18.0 16.5 15.0 13.5 12.0 10.5 9.0
head(p1qyf(iris))
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 16.35 18.75 21.90 23.7 NA
#2 16.65 19.50 21.90 23.7 NA
#3 16.95 19.20 22.05 23.7 NA
#4 17.10 19.35 21.75 23.7 NA
#5 16.50 18.60 21.90 23.7 NA
#6 15.90 18.15 21.45 23.4 NA
You can improve this further by defining more arguments so that R doesn't need to search for missing values with it's scoping rules
p1qyf <- function(y, w1 = 1.5){(w1*16)-(w1*y)}
#w1 is defaulted to 1.5 and doesn't need to be searched for.
I would spend some time looking into your functions because they are unclear and some, such as your p1qyf, do not fully use the arguments they are passed.
ggplot
ggplot takes some type of structured data object such as data.frame tbl_df, and allows plotting. The aes mappings can take the symbol names of the column headers you wish to map. Continuing with iris as an example.
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species))+
geom_point() +
geom_line()
I hope this helps clears up why you may be getting some errors. Honestly though, if you were actually able to declare a data.frame then the problem here is that your post is still not that reproducible. Good luck
pqxf <- function(x){(1)*(y)} # replace 1 with py and assign a value to py
pqyf <- function(y){(w * 16)-(w * x)} #
utilityf <- function(x, y) { (pqyf(x)) * ((pqxf(y)))} # the utility function C,l
hours <- c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,20)
w1 <- 1.5
p1qxf <- function(x){(1)*(y)} # replace 1 with py and assign a value to p1y
p1qyf <- function(y){(w1 * 16)-(w1 * x)} #
utilityf1 <- function(x, y) { (p1qyf(x)) * ((p1qxf(y)))} # the utility function (C,l)
utilitycf <- function(x,y){max(utilityf(x))/((pqyf(y)))}
utilityc1f <- function(x,y){max(utilityf1(x))/((pqyf(y)))}
u <- function(x,y){max(utilityf(x))}
u1 <- function(x,y){max(utilityf1(x))}```

How can subset a dataframe by nrow and groups in r?

I have a dataframe that contains 240,000 obs. of 7 variables. In the dataframe there are 100 groups of 2400 records each, by Symbol. Example:
Complete DataFrame
I want to split this dataframe in new dataframe that contains every first observation and each 240 observation. The new dataframe will be 1000 obs of 7 variables:
New DataFrame
I tried df[seq(1, nrow(df), 240), ] but the new dataframe has each 240 observation and not distinguished by group (Symbol). I mean, I want a new dataframe that contains the rows 240, 480, 720, 960, and so on, for each symbol. In the original data frame every symbol has 2400 obs thus the new dataframe will have 10 obs by group.
Since we don't have your data, we can use an R database: iris. In this example we split iris by Species and select first n rows using head, in this example I set n=5 to extract first 5 rows by Species
> split_data <- lapply(split(iris, iris$Species), head, n=5)
> do.call(rbind, split_data)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
setosa.1 5.1 3.5 1.4 0.2 setosa
setosa.2 4.9 3.0 1.4 0.2 setosa
setosa.3 4.7 3.2 1.3 0.2 setosa
setosa.4 4.6 3.1 1.5 0.2 setosa
setosa.5 5.0 3.6 1.4 0.2 setosa
versicolor.51 7.0 3.2 4.7 1.4 versicolor
versicolor.52 6.4 3.2 4.5 1.5 versicolor
versicolor.53 6.9 3.1 4.9 1.5 versicolor
versicolor.54 5.5 2.3 4.0 1.3 versicolor
versicolor.55 6.5 2.8 4.6 1.5 versicolor
virginica.101 6.3 3.3 6.0 2.5 virginica
virginica.102 5.8 2.7 5.1 1.9 virginica
virginica.103 7.1 3.0 5.9 2.1 virginica
virginica.104 6.3 2.9 5.6 1.8 virginica
virginica.105 6.5 3.0 5.8 2.2 virginica
>
Update
Given your comment, try this using your data.frame:
ind <- seq(from=240, to=240000, by=240) # a row index of length = 1000
split_data <- lapply(split(yourData, yourData$Symbol), function(x) x[ind,] )
do.call(rbind, split_data)
Here is one way using base R.
just like in the answer by user #Jilber Urbina I will give an example use with the built-in dataset iris.
fun <- function(DF, n = 240, start = n){
DF[seq(start, NROW(DF), by = n), ]
}
res <- lapply(split(iris, iris$Species), fun, n = 24)
res <- do.call(rbind, res)
row.names(res) <- NULL
res
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 5.1 3.3 1.7 0.5 setosa
#2 4.6 3.2 1.4 0.2 setosa
#3 6.1 2.8 4.7 1.2 versicolor
#4 6.2 2.9 4.3 1.3 versicolor
#5 6.3 2.7 4.9 1.8 virginica
#6 6.5 3.0 5.2 2.0 virginica
This can be made into a function, I named selectStepN.
#
# x - dataset to subset
# f - a factor, split criterion
# n - the step
#
selectStepN <- function(x, f, n = 240, start = n){
fun <- function(DF, n){
DF[seq(start, NROW(DF), by = n), ]
}
res <- lapply(split(x, f), fun, n = n)
res <- do.call(rbind, res)
row.names(res) <- NULL
res
}
selectStepN(iris, iris$Species, 24)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 5.1 3.3 1.7 0.5 setosa
#2 4.6 3.2 1.4 0.2 setosa
#3 6.1 2.8 4.7 1.2 versicolor
#4 6.2 2.9 4.3 1.3 versicolor
#5 6.3 2.7 4.9 1.8 virginica
#6 6.5 3.0 5.2 2.0 virginica

R range referencing with different dimensions not causing an error

In using range referencing I normally expect to see an error or at least a warning message when the operations in '[' ']' do not match the dimensions of the parent object, however I have just discovered that I am not seeing said warnings and errors. Is there a setting for this or a way to force an error? Example:
x = 1:5
y = 10:12
x[y>10]
y[x>2]
likewise this applies to data frames and other R objects:
dat = data.frame(x=runif(100),y=1:100)
dat[sample(c(TRUE,FALSE),23),c(TRUE,FALSE)]
The silent repetition and truncation of the references to match the dimensions of the parent object is unexpected, having used R for years, I've somehow never noticed this before.
I'm using R Console (64-bit) 3.0.1 for Windows (could be updated yes, but I hope this isn't the cause).
Edit: Fixed data.frame example as data.frame's don't allow more column references than columns. Thanks zero323.
You could modify the `[.data.frame` function to throw a warning when indexing with a logical vector that doesn't evenly divide the number of rows:
`[.data.frame` <- function(x, i, j, drop = if (missing(i)) TRUE else length(cols) == 1) {
if (!missing(i) && is.logical(i) && nrow(x) %% length(i) != 0) {
warning("Indexing data frame with logical vector that doesn't evenly divide row count")
}
base::`[.data.frame`(x, i, j, drop)
}
Here's a demonstration with the 150-row iris dataset, passing logical indexing vectors of length 11 (should cause warning) and 15 (should not cause warning):
iris[c(rep(FALSE, 10), TRUE),]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 11 5.4 3.7 1.5 0.2 setosa
# 22 5.1 3.7 1.5 0.4 setosa
# 33 5.2 4.1 1.5 0.1 setosa
# 44 5.0 3.5 1.6 0.6 setosa
# 55 6.5 2.8 4.6 1.5 versicolor
# 66 6.7 3.1 4.4 1.4 versicolor
# 77 6.8 2.8 4.8 1.4 versicolor
# 88 6.3 2.3 4.4 1.3 versicolor
# 99 5.1 2.5 3.0 1.1 versicolor
# 110 7.2 3.6 6.1 2.5 virginica
# 121 6.9 3.2 5.7 2.3 virginica
# 132 7.9 3.8 6.4 2.0 virginica
# 143 5.8 2.7 5.1 1.9 virginica
# Warning message:
# In `[.data.frame`(iris, c(rep(FALSE, 10), TRUE), ) :
# Indexing data frame with logical vector that doesn't evenly divide number of rows
iris[c(rep(FALSE, 14), TRUE),]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 15 5.8 4.0 1.2 0.2 setosa
# 30 4.7 3.2 1.6 0.2 setosa
# 45 5.1 3.8 1.9 0.4 setosa
# 60 5.2 2.7 3.9 1.4 versicolor
# 75 6.4 2.9 4.3 1.3 versicolor
# 90 5.5 2.5 4.0 1.3 versicolor
# 105 6.5 3.0 5.8 2.2 virginica
# 120 6.0 2.2 5.0 1.5 virginica
# 135 6.1 2.6 5.6 1.4 virginica
# 150 5.9 3.0 5.1 1.8 virginica
Expanding on #josilber I've written the following for atomic vector and matrix subsetting in case anyone else wants it:
`[` <- function(x, i) {
if(!missin
g(i) && is.logical(i) && (length(x) %% length(i) != 0 || length(i) > length(x))) {
warning("Indexing atomic vector with logical vector that doesn't evenly divide row count")
}
base::`[`(x,i)
}
`[` <- function(x,i,j,...,drop=TRUE) {
if (!missing(i) && is.logical(i) && nrow(x) %% length(i) != 0) {
warning("Indexing matrix with logical vector that doesn't evenly divide row count")
}
if (!missing(j) && is.logical(j) && nrow(x) %% length(j) != 0) {
warning("Indexing matrix with logical vector that doesn't evenly divide column count")
}
base::`[`(x,i,j,...,drop)
}
Testing my original example afterwards with this modification now produces the warning and other operations behave as per normal:
> x =
1:5
> y = 10:12
> x[y>10]
[1] 2 3 5
Warning message:
In x[y > 10] :
Indexing atomic vector with logical vector that doesn't evenly divide row count
> y[x>2]
[1] 12 NA NA
Warning message:
In y[x > 2] :
Indexing atomic vector with logical vector that doesn't evenly divide row count
> x[x>2]
[1] 3 4 5
> x[1:2]
[1] 1 2

How do I make a list of numbers by tenths?

I know that 1:10 will give me a vector of all integers from 1 to 10, but how can I get numbers from 1 to 2 going up by tenths (i.e., 1.0, 1.1, 1.2, ..., 2.0)?
Try seq
> seq(1, 2, by = 0.1)
[1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
Just in the spirit of there is more than one way to do things, another option is:
> (10:20)/10
[1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

Resources