I have a data set of stock prices that have already been rounded to 2 decimal places (1234.56). I am now trying to round to a specific value which is different for each stock. Here are some examples:
Current Stock Price Minimum Tick Increment Desired Output
123.45 .50 123.50
155.03 .10 155.00
138.24 .50 138.00
129.94 .10 129.90
... ... ...
I'm not really sure how to do this but am open to suggestions.
Probably,
round(a/b)*b
will do the work.
> a <- seq(.1,1,.13)
> b <- c(.1,.1,.1,.2,.3,.3,.7)
> data.frame(a, b, out = round(a/b)*b)
a b out
1 0.10 0.1 0.1
2 0.23 0.1 0.2
3 0.36 0.1 0.4
4 0.49 0.2 0.4
5 0.62 0.3 0.6
6 0.75 0.3 0.6
7 0.88 0.7 0.7
I'm not familiar with R the language, but my method should work with any language with a ceiling function. I assume it's rounded UP to nearest 0.5:
a = ceiling(a*2) / 2
if a = 0.4, a = ceiling(0.4*2)/2 = ceiling(0.8)/2 = 1/2 = 0.5
if a = 0.9, a = ceiling(0.9*2)/2 = ceiling(1.8)/2 = 2/2 = 1
Like what JoshO'Brien said in the comments: round_any in the package plyr works very well!
> library(plyr)
> stocks <- c(123.45, 155.03, 138.24, 129.94)
> round_any(stocks,0.1)
[1] 123.4 155.0 138.2 129.9
>
> round_any(stocks,0.5)
[1] 123.5 155.0 138.0 130.0
>
> round_any(stocks,0.1,f = ceiling)
[1] 123.5 155.1 138.3 130.0
>
> round_any(stocks,0.5,f = floor)
[1] 123.0 155.0 138.0 129.5
Read more here:
https://www.rdocumentation.org/packages/plyr/versions/1.8.4/topics/round_any
The taRifx package has just such a function:
> library(taRifx)
> roundnear( seq(.1,1,.13), c(.1,.1,.1,.2,.3,.3,.7) )
[1] 0.1 0.2 0.3 0.4 0.6 0.6 0.7
In your case, just feed it the stock price and the minimum tick increment as its first and second arguments, and it should work its magic.
N.B. This has now been deprecated. See comment.
Related
Please have a look at the preview of the data in theimage. I would like to create 3 new columns i.e. Start, End, Density and create new row for each record in these 3 columns.
In accordance with comments above you can converse list into the data.frame as below:
# simulation of data.frame with one row and one cell with histogram
z <- hist(rnorm(1000))
z$start <- z$breaks[-length(z$breaks)]
z$end <- z$breaks[-1]
z[c("mids", "xname", "breaks", "equidist", "counts")] <- NULL
names_z <- names(z)
attributes(z) <- NULL
df <- data.frame(a = 1, b = 2, x = I(list((z))))
# Conversion of list to dataframe
setNames(as.data.frame(unlist(df["x"], recursive = FALSE)), names_z)
Output:
density start end
1 0.012 -3.0 -2.5
2 0.042 -2.5 -2.0
3 0.082 -2.0 -1.5
4 0.182 -1.5 -1.0
5 0.288 -1.0 -0.5
6 0.354 -0.5 0.0
7 0.418 0.0 0.5
8 0.300 0.5 1.0
9 0.172 1.0 1.5
10 0.088 1.5 2.0
11 0.050 2.0 2.5
12 0.012 2.5 3.0
I have a data frame that looks like this:
Subject Time Freq1 Freq2 ...
A 6:20 0.6 0.1
A 6:30 0.1 0.5
A 6:40 0.6 0.1
A 6:50 0.6 0.1
A 7:00 0.3 0.4
A 7:10 0.1 0.5
A 7:20 0.1 0.5
B 6:00 ... ...
I need to delete the rows in the time range it is not from 7:00 to 7:30.So in this case, all the 6:00, 6:10, 6:20...
I have tried creating a data frame with just the times I want to keep but I does not seem to recognize the times as a number nor as a name. And I get the same error when trying to directly remove the ones I don't need. It is probably quite simple but I haven't found any solution.
Any suggestions?
We can convert the time column to a Period class under the package lubridate and then filter the data frame based on that column.
library(dplyr)
library(lubridate)
dat2 <- dat %>%
mutate(HM = hm(Time)) %>%
filter(HM < hm("7:00") | HM > hm("7:30")) %>%
select(-HM)
dat2
# Subject Time Freq1 Freq2
# 1 A 6:20 0.6 0.1
# 2 A 6:30 0.1 0.5
# 3 A 6:40 0.6 0.1
# 4 A 6:50 0.6 0.1
# 5 B 6:00 NA NA
DATA
dat <- read.table(text = "Subject Time Freq1 Freq2
A '6:20' 0.6 0.1
A '6:30' 0.1 0.5
A '6:40' 0.6 0.1
A '6:50' 0.6 0.1
A '7:00' 0.3 0.4
A '7:10' 0.1 0.5
A '7:20' 0.1 0.5
B '6:00' NA NA",
header = TRUE)
I am new to R and I wanted to generate a matrix of joint probabilities.
Using this function:
> simul.commonprob(margprob=c(0.1,0.25,0.2), corr=0, method="integrate", n1=10^5, n2=10)
I got the following results:
0 0.1 0.1 : done
0 0.1 0.25 : done
0 0.1 0.2 : done
0 0.25 0.25 : done
0 0.25 0.2 : done
0 0.2 0.2 : done
, , 0
0.1 0.25 0.2
0.1 0.010 0.0250 0.02
0.25 0.025 0.0625 0.05
0.2 0.020 0.0500 0.04
I want to substract this matrix so I can use it later as an input to another function, in other words I want this result:
0.1 0.25 0.2
0.1 0.010 0.0250 0.02
0.25 0.025 0.0625 0.05
0.2 0.020 0.0500 0.04
How can I get it?
The matrix that you get is the returned object from the function that you call. You can store it in an object that you define like this:
myOutput <- simul.commonprob(margprob=c(0.1,0.25,0.2), corr=0, method="integrate", n1=10^5, n2=10)
myMatrix <- matrix(data = as.vector(myOutput), nrow = 3, ncol = 3)
rownames(myMatrix) <- c(0.1,0.25,0.2)
colnames(myMatrix) <- c(0.1,0.25,0.2)
Now myMatrix has the output that you wanted. You can use it as input for other functions.
I am using the 'diamonds' dataset from ggplot2 and am wanting to find the average of the 'carat' column. However, I want to find the average every 0.1:
Between
0.2 and 0.29
0.3 and 0.39
0.4 and 0.49
etc.
You can use function aggregate to mean by group which is calculated with carat %/% 0.1
library(ggplot2)
averageBy <- 0.1
aggregate(diamonds$carat, list(diamonds$carat %/% averageBy * averageBy), mean)
Which gives mean by 0.1
Group.1 x
1 0.2 0.2830764
2 0.3 0.3355529
3 0.4 0.4181711
4 0.5 0.5341423
5 0.6 0.6821408
6 0.7 0.7327491
...
I have a big data frame (104029 x 142).
I want to filter rows which value>0 by multi specific column names.
df
word abrasive abrasives abrasivefree abrasion slurry solute solution ....
1 composition -0.2 0.2 -0.3 -0.40 0.2 0.1 0.20 ....
2 ceria 0.1 0.2 -0.4 -0.20 -0.1 -0.2 0.20 ....
3 diamond 0.3 -0.5 -0.6 -0.10 -0.1 -0.2 -0.15 ....
4 acid -0.1 -0.1 -0.2 -0.15 0.1 0.3 0.20 ....
....
Now I have tried to use filter() function to do, and it's OK.
But I think this way is not efficient for me.
Because I need to define each column name, it makes hard work when I need to maintain my process.
column_names <- c("agent", "agents", "liquid", "liquids", "slurry",
"solute", "solutes", "solution", "solutions")
df_filter <- filter(df, agents>0 | agents>0 | liquid>0 | liquids>0 | slurry>0 | solute>0 |
solutes>0 | solution>0 | solutions>0)
df_filter
word abrasive abrasives abrasivefree abrasion slurry solute solution ....
1 composition -0.2 0.2 -0.3 -0.40 0.2 0.1 0.20 ....
2 ceria 0.1 0.2 -0.4 -0.20 -0.1 -0.2 0.20 ....
4 acid -0.1 -0.1 -0.2 -0.15 0.1 0.3 0.20 ....
....
Is there any more efficient way to do?
This line will return vector of True/False for the condition you are testing
filter_condition <- apply(df[ , column_names], 1, function(x){sum(x>0)} )>0
Then you can use
df[filter_condition, ]
I'm sure there is something nicer in dplyr.
Use dplyr::filter_at() which allows you to use select()-style helpers to select some functions:
library(dplyr)
df_filter <- df %>%
filter_at(
# select all the columns that are in your column_names vector
vars(one_of(column_names))
# if any of those variables are greater than zero, keep the row
, any_vars( . > 0)
)