This is what my data.table looks like. The A:E columns are just to draw comparison to excel. Column NewShares is my desired column. I DO NOT have that column in my data.
A B C D E F
dt<-fread('
InitialShares Level Price Amount CashPerShare NewShares
1573.333 0 9.5339 13973.71 0 1573.333
0 1 10.2595 0 .06689 1584.73
0 1 10.1575 0 .06689 1596.33
0 1 9.6855 0 .06689 1608.58')
I am trying to calculate NewShares with the assumption that new shares are added to InitialShares by reinvesting dividends(NewShares*CashPershare) at 90% of the price(Price*.9). In excel land the formula will be =F2+((F2*E3*B3)/(C3*0.9)) as of the second row. The first row is just equal to InitialShares.
In R land, I am trying(which is not quite right):
dt[,NewShares:= cumsum(InitialShares[1]*Level * CashPerShare/(Price*.9)+InitialShares[1])]
Please pay attention to the Decimal points of NewShares once you generate the field in order to validate your approach.
If you expand your formula, you'll realize that this works:
dt[, NewShares := cumprod(1+Level*CashPerShare/Price/0.9)*InitialShares[1]]
Related
I want to get a proportion of pcr detection and create a new column using the following test dataset. I want a proportion of detection for each row, only in the columns pcr1 to pcr6. I want the other columns to be ignored.
site sample pcr1 pcr2 pcr3 pcr4 pcr5 pcr6
pond 1 1 1 1 1 0 1 1
pond 1 2 1 1 0 0 1 1
pond 1 3 0 0 1 1 1 1
I want the output to create a new column with the proportion detection. The dataset above is only a small sample of the one I am using. I've tried:
data$detection.proportion <- rowMeans(subset(testdf, select = c(pcr1, pcr2, pcr3, pcr4, pcr5, pcr6)), na.rm = TRUE)
This works for this small dataset but I tried on my larger one and it did not work and it would give the incorrect proportions. What I'm looking for is a way to count all the 1s from pcr1 to pcr6 and divide them by the total number of 1s and 0s (which I know is 6 but I would like R to recognize this in case it's not inputted).
I found a way to do it in case anyone else needed to. I don't know if this is the most effective but it worked for me.
data$detection.proportion <- length(subset(testdf, select = c(pcr1, pcr2, pcr3, pcr4, pcr5, pcr6)))
#Calculates the length of pcrs = 6
p.detection <- rowSums(data[,c(-1, -2, -3, -10)] == "1")
#Counts the occurrences of 1 (which is detection) for each row
data$detection.proportion <- p.detection/data$detection.proportion
#Divides the occurrences by the total number of pcrs to find the detected proportion.
Simple R question here, and a little similar to this one but I couldn't figure out how to adapt the insights from there into my setting.
I have a dataframe with relative quality rankings from several firms, e.g.
Firm Quality
A 4
B 5
C 2
D 0
I want to add a third column that is 1 if quality is at or above the 50th percentile (and 0 otherwise), and a fourth column that is 1 if quality is at or above the 75th percentile (and 0 otherwise). Solutions like the one linked above seem to rely on cut() and within(); they are relatively old though, pre-dplyr, and I'm wondering if there's a good way to use summarise() and the dplyr summary functions to do this in a way that is more intuitive (at least for this newbie).
You could do something like
library(dplyr)
df %>%
mutate(Above50 = as.numeric(Quality >= quantile(Quality, 0.5)),
Above75 = as.numeric(Quality >= quantile(Quality, 0.75)))
# Firm Quality Above50 Above75
#1 A 4 1 0
#2 B 5 1 1
#3 C 2 0 0
#4 D 0 0 0
Ronak's answer is perfectly fine, but just for the fun of it, a fully dplyr solution:
library(dplyr)
df %>%
mutate(Above50 = as.numeric(ntile(Quality, 2)==2),
Above75 = as.numeric(ntile(Quality, 4)==4))
I just want to achieve a thing on R. Here is the explanation,
I have data sets which contains same value, please find the below data sets,
A B
1122513454 0
1122513460 0
1600041729 0
2100002632 147905
2840007103 0
2840064133 138142
3190300079 138040
3190301011 138120
3680024411 0
4000000263 4000000263
4100002263 4100002268
4880004352 138159
4880015611 138159
4900007044 0
7084781116 142967
7124925306 0
7225002523 7225001325
23012600000 0
80880593057 0
98880000045 0
I have two columns (A & B). In the b column, I have the same value (138159,138159). It appears two times.
I just want to make a calculation, where it will get the same value it will count as 1. That means, I am getting two 138159, but that will be treated as 1. and finally it will count the whole b column value except 0. That means, 0 is here 10 times and the other value is also 10 times, but 138519 appears 2 times, so it will be counted as 1, so other values are 9 times and finally it will give me only other value's count i.e 9.
So my expected output will be 9
I have already done this in excel. But, want to achieve the same in R. Is there any way to do it in R by dplyr package?
I have written following formula in excel,
=+SUMPRODUCT((I2:I14<>0)/COUNTIFS(I2:I14,I2:I14))
how can I count only other value's record without 0?
Can you guys help me with that?
any suggestion is really appreciable.
Edit 1: I have done this by following way,
abc <- hardy[hardy$couponid !=0,]
undertaker <- abc %>%
group_by(TYC) %>%
summarise(count_couponid= n_distinct(couponid))
any smart way to do that?
Thanks
I am trying to convert a data.frame to table without packages. Basically I take cookbook as reference for this and tried from data frame, both named or unnamed vectors. The data set is stackoverflow survey from kaggle.
moreThan1000 is a data.frame stores countries those have more than 1000 stackoverflow user and sorted by number column as shown below:
moreThan1000 <- subset(users, users$Number >1000)
moreThan1000 <- moreThan1000[order(moreThan1000$Number),]
when I try to convert it to a table like
tbl <- table(moreThan1000)
tbl <- table(moreThan1000$Country, moreThan1000$Number)
tbl <- table(moreThan1000$Country, moreThan1000$Number, dnn = c("Country","Number"))
after each attempt my conversion look like this:
Why moreThan1000 data.frame do not send just related countries but all countries to table? It seems to me conversion looks like a matrix.
I believe that this is because countries do not relate to each other. To each country corresponds a number, to another country will correspond an unrelated number. So the best way to reflect this is the original data.frame, not a table that will have just one 1 per row (unless two countries have the very same number of stackoverflow users). I haven't downloaded the dataset you're using, but look to what happens with a fake dataset, order by number just like your moreThan1000.
dat <- data.frame(A = letters[1:5], X = 21:25)
table(dat$A, dat$X)
21 22 23 24 25
a 1 0 0 0 0
b 0 1 0 0 0
c 0 0 1 0 0
d 0 0 0 1 0
e 0 0 0 0 1
Why would you expect anything different from your dataset?
The function "table" is used to tabulate your data.
So it will count how often every value occurs (in the "number"column!). In your case, every number only occurs once, so don't use this function here. It's working correctly, but it's not what you need.
Your data is already a tabulation, no need to count frequencies again.
You can check if there is an object conversion function, I guess you are looking for a function as.table rather than table.
I am trying to run a cumsum on a data frame on two separate columns. They are essentially tabulation of events for two different variables. Only one variable can have an event recorded per row in the data frame. The way I attacked the problem was to create a new variable, holding the value ‘1’, and create two new columns to sum the variables totals. This works fine, and I can get the correct total amount of occurrences, but the problem I am having is that in my current ifelse statement, if the event recorded is for variable “A”, then variable “B” is assigned 0. But, for every row, I want to have the previous variable’s value assigned to the current row, so that I don’t end up with gaps where it goes from 1 to 2, to 0, to 3.
I don't want to run summarize on this either, I would prefer to keep each recorded instance and run new columns through mutate.
CURRENT DF:
Event Value Variable Total.A Total.B
1 1 A 1 0
2 1 A 2 0
3 1 B 0 1
4 1 A 3 0
DESIRED RESULT:
Event Value Variable Total.A Total.B
1 1 A 1 0
2 1 A 2 0
3 1 B 2 1
4 1 A 3 1
Thanks!
You can use the property of booleans that you can sum them as ones and zeroes. Therefore, you can use the cumsum-function:
DF$Total.A <- cumsum(DF$variable=="A")
Or as a more general approach, provided by #Frank you can do:
uv = unique(as.character(DF$Variable))
DF[, paste0("Total.",uv)] <- lapply(uv, function(x) cumsum(DF$V == x))
If you have many levels to your factor, you can get this in one line by dummy coding and then cumsuming the matrix.
X <- model.matrix(~Variable+0, DF)
apply(X, 2, cumsum)