What I want to do is take the split_coefficient value in the rows with the split_coefficient !=1 to be used in calculations with the adjusted_close for the prior dates in the data frame. I'm trying to create a loop in R that will multiple the adjusted_close values by the split_coefficient up to but not including the row which contains split_coefficient that != 1 and repeat the process to the end of the data set. I am able to identify those rows with split_coefficients != 1 using which(y[,6] !=1, but cannot figure out how to write the loops to accomplish this task. Any help on how to create this loop would be greatly appreciated. Thank you in advance.
timestamp open high low close adjusted_close split_coefficient
7/20/2018 31.61 31.72 30.95 31.04 31.04 1
7/19/2018 31.17 31.57 30.69 31.19 31.19 1
7/18/2018 30.53 31.33 30.26 30.63 30.63 1
7/17/2018 31.67 31.825 30.49 30.89 30.89 1
7/16/2018 31.24 31.79 31 31.23 31.23 1
7/13/2018 32.06 32.37 31.36 31.45 31.45 1
7/12/2018 32.29 32.68 31.69 31.69 31.69 1
7/11/2018 33.37 33.47 32.43 32.93 32.93 1
7/10/2018 32.19 32.8185 31.75 31.84 31.84 1
7/9/2018 33.32 33.37 32.249 32.48 32.48 0.25
7/6/2018 36.03 36.17 34.15 34.23 34.23 1
7/5/2018 36.47 37.46 36.05 36.09 36.09 1
7/3/2018 36.28 37.8299 36 37.33 37.33 1
7/2/2018 38.74 39.22 37.03 37.08 37.08 1
6/29/2018 36.71 37.06 35.78 37 37 1
6/28/2018 38.88 40.51 37.46 38.03 38.03 0.35
6/27/2018 36.14 39.43 35.21 38.56 38.56 1
6/26/2018 36.54 37.89 35.715 36.48 36.48 1
6/25/2018 34.24 39.745 34.24 38.11 38.11 1
6/22/2018 33.04 33.57 32.72 33.06 33.06 1
6/21/2018 32.26 34.84 32.21 34.15 34.15 1
6/20/2018 32.13 32.21 31.655 32.02 32.02 0.5
6/19/2018 33.33 33.92 32.43 32.79 32.79 1
6/18/2018 32.55 33.02 31.19 31.24 31.24 1
6/15/2018 31.94 32.52 31.52 31.67 31.67 1
6/14/2018 31.5 31.83 30.91 31.33 31.33 1
6/13/2018 31.58 32.45 31.44 32.39 32.39 1
6/12/2018 31.86 32.41 31.66 31.97 31.97 1
6/11/2018 32.67 32.77 31.91 32.09 32.09 1
6/8/2018 33.46 33.56 32.41 32.6 32.6 1
I'll try to clarify my question:
On 6/20/18, the split coefficient is .50. What I want to do is multiple the split_coefficient of .5 by the adjusted_close values from 6/8/18 to 6/19/18. The split_coefficient then changes to .35 on 6/28/18 where I want to multiple the Adjusted_close from 6/21/18 to 6/27/18 by .35. Since the split_coefficient changes periodically, I thought a loop or series of loops would accomplish this.
Based on what I wrote above, I am looking for the following output with anew column named New.adj.Close which will contain the values calculated when multiplying the split_coefficient from 6/20/18 on the adjusted_close values for 6/8/18 - 6/19/18:
timestamp open high low close adjusted_close dividend_amount split_coefficient New.Adj.close
6/19/2018 33.33 33.92 32.43 32.79 32.79 0 1 16.395
6/18/2018 32.55 33.02 31.19 31.24 31.24 0 1 15.62
6/15/2018 31.94 32.52 31.52 31.67 31.67 0 1 15.835
6/14/2018 31.5 31.83 30.91 31.33 31.33 0 1 15.665
6/13/2018 31.58 32.45 31.44 32.39 32.39 0 1 16.195
6/12/2018 31.86 32.41 31.66 31.97 31.97 0 1 15.985
6/11/2018 32.67 32.77 31.91 32.09 32.09 0 1 16.045
6/8/2018 33.46 33.56 32.41 32.6 32.6 0 1 16.3
Okay this uses the tidyverse but you can recode it to use base r or whatever. The important thing is the logic.
As mentioned you do not normally want to use loops for a task like this, and in this case you would have to do a do while loop. Instead take advantage of vectorization.
measure_date <- seq(as.Date("2000/1/1"), by = "day", length.out = 20)
pattern <- c(.5, 1,1,1,1)
split_coefficient <- c(pattern, pattern, pattern, pattern)
value_to_multiply <- c(1:20)
df <- data.frame(measure_date, value_to_multiply, split_coefficient)
# doing this because OP's data is reversed
df <- dplyr::arrange(df, measure_date)
# Change the 1s to NAs.
df$newsplit <- ifelse(df$split_coefficient == 1, NA, df$split_coefficient)
df <- tidyr::fill(df , newsplit)
df$multiplied <- df$value_to_multiply*df$newsplit
df
Results
measure_date value_to_multiply split_coefficient newsplit multiplied
1 2000-01-01 1 0.5 0.5 0.5
2 2000-01-02 2 1.0 0.5 1.0
3 2000-01-03 3 1.0 0.5 1.5
4 2000-01-04 4 1.0 0.5 2.0
5 2000-01-05 5 1.0 0.5 2.5
6 2000-01-06 6 0.5 0.5 3.0
7 2000-01-07 7 1.0 0.5 3.5
8 2000-01-08 8 1.0 0.5 4.0
9 2000-01-09 9 1.0 0.5 4.5
10 2000-01-10 10 1.0 0.5 5.0
11 2000-01-11 11 0.5 0.5 5.5
12 2000-01-12 12 1.0 0.5 6.0
13 2000-01-13 13 1.0 0.5 6.5
14 2000-01-14 14 1.0 0.5 7.0
15 2000-01-15 15 1.0 0.5 7.5
16 2000-01-16 16 0.5 0.5 8.0
17 2000-01-17 17 1.0 0.5 8.5
18 2000-01-18 18 1.0 0.5 9.0
19 2000-01-19 19 1.0 0.5 9.5
20 2000-01-20 20 1.0 0.5 10.0
To clarify, do you just want to multiply adjusted_close by split_coefficient for the observations where split_coefficient equals 1? If so,
library(dplyr)
y %>% filter(split_coefficient == 1) %>% mutate(new_col = split_coefficient *adjusted_close)
Apologies if I misunderstood the question.
As highlighted in the comments, using loops in R is usually avoided and better alternatives are available. For example you can use ifelse:
df <-
data.frame(
adjusted_close = sample(1:5, 10, TRUE),
split_coefficient = sample(1:2, 10, TRUE)
)
# adjusted_close split_coefficient
# 1 5 1
# 2 2 2
# 3 3 2
# 4 2 2
# 5 4 2
# 6 5 2
# 7 1 1
# 8 2 1
# 9 2 2
# 10 2 1
df$m <- ifelse(df$split_coefficient == 1,
df$adjusted_close,
df$adjusted_close * df$split_coefficient
)
# df
# adjusted_close split_coefficient m
# 1 5 1 5
# 2 2 2 4
# 3 3 2 6
# 4 2 2 4
# 5 4 2 8
# 6 5 2 10
# 7 1 1 1
# 8 2 1 2
# 9 2 2 4
# 10 2 1 2
Related
I have this dataset
Longitude Latitude Radius Site_Type
<dbl> <dbl> <dbl> <chr>
1 -102. 1.5 5 OBS
2 -80.0 27.1 5 OBS
3 -158. 21.5 1 FEE;OBS
4 -81.6 3.98 1 FEE;OBS;NA
5 -87.0 5.50 1 OBS
6 -90.7 -0.55 1 FEE;OBS
7 -110. 24.7 1 FEE;OBS;NA
8 -89.5 28.4 1 OBS
9 -91.8 1.38 1 FEE;OBS
I want to replace NA by OBS I tried using replace() but nothing changed...
NA is character here so str_replace replace might work for you?
library(tidyverse)
df1 %>%
mutate(Site_Type = str_replace(Site_Type, "NA", "OBS"))
# Longitude Latitude Radius Site_Type
# 1 -102.0 1.50 5 OBS
# 2 -80.0 27.10 5 OBS
# 3 -158.0 21.50 1 FEE;OBS
# 4 -81.6 3.98 1 FEE;OBS;OBS
# 5 -87.0 5.50 1 OBS
# 6 -90.7 -0.55 1 FEE;OBS
# 7 -110.0 24.70 1 FEE;OBS;OBS
# 8 -89.5 28.40 1 OBS
# 9 -91.8 1.38 1 FEE;OBS
We can use sub in base R
df1$Site_Type <- sub("NA", "OBS", df1$Site_Type)
I am working on a table named kpi where I need to calculate the difference of (B1 - B0) for each BoxID per week using the Minimum No. and Maximum No. for each BoxID per week -
I couldn't calculate the First_b1, Last_b0 and Diff
kpi <- kpi %>%
mutate(weekNumber = week(dmy(Date))) %>%
group_by(SolboxID, weekNumber) %>%
arrange(SolboxID)
**Date No. BoxID B0 B1 WkNo**
29.10.2018 61931 1 0 0 44
15.11.2018 115763 1 5.38 5.38 46
16.11.2018 119833 1 51.86 52.23 46
29.10.2018 60486 3 23.26 22.97 44
10.11.2018 99576 3 1336.53 1336.53 45
14.11.2018 112259 3 1.19 1.04 46
16.11.2018 117965 3 8.68 47.22 46
16.11.2018 118092 3 47.22 47.22 46
15.11.2018 115396 4 82.05 82.05 46
Expected output table -
Date No. BoxID B0 B1 WkNo First_b1 Last_b0 Diff
29.10.2018 61931 1 0 0 44 0 0 0
15.11.2018 115763 1 5.38 5.38 46 52.23 5.38 46.85
16.11.2018 119833 1 51.86 52.23 46 52.23 5.38 46.85
29.10.2018 60486 3 23.26 22.97 44 22.97 23.26 -0.29
10.11.2018 99576 3 1336.53 1336.53 45 1336.53 1336.53 0
14.11.2018 112259 3 1.19 1.04 46 47.22 1.19 46.03
16.11.2018 117965 3 8.68 47.22 46 47.22 1.19 46.03
16.11.2018 118092 3 47.22 47.22 46 47.22 1.19 46.03
15.11.2018 115396 4 82.05 82.05 46 82.05 82.05 0
I need some help to compute the 3 more columns.
Thank you in advance.
A simple pipe seems to do the job. See if this is it.
library(tidyverse)
kpi %>%
group_by(BoxID, WkNo) %>%
mutate(i = which.min(No.),
j = which.max(No.)) %>%
mutate(First_B0 = B0[i],
Last_B1 = B1[j],
Diff = Last_B1 - First_B0) %>%
select(-i, -j)
## A tibble: 9 x 9
## Groups: BoxID, WkNo [6]
# Date No. BoxID B0 B1 WkNo First_B0 Last_B1 Diff
# <fct> <int> <int> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#1 29.10.2… 61931 1 0 0. 44 0 0 0
#2 15.11.2… 115763 1 5.38 5.38e0 46 5.38 52.2 46.8
#3 16.11.2… 119833 1 51.9 5.22e1 46 5.38 52.2 46.8
#4 29.10.2… 60486 3 23.3 2.30e1 44 23.3 23.0 -0.29
#5 10.11.2… 99576 3 1337. 1.34e3 45 1337. 1337. 0
#6 14.11.2… 112259 3 1.19 1.04e0 46 1.19 47.2 46.0
#7 16.11.2… 117965 3 8.68 4.72e1 46 1.19 47.2 46.0
#8 16.11.2… 118092 3 47.2 4.72e1 46 1.19 47.2 46.0
#9 15.11.2… 115396 4 82.0 8.20e1 46 82.0 82.0 0
Data.
kpi <- read.table(text = "
Date No. BoxID B0 B1 WkNo
29.10.2018 61931 1 0 0 44
15.11.2018 115763 1 5.38 5.38 46
16.11.2018 119833 1 51.86 52.23 46
29.10.2018 60486 3 23.26 22.97 44
10.11.2018 99576 3 1336.53 1336.53 45
14.11.2018 112259 3 1.19 1.04 46
16.11.2018 117965 3 8.68 47.22 46
16.11.2018 118092 3 47.22 47.22 46
15.11.2018 115396 4 82.05 82.05 46
", header = TRUE)
I have a data frame like this:
distance exclude
1.1 F
1.5 F
3 F
2 F
1 F
5 T
3 F
63 F
32 F
21 F
15 F
1 T
I want get the four boxplot stats of each segment of data in distance column separated by "T" in exclude column, here "T" serves as separator.
Can anyone help? Thanks so much!
First, let's create some fake data:
library(dplyr)
# Fake data
set.seed(49349)
dat = data.frame(distance=rnorm(500, 50, 10),
exclude=sample(c("T","F"), 500, replace=TRUE, prob=c(0.03,0.95)))
Now create a new group each time exclude = "T". Then, for each group, and calculate whatever statistics you wish and return the results in a data frame:
box.stats = dat %>%
mutate(group = cumsum(exclude=="T")) %>%
group_by(group) %>%
do(data.frame(n=length(.$distance),
out_90 = sum(.$distance > quantile(.$distance, 0.9)),
out_10 = sum(.$distance < quantile(.$distance, 0.1)),
MEAN = round(mean(.$distance),2),
SD = round(sd(.$distance),2),
out_2SD_high = sum(.$distance > mean(.$distance) + 2*sd(.$distance)),
round(t(quantile(.$distance, probs=c(0,0.1,0.25,0.5,0.75,0.9,1))),2)))
names(box.stats) = gsub("X(.*)\\.$", "p\\1", names(box.stats))
box.stats
group n out_90 out_10 MEAN SD out_2SD_high p0 p10 p25 p50 p75 p90 p100
1 0 15 2 2 46.21 8.78 0 28.66 36.03 41.88 46.04 52.33 56.30 61.98
2 1 36 4 4 50.03 10.01 0 21.71 38.78 44.63 51.13 56.66 61.58 67.84
3 2 80 8 8 50.36 9.00 1 20.30 38.10 45.95 51.28 56.51 61.74 70.44
4 3 9 1 1 55.62 8.58 0 42.11 47.10 49.19 54.54 63.63 65.84 67.88
5 4 16 2 2 47.70 7.79 0 29.03 39.89 43.60 49.26 52.92 56.97 58.02
6 5 66 7 7 49.86 9.93 2 24.84 36.00 45.05 50.51 55.65 61.41 75.27
7 6 44 5 5 50.35 10.39 1 31.72 36.36 43.49 50.95 55.78 64.88 73.64
8 7 80 8 8 49.18 9.24 1 27.62 37.86 42.06 50.34 56.60 59.66 72.13
9 8 31 3 3 52.56 11.18 0 25.78 39.94 44.10 51.32 62.02 66.35 70.40
10 9 60 6 6 50.31 9.82 1 25.43 37.44 44.53 50.31 56.78 62.36 71.77
11 10 33 4 4 49.99 9.78 2 32.74 38.72 42.56 49.60 55.75 62.86 72.20
12 11 30 3 3 48.26 11.47 1 30.03 37.68 40.24 45.65 55.42 60.18 79.36
head(MYK)
X Analyte Subject Cohort DayNominal HourNominal Concentration uniqueID FS EF VTI deltaFS deltaEF deltaVTI HR
2 MYK-461 005-010 1 1 0.25 31.00 005-0100.25 31.82 64.86 0.00 3 -1 -100 58
3 MYK-461 005-010 1 1 0.50 31.80 005-0100.5 NA NA NA NA NA NA NA
4 MYK-461 005-010 1 1 1.00 9.69 005-0101 26.13 69.11 0.00 -15 6 -100 55
5 MYK-461 005-010 1 1 1.50 8.01 005-0101.5 NA NA NA NA NA NA NA
6 MYK-461 005-010 1 1 2.00 5.25 005-0102 NA NA NA NA NA NA NA
7 MYK-461 005-010 1 1 3.00 3.26 005-0103 29.89 60.99 23.49 -3 -7 9 55
105 MYK-461 005-033 2 1 0.25 3.4 005-0330.25 30.18 68.59 23.22 1 0 16 47
106 MYK-461 005-033 2 1 0.50 12.4 005-0330.5 NA NA NA NA NA NA NA
107 MYK-461 005-033 2 1 0.75 27.1 005-0330.75 NA NA NA NA NA NA NA
108 MYK-461 005-033 2 1 1.00 23.5 005-0331 32.12 69.60 21.06 7 2 5 43
109 MYK-461 005-033 2 1 1.50 16.8 005-0331.5 NA NA NA NA NA NA NA
110 MYK-461 005-033 2 1 2.00 15.8 005-0332 NA NA NA NA NA NA NA
organize = function(x, y) {
g1 = subset(x, Cohort == y)
g1 = aggregate(x[,'Concentration'], by=list(x[,'HourNominal']), FUN=mean)
g1 = setNames(g1, c('HourNominal', 'Concentration'))
g2 = aggregate(x[,'Concentration'], by=list(x[,'HourNominal']), FUN=sd)
g2 = setNames(g2, c('HourNominal', 'SD'))
g1[,'SD'] = g2$SD
g1$top = g1$Concentration + g1$SD
g1$bottom = g1$Concentration - g1$SD
return(g1)
}
I have a dataframe here, along with some code to subset the dataframe based on a certain Cohort, and to aggregate the Concentration based on Hour. However, all of the dataframes look the same.
CA1 = organize(MYK, 1)
CA2 = organize(MYK, 2)
Yet whenever I use these two commands, the two datasets are identical.
I want a dataset that looks like
HourNominal Concentration SD top bottom
1 0.25 27.287500 25.112204 52.399704 2.1752958
2 0.50 41.989722 32.856013 74.845735 9.1337094
3 0.75 49.866667 22.485254 72.351921 27.3814122
4 1.00 107.168889 104.612098 211.780987 2.5567908
5 1.50 191.766389 264.375466 456.141855 -72.6090774
6 1.75 319.233333 290.685423 609.918757 28.5479100
7 2.00 226.785278 272.983234 499.768512 -46.1979560
8 2.25 341.145833 301.555769 642.701602 39.5900645
9 2.50 341.145833 319.099679 660.245512 22.0461542
10 3.00 195.303333 276.530533 471.833866 -81.2271993
11 4.00 107.913889 140.251991 248.165880 -32.3381024
12 6.00 50.174167 64.700785 114.874952 -14.5266184
13 8.00 38.132639 47.099796 85.232435 -8.9671572
14 12.00 31.404444 39.667850 71.072294 -8.2634051
15 24.00 33.488583 41.267392 74.755975 -7.7788087
16 48.00 29.304833 38.233776 67.538609 -8.9289422
17 72.00 7.322792 6.548898 13.871690 0.7738932
18 96.00 7.002833 6.350251 13.353085 0.6525821
19 144.00 6.463875 5.612630 12.076505 0.8512452
20 216.00 5.007792 4.808156 9.815948 0.1996353
21 312.00 3.964727 4.351626 8.316353 -0.3868988
22 480.00 2.452857 3.220947 5.673804 -0.7680897
23 648.00 1.826625 2.569129 4.395754 -0.7425044
The problem is that the even why I try to separate the values by Cohort, the two dataframes have the same content. They should not be identical.
I have the following data.table:
Month Day Lat Long Temperature
1: 10 01 80.0 180 -6.383330333333309
2: 10 01 77.5 180 -6.193327999999976
3: 10 01 75.0 180 -6.263328333333312
4: 10 01 72.5 180 -5.759997333333306
5: 10 01 70.0 180 -4.838330999999976
---
117020: 12 31 32.5 310 11.840003833333355
117021: 12 31 30.0 310 13.065001833333357
117022: 12 31 27.5 310 14.685003333333356
117023: 12 31 25.0 310 15.946669666666690
117024: 12 31 22.5 310 16.578336333333358
For every location (given by Lat and Long), I have a temperature for each day from 1 October to 31 December.
There are 1,272 locations consisting of each pairwise combination of Lat:
Lat
1 80.0
2 77.5
3 75.0
4 72.5
5 70.0
--------
21 30.0
22 27.5
23 25.0
24 22.5
and Long:
Long
1 180.0
2 182.5
3 185.0
4 187.5
5 190.0
---------
49 300.0
50 302.5
51 305.0
52 307.5
53 310.0
I'm trying to create a data.table that consists of 1,272 rows (one per location) and 92 columns (one per day). Each element of that data.table will then contain the temperature at that location on that day.
Any advice about how to accomplish that goal without using a for loop?
Here we use ChickWeights as the data, where we use "Chick-Diet" as the equivalent of your "lat-lon", and "Time" as your "Date":
dcast.data.table(data.table(ChickWeight), Chick + Diet ~ Time)
Produces:
Chick Diet 0 2 4 6 8 10 12 14 16 18 20 21
1: 18 1 1 1 NA NA NA NA NA NA NA NA NA NA
2: 16 1 1 1 1 1 1 1 1 NA NA NA NA NA
3: 15 1 1 1 1 1 1 1 1 1 NA NA NA NA
4: 13 1 1 1 1 1 1 1 1 1 1 1 1 1
5: ... 46 rows omitted
You will likely need to lat + lon ~ Month + Day or some such for your formula.
In the future, please make your question reproducible as I did here by using a built-in data set.
First create a date value using the lubridate package (I assumed year = 2014, adjust as necessary):
library(lubridate)
df$datetext <- paste(df$Month,df$Day,"2014",sep="-")
df$date <- mdy(df$datetext)
Then one option is to use the tidyr package to spread the columns:
library(tidyr)
spread(df[,-c(1:2,6)],date,Temperature)
Lat Long 2014-10-01 2014-12-31
1 22.5 310 NA 16.57834
2 25.0 310 NA 15.94667
3 27.5 310 NA 14.68500
4 30.0 310 NA 13.06500
5 32.5 310 NA 11.84000
6 70.0 180 -4.838331 NA
7 72.5 180 -5.759997 NA
8 75.0 180 -6.263328 NA
9 77.5 180 -6.193328 NA
10 80.0 180 -6.383330 NA