Taking inverse of certain rows in dataframe

Taking inverse of certain rows in dataframe - r

I have a dataframe of market trades and need to multiply only the put returns by -1. I have the code for that, but can't figure out how to assign it back without affecting the calls.
Input df:
Date Type Stock_Open Stock_Close Stock_ROI
0 2016-04-27 Call 5.33 4.80 -0.099437
1 2016-06-03 Put 4.80 4.52 -0.058333
2 2016-06-30 Call 4.52 5.29 0.170354
3 2016-07-21 Put 5.29 4.84 -0.085066
4 2016-08-08 Call 4.84 5.35 0.105372
5 2016-08-25 Put 5.35 4.65 -0.130841
6 2016-09-21 Call 4.65 5.07 0.090323
7 2016-10-13 Put 5.07 4.12 -0.187377
8 2016-11-04 Call 4.12 4.79 0.162621
Code:
flipped_puts = trades_df[trades_df['Type']=='Put']['Stock_ROI']*-1
trades_df['Stock_ROI'] = flipped_puts
Output of flipped puts:
1 0.058333
3 0.085066
5 0.130841
7 0.187377
Output of original DF:
Date Type Stock_Open Stock_Close Stock_ROI
0 2016-04-27 Call 5.33 4.80 NaN
1 2016-06-03 Put 4.80 4.52 0.058333
2 2016-06-30 Call 4.52 5.29 NaN
3 2016-07-21 Put 5.29 4.84 0.085066
4 2016-08-08 Call 4.84 5.35 NaN
5 2016-08-25 Put 5.35 4.65 0.130841
6 2016-09-21 Call 4.65 5.07 NaN
7 2016-10-13 Put 5.07 4.12 0.187377
8 2016-11-04 Call 4.12 4.79 NaN

try
trades_df.loc[trades_df.Type.eq('Put'), 'Stock_ROI'] *= -1
Or
trades_df.update(trades_df.query('Type == "Put"').Stock_ROI.mul(-1))
both give you
trades_df

We can use data.table from R. Convert the 'data.frame' to 'data.table' (setDT(trades_df)), specify the logical condition in 'i', multiply the 'Stock_ROI' with -1 and assign (:=) it to a new column. The other values will be filled by NA.
library(data.table)
setDT(trades_df)[Type == 'Put', Stock_ROIN := Stock_ROI * -1][]
If we want to update the same column
setDT(trades_df)[Type == 'Put', Stock_ROI := Stock_ROI * -1]
trades_df
# Date Type Stock_Open Stock_Close Stock_ROI
#1: 2016-04-27 Call 5.33 4.80 -0.099437
#2: 2016-06-03 Put 4.80 4.52 0.058333
#3: 2016-06-30 Call 4.52 5.29 0.170354
#4: 2016-07-21 Put 5.29 4.84 0.085066
#5: 2016-08-08 Call 4.84 5.35 0.105372
#6: 2016-08-25 Put 5.35 4.65 0.130841
#7: 2016-09-21 Call 4.65 5.07 0.090323
#8: 2016-10-13 Put 5.07 4.12 0.187377
#9: 2016-11-04 Call 4.12 4.79 0.162621
and want to change the other values to NA
setDT(trades_df)[Type == 'Put', Stock_ROI := Stock_ROI * -1
][Type!= 'Put', Stock_ROI := NA]
trades_df
# Date Type Stock_Open Stock_Close Stock_ROI
#1: 2016-04-27 Call 5.33 4.80 NA
#2: 2016-06-03 Put 4.80 4.52 0.058333
#3: 2016-06-30 Call 4.52 5.29 NA
#4: 2016-07-21 Put 5.29 4.84 0.085066
#5: 2016-08-08 Call 4.84 5.35 NA
#6: 2016-08-25 Put 5.35 4.65 0.130841
#7: 2016-09-21 Call 4.65 5.07 NA
#8: 2016-10-13 Put 5.07 4.12 0.187377
#9: 2016-11-04 Call 4.12 4.79 NA

Related

How can I change date formats in a column of a dataframe?

I'd like to change the format of an entire column in a dataframe in R.
I saw answers for this in python, and I've been attempting all sorts of codes and trying, but nothing has worked. I've finally found a way to check and verify the value types of each column of my dataframe and the date column comes up as character. I'd like to change that to date.
...also, on another note, I've asked a few questions and people always say to put sample data in here, but I don't know how to copy my dataframe from RStudio cloud...? But I'll attempt to show some data and my codes.
data frame:
Id
ActivityDate
TotalSteps
TotalDistance
TrackerDistance
LoggedActivitiesDistance
VeryActiveDistance
ModeratelyActiveDistance
LightActiveDistance
SedentaryActiveDistance
VeryActiveMinutes
FairlyActiveMinutes
LightlyActiveMinutes
SedentaryMinutes
Calories
1
1503960366
4/12/2016
13162
8.50
8.50
0
1.88
0.55
6.06
0.00
25
13
328
728
1985
2
1503960366
4/13/2016
10735
6.97
6.97
0
1.57
0.69
4.71
0.00
21
19
217
776
1797
3
1503960366
4/14/2016
10460
6.74
6.74
0
2.44
0.40
3.91
0.00
30
11
181
1218
1776
4
1503960366
4/15/2016
9762
6.28
6.28
0
2.14
1.26
2.83
0.00
29
34
209
726
1745
5
1503960366
4/16/2016
12669
8.16
8.16
0
2.71
0.41
5.04
0.00
36
10
221
773
1863
6
1503960366
4/17/2016
9705
6.48
6.48
0
3.19
0.78
2.51
0.00
38
20
164
539
1728
7
1503960366
4/18/2016
13019
8.59
8.59
0
3.25
0.64
4.71
0.00
42
16
233
1149
1921
8
1503960366
4/19/2016
15506
9.88
9.88
0
3.53
1.32
5.03
0.00
50
31
264
775
2035
9
1503960366
4/20/2016
10544
6.68
6.68
0
1.96
0.48
4.24
0.00
28
12
205
818
1786
10
1503960366
4/21/2016
9819
6.34
6.34
0
1.34
0.35
4.65
0.00
19
8
211
838
1775
11
1503960366
4/22/2016
12764
8.13
8.13
0
4.76
1.12
2.24
0.00
66
27
130
1217
1827
12
1503960366
4/23/2016
14371
9.04
9.04
0
2.81
0.87
5.36
0.00
41
21
262
732
1949
13
1503960366
4/24/2016
10039
6.41
6.41
0
2.92
0.21
3.28
0.00
39
5
238
709
1788
14
1503960366
4/25/2016
15355
9.80
9.80
0
5.29
0.57
3.94
0.00
73
14
216
814
2013
15
1503960366
4/26/2016
13755
8.79
8.79
0
I don't know why it pastes like that. Anyways...
daily_activity <- read_csv("dailyActivity_merged.csv")
I then ran:
str(daily_activity)
To check what types of data each column was made of. I see my activity dates are
chr types...which I looked up and saw meant characters. Is this correct? I used this same dataset in google sheets and double check it, there are 600 rows worth of each columns, and they came back with a ' in front of the numbers for dates, as if they were entered in distance measurements like the columns following the date column. This is incorrect obviously as this is a date, not a distance, so now I'd like to change the entire column to be dates.
I've tried:
as_date(daily_activity, ActivityDate)
mdy(ActivityDate)
help("mdy")
help("print")
help("str")
str(daily_activity) %>% as.date(ActivityDate,"mm/dd/yyyy")
Not sure what to do but there doesn't seem to be any site or reference for such a thing as I've been google-ing for answers and help for 2 days now.
2nd part of my quest, is to then use the newly created date column, and the already present Id column to merge two dataframes...is that possible? Both the dataframes have the date column that needs to be upgraded, and they both have the Id columns, so I was thinking a join statement...does that exist in R? Because I want to join both dataframes in their entirety, by matching up with both those columns, Id and date...but the other columns in the data frames are different, and I need to work with information from both dataframes, together.
Has any of this made sense? I hope so. I asked the questions as if I was talking to someone, like the recommend to do on this site. Thanks in advance for any advice, help or information.

How to Drop X in Column names after Merge

I've merged two data frames by common row names, and the merge worked fine, but I am getting an x before each column name.
How can I remove the X from each column header?
z<- merge(p, y, by='ID')
head(z)
ID x y V1 X198101 X198102 X198103 X198104 X198105 X198106
1 410320 -122.5417 37.75 NA 119.45 33.15 104.23 5.61 4.85 0
2 410321 -122.5000 37.75 NA 129.49 37.76 114.94 5.28 5.24 0
3 410322 -122.4583 37.75 NA 163.68 42.80 131.22 7.25 6.94 0
4 410323 -122.4167 37.75 NA 141.14 32.26 110.45 7.77 4.62 0
5 410324 -122.3750 37.75 NA 130.87 25.87 102.15 8.38 4.13 0
6 410325 -122.3333 37.75 NA 129.03 25.21 102.37 9.42 4.35 0
Thanks!

It is better to have column names not start with numbers. By default, the make.names or make.unique adds the X prefix when it starts with numbers. To remove it, one option is sub
names(z) <- sub("^X", "", names(z))
z
# ID x y V1 198101 198102 198103 198104 198105 198106
#1 410320 -122.5417 37.75 NA 119.45 33.15 104.23 5.61 4.85 0
#2 410321 -122.5000 37.75 NA 129.49 37.76 114.94 5.28 5.24 0
#3 410322 -122.4583 37.75 NA 163.68 42.80 131.22 7.25 6.94 0
#4 410323 -122.4167 37.75 NA 141.14 32.26 110.45 7.77 4.62 0
#5 410324 -122.3750 37.75 NA 130.87 25.87 102.15 8.38 4.13 0
#6 410325 -122.3333 37.75 NA 129.03 25.21 102.37 9.42 4.35 0
If we apply make.names
make.names(names(z))
#[1] "ID" "x" "y" "V1" "X198101" "X198102"
#[7] "X198103" "X198104" "X198105" "X198106"
The 'X' prefix is returned. So, in general, it is safe to have column names with 'character' prefix instead of just numbers. Also, if we wanted to extract say '198101' column, we need a backtick
z$198104
#Error: unexpected numeric constant in "z$198104"
z$`198104`
#[1] 5.61 5.28 7.25 7.77 8.38 9.42

This isn't actually caused by merge, it must be something earlier in your code. If it happens when you read in the data, try the check.names=FALSE option.
a <- data.frame(a=1:3, b=4:6)
b <- data.frame(a=1:3, c=7:9)
names(b)[2] <- 2485
merge(a,b)
## a b 2485
## 1 1 4 7
## 2 2 5 8
## 3 3 6 9

Error in producing the output

I have problem with my code. I can't trace the error. I have coor data (40 by 2 matrix) as below and a rainfall data (14610 by 40 matrix).
No Longitude Latitude
1 100.69 6.34
2 100.77 6.24
3 100.39 6.11
4 100.43 5.53
5 100.39 5.38
6 101.00 5.71
7 101.06 5.30
8 100.80 4.98
9 101.17 4.48
10 102.26 6.11
11 102.22 5.79
12 102.28 5.31
13 102.02 5.38
14 101.97 4.88
15 102.95 5.53
16 103.13 5.32
17 103.06 4.94
18 103.42 4.76
19 103.42 4.23
20 102.38 4.24
21 101.94 4.23
22 103.04 3.92
23 103.36 3.56
24 102.66 3.03
25 103.19 2.89
26 101.35 3.70
27 101.41 3.37
28 101.75 3.16
29 101.39 2.93
30 102.07 3.09
31 102.51 2.72
32 102.26 2.76
33 101.96 2.74
34 102.19 2.36
35 102.49 2.29
36 103.02 2.38
37 103.74 2.26
38 103.97 1.85
39 103.72 1.76
40 103.75 1.47
rainfall= 14610 by 40 matrix;
coor= 40 by 2 matrix
my_prog=function(rainrain,coordinat,misss,distance)
{
rain3<-rainrain # target station i**
# neighboring stations for target station i
a=coordinat # target station i**
diss=as.matrix(distHaversine(a,coor,r=6371))
mmdis=sort(diss,decreasing=F,index.return=T)
mdis=as.matrix(mmdis$x)
mdis1=as.matrix(mmdis$ix)
dist=cbind(mdis,mdis1)
# NA creation
# create missing values in rainfall data
set.seed(100)
b=sample(1:nrow(rain3),(misss*nrow(rain3)),replace=F)
k=replace(rain3,b,NA)
# pick i closest stations
neig=mdis1[distance] # neighbouring selection distance
# target (with NA) and their neighbors
rainB=rainfal00[,neig]
rainA0=rainB[,2:ncol(rainB)]
rainA<-as.matrix(cbind(k,rainA0))
rain2=na.omit(rainA)
x=as.matrix(rain2[,1]) # used to calculate the correlation
n1=ncol(rainA)-1
#1) normal ratio(nr)
jum=as.matrix(apply(rain2,2,mean))
nr0=(jum[1]/jum)
nr=as.matrix(nr0[2:nrow(nr0),])
m01=as.matrix(rainA[is.na(k),])
m1=m01[,2:ncol(m01)]
out1=as.matrix(sapply(seq_len(nrow(m1)),
function(i) sum(nr*m1[i,],na.rm=T)/n1))
print(out1)
}
impute=my_prog(rainrain=rainfall[,1],coordinat=coor[1,],misss=0.05,distance=mdis<200)
I have run this code and and the output obtained is:
Error in my_prog(rainrain = rainfal00[, 1], misss = 0.05, coordinat = coor[1, :
object 'mdis' not found
I have checked the program, but cannot trace the problem. I would really appreciate if someone could help me.

'origin' must be supplied&unambiguous format

I am a beginner in R. I have the following problem - I want to load a CSV file into R and then convert it into a XTS object. However, after the operation I get an error. First, a small snippet of the data:
a=read.csv('/Users/..../Desktop/SYNEKTIK.csv',h=T)
head(a)
Name Date Open High Low Close Volume
1 SYNEKTIK 20110809 5.76 8.23 5.76 8.23 28062
2 SYNEKTIK 20110810 9.78 9.78 8.10 8.13 9882
3 SYNEKTIK 20110811 9.00 9.00 9.00 9.00 2978
4 SYNEKTIK 20110812 9.70 9.70 8.90 9.60 5748
5 SYNEKTIK 20110816 9.70 11.00 9.70 11.00 23100
6 SYNEKTIK 20110818 10.90 11.00 10.90 10.90 319
The following does not work:
w=xts(a[,-1],order.by=as.POSIXct(a[,1]))
As it produces the following error:
error'as.POSIXlt.character(as.character(x), ...)':
character string is not in a standard unambiguous format
Another try that did not work:
a=a[,-1]
head(a)
Date Open High Low Close Volume
1 20110809 5.76 8.23 5.76 8.23 28062
2 20110810 9.78 9.78 8.10 8.13 9882
3 20110811 9.00 9.00 9.00 9.00 2978
4 20110812 9.70 9.70 8.90 9.60 5748
5 20110816 9.70 11.00 9.70 11.00 23100
6 20110818 10.90 11.00 10.90 10.90 319
w=xts(a[,-1],order.by=as.POSIXct(a[,1]))
error 'as.POSIXct.numeric(a[, 1])':'origin' must be supplied
Finally, when I saved the date in the following format: yyyy -mm - dd Everything turned out right, and I could convert into an XTS object, why?

Maybe something like this will help:
w <- xts(a[,c(-1,-2)],order.by=as.Date(as.character(a[,2]),"%Y%m%d"))

Transposing and Add column in R in Azure ML Studio

I obtain the following data set in Azure. Each row is a parameter that is relevant to a forecasting model.
I am relatively new to R. I tried the following code but it does not give me the expected output. After I transpose the data set, I want to add an additional column "Month-Year".
Can someone help me? Thanks.
Data set
features V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
A 28.21 42.03 48.56 46.85 46.03 54.6 63.87 50 53.34 43.47 34.66 27.48
B 1333 1348.64 1364.28 1379.92 1395.56 1411.2 1426.84 1442.48 1458.11 1473.75 1489.39 1505.03
C 10.05 5.46 4.82 5.27 5.07 4.07 9.53 1.95 6.95 6.54 5.91 0.56
D 18.22 18.41 14.31 30.28 18.16 15.52 12.52 13.14 15.05 8.89 12.51 25.25
R code
# Map 1-based optional input ports to variables
dataset <- maml.mapInputPort(1)
a <- c("A", "B", "C", "D")
data.set <- cbind(a, dataset)
names(data.set)[1] <- c("features")
# first remember the names
n <- dataset$features
# transpose all but the first column (name)
df.aree <- as.data.frame(t(data.set[,-1]))
names(data.set)[1] <- n
df.aree$myfactor <- factor(row.names(df.aree))
maml.mapOutputPort("df.aree")
Expected result
Month-Year A B C D
01-01-15 28.21 1333 10.05 18.22
01-02-15 42.03 1348.64 5.46 18.41
01-03-15 48.56 1364.28 4.82 14.31
01-04-15 46.85 1379.92 5.27 30.28
01-05-15 46.03 1395.56 5.07 18.16
01-06-15 54.6 1411.2 4.07 15.52
01-07-15 63.87 1426.84 9.53 12.52
01-08-15 50 1442.48 1.95 13.14
01-09-15 53.34 1458.11 6.95 15.05
01-10-15 43.47 1473.75 6.54 8.89
01-11-15 34.66 1489.39 5.91 12.51
01-12-15 27.48 1505.03 0.56 25.25

Create "MonYear" using seq with from and to dates.
MonYear <- format(seq(as.Date('2015-01-01'), as.Date('2015-12-01'),
by = 'month'), '%d-%m-%y')
Transpose the non-numeric columns in the original dataset (the output will be a matrix. We create a data.frame by combining 'MonYear' and the matrix output.
df2 <- data.frame(MonYear,t(df1[-1]))
Change the column names and row names accordingly
colnames(df2)[-1] <- LETTERS[1:4]
row.names(df2) <- NULL
df2
MonYear A B C D
1 01-01-15 28.21 1333.00 10.05 18.22
2 01-02-15 42.03 1348.64 5.46 18.41
3 01-03-15 48.56 1364.28 4.82 14.31
4 01-04-15 46.85 1379.92 5.27 30.28
5 01-05-15 46.03 1395.56 5.07 18.16
6 01-06-15 54.60 1411.20 4.07 15.52
7 01-07-15 63.87 1426.84 9.53 12.52
8 01-08-15 50.00 1442.48 1.95 13.14
9 01-09-15 53.34 1458.11 6.95 15.05
10 01-10-15 43.47 1473.75 6.54 8.89
11 01-11-15 34.66 1489.39 5.91 12.51
12 01-12-15 27.48 1505.03 0.56 25.25

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Taking inverse of certain rows in dataframe - r

try trades_df.loc[trades_df.Type.eq('Put'), 'Stock_ROI'] *= -1 Or trades_df.update(trades_df.query('Type == "Put"').Stock_ROI.mul(-1)) both give you trades_df

Related

How can I change date formats in a column of a dataframe?

How to Drop X in Column names after Merge

Error in producing the output

'origin' must be supplied&unambiguous format

Transposing and Add column in R in Azure ML Studio

Categories

Resources