I don't know if the subject has already been find but here my problem :
I have a dataset from behaviors personality items scored from 1 to 8 and I would like to convert each scored according a range (e.g. 1-2 = Rare ; 3-5 = Occasionally ; 6-8 = Frequent).
I succeed to create new columns and put labels in it but I don't understand why I have same repetition in others columns :
Beh_data[,c(2,3,4,32,33,34)
enter image description here
You can see that columns with "_class" had the same outputs, and there are mistakes about correct match between labels and scores (e.g. row4 -- 8 put as Occasionally)
Here the function code :
l = unlist(names(Beh_data[,2:28]))
for (j in 1:length(l)) {
cl[j] = list(paste(l[j],"class",sep="_"))
for (k in 1:length(cl)) {
Beh_data[,cl[[k]] ] <- cl[[k]]
for(i in 1:nrow(Beh_data)){
Beh_data[,cl[[k]] ][i] <-ifelse(Beh_data[,l[j] ][i]<3, "Rare", Beh_data[,cl[[k]] ][i])
Beh_data[,cl[[k]] ][i] <-ifelse(Beh_data[,l[j] ][i]>2 & Beh_data[,l[j] ][i]<6, "Occasionally", Beh_data[,cl[[k] ] ][i])
Beh_data[,cl[[k]] ][i] <-ifelse(Beh_data[,l[j] ][i]>5, "Frequent", Beh_data[,cl[[k]] ][i])
}
}
}
I tried to see if it's could from a wrong annotation as cl[[k]] ] or something like this but it steels doesn't work
Do you have any ideas please ?
If you're open to a dplyr solution, I think its across and case_when functions are helpful here. It should also run faster since it's vectorized. This will create new columns like aff_sum_class which use the categorization you've specified.
library(dplyr)
Beh_data |>
mutate(across(aff_sum:qui_sum,
~case_when(. >= 6 ~ "Frequent",
. >= 3 ~ "Occasionally",
TRUE ~ "Rare"),
.names = "{.col}_class"))
I have a dataframe with two columns, and I want to remove all rows in which one of the values in each row is either smaller than 0 or bigger than a specified number (for the sake of argument let's call this 2000).
This is the dataframe
structure(list(xx = c(134.697838289433, 222.004361198059, 131.230956160172,
206.658871436917, 111.25078650042, 241.965831417648, 171.46912254679,
116.860666678254, 196.894985820028, 135.309699618638, 133.082437475133,
185.509376072318, 718.998297748551, 745.902984215293, 752.655615982603,
633.199684348903, 764.983924278636, 694.856525559398, 773.56532078895,
757.32358575657, 709.924023536199, 658.863564702233, 733.076690816291,
745.9306541374, 788.134444412421, 759.445624288787, 796.989170170713,
632.952543475636, 746.103571612919, 715.296116988119, 766.899107551248,
628.268453830605, 658.574104878488, 689.916530654021, 820.841422812349,
709.097957368612, 793.109262845978, 716.713801941779, 726.83260343463,
746.547080776193, 759.644057119419, 757.41275593749, 723.539527360327,
839.816318612061, 795.655016954661, 766.245386324182, 756.300015395758,
808.255074043333, 745.915083305187, 685.465492956583, 694.567959198318,
786.919467838804, 699.521900871042, 749.041223560884, 700.079697765533,
753.805501259023, 745.080253997501, 846.982894686656, 775.66384433188,
809.39649823454, 841.009469183585, 790.987061753069, 792.441925234251,
1377.97739642236, 1353.19738061511, 1259.94435540633, 1276.25060187203,
1331.26106031956, 1227.68481147557, 1345.95561236514, 1309.51489973952,
1285.62680259649, 1329.46388049714, 1256.00394500077, 1294.0505313591,
1349.09440181876, 1294.72661682462, 1339.38577920408, 1277.114896541,
1267.54884404031, 1291.32793111573, 1254.85565551553, 1298.78499697743,
1283.89664572036, 1273.92831816666, 1310.221891323, 1327.89682404014,
1310.81394400863, 595.342571560588, 689.892254230306, 562.390766853428,
736.319251501976, 609.577261412134, 641.591997384705, 682.957658696869,
580.320759093636, 560.64984978551, 643.487033739876, 688.457314818318,
631.156743281308, 659.535909106305), yy = c(1169.70954243065,
1259.830208937, 1172.21661417439, 1097.62724268622, 1198.15024522658,
1231.90665701131, 1211.36196331211, 1152.4207367321, 1287.57553021171,
1120.61366993258, 1234.70366243878, 1258.47454705197, 893.983957068268,
994.99854601335, 916.330965835536, 947.536265806389, 950.345051732045,
934.313361799171, 1018.76942964176, 918.182358835366, 1005.51128858608,
967.577307930044, 997.239384198691, 995.866808447868, 962.292293255127,
864.624084608006, 895.091604672023, 906.22162647536, 1024.45206885923,
908.693026118345, 923.625774785301, 931.801569764776, 1007.88553380827,
848.55309782664, 927.608364899483, 1024.60765786828, 1085.64295260059,
1057.90632135992, 1195.30607038065, 1151.39888340311, 1168.2831257626,
1137.15375447446, 1145.42393212912, 1108.89072769468, 1075.15451622384,
1129.91711324634, 1191.94330388541, 1132.41649984784, 1210.89342724886,
1100.60339252755, 1083.5987922884, 1056.69487941162, 1150.2707936581,
1055.75678264632, 1055.53323667429, 1049.79655119467, 1166.86598024805,
1141.82593378866, 1066.37755267981, 1160.55793904653, 1162.65728735716,
1060.29360609309, 1107.40480300404, 1825.01445883899, 1802.95011068891,
1692.84948509132, 1675.97166713074, 1758.10341887143, 1788.48414279738,
1680.15824054313, 1756.01930833023, 1706.98458587119, 1770.57687329296,
1692.21991398915, 1835.60585163662, 1790.6487914694, 1787.52076839767,
1704.25313427813, 1735.96312434652, 1813.02044772293, 1847.21159474717,
1725.63580525853, 1841.32016678, 1713.80845602987, 1770.39756152819,
1747.72988313376, 1778.13110060636, 1786.3871288087, 6.01666671271317,
19.2497357431764, 9.6964112500295, -3.23929433528044, 89.4863211231715,
86.0082947221296, 42.7982120490919, 2.19886414532234, 12.8780844043502,
30.694893442471, 7.58386594976601, 83.8385161493349, 36.4551491976192
)), row.names = 100:200, class = "data.frame")
First I create a function to eliminate points which satisfy the conditions.
routliers<-function(x){
if(x>2000|x<0){
rm(x)
}
}
Then I use the apply function across the rows to eliminate the points using the above function (the above dput() is named cds).
cds<-data.frame(apply(cds,1,routliers))
But this eliminates all points
length(cds)
[1]0
Interestingly, if I replace the rm() function with print(), then I do print out the desired points when using the apply function, but I receive the error "arguments imply differing number of rows: 0, 2". Also, I am not sure when I use the apply() function that the specified function applies to both columns of data, as I am not seeing any data points in print() that satisfies the condition for ONLY the second column of points. The first column is x co-ordinates, and the second column is y co-ordinates. I think the error "arguments imply differing number of rows:0,2" suggests that only the first value in the row is being tested against the function.
How can I write code in which rows are eliminated if one or more of the data points satisfies my condition?
This is easy to do when the columns are separate vectors, (x<-x[!condition]) however I cannot add them together again easily so I prefer to do this on a dataframe of points.
Please check if this code works for you, with df being the data you shared:
#Code
new <- df[!rowSums(df < 0 | df>2000) > 0, ]
Or this:
#Code 2
new <- df[which(apply(df,1,function(x) sum(x<0 | x>2000))==0),]
Let's make your function return TRUE for an outlier and FALSE for not an outlier. And it can be vectorized:
is_outlier = function(x) {
x > 2000 | x < 0
}
Here's how we'd use this to drop rows with outliers in a single column:
cds[!is_outlier(cds$xx), ]
For two columns, we can combine the is_outlier results with & or |. I can't tell from your text whether you want to remove rows where xx AND yy are outliers, or remove rows where xx OR yy are outliers. So pick the appropriate version:
cds[!is_outlier(cds$xx) & !is_outlier(cds$yy), ]
cds[!is_outlier(cds$xx) | !is_outlier(cds$yy), ]
I'm trying to write an xlsx file from a list of dataframes that I created but I'm getting an error due to missing data (I couldn't download it). I just want to write the xlsx file besides having this lacking data. Any help is appreciated.
For replication of the problem:
library(quantmod)
name_of_symbols <- c("AKER","YECO","SNOA")
research_dates <- c("2018-11-19","2018-11-19","2018-11-14")
my_symbols_df <- lapply(name_of_symbols, function(x) tryCatch(getSymbols(x, auto.assign = FALSE),error = function(e) { }))
my_stocks_OHLCV <- list()
for (i in 1:3) {
trade_date <- paste(as.Date(research_dates[i]))
OHLCV_data <- my_symbols_df[[i]][trade_date]
my_stocks_OHLCV[[i]] <- data.frame(OHLCV_data)
}
And you can see the missing data down here in my_stocks_OHLCV[[2]] and the write.xlsx error I'm getting:
print(my_stocks_OHLCV)
[[1]]
AKER.Open AKER.High AKER.Low AKER.Close AKER.Volume AKER.Adjusted
2018-11-19 2.67 3.2 1.56 1.75 15385800 1.75
[[2]]
data frame with 0 columns and 0 rows
[[3]]
SNOA.Open SNOA.High SNOA.Low SNOA.Close SNOA.Volume SNOA.Adjusted
2018-11-14 1.1 1.14 1.01 1.1 107900 1.1
write.xlsx(my_stocks_OHLCV, "C:/Users/MICRO/Downloads/Datasets_stocks/dux_OHLCV.xlsx")
Error in (function (..., row.names = NULL, check.rows = FALSE,
check.names = TRUE,:arguments imply differing number of rows: 1, 0
How do I run write.xlsx even though I have this missing data?
The main question you need to ask is, what do you want instead?
As you are working with stock data, the best idea, is that if you don't have data for a stock, then remove it. Something like this should work,
my_stocks_OHLCV[lapply(my_stocks_OHLCV,nrow)>0]
If you want a row full of NA or 0
Then use the lapply function and for each element of the list, of length 0, replace with either NA's, vector of 0's (c(0,0,0,0,0,0)) etc...
Something like this,
condition <- !lapply(my_stocks_OHLCV,nrow)>0
my_stocks_OHLCV[condition] <- data.frame(rep(NA,6))
Here we define the condition variable, to be the elements in the list where you don't have any data. We can then replace those by NA or swap the NA for 0. However, I can't think of a reason to do this.
A variation on your question, and one you could handle inside your for loop, is to check if you have data, and if you don't, replace the values there, with NAs, and you could given it the correct headers, as you know which stock it relates to.
Hope this helps.
I am very new user for R and want to use R for back testing my Strategy. I try to combine some scripts found in web. However, it did not work according my idea. My problem is the transaction date cannot be generated according to my strategy design date.
library(quantmod)
library(lubridate)
stock1<-getSymbols("AAPL",src="yahoo",from="2016-01-01",auto.assign=F)
stock1<-na.locf(stock1)
stock1$EMA9<-EMA(Cl(stock1),n=9)
stock1$EMA19<-EMA(Cl(stock1),n=19)
stock1$EMACheck<-ifelse(stock1$EMA9>stock1$EMA19,1,0)
stock1$EMA_CrossOverUp<-ifelse(diff(stock1$EMACheck)==1,1,0)
stock1$EMA_CrossOverDown<-ifelse(diff(stock1$EMACheck)==-1,-1,0)
stock1<-stock1[index(stock1)>="2016-01-01",]
stock1_df<-data.frame(index(stock1),coredata(stock1))
colnames(stock1_df)<-c("Date","Open","High","Low","Close","Volume","Adj","EMA9","EMA19","EMACheck","EMACheck_up","EMACheck_down")
#To calculate the number of crossoverup transactions during the duration from 2016-01-01
sum(stock1_df$EMACheck_up==1 & index(stock1)>="2016-01-01",na.rm=T)
stock1_df$Date[stock1_df$EMACheck_up==1 & index(stock1)>="2016-01-01"]
sum(stock1_df$EMACheck_down==-1 & index(stock1)>="2016-01-01",na.rm=T)
stock1_df$Date[stock1_df$EMACheck_down==-1 & index(stock1)>="2016-01-01"]
#To generate the transcation according to the strategy
transaction_dates<-function(stock2,Buy,Sell)
{
Date_buy<-c()
Date_sell<-c()
hold<-F
stock2[["Hold"]]<-hold
for(i in 1:nrow(stock2)) {
if(hold == T) {
stock2[["Hold"]][i]<-T
if(stock2[[Sell]][i] == -1) {
#stock2[["Hold"]][i]<-T
hold<-F
}
} else {
if(stock2[[Buy]][i] == 1) {
hold<-T
stock2[["Hold"]][i]<-T
}
}
}
stock2[["Enter"]]<-c(0,ifelse(diff(stock2[["Hold"]])==1,1,0))
stock2[["Exit"]]<-c(ifelse(diff(stock2[["Hold"]])==-1,-1,0),0)
Buy_date <- stock2[["Date"]][stock2[["Enter"]] == 1]
Sell_date <- stock2[["Date"]][stock2[["Exit"]] == -1]
if (length(Sell_date)<length(Buy_date)){
#Sell_date[length(Sell_date)+1]<-tail(stock2[["Date"]],n=2)[1]
Buy_date<-Buy_date[1:length(Buy_date)-1]
}
return(list(DatesBuy=Buy_date,DatesSell=Sell_date))
}
#transaction dates generate:
stock1_df <- na.locf(stock1_df)
transactionDates<-transaction_dates(stock1_df,"EMACheck_up","EMACheck_down")
transactionDates
num_transaction1<-length(transactionDates[[1]])
Open_price<-function(df,x) {df[as.integer(rownames(df[df[["Date"]]==x,]))+1,][["Open"]]}
transactions_date<-function(df,x) {df[as.integer(rownames(df[df[["Date"]]==x,]))+1,][["Date"]]}
transactions_generate<-function(df,num_transaction)
{
price_buy<-sapply(1:num_transaction,function(x) {Open_price(df,transactionDates[[1]][x])})
price_sell<-sapply(1:num_transaction,function(x) {Open_price(df,transactionDates[[2]][x])})
Dates_buy<-as.Date(sapply(1:num_transaction,function(x) {transactions_date(df,transactionDates[[1]][x])}))
Dates_sell<-as.Date(sapply(1:num_transaction,function(x) {transactions_date(df,transactionDates[[2]][x])}))
transactions_df<-data.frame(DatesBuy=Dates_buy,DatesSell=Dates_sell,pricesBuy=price_buy,pricesSell=price_sell)
#transactions_df$return<-100*(transactions_df$pricesSell-transactions_df$pricesBuy)/transactions_df$pricesBuy
transactions_df$Stop_loss<-NA
return(transactions_df)
}
transaction_summary<-transactions_generate(stock1_df,num_transaction1)
transaction_summary$Return<-100*(transaction_summary$pricesSell-transaction_summary$pricesBuy)/transaction_summary$pricesBuy
transaction_summary
sum(transaction_summary$Return,na.rm=T)
Hi, I am very new user for R and want to use R for back testing my Strategy. I try to combine some scripts found in web. However, it did not work according my idea. My problem is the transaction date cannot be generated according to my strategy design date.
problem as this image
The code you have is to complicated for it's own good.
The issue lies in the fact that the functions Open_price and transactions_date look for use rownames to find a record number and then take the next one. But then instead of looking for the rownames again, it is used as an index. There it goes wrong.
If you look at the following result for the first date, it returns 40.
as.integer(rownames(stock1_df[stock1_df[["Date"]] == "2016-03-01", ]))
[1] 40
So the next record it would look for will be 41. But stock_df[41, ] is not the same as rowname 41. An issue with rownames is that if you filter / remove records from the data.frame the rownames don't change. To get the correct index number you should use which. If you look at the stock1_df, you can see that it returns 21 and we need record 22
which(stock1_df[["Date"]] == "2016-03-01")
[1] 21
I changed the Open_price and transactions_date functions to use the which function. This will now return the correct results.
Open_price <- function(df, x) {
df[which(df[["Date"]] == x) + 1, ][["Open"]]
}
transactions_date <- function(df, x) {
df[which(df[["Date"]] == x) + 1, ][["Date"]]
}
head(transaction_summary)
DatesBuy DatesSell pricesBuy pricesSell Stop_loss Return
1 2016-03-02 2016-04-25 100.51 105.00 NA 4.467215
2 2016-05-27 2016-06-20 99.44 96.00 NA -3.459374
3 2016-07-13 2016-09-12 97.41 102.65 NA 5.379322
4 2016-09-15 2016-11-02 113.86 111.40 NA -2.160547
5 2016-12-12 2017-06-13 113.29 147.16 NA 29.896728
6 2017-07-17 2017-09-19 148.82 159.51 NA 7.183166
A bit of advice, try to use spaces in your code. That makes it more readable. Look for example at this style guide. Your whole code be rewritten to only use stock1 without the need to turning it into a data.frame halfway your code. But for now the code does what it needs to do.