My problem is as detailed below:
My input data is of the format as given in the small example below:
USERID LONGITUDE LATITUDE
1 -8.79659 55.879554
2 -6.874743 56.87896
3 -3.874743 58.87896
4 -10.874743 80.87896
I have used the follwoing code to reverse geocode the latitiude and longitude
dset <- as.data.frame(dataset[,2:3])
dset <- na.omit(dset)
library (ggmap)
location <- dset
nrow(location)
locaddr <- matrix(0,nrow(location),1)
location <- as.matrix(location)
for (i in 1:nrow(location))
{
locaddr[i,] <- revgeocode(location[i,], output = c("address"), messaging = FALSE, sensor = FALSE, override_limit = FALSE)
}
Now certain longitude-latitude return NA from Google Maps API. But when this happens the for loop is terminated for some reason. I would like to circumvent this and continue processing for the remaining data points. One idea I had was the following pseudocode:
if i = nrow(location)
continue
else
repeat revgeocode for loop here
end-for
end-if.
Kindly advise how this can be done or if there is a better way to do this.
Thank you in advance for your time and help.
No need to use a for-loop here. I recommand you to use lapply to avoid side effect, and pre-allocate problems:
locaddr <- lapply(seq(nrow(location)), function(i){
revgeocode(location[i,],
output = c("address"),
messaging = FALSE,
sensor = FALSE,
override_limit = FALSE)
})
Related
I have 24 variables called empl_1 -empl_24 (e.g. empl_2; empl_3..)
I would like to write a loop in R that takes this values 1-24 and puts them in the respective places so the corresponding variables are either called or created with i = 1-24. The sample below shows what I would like to have within the loop (e.g. ye1- ye24; ipw_atet_1 - ipw_atet_14 and so on.
ye1_ipw <- empl$empl_1[insample==1]
ipw_atet_1 <- treatweight(y=ye1_ipw, d=treat_ipw, x=x1_ipw, ATET =TRUE, trim=0.05, boot = 2)
ipw_atet_1
ipw_atet_1$se
ye2_ipw <- empl$empl_2[insample==1]
ipw_atet_2 <- treatweight(y=ye2_ipw, d=treat_ipw, x=x1_ipw, ATET =TRUE, trim=0.05, boot = 2)
ipw_atet_2
ipw_atet_2$se
ye3_ipw <- empl$empl_3[insample==1]
ipw_atet_3 <- treatweight(y=ye3_ipw, d=treat_ipw, x=x1_ipw, ATET =TRUE, trim=0.05, boot = 2)
ipw_atet_3
ipw_atet_3$se
coming from a Stata environment I tried
for (i in seq_anlong(empl_list)){
ye[i]_ipw <- empl$empl_[i][insample==1]
ipw_atet_[i]<-treatweight(y=ye[i]_ipw, d=treat_ipw, x=x1_ipw, ATET=TRUE, trim=0.05, boot =2
}
However this does not work at all. Do you have any idea how to approach this problem by writing a nice loop? Thank you so much for your help =)
You can try with lapply :
result <- lapply(empl[paste0('empl_', 1:24)], function(x)
treatweight(y = x[insample==1], d = treat_ipw,
x = x1_ipw, ATET = TRUE, trim = 0.05, boot = 2))
result would be a list output storing the data of all the 24 variables in same object which is easier to manage and process instead of having different vectors.
I have this code. I have my google API set up already, registered as well in R, Distance Matrix API has been initiated as well in the Google Cloud console.
Here is the dataframe I have, random 25 postal codes FROM and TO postal codes.
Dataset_test = data.frame(
FROM_POSTAL = c("V8A 0E5","T4G 6M4","V1N 8X3",
"C1B 5G1","R5H 2L4","H9S 8L4","L8E 4Y0","H2Y 7N6",
"K1B 7C0","G4A 5B0","E4P 3T2","E4V 5P4","H3J 1R5",
"G0B 4J7","E7A 6E7","E5B 2Y9","S4H 1T8","A2V 4G5",
"V8L 2A9","T9E 1M5","A5A 5M2","E4T 5B4","S2V 6C4",
"S9H 5P8","B1Y 0V0"),
TO_POSTAL = c("G0J 0B8","N0H 9N4","J9B 4Y4",
"L3Z 2Y7","E8K 4R4","B4P 7X9","S4H 2M0","A1Y 0B8",
"A1W 1E9","P9N 7X1","E4R 4B0","N0P 0M8","E1W 9Y7",
"T9W 8E2","G6X 4S9","A0E 0V4","J5X 7N8","N4N 8A1",
"V9K 0B9","L4G 3H7","E1W 0T2","G5R 9G3","L7C 9S2",
"E8P 2X6","E2A 2M1")
)
Here is the simple script I have to try to calculate the distance between the two postal codes by driving using Google's Distance Matrix API.
Driving_Distance = mapdist(from = Dataset_test[["FROM_POSTAL"]], to = Dataset_test[["TO_POSTAL"]], mode = c("driving")) %>% distinct()
When I run this, it throws an error in the Driving_Distance - says
Error: Argument 1 is a list, must contain atomic vectors
Your Canadian postal codes are hereby working with the mapdist() function.
The number of addresses used here were shortened for the sake of brevity.
A tibble was used instead of a dataframe so that the variables were character data types rather than factor data types. The actual Google API key that was used has been replaced with some text.
This was a good mapping question. The working code and output below:
library(ggmap)
library(plyr)
library(googleway)
library(tidyverse)
df = tibble(
FROM_POSTAL = c("V8A 0E5","T4G 6M4","V1N 8X3",
"C1B 5G1","R5H 2L4","H9S 8L4"),
TO_POSTAL = c("G0J 0B8","N0H 9N4","J9B 4Y4",
"L3Z 2Y7","E8K 4R4","B4P 7X9"))
dd <- apply(df, 1, function(x){
google_distance(origins = list(x["from"]),
destinations = list(x["to"]),
key="My_secret_key")
})
dd
Here is my sample dataset (called origAddress):
lat lng
1.436316 103.8299
1.375093 103.8516
1.369347 103.8398
1.367353 103.8426
I have many more rows of latitude and longitude numbers (330) and I would like to find the address. I have used this for loop to do that:
for(i in 1:nrow(origAddress))
{
# Print("Working...")
result <- google_reverse_geocode(location = c(origAddress$lat[i],origAddress$lng[i]),
key = key,
location_type = "rooftop")
if(is.null(result) || length(dim(result)) < 2 || !nrow(result)) next
origAddress$venadd <- geocode_address(result)
}
It works for the first three or four rows but then returns the same address as the first row although the latitude and longitude numbers are definitely different. I have looked at other stackoverflow questions(here) and tried to copy their approach with similar bad results.
Please help!
It looks like the calls to google_geocode can return more than one address for each lat/longitude pair thus you could be overwriting your data in the output data frame.
Also, I am not sure your if statement is evaluating properly.
Here is my attempt on your problem:
library(googleway)
origAddress<-read.table(header = TRUE, text = "lat lng
1.436316 103.8299
1.375093 103.8516
1.369347 103.8398
1.367353 103.8426")
#add the output column
origAddress$venadd<-NA
for(i in 1:nrow(origAddress))
{
# Print("Working...")
result <- google_reverse_geocode(location = c(origAddress$lat[i],origAddress$lng[i]),
key=key,
location_type = "rooftop")
#add a slight pause so not to overload the call requests
Sys.sleep(1)
if(result$status =="OK" ){
#multiple address can be returned with in gecode request picks the first one
origAddress$venadd[i] <- result$results$formatted_address[1]
#use this to collect all addresses:
#paste(result$results$formatted_address, collapse = " ")
}
}
Since the call to google_reverse_geocode returns the address, I just pull the first address from the result saving a call to the internet (performance improvement). Also since the call returns a status, I check for an OK and if exist save the first address.
Hope this helps.
I am trying to create a word cloud for bi-gram (and higher n grams) using the below code -
text_input <- scan("Path/Wordcloud.txt")
corpus <- Corpus(VectorSource(text_input))
corpus.ng = tm_map(corpus,removeWords,c(stopwords(),"s","ve"))
corpus.ng = tm_map(corpus.ng,removePunctuation)
corpus.ng = tm_map(corpus.ng,removeNumbers)
BigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 2, max = 2))
tdm.bigram = TermDocumentMatrix(corpus.ng,control = list(tokenize = BigramTokenizer))
tdm.bigram
freq = sort(rowSums(as.matrix(tdm.bigram)),decreasing = TRUE)
freq.df = data.frame(word=names(freq), freq=freq)
head(freq.df, 20)
pal=brewer.pal(8,"Blues")
pal=pal[-(1:3)]
wordcloud(freq.df$word,freq.df$freq,max.words=100,random.order = F, colors=pal)
I have seen similar code on few websites being used for generating n gram but I am getting only single word combinations in my output.
The code is not responding to changes in min and max being set to different values (2,3,4 etc) successively in the NGramTokenizer function.
Am I missing something in the code or is it possible that one of the libraries which I am calling in the code (tm,ggplot2,wordcloud,RWeka) or their dependencies (like rJava) is not responding? I will be really grateful if someone can throw some pointers regarding this issue or suggest modifications in the above code.
Thanks,
Saibal
You are missing out on mentioning the token delimiter.
token_delim <- " \\t\\r\\n.!?,;\"()"
BigramTokenizer <- NGramTokenizer(mycorpus, Weka_control(min=2,max=2, delimiters = token_delim))
This should work.
In case you need a working example, you can check this five-minute video:
https://youtu.be/HellsQ2JF2k
Hope this helps.
Also, some others have had problems using the Corpus function.
Try using the volatile corpus
corpus <- VCorpus(VectorSource(text_input))
I tried the following and it worked:
> minfreq_bigram<-2
> bitoken <- NGramTokenizer(corpus, Weka_control(min=2,max=2))
> two_word <- data.frame(table(bitoken))
> sort_two <- two_word[order(two_word$Freq,decreasing=TRUE),]
> wordcloud(sort_two$bitoken,sort_two$Freq,random.order=FALSE,scale =
c(2,0.35),min.freq = minfreq_bigram,colors = brewer.pal(8,"Dark2"),max.words=150)
I am trying to do some quantitative modeling in R. I'm not getting an error message, but the results are not what I actually need.
I am a newbie, but here is my complete code sample.
`library(quantmod)
#Building the data frame and xts to show dividends, splits and technical indicators
getSymbols(c("AMZN"))
Playground <- data.frame(AMZN)
Playground$date <- as.Date(row.names(Playground))
Playground$wday <- as.POSIXlt(Playground$date)$wday #day of the week
Playground$yday <- as.POSIXlt(Playground$date)$mday #day of the month
Playground$mon <- as.POSIXlt(Playground$date)$mon #month of the year
Playground$RSI <- RSI(Playground$AMZN.Adjusted, n = 5, maType="EMA") #can add Moving Average Type with maType =
Playground$MACD <- MACD(AMZN, nFast = 12, nSlow = 26, nSig = 9)
Playground$Div <- getDividends('AMZN', from = "2007-01-01", to = Sys.Date(), src = "google", auto.assign = FALSE)
Playground$Split <- getSplits('AMZN', from = "2007-01-01", to = Sys.Date(), src = "google", auto.assign = FALSE)
Playground$BuySignal <- ifelse(Playground$RSI < 30 & Playground$MACD < 0, "Buy", "Hold")
All is well up until this point when I start using some logical conditions to come up with decision points.
Playground$boughts <- ifelse(Playground$BuySignal == "Buy", lag(Playground$boughts) + 1000, lag(Playground$boughts))
It will execute but the result will be nothing but NA. I suppose this is because you are trying to add NA to a number, but I'm not 100% sure. How do you tell the computer I want you to keep a running tally of how much you have bought?
Thanks so much for the help.
So we want ot buy 1000 shares every time a buy signal is generated?
Your problem stems from MACD idicator. It actually generates two columns, macd and signal. You have to decide which one you want to keep.
Playground$MACD <- MACD(AMZN, nFast = 12, nSlow = 26, nSig = 9)$signal
This should solve the problem at hand.
Also, please check the reference for ifelse. The class of return value can be tricky at times, and so the approach suggested by Floo0 is preferable.
Also, I'd advocate using 1 and 0 instead of buy and sell to show weather you are holding . It makes the math much easier.
And I'd strongly suggest reading some beginner tutorial on backtesting with PerformanceAnalytics. They make the going much much easier.
BTW, you missed this line in the code:
Playground$boughts<- 0
Hope it helps.
EDIT: And I forgot to mention the obvious. discard the first few rows where MACD will be NA
Something like:
Playground<- Playground[-c(1:26),]
Whenever you want to do an ifelse like
if ... Do something, else stay the same: Do not use ifelse
Try this instead
ind <- which(Playground$BuySignal == "Buy")
Playground$boughts[ind] <- lag(Playground$boughts) + 1000