I have a list of locally saved html files. I want to extract multiple nodes from each html and save the results in a vector. Afterwards, I would like to combine them in a dataframe. Now, I have a piece of code for 1 node, which works (see below), but it seems quite long and inefficient if I apply it for ~ 20 variables. Also, something really strange with the saving to vector (XXX_name) it starts with the last observation and then continues with the first, second, .... Do you have any suggestions for simplifying the code/ making it more efficient?
# Extracts name variable and stores in a vector
XXX_name <- c()
for (i in 1:216) {
XXX_name <- c(XXX_name, name)
mydata <- read_html(files[i], encoding = "latin-1")
reads_name <- html_nodes(mydata, 'h1')
name <- html_text(reads_name)
#print(i)
#print(name)
}
Many thanks!
You can put the workings inside a function then apply that function to each of your variables with map
First, create the function:
read_names <- function(var, node) {
mydata <- read_html(files[var], encoding = "latin-1")
reads_name <- html_nodes(mydata, node)
name <- html_text(reads_name)
}
Then we create a df with all possible combinations of inputs and apply the function to that
library(tidyverse)
inputs <- crossing(var = 1:216, node = vector_of_nodes)
output <- map2(inputs$var, inputs$node, read_names)
How to create a SpatialPointsDataFrame from a list of SpatialPoints?
In the following there´s a code of a list containing SpatialPoints:.
SP1 <- SpatialPoints(cbind(1,5))
SP2 <- SpatialPoints(cbind(2,4))
SP3 <- SpatialPoints(cbind(3,3))
SP.l<-list(SP1,SP2, SP3)
What I´m looking for is a way to extract the SpatialPoints from the list and create a SpatialPointsDataFrame out of it.
With the following code I can get single SpatialPoints out of the list:
coords_3 = SP.l[[3]]#coords
data_3 = as.data.frame(SP.l[[3]])
SPDF_3 <- SpatialPointsDataFrame(coords=coords_3, data=as.data.frame(data_3))
However I´d like receive all at once.
Maybe something like:
SP <- SpatialPoints(lapply(1:length(lidR.clip.SP.l), function(i) {
...
EDIT:
what was missing was:
SP.l <- do.call("rbind", SP.l)
That´s what I was actually looking for.
thx!
Since there is no minimal working example as hrbrmstr. You need to provide one. For now, I use a sample data from the GISTools package and demonstrate one way. There is a data set call newhaven in the package. breach is the data. I made a copy of it and created foo, which class is SpatialPoints. I created two list elements using foo.
Using your code, I looped through each list element and converted SpatialPoints to SpatialPointsDataFrame. I hope you can figure out how to apply the following code to your case.
library(GISTools)
data(newhaven)
foo <- breach
mylist <- list(foo1 = breach[1:10, ],
foo2 = breach[11:20, ])
lapply(1:length(mylist), function(x){
SpatialPointsDataFrame(coords = mylist[[x]]#coords,
data = as.data.frame(mylist[[x]]))
})
If you want to bind all SPDFs, then you can try the following.
do.call(rbind, lapply(1:length(mylist), function(x){
SpatialPointsDataFrame(coords = mylist[[x]]#coords,
data = as.data.frame(mylist[[x]]))
})
)
what was missing was:
SP.l <- do.call("rbind", SP.l)
That´s what I was actually looking for.
thx!
I'm a newbye in R and I've seen several posts about downloading more stocks, but for a reason or another they don't work as suggested.
My purpose is to download a vector of stocks and create a whole xts-matrix containing only Close prices for every stock (so a nobservations x 3 columns).
Anyway, I'd like to start from a basic script that doesn't work properly:
library(quantmod)
ticker=c("KO","AAPL","^GSPC")
for (i in 1:length(ticker)) {
simbol=as.xts(na.omit(getSymbols(ticker[i],from="2016-01-01",auto.assign=F)))
new=Cl(simbol)
merge(new[i])
}
It would be even better to write a function(symbols) that allows me to call whenever I need to just change the name of the stocks to download.
Thanks to everyone
This is how I would do what you want with a function wrapper (which is a pretty common kind of manipulation with xts):
ticker=c("KO","AAPL","^GSPC")
collect_close_series <- function(ticker) {
# Preallocate a list to store the result from each loop iteration (Note: lapply is another alternative to a direct loop)
lst <- vector("list", length(ticker))
for (i in 1:length(ticker)) {
symbol <- na.omit(getSymbols(ticker[i],from="2016-01-01",auto.assign = FALSE))
lst[[i]] <- Cl(symbol)
}
# You have a list of close prices. You can combine the objects in the list compactly using do.call; this is a common "data manipulation pattern" with xts objects.
rr <- do.call(what = merge, lst)
rr
}
out <- collect_close_series(ticker)
More advanced (better code design): You could write cleaner code by writing a function that handles each symbol (rather than a function that wraps and passes in all the symbols together) and then run lapply on it:
per_sym_close <- function(tick) {
symbol <- na.omit(getSymbols(tick,from="2016-01-01",auto.assign = FALSE))
Cl(symbol)
}
out2 <- do.call(merge, lapply(X = ticker, FUN = per_sym_close))
This gives the same result.
Hope this helps getting you started toward writing good R code!
Thank you in advance for your advice. I am trying to create a new variable over multiple objects in a loop. These new variables are generated by a function.
For example, I have three sets of country-level data:
# Generate Example Data
`enter code here`pop <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(290,300,29,30,50,55))
gas <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(3.10,1.80,4.50,2.50,4.50,2.50))
cars <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(2.1,2.2,1.8,1.9,1.3,1.3))
I want to create a new variable, called “countrycode”, using the countrycode() command in the countrycode package.
I would perform the operation on individual objects like this:
library(countrycode)
pop$ccode <- countrycode(pop$country,"iso2c","cown")
pop$id <- (pop$ccode*10000)+pop$year
But I have a large number of objects. I was hoping to do this over a loop, like this
# Create list of variables
vars <- c("pop","gas","cars")
for (i in vars){
i$ccode <- countrycode(country,"iso2c","cown")
i$id <- (i$ccode*10000)+i$year
}
But that doesn’t work. I’ve been trying to do this using assign() in loops and apply(), but I’m too dense to get my head around how to make this work in my case.
If someone could provide me with an example of how to do this with my own type of data, I’d be very grateful.
Would this work for you?
pop <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(290,300,29,30,50,55))
gas <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(3.10,1.80,4.50,2.50,4.50,2.50))
cars <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(2.1,2.2,1.8,1.9,1.3,1.3))
attachCodes <- function(dframe)
{
df <- dframe
df$ccode <- countrycode(df$country,"iso2c","cown")
df$id <- (df$ccode*10000)+df$year
return(df)
}
tablesList <- list(pop,gas,cars)
tablesList <- lapply(tablesList,attachCodes)
Special thanks to #Pawel for supplying the missing information needed to solve the problem. The solution was:
rm(list=ls())
pop <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(290,300,29,30,50,55))
gas <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(3.10,1.80,4.50,2.50,4.50,2.50))
cars <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(2.1,2.2,1.8,1.9,1.3,1.3))
attachCodes <- function(dframe)
{
df <- dframe
df$ccode <- countrycode(df$country,"iso2c","cown")
df$id <- (df$ccode*10000)+df$year
return(df)
}
names <- list("pop","gas","cars")
for(i in names){
assign(i,attachCodes(get(i)))
}
I have a large character-vector file and I need to draw a random sample from it. This works fine. But I need to draw sample after sample. For that I want to shorten file by every element that is already drawn out of it (that I can draw a new sample without drawing the same element more than once).
I've got some solution, but I'm interested in anything else that might work faster and even more important, maybe correctly.
Here are my tries:
Approach 1
file <- rep(1:10000)
rand_no <- sample(file, 100)
library(car)
a <- data.frame()
for (i in 1:length(rand_no)){
a <- rbind(a, which.names(rand_no[i], file))
file <- file[-a[1,1]]
}
Problem:
Warning message:
In which.names(rand_no[i], file) : 297 not matched
Approach 2
file <- rep(1:10000)
rand_no <- sample(file, 100)
library(car)
deleter <- function(i) {
a <- which.names(rand_no[i], file)
file <- file[-a]
}
lapply(1:length(rand_no), deleter)
Problem:
This doesn't work at all. Maybe I should split the quesion, because the second problem clearly lies with me not fully understanding lapply.
Thanks for any suggestions.
Edit
I hoped that it will work with numbers, but of course file looks like this:
file <- c("Post-19960101T000000Z-1.tsv", "Post-19960101T000000Z-2.tsv", "Post-19960101T000000Z-3.tsv","Post-19960101T000000Z-4.tsv", "Post-19960101T000000Z-5.tsv", "Post-19960101T000000Z-6.tsv", "Post-19960101T000000Z-7.tsv","Post-19960101T000000Z-9.tsv")
Of course rand_no can't be over 100 files with such a small sample. Therefore:
rand_no <- sample(file, 2)
Use list instead of c. Then you can set the values to NULL and they will be removed.
file[file %in% rand_no] <- NULL This find all instances from rand_no in file and removes them.
file <- list("Post-19960101T000000Z-1.tsv",
"Post-19960101T000000Z-2.tsv",
"Post-19960101T000000Z-3.tsv",
"Post-19960101T000000Z-4.tsv",
"Post-19960101T000000Z-5.tsv",
"Post-19960101T000000Z-6.tsv",
"Post-19960101T000000Z-7.tsv",
"Post-19960101T000000Z-9.tsv")
rand_no <- sample(file, 2)
library(car) #From poster's code.
file[file %in% rand_no] <- NULL
If you are working with a large list of files, using %in% to compare strings may bog you down. In that case I would use indexes.
file <- list("Post-19960101T000000Z-1.tsv",
"Post-19960101T000000Z-2.tsv",
"Post-19960101T000000Z-3.tsv",
"Post-19960101T000000Z-4.tsv",
"Post-19960101T000000Z-5.tsv",
"Post-19960101T000000Z-6.tsv",
"Post-19960101T000000Z-7.tsv",
"Post-19960101T000000Z-9.tsv")
rand_no <- sample(1:length(file), 2)
library(car) #From poster's code.
file[rand_no] <- NULL
Sample() already returns values in a permuted order with no replacements (unless you set replace=T). So it will never pick a value twice.
So if you want three sets of 100 samples that don't share any elements, you can use
file <- rep(1:10000)
rand_no <- sample(seq_along(file), 300)
s1<-file[rand_no[1:100]]
s2<-file[rand_no[101:200]]
s3<-file[rand_no[201:300]]
Or if you wanted to decease the total size by 100 each time you could do
s1<-file[-rand_no[1:100]]
s2<-file[-rand_no[1:200]]
s3<-file[-rand_no[1:300]]
A simple approach would be to select random indices and then remove those indices:
file <- 1:10000 # Build sample data
ind <- sample(seq(length(file)), 100) # Select random indices
rand_no <- file[ind] # Compute the actual values selected
file <- file[-ind] # Remove selected indices
I think using sample and split could be a nice way of doing this, without having to alter your files variable. I'm not a big fan of mutation, unless you really need to, and this would let you know exactly which files you used for each chunk of the analysis going forward.
files<-paste("file",1:100,sep="_")
randfiles<-sample(files, 50)
randfiles_chunks<-split(randfiles,seq(1,length(randfiles), by=10))