Google Sheets: Split Multiple Rows of Delimited Text from IMPORTDATA() - web-scraping

I'm getting historical data for a cryptocurrency using an API from CoinAPI which returns multiple rows, but each row consists of delimited text:
In the second formula I added SPLIT(), but it only returns the first row, which in this case is the headers.
I've previously returned data for just a current price using a combination of INDEX( SPLIT( QUERY( IMPORTDATA() ))), but that was only splitting one row in the end.
Is there a way to split all the rows using one formula, in one cell?

try:
=INDEX(SPLIT(IMPORTDATA("url_here"), ";"))

Related

How to find a list of words in data in excel or R?

I want to search for matches between a list of words and a column, which has 22000 rows, and if there is a match, export the matched data. How can I find and export all the matches together instead of searching for every word one by one?
Is there any code for that in R?
Thank you

How to skip empty rows while reading multiple tabs in R?

I am trying to read an excel file with multiple tabs. For that, I use the code provided here.
The problem is that each tab has a different number of empty rows before the actual data begins. For example, the first tab has two empty rows, the second tab has three empty rows, and so on.
Normally, I would use the parameter skip in the read_excel function to indicate the number of empty lines to skip. But how do I do that for multiple tabs with different numbers of rows to skip?
perhaps the easiest solution would be to read it as it is then remove rows, i.e. yourdata <- yourdata[!is.na(yourdata$columname),] ; this would work if you don't expect any NA's in a particular column, like id. If you have data gaps everywhere you can test for all NAs in multiple columns - let me know if that's what you need.

How to get IMPORTXML/IMPORTHTML result data in one cell?

Working with Google Sheets, scraping from an html table like this:
I want to get all the rows in JUST ONE CELL...
like this:
And I couldn't get the way to do it!
The expected result is to get all the table data in an unique cell...
replacing columns division, just by a blank space
and converting rows to simple text lines.
Any help, please? =(
NOTE: First values not always include de ":" character. Number of rows in table may vary.
Building from the previous answer you can try:
=QUERY(TRANSPOSE(ARRAYFORMULA(CONCAT(QUERY(TRANSPOSE(importxml_formula),,9^9),CHAR(10)))),,9^9)
Explaining the breakdown:
QUERY(TRANSPOSE(importxml_formula),,9^9)
This returns a table with one row with the number of columns as rows in the original table, with the result of each column is the data of each row in the table.
ARRAYFORMULA(CONCAT(QUERY(TRANSPOSE(importxml_formula),,9^9),CHAR(10))
Each column will be appended with CHAR(10), which corresponds to a line break.
TRANSPOSE(ARRAYFORMULA(CONCAT(QUERY(TRANSPOSE(importxml_formula),,9^9),CHAR(10))))
Transpose the table into 1 column, x number of rows, and finally the last QUERY merges the column into a single cell.
Sample Output:
Update:
Your IMPORTXML() already returned a single cell, and since they can be split by double space, you can use this formula instead:
=QUERY(ARRAYFORMULA(CONCAT(TRANSPOSE(SPLIT(IMPORTXML(A1,B1)," ",,FALSE)),CHAR(10))),,9^9)
try:
=INDEX(SUBSTITUTE(SUBSTITUTE(QUERY(SUBSTITUTE(FLATTEN(QUERY(TRANSPOSE(
your_formula_here
),,9^9)), " ", "×"),,9^9), " ", CHAR(10)&CHAR(10)), "×", " "))

In R, how do I select variables from a data frame by string contained in names - either/or

I'm aware of how to select variables from a large data.frame based on the column name containing one defined string, as in: (How do I select variables in an R dataframe whose names contain a particular string?)
But how do I do this to select columns from the object that contain either one string or another?
I'd prefer not to have to split and recombine the df, so that the columns would be kept in their original order.
Here is my sample code, using grep, for obtaining variables matching the first string only, which works well:
df[grep("top",names(df),fixed=TRUE)]
grep won't take logical operators. So how do I select the second set of columns with "base" in the column name?
This should work:
df[grep("base",colnames(df))[2]]
or, in a somewhat more accurate and less error-prone style:
df[,grep("base",colnames(df))[2],drop=FALSE]
In both cases, the [2] at the end of the line specifies that you request the second column of df which contains the string "base" in its name.

Changing hundreds of column names simultaneously in R

I have a data frame with hundreds of columns whose names I want to change. I'm very new to R, so it's rather easy to think through the logic of this, but I simply can't find a relevant example online.
The closest I could sort of get was this:
projectFileAllCombinedNames <- for (i in 1:200){names(projectFileAllCombined)[i+1] <-variableNames[i]}
Basically, starting at the second column of projectFileAllCombined, I want to loop through the columns in the dataframe and assign them the data values in the second data frame. I was able to change one column name manually with this code:
colnames(projectFileAllCombined)[2]<-"newColumnName"
but I can't possibly do that for hundreds of columns. I've spent multiple hours on this and can't crack it with any number of Google searches on "change multiple columns in r" or "change column names in r". The best I can find online is examples where people change a few columns with a c() function and I get how that works, but that still seems to require typing out all the column names as parameters to the function, unless there is a way to just pass the "variableNames" file into that c() function, but I don't know of one.
Will
colnames(projectFileAllCombined)[-1] <- variableNames
not suffice?
This assumes the ordering of columns in projectFileAllCombined is the same as the ordering of the new variable names in variableNames, and that
length(variableNames) == (ncol(projectFileAllCombined) - 1)
The key point here is that the replacement function 'colnames<-'() is vectorised and can replace any number of column names in a single call if passed a vector of replacement values.

Resources