How can I extract information from a web page into an Excel sheet?
The website is https://www.proudlysa.co.za/members.php and I would like to extract all the companies listed there and all their respective information.
The process you're referring to is called web scraping, and there are several VBA tutorials out there for you to try.
Alternatively, you can always try
(source: netdna-ssl.com)
I tried creating something to grab for all pages. But ran of time and had bugs. This should help you a little. You will have to do this on all 112 pages.
Using chrome go to the page
type javascript: in the url then paste the code below. it should extra what you need. then you will have to just copy and paste it in to excel.
var list = $(document).find(".pricing-list");
var csv ="";
for (i = 0; list.length > i;i++) {
var dataTags = list[i].getElementsByTagName('li');
var dataArr = [];
for (j = 0; dataTags.length > j;j++) {
dataArr.push(dataTags[j].innerText.trim());
}
csv += dataArr.join(', ') + "<br>";
}
you will get something like this
EDITTED
use this instead will automatically download each page as csv then you can just combine them after somehow.
Make sure to type javascript: in url before pasting and pressing enter
Also works with chrome, not sure about other browsers. i dont use them much
var list = $(document).find(".pricing-list");
var csv ="data:text/csv;charset=utf-8,";
for (i = 0; list.length > i;i++) {
var dataTags = list[i].getElementsByTagName('li');
var dataArr = [];
for (j = 0; dataTags.length > j;j++) {
dataArr.push(dataTags[j].innerText.trim());
}
csv += dataArr.join(', ') + "\n";
}
var a = document.createElement("a");
a.href = ""+ encodeURI(csv);
a.download = "data.csv";
a.click();
Related
I'm using ClosedXML library to generate a simple Excel file with 2 worksheets.
I keep getting error message whenever i try to open the file saying
"We found a problem with some content in "example.xlsx". Do you want us to try to recover as much as we can. if you trust source of this workbook, click Yes"
If i click Yes, it displays the data as expected, i don't see any
problems with it. Also if i generate only 1 worksheet this error does
not appear.
This is what my stored procedure returns, first result set is populated in sheet1 and second result set is populated in sheet2, which works as expected.
Workbook data
Here is the method i am using, it returns 2 result sets and populates both result sets in 2 different worksheets:
[HttpPost]
[ValidateAntiForgeryToken]
public ActionResult POAReport(POAReportVM model)
{
POAReportVM poaReportVM = reportService.GetPOAReport(model);
using (var workbook = new XLWorkbook())
{
IXLWorksheet worksheet1 = workbook.Worksheets.Add("ProductOrderAccuracy");
worksheet1.Cell("A1").Value = "DATE";
worksheet1.Cell("B1").Value = "ORDER";
worksheet1.Cell("C1").Value = "";
var countsheet1 = 2;
for (int i = 0; i < poaReportVM.productOrderAccuracyList.Count; i++)
{
worksheet1.Cell(countsheet1, 1).Value = poaReportVM.productOrderAccuracyList[i].CompletedDate.ToString();
worksheet1.Cell(countsheet1, 2).Value = poaReportVM.productOrderAccuracyList[i].WebOrderID.ToString();
worksheet1.Cell(countsheet1, 3).Value = poaReportVM.productOrderAccuracyList[i].CompletedIndicator;
countsheet1++;
}
IXLWorksheet worksheet2 = workbook.Worksheets.Add("Summary");
worksheet2.Cell("A1").Value = "Total Orders Sampled";
worksheet2.Cell("B1").Value = "Passed";
worksheet2.Cell("C1").Value = "% Passed";
worksheet2.Cell(2, 1).Value = poaReportVM.summaryVM.TotalOrdersSampled.ToString();
worksheet2.Cell(2, 2).Value = poaReportVM.summaryVM.Passed.ToString();
worksheet2.Cell(2, 3).Value = poaReportVM.summaryVM.PassedPercentage.ToString();
//save file to memory stream and return it as byte array
using (var ms = new MemoryStream())
{
workbook.SaveAs(ms);
ms.Position = 0;
var content = ms.ToArray();
return File(content, "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
}
}
}
I had similar problem. For me the cause was as mentioned in this answer:
I had this issue when I was using EPPlus to customise an existing template. For me the issue was in the template itself as it contained invalid references to lookup tables. I found this in Formula -> Name Manager.
You may find other solutions there.
Apologies as my reputation is too low to add a comment.
I'm using Google Sheet's IMPORTDATA function to grab information from an XML file that is pulling from an API but the information I pull into the sheet isn't up to date.
How can I modify my sheet to pull in up-to-date data?
Compare the sheet: https://docs.google.com/spreadsheets/d/1W0Bt5z-Tky-tNhG_JtfE4FfjTRgQNRu_eQu2qVhQ-_E/edit?usp=sharing (LiveScores sheet)
To the XML: https://www67.myfantasyleague.com/2019/export?TYPE=liveScoring&L=64741&APIKEY=&W=14&DETAILS=1&JSON=0
Observe franchise id="0015" in both sets of data.
The sheet states <franchise id="0005" score="0.00" gameSecondsRemaining="21600" playersYetToPlay="6" playersCurrentlyPlaying="0" isHome="0">
The XML states <franchise id="0015" score="11.14" gameSecondsRemaining="20004" playersYetToPlay="4" playersCurrentlyPlaying="2"> (This data is for a football game that is currently being played as I'm writing this so the above example may not be exact, but it WON'T be score of 0.00, for example.
Any help would be amazing, thanks!
Have you tried using IMPORTXML? Google Sheets IMPORTXML Page
In IMPORTXML, you can just use the Inspect Element feature to pull the xpath.
Hope this helps. Let me know if I can help further.
Edit: Instructions To Change When Data Is Imported
In the toolbar go to the script editor
Now in the scripts, paste the code listed bellow
/**
* Go through all sheets in a spreadsheet, identify and remove all spreadsheet
* import functions, then replace them a while later. This causes a "refresh"
* of the "import" functions. For periodic refresh of these formulas, set this
* function up as a time-based trigger.
*
* Caution: Formula changes made to the spreadsheet by other scripts or users
* during the refresh period COULD BE OVERWRITTEN.
*
* From: https://stackoverflow.com/a/33875957/1677912
*/
function RefreshImports() {
var lock = LockService.getScriptLock();
if (!lock.tryLock(5000)) return; // Wait up to 5s for previous refresh to end.
// At this point, we are holding the lock.
var id = "YOUR-SHEET-ID";
var ss = SpreadsheetApp.openById(id);
var sheets = ss.getSheets();
for (var sheetNum=0; sheetNum<sheets.length; sheetNum++) {
var sheet = sheets[sheetNum];
var dataRange = sheet.getDataRange();
var formulas = dataRange.getFormulas();
var tempFormulas = [];
for (var row=0; row<formulas.length; row++) {
for (col=0; col<formulas[0].length; col++) {
// Blank all formulas containing any "import" function
// See https://regex101.com/r/bE7fJ6/2
var re = /.*[^a-z0-9]import(?:xml|data|feed|html|range)\(.*/gi;
if (formulas[row][col].search(re) !== -1 ) {
tempFormulas.push({row:row+1,
col:col+1,
formula:formulas[row][col]});
sheet.getRange(row+1, col+1).setFormula("");
}
}
}
// After a pause, replace the import functions
Utilities.sleep(5000);
for (var i=0; i<tempFormulas.length; i++) {
var cell = tempFormulas[i];
sheet.getRange( cell.row, cell.col ).setFormula(cell.formula)
}
// Done refresh; release the lock.
lock.releaseLock();
}
}
This snippet of code came from Periodically refresh IMPORTXML() spreadsheet function
Last and definitely the least, replace the "YOUR-SHEET-ID"
NOTE: I have not personally tested this code and I cannot vouch for it. I recommend making a copy and testing it there first.
Hopefully, this solves the issue of your data not being imported as often as you want. If you want to manually get "fresh" data, you can just delete/cut the import function and paste it back in.
try in A2:
=ARRAYFORMULA(IFNA(VLOOKUP(C2:C, PlayerList!A:F, {2, 6}, 0)))
and in C2:
=ARRAYFORMULA(QUERY(REGEXEXTRACT(QUERY(IMPORTDATA(
"https://www67.myfantasyleague.com/2019/export?TYPE=liveScoring&L=64741&APIKEY=&W=14&DETAILS=1&JSON=0?273"),
"where Col1 contains 'player id'", 0),
"(player id=""(\d+)).+?(score=""(\d+.\d+))"),
"select Col2,Col4"))
spreadsheet demo
Sorry in advance if I’m not phrasing this question correctly. I know nothing about InDesign scripting, but this would solve a workflow problem I’m having at my company, so I would appreciate any and all help.
I want to find all strings in an InDesign file that are between angle brackets (i.e. <variable>) and export that out into a list. Ideally this list would be a separate document but if it can just be dumped into a text frame or something that’s fine.
Any ideas on how to do this? Thank you in advance for any and all help.
Here is something simple:
app.findGrepPreferences=NothingEnum.NOTHING; //to reset the Grep search
app.findGrepPreferences.findWhat = '<[^<]+?>'; //the word(s) you are searching
var fnd = app.activeDocument.findGrep (); //execute search
var temp_str = ''; //initialize an empty string
for (var i = 0; i < fnd.length; i++) { //loop through results and store the results
temp_str += fnd[i].contents.toString() + '\r'; // add next found item text to string
}
var new_doc = app.documents.add (); //create a new document
app.scriptPreferences.measurementUnit = MeasurementUnits.POINTS; //set measurement units to points
var text_frame = new_doc.pages[0].textFrames.add({geometricBounds:[0,0,100,100]});// create a new text frame on the first page
text_frame.contents = temp_str; //output your text to the text frame in the new document
For more data see here.
I'm new to Java scripting and Google Apps Scripts so i am sorry if this has already been answered. I was not able to find what i was looking for over the last few months of working on this project.
I am working on a variant of the scripts here:
Delete row in Google Sheets if certain "word" is found in cell
AND
Google Sheet Script - Find Value in Col and Delete Row
I want to create a button, or menu, that will allow someone to enter specific data, and have each row in the spreadsheet containing that data deleted.
I have a test sheet here that illustrates the data i'm working with, formulas i'm using, and has the beginning of the script attached to it:
https://docs.google.com/spreadsheets/d/1e2ILQYf8MJD3mrmUeFQyET6lOLYEb-4coDTd52QBWtU/edit?usp=sharing
The first 4 sheets are pulling data from the "Form Responses 1" sheet via a formula in cell A:3 in each sheet so the data would only need to be deleted from the "Form Responses 1" sheet to clear it from the rest of the sheets.
I tried working this in but i do not think i am on the right track.
https://developers.google.com/apps-script/guides/dialogs
I also posted this on Google Docs Help Forum 60 days ago, but have not received any responses.
Any help would be greatly appreciated.
There's a few steps. For usability of UI this takes a little longer code. In concise form:
The user activates a dialog and enters a string.
Rows w/ the string are deleted (with error handling and confirmation)
(Hopefully this gets you started and you can tailor it to your needs)
Function that initiates the menu:
function onOpen(){
SpreadsheetApp.getUi()
.createMenu('My Menu')
.addItem('Delete Data', 'deleteFunction')
.addToUi();
}
The main workhorse:
function deleteFunction(){
//declarations
var sheetName = "Form Responses 1";
var ss = SpreadsheetApp.getActive();
var sheet = ss.getSheetByName(sheetName);
var dataRange = sheet.getDataRange();
var numRows = dataRange.getNumRows();
var values = dataRange.getValues();
var delete_string = getUIstring();//open initial UI, save value
if (delete_string.length < 3) return shortStringError()//UI to protect your document from an accidental entry of a very short string.
//removing the rows (start with i=2, so don't delete header row.)
var rowsDeleted = 0;
for (var i = 2; i <= numRows; i++){
var rowValues = values[i-1].toString();//your sheet has various data types, script can be improved here to allow deleting dates, ect.
if (rowValues.indexOf(delete_string) > -1){
sheet.deleteRow(i - rowsDeleted);//keeps loop and sheet in sync
rowsDeleted++;
}
}
postUIconfirm(rowsDeleted);//Open confirmation UI
}
Isolated UI functions to help make above function more concise:
function getUIstring(){
var ui = SpreadsheetApp.getUi();
var response = ui.prompt("Enter the target data element for deletion")
return response.getResponseText()
}
function postUIconfirm(rowsDeleted){
var ui = SpreadsheetApp.getUi();
ui.alert("Operation complete. There were "+rowsDeleted+" rows deleted.")
}
function shortStringError(){
var ui = SpreadsheetApp.getUi();
ui.alert("The string is too short. Enter a longer string to prevent unexpected deletion")
}
I'll just show a way to delete the cell value if it matches your search criteria. It's up to you to connect it to buttons ,etc.
You'll loop through a Sheet Range. When you find the word match, delete it using clearContent()
function deleteSpecificData() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet = ss.getSheets()[0];
var range = sheet.getRange("Sheet1!A1:C4");
var values = range.getValues();
var numArray = [1,2,3,4,5,6,7,8,9];
var deleteItem = "Garen";
Logger.log(range);
for(var i=0; i< values.length; i++){
for(var j=0; j<values[i].length; j++){
if(values[i][j] == deleteItem){
var row = numArray[i];
var col = numArray[j];
var range = sheet.getRange(row,col).clearContent();
}
}
}
}
Before:
After:
I have a standard HTML page with an CKEditor on it wrapped in a form.
The form submits (POSTS) to Send_Emails.aspx
Send_Emails.aspx reads the content of the FCKEditor into a variable
Dim html As String = Request.Form("ck_content")
Then it sends an email.
Problem
Characters such as:
 -> this seems to show as a special character for blank spaces/carriage returns
’ -> this seems to show as apostrophe's
Can you reccomend some methods to cleanze my post data of these non-standard characters?
Thanks
I figured out how to strip unwanted characters by using this function:
function removeMSWordChars(str) {
var myReplacements = new Array();
var myCode, intReplacement;
myReplacements[8216] = 39;
myReplacements[8217] = 39;
myReplacements[8220] = 34;
myReplacements[8221] = 34;
myReplacements[8212] = 45;
for(c=0; c<str.length; c++) {
var myCode = str.charCodeAt(c);
if(myReplacements[myCode] != undefined) {
intReplacement = myReplacements[myCode];
str = str.substr(0,c) + String.fromCharCode(intReplacement) + str.substr(c+1);
}
}
return str;
}