Using R to Automate Filename Retrieval in a Microsoft Word Table - r

I have a large table within a Microsoft Word document.
The majority of rows, but not all, have a single Microsoft Word file attached.
My job is to go into each row and manually type in the file name where an attachment is provided.
Is there any way to automate this task using an R package? For example, for each row that has a file attachment, automatically pull the filename and record it in the field directly to its left?
This is what the table looks like. The files are in the most right column. The column to its left is where I am going to be typing the filenames.
I've tried importing the docx file using the docxtractr package, but it is not reading in the filenames properly. Instead, it is replacing them with \s.
ievs_raw <- read_docx("ievs-raw.docx")
tbls <- docx_extract_all_tbls(real_world)
view(as.data.frame.list(tbls))
Produces the following output with \s where there should be filenames like CAP_ATT_H.11.114.docx etc.:

I wasn't able to figure this out using an R package, but the kind people at the Microsoft Community Forum helped out by providing a super useful Visual Basic Macro. What's great about this is it can accommodate cases where there is more than 1 attachment in a particular row.
Sub ObjectNames()
Dim ILS As InlineShape
Dim nObj As Long
Dim strName As String
Dim col As Long
Dim row As Long
With ActiveDocument.Tables(1)
col = .Columns.Count
For row = 1 To .Rows.Count
strName = ""
# loop through all shapes in this row's last cell
# (if there are none, the loop does nothing)
For nObj = 1 To .Cell(row, col).Range.InlineShapes.Count
Set ILS = .Cell(row, col).Range.InlineShapes(nObj)
If Not ILS.OLEFormat Is Nothing Then
# build up a string with as many names as
# there are embedded objects, separated by
# paragraph marks (vbCr)
If nObj > 1 Then strName = strName & vbCr
strName = strName & ILS.OLEFormat.IconLabel
End If
Next nObj
If Len(strName) > 0 Then
.Cell(row, col - 1).Range.Text = strName
End If
Next row
End With
End Sub

Related

How to copy workbook (.aspx file) from html link to current workbook

I have trouble with the following tasks in excel VBA:
At my work, we use a document management platform called TeamShare: [https://www.lector.dk/en/products/]
I want to create a code in VBA, that loops over a range of links to this document management platform in my workbook, ie. loops over other workbooks, opens them and then copies a specified sheet to my current workbook.
I have tried putting together bits of codes from other sites, and the code works just fine when i run it in break mode. However, when I run the code all at once, the Excel program reopens, such that the current workbook cannot "communicate" with the opened workbook and I end up in an infinity loop (so no direct error message).
This is the code that only works in break mode:
Dim wbCopyTo As Workbook Dim wsCopyTo As Worksheet Dim i As Long Dim Count As Long Dim WBCount As Long Dim LastRow As Long Dim wb As Workbook Dim ws As Worksheet Dim URL As String Dim IE As Object Dim doc As Object Dim objElement As Object Dim objCollection As Object
Set wbCopyTo = ActiveWorkbook Set wsCopyTo = ActiveSheet
LastRow = wsCopyTo.Range("B" & Rows.Count).End(xlUp).Row
For i = 2 To LastRow
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
The purpose of this piece of code is to get the DocID
A = InStr(wsCopyTo.Range("B" & i), "documentid=") + Len("documentid=")
B = InStrRev(wsCopyTo.Range("B" & i), "&")
DocID = Mid(wsCopyTo.Range("B" & i), A, B - A)
'Get URL
URL = wsCopyTo.Range("B" & i)
'Count number of open workbooks
WBCount = Workbooks.Count
With IE
New is the comman that opens excel sheet. This works as planned in breakmode, however the excel program reopens when i run the code all at once. I have tried other commandos here: "Workbooks.Open", I couldn't get this one to open the file and "Application.FollowHyperlink" only worked in break mode too, however, much much slower
.Navigate URL
'This was my solution to how to stop the rest of the code from executing until the new workbook has loaded.
Do Until Workbooks.Count = WBCount + 1: Loop
End With
'Unload IE
Set IE = Nothing
Set objElement = Nothing
Set objCollection = Nothing
'So in order to activate the workbook from the URL, I am looping over all my open workbooks and matching them on their unique Document ID. I found that the workbook from the URL wasn´t the "active workbook" per default.
For Each book In Workbooks If Mid(book.Name, 12, Len(DocID)) = DocID Then
book.Activate
Set wb = ActiveWorkbook
Set ws = ActiveSheet
End If
Next book
Here i copy the desired sheet to my initial workbook
wb.Worksheets("SpecificSheetIWantToCopy").Copy After:=wbCopyTo.Worksheets("Sheet1") wbCopyTo.Sheets(ActiveSheet.Name).Name = DocID
Next i
End Sub
I am using excel 2010.
I hope you can help me resolve this problem. Please ask if you need any more information, that i haven´t provided.
Thanks in advance.

Upload large file to Microsoft Access

I am quite new to setup a MS Access data base. Just wondering whether there is a way to upload a coma delimited file with more than 1.5 million rows and ignore the first 3 lines (file header) and the last row (footer).
The header for the content of this file is at the 4th row.
Finally i worked it out myself.
the header and the footer are having different number of columns.
I used line input statement to check each line of the my text file.
Here is my code:
Sub FileUpload_CMP_Funding()
Dim sFile, sText As String
Dim dText As Variant
Dim db As Database
Dim rst As Recordset2
Dim i As Long
sFile = "C:\NotBackedUp\testfile\CMPFUNding.out"
Open sFile For Input As #1
Do While Not EOF(1)
Line Input #1, sText
dText = Empty
dText = Split(vText(i), ",")
'My main content has 24 columns
If UBound(dText) - LBound(dText) + 1 = 24 Then
If dText(0) <> "Product ID" Then 'skip the header row at the 4th rows
Set db = CurrentDb
Set rst = db.OpenRecordset("tblCMP_Funding", dbOpenDynaset)
rst.AddNew
rst!ProductID = Trim(Replace(dText(0), """", ""))
rst!FundID = Trim(Replace(dText(1), """", ""))
""
'Update whatever field is required to be updated
rst.Update
Set db = Nothing
Set rst = Nothing
End If
End If
Loop
Close #1
End Sub
Hope it helps anyone who have same requirement

Excel column values being appended as rows instead of columns into datatable

I have an Excel file which I need to read through and extract particular values of a certain range into a datatable so I can then save that data into a table in a database.
Whilst debugging, on every loop I check the datatable visualizer to see what's going on and I find that I'm appending values of a different row, to the same row. Example in photo.
SamplePhoto
Here is the code responsible for that action.(Surrounded in a Try-Catch)
Using excel As New ExcelPackage (ulTarget.PostedFile.InputStream)
Dim _worksheet = excel.Workbook.Worksheets.First()
Dim _hasHeader = True
For Each cell In _worksheet.Cells(1,2,147,4)
_dataTable.Columns.Add(If(_hasHeader, cell.Value, String.Format("{0}", cell.Start.Column)))
If _worksheet.Cells.Value Is Nothing Then
Continue For
Next
Assume that the range 1,2,147,4 is correct as the data going into the datatable is correct, the row seperation is simply the problem. _dataTable is my DataTable (obvious I know but nothing bad in clarifying it) and _hasHeader is set to True because the Excel worksheet being uploaded has headers and I don't want them being put into the DataTable because the data will all end up in a table in SQL Server where appropriate column names exist.
Also ulTarget is my File uploader. I am using EPPlus most recent version for this.
Anybody have any suggestions as to how I can seperate the data into rows as per the example in the photo above? Happy to make any clarifications if needed.
Added the horizontal range of the columns without headers and read each cell row by row.
Using excel As New ExcelPackage(ulTargetFile.PostedFile.InputStream)
'Add the columns'
Dim _worksheet = excel.Workbook.Worksheets.First()
Dim _hasHeader As Boolean = False
For Each col In _worksheet.Cells("B1:D1")
_dataTable.Columns.Add(If(_hasHeader, Nothing, col.Value))
String.Format("{0}", col.Start.Column)
Next
'Add the rows'
For rowNumber As Integer = 1 To 147
Dim worksheetRow = _worksheet.Cells(rowNumber, 2, rowNumber, 4)
Dim row As DataRow = _dataTable.Rows.Add()
For Each cell In worksheetRow
row(cell.Start.Column - 2) = cell.Value
Next
Next

Exporting data from PowerPivot

I have an enormous PowerPivot table (839,726 rows), and it is simply too big to copy-paste into a regular spread sheet. I have tried copying it and then reading it directly into R using the line data = read.table("clipboard", header = T), but neither of these approaches work. I am wondering if there is some add-on or method I can use to export my PowerPivot table as a CSV or .xlsx? Thanks very much
Select all the PowerPivot table
Copy the data
Past the data in a text file (for example PPtoR.txt)
Read the text file in R using tab delimiter: read.table("PPtoR.txt", sep="\t"...)
To get a PowerPivot table into Excel:
Create a pivot table based on your PowerPivot data.
Make sure that the pivot table you created has something in values area, but nothing in filters-, columns- or rows areas.
Go to Data > Connections.
Select your Data model and click Properties.
In Usage tab, OLAP Drill Through set the Maximum number of records to retrieve as high as you need (maximum is 9999999 records).
Double-click the measures area in pivot table to drill-through.
another solution is
import the Powerpivot model to PowerBi desktop
export the results from PowerBI desktop using a Powershell script
here is an example
https://github.com/djouallah/PowerBI_Desktop_Export_CSV
A pure Excel / VBA solution is below. This is adapted from the code here to use FileSystemObject and write 1k rows at a time to the file. You'll need to add Microsoft ActiveX Data Objects Library and Microsoft Scripting Runtime as references.
Option Explicit
Public FSO As New FileSystemObject
Public Sub ExportToCsv()
Dim wbTarget As Workbook
Dim ws As Worksheet
Dim rs As Object
Dim sQuery As String
'Suppress alerts and screen updates
With Application
.ScreenUpdating = False
.DisplayAlerts = False
End With
'Bind to active workbook
Set wbTarget = ActiveWorkbook
Err.Clear
On Error GoTo ErrHandler
'Make sure the model is loaded
wbTarget.Model.Initialize
'Send query to the model
sQuery = "EVALUATE <Query>"
Set rs = CreateObject("ADODB.Recordset")
rs.Open sQuery, wbTarget.Model.DataModelConnection.ModelConnection.ADOConnection
Dim CSVData As String
Call WriteRecordsetToCSV(rs, "<ExportPath>", True)
rs.Close
Set rs = Nothing
ExitPoint:
With Application
.ScreenUpdating = True
.DisplayAlerts = True
End With
Set rs = Nothing
Exit Sub
ErrHandler:
MsgBox "An error occured - " & Err.Description, vbOKOnly
Resume ExitPoint
End Sub
Public Sub WriteRecordsetToCSV(rsData As ADODB.Recordset, _
FileName As String, _
Optional ShowColumnNames As Boolean = True, _
Optional NULLStr As String = "")
'Function returns a string to be saved as .CSV file
'Option: save column titles
Dim TxtStr As TextStream
Dim K As Long, CSVData As String
'Open file
Set TxtStr = FSO.CreateTextFile(FileName, True, True)
If ShowColumnNames Then
For K = 0 To rsData.Fields.Count - 1
CSVData = CSVData & ",""" & rsData.Fields(K).Name & """"
Next K
CSVData = Mid(CSVData, 2) & vbNewLine
TxtStr.Write CSVData
End If
Do While rsData.EOF = False
CSVData = """" & rsData.GetString(adClipString, 1000, """,""", """" & vbNewLine & """", NULLStr)
CSVData = Left(CSVData, Len(CSVData) - Iif(rsData.EOF, 3, 2))
TxtStr.Write CSVData
Loop
TxtStr.Close
End Sub
Here is a lovely low-tech way:
https://www.sqlbi.com/articles/linkback-tables-in-powerpivot-for-excel-2013/
I think the process is a little different in Excel 2016. If you have Excel 2016, you just go to the data tab, go to Get External Data, and then Existing Connections (and look under Tables).
The other important thing is to click on Unlink (under Table Tools - Design - External Table Data). This unlinks it from the source data, so it really is just an export.
You can copy that data into another workbook should you wish to.
Data in Power Pivot is modeled, using DAX Studio to export data to csv or SQL.
after finished, you will see that Each model corresponds to a CSV file or SQL table.

Prevent scientific notation when created an excel file in asp.net using response

I'm creating an excel file using the response object in an asp.net web application. Some of the long numeric values are being converted to scientific notation. I would like to keep the code I'm using because it prevents time-out issues I received due to the size of the data. Can someone offer any advice on how to modify the existing code to prevent columns from being converted to scientific notation?
Response.Clear()
Response.Buffer = True
Response.AddHeader("content-disposition", "attachment;filename=test.csv")
Response.Charset = ""
Response.Cache.SetCacheability(HttpCacheability.NoCache)
Response.ContentType = "application/vnd.xls"
Try
sqlconn.Open()
Dim dr As SqlDataReader = sqlcmd.ExecuteReader()
Dim sb As New StringBuilder()
'Add Header
For count As Integer = 0 To dr.FieldCount - 1
If dr.GetName(count) IsNot Nothing Then
sb.Append(dr.GetName(count))
End If
If count < dr.FieldCount - 1 Then
sb.Append(",")
End If
Next
Response.Write(sb.ToString() & vbLf)
Response.Flush()
'Append Data
While dr.Read()
sb = New StringBuilder()
For col As Integer = 0 To dr.FieldCount - 2
If Not dr.IsDBNull(col) Then
sb.Append(dr.GetValue(col).ToString().Replace(",", " "))
End If
sb.Append(",")
Next
If Not dr.IsDBNull(dr.FieldCount - 1) Then
sb.Append(dr.GetValue(dr.FieldCount - 1).ToString().Replace(",", " "))
End If
Response.Write(sb.ToString() & vbLf)
Response.Flush()
End While
dr.Dispose()
Catch ex As Exception
Finally
sqlconn.Close()
End Try
Response.End()
As far as I know, the issue is happening because Excel is reading your data and guessing at what type of data it is and if it thinks they are numbers for things that are really text with numbers in it, then it starts applying scientific notation.
The only way I know of to force this to not happen is to use the Excel API and force a column to a particular format, like this:
xlApp.Columns("A:A").Select()
xlApp.Selection.NumberFormat = "#"
Note: This will select column A in your worksheet and then force the number format to text.
Hi I know this question is a bit old but I just want to share my idea on how to prevent the scientific notations in excel generated files. I have tried the "#" but the cells on the respective row/column where I applied the .NumberFormat has a warning icon on it. It says that converting a number to text is invalid.
I have here a solution in which I personally used as a fixed to that warning, you can use a "#" instead "#". Here is the sample.
mySheet.Range("A:A").NumberFormat = "#"
I think using the "#" is for text data type only that is why the warning occurs.
Hope this helps.

Resources