Is there a fast way to re-format and compile odd data in Excel using VBA or R? - r

I have over 200 sheets in an Excel workbook that are each formatted in a really odd way and I need to figure out how to compile all the data that I need into a single master sheet. I only need the values from certain cells and ranges (shown in the code below). I'd like the final compiled sheet to be in long-form (see attached image).
There is an attached image that is an example of the format of each sheet - it contains all the cells but does not contain any actual data. In reality, there is a lot of data - some sheets have >1000 rows.
I tried to use a function in R to read in all the sheets as separate data frames so that I could merge them but I couldn't get it to work. I then tried to use VBA, but I'm not familiar with the syntax. Here's what I came up with:
Sub Copy_Example()
Dim J As Integer
Dim s As Worksheet
On Error Resume Next
Sheets(1).Select
Worksheets.Add ' add a sheet in first place
Sheets(1).Name = "Combined"
Worksheets("Sheet2").Range("D9").Copy Destination:=Worksheets("Combined").Range("A2")
Worksheets("Sheet2").Range("E2").Copy Destination:=Worksheets("Combined").Range("B2")
Worksheets("Sheet2").Range("E3").Copy Destination:=Worksheets("Combined").Range("C2")
Worksheets("Sheet2").Range("E4").Copy Destination:=Worksheets("Combined").Range("D2")
Worksheets("Sheet2").Range("E5").Copy Destination:=Worksheets("Combined").Range("E2")
Worksheets("Sheet2").Range("C22:C2000").Copy Destination:=Worksheets("Combined").Range("F1")
Worksheets("Sheet2").Range("E22:E2000").Copy Destination:=Worksheets("Combined").Range("G1")
Worksheets("Sheet2").Range("F22:F2000").Copy Destination:=Worksheets("Combined").Range("H1")
Worksheets("Sheet2").Range("G22:G2000").Copy Destination:=Worksheets("Combined").Range("I1")
Worksheets("Sheet2").Range("H22:H2000").Copy Destination:=Worksheets("Combined").Range("J1")
Worksheets("Sheet2").Range("I22:I2000").Copy Destination:=Worksheets("Combined").Range("K1")
End Sub
This VBA will copy and paste the correct columns and ranges into a newly created worksheet only for Sheet 2. I tried to integrate additional snippets of code so that this would run through all sheets in the workbook and paste the data below the last line previously added but I can't get it to work. I would also love to be able to add a column with the name of the sheet that the data has been copied from.
If anyone can help me with this, using either R or VBA, I would really appreciate it.
This is an example of the format of each sheet
This is an example of what I'd like the master compiled sheet to look like

Try the below code
Sub CopyToCombined()
Dim oComWS As Worksheet, oWS As Worksheet
Dim iLR As Long: iLR = 1
' Add New sheet as "Combined"
Set oComWS = ThisWorkbook.Worksheets.Add
oComWS.Name = "Combined"
' Loop through all sheets in the workbook and copy details in Combined sheet
For Each oWS In ThisWorkbook.Worksheets
If oWS.Name <> "Combined" Then
With oWS
oComWS.Range("A" & iLR).Value = .Range("A3").Value
oComWS.Range("B" & iLR).Value = .Range("B5").Value
oComWS.Range("C" & iLR).Value = .Range("C26").Value
End With
iLR = iLR + 1
End If
Next
End Sub
Above code will go through all sheets in your workbook and copy the relevant data (obviously you will have to change what you want to copy)
EDIT 1:
As per requirement, below code should update the Combined as you requested
Sub CopyToCombined()
Dim oComWS As Worksheet, oWS As Worksheet
Dim iLR As Long: iLR = 1
Dim iC As Long
Dim aCleanArray As Variant, aMyRange As Variant, aColumn As Variant
' Add New sheet as "Combined"
Set oComWS = ThisWorkbook.Worksheets.Add
oComWS.Name = "Combined"
' Set arrays
aMyRange = Array("C20:C50", "D20:D50") ' <-- Set all your ranges here (i.e. "C22:C2000", "E22:E2000", ...)
aColumn = Array("C", "D") ' <-- Set the columns here (i.e. "F", "G", ...)
' Loop through all sheets in the workbook and copy details in Combined sheet
For Each oWS In ThisWorkbook.Worksheets
If oWS.Name <> "Combined" Then
With oWS
oComWS.Range("A" & iLR).Value = .Range("A2").Value
oComWS.Range("B" & iLR).Value = .Range("B2").Value
For iC = LBound(aMyRange) To UBound(aMyRange)
aCleanArray = CleanUpArray(.Range(aMyRange(iC)).Value)
oComWS.Range(aColumn(iC) & iLR & ":" & aColumn(iC) & (iLR + UBound(aCleanArray))).Value = Application.Transpose(aCleanArray)
Next
End With
iLR = oComWS.Range(aColumn(0) & oComWS.Rows.Count).End(xlUp).Row + 1
End If
Next
End Sub
Function CleanUpArray(aIncomigArray As Variant) As Variant
Dim aTemp() As Variant
Dim iC As Long
ReDim aTemp(0 To 0)
For iC = LBound(aIncomigArray) To UBound(aIncomigArray)
If Not IsEmpty(aIncomigArray(iC, 1)) Then
aTemp(UBound(aTemp)) = aIncomigArray(iC, 1)
ReDim Preserve aTemp(UBound(aTemp) + 1)
End If
Next
ReDim Preserve aTemp(UBound(aTemp) - 1)
CleanUpArray = aTemp
End Function
Hope this helps

Related

For each value in column copy offset (0,3) value and paste in another worksheet

I have a worksheet (1) that pulls sections of a larger worksheet (2) in based on a filtered value. I then enter data for various invoices. I have the invoice number in column A and the entered data in column D. I want to run a macro that will for each value in column A worksheet (1) find the same value in worksheet (2), offset 3 columns and paste the data I entered into the offset cell from worksheet 1.
I found this code but it does not reference copying the offset data and when I run it nothing happens
Dim ws1 As Worksheet, ws2 As Worksheet
Dim lRow1 As Long, lRow2 As Long
Dim Cell As Range, Found As Range
Set ws1 = Worksheets("Commission_Projection")
Set ws2 = Worksheets("Comm_Report")
With ws2
'Find last row in Col A
lRow2 = .Range("A" & .Rows.Count).End(xlUp).Row
'Loop through col A to find values
For Each Cell In .Range("A10:A" & lRow2)
'Search ws1 for Value
Set Found = ws1.Columns(2).Find(What:=Cell.Value, _
After:=ws1.Cells(1, 2), _
LookIn:=xlFormulas, _
LookAt:=xlWhole, _
SearchOrder:=xlByRows, _
SearchDirection:=xlPrevious, _
MatchCase:=False)
If Not Found Is Nothing Then
Cell.Offset(0, 5).Value = ws1.Cells(Found.Row, Found.Column + 1).Value
End If
Next Cell
End With

Excel VBA Value wildcards are not returning a result for find a bold function

In range M:M there is text in each cell. I need to Bold ever time the date format yyyy-mm-dd appears in each cell in the range. I have been using the following formula to do the same operation for defined text but i am unable to get it to work when i am using wildcards.
I am not properly defining
Dim rCell As Range, sToFind As String, iSeek As Long
Dim myWord As String
myWord = "202#[-]##[-]##"
sToFind = myWord
For Each rCell In Range("M1:M1000")
iSeek = InStr(1, rCell.Value, sToFind)
Do While iSeek > 0
rCell.Characters(iSeek, Len(sToFind)).Font.Bold = True
iSeek = InStr(iSeek + 1, rCell.Value, sToFind)
Loop
Next
End Sub

Trigger Excel Worksheet_Change to change cells using R

I modify a cell in an Excel file using R.
When this cell is manually changed, Worksheet_Change launches to indicate the date of the modification (in two other cells).
When I run my code in R, the date of the modification in column 46 (see VBA code ModificationDate1) is indicated. The date of the modification in column 40 (see VBA code ModificationDate2) doesn't appear. I get a VBA error 1004.
It is not indicated where the error appears. I can't click debug.
I would like the date of the modification in column 40 to also appear. (This later appears when I make a change directly in Excel.)
My code in R:
if (NewInput != CurrentData) {
xlApp <- COMCreate("Excel.Application")
wb <- xlApp[["Workbooks"]]$Open("path.xlsm")
sheet <- wb$Worksheets("Sheet1")
cell <- sheet$Cells(Outputrow + 6, 7)
cell[["Value"]] <- paste0(NewInput)
wb$Save()
xlApp$Quit()
}
My codes in vba:
Private Sub Worksheet_Change(ByVal Target As Range)
'**ModificationDate1**
Dim WorkRng As Range
Dim Rng As Range
LastRow = Sheets("Sheet1").Cells(Rows.Count, 1).End(xlUp).Row
Set Rng = Range(Cells(7, 7), Cells(LastRow, 7))
If Not Intersect(Target, Rng) Is Nothing Then
Call UnprotectCells
Cells(Target.Row, 46).Value = Date
Call ProtectCells
End If
'**ModificationDate2**
Dim rng1 As Range
Dim rng2 As Range
Set rng1 = Range(Cells(7, 3), Cells(LastRow, 4))
Set rng2 = Range(Cells(7, 7), Cells(LastRow, 10))
If Not Intersect(Target, Union(rng1, rng2)) Is Nothing Then
Call UnprotectCells
Cells(Target.Row, 40).Value = Date
Call ProtectCells
End If
End Sub

Splitting a dataframe into parts by detection, then writing to multiple csv's?

I have a csv as shown in the image below. The data is a set of separate tables, separated by a blank line, that I require to be in separate csv files.
After importing to R, I'd like to split the data into the various separate tables, and then write these tables to separate csv files. I had the idea of using some kind of string detect, as a 'new' table is signified by the first instance of 'Area' in the first column. Any ideas of how to approach the code for this in R? There are a bunch of tables and doing this manually isn't advisable.
There's a truncation problem too it seems, as the tables will be required to have a differing amounts of columns, however I don't expect that getting rid of NULL or NA data should be too difficult with this.
Thanks for any help.
I don't think R is the right tool for this kind of thing. You should always try to use the right tool based on the task. Since you have Excel installed run this VBA script. That will do what you want.
Sub page_endings()
Dim i As Long 'how many times for pagebreak
Dim searchvalue_for_break_after 'value to do pagebreak
searchvalue_for_break_after = ""
'column A must be filled in with value break after
'example row 6, 12, 18, 24 whatever row you want
'will loop until empty row in column A
For i = 1 To Range("A" & Rows.Count).End(xlUp).Row + 1
If Range("A" & i).Value = searchvalue_for_break_after Then
'will add a pagebreak after the row with value break after
ActiveWindow.SelectedSheets.HPageBreaks.Add before:=Range("A" & i).Offset(1)
End If
Next i
Call Create_Separate_Sheet_For_Each_HPageBreak
End Sub
Sub Create_Separate_Sheet_For_Each_HPageBreak()
Dim HPB As HPageBreak
Dim RW As Long
Dim PageNum As Long
Dim Asheet As Worksheet
Dim Nsheet As Worksheet
Dim Acell As Range
'Sheet with the data, you can also use Sheets("Sheet1")
Set Asheet = ActiveSheet
If Asheet.HPageBreaks.Count = 0 Then
MsgBox "There are no HPageBreaks"
Exit Sub
End If
With Application
.ScreenUpdating = False
.EnableEvents = False
End With
'When the macro is ready we return to this cell on the ActiveSheet
Set Acell = Range("A1")
'Because of this bug we select a cell below your data
'http://support.microsoft.com/default.aspx?scid=kb;en-us;210663
Application.Goto Asheet.Range("A" & Rows.Count), True
RW = 1
PageNum = 1
For Each HPB In Asheet.HPageBreaks
'Add a sheet for the page
With Asheet.Parent
Set Nsheet = Worksheets.Add(after:=.Sheets(.Sheets.Count))
End With
'Give the sheet a name
On Error Resume Next
Nsheet.Name = "Page " & PageNum
If Err.Number > 0 Then
MsgBox "Change the name of : " & Nsheet.Name & " manually"
Err.Clear
End If
On Error GoTo 0
'Copy the cells from the page into the new sheet
With Asheet
.Range(.Cells(RW, "A"), .Cells(HPB.Location.Row - 1, "K")).Copy _
Nsheet.Cells(1)
End With
' If you want to make values of your formulas use this line also
' Nsheet.UsedRange.Value = Nsheet.UsedRange.Value
RW = HPB.Location.Row
PageNum = PageNum + 1
Next HPB
Asheet.DisplayPageBreaks = False
Application.Goto Acell, True
With Application
.ScreenUpdating = True
.EnableEvents = True
End With
Call SaveWorksheetsAsCsv
End Sub
Sub SaveWorksheetsAsCsv()
Dim WS As Excel.Worksheet
Dim SaveToDirectory As String
Dim CurrentWorkbook As String
Dim CurrentFormat As Long
CurrentWorkbook = ThisWorkbook.FullName
CurrentFormat = ThisWorkbook.FileFormat
' Store current details for the workbook
SaveToDirectory = "C:\Users\Excel\Desktop\"
For Each WS In ThisWorkbook.Worksheets
Sheets(WS.Name).Copy
ActiveWorkbook.SaveAs Filename:=SaveToDirectory & ThisWorkbook.Name & "-" & WS.Name & ".csv", FileFormat:=xlCSV
ActiveWorkbook.Close savechanges:=False
ThisWorkbook.Activate
Next
Application.DisplayAlerts = False
ThisWorkbook.SaveAs Filename:=CurrentWorkbook, FileFormat:=CurrentFormat
Application.DisplayAlerts = True
' Temporarily turn alerts off to prevent the user being prompted
' about overwriting the original file.
End Sub
You should take each different table to the uppermost part. All in all, you have 5 tables with different dimensions (Table1: 11x13; Table2: 11x9; Table3: 3x12; Table4: 10x5; Table5: 6x7). Take them side-by-side in the above (A1:M11; N1:V11 etc.). The headings of tables would be in 1st row.
library(readxl)
# Use the path returned from getwd() function that is R's working directory
df <- as.data.frame(read_excel("C://Users//User//Documents//Revolution//Your.xlsx"))
Then, you can handle these 5 tables as:
Table1 <- df[,1:13]
Table2 <- df[,14:22]
Table3 <- df[1:3,23:34]
Table4 <- df[1:10,35:39]
Table5 <- df[1:6,40:46]
By caring dimensions stemmed from different row numbers in the assignmets, you do not face any NA or NULL value in Table1...Table5.

count the unique values in one column in EXCEL 2010 or R with 1 million rows

After searching the forum, I did not find a good solution for this question. If I missed it, please tell me.
I need to count the unique values in one column in EXCEL 2010.
The worksheet has 1 million rows and 10 columns. All cell values are string or numbers.
I used the solution at Count unique values in a column in Excel
=SUMPRODUCT((A2:A1000000<>"")/COUNTIF(A2:A100000,A2:A1000000&""))
But, it runs so long time that the EXCEL is almost frozen. And, it generates 25 processes in Win 7.
Are there more efficient ways to do it?
Also, in the column, all values have for format of
AX_Y
here, A is a character, X is an integer, Y is an integer from 1 to 10.
For example, A5389579_10
I need to cut off the part after (including) undersocre. for the example,
A5389579
This is what I need to count as unique values in all cells in one column.
For example, A5389579_10
A1543848_6
A5389579_8
Here, the unique value has 2 after removing the part after underscore.
How to do it in EXCEL VBA and R (if no efficient solution for EXCEL)?
If you want to do this by VBA, you can take advantage of the Collection object. Since collections can only contain unique values, trying to add all of your input data to a collection will result in an array of unique values. The code below takes all the variables in a selected range and then outputs an array with distinct values to an other sheet (in this case a sheet named Output).
Sub ReturnDistinct()
Dim Cell As Range
Dim i As Integer
Dim DistCol As New Collection
Dim DistArr()
Dim OutSht As Worksheet
Dim LookupVal As String
Set OutSht = ActiveWorkbook.Sheets("Output") '<~~ Define sheet to putput array
If TypeName(Selection) <> "Range" Then Exit Sub
'Add all distinct values to collection
For Each Cell In Selection
If InStr(Cell.Value, "_") > 0 Then
LookupVal = Mid(Cell.Value, 1, InStr(Cell.Value, "_") - 1)
Else
LookupVal = Cell.Value
End If
On Error Resume Next
DistCol.Add LookupVal, CStr(LookupVal)
On Error GoTo 0
Next Cell
'Write collection to array
ReDim DistArr(1 To DistCol.Count, 1 To 1)
For i = 1 To DistCol.Count Step 1
DistArr(i, 1) = DistCol.Item(i)
Next i
'Outputs distinct values
OutSht.Range("A1:A" & UBound(DistArr)).Value = DistArr
End Sub
Note that since this code writes all the distinct values to a single column in the OutSht-sheet, this will return an error if there are more than 1,048,576 distinct values in your dataset. In that case you would have to split the data to be filled into multiple output columns.
For your specific request to count, use the below in a formula like =COUNTA(GetUniques(LEFT("A1:A100000",FIND("_","A1:A100000")-1)) entered as an array formula with Ctrl+Shift+Enter.
It also accepts multiple ranges / values (e.g. GetUniques("A1:A10","B2:E4"))
Function GetUniques(ParamArray args())
Dim arg, ele, arr, i As Long
Dim c As Collection
Set c = New Collection
For Each arg In args
If TypeOf arg Is Range Then
If arg.Count = 1 Then
arr = array(arg.value)
Else
arr = arg.Value
End If
ElseIf VarType(arg) > vbArray Then
arr = arg
Else
arr = Array(arg)
End If
For Each ele In arr
On Error Resume Next
c.Add ele, VarType(ele) & "|" & CStr(ele)
On Error GoTo 0
Next ele
Next arg
If c.Count > 0 Then
ReDim arr(0 To c.Count - 1)
For i = 0 To UBound(arr)
arr(i) = c(i + 1)
Next i
Set c = Nothing
GetUniques = arr
End If
End Function
edit: added a performance optimisation for ranges (loads them at once into an array - much faster than enumerating through a range)
In R:
# sample data
df <- data.frame(x=1:1000000,
y=sample(1e6:(1e7-1),1e6,replace=T))
df$y <- paste0("A",df$y,"_",sample(1:10,1e6,replace=T))
# this does the work...
length(unique(sub("_[0-9]+","",df$y)))
# [1] 946442
# and it's fast...
system.time(length(unique(sub("_[0-9]+","",df$y))))
# user system elapsed
# 2.01 0.00 2.02
In excel 2010... in the next column add (if original data was in A:A add in B1)
= 1/COUNTIF(A:A,A1) and copy down col B to the bottom of your data. Depending on your PC it may chug away calculating for a long time, but it will work. Then copy col B & paste values over itself.
Then SUM col B

Resources