Copy Row if Sheet1 A contains part of Sheet2 C - rss

So I'm trying to pull the data in a row from a separate sheet (sheet2!), if part of Col A has the the Date that is in sheet1! C1.
Col A ex: "Build 251 at Fri Jun 12 03:03:49 2015"
Col C1 ex: "Fri Jun 12" (Changes date every couple days)
I've tried these formulas but they don't work. The errors I get back are "finished with no results"; "error filter has mismatched range sizes"; "there is no ColumnA"; "formula parse error"
=filter("'GitHub-Changelog'!A", ("'GitHub-Changelog'!A" = 'x64 RSS Data'!C2))
=QUERY('GitHub-Changelog'!A:F,"select * where A contains '(TRANSPOSE(" "&C1:C&" "))'")
=FILTER('GitHub Changelog'!A,MMULT(SEARCH(TRANSPOSE(" "&'x64 RSS Data'!C1:C&" ")," "&'GitHub-Changelog'!A1:A&" "),SIGN(ROW('GitHub-Changelog'!A1:A))))
I'm not sure why I'm not getting results, the date is in A. If I use this =QUERY('GitHub-Changelog'!A:F,"select * where A contains 'Fri Jun 12'") It prints out the single row, it's just not reading C1 for some reason; and I need it to be dynamic to match whatever C1 changes to.
*The true future ideal goal would be to check Sheet1!C against Sheet2!A, if part of A contains C then copy whole row (Sheet2!A:F) into a single cell (Sheet1!E). Along the lines of IF Sheet2!A contains sheet1!C1 then copy (sheet1!E=Sheet2!D&C&B, but I believe that needs full script writing to accomplish this so I'm not sure how to do it yet, but will learn; one thing at a time though (just thought I'd share a better version of what I'm trying to accomplish).
Here is the sheet I'm working on: https://docs.google.com/spreadsheets/d/1lPOwiYGBK0kSJXXU9kaQjG7WNHjnNuxy25WCUudE5sk/edit?usp=sharing. It pulls multiple pages on different sheets, then cleanup pages of the data. The plan is to have an update sheet that searches the changelog info for the date of the current build and puts that data next the build. So the final sheet will show most recent build + commit changes for that nightly build. That's where this function is being used, to scrape the changelog for the same date.

See if this works:
=query('GitHub-Changelog'!A:F; "where A contains '"&C1&"' ")
where C1 (on the same sheet as the formula) is the cell that holds the date (ex: Fri Jun 12).

You don't need to surround the range with "".
Also, you can use Find() in your filter, to check if that date is present in the string.
Here is a working Filter formula:
=FILTER('GitHub-Changelog'!A:F, Find('x64 RSS Data'!C1,'GitHub-Changelog'!A:A))

Related

R read csv with comma in column

Update 2020-5-14
Working with a different but similar dataset from here, I found read_csv seems to work fine. I haven't tried it with the original data yet though.
Although the replies didn't help solve the problem because my question was not correct, Shan's reply fits the original question I posted the most, so I accepted his answer.
Update 2020-5-12
I think my original question is not correct. Like mentioned in the comment, the data was quoted. Although changing the separator made the 11582 row in R look the same as the 11583 row in excel, it doesn't mean it's "right". Maybe there is some incorrect line switch due to inappropriate encoding or something, and thus causing some of the columns to be displaced. If I open the data with notepad++, the instance at row 11583 in excel is at the 11596 row.
Original question
I am trying to read the listings.csv from this dataset in kaggle into R. I downloaded the file and wrote the coderead.csv('listing.csv'). The first column, the column id, is supposed to be numeric. However, it shows:
listing$id[1:10]
[1] 2015 2695 3176 3309 7071 9991 14325 16401 16644 17409
13129 Levels: Ole Berl穩n!,16736423,Nerea,Mitte,Parkviertel,52.55554132116211,13.340658248460871,Entire home/apt,36,6,3,2018-01-26,0.16,1,279\n17312576,Great 2 floor apartment near Friederich Str MITTE,116829651,Selin,Mitte,Alexanderplatz,52.52349354926847,13.391003496971203,Entire home/apt,170,3,31,2018-10-13,1.63,1,92\n17316675,80簡 m of charm in 3 rooms with office space,116862833,Jon,Neuk繹lln,Schillerpromenade,52.47499080234379,13.427509313575928...
I think it is because there are values with commas in the second column. For example, opening the file with MiCrosoft excel, I can see one of the value in the second column is Ole,Ole...:
How can I read a csv file into R correctly when some values contain commas?
Since you have access to the data in Excel, you can 'Save As' in Excel with a seperator other than comma (,). First go in to Control Panel –> Region and Language -> Additional settings, you can change the "List Seperator". Most common one other than comma is pipe symbol (|). In R, when you read_csv, specify the seperator as '|'.
You could try this?
lsitings <- read.csv("listings.csv", stringsAsFactors = FALSE)
listings$name <- gsub(",","", listings$name) - This will remove the comma in Col name
If you don't need the information in the second column, then you can always delete it (in Excel) before importing into R. The read.csv function, which calls scan, can also omit unwanted columns using the colClasses argument. However, the fread function from the data.table package does this much more simply with the drop argument:
library(data.table)
listings <- fread("listings.csv", drop=2)
If you do need the information in that column, then other methods are needed (see other solutions).

USQL reading last n days when file name pattern does not have day part

In data lake I have file names with pattern yyyyMM_data.csv. Now I want to read previous 3 days data. I am using below code -
DECLARE #ReportDate DateTime= DateTime.Parse("05/08/2017");
DECLARE #FeatureSummaryInput string=#"/FolderPath/{InputFileDate:yyyy}{InputFileDate:MM}_data.csv";
#FeaturedUsed =
EXTRACT Id string,InputFileDate DateTime
FROM #FeatureSummaryInput
USING Extractors.Csv(silent : true, skipFirstNRows : 1);
#FeaturedUsed=
SELECT *
FROM #FeaturedUsed
WHERE InputFileDate BETWEEN #ReportDate.AddDays(-3) AND #ReportDate;
If I run above code it runs with empty input. Please let me know if I am missing something. Why it is not reading correct file?
It seems like we need to must have "day" in file name pattern to work this.
Possibly I am missing something but, as you cast InputFileDate to DateTime it defaults to the first of the month, as no day is specified. For your test ReportDate set to 05/08/2017, your WHERE clause basically evaluates to Between 2017-08-02 And 2017-08-05, which will never be true.
Where do you expect the day element to come in with your files structured as yyyyMM?

Modify the dates in a huge file (around 1000 rows) using script

I have a requirement in which I need to subtract x number of days from dates present in a delimited file if the date exists excluding the first and last row. If the date does not exist in the specified field, ignore the same.
For example, aaa.txt contains
header
abc|20160431|dhadjs|20160325|hjkkj|kllls
ddd||dhajded|20160320|dwdas|hfehf
footer
I want the modified file to have the dates subtracted by 10 days. Something like below:-
header
abc|20160421|dhadjs|20160315|hjkkj|kllls
ddd||dhajded|20160310|dwdas|hfehf
footer
I don't want to use a programming language like Java to read the file but rather use a scripting language on unix. Any suggestions on how this can be done?

New cell with range between two cells

I have a cell with vehicle model years (starting A:2 and ending B:2) for example A:2 is 1997 and B:2 is 2001 How can I make the single cell C:2 show the range of years for example 1997 1998 1999 2000 2001 I need this for a shopping cart that will query specific years. My files have about 15,000 rows so I need to do this with a formula.
I need the end result to go into either the single cell c:2 or to populate in single cells to the right example c;2 d;2 e;2 etc. It is preferred to have them in a single cell separated by a space not a comma as this will be uploaded as a .csv file.
In C2 enter:
=A2
In C3 enter:
=IF( MAX($C$1:C2)=$B$2,"",C2+1)
and copy down
EDIT#1:
In order to place the result into a single cell run the following macro rather than using formulas:
Sub SpanOfYears()
Dim FirstCell As Range, SecondCell As Range, Result As Range
Dim I As Long
Set FirstCell = Range("A2")
Set SecondCell = Range("B2")
Set Result = Range("C2")
For I = FirstCell.Value To SecondCell.Value
Result.Value = Result.Value & " " & I
Next I
End Sub
with this result:
EDIT#2
Macros are very easy to install and use:
ALT-F11 brings up the VBE window
ALT-I
ALT-M opens a fresh module
paste the stuff in and close the VBE window
If you save the workbook, the macro will be saved with it.
If you are using a version of Excel later then 2003, you must save
the file as .xlsm rather than .xlsx
To remove the macro:
bring up the VBE window as above
clear the code out
close the VBE window
To use the macro from Excel:
ALT-F8
Select the macro
Touch RUN
To learn more about macros in general, see:
http://www.mvps.org/dmcritchie/excel/getstarted.htm
and
http://msdn.microsoft.com/en-us/library/ee814735(v=office.14).aspx
Macros must be enabled for this to work!

fread() error and strange behaviour when reading csv

I used fread() from data.table library to try read a 540MB csv file. It returned an error message saying:
' ends field 36 on line 4 when detecting types: 20.00,8/25/2006 0:00:00,"07:05:00 PM","CST",143.00,"OTTAWA","KS","HAIL",1.00,"S","MINNEAPOLIS",8/25/2006 0:00:00,"07:05:00 PM",0.00,,1.00,"S","MINNEAPOLIS",0.00,0.00,,88.00,0.00,0.00,0.00,,0.00,,"TOP","KANSAS, East",,3907.00,9743.00,3907.00,9743.00,"Dime to nickel sized hail.
I have no idea what caused the error and want to track down if it's a bug or just some data formating issue that I can tweak fread() to process.
I managed to read the csv using read.csv(), and decided to track down the row that triggered the error above (line 617174, not line 4 as the error message above). I then re-output the row and one row each immediately preceding and following the offending row, written out using write.csv() as testout.csv
I was able to read back testout.csv using read.csv() creating a data frame with 3 observations, as expected. Using fread() on testout.csv, however, resulted in a data table with only 1 observation, which is the last row.
The four lines in testout.csv are below (I start a new line for each entry below for readability).
"STATE__","BGN_DATE","BGN_TIME","TIME_ZONE","COUNTY","COUNTYNAME","STATE","EVTYPE","BGN_RANGE","BGN_AZI","BGN_LOCATI","END_DATE","END_TIME","COUNTY_END","COUNTYENDN","END_RANGE","END_AZI","END_LOCATI","LENGTH","WIDTH","F","MAG","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP","WFO","STATEOFFIC","ZONENAMES","LATITUDE","LONGITUDE","LATITUDE_E","LONGITUDE_","REMARKS","REFNUM"
20,"8/25/2006 0:00:00","07:01:00 PM","CST",139,"OSAGE","KS","TSTM WIND",5,"WNW","OSAGE CITY","8/25/2006 0:00:00","07:01:00 PM",0,NA,5,"WNW","OSAGE CITY",0,0,NA,52,0,0,0,"",0,"","TOP","KANSAS, East","",3840,9554,3840,9554,".",617129
20,"8/25/2006 0:00:00","07:05:00 PM","CST",143,"OTTAWA","KS","HAIL",1,"S","MINNEAPOLIS","8/25/2006 0:00:00","07:05:00 PM",0,NA,1,"S","MINNEAPOLIS",0,0,NA,88,0,0,0,"",0,"","TOP","KANSAS, East","",3907,9743,3907,9743,"Dime to nickel sized hail.
.",617130
20,"8/25/2006 0:00:00","07:07:00 PM","CST",125,"MONTGOMERY","KS","TSTM WIND",3,"N","COFFEYVILLE","8/25/2006 0:00:00","07:07:00 PM",0,NA,3,"N","COFFEYVILLE",0,0,NA,61,0,0,0,"",0,"","ICT","KANSAS, Southeast","",3705,9538,3705,9538,"",617131
When I ran fread("testout.csv", sep=",", verbose=TRUE), the output was
Input contains no \n. Taking this to be a filename to open
File opened, filesize is 1.05E-06B
File is opened and mapped ok
Detected eol as \r\n (CRLF) in that order, the Windows standard.
Looking for supplied sep ',' on line 5 (the last non blank line in the first 'autostart') ... found ok
Found 37 columns
First row with 37 fields occurs on line 5 (either column names or first row of data)
Some fields on line 5 are not type character (or are empty). Treating as a data row and using default column names.
Count of eol after first data row: 2
Subtracted 1 for last eol and any trailing empty lines, leaving 1 data rows
Type codes: 1444144414444111441111111414444111141 (first 5 rows)
Type codes: 1444144414444111441111111414444111141 (after applying colClasses and integer64)
Type codes: 1444144414444111441111111414444111141 (after applying drop or select (if supplied)
Any idea what may have caused the unexpected results, and the error in the first place? And any way around it? Just to be clear, my aim is to be able to use fread() to read the main file, even though read.csv() works so far.
UPDATE: Now fixed in v1.9.3 on GitHub :
fread() now accepts line breaks inside quoted fields. Thanks to Clayton Stanley for highlighting.See:
fread and a quoted multi-line column value
Windows users are reporting success with the latest version from GitHub.

Resources