How to insert Dataframe into Google Sheet dynamically using pygsheet - pygsheets

I am trying to insert a number of Dataframes into google spreadsheet. I am using the pygsheets module.
I have a variable that the stores the row number in a loop. I am trying to have the Dataframe inserted in the corresponding cell reference. I am doing as per below but I see the Dataframe gets overwritten in the same cell as it runs through a loop
sheet.set_dataframe(df, 'A' + '1 + x')
My expectations are to insert in the below 3 Dataframes starting from cells A6, A11, A16 respectively. Currently x has a value of 5 and it changes to 10 and 15 respectively as part of the loop.

I dont understand why you are not substituting for x. anyway this will do what you want in py3
sheet.set_dataframe(df, f'A{1+x}')

You can try like this:-
sheet.set_dataframe(df, ('A' + '1 + x'))
I hope this works for you.

Related

Compare cell against series of cell pairs

I'm trying to make a LibreOffice spreadsheet formula that populates a column based on another input column, comparing each input with a series of range pairs defined in another sheet and finally outputting a symbol based on matched criteria. I have a series of ranges that specify a - output, and another series that corresponds to +, but not all inputs will fall into a category. I am using this trinary output later for another expression, which I already have in place.
My question becomes: how can I test input against each range pair without spelling out the cell coordinates for each individual cell (ie OR(AND(">= $A$1", "< $B$1"), AND(">=$A$2", "<$B$2"), ...))? Ideally I could just specify an array to compare against like $A$1:$B$4. Writing it in a python macro would work, too, since I don't plan on sharing this file.
I wrote a really quick list comp in python to illustrate what I'm after. This snippet would be one half, such as testing - qualification, and these values may be fed into a condition that outputs the symbol:
>>> def cmp(f, r):
... return r[0] <= f < r[1]
>>> f = (1, 2, 3)
>>> ranges = ((2, 5), (4, 6), (3, 8))
>>> [any([cmp(i, r) for r in ranges]) for i in f]
[False, True, True]
Here is a small test example with real input and real ranges.
Change the range pairs so that they are in two columns starting from A13. Be sure that they are in sorted order (Data -> Sort).
A B C
~~~~~~~~ ~~~~~~~~ ~
145.1000 145.5000 -
146.0000 146.4000 +
146.6000 147.0000 -
147.0000 147.4000 +
147.6000 148.0000 -
440.0000 445.0000 +
In each row, specify whether it is negative or positive. To do this, I entered the following formula in C13 and filled down. If the range pairs are not consistent enough then enter values for C13 and below manually.
=IF(ISODD(ROW());"-";"+")
Now, enter the following formula in cell C3 and fill down.
=IFNA(IF(
VLOOKUP(A3;A$13:C$18;2;1) >= A3;
VLOOKUP(A3;A$13:C$18;3;1);
"None");"None")
The formula finds the closest pair and then checks if the number is inside that range or not. For better testing, I would also suggest using 145.7000 as input, which should result in no shift if I understood the question correctly.
The results in column C:
-
+
None
None
Documentation: VLOOKUP, IFNA, ROW.
EDIT:
The following formula produces correct results for the example data you gave, and it works for anything between 144.0 and 148.0.
=IFNA(VLOOKUP(A3;A$13:C$18;3;1); "None")
However, 150.0 produces - and 550.0 produces +. If that is not what you want, then use the formula above that has two VLOOKUP expressions.

Loading data with multiple delimiter (only in some cases) in a particular data file using apache-pig

150060275,NON-CRIMINAL,LOST PROPERTY,Monday,01/19/2015,14:00,MISSION,NONE,18TH ST / VALENCIA ST,-122.42158168137,37.7617007179518,"(37.7617007179518, -122.42158168137)",15006027571000
150098210,ROBBERY,"ROBBERY, BODILY FORCE",Sunday,02/01/2015,15:45,TENDERLOIN,NONE,300 Block of LEAVENWORTH ST,-122.414406029855,37.7841907151119,"(37.7841907151119, -122.414406029855)",15009821003074
In the second row the third field has a ',' in between which shouldn't be taken as a delimiter.How do i solve this?
if i use STRPLIT(), then it works for the 2nd row, but generates wrong result for the 1st row.
Load it into a single field, replace the comma and space with | and then use strsplit on each line.
A = LOAD 'data.txt' USING TextLoader() AS (line:chararray);
B = FOREACH A GENERATE REPLACE(line,', ','|');
C = FOREACH B GENERATE STRSPLIT(B.$0,',',13);-- Assuming there are 13 fields.
Alternatively, you can use CSVExcelStorage and PiggyBank.

reading/writing data frame to google sheets using pygsheets

What is the correct program flow to write different sized data frame to the same worksheet but ensure only the most recent data values written are visible?
Here was my original sequence:
gc = pygsheets.authorize(outh_file=oauth_file)
sh = gc.open(sheet_name)
wks = sh.worksheet_by_title(wks_name)
wks.set_dataframe(df, (1, 1))
Problem with above sequence is if 1st write was 3800 rows x 12 cols and 2nd write was 2400 rows x 12 cols the wks would still show data from the prior write for rows above 2400.
My 2nd solution (basically a hack just to get it to work for me):
gc = pygsheets.authorize(outh_file=oauth_file)
sh = gc.open(spreadsheet_name)
wks = sh.worksheet_by_title(sheet_name)
sh.del_worksheet(wks)
sh.add_worksheet(sheet_name, rows=len(df) + 1, cols=len(df.columns))
wks = sh.worksheet_by_title(sheet_name)
wks.set_dataframe(df, (1, 1))
The above sequence basically does what I want but I do not like having to delete the wks (I lose all my manual formatting). I know there must be a correct way to accomplish but I do not know the pygsheets API very well.
Will a more advanced pygsheet users please advise proper program flow and methods to use?
TIA,
--Rj
fit=True will basically resize the sheet to fit you data frame. so if you wanna keep the sheet at same size, you can clear the sheet before next write. it wold be easier than your second solution. Also if you just wanna clear the range you had written earlier, you can pass a range to clear function.
wks.set_dataframe(df, (1, 1))
wks.clear()
wks.set_dataframe(df, (1, 1))

How to create excel formula that will add an number to specific digits in a multi digit number

Ex: I enter the number 9876543210 in a cell.
I want to create an if then formula to add a sequential number to this but working only off of the last digit. the zero in this example.
If the last digit is >= to 3 than add 5 if the last digit is <=2 than add 15.
Then have this formula repeat for 10 numbers - is that possible?
so i imput the 9876543210
it then show:
9876543225
9876543230
9876543245
and so on
=IF((RIGHT(A1,1)/1)>2,A1+5,A1+15)
Assumed that you update the number in the cell A1. Paste the above formula in A2 and copy paste downwards.
If this is Excel, you may want to use MOD (modulo or remainder) function to get the last digit and then perform an IF-THEN or nested IF-THEN to achieve this.
=IF(MOD(A1,10)=3, A1+15, IF(MOD(A1,10)=5, A1+20, A1+30))
This formula translates to the following decision tree:
IF the last digit of the value in cell A3 is 3 Then
Add 15 to it
ELSEIF the last digit of the value in cell A3 is 5 then
Add 20 to it
ELSE
Add 30 to it
END IF
Repeating the operation may require some VBA. If you already know the number of times you need to repeat the operation, you can pre-populate formulas in subsequent rows/columns, each time refer to the immediately preceding cell. For example, if you want to repeat it 5 times, you should compute the diff of first two cells and then add that diff to the value of immediately preceding row/column like this (assuming A1 had the original value, B1 had the formula I posted above and C1 through G1 are the next 5 cells):
In C1: =B1 + ($B1 - $A1)
In D1: =C1 + ($B1 - $A1)
and so on...
Note the use of absolute and relative addresses in these formulae. You can copy/paste the formula in C1 to the subsequent cells and it will automatically adjust itself to refer to immediately preceding cell.
EDIT
I just realized that you want to evaluate the MOD formula in each subsequent cell. In that case you simply need to copy/paste it to subsequent cells instead of using 2nd and 3rd formulas I posted above.

Counting specific characters in a string, across a data frame. sapply

I have found similar problems to this here:
Count the number of words in a string in R?
and here
Faster way to split a string and count characters using R?
but I can't get either to work in my example.
I have quite a large dataframe. One of the columns has genomic locations for features and the entries are formatted as follows:
[hg19:2:224840068-224840089:-]
[hg19:17:37092945-37092969:-]
[hg19:20:3904018-3904040:+]
[hg19:16:67000244-67000248,67000628-67000647:+]
I am splitting out these elements into thier individual elements to get the following (i,e, for the first entry):
hg19 2 224840068 224840089 -
But in the case of the fourth entry, I would like to pase this into two seperate locations.
i.e
hg19:16:67000244-67000248,67000628-67000647:+]
becomes
hg19 16 67000244 67000248 +
hg19 16 67000628 67000647 +
(with all the associated data in the adjacent columns filled in from the original)
An easy way for me to identify which rows need this action is to simply count the rows with commas ',' as they don't appear in any other text in any other columns, except where there are multiple genomic locations for the feature.
However I am failing at the first hurdle because the sapply command incorrectly returns '1' for every entry.
testdat$multiple <- sapply(gregexpr(",", testdat$genome_coordinates), length)
(or)
testdat$multiple <- sapply(gregexpr("\\,", testdat$genome_coordinates), length)
table(testdat$multiple)
1
4
Using the example I have posted above, I would expect the output to be
testdat$multiple
0
0
0
1
Actually doing
grep -c
on the same data in the command line shows I have 10 entries containing ','.
Using the example I have posted above, I would expect the output to be
So initially I would like to get this working but also I am a bit stumped for ideas as to how to then extract the two (or more) locations and put them on thier own rows, filling in the adjacent data.
Actually what I intended to to was to stick to something I know (on the command line) grepping the rows with ','out, duplicate the file and split and awk selected columns (1st and second location in respective files) then cat and sort them. If there is a niftier way for me to do this in R then I would love a pointer.
gregexpr does in fact return an object of length 1. If you want to find the rows which have a match vs the ones which don't, then you need to look at the returned value , not the length. A match failure returns -1 .
Try foo<-sapply(testdat$genome, function(x) gregexpr(',',x)); as.logical(foo) to get the rows with a comma.

Resources