How to print git history in rmarkdown? - r

I am writing an analysis report with rmarkdown and would like to have a "document versions" section in which I would indicate the different versions of the document and the changes made.
Instead of writing it down manually, I was thinking about using git history and inserting it automatically in the markdown document (formatting it in a table).
How can I do that? Is it possible?

Install git2r, https://github.com/ropensci/git2r then you can do stuff like:
> r = repository(".")
> cs = commits(r)
> cs[[1]]
[02cf9a0] 2017-02-02: uses Rcpp attributes instead of inline
So now I have a list of all the commits on this repo. You can get the info out of each commit and format as per your desire into your document.
> summary(cs[[1]])
Commit: 02cf9a0ff92d3f925b68853374640596530c90b5
Author: barryrowlingson <b.rowlingson#gmail.com>
When: 2017-02-02 23:03:17
uses Rcpp attributes instead of inline
11 files changed, 308 insertions, 151 deletions
DESCRIPTION | - 0 + 2 in 2 hunks
NAMESPACE | - 0 + 2 in 1 hunk
R/RcppExports.R | - 0 + 23 in 1 hunk
R/auxfunctions.R | - 1 + 1 in 1 hunk
R/skewt.r | - 0 + 3 in 1 hunk
R/update_params.R | - 1 + 1 in 1 hunk
R/update_params_cpp.R | -149 + 4 in 2 hunks
src/.gitignore | - 0 + 3 in 1 hunk
src/RcppExports.cpp | - 0 + 76 in 1 hunk
src/hello_world.cpp | - 0 + 13 in 1 hunk
src/update_params.cpp | - 0 +180 in 1 hunk
So if you just want the time and the commit message then you can grab it out of the object.
> cs[[3]]#message
[1] "fix imports etc\n"
> cs[[3]]#committer#when
2017-01-20 23:26:20
I don't know if there's proper accessor functions for these rather than using #-notation to get slots. Need to read the docs a bit more...
You can make a data frame from your commits this way:
> do.call(rbind,lapply(cs,function(cs){as(cs,"data.frame")}))
which converts the dates to POSIXct objects, which is nice. Creating a markdown table from the data frame should be trivial!

You can manually convert git log to markdown with pretty=format [1]
Something like
git log --reverse --pretty=format:'| %H | %s |'
This will output something like this:
| a8d5defb511f1e44ddea21b42aec9b03ee768253 | initial commit |
| fdd9865e9cf01bd53c4f1dc106ee603b0a730f48 | fix tests |
| 10b58e8dd9cf0b9bebbb520408f0b342df613627 | add Dockerfile |
| d039004e8073a20b5d6eab1979c1afa213b78fa3 | update README.md |
1: https://git-scm.com/docs/pretty-formats

Related

Robot Framework API - how to get suite and its test cases results

I have a test suite directory which contains test suite files with one or more test cases. Let's say it looks like this:
TestSuite
Test-1
Step 1
Step 2
Test-2
Step 1
Test-3
Step 1
Step 2
Step 3
I would like to parse output.xml to get results like this:
Test-1 | PASS
Test-1 | Step 1 | PASS
Test-1 | Step 2 | PASS
Test-2 | PASS
Test-2 | Step 1 | PASS
Test-3 | PASS
Test-3 | Step 1 | PASS
Test-3 | Step 2 | PASS
Test-3 | Step 3 | PASS
So far I have managed to get only suite files names and results using this code:
from robot.api import ExecutionResult, SuiteVisitor
class PrintSuiteInfo(SuiteVisitor):
def visit_suite(self, suite):
print('{} | {}'.format(suite.name, suite.status))
result = ExecutionResult('output.xml')
result.suite.suites.visit(PrintSuiteInfo())
which gives this output:
Test-1 | PASS
Test-2 | PASS
Test-3 | PASS
I can get test case names and results with this code:
from robot.api import ExecutionResult, ResultVisitor
class PrintTestInfo(ResultVisitor):
def visit_test(self, test):
print('{} | {}'.format(test.name, test.status))
result = ExecutionResult('output.xml')
result.visit(PrintTestInfo())
but the output is:
Step 1 | PASS
Step 2 | PASS
Step 1 | PASS
Step 1 | PASS
Step 2 | PASS
Step 3 | PASS
so there is no relation to suite files which I need to update results in Jira.
The only thing that came to my mind is to include the suite file name in each test case name but I would like to learn more about robot.api. I looked into the documentation many times but it is not clear enough for me now.
One of my colleagues helped me with solving this. What I was missing was:
test.parent
or one that I figured out myself:
test.longname
which gives output like this:
TestSuite.Test-1.Step 1
TestSuite.Test-1.Step 2
...
It is documented here.

Filter for appearance of 2 values that must at least exist 1 times

Title may be bad, couldn't think of a better one.
My comment data, each comment is assigned to an account by usernameChannelId:
usernameChannelId | hasTopic | sentiment_sum | commentId
a | 1 | 4 | xyxe24
a | 0 | 2 | h5hssd
a | 1 | 3 | k785hg
a | 0 | 2 | j7kgbf
b | 1 | -2 | 76hjf2
c | 0 | -1 | 3gqash
c | 1 | 2 | ptkfja
c | 0 | -2 | gbe5gs
c | 1 | 1 | hghggd
My code:
SELECT u.usernameChannelId, avg(sentiment_sum) sentiment_sum, u.hasTopic
FROM total_comments u
WHERE u.hasTopic is True
GROUP BY u.usernameChannelId
HAVING count(u.usernameChannelId) > 0
UNION
SELECT u.usernameChannelId, avg(sentiment_sum) sentiment_sum, u.hasTopic
FROM total_comments u
WHERE u.hasTopic is False
GROUP BY u.usernameChannelId
I want to get all usernameChannelIds that have at least 1 comment with hasTopic == 0 and 1 comment with hasTopic == 1 (to compare both groups statistically and remove user that only commented in topic or offtopic videos).
How can I filter like that?
Here's a little trick that may help. First, you need to get familiar with the CASE expression., here's an excerpt from the doc.
The CASE expression
A CASE expression serves a role similar to IF-THEN-ELSE in other
programming languages.
The optional expression that occurs in between the CASE keyword and
the first WHEN keyword is called the "base" expression. There are two
basic forms of the CASE expression: those with a base expression and
those without.
An expression like CASE when hasTopic is False then 1 else 0 END will evaluate to 1 if hasTopic is 0. An expression for hasTopic is True would be similar.
Now, those CASEs can be summed, which will tell you if user has any rows with hasTopic True and hasTopic False.
Something like this in the having clause might do the trick (one for each value of course)
HAVING SUM(CASE when hasTopic is False then 1 else 0 END) > 0
(it would be necessary to remove the WHERE clause, and the UNION query would be unnecessary).

SQLite UPDATE returns empty

I'm trying to update a table column from another table with the code below.
Now the editor says '39 rows affected' and I can see something happened because some cells changed from null to empty (nothing shows).
While orhers are still null
What could be wrong here?
Why does it not update properly....
PS: I checked manually that the values are not empty in the column to check for.
UPDATE CANZ_CONC
SET EAN = (SELECT t1.EAN_nummer FROM ArtLev_CONC t1 WHERE t1.Artikelcode_leverancier = Artikelcode_leverancier)
WHERE ARTNMR IN (SELECT t1.Artikelcode_leverancier FROM Artlev_CONC t1 WHERE t1.Artikelcode_leverancier = ARTNMR);
Edit:
The tabel2 is like:
NMR | EAN | CUSTOM
-------------------------------
1 | 987 | A
2 | 654 | B
3 | 321 | C
Tabel 1 is like
NMR | EAN | CUSTOM
-------------------------------
1 | null | null
2 | null | null
5 | null | null
After the UPDATE table1 is like
NMR | EAN | CUSTOM
-------------------------------
1 | | null
2 | | null
5 | null | null
I've got this working.
I guess my data was corrupted after all.
Since it is about 330.000 rows it was not very easy to spot.
But it came to me when the loading of the data took about 10 minutes!
It used to be about 40 - 60 seconds.
So I ended up back at the drawing board for the initial csv file.
I also saw the columns had not been given a DATA type, so I altered that as well.
Thanx for the help!

How can I access the data in a Cassandra Table using RCassandra

I need to get the data in a column of a table Cassandra Database. I am using RCassandra for this. After getting the data I need to do some text mining on it. Please suggest me how do connect to cassandra, and get the data into my R Script using RCassandra
My RScript :
library(RCassandra)
connect.handle <- RC.connect(host="127.0.0.1", port=9160)
RC.cluster.name(connect.handle)
RC.use(connect.handle, 'mykeyspace')
sourcetable <- RC.read.table(connect.handle, "sourcetable")
print(ncol(sourcetable))
print(nrow(sourcetable))
print(sourcetable)
This will print the output as:
> print(ncol(sourcetable))
[1] 1
> print(nrow(sourcetable))
[1] 18
> print(sourcetable)
144 BBC News
158 IBN Live
123 Reuters
131 IBN Live
But my cassandra table contains four columns, but here its showing only 1 column. I need to get each column values separated. So how do I get the individual column values(Eg.each feedurl) What changes should I make in my R script?
My cassandra table, named sourcetable
I have used Cassandra and R with the correct Cran Jar files, but RCassandra is easier. RCassandra is a direct interface to Cassandra without the use of Java. To connect to Cassandra you will use RC.connect to return a connection handle like this.
RC.connect(host = <xxx>, port = <xxx>)
RC.login(conn, username = "bar", password = "foo")
You can then use a RC.get command to retrieve data or RC.ReadTable command to read table data.
BUT, First you should read THIS
I am confused as well. Table demo.emp has 4 row and 4 columns ( empid, deptid, first_name and last_name). Neither RC.get nor RC.read.table gets the all the data.
cqlsh:demo> select * from emp;
empid | deptid | first_name | last_name
-------+--------+------------+-----------
1 | 1 | John | Doe
1 | 2 | Mia | Lewis
2 | 1 | Jean | Doe
2 | 2 | Manny | Lewis
> RC.get.range.slices(c, "emp", limit=10)
[[1]]
key value ts
1 1.474796e+15
2 John 1.474796e+15
3 Doe 1.474796e+15
4 1.474796e+15
5 Mia 1.474796e+15
[[2]]
key value ts
1 1.474796e+15
2 Jean 1.474796e+15
3 Doe 1.474796e+15
4 1.474796e+15
5 Manny 1.474796e+15

UNIX (AIX) Command Help - Sed & Awk

I'm running this on an AIX 6.1.
The intended purpose of this command is to display the following information in the following format:
GetUsedRAM:GetUsedSwap:CPU_0_System:CPU_0_User:…CPU_N_System:CPU_N_User
The command is composed of several sub commands:
echo `vmstat 1 2 | tr -s ' ' ':' | cut -d':' -f4,5,14-15 | tail -1 | sed 's/\([0-9]*:[0-9]*:\)\([0-9]*:[0-9]*\)/\1/'``mpstat -a 1 1 | tr -s ' ' '|' | head -8 | tail -4 | cut -d'|' -f 25,27 | awk -F "|" '{printf "%.0f:%.0f:",$2,$1}' | sed '$s/.$//'| sed -e "s/ \{1,\}$//"| awk '{int a[10];split($1, a,":");printf("%d:%d:%d:%d:%d:%d:%d:%d",a[0],a[1],a[2],a[3],a[4],a[5],a[6],a[7])}'`
Which I'll re format for clarity:
echo \
`vmstat 1 2 |
tr -s ' ' ':' |
cut -d':' -f4,5,14-15 |
tail -1 |
sed 's/\([0-9]*:[0-9]*:\)\([0-9]*:[0-9]*\)/\1/' \
` \
`mpstat -a 1 1 |
tr -s ' ' '|' |
head -8 |
tail -4 |
cut -d'|' -f 25,27 |
awk -F "|" '{printf "%.0f:%.0f:",$2,$1}' |
sed '$s/.$//' |
sed -e "s/ \{1,\}$//" |
awk '{int a[10];split($1, a,":");printf("%d:%d:%d:%d:%d:%d:%d:%d",a[0],a[1],a[2],a[3],a[4],a[5],a[6],a[7])}' \
`
I understand all of the tr, cut, head tail, and (roughly) vmstat/mpstat commands. The first sed is where I get lost, I've tried running the command in smaller segments and not quite sure why it seems to work as a whole but not when I truncate the command before the next tr.
I'm also not so sure on the awk command although I understand the premise vaguely, as a function allowing formatted output.
Similarly, I have a vague understanding of sed being a command allowing certain strings/characters being replaced in some file.
I'm not able to make out what this specific implementation in the above case is.
Could anyone provide some clarity or direction as to exactly what is happening at each sed and awk step within the context of the entire command?
Thanks for your help.
Simplification
This two simpler commands will get the exact same output:
# GetUsedRAM:GetUsedSwap:CPU_0_System:CPU_0_User:…CPU_N_System:CPU_N_User
# Select fields 4,5 of last line, and format with :
comm1=`vmstat 1 2 |
awk '$4~/[0-9]/{avm=$4;fre=$5} END{printf "%s:%s",avm,fre}'
`
# Select fields 27 (sy) and 25 (us) for four cpu, print as decimal.
comm2=`mpstat -A 1 1 |
awk -v firstline=6 -v cpus=4 '
BEGIN{start=firstline-1; end=firstline+cpus;}
NR>start && NR<end {printf( ":%d:%d", $27,$25)}'
`
echo "${comm1}${comm2}"
Description.
Description of original commands
The whole command is the concatenation of two commands.
The first command:
The output of the vmstat is shown in this link.
The columns 4 and 5 are 'avm' and 'fre'. The output in columns 14 and 15,
seem to be 'us' (user) and 'sy' (system). And I say seem as no output
from the user is available to confirm.
The first command
`vmstat 1 2 | # Execute the command vmstat.
tr -s ' ' ':' | # convert all spaces to colon (:).
cut -d':' -f4,5,14-15 | # select fields 4,5,14,and 15
tail -1 | # select last line.
sed 's/\([0-9]*:[0-9]*:\)\([0-9]*:[0-9]*\)/\1/' \ # See below.
`
The sed command selects inside braces all digits [0-9]* before a colon
repeated twice. And then again (without the last colon). That's the whole
string in two parts: « (dd:dd:)(dd:dd) » (d means digit).
And finally, it replaces such whole string by what was selected inside
the first braces /\1/.
All this complexity just removes fields 14 and 15 as selected by cut.
A simpler command with exactly the same output is:
Select fields 4,5 of last line, and format with (:).
`vmstat 1 2 | awk '
$4~/[0-9]/{avm=$4;fre=$5} END{printf "%s:%s:",avm,fre}'
`
The second command:
The output of mpstat -A is similar to this one from Linux.
And also similar to this AIX mpstat -d output.
However, the exact output of AIX 6.1 for mpstat -a (ALL) on the computer
used could have several variations. Anyway, guided by the intended final
output desired: CPU_0_System:CPU_0_User:…CPU_N_System:CPU_N_User.
It seems that the columns to be selected should be us (user) and sy
(sys) percent of time that used the cpu for all cpu in use,
which seem to be four on the computer measured.
The manual for AIX 6.1 mpstat is here.
It has a list of all the 40 columns that are presented when the option
-a ALL is used:
CPU min maj mpcs mpcr dev soft dec ph cs ics bound rq push
S3pull S3grd S0rd S1rd S2rd S3rd S4rd S5rd S3hrd S4hrd S5hrd
sysc us sy wa id pc %ec ilcs vlcs lcs %idon %bdon %istol %bstol %nsp
us and sy are listed as the fields 27 and 28, however the command presented
by the user selects fields number 25 and 27. Close but not the same. The
only way to confirm would be to receive the output of the command from the user.
For testing I will be using the output of mpstat 5 1 from here.
# mpstat 5 1
System configuration: lcpu=4 ent=1.0 mode=Uncapped
cpu min maj mpc int cs ics rq mig lpa sysc us sy wt id pc %ec lcs
0 4940 0 1 632 685 268 0 320 100 263924 42 55 0 4 0.57 35.1 277
1 990 0 3 1387 2234 805 0 684 100 130290 28 47 0 25 0.27 16.6 649
2 3943 0 2 531 663 223 0 389 100 276520 44 54 0 3 0.57 34.9 270
3 1298 0 2 1856 2742 846 0 752 100 82141 31 40 0 29 0.22 13.4 650
ALL 11171 0 8 4406 6324 2142 0 2145 100 752875 39 51 0 10 1.63 163.1 1846
The second command
`mpstat -A 1 1 | # execute command
tr -s ' ' '|' | # replace all spaces with (|).
head -8 | # select 8 first lines.
tail -4 | # select last four lines.
cut -d'|' -f 25,27 | # select fields 25 and 27
awk -F "|" '{printf "%.0f:%.0f:",$2,$1}' | # print the fields as integers.
sed '$s/.$//' | # on the last line ($), substitute the last character (.$) by nothing.
sed -e "s/ \{1,\}$//" | # remove trailing space(s).
awk '{
int a[10];
split($1, a,":");
printf("%d:%d:%d:%d:%d:%d:%d:%d",a[0],a[1],a[2],a[3],a[4],a[5],a[6],a[7])
}' \
`
About the int: For older versions of awk, calling a function without the parentheses is equivalent to call the function on $0. int is equivalent to int($0), which is not printed, nor used. The same happens to the value of a[10].
The split sets each value of the command in a[i]. Then, all values of a[i] are printed as decimals.
The equivalent, and way simpler is:
Command #2
`mpstat -A 1 1 |
awk -v firstline=6 -v cpus=4 '
BEGIN{start=firstline-1; end=firstline+cpus;}
NR>start && NR<end {printf( ":%d:%d", $27,$25)}'
`

Resources