Decrypt Word document knowing part of its content - encryption

I have a ciphered .docx document I would like to recover and I don't remember the password. I'm trying brute-forcing it but it's taking way too long, so it's not going to be option. However, I know the exact content of part of it (296 characters). Any help?

Unfortunately, part of the document wouldn't help.
To get to the cleartext, any cracker would still need to go through trying to crack the password hash that is exported from the document, and with your logic try to decrypt the file and interpret it's content, compare it to the known cleartext. There is no such funcitonality, especially for specialized document formats.
Here is an example how to approach it:
Document: encrypted_doc.docx
Password: 123horse123
You will have to use office2john to export the hash to be cracked from the document.
wget https://raw.githubusercontent.com/magnumripper/JohnTheRipper/bleeding-jumbo/run/office2john.py
python office2john.py encrypted_doc.docx > doc_pass_hash.txt
cat doc_pass_hash.txt
encrypted_doc.docx:$**office$*2013***100000*256*16*e77e386a8e68462d2a0a703718febbc9*08ee275ccf4946ae0e5922e9ff3114b7*0ab5fc00964f7ed4be9e45c77a33b441b2c4874d28e4bc30f38e99bfb169fcf4
Remembering some information about the password(complexity, some chosen words if any, character set etc.) mask attack could help you run a more effective way to uncover the document.
Run hashcat --help to see which document file are you dealing with:
9700 | MS Office <= 2003 $0/$1, MD5 + RC4 | Documents
9710 | MS Office <= 2003 $0/$1, MD5 + RC4, collider #1 | Documents
9720 | MS Office <= 2003 $0/$1, MD5 + RC4, collider #2 | Documents
9800 | MS Office <= 2003 $3/$4, SHA1 + RC4 | Documents
9810 | MS Office <= 2003 $3, SHA1 + RC4, collider #1 | Documents
9820 | MS Office <= 2003 $3, SHA1 + RC4, collider #2 | Documents
9400 | MS Office 2007 | Documents
9500 | MS Office 2010 | Documents
9600 | MS Office 2013 | Documents
Based on what you can recall from the password, you can choose from the following:
- [ Attack Modes ] -
# | Mode
===+======
0 | Straight
1 | Combination
3 | Brute-force
6 | Hybrid Wordlist + Mask
7 | Hybrid Mask + Wordlist
Here are the options for hashcat to specify the password:
?l = abcdefghijklmnopqrstuvwxyz
?u = ABCDEFGHIJKLMNOPQRSTUVWXYZ
?d = 0123456789
?h = 0123456789abcdef
?H = 0123456789ABCDEF
?s = «space»!"#$%&'()*+,-./:;<=>?#[\]^_`{|}~
?a = ?l?u?d?s
?b = 0x00 - 0xff
You can also create your own dictionary, which then will be used when generating the passwords, if you remember at least part of the password. This can be the most efficient help.
So in my example, let's run a brute force attack with mask(3 digits, 5 alphabetical characters, and another 3 digits):
hashcat -m 9600 -a 3 doc_pass_hash.txt --username -o cracked_pass.txt ?d?d?d?l?l?l?l?l?d?d?d --force
You can hit [s] for status:
[s]tatus [p]ause [b]ypass [c]heckpoint [q]uit => s
Session..........: hashcat
Status...........: Running
Hash.Type........: MS Office 2013
Hash.Target......: $office$*2013*100000*256*16*e77e386a8e68462d2a0a703...69fcf4
Time.Started.....: Sat May 30 16:59:30 2020 (3 mins, 41 secs)
Time.Estimated...: Next Big Bang (17614 years, 157 days)
Guess.Mask.......: ?d?d?d?l?l?l?l?l?d?d?d [11]
Guess.Queue......: 1/1 (100.00%)
Speed.#1.........: 21 H/s (7.50ms) # Accel:128 Loops:32 Thr:1 Vec:8
Recovered........: 0/1 (0.00%) Digests, 0/1 (0.00%) Salts
Progress.........: 4608/11881376000000 (0.00%)
Rejected.........: 0/4608 (0.00%)
Restore.Point....: 0/1188137600000 (0.00%)
Restore.Sub.#1...: Salt:0 Amplifier:9-10 Iteration:24672-24704
Candidates.#1....: 623anane123 -> 612kerin123
As you see, this one doesn't seem to be very effective (Time.Estimated...: Next Big Bang (17614 years, 157 days)), however, adding a wordlist is a good idea:
cat wordlist.txt
dog
horse
cat
hashcat -m 9600 -a 6 doc_pass_hash.txt wordlist.dict ?d?d?d?l?l?l?l?l?d?d?d --username -o cracked_pass.txt --forces
Session..........: hashcat
Status...........: Running
Hash.Type........: MS Office 2013
Hash.Target......: $office$*2013*100000*256*16*e77e386a8e68462d2a0a703...69fcf4
Time.Started.....: Sat May 30 17:15:34 2020 (1 min, 25 secs)
Time.Estimated...: Next Big Bang (734631 years, 226 days)
Guess.Base.......: File (wordlist.dict), Left Side
Guess.Mod........: Mask (?d?d?d?l?l?l?l?l?d?d?d) [11], Right Side
Guess.Queue.Base.: 1/1 (100.00%)
Guess.Queue.Mod..: 1/1 (100.00%)
Speed.#1.........: 2 H/s (0.47ms) # Accel:128 Loops:32 Thr:1 Vec:8
Recovered........: 0/1 (0.00%) Digests, 0/1 (0.00%) Salts
Progress.........: 129/35644128000000 (0.00%)
Rejected.........: 0/129 (0.00%)
Restore.Point....: 0/3 (0.00%)
Restore.Sub.#1...: Salt:0 Amplifier:43-44 Iteration:32000-32032
Candidates.#1....: dog360verin123 -> cat360verin123
As we see this is not yet correct, as the candidates generate prior the mask. So this needs some more tweaking.
Masks you can define specific characters as well, for instance:
hashcat -m 9600 -a 3 doc_pass_hash.txt ?d?d?dhorse?d?d?d --username -o cracked_pass.txt --force
Session..........: hashcat
Status...........: Cracked
Hash.Type........: MS Office 2013
Hash.Target......: $office$*2013*100000*256*16*e77e386a8e68462d2a0a703...69fcf4
Time.Started.....: Sat May 30 17:24:32 2020 (28 secs)
Time.Estimated...: Sat May 30 17:25:00 2020 (0 secs)
Guess.Mask.......: ?d?d?dhorse?d?d?d [11]
Guess.Queue......: 1/1 (100.00%)
Speed.#1.........: 18 H/s (8.21ms) # Accel:128 Loops:32 Thr:1 Vec:8
Recovered........: 1/1 (100.00%) Digests, 1/1 (100.00%) Salts
Progress.........: 512/1000000 (0.05%)
Rejected.........: 0/512 (0.00%)
Restore.Point....: 0/100000 (0.00%)
Restore.Sub.#1...: Salt:0 Amplifier:0-1 Iteration:99968-100000
Candidates.#1....: 123horse123 -> 112horse778
cat cracked_pass.txt
$office$*2013*100000*256*16*e77e386a8e68462d2a0a703718febbc9*08ee275ccf4946ae0e5922e9ff3114b7*0ab5fc00964f7ed4be9e45c77a33b441b2c4874d28e4bc30f38e99bfb169fcf4:123horse123
Cracked password in the end of the file: 123horse123
There is more to be read about rules and cracking with increased password lenght (--incremental) and combined attacks, but you get the idea.
Here are the official basic examples to get you started:
- [ Basic Examples ] -
Attack- | Hash- |
Mode | Type | Example command
==================+=======+==================================================================
Wordlist | $P$ | hashcat -a 0 -m 400 example400.hash example.dict
Wordlist + Rules | MD5 | hashcat -a 0 -m 0 example0.hash example.dict -r rules/best64.rule
Brute-Force | MD5 | hashcat -a 3 -m 0 example0.hash ?a?a?a?a?a?a
Combinator
| MD5 | hashcat -a 1 -m 0 example0.hash example.dict example.dict

Related

How to print git history in rmarkdown?

I am writing an analysis report with rmarkdown and would like to have a "document versions" section in which I would indicate the different versions of the document and the changes made.
Instead of writing it down manually, I was thinking about using git history and inserting it automatically in the markdown document (formatting it in a table).
How can I do that? Is it possible?
Install git2r, https://github.com/ropensci/git2r then you can do stuff like:
> r = repository(".")
> cs = commits(r)
> cs[[1]]
[02cf9a0] 2017-02-02: uses Rcpp attributes instead of inline
So now I have a list of all the commits on this repo. You can get the info out of each commit and format as per your desire into your document.
> summary(cs[[1]])
Commit: 02cf9a0ff92d3f925b68853374640596530c90b5
Author: barryrowlingson <b.rowlingson#gmail.com>
When: 2017-02-02 23:03:17
uses Rcpp attributes instead of inline
11 files changed, 308 insertions, 151 deletions
DESCRIPTION | - 0 + 2 in 2 hunks
NAMESPACE | - 0 + 2 in 1 hunk
R/RcppExports.R | - 0 + 23 in 1 hunk
R/auxfunctions.R | - 1 + 1 in 1 hunk
R/skewt.r | - 0 + 3 in 1 hunk
R/update_params.R | - 1 + 1 in 1 hunk
R/update_params_cpp.R | -149 + 4 in 2 hunks
src/.gitignore | - 0 + 3 in 1 hunk
src/RcppExports.cpp | - 0 + 76 in 1 hunk
src/hello_world.cpp | - 0 + 13 in 1 hunk
src/update_params.cpp | - 0 +180 in 1 hunk
So if you just want the time and the commit message then you can grab it out of the object.
> cs[[3]]#message
[1] "fix imports etc\n"
> cs[[3]]#committer#when
2017-01-20 23:26:20
I don't know if there's proper accessor functions for these rather than using #-notation to get slots. Need to read the docs a bit more...
You can make a data frame from your commits this way:
> do.call(rbind,lapply(cs,function(cs){as(cs,"data.frame")}))
which converts the dates to POSIXct objects, which is nice. Creating a markdown table from the data frame should be trivial!
You can manually convert git log to markdown with pretty=format [1]
Something like
git log --reverse --pretty=format:'| %H | %s |'
This will output something like this:
| a8d5defb511f1e44ddea21b42aec9b03ee768253 | initial commit |
| fdd9865e9cf01bd53c4f1dc106ee603b0a730f48 | fix tests |
| 10b58e8dd9cf0b9bebbb520408f0b342df613627 | add Dockerfile |
| d039004e8073a20b5d6eab1979c1afa213b78fa3 | update README.md |
1: https://git-scm.com/docs/pretty-formats

Format output of concatenating 2 variables in unix

I am coding a simple shell script that checks the space of the target path and the space utilization per directory on that target path (example, I am checking space of /path1/home, and also checks how all the folders on /path1/home is consuming the total space.) My question is regarding the output it produces, it is not that pleasing to the eye (uneven spacing). See sample output lines below.
SIZE USER_FOLDER DATE_LAST_MODIFIED
83G FOLDER 1 Apr 15 03:45
34G FOLDER 10 Mar 9 05:02
26G FOLDER 11 Mar 29 13:01
8.2G FOLDER 100 Apr 1 09:42
1.8G FOLDER 101 Apr 11 13:50
1.3G FOLDER 110 Feb 16 09:30
I just want the output format to be in line with the header so it will look neat because I will use it as a report. Here is the code I am using for this part.
ls -1 | grep -v "lost+found" |grep -v "email_body.tmp" > $v_path/Users.tmp
for user in `cat $v_path/Users.tmp | grep -v "Users.tmp"`
do
folder_size=`du -sh $user 2>/dev/null` # should be run using a more privileged user so that other folders can be read (2>/dev/null was used to discard error messages i.e. "du: cannot read directory `./marcnad/.gnupg': Permission denied")
folder_date=`ls -ltr | tr -s " " | cut -f6,7,8,9, -d" " | grep -w $user | cut -f1,2,3, -d" "`
folder_size="$folder_size $folder_date"
echo $folder_size >> $v_path/Users_Usage.tmp
done
echo "Summary of $v_path Disk Space Utilization per folder." >> email_body.tmp
echo "" >> email_body.tmp
echo "SIZE USER_FOLDER DATE_LAST_MODIFIED" >> email_body.tmp
for i in T G M K
do
cat $v_path/Users_Usage.tmp | grep [0-9]$i | sort -nr -k 1 >> $v_path/email_body.tmp
done
Thanks!
EDIT: Formatting
When you print the data use printf instead of echo
cat $v_path/Users_Usage.tmp | while read a b c d e f
do
printf '%-5s%-7%s%-4s%-4s%-3s-6s' $a $b $c $d $e $f
done
See here

UNIX (AIX) Command Help - Sed & Awk

I'm running this on an AIX 6.1.
The intended purpose of this command is to display the following information in the following format:
GetUsedRAM:GetUsedSwap:CPU_0_System:CPU_0_User:…CPU_N_System:CPU_N_User
The command is composed of several sub commands:
echo `vmstat 1 2 | tr -s ' ' ':' | cut -d':' -f4,5,14-15 | tail -1 | sed 's/\([0-9]*:[0-9]*:\)\([0-9]*:[0-9]*\)/\1/'``mpstat -a 1 1 | tr -s ' ' '|' | head -8 | tail -4 | cut -d'|' -f 25,27 | awk -F "|" '{printf "%.0f:%.0f:",$2,$1}' | sed '$s/.$//'| sed -e "s/ \{1,\}$//"| awk '{int a[10];split($1, a,":");printf("%d:%d:%d:%d:%d:%d:%d:%d",a[0],a[1],a[2],a[3],a[4],a[5],a[6],a[7])}'`
Which I'll re format for clarity:
echo \
`vmstat 1 2 |
tr -s ' ' ':' |
cut -d':' -f4,5,14-15 |
tail -1 |
sed 's/\([0-9]*:[0-9]*:\)\([0-9]*:[0-9]*\)/\1/' \
` \
`mpstat -a 1 1 |
tr -s ' ' '|' |
head -8 |
tail -4 |
cut -d'|' -f 25,27 |
awk -F "|" '{printf "%.0f:%.0f:",$2,$1}' |
sed '$s/.$//' |
sed -e "s/ \{1,\}$//" |
awk '{int a[10];split($1, a,":");printf("%d:%d:%d:%d:%d:%d:%d:%d",a[0],a[1],a[2],a[3],a[4],a[5],a[6],a[7])}' \
`
I understand all of the tr, cut, head tail, and (roughly) vmstat/mpstat commands. The first sed is where I get lost, I've tried running the command in smaller segments and not quite sure why it seems to work as a whole but not when I truncate the command before the next tr.
I'm also not so sure on the awk command although I understand the premise vaguely, as a function allowing formatted output.
Similarly, I have a vague understanding of sed being a command allowing certain strings/characters being replaced in some file.
I'm not able to make out what this specific implementation in the above case is.
Could anyone provide some clarity or direction as to exactly what is happening at each sed and awk step within the context of the entire command?
Thanks for your help.
Simplification
This two simpler commands will get the exact same output:
# GetUsedRAM:GetUsedSwap:CPU_0_System:CPU_0_User:…CPU_N_System:CPU_N_User
# Select fields 4,5 of last line, and format with :
comm1=`vmstat 1 2 |
awk '$4~/[0-9]/{avm=$4;fre=$5} END{printf "%s:%s",avm,fre}'
`
# Select fields 27 (sy) and 25 (us) for four cpu, print as decimal.
comm2=`mpstat -A 1 1 |
awk -v firstline=6 -v cpus=4 '
BEGIN{start=firstline-1; end=firstline+cpus;}
NR>start && NR<end {printf( ":%d:%d", $27,$25)}'
`
echo "${comm1}${comm2}"
Description.
Description of original commands
The whole command is the concatenation of two commands.
The first command:
The output of the vmstat is shown in this link.
The columns 4 and 5 are 'avm' and 'fre'. The output in columns 14 and 15,
seem to be 'us' (user) and 'sy' (system). And I say seem as no output
from the user is available to confirm.
The first command
`vmstat 1 2 | # Execute the command vmstat.
tr -s ' ' ':' | # convert all spaces to colon (:).
cut -d':' -f4,5,14-15 | # select fields 4,5,14,and 15
tail -1 | # select last line.
sed 's/\([0-9]*:[0-9]*:\)\([0-9]*:[0-9]*\)/\1/' \ # See below.
`
The sed command selects inside braces all digits [0-9]* before a colon
repeated twice. And then again (without the last colon). That's the whole
string in two parts: « (dd:dd:)(dd:dd) » (d means digit).
And finally, it replaces such whole string by what was selected inside
the first braces /\1/.
All this complexity just removes fields 14 and 15 as selected by cut.
A simpler command with exactly the same output is:
Select fields 4,5 of last line, and format with (:).
`vmstat 1 2 | awk '
$4~/[0-9]/{avm=$4;fre=$5} END{printf "%s:%s:",avm,fre}'
`
The second command:
The output of mpstat -A is similar to this one from Linux.
And also similar to this AIX mpstat -d output.
However, the exact output of AIX 6.1 for mpstat -a (ALL) on the computer
used could have several variations. Anyway, guided by the intended final
output desired: CPU_0_System:CPU_0_User:…CPU_N_System:CPU_N_User.
It seems that the columns to be selected should be us (user) and sy
(sys) percent of time that used the cpu for all cpu in use,
which seem to be four on the computer measured.
The manual for AIX 6.1 mpstat is here.
It has a list of all the 40 columns that are presented when the option
-a ALL is used:
CPU min maj mpcs mpcr dev soft dec ph cs ics bound rq push
S3pull S3grd S0rd S1rd S2rd S3rd S4rd S5rd S3hrd S4hrd S5hrd
sysc us sy wa id pc %ec ilcs vlcs lcs %idon %bdon %istol %bstol %nsp
us and sy are listed as the fields 27 and 28, however the command presented
by the user selects fields number 25 and 27. Close but not the same. The
only way to confirm would be to receive the output of the command from the user.
For testing I will be using the output of mpstat 5 1 from here.
# mpstat 5 1
System configuration: lcpu=4 ent=1.0 mode=Uncapped
cpu min maj mpc int cs ics rq mig lpa sysc us sy wt id pc %ec lcs
0 4940 0 1 632 685 268 0 320 100 263924 42 55 0 4 0.57 35.1 277
1 990 0 3 1387 2234 805 0 684 100 130290 28 47 0 25 0.27 16.6 649
2 3943 0 2 531 663 223 0 389 100 276520 44 54 0 3 0.57 34.9 270
3 1298 0 2 1856 2742 846 0 752 100 82141 31 40 0 29 0.22 13.4 650
ALL 11171 0 8 4406 6324 2142 0 2145 100 752875 39 51 0 10 1.63 163.1 1846
The second command
`mpstat -A 1 1 | # execute command
tr -s ' ' '|' | # replace all spaces with (|).
head -8 | # select 8 first lines.
tail -4 | # select last four lines.
cut -d'|' -f 25,27 | # select fields 25 and 27
awk -F "|" '{printf "%.0f:%.0f:",$2,$1}' | # print the fields as integers.
sed '$s/.$//' | # on the last line ($), substitute the last character (.$) by nothing.
sed -e "s/ \{1,\}$//" | # remove trailing space(s).
awk '{
int a[10];
split($1, a,":");
printf("%d:%d:%d:%d:%d:%d:%d:%d",a[0],a[1],a[2],a[3],a[4],a[5],a[6],a[7])
}' \
`
About the int: For older versions of awk, calling a function without the parentheses is equivalent to call the function on $0. int is equivalent to int($0), which is not printed, nor used. The same happens to the value of a[10].
The split sets each value of the command in a[i]. Then, all values of a[i] are printed as decimals.
The equivalent, and way simpler is:
Command #2
`mpstat -A 1 1 |
awk -v firstline=6 -v cpus=4 '
BEGIN{start=firstline-1; end=firstline+cpus;}
NR>start && NR<end {printf( ":%d:%d", $27,$25)}'
`

Sort history on number of occurrences

Basically I want to print the 10
most used commands that are stored in the
bash history but they still have to be proceeded
by the number that indicates when it was used;
I got this far:
history | cut -f 2 | cut -d ' ' -f 3,5 | sort -k 2 -n
Which should sort the second column of the number of occurrences from the command in that row... But it doesn't do that. I know I can head -10 the pipe at the end to take the highest ten of them, but I'm kinda stuck with the sorting part.
The 10 most used commands stored in your history:
history | sed -e 's/ *[0-9][0-9]* *//' | sort | uniq -c | sort -rn | head -10
This gives you the most used command line entries by removing the history number (sed), counting (sort | uniq -c), sorting by frequency (sort -rn) and showing only the top ten entries.
If you just want the commands alone:
history | awk '{print $2;}' | sort | uniq -c | sort -rn | head -10
Both of these strip the history number. Currently, I have no idea, how to achieve that in one line.
If you want to find the top used commands in your history file, you will have to count the instances in your history. awk can be used to do this. In the following code, the awk segment will create a hashtable with commands as the key and the number of times they appear as the value. This is printed out with the last history number for that command and sorted:
history | cut -f 2 | cut -d ' ' -f 3,5 | awk '{a[$2]++;b[$2]=$1} END{for (i in a) {print b[i], i, a[i]}}' | sort -k3 -rn | head -n 10
Output looks like:
975 cd 142
972 vim 122
990 ls 118
686 hg 90
974 mvn 51
939 bash 39
978 tac 32
958 cat 28
765 echo 27
981 exit 17
If you don't want the last column you could pipe the output through cut -d' ' -f1,2.

Finding punctuation and counting the number of each from the Unix Command line

I want find all of the punctuation marks used my .txt file and give a count of the number of occurrences of each one. How would I go about doing this?? I am new at this but I am trying to learn! This is not homework! I have been doing research on grep and sed right now.
$ perl -CSD -nE '$seen{$1}++ while /(\pP)/g; END { say "$_ $seen{$_}" for keys %seen }' sometextfile.utf8
As in
$ perl -CSD -nE '$seen{$1}++ while /(\pP)/g; END { say "$_ $seen{$_}" for keys %seen }' programming_perl_4th_edition.pod | sort -k2rn
, 21761
. 19578
; 10986
( 8856
) 8853
- 7606
: 7420
" 7300
_ 5305
’ 4906
/ 4528
{ 2966
} 2947
\ 2258
# 2121
# 2070
* 1991
' 1715
“ 1406
” 1404
[ 1007
] 1003
% 881
! 838
? 824
& 555
— 330
‑ 72
– 41
‹ 16
› 16
‐ 10
⁂ 10
… 8
· 3
「 2
」 2
« 1
» 1
‒ 1
― 1
‘ 1
• 1
‥ 1
⁃ 1
・ 1
If you want not just punctuation but punctuation and symbols, use [\pP\pS] in your pattern. Don’t use old-style POSIX classes whatever you do, though.
Use sed, tr, sort and uniq (and no perl):
sed -E 's/[^[:punct:]]//g;s/(.)/\1x/g' myfile.txt | tr 'x' '\n' | sort | uniq -c
I did it this way (sed + tr) so it will work on both unix and mac. Mac needs an imbedded linefeed in the sed command, but unix can use \n. This way it works everywhere.
This will work on non-mac unix:
sed -E 's/[^[:punct:]]//g;s/(.)/\1\n/g' myfile.txt | sort | uniq -c

Resources