Convert .doc files in .docx in windows programatically using Python - docx

I tried below code but it is giving me win32com Import Error
import win32com
word = win32com.Dispatch("Word.application")
import glob, os
os.chdir(%doc folder path%)
for file in glob.glob("*.doc"):
file=%doc folder path%+'\\'+file
print(file)
doc = word.Documents.Open(file)
file = file.replace('.doc','.docx')
file = file.replace('DOCX_F', 'docx_files')
doc.SaveAs(file, 16)
word.Quit()
But when I tried to install the same module it is not available

Try win32com.client instead of just win32com
import win32com.client
word = win32com.client.Dispatch("Word.application")

Related

Importing Excel in Watson Studio

I am trying to read an excel file (xlsx) into a data frame in ibm watson studio. the excel file is saved in my list of assets. i'm a bit new to python
i have tried creating a project token with some help i got here. I will appreciate if someone helps with the complete code.
i tried this
from project_lib import Project
project = Project(project_id='',
project_access_token='')
pc = project.project_context
file = project.get_file("xx.xlsx")
file.sheet_names
df = pd.ExcelFile(file)
df = file.parse (0)
df.head ()
i needed to pass the excel file into a pandas data frame , pd for eg.
All you need to do is
First insert the project token as you already did.
Then simply fetch file and then do .seek(0),
Then read it using pandas' read_excel() and you should be able to read it.
# Fetch the file
my_file = project.get_file("tests-example.xls")
# Read the CSV data file from the object storage into a pandas DataFrame
my_file.seek(0)
import pandas as pd
pd.read_excel(my_file, nrows=10)
For more information:- https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/project-lib-python.html

Python 3 - OpenPyXL fill cell does not work

I'm trying to fill Excel cells with color.
(Ubuntu machine if that matters. Excel file was created with pandas)
import openpyxl
from openpyxl.styles import PatternFill
wb = openpyxl.load_workbook('out.xlsx')
ws = wb.active
ws['B5'].fill = PatternFill(start_color='FFEE1111', end_color='FFEE1111', fill_type='solid')
After running the script above, I opened out.xlsx but nothing changed.
Can someone help please?

Upload images with labels in google collab

I am using jupyter notebook in google collab. My training dataset looks like this:
/data/label1/img1.jpeg
.
.
.
/data/label2/img90.jpeg
I want to import such dataset. Things that I tried
Step1:
!pip install -U -q PyDrive
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
from os import walk
import os
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
Step 2:
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
Step 3
file_to_download = os.path.expanduser('./data/')
file_list = drive.ListFile(
{'q': 'id_of_the_data_directory'})
Not sure how to proceed next. The folder data is my collab notebook folder in the drive. I want to read the images along with labels.In order to do the same I am using the code:
filename_queue=tf.train.string_input_producer(tf.train.match_filenames_once('data/*/*.jpeg'))
image_reader=tf.WholeFileReader()
key,image_file=image_reader.read(filename_queue)
#key is the entire path to the jpeg file and we need only the subfolder as the label
S = tf.string_split([key],'\/')
length = tf.cast(S.dense_shape[1],tf.int32)
label = S.values[length-tf.constant(2,dtype=tf.int32)]
label = tf.string_to_number(label,out_type=tf.int32)
#decode the image
image=tf.image.decode_jpeg(image_file)
#then code to place labels and folders in corresponding arrays
You should upload your dataset in a recursive manner. Here is a sample on how to upload your dataset from your Google Drive to Colab
First of all I want to mention that we cannot access the folder directly. We need to set the mount point and all the drive contents are accessed via that. Thanks to this answer
Follow the steps exactly as given in the answer link given above. But just make sure to change your path according to the new drive folder created.
PS: I still left the question open because you may reach here with image dataset having subfolder names as the labels of the training images, it works for so the solution posted here works for both directories with subfolders as well as directories with files.

Expected BOF record for XLRD when first line is redundant

I came across the problem when I tried to use xlrd to import an .xls file and create dataframe using python.
Here is my file format:
xls file format
When I run:
import os
import pandas as pd
import xlrd
for filename in os.listdir("."):
if filename.startswith("report_1"):
df = pd.read_excel(filename)
It's showing "XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'Report g'"
I am pretty sure nothing wrong with xlrd (version 1.0.0) because when I remove the first row, dataframe can be created.
Wonder if there is any way that i can load the original file format?
Try following that accounts for a header line:
df = pd.read_excel(filename, header=0)

knitr, pandoc: embeding SVG directly to HTML document

I am using knitr to generate reports automatically to a mediawiki page. The report output is in HTML via pandoc. But I am having problems uploading the figures to the wiki site. So I figured that I would use the SVG device and include the code in the final document instead of relying on external documents. However I am having trouble doing that with either knitr or pandoc. Does anybody know about a pandoc or a knitr option that creates the SVG embedded instead of linking to the image? Or even a small shell script that replaces <img src="myFigure.svg"> with the contents of myFigure.svg.
I ended up using a simple python script for the job
from sys import argv
import re
import os
def svgreplace(match):
"replace match with the content of a filename match"
filename = match.group(1)
with open(filename) as f:
return f.read()
def svgfy(string):
img = re.compile(r'<img src="([^"]*\.svg)"[^>]*>')
return img.sub(svgreplace, string)
if __name__ == "__main__":
fname = argv[1]
with open(fname) as f:
html = f.read()
out_fname = fname + ".tmp"
out = open(out_fname, 'w')
out.write(svgfy(html))
out.close()
os.rename(out_fname, fname)

Resources