Expected BOF record for XLRD when first line is redundant - jupyter-notebook

I came across the problem when I tried to use xlrd to import an .xls file and create dataframe using python.
Here is my file format:
xls file format
When I run:
import os
import pandas as pd
import xlrd
for filename in os.listdir("."):
if filename.startswith("report_1"):
df = pd.read_excel(filename)
It's showing "XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'Report g'"
I am pretty sure nothing wrong with xlrd (version 1.0.0) because when I remove the first row, dataframe can be created.
Wonder if there is any way that i can load the original file format?

Try following that accounts for a header line:
df = pd.read_excel(filename, header=0)

Related

Seaborn code Anscombe’s quartet does not work

The Seaborn code does not work.
I use jupyterlite to execute seaborn python code. first, i import seaborn in the following way --
import piplite
await piplite.install('seaborn')
import matplotlib.pyplot as plt
import seaborn as sn
%matplotlib inline
But when I insert seaborn code like the following one then it shows many errors that i do not understand yet --
link of the code
the problem that I face
But I insert this code in the google colab it works nicely
google colab
The issue is getting the example dataset as I point out in my comments.
The problem step is associated with:
# Load the example dataset for Anscombe's quartet
df = sns.load_dataset("anscombe")
You need to replace the line df = sns.load_dataset("anscombe") with the following:
url = 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/anscombe.csv' # based on [Data repository for seaborn examples](https://github.com/mwaskom/seaborn-data)
from pyodide.http import open_url
import pandas
df = pandas.read_csv(open_url(url))
That's based on use of open_url() from pyodide.http, see here for more examples.
Alternative with pyfetch and assigning the string obtained
If you've seen pyfetch around, this also works as a replacement of the sns.load_dataset() line based on John Hanley's post, that uses pyfetch to get the CSV data. The code is commented further:
# GET text at URL via pyfetch based on John Hanley's https://www.jhanley.com/blog/pyscript-loading-python-code-in-the-browser/
url = 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/anscombe.csv' # based on [Data repository for seaborn examples](https://github.com/mwaskom/seaborn-data)
from pyodide.http import pyfetch
response = await pyfetch(url)
content = (await response.bytes()).decode('utf-8')
# READ in string to dataframe based on [farmOS + JupyterLite: Import a CSV of Animals](https://gist.github.com/symbioquine/7641a2ab258726347ec937e8ea02a167)
import io
import pandas
df = pandas.read_csv(io.StringIO(content))

Importing Excel in Watson Studio

I am trying to read an excel file (xlsx) into a data frame in ibm watson studio. the excel file is saved in my list of assets. i'm a bit new to python
i have tried creating a project token with some help i got here. I will appreciate if someone helps with the complete code.
i tried this
from project_lib import Project
project = Project(project_id='',
project_access_token='')
pc = project.project_context
file = project.get_file("xx.xlsx")
file.sheet_names
df = pd.ExcelFile(file)
df = file.parse (0)
df.head ()
i needed to pass the excel file into a pandas data frame , pd for eg.
All you need to do is
First insert the project token as you already did.
Then simply fetch file and then do .seek(0),
Then read it using pandas' read_excel() and you should be able to read it.
# Fetch the file
my_file = project.get_file("tests-example.xls")
# Read the CSV data file from the object storage into a pandas DataFrame
my_file.seek(0)
import pandas as pd
pd.read_excel(my_file, nrows=10)
For more information:- https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/project-lib-python.html

EOF within quoted string to import all tsv documents with lapply tsv data with read.table

I have 200 tsv documents to import
I use code
list_file <- list.files(pattern = "*.tsv")
read.all <- lapply(list_file, read.table,header=TRUE, sep="\t",na.strings = c(""," "), fill = TRUE)
It provides warning as EOF within quoted string to import
At first I tought it is warning so I moved on, the structure of data is find. But when I check the number of rows in last few lists, I found it didn't import all rows from those tsv documents. It seems nothing wrong with my data. Because I imported a single tsv file that imported incompletely with lapply function and it was imported successfully without losing any information
Since I can't monitor which file was not import appropriately, and I can't trust I have all information I need. Can any one help to provide suggestions?
Maybe provide a method that may slow but without errors? Many thanks.

Iterating through a directory and assigning each file a variable to work with

I'm really new to Python, so in advance thank you for this probably stupid question:
I have a directory filled with LAS files and I would like to assign each of the with a variable, so that I can continue working with them all afterwards. I can't seem to find an answer that works ... the code below, is for some reason I cant figure out not working - thanks for the help!
%%
### Importing function packages ### NumPy, Pandas, PyPlot, OS, LASio, Sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
import lasio as las
#%%
#Directory
my_dir = os.walk("C:\\Users\\L_R\\Desktop\\test\\")
my_list = []
#Creates a list with LAS files in the DIR
for i in my_dir:
my_list.append(i)
print my_list
#Iterates through DIR and creates dataframes out of files
count=0
current_las=my_list[count]
filename = current_las
for dirname, dirs, files in my_dir:
while count < len(my_list):
if filename.endswith(".LAS"):
las_output = las.read(filename) #module reading the file
count = count + 1
A few things. First I assume you have a dictionary, and you want to extract the items and assign them to variables using the names saved in the keys. Is that correct?
You could load the entire directory and then save them to the correct files.
If I am correct, this code will allow you first load the data and then save the data to variables.
D = np.load('Dict.py')
for key,val in D.items():
exec(key + '=val')

iPython: Unable to export data to CSV

I have searched a multiple articles, but unable to get iPython (Python 2.7) to export data to a CSV, and I do not receive an error message to troubleshoot the specific problem, and when I include "print(new_links)" I obtain the desired output; thus, this issue is printing to the csv.
Any suggestions on next steps are much appreciated !
Thanks!
import csv
import requests
import lxml.html as lh
url = 'http://wwwnc.cdc.gov/travel/destinations/list'
page = requests.get(url)
doc = lh.fromstring(page.content)
new_links = []
for link_node in doc.iterdescendants('a'):
try:
new_links.append(link_node.attrib['href'])
except KeyError:
pass
cdc_part1 = open("cdc_part1.csv", 'wb')
wr = csv.writer(cdc_part1, dialect='excel')
wr.writerow(new_links)
Check to make sure the new_links is a list of lists.
If so and wr.writerow(new_links) is still not working, you can try:
for row in new_links:
wr.writerow(row)
I would also check the open statement's file path and mode. Check if you can get it to work with 'w'.

Resources