Upload images with labels in google collab - jupyter-notebook

I am using jupyter notebook in google collab. My training dataset looks like this:
/data/label1/img1.jpeg
.
.
.
/data/label2/img90.jpeg
I want to import such dataset. Things that I tried
Step1:
!pip install -U -q PyDrive
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
from os import walk
import os
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
Step 2:
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
Step 3
file_to_download = os.path.expanduser('./data/')
file_list = drive.ListFile(
{'q': 'id_of_the_data_directory'})
Not sure how to proceed next. The folder data is my collab notebook folder in the drive. I want to read the images along with labels.In order to do the same I am using the code:
filename_queue=tf.train.string_input_producer(tf.train.match_filenames_once('data/*/*.jpeg'))
image_reader=tf.WholeFileReader()
key,image_file=image_reader.read(filename_queue)
#key is the entire path to the jpeg file and we need only the subfolder as the label
S = tf.string_split([key],'\/')
length = tf.cast(S.dense_shape[1],tf.int32)
label = S.values[length-tf.constant(2,dtype=tf.int32)]
label = tf.string_to_number(label,out_type=tf.int32)
#decode the image
image=tf.image.decode_jpeg(image_file)
#then code to place labels and folders in corresponding arrays

You should upload your dataset in a recursive manner. Here is a sample on how to upload your dataset from your Google Drive to Colab

First of all I want to mention that we cannot access the folder directly. We need to set the mount point and all the drive contents are accessed via that. Thanks to this answer
Follow the steps exactly as given in the answer link given above. But just make sure to change your path according to the new drive folder created.
PS: I still left the question open because you may reach here with image dataset having subfolder names as the labels of the training images, it works for so the solution posted here works for both directories with subfolders as well as directories with files.

Related

How import excel file from the browser

I want to use GET() function from httr package, because this is just an example file and in the original file I need to write in user name and password i.e.
library(httr)
filename<-"filename_in_url.xls"
URL <- "originalurl"
GET(URL, authenticate("usr", "pwd"), write_disk(paste0("C:/Temp/temp/",filename), overwrite = TRUE))
As a test, I tried to import one of the files from I want to import one of the files from https://www.nordpoolgroup.com/historical-market-data/ and do not save it to the disk, but save it to the environment in order to see the data. However, it also does not work.
library(XML)
library(RCurl)
excel <- readHTMLTable(htmlTreeParse(getURL(paste("https://www.nordpoolgroup.com/4a4c6b/globalassets/marketdata-excel-files/elspot-prices_2021_hourly_eur.xls")), useInternalNodes=TRUE))[[1]]
Or if there are other ways how to import data (functions where login information can be as an input)m it will be great to see them

How to write streamlit UploadedFile to temporary directory with original filename?

Streamlit has a function that allows convenient upload of multiple files.
files = st.file_uploader('File upload', type=['txt'],accept_multiple_files=True)
Then files contains a list of UploadedFile objects which are ByteIO like. Though it is not clear how to get the filenames of the original files and write the file to a temporary directory. It is also not clear if that approach would conflict with the way streamlit operates. It basically reruns the underlying script every time an action is performed.
I am using some tools that read files based on their path given as a string. They are expected to be read from the hard drive.
You can access the name of the file with files[i].name and its content with files[i].read().
It looks like this in the end:
import os
import streamlit as st
files = st.file_uploader("File upload", type=["txt"], accept_multiple_files=True)
if len(files) == 0:
st.error("No file were uploaded")
for i in range(len(files)):
bytes_data = files[i].read() # read the content of the file in binary
print(files[i].name, bytes_data)
with open(os.path.join("/tmp", files[i].name), "wb") as f:
f.write(bytes_data) # write this content elsewhere

Importing Excel in Watson Studio

I am trying to read an excel file (xlsx) into a data frame in ibm watson studio. the excel file is saved in my list of assets. i'm a bit new to python
i have tried creating a project token with some help i got here. I will appreciate if someone helps with the complete code.
i tried this
from project_lib import Project
project = Project(project_id='',
project_access_token='')
pc = project.project_context
file = project.get_file("xx.xlsx")
file.sheet_names
df = pd.ExcelFile(file)
df = file.parse (0)
df.head ()
i needed to pass the excel file into a pandas data frame , pd for eg.
All you need to do is
First insert the project token as you already did.
Then simply fetch file and then do .seek(0),
Then read it using pandas' read_excel() and you should be able to read it.
# Fetch the file
my_file = project.get_file("tests-example.xls")
# Read the CSV data file from the object storage into a pandas DataFrame
my_file.seek(0)
import pandas as pd
pd.read_excel(my_file, nrows=10)
For more information:- https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/project-lib-python.html

Jupyter Multiple Notebooks using Same Data

I have built out a Jupyter notebook for my previous analysis.
And I want to start a different branch of analysis, using the some of the same dataframes from previous analysis.
How do I use the previous dataframes without coping all my code to rebuild my previous analysis, in my new notebook?
You can share data across notebooks with Jupyter magics. For example:
Given
# Notebook 1
import pandas as pd
d = {"one" : pd.Series([1., 2., 3.], index=list("abc"))}
df = pd.DataFrame(d)
Code
%store df
Recall the DataFrame in a separate notebook:
# Notebook 2
%store -r df
df
Output
More on this in the older IPython docs. See also Jupyter's %bookmark magic for sharing directories.
You can pickle the dataframe then load the dataframe in your new notebook. This is fine for short term data reuse. For long term data storage, writing then reading a text csv file may be more reliable.
pickle_save.py
import os
import pandas as pd
pickle_location = r'd:\temp\pickle_file'
df = pd.DataFrame({'A':1,'B':2}, index=[0])
df.to_pickle(pickle_location)
if os.path.exists(pickle_location):
print('pickle created')
pickle_load.py
import os
import pandas as pd
pickle_location = r'd:\temp\pickle_file'
df_load = pd.read_pickle(pickle_location)
print(df_load)

Using bokeh server in jupyter notebook behind proxy / jupyterhub

I want to develop bokeh apps on a jupyter notebook instance that runs behind jupyterhub (AKA an authenticating proxy). I would like to have interactive bokeh apps calling back to the notebook kernel. I don't want to use the notebook widgets etc because I want to be able to export the notebook as a python file and have something I can serve with bokeh server.
The following code in my notebook gives an empty output with no errors:
from bokeh.layouts import row
from bokeh.models.widgets import Button
from bokeh.io import show, output_notebook
from bokeh.application.handlers import FunctionHandler
from bokeh.application import Application
output_notebook()
# Create the Document Application
def modify_doc(doc):
layout = row(Button(label="Hello,"),Button(label="world!"))
doc.add_root(layout)
handler = FunctionHandler(modify_doc)
app = Application(handler)
# Output = BokehJS 0.12.10 successfully loaded.
# New cell
show(app, notebook_url="my-jupyterhub-url.com:80")
# Output = "empty" cell
Inspecting the cell a script tag has been added:
<script src="http://my-jupyterhub-url.com:46249/autoload.js?bokeh-autoload-element=f8fa3bd0-9caf-473d-87a5-6c7b9680648b&bokeh-absolute-url=http://my-jupyterhub-url.com:46249" id="f8fa3bd0-9caf-473d-87a5-6c7b9680648b" data-bokeh-model-id="" data-bokeh-doc-id=""></script>
This will not work because port 46249 isn't open on the jupyterhub proxy. Also the path that routes to my jupyter instance is my-jupyterhub-url.com/user/my-username/ so my-jupyterhub-url.com/autoload.js wouldn't route anywhere.
This feels like it could be a common requirement but a search hasn't revealed a solution to be yet.
Any ideas?
So I've found a solution that I'm not happy about but works.. just about.
First install nbserverproxy on your Jupyter instance. This allows you to proxy through JupyterHub (where you are authenticated) onto arbitrary ports on your Jupyter machine/container. I installed by opening a terminal from the Jupyter web front end and typing:
pip install git+https://github.com/jupyterhub/nbserverproxy --user
jupyter serverextension enable --py nbserverproxy --user
Then restart your server. For my install of JupyterHub this was control panel -> stop my server wait then start my server.
Finally I monkey patched the Ipython.display.publish_display_data (since the source code revealed that bokeh used this when calling show) in the notebook like so.
from unittest.mock import patch
from IPython.display import publish_display_data
orig = publish_display_data
import re
def proxy_replacer(display_data):
for key, item in display_data.items():
if isinstance(item, str):
item= re.sub(r'(/user/tam203)/?:([0-9]+)', r'\1/proxy/\2', item)
item = re.sub(r'http:' , 'https:', item)
display_data[key] = item
return display_data
def mock(data, metadata=None, source=None):
data = proxy_replacer(data) if data else data
metadata = proxy_replacer(metadata) if metadata else metadata
return orig(data, metadata=metadata, source=source)
patcher = patch('IPython.display.publish_display_data', new=mock)
patcher.start()
With that all done I was then able to run the following an see a nice dynamically updating plot.
import random
from bokeh.io import output_notebook
output_notebook()
from bokeh.io import show
from bokeh.server.server import Server
from bokeh.application import Application
from bokeh.application.handlers.function import FunctionHandler
from bokeh.plotting import figure, ColumnDataSource
def make_document(doc):
source = ColumnDataSource({'x': [], 'y': [], 'color': []})
def update():
new = {'x': [random.random()],
'y': [random.random()],
'color': [random.choice(['red', 'blue', 'green'])]}
source.stream(new)
doc.add_periodic_callback(update, 100)
fig = figure(title='Streaming Circle Plot!', sizing_mode='scale_width',
x_range=[0, 1], y_range=[0, 1])
fig.circle(source=source, x='x', y='y', color='color', size=10)
doc.title = "Now with live updating!"
doc.add_root(fig)
app = Application(FunctionHandler(make_document))
show(app, notebook_url="<my-domain>.co.uk/user/tam203/")
So while I'm happy to have found a work around it doesn't really feel like a solution. I think a smallish change in bokeh could solve this (something like a url template string where you can specify the path and the port).
According to the official bokeh documentation show(obj, notebook_url=remote_jupyter_proxy_url) accepts a notebook_url argument value. Apparently this can be a function that accepts a port argument value.
The documentation goes further by providing a reference implementation for the function remote_jupyter_proxy_url in the context of jupyterhub/jupyterlab and proxy extension.

Resources