Jupyter Multiple Notebooks using Same Data - jupyter-notebook

I have built out a Jupyter notebook for my previous analysis.
And I want to start a different branch of analysis, using the some of the same dataframes from previous analysis.
How do I use the previous dataframes without coping all my code to rebuild my previous analysis, in my new notebook?

You can share data across notebooks with Jupyter magics. For example:
Given
# Notebook 1
import pandas as pd
d = {"one" : pd.Series([1., 2., 3.], index=list("abc"))}
df = pd.DataFrame(d)
Code
%store df
Recall the DataFrame in a separate notebook:
# Notebook 2
%store -r df
df
Output
More on this in the older IPython docs. See also Jupyter's %bookmark magic for sharing directories.

You can pickle the dataframe then load the dataframe in your new notebook. This is fine for short term data reuse. For long term data storage, writing then reading a text csv file may be more reliable.
pickle_save.py
import os
import pandas as pd
pickle_location = r'd:\temp\pickle_file'
df = pd.DataFrame({'A':1,'B':2}, index=[0])
df.to_pickle(pickle_location)
if os.path.exists(pickle_location):
print('pickle created')
pickle_load.py
import os
import pandas as pd
pickle_location = r'd:\temp\pickle_file'
df_load = pd.read_pickle(pickle_location)
print(df_load)

Related

Creating a dataframe in Azure ML Notebook with R kernel

I have written some scripts in R which I have to run in azure ml notebook but I have not found much documentation how to create a dataset by running code in notebook with R kernel. I have written the following python code which works with python kernel as:
from azureml.core import Dataset, Datastore,Workspace
subscription_id = 'abc'
resource_group = 'pqr'
workspace_name = 'xyz'
workspace = Workspace(subscription_id, resource_group, workspace_name)
datastore = Datastore.get(workspace, 'workspaceblobstore')
# create tabular dataset from all parquet files in the directory
tabular_dataset_3 = Dataset.Tabular.from_parquet_files(path=(datastore,'/UI/09-17-2022_125003_UTC/userdata1.parquet'))
df=tabular_dataset_3.to_pandas_dataframe()
It works fine with python kernel but I want to execute the equivalent R code in notebook with R kernel.
Can anyone please help me what is the equivalent R code ? Any help would be appreciated.
To create an R script and use the dataset, first we need to register the dataset to the portal. Once the dataset is added to the portal, we need to get the dataset URL and open the notebook and use the R kernel.
Upload the dataset and get the data source URL
Go to Machine Learning studio and create a new notebook.
Use the below R script to get the dataset and convert that to dataframe.
azureml_main <- function(dataframe1, dataframe2){
print("R script run.")
run = get_current_run()
ws = workspacename
dataset = azureml$core$dataset$Dataset$get_by_name(ws, “./path/insurance.csv")
dataframe2 <- dataset$to_pandas_dataframe()
# Return datasets as a Named List
return(list(dataset1=dataframe1, dataset2=dataframe2))
}

Importing Excel in Watson Studio

I am trying to read an excel file (xlsx) into a data frame in ibm watson studio. the excel file is saved in my list of assets. i'm a bit new to python
i have tried creating a project token with some help i got here. I will appreciate if someone helps with the complete code.
i tried this
from project_lib import Project
project = Project(project_id='',
project_access_token='')
pc = project.project_context
file = project.get_file("xx.xlsx")
file.sheet_names
df = pd.ExcelFile(file)
df = file.parse (0)
df.head ()
i needed to pass the excel file into a pandas data frame , pd for eg.
All you need to do is
First insert the project token as you already did.
Then simply fetch file and then do .seek(0),
Then read it using pandas' read_excel() and you should be able to read it.
# Fetch the file
my_file = project.get_file("tests-example.xls")
# Read the CSV data file from the object storage into a pandas DataFrame
my_file.seek(0)
import pandas as pd
pd.read_excel(my_file, nrows=10)
For more information:- https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/project-lib-python.html

Upload images with labels in google collab

I am using jupyter notebook in google collab. My training dataset looks like this:
/data/label1/img1.jpeg
.
.
.
/data/label2/img90.jpeg
I want to import such dataset. Things that I tried
Step1:
!pip install -U -q PyDrive
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
from os import walk
import os
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
Step 2:
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
Step 3
file_to_download = os.path.expanduser('./data/')
file_list = drive.ListFile(
{'q': 'id_of_the_data_directory'})
Not sure how to proceed next. The folder data is my collab notebook folder in the drive. I want to read the images along with labels.In order to do the same I am using the code:
filename_queue=tf.train.string_input_producer(tf.train.match_filenames_once('data/*/*.jpeg'))
image_reader=tf.WholeFileReader()
key,image_file=image_reader.read(filename_queue)
#key is the entire path to the jpeg file and we need only the subfolder as the label
S = tf.string_split([key],'\/')
length = tf.cast(S.dense_shape[1],tf.int32)
label = S.values[length-tf.constant(2,dtype=tf.int32)]
label = tf.string_to_number(label,out_type=tf.int32)
#decode the image
image=tf.image.decode_jpeg(image_file)
#then code to place labels and folders in corresponding arrays
You should upload your dataset in a recursive manner. Here is a sample on how to upload your dataset from your Google Drive to Colab
First of all I want to mention that we cannot access the folder directly. We need to set the mount point and all the drive contents are accessed via that. Thanks to this answer
Follow the steps exactly as given in the answer link given above. But just make sure to change your path according to the new drive folder created.
PS: I still left the question open because you may reach here with image dataset having subfolder names as the labels of the training images, it works for so the solution posted here works for both directories with subfolders as well as directories with files.

Iterating through a directory and assigning each file a variable to work with

I'm really new to Python, so in advance thank you for this probably stupid question:
I have a directory filled with LAS files and I would like to assign each of the with a variable, so that I can continue working with them all afterwards. I can't seem to find an answer that works ... the code below, is for some reason I cant figure out not working - thanks for the help!
%%
### Importing function packages ### NumPy, Pandas, PyPlot, OS, LASio, Sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
import lasio as las
#%%
#Directory
my_dir = os.walk("C:\\Users\\L_R\\Desktop\\test\\")
my_list = []
#Creates a list with LAS files in the DIR
for i in my_dir:
my_list.append(i)
print my_list
#Iterates through DIR and creates dataframes out of files
count=0
current_las=my_list[count]
filename = current_las
for dirname, dirs, files in my_dir:
while count < len(my_list):
if filename.endswith(".LAS"):
las_output = las.read(filename) #module reading the file
count = count + 1
A few things. First I assume you have a dictionary, and you want to extract the items and assign them to variables using the names saved in the keys. Is that correct?
You could load the entire directory and then save them to the correct files.
If I am correct, this code will allow you first load the data and then save the data to variables.
D = np.load('Dict.py')
for key,val in D.items():
exec(key + '=val')

Animated graphs in ipython notebook

Is there a way of creating animated graphs. For example showing the same graph, with different parameters.
For example is SAGE notebook, one can write:
a = animate([circle((i,i), 1-1/(i+1), hue=i/10) for i in srange(0,2,0.2)],
xmin=0,ymin=0,xmax=2,ymax=2,figsize=[2,2])
a.show()
This has horrible flickering, but at least this creates a plot that animates for me. It is based on Aron's, but Aron's does not work as-is.
import time, sys
from IPython.core.display import clear_output
f, ax = plt.subplots()
n = 30
x = array([i/10.0 for i in range(n)])
y = array([sin(i) for i in x])
for i in range(5,n):
ax.plot(x[:i],y[:i])
time.sleep(0.1)
clear_output()
display(f)
ax.cla() # turn this off if you'd like to "build up" plots
plt.close()
Update: January 2014
Jake Vanderplas has created a Javascript-based package for matplotlib animations available here. Using it is as simple as:
# https://github.com/jakevdp/JSAnimation
from JSAnimation import examples
examples.basic_animation()
See his blog post for a more complete description and examples.
Historical answer (see goger for a correction)
Yes, the Javascript update does not correctly hold the image frame yet, so there is flicker, but you can do something quite simple using this technique:
import time, sys
from IPython.display import clear_output
f, ax = plt.subplots()
for i in range(10):
y = i/10*sin(x)
ax.plot(x,y)
time.sleep(0.5)
clear_output()
display(f)
ax.cla() # turn this off if you'd like to "build up" plots
plt.close()
IPython widgets let you manipulate Python objects in the kernel with GUI objects in the Notebook. You might also like Sage hosted IPython Notebooks. One problem you might have with sharing widgets or interactivity in Notebooks is that if someone else doesn't have IPython, they can't run your work. To solve that, you can use Domino to share Notebooks with widgets that others can run.
Below are three examples of widgets you can build in a Notebook using pandas to filter data, fractals, and a slider for a 3D plot. Learn more and see the code and Notebooks here.
If you want to live-stream data or set up a simulation to run as a loop, you can also stream data into plots in a Notebook. Disclaimer: I work for Plotly.
If you use IPython notebook, v2.0 and above support interactive widgets. You can find a good example notebook here (n.b. you need to download and run from your own machine to see the sliders).
It essentially boils down to importing interact, and then passing it a function, along with ranges for the paramters. e.g., from the second link:
In [8]:
def pltsin(f, a):
plot(x,a*sin(2*pi*x*f))
ylim(-10,10)
In [9]:
interact(pltsin, f=(1,10,0.1), a=(1,10,1));
This will produce a plot with two sliders, for f and a.
If you want 3D scatter plot animations, the Ipyvolume Jupyter widget is very impressive.
http://ipyvolume.readthedocs.io/en/latest/animation.html#
bqplot is a really good option to do this now. its built specifically for animation through python in the notebook
https://github.com/bloomberg/bqplot
On #goger's comment of 'horrible flickering', I found that calling clear_output(wait=True) solved my problem. The flag tells clear_output to wait to render till it has something new to render.
matplotlib has an animation module to do just that. However, examples provided on the site will not run as is in a notebook; you need to make a few tweaks to make it work.
Here is the example of the page below modified to work in a notebook (modifications in bold).
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
from matplotlib import rc
from IPython.display import HTML
fig, ax = plt.subplots()
xdata, ydata = [], []
ln, = plt.plot([], [], 'ro', animated=True)
def init():
ax.set_xlim(0, 2*np.pi)
ax.set_ylim(-1, 1)
return ln,
def update(frame):
xdata.append(frame)
ydata.append(np.sin(frame))
ln.set_data(xdata, ydata)
return ln,
ani = FuncAnimation(fig, update, frames=np.linspace(0, 2*np.pi, 128),
init_func=init, blit=True)
rc('animation', html='html5')
ani
# plt.show() # not needed anymore
Note that the animation in the notebook is made via a movie and that you need to have ffmpeg installed and matplotlib configured to use it.

Resources