The blog introduces using Python® for data management and reporting. Python uses Jupyter® Notebook to access data sources to explore reading and writing data for generating reports from the source data.
Jupyter Notebook consists of the following components:
Note: I used Ubuntu® 18 on my server for this installation.
Perform the following steps:
Install all security patches on the server using the following command:
sudo apt upgrade -y
Run the following command to install one or more packages for converting notebooks to PDF:
sudo apt install -y texlive-xetex
To use Jupyter notebook, install Anaconda, a data science and machine learning platform based on Python. Download the Anaconda3-2019.03-Linux-x86_64.sh package.
Run the following command to ensure this download package is not corrupt:
Run the following command to install Anaconda:
After the installation completes, restart the bash shell to include the changes by running the following command:
Update the conda® base code by running the following commands:
conda config --set auto_activate_base false conda update -n base -c defaults conda
Create the conda virtual environment (such as python_data) using the latest Python version by running the following command:
conda create -n python_data python=3
Activate this virtual environment y running the following command:
conda activate python_data
Run the following commands to make a directory to hold this GitHub repo:
mkdir python_data cd python_data
Run the following commands to install the packages:
conda activate python_data conda install jupyter psutil
Perform the following steps to connect to the Juniper Notebook server:
Activated the virtual environment using the following command:
conda activate python_data
Run the following script to start the server:
This script uses the following command to start the Jupyter Notebook server and continues to run it outside of the terminal window:
nohup jupyter notebook --no-browser --port=8086
This provides the token and searches the nohup output if you run the preceding command in the background by using get_notebook_token.py as highlighted in the following snapshot:
Go to the terminal window of your local machine and enter the following command:
ssh -N -L localhost:8087:localhost:8086 username@<the public IP address of the server>
Leave this terminal open while using the Jupyter Notebook server in this session.
In the browser, enter the following URL http://localhost:8087 and provide the token as input.
After logging, you can create a Jupyter Notebook by using Python code and Markdown cells.
Use Pandas DataFrames inside of Jupyter Notebook for examining the data in a DataFrame. Run the following command to import the pandas package.
import pandas as pd
There are different ways to examine data using Pandas DataFrame:
.iloc[row, column]. remember: pythonindexes starting at 0.
For writing a report, you need to use LaTex®. LaTex, built into Jupyter Notebook, helps you create technical and scientific documents and is good at displaying mathematical formulas. You can write LaTex formulas in a Markdown cell.
For the Jupyter report, use a data set that allows you to consider some aspects of socioeconomic factors that may impact it.
To create the report and make sure to run the following commands to install supporting packages:
import matplotlib.pyplot import numpy import pandas import random
You can create and share documents that contain live codes, equations, and visualizations that help in data cleaning, transformation, and statistical modeling in data science.
Use the Feedback tab to make any comments or ask questions. You can also start a conversation with us.