{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Access Excel Tables with Python"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## **Access Excel Tables with Python**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This post is about extracting data from Excel tables into Python. Source data is with permission from [ExcelisFun](https://www.youtube.com/watch?v=nBu1Bqa1jjs&t=1780s)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Excel Tables are a great way of grouping related data, as it makes analysis easier. Usually, these tables will have names to identify them, as well as some other cool features. An example image is shown below: "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![ExcelTables.png](extract_defined_tables/image1.png)<br>\n",
    "Source : [support.office.com](https://support.office.com/en-us/article/overview-of-excel-tables-7ab0bb7d-3a9e-4b56-a3c9-6c94334e492c)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![ExcelTables.png](extract_defined_tables/image2.png)<br>\n",
    "Data Source : [ExcelisFun](https://www.youtube.com/watch?v=nBu1Bqa1jjs&t=1780s)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the image above, there are a couple of Excel tables, with defined names - SalesRep, Products, Category, and Supplier tables. How do we read this into Python?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **Option 1 - The Naive way:**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's read it into pandas "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "filename = \"Data_files/016-MSPTDA-Excel.xlsx\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>SalesRepID</th>\n",
       "      <th>SalesRep</th>\n",
       "      <th>Region</th>\n",
       "      <th>Unnamed: 3</th>\n",
       "      <th>ProductID</th>\n",
       "      <th>Products</th>\n",
       "      <th>RetailPrice</th>\n",
       "      <th>CategoryID</th>\n",
       "      <th>SupplierID</th>\n",
       "      <th>Unnamed: 9</th>\n",
       "      <th>CategoryID.1</th>\n",
       "      <th>Category</th>\n",
       "      <th>Unnamed: 12</th>\n",
       "      <th>SupplierID.1</th>\n",
       "      <th>Supplier</th>\n",
       "      <th>City</th>\n",
       "      <th>State</th>\n",
       "      <th>E-mail</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>Sioux Radcoolinator</td>\n",
       "      <td>NW</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>Quad</td>\n",
       "      <td>43.95</td>\n",
       "      <td>3.0</td>\n",
       "      <td>GB</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>Beginner</td>\n",
       "      <td>NaN</td>\n",
       "      <td>GB</td>\n",
       "      <td>Gel Boomerangs</td>\n",
       "      <td>Oakland</td>\n",
       "      <td>CA</td>\n",
       "      <td>gel@gel-boomerang.com</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>Tyrone Smithe</td>\n",
       "      <td>NE</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>Yanaki</td>\n",
       "      <td>27.95</td>\n",
       "      <td>1.0</td>\n",
       "      <td>CO</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>Advanced</td>\n",
       "      <td>NaN</td>\n",
       "      <td>CO</td>\n",
       "      <td>Colorado Boomerangs</td>\n",
       "      <td>Gunnison</td>\n",
       "      <td>CO</td>\n",
       "      <td>Pollock@coloradoboomerang.com</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>Chantel Zoya</td>\n",
       "      <td>SW</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Eagle</td>\n",
       "      <td>19.95</td>\n",
       "      <td>2.0</td>\n",
       "      <td>CC</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3.0</td>\n",
       "      <td>Freestyle</td>\n",
       "      <td>NaN</td>\n",
       "      <td>CC</td>\n",
       "      <td>Channel Craft</td>\n",
       "      <td>Richland</td>\n",
       "      <td>WA</td>\n",
       "      <td>Dino@CC.com</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>Chin Pham</td>\n",
       "      <td>SE</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Bellen</td>\n",
       "      <td>26.95</td>\n",
       "      <td>1.0</td>\n",
       "      <td>GB</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Competition</td>\n",
       "      <td>NaN</td>\n",
       "      <td>DB</td>\n",
       "      <td>Darnell Booms</td>\n",
       "      <td>Burlington</td>\n",
       "      <td>VT</td>\n",
       "      <td>Darnell@Darnell.com</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>Diego Vasque</td>\n",
       "      <td>MW</td>\n",
       "      <td>NaN</td>\n",
       "      <td>5.0</td>\n",
       "      <td>Aspen</td>\n",
       "      <td>24.95</td>\n",
       "      <td>1.0</td>\n",
       "      <td>CO</td>\n",
       "      <td>NaN</td>\n",
       "      <td>5.0</td>\n",
       "      <td>Long Distance</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   SalesRepID             SalesRep Region  Unnamed: 3  ProductID Products  \\\n",
       "0           1  Sioux Radcoolinator     NW         NaN        1.0     Quad   \n",
       "1           2        Tyrone Smithe     NE         NaN        2.0   Yanaki   \n",
       "2           3         Chantel Zoya     SW         NaN        3.0    Eagle   \n",
       "3           4            Chin Pham     SE         NaN        4.0   Bellen   \n",
       "4           5         Diego Vasque     MW         NaN        5.0    Aspen   \n",
       "\n",
       "   RetailPrice  CategoryID SupplierID  Unnamed: 9  CategoryID.1  \\\n",
       "0        43.95         3.0         GB         NaN           1.0   \n",
       "1        27.95         1.0         CO         NaN           2.0   \n",
       "2        19.95         2.0         CC         NaN           3.0   \n",
       "3        26.95         1.0         GB         NaN           4.0   \n",
       "4        24.95         1.0         CO         NaN           5.0   \n",
       "\n",
       "        Category  Unnamed: 12 SupplierID.1             Supplier        City  \\\n",
       "0       Beginner          NaN           GB       Gel Boomerangs     Oakland   \n",
       "1       Advanced          NaN           CO  Colorado Boomerangs    Gunnison   \n",
       "2      Freestyle          NaN           CC        Channel Craft    Richland   \n",
       "3    Competition          NaN           DB        Darnell Booms  Burlington   \n",
       "4  Long Distance          NaN          NaN                  NaN         NaN   \n",
       "\n",
       "  State                         E-mail  \n",
       "0    CA          gel@gel-boomerang.com  \n",
       "1    CO  Pollock@coloradoboomerang.com  \n",
       "2    WA                    Dino@CC.com  \n",
       "3    VT            Darnell@Darnell.com  \n",
       "4   NaN                            NaN  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_excel(filename, sheet_name = \"Tables\", engine='openpyxl')\n",
    "\n",
    "#view the first five rows: \n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice how Pandas did not identify the tables - it just pulled in everything, even the empty columns. Also note the mangling of column names(*SupplierID.1*, *CategoryID.1*). This is not good enough. Yes, we could fix it, probably use the empty rows as a means of splitting the dataframe into new dataframes, but that is not wise. How do we truly know where one table starts and the other ends? Surely there has to be a better way. Thankfully there is."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **Option 2 - The better way :**"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Check on [stackoverflow](https://stackoverflow.com/questions/48657867/manipulate-existing-excel-table-using-openpyxl) and you'll see solutions and conversations regarding this. The better way is via [Openpyxl](https://openpyxl.readthedocs.io/en/stable/usage.html), a python module dedicated to working with Excel files. It has a method - ```.tables``` that allows access to defined tables in the spreadsheet. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import library\n",
    "from openpyxl import load_workbook\n",
    "\n",
    "# read file\n",
    "wb = load_workbook(filename)\n",
    "\n",
    "# access specific sheet\n",
    "ws = wb[\"Tables\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can access the tables in the worksheet through the ``tables`` method - this returns a dictionary :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'dSalesReps': 'A1:C26',\n",
       " 'dProduct': 'E1:I17',\n",
       " 'dCategory': 'K1:L6',\n",
       " 'dSupplier': 'N1:R5'}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "{key : value for key, value in ws.tables.items()}"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From the result above, we can see the name of each table (*name=dSalesReps*) and the span of the data (*ref='A1:C26*). Let's get our data out :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "mapping = {}\n",
    "\n",
    "for entry, data_boundary in ws.tables.items():\n",
    "    #parse the data within the ref boundary\n",
    "    data = ws[data_boundary]\n",
    "    #extract the data \n",
    "    #the inner list comprehension gets the values for each cell in the table\n",
    "    content = [[cell.value for cell in ent] \n",
    "               for ent in data\n",
    "          ]\n",
    "    \n",
    "    header = content[0]\n",
    "    \n",
    "    #the contents ... excluding the header\n",
    "    rest = content[1:]\n",
    "    \n",
    "    #create dataframe with the column names\n",
    "    #and pair table name with dataframe\n",
    "    df = pd.DataFrame(rest, columns = header)\n",
    "    mapping[entry] = df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'dSalesReps':     SalesRepID              SalesRep Region\n",
       " 0            1   Sioux Radcoolinator     NW\n",
       " 1            2         Tyrone Smithe     NE\n",
       " 2            3          Chantel Zoya     SW\n",
       " 3            4             Chin Pham     SE\n",
       " 4            5          Diego Vasque     MW\n",
       " 5            6      Vannessa Deloach      W\n",
       " 6            7            Shon Stein     NW\n",
       " 7            8           Tomi Benton     NE\n",
       " 8            9     Ghislaine Stidham     SW\n",
       " 9           10       Yoshiko Murillo     SE\n",
       " 10          11            Hoyt Potts     MW\n",
       " 11          12         Alysha Dewitt      W\n",
       " 12          13       Claudine Dupuis     NW\n",
       " 13          14         Shanta Spring     NE\n",
       " 14          15      Ramonita Babcock     SW\n",
       " 15          16     Janyce Betancourt     SE\n",
       " 16          17       Rhiannon Cathey     MW\n",
       " 17          18      Dominica Ordonez      W\n",
       " 18          19       Rana Burchfield     NW\n",
       " 19          20            Neida Ashe     NE\n",
       " 20          21  Marylouise Halverson     MW\n",
       " 21          22           Naoma Bloom     NW\n",
       " 22          23            JoJo Jones      W\n",
       " 23          24       Dean Washington      W\n",
       " 24          25              Kiki Lim      W,\n",
       " 'dProduct':     ProductID      Products  RetailPrice  CategoryID SupplierID\n",
       " 0           1          Quad        43.95           3         GB\n",
       " 1           2        Yanaki        27.95           1         CO\n",
       " 2           3         Eagle        19.95           2         CC\n",
       " 3           4        Bellen        26.95           1         GB\n",
       " 4           5         Aspen        24.95           1         CO\n",
       " 5           6       Carlota        23.95           3         GB\n",
       " 6           7      Sunshine        19.95           4         GB\n",
       " 7           8        Sunset        22.95           4         GB\n",
       " 8           9         Beaut        35.95           5         CO\n",
       " 9          10      Kangaroo        25.00           2         CC\n",
       " 10         11       Elevate        48.95           5         GB\n",
       " 11         12       Flattop        25.95           2         CO\n",
       " 12         13         Vrang        12.95           4         DB\n",
       " 13         14        TriFly        21.95           3         DB\n",
       " 14         15  NaturalElbow        35.00           4         DB\n",
       " 15         16      LongRang        41.00           5         CC,\n",
       " 'dCategory':    CategoryID       Category\n",
       " 0           1       Beginner\n",
       " 1           2       Advanced\n",
       " 2           3      Freestyle\n",
       " 3           4    Competition\n",
       " 4           5  Long Distance,\n",
       " 'dSupplier':   SupplierID             Supplier        City State  \\\n",
       " 0         GB       Gel Boomerangs     Oakland    CA   \n",
       " 1         CO  Colorado Boomerangs    Gunnison    CO   \n",
       " 2         CC        Channel Craft    Richland    WA   \n",
       " 3         DB        Darnell Booms  Burlington    VT   \n",
       " \n",
       "                           E-mail  \n",
       " 0          gel@gel-boomerang.com  \n",
       " 1  Pollock@coloradoboomerang.com  \n",
       " 2                    Dino@CC.com  \n",
       " 3            Darnell@Darnell.com  }"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mapping"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can safely extract our tables from the dictionary :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "dSalesReps, dProduct, dCategory, dSupplier = mapping.values()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's view some of the dataframes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>SalesRepID</th>\n",
       "      <th>SalesRep</th>\n",
       "      <th>Region</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>Sioux Radcoolinator</td>\n",
       "      <td>NW</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>Tyrone Smithe</td>\n",
       "      <td>NE</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>Chantel Zoya</td>\n",
       "      <td>SW</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>Chin Pham</td>\n",
       "      <td>SE</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>Diego Vasque</td>\n",
       "      <td>MW</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   SalesRepID             SalesRep Region\n",
       "0           1  Sioux Radcoolinator     NW\n",
       "1           2        Tyrone Smithe     NE\n",
       "2           3         Chantel Zoya     SW\n",
       "3           4            Chin Pham     SE\n",
       "4           5         Diego Vasque     MW"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dSalesReps.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>SupplierID</th>\n",
       "      <th>Supplier</th>\n",
       "      <th>City</th>\n",
       "      <th>State</th>\n",
       "      <th>E-mail</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>GB</td>\n",
       "      <td>Gel Boomerangs</td>\n",
       "      <td>Oakland</td>\n",
       "      <td>CA</td>\n",
       "      <td>gel@gel-boomerang.com</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>CO</td>\n",
       "      <td>Colorado Boomerangs</td>\n",
       "      <td>Gunnison</td>\n",
       "      <td>CO</td>\n",
       "      <td>Pollock@coloradoboomerang.com</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>CC</td>\n",
       "      <td>Channel Craft</td>\n",
       "      <td>Richland</td>\n",
       "      <td>WA</td>\n",
       "      <td>Dino@CC.com</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>DB</td>\n",
       "      <td>Darnell Booms</td>\n",
       "      <td>Burlington</td>\n",
       "      <td>VT</td>\n",
       "      <td>Darnell@Darnell.com</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  SupplierID             Supplier        City State  \\\n",
       "0         GB       Gel Boomerangs     Oakland    CA   \n",
       "1         CO  Colorado Boomerangs    Gunnison    CO   \n",
       "2         CC        Channel Craft    Richland    WA   \n",
       "3         DB        Darnell Booms  Burlington    VT   \n",
       "\n",
       "                          E-mail  \n",
       "0          gel@gel-boomerang.com  \n",
       "1  Pollock@coloradoboomerang.com  \n",
       "2                    Dino@CC.com  \n",
       "3            Darnell@Darnell.com  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dSupplier"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There we have it, our excel tables successfully extracted and assigned to variables. We can proceed from here and run our computations in Pandas."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **Option 3 - An even simpler form :**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The steps above can be simplified by using the [xlsx_table](https://pyjanitor-devs.github.io/pyjanitor/api/io/#janitor.io.xlsx_table) function from [pyjanitor](https://pyjanitor-devs.github.io/pyjanitor/). Simply install [pyjanitor](https://github.com/pyjanitor-devs/pyjanitor) with pip, and read in the Excel table:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>CategoryID</th>\n",
       "      <th>Category</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>Beginner</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>Advanced</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>Freestyle</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>Competition</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>Long Distance</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   CategoryID       Category\n",
       "0           1       Beginner\n",
       "1           2       Advanced\n",
       "2           3      Freestyle\n",
       "3           4    Competition\n",
       "4           5  Long Distance"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# pip install pyjanitor\n",
    "from janitor import xlsx_table\n",
    "\n",
    "# read in the specific table:\n",
    "xlsx_table(filename, table='dCategory')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All the tables can be read at once; it will be loaded as a dictionary, and the relevant table accessed:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict_keys(['dSalesReps', 'dProduct', 'dCategory', 'dSupplier'])"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tables = xlsx_table(filename)\n",
    "\n",
    "# view all the keys, which are the table names:\n",
    "tables.keys()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Access a specific table, using Python's dictionary syntax:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>SalesRepID</th>\n",
       "      <th>SalesRep</th>\n",
       "      <th>Region</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>Sioux Radcoolinator</td>\n",
       "      <td>NW</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>Tyrone Smithe</td>\n",
       "      <td>NE</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>Chantel Zoya</td>\n",
       "      <td>SW</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>Chin Pham</td>\n",
       "      <td>SE</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>Diego Vasque</td>\n",
       "      <td>MW</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6</td>\n",
       "      <td>Vannessa Deloach</td>\n",
       "      <td>W</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>7</td>\n",
       "      <td>Shon Stein</td>\n",
       "      <td>NW</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>8</td>\n",
       "      <td>Tomi Benton</td>\n",
       "      <td>NE</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>9</td>\n",
       "      <td>Ghislaine Stidham</td>\n",
       "      <td>SW</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>10</td>\n",
       "      <td>Yoshiko Murillo</td>\n",
       "      <td>SE</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>11</td>\n",
       "      <td>Hoyt Potts</td>\n",
       "      <td>MW</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>12</td>\n",
       "      <td>Alysha Dewitt</td>\n",
       "      <td>W</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>13</td>\n",
       "      <td>Claudine Dupuis</td>\n",
       "      <td>NW</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>14</td>\n",
       "      <td>Shanta Spring</td>\n",
       "      <td>NE</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>15</td>\n",
       "      <td>Ramonita Babcock</td>\n",
       "      <td>SW</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>16</td>\n",
       "      <td>Janyce Betancourt</td>\n",
       "      <td>SE</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>17</td>\n",
       "      <td>Rhiannon Cathey</td>\n",
       "      <td>MW</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>18</td>\n",
       "      <td>Dominica Ordonez</td>\n",
       "      <td>W</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>19</td>\n",
       "      <td>Rana Burchfield</td>\n",
       "      <td>NW</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>20</td>\n",
       "      <td>Neida Ashe</td>\n",
       "      <td>NE</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>21</td>\n",
       "      <td>Marylouise Halverson</td>\n",
       "      <td>MW</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>22</td>\n",
       "      <td>Naoma Bloom</td>\n",
       "      <td>NW</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>23</td>\n",
       "      <td>JoJo Jones</td>\n",
       "      <td>W</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>24</td>\n",
       "      <td>Dean Washington</td>\n",
       "      <td>W</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>25</td>\n",
       "      <td>Kiki Lim</td>\n",
       "      <td>W</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    SalesRepID              SalesRep Region\n",
       "0            1   Sioux Radcoolinator     NW\n",
       "1            2         Tyrone Smithe     NE\n",
       "2            3          Chantel Zoya     SW\n",
       "3            4             Chin Pham     SE\n",
       "4            5          Diego Vasque     MW\n",
       "5            6      Vannessa Deloach      W\n",
       "6            7            Shon Stein     NW\n",
       "7            8           Tomi Benton     NE\n",
       "8            9     Ghislaine Stidham     SW\n",
       "9           10       Yoshiko Murillo     SE\n",
       "10          11            Hoyt Potts     MW\n",
       "11          12         Alysha Dewitt      W\n",
       "12          13       Claudine Dupuis     NW\n",
       "13          14         Shanta Spring     NE\n",
       "14          15      Ramonita Babcock     SW\n",
       "15          16     Janyce Betancourt     SE\n",
       "16          17       Rhiannon Cathey     MW\n",
       "17          18      Dominica Ordonez      W\n",
       "18          19       Rana Burchfield     NW\n",
       "19          20            Neida Ashe     NE\n",
       "20          21  Marylouise Halverson     MW\n",
       "21          22           Naoma Bloom     NW\n",
       "22          23            JoJo Jones      W\n",
       "23          24       Dean Washington      W\n",
       "24          25              Kiki Lim      W"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tables['dSalesReps']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "you can also read it into a polars DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (5, 2)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>CategoryID</th><th>Category</th></tr><tr><td>i64</td><td>str</td></tr></thead><tbody><tr><td>1</td><td>&quot;Beginner&quot;</td></tr><tr><td>2</td><td>&quot;Advanced&quot;</td></tr><tr><td>3</td><td>&quot;Freestyle&quot;</td></tr><tr><td>4</td><td>&quot;Competition&quot;</td></tr><tr><td>5</td><td>&quot;Long Distance&quot;</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (5, 2)\n",
       "┌────────────┬───────────────┐\n",
       "│ CategoryID ┆ Category      │\n",
       "│ ---        ┆ ---           │\n",
       "│ i64        ┆ str           │\n",
       "╞════════════╪═══════════════╡\n",
       "│ 1          ┆ Beginner      │\n",
       "│ 2          ┆ Advanced      │\n",
       "│ 3          ┆ Freestyle     │\n",
       "│ 4          ┆ Competition   │\n",
       "│ 5          ┆ Long Distance │\n",
       "└────────────┴───────────────┘"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "xlsx_table(filename, table='dCategory', engine='polars')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Comments\n",
    "<script src=\"https://utteranc.es/client.js\"\n",
    "        repo=\"samukweku/data-wrangling-blog\"\n",
    "        issue-term=\"title\"\n",
    "        theme=\"github-light\"\n",
    "        crossorigin=\"anonymous\"\n",
    "        async>\n",
    "</script>"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "formats": "ipynb,md"
  },
  "kernelspec": {
   "display_name": "blogger",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}