{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# SpreadSheet Munging Strategies in Python - Meaningful Formats"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## __Meaningful Formats__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*updated : September 24, 2024*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is part of a series of blog posts about extracting data from spreadsheets using Python. It is based on the [book](https://nacnudus.github.io/spreadsheet-munging-strategies/index.html) written by [Duncan Garmonsway](https://twitter.com/nacnudus?lang=en), which was written primarily for R users. Links to the other posts are on the sidebar.\n",
"\n",
"The key takeaway is this - you understand your data layout; use the tools to achieve your end goal. [xlsx_cells](https://pyjanitor-devs.github.io/pyjanitor/api/io/#janitor.io.xlsx_cells) offers a way to get the cells in a spreadsheet into individual rows, with some metadata. The final outcome however relies on your understanding of the data layout and its proper application."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is an excerpt from [Duncan's book](https://nacnudus.github.io/spreadsheet-munging-strategies/tidy-formatted-rows.html) regarding meaningful formats :\n",
">Sometimes whole rows in a table are highlighted by formatting them with, say, a bright yellow fill. The highlighting could mean “this observation should be ignored”, or “this product is no longer available”. Different colours could mean different levels of a hierarchy, e.g. green for “pass” and red for “fail”.\n",
"\n",
"The examples below highlight various ways of dealing with meaningfully formatted spreadsheets. Let's dive in."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### **Case 1 : Meaningfully Formatted Rows**\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The goal is to capture the color as part of our final output. We'll make use of the [xlsx_cells](https://pyjanitor-devs.github.io/pyjanitor/api/io/#janitor.io.xlsx_cells) function from [pyjanitor](https://pyjanitor-devs.github.io/pyjanitor/) - under the hood, it uses [openpyxl](https://openpyxl.readthedocs.io/en/stable/index.html):"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [],
"source": [
"# pip install pyjanitor\n",
"import pandas as pd\n",
"import janitor as jn\n",
"import sys\n",
"import numpy as np\n",
"\n",
"filename = \"Data_files/worked-examples.xlsx\""
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'0.29.1'"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print('pandas version: ', pd.__version__)\n",
"print('janitor version: ', jn.__version__)\n",
"print('python version: ', sys.version)\n",
"print('numpy version: ', np.__version__)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" value | \n",
" internal_value | \n",
" coordinate | \n",
" row | \n",
" column | \n",
" data_type | \n",
" is_date | \n",
" number_format | \n",
" fill | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Age | \n",
" Age | \n",
" A1 | \n",
" 1 | \n",
" 1 | \n",
" s | \n",
" False | \n",
" General | \n",
" {'patternType': None, 'fgColor': {'rgb': '0000... | \n",
"
\n",
" \n",
" 1 | \n",
" Height | \n",
" Height | \n",
" B1 | \n",
" 1 | \n",
" 2 | \n",
" s | \n",
" False | \n",
" General | \n",
" {'patternType': None, 'fgColor': {'rgb': '0000... | \n",
"
\n",
" \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" A2 | \n",
" 2 | \n",
" 1 | \n",
" n | \n",
" False | \n",
" General | \n",
" {'patternType': None, 'fgColor': {'rgb': '0000... | \n",
"
\n",
" \n",
" 3 | \n",
" 2 | \n",
" 2 | \n",
" B2 | \n",
" 2 | \n",
" 2 | \n",
" n | \n",
" False | \n",
" General | \n",
" {'patternType': None, 'fgColor': {'rgb': '0000... | \n",
"
\n",
" \n",
" 4 | \n",
" 3 | \n",
" 3 | \n",
" A3 | \n",
" 3 | \n",
" 1 | \n",
" n | \n",
" False | \n",
" General | \n",
" {'patternType': 'solid', 'fgColor': {'rgb': 'F... | \n",
"
\n",
" \n",
" 5 | \n",
" 4 | \n",
" 4 | \n",
" B3 | \n",
" 3 | \n",
" 2 | \n",
" n | \n",
" False | \n",
" General | \n",
" {'patternType': 'solid', 'fgColor': {'rgb': 'F... | \n",
"
\n",
" \n",
" 6 | \n",
" 5 | \n",
" 5 | \n",
" A4 | \n",
" 4 | \n",
" 1 | \n",
" n | \n",
" False | \n",
" General | \n",
" {'patternType': None, 'fgColor': {'rgb': '0000... | \n",
"
\n",
" \n",
" 7 | \n",
" 6 | \n",
" 6 | \n",
" B4 | \n",
" 4 | \n",
" 2 | \n",
" n | \n",
" False | \n",
" General | \n",
" {'patternType': None, 'fgColor': {'rgb': '0000... | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" value internal_value coordinate row column data_type is_date \\\n",
"0 Age Age A1 1 1 s False \n",
"1 Height Height B1 1 2 s False \n",
"2 1 1 A2 2 1 n False \n",
"3 2 2 B2 2 2 n False \n",
"4 3 3 A3 3 1 n False \n",
"5 4 4 B3 3 2 n False \n",
"6 5 5 A4 4 1 n False \n",
"7 6 6 B4 4 2 n False \n",
"\n",
" number_format fill \n",
"0 General {'patternType': None, 'fgColor': {'rgb': '0000... \n",
"1 General {'patternType': None, 'fgColor': {'rgb': '0000... \n",
"2 General {'patternType': None, 'fgColor': {'rgb': '0000... \n",
"3 General {'patternType': None, 'fgColor': {'rgb': '0000... \n",
"4 General {'patternType': 'solid', 'fgColor': {'rgb': 'F... \n",
"5 General {'patternType': 'solid', 'fgColor': {'rgb': 'F... \n",
"6 General {'patternType': None, 'fgColor': {'rgb': '0000... \n",
"7 General {'patternType': None, 'fgColor': {'rgb': '0000... "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"frame = jn.xlsx_cells(filename, sheetnames=\"highlights\", fill=True)\n",
"frame"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can view a single row in the fill column:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'patternType': None,\n",
" 'fgColor': {'rgb': '00000000', 'type': 'rgb', 'tint': 0.0},\n",
" 'bgColor': {'rgb': '00000000', 'type': 'rgb', 'tint': 0.0}}"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"frame.fill[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Steps:\n",
"- get the headers\n",
"- get the data(numbers)\n",
"- get the colours\n",
"- combine to create the final DataFrame"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"headers:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\n",
"['Age', 'Height']\n",
"Length: 2, dtype: object"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"headers = frame.loc[frame.row == \"1\", \"value\"].array\n",
"headers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"data:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[1, 2],\n",
" [3, 4],\n",
" [5, 6]], dtype=int8)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"booleans = frame.data_type == \"n\"\n",
"data = frame.loc[booleans, \"value\"].array\n",
"# There are only two columns in the spreadsheet;\n",
"# we'll use that knowledge in reshaping the integer values:\n",
"data = data.astype(np.int8).reshape((-1, 2))\n",
"data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"colours:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\n",
"['00000000', 'FFFFFF00', '00000000']\n",
"Length: 3, dtype: object"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fill_colour = frame.fill.str.get(\"fgColor\").str.get(\"rgb\").array\n",
"# keep only rows related to the integers:\n",
"fill_colour = fill_colour[booleans]\n",
"# the number of columns is 2,\n",
"# so we only need to jump 2 steps\n",
"# to get the corresponding colour per row\n",
"fill_colour = fill_colour[::2]\n",
"fill_colour"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Age | \n",
" Height | \n",
" fill_colour | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" 00000000 | \n",
"
\n",
" \n",
" 1 | \n",
" 3 | \n",
" 4 | \n",
" FFFFFF00 | \n",
"
\n",
" \n",
" 2 | \n",
" 5 | \n",
" 6 | \n",
" 00000000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Age Height fill_colour\n",
"0 1 2 00000000\n",
"1 3 4 FFFFFF00\n",
"2 5 6 00000000"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"outcome = pd.DataFrame(data, columns=headers).assign(fill_colour=fill_colour)\n",
"outcome"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"(case-2)=\n",
"### **Case 2 : Meaningfully Formatted Cells**\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the pic above, the table has different colors for different cells. [xlsx_cells](https://pyjanitor-devs.github.io/pyjanitor/api/io/#janitor.io.xlsx_cells) comes in handy again, and the focus here, just as in Case 1, is the cell's fill attribute."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" value | \n",
" internal_value | \n",
" coordinate | \n",
" row | \n",
" column | \n",
" data_type | \n",
" is_date | \n",
" number_format | \n",
" fill | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Name | \n",
" Name | \n",
" A1 | \n",
" 1 | \n",
" 1 | \n",
" s | \n",
" False | \n",
" General | \n",
" {'patternType': None, 'fgColor': {'rgb': '0000... | \n",
"
\n",
" \n",
" 1 | \n",
" Age | \n",
" Age | \n",
" B1 | \n",
" 1 | \n",
" 2 | \n",
" s | \n",
" False | \n",
" General | \n",
" {'patternType': None, 'fgColor': {'rgb': '0000... | \n",
"
\n",
" \n",
" 2 | \n",
" Height | \n",
" Height | \n",
" C1 | \n",
" 1 | \n",
" 3 | \n",
" s | \n",
" False | \n",
" General | \n",
" {'patternType': None, 'fgColor': {'rgb': '0000... | \n",
"
\n",
" \n",
" 3 | \n",
" Matilda | \n",
" Matilda | \n",
" A2 | \n",
" 2 | \n",
" 1 | \n",
" s | \n",
" False | \n",
" General | \n",
" {'patternType': None, 'fgColor': {'rgb': '0000... | \n",
"
\n",
" \n",
" 4 | \n",
" 1 | \n",
" 1 | \n",
" B2 | \n",
" 2 | \n",
" 2 | \n",
" n | \n",
" False | \n",
" General | \n",
" {'patternType': None, 'fgColor': {'rgb': '0000... | \n",
"
\n",
" \n",
" 5 | \n",
" 2 | \n",
" 2 | \n",
" C2 | \n",
" 2 | \n",
" 3 | \n",
" n | \n",
" False | \n",
" General | \n",
" {'patternType': None, 'fgColor': {'rgb': '0000... | \n",
"
\n",
" \n",
" 6 | \n",
" Nicholas | \n",
" Nicholas | \n",
" A3 | \n",
" 3 | \n",
" 1 | \n",
" s | \n",
" False | \n",
" General | \n",
" {'patternType': None, 'fgColor': {'rgb': '0000... | \n",
"
\n",
" \n",
" 7 | \n",
" 3 | \n",
" 3 | \n",
" B3 | \n",
" 3 | \n",
" 2 | \n",
" n | \n",
" False | \n",
" General | \n",
" {'patternType': 'solid', 'fgColor': {'rgb': 'F... | \n",
"
\n",
" \n",
" 8 | \n",
" 4 | \n",
" 4 | \n",
" C3 | \n",
" 3 | \n",
" 3 | \n",
" n | \n",
" False | \n",
" General | \n",
" {'patternType': None, 'fgColor': {'rgb': '0000... | \n",
"
\n",
" \n",
" 9 | \n",
" Olivia | \n",
" Olivia | \n",
" A4 | \n",
" 4 | \n",
" 1 | \n",
" s | \n",
" False | \n",
" General | \n",
" {'patternType': None, 'fgColor': {'rgb': '0000... | \n",
"
\n",
" \n",
" 10 | \n",
" 5 | \n",
" 5 | \n",
" B4 | \n",
" 4 | \n",
" 2 | \n",
" n | \n",
" False | \n",
" General | \n",
" {'patternType': None, 'fgColor': {'rgb': '0000... | \n",
"
\n",
" \n",
" 11 | \n",
" 6 | \n",
" 6 | \n",
" C4 | \n",
" 4 | \n",
" 3 | \n",
" n | \n",
" False | \n",
" General | \n",
" {'patternType': 'solid', 'fgColor': {'rgb': 'F... | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" value internal_value coordinate row column data_type is_date \\\n",
"0 Name Name A1 1 1 s False \n",
"1 Age Age B1 1 2 s False \n",
"2 Height Height C1 1 3 s False \n",
"3 Matilda Matilda A2 2 1 s False \n",
"4 1 1 B2 2 2 n False \n",
"5 2 2 C2 2 3 n False \n",
"6 Nicholas Nicholas A3 3 1 s False \n",
"7 3 3 B3 3 2 n False \n",
"8 4 4 C3 3 3 n False \n",
"9 Olivia Olivia A4 4 1 s False \n",
"10 5 5 B4 4 2 n False \n",
"11 6 6 C4 4 3 n False \n",
"\n",
" number_format fill \n",
"0 General {'patternType': None, 'fgColor': {'rgb': '0000... \n",
"1 General {'patternType': None, 'fgColor': {'rgb': '0000... \n",
"2 General {'patternType': None, 'fgColor': {'rgb': '0000... \n",
"3 General {'patternType': None, 'fgColor': {'rgb': '0000... \n",
"4 General {'patternType': None, 'fgColor': {'rgb': '0000... \n",
"5 General {'patternType': None, 'fgColor': {'rgb': '0000... \n",
"6 General {'patternType': None, 'fgColor': {'rgb': '0000... \n",
"7 General {'patternType': 'solid', 'fgColor': {'rgb': 'F... \n",
"8 General {'patternType': None, 'fgColor': {'rgb': '0000... \n",
"9 General {'patternType': None, 'fgColor': {'rgb': '0000... \n",
"10 General {'patternType': None, 'fgColor': {'rgb': '0000... \n",
"11 General {'patternType': 'solid', 'fgColor': {'rgb': 'F... "
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"frame = jn.xlsx_cells(filename, sheetnames=\"annotations\", fill=True).astype(\n",
" {\"row\": np.int8, \"column\": np.int8}\n",
")\n",
"frame"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Steps:\n",
"- get the headers\n",
"- get the data in separate columns(strings, numbers)\n",
"- get the colours\n",
"- combine to create the final DataFrame"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"headers:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" measure | \n",
" column | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Name | \n",
" 1 | \n",
"
\n",
" \n",
" 1 | \n",
" Age | \n",
" 2 | \n",
"
\n",
" \n",
" 2 | \n",
" Height | \n",
" 3 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" measure column\n",
"0 Name 1\n",
"1 Age 2\n",
"2 Height 3"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# all the headers are on the first row\n",
"# retain the column positions,\n",
"# so we can link them back to the data and colours\n",
"headers = frame.loc[frame.row.eq(1), [\"value\", \"column\"]]\n",
"headers.columns = [\"measure\", \"column\"]\n",
"headers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"data:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" row | \n",
"
\n",
" \n",
" \n",
" \n",
" 3 | \n",
" Matilda | \n",
" 2 | \n",
"
\n",
" \n",
" 6 | \n",
" Nicholas | \n",
" 3 | \n",
"
\n",
" \n",
" 9 | \n",
" Olivia | \n",
" 4 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name row\n",
"3 Matilda 2\n",
"6 Nicholas 3\n",
"9 Olivia 4"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# the string values have a data type of 's'\n",
"# and the row numbers are greater than 1\n",
"# we'll keep the row number so we can correctly\n",
"# match with the measurements(age, height)\n",
"booleans = frame.row.gt(1) & frame.data_type.eq(\"s\")\n",
"students = frame.loc[booleans, [\"value\", \"row\"]]\n",
"students.columns = [\"name\", \"row\"]\n",
"students"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" value | \n",
" row | \n",
" column | \n",
"
\n",
" \n",
" \n",
" \n",
" 4 | \n",
" 1 | \n",
" 2 | \n",
" 2 | \n",
"
\n",
" \n",
" 5 | \n",
" 2 | \n",
" 2 | \n",
" 3 | \n",
"
\n",
" \n",
" 7 | \n",
" 3 | \n",
" 3 | \n",
" 2 | \n",
"
\n",
" \n",
" 8 | \n",
" 4 | \n",
" 3 | \n",
" 3 | \n",
"
\n",
" \n",
" 10 | \n",
" 5 | \n",
" 4 | \n",
" 2 | \n",
"
\n",
" \n",
" 11 | \n",
" 6 | \n",
" 4 | \n",
" 3 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" value row column\n",
"4 1 2 2\n",
"5 2 2 3\n",
"7 3 3 2\n",
"8 4 3 3\n",
"10 5 4 2\n",
"11 6 4 3"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# the scores values have a data type of 'n'\n",
"# and the row numbers are greater than 1\n",
"# we'll keep the row and column columns\n",
"# so we can correctly\n",
"# match with the student names\n",
"booleans = frame.row.gt(1) & frame.data_type.eq(\"n\")\n",
"numbers = frame.loc[booleans, [\"value\", \"row\", \"column\"]]\n",
"numbers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"colours:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4 00000000\n",
"5 00000000\n",
"7 FFFFFF00\n",
"8 00000000\n",
"10 00000000\n",
"11 FF92D050\n",
"Name: fill, dtype: object"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fill_colour = frame.fill.str.get(\"fgColor\").str.get(\"rgb\")\n",
"# keep only rows for numbers:\n",
"fill_colour = fill_colour[booleans]\n",
"fill_colour"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"combine into a single DataFrame:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" measure | \n",
" value | \n",
" fill_colour | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Matilda | \n",
" Age | \n",
" 1 | \n",
" 00000000 | \n",
"
\n",
" \n",
" 1 | \n",
" Matilda | \n",
" Height | \n",
" 2 | \n",
" 00000000 | \n",
"
\n",
" \n",
" 2 | \n",
" Nicholas | \n",
" Age | \n",
" 3 | \n",
" FFFFFF00 | \n",
"
\n",
" \n",
" 3 | \n",
" Nicholas | \n",
" Height | \n",
" 4 | \n",
" 00000000 | \n",
"
\n",
" \n",
" 4 | \n",
" Olivia | \n",
" Age | \n",
" 5 | \n",
" 00000000 | \n",
"
\n",
" \n",
" 5 | \n",
" Olivia | \n",
" Height | \n",
" 6 | \n",
" FF92D050 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name measure value fill_colour\n",
"0 Matilda Age 1 00000000\n",
"1 Matilda Height 2 00000000\n",
"2 Nicholas Age 3 FFFFFF00\n",
"3 Nicholas Height 4 00000000\n",
"4 Olivia Age 5 00000000\n",
"5 Olivia Height 6 FF92D050"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"outcome = (\n",
" numbers.assign(fill_colour=fill_colour)\n",
" .merge(students, on=\"row\")\n",
" .merge(headers, on=\"column\")\n",
" .loc[:, [\"name\", \"measure\", \"value\", \"fill_colour\"]]\n",
")\n",
"\n",
"outcome"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### **Case 3 : Layered Meaningful Formatting**\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this case, we have formats for fill and font. \n",
"\n",
"The format applies to the entire row."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Steps:\n",
"- get the headers\n",
"- get the data in separate columns(kitchen_items, price, weight)\n",
"- get the colours (fill and font)\n",
"- combine to create the final DataFrame"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" value | \n",
" internal_value | \n",
" coordinate | \n",
" row | \n",
" column | \n",
" data_type | \n",
" is_date | \n",
" number_format | \n",
" fill | \n",
" font | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Name | \n",
" Name | \n",
" A1 | \n",
" 1 | \n",
" 1 | \n",
" s | \n",
" False | \n",
" General | \n",
" {'patternType': None, 'fgColor': {'rgb': '0000... | \n",
" {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... | \n",
"
\n",
" \n",
" 1 | \n",
" Weight | \n",
" Weight | \n",
" B1 | \n",
" 1 | \n",
" 2 | \n",
" s | \n",
" False | \n",
" General | \n",
" {'patternType': None, 'fgColor': {'rgb': '0000... | \n",
" {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... | \n",
"
\n",
" \n",
" 2 | \n",
" Price | \n",
" Price | \n",
" C1 | \n",
" 1 | \n",
" 3 | \n",
" s | \n",
" False | \n",
" General | \n",
" {'patternType': None, 'fgColor': {'rgb': '0000... | \n",
" {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... | \n",
"
\n",
" \n",
" 3 | \n",
" Knife | \n",
" Knife | \n",
" A2 | \n",
" 2 | \n",
" 1 | \n",
" s | \n",
" False | \n",
" General | \n",
" {'patternType': None, 'fgColor': {'rgb': '0000... | \n",
" {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... | \n",
"
\n",
" \n",
" 4 | \n",
" 7 | \n",
" 7 | \n",
" B2 | \n",
" 2 | \n",
" 2 | \n",
" n | \n",
" False | \n",
" General | \n",
" {'patternType': None, 'fgColor': {'rgb': '0000... | \n",
" {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... | \n",
"
\n",
" \n",
" 5 | \n",
" 8 | \n",
" 8 | \n",
" C2 | \n",
" 2 | \n",
" 3 | \n",
" n | \n",
" False | \n",
" General | \n",
" {'patternType': None, 'fgColor': {'rgb': '0000... | \n",
" {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... | \n",
"
\n",
" \n",
" 6 | \n",
" Fork | \n",
" Fork | \n",
" A3 | \n",
" 3 | \n",
" 1 | \n",
" s | \n",
" False | \n",
" General | \n",
" {'patternType': 'solid', 'fgColor': {'rgb': 'F... | \n",
" {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... | \n",
"
\n",
" \n",
" 7 | \n",
" 5 | \n",
" 5 | \n",
" B3 | \n",
" 3 | \n",
" 2 | \n",
" n | \n",
" False | \n",
" General | \n",
" {'patternType': 'solid', 'fgColor': {'rgb': 'F... | \n",
" {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... | \n",
"
\n",
" \n",
" 8 | \n",
" 6 | \n",
" 6 | \n",
" C3 | \n",
" 3 | \n",
" 3 | \n",
" n | \n",
" False | \n",
" General | \n",
" {'patternType': 'solid', 'fgColor': {'rgb': 'F... | \n",
" {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... | \n",
"
\n",
" \n",
" 9 | \n",
" Spoon | \n",
" Spoon | \n",
" A4 | \n",
" 4 | \n",
" 1 | \n",
" s | \n",
" False | \n",
" General | \n",
" {'patternType': None, 'fgColor': {'rgb': '0000... | \n",
" {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... | \n",
"
\n",
" \n",
" 10 | \n",
" 3 | \n",
" 3 | \n",
" B4 | \n",
" 4 | \n",
" 2 | \n",
" n | \n",
" False | \n",
" General | \n",
" {'patternType': None, 'fgColor': {'rgb': '0000... | \n",
" {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... | \n",
"
\n",
" \n",
" 11 | \n",
" 4 | \n",
" 4 | \n",
" C4 | \n",
" 4 | \n",
" 3 | \n",
" n | \n",
" False | \n",
" General | \n",
" {'patternType': None, 'fgColor': {'rgb': '0000... | \n",
" {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... | \n",
"
\n",
" \n",
" 12 | \n",
" Teaspoon | \n",
" Teaspoon | \n",
" A5 | \n",
" 5 | \n",
" 1 | \n",
" s | \n",
" False | \n",
" General | \n",
" {'patternType': 'solid', 'fgColor': {'rgb': 'F... | \n",
" {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... | \n",
"
\n",
" \n",
" 13 | \n",
" 1 | \n",
" 1 | \n",
" B5 | \n",
" 5 | \n",
" 2 | \n",
" n | \n",
" False | \n",
" General | \n",
" {'patternType': 'solid', 'fgColor': {'rgb': 'F... | \n",
" {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... | \n",
"
\n",
" \n",
" 14 | \n",
" 2 | \n",
" 2 | \n",
" C5 | \n",
" 5 | \n",
" 3 | \n",
" n | \n",
" False | \n",
" General | \n",
" {'patternType': 'solid', 'fgColor': {'rgb': 'F... | \n",
" {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" value internal_value coordinate row column data_type is_date \\\n",
"0 Name Name A1 1 1 s False \n",
"1 Weight Weight B1 1 2 s False \n",
"2 Price Price C1 1 3 s False \n",
"3 Knife Knife A2 2 1 s False \n",
"4 7 7 B2 2 2 n False \n",
"5 8 8 C2 2 3 n False \n",
"6 Fork Fork A3 3 1 s False \n",
"7 5 5 B3 3 2 n False \n",
"8 6 6 C3 3 3 n False \n",
"9 Spoon Spoon A4 4 1 s False \n",
"10 3 3 B4 4 2 n False \n",
"11 4 4 C4 4 3 n False \n",
"12 Teaspoon Teaspoon A5 5 1 s False \n",
"13 1 1 B5 5 2 n False \n",
"14 2 2 C5 5 3 n False \n",
"\n",
" number_format fill \\\n",
"0 General {'patternType': None, 'fgColor': {'rgb': '0000... \n",
"1 General {'patternType': None, 'fgColor': {'rgb': '0000... \n",
"2 General {'patternType': None, 'fgColor': {'rgb': '0000... \n",
"3 General {'patternType': None, 'fgColor': {'rgb': '0000... \n",
"4 General {'patternType': None, 'fgColor': {'rgb': '0000... \n",
"5 General {'patternType': None, 'fgColor': {'rgb': '0000... \n",
"6 General {'patternType': 'solid', 'fgColor': {'rgb': 'F... \n",
"7 General {'patternType': 'solid', 'fgColor': {'rgb': 'F... \n",
"8 General {'patternType': 'solid', 'fgColor': {'rgb': 'F... \n",
"9 General {'patternType': None, 'fgColor': {'rgb': '0000... \n",
"10 General {'patternType': None, 'fgColor': {'rgb': '0000... \n",
"11 General {'patternType': None, 'fgColor': {'rgb': '0000... \n",
"12 General {'patternType': 'solid', 'fgColor': {'rgb': 'F... \n",
"13 General {'patternType': 'solid', 'fgColor': {'rgb': 'F... \n",
"14 General {'patternType': 'solid', 'fgColor': {'rgb': 'F... \n",
"\n",
" font \n",
"0 {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... \n",
"1 {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... \n",
"2 {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... \n",
"3 {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... \n",
"4 {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... \n",
"5 {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... \n",
"6 {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... \n",
"7 {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... \n",
"8 {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... \n",
"9 {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... \n",
"10 {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... \n",
"11 {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... \n",
"12 {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... \n",
"13 {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... \n",
"14 {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... "
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"frame = jn.xlsx_cells(filename, sheetnames=\"combined-highlights\", fill=True, font=True)\n",
"frame = frame.astype({\"row\": np.int8, \"column\": np.int8})\n",
"frame"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"headers:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" value | \n",
" column | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Name | \n",
" 1 | \n",
"
\n",
" \n",
" 1 | \n",
" Weight | \n",
" 2 | \n",
"
\n",
" \n",
" 2 | \n",
" Price | \n",
" 3 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" value column\n",
"0 Name 1\n",
"1 Weight 2\n",
"2 Price 3"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# headers are on the first row\n",
"# we'll rely on the `column` column\n",
"# to align with the weight and price extracts\n",
"headers = frame.loc[frame.row.eq(1), [\"value\", \"column\"]]\n",
"headers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"data:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\n",
"['Knife', 'Fork', 'Spoon', 'Teaspoon']\n",
"Length: 4, dtype: object"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# kitchen_items are string datatype\n",
"# they are on rows > 1\n",
"booleans = frame.row.gt(1) & frame.data_type.eq(\"s\")\n",
"kitchen_items = frame.loc[booleans, \"value\"].array\n",
"kitchen_items"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\n",
"[\n",
"[np.int8(7), np.int8(8)],\n",
"[np.int8(5), np.int8(6)],\n",
"[np.int8(3), np.int8(4)],\n",
"[np.int8(1), np.int8(2)]\n",
"]\n",
"Shape: (4, 2), dtype: int8"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# price and weight are number datatypes\n",
"# they are on rows > 1\n",
"booleans = frame.row.gt(1) & frame.data_type.eq(\"n\")\n",
"# retrieve column positions:\n",
"columns = frame.loc[booleans, \"column\"]\n",
"numbers = frame.loc[booleans, \"value\"].astype(np.int8).array\n",
"# we know there are only two number columns - weight and price\n",
"# we'll use that knowledge in reshaping:\n",
"numbers = numbers.reshape((-1, 2))\n",
"numbers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"colors:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"fill_colour = frame.fill.str.get(\"fgColor\").str.get(\"rgb\")\n",
"font_colour = frame.font.str.get(\"color\").str.get(\"rgb\")\n",
"# keep only rows aligned with the numbers:\n",
"fill_colour = fill_colour[booleans].array\n",
"font_colour = font_colour[booleans].array\n",
"# we know there are only two number columns\n",
"# this means we only need to jump two steps per colour:\n",
"fill_colour = fill_colour[::2]\n",
"font_colour = font_colour[::2]"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\n",
"['00000000', 'FFFFFF00', '00000000', 'FFFFFF00']\n",
"Length: 4, dtype: object"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fill_colour"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\n",
"['FF000000', None, 'FFFF0000', 'FFFF0000']\n",
"Length: 4, dtype: object"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"font_colour"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Combine into one DataFrame:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" kitchen_item | \n",
" Price | \n",
" Weight | \n",
" fill_colour | \n",
" font_colour | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Knife | \n",
" 8 | \n",
" 7 | \n",
" 00000000 | \n",
" FF000000 | \n",
"
\n",
" \n",
" 1 | \n",
" Fork | \n",
" 6 | \n",
" 5 | \n",
" FFFFFF00 | \n",
" None | \n",
"
\n",
" \n",
" 2 | \n",
" Spoon | \n",
" 4 | \n",
" 3 | \n",
" 00000000 | \n",
" FFFF0000 | \n",
"
\n",
" \n",
" 3 | \n",
" Teaspoon | \n",
" 2 | \n",
" 1 | \n",
" FFFFFF00 | \n",
" FFFF0000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" kitchen_item Price Weight fill_colour font_colour\n",
"0 Knife 8 7 00000000 FF000000\n",
"1 Fork 6 5 FFFFFF00 None\n",
"2 Spoon 4 3 00000000 FFFF0000\n",
"3 Teaspoon 2 1 FFFFFF00 FFFF0000"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# filter headers for only rows\n",
"# that align with the weight and price extracts\n",
"headers = headers.loc[headers.column.isin(columns), \"value\"].array\n",
"outcome = pd.DataFrame(numbers, columns=headers)\n",
"outcome = outcome.assign(\n",
" kitchen_item=kitchen_items, fill_colour=fill_colour, font_colour=font_colour\n",
").loc[:, [\"kitchen_item\", \"Price\", \"Weight\", \"fill_colour\", \"font_colour\"]]\n",
"outcome"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that an alternative would be to use the `row` and `column` positions to reshape the data - similar to the solution in [case 2](case-2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### **Case 4 : Hierarchies in Formatting**\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the pic above, notice the differences in the formatting (bold, italic, bold and italic, none). In Duncan's book, he offers a suggestion on what these different levels of hierarchy might mean :
\n",
"- none - good\n",
"- italic - satisfactory\n",
"- bold - poor\n",
"- bold and italic - fail\n",
"
\n",
"Again, the formatting is applied to the entire row."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" value | \n",
" internal_value | \n",
" coordinate | \n",
" row | \n",
" column | \n",
" data_type | \n",
" is_date | \n",
" number_format | \n",
" font | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Name | \n",
" Name | \n",
" A1 | \n",
" 1 | \n",
" 1 | \n",
" s | \n",
" False | \n",
" General | \n",
" {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... | \n",
"
\n",
" \n",
" 1 | \n",
" Score | \n",
" Score | \n",
" B1 | \n",
" 1 | \n",
" 2 | \n",
" s | \n",
" False | \n",
" General | \n",
" {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... | \n",
"
\n",
" \n",
" 2 | \n",
" Matilda | \n",
" Matilda | \n",
" A2 | \n",
" 2 | \n",
" 1 | \n",
" s | \n",
" False | \n",
" General | \n",
" {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... | \n",
"
\n",
" \n",
" 3 | \n",
" 7 | \n",
" 7 | \n",
" B2 | \n",
" 2 | \n",
" 2 | \n",
" n | \n",
" False | \n",
" General | \n",
" {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... | \n",
"
\n",
" \n",
" 4 | \n",
" Nicholas | \n",
" Nicholas | \n",
" A3 | \n",
" 3 | \n",
" 1 | \n",
" s | \n",
" False | \n",
" General | \n",
" {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... | \n",
"
\n",
" \n",
" 5 | \n",
" 5 | \n",
" 5 | \n",
" B3 | \n",
" 3 | \n",
" 2 | \n",
" n | \n",
" False | \n",
" General | \n",
" {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... | \n",
"
\n",
" \n",
" 6 | \n",
" Olivia | \n",
" Olivia | \n",
" A4 | \n",
" 4 | \n",
" 1 | \n",
" s | \n",
" False | \n",
" General | \n",
" {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... | \n",
"
\n",
" \n",
" 7 | \n",
" 3 | \n",
" 3 | \n",
" B4 | \n",
" 4 | \n",
" 2 | \n",
" n | \n",
" False | \n",
" General | \n",
" {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... | \n",
"
\n",
" \n",
" 8 | \n",
" Paul | \n",
" Paul | \n",
" A5 | \n",
" 5 | \n",
" 1 | \n",
" s | \n",
" False | \n",
" General | \n",
" {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... | \n",
"
\n",
" \n",
" 9 | \n",
" 1 | \n",
" 1 | \n",
" B5 | \n",
" 5 | \n",
" 2 | \n",
" n | \n",
" False | \n",
" General | \n",
" {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" value internal_value coordinate row column data_type is_date \\\n",
"0 Name Name A1 1 1 s False \n",
"1 Score Score B1 1 2 s False \n",
"2 Matilda Matilda A2 2 1 s False \n",
"3 7 7 B2 2 2 n False \n",
"4 Nicholas Nicholas A3 3 1 s False \n",
"5 5 5 B3 3 2 n False \n",
"6 Olivia Olivia A4 4 1 s False \n",
"7 3 3 B4 4 2 n False \n",
"8 Paul Paul A5 5 1 s False \n",
"9 1 1 B5 5 2 n False \n",
"\n",
" number_format font \n",
"0 General {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... \n",
"1 General {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... \n",
"2 General {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... \n",
"3 General {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... \n",
"4 General {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... \n",
"5 General {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... \n",
"6 General {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... \n",
"7 General {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... \n",
"8 General {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... \n",
"9 General {'name': 'Calibri', 'family': 2.0, 'sz': 11.0,... "
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"frame = jn.xlsx_cells(filename, sheetnames=\"highlight-hierarchy\", font=True)\n",
"frame = frame.astype({'row':np.int8, 'column':np.int8})\n",
"frame"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Steps:\n",
"- get the data in separate columns(score, grade)\n",
"- get the fonts (bold, italics)\n",
"- combine to create the final DataFrame"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"data:"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\n",
"['Matilda', 'Nicholas', 'Olivia', 'Paul']\n",
"Length: 4, dtype: object"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"booleans = frame.row.gt(1) & frame.data_type.eq('s')\n",
"students = frame.loc[booleans, 'value'].array\n",
"students"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([7, 5, 3, 1], dtype=int8)"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"booleans = frame.row.gt(1) & frame.data_type.eq('n')\n",
"score = frame.loc[booleans, 'value'].array.astype(np.int8)\n",
"score"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"fonts:"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([False, False, True, True])"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bold = frame.font.str.get(\"b\")[booleans].to_numpy()\n",
"bold"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([False, True, False, True])"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"italics = frame.font.str.get(\"i\")[booleans][booleans].to_numpy()\n",
"italics"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['good', 'satisfactory', 'poor', 'fail'], dtype='\n",
"\n",
"\n",
" \n",
" \n",
" | \n",
" name | \n",
" bold | \n",
" italics | \n",
" score | \n",
" grade | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Matilda | \n",
" False | \n",
" False | \n",
" 7 | \n",
" good | \n",
"
\n",
" \n",
" 1 | \n",
" Nicholas | \n",
" False | \n",
" True | \n",
" 5 | \n",
" satisfactory | \n",
"
\n",
" \n",
" 2 | \n",
" Olivia | \n",
" True | \n",
" False | \n",
" 3 | \n",
" poor | \n",
"
\n",
" \n",
" 3 | \n",
" Paul | \n",
" True | \n",
" True | \n",
" 1 | \n",
" fail | \n",
"
\n",
" \n",
"
\n",
""
],
"text/plain": [
" name bold italics score grade\n",
"0 Matilda False False 7 good\n",
"1 Nicholas False True 5 satisfactory\n",
"2 Olivia True False 3 poor\n",
"3 Paul True True 1 fail"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"outcome = {'name':students, 'bold':bold,'italics':italics,'score':score,'grade':grade}\n",
"outcome = pd.DataFrame(outcome)\n",
"outcome"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Comments\n",
""
]
}
],
"metadata": {
"jupytext": {
"formats": "ipynb,Rmd"
},
"kernelspec": {
"display_name": "blogger",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
}
},
"nbformat": 4,
"nbformat_minor": 4
}