{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data Wrangling with Python Datatable - Row-wise Transformations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### [Link to Source data](https://stackoverflow.com/questions/63025628/how-to-get-the-difference-of-the-max-and-min-of-the-row-and-input-as-series-for/63025709#63025709)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Task: Get the difference between the maximum and minimum values per row for `Value` columns."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | Ind | Department | Value1 | Value2 | Value3 | Value4 |
\n",
" | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | 1 | Electronics | 5 | 4 | 3 | 2 |
\n",
" 1 | 2 | Clothing | 4 | 3 | 2 | 1 |
\n",
" 2 | 3 | Grocery | 3 | 3 | 5 | 1 |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from datatable import dt, f\n",
"\n",
"df = dt.Frame({'Ind': [1, 2, 3],\n",
" 'Department': ['Electronics', 'Clothing', 'Grocery'],\n",
" 'Value1': [5, 4, 3],\n",
" 'Value2': [4, 3, 3],\n",
" 'Value3': [3, 2, 5],\n",
" 'Value4': [2, 1, 1]})\n",
"\n",
"\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**COMPLETE SOLUTION**"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | Ind | Department | Value1 | Value2 | Value3 | Value4 | difference |
\n",
" | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | 1 | Electronics | 5 | 4 | 3 | 2 | 3 |
\n",
" 1 | 2 | Clothing | 4 | 3 | 2 | 1 | 3 |
\n",
" 2 | 3 | Grocery | 3 | 3 | 5 | 1 | 4 |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"value_columns = [name for name in df.names if \"Value\" in name]\n",
"max_min_difference = f[value_columns].rowmax()- f[value_columns].rowmin()\n",
"df[:, [f[:], max_min_difference.alias(\"difference\")]]"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | Ind | Department | Value1 | Value2 | Value3 | Value4 |
\n",
" | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | 1 | Electronics | 5 | 4 | 3 | 2 |
\n",
" 1 | 2 | Clothing | 4 | 3 | 2 | 1 |
\n",
" 2 | 3 | Grocery | 3 | 3 | 5 | 1 |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**BREAKDOWN OF SOLUTION**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Step 1 : Filter for columns that start with `Value` and prefix with the `f` symbol"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['Value1', 'Value2', 'Value3', 'Value4']"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"value_columns = [name for name in df.names if \"Value\" in name]\n",
"value_columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Step 2 : Create an `f-expression` of the difference between the row maximum and row minimum of `value_columns`. Note that there is no execution at this point; the execution of a `f-expression` only occurs within the brackets of a datatable frame."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"FExpr"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"max_min_difference = f[value_columns].rowmax() - f[value_columns].rowmin()\n",
"max_min_difference"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Step 3: Apply `max_min_difference` to the datatable frame to get the results"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | Ind | Department | Value1 | Value2 | Value3 | Value4 | difference |
\n",
" | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | 1 | Electronics | 5 | 4 | 3 | 2 | 3 |
\n",
" 1 | 2 | Clothing | 4 | 3 | 2 | 1 | 3 |
\n",
" 2 | 3 | Grocery | 3 | 3 | 5 | 1 | 4 |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[:, [f[:], max_min_difference.alias(\"difference\")]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Resources: \n",
"- [update](https://datatable.readthedocs.io/en/latest/api/dt/update.html#)\n",
"\n",
"- [rowmax](https://datatable.readthedocs.io/en/latest/api/dt/rowmax.html)\n",
"\n",
"- [rowmin](https://datatable.readthedocs.io/en/latest/api/dt/rowmin.html)\n",
"\n",
"- [Documentation on row functions](https://datatable.readthedocs.io/en/latest/manual/row_functions.html)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Comments\n",
""
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.18"
}
},
"nbformat": 4,
"nbformat_minor": 4
}