{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Data Wrangling with Python Datatable - Row-wise Transformations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### [Link to Source data](https://stackoverflow.com/questions/63025628/how-to-get-the-difference-of-the-max-and-min-of-the-row-and-input-as-series-for/63025709#63025709)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Task: Get the difference between the maximum and minimum values per row for `Value` columns." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IndDepartmentValue1Value2Value3Value4
▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪
01Electronics5432
12Clothing4321
23Grocery3351
\n", " \n", "
\n" ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from datatable import dt, f\n", "\n", "df = dt.Frame({'Ind': [1, 2, 3],\n", " 'Department': ['Electronics', 'Clothing', 'Grocery'],\n", " 'Value1': [5, 4, 3],\n", " 'Value2': [4, 3, 3],\n", " 'Value3': [3, 2, 5],\n", " 'Value4': [2, 1, 1]})\n", "\n", "\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**COMPLETE SOLUTION**" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IndDepartmentValue1Value2Value3Value4difference
▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪
01Electronics54323
12Clothing43213
23Grocery33514
\n", " \n", "
\n" ], "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "value_columns = [name for name in df.names if \"Value\" in name]\n", "max_min_difference = f[value_columns].rowmax()- f[value_columns].rowmin()\n", "df[:, [f[:], max_min_difference.alias(\"difference\")]]" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IndDepartmentValue1Value2Value3Value4
▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪
01Electronics5432
12Clothing4321
23Grocery3351
\n", " \n", "
\n" ], "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**BREAKDOWN OF SOLUTION**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Step 1 : Filter for columns that start with `Value` and prefix with the `f` symbol" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Value1', 'Value2', 'Value3', 'Value4']" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "value_columns = [name for name in df.names if \"Value\" in name]\n", "value_columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Step 2 : Create an `f-expression` of the difference between the row maximum and row minimum of `value_columns`. Note that there is no execution at this point; the execution of a `f-expression` only occurs within the brackets of a datatable frame." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "FExpr" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "max_min_difference = f[value_columns].rowmax() - f[value_columns].rowmin()\n", "max_min_difference" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Step 3: Apply `max_min_difference` to the datatable frame to get the results" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IndDepartmentValue1Value2Value3Value4difference
▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪
01Electronics54323
12Clothing43213
23Grocery33514
\n", " \n", "
\n" ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[:, [f[:], max_min_difference.alias(\"difference\")]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Resources: \n", "- [update](https://datatable.readthedocs.io/en/latest/api/dt/update.html#)\n", "\n", "- [rowmax](https://datatable.readthedocs.io/en/latest/api/dt/rowmax.html)\n", "\n", "- [rowmin](https://datatable.readthedocs.io/en/latest/api/dt/rowmin.html)\n", "\n", "- [Documentation on row functions](https://datatable.readthedocs.io/en/latest/manual/row_functions.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comments\n", "" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.18" } }, "nbformat": 4, "nbformat_minor": 4 }