{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Data Wrangling with Python Datatable - Conditional Statements" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Transformation based on a Single Condition" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Source data](https://stackoverflow.com/questions/65728964/how-to-apply-a-function-to-a-dataframe-row-based-on-a-condition-and-values-of-an/65729035?noredirect=1#comment116213987_65729035\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Task: if `c` is positive, then value should be `a - b`, else `b - a`." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abc
▪▪▪▪▪▪▪▪▪▪▪▪
0123
112−3
2232
342−1
\n", " \n", "
\n" ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from datatable import dt, f, update, ifelse\n", "\n", "df = dt.Frame({\"a\": [1, 1, 2, 4], \"b\": [2, 2, 3, 2], \"c\": [3, -3, 2, -1]})\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [ifelse](https://datatable.readthedocs.io/en/latest/api/dt/ifelse.html) function handles conditional transformations. It is similar to python's [if...else](https://docs.python.org/3/tutorial/controlflow.html#if-statements) statements or SQL's [case_when](https://www.postgresql.org/docs/current/functions-conditional.html) function." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**BREAKDOWN OF SOLUTION**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Step 1 : Define the condition, with the `True` and `False` values." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "condition = f.c >= 0 # positive values\n", "true = f.a - f.b\n", "false = f.b - f.a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Step 2 : Create the [ifelse](https://datatable.readthedocs.io/en/latest/api/dt/ifelse.html) expression." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "if_statement = ifelse(condition, true, false)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Step 3: Apply the `if_statement` to the datatable frame to get the results" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abc
▪▪▪▪▪▪▪▪▪▪▪▪
01−13
111−3
22−12
34−2−1
\n", " \n", "
\n" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[:, update(b=if_statement)]\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Transformation based on Multiple Conditions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Source data](https://stackoverflow.com/a/19913845/7175713)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TypeSet
▪▪▪▪▪▪▪▪
0AZ
1BZ
2BX
3CY
\n", " \n", "
\n" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = dt.Frame({\"Type\": [\"A\", \"B\", \"B\", \"C\"], \"Set\": [\"Z\", \"Z\", \"X\", \"Y\"]})\n", "\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Task: \n", "1. if `Set` is equal to 'Z' and `Type` is equal to 'A' then assign 'yellow' to `color`.\n", "2. If `Set` is equal to 'Z' and `Type` is equal to 'B' then assign 'blue' to `color`.\n", "3. If `Type` is equal to 'B' then assign 'purple' to `color`.\n", "4. Otherwise, assign 'black' to `color`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**BREAKDOWN OF SOLUTION**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Step 1 : Define the conditions, with the `True` and `False` values." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "condition1 = (f.Set == \"Z\") & (f.Type == \"A\")\n", "true1 = \"yellow\"\n", "condition2 = (f.Set == \"Z\") & (f.Type == \"B\")\n", "true2 = \"blue\"\n", "condition3 = f.Type == \"B\"\n", "true3 = \"purple\"\n", "false = \"black\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Step 2 : Create the [ifelse](https://datatable.readthedocs.io/en/latest/api/dt/ifelse.html) expression." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "if_statements = ifelse(condition1, true1, condition2, true2, condition3, true3, false)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Step 3: Apply the `if_statements` to the datatable frame to get the results" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TypeSetcolor
▪▪▪▪▪▪▪▪▪▪▪▪
0AZyellow
1BZblue
2BXpurple
3CYblack
\n", " \n", "
\n" ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[:, update(color=if_statements)]\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Resources: \n", "\n", "- [ifelse](https://datatable.readthedocs.io/en/latest/api/dt/ifelse.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comments\n", "" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.18" } }, "nbformat": 4, "nbformat_minor": 4 }