{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Dplyr's across: Replicating within Pandas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[dplyr](https://dplyr.tidyverse.org/index.html) has the [across](https://dplyr.tidyverse.org/reference/across.html) function, which is meant to make column wise processing easy. Pandas has always supported column-wise operations, and in a relatively simple manner. \n", "\n", "This article aims to replicate solutions in the dplyr [column-wise operations](https://dplyr.tidyverse.org/articles/colwise.html) vignette with Pandas." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's load in the relevant libraries" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " pandas version : 2.2.2 \n", " numpy version : 2.0.0 \n", " python version : 3.12.4 | packaged by conda-forge | (main, Jun 17 2024, 10:13:44) [Clang 16.0.6 ]\n" ] } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "import sys\n", "\n", "print(\" pandas version :\", pd.__version__,\"\\n\", \n", " \"numpy version :\", np.__version__, \"\\n\", \n", " \"python version :\", sys.version)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameheightmasshair_colorskin_coloreye_colorbirth_yearsexgenderhomeworldspeciesfilmsvehiclesstarships
0Luke Skywalker172.077.0blondfairblue19.0malemasculineTatooineHumanA New Hope, The Empire Strikes Back, Return of...Snowspeeder, Imperial Speeder BikeX-wing, Imperial shuttle
1C-3PO167.075.0NaNgoldyellow112.0nonemasculineTatooineDroidA New Hope, The Empire Strikes Back, Return of...NaNNaN
2R2-D296.032.0NaNwhite, bluered33.0nonemasculineNabooDroidA New Hope, The Empire Strikes Back, Return of...NaNNaN
3Darth Vader202.0136.0nonewhiteyellow41.9malemasculineTatooineHumanA New Hope, The Empire Strikes Back, Return of...NaNTIE Advanced x1
4Leia Organa150.049.0brownlightbrown19.0femalefeminineAlderaanHumanA New Hope, The Empire Strikes Back, Return of...Imperial Speeder BikeNaN
\n", "
" ], "text/plain": [ " name height mass hair_color skin_color eye_color \\\n", "0 Luke Skywalker 172.0 77.0 blond fair blue \n", "1 C-3PO 167.0 75.0 NaN gold yellow \n", "2 R2-D2 96.0 32.0 NaN white, blue red \n", "3 Darth Vader 202.0 136.0 none white yellow \n", "4 Leia Organa 150.0 49.0 brown light brown \n", "\n", " birth_year sex gender homeworld species \\\n", "0 19.0 male masculine Tatooine Human \n", "1 112.0 none masculine Tatooine Droid \n", "2 33.0 none masculine Naboo Droid \n", "3 41.9 male masculine Tatooine Human \n", "4 19.0 female feminine Alderaan Human \n", "\n", " films \\\n", "0 A New Hope, The Empire Strikes Back, Return of... \n", "1 A New Hope, The Empire Strikes Back, Return of... \n", "2 A New Hope, The Empire Strikes Back, Return of... \n", "3 A New Hope, The Empire Strikes Back, Return of... \n", "4 A New Hope, The Empire Strikes Back, Return of... \n", "\n", " vehicles starships \n", "0 Snowspeeder, Imperial Speeder Bike X-wing, Imperial shuttle \n", "1 NaN NaN \n", "2 NaN NaN \n", "3 NaN TIE Advanced x1 \n", "4 Imperial Speeder Bike NaN " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "url = \"https://raw.githubusercontent.com/tidyverse/dplyr/main/data-raw/starwars.csv\"\n", "starwars = pd.read_csv(url)\n", "starwars.head()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "name 87\n", "hair_color 12\n", "skin_color 31\n", "eye_color 15\n", "sex 5\n", "gender 3\n", "homeworld 49\n", "species 38\n", "films 24\n", "vehicles 11\n", "starships 16\n", "dtype: int64" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# dplyr : \n", "# starwars %>% \n", "# summarise(across(where(is.character), n_distinct))\n", "(starwars\n", ".select_dtypes('object')\n", ".nunique(dropna=False))" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# dplyr :\n", "# starwars %>% \n", "# group_by(species) %>% \n", "# filter(n() > 1) %>% \n", "# summarise(across(c(sex, gender, homeworld), n_distinct))" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sexgenderhomeworld
species
Droid123
Gungan111
Human2215
Kaminoan221
Mirialan111
Twi'lek221
Wookiee111
Zabrak112
\n", "
" ], "text/plain": [ " sex gender homeworld\n", "species \n", "Droid 1 2 3\n", "Gungan 1 1 1\n", "Human 2 2 15\n", "Kaminoan 2 2 1\n", "Mirialan 1 1 1\n", "Twi'lek 2 2 1\n", "Wookiee 1 1 1\n", "Zabrak 1 1 2" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grp = starwars.groupby('species')\n", "\n", "(grp[['sex', 'gender', 'homeworld']]\n", " .nunique(dropna=False)\n", " .loc[grp.size().gt(1)]\n", " )" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Another approach is with the [pipe](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.GroupBy.pipe.html) function:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sexgenderhomeworld
species
Droid123
Gungan111
Human2215
Kaminoan221
Mirialan111
Twi'lek221
Wookiee111
Zabrak112
\n", "
" ], "text/plain": [ " sex gender homeworld\n", "species \n", "Droid 1 2 3\n", "Gungan 1 1 1\n", "Human 2 2 15\n", "Kaminoan 2 2 1\n", "Mirialan 1 1 1\n", "Twi'lek 2 2 1\n", "Wookiee 1 1 1\n", "Zabrak 1 1 2" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(starwars\n", ".filter(['species', 'sex', 'gender', 'homeworld'])\n", ".groupby('species')\n", ".pipe(lambda df: df.nunique(dropna = False).loc[df.size().gt(1)])\n", ")" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# dplyr : \n", "# starwars %>% \n", "# group_by(homeworld) %>% \n", "# filter(n() > 1) %>% \n", "# summarise(across(where(is.numeric), ~ mean(.x, na.rm = TRUE)))" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
heightmassbirth_year
homeworld
Alderaan176.33333364.00000043.000000
Corellia175.00000078.50000025.000000
Coruscant173.66666750.00000091.000000
Kamino208.33333383.10000031.500000
Kashyyyk231.000000124.000000200.000000
Mirial168.00000053.10000049.000000
Naboo177.27272764.16666755.000000
Ryloth179.00000055.00000048.000000
Tatooine169.80000085.37500054.644444
\n", "
" ], "text/plain": [ " height mass birth_year\n", "homeworld \n", "Alderaan 176.333333 64.000000 43.000000\n", "Corellia 175.000000 78.500000 25.000000\n", "Coruscant 173.666667 50.000000 91.000000\n", "Kamino 208.333333 83.100000 31.500000\n", "Kashyyyk 231.000000 124.000000 200.000000\n", "Mirial 168.000000 53.100000 49.000000\n", "Naboo 177.272727 64.166667 55.000000\n", "Ryloth 179.000000 55.000000 48.000000\n", "Tatooine 169.800000 85.375000 54.644444" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# get the grouper\n", "grp = starwars.set_index('homeworld').select_dtypes('number').groupby('homeworld')\n", "# get the size:\n", "n = grp.size()\n", "# filter averages with the size:\n", "grp.mean().loc[n.gt(1)]" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "With the [pipe](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.GroupBy.pipe.html) function:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
heightmassbirth_year
homeworld
Alderaan176.33333364.00000043.000000
Corellia175.00000078.50000025.000000
Coruscant173.66666750.00000091.000000
Kamino208.33333383.10000031.500000
Kashyyyk231.000000124.000000200.000000
Mirial168.00000053.10000049.000000
Naboo177.27272764.16666755.000000
Ryloth179.00000055.00000048.000000
Tatooine169.80000085.37500054.644444
\n", "
" ], "text/plain": [ " height mass birth_year\n", "homeworld \n", "Alderaan 176.333333 64.000000 43.000000\n", "Corellia 175.000000 78.500000 25.000000\n", "Coruscant 173.666667 50.000000 91.000000\n", "Kamino 208.333333 83.100000 31.500000\n", "Kashyyyk 231.000000 124.000000 200.000000\n", "Mirial 168.000000 53.100000 49.000000\n", "Naboo 177.272727 64.166667 55.000000\n", "Ryloth 179.000000 55.000000 48.000000\n", "Tatooine 169.800000 85.375000 54.644444" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(starwars\n", ".set_index('homeworld')\n", ".select_dtypes('number')\n", ".groupby('homeworld')\n", ".pipe(lambda df: df.mean().loc[df.size().gt(1)])\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An alternate approach, where the groupby is reused, is with [pyjanitor's](https://pyjanitor-devs.github.io/pyjanitor/) [get_columns](https://pyjanitor-devs.github.io/pyjanitor/api/functions/#janitor.functions.utils.get_columns) function:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
heightmassbirth_year
homeworld
Alderaan176.33333364.00000043.000000
Corellia175.00000078.50000025.000000
Coruscant173.66666750.00000091.000000
Kamino208.33333383.10000031.500000
Kashyyyk231.000000124.000000200.000000
Mirial168.00000053.10000049.000000
Naboo177.27272764.16666755.000000
Ryloth179.00000055.00000048.000000
Tatooine169.80000085.37500054.644444
\n", "
" ], "text/plain": [ " height mass birth_year\n", "homeworld \n", "Alderaan 176.333333 64.000000 43.000000\n", "Corellia 175.000000 78.500000 25.000000\n", "Coruscant 173.666667 50.000000 91.000000\n", "Kamino 208.333333 83.100000 31.500000\n", "Kashyyyk 231.000000 124.000000 200.000000\n", "Mirial 168.000000 53.100000 49.000000\n", "Naboo 177.272727 64.166667 55.000000\n", "Ryloth 179.000000 55.000000 48.000000\n", "Tatooine 169.800000 85.375000 54.644444" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from janitor import get_columns\n", "from pandas.api.types import is_numeric_dtype\n", "\n", "grp = starwars.groupby('homeworld')\n", "(grp\n", " .pipe(get_columns, is_numeric_dtype)\n", " .mean()\n", " .loc[grp.size().gt(1)]\n", " )" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
hair_colorskin_coloreye_color
0blondfairblue
1NaNgoldyellow
2NaNwhite, bluered
3nonewhiteyellow
4brownlightbrown
............
80nonepalewhite
82blackdarkdark
83brownlighthazel
85nonenoneblack
86nonenoneunknown
\n", "

67 rows × 3 columns

\n", "
" ], "text/plain": [ " hair_color skin_color eye_color\n", "0 blond fair blue\n", "1 NaN gold yellow\n", "2 NaN white, blue red\n", "3 none white yellow\n", "4 brown light brown\n", ".. ... ... ...\n", "80 none pale white\n", "82 black dark dark\n", "83 brown light hazel\n", "85 none none black\n", "86 none none unknown\n", "\n", "[67 rows x 3 columns]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# dplyr:\n", "# starwars %>% distinct(across(contains(\"color\")))\n", "starwars.filter(like='color').drop_duplicates()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "hair_color skin_color eye_color\n", "brown light brown 6\n", " fair blue 4\n", "none grey black 4\n", "black dark brown 3\n", "blond fair blue 3\n", " ..\n", "none dark brown 1\n", " green black 1\n", "auburn, grey fair blue 1\n", "none green red 1\n", "NaN white, red red 1\n", "Name: count, Length: 67, dtype: int64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# dplyr\n", "# starwars %>% count(across(contains(\"color\")), sort = TRUE)\n", "starwars.filter(like='color').value_counts(dropna = False)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameheightmasshair_colorskin_coloreye_colorbirth_yearsexgenderhomeworldspeciesfilmsvehiclesstarships
0Luke Skywalker172.077.0blondfairblue19.0malemasculineTatooineHumanA New Hope, The Empire Strikes Back, Return of...Snowspeeder, Imperial Speeder BikeX-wing, Imperial shuttle
1C-3PO167.075.0NaNgoldyellow112.0nonemasculineTatooineDroidA New Hope, The Empire Strikes Back, Return of...NaNNaN
2R2-D296.032.0NaNwhite, bluered33.0nonemasculineNabooDroidA New Hope, The Empire Strikes Back, Return of...NaNNaN
3Darth Vader202.0136.0nonewhiteyellow41.9malemasculineTatooineHumanA New Hope, The Empire Strikes Back, Return of...NaNTIE Advanced x1
4Leia Organa150.049.0brownlightbrown19.0femalefeminineAlderaanHumanA New Hope, The Empire Strikes Back, Return of...Imperial Speeder BikeNaN
.............................................
82FinnNaNNaNblackdarkdarkNaNmalemasculineNaNHumanThe Force AwakensNaNNaN
83ReyNaNNaNbrownlighthazelNaNfemalefeminineNaNHumanThe Force AwakensNaNNaN
84Poe DameronNaNNaNbrownlightbrownNaNmalemasculineNaNHumanThe Force AwakensNaNX-wing
85BB8NaNNaNnonenoneblackNaNnonemasculineNaNDroidThe Force AwakensNaNNaN
86Captain PhasmaNaNNaNnonenoneunknownNaNfemalefeminineNaNHumanThe Force AwakensNaNNaN
\n", "

87 rows × 14 columns

\n", "
" ], "text/plain": [ " name height mass hair_color skin_color eye_color \\\n", "0 Luke Skywalker 172.0 77.0 blond fair blue \n", "1 C-3PO 167.0 75.0 NaN gold yellow \n", "2 R2-D2 96.0 32.0 NaN white, blue red \n", "3 Darth Vader 202.0 136.0 none white yellow \n", "4 Leia Organa 150.0 49.0 brown light brown \n", ".. ... ... ... ... ... ... \n", "82 Finn NaN NaN black dark dark \n", "83 Rey NaN NaN brown light hazel \n", "84 Poe Dameron NaN NaN brown light brown \n", "85 BB8 NaN NaN none none black \n", "86 Captain Phasma NaN NaN none none unknown \n", "\n", " birth_year sex gender homeworld species \\\n", "0 19.0 male masculine Tatooine Human \n", "1 112.0 none masculine Tatooine Droid \n", "2 33.0 none masculine Naboo Droid \n", "3 41.9 male masculine Tatooine Human \n", "4 19.0 female feminine Alderaan Human \n", ".. ... ... ... ... ... \n", "82 NaN male masculine NaN Human \n", "83 NaN female feminine NaN Human \n", "84 NaN male masculine NaN Human \n", "85 NaN none masculine NaN Droid \n", "86 NaN female feminine NaN Human \n", "\n", " films \\\n", "0 A New Hope, The Empire Strikes Back, Return of... \n", "1 A New Hope, The Empire Strikes Back, Return of... \n", "2 A New Hope, The Empire Strikes Back, Return of... \n", "3 A New Hope, The Empire Strikes Back, Return of... \n", "4 A New Hope, The Empire Strikes Back, Return of... \n", ".. ... \n", "82 The Force Awakens \n", "83 The Force Awakens \n", "84 The Force Awakens \n", "85 The Force Awakens \n", "86 The Force Awakens \n", "\n", " vehicles starships \n", "0 Snowspeeder, Imperial Speeder Bike X-wing, Imperial shuttle \n", "1 NaN NaN \n", "2 NaN NaN \n", "3 NaN TIE Advanced x1 \n", "4 Imperial Speeder Bike NaN \n", ".. ... ... \n", "82 NaN NaN \n", "83 NaN NaN \n", "84 NaN X-wing \n", "85 NaN NaN \n", "86 NaN NaN \n", "\n", "[87 rows x 14 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# starwars %>% \n", "# filter(if_any(everything(), ~ !is.na(.x)))\n", "starwars.loc[starwars.notna().any(axis=1)]" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameheightmasshair_colorskin_coloreye_colorbirth_yearsexgenderhomeworldspeciesfilmsvehiclesstarships
0Luke Skywalker172.077.0blondfairblue19.0malemasculineTatooineHumanA New Hope, The Empire Strikes Back, Return of...Snowspeeder, Imperial Speeder BikeX-wing, Imperial shuttle
9Obi-Wan Kenobi182.077.0auburn, whitefairblue-gray57.0malemasculineStewjonHumanA New Hope, The Empire Strikes Back, Return of...Tribubble bongoJedi starfighter, Trade Federation cruiser, Na...
10Anakin Skywalker188.084.0blondfairblue41.9malemasculineTatooineHumanThe Phantom Menace, Attack of the Clones, Reve...Zephyr-G swoop bike, XJ-6 airspeederNaboo fighter, Trade Federation cruiser, Jedi ...
12Chewbacca228.0112.0brownunknownblue200.0malemasculineKashyyykWookieeA New Hope, The Empire Strikes Back, Return of...AT-STMillennium Falcon, Imperial shuttle
16Wedge Antilles170.077.0brownfairhazel21.0malemasculineCorelliaHumanA New Hope, The Empire Strikes Back, Return of...SnowspeederX-wing
42Darth Maul175.080.0noneredyellow54.0malemasculineDathomirZabrakThe Phantom MenaceSith speederScimitar
\n", "
" ], "text/plain": [ " name height mass hair_color skin_color eye_color \\\n", "0 Luke Skywalker 172.0 77.0 blond fair blue \n", "9 Obi-Wan Kenobi 182.0 77.0 auburn, white fair blue-gray \n", "10 Anakin Skywalker 188.0 84.0 blond fair blue \n", "12 Chewbacca 228.0 112.0 brown unknown blue \n", "16 Wedge Antilles 170.0 77.0 brown fair hazel \n", "42 Darth Maul 175.0 80.0 none red yellow \n", "\n", " birth_year sex gender homeworld species \\\n", "0 19.0 male masculine Tatooine Human \n", "9 57.0 male masculine Stewjon Human \n", "10 41.9 male masculine Tatooine Human \n", "12 200.0 male masculine Kashyyyk Wookiee \n", "16 21.0 male masculine Corellia Human \n", "42 54.0 male masculine Dathomir Zabrak \n", "\n", " films \\\n", "0 A New Hope, The Empire Strikes Back, Return of... \n", "9 A New Hope, The Empire Strikes Back, Return of... \n", "10 The Phantom Menace, Attack of the Clones, Reve... \n", "12 A New Hope, The Empire Strikes Back, Return of... \n", "16 A New Hope, The Empire Strikes Back, Return of... \n", "42 The Phantom Menace \n", "\n", " vehicles \\\n", "0 Snowspeeder, Imperial Speeder Bike \n", "9 Tribubble bongo \n", "10 Zephyr-G swoop bike, XJ-6 airspeeder \n", "12 AT-ST \n", "16 Snowspeeder \n", "42 Sith speeder \n", "\n", " starships \n", "0 X-wing, Imperial shuttle \n", "9 Jedi starfighter, Trade Federation cruiser, Na... \n", "10 Naboo fighter, Trade Federation cruiser, Jedi ... \n", "12 Millennium Falcon, Imperial shuttle \n", "16 X-wing \n", "42 Scimitar " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# starwars %>% \n", "# filter(if_any(everything(), ~ !is.na(.x)))\n", "starwars.loc[starwars.notna().all(axis=1)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Output for the above is different from the vignette's because Pandas treats none as NaN, and dplyr does not (at least in the vignette)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's look at another solution, from [Stack Overflow](https://stackoverflow.com/questions/63200530/python-pandas-equivalent-to-dplyr-1-0-0-summarizeacross):" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
modelmpgcyldisphpdratwtqsecvsamgearcarb
0Mazda RX421.06160.01103.902.62016.460144
1Mazda RX4 Wag21.06160.01103.902.87517.020144
2Datsun 71022.84108.0933.852.32018.611141
3Hornet 4 Drive21.46258.01103.083.21519.441031
4Hornet Sportabout18.78360.01753.153.44017.020032
\n", "
" ], "text/plain": [ " model mpg cyl disp hp drat wt qsec vs am gear \\\n", "0 Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 \n", "1 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 \n", "2 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 \n", "3 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 \n", "4 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 \n", "\n", " carb \n", "0 4 \n", "1 4 \n", "2 1 \n", "3 1 \n", "4 2 " ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "url = \"https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv\"\n", "cars = pd.read_csv(url)\n", "cars.head()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
disphpdratwt
cyl
41156.59094.0709092.285727
61283.28563.5857143.117143
84943.429293.2292863.999214
\n", "
" ], "text/plain": [ " disp hp drat wt\n", "cyl \n", "4 1156.5 909 4.070909 2.285727\n", "6 1283.2 856 3.585714 3.117143\n", "8 4943.4 2929 3.229286 3.999214" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# dplyr:\n", "# dat <- group_by(mtcars, cyl) \n", "# summarize(dat, across(ends_with('p'), sum), across(ends_with('t'), mean))\n", "aggs = {key:\"mean\" if key.endswith(\"t\") else \"sum\" for key in cars if key.endswith(('p', 't'))}\n", "cars.groupby('cyl').agg(aggs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An alternate approach, where the groupby is reused, is with [pyjanitor's](https://pyjanitor-devs.github.io/pyjanitor/) [get_columns](https://pyjanitor-devs.github.io/pyjanitor/api/functions/#janitor.functions.utils.get_columns) function:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
disphpdratwt
cyl
41156.59094.0709092.285727
61283.28563.5857143.117143
84943.429293.2292863.999214
\n", "
" ], "text/plain": [ " disp hp drat wt\n", "cyl \n", "4 1156.5 909 4.070909 2.285727\n", "6 1283.2 856 3.585714 3.117143\n", "8 4943.4 2929 3.229286 3.999214" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grp = cars.groupby('cyl')\n", "summary = [get_columns(grp, '*p').sum(), get_columns(grp, '*t').mean()]\n", "pd.concat(summary, axis = 1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comments\n", "" ] } ], "metadata": { "jupytext": { "formats": "ipynb,md" }, "kernelspec": { "display_name": "blogger", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.14" } }, "nbformat": 4, "nbformat_minor": 4 }