Dplyr’s across: Replicating within Pandas
dplyr has the across function, which is meant to make column wise processing easy. Pandas has always supported column-wise operations, and in a relatively simple manner.
This article aims to replicate solutions in the dplyr column-wise operations vignette with Pandas.
Let’s load in the relevant libraries
pandas version : 2.2.2
numpy version : 2.0.2
python version : 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:51:49) [Clang 16.0.6 ]
|
name |
height |
mass |
hair_color |
skin_color |
eye_color |
birth_year |
sex |
gender |
homeworld |
species |
films |
vehicles |
starships |
0 |
Luke Skywalker |
172.0 |
77.0 |
blond |
fair |
blue |
19.0 |
male |
masculine |
Tatooine |
Human |
A New Hope, The Empire Strikes Back, Return of... |
Snowspeeder, Imperial Speeder Bike |
X-wing, Imperial shuttle |
1 |
C-3PO |
167.0 |
75.0 |
NaN |
gold |
yellow |
112.0 |
none |
masculine |
Tatooine |
Droid |
A New Hope, The Empire Strikes Back, Return of... |
NaN |
NaN |
2 |
R2-D2 |
96.0 |
32.0 |
NaN |
white, blue |
red |
33.0 |
none |
masculine |
Naboo |
Droid |
A New Hope, The Empire Strikes Back, Return of... |
NaN |
NaN |
3 |
Darth Vader |
202.0 |
136.0 |
none |
white |
yellow |
41.9 |
male |
masculine |
Tatooine |
Human |
A New Hope, The Empire Strikes Back, Return of... |
NaN |
TIE Advanced x1 |
4 |
Leia Organa |
150.0 |
49.0 |
brown |
light |
brown |
19.0 |
female |
feminine |
Alderaan |
Human |
A New Hope, The Empire Strikes Back, Return of... |
Imperial Speeder Bike |
NaN |
name 87
hair_color 12
skin_color 31
eye_color 15
sex 5
gender 3
homeworld 49
species 38
films 24
vehicles 11
starships 16
dtype: int64
|
sex |
gender |
homeworld |
species |
|
|
|
Droid |
1 |
2 |
3 |
Gungan |
1 |
1 |
1 |
Human |
2 |
2 |
15 |
Kaminoan |
2 |
2 |
1 |
Mirialan |
1 |
1 |
1 |
Twi'lek |
2 |
2 |
1 |
Wookiee |
1 |
1 |
1 |
Zabrak |
1 |
1 |
2 |
Another approach is with the pipe function:
|
sex |
gender |
homeworld |
species |
|
|
|
Droid |
1 |
2 |
3 |
Gungan |
1 |
1 |
1 |
Human |
2 |
2 |
15 |
Kaminoan |
2 |
2 |
1 |
Mirialan |
1 |
1 |
1 |
Twi'lek |
2 |
2 |
1 |
Wookiee |
1 |
1 |
1 |
Zabrak |
1 |
1 |
2 |
|
height |
mass |
birth_year |
homeworld |
|
|
|
Alderaan |
176.333333 |
64.000000 |
43.000000 |
Corellia |
175.000000 |
78.500000 |
25.000000 |
Coruscant |
173.666667 |
50.000000 |
91.000000 |
Kamino |
208.333333 |
83.100000 |
31.500000 |
Kashyyyk |
231.000000 |
124.000000 |
200.000000 |
Mirial |
168.000000 |
53.100000 |
49.000000 |
Naboo |
177.272727 |
64.166667 |
55.000000 |
Ryloth |
179.000000 |
55.000000 |
48.000000 |
Tatooine |
169.800000 |
85.375000 |
54.644444 |
With the pipe function:
|
height |
mass |
birth_year |
homeworld |
|
|
|
Alderaan |
176.333333 |
64.000000 |
43.000000 |
Corellia |
175.000000 |
78.500000 |
25.000000 |
Coruscant |
173.666667 |
50.000000 |
91.000000 |
Kamino |
208.333333 |
83.100000 |
31.500000 |
Kashyyyk |
231.000000 |
124.000000 |
200.000000 |
Mirial |
168.000000 |
53.100000 |
49.000000 |
Naboo |
177.272727 |
64.166667 |
55.000000 |
Ryloth |
179.000000 |
55.000000 |
48.000000 |
Tatooine |
169.800000 |
85.375000 |
54.644444 |
An alternate approach, where the groupby is reused, is with pyjanitor’s get_columns function:
|
height |
mass |
birth_year |
homeworld |
|
|
|
Alderaan |
176.333333 |
64.000000 |
43.000000 |
Corellia |
175.000000 |
78.500000 |
25.000000 |
Coruscant |
173.666667 |
50.000000 |
91.000000 |
Kamino |
208.333333 |
83.100000 |
31.500000 |
Kashyyyk |
231.000000 |
124.000000 |
200.000000 |
Mirial |
168.000000 |
53.100000 |
49.000000 |
Naboo |
177.272727 |
64.166667 |
55.000000 |
Ryloth |
179.000000 |
55.000000 |
48.000000 |
Tatooine |
169.800000 |
85.375000 |
54.644444 |
|
hair_color |
skin_color |
eye_color |
0 |
blond |
fair |
blue |
1 |
NaN |
gold |
yellow |
2 |
NaN |
white, blue |
red |
3 |
none |
white |
yellow |
4 |
brown |
light |
brown |
... |
... |
... |
... |
80 |
none |
pale |
white |
82 |
black |
dark |
dark |
83 |
brown |
light |
hazel |
85 |
none |
none |
black |
86 |
none |
none |
unknown |
67 rows × 3 columns
hair_color skin_color eye_color
brown light brown 6
fair blue 4
none grey black 4
black dark brown 3
blond fair blue 3
..
none dark brown 1
green black 1
auburn, grey fair blue 1
none green red 1
NaN white, red red 1
Name: count, Length: 67, dtype: int64
|
name |
height |
mass |
hair_color |
skin_color |
eye_color |
birth_year |
sex |
gender |
homeworld |
species |
films |
vehicles |
starships |
0 |
Luke Skywalker |
172.0 |
77.0 |
blond |
fair |
blue |
19.0 |
male |
masculine |
Tatooine |
Human |
A New Hope, The Empire Strikes Back, Return of... |
Snowspeeder, Imperial Speeder Bike |
X-wing, Imperial shuttle |
1 |
C-3PO |
167.0 |
75.0 |
NaN |
gold |
yellow |
112.0 |
none |
masculine |
Tatooine |
Droid |
A New Hope, The Empire Strikes Back, Return of... |
NaN |
NaN |
2 |
R2-D2 |
96.0 |
32.0 |
NaN |
white, blue |
red |
33.0 |
none |
masculine |
Naboo |
Droid |
A New Hope, The Empire Strikes Back, Return of... |
NaN |
NaN |
3 |
Darth Vader |
202.0 |
136.0 |
none |
white |
yellow |
41.9 |
male |
masculine |
Tatooine |
Human |
A New Hope, The Empire Strikes Back, Return of... |
NaN |
TIE Advanced x1 |
4 |
Leia Organa |
150.0 |
49.0 |
brown |
light |
brown |
19.0 |
female |
feminine |
Alderaan |
Human |
A New Hope, The Empire Strikes Back, Return of... |
Imperial Speeder Bike |
NaN |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
82 |
Finn |
NaN |
NaN |
black |
dark |
dark |
NaN |
male |
masculine |
NaN |
Human |
The Force Awakens |
NaN |
NaN |
83 |
Rey |
NaN |
NaN |
brown |
light |
hazel |
NaN |
female |
feminine |
NaN |
Human |
The Force Awakens |
NaN |
NaN |
84 |
Poe Dameron |
NaN |
NaN |
brown |
light |
brown |
NaN |
male |
masculine |
NaN |
Human |
The Force Awakens |
NaN |
X-wing |
85 |
BB8 |
NaN |
NaN |
none |
none |
black |
NaN |
none |
masculine |
NaN |
Droid |
The Force Awakens |
NaN |
NaN |
86 |
Captain Phasma |
NaN |
NaN |
none |
none |
unknown |
NaN |
female |
feminine |
NaN |
Human |
The Force Awakens |
NaN |
NaN |
87 rows × 14 columns
|
name |
height |
mass |
hair_color |
skin_color |
eye_color |
birth_year |
sex |
gender |
homeworld |
species |
films |
vehicles |
starships |
0 |
Luke Skywalker |
172.0 |
77.0 |
blond |
fair |
blue |
19.0 |
male |
masculine |
Tatooine |
Human |
A New Hope, The Empire Strikes Back, Return of... |
Snowspeeder, Imperial Speeder Bike |
X-wing, Imperial shuttle |
9 |
Obi-Wan Kenobi |
182.0 |
77.0 |
auburn, white |
fair |
blue-gray |
57.0 |
male |
masculine |
Stewjon |
Human |
A New Hope, The Empire Strikes Back, Return of... |
Tribubble bongo |
Jedi starfighter, Trade Federation cruiser, Na... |
10 |
Anakin Skywalker |
188.0 |
84.0 |
blond |
fair |
blue |
41.9 |
male |
masculine |
Tatooine |
Human |
The Phantom Menace, Attack of the Clones, Reve... |
Zephyr-G swoop bike, XJ-6 airspeeder |
Naboo fighter, Trade Federation cruiser, Jedi ... |
12 |
Chewbacca |
228.0 |
112.0 |
brown |
unknown |
blue |
200.0 |
male |
masculine |
Kashyyyk |
Wookiee |
A New Hope, The Empire Strikes Back, Return of... |
AT-ST |
Millennium Falcon, Imperial shuttle |
16 |
Wedge Antilles |
170.0 |
77.0 |
brown |
fair |
hazel |
21.0 |
male |
masculine |
Corellia |
Human |
A New Hope, The Empire Strikes Back, Return of... |
Snowspeeder |
X-wing |
42 |
Darth Maul |
175.0 |
80.0 |
none |
red |
yellow |
54.0 |
male |
masculine |
Dathomir |
Zabrak |
The Phantom Menace |
Sith speeder |
Scimitar |
Output for the above is different from the vignette’s because Pandas treats none as NaN, and dplyr does not (at least in the vignette)
Let’s look at another solution, from Stack Overflow:
|
model |
mpg |
cyl |
disp |
hp |
drat |
wt |
qsec |
vs |
am |
gear |
carb |
0 |
Mazda RX4 |
21.0 |
6 |
160.0 |
110 |
3.90 |
2.620 |
16.46 |
0 |
1 |
4 |
4 |
1 |
Mazda RX4 Wag |
21.0 |
6 |
160.0 |
110 |
3.90 |
2.875 |
17.02 |
0 |
1 |
4 |
4 |
2 |
Datsun 710 |
22.8 |
4 |
108.0 |
93 |
3.85 |
2.320 |
18.61 |
1 |
1 |
4 |
1 |
3 |
Hornet 4 Drive |
21.4 |
6 |
258.0 |
110 |
3.08 |
3.215 |
19.44 |
1 |
0 |
3 |
1 |
4 |
Hornet Sportabout |
18.7 |
8 |
360.0 |
175 |
3.15 |
3.440 |
17.02 |
0 |
0 |
3 |
2 |
|
disp |
hp |
drat |
wt |
cyl |
|
|
|
|
4 |
1156.5 |
909 |
4.070909 |
2.285727 |
6 |
1283.2 |
856 |
3.585714 |
3.117143 |
8 |
4943.4 |
2929 |
3.229286 |
3.999214 |
An alternate approach, where the groupby is reused, is with pyjanitor’s get_columns function:
|
disp |
hp |
drat |
wt |
cyl |
|
|
|
|
4 |
1156.5 |
909 |
4.070909 |
2.285727 |
6 |
1283.2 |
856 |
3.585714 |
3.117143 |
8 |
4943.4 |
2929 |
3.229286 |
3.999214 |
Comments#