Data Wrangling with Python Datatable - Row-wise Transformations#
Link to Source data#
Task: Get the difference between the maximum and minimum values per row for Value
columns.
from datatable import dt, f
df = dt.Frame({'Ind': [1, 2, 3],
'Department': ['Electronics', 'Clothing', 'Grocery'],
'Value1': [5, 4, 3],
'Value2': [4, 3, 3],
'Value3': [3, 2, 5],
'Value4': [2, 1, 1]})
df
Ind | Department | Value1 | Value2 | Value3 | Value4 | |
---|---|---|---|---|---|---|
▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | |
0 | 1 | Electronics | 5 | 4 | 3 | 2 |
1 | 2 | Clothing | 4 | 3 | 2 | 1 |
2 | 3 | Grocery | 3 | 3 | 5 | 1 |
COMPLETE SOLUTION
value_columns = [name for name in df.names if "Value" in name]
max_min_difference = f[value_columns].rowmax()- f[value_columns].rowmin()
df[:, [f[:], max_min_difference.alias("difference")]]
Ind | Department | Value1 | Value2 | Value3 | Value4 | difference | |
---|---|---|---|---|---|---|---|
▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | |
0 | 1 | Electronics | 5 | 4 | 3 | 2 | 3 |
1 | 2 | Clothing | 4 | 3 | 2 | 1 | 3 |
2 | 3 | Grocery | 3 | 3 | 5 | 1 | 4 |
df
Ind | Department | Value1 | Value2 | Value3 | Value4 | |
---|---|---|---|---|---|---|
▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | |
0 | 1 | Electronics | 5 | 4 | 3 | 2 |
1 | 2 | Clothing | 4 | 3 | 2 | 1 |
2 | 3 | Grocery | 3 | 3 | 5 | 1 |
BREAKDOWN OF SOLUTION
Step 1 : Filter for columns that start with
Value
and prefix with thef
symbol
value_columns = [name for name in df.names if "Value" in name]
value_columns
['Value1', 'Value2', 'Value3', 'Value4']
Step 2 : Create an
f-expression
of the difference between the row maximum and row minimum ofvalue_columns
. Note that there is no execution at this point; the execution of af-expression
only occurs within the brackets of a datatable frame.
max_min_difference = f[value_columns].rowmax() - f[value_columns].rowmin()
max_min_difference
FExpr<rowmax(f[['Value1', 'Value2', 'Value3', 'Value4']]) - rowmin(f[['Value1', 'Value2', 'Value3', 'Value4']])>
Step 3: Apply
max_min_difference
to the datatable frame to get the results
df[:, [f[:], max_min_difference.alias("difference")]]
Ind | Department | Value1 | Value2 | Value3 | Value4 | difference | |
---|---|---|---|---|---|---|---|
▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | |
0 | 1 | Electronics | 5 | 4 | 3 | 2 | 3 |
1 | 2 | Clothing | 4 | 3 | 2 | 1 | 3 |
2 | 3 | Grocery | 3 | 3 | 5 | 1 | 4 |
Resources:
Comments#