Data Wrangling with Python Datatable - Selecting Columns#
This article highlights various ways to select columns in python datatable. The examples used here are based off the excellent article by Susan Baert.
The data file can be accessed here
Selecting Columns#
The Basics#
from datatable import dt, f, ltype, stype
import re
file_path='Data_files/msleep.txt'
DT = dt.fread(file_path)
DT.head(5)
name | genus | vore | order | conservation | sleep_total | sleep_rem | sleep_cycle | awake | brainwt | bodywt | |
---|---|---|---|---|---|---|---|---|---|---|---|
▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | |
0 | Cheetah | Acinonyx | carni | Carnivora | lc | 12.1 | NA | NA | 11.9 | NA | 50 |
1 | Owl monkey | Aotus | omni | Primates | NA | 17 | 1.8 | NA | 7 | 0.0155 | 0.48 |
2 | Mountain beaver | Aplodontia | herbi | Rodentia | nt | 14.4 | 2.4 | NA | 9.6 | NA | 1.35 |
3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | lc | 14.9 | 2.3 | 0.133333 | 9.1 | 0.00029 | 0.019 |
4 | Cow | Bos | herbi | Artiodactyla | domesticated | 4 | 0.7 | 0.666667 | 20 | 0.423 | 600 |
You can select columns by name or position in the j
section:
DT[:, 'genus'].head(5)
genus | |
---|---|
▪▪▪▪ | |
0 | Acinonyx |
1 | Aotus |
2 | Aplodontia |
3 | Blarina |
4 | Bos |
DT[:, 1].head()
genus | |
---|---|
▪▪▪▪ | |
0 | Acinonyx |
1 | Aotus |
2 | Aplodontia |
3 | Blarina |
4 | Bos |
5 | Bradypus |
6 | Callorhinus |
7 | Calomys |
8 | Canis |
9 | Capreolus |
DT[:, -10].head()
genus | |
---|---|
▪▪▪▪ | |
0 | Acinonyx |
1 | Aotus |
2 | Aplodontia |
3 | Blarina |
4 | Bos |
5 | Bradypus |
6 | Callorhinus |
7 | Calomys |
8 | Canis |
9 | Capreolus |
If you are selecting a single column, you can pass it into the brackets without specifying the i
section:
DT['genus'].head(5)
genus | |
---|---|
▪▪▪▪ | |
0 | Acinonyx |
1 | Aotus |
2 | Aplodontia |
3 | Blarina |
4 | Bos |
For the rest of this article, I will be focusing on column selection by name.
You can select columns by passing a list/tuple of the column names:
columns_to_select = ["name", "genus", "sleep_total", "awake"]
DT[:, columns_to_select].head(5)
name | genus | sleep_total | awake | |
---|---|---|---|---|
▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | |
0 | Cheetah | Acinonyx | 12.1 | 11.9 |
1 | Owl monkey | Aotus | 17 | 7 |
2 | Mountain beaver | Aplodontia | 14.4 | 9.6 |
3 | Greater short-tailed shrew | Blarina | 14.9 | 9.1 |
4 | Cow | Bos | 4 | 20 |
You can pass a list/tuple of booleans:
columns_to_select = [True, True, False, False, False, True,False,True,True,False,False]
DT[:, columns_to_select].head(5)
name | genus | sleep_total | sleep_cycle | awake | |
---|---|---|---|---|---|
▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | |
0 | Cheetah | Acinonyx | 12.1 | NA | 11.9 |
1 | Owl monkey | Aotus | 17 | NA | 7 |
2 | Mountain beaver | Aplodontia | 14.4 | NA | 9.6 |
3 | Greater short-tailed shrew | Blarina | 14.9 | 0.133333 | 9.1 |
4 | Cow | Bos | 4 | 0.666667 | 20 |
You can select chunks of columns using python’s slice syntax or via the start:end
shortcut:
DT[:, slice("name", "order")].head(5)
name | genus | vore | order | |
---|---|---|---|---|
▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | |
0 | Cheetah | Acinonyx | carni | Carnivora |
1 | Owl monkey | Aotus | omni | Primates |
2 | Mountain beaver | Aplodontia | herbi | Rodentia |
3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha |
4 | Cow | Bos | herbi | Artiodactyla |
DT[:, "name" : "order"].head(5)
name | genus | vore | order | |
---|---|---|---|---|
▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | |
0 | Cheetah | Acinonyx | carni | Carnivora |
1 | Owl monkey | Aotus | omni | Primates |
2 | Mountain beaver | Aplodontia | herbi | Rodentia |
3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha |
4 | Cow | Bos | herbi | Artiodactyla |
Multiple chunk selection is possible:
columns_to_select = [slice("name", "order"), slice("sleep_total", "sleep_cycle")]
DT[:, columns_to_select].head(5)
name | genus | vore | order | sleep_total | sleep_rem | sleep_cycle | |
---|---|---|---|---|---|---|---|
▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | |
0 | Cheetah | Acinonyx | carni | Carnivora | 12.1 | NA | NA |
1 | Owl monkey | Aotus | omni | Primates | 17 | 1.8 | NA |
2 | Mountain beaver | Aplodontia | herbi | Rodentia | 14.4 | 2.4 | NA |
3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | 14.9 | 2.3 | 0.133333 |
4 | Cow | Bos | herbi | Artiodactyla | 4 | 0.7 | 0.666667 |
For the shortcut notation, for multiple selections, it has to be prefixed with datatable’s f symbol:
columns_to_select = [f["name" : "order", "sleep_total" : "sleep_cycle"]]
DT[:, columns_to_select].head(5)
name | genus | vore | order | sleep_total | sleep_rem | sleep_cycle | |
---|---|---|---|---|---|---|---|
▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | |
0 | Cheetah | Acinonyx | carni | Carnivora | 12.1 | NA | NA |
1 | Owl monkey | Aotus | omni | Primates | 17 | 1.8 | NA |
2 | Mountain beaver | Aplodontia | herbi | Rodentia | 14.4 | 2.4 | NA |
3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | 14.9 | 2.3 | 0.133333 |
4 | Cow | Bos | herbi | Artiodactyla | 4 | 0.7 | 0.666667 |
To deselect/drop columns you can use the remove function:
columns_to_remove = [f["sleep_total" : "awake", "conservation"]]
DT[:, f[:].remove(columns_to_remove)].head(5)
name | genus | vore | order | brainwt | bodywt | |
---|---|---|---|---|---|---|
▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | |
0 | Cheetah | Acinonyx | carni | Carnivora | NA | 50 |
1 | Owl monkey | Aotus | omni | Primates | 0.0155 | 0.48 |
2 | Mountain beaver | Aplodontia | herbi | Rodentia | NA | 1.35 |
3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | 0.00029 | 0.019 |
4 | Cow | Bos | herbi | Artiodactyla | 0.423 | 600 |
You can deselect a whole chunk, and then re-add a column again; this combines the remove and extend functions:
DT[:, f[:].remove(f["name" : "awake"]).extend(f["conservation"])].head(5)
brainwt | bodywt | conservation | |
---|---|---|---|
▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪ | |
0 | NA | 50 | lc |
1 | 0.0155 | 0.48 | NA |
2 | NA | 1.35 | nt |
3 | 0.00029 | 0.019 | lc |
4 | 0.423 | 600 | domesticated |
Selecting Columns based on Partial Names#
You can use python’s string functions to filter for columns with partial matching:
columns_to_select = [name.startswith("sleep") for name in DT.names]
DT[:, columns_to_select].head(5)
sleep_total | sleep_rem | sleep_cycle | |
---|---|---|---|
▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | |
0 | 12.1 | NA | NA |
1 | 17 | 1.8 | NA |
2 | 14.4 | 2.4 | NA |
3 | 14.9 | 2.3 | 0.133333 |
4 | 4 | 0.7 | 0.666667 |
columns_to_select = ["eep" in name or name.endswith("wt") for name in DT.names]
DT[:, columns_to_select].head(5)
sleep_total | sleep_rem | sleep_cycle | brainwt | bodywt | |
---|---|---|---|---|---|
▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | |
0 | 12.1 | NA | NA | NA | 50 |
1 | 17 | 1.8 | NA | 0.0155 | 0.48 |
2 | 14.4 | 2.4 | NA | NA | 1.35 |
3 | 14.9 | 2.3 | 0.133333 | 0.00029 | 0.019 |
4 | 4 | 0.7 | 0.666667 | 0.423 | 600 |
Selecting Columns based on Regex#
Python’s re module can be used to select columns based on a regular expression:
# this returns a list of booleans
columns_to_select = [True if re.search(r"o.+er", name) else False for name in DT.names]
DT[:, columns_to_select].head(5)
order | conservation | |
---|---|---|
▪▪▪▪ | ▪▪▪▪ | |
0 | Carnivora | lc |
1 | Primates | NA |
2 | Rodentia | nt |
3 | Soricomorpha | lc |
4 | Artiodactyla | domesticated |
Selecting columns by their data type#
You can pass a data type in the j
section:
DT[:, str].head(5)
name | genus | vore | order | conservation | |
---|---|---|---|---|---|
▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | |
0 | Cheetah | Acinonyx | carni | Carnivora | lc |
1 | Owl monkey | Aotus | omni | Primates | NA |
2 | Mountain beaver | Aplodontia | herbi | Rodentia | nt |
3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | lc |
4 | Cow | Bos | herbi | Artiodactyla | domesticated |
You can pass a list of data types:
DT[:, [int, float]].head(5)
sleep_total | sleep_rem | sleep_cycle | awake | brainwt | bodywt | |
---|---|---|---|---|---|---|
▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | |
0 | 12.1 | NA | NA | 11.9 | NA | 50 |
1 | 17 | 1.8 | NA | 7 | 0.0155 | 0.48 |
2 | 14.4 | 2.4 | NA | 9.6 | NA | 1.35 |
3 | 14.9 | 2.3 | 0.133333 | 9.1 | 0.00029 | 0.019 |
4 | 4 | 0.7 | 0.666667 | 20 | 0.423 | 600 |
You can also pass datatable’s stype or ltype data types:
DT[:, ltype.str].head(5)
name | genus | vore | order | conservation | |
---|---|---|---|---|---|
▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | |
0 | Cheetah | Acinonyx | carni | Carnivora | lc |
1 | Owl monkey | Aotus | omni | Primates | NA |
2 | Mountain beaver | Aplodontia | herbi | Rodentia | nt |
3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | lc |
4 | Cow | Bos | herbi | Artiodactyla | domesticated |
DT[:, stype.float64].head(5)
sleep_total | sleep_rem | sleep_cycle | awake | brainwt | bodywt | |
---|---|---|---|---|---|---|
▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | |
0 | 12.1 | NA | NA | 11.9 | NA | 50 |
1 | 17 | 1.8 | NA | 7 | 0.0155 | 0.48 |
2 | 14.4 | 2.4 | NA | 9.6 | NA | 1.35 |
3 | 14.9 | 2.3 | 0.133333 | 9.1 | 0.00029 | 0.019 |
4 | 4 | 0.7 | 0.666667 | 20 | 0.423 | 600 |
You can remove columns based on their data type:
columns_to_remove = [f[int, float]]
DT[:, f[:].remove(columns_to_remove)].head(5)
name | genus | vore | order | conservation | |
---|---|---|---|---|---|
▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | |
0 | Cheetah | Acinonyx | carni | Carnivora | lc |
1 | Owl monkey | Aotus | omni | Primates | NA |
2 | Mountain beaver | Aplodontia | herbi | Rodentia | nt |
3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | lc |
4 | Cow | Bos | herbi | Artiodactyla | domesticated |
An alternative is to preselect the columns you intend to keep:
# creates a list of booleans
columns_to_select = [
dtype not in (ltype.int, ltype.real)
for _, dtype in zip(DT.names, DT.ltypes)
]
DT[:, columns_to_select].head(5)
name | genus | vore | order | conservation | |
---|---|---|---|---|---|
▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | |
0 | Cheetah | Acinonyx | carni | Carnivora | lc |
1 | Owl monkey | Aotus | omni | Primates | NA |
2 | Mountain beaver | Aplodontia | herbi | Rodentia | nt |
3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | lc |
4 | Cow | Bos | herbi | Artiodactyla | domesticated |
You could also iterate through the frame and check each column’s type, before recombining with cbind:
matching_frames = [frame for frame in DT if frame.ltypes[0] not in (ltype.real, ltype.int)]
dt.cbind(matching_frames).head(5)
name | genus | vore | order | conservation | |
---|---|---|---|---|---|
▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | |
0 | Cheetah | Acinonyx | carni | Carnivora | lc |
1 | Owl monkey | Aotus | omni | Primates | NA |
2 | Mountain beaver | Aplodontia | herbi | Rodentia | nt |
3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | lc |
4 | Cow | Bos | herbi | Artiodactyla | domesticated |
Each column in a frame is treated as a frame, allowing for the list comprehension above.
You could also pass the matching frames
to the j
section of DT
:
DT[:, matching_frames].head(5)
name | genus | vore | order | conservation | |
---|---|---|---|---|---|
▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | |
0 | Cheetah | Acinonyx | carni | Carnivora | lc |
1 | Owl monkey | Aotus | omni | Primates | NA |
2 | Mountain beaver | Aplodontia | herbi | Rodentia | nt |
3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | lc |
4 | Cow | Bos | herbi | Artiodactyla | domesticated |
Selecting columns by logical expressions#
The ideas expressed in the previous section allows for more nifty column selection.
Say we wish to select columns that are numeric, and have a mean greater than 10:
# returns a list of booleans
columns_to_select = [
ltype in (ltype.real, ltype.int) and DT[name].mean()[0, 0] > 10
for name, ltype in zip(DT.names, DT.ltypes)
]
DT[:, columns_to_select].head(5)
sleep_total | awake | bodywt | |
---|---|---|---|
▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | |
0 | 12.1 | 11.9 | 50 |
1 | 17 | 7 | 0.48 |
2 | 14.4 | 9.6 | 1.35 |
3 | 14.9 | 9.1 | 0.019 |
4 | 4 | 20 | 600 |
The code above preselects the columns before passing it to datatable. Note the use of [0,0]
to return a scalar value; this allows us to compare with the scalar value 10
.
Alternatively, in the list comprehension, instead of a list of booleans, you could return the column names:
columns_to_select = [
name
for name, ltype in zip(DT.names, DT.ltypes)
if ltype in (ltype.real, ltype.int) and DT[name].mean()[0, 0] > 10
]
DT[:, columns_to_select].head(5)
sleep_total | awake | bodywt | |
---|---|---|---|
▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | |
0 | 12.1 | 11.9 | 50 |
1 | 17 | 7 | 0.48 |
2 | 14.4 | 9.6 | 1.35 |
3 | 14.9 | 9.1 | 0.019 |
4 | 4 | 20 | 600 |
You could also iterate through the frame in a list comprehension and check each column, before recombining with cbind:
matching_frames = [frame for frame in DT
if frame.ltypes[0] in (ltype.int, ltype.real)
and frame.mean()[0,0] > 10]
dt.cbind(matching_frames).head(5)
sleep_total | awake | bodywt | |
---|---|---|---|
▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | |
0 | 12.1 | 11.9 | 50 |
1 | 17 | 7 | 0.48 |
2 | 14.4 | 9.6 | 1.35 |
3 | 14.9 | 9.1 | 0.019 |
4 | 4 | 20 | 600 |
Instead of recombining with cbind, you could pass the matching_frames
to the j
section:
DT[:, matching_frames].head(5)
sleep_total | awake | bodywt | |
---|---|---|---|
▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | |
0 | 12.1 | 11.9 | 50 |
1 | 17 | 7 | 0.48 |
2 | 14.4 | 9.6 | 1.35 |
3 | 14.9 | 9.1 | 0.019 |
4 | 4 | 20 | 600 |
Let’s look at another example, where we select only columns where the number of distinct values is less than 10:
# returns a list of booleans
columns_to_select = [frame.nunique()[0, 0] < 10 for frame in DT]
DT[:, columns_to_select].head(5)
vore | conservation | |
---|---|---|
▪▪▪▪ | ▪▪▪▪ | |
0 | carni | lc |
1 | omni | NA |
2 | herbi | nt |
3 | omni | lc |
4 | herbi | domesticated |
matching_frames = [frame for frame in DT if frame.nunique()[0,0] < 10]
dt.cbind(matching_frames).head(5)
vore | conservation | |
---|---|---|
▪▪▪▪ | ▪▪▪▪ | |
0 | carni | lc |
1 | omni | NA |
2 | herbi | nt |
3 | omni | lc |
4 | herbi | domesticated |
Or pass matching_frames
to the j
section in DT
:
DT[:, matching_frames].head(5)
vore | conservation | |
---|---|---|
▪▪▪▪ | ▪▪▪▪ | |
0 | carni | lc |
1 | omni | NA |
2 | herbi | nt |
3 | omni | lc |
4 | herbi | domesticated |
Reordering Columns#
You can select columns in the order that you want:
columns_to_select = ['conservation', 'sleep_total', 'name']
DT[:, columns_to_select].head(5)
conservation | sleep_total | name | |
---|---|---|---|
▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪ | |
0 | lc | 12.1 | Cheetah |
1 | NA | 17 | Owl monkey |
2 | nt | 14.4 | Mountain beaver |
3 | lc | 14.9 | Greater short-tailed shrew |
4 | domesticated | 4 | Cow |
To move some columns to the front, you could write a function to cover that:
def move_to_the_front(frame, front_columns):
column_names = list(frame.names)
for name in front_columns:
column_names.remove(name)
front_columns.extend(column_names)
return front_columns
DT[:, move_to_the_front(DT, ['conservation', 'sleep_total'])].head(5)
conservation | sleep_total | name | genus | vore | order | sleep_rem | sleep_cycle | awake | brainwt | bodywt | |
---|---|---|---|---|---|---|---|---|---|---|---|
▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | |
0 | lc | 12.1 | Cheetah | Acinonyx | carni | Carnivora | NA | NA | 11.9 | NA | 50 |
1 | NA | 17 | Owl monkey | Aotus | omni | Primates | 1.8 | NA | 7 | 0.0155 | 0.48 |
2 | nt | 14.4 | Mountain beaver | Aplodontia | herbi | Rodentia | 2.4 | NA | 9.6 | NA | 1.35 |
3 | lc | 14.9 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | 2.3 | 0.133333 | 9.1 | 0.00029 | 0.019 |
4 | domesticated | 4 | Cow | Bos | herbi | Artiodactyla | 0.7 | 0.666667 | 20 | 0.423 | 600 |
Column Names#
Renaming Columns#
Columns with new names can be created within the j
section by passing a dictionary:
new_names = {"animal": f.name, "extinction_threat": f.conservation}
DT[:, f.sleep_total.extend(new_names)].head(5)
sleep_total | animal | extinction_threat | |
---|---|---|---|
▪▪▪▪▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | |
0 | 12.1 | Cheetah | lc |
1 | 17 | Owl monkey | NA |
2 | 14.4 | Mountain beaver | nt |
3 | 14.9 | Greater short-tailed shrew | lc |
4 | 4 | Cow | domesticated |
You can also rename the columns via a dictionary that maps the old column name to the new column name, and assign it to DT.names
:
DT_copy = DT.copy()
DT_copy.names = {"name": "animal", "conservation": "extinction_threat"}
DT_copy[:, ['animal', 'sleep_total', 'extinction_threat']].head(5)
animal | sleep_total | extinction_threat | |
---|---|---|---|
▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪ | |
0 | Cheetah | 12.1 | lc |
1 | Owl monkey | 17 | NA |
2 | Mountain beaver | 14.4 | nt |
3 | Greater short-tailed shrew | 14.9 | lc |
4 | Cow | 4 | domesticated |
DT_copy.head(5)
animal | genus | vore | order | extinction_threat | sleep_total | sleep_rem | sleep_cycle | awake | brainwt | bodywt | |
---|---|---|---|---|---|---|---|---|---|---|---|
▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | |
0 | Cheetah | Acinonyx | carni | Carnivora | lc | 12.1 | NA | NA | 11.9 | NA | 50 |
1 | Owl monkey | Aotus | omni | Primates | NA | 17 | 1.8 | NA | 7 | 0.0155 | 0.48 |
2 | Mountain beaver | Aplodontia | herbi | Rodentia | nt | 14.4 | 2.4 | NA | 9.6 | NA | 1.35 |
3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | lc | 14.9 | 2.3 | 0.133333 | 9.1 | 0.00029 | 0.019 |
4 | Cow | Bos | herbi | Artiodactyla | domesticated | 4 | 0.7 | 0.666667 | 20 | 0.423 | 600 |
Reformatting all Column Names#
You can use python’s string functions to reformat column names.
Let’s convert all column names to uppercase:
DT_copy.names = [name.upper() for name in DT.names] # or list(map(str.upper, DT.names))
DT_copy.head(5)
NAME | GENUS | VORE | ORDER | CONSERVATION | SLEEP_TOTAL | SLEEP_REM | SLEEP_CYCLE | AWAKE | BRAINWT | BODYWT | |
---|---|---|---|---|---|---|---|---|---|---|---|
▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | |
0 | Cheetah | Acinonyx | carni | Carnivora | lc | 12.1 | NA | NA | 11.9 | NA | 50 |
1 | Owl monkey | Aotus | omni | Primates | NA | 17 | 1.8 | NA | 7 | 0.0155 | 0.48 |
2 | Mountain beaver | Aplodontia | herbi | Rodentia | nt | 14.4 | 2.4 | NA | 9.6 | NA | 1.35 |
3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | lc | 14.9 | 2.3 | 0.133333 | 9.1 | 0.00029 | 0.019 |
4 | Cow | Bos | herbi | Artiodactyla | domesticated | 4 | 0.7 | 0.666667 | 20 | 0.423 | 600 |
Resources:
Comments#