{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data Wrangling with Python Datatable - Selecting Columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This article highlights various ways to select columns in python datatable. The examples used here are based off the excellent [article](https://suzan.rbind.io/2018/01/dplyr-tutorial-1/) by [Susan Baert](https://twitter.com/SuzanBaert).\n",
"\n",
"The data file can be accessed [here](https://github.com/samukweku/data-wrangling-blog/raw/master/_notebooks/Data_files/msleep.txt)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## **Selecting Columns**"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {
"iopub.execute_input": "2020-10-31T01:26:23.474681Z",
"iopub.status.busy": "2020-10-31T01:26:23.474464Z",
"iopub.status.idle": "2020-10-31T01:26:23.477682Z",
"shell.execute_reply": "2020-10-31T01:26:23.477047Z",
"shell.execute_reply.started": "2020-10-31T01:26:23.474658Z"
}
},
"source": [
"### The Basics"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
" | name | genus | vore | order | conservation | sleep_total | sleep_rem | sleep_cycle | awake | brainwt | bodywt |
\n",
" | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | Cheetah | Acinonyx | carni | Carnivora | lc | 12.1 | NA | NA | 11.9 | NA | 50 |
\n",
" 1 | Owl monkey | Aotus | omni | Primates | NA | 17 | 1.8 | NA | 7 | 0.0155 | 0.48 |
\n",
" 2 | Mountain beaver | Aplodontia | herbi | Rodentia | nt | 14.4 | 2.4 | NA | 9.6 | NA | 1.35 |
\n",
" 3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | lc | 14.9 | 2.3 | 0.133333 | 9.1 | 0.00029 | 0.019 |
\n",
" 4 | Cow | Bos | herbi | Artiodactyla | domesticated | 4 | 0.7 | 0.666667 | 20 | 0.423 | 600 |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from datatable import dt, f, ltype, stype\n",
"import re\n",
"\n",
"file_path='Data_files/msleep.txt'\n",
"DT = dt.fread(file_path)\n",
"DT.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can select columns by name or position in the `j` section:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | genus |
\n",
" | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | Acinonyx |
\n",
" 1 | Aotus |
\n",
" 2 | Aplodontia |
\n",
" 3 | Blarina |
\n",
" 4 | Bos |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"DT[:, 'genus'].head(5)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | genus |
\n",
" | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | Acinonyx |
\n",
" 1 | Aotus |
\n",
" 2 | Aplodontia |
\n",
" 3 | Blarina |
\n",
" 4 | Bos |
\n",
" 5 | Bradypus |
\n",
" 6 | Callorhinus |
\n",
" 7 | Calomys |
\n",
" 8 | Canis |
\n",
" 9 | Capreolus |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"DT[:, 1].head()"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | genus |
\n",
" | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | Acinonyx |
\n",
" 1 | Aotus |
\n",
" 2 | Aplodontia |
\n",
" 3 | Blarina |
\n",
" 4 | Bos |
\n",
" 5 | Bradypus |
\n",
" 6 | Callorhinus |
\n",
" 7 | Calomys |
\n",
" 8 | Canis |
\n",
" 9 | Capreolus |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"DT[:, -10].head()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you are selecting a single column, you can pass it into the brackets without specifying the `i` section:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | genus |
\n",
" | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | Acinonyx |
\n",
" 1 | Aotus |
\n",
" 2 | Aplodontia |
\n",
" 3 | Blarina |
\n",
" 4 | Bos |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"DT['genus'].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For the rest of this article, I will be focusing on column selection by name."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can select columns by passing a list/tuple of the column names:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | name | genus | sleep_total | awake |
\n",
" | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | Cheetah | Acinonyx | 12.1 | 11.9 |
\n",
" 1 | Owl monkey | Aotus | 17 | 7 |
\n",
" 2 | Mountain beaver | Aplodontia | 14.4 | 9.6 |
\n",
" 3 | Greater short-tailed shrew | Blarina | 14.9 | 9.1 |
\n",
" 4 | Cow | Bos | 4 | 20 |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"columns_to_select = [\"name\", \"genus\", \"sleep_total\", \"awake\"]\n",
"DT[:, columns_to_select].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can pass a list/tuple of booleans:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | name | genus | sleep_total | sleep_cycle | awake |
\n",
" | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | Cheetah | Acinonyx | 12.1 | NA | 11.9 |
\n",
" 1 | Owl monkey | Aotus | 17 | NA | 7 |
\n",
" 2 | Mountain beaver | Aplodontia | 14.4 | NA | 9.6 |
\n",
" 3 | Greater short-tailed shrew | Blarina | 14.9 | 0.133333 | 9.1 |
\n",
" 4 | Cow | Bos | 4 | 0.666667 | 20 |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"columns_to_select = [True, True, False, False, False, True,False,True,True,False,False]\n",
"DT[:, columns_to_select].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can select chunks of columns using python's [slice](https://docs.python.org/3/library/functions.html#slice) syntax or via the ``start:end`` shortcut:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | name | genus | vore | order |
\n",
" | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | Cheetah | Acinonyx | carni | Carnivora |
\n",
" 1 | Owl monkey | Aotus | omni | Primates |
\n",
" 2 | Mountain beaver | Aplodontia | herbi | Rodentia |
\n",
" 3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha |
\n",
" 4 | Cow | Bos | herbi | Artiodactyla |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"DT[:, slice(\"name\", \"order\")].head(5)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | name | genus | vore | order |
\n",
" | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | Cheetah | Acinonyx | carni | Carnivora |
\n",
" 1 | Owl monkey | Aotus | omni | Primates |
\n",
" 2 | Mountain beaver | Aplodontia | herbi | Rodentia |
\n",
" 3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha |
\n",
" 4 | Cow | Bos | herbi | Artiodactyla |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"DT[:, \"name\" : \"order\"].head(5)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Multiple chunk selection is possible:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | name | genus | vore | order | sleep_total | sleep_rem | sleep_cycle |
\n",
" | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | Cheetah | Acinonyx | carni | Carnivora | 12.1 | NA | NA |
\n",
" 1 | Owl monkey | Aotus | omni | Primates | 17 | 1.8 | NA |
\n",
" 2 | Mountain beaver | Aplodontia | herbi | Rodentia | 14.4 | 2.4 | NA |
\n",
" 3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | 14.9 | 2.3 | 0.133333 |
\n",
" 4 | Cow | Bos | herbi | Artiodactyla | 4 | 0.7 | 0.666667 |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"columns_to_select = [slice(\"name\", \"order\"), slice(\"sleep_total\", \"sleep_cycle\")]\n",
"DT[:, columns_to_select].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For the shortcut notation, for multiple selections, it has to be prefixed with datatable's [f](https://datatable.readthedocs.io/en/latest/manual/f-expressions.html) symbol:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | name | genus | vore | order | sleep_total | sleep_rem | sleep_cycle |
\n",
" | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | Cheetah | Acinonyx | carni | Carnivora | 12.1 | NA | NA |
\n",
" 1 | Owl monkey | Aotus | omni | Primates | 17 | 1.8 | NA |
\n",
" 2 | Mountain beaver | Aplodontia | herbi | Rodentia | 14.4 | 2.4 | NA |
\n",
" 3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | 14.9 | 2.3 | 0.133333 |
\n",
" 4 | Cow | Bos | herbi | Artiodactyla | 4 | 0.7 | 0.666667 |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"columns_to_select = [f[\"name\" : \"order\", \"sleep_total\" : \"sleep_cycle\"]]\n",
"DT[:, columns_to_select].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To deselect/drop columns you can use the [remove](https://datatable.readthedocs.io/en/latest/manual/f-expressions.html#modifying-a-columnset) function:"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | name | genus | vore | order | brainwt | bodywt |
\n",
" | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | Cheetah | Acinonyx | carni | Carnivora | NA | 50 |
\n",
" 1 | Owl monkey | Aotus | omni | Primates | 0.0155 | 0.48 |
\n",
" 2 | Mountain beaver | Aplodontia | herbi | Rodentia | NA | 1.35 |
\n",
" 3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | 0.00029 | 0.019 |
\n",
" 4 | Cow | Bos | herbi | Artiodactyla | 0.423 | 600 |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"columns_to_remove = [f[\"sleep_total\" : \"awake\", \"conservation\"]]\n",
"DT[:, f[:].remove(columns_to_remove)].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can deselect a whole chunk, and then re-add a column again; this combines the [remove](https://datatable.readthedocs.io/en/latest/manual/f-expressions.html#modifying-a-columnset) and [extend](https://datatable.readthedocs.io/en/latest/manual/f-expressions.html#modifying-a-columnset) functions:"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | brainwt | bodywt | conservation |
\n",
" | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | NA | 50 | lc |
\n",
" 1 | 0.0155 | 0.48 | NA |
\n",
" 2 | NA | 1.35 | nt |
\n",
" 3 | 0.00029 | 0.019 | lc |
\n",
" 4 | 0.423 | 600 | domesticated |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"DT[:, f[:].remove(f[\"name\" : \"awake\"]).extend(f[\"conservation\"])].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Selecting Columns based on Partial Names"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can use python's string functions to filter for columns with partial matching:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | sleep_total | sleep_rem | sleep_cycle |
\n",
" | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | 12.1 | NA | NA |
\n",
" 1 | 17 | 1.8 | NA |
\n",
" 2 | 14.4 | 2.4 | NA |
\n",
" 3 | 14.9 | 2.3 | 0.133333 |
\n",
" 4 | 4 | 0.7 | 0.666667 |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"columns_to_select = [name.startswith(\"sleep\") for name in DT.names]\n",
"DT[:, columns_to_select].head(5)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | sleep_total | sleep_rem | sleep_cycle | brainwt | bodywt |
\n",
" | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | 12.1 | NA | NA | NA | 50 |
\n",
" 1 | 17 | 1.8 | NA | 0.0155 | 0.48 |
\n",
" 2 | 14.4 | 2.4 | NA | NA | 1.35 |
\n",
" 3 | 14.9 | 2.3 | 0.133333 | 0.00029 | 0.019 |
\n",
" 4 | 4 | 0.7 | 0.666667 | 0.423 | 600 |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"columns_to_select = [\"eep\" in name or name.endswith(\"wt\") for name in DT.names]\n",
"DT[:, columns_to_select].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Selecting Columns based on Regex"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Python's [re](https://docs.python.org/3/library/re.html) module can be used to select columns based on a regular expression:"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | order | conservation |
\n",
" | ▪▪▪▪ | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | Carnivora | lc |
\n",
" 1 | Primates | NA |
\n",
" 2 | Rodentia | nt |
\n",
" 3 | Soricomorpha | lc |
\n",
" 4 | Artiodactyla | domesticated |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# this returns a list of booleans\n",
"columns_to_select = [True if re.search(r\"o.+er\", name) else False for name in DT.names]\n",
"DT[:, columns_to_select].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Selecting columns by their data type"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can pass a data type in the ``j`` section:"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | name | genus | vore | order | conservation |
\n",
" | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | Cheetah | Acinonyx | carni | Carnivora | lc |
\n",
" 1 | Owl monkey | Aotus | omni | Primates | NA |
\n",
" 2 | Mountain beaver | Aplodontia | herbi | Rodentia | nt |
\n",
" 3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | lc |
\n",
" 4 | Cow | Bos | herbi | Artiodactyla | domesticated |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"DT[:, str].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can pass a list of data types:"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | sleep_total | sleep_rem | sleep_cycle | awake | brainwt | bodywt |
\n",
" | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | 12.1 | NA | NA | 11.9 | NA | 50 |
\n",
" 1 | 17 | 1.8 | NA | 7 | 0.0155 | 0.48 |
\n",
" 2 | 14.4 | 2.4 | NA | 9.6 | NA | 1.35 |
\n",
" 3 | 14.9 | 2.3 | 0.133333 | 9.1 | 0.00029 | 0.019 |
\n",
" 4 | 4 | 0.7 | 0.666667 | 20 | 0.423 | 600 |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"DT[:, [int, float]].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also pass datatable's [stype](https://datatable.readthedocs.io/en/latest/api/stype.html#) or [ltype](https://datatable.readthedocs.io/en/latest/api/ltype.html#) data types:"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | name | genus | vore | order | conservation |
\n",
" | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | Cheetah | Acinonyx | carni | Carnivora | lc |
\n",
" 1 | Owl monkey | Aotus | omni | Primates | NA |
\n",
" 2 | Mountain beaver | Aplodontia | herbi | Rodentia | nt |
\n",
" 3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | lc |
\n",
" 4 | Cow | Bos | herbi | Artiodactyla | domesticated |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"DT[:, ltype.str].head(5)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | sleep_total | sleep_rem | sleep_cycle | awake | brainwt | bodywt |
\n",
" | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | 12.1 | NA | NA | 11.9 | NA | 50 |
\n",
" 1 | 17 | 1.8 | NA | 7 | 0.0155 | 0.48 |
\n",
" 2 | 14.4 | 2.4 | NA | 9.6 | NA | 1.35 |
\n",
" 3 | 14.9 | 2.3 | 0.133333 | 9.1 | 0.00029 | 0.019 |
\n",
" 4 | 4 | 0.7 | 0.666667 | 20 | 0.423 | 600 |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"DT[:, stype.float64].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can remove columns based on their data type:"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | name | genus | vore | order | conservation |
\n",
" | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | Cheetah | Acinonyx | carni | Carnivora | lc |
\n",
" 1 | Owl monkey | Aotus | omni | Primates | NA |
\n",
" 2 | Mountain beaver | Aplodontia | herbi | Rodentia | nt |
\n",
" 3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | lc |
\n",
" 4 | Cow | Bos | herbi | Artiodactyla | domesticated |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"columns_to_remove = [f[int, float]]\n",
"DT[:, f[:].remove(columns_to_remove)].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"An alternative is to preselect the columns you intend to keep:"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | name | genus | vore | order | conservation |
\n",
" | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | Cheetah | Acinonyx | carni | Carnivora | lc |
\n",
" 1 | Owl monkey | Aotus | omni | Primates | NA |
\n",
" 2 | Mountain beaver | Aplodontia | herbi | Rodentia | nt |
\n",
" 3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | lc |
\n",
" 4 | Cow | Bos | herbi | Artiodactyla | domesticated |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# creates a list of booleans\n",
"columns_to_select = [\n",
" dtype not in (ltype.int, ltype.real)\n",
" for _, dtype in zip(DT.names, DT.ltypes) \n",
"]\n",
"\n",
"DT[:, columns_to_select].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You could also iterate through the frame and check each column's type, before recombining with [cbind](https://datatable.readthedocs.io/en/latest/api/dt/cbind.html):"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | name | genus | vore | order | conservation |
\n",
" | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | Cheetah | Acinonyx | carni | Carnivora | lc |
\n",
" 1 | Owl monkey | Aotus | omni | Primates | NA |
\n",
" 2 | Mountain beaver | Aplodontia | herbi | Rodentia | nt |
\n",
" 3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | lc |
\n",
" 4 | Cow | Bos | herbi | Artiodactyla | domesticated |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"matching_frames = [frame for frame in DT if frame.ltypes[0] not in (ltype.real, ltype.int)]\n",
"dt.cbind(matching_frames).head(5)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Each column in a frame is treated as a frame, allowing for the list comprehension above.\n",
"\n",
"You could also pass the `matching frames` to the `j` section of `DT`:"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | name | genus | vore | order | conservation |
\n",
" | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | Cheetah | Acinonyx | carni | Carnivora | lc |
\n",
" 1 | Owl monkey | Aotus | omni | Primates | NA |
\n",
" 2 | Mountain beaver | Aplodontia | herbi | Rodentia | nt |
\n",
" 3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | lc |
\n",
" 4 | Cow | Bos | herbi | Artiodactyla | domesticated |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"DT[:, matching_frames].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Selecting columns by logical expressions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The ideas expressed in the previous section allows for more nifty column selection. \n",
"\n",
"Say we wish to select columns that are numeric, and have a mean greater than 10:"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | sleep_total | awake | bodywt |
\n",
" | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | 12.1 | 11.9 | 50 |
\n",
" 1 | 17 | 7 | 0.48 |
\n",
" 2 | 14.4 | 9.6 | 1.35 |
\n",
" 3 | 14.9 | 9.1 | 0.019 |
\n",
" 4 | 4 | 20 | 600 |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# returns a list of booleans\n",
"columns_to_select = [\n",
" ltype in (ltype.real, ltype.int) and DT[name].mean()[0, 0] > 10\n",
" for name, ltype in zip(DT.names, DT.ltypes)\n",
"]\n",
"DT[:, columns_to_select].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The code above preselects the columns before passing it to datatable. Note the use of `[0,0]` to return a scalar value; this allows us to compare with the scalar value `10`.\n",
"\n",
"Alternatively, in the list comprehension, instead of a list of booleans, you could return the column names:\n"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | sleep_total | awake | bodywt |
\n",
" | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | 12.1 | 11.9 | 50 |
\n",
" 1 | 17 | 7 | 0.48 |
\n",
" 2 | 14.4 | 9.6 | 1.35 |
\n",
" 3 | 14.9 | 9.1 | 0.019 |
\n",
" 4 | 4 | 20 | 600 |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"columns_to_select = [\n",
" name\n",
" for name, ltype in zip(DT.names, DT.ltypes)\n",
" if ltype in (ltype.real, ltype.int) and DT[name].mean()[0, 0] > 10\n",
"]\n",
"DT[:, columns_to_select].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"You could also iterate through the frame in a list comprehension and check each column, before recombining with [cbind](https://datatable.readthedocs.io/en/latest/api/dt/cbind.html):"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | sleep_total | awake | bodywt |
\n",
" | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | 12.1 | 11.9 | 50 |
\n",
" 1 | 17 | 7 | 0.48 |
\n",
" 2 | 14.4 | 9.6 | 1.35 |
\n",
" 3 | 14.9 | 9.1 | 0.019 |
\n",
" 4 | 4 | 20 | 600 |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"matching_frames = [frame for frame in DT \n",
" if frame.ltypes[0] in (ltype.int, ltype.real) \n",
" and frame.mean()[0,0] > 10]\n",
"dt.cbind(matching_frames).head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Instead of recombining with [cbind](https://datatable.readthedocs.io/en/latest/api/dt/cbind.html), you could pass the `matching_frames` to the ``j`` section:"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | sleep_total | awake | bodywt |
\n",
" | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | 12.1 | 11.9 | 50 |
\n",
" 1 | 17 | 7 | 0.48 |
\n",
" 2 | 14.4 | 9.6 | 1.35 |
\n",
" 3 | 14.9 | 9.1 | 0.019 |
\n",
" 4 | 4 | 20 | 600 |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"DT[:, matching_frames].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's look at another example, where we select only columns where the number of distinct values is less than 10:"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | vore | conservation |
\n",
" | ▪▪▪▪ | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | carni | lc |
\n",
" 1 | omni | NA |
\n",
" 2 | herbi | nt |
\n",
" 3 | omni | lc |
\n",
" 4 | herbi | domesticated |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# returns a list of booleans\n",
"columns_to_select = [frame.nunique()[0, 0] < 10 for frame in DT]\n",
"DT[:, columns_to_select].head(5)"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | vore | conservation |
\n",
" | ▪▪▪▪ | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | carni | lc |
\n",
" 1 | omni | NA |
\n",
" 2 | herbi | nt |
\n",
" 3 | omni | lc |
\n",
" 4 | herbi | domesticated |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"matching_frames = [frame for frame in DT if frame.nunique()[0,0] < 10]\n",
"dt.cbind(matching_frames).head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or pass `matching_frames` to the `j` section in `DT`:"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | vore | conservation |
\n",
" | ▪▪▪▪ | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | carni | lc |
\n",
" 1 | omni | NA |
\n",
" 2 | herbi | nt |
\n",
" 3 | omni | lc |
\n",
" 4 | herbi | domesticated |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"DT[:, matching_frames].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## **Reordering Columns**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can select columns in the order that you want:"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | conservation | sleep_total | name |
\n",
" | ▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | lc | 12.1 | Cheetah |
\n",
" 1 | NA | 17 | Owl monkey |
\n",
" 2 | nt | 14.4 | Mountain beaver |
\n",
" 3 | lc | 14.9 | Greater short-tailed shrew |
\n",
" 4 | domesticated | 4 | Cow |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"columns_to_select = ['conservation', 'sleep_total', 'name']\n",
"DT[:, columns_to_select].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To move some columns to the front, you could write a function to cover that:"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"def move_to_the_front(frame, front_columns):\n",
" column_names = list(frame.names)\n",
" for name in front_columns:\n",
" column_names.remove(name)\n",
" front_columns.extend(column_names)\n",
" return front_columns"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | conservation | sleep_total | name | genus | vore | order | sleep_rem | sleep_cycle | awake | brainwt | bodywt |
\n",
" | ▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | lc | 12.1 | Cheetah | Acinonyx | carni | Carnivora | NA | NA | 11.9 | NA | 50 |
\n",
" 1 | NA | 17 | Owl monkey | Aotus | omni | Primates | 1.8 | NA | 7 | 0.0155 | 0.48 |
\n",
" 2 | nt | 14.4 | Mountain beaver | Aplodontia | herbi | Rodentia | 2.4 | NA | 9.6 | NA | 1.35 |
\n",
" 3 | lc | 14.9 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | 2.3 | 0.133333 | 9.1 | 0.00029 | 0.019 |
\n",
" 4 | domesticated | 4 | Cow | Bos | herbi | Artiodactyla | 0.7 | 0.666667 | 20 | 0.423 | 600 |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"DT[:, move_to_the_front(DT, ['conservation', 'sleep_total'])].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## **Column Names**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Renaming Columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Columns with new names can be created within the `j` section by passing a dictionary:"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | sleep_total | animal | extinction_threat |
\n",
" | ▪▪▪▪▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | 12.1 | Cheetah | lc |
\n",
" 1 | 17 | Owl monkey | NA |
\n",
" 2 | 14.4 | Mountain beaver | nt |
\n",
" 3 | 14.9 | Greater short-tailed shrew | lc |
\n",
" 4 | 4 | Cow | domesticated |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new_names = {\"animal\": f.name, \"extinction_threat\": f.conservation}\n",
"DT[:, f.sleep_total.extend(new_names)].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also rename the columns via a dictionary that maps the old column name to the new column name, and assign it to ``DT.names``:"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | animal | sleep_total | extinction_threat |
\n",
" | ▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | Cheetah | 12.1 | lc |
\n",
" 1 | Owl monkey | 17 | NA |
\n",
" 2 | Mountain beaver | 14.4 | nt |
\n",
" 3 | Greater short-tailed shrew | 14.9 | lc |
\n",
" 4 | Cow | 4 | domesticated |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"DT_copy = DT.copy()\n",
"DT_copy.names = {\"name\": \"animal\", \"conservation\": \"extinction_threat\"}\n",
"DT_copy[:, ['animal', 'sleep_total', 'extinction_threat']].head(5)\n"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | animal | genus | vore | order | extinction_threat | sleep_total | sleep_rem | sleep_cycle | awake | brainwt | bodywt |
\n",
" | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | Cheetah | Acinonyx | carni | Carnivora | lc | 12.1 | NA | NA | 11.9 | NA | 50 |
\n",
" 1 | Owl monkey | Aotus | omni | Primates | NA | 17 | 1.8 | NA | 7 | 0.0155 | 0.48 |
\n",
" 2 | Mountain beaver | Aplodontia | herbi | Rodentia | nt | 14.4 | 2.4 | NA | 9.6 | NA | 1.35 |
\n",
" 3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | lc | 14.9 | 2.3 | 0.133333 | 9.1 | 0.00029 | 0.019 |
\n",
" 4 | Cow | Bos | herbi | Artiodactyla | domesticated | 4 | 0.7 | 0.666667 | 20 | 0.423 | 600 |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"DT_copy.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Reformatting all Column Names"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can use python's string functions to reformat column names.\n",
"\n",
"Let's convert all column names to uppercase:"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | NAME | GENUS | VORE | ORDER | CONSERVATION | SLEEP_TOTAL | SLEEP_REM | SLEEP_CYCLE | AWAKE | BRAINWT | BODYWT |
\n",
" | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | Cheetah | Acinonyx | carni | Carnivora | lc | 12.1 | NA | NA | 11.9 | NA | 50 |
\n",
" 1 | Owl monkey | Aotus | omni | Primates | NA | 17 | 1.8 | NA | 7 | 0.0155 | 0.48 |
\n",
" 2 | Mountain beaver | Aplodontia | herbi | Rodentia | nt | 14.4 | 2.4 | NA | 9.6 | NA | 1.35 |
\n",
" 3 | Greater short-tailed shrew | Blarina | omni | Soricomorpha | lc | 14.9 | 2.3 | 0.133333 | 9.1 | 0.00029 | 0.019 |
\n",
" 4 | Cow | Bos | herbi | Artiodactyla | domesticated | 4 | 0.7 | 0.666667 | 20 | 0.423 | 600 |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"DT_copy.names = [name.upper() for name in DT.names] # or list(map(str.upper, DT.names))\n",
"DT_copy.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Resources: \n",
"\n",
"- [datatable docs](https://datatable.readthedocs.io/en/latest/)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Comments\n",
""
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.18"
}
},
"nbformat": 4,
"nbformat_minor": 4
}