{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Read Multiple CSV Files into one Frame in Python"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### [Link to Source data](https://people.highline.edu/mgirvin/excelisfun.htm)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## **Pandas**"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {
"iopub.execute_input": "2020-10-31T01:26:23.474681Z",
"iopub.status.busy": "2020-10-31T01:26:23.474464Z",
"iopub.status.idle": "2020-10-31T01:26:23.477682Z",
"shell.execute_reply": "2020-10-31T01:26:23.477047Z",
"shell.execute_reply.started": "2020-10-31T01:26:23.474658Z"
}
},
"source": [
"### Via read_csv"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" City | \n",
" Date | \n",
" Sales | \n",
" StoreID | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Tacoma | \n",
" 1/1/2014 | \n",
" $17.30 | \n",
" 1522 | \n",
"
\n",
" \n",
" 1 | \n",
" Tacoma | \n",
" 1/1/2014 | \n",
" $15.77 | \n",
" 1520 | \n",
"
\n",
" \n",
" 2 | \n",
" Tacoma | \n",
" 1/1/2014 | \n",
" $151.36 | \n",
" 1520 | \n",
"
\n",
" \n",
" 3 | \n",
" Tacoma | \n",
" 1/1/2014 | \n",
" $168.47 | \n",
" 1522 | \n",
"
\n",
" \n",
" 4 | \n",
" Tacoma | \n",
" 1/1/2014 | \n",
" $150.71 | \n",
" 1520 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 223593 | \n",
" Portland | \n",
" 2/3/2015 | \n",
" $41.03 | \n",
" 1004 | \n",
"
\n",
" \n",
" 223594 | \n",
" Portland | \n",
" 2/3/2015 | \n",
" $162.54 | \n",
" 1005 | \n",
"
\n",
" \n",
" 223595 | \n",
" Portland | \n",
" 2/3/2015 | \n",
" $38.38 | \n",
" 1002 | \n",
"
\n",
" \n",
" 223596 | \n",
" Portland | \n",
" 2/3/2015 | \n",
" $96.40 | \n",
" 1003 | \n",
"
\n",
" \n",
" 223597 | \n",
" Portland | \n",
" 2/3/2015 | \n",
" $8.38 | \n",
" 1005 | \n",
"
\n",
" \n",
"
\n",
"
880349 rows × 4 columns
\n",
"
"
],
"text/plain": [
" City Date Sales StoreID\n",
"0 Tacoma 1/1/2014 $17.30 1522\n",
"1 Tacoma 1/1/2014 $15.77 1520\n",
"2 Tacoma 1/1/2014 $151.36 1520\n",
"3 Tacoma 1/1/2014 $168.47 1522\n",
"4 Tacoma 1/1/2014 $150.71 1520\n",
"... ... ... ... ...\n",
"223593 Portland 2/3/2015 $41.03 1004\n",
"223594 Portland 2/3/2015 $162.54 1005\n",
"223595 Portland 2/3/2015 $38.38 1002\n",
"223596 Portland 2/3/2015 $96.40 1003\n",
"223597 Portland 2/3/2015 $8.38 1005\n",
"\n",
"[880349 rows x 4 columns]"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"from pathlib import Path\n",
"\n",
"# create a Path instance and filter for only csv files\n",
"files = Path(\"./Data_files/multiple_csvs/\").rglob(\"*.csv\")\n",
"\n",
"# read in all the csv files\n",
"all_csvs = [pd.read_csv(file) for file in files]\n",
"\n",
"# lump into one table\n",
"all_csvs = pd.concat(all_csvs)\n",
"\n",
"all_csvs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Via Pyjanitor's read_csvs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The above step can be simplified with [pyjanitor's](https://pyjanitor-devs.github.io/pyjanitor/) [read_csvs](https://pyjanitor-devs.github.io/pyjanitor/api/io/#janitor.io.read_csvs) function:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" City | \n",
" Date | \n",
" Sales | \n",
" StoreID | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Tacoma | \n",
" 1/1/2014 | \n",
" $17.30 | \n",
" 1522 | \n",
"
\n",
" \n",
" 1 | \n",
" Tacoma | \n",
" 1/1/2014 | \n",
" $15.77 | \n",
" 1520 | \n",
"
\n",
" \n",
" 2 | \n",
" Tacoma | \n",
" 1/1/2014 | \n",
" $151.36 | \n",
" 1520 | \n",
"
\n",
" \n",
" 3 | \n",
" Tacoma | \n",
" 1/1/2014 | \n",
" $168.47 | \n",
" 1522 | \n",
"
\n",
" \n",
" 4 | \n",
" Tacoma | \n",
" 1/1/2014 | \n",
" $150.71 | \n",
" 1520 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 880344 | \n",
" Portland | \n",
" 2/3/2015 | \n",
" $41.03 | \n",
" 1004 | \n",
"
\n",
" \n",
" 880345 | \n",
" Portland | \n",
" 2/3/2015 | \n",
" $162.54 | \n",
" 1005 | \n",
"
\n",
" \n",
" 880346 | \n",
" Portland | \n",
" 2/3/2015 | \n",
" $38.38 | \n",
" 1002 | \n",
"
\n",
" \n",
" 880347 | \n",
" Portland | \n",
" 2/3/2015 | \n",
" $96.40 | \n",
" 1003 | \n",
"
\n",
" \n",
" 880348 | \n",
" Portland | \n",
" 2/3/2015 | \n",
" $8.38 | \n",
" 1005 | \n",
"
\n",
" \n",
"
\n",
"
880349 rows × 4 columns
\n",
"
"
],
"text/plain": [
" City Date Sales StoreID\n",
"0 Tacoma 1/1/2014 $17.30 1522\n",
"1 Tacoma 1/1/2014 $15.77 1520\n",
"2 Tacoma 1/1/2014 $151.36 1520\n",
"3 Tacoma 1/1/2014 $168.47 1522\n",
"4 Tacoma 1/1/2014 $150.71 1520\n",
"... ... ... ... ...\n",
"880344 Portland 2/3/2015 $41.03 1004\n",
"880345 Portland 2/3/2015 $162.54 1005\n",
"880346 Portland 2/3/2015 $38.38 1002\n",
"880347 Portland 2/3/2015 $96.40 1003\n",
"880348 Portland 2/3/2015 $8.38 1005\n",
"\n",
"[880349 rows x 4 columns]"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# pip install pyjanitor\n",
"from janitor import read_csvs\n",
"\n",
"files = Path(\"./Data_files/multiple_csvs/\").rglob(\"*.csv\")\n",
"df = read_csvs(files)\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Via read_csv and the command line"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" City | \n",
" Date | \n",
" Sales | \n",
" StoreID | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Oakland | \n",
" 1/1/2014 | \n",
" $9.83 | \n",
" 982 | \n",
"
\n",
" \n",
" 1 | \n",
" Oakland | \n",
" 1/1/2014 | \n",
" $28.18 | \n",
" 983 | \n",
"
\n",
" \n",
" 2 | \n",
" Oakland | \n",
" 1/1/2014 | \n",
" $6.83 | \n",
" 982 | \n",
"
\n",
" \n",
" 3 | \n",
" Oakland | \n",
" 1/1/2014 | \n",
" $43.90 | \n",
" 982 | \n",
"
\n",
" \n",
" 4 | \n",
" Oakland | \n",
" 1/1/2014 | \n",
" $17.16 | \n",
" 980 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 880344 | \n",
" Tacoma | \n",
" 12/31/2015 | \n",
" $33.68 | \n",
" 1523 | \n",
"
\n",
" \n",
" 880345 | \n",
" Tacoma | \n",
" 12/31/2015 | \n",
" $215.98 | \n",
" 1521 | \n",
"
\n",
" \n",
" 880346 | \n",
" Tacoma | \n",
" 12/31/2015 | \n",
" $236.86 | \n",
" 1521 | \n",
"
\n",
" \n",
" 880347 | \n",
" Tacoma | \n",
" 12/31/2015 | \n",
" $33.02 | \n",
" 1522 | \n",
"
\n",
" \n",
" 880348 | \n",
" Tacoma | \n",
" 12/31/2015 | \n",
" $11.32 | \n",
" 1523 | \n",
"
\n",
" \n",
"
\n",
"
880349 rows × 4 columns
\n",
"
"
],
"text/plain": [
" City Date Sales StoreID\n",
"0 Oakland 1/1/2014 $9.83 982\n",
"1 Oakland 1/1/2014 $28.18 983\n",
"2 Oakland 1/1/2014 $6.83 982\n",
"3 Oakland 1/1/2014 $43.90 982\n",
"4 Oakland 1/1/2014 $17.16 980\n",
"... ... ... ... ...\n",
"880344 Tacoma 12/31/2015 $33.68 1523\n",
"880345 Tacoma 12/31/2015 $215.98 1521\n",
"880346 Tacoma 12/31/2015 $236.86 1521\n",
"880347 Tacoma 12/31/2015 $33.02 1522\n",
"880348 Tacoma 12/31/2015 $11.32 1523\n",
"\n",
"[880349 rows x 4 columns]"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import subprocess\n",
"from io import StringIO\n",
"\n",
"data = subprocess.run(\"awk '(NR==1) || (FNR>1)' ./Data_files/multiple_csvs/*.csv\", \n",
" shell=True, \n",
" capture_output=True,\n",
" text=True).stdout\n",
"\n",
"df = pd.read_csv(StringIO(data))\n",
"\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Via Pyjanitor's read_commandline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Reading via the command line can be simplified via [pyjanitor's](https://pyjanitor-devs.github.io/pyjanitor/) [read_commandline](https://pyjanitor-devs.github.io/pyjanitor/api/io/#janitor.io.read_commandline) function:"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" City | \n",
" Date | \n",
" Sales | \n",
" StoreID | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Oakland | \n",
" 1/1/2014 | \n",
" $9.83 | \n",
" 982 | \n",
"
\n",
" \n",
" 1 | \n",
" Oakland | \n",
" 1/1/2014 | \n",
" $28.18 | \n",
" 983 | \n",
"
\n",
" \n",
" 2 | \n",
" Oakland | \n",
" 1/1/2014 | \n",
" $6.83 | \n",
" 982 | \n",
"
\n",
" \n",
" 3 | \n",
" Oakland | \n",
" 1/1/2014 | \n",
" $43.90 | \n",
" 982 | \n",
"
\n",
" \n",
" 4 | \n",
" Oakland | \n",
" 1/1/2014 | \n",
" $17.16 | \n",
" 980 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 880344 | \n",
" Tacoma | \n",
" 12/31/2015 | \n",
" $33.68 | \n",
" 1523 | \n",
"
\n",
" \n",
" 880345 | \n",
" Tacoma | \n",
" 12/31/2015 | \n",
" $215.98 | \n",
" 1521 | \n",
"
\n",
" \n",
" 880346 | \n",
" Tacoma | \n",
" 12/31/2015 | \n",
" $236.86 | \n",
" 1521 | \n",
"
\n",
" \n",
" 880347 | \n",
" Tacoma | \n",
" 12/31/2015 | \n",
" $33.02 | \n",
" 1522 | \n",
"
\n",
" \n",
" 880348 | \n",
" Tacoma | \n",
" 12/31/2015 | \n",
" $11.32 | \n",
" 1523 | \n",
"
\n",
" \n",
"
\n",
"
880349 rows × 4 columns
\n",
"
"
],
"text/plain": [
" City Date Sales StoreID\n",
"0 Oakland 1/1/2014 $9.83 982\n",
"1 Oakland 1/1/2014 $28.18 983\n",
"2 Oakland 1/1/2014 $6.83 982\n",
"3 Oakland 1/1/2014 $43.90 982\n",
"4 Oakland 1/1/2014 $17.16 980\n",
"... ... ... ... ...\n",
"880344 Tacoma 12/31/2015 $33.68 1523\n",
"880345 Tacoma 12/31/2015 $215.98 1521\n",
"880346 Tacoma 12/31/2015 $236.86 1521\n",
"880347 Tacoma 12/31/2015 $33.02 1522\n",
"880348 Tacoma 12/31/2015 $11.32 1523\n",
"\n",
"[880349 rows x 4 columns]"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# pip install pyjanitor\n",
"from janitor import read_commandline\n",
"\n",
"cmd = \"awk '(NR==1) || (FNR>1)' ./Data_files/multiple_csvs/*.csv\"\n",
"df = read_commandline(cmd)\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## **datatable**"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {
"iopub.execute_input": "2020-10-31T01:38:57.851388Z",
"iopub.status.busy": "2020-10-31T01:38:57.851145Z",
"iopub.status.idle": "2020-10-31T01:38:57.854014Z",
"shell.execute_reply": "2020-10-31T01:38:57.853407Z",
"shell.execute_reply.started": "2020-10-31T01:38:57.851357Z"
}
},
"source": [
"### Via iread"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | City | Date | Sales | StoreID |
\n",
" | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | Tacoma | 1/1/2014 | $17.30 | 1522 |
\n",
" 1 | Tacoma | 1/1/2014 | $15.77 | 1520 |
\n",
" 2 | Tacoma | 1/1/2014 | $151.36 | 1520 |
\n",
" 3 | Tacoma | 1/1/2014 | $168.47 | 1522 |
\n",
" 4 | Tacoma | 1/1/2014 | $150.71 | 1520 |
\n",
" 5 | Tacoma | 1/1/2014 | $22.86 | 1523 |
\n",
" 6 | Tacoma | 1/1/2014 | $18.66 | 1521 |
\n",
" 7 | Tacoma | 1/1/2014 | $39.76 | 1520 |
\n",
" 8 | Tacoma | 1/1/2014 | $132.75 | 1523 |
\n",
" 9 | Tacoma | 1/1/2014 | $39.58 | 1521 |
\n",
" 10 | Tacoma | 1/1/2014 | $159.14 | 1523 |
\n",
" 11 | Tacoma | 1/1/2014 | $33.29 | 1522 |
\n",
" 12 | Tacoma | 1/1/2014 | $38.40 | 1522 |
\n",
" 13 | Tacoma | 1/1/2014 | $33.46 | 1520 |
\n",
" 14 | Tacoma | 1/1/2014 | $23.85 | 1522 |
\n",
" ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
\n",
" 880,344 | Portland | 2/3/2015 | $41.03 | 1004 |
\n",
" 880,345 | Portland | 2/3/2015 | $162.54 | 1005 |
\n",
" 880,346 | Portland | 2/3/2015 | $38.38 | 1002 |
\n",
" 880,347 | Portland | 2/3/2015 | $96.40 | 1003 |
\n",
" 880,348 | Portland | 2/3/2015 | $8.38 | 1005 |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from datatable import fread, iread, rbind\n",
"\n",
"files = Path(\"./Data_files/multiple_csvs/\").rglob(\"*.csv\")\n",
"\n",
"# iread returns an iterator of all files\n",
"all_csvs = iread(tuple(files))\n",
"\n",
"# combine into one table with rbind\n",
"all_csvs = rbind(all_csvs)\n",
"\n",
"all_csvs"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {
"iopub.execute_input": "2020-10-31T01:39:15.743149Z",
"iopub.status.busy": "2020-10-31T01:39:15.742716Z",
"iopub.status.idle": "2020-10-31T01:39:15.747343Z",
"shell.execute_reply": "2020-10-31T01:39:15.746321Z",
"shell.execute_reply.started": "2020-10-31T01:39:15.743100Z"
}
},
"source": [
"### Via fread"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | City | Date | Sales | StoreID |
\n",
" | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | Tacoma | 1/1/2014 | $17.30 | 1522 |
\n",
" 1 | Tacoma | 1/1/2014 | $15.77 | 1520 |
\n",
" 2 | Tacoma | 1/1/2014 | $151.36 | 1520 |
\n",
" 3 | Tacoma | 1/1/2014 | $168.47 | 1522 |
\n",
" 4 | Tacoma | 1/1/2014 | $150.71 | 1520 |
\n",
" 5 | Tacoma | 1/1/2014 | $22.86 | 1523 |
\n",
" 6 | Tacoma | 1/1/2014 | $18.66 | 1521 |
\n",
" 7 | Tacoma | 1/1/2014 | $39.76 | 1520 |
\n",
" 8 | Tacoma | 1/1/2014 | $132.75 | 1523 |
\n",
" 9 | Tacoma | 1/1/2014 | $39.58 | 1521 |
\n",
" 10 | Tacoma | 1/1/2014 | $159.14 | 1523 |
\n",
" 11 | Tacoma | 1/1/2014 | $33.29 | 1522 |
\n",
" 12 | Tacoma | 1/1/2014 | $38.40 | 1522 |
\n",
" 13 | Tacoma | 1/1/2014 | $33.46 | 1520 |
\n",
" 14 | Tacoma | 1/1/2014 | $23.85 | 1522 |
\n",
" ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
\n",
" 880,344 | Portland | 2/3/2015 | $41.03 | 1004 |
\n",
" 880,345 | Portland | 2/3/2015 | $162.54 | 1005 |
\n",
" 880,346 | Portland | 2/3/2015 | $38.38 | 1002 |
\n",
" 880,347 | Portland | 2/3/2015 | $96.40 | 1003 |
\n",
" 880,348 | Portland | 2/3/2015 | $8.38 | 1005 |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"files = Path(\"./Data_files/multiple_csvs/\").rglob(\"*.csv\")\n",
"\n",
"all_csvs = [fread(filename) for filename in files]\n",
"\n",
"# combine into one frame\n",
"rbind(all_csvs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Via fread and the command line"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" | City | Date | Sales | StoreID |
\n",
" | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ |
\n",
" \n",
" \n",
" 0 | Oakland | 1/1/2014 | $9.83 | 982 |
\n",
" 1 | Oakland | 1/1/2014 | $28.18 | 983 |
\n",
" 2 | Oakland | 1/1/2014 | $6.83 | 982 |
\n",
" 3 | Oakland | 1/1/2014 | $43.90 | 982 |
\n",
" 4 | Oakland | 1/1/2014 | $17.16 | 980 |
\n",
" 5 | Oakland | 1/1/2014 | $14.29 | 982 |
\n",
" 6 | Oakland | 1/1/2014 | $6.48 | 982 |
\n",
" 7 | Oakland | 1/1/2014 | $232.13 | 982 |
\n",
" 8 | Oakland | 1/1/2014 | $209.28 | 981 |
\n",
" 9 | Oakland | 1/1/2014 | $12.11 | 981 |
\n",
" 10 | Oakland | 1/1/2014 | $107.96 | 982 |
\n",
" 11 | Oakland | 1/1/2014 | $47.64 | 980 |
\n",
" 12 | Oakland | 1/1/2014 | $15.44 | 981 |
\n",
" 13 | Oakland | 1/1/2014 | $52.72 | 982 |
\n",
" 14 | Oakland | 1/1/2014 | $3.25 | 981 |
\n",
" ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
\n",
" 880,344 | Tacoma | 12/31/2015 | $33.68 | 1523 |
\n",
" 880,345 | Tacoma | 12/31/2015 | $215.98 | 1521 |
\n",
" 880,346 | Tacoma | 12/31/2015 | $236.86 | 1521 |
\n",
" 880,347 | Tacoma | 12/31/2015 | $33.02 | 1522 |
\n",
" 880,348 | Tacoma | 12/31/2015 | $11.32 | 1523 |
\n",
" \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# less verbose than the Pandas option\n",
"cmd = \"awk '(NR==1) || (FNR>1)' ./Data_files/multiple_csvs/*.csv\"\n",
"all_csvs = fread(cmd = cmd)\n",
"all_csvs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## **Polars**"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (880_349, 4)City | Date | Sales | StoreID |
---|
str | str | str | i64 |
"Oakland" | "1/1/2014" | "$9.83 " | 982 |
"Oakland" | "1/1/2014" | "$28.18 " | 983 |
"Oakland" | "1/1/2014" | "$6.83 " | 982 |
"Oakland" | "1/1/2014" | "$43.90 " | 982 |
"Oakland" | "1/1/2014" | "$17.16 " | 980 |
… | … | … | … |
"Tacoma" | "12/31/2015" | "$33.68 " | 1523 |
"Tacoma" | "12/31/2015" | "$215.98 " | 1521 |
"Tacoma" | "12/31/2015" | "$236.86 " | 1521 |
"Tacoma" | "12/31/2015" | "$33.02 " | 1522 |
"Tacoma" | "12/31/2015" | "$11.32 " | 1523 |
"
],
"text/plain": [
"shape: (880_349, 4)\n",
"┌─────────┬────────────┬──────────┬─────────┐\n",
"│ City ┆ Date ┆ Sales ┆ StoreID │\n",
"│ --- ┆ --- ┆ --- ┆ --- │\n",
"│ str ┆ str ┆ str ┆ i64 │\n",
"╞═════════╪════════════╪══════════╪═════════╡\n",
"│ Oakland ┆ 1/1/2014 ┆ $9.83 ┆ 982 │\n",
"│ Oakland ┆ 1/1/2014 ┆ $28.18 ┆ 983 │\n",
"│ Oakland ┆ 1/1/2014 ┆ $6.83 ┆ 982 │\n",
"│ Oakland ┆ 1/1/2014 ┆ $43.90 ┆ 982 │\n",
"│ Oakland ┆ 1/1/2014 ┆ $17.16 ┆ 980 │\n",
"│ … ┆ … ┆ … ┆ … │\n",
"│ Tacoma ┆ 12/31/2015 ┆ $33.68 ┆ 1523 │\n",
"│ Tacoma ┆ 12/31/2015 ┆ $215.98 ┆ 1521 │\n",
"│ Tacoma ┆ 12/31/2015 ┆ $236.86 ┆ 1521 │\n",
"│ Tacoma ┆ 12/31/2015 ┆ $33.02 ┆ 1522 │\n",
"│ Tacoma ┆ 12/31/2015 ┆ $11.32 ┆ 1523 │\n",
"└─────────┴────────────┴──────────┴─────────┘"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import polars as pl\n",
"files = Path(\"./Data_files/multiple_csvs/*.csv\")\n",
"pl.read_csv(files)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {
"iopub.execute_input": "2020-10-31T08:59:52.887765Z",
"iopub.status.busy": "2020-10-31T08:59:52.887501Z",
"iopub.status.idle": "2020-10-31T08:59:52.893435Z",
"shell.execute_reply": "2020-10-31T08:59:52.892724Z",
"shell.execute_reply.started": "2020-10-31T08:59:52.887738Z"
}
},
"source": [
"## **Resources Used**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"datatable - [fread](https://datatable.readthedocs.io/en/latest/api/dt/fread.html)\n",
"\n",
"datatable - [iread](https://datatable.readthedocs.io/en/latest/api/dt/iread.html)\n",
"\n",
"pandas - [read_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)\n",
"\n",
"polars - [read_csv](https://docs.pola.rs/user-guide/io/multiple/#dealing-with-multiple-files)\n",
"\n",
"python - [subprocess](https://docs.python.org/3/library/subprocess.html)\n",
"\n",
"python - [pathlib](https://docs.python.org/3/library/pathlib.html)\n",
"\n",
"StackOverflow - [read csv files via the command line](https://stackoverflow.com/a/58131427/7175713)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Comments\n",
""
]
}
],
"metadata": {
"kernelspec": {
"display_name": "blogger",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
}
},
"nbformat": 4,
"nbformat_minor": 4
}