{"id":21,"date":"2023-09-13T09:56:18","date_gmt":"2023-09-13T09:56:18","guid":{"rendered":"http:\/\/harvard-open-data-project.local\/?page_id=21"},"modified":"2023-09-14T01:54:42","modified_gmt":"2023-09-14T01:54:42","slug":"data-wrangling-with-python-intermediate","status":"publish","type":"page","link":"http:\/\/harvard-open-data-project.local\/data-wrangling-with-python-intermediate\/","title":{"rendered":"Data Wrangling with Python &#8211; Intermediate"},"content":{"rendered":"\n<p>Python is known for its incredible versatility and simplicity in handling data, making it an excellent tool for data wrangling. This article will delve into the intermediate aspects of Python, such as file manipulation and reading CSV files. This guide assumes a basic knowledge of Python and Python syntax. If you need a refresher on Python basics, check out&nbsp;<a href=\"http:\/\/harvard-open-data-project.local\/python-data-wrangling\/\" target=\"_blank\" data-type=\"page\" data-id=\"19\" rel=\"noreferrer noopener\">HODP&#8217;s Python for beginners guid<\/a><a href=\"https:\/\/docs.hodp.org\/docs\/python-beginners\" target=\"_blank\" rel=\"noreferrer noopener\">e<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">File Input\/Output (I\/O)<\/h2>\n\n\n\n<p>File I\/O operations are crucial in Python, especially when dealing with large amounts of data stored outside of Python, such as Excel spreadsheets. Python makes it easy to read and write files in different modes, making data manipulation efficient and straightforward.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Opening Files<\/h3>\n\n\n\n<p>To open a file in Python, we use the built-in&nbsp;<code>open()<\/code>&nbsp;function.&nbsp;<code>open()<\/code>&nbsp;takes two arguments, the file&#8217;s name (or path) and the mode in which we want to open the file.<\/p>\n\n\n\n<p>Python provides different modes for opening a file. The common modes are:<\/p>\n\n\n\n<ul>\n<li><code>'r'<\/code>&nbsp;for read-only.<\/li>\n\n\n\n<li><code>'w'<\/code>&nbsp;for write-only.<\/li>\n\n\n\n<li><code>'a'<\/code>&nbsp;for append.<\/li>\n\n\n\n<li><code>'x'<\/code>&nbsp;for exclusive creation.<\/li>\n<\/ul>\n\n\n\n<p>In addition to these modes, you can specify text mode (<code>'t'<\/code>) or binary mode (<code>'b'<\/code>). By default, files are opened in text mode. After processing a file, we should always close it using the&nbsp;<code>close()<\/code>&nbsp;function to free up any resources associated with the file.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">File Operations<\/h3>\n\n\n\n<p>Python provides several methods to read and write files. The&nbsp;<code>read()<\/code>&nbsp;function is used to read an entire file, and the&nbsp;<code>write()<\/code>&nbsp;function is used to write to a file. However, using the write mode will completely overwrite all existing data, so it&#8217;s often preferable to use the append mode when you want to add data to an existing file.<\/p>\n\n\n\n<p>The&nbsp;<code>tell()<\/code>&nbsp;function returns the current position of the file pointer, and the&nbsp;<code>seek()<\/code>&nbsp;function changes the pointer position. This allows us to have control over where in the file we are reading or writing.<\/p>\n\n\n\n<p>Python also provides the&nbsp;<code>readlines()<\/code>&nbsp;function, which returns a list of all lines in the file. But if you want to loop through every line in the file, you can loop through the file object directly. This makes handling large files more manageable.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Reading CSVs<\/h2>\n\n\n\n<p>Python provides inbuilt support to read and write CSV files through the&nbsp;<code>csv<\/code>&nbsp;module. This module has&nbsp;<code>reader()<\/code>&nbsp;and&nbsp;<code>DictReader()<\/code>&nbsp;functions that enable us to read CSV files conveniently.<\/p>\n\n\n\n<p>The&nbsp;<code>reader()<\/code>&nbsp;function is best suited for small CSV files. Each row in the CSV is returned as a list of strings, so we can access data using indices. On the other hand,&nbsp;<code>DictReader()<\/code>&nbsp;is useful for reading in large CSV files. It works similarly to&nbsp;<code>reader()<\/code>, but it stores data in dictionaries rather than lists.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-1\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\"><span class=\"hljs-keyword\">import<\/span> csv\n<span class=\"hljs-keyword\">with<\/span> open(<span class=\"hljs-string\">\"students.csv\"<\/span>) <span class=\"hljs-keyword\">as<\/span> f:\n   reader = csv.DictReader(f, delimiter=<span class=\"hljs-string\">\",\"<\/span>)\n   <span class=\"hljs-keyword\">for<\/span> row <span class=\"hljs-keyword\">in<\/span> reader:\n       print(row)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-1\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>This code reads a CSV file and prints each row, which is represented as a dictionary where the keys correspond to the column names and the values to the data in the respective cells.<\/p>\n\n\n\n<p>In conclusion, Python&#8217;s intermediate capabilities, such as file handling and CSV parsing, make it a powerful tool for data wrangling. With these skills under your belt, you can work more efficiently and flexibly with your data, enhancing your data analysis and manipulation capabilities.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Python is known for its incredible versatility and simplicity in handling data, making it an excellent tool for data wrangling. This article will delve into the intermediate aspects of Python, such as file manipulation and reading CSV files. This guide assumes a basic knowledge of Python and Python syntax. If you need a refresher on &#8230; <a title=\"Data Wrangling with Python &#8211; Intermediate\" class=\"read-more\" href=\"http:\/\/harvard-open-data-project.local\/data-wrangling-with-python-intermediate\/\" aria-label=\"More on Data Wrangling with Python &#8211; Intermediate\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"_links":{"self":[{"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/pages\/21"}],"collection":[{"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/comments?post=21"}],"version-history":[{"count":2,"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/pages\/21\/revisions"}],"predecessor-version":[{"id":23,"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/pages\/21\/revisions\/23"}],"wp:attachment":[{"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/media?parent=21"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}