{"id":21,"date":"2023-09-13T09:56:18","date_gmt":"2023-09-13T09:56:18","guid":{"rendered":"http:\/\/harvard-open-data-project.local\/?page_id=21"},"modified":"2023-09-14T01:54:42","modified_gmt":"2023-09-14T01:54:42","slug":"data-wrangling-with-python-intermediate","status":"publish","type":"page","link":"http:\/\/harvard-open-data-project.local\/data-wrangling-with-python-intermediate\/","title":{"rendered":"Data Wrangling with Python – Intermediate"},"content":{"rendered":"\n
Python is known for its incredible versatility and simplicity in handling data, making it an excellent tool for data wrangling. This article will delve into the intermediate aspects of Python, such as file manipulation and reading CSV files. This guide assumes a basic knowledge of Python and Python syntax. If you need a refresher on Python basics, check out HODP’s Python for beginners guid<\/a>e<\/a>.<\/p>\n\n\n\n File I\/O operations are crucial in Python, especially when dealing with large amounts of data stored outside of Python, such as Excel spreadsheets. Python makes it easy to read and write files in different modes, making data manipulation efficient and straightforward.<\/p>\n\n\n\n To open a file in Python, we use the built-in Python provides different modes for opening a file. The common modes are:<\/p>\n\n\n\n In addition to these modes, you can specify text mode ( Python provides several methods to read and write files. The The Python also provides the Python provides inbuilt support to read and write CSV files through the The This code reads a CSV file and prints each row, which is represented as a dictionary where the keys correspond to the column names and the values to the data in the respective cells.<\/p>\n\n\n\n In conclusion, Python’s intermediate capabilities, such as file handling and CSV parsing, make it a powerful tool for data wrangling. With these skills under your belt, you can work more efficiently and flexibly with your data, enhancing your data analysis and manipulation capabilities.<\/p>\n","protected":false},"excerpt":{"rendered":" Python is known for its incredible versatility and simplicity in handling data, making it an excellent tool for data wrangling. This article will delve into the intermediate aspects of Python, such as file manipulation and reading CSV files. This guide assumes a basic knowledge of Python and Python syntax. If you need a refresher on … Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"_links":{"self":[{"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/pages\/21"}],"collection":[{"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/comments?post=21"}],"version-history":[{"count":2,"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/pages\/21\/revisions"}],"predecessor-version":[{"id":23,"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/pages\/21\/revisions\/23"}],"wp:attachment":[{"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/media?parent=21"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}File Input\/Output (I\/O)<\/h2>\n\n\n\n
Opening Files<\/h3>\n\n\n\n
open()<\/code> function.
open()<\/code> takes two arguments, the file’s name (or path) and the mode in which we want to open the file.<\/p>\n\n\n\n
\n
'r'<\/code> for read-only.<\/li>\n\n\n\n
'w'<\/code> for write-only.<\/li>\n\n\n\n
'a'<\/code> for append.<\/li>\n\n\n\n
'x'<\/code> for exclusive creation.<\/li>\n<\/ul>\n\n\n\n
't'<\/code>) or binary mode (
'b'<\/code>). By default, files are opened in text mode. After processing a file, we should always close it using the
close()<\/code> function to free up any resources associated with the file.<\/p>\n\n\n\n
File Operations<\/h3>\n\n\n\n
read()<\/code> function is used to read an entire file, and the
write()<\/code> function is used to write to a file. However, using the write mode will completely overwrite all existing data, so it’s often preferable to use the append mode when you want to add data to an existing file.<\/p>\n\n\n\n
tell()<\/code> function returns the current position of the file pointer, and the
seek()<\/code> function changes the pointer position. This allows us to have control over where in the file we are reading or writing.<\/p>\n\n\n\n
readlines()<\/code> function, which returns a list of all lines in the file. But if you want to loop through every line in the file, you can loop through the file object directly. This makes handling large files more manageable.<\/p>\n\n\n\n
Reading CSVs<\/h2>\n\n\n\n
csv<\/code> module. This module has
reader()<\/code> and
DictReader()<\/code> functions that enable us to read CSV files conveniently.<\/p>\n\n\n\n
reader()<\/code> function is best suited for small CSV files. Each row in the CSV is returned as a list of strings, so we can access data using indices. On the other hand,
DictReader()<\/code> is useful for reading in large CSV files. It works similarly to
reader()<\/code>, but it stores data in dictionaries rather than lists.<\/p>\n\n\n
import<\/span> csv\nwith<\/span> open(\"students.csv\"<\/span>) as<\/span> f:\n reader = csv.DictReader(f, delimiter=\",\"<\/span>)\n for<\/span> row in<\/span> reader:\n print(row)<\/code><\/span>Code language:<\/span> JavaScript<\/span> (<\/span>javascript<\/span>)<\/span><\/small><\/pre>\n\n\n