{"id":25,"date":"2023-09-14T01:59:02","date_gmt":"2023-09-14T01:59:02","guid":{"rendered":"https:\/\/harvard-open-data-project.local\/?page_id=25"},"modified":"2023-09-14T01:59:03","modified_gmt":"2023-09-14T01:59:03","slug":"data-wrangling-with-numpy-pandas","status":"publish","type":"page","link":"http:\/\/harvard-open-data-project.local\/data-wrangling-with-numpy-pandas\/","title":{"rendered":"Data Wrangling with Numpy + Pandas"},"content":{"rendered":"\n
Numpy and Pandas have become the backbone of Python data analytics and provide efficient, intuitive interfaces for data manipulation and analysis.<\/p>\n\n\n\n
To start using these libraries, we first need to install them as they are not included in the standard Python library. Once installed, it is conventional to import Numpy and Pandas with the aliases Numpy<\/a>, short for ‘Numerical Python’, is a general-purpose package that furnishes Python with efficient multi-dimensional array and matrix objects and operations. It is an essential library for scientific computing in Python because of its capability to provide high-performance multidimensional array objects.<\/p>\n\n\n\n Python lists are useful but slow. On the other hand, Numpy arrays aim to be 50x faster than traditional Python lists. This speed is due to the fact that unlike Python lists, Numpy array objects are stored at one continuous place in memory. This enables faster execution and makes Numpy arrays a popular choice for large data set analyses.<\/p>\n\n\n\n Here is a simple example of how to perform addition on Numpy arrays:<\/p>\n\n\n Pandas<\/a> is another powerful Python library specifically designed for data manipulation and analysis. The term ‘Pandas’ comes from the term ‘Panel Data’, data that contains information of individuals over a period of time.<\/p>\n\n\n\n Pandas provides two key data structures: Series and Dataframes. A Series is a one-dimensional array of values, like a column in a spreadsheet. On the other hand, a Dataframe is a two-dimensional table of data with rows and columns.<\/p>\n\n\n\n Here’s an example of creating a DataFrame:<\/p>\n\n\n Pandas also provides a multitude of functionalities, including data importing\/exporting, data cleaning, and data wrangling. For instance, you can read a CSV file directly into a DataFrame with the Similarly, you can export data from a DataFrame to a CSV file with the When working with large datasets, it is often impractical to print out the entire dataset. To overcome this, you can use the For example:<\/p>\n\n\n Additionally, you can inspect the column names of a DataFrame:<\/p>\n\n\n With Numpy and Pandas, not only can data be handled efficiently, but it can also be manipulated and analyzed in a flexible manner. These powerful libraries make Python an excellent language for data wrangling and analysis, simplifying complex computations and offering an intuitive syntax that is easy to follow.<\/p>\n\n\n\n Whether you are performing basic mathematical operations with Numpy or conducting sophisticated data cleaning with Pandas, you can achieve your goals quickly and efficiently. Start exploring these libraries and unlock the potential of data wrangling in Python.<\/p>\n","protected":false},"excerpt":{"rendered":" Numpy and Pandas have become the backbone of Python data analytics and provide efficient, intuitive interfaces for data manipulation and analysis. Getting Started with Numpy and Pandas To start using these libraries, we first need to install them as they are not included in the standard Python library. Once installed, it is conventional to import … Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"_links":{"self":[{"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/pages\/25"}],"collection":[{"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/comments?post=25"}],"version-history":[{"count":1,"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/pages\/25\/revisions"}],"predecessor-version":[{"id":26,"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/pages\/25\/revisions\/26"}],"wp:attachment":[{"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/media?parent=25"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}np<\/code> and
pd<\/code> respectively, as we will be using their functions frequently.<\/p>\n\n\n
import<\/span> numpy as<\/span> np\nimport<\/span> pandas as<\/span> pd<\/code><\/span>Code language:<\/span> JavaScript<\/span> (<\/span>javascript<\/span>)<\/span><\/small><\/pre>\n\n\n
Numpy: Numerical Python<\/h2>\n\n\n\n
import numpy as<\/span> np\n\nx = np.array<\/span>([1<\/span>, 0<\/span>, 0<\/span>, 1<\/span>])\ny = np.array<\/span>([-1<\/span>, 5<\/span>, 10<\/span>, -1<\/span>])\nprint<\/span>(x + y)\n<\/code><\/span>Code language:<\/span> PHP<\/span> (<\/span>php<\/span>)<\/span><\/small><\/pre>\n\n\n
Pandas: Powerful Data Analysis<\/h2>\n\n\n\n
import<\/span> pandas as<\/span> pd\n\ndf = pd.DataFrame(data={'col1'<\/span>: [1<\/span>, 2<\/span>, 3<\/span>, 4<\/span>], 'col2'<\/span>: [5<\/span>, 6<\/span>, 7<\/span>, 8<\/span>]},\n index=[\"row1\"<\/span>, \"row2\"<\/span>, \"row3\"<\/span>, \"row4\"<\/span>])\nprint(df)\n<\/code><\/span>Code language:<\/span> JavaScript<\/span> (<\/span>javascript<\/span>)<\/span><\/small><\/pre>\n\n\n
read_csv()<\/code> function:<\/p>\n\n\n
import<\/span> pandas as<\/span> pd\n\nearthquakes = pd.read_csv(\"earthquakes.csv\"<\/span>)\n<\/code><\/span>Code language:<\/span> JavaScript<\/span> (<\/span>javascript<\/span>)<\/span><\/small><\/pre>\n\n\n
to_csv()<\/code> function:<\/p>\n\n\n
earthquakes<\/span>.to_csv<\/span>(\"new_earthquakes<\/span>.csv<\/span>\")\n<\/code><\/span>Code language:<\/span> CSS<\/span> (<\/span>css<\/span>)<\/span><\/small><\/pre>\n\n\n
Wrangling Data with Numpy and Pandas<\/h2>\n\n\n\n
head()<\/code> function to examine the first five rows of the DataFrame. Similarly, the
tail()<\/code> function allows you to view the last few rows.<\/p>\n\n\n\n
print<\/span>(earthquakes<\/span>.head<\/span>())\n<\/code><\/span>Code language:<\/span> CSS<\/span> (<\/span>css<\/span>)<\/span><\/small><\/pre>\n\n\n
print<\/span>(earthquakes<\/span>.columns<\/span>)\n<\/code><\/span>Code language:<\/span> CSS<\/span> (<\/span>css<\/span>)<\/span><\/small><\/pre>\n\n\n