{"id":25,"date":"2023-09-14T01:59:02","date_gmt":"2023-09-14T01:59:02","guid":{"rendered":"https:\/\/harvard-open-data-project.local\/?page_id=25"},"modified":"2023-09-14T01:59:03","modified_gmt":"2023-09-14T01:59:03","slug":"data-wrangling-with-numpy-pandas","status":"publish","type":"page","link":"http:\/\/harvard-open-data-project.local\/data-wrangling-with-numpy-pandas\/","title":{"rendered":"Data Wrangling with Numpy + Pandas"},"content":{"rendered":"\n<p>Numpy and Pandas have become the backbone of Python data analytics and provide efficient, intuitive interfaces for data manipulation and analysis.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Getting Started with Numpy and Pandas<\/h2>\n\n\n\n<p>To start using these libraries, we first need to install them as they are not included in the standard Python library. Once installed, it is conventional to import Numpy and Pandas with the aliases&nbsp;<code>np<\/code>&nbsp;and&nbsp;<code>pd<\/code>&nbsp;respectively, as we will be using their functions frequently.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-1\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\"><span class=\"hljs-keyword\">import<\/span> numpy <span class=\"hljs-keyword\">as<\/span> np\n<span class=\"hljs-keyword\">import<\/span> pandas <span class=\"hljs-keyword\">as<\/span> pd<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-1\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h2 class=\"wp-block-heading\">Numpy: Numerical Python<\/h2>\n\n\n\n<p><a href=\"https:\/\/numpy.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">Numpy<\/a>, short for &#8216;Numerical Python&#8217;, is a general-purpose package that furnishes Python with efficient multi-dimensional array and matrix objects and operations. It is an essential library for scientific computing in Python because of its capability to provide high-performance multidimensional array objects.<\/p>\n\n\n\n<p>Python lists are useful but slow. On the other hand, Numpy arrays aim to be 50x faster than traditional Python lists. This speed is due to the fact that unlike Python lists, Numpy array objects are stored at one continuous place in memory. This enables faster execution and makes Numpy arrays a popular choice for large data set analyses.<\/p>\n\n\n\n<p>Here is a simple example of how to perform addition on Numpy arrays:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-2\" data-shcb-language-name=\"PHP\" data-shcb-language-slug=\"php\"><span><code class=\"hljs language-php\">import numpy <span class=\"hljs-keyword\">as<\/span> np\n\nx = np.<span class=\"hljs-keyword\">array<\/span>(&#91;<span class=\"hljs-number\">1<\/span>, <span class=\"hljs-number\">0<\/span>, <span class=\"hljs-number\">0<\/span>, <span class=\"hljs-number\">1<\/span>])\ny = np.<span class=\"hljs-keyword\">array<\/span>(&#91;<span class=\"hljs-number\">-1<\/span>, <span class=\"hljs-number\">5<\/span>, <span class=\"hljs-number\">10<\/span>, <span class=\"hljs-number\">-1<\/span>])\n<span class=\"hljs-keyword\">print<\/span>(x + y)\n<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-2\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">PHP<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">php<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h2 class=\"wp-block-heading\">Pandas: Powerful Data Analysis<\/h2>\n\n\n\n<p><a href=\"https:\/\/pandas.pydata.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">Pandas<\/a>&nbsp;is another powerful Python library specifically designed for data manipulation and analysis. The term &#8216;Pandas&#8217; comes from the term &#8216;Panel Data&#8217;, data that contains information of individuals over a period of time.<\/p>\n\n\n\n<p>Pandas provides two key data structures: Series and Dataframes. A Series is a one-dimensional array of values, like a column in a spreadsheet. On the other hand, a Dataframe is a two-dimensional table of data with rows and columns.<\/p>\n\n\n\n<p>Here&#8217;s an example of creating a DataFrame:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-3\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\"><span class=\"hljs-keyword\">import<\/span> pandas <span class=\"hljs-keyword\">as<\/span> pd\n\ndf = pd.DataFrame(data={<span class=\"hljs-string\">'col1'<\/span>: &#91;<span class=\"hljs-number\">1<\/span>, <span class=\"hljs-number\">2<\/span>, <span class=\"hljs-number\">3<\/span>, <span class=\"hljs-number\">4<\/span>], <span class=\"hljs-string\">'col2'<\/span>: &#91;<span class=\"hljs-number\">5<\/span>, <span class=\"hljs-number\">6<\/span>, <span class=\"hljs-number\">7<\/span>, <span class=\"hljs-number\">8<\/span>]},\n                  index=&#91;<span class=\"hljs-string\">\"row1\"<\/span>, <span class=\"hljs-string\">\"row2\"<\/span>, <span class=\"hljs-string\">\"row3\"<\/span>, <span class=\"hljs-string\">\"row4\"<\/span>])\nprint(df)\n<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-3\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Pandas also provides a multitude of functionalities, including data importing\/exporting, data cleaning, and data wrangling. For instance, you can read a CSV file directly into a DataFrame with the&nbsp;<code>read_csv()<\/code>&nbsp;function:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-4\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\"><span class=\"hljs-keyword\">import<\/span> pandas <span class=\"hljs-keyword\">as<\/span> pd\n\nearthquakes = pd.read_csv(<span class=\"hljs-string\">\"earthquakes.csv\"<\/span>)\n<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-4\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Similarly, you can export data from a DataFrame to a CSV file with the&nbsp;<code>to_csv()<\/code>&nbsp;function:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-5\" data-shcb-language-name=\"CSS\" data-shcb-language-slug=\"css\"><span><code class=\"hljs language-css\"><span class=\"hljs-selector-tag\">earthquakes<\/span><span class=\"hljs-selector-class\">.to_csv<\/span>(\"<span class=\"hljs-selector-tag\">new_earthquakes<\/span><span class=\"hljs-selector-class\">.csv<\/span>\")\n<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-5\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">CSS<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">css<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h2 class=\"wp-block-heading\">Wrangling Data with Numpy and Pandas<\/h2>\n\n\n\n<p>When working with large datasets, it is often impractical to print out the entire dataset. To overcome this, you can use the&nbsp;<code>head()<\/code>&nbsp;function to examine the first five rows of the DataFrame. Similarly, the&nbsp;<code>tail()<\/code>&nbsp;function allows you to view the last few rows.<\/p>\n\n\n\n<p>For example:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-6\" data-shcb-language-name=\"CSS\" data-shcb-language-slug=\"css\"><span><code class=\"hljs language-css\"><span class=\"hljs-selector-tag\">print<\/span>(<span class=\"hljs-selector-tag\">earthquakes<\/span><span class=\"hljs-selector-class\">.head<\/span>())\n<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-6\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">CSS<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">css<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Additionally, you can inspect the column names of a DataFrame:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-7\" data-shcb-language-name=\"CSS\" data-shcb-language-slug=\"css\"><span><code class=\"hljs language-css\"><span class=\"hljs-selector-tag\">print<\/span>(<span class=\"hljs-selector-tag\">earthquakes<\/span><span class=\"hljs-selector-class\">.columns<\/span>)\n<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-7\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">CSS<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">css<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>With Numpy and Pandas, not only can data be handled efficiently, but it can also be manipulated and analyzed in a flexible manner. These powerful libraries make Python an excellent language for data wrangling and analysis, simplifying complex computations and offering an intuitive syntax that is easy to follow.<\/p>\n\n\n\n<p>Whether you are performing basic mathematical operations with Numpy or conducting sophisticated data cleaning with Pandas, you can achieve your goals quickly and efficiently. Start exploring these libraries and unlock the potential of data wrangling in Python.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Numpy and Pandas have become the backbone of Python data analytics and provide efficient, intuitive interfaces for data manipulation and analysis. Getting Started with Numpy and Pandas To start using these libraries, we first need to install them as they are not included in the standard Python library. Once installed, it is conventional to import &#8230; <a title=\"Data Wrangling with Numpy + Pandas\" class=\"read-more\" href=\"http:\/\/harvard-open-data-project.local\/data-wrangling-with-numpy-pandas\/\" aria-label=\"More on Data Wrangling with Numpy + Pandas\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"_links":{"self":[{"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/pages\/25"}],"collection":[{"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/comments?post=25"}],"version-history":[{"count":1,"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/pages\/25\/revisions"}],"predecessor-version":[{"id":26,"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/pages\/25\/revisions\/26"}],"wp:attachment":[{"href":"http:\/\/harvard-open-data-project.local\/wp-json\/wp\/v2\/media?parent=25"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}