{
"cells": [
{
"cell_type": "markdown",
"source": [
"# Working with Data\n",
"\n",
"This notebook will work through the process of importing data from \n",
"a `csv` spreadsheet file, graphing the data, fitting a function to the\n",
"data, and reporting an experimental measurement (with uncertainty)\n",
"of the acceleration due to gravity.\n",
"\n",
"\n",
"- [A. Import and view data](#A.-Import-and-view-data)\n",
"- [B. Selecting the Relevant Data Slice](#B.-Selecting-the-Relevant-Data-Slice)\n",
"- [C. Fitting a function](#C.-Fitting-a-function)\n",
"- [D. Reporting fit data with uncertainty](#D.-Reporting-fit-data-with-uncertainty)\n",
"\n",
"\n",
"The `pandas` package will be used to \n",
"read and write data in `csv` spreadsheet files.\n",
"Pandas is a powerful data analysis library --\n",
"you can learn more about it [here][pandas_docs].\n",
"\n",
"[pandas_docs]: https://pandas.pydata.org/pandas-docs/stable/user_guide/index.html#user-guide\n",
"\n",
"To begin, we import `pandas`, as well as `numpy` for numerical libraries and\n",
"`matplotlib` for making graphs.\n"
],
"metadata": {}
},
{
"cell_type": "code",
"source": [
"import pandas as pd\n",
"from numpy import *\n",
"import matplotlib.pyplot as plt"
],
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## A. Import and view data\n",
"\n",
"We will need a `csv` data file before we can proceed. \n",
"For this example we will use the file `data_freefall.csv`, which must be saved\n",
"in the same folder as this notebook. You can download the spreadsheet\n",
"from the [Computational Physics](https://madisoncollegephysics.net/comp/)\n",
"page, where this notebook is found.\n",
"\n",
"We imported `pandas` above as `pd`, so you can use its code with `pd.`.\n",
"The command on the next line will read the `csv` data file. \n",
"The data will be stored in the variable called `mydata`, but you can name it\n",
"whatever you like.\n"
],
"metadata": {}
},
{
"cell_type": "code",
"source": [
"mydata = pd.read_csv('data_freefall.csv')"
],
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Show your data in a table:\n"
],
"metadata": {}
},
{
"cell_type": "code",
"source": [
"mydata"
],
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"This happens to be data collected in a Physics lab experiment studying the\n",
"free-fall motion of a ball. We will use this data to experimentally determine\n",
"the acceleration due to gravity, $g$.\n",
"\n",
"We want to get the data values out of the table so we can work with the\n",
"numbers. We need to know the column names, called the \"keys\".\n"
],
"metadata": {}
},
{
"cell_type": "code",
"source": [
"mydata.keys()"
],
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Using these keys, we put the columns of data into variables \n",
"`t`, `y`, `v` and `a` (for time, position, velocity and acceleration).\n"
],
"metadata": {}
},
{
"cell_type": "code",
"source": [
"t = mydata['Time (s)'].values\n",
"y = mydata['Position (m)'].values\n",
"v = mydata['Velocity (m/s)'].values\n",
"a = mydata['Acceleration (m/s\u00b2)'].values"
],
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Let's make a quick plot of the position data. You can do this with the\n",
"libraries we imported above as `plt`.\n"
],
"metadata": {}
},
{
"cell_type": "code",
"source": [
"plt.plot(t,y)"
],
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Think about the shape of this plot to be sure it makes sense. \n",
"Note that the ball is in free-fall for only the middle portion \n",
"of this data set, the smooth concave-down parabola from about\n",
"$t=1$ s to $t=2$ s.\n",
"\n",
"A good plot should have a title and axis labels. \n",
"Also, we'll save this figure so we can share it or include it in a report, as needed.\n",
"The `.png` extension is a good format for this type of image.\n"
],
"metadata": {}
},
{
"cell_type": "code",
"source": [
"plt.plot(t,y)\n",
"plt.title('free-fall motion of a ball')\n",
"plt.xlabel('time (s)')\n",
"plt.ylabel('position (m)')\n",
"plt.savefig('free-fall_position_graph.png')"
],
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## B. Selecting the Relevant Data Slice\n",
"\n",
"Before we continue, we need to remove the portions of data that we don't want.\n",
"We will find the indicies of the data set that contain \n",
"just the free-fall (parabolic) data.\n",
"This is called getting a \"slice\" of the data.\n",
"\n",
"I will first illustrate this with a simple example array.\n",
"Array `A` has the following data:\n"
],
"metadata": {}
},
{
"cell_type": "code",
"source": [
"A = array((42,44,47,50,54,62,77))\n",
"print(A)"
],
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Now define a new array called `B` that has just a slice of the `A` data from\n",
"index `1` to index `4`:\n"
],
"metadata": {}
},
{
"cell_type": "code",
"source": [
"B = A[1:4]\n",
"print(B)"
],
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### Plotting a data slice\n",
"\n",
"We want to find an initial and final index (call these `i,f`) \n",
"for our data set that starts and ends just around the \n",
"free-fall (parabolic) portion of the data.\n",
"\n",
"We will need to play around with the first line in the following code block:\n",
"\n",
"> `i,f = 18,40`\n",
"\n",
"Change the initial and final indicies, then plot the data. Change again until\n",
"you find the perfect data slice.\n",
"\n",
"The code after the data slicing plots $y$ versus $t$ as well as $\\Delta y$\n",
"versus $t$. A look at both will help get the best slice possible. (See the\n",
"[matplotlib documentation] to learn about the poltting functions.)\n",
"\n",
"[matplotlib documentation]: https://matplotlib.org/api/_as_gen/matplotlib.pyplot.plot.html#matplotlib.pyplot.plot\n"
],
"metadata": {}
},
{
"cell_type": "code",
"source": [
"i,f = 18,40 # you'll need to change these to get the right slice\n",
"tt = t[i:f] # tt is a slice of t\n",
"yy = y[i:f] # yy is a slice of y\n",
"\n",
"# make plots:\n",
"plt.subplot(121) # 121 means \"on a 1x2 grid, plot number 1\"\n",
"plt.plot(tt,yy,'o',label='')\n",
"plt.xlabel('time (s)') # label time axis \n",
"plt.ylabel('$y$ position (m)') # label position axis \n",
"plt.title('height vs time')\n",
"plt.subplot(122) # 122 means \"on a 1x2 grid, plot number 2\"\n",
"Dy = yy[1:]-yy[:-1] # define Dy to be delta-y values\n",
"plt.plot( Dy, 'o',label='')\n",
"plt.xlabel('index') # label time axis \n",
"plt.title('$\\Delta$ y')"
],
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"After some experimentation, it appears that the best values of \n",
"`i,f` are `16,36`.\n"
],
"metadata": {}
},
{
"cell_type": "code",
"source": [
"i,f = 16,36 \n",
"tt = t[i:f] # tt is a slice of t\n",
"yy = y[i:f] # yy is a slice of y"
],
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## C. Fitting a function\n",
"\n",
"The best way to find the value of the free-fall acceleration is to *fit a\n",
"function* to our data. This is a very useful experimental technique.\n",
"\n",
"Suppose we tried to find the second-degree polynomial \n",
"\n",
"$$\n",
"p(t) = A t^2 + B t + C \\approx y(t)\n",
"$$ \n",
"\n",
"that best matches the parabola in our data set. \n",
"Numerical Python (`numpy`) can do this with the\n",
"function `polyfit`. You can learn the details in the [polyfit documentation].\n",
"\n",
"[polyfit documentation]: https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html\n",
"\n",
"We expect our $y$-versus-$t$ data to follow a polynomial because (in free-fall)\n",
"the equation of motion is\n",
"\n",
"$$\n",
"y(t) = y_0 + v_0 t - \\tfrac{1}{2} g t^2\n",
"$$\n",
"\n",
"where $g$ is the acceleration due to gravity.\n",
"\n",
"We give our data to the `polyfit` function and tell it to fit a polynomial of\n",
"degree 2 -- a parabola. \n",
"The function returns the coefficients $A,B,C$ that best fit our data.\n"
],
"metadata": {}
},
{
"cell_type": "code",
"source": [
"A,B,C = polyfit(tt,yy,2) # fit a degree-2 polynomial to the t,y data\n",
"print('best fit parabola has coefficients')\n",
"print(' A=',A)\n",
"print(' B=',B)\n",
"print(' C=',C)"
],
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Let's see how well this polynomial fits our data by plotting them together. \n",
"\n",
"To plot the parabola, we'll use the `linspace` function to make an array of\n",
"many time values, from `tt[0]` to the last time, `tt[-1]`. \n",
"Then plug these times into the polynomial with the coefficients `A,B,C` we found above.\n"
],
"metadata": {}
},
{
"cell_type": "code",
"source": [
"t_axis = linspace(tt[0],tt[-1],100) # t_axis is an array of 100 values from initial to final\n",
"pp = A*t_axis**2 + B*t_axis + C # evaluate the polynomial at all t_axis values\n",
"plt.plot(t,y,'o',label='raw data') # plot the data with dots ('o')\n",
"plt.plot(t_axis,pp,label='best fit parabola') # plot the parabola at the t_axis times\n",
"plt.xlabel('time (s)')\n",
"plt.ylabel('position (m)')\n",
"plt.legend(loc='upper right') # show graph legend"
],
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"The fit is the line passing accurately through our data.\n",
"\n",
"The process used by the `polyfit` function is called a \"least squares fit\". \n",
"You can learn more about it on [Wikipedia], \n",
"or in a [proper treatment of error analysis].\n",
"\n",
"[Wikipedia]: https://en.wikipedia.org/wiki/Least_squares\n",
"[proper treatment of error analysis]: https://www.amazon.com/Introduction-Error-Analysis-Uncertainties-Measurements/dp/093570275X\n",
"\n",
"## D. Reporting fit data with uncertainty\n",
"\n",
"We now have coefficient $A$ from the fit, but \n",
"what is the uncertainty in the fit coefficients? \n",
"We want to know $A \\pm \\delta A$, where $\\delta A$ is the\n",
"absolute uncertainty, so we can report our best measure of\n",
"$g \\pm \\delta g$. \n",
"\n",
"When the `polyfit` function is called with an additional parameter:\n",
"\n",
" polyfit(tt,yy,2,cov=True)\n",
"\n",
"it returns the `A,B,C` coefficients as before and also a \"covariance matrix\"\n",
"which gives the variance in each of the fit coefficients. The variance is the\n",
"square of the standard deviation, and the standard deviation is the uncertainty\n",
"in the fit coefficient. The variance of each coefficient appears on the\n",
"diagonal of the covariance matrix. \n",
"\n",
"This may all sound foreign if you haven't had any linear algebra or statistics.\n",
"Don't worry about that now -- here's the bottom line.\n",
"We find the uncertainty in each fit parameter (`dA,dB,dC`) as follows:\n"
],
"metadata": {}
},
{
"cell_type": "code",
"source": [
"(A,B,C),covariance = polyfit(tt,yy,2,cov=True) # fit with covariance matrix\n",
"dA,dB,dC = sqrt(diag(covariance))\n",
"print('fit coefficients are')\n",
"print(' A = ',A,'\u00b1',dA,'m/s^2')\n",
"print(' B = ',B,'\u00b1',dB,'m/s')\n",
"print(' C = ',C,'\u00b1',dC,'m')"
],
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"We can now report out experiment's measure for the value of the\n",
"acceleration due to free-fall, $g$, along with its uncertainty.\n",
"Since the coefficient `A` in the polynomial corresponds to \n",
"$- \\tfrac{1}{2} g$ in the equation\n",
"of motion (see part C above), we have $g = -2A$.\n",
"\n",
"So we finally can report our experimental finding:\n"
],
"metadata": {}
},
{
"cell_type": "code",
"source": [
"g = -2*A\n",
"dg = 2*dA\n",
"print(f'measured value of g = {g:0.3f} \u00b1 {dg:0.3f} m/s\u00b2')"
],
"metadata": {},
"execution_count": null,
"outputs": []
}
],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 5
}