# Working with Data¶

This notebook will work through the process of importing data from
a `csv`

spreadsheet file, graphing the data, fitting a function to the
data, and reporting an experimental measurement (with uncertainty)
of the acceleration due to gravity.

- A. Import and view data
- B. Selecting the Relevant Data Slice
- C. Fitting a function
- D. Reporting fit data with uncertainty

The `pandas`

package will be used to
read and write data in `csv`

spreadsheet files.
Pandas is a powerful data analysis library --
you can learn more about it here.

To begin, we import `pandas`

, as well as `numpy`

for numerical libraries and
`matplotlib`

for making graphs.

```
import pandas as pd
from numpy import *
import matplotlib.pyplot as plt
```

## A. Import and view data¶

We will need a `csv`

data file before we can proceed.
For this example we will use the file `data_freefall.csv`

, which must be saved
in the same folder as this notebook. You can download the spreadsheet
from the Computational Physics
page, where this notebook is found.

We imported `pandas`

above as `pd`

, so you can use its code with `pd.<function name>`

.
The command on the next line will read the `csv`

data file.
The data will be stored in the variable called `mydata`

, but you can name it
whatever you like.

```
mydata = pd.read_csv('data_freefall.csv')
```

Show your data in a table:

```
mydata
```

Time (s) | Position (m) | Velocity (m/s) | Acceleration (m/s²) | |
---|---|---|---|---|

0 | 0.05 | 1.478452 | 0.014330 | 0.126563 |

1 | 0.10 | 1.478452 | 0.028660 | -0.360565 |

2 | 0.15 | 1.480098 | 0.033751 | -1.584863 |

3 | 0.20 | 1.486410 | -0.058996 | -4.102331 |

4 | 0.25 | 1.481745 | -0.339341 | -7.380276 |

... | ... | ... | ... | ... |

88 | 4.45 | 0.991392 | -0.000152 | -0.002337 |

89 | 4.50 | 0.991666 | -0.000457 | -0.007605 |

90 | 4.55 | 0.991392 | -0.001555 | -0.001118 |

91 | 4.60 | 0.991392 | -0.000610 | 0.006462 |

92 | 4.65 | 0.991392 | -0.000305 | 0.007385 |

93 rows × 4 columns

This happens to be data collected in a Physics lab experiment studying the free-fall motion of a ball. We will use this data to experimentally determine the acceleration due to gravity, $g$.

We want to get the data values out of the table so we can work with the numbers. We need to know the column names, called the "keys".

```
mydata.keys()
```

Index(['Time (s)', 'Position (m)', 'Velocity (m/s)', 'Acceleration (m/s²)'], dtype='object')

Using these keys, we put the columns of data into variables
`t`

, `y`

, `v`

and `a`

(for time, position, velocity and acceleration).

```
t = mydata['Time (s)'].values
y = mydata['Position (m)'].values
v = mydata['Velocity (m/s)'].values
a = mydata['Acceleration (m/s²)'].values
```

Let's make a quick plot of the position data. You can do this with the
libraries we imported above as `plt`

.

```
plt.plot(t,y)
```

[<matplotlib.lines.Line2D at 0x7fdb37a5ba60>]

Think about the shape of this plot to be sure it makes sense. Note that the ball is in free-fall for only the middle portion of this data set, the smooth concave-down parabola from about $t=1$ s to $t=2$ s.

A good plot should have a title and axis labels.
Also, we'll save this figure so we can share it or include it in a report, as needed.
The `.png`

extension is a good format for this type of image.

```
plt.plot(t,y)
plt.title('free-fall motion of a ball')
plt.xlabel('time (s)')
plt.ylabel('position (m)')
plt.savefig('free-fall_position_graph.png')
```

## B. Selecting the Relevant Data Slice¶

Before we continue, we need to remove the portions of data that we don't want. We will find the indicies of the data set that contain just the free-fall (parabolic) data. This is called getting a "slice" of the data.

I will first illustrate this with a simple example array.
Array `A`

has the following data:

```
A = array((42,44,47,50,54,62,77))
print(A)
```

[42 44 47 50 54 62 77]

Now define a new array called `B`

that has just a slice of the `A`

data from
index `1`

to index `4`

:

```
B = A[1:4]
print(B)
```

[44 47 50]

### Plotting a data slice¶

We want to find an initial and final index (call these `i,f`

)
for our data set that starts and ends just around the
free-fall (parabolic) portion of the data.

We will need to play around with the first line in the following code block:

`i,f = 18,40`

Change the initial and final indicies, then plot the data. Change again until you find the perfect data slice.

The code after the data slicing plots $y$ versus $t$ as well as $\Delta y$ versus $t$. A look at both will help get the best slice possible. (See the matplotlib documentation to learn about the poltting functions.)

```
i,f = 18,40 # you'll need to change these to get the right slice
tt = t[i:f] # tt is a slice of t
yy = y[i:f] # yy is a slice of y
# make plots:
plt.subplot(121) # 121 means "on a 1x2 grid, plot number 1"
plt.plot(tt,yy,'o',label='')
plt.xlabel('time (s)') # label time axis
plt.ylabel('$y$ position (m)') # label position axis
plt.title('height vs time')
plt.subplot(122) # 122 means "on a 1x2 grid, plot number 2"
Dy = yy[1:]-yy[:-1] # define Dy to be delta-y values
plt.plot( Dy, 'o',label='')
plt.xlabel('index') # label time axis
plt.title('$\Delta$ y')
```

Text(0.5, 1.0, '$\\Delta$ y')

After some experimentation, it appears that the best values of
`i,f`

are `16,36`

.

```
i,f = 16,36
tt = t[i:f] # tt is a slice of t
yy = y[i:f] # yy is a slice of y
```

## C. Fitting a function¶

The best way to find the value of the free-fall acceleration is to *fit a
function* to our data. This is a very useful experimental technique.

Suppose we tried to find the second-degree polynomial

$$ p(t) = A t^2 + B t + C \approx y(t) $$

that best matches the parabola in our data set.
Numerical Python (`numpy`

) can do this with the
function `polyfit`

. You can learn the details in the polyfit documentation.

We expect our $y$-versus-$t$ data to follow a polynomial because (in free-fall) the equation of motion is

$$ y(t) = y_0 + v_0 t - \tfrac{1}{2} g t^2 $$

where $g$ is the acceleration due to gravity.

We give our data to the `polyfit`

function and tell it to fit a polynomial of
degree 2 -- a parabola.
The function returns the coefficients $A,B,C$ that best fit our data.

```
A,B,C = polyfit(tt,yy,2) # fit a degree-2 polynomial to the t,y data
print('best fit parabola has coefficients')
print(' A=',A)
print(' B=',B)
print(' C=',C)
```

best fit parabola has coefficients A= -4.90062519936204 B= 12.995800694098895 C= -5.96120029435409

Let's see how well this polynomial fits our data by plotting them together.

To plot the parabola, we'll use the `linspace`

function to make an array of
many time values, from `tt[0]`

to the last time, `tt[-1]`

.
Then plug these times into the polynomial with the coefficients `A,B,C`

we found above.

```
t_axis = linspace(tt[0],tt[-1],100) # t_axis is an array of 100 values from initial to final
pp = A*t_axis**2 + B*t_axis + C # evaluate the polynomial at all t_axis values
plt.plot(t,y,'o',label='raw data') # plot the data with dots ('o')
plt.plot(t_axis,pp,label='best fit parabola') # plot the parabola at the t_axis times
plt.xlabel('time (s)')
plt.ylabel('position (m)')
plt.legend(loc='upper right') # show graph legend
```

<matplotlib.legend.Legend at 0x7fdb358c9540>

The fit is the line passing accurately through our data.

The process used by the `polyfit`

function is called a "least squares fit".
You can learn more about it on Wikipedia,
or in a proper treatment of error analysis.

## D. Reporting fit data with uncertainty¶

We now have coefficient $A$ from the fit, but what is the uncertainty in the fit coefficients? We want to know $A \pm \delta A$, where $\delta A$ is the absolute uncertainty, so we can report our best measure of $g \pm \delta g$.

When the `polyfit`

function is called with an additional parameter:

`polyfit(tt,yy,2,cov=True)`

it returns the `A,B,C`

coefficients as before and also a "covariance matrix"
which gives the variance in each of the fit coefficients. The variance is the
square of the standard deviation, and the standard deviation is the uncertainty
in the fit coefficient. The variance of each coefficient appears on the
diagonal of the covariance matrix.

This may all sound foreign if you haven't had any linear algebra or statistics.
Don't worry about that now -- here's the bottom line.
We find the uncertainty in each fit parameter (`dA,dB,dC`

) as follows:

```
(A,B,C),covariance = polyfit(tt,yy,2,cov=True) # fit with covariance matrix
dA,dB,dC = sqrt(diag(covariance))
print('fit coefficients are')
print(' A = ',A,'±',dA,'m/s^2')
print(' B = ',B,'±',dB,'m/s')
print(' C = ',C,'±',dC,'m')
```

fit coefficients are A = -4.90062519936204 ± 0.010819955279437353 m/s^2 B = 12.995800694098895 ± 0.028807305453603355 m/s C = -5.96120029435409 ± 0.01848475836721139 m

We can now report out experiment's measure for the value of the
acceleration due to free-fall, $g$, along with its uncertainty.
Since the coefficient `A`

in the polynomial corresponds to
$- \tfrac{1}{2} g$ in the equation
of motion (see part C above), we have $g = -2A$.

So we finally can report our experimental finding:

```
g = -2*A
dg = 2*dA
print(f'measured value of g = {g:0.3f} ± {dg:0.3f} m/s²')
```

measured value of g = 9.801 ± 0.022 m/s²