Pearson Correlation Coefficient Formula: How to Calculate and Interpret

Doing quality research is never an easy task. Actually, it’s much more than just coming up with relevant questions and answer options and collecting data. You need to make sure the data obtained is properly analyzed in order to draw meaningful conclusions that can have a positive effect on your business or scientific study.

Research analysis often involves a lot of statistical work – math, formulas, calculations. Whether you’re just starting out with statistics or you’re a seasoned researcher, the chances are high that you’re going to be dealing with the Pearson correlation coefficient formula.

Before we go into more detail about the formula itself, how you can calculate the correlation value, and what you can do with it, let’s just briefly define the correlation coefficients, particularly the Pearson’s.

What are correlation coefficients?

In statistics, correlation coefficients are used to calculate the strength of a relationship between variables or sets of data. Even though there are several types of correlation coefficients (including sample correlation coefficient and population correlation coefficient), when talking about the correlation coefficient, you’re most likely referring to the Pearson correlation coefficient formula.

Pearson Correlation Coefficient

The Pearson correlation coefficient (also referred to as the Pearson product-moment correlation coefficient, the Pearson R test, or the bivariate correlation) is the most common correlation measure in statistics, used in linear regression.

In simple terms, the Pearson correlation shows a linear relationship between two datasets (or two variables – X and Y), answering the question “how can I draw a line graph to represent this data?” It can hold a value between +1 and −1, where +1 represents a total positive linear correlation, 0 stands for no linear correlation, while −1 means total negative linear correlation.

 

Pearson correlation coefficient

Pearson correlation coefficient formula was developed by Karl Pearson, who built upon a related concept initially introduced in the 1880s by Francis Galton while relying upon a mathematical formula first derived in 1844 by Auguste Bravais.

Pearson Correlation Coefficient Formula

Pearson correlation coefficient formula can be applied to a population or to a sample. To learn more about the difference between the two, here’s a post that explores population vs sample in more detail. Let’s explore both coefficient formulas.

Population

When applied to a population, the Pearson correlation coefficient is represented by ρ (Greek letter “rho”) and is also referred to as the population Pearson correlation coefficient. When we take two variables (X and Y) into account, the Pearson correlation coefficient formula looks like this:

Pearson correlation coefficient formula 0

Where:

  • cov stands for covariance
  • σX represents the standard deviation of X
  • σY represents the standard deviation of Y

The Pearson correlation coefficient formula can also be expressed with regard to mean and expectation. Given that

Pearson correlation coefficient formula 1

the formula would look like this:

Pearson correlation coefficient formula

Where:

  • E stands for the expected value (or expectation)
  • μX represents the mean of X
  • μY represents the mean of Y
  • σX represents the standard deviation of X
  • σY represents the standard deviation of Y

Sample

When applied to a sample, the Pearson correlation coefficient is represented by rxy  and is also referred to as the sample Pearson correlation coefficient. In this case, the Pearson correlation coefficient formula can be derived by substituting covariance and variance estimates based on a particular sample into the formula given above.

The new formula would look like this:

Pearson correlation coefficient formula 2

Where:

  • n stands for sample size
  • xi and yi  represent the individual sample points indexed with i
  • and ȳ represent the sample mean

How to calculate the Pearson Correlation Coefficient

Ok, so now you know what the Pearson correlation coefficient formula looks like, but unless you have a diploma in statistics, all those variables and Greek letters might not mean much to you. That’s why I’m going to try to present you with several ways to calculate the correlation coefficient (without a PhD in math).

When looking to determine the strength of a relationship between variables, you first need to follow the Pearson correlation coefficient formula shown above to obtain what is known as the coefficient value. As already mentioned above, it can range anywhere between -1.00 and 1.00.

If the value is negative, the two variables are negatively correlated. This means that if one value increases, the other will decrease. If it’s positive, the correlation is positive as well, meaning that both values will either increase or decrease simultaneously.

Here are two simple ways you can calculate Pearson’s correlation coefficient.

By hand

If you’re looking for some serious skill exercise or just want to refresh your statistics and math knowledge, you could calculate the correlation coefficient by hand. Here’s an easy-to-follow video tutorial that shows the exact steps.

Using Google Sheets

Enter your two data sets into two Google Sheet columns (e.g. “x” data can go into column A and “y” data into column B).

Select column C (for example) and choose the CORREL function from the list of available functions.

Google Sheets CORREL

Select the location of data to be used with the CORREL function in the “Array 1” and “Array 2” fields. For instance, “A2:A15” can be the value of your Array 1 field and “B2:B15” can go into the Array 2 field (or whatever the number of values used for each of the variables).

Once you assign the Array values, the result will appear in column C (or whichever field you have chosen for the CORREL function).

Correlation interpretation

Now that you know how to calculate the Pearson correlation coefficient, you simply have to do the math. Or use an advanced scientific calculator to calculate it for you. Or get your own personal statistician to do the calculations instead of you.

Whatever your calculation weapon of choice, the most important thing is how you can interpret the calculated data and what kind of conclusions you can draw from it. Keep in mind that there are no universal guidelines for interpreting correlation. What does this mean?

Well, if you’re looking to verify a scientific law using high-precision instruments and technology, a correlation of 0.7 might be regarded as extremely low. On the other hand, the same correlation might be considered as high if you’re conducting a social sciences experiment that involves highly-subjective input from survey respondents.

In any case, it largely depends on the context and purpose of your research.

Quinnipiac University’s Political Science Department has published a list of “crude estimates” for interpreting the meaning of Pearson’s Correlation coefficients.

r value =
+.70 or higher Very strong positive relationship
+.40 to +.69 Strong positive relationship
+.30 to +.39 Moderate positive relationship
+.20 to +.29 Weak positive relationship
+.01 to +.19 No or negligible relationship
0 No relationship [zero correlation]
-.01 to -.19 No or negligible relationship
-.20 to -.29 Weak negative relationship
-.30 to -.39 Moderate negative relationship
-.40 to -.69 Strong negative relationship
-.70 or higher Very strong negative relationship

How to use the Pearson correlation coefficient in your research (an example)

Let’s imagine you’re doing research on the work-life balance of your employees. You are looking to find out more about the relationship strength of your employees’ stress levels and the number of weekly work hours.

First, you need to have your employees take a survey where they would respond to questions related to their work hours and how those make them feel. If you’re looking for inspiration, here is an employee satisfaction survey you can check out.

Once you’ve used your survey to gather data from your employees, you need to put the Pearson correlation coefficient formula in practice to calculate the relationship between the two variables (work hours & stress level).

If you find out that your Pearson correlation coefficient value is, let’s say -0.06, this would mean (according to the interpretations in the table above) that there is a strong negative relationship, meaning that there is a weak relationship between your employees’ work hours and their stress levels.

Now that you know how to calculate and interpret the correlation coefficient, you need to ensure that you are well-equipped with another non-optional prerequisite – data. To collect data for your surveys and research more effectively, feel free to check out the list of our survey examples – they are free for use to all of our active users!