Data Management
Essay by SPAINE • November 6, 2012 • Coursework • 4,255 Words (18 Pages) • 1,476 Views
12 Data Management
MDM4U1
Exam Review
By: Anthony Nguyen
Unit 1: The Power of Information
Terms
Discrete data- a set data that only takes on integer/specific values (non-decimal), i.e. Frequency
Continuous data- data that takes on both integer values and decimal values, i.e. Height, measurement
Types of Graphs
Bar graph- a graph that takes on discrete data and puts their frequency or related stat (involving numbers) into bars, good for displaying frequency and nominal data for discrete data
Histogram- a graph that takes on continuous data, and looks at the ranges of data rather than each specific piece of data. The frequency of each range is put into bars. Good for displaying frequency for continuous data
Double Bar Graph- Where two sets of data are represented with two bars. Good for comparing two sets of data.
Circle graph- A graph which displays the complete set of discrete data in a circle. Each section of the circle graph contains a piece of data, and the size (angle) of the piece is relative to the percentage of the data. To calculate the angle of each piece:
1. Find the percentage of the piece of data relative to the entire set of data.
2. Take the percentage (as a decimal, not a percent) and multiply it by 360.
3. This is the angle you use for the circle graph. Use the protractor to draw the angle in your circle graph.
Good for displaying percentages of discrete data.
Line graph- A graph where the data is displayed as a line. Good for displaying trends.
Multiple line graph-There are multiple sets of data on the graph displayed on multiple lines. Good for displaying multiple sets of data and comparing their trends.
Pictograph- A graph where pictures represent the frequency of data. Good for large sets of data.
Frequency Table-A table where tally is taken and recorded in the tally column, and the frequency of each tally are recorded in the frequency column.
Scatter Plots
Scatter plots are plots that can compare two sets of data that depend on each other. The independent variable is put on the x axis, and the dependent variable is put on the y axis. Each point is then plotted on the scatter plot.
The trend of a scatter plot can be described with a line of best fit. It is basically a line drawn that is supposed to represent the general direction of the points on the scatter plot and where they're headed.
When describing the trends on a scatter plot, it is either strong or weak, linear or non-linear, positive or negative.
Strong or weak- The line of best fit yields two numbers, the correlation coefficient and the coefficient of determination. When the numbers are close to from 0.7-1 or (-0.7)-(-1), then there is a strong correlation. When the numbers are not in those ranges, there is a weak correlation. Or, if you see that there are many points far from the line of best fit, it is weak, but if the points are pretty close to the line then it is strong.
Linear or non-linear- Look at the data. If a line of best fit can be drawn easily, then there is a linear correlation. If not, then there is a non-linear correlation.
Positive or negative- If the line of best fit is going upwards, there is a positive correlation. If the line is going downwards, there is a negative correlation.
These terms should all be put together when describing the graph's trend (i.e. strong, linear, and positive correlation)
Drawing lines of best fit with accuracy (median-median line)
Instead of free handing a line of best fit based on visualization, a median-median line can be calculated and drawn mathematically.
To draw a median-median line:
1. Put your data in order, and split it into three similarly sized parts. If there is an uneven distribution, make sure that the first part and the last part have the same amount of data.
2. Calculate the average point for each part. To do this, find the average of the x values for each section, and find the average of the y values for each section. You should get three average x values, and three average y values.
3. Plot these average points on the scatter plot, with the average x value going with its corresponding y value.
4. Start your line at the first average point of the graph, and continue the line directly in between the other two points.
To find the equation of the median-median line:
1. Find the three points using steps 1-2 above.
2. Find the midpoint of the second and third point. Do this by adding the x values together and dividing by 2, and adding the y values together and dividing by 2. You now have your midpoint.
3. Find the equation of the line, using m = (y2-y1)/(x2-x1) for slope. Use your midpoint for the y2 and x2, and use your first point for x1 and y1. Once you have the slope, plug in one of the two points into the linear equation (y = mx + b) to find b, the y intercept.
Example: Find the equation of the median-median line of the following set of data.
Number of cans sold 15 18 23 25 28 30 30 36
Number of cans recycled 6 2 4 1 6 8 4 10
We first split up the data into three parts. Because there are 8 pieces of data, there won't be an even distribution for all three parts. Therefore, we can make the distribution 3 2 3 so that the outside parts will have the same amount of data.
30 30 36
8 4 10
15 18 23
6 2 4
25 28
1 6
...
...