A Gentle Intro To Statistics
Statistics Basics in simple terms
Do you ever hear those terms?
The average height of a man in the world is 5ft 9 inches.
The most popular food in the world.
The average life expectancy of the of male or female.
And I was always like those people never came to me to measure my height or ask me what kind of food I like. So how can they reach to those conclusions?
So without a further a due let's dive into it.
This is where statistics come in handy, where a calculated conclusion needs to be drawn to a wide area of population.
Statistics is a method of organizing the data to draw some meaningful knowledge and conclusion from it.
A population is a group or individual that represents all the members of interest.
A sample is a subset is drawn from the population.
let's take an example of we need to find out a taste of the oranges are sweet or sour. The entire oranges on the tree are considered as population. If we just select five random oranges (arrows) as a subset of oranges that would be called a sample.
A parameter is derived from a population.
Statistic- is derived from a sample.
If I want to know the average salary of IT staff in a college. There is two way of doing this, one way is to go out to every one of the staff members and ask them. Because these are the people of interest, so this will be considered as a population. We collect all the data to calculate the mean of their salary. This mean is called a parameter.
Another way to find this information is to randomly select a subset of staff names from the population and calculate the mean income of that selected subset of staff incomes. This subset of staff is called a sample and the mean generated from this is called a statistic.
Types of statistics
- Descriptive — is applied only to members of the sample or population from which data have been collected.
- Inferential —use sample data to reach some conclusions.
Sampling methods — there are various ways researchers can select samples, below are the most common ones.
- Random sampling- As the name implies every member of the population has an equal chance to get selected into the sample. It is the most useful and result-oriented but very difficult. Just imagine counting the average height of adults in a country that will be a very time-consuming and tedious task.
- Representative sampling- Purposely select cases that match the larger population on a particular characteristic. As I want to conduct a study to find out the income of adults between(25–35 years) in the Dublin area. So far all the adults include (man, women, disabled persons, parents, singles, ethical and racial groups). The sample represents all the adults as per the population of the study. There are 55% of the men are adults in the population, then the sample should also represent 55% of the men in the sample.
- Convenience sampling- Analyze exam performance of class 5 students. I select a nearby school as it is less labor-intensive, ease of access, and on the basis of proximity.
Constant — can have only a single value.
Variable can be codified for anything and has more than one value(ex-gender, income, age, height). A variable can also be described as the below types.
Quantitative(or continuous)- This type of variable represents some sort of amount. Height is a quantitative variable as a higher score, on this variable, it indicates a greater amount of height.
Qualitative(or categorical)- In this type of variable assigned value does not indicate more or less of a certain quantity. A study needs to find out the spending habits of adults in India, Ireland, UK(1= India,2=Ireland,3=UK) here value 3 is no more than value 1 or 2. the label(1,2,3)just represent a qualitative difference in a location not quantitative.
The dichotomous variable is the most common variable that is used in stats, it has two different categories(male or female/0 or 1).
The scale of measurement - there are four different scales of measurement for variables in statistics.
Nominal -used to identify the different levels of variable but don’t have any weight or value. Cat labels as 0 and dog labels as 1 don’t represent any higher score than each other. They are just values assigned to each group.
The ordinal — value that contains weight. If I wanted to know the top 10 students in my class. The student who scored highest in the exam will be at spot 1 and the second highest will be at spot 2. It doesn’t matter the difference between their marks is by number 1 or 50.
Interval — variable scores by this method have both relative value and distance. if there are 3 students in class a,b and c. a is 150 cm tall, b is 160 cm tall, and c is 175 cm tall. It is easily recognizable that who is taller and shorter in relation to other students. The height variable is measured in cm and all cm is equal in length. The height variable is measured in equal intervals and provides relative information about relative position and distance.
ratio -both ratio and interval are used measures with equal distances between each unit however ratio also includes the absolute zero value.