Tuesday, November 12, 2019
Linear Regression
Scatter Plots Linear regression Is a crucial tool In Identifying and defining key elements influencing data. Essentially, the researcher is using past data to predict future direction. Regression allows you to dissect and further investigate how certain variables affect your potential output. Once data has been received this information can be used to help predict future results. Regression is a form of forecasting that determines the value of an element on a particular situation. Linear regression allows us to create formulas to define the effects of a variable.Data analysis Is an Important concept In Improving business results. There Is no reason why we would not use the data to help forecast for the future. The information is available and reliable and will explain the breakdown of the entire business process. Break Even Calculations Break-even calculations are used to denote a firm's capital structure, to the extent to which fixed Income securities, debt and preferred stock, are used. The operating leverage can be depicted by graphs to demonstrate relevant probability distributions.Break even points are determined by the quantity measurement of operating income (BIT) being equal to zero, which applies that sales revenues are equal to costs. Break-even analysis, from an operational perspective focuses on the choice of processes, which Implies that the two processes have equal costs for a specific level of volume, referred to as the break-even point. To determine how much volume of business a company must do to break-even can be stated in either monetary units or product unit.The linear model that is utilized to conceptualize the processes denotes that the selling price per unit is constant. In other words the banks fixed costs are the predetermined Interest rates, which Is what the banks financial business depends upon. The variable costs remain constant, which refers to costs for labor. Fixed costs remain constant, which are the operation costs that do not change, such as facility operation costs, insurance and taxes on the facility, senior management salaries, and other overhead expenses.How Does this affect our business? If we go back to the topics we discussed during week two, our business, Diamond Banking, was Interested In the correlation between customer account balances and the number of ATM transactions occurring. Business executives for the company should be interested in using both Linear regression models and break even calculations to determine deferent aspects of the business model. We want to use linear regression to compare past data and net profit with current usage and profits gained.As with every business, we want to make sure that the amount of money we put into operating and servicing the Tam's does not amount to more than the profit we receive from consumers using them. This is where break even calculations would be useful. There are many more examples that we could use in our business practices, both everyday use s, and for yearly comparisons. Graphs, charts and equations models are essential in monitoring and understanding past business successes and failures, and must be used both in current and future annihilation of the business model. Linear Regression Introduction ââ¬â DescriptionLinear regression is a basic linear approach used for predictive analysis. It is used to model the relationship between one dependant variable y and one or more independent variable denoted X. It is used to examine two things which are: Whether a set of predictor variables can predict an outcome (dependant) variable. To identify which variables are significant predictors of the outcome variable and how they impact the outcome variable. Simple linear regression is used to examine the relationship between a quantitative outcome (dependant) and a single explanatory (independent) variable. The formula is given by y=?_0+?_1 x+? Where y = estimated dependent, response, outcome variable score, ?_0 = constant, and it estimates the y intercept ?_1 = regression coefficient, and it estimates the slope x= score on the independent, predictor, or explanatory variable. ?= is the unexplained, random, or error component. We can get the values of x and y from a sample and the parameters ?_0 and ?_1 are estimated by using the method of least squares or another method. The resulting estimate of the model is given by y ?=b_0+b_1 xThe symbol y ? ââ¬â pronounced y hat ââ¬â refers to the predicted values of the outcome variable y that are associated with values ofx, given the linear model. Since linear regression models depend linearly on their unknown parameters they are easier to fit than models which are not linearly related to their parameters.Given n observations pairs (x_1,y_1),(x_2,y_2 ),â⬠¦,(x_n,y_n), the predicted response on the ith observation is given by y ?_i=b_0+b_1 x_iAnd the random error component will be given by?_i=y_i-y ?_iA line that fits the data ââ¬Å"bestâ⬠will be one for which the random errors are as small as possible in some overall sense and this is achieved through least squares. The method of least squares chooses the values forb_0, and b_1 to minimize the sum of squared errors.The Sum of Squares Errors (SSE) is given by the following formulaSSE=?_(i=1)^n(y_i-y ?_i)?^2=?_(i=1)^n(y-b_0-b_1 x)?^2 ?The SSE should be kept as minimal as possible in order to get the line of best fit. If the blue line is a regression line (line of best fit) the observations marked in red are assumed to have come as a result of random deviations ââ¬â marked in green ââ¬â from the underlying relationship between the response variable (y) and the predictor variable (x). Source: Wikipedia The regression parameters that give minimum error variance are: b_1=(xy-nx ?y ? ?)/(x^2-nx ?^2 ?) and b_0=y ?-b_1 x ?Where,x ?=1/n ?_(i=1)^n?x_i y ?=1/n ?_(i=1)^n?y_i xy=?_(i=1)^n?x_i y_ix^2 =?_(i=1)^nx_i?^2 History of Simple Linear Regression Models Regression through the use of method of least squares was first published by Adrien-Marie Legendre in 1805 in his paper ââ¬Å"New Methods for Determination of the Orbits of Cometsâ⬠. In 1809 another mathematician Carl Friedrich Gauss published method of least squares in his treatise, ââ¬Å"Theory of the Motion of the Heavenly Bodies Moving About the Sun in Conic Sections,â⬠even though Gauss claimed to have discovered it before Legendre. Both mathematicians used least squares in astronomical observations, to determine the orbits of comets and other planets about the Sun and also relative to the earth. They used the method of least squares to predict the position of comets, based on measurements of the comets' previous position. Gauss published a development of the least squares which included the Gauss Markov theorem. The first person to use the term ââ¬Å"regressionâ⬠was Francis Galton in the 1870s. He used regression to explain a biological phenomenon how ââ¬Å"co-relatedâ⬠trees were to their parents. His findings were published in his 1886 paper Regression Towards Mediocrity in Hereditary Stature. Karl Pearson, Galton's colleague was the first to link regression with the method of least squares. He discovered that if you plotted the height of parents on the x-axis and their children on the y-axis, resulted in a line of best fit with a slope less than one when using least squares. R. A. Fisher a twentieth century mathematician combined the methods of Gauss and Pearson to develop regression methods as we know it today. Through Fisher's work, regression analysis is no longer limited to prediction and understanding correlations, but also used to determine the relationship between a factor and an outcome. Over the years regression has developed and it now includes logistic regression, non-parametric regression, Bayesian regression and regression that incorporates regularisation. Regression was used for manageable data sets but through technology and computerisation regression can be done on a large data set in less than a second. Uses of Simple Linear Regression ModelSimple linear regression is a model that can determine the relationship between two variables and how one can impact the other. Once the relationship has been determine and its strength verified a simple linear regression can be used to forecast the dependant variable when an independent variable changes. It can be used to predict trends and future values of a phenomenon. The uses of simple linear regression do overlap in practice. Simple linear regression is used across many fields of study and economy, these include ââ¬â but not limited to ââ¬â the following: In business and economics it can be used to determine the effect of marketing and pricing on the sales of a product. It can also be used to predict the consumer behaviour in relation to some changes in the different variables. In car sales industry it can be used to predict the car selling price given the odometer reading for used cars. In agriculture it can be used to predict the yield of crop against the amount of rainfall received in a particular season. In Crime Data Mining it can used predict the crime rate of a provinces based on drug usage, human trafficking, etc. Sports journalist and analysts also use regression to predict future results.These are the few applications where simple linear regression can be used but the list is endless. Generally it can be used to simplify, explain and predict many aspects in life. Linear Regression Simple linear regression is the statistic method used to make summary of and provide the association between variables that are continues and quantitative ,basically it deals with two measures that describes how strong the linear relationship we can compute in data .Simple linear regression consist of one variable known as the predictor variable and the other variable denote y known as response variable . It is expected that when we talk of simple linear regression to touch on deterministic relationship and statistical relationship, the concept of least mean square .the interpretation of the b0 and b1 that they are used to interpret the estimate regression . There is also what is known as the population regression line and the estimate regression line .This linearity is measured using the correlation coefficient (r), that can be -1,0,1. The strength of the association is determined from the value of r .( https://onlinecourses.science.psu.edu/stat501/node/250). History of simple linear regression Karl Pearson established a demanding treatment of Applied statistical measure known as Pearson Product Moment Correlation .This come from the thought of Sir Francis Galton ,who had the idea of the modern notions of correlation and regression ,Sir Galton contributed in science of Biology ,psychology and Applied statistics . It was seen that Sir Galton is fascinated with genetics and heredity provided the initial inspiration that led to regression and Pearson Product Moment Correlation .The thought that encouraged the advance of the Pearson Product Moment Correlation began with vexing problem of heredity to understand how closely features of generation of living things exhibited in the next generation. Sir Galton took the approach of using the sweet pea to check the characteristic similarities. ( Bravais, A. (1846). The use of sweet pea was motivated by the fact that it is self- fertilize ,daughter plants shows differences in genetics from mother with-out the use of the second parent that will lead to statistical problem of assessing the genetic combination for both parents .The first insight came about regression came from two dimensional diagram plotting the size independent being the mother peas and the dependent being the daughter peas. He used this representation of data to show what statisticians call it regression today ,from his plot he realised that the median weight of daughter seeds from a particular size of mother seed approximately described a straight line with positive slope less than 1. ââ¬Å"Thus he naturally reached a straight regression line ,and the constant variability for all arrays of character for a given character of second .It was ,perhaps best for the progress of the correlational calculus that this simple special case should promulgated first. It so simply grabbed by the beginner (Pearson 1930,p.5). Then it was later generalised to more complex way that is called the multiple regression. Galton, F. (1894),Importance of linear regressionStatistics usually uses the term linear regression in interpretation of data association of a particular survey, research and experiment. The linear relationship is used in modelling .The modelling of one explanatory variable x and response variable y will require the use of simple linear regression approach .The simple linear regression is said to be broadly useful in methodology and the practical application. This method on simple linear regression model is not used in statistics only but it is applied in many biological, social science and environmental research. The simple linear regression is worth importance because it gives indication of what is to be expected, mostly in monitoring and amendable purposes involved on some disciplines(April 20, 2011 , plaza ,).Description of linear regression The simple linear regression model is described by Y=(?0 + ?1 +E), this is the mathematical way of showing the simple linear regression with labelled x and y .This equation gives us a clear idea on how x is associated to y, there is also an error term shown by E. The term E is used to justification for inconsistency in y, that we can be able to detect it by the use of linear regression to give us the amount of association of the two variables x and y .Then we have the parameters that are use to represent the population (?0 + ?1x) .We then have the model given by E(y)= (?0 + ?1x), the ?0 being the intercept and ?1 being the slope of y ,the mean of y at the x values is E(y) . The hypothesis is assumed is we assume that there is a linear association between the two variables ,that being our H0 and H1 we assume that there is no linear relationship between H0 and H1. Background of simple linear regression Galton used descriptive statistics in order for him to be able to generalise his work of different heredity problems .The needed opportunity to conclude the process of analysing these data, he realised that if the degree of association between variables was held constant,then the slope of the regression line could be described if variability of the two measure were known . Galton assumed he estimated a single heredity constant that was generalised to multiple inherited characteristics . He was wondering why, if such a constant existed ,the observed slopes in the plot of parent child varied too much over these characteristics .He realise variation in variability amongst the generations, he attained at the idea that the variation in regression slope he obtained were solely due to variation in variability between the various set of measurements . In resent terms ,the principal this principal can be illustrated by assuming a constant correlation coefficient but varying the standard deviations of the two variables involved . On his plot he found out that the correlation in each data set. He then observe three data sets ,on data set one he realised that the standard deviation of Y is the same as that of X , on data set two standard deviation of Y is less than that of X ,third data set standard deviation of Y is great than that of X . The correlation remain constant for three sets of data even though the slope of the line changes as an outcome of the differences in variability between the two variables.The rudimentary regression equation y=r(Sy / Sx)x to describe the relationship between his paired variables .He the used an estimated value of r , because he had no knowledge of calculating it The (Sy /Sx) expression was a correction factor that helped to adjust the slope according to the variability of measures .He also realised that the ratio of variability of the two measures was the key factor in determining the slope of the regression line . The uses of simple linear regression Straight relapse is a typical Statistical Data Analysis strategy. It is utilized to decide the degree to which there is a direct connection between a needy variable and at least one free factors. There are two sorts of straight relapse, basic direct relapse and different straight relapse. In straightforward direct relapse a solitary autonomous variable is utilized to anticipate the estimation of a needy variable. In numerous straight relapse at least two free factors are utilized to anticipate the estimation of a needy variable. The contrast between the two is the quantity of free factors. In the two cases there is just a solitary ward variable The needy variable must be estimated on a nonstop estimation scale (e.g. 0-100 test score) and the free variable(s) can be estimated on either an all out (e.g. male versus female) or consistent estimation scale. There are a few different suppositions that the information must full fill keeping in mind the end goal to meet all requirements for straight relapse. Basic straight relapse is like connection in that the reason for existing is to gauge to what degree there is a direct connection between two factors. The real contrast between the two is that relationship sees no difference amongst autonomous and subordinate factors while direct relapse does. Specifically, the reason for direct relapse is to ââ¬Å"anticipateâ⬠the estimation of the reliant variable in light of the estimations of at least one free factors. When you procure me to do the measurable investigation for your exposition, I ensure that I will utilize the fitting factual tests for your thesis comes about section. I can perform basically any standard measurable examination (utilizing SPSS) and I give onà ¬ going factual help to guarantee that you completely see the greater part of the measurements that I regression.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.