Predicting of the physique performance and accuracy of college students based on multiple linear regression and BP neural network

A multiple linear regression model and a BP neural network model are established to predict the physique performance of college students and analyze its accuracy based on the physical fitness test data of college students. The results of the study show that the average error of the multiple linear regression model and the BP neural network model for predicting the physique performance of college students are 1.23 and 0.51 points, respectively, and the average error percentage is 1.84% and 0.74%, respectively. Both models can effectively predict the physique performance of college students, but the BP neural network model has higher prediction accuracy than the multiple linear regression model.

media and forming a kind of life inertia, which in turn weakens their awareness of self-exercise and reduces their time and space for sports and exercise.The physical quality of college students directly affects their physical and mental health, and college sports is the last stop of school sports and the best period for college students to form the concept of lifelong sports, so college sports should assume the awareness of cultivating students' lifelong physical exercise [1].With the continuous development and deepening of university sports, colleges and universities have gradually realized the drawbacks of the idea of "emphasizing culture rather than sports" and the importance of physical exercise of college students in school, and have continuously introduced incentive policies to strengthen physical exercise of college students.
The physical quality of college students is closely related to the physical fitness test, which can indirectly reflect the physical quality of college students.However, there is lack of effective analysis method for college students' physical fitness test results at present.In the past, most universities usually use the simple arithmetic calculation of the percentage of each test item to get the comprehensive score of physical fitness test for college students.Since there are many physical fitness test items for college students, and each physical fitness test item is independent of the overall score, the traditional research method can only make a rough prediction of the score, and there is no direct connection between items and items.Therefore it is impossible to predict the physical fitness score of college students in multiple dimensions.In recent years, some scholars have conducted research on various types of sports performance by linear square [2], genetic algorithm [3], extrapolation [4] data mining [5,10], machine learning [6], gray scale theory [7] and neural network [8,9], whose prediction accuracy ranges from 0.5% to 6%.
This thesis first analyzes the historical physical fitness test data of college students, classifies and processes the students, and then carries out sample extraction.The neural network algorithm is used to establish a prediction model for college students' sports scores, and the historical data is divided into a training set and a test set, and the training set data is used to establish the model and continuously adjust the weights, and then the model performance is evaluated using the test set, comparing the relationship between This study aims to construct a multiple linear regression model and a BP neural network model for predicting the comprehensive physical fitness scores of college students.The study use the 2022 college physical fitness test items and scores as dataset.Furthermore, this study evaluate the accuracy of model prediction that can be used to predict the physical fitness scores of college students in colleges and universities.

CONSTRUCTING THE PREDICTION MODEL 2.1 Constructing model data sources
The comprehensive scores of 2019 college students' physical fitness test include weight index BMI, lung capacity, 50-meter run, sitting forward bend, standing long jump, 800-meter run or 1000-meter run, one-minute sit-ups, and pull-ups.Because the test items of male and female students are slightly different and the assessment methods are also different, the results of physical fitness test items of 3834 male students were only taken as the subjects of this paper, and the specific results of each item are shown in Table 1.

Multiple linear model construction
Multiple linear regression model is to describe the linear relationship between multiple independent variables and one response variable, the sample data of the independent variables are often larger than the number of independent variables.n dimensions of the multiple linear regression model [2] equation is.
where n is 1,2,3,…,n, i is the coefficient term, and xi is the independent variable.By adding an identity x0=1 to equation 1), the expression of equation 1) is reduced to Equation 2) can be expressed in matrix as: where The sample array in Table 1 has a total of 3843 groups and 8 independent variables, with k taken as 3843 and n taken as 8 in the matrix.

BP neural network model construction
The comprehensive score of college students' physical fitness test is evaluated by lung capacity, 50m run, standing long jump, sitting forward bend, 1000m run and pull-up, and the height and weight of college students have an important influence on the comprehensive sports score.In this paper, the height and weight of college students and the specified physical fitness test items are used as input parameters Use the combined score value as an output parameter = [ 1 2 8].BP neural network training data were randomly selected from 3843 students' physical fitness test performance data in 3800 groups, and the rest of the data were used for prediction data comparison.The network target error was set to 10-6, the maximum number of training steps was 1000, the maximum number of training times was 2000, and the learning rate was 0.001.
The design of the implicit layer in the BP neural network is one of the very important aspects of model building.It has been shown that selecting the appropriate number of hidden layers can effectively avoid the error being trained too small when computing, and the trained network will produce too much error in the secondary calculation.Although increasing the number of implied layers can reduce the network error, with more implied layers, the training time of the network between layers is also more, in addition to the possible overfitting phenomenon.Therefore, in this paper, a 5-layer At present, there is no fixed model for the determination of the number of nodes in the hidden layer, which is mainly determined by experience.The number of nodes in the hidden layer will directly affect the operational efficiency of the grid model and the recognition ability of the grid, and the commonly used empirical formula [9] is: In equation 4): k and n are the number of neurons in the input layer and output layer, respectively; is a correction constant from 1 to 10. Combining with Figure 1, we can determine that the neuron k=8 in the input layer and the neuron n=1 in the output layer, when taking 5, the number of nodes in the hidden layer can be determined as 8 according to equation 4).
According to Table 1, we can get the input parameters x1 range from 88 to 196.5 cm, the input parameters x2 range from 37.5 to 139.4 kg, and the input parameters x3 to x8 range from 0 -100 points.Large differences in the values of the input parameters will reduce the training accuracy reduction.In order to make each input parameter in the same status of importance and improve the accuracy of score prediction, the input parameters and output parameters need to be cinnamonized, and the normalized expression [10] is.
In equation 5), Z is the input and output parameters, and Zmin and Zmax denote the minimum and maximum values of the input and output parameters.After the regularization process, the parameter interval of all input and output terms is [-1,1].The actual values of the output parameters can be obtained after inverse normalization of the predicted output parameters again.Figure 3 shows the fitted curve of the BP neural network model.From Figure 3, it can be seen that only a small number of predicted values deviate from the experimental results, and the correlation R of the fitted values of the sample training set, sample validation Using the 3843 sets of data from the 2022 college student physical fitness test composite scores in Table 1 as the base data, the height (x1), weight (x2), lung capacity score (x3), 50-meter run score (x4), standing long jump score (x5), seated forward bend score (x6), 1000-meter run score (x7), and pull-up score (x8) were used as the independent variables of the multiple linear regression model, and the composite scores were used as the dependent variables, and after bringing the independent variables and the dependent variables into expression (3) to simplify them, the expression of the multiple linear regression equation model could be solved to obtain the following.
In order to facilitate the comparison of the prediction accuracy of the multiple linear regression model and the BP neural network model for the physical fitness test of college students, the randomly obtained BP neural network validation group data were substituted into the multiple linear regression model to predict the comprehensive score of the physical fitness test of college students.Figure 4 shows the BP neural network model and the multiple linear regression model predicting the comprehensive score of college students' To further compare the accuracy effects of the multiple linear regression model and the BP neural network model, the error and percentage of error in the prediction of grades by both models are plotted.As can be seen from Figure 5, the error value interval of grades predicted by the multiple linear regression model is [0.02, 3.75], the average error value is 1.23, and the average percentage of error is 1.84%.The error interval of grades predicted by the BP neural network model is [0.02, 1.96], the average error value is 0.51, and the average percentage of error is 0.74%.It can be seen that both the multiple linear regression model and the BP neural network model are able to more accurately predict the overall physical fitness performance of college students, and in addition, the BP neural network model has a higher prediction accuracy.

CONCLUSION
College students' physical education is an important way to improve the physical fitness of college students, and physical fitness test is an important means for teachers to understand the physical fitness of students, and the results of college students' physical fitness test can help college physical education management departments to set up a scientific and reasonable curriculum and develop the most effective training mechanism.By analyzing and processing the historical data of college students' physical fitness test scores, the prediction of physical fitness test scores can further help college physical education teachers to develop more effective sports training programs.Reasonable arrangement of teaching content, guide students to consciously carry out physical exercise, so as to comprehensively improve the physical quality of students, for the development of sports in colleges and universities as well as the cultivation of lifelong sports awareness of students to carry out services.However, due to the huge and redundant data collection, the diverse types and requirements of test programs, the variability of auxiliary test machines, and the different methods of using predictive data, the test data are not precise enough.Thus, this study uses different modeling requirements and chooses the most appropriate machine learning algorithm can solve the practical problems efficiently and get more accurate prediction data.
The mean error of the multiple linear regression model is 1.23 points, and the mean error percentage is 1.84%, while the mean error of the BP neural network model is 0.51 points, and the mean error percentage is 0.74%.The prediction accuracy of both models was less than 1.5 points, and the error percentage was less than 2%.
Both the multiple linear regression model and the BP neural network model can be applied to the prediction of the comprehensive physical fitness score of college students, and the predicted scores are closer to the actual scores with better prediction accuracy, and the BP neural network model has higher prediction accuracy than the multiple linear regression model, and the BP neuron network can be further applied to the online learning and prediction of college students' physical fitness to extend the application of this method The BP neural network model has better prediction accuracy than the multiple linear regression model.

Figure 2 :
Figure 2: Mean square error curve of BP neural network model

Figure 2
Figure 2 shows the mean square error curve obtained using the Eu Levenberg-Marquardt algorithm is shown, the X-axis indicates the number of training and the Y-axis indicates the mean square plot of the data set.From Figure 2, it can be seen that after 36 iterations of learning of the BP neural network, the function values have converged, and the best mean square error of the test curve, validation curve and training curve reaches 0.0007453 for the validation set and 0.0003127 for the training mean square, which shows that the BP neural network training is faster and more effective.Figure3shows the fitted curve of the BP neural network model.From Figure3, it can be seen that only a small number of predicted values deviate from the experimental results, and the correlation R of the fitted values of the sample training set, sample validation

Figure 4 :
Figure 4: Comprehensive results of physical fitness test for college students

Figure 5 :
Figure 5: Error curves of predicted values from multiple linear regression model and BP neural network model: (a) error value, (b) error percentage

Table 1 :
Basic information about the 2022 college students' sample