題名: | 台灣重點城市房屋價格之研究 |
其他題名: | A Study of Housing Price of Topic Cities in Taiwan |
作者: | 陳瑞龍 廖烜唯 劉芸均 吳雅莉 陳軍翰 吳育成 曾譯賢 |
關鍵字: | 房價預測 迴歸分析 交叉驗證 決策樹 懲罰迴歸 housing price prediction regression model cross-validation decision tree penalized regression |
系所/單位: | 統計學系 |
摘要: | 本研究以近期台灣發展較為重點的城市進行房價的探討。本文納入不同面向的因子以建立房價預測模型,其中包括平均年所得、人口成長率、青年比、餘屋數和臨最近產業園區距離,資料年份採用2017年,研究最大亮點是使用臨最近產業園區距離這項因子,過去文獻甚少以產業園區對房價進行探討。本研究依據複迴歸分析進行房價的建模,並使用交叉驗證對不同因子組合的模型進行最佳模型的挑選,以及藉由懲罰迴歸模型進行最佳模型的佐證,後續再與羅吉斯迴歸跟決策樹得出的最佳模型進行比較與解析。研究結果顯示:1.複迴歸模型:挑選出以平均年所得、青年比及臨最近產業園區距離組成的模型表現最佳,調整後模型解釋能力約為0.65,此外從參數估計可以觀察出平均年所得、青年比和房價呈正向關係,而臨最近產業園區距離和房價呈負向關係,這方面與預期符合;2.羅吉斯迴歸中得出最佳的模型與複迴歸的結果不同,但在準確率上差異不大;3.決策樹中將房價分成三類其得出的最佳模型與複迴歸的結果不同,但準確率差距不到1%,且複迴歸的最佳模型在混淆矩陣中的重大錯誤皆較少。從三者結果得出以平均年所得、青年比及臨最近產業園區距離的模型組合來預測房價表現較為穩定。 A study is to discuss the housing price of topic cities in Taiwan. This article investigates several factors that are the per capita income, the population growth rate, youth ratio, number of remaining houses, and a distance of near the nearest industrial park to establish the housing price predicted model. The year of data uses in 2017, the research focuses on using a distance of near the nearest industrial park factor. In the past, there is limited literature discussed housing prices with a distance factor. This research utilizes multiple regression to build the housing price predicted model and cross-validation (CV) procedure (leave-one-out CV) to define the best prediction model as well as prove the best model through the penalty regression with SCAD penalty. Then, using logistic regression and decision tree to obtain the comparison and the analysis. The research results showed that: 1. The model contains per capita income, youth ratio, and a distance near the nearest industrial park factors is the best model through the multiple regression model, and the corresponding adjusted R-square is about 0.65. Also, we observe a positive relationship among the per capita income, youth ratio, and housing price, and a negative relationship between the distance near the nearest industrial park and housing price. 2. The best model selected is different from logistic regression and multiple regression models, but the difference in accuracy is not significant. 3. The best model through the decision tree based on dividing house prices into three categories is different from the result by the multiple regression model, but the difference of accuracy is lower than 1%. Moreover, the best model of multiple regression has fewer major errors in the confusion matrix. According to these results, the model that contains per capita income, youth ratio, and a distance near the nearest industrial park have a good performance on the prediction. |
日期: | 2021-04-23T04:15:17Z |
學年度: | 109學年度第一學期 |
開課老師: | 王价輝 |
課程名稱: | 統計專題(一) |
系所: | 統計學系 |
分類: | 商109學年度 |
文件中的檔案:
檔案 | 描述 | 大小 | 格式 | |
---|---|---|---|---|
D0680416109102.pdf | 3.59 MB | Adobe PDF | 檢視/開啟 |
在 DSpace 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。