Geographically and temporally weighted regression for modeling spatio-temporal variation in house prices
Bo Huang , Bo Wu & Michael Barry
Abstract
By incorporating temporal effects into the geographically weighted regression (GWR) model, an extended GWR model, geographically and temporally weighted regression (GTWR), has been developed to deal with both spatial and temporal nonstationarity simultaneously in real estate market data. Unlike the standard GWR model, GTWR integrates both temporal and spatial information in the weighting matrices to capture spatial and temporal heterogeneity. The GTWR design embodies a local weighting scheme wherein GWR and temporally weighted regression (TWR) become special cases of GTWR. In order to test its improved performance, GTWR was compared with global ordinary least squares, TWR, and GWR in terms of goodness-of-fit and other statistical measures using a case study of residential housing sales in the city of Calgary, Canada, from 2002 to 2004. The results showed that there were substantial benefits in modeling both spatial and temporal nonstationarity simultaneously. In the test sample, the TWR, GWR, and GTWR models, respectively, reduced absolute errors by 3.5%, 31.5%, and 46.4% relative to a global ordinary least squares model. More impressively, the GTWR model demonstrated a better goodness-of-fit (0.9282) than the TWR model (0.7794) and the GWR model (0.8897). McNamara’s test supported...
.NET单变量2G内存限制
问题来源
最近在写回归模型,.net环境下使用Math.NET Numerics矩阵库。
软硬件环境:WIN10(8G内存)、.NET 4.5、X64编译
以最小二乘法为例,需要求解帽子矩阵:
var hatmatrix = x_normalize.Multiply(x_normalize.Transpose().Multiply(x_normalize).Inverse()).Multiply(x_normalize.Transpose());
当数据量自变量样例达到20000时,hatmatrix达到20000 X 20000 ,单变量大小超过2G,出现内存溢出。
以创建20000 X 20000的双精度数组,
double[] array = new double[20000*20000];
同样出现内存溢出现象。
解决方案
这是.net在初始设计是对单变量2G大小的默认限制,与平台和内存大小无关。
这项默认设置已经有10年之久,直到.net 4.5版本的出现,允许用户在X64编译器上通过配置,使用超过2G的变量。
在App.Config文件下,配置如下即可:
<configuration>
<runtime>
<gcAllowVeryLargeObjects enable="true"/>
</runtime>
</configuration>
详细信息,可参考 https://bhrnjica.net/2012/07/22/with-net-4-5-10-years-memory-limit-of-2-gb-is-over/
瓶颈
更新于2017-11-2
上述方法,虽然使得单变量的存储突破了2G限制,但对于一个数组来说,其数组的最大长度也依旧有瓶颈。
在.NET 4.5 环境下,字节数组的最大长度不能大于2147483591。对于其他数值类型,长度不能大于2146435071。
这份说明同样在破解2G变量大小限制的MSDN官网有说明,之前没仔细翻到后面。
原文中说到:
Using this element in your application configuration file enables arrays that are larger than 2 GB in size, but does not change other limits on object size or array size:
- The maximum number of elements in an array is UInt32.MaxValue.
- The maximum index in any single dimension is 2,147,483,591 (0x7FFFFFC7) for byte arrays and arrays of single-byte structures, and 2,146,435,071 (0X7FEFFFFF) for other types.
- The maximum size for strings and other non-array objects is unchanged.
GWR在PM2.5分析中的文献综述
新思路
新模型:GTWR+高程
运用:宁波市PM2.5每个小时数据
范围:100km*100km
自变量因子:
AOD:气溶胶光学厚度值
TEMP:温度
RH:相对湿度
WS:风速
HPBL:边界层高度(有些模型里没有)
LANDCOVER:地表覆盖
数据来源
AOD:
-
MODIS: Collection 5.1 MODIS Dark Target level 2 aerosol retrievals的产品
https://ladsweb.modaps.eosdis.nasa.gov/The data field of Optical_Depth_Land_And_ Ocean with best quality assurance (AOD Qualityflag = 3)
10km空间分辨率 -
>
后面自己用克里金插值成5km的 -
气象数据:
-
NARR:环境预测全球再分析的国家中心
http://www.emc.ncep.noaa.gov/mmb/rreanl/
(使用了ETA 模型)空间分辨率:32km
包括:boundary layer height, relative humidity, air temperature, and wind speed at 3-h intervals(每三个小时的数据),
每天的数据由10-16点的三小时数据平均而得到
-
NLDAS(Phase 2):
http://ldas.gsfc.nasa.gov/nldas/
空间分辨率:1/8度,即13km 时间分辨率:1小时,各个指标值根据高程做了修正
每天的数据由10-16点的时数据平均而得到。
但好像只有北美的数据,且没有boundary layer height
-
最后模型分别提供气象参数(混着来)
-
气象插值方法:thin plate spline spatial interpolation method(到1km)
土地利用数据:
- 获得一份地表覆盖数据:美国2001年30m的土地利用类型数据 http://www.epa.gov/mrlc/nlcd-2001
- 从中提取植被数据(植被为1,非植被为0)
文献综述
文献 | 期刊 | 年份 | 使用模型 | 参数 | 一些手段 |
---|---|---|---|---|---|
High-Resolution Satellite Mapping of Fine Particulates Based on Geographically Weighted Regression | IEEE GEOSCIENCE AND REMOTE SENSING LETTERS | 2016 | GWR对比OLS | 区域:京津冀实测点:52个 每小时http://cnemc.cn/AOD:通过SARA模型计算Land Use:每年 以监测点为缓冲区1km范围进行计算工业区密度:人口密度:每年主干道长度:特定类型地类的密度(绿地、水地):Meteorological:每年 13个站点,用最邻近发拷到监测点上 温度 湿度 大气压强建立1km格网,作分析 | 扯了SARA模型扯了GWR、OLS作出了所有区域预测的PM2.5图尺度做了1,3,10km的例子,做了对比讨论 |
A Review on Predicting Ground PM2.5 Concentration Using Satellite Aerosol Optical Depth | atmosphere | ...