Spatial correlation

Functionality

Point statistics may help to get an impression of the nature of your point data, for instance prior to a point interpolation, and to find necessary input parameters for Kriging, Anisotropic Kriging or Universal Kriging. First, distance and optionally direction is calculated between all points of possible point pairs; these distances and directions are also known as the separation vectors. Subsequently, autocorrelation, autovariance and experimental semi-variogram values are calculated from the values of those point pairs which fall within the same user-specified lag, i.e. the same distance (and direction) class.

Spatial autocorrelation measures dependence among nearby values in a spatial distribution. Variables may be correlated because they are affected by similar processes, or phenomena, that extend over a larger region. Odland (1988, p.7) mentions that spatial autocorrelation 'exists whenever a variable exhibits a regular pattern over space in which values at a certain set of locations depend on values of the same variable at other locations'. For example, if the concentration of a certain pollutant is very high at a certain location, it will most likely also be high in the direct surroundings. In other words, the concentration is autocorrelated at small distances. At larger distances, it is less likely that the concentration will be equally high. The correlation will probably be lower, and the variance higher.

By plotting the answers on autocorrelation against the distance classes, you will be able to see until which distance spatial autocorrelation exists between point pairs. This value can be used for the limiting distance in point interpolations such as moving average and moving surface. Furthermore, the user is encouraged to compare his or her data set with a data set consisting of the same point locations, with a set of attribute values, approximately in the same range as the measured variable, but created at random (using one of the RND functions in Table Calculation). If the graphs are very much the same for the measured data and the random data, no spatial autocorrelation exists between the data points. Hence, point interpolation is not useful. For more information, see Interpretation of Moran's I and Geary's c below.

Calculating semi-variograms is a basic geostatistical measure to determine the rate of change of a regionalized variable along a specific orientation (usually distances). Semi-variogram values are defined as the sum of the squared differences between pairs of points separated by a certain distance divided by two times the number of points in a distance class. By plotting experimental semi-variogram values against distance classes in a graph, you obtain a semi-variogram. By finding a model or function which fits these experimental semi-variogram values, you can obtain necessary input information (such as model type, sill, range, and nugget) for a Simple or Ordinary Kriging, an Anisotropic Kriging or a Universal Kriging operation later on. For more information, see the Additional info on semi-variograms below.

Tip:

When you suspect anisotropy in your input data, you can first perform the Variogram Surface operation. The output of this operation will show you the direction of the anisotropy. You can then do Spatial Correlation using the bi-directional method.

General process of this operation (omni directional):

  1. First, the distances between all points are calculated.
  2. Then, distance classes are determined (output column Distance). This is usually done according to the user-specified lag spacing: in the output table, records will appear for each multiple of the user-specified lag spacing. When specifying a lag spacing of 500 m., the values in the Distance column in the output table will be 0, 500, 1000, 1500, etc.
    However, these values in the output column Distance represent the middle value of a distance class, thus for lag spacing 500, distance 500 represents the distance interval of 250-750m, distance 1000 represents the distance interval of 750-1250m, etc.
    When a variable was sampled at regular distances, you can use this distance for the lag spacing.
  3. Subsequently, for each distance class, the number of point pairs is counted of which the points have such a distance towards each other.
    Thus, when the user-specified lag spacing is 500 m.:
  4. On the command line, you can also use a certain expression to obtain log-scaled distance classes.

  5. Then, for all the point pairs within a certain distance class, the following statistical values are calculated:
  6.   

    The formula to calculate experimental semi-variogram values reads:

      

    = S (zi - zi+h)2 / 2n

      

    where:

    experimental semi-variogram value of points that have a certain distance (h) towards each other

    zi

    the value of point i

    zi+h

    the value of a point at distance h from point i

    S (zi - zi+h)2

    the sum of the squared differences between point values of all point pairs within a certain distance class

    n

    the number of point pairs within a distance class

      

    For more information on formulas, see Spatial correlation : algorithm.

Methods:

In the dialog box, you can choose to use either the omni directional or the bi-directional method:

Both for the omni directional or the bi-directional method, linear distance intervals are created where the middle values of these distance classes are multiples of the user-specified lag spacing.

To calculate experimental semi-variogram values in a certain direction, you thus have to use the bi-directional method. The parameters for the bi-directional method are schematically presented in Figure 1.

  

Fig. 1: Schematic explanation of the bi-directional method when experimental semi-variogram values are calculated for the specified direction as well as for the perpendicular direction. The user has to specify a Direction (blue angle) and a Tolerance (red angle), and optionally, also a Band width (green distance in meters) can be specified. These parameters are used to find valid point pairs. When an input point is located at the origin of this picture, it is calculated whether any other input point is within the specified direction, tolerance angle and band width. If this is the case, the 2 points are a valid point pair; otherwise the pair is ignored. For each valid point pair, the distance between the 2 points is calculated, and the point pair is counted in the appropriate distance class.

Finally, from the command line, you can even use another method by which logarithmic distance intervals are used. The lag spacing increases with the distance.

Spherical distance:

Optionally, when using the 'omni directional' method, you can choose to calculate with spherical distances, i.e. distances calculated over the sphere using the projection that is specified in the coordinate system used by the input point map. It is advised to use this spherical distance option for maps that comprise large areas (countries or regions) and for maps that use LatLon coordinates. In more general terms, spherical distance should be used when there are 'large' scale differences within a map as a consequence of projecting the globe-shaped earth surface onto a plane.

When the spherical distance option is not used, distances will be calculated in a plane as Euclidean distances.

 

Tip: When you used the spherical distance option in the Spatial Correlation operation, you should also use the spherical distance option in a subsequent point interpolation operation, or in a subsequent Kriging operation.

Input map requirements:

The input point map should either be a value map itself, or a Class or ID point map which has a linked attribute table with one or more value columns.

Output table:

An output table with domain None is created.

When you use the option Omni directional, the output table will contain 6 columns:

When you use the option Bi-directional, the output table will contain 10 columns:

Mind:

When in an distance interval no point pairs are found, then the values in columns I, c, AvgLag and SemiVar will be undefined for these distance intervals.

Additional information: Semi-variograms

From the results of the Spatial correlation operation, you can make a semi-variogram. In the semi-variogram, the discrete experimental semi-variogram values that are the outcome of Spatial correlation will be modeled by a continuous function so that a semi-variogram value g will be available for any desired distance h (and optionally direction) for a Simple or Ordinary Kriging, an Anisotropic Kriging or a Universal Kriging operation later on.

How to display a semi-variogram:

Display the input table of the Spatial correlation operation or a histogram of an input value map in a table window.

Display the output table of the Spatial correlation operation in a table window. Inspect the following columns in the output table:

Create point graphs, i.e. experimental semi-variogram(s), from the Distance and SemiVar columns in the output table of Spatial Correlation.

In literature, the shown graph is called a discrete experimental semi-variogram.

Figure 2 below shows a semi-variogram depicting a spherical model:

Fig. 2: A semi-variogram depicting a spherical model.

Remarks on semi-variograms:

The next step, before Kriging, is to model the discrete values of your experimental semi-variogram by a continuous function which will give an expected value for any desired distance.

To find which semi-variogram model fits your experimental semi-variogram values best, you can also use the Column SemiVariogram operation. This operation calculates semi-variogram values according to a user-specified semi-variogram model and parameters and stores calculated semi-variogram values in an output column.

Once you have decided which semi-variogram model, and which values for sill, range and nugget fit your data best, you can continue with the Simple or Ordinary Kriging operation, the Anisotropic Kriging operation or the Universal Kriging operation.

Interpretation of Moran's I and Geary's c

For all point pairs in a distance/direction class, you obtain a value for Moran's I and Geary's c; the formulae for these statistic measures can be found in topic Spatial correlation : algorithm. Geary's c compares the squared differences of point pair values to the mean of all values. Moran's I relates the product of differences of point pair values to the overall difference.

The general interpretation of both statistics can be summarized as:

  

0 < C < 1

Strong positive autocorrelation

I > 0

C > 1

Strong negative autocorrelation

I < 0

C = 1

Random distribution of values

I = 0

Geary's c multiplied by the variance of the input equals the semi-variogram values.

References:

See also: