# EnergyStatistics.jl

In statistics distance correlation or distance covariance is a measure of dependence between two paired random vectors. The population distance correlation coefficient is zero if and only if the random vectors are independent. Thus, distance correlation measures both linear and nonlinear association between two random vectors. This is in contrast to Pearson's correlation, which can only detect linear association between two random variables. See here for references and more details.

## Installation

This package can be installed using the Julia package manager. From the Julia REPL, type `]`

to enter the Pkg REPL mode and run

`pkg> add EnergyStatistics`

## General Usage

Given two vectors `x`

and `y`

the distance correlation `dcor`

can simply computed:

```
using EnergyStatistics
x = collect(-1:0.01:1)
y = map(x -> x^4 - x^2, x)
dcor(x, y) ≈ 0.374204050
```

These two vectors are clearly associated. However, their (Pearson) correlation coefficient vanishes suggesting that they are independent. The finite distance correlation `dcor`

reveals their **non-linear** association.

Function to compute the distance covariance `dcov`

and distance variance `dvar`

are also supplied.

## Advanced Usage

The computation of a 'DistanceMatrix' is computationally expansive. Especially the computation of `n(n-1)/2`

pairwise distances for vectors of length `n`

and the subsequent centering of the distance matrix take time and memory. In cases where one wants to compute several distance correlations and keep intermediate results of the distance computations and centering one can do so. For example:

```
Dx = EnergyStatistics.dcenter!(EnergyStatistics.DistanceMatrix(x))
Dy = EnergyStatistics.dcenter!(EnergyStatistics.DistanceMatrix(y))
Dz = EnergyStatistics.dcenter!(EnergyStatistics.DistanceMatrix(z))
dcor_xy = dcor(Dx, Dy)
dcor_xz = dcor(Dx, Dz)
```

will run faster than

```
dcor_xy = dcor(x, y)
dcor_xz = dcor(x, z)
```

since the distance matrix `Dx`

for the vector `x`

is only computed once.

You can also construct distance matrices using other distance measures than the (default) `abs`

.

`AA = EnergyStatistics.dcenter!(EnergyStatistics.DistanceMatrix(Float64, x, abs2))`

Instead of double centering via `dcenter!`

one may also use U-centering via the `ucenter!`

function.

## References

See the wikipedia page for references and more details.

## Functions

`EnergyStatistics.dcor`

— Method`dcor(x::AbstractVector{T}, y::AbstractVector{T}) where T <: Real`

Computes the distance correlation of samples `x`

and `y`

.

```
using EnergyStatistics
x = collect(-1:0.01:1)
y = @. x^4 - x^2
dcor(x, y)
# output
0.3742040504583155
```

`EnergyStatistics.dcov`

— Method`dcov(x::AbstractVector{T}, y::AbstractVector{T}) where T <: Real`

Computes the distance covariance of samples `x`

and `y`

.

`EnergyStatistics.dvar`

— Method`dvar(x::AbstractVector{T}) where T <: Real`

Computes the distance variance of a sample `x`

.

`EnergyStatistics.DistanceMatrix`

— Method`DistanceMatrix(x::AbstractVector{T}, dist = abs) where {T}`

Computes the matrix of pairwise distance of `x`

. The distance measure `dist`

is `abs`

as default.

```
using EnergyStatistics
x = [1.0, 2.0]
EnergyStatistics.DistanceMatrix(x)
# output
2×2 EnergyStatistics.DistanceMatrix{Float64}:
0.0 1.0
1.0 0.0
```

`EnergyStatistics.dcenter!`

— Method`dcenter!(A::DistanceMatrix{T}) where {T <: Real}`

Computes the double centered matrix of `A`

in place.

`EnergyStatistics.ucenter!`

— Method`ucenter!(A::DistanceMatrix{T}) where {T <: Real}`

Computes the u-centered matrix of `A`

in place.

`EnergyStatistics.dcor`

— Method`dcor(A::DistanceMatrix{T}, B::DistanceMatrix{T})`

Computes the distance correlation of two centered DistanceMatrices `A`

and `B`

.

`EnergyStatistics.dcov`

— Method`dcov(A::DistanceMatrix{T}, B::DistanceMatrix{T}) where {T <: Real}`

Computes the distance covariance of two centered DistanceMatrices `A`

and `B`

.

`EnergyStatistics.dvar`

— Method`dvar(A::DistanceMatrix{T}) where {T <: Real}`

Computes the distance variance of a centered DistanceMatrices `A`

. Stores the variance alongside the DistanceMatrix for future use.