Euclidean Distance (Points)

seadoggie01 · March 29, 2020

Recently I was reading this article about "K-nearest neighbors from scratch" (for school), and I was interested in implementing this in AutoIt... so I needed this piece before I started. It was a beast. Please let me know if you have any suggestions/improvements.

The method may seem a bit backwards, but I wanted to be able to use this for an array of any number of dimensions... which was an interesting challenge. It's based around using Execute to return the value of a particular point in an array.

; #FUNCTION# ====================================================================================================================
; Name ..........: _Point_EuclideanDistance
; Description ...: Gets the Euclidean distance between two points (represented as an array like [x, y, z, ..., n]).
; Syntax ........: _Point_EuclideanDistance($aPoint1, $aData2)
; Parameters ....: $aPoint1              - an array of 1 dimensional data.
;                  $aData2              - an array of 1 dimensional data.
; Return values .: Success - the distance between the two points
;                  Failure - False and sets @error:
;                  |1 - array is not 1-D. @extended is set to the argument number.
;                  |2 - the arrays are not the same size.
; Author ........: Seadoggie01
; Modified ......: 3/29/2020
; Remarks .......: The arrays MUST contain only numerical data. This is not verified in the function.
;                  Special thanks to jchd and RTFC for the mathematical explanations on dimensional measurements!
; Related .......:
; Link ..........:
; Example .......: No
; ===============================================================================================================================
Func _Point_EuclideanDistance($aPoint1, $aData2)

    If Not (UBound($aPoint1, 0) = 1) Then Return SetError(1, 1, False)
    If Not (UBound($aData2, 0) = 1) Then Return SetError(1, 2, False)

    If Not (UBound($aPoint1) = UBound($aData2)) Then Return SetError(2, 0, False)

    Local $iSum = 0

    For $i=0 To UBound($aPoint1) - 1
        $iSum += ($aPoint1[$i] - $aData2[$i])^2
    Next

    Return Sqrt($iSum)

EndFunc

Edited March 29, 2020 by seadoggie01
Updated code to use points ;)

jchd · March 29, 2020

Sorry to jump in your thread but there's something I don't understand.

The Euclidean distance (also called L² distance) is the measure of the distance between two points in Euclidean space of dimension n, that is from vectors in ℝⁿ.

You seem to want to extend this notion of distance to matrices (hence of dimension >= 2, that is between elements of ℝᵏ × ℝⁿ) and I'm unsure the result you get here has the same nice properties as the Euclidean distance in ℝⁿ.

See for instance https://math.stackexchange.com/questions/507742/distance-similarity-between-two-matrices

seadoggie01 · March 29, 2020

Actually, I'm not 100% sure myself. You might be right in that I've implemented this too far. I was honestly just following the code from the link I posted and thought going above to more dimensions would be cool. I didn't think about how matrices would work, I barely understand them as is

"Don't ask me, I just wrote the code 😆"

Edit: And please don't be sorry, I'm just happy someone is interested!

Edited March 29, 2020 by seadoggie01

RTFC · March 29, 2020

A matrix is two-dimensional by definition; higher-dimensional = tensor. Tensor norms (e.g. the Frobenius norm) exist, but are defined differently (see for example here). For matrices, a faster implementation would be:

#include "Eigen4AutoIt.au3"

_Eigen_StartUp()

$rows=123
$cols=234
$matA = _Eigen_CreateMatrix_Random ( $rows, $cols )
$matB = _Eigen_CreateMatrix_Random ( $rows, $cols  )

$specs = ( _Eigen_MatrixSpecs ( _Eigen_CwiseBinaryOp ( $matA, "-", $matB )))

_ArrayDelete($specs,"0-5;10-32")
_ArrayDisplay($specs,"norms")

_Eigen_CleanUp()

The Frobenius norm is the first specs entry (after deleting the other specs).

Edited March 29, 2020 by RTFC
clarification

seadoggie01 · March 29, 2020

7 hours ago, RTFC said:

<Math mumbo jumbo>

I bow to your superiority. But seriously though, I've only taken Calculus and don't understand that at all. I'm a visual person and can barely think about data in 3 dimensions.

After rethinking about this function, it seems to make sense that I should really only be accepting 2 1-D arrays that represent points in an unlimited dimensional space to get the distance between.

RTFC · March 29, 2020

49 minutes ago, seadoggie01 said:

I bow to your superiority.

Please don't.

49 minutes ago, seadoggie01 said:

I've only taken Calculus and don't understand that at all.

No calculus is required, yippeee. :cheer:

You can think of "distance" as: 1. how far is point A from point B in some space with any number of dimensions, and 2. how dissimilar (distant from one another) are two mathematical entities, be it a scalar, a vector, a matrix, or a tensor. There are various so-called "norms" for expressing this, of which the Euclidean norm/distance is one. The problem with high-dimensional entities is that they can be close in some respects and distant/dissimilar in others, so it's not always easy to figure out which method to use for quantifying this (that is, which norm is most appropriate in that particular context). But I think it's great that you're exploring this topic. :thumbsup:

Edited March 29, 2020 by RTFC

jchd · March 29, 2020

I second what @RTFC said.

Depending on the nature of the problem you have to solve, some norm may be completely misleading or unsignificant, while another (more or less unintuitive) will provide enlightning results. It often boils down to which mathematical property your problem requires or implies, which some norm(s) satisfy while other(s) destroy. Digging further into this for the general case needs non-trivial background as you can imagine.

Clearly, stepping into higher dimensions makes things less simple. The distance between two points in 1D, 2D, 3D, ... nD is pretty intuitive. But even in as low as 2D, deciding what means the "distance" between two closed curves (say shadows projected on a paper of two different patatoes) is something open to very different interpretations. Is the distance between the closest points of the two curves more meaningful for your problem than the distance of their center of gravity, or distance between the most distant points, or whatelse? Just with this example you can see the answer can't be unique and fit every problem involving two closed curves in 2D plane.

Going into higher dimensions generally breaks down intuition because "usual" (everyday life) properties don't hold anymore. Even stepping from 2D to 3D can be tricky: for instance, rotations in 3D aren't commutative anymore (the result depends on the order of applying two or more of them), but they are in 2D.

Nonetheless you can very often rely on results published in the huge freely available academic litterature, as well as practical implementations easy to use provided by computational packages like the one @RTFC has wrapped into his useful UDF.

seadoggie01 · March 29, 2020

@RTFC Wasn't actually bowing, just kidding And I think I see what you are getting at... and how @jchd explained it kind of helps. I think what you're gently and kindly trying to say (and I appreciate that!) is that you can't exactly calculate the distance between two non-similar entities using this method, and that completely makes sense. (Potato example was incredibly helpful!)

Just for kicks and giggles I tested my function on two sets of points from two cubes of size 1 that were directly next to each other. It returned the square of 8, where I was expecting 1, which proves both of your points. Each of the points I passed were 1 unit away from their counterparts, so the total was 8, which it took the square of. Then I realized that the distance would be the distance between two the center points (assuming equal objects).

So I should probably rename this to something like point euclidean distance and reign the dimensions back down

Thank you for the explanations!

Edit: Also, jchd, your first post makes total sense now that I re-read it

Edited March 29, 2020 by seadoggie01

RTFC · March 30, 2020

11 hours ago, seadoggie01 said:

just kidding

Me too. And apologies for the mumbo jumbo.

Have you ever read "Flatland" by E. Abbott? It's about a sphere that visits a 2D world, appearing and disappearing as it briefly intersects that plane, thereby disturbing the complacent existence of A. Square (who eventually gets locked up in an insane asylum). Higher-dimensional entities have features that remain either completely hidden or are only perceived in a partial, distorted fashion. That means the rules for an operation may be different depending on their dimensionality (for example, multiplying two matrices is not the same as multiplying each pair of corresponding cell values, as you would in the scalar case), or the operation itself may not even exist (for example, you cannot divide one matrix by another). Think of the difference between a 3D coffee pot and a 2D picture of a coffee pot. How much coffee can the 2D pot contain? Answer: it does not compute. The 3D complexity of "volume" is completely hidden in the 2D case. We could use the 2D area of the coffee pot in the picture instead, but we'd have to apply a different calculation for that, and you still can't put your coffee in there.

In your case, distance/dissimilarity can be expressed in many different ways, depending (for example) on how exceptional we consider outliers in a single dimension to be. The Euclidean version implicitly expects outliers to be fairly common, because you take the square of the difference per dimension (exponent = 2, also called the L2-norm). If, on the other hand, a big difference in even a single dimension is a big deal (even if the differences in a thousand other dimensions are tiny), then you would raise that exponent (Lmax norm), causing the effect of any single big dim-difference on the total sum to be magnified. The opposite effect is achieved when summing just the absolute differences (L1-norm) instead of their squared differences, meaning the effect of outliers will affect the overall "distance" even less than in the Euclidean case. So that power of two you're using in your algorithm is itself a (hopefully) conscious choice about the expected properties of what you're trying to quantify. :think:

Edited March 30, 2020 by RTFC
typos

Sign In

Euclidean Distance (Points)

Recommended Posts

seadoggie01

jchd

seadoggie01

RTFC

seadoggie01

RTFC

jchd

seadoggie01

RTFC

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Browse

AutoIt Resources

Release

Beta