noellarkin Posted April 11, 2023 Share Posted April 11, 2023 I can use the OpenAI API to get arrays containing vector embeddings for a word/phrase using this: https://platform.openai.com/docs/guides/embeddings But what's the process of comparing the two vector arrays using something like this: https://en.wikipedia.org/wiki/Cosine_similarity In python, there's a library for this: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html Anything similar in AutoIt? Thanks! Link to comment Share on other sites More sharing options...
noellarkin Posted April 11, 2023 Author Share Posted April 11, 2023 Did I get this right? Just working off of the Wikipedia definition. #include <Array.au3> #include <Math.au3> Local $embedding1[3] = [1.0, 2.0, 3.0] Local $embedding2[3] = [4.0, 5.0, 6.0] Local $dotProduct = 0.0 For $i = 0 To UBound($embedding1) - 1 $dotProduct += $embedding1[$i] * $embedding2[$i] Next Local $magnitude1 = 0.0 For $i = 0 To UBound($embedding1) - 1 $magnitude1 += $embedding1[$i] ^ 2 Next $magnitude1 = Sqrt($magnitude1) Local $magnitude2 = 0.0 For $i = 0 To UBound($embedding2) - 1 $magnitude2 += $embedding2[$i] ^ 2 Next $magnitude2 = Sqrt($magnitude2) Local $cosineSimilarity = $dotProduct / ($magnitude1 * $magnitude2) MsgBox(0, "", "Cosine similarity: " & $cosineSimilarity) Link to comment Share on other sites More sharing options...
Solution RTFC Posted April 11, 2023 Solution Share Posted April 11, 2023 looks okay, but you should really look into E4A's DotProduct (section: Multiplication) and GetNorm (section: Reduction) functions. noellarkin 1 My Contributions and Wrappers Spoiler BitMaskSudokuSolver BuildPartitionTable CodeCrypter CodeScanner DigitalDisplay Eigen4AutoIt FAT Suite HighMem MetaCodeFileLibrary OSgrid Pool RdRand SecondDesktop SimulatedAnnealing Xbase I/O Link to comment Share on other sites More sharing options...
noellarkin Posted April 11, 2023 Author Share Posted April 11, 2023 23 minutes ago, RTFC said: looks okay, but you should really look into E4A's DotProduct (section: Multiplication) and GetNorm (section: Reduction) functions. I remember you recommending this library some time back, and I downloaded it but it looked so daunting (I don't have a CS background) I backed off immediately :) Okay I'll give it another go :) Link to comment Share on other sites More sharing options...
RTFC Posted April 11, 2023 Share Posted April 11, 2023 (edited) How is this daunting? #include "C:\AutoIt\Eigen\Eigen4AutoIt.au3" ; NB adjust path to wherever you put it Local $embedding1[3] = [1.0, 2.0, 3.0] Local $embedding2[3] = [4.0, 5.0, 6.0] _Eigen_StartUp() $vec1=_Eigen_CreateMatrix_FromArray($embedding1) $vec2=_Eigen_CreateMatrix_FromArray($embedding2) MsgBox(0, "", "Cosine similarity: " & _ _Eigen_DotProduct($vec1,$vec2) / (_Eigen_GetNorm($vec1) * _Eigen_GetNorm($vec2))) _Eigen_CleanUp() (I don't have a CS background either.) Edited June 9, 2023 by RTFC typo noellarkin 1 My Contributions and Wrappers Spoiler BitMaskSudokuSolver BuildPartitionTable CodeCrypter CodeScanner DigitalDisplay Eigen4AutoIt FAT Suite HighMem MetaCodeFileLibrary OSgrid Pool RdRand SecondDesktop SimulatedAnnealing Xbase I/O Link to comment Share on other sites More sharing options...
noellarkin Posted April 11, 2023 Author Share Posted April 11, 2023 2 hours ago, RTFC said: How is this daunting? Okay now I feel really stupid, haha :) Thank you I'll give the library another go Link to comment Share on other sites More sharing options...
RTFC Posted June 9, 2023 Share Posted June 9, 2023 Update: as of version 5.4 (released: 29 May 2023), E4A supports direct retrieval of the angle between two vectors with function _Eigen_GetVectorAngle ( $vecA, $vecB, $returnRadians = False ). A zero-degree angle signifies parallel vectors (aligned and pointing in the exact same direction), a 90-degree angle perpendicular ones, and a 180-degree angle implies the vectors are anti-parallel (aligned, but pointing in opposite directions). #include "C:\AutoIt\Eigen\Eigen4AutoIt.au3" ; NB adjust path to wherever you put it Local $embedding1[3] = [1.0, 2.0, 3.0] Local $embedding2[3] = [4.0, 5.0, 6.0] _Eigen_StartUp() $vec1=_Eigen_CreateMatrix_FromArray($embedding1) $vec2=_Eigen_CreateMatrix_FromArray($embedding2) MsgBox(0, "", "Cosine similarity: " & _Eigen_GetVectorAngle($vec1,$vec2)) _Eigen_CleanUp() noellarkin 1 My Contributions and Wrappers Spoiler BitMaskSudokuSolver BuildPartitionTable CodeCrypter CodeScanner DigitalDisplay Eigen4AutoIt FAT Suite HighMem MetaCodeFileLibrary OSgrid Pool RdRand SecondDesktop SimulatedAnnealing Xbase I/O Link to comment Share on other sites More sharing options...
noellarkin Posted June 12, 2023 Author Share Posted June 12, 2023 Sounds awesome :) I love that there are some alternatives to using Python for ML. Link to comment Share on other sites More sharing options...
RTFC Posted June 12, 2023 Share Posted June 12, 2023 Never jumped on the Python bandwagon myself either. From what I read at stackoverflow in various threads, you should be able to get significantly better performance when replacing numPy with raw Eigen/C++, even without GPU/CUDA/MPI refactoring. If you're serious about setting up ML in this way, I can probably help you. Because many of Eigen's speed optimisations are obtained at compile-time (e.g. lazy evaluation, smart loop unrolling, and matrix operation-specific stuff), if you were to present a snippet of E4A code (say, a UDF that applies a number of E4A functions to some input matrices), I could duplicate/optimise/rewrite that and present you with single pre-compiled E4A dllcall. I first suggested this when I started the E4A thread many years ago, but so far nobody has taken me up on this. Up to you of course. If you're worried about your intellectual property, you can PM me instead. In any case, hope it helps. noellarkin 1 My Contributions and Wrappers Spoiler BitMaskSudokuSolver BuildPartitionTable CodeCrypter CodeScanner DigitalDisplay Eigen4AutoIt FAT Suite HighMem MetaCodeFileLibrary OSgrid Pool RdRand SecondDesktop SimulatedAnnealing Xbase I/O Link to comment Share on other sites More sharing options...
noellarkin Posted June 12, 2023 Author Share Posted June 12, 2023 1 hour ago, RTFC said: so far nobody has taken me up on this Would love to :) but nothing in my workflow (so far) has warranted anything extremely complex - - at most, I'm using SBERT embeddings + Milvus vector DB and doing some vector comparisons, indexing corpus, some n-gram extractions with Yake. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now