1301: 实验2-2-4 查询向量与库向量的最短距离 / Nearest Library Vector to a Query Vector

Memory Limit:128 MB Time Limit:1.000 S
Judge Style:Text Compare Creator:
Submit:111 Solved:80

Description

【中文题面】
在很多人工智能系统中,一张图片、一段文字或一个用户兴趣画像,最终都会被模型转换成一个由若干实数组成的“特征向量”。向量之间的距离可以用来衡量相似程度:距离越小,通常表示两个对象越相似。
例如,在人脸识别中,系统会先把待识别的人脸图像转换成人脸特征向量,再与库中已登记人员的人脸向量逐一比较,寻找距离最近的候选人;在推荐算法中,系统可以把用户兴趣和商品、文章、视频等内容都表示为向量,再查找与用户向量最接近的内容,作为推荐结果;在医学图像、病历文本或知识库检索中,也常用类似思路寻找相似病例、相似图像或相关文档。
本题给定一个查询向量和若干个库向量,要求计算查询向量与每个库向量之间的欧氏距离,输出距离最近的库向量序号和距离。距离为各维度差值平方和再开方。若距离相同,选择编号较小者。
本题要求使用vector实现。这里采用的是最朴素的线性扫描方法:逐个计算所有库向量到查询向量的距离。真实工业系统面对百万、亿级向量时,会进一步使用索引结构和近似最近邻搜索来加速。
扩展阅读:Faiss(高效相似向量搜索与聚类库)、Milvus(开源向量数据库)、hnswlib(基于HNSW的近似最近邻检索库)。
[English Statement]
In many AI systems, an image, a text passage, or a user preference profile can be converted by a model into a feature vector of real numbers. The distance between vectors can measure similarity: a smaller distance usually means two objects are more similar.
For example, in face recognition, a query face image is converted into a face embedding and compared with registered face embeddings in a gallery to find the nearest candidate. In recommender systems, users and items such as products, articles, or videos can also be represented as vectors, and the system recommends items whose vectors are close to the user vector. Similar ideas are also used in medical image retrieval, clinical text search, and knowledge-base search.
Given a query vector and several library vectors, compute the Euclidean distance from the query vector to each library vector, and output the index and distance of the nearest library vector. The distance is the square root of the sum of squared coordinate differences. If there is a tie, choose the smaller index.
Use vector in this problem. This problem uses the simplest linear scan method: compute the distance from the query vector to every library vector one by one. Real industrial systems with millions or billions of vectors usually use indexes and approximate nearest neighbor search for acceleration.
Further reading: Faiss, Milvus, and hnswlib.

Input

【输入】
第一行输入向量维数d和库向量个数n。第二行输入查询向量,包含d个浮点数。接下来n行,每行输入一个库向量,包含d个浮点数。
[Input]
The first line contains dimension d and the number of library vectors n. The second line contains the query vector with d floating-point values. The next n lines each contain one library vector with d floating-point values.

Output

【输出】
输出最近库向量的编号(从0开始)和距离,距离保留2位小数。
[Output]
Output the index of the nearest library vector (0-based) and the distance rounded to 2 decimal places.

Sample Input Copy

4 3
3 6 9 2
2 -1 3 1
2 -1 9 1
-7.2 3.1 5.5 0.8

Sample Output Copy

1 7.14

HINT

比较最近距离时可以先比较平方距离,最后输出时再开方。这样有助于减少不必要的sqrt计算。
When comparing distances, you may compare squared distances first and apply sqrt only for the final output. This reduces unnecessary sqrt computations.