New Method Makes AI Training Data Valuation 1000x Faster
New method ALinFiK values AI training data 1000x faster without model access. Achieves 98.4% correlation with exact influence functions at high speed. Applications in data pricing, curation & identifying harmful data.
This is a Plain English Papers summary of a research paper called New Method Makes AI Training Data Valuation 1000x Faster Without Model Access. If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter. Overview ALinFiK is a method for valuing training data in third-party Large Language Models (LLMs) Uses efficient approximation of influence functions to assign value to data points Achieves up to 98.4% correlation with exact influence functions at 1000x greater speed Requires only black-box API access to LLMs without needing internal model parameters...