shlogg · Early preview
Mike Young @mikeyoung44

New Method Makes AI Training Data Valuation 1000x Faster

New method ALinFiK values AI training data 1000x faster without model access. Achieves 98.4% correlation with exact influence functions at high speed. Applications in data pricing, curation & identifying harmful data.

This is a Plain English Papers summary of a research paper called New Method Makes AI Training Data Valuation 1000x Faster Without Model Access. If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter.

  
  
  Overview

ALinFiK is a method for valuing training data in third-party Large Language Models (LLMs)
Uses efficient approximation of influence functions to assign value to data points
Achieves up to 98.4% correlation with exact influence functions at 1000x greater speed
Requires only black-box API access to LLMs without needing internal model parameters...