Fine-Tuning Models: Uncovering Hidden Capabilities
Fine-tuning large pre-trained models rarely alters underlying capabilities, instead adding a "wrapper" to perform new tasks without changing core knowledge.
Devs release thousands of AI papers, models, and tools daily. Only a few will be revolutionary. We scan repos, journals, and social media to bring them to you in bite-sized summaries.
Fine-tuning large pre-trained models rarely alters underlying capabilities, instead adding a "wrapper" to perform new tasks without changing core knowledge.
LLMs simulate user preferences for tailored learning in recommender systems, improving performance & generalizability, especially in text-based educational environments.
Large language models can linearly represent true & false statements, researchers find, using visualizations, transfer experiments & causal interventions to demonstrate the structure of truthfulness in LLMs.
New study shows AI models can spread election disinformation seamlessly. Researchers found people struggle to identify AI-generated text, with varying success rates depending on content type & AI model used.
The segment_anything_model is a powerful AI model developed by Meta AI's FAIR team, automatically detecting & segmenting objects in images with strong zero-shot performance.
Misaligned AI poses existential risk by 2070: >10% chance of catastrophe due to powerful & agentic AI seeking power over humans, disempowering humanity.
Flux AI model generates realistic images & text, blurring reality & fiction online. Social media platforms will need a complete overhaul as AI-generated content floods the internet.
Deep learning models can be trained efficiently with blockwise pretraining using self-supervised learning, rivaling backpropagation performance on ImageNet dataset.
Tree Attention boosts long-context attention efficiency on GPUs by 10x with up to 5x memory reduction. This novel approach organizes attention computation into a tree-like structure, enabling parallelization and reducing memory footprint.
Large Language Models (LLMs) enhance personalized recommender systems by improving recommendation accuracy & user experience through LLM reasoning capabilities.
New algorithm escapes saddle points in nonconvex optimization problems with regularization, outperforming existing methods in empirical results.
Research reveals current language models lack strong planning capabilities, focusing mainly on immediate next token rather than long-term context.
Researchers propose "prover-verifier game" to improve LLM transparency & trustworthiness by requiring models to justify outputs in interactive games with verifier agents.
Llama 3 foundation models excel in multilingualism, coding & reasoning, rivaling GPT-4's performance on various tasks. Publicly released with pre-trained & fine-tuned versions, including a safety-focused model.
Meta-rewarding approach aligns LLMs with desired goals by using an LLM as a "meta-judge" to evaluate & provide feedback on its own outputs, enabling self-improvement & addressing limitations of existing alignment techniques.
Retrieval Augmented Language Model with Self-Reasoning (RALM-SR) enhances understanding by giving models the ability to 'think for itself' & reason about retrieved info, outperforming traditional retrieval-augmented language models.
Researchers propose Consistent Diffusion, a new approach to train diffusion models on noisy data using Tweedie consistency. This method outperforms existing methods on image generation tasks, especially with noisy input.
Large language models can tackle complex tasks with chain-of-thought prompting, but a new method called Active-Prompt adapts to different tasks by automatically selecting the most helpful examples.
Researchers release Spectra LLM suite with 54 models to study ternary quantized language models' performance & efficiency compared to larger FP16 models.
Refusal training improves LLMs' past tense handling by instilling discipline & caution, helping them learn irregular verb forms better. Researchers found positive spillover effects in linguistic domains beyond safety & reliability.
BRIGHT benchmark pushes text retrieval limits with complex queries requiring intensive reasoning to identify relevant documents, outperforming state-of-the-art models by up to 12 points with chain-of-thought augmentation.
Large language models can now operate with 99% sparsity thanks to Q-Sparse technique, reducing computational costs & memory usage without compromising performance.
New AI approach tackles misalignment in text-to-image gen: Decompose & Realign framework improves image-text alignment by breaking down prompts into objects, attributes & relationships.
New "WildDeepfake" dataset tests limits of deepfake detection models, researchers propose attention-based networks to improve detection performance.
Context Augmented Retrieval boosts LLM performance with efficient info retrieval, improving accuracy & efficiency in large language models.
LLMs unlock mathematical discovery with In-Context Symbolic Regression (ICSR), outperforming traditional methods by providing context & guidelines to generate symbolic equations that fit data.
xLSTM-UNet outperforms ViM-UNet in 2D & 3D medical image segmentation tasks with extended Long Short-Term Memory module & UNet architecture.
Researchers evaluate if large language models (LLMs) can aid comedians by generating humorous content & aligning with human humor styles, finding significant gaps in nuance & contextual awareness.
Agent Attention combines softmax & linear attention to improve performance & efficiency in transformer models, outperforming traditional attention mechanisms on image recognition, object detection & language modeling tasks.
TurboTLS reduces TLS connection latency by 1 round trip using UDP & TCP combo, offering substantial improvements especially on reliable connections.
WildGaussians: novel 3D Gaussian splatting technique for real-time novel view synthesis in uncontrolled scenes. Enables high-quality 3D reconstruction & rendering from sparse RGB-D or multi-view data.
Beyond Euclid: Modern Machine Learning enriched by geometric, topological & algebraic structures. Geometric Deep Learning, Algebraic Topology & Riemannian Geometry explored in an illustrated guide.
New clustering method combines human input & statistical sampling to estimate cluster count more effectively. Users provide feedback to refine estimation process, overcoming algorithmic limitations.
SparQ Attention reduces LLM inference bandwidth by up to 90% with selective attention transfer, enabling more accessible & energy-efficient AI apps
Researchers introduce LAMSUM dataset & experiments with LLMs to enhance extractive summarization coherence. They fine-tune models like BERT & GPT-2 to optimize for coherence, improving summary flow & naturalness.
Researchers introduce SpreadsheetLLM: encoding spreadsheets for LLMs. This approach enables LLMs to understand spreadsheet structure, formulas & data, outperforming previous methods in tasks like formula prediction & cell value generation.
AI language models can exhibit human-like behaviors & attributes, blurring reality & simulation. This "conscious exotica" raises ethical & philosophical implications, encouraging critical thinking about AI's nature & societal impact.
ColPali: a novel approach for efficient document retrieval using vision-language models, outperforming traditional text-based methods by jointly representing & retrieving documents from both textual & visual content.
New RNN model "Learning to (Learn at Test Time)" adapts & learns during test time with dynamic hidden state updates via TTT layers, outperforming standard RNNs on benchmark tasks.
ScreenAI model understands UIs & infographics with 5B params, outperforming larger models on tasks like Multi-page DocVQA & WebSRC, thanks to novel screen annotation task & flexible patching strategy.
Mooncake optimizes LLM inference performance with KVCache, SnapKV, PyramidInfer & MiniCache techniques, reducing memory usage & increasing throughput by up to 3x.
LLMs explored for leaderboard extraction from tech papers, with RoBERTa showing strong performance in automating data extraction, saving researchers time & effort.
Researchers propose a control theory approach to prompting large language models, framing prompt engineering as a control system problem to steer model behavior & output.
LLMs perform at chance accuracy & show inconsistencies in answers, lacking human-like understanding of language, challenging their claimed human-level compositional abilities.
LLM4GEN uses semantic representations of large language models to improve text-to-image generation, producing coherent & faithful images that align with input text prompts.
Evaluating social impacts of generative AI systems in 2 areas: base system evaluation & societal context evaluation, covering bias, privacy, inequality & environmental effects.
MAGIS: LLM-Based Multi-Agent Framework resolves GitHub issues efficiently using large language models & multi-agent system, outperforming existing methods in empirical study.
SciBench benchmark reveals current Large Language Models struggle with complex scientific problems, achieving only 43.22% correct answers.
Adam-mini optimizes Adam with a single global learning rate, reducing memory usage & achieving comparable performance on various tasks.
Large language models like GPT-3 can "know" correct answers but fail to output them due to biases & limitations in the models, leading to "long-context failures". Researchers aim to guide future work in making LLMs better at long-context reasoning.
Q* framework improves multi-step reasoning for LLMs by integrating deliberative planning, allowing them to plan & execute tasks more effectively.
Large language models like Jellyfish can now preprocess data locally, improving security & performance, rivaling GPT-3.5/4 capabilities while being more customizable.
Generative models can outperform experts who train them, raising questions about intelligence & human-machine collaboration. Researchers explore "transcendence" in a new paper, highlighting potential benefits & challenges.
Self-play fine-tuning converts weak language models into strong ones by having them engage in self-directed dialogue & learn effective reasoning strategies, outperforming alternative methods on tasks requiring advanced cognitive skills.
LLAMAFUZZ combines large language models with traditional fuzzing to generate diverse & effective input data, finding more bugs & vulnerabilities in software, especially in structured data formats.
Scrolly2Reel transforms news graphics into short-form TikTok videos by adjusting narrative pacing & beats. It repurposes existing content for social media platforms, making it engaging & accessible to younger audiences.
LLMs outperform traditional compilers in code optimization tasks, but compilers excel in systematic, low-level optimizations. Hybrid approaches combining both may offer a path forward for robust code optimization systems.
DeepSeek-Coder-V2 breaks barrier of closed-source models in code intelligence with improved understanding, generation & editing capabilities, pushing limits of mathematical reasoning & code generation.
Proofread tool fixes all errors in text with one tap using large language models, outperforming traditional proofreading methods in accuracy & efficiency.
Researchers developed Asclepius, a specialized clinical language model using synthetic clinical notes. It outperformed GPT-3.5-turbo in clinical text tasks & made all resources publicly accessible for future research.
Diffusion models generate new data by reversing noise addition process. They start with random noise & transform it into meaningful data through diffusion. Key concept: reverse diffusion process reconstructs original data from noisy version.
Researchers propose a VR-CBT system using rough set analysis to personalize pain management for chronic neck & shoulder pain sufferers, showing promising results in a pilot study.
Turbo Sparse achieves SOTA performance with 10x fewer activated params by leveraging sparse attention & feed-forward layers, making it a promising approach for efficient LLMs.
Generative diffusion models undergo 2nd-order phase transitions related to symmetry breaking, key to their generative capabilities & characterized by mean-field critical exponents.
Computers learn to connect images & sounds with self-supervised technique, separating "chirp" (env sounds) from "chat" (speech), enabling better understanding of multimodal world.
Conformal prediction sets improve human decision-making by providing quantified uncertainty estimates alongside model predictions, leading to better decisions compared to traditional point estimates.
New language modeling approach avoids computationally expensive matrix multiplication, improving efficiency & scalability without sacrificing performance.
Bootstrap3D improves 3D content creation with synthetic data, outperforming previous methods in diversity, compositionality & realism.
Vision-LSTM uses xLSTM as a generic building block for vision tasks, outperforming CNNs in performance & efficiency, with applications in image classification, object detection & semantic segmentation.
Research paper "Ask LLMs Directly" measures social bias in large language models, exploring what shapes their biases.
New algorithm uses 2-stage filtering to quickly find nearest matches in large vector databases, achieving significant speedups while maintaining accuracy.
ReGAL tool refactors code to discover generalizable abstractions, improving modularity & maintainability in large software systems through automated analysis & pattern recognition.
S-LoRA system enables scalable serving of thousands of LoRA adapters with up to 4x throughput improvement & increased adapter capacity.
Gated Linear Attention Transformers (GLAT) improve efficiency & performance on resource-constrained devices like smartphones & IoT sensors with a linear-complexity attention mechanism & hardware-aware training.
WaveCoder enhances large language models with refined synthetic code data, improving performance on tasks like code completion & generation.
Virtual avatar generation models explored as "world navigators" in new research paper, enabling efficient exploration & interaction with 3D virtual environments.
SaySelf teaches LLMs to express confidence with self-reflective rationales, improving model calibration & transparency in language understanding & generation tasks.
Researchers created CompanyKG, a large-scale graph quantifying company similarity through products, services, leadership & financial data. This knowledge graph approach outperforms traditional methods in identifying similar companies.
LLM evaluation is flawed due to biased benchmark datasets. Researchers propose uncertainty quantification & diverse benchmarks to make LLM assessment more robust & reliable.
Categorical Deep Learning is an Algebraic Theory of All Architectures, simplifying complex neural networks with algebraic rules.
Audio Flamingo: a novel audio language model with few-shot learning & dialogue abilities, advancing AI's ability to work with audio data beyond text-based tasks.
Open-source language models can rival ChatGPT's capabilities even with less data, making them a viable & transparent alternative for many applications.
Large language models vulnerable to "context injection attacks" where input prompts are manipulated to generate harmful or malicious content. Researchers propose defenses & mitigation strategies to protect against such attacks.
Contextual Position Encoding learns to assign importance to input positions, adapting to varying sequence lengths & improving language model performance.
Neural Network Diffusion improves diffusion models' efficiency & effectiveness by integrating neural networks into the diffusion process, enabling better data capture & coherent outputs.
Certifiably Robust RAG (CR-RAG) improves Retrieval Augmented Generation models' robustness against retrieval corruption with theoretical guarantees & architectural changes.
Kotlin ML Pack simplifies building machine learning models in Kotlin with high-level API & automated code generation, outperforming CodeBenchGen & PythonSAGA in experiments.
Large language models like Llama 2-Chat can be easily misused even with safety fine-tuning, researchers find it's possible to undo these safeguards for under $200.
Metaheuristics & Large Language Models join forces to tackle complex optimization problems, potentially leading to improved performance & capabilities.
ChatGPT struggles with parallel programming & complex algorithmic reasoning in generating scientific code across various languages, but shows promise in compilation & runtime performance.
New attention mechanism improves neural network performance by focusing on most relevant parts of input & encouraging global sparsity, outperforming standard attention in various tasks & settings.
NV-Embed improves LLMs as generalist embedding models, outperforming BERT & LLM2Vec on word similarity, analogies & probing tasks. It captures broader semantic info, useful for various AI apps like info retrieval & user privacy protection.
Large language models' reasoning abilities may be driven by retrieving relevant examples rather than true reasoning capabilities, challenging their perceived intelligence.
Large language models process text with a "forward-in-time" bias, influencing tasks like time series forecasting & zero-shot learning. Researchers explore how this temporal asymmetry affects LLM capabilities & limitations.
Transformers are SSMs: Generalized Models & Efficient Algorithms Through Structured State Space Duality. Research shows Transformers can be viewed as a type of state space model, enabling efficient algorithms & generalized models.
MoEUT: Mixture-of-Experts Universal Transformers scales large language models 47x with minimal performance impact, enabling more powerful & versatile universal language models.
Many-shot in-context learning boosts language model performance with few examples, outperforming traditional methods & enabling quick adaptation to new tasks & data.
Researchers develop "output2prompt" method to recover original prompts from language model outputs without access to internal workings, improving memory efficiency with sparse encoding technique.
DarkDNS speeds up DNS zone updates by leveraging rapid zone updates, reducing time to propagate changes & making the internet more responsive & resilient.
LLMs improved with ontologies: Researchers integrate structured knowledge into LLM training & inference, boosting accuracy in question-answering tasks.