Generated on 2024-11-18 15:53:49 by PubSummarizer
Paper URL: https://openreview.net/attachment?id=sPLTQSf6GI&name=pdf
This paper proposes a measure-theoretic axiomatization of causality, addressing the lack of a universally accepted framework by introducing the concept of "causal spaces." These spaces integrate probability theory with causal information through "causal kernels," which describe the effects of interventions on systems. The authors argue that using Kolmogorov's measure-theoretic framework as a foundation allows for a more rigorous understanding of causal relationships, particularly in complex scenarios involving cycles, latent variables, and stochastic processes, which existing frameworks struggle to address. The paper further compares causal spaces to traditional models like structural causal models and potential outcomes, highlighting their advantages and limitations while emphasizing the need for future work in embedding counterfactuals and actual causality within this new framework.
Paper URL: https://openreview.net/attachment?id=eTHawKFT4h&name=pdf
This paper establishes a rigorous connection between Bayesian methods, variational inference, and deep ensemble techniques in deep learning by reformulating the optimization problems they encounter. By leveraging the framework of Wasserstein gradient flows, the authors unify various approaches for uncertainty quantification and demonstrate that different algorithms arise from choices related to regularizers in a generalized variational inference context. The paper introduces novel algorithms like interacting deep ensembles, which are shown to converge to a global minimizer, and contrasts their efficacy with traditional methods. Through theoretical insights and numerical experiments, the authors highlight the advantages of the proposed methodologies over conventional variational inference approaches, suggesting pathways for future advancements in Bayesian deep learning.
Paper URL: https://openreview.net/attachment?id=wIlmx4bHrO&name=pdf
This paper introduces a novel single-loop Extra-Gradient Difference Acceleration (EGDA) algorithm designed for solving constrained nonconvex-nonconcave (NC-NC) minimax problems, achieving a significant improvement in convergence rates. By utilizing a new extra-gradient difference step and incorporating momentum acceleration, the proposed algorithm attains a complexity of O(ϵ^2) for finding ε-stationary points, outperforming existing methods that achieve complexities of O(ϵ^4) or eO(ϵ^3). Additionally, the algorithm's applicability extends to constrained nonconvex-concave (NC-C) and convex-nonconcave (C-NC) problems, retaining the same optimal complexity O(ϵ^2). The theoretical analysis and numerical experiments validate the enhanced performance and efficiency of the EGDA algorithm in various minimax optimization contexts.
Paper URL: https://openreview.net/attachment?id=O0Lz8XZT2b&name=pdf
The paper challenges the conventional understanding of the relationship between model complexity and prediction error, traditionally represented by a U-shaped curve. It addresses the phenomenon of "double descent," where test error decreases again after a peak as model parameters exceed the number of training samples. The authors argue that previous claims of double descent in classical statistical learning methods like linear regression, trees, and boosting do not contradict traditional statistical intuition. They demonstrate that these observed curves can be explained by considering multiple underlying complexity axes and that when effective parameter counts are measured appropriately, the double descent shapes revert to traditional U-curves. By interpreting various machine learning models as smoothers, the study provides a new lens for understanding parameter counting and emphasizes the importance of effective parameters in assessing model performance on unseen data.
Paper URL: https://openreview.net/attachment?id=kMueEV8Eyy&name=pdf
The paper investigates the geometric properties of gradient descent dynamics in large machine learning models, emphasizing the concept of conservation laws—quantities preserved during the optimization process. It rigorously defines these laws, demonstrates how to compute the maximal number of independent conservation laws using Lie algebra techniques, and presents algorithms to identify polynomial conservation laws. The authors showcase their findings through various examples, particularly focusing on ReLU network architectures, confirming that existing conservation laws are complete and no other independent laws exist. This work contributes to understanding the implicit bias of optimization initialization and generalization in over-parameterized models, paving the way for further exploration of optimization dynamics in machine learning.
Paper URL: https://openreview.net/attachment?id=R6KJN1AUAR&name=pdf
This paper presents a novel approach to latent variable identification and out-of-support image generation using a specific class of decoders termed "additive decoders." These decoders are particularly effective for images that can be represented as sums of object-specific images, enabling both the identification of latent variables up to permutation and the generation of new images through a process called Cartesian-product extrapolation. The authors establish theoretical conditions under which these decoders guarantee identifiability and demonstrate empirically that additivity is crucial for both identifiability and extrapolation in simulated datasets. This work contributes to the understanding of object-centric representation learning and nonlinear independent component analysis by providing insights into the mathematical foundations that allow for effective disentanglement of latent factors.
Paper URL: https://openreview.net/attachment?id=ITw9edRDlD&name=pdf
This paper challenges the notion that large language models (LLMs) exhibit emergent abilities—sudden, unpredictable enhancements in performance as model size increases—arguing instead that these phenomena may stem from the selection of metrics used to evaluate model outputs. The authors propose that nonlinear or discontinuous metrics can create the illusion of emergent abilities, while linear or continuous metrics reveal smoother, more predictable performance improvements. They provide a mathematical framework and conduct multiple analyses using the InstructGPT/GPT-3 family and the BIG-Bench benchmark, demonstrating that changing the evaluation metric can eliminate perceived emergent abilities. Their findings suggest that emergent abilities might not be intrinsic properties of the models but rather artifacts of the measurement techniques employed by researchers.
Paper URL: https://openreview.net/attachment?id=9VqMaSjf7U&name=pdf
The paper introduces Brain Diffusion for Visual Exploration (BrainDiVE), a novel data-driven method that utilizes large-scale generative models to synthesize images aimed at activating specific regions of the human visual cortex, thereby enhancing our understanding of its functional organization. Traditional approaches in neuroscience often rely on manually curated stimuli, which can limit the exploration of brain function. In contrast, BrainDiVE employs diffusion models guided by fMRI data to generate images with high semantic specificity for category-selective regions, allowing for the identification of subtle differences and novel sub-regions within these areas. The results demonstrate that BrainDiVE effectively elucidates fine-grained preferences in the visual system and offers a promising avenue for further investigation into cortical organization.
Paper URL: https://openreview.net/attachment?id=mayAyPrhJI&name=pdf
The paper presents a new approach called ReinMax to enhance gradient estimation for deep learning models dealing with discrete latent variables, addressing limitations of backpropagation which is traditionally suited for continuous variables. The authors analyze the Straight-Through (ST) estimator, demonstrating its first-order approximation nature, and propose ReinMax as a second-order accurate method by integrating Heun's method without requiring second-order derivatives. Extensive experiments show that ReinMax outperforms existing state-of-the-art methods in various tasks, offering insights into hyperparameter optimization and improving the understanding of gradient estimators for discrete variables.
Paper URL: https://openreview.net/attachment?id=Lr2swAfwff&name=pdf
The paper introduces a new theoretical complexity measure called the effective horizon, which aims to bridge the gap between reinforcement learning (RL) theory and practice. The authors analyze deep RL algorithms like PPO and DQN in conjunction with a newly constructed dataset, BRIDGE, comprising 155 deterministic Markov Decision Processes (MDPs). They discover that a property related to the alignment of Q-values under random and optimal policies significantly predicts the success of deep RL algorithms. The effective horizon serves as a more reliable predictor of empirical performance than traditional sample complexity bounds, revealing its potential to explain the impact of techniques like reward shaping and pre-trained exploration policies. Overall, the findings suggest that understanding the effective horizon can lead to better theoretical insights and practical improvements in deep RL.
Paper URL: https://openreview.net/attachment?id=QIFoCI7ca1&name=pdf
This paper explores the application of causal normalizing flows (NFs) for causal inference, demonstrating that causal models can be identified using autoregressive NFs from observational data when the causal ordering is known. It first establishes a theoretical framework linking non-linear independent component analysis (ICA) to causal inference, followed by a detailed examination of design choices for causal NFs that effectively capture the underlying causal data-generating processes. The authors introduce a do-operator within the causal NF framework to facilitate interventional and counterfactual analyses. Empirical evaluations show that the proposed causal NFs outperform traditional approaches in both accuracy and efficiency, effectively handling real-world scenarios involving mixed discrete-continuous data and partial causal knowledge, as demonstrated through extensive experiments including a real-world use case on fairness auditing in credit assessments.
Paper URL: https://openreview.net/attachment?id=5W7cXno10k&name=pdf
The paper introduces characteristic circuits (CCs), a novel family of tractable probabilistic models designed to effectively manage heterogeneous data by utilizing characteristic functions in the spectral domain. Unlike traditional probabilistic circuits (PCs), which struggle with mixed data types and often lack closed-form density functions, CCs provide a unified framework that allows for efficient learning and inference of high-dimensional distributions without relying on a specific base measure. The authors demonstrate that CCs can outperform state-of-the-art density estimators on various benchmark datasets, showcasing their ability to compute densities, marginals, and moments efficiently. This work highlights the potential of CCs for enhancing probabilistic modeling in complex, real-world scenarios.
Paper URL: https://openreview.net/attachment?id=i913TUOvTK&name=pdf
The paper presents MinD-Video, a novel method for reconstructing high-quality videos from brain activity recorded via functional Magnetic Resonance Imaging (fMRI). This work addresses the challenge of translating continuous visual experiences into video format, which has been less explored compared to static image reconstruction. MinD-Video employs a progressive learning approach, integrating masked brain modeling and multimodal contrastive learning with spatiotemporal attention, along with co-training an augmented Stable Diffusion model. The method demonstrates superior video reconstruction capabilities, achieving 85% accuracy in semantic tasks and a structural similarity index (SSIM) of 0.19, outperforming previous benchmarks by 45%. Additionally, the model shows biological plausibility, reflecting established cognitive processes and offering insights into the neural mechanisms of visual perception.
Paper URL: https://openreview.net/attachment?id=n84bzMrGUD&name=pdf
The paper introduces Clifford Group Equivariant Neural Networks (CGENNs), a new framework for constructing equivariant neural networks leveraging the properties of the Clifford algebra and its associated groups. The authors explore the Clifford group, which acts as an orthogonal automorphism on the entire Clifford algebra while maintaining the multivector grading. They demonstrate that this group action preserves both the vector space and multiplicative structures of the algebra, facilitating the development of equivariant neural network layers. CGENNs are shown to effectively generalize to various dimensional inner-product spaces and achieve state-of-the-art performance on several tasks, including physics experiments and geometric computations, showcasing the advantages of incorporating geometric properties into neural network architectures.
Paper URL: https://openreview.net/attachment?id=IwnINorSZ5&name=pdf
This paper presents a novel framework called conformal meta-learners for predictive inference of individual treatment effects (ITEs) using machine learning. Unlike traditional methods that primarily yield point estimates of conditional average treatment effects (CATE), conformal meta-learners provide predictive intervals for ITEs by applying conformal prediction (CP) to CATE meta-learners. The authors demonstrate that these conformal meta-learners are valid under certain stochastic dominance conditions and can efficiently estimate ITEs while maintaining desirable properties of CATE estimators. Through numerical experiments, the framework shows effective coverage and efficiency in comparison to existing methods. This work addresses challenges in causal inference by enabling direct inference on ITEs, thereby improving the understanding of treatment effect heterogeneity across individuals.
Paper URL: https://openreview.net/attachment?id=1zo4iioUEs&name=pdf
The paper introduces DiffuseBot, a novel framework that utilizes physics-augmented generative diffusion models to design and optimize soft robots. By integrating physical simulations into the diffusion process, DiffuseBot generates robot morphologies that excel in various tasks, including locomotion and manipulation. The framework allows for co-optimization of robot design and control by leveraging insights from differentiable simulations, bridging the gap between virtual and physical robot capabilities. The authors demonstrate the efficacy of DiffuseBot through extensive simulations and a proof-of-concept physical robot, highlighting its potential for accelerating design cycles and enhancing robotic performance across diverse applications.
Paper URL: https://openreview.net/attachment?id=HPuSIXJaa9&name=pdf
The paper introduces Direct Preference Optimization (DPO), an innovative algorithm designed to enhance the alignment of large-scale unsupervised language models (LMs) with human preferences without relying on traditional reinforcement learning from human feedback (RLHF). DPO simplifies the optimization process by directly optimizing a language model's policy based on a binary cross-entropy objective derived from human preference data, thereby eliminating the need for an explicit reward model and complex sampling strategies during training. Experimental results demonstrate that DPO performs comparably or better than existing RLHF methods in tasks such as sentiment modulation, summarization, and single-turn dialogue, while being more stable and computationally efficient, thus significantly lowering the barrier for implementing preference-based tuning of language models.
Paper URL: https://openreview.net/attachment?id=rybsHQ4DXy&name=pdf
The paper presents EgoEnv, a novel approach for learning human-centric environment representations from egocentric video that enhances standard video understanding techniques by linking visual features to the underlying physical space. By training models on simulated 3D environments, EgoEnv captures the camera-wearer's local surroundings, allowing for predictive modeling of unseen environmental contexts. The approach demonstrates improved performance on human-centric tasks, such as room classification and natural language query localization in real-world video datasets, outperforming traditional clip-based methods. The findings highlight the potential of using simulated data to transfer knowledge to complex real-world scenarios, thereby setting a new state-of-the-art in the field.
Paper URL: https://openreview.net/attachment?id=QzcZb3fWmW&name=pdf
This paper investigates the difference in bias between human visual systems and convolutional neural networks (CNNs), which tend to favor texture over shape in object recognition. The authors propose that enforcing sparse coding, specifically through a non-differential Top-K operation, can induce a shape bias in CNNs. By implementing this sparse coding mechanism, they demonstrate that CNNs can better decompose objects into structural parts, leading to improved robustness against texture-based distractions and enhanced coherence in synthetic images. Their experiments reveal that Top-K responses primarily encode structural information, while non-Top-K responses focus on texture, thereby bridging the bias gap between machine and human vision. The findings suggest that sparse coding principles might play a role in the shape bias observed in human visual perception.
Paper URL: https://openreview.net/attachment?id=fHyLsfMDIs&name=pdf
The paper presents a new neural algorithm for computing the entropic optimal transport (EOT) plan between continuous probability distributions using samples. It introduces a saddle-point reformulation of the dynamic EOT, also known as the Schrodinger Bridge problem, enabling an end-to-end learning approach that is computationally efficient and stable even for small values of the entropy regularization coefficient. The authors demonstrate the efficacy of their method through empirical results on various large-scale EOT tasks, showing significant improvement over existing techniques in terms of performance and applicability to real-world problems, particularly in generating diverse outputs for tasks like image super-resolution. The proposed algorithm and its implementation are publicly accessible.
Paper URL: https://openreview.net/attachment?id=eD534mPhAg&name=pdf
This paper presents a novel evaluation metric for assessing the explainability of Graph Neural Networks (GNNs) called OOD-resistant Adversarial Robustness (OAR). Traditional evaluation methods often struggle with out-of-distribution (OOD) issues, leading to unreliable assessments of explanations. OAR addresses these limitations by leveraging adversarial robustness principles, evaluating the quality of explanations based on their resistance to adversarial attacks while ensuring adherence to the original data distribution through an OOD reweighting mechanism. Additionally, a simplified version, SimOAR, is proposed to enhance computational efficiency, particularly for large datasets, with minimal performance trade-offs. Extensive empirical experiments demonstrate that both OAR and SimOAR significantly outperform existing evaluation metrics, providing more reliable and consistent assessments of GNN explanations.
Paper URL: https://openreview.net/attachment?id=FtNruwFEs3&name=pdf
This paper introduces an exact Bayesian inference method for discrete statistical models, leveraging probability generating functions (PGFs) to facilitate exact computation of posterior probabilities, moments, and variances, even with infinite support and continuous priors. The authors present a new probabilistic programming language, SGCL, designed to express complex discrete and continuous statistical models while ensuring that every program can be translated into a generating function for automated inference. They developed a tool called Genfer, which utilizes automatic differentiation for efficient computation without requiring computer algebra, demonstrating superior performance compared to existing exact inference tools on various benchmarks. The approach is shown to be competitive with Monte Carlo methods on real-world problems, achieving exact results while avoiding approximation errors, thereby addressing significant challenges in Bayesian statistics related to posterior distribution computation.
Paper URL: https://openreview.net/attachment?id=Vota6rFhBQ&name=pdf
This paper introduces MeZO, a memory-efficient zeroth-order (ZO) optimizer designed for fine-tuning large language models (LMs) without the memory overhead associated with traditional backpropagation. By adapting the ZO-SGD method to operate in-place, MeZO allows the training of models with billions of parameters using the same memory footprint as inference. Experiments demonstrate that MeZO significantly outperforms in-context learning and linear probing, achieving comparable performance to full fine-tuning while reducing memory requirements by up to 12 times. Furthermore, MeZO is compatible with parameter-efficient tuning techniques and can optimize non-differentiable objectives, highlighting its versatility for a variety of downstream tasks. Theoretical insights support the empirical results, showing that adequate pre-training and task prompts facilitate effective optimization with MeZO, even for large models.
Paper URL: https://openreview.net/attachment?id=gI1SOgW3kw&name=pdf
This paper addresses the limitations of nonlinear independent component analysis (ICA) by proposing new identifiability results that extend the framework beyond the conventional assumptions of structural sparsity and independence among sources. The authors demonstrate that identifiability can be achieved in cases of undercompleteness (more observed variables than sources), partial sparsity, and source dependence. They introduce flexible grouping structures of sources, allowing for the identification of latent variables even when certain sparsity or independence conditions are violated. Empirical validation is provided through experiments on synthetic and real-world datasets, suggesting the practical applicability of the proposed framework for scientific discovery and disentangled representations in machine learning.
Paper URL: https://openreview.net/attachment?id=27TdrEvqLD&name=pdf
This paper explores the limitations and expressivity of persistent homology (PH) when applied to attributed graphs, particularly in the context of message-passing graph neural networks (MP-GNNs). The authors introduce the concept of color-separating sets to fully characterize the class of graphs that PH can distinguish based on the persistence of connected components derived from vertex and edge colors. They demonstrate that vertex- and edge-level PH have distinct expressive powers and propose a novel method called RePHINE that integrates both levels to enhance graph classification performance. Theoretical results underpin RePHINE's advantages, which are empirically validated across various datasets, showing significant improvements over standard PH methods and existing topological neural networks.
Paper URL: https://openreview.net/attachment?id=RSGNGiB1q4&name=pdf
This paper presents a novel approach to transforming popular knowledge graph embedding (KGE) models, such as COMPL EX and RESCAL, into generative models known as generative KGE circuits (GeKCs). By interpreting KGE score functions as structured computational graphs, the authors demonstrate that these models can achieve efficient maximum-likelihood estimation (MLE) and sampling while adhering to logical constraints. The proposed methods, which include non-negative activation restrictions and squaring of outputs, enhance the scalability and performance of KGE models in link prediction tasks across large graphs with millions of entities. Experimental results indicate that the GeKCs maintain competitive link prediction accuracy compared to traditional KGE models while providing better probabilistic interpretations and allowing for the integration of logical constraints.
Paper URL: https://openreview.net/attachment?id=dVnhdm9MIg&name=pdf
This paper presents a model for few-shot concept learning that mimics human-like inductive reasoning by employing Bayesian methods over natural language hypotheses. The model generates candidate concepts expressed in natural language, which are evaluated against a learned prior based on human judgments, allowing for efficient inference across a diverse hypothesis space. By leveraging large language models, the approach captures human generalization patterns for abstract concepts, such as numerical sets, and demonstrates improved accuracy in concept-learning tasks compared to traditional Bayesian and program-learning models. The findings suggest that integrating human-like inductive biases into AI systems could enhance their data efficiency and generalization capabilities.
Paper URL: https://openreview.net/attachment?id=A7feCufBhL&name=pdf
This paper presents a comparative analysis of image captioning and contrastive pretraining approaches for developing vision encoders, specifically using a standard encoder-decoder transformer architecture. The authors demonstrate that image captioning as a standalone pretraining strategy yields competitive performance in classification tasks and outperforms contrastive pretraining on vision-and-language tasks. Through careful matching of training data, compute resources, and model capacity, they reveal that captioning exhibits superior scaling behavior and offers significant advantages for downstream multimodal applications. Additionally, they introduce a new pretraining technique called CapPa, which alternates between autoregressive and parallel decoding, further enhancing the performance of vision encoders. Overall, the findings challenge the prevailing notion that captioning is an inferior pretraining strategy, highlighting its potential for effective vision representation learning.
Paper URL: https://openreview.net/attachment?id=TXoZiUZywf&name=pdf
This paper introduces enhanced algorithms for the stochastic linear bandit problem, focusing on the development of tighter confidence sequences utilizing a novel tail bound for adaptive martingale mixtures. The algorithms, named Convex Martingale Mixture UCB (CMM-UCB) and Analytic Martingale Mixture UCB (AMM-UCB), leverage these confidence sequences to enable efficient action selection through convex programming, leading to competitive worst-case regret guarantees. The authors demonstrate that their confidence sequences outperform existing methods both theoretically and empirically, resulting in improved performance across various hyperparameter tuning tasks. The study highlights the importance of tighter confidence bounds in optimizing bandit algorithms and suggests potential avenues for future research, particularly in extending these results to non-linear reward functions.
Paper URL: https://openreview.net/attachment?id=jA235JGM09&name=pdf
The paper investigates vulnerabilities in large language models (LLMs) like GPT-4 and Claude v1.3, focusing on how safety training fails against jailbreak attacks. It identifies two primary failure modes: competing objectives, where safety goals conflict with model capabilities, and mismatched generalization, where safety training does not cover all potential input scenarios the model can encounter. Using these insights, the authors constructed new jailbreak methods that successfully bypassed safety measures, demonstrating that despite extensive safety training, LLMs remain vulnerable to adversarial manipulation. The findings highlight the necessity for safety mechanisms to be as sophisticated as the models themselves and argue that merely scaling up models will not inherently solve these safety issues.
Paper URL: https://openreview.net/attachment?id=q131tA7HCT&name=pdf
This paper addresses the challenge of learning causal representations from interventions in scenarios where the mixing function is nonlinear and the latent variables are Gaussian. The authors establish strong identifiability results for models with unknown single-node interventions, extending prior work that focused on simpler cases. They present a contrastive learning algorithm designed to identify latent variables effectively and assess its performance across various tasks. The findings reveal that identifiability can be achieved without requiring knowledge of intervention targets or paired data, thus making significant strides in the field of causal representation learning, particularly in complex real-world applications.
Paper URL: https://openreview.net/attachment?id=Pe9WxkN8Ff&name=pdf
The paper presents a novel approach to training Transformers that are inherently interpretable by design, termed "Transformer Programs." Building on an existing programming language (RASP), the authors propose a modified Transformer architecture that can be trained via gradient-based optimization and subsequently converted into discrete, human-readable programs in Python. This method facilitates the interpretation of model behavior, enabling the debugging of errors and the identification of the circuits employed for problem-solving. The authors validate their approach through various tasks, including algorithmic challenges and natural language processing applications, demonstrating that Transformer Programs achieve comparable performance to standard Transformers while being significantly easier to interpret. Overall, the work aims to advance the field of mechanistic interpretability in machine learning by creating models that are both effective and understandable.
Paper URL: https://openreview.net/attachment?id=AOKU4nRw1W&name=pdf
This paper presents SynGen, an innovative approach to enhance attribute correspondence in text-to-image generation models, specifically addressing issues of improper binding where visual attributes fail to correctly align with their corresponding linguistic modifiers. By syntactically analyzing text prompts to identify entities and modifiers, SynGen employs a novel loss function that aligns cross-attention maps with the linguistic structure of the input prompt during inference. Through evaluation on three datasets, including a newly designed challenge set, SynGen demonstrates significant improvements over existing state-of-the-art methods in generating faithful images that accurately reflect the input descriptions, emphasizing the effectiveness of integrating linguistic information in the image generation process.
Paper URL: https://openreview.net/attachment?id=cB0BImqSS9&name=pdf
The paper presents MONARCH MIXER (M2), a novel architecture designed to achieve sub-quadratic scaling in both sequence length and model dimensions, overcoming the limitations of existing architectures such as Transformers that scale quadratically. M2 employs Monarch matrices, a class of structured matrices that efficiently captures various linear transformations, ensuring high hardware efficiency on GPUs. The authors demonstrate M2's efficacy through experiments in three domains: non-causal BERT-style language modeling, ViT-style image classification, and causal GPT-style language modeling, where M2 outperforms or matches state-of-the-art models with fewer parameters and increased throughput. The findings suggest M2 could pave the way for more efficient machine learning models, warranting further exploration and optimization.
Paper URL: https://openreview.net/attachment?id=QDByreuQyk&name=pdf
This paper presents significant advancements in differentially private algorithms for the min s-tcut and multiway k-cut problems, which are crucial for applications in graph theory and various machine learning tasks. The authors establish nearly tight bounds for both lower and upper error limits in the private min s-tcut algorithm, demonstrating that it can achieve privacy without compromising runtime efficiency. Their algorithm maintains an additive error of O(n) for edge-differential privacy while running at the speed of non-private algorithms. Furthermore, they introduce a novel approach for the multiway k-cut problem that reduces the additive error to O(n/log k), significantly more efficient than previous methods. The empirical evaluation supports the theoretical findings, showing that their algorithm's performance closely aligns with non-private counterparts, while also preserving data privacy.
Paper URL: https://openreview.net/attachment?id=HV85SiyrsV&name=pdf
This paper investigates online reinforcement learning (RL) within episodic Markov decision processes (MDPs) under the linear -realizability assumption, which generalizes linear MDPs by allowing for states where all actions have approximately equal values. The authors propose a new algorithm, S KIPPY ELEANOR, which identifies states to ignore, effectively transforming the problem into a linear MDP setting. They demonstrate that this approach yields an -optimal policy after a polynomial number of interactions with the MDP, thus achieving the first polynomial-sample-complexity result for online RL in linearly realizable MDPs. The paper includes a thorough theoretical analysis and proves the algorithm's efficiency even in the presence of misspecification errors.
Paper URL: https://openreview.net/attachment?id=w116w62fxH&name=pdf
This paper investigates the statistical complexity of realizable regression within the frameworks of Probably Approximately Correct (PAC) learning and online learning. The authors introduce a minimax instance optimal learner for realizable regression and propose new combinatorial dimensions that characterize learnability in these settings. They establish necessary and sufficient conditions for PAC learnability based on the scaled graph and DS dimensions, and they introduce an optimal online learner that achieves minimax optimal cumulative loss. The results highlight gaps in existing dimensions for regression, contrasting them with binary and multiclass classification, and resolve an open question pertaining to the characterization of online realizable regression. The work aims to deepen the understanding of learning theory, specifically regarding the complexities of real-valued function prediction.
Paper URL: https://openreview.net/attachment?id=mmTy1iyU5G&name=pdf
This paper presents a theoretical framework for analyzing the effectiveness of deep neural networks as solution generators for combinatorial optimization problems, specifically focusing on policy gradient methods. The authors investigate the existence of generative models that can produce approximately optimal solutions while ensuring a polynomial number of parameters and a benign optimization landscape that avoids sub-optimal stationary points. They provide a positive answer to this question for several well-known combinatorial problems, including Max-Cut, Min-Cut, Maximum-Weight-Bipartite-Matching, and the Traveling Salesman Problem. Additionally, the paper introduces novel regularization techniques to enhance the optimization process, demonstrating through theoretical and empirical evidence that these methods can mitigate issues related to vanishing gradients and local minima, thereby improving the performance of solution samplers in practical scenarios.
Paper URL: https://openreview.net/attachment?id=sW8yGZ4uVJ&name=pdf
The paper investigates the global convergence of policy gradient methods, specifically under linear function approximation for finite-arm bandits. It establishes that global convergence is not solely dependent on approximation error, challenging previous assumptions that it is a key factor. The authors demonstrate that both the standard Softmax policy gradient (PG) and natural policy gradient (NPG) can achieve global convergence even with non-zero approximation errors, contingent upon specific conditions related to the representation of policies and rewards. For NPG, convergence is guaranteed if the projection of the reward preserves the optimal action's rank, while for Softmax PG, a non-domination condition and the ability to maintain reward ranking are critical. Experimental results corroborate these theoretical findings, emphasizing a need to reassess the role of approximation error in characterizing the convergence properties of policy gradient methods.
Paper URL: https://openreview.net/attachment?id=f38EY21lBw&name=pdf
The paper introduces a novel framework for auditing differentially private (DP) machine learning systems using just one training run, capitalizing on the ability to independently add or remove multiple training examples. By linking differential privacy with statistical generalization, the authors demonstrate that their approach can yield meaningful empirical lower bounds on privacy parameters without the computational burden of running multiple models, which typically requires hundreds of training sessions. The methodology is validated through experiments with DP-SGD on the CIFAR-10 dataset, achieving significant lower bounds on privacy parameters while maintaining model accuracy. This work represents a significant advancement in privacy auditing, making it more feasible for large-scale machine learning applications.
Paper URL: https://openreview.net/attachment?id=y8UAQQHVTX&name=pdf
This paper introduces the concept of private everlasting prediction (PEP), which extends the notion of private prediction to accommodate an unlimited stream of classification queries while safeguarding the privacy of both the training set and the adaptive queries. The authors highlight the limitations of traditional private learners, which often exhibit high sample complexity, particularly in the context of learning threshold functions. They propose a generic construction for PEP that is applicable to concept classes with finite VC dimensions, demonstrating that their approach requires an initial training sample size that is quadratic in the VC dimension. The paper also discusses the implications of their findings for private prediction and the potential for efficient implementations in specific contexts, while leaving open questions about the reduction of sample complexity and computational efficiency.
Paper URL: https://openreview.net/attachment?id=OUIFPHEgJU&name=pdf
The paper introduces QLORA, a novel approach for efficiently finetuning quantized large language models (LLMs), specifically enabling the finetuning of a 65B parameter model on a single 48GB GPU without losing performance compared to traditional 16-bit methods. QLORA utilizes a frozen, 4-bit quantized pretrained model, which incorporates Low Rank Adapters (LoRA) to facilitate the training process. Key innovations include a new quantization technique called 4-bit NormalFloat (NF4), Double Quantization for further memory reduction, and Paged Optimizers to manage memory spikes. The resulting Guanaco model family achieves state-of-the-art performance on the Vicuna benchmark, closely rivaling ChatGPT while reducing memory requirements significantly. The paper also emphasizes the importance of dataset quality over size in model performance and evaluates chatbot capabilities using both human and GPT-4 assessments, revealing discrepancies in current benchmark evaluations.
Paper URL: https://openreview.net/attachment?id=MFWgLCWgUB&name=pdf
This paper establishes that the RANDOM COORDINATE CUT algorithm achieves the optimal competitive ratio for explainable k-medians clustering in 1-dimensional space, matching the lower bound proposed by Dasgupta et al. (2020). The authors analyze the algorithm's performance, demonstrating that its competitive ratio is bounded by \(2 \ln k + 2\), which effectively aligns with the previously established \(O(\log k)\) lower bound. The study emphasizes the importance of explainability in machine learning clustering techniques by employing threshold decision trees to enhance interpretability, thus allowing algorithmic decisions to be better understood in critical applications. Additionally, the paper provides a straightforward analysis through the concept of a Set Elimination Game, which serves as a foundation for evaluating the algorithm's efficiency.
Paper URL: https://openreview.net/attachment?id=1vzF4zWQ1E&name=pdf
The paper "Rethinking Bias Mitigation: Fairer Architectures Make for Fairer Face Recognition" by Dooley et al. addresses the inherent biases in face recognition systems, traditionally attributed to biased training data. The authors argue that biases are also embedded within the neural network architectures themselves. They conduct a large-scale analysis revealing significant impacts of architectural choices and hyperparameters on fairness. By utilizing a novel approach that combines neural architecture search (NAS) and hyperparameter optimization (HPO), the authors develop models that outperform existing architectures in both accuracy and multiple fairness metrics on prominent datasets like CelebA and VGGFace2. These new models demonstrate promising generalization capabilities across various datasets and sensitive attributes, thereby offering a new paradigm for achieving fairness in face recognition systems. The code and models are made publicly available for further research and applications.
Paper URL: https://openreview.net/attachment?id=fg7iyNK81W&name=pdf
This paper introduces Rotating Features, an innovative approach to scaling continuous and distributed object-centric representations in machine learning, specifically addressing the binding problem in human cognition. The authors critique existing slot-based methods for their limitations in representing uncertainty and flexibility in object representation, proposing Rotating Features as a generalization of complex-valued features to higher dimensions. This method enhances object discovery capabilities from simple toy datasets to complex real-world data by leveraging a novel evaluation procedure and applying pretrained features. The results demonstrate that Rotating Features can effectively represent multiple objects simultaneously and adapt well to various input complexities, advancing the field of object-centric representation learning.
Paper URL: https://openreview.net/attachment?id=Sf9goJtTCE&name=pdf
This paper investigates the application of stochastic gradient descent (SGD) to sample from Gaussian process (GP) posteriors, addressing the computational challenges posed by the cubic cost associated with traditional GP methods. It presents a novel approach that reformulates the posterior sampling problem into optimization tasks amenable to SGD, which allows for efficient sampling even in large-scale or ill-conditioned scenarios. The authors demonstrate that, despite slower convergence rates, SGD can yield high-quality predictions and uncertainty estimates comparable to those obtained from more computationally intensive methods, achieving state-of-the-art performance on various regression tasks and a large-scale Bayesian optimization benchmark. Key findings include a spectral analysis of SGD's implicit bias, which suggests that accurate predictions can still be achieved even when convergence to the optimum is not fully realized.
Paper URL: https://openreview.net/attachment?id=j5BuTrEj35&name=pdf
The paper investigates scaling large language models (LLMs) under data-constrained conditions, addressing the diminishing returns of data repetition and optimizing compute allocation. Through extensive experiments, the authors demonstrate that training on repeated data for up to four epochs yields minimal loss changes compared to unique data, while excess repetition leads to diminishing returns. They propose a new scaling law that integrates these findings with the existing Chinchilla scaling laws. Additionally, the study explores alternative strategies to mitigate data scarcity, such as augmenting training datasets with code and relaxing filtering criteria, revealing that these methods can enhance model performance in low-data scenarios. The findings suggest a path forward for scaling language models effectively despite data limitations, emphasizing the importance of adjusting training methodologies.
Paper URL: https://openreview.net/attachment?id=Dkmpa6wCIx&name=pdf
This paper investigates the relationship between sharpness minimization algorithms and generalization in overparameterized neural networks, revealing that the connection is complex and depends on model architecture and data distribution. The authors identify three scenarios involving two-layer ReLU networks: (1) flatness guarantees generalization, (2) non-generalizing flattest models exist where sharpness minimization fails, and (3) sharpness minimization can still achieve generalization even with non-generalizing flattest models. These findings challenge the notion that sharpness minimization directly leads to better generalization, suggesting that additional factors must be considered to fully understand generalization in neural networks.
Paper URL: https://openreview.net/attachment?id=yC3q7vInux&name=pdf
The paper introduces Siamese Masked Autoencoders (SiamMAE), an innovative extension of Masked Autoencoders (MAE) designed to enhance visual correspondence learning from video data. SiamMAE employs a unique asymmetric masking strategy, where 95% of the future frame's patches are masked while the past frame remains intact. This approach encourages the model to focus on object motion and develop object-centric representations. The authors demonstrate that SiamMAE significantly outperforms state-of-the-art self-supervised methods across various tasks, including video object segmentation and pose keypoint propagation, without relying on data augmentation or complex tracking techniques. The findings suggest that leveraging temporal information through asymmetric masking is crucial for effective correspondence learning in video representations, paving the way for future research in this domain.
Paper URL: https://openreview.net/attachment?id=73XPopmbXH&name=pdf
This paper investigates the sample complexity required for learning single index models under isotropic Gaussian distributions, particularly focusing on the effectiveness of stochastic gradient descent (SGD). Previous findings indicated a gap between the sample complexity necessary for gradient-based methods and the theoretical lower bounds, with SGD requiring significantly more samples than what was indicated by Correlational Statistical Query (CSQ) lower bounds. The authors address this gap by introducing a smoothing technique applied to the loss landscape, demonstrating that by utilizing a smoothed loss, SGD can achieve the optimal sample complexity of \(ndk/2\) samples, aligning with CSQ bounds for models where the information exponent \(k > 2\). The analysis connects this improvement to the enhanced signal-to-noise ratio resulting from smoothing, which allows the learning process to avoid poor local minima, thus improving convergence properties.
Paper URL: https://openreview.net/attachment?id=KvPwXVcslY&name=pdf
This paper investigates the spatial frequency information utilized by humans and neural networks for object recognition, employing critical band masking to compare their performance in recognizing natural images under noise. The study finds that humans rely on a narrow, one-octave-wide spatial frequency channel for recognition tasks, consistent across various stimuli, while neural networks exhibit significantly broader channels—2-4 times wider than humans. This discrepancy in channel width correlates with differences in shape bias and adversarial robustness, with adversarial training further increasing the networks' channel bandwidth beyond human levels. The findings suggest that aligning the spatial frequency channels of neural networks with those of humans could enhance their robustness against adversarial attacks.
Paper URL: https://openreview.net/attachment?id=a2Yg9Za6Rb&name=pdf
This paper investigates the effectiveness of model distillation as a privacy-preserving technique in machine learning, focusing on its vulnerability to membership inference attacks. The authors demonstrate that simply relying on distillation does not adequately protect sensitive training data from being inferred, as their developed attacks reveal that information can leak from teacher models to student models, even without direct access to training examples. They find that the similarity between teacher and student datasets, as well as data poisoning, significantly increases privacy risks. Moreover, they propose mitigating strategies such as deduplication and employing differential privacy, emphasizing the need for comprehensive privacy measures beyond model distillation alone.
Paper URL: https://openreview.net/attachment?id=0A9f2jZDGW&name=pdf
The paper investigates task arithmetic in vision-language models, emphasizing its potential for efficient model editing by manipulating weights directly in the tangent space. It identifies weight disentanglement as a critical factor enabling effective task arithmetic, revealing that distinct directions in weight space correspond to localized function space regions for different tasks. The authors demonstrate that fine-tuning in the tangent space enhances weight disentanglement, leading to improved performance on various benchmarks. They also connect task arithmetic to the spatial localization of the neural tangent kernel (NTK) eigenfunctions, establishing that weight disentanglement emerges during pre-training. The findings suggest that linearized fine-tuning can significantly enhance task arithmetic performance, offering insights for developing more effective model editing techniques.
Paper URL: https://openreview.net/attachment?id=Kv8GJkV19S&name=pdf
The paper introduces the first universal tester-learner for halfspaces that operates efficiently across a broad class of structured distributions, specifically those satisfying the Poincar inequality. This tester-learner is designed to accept a wide variety of distributions without being tailored to any single target distribution. The proposed algorithm runs in polynomial time and guarantees an error of O(opt) + ε on any labeled distribution it accepts. It utilizes hypercontractivity checks via a sum-of-squares program, marking a significant advancement over previous works that were limited to specific distributions, such as Gaussian or log-concave. Additionally, under the assumption of known Massart noise, it achieves an error rate of opt + ε, thereby extending its applicability and performance for various learning scenarios.
Paper URL: https://openreview.net/attachment?id=S5wmbQc1We&name=pdf
This paper explores whether neural networks trained on algorithmic tasks, specifically modular addition, reliably rediscover known algorithms. By examining two algorithms—Clock and Pizza—the authors demonstrate that small changes in model hyperparameters can lead to qualitatively different algorithmic implementations. The Clock algorithm, which aligns with traditional modular arithmetic, is shown to be one of several possible solutions, as some networks implement the less intuitive Pizza algorithm, characterized by averaging embeddings rather than using multiplication. The findings suggest a rich diversity of algorithmic behaviors in neural networks, emphasizing the need for new interpretability tools to navigate the complex algorithmic phase space that these models inhabit.
Paper URL: https://openreview.net/attachment?id=jDIlzSU8wJ&name=pdf
This paper demonstrates the effectiveness of denoising diffusion probabilistic models for optical flow and monocular depth estimation, challenging the traditional reliance on specialized architectures and loss functions. The authors introduce the Denoising Diffusion Vision Model (DDVM), which excels in uncertainty capturing and allows for Monte Carlo inference, outperforming existing state-of-the-art methods on benchmark datasets like NYU and KITTI. The model integrates self-supervised pre-training, innovative training techniques for handling noisy data, and a coarse-to-fine refinement approach, achieving significant improvements in performance metrics such as relative depth error and optical flow outlier rates. The findings indicate that diffusion models could serve as a powerful and flexible framework for dense vision tasks, emphasizing their potential for capturing multimodal distributions and handling ambiguities in data.
Paper URL: https://openreview.net/attachment?id=Yacmpz84TH&name=pdf
The paper introduces Toolformer, a novel language model that enhances its capabilities by learning to utilize external tools through simple APIs in a self-supervised manner. Unlike traditional models, which often require extensive human annotations or are limited to specific tasks, Toolformer autonomously decides when and how to call various APIs, such as calculators, search engines, and translation systems, based on a few demonstrations. This approach not only improves its performance across a range of downstream tasks in zero-shot settings but also maintains its language modeling abilities. Toolformer, based on a 6.7 billion parameter GPT-J model, surpasses even larger models like GPT-3 in several benchmarks, demonstrating its effectiveness in addressing inherent limitations of language models, such as arithmetic skills and factual accuracy.
Paper URL: https://openreview.net/attachment?id=BHXsb69bSx&name=pdf
The paper introduces ToolkenGPT, a novel approach that enhances large language models (LLMs) by integrating external tools through the concept of tool embeddings, referred to as "toolkens." Unlike traditional methods that require extensive fine-tuning or are constrained by limited context in in-context learning, ToolkenGPT allows for the dynamic addition of multiple tools and utilizes extensive demonstration data to learn toolken embeddings efficiently. The framework prompts the LLM to switch modes when a tool is called, enabling it to generate arguments for tool execution seamlessly. Experimental results demonstrate that ToolkenGPT significantly outperforms existing baselines in various domains, including numerical reasoning, knowledge-based question answering, and embodied plan generation, showcasing its ability to adeptly utilize a wide range of tools in complex scenarios without the need for costly fine-tuning.
Paper URL: https://openreview.net/attachment?id=qHrADgAdYu&name=pdf
This paper investigates the theoretical foundations of Chain-of-Thought (CoT) prompting, which has been shown to significantly enhance the performance of Large Language Models (LLMs) on complex tasks, particularly in mathematics and reasoning. The authors employ circuit complexity theory to demonstrate that bounded-depth Transformers struggle to directly solve basic arithmetic and equation tasks without CoT, requiring super-polynomial model sizes. In contrast, they establish that constant-size autoregressive Transformers can effectively utilize CoT to generate step-by-step derivations, enabling them to tackle a broader class of decision-making problems, such as Dynamic Programming. Empirical experiments further confirm that models trained with CoT consistently outperform those trained for direct predictions, emphasizing the critical role of CoT in unlocking the potential of LLMs for solving intricate real-world tasks.
Paper URL: https://openreview.net/attachment?id=liMSqUuVg9&name=pdf
The paper presents a comprehensive statistical theory for transformers' capabilities in in-context learning (ICL), demonstrating that they can implement various standard machine learning algorithms—such as least squares and Lasso—in context without explicit parameter updates. The authors establish that transformers can perform adaptive in-context algorithm selection, allowing them to choose different algorithms based on input sequences, thus enhancing predictive performance. They construct two mechanisms for algorithm selection: pre-ICL testing and post-ICL validation, providing theoretical guarantees for their approaches. The experimental results affirm the theoretical findings, showcasing strong ICL and algorithm selection capabilities in standard transformer architectures.
Paper URL: https://openreview.net/attachment?id=5Xc1ecxO1h&name=pdf
The paper introduces the Tree of Thoughts (ToT) framework, enhancing the problem-solving capabilities of large language models (LMs) by allowing them to engage in deliberate decision-making processes that involve exploring multiple reasoning paths and self-evaluating their choices. Unlike traditional left-to-right inference mechanisms, ToT enables LMs to maintain a tree of coherent text units (thoughts) and apply search algorithms, such as breadth-first and depth-first search, to navigate through potential solutions. Experimental results demonstrate that ToT significantly improves performance in tasks requiring complex reasoning, including the Game of 24, Creative Writing, and Mini Crosswords, achieving markedly higher success rates compared to existing prompting techniques like Chain of Thought. The framework's flexibility and generality position it as a promising approach for tackling diverse problem-solving challenges in the realm of LMs.
Paper URL: https://openreview.net/attachment?id=NnMEadcdyD&name=pdf
This paper establishes a theoretical connection between diffusion model objectives and the Evidence Lower Bound (ELBO), demonstrating that common diffusion objectives can be viewed as weighted integrals of ELBOs across varying noise levels, with specific weightings influencing the model's performance. The authors show that when the weighting function is monotonic, these objectives correspond to maximizing the ELBO with Gaussian noise perturbation as a form of data augmentation. Through experiments on the ImageNet dataset, they explore various monotonic weightings and demonstrate that their proposed approaches achieve state-of-the-art results in image generation. The findings suggest significant implications for optimizing diffusion models and understanding their relationship to other generative modeling techniques.
Paper URL: https://openreview.net/attachment?id=PITeSdYQkv&name=pdf
This paper addresses the challenge of user-level differential privacy (DP) in scenarios where each user contributes only a few examples, as opposed to the previously studied example-rich cases. The authors present a generic method to transform item-level DP algorithms into user-level DP algorithms, yielding significant reductions in the number of users required to achieve similar utility, specifically a multiplicative savings of O(m) where m is the number of examples per user. They also propose techniques for both approximate and pure DP, adapting existing mechanisms like the exponential mechanism to fit the user-level framework. The results yield new sample complexity bounds for various learning tasks, including PAC learning, while highlighting the computational inefficiencies of the proposed algorithms. Overall, the paper contributes to advancing the understanding of user-level DP, providing algorithms that are useful for practical machine learning applications while outlining open questions for future research.
Paper URL: https://openreview.net/attachment?id=w0H2xGHlkw&name=pdf
The paper introduces LLaVA, a novel approach to visual instruction tuning that leverages GPT-4 to generate multimodal instruction-following data, aiming to enhance the capabilities of large multimodal models (LMMs) for visual and language tasks. LLaVA effectively connects a vision encoder with a language model to create a general-purpose assistant capable of interpreting and responding to visual instructions. The authors construct two evaluation benchmarks to assess the model's performance across diverse tasks. Experimental results demonstrate that LLaVA exhibits strong multimodal chat capabilities and achieves state-of-the-art accuracy on the Science QA dataset, outperforming existing models. The paper also emphasizes the importance of creating high-quality multimodal instruction-following data and provides open-source access to these resources for further research in the field.
Paper URL: https://openreview.net/attachment?id=oML3v2cFg2&name=pdf
The paper introduces a novel approach to Offline Inverse Reinforcement Learning (IRL) by framing it as a maximum likelihood estimation problem, addressing the challenges posed by limited expert demonstrations and distribution shifts in the environment dynamics. The authors develop a bi-level optimization framework where the upper level maximizes the likelihood of observed expert actions, while the lower level conservatively estimates the expert's policy and the world model. This method incorporates uncertainty estimation to penalize state-action pairs with high uncertainty, thereby enhancing the reliability of the reward recovery process. The proposed algorithm, termed Offline ML-IRL, demonstrates significant improvements over existing offline IRL and imitation learning benchmarks, particularly in continuous control tasks using the MuJoCo simulator and various datasets from the D4RL benchmark. The authors provide statistical and computational guarantees for the performance of their method, emphasizing its applicability in safety-sensitive domains.
Paper URL: https://openreview.net/attachment?id=APGXBNkt6h&name=pdf
This paper investigates the effectiveness of Transformers in reinforcement learning (RL) by distinguishing between memory capabilities and credit assignment abilities. The authors define memory length and credit assignment length, and design configurable tasks to empirically evaluate these aspects in RL contexts. The findings reveal that Transformers significantly enhance long-term memory, enabling them to recall observations up to 1500 steps back, but do not provide improvements in long-term credit assignment. The study highlights the need for careful task design in RL benchmarks and suggests that while Transformers are powerful for memory tasks, they do not universally solve all RL challenges.
Paper URL: https://openreview.net/attachment?id=rcXXNFVlEn&name=pdf
The paper investigates the effectiveness of chain-of-thought reasoning in language models and its connection to the statistical structure of training data. It posits that reasoning through intermediate steps allows models to make more accurate inferences when the training data consists of overlapping local clusters of related variables. Experimental results demonstrate that models trained on locally structured data significantly benefit from reasoning steps, leading to lower bias in estimating conditional probabilities, particularly for pairs of variables not frequently co-occurring in training. The findings highlight the importance of local statistical dependencies in enhancing the reasoning capabilities of both humans and language models, suggesting that such reasoning improves data efficiency.