Log in to gain access to all functionality! (Sign up | Log in)

Google AI Blog: Improving the Accuracy of Genomic Analysis with DeepVariant 1.0
ai.googleblog.com

8 min read, listed 10 hours ago in Google AI Blog

73

Posted by Andrew Carroll, Product Lead, and Pi-Chuan Chang, Technical Lead, Google Health Sequencing genomes involves sampling short piec...

Reading list

[1912.06594] An Interval-Valued Utility Theory for Decision Making with Dempster-Shafer Belief Functions
arxiv.org

1 min read, listed 12 hours ago in reddit/artificial/Artificial intelligence expert originates new theory for decision-making

1

The main goal of this paper is to describe an axiomatic utility theory for Dempster-Shafer belief function lotteries. The axiomatic framework used is analogous to von Neumann-Morgenstern's utility theory for probabilistic lotteries as described by Luce and Raiffa. Unlike the probabilistic case, our axiomatic framework leads to interval-valued utilities, and therefore, to a partial (incomplete) preference order on the set of all belief function lotteries. If the belief function reference lotteries we use are Bayesian belief functions, then our representation theorem coincides with Jaffray's representation theorem for his linear utility theory ...

Reading list

Cybersecurity After Coronavirus: Best Practices for a Better Defense of Data
datafloq.com

1 min read, listed 12 hours ago in Datafloq: Driving Innovation through Data

3

Cybersecurity has always been important for businesses, but with everyone working remotely this importance only increased.

Reading list

MutaGAN: A Seq2seq GAN Framework to Predict Mutations of Evolving Protein Populations – arXiv Vanity
www.arxiv-vanity.com

1 min read, listed 16 hours ago in reddit/MachineLearning/[R] MutaGAN: A Seq2seq GAN Framework to Predict Mutations of Evolving Protein Populations

Reading list

[2008.11790] MutaGAN: A Seq2seq GAN Framework to Predict Mutations of Evolving Protein Populations
arxiv.org

2 min read, listed 16 hours ago in reddit/MachineLearning/[R] MutaGAN: A Seq2seq GAN Framework to Predict Mutations of Evolving Protein Populations

101

The ability to predict the evolution of a pathogen would significantly improve the ability to control, prevent, and treat disease. Despite significant progress in other problem spaces, deep learning has yet to contribute to the issue of predicting mutations of evolving populations. To address this gap, we developed a novel machine learning framework using generative adversarial networks (GANs) with recurrent neural networks (RNNs) to accurately predict genetic mutations and evolution of future biological populations. Using a generalized time-reversible phylogenetic model of protein evolution with bootstrapped maximum likelihood tree estimatio...

Reading list

[2009.08295] Neural CDEs for Long Time Series via the Log-ODE Method
arxiv.org

1 min read, listed 21 hours ago in reddit/MachineLearning/[R] Neural CDEs for very long (17k) time series (TLDR: magic binning strategy!)

1

Neural Controlled Differential Equations (Neural CDEs) are the continuous-time analogue of an RNN, just as Neural ODEs are analogous to ResNets. However just like RNNs, training Neural CDEs can be difficult for long time series. Here, we propose to apply a technique drawn from stochastic analysis, namely the log-ODE method. Instead of using the original input sequence, our procedure summarises the information over local time intervals via the log-signature map, and uses the resulting shorter stream of log-signatures as the new input. This represents a length/channel trade-off. In doing so we demonstrate efficacy on problems of length up to 17...

Reading list

A solution to the learning dilemma for recurrent networks of spiking neurons | Nature Communications
www.nature.com

21 min read, listed 21 hours ago in reddit/MachineLearning/[R] EventProp: Backpropagation for Exact Gradients in Spiking Neural Networks

2

Bellec et al. present a mathematically founded approximation for gradient descent training of recurrent neural networks without backwards propagation in time. This enables biologically plausible training of spike-based neural network models with working memory and supports on-chip training of neuromorphic hardware.

Reading list

EventProp: Backpropagation for Exact Gradients in Spiking Neural Networks – arXiv Vanity
www.arxiv-vanity.com

5 min read, listed 21 hours ago in reddit/MachineLearning/[R] EventProp: Backpropagation for Exact Gradients in Spiking Neural Networks

We derive the backpropagation algorithm for spiking neural networks composed of leaky integrate-and-fire neurons operating in continuous time. This algorithm, EventProp, computes the exact gradient of an arbitrary loss function of spike times and membrane potentials by backpropagating errors in time. For the first time, by leveraging methods from optimal control theory, we are able to backpropagate errors through spike discontinuities and avoid approximations or smoothing operations. EventProp can be applied to spiking networks with arbitrary connectivity, including recurrent, convolutional and deep feed-forward architectures. While we consid...

Reading list

[2009.08378] EventProp: Backpropagation for Exact Gradients in Spiking Neural Networks
arxiv.org

1 min read, listed 21 hours ago in reddit/MachineLearning/[R] EventProp: Backpropagation for Exact Gradients in Spiking Neural Networks

We derive the backpropagation algorithm for spiking neural networks composed of leaky integrate-and-fire neurons operating in continuous time. This algorithm, EventProp, computes the exact gradient of an arbitrary loss function of spike times and membrane potentials by backpropagating errors in time. For the first time, by leveraging methods from optimal control theory, we are able to backpropagate errors through spike discontinuities and avoid approximations or smoothing operations. EventProp can be applied to spiking networks with arbitrary connectivity, including recurrent, convolutional and deep feed-forward architectures. While we consid...

Reading list

MoPro: Webly Supervised Learning with Momentum Prototypes
blog.einstein.ai

6 min read, listed 1 day ago in Salesforce Research

TL; DR: We propose a new webly-supervised learning algorithm which achieves state-of-the-art representation learning performance by training on noisy Web images.Deep neural networks are known to be hungry for labeled data. Current state-of-the-art CNNs are trained with supervised learning on datasets such as ImageNet or Places, which contain millions

Reading list

[2009.06489] The Hardware Lottery
arxiv.org

1 min read, listed 1 day ago in reddit/MachineLearning/[R] The Hardware Lottery: “The advent of domain specialized hardware makes it increasingly costly to stray off of the beaten path of research ideas.” (Essay by Sara Hooker)

61

Hardware, systems and algorithms research communities have historically had different incentive structures and fluctuating motivation to engage with each other explicitly. This historical treatment is odd given that hardware and software have frequently determined which research ideas succeed (and fail). This essay introduces the term hardware lottery to describe when a research idea wins because it is suited to the available software and hardware and not because the idea is superior to alternative research directions. Examples from early computer science history illustrate how hardware lotteries can delay research progress by casting success...

Reading list

[2009.06978] Dialogue Response Ranking Training with Large-Scale Human Feedback Data
arxiv.org

1 min read, listed 1 day ago in reddit/MachineLearning/[R] This model predicts which Reddit comment gets more upvotes

Existing open-domain dialog models are generally trained to minimize the perplexity of target human responses. However, some human replies are more engaging than others, spawning more followup interactions. Current conversational models are increasingly capable of producing turns that are context-relevant, but in order to produce compelling agents, these models need to be able to predict and optimize for turns that are genuinely engaging. We leverage social media feedback data (number of replies and upvotes) to build a large-scale training dataset for feedback prediction. To alleviate possible distortion between the feedback and engagingness,...

Reading list

[2009.07243] A Systematic Characterization of Sampling Algorithms for Open-ended Language Generation
arxiv.org

1 min read, listed 1 day ago in reddit/MachineLearning/[R] A Systematic Characterization of Sampling Algorithms for Open-ended Language Generation

1

This work studies the widely adopted ancestral sampling algorithms for auto-regressive language models, which is not widely studied in the literature. We use the quality-diversity (Q-D) trade-off to investigate three popular sampling algorithms (top-k, nucleus and tempered sampling). We focus on the task of open-ended language generation. We first show that the existing sampling algorithms have similar performance. After carefully inspecting the transformations defined by different sampling algorithms, we identify three key properties that are shared among them: entropy reduction, order preservation, and slope preservation. To validate the im...

Reading list

[1812.08466] Fréchet Audio Distance: A Metric for Evaluating Music Enhancement Algorithms
arxiv.org

1 min read, listed 1 day ago in reddit/MachineLearning/[D] Metrics for percepted audio

We propose the Fréchet Audio Distance (FAD), a novel, reference-free evaluation metric for music enhancement algorithms. We demonstrate how typical evaluation metrics for speech enhancement and blind source separation can fail to accurately measure the perceived effect of a wide variety of distortions. As an alternative, we propose adapting the Fréchet Inception Distance (FID) metric used to evaluate generative image models to the audio domain. FAD is validated using a wide variety of artificial distortions and is compared to the signal based metrics signal to distortion ratio (SDR), cosine distance and magnitude L2 distance. We show that, wi...

Reading list

[2009.07118] It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
arxiv.org

1 min read, listed 2 days ago in reddit/MachineLearning/[R] It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners

76

When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance on challenging natural language understanding benchmarks. In this work, we show that performance similar to GPT-3 can be obtained with language models whose parameter count is several orders of magnitude smaller. This is achieved by converting textual inputs into cloze questions that contain some form of task description, combined with gradient-based optimization; additionally exploiting unlabeled data gives further improvements. Based on our findings, we identify several key factors required...

Reading list

[2009.06732] Efficient Transformers: A Survey
arxiv.org

1 min read, listed 2 days ago in reddit/MachineLearning/[R] Efficient Transformers: A Survey

39

Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning. In the field of natural language processing for example, Transformers have become an indispensable staple in the modern deep learning stack. Recently, a dizzying number of "X-former" models have been proposed - Reformer, Linformer, Performer, Longformer, to name a few - which improve upon the original Transformer architecture, many of which make improvements around computational and memory efficiency. With the aim of helping the avid researcher navigate this flurry, this p...

Reading list

Efficient Transformers: A Survey – arXiv Vanity
www.arxiv-vanity.com

2 min read, listed 2 days ago in reddit/MachineLearning/[R] Efficient Transformers: A Survey

Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning. In the field of natural language processing for example, Transformers have become an indispensable staple in the modern deep learning stack. Recently, a dizzying number of “X-former” models have been proposed - Reformer, Linformer, Performer, Longformer, to name a few - which improve upon the original Transformer architecture, many of which make improvements around computational and memory efficiency. With the aim of helping the avid researcher navigate this flurry, this p...

Reading list

Current Limitations of Language Models: What You Need is Retrieval – arXiv Vanity
www.arxiv-vanity.com

6 min read, listed 2 days ago in reddit/MachineLearning/[R] Current Limitations of Language Models: What You Need is Retrieval

We classify and re-examine some of the current approaches to improve the performance-computes trade-off of language models, including (1) non-causal models (such as masked language models), (2) extension of batch length with efficient attention, (3) recurrence, (4) conditional computation and (5) retrieval. We identify some limitations (1) - (4) suffer from. For example, (1) currently struggles with open-ended text generation with the output loosely constrained by the input as well as performing general textual tasks like GPT-2/3 due to its need for a specific fine-tuning dataset. (2) and (3) do not improve the prediction of the first ∼103 to...

Reading list

[2009.06857] Current Limitations of Language Models: What You Need is Retrieval
arxiv.org

1 min read, listed 2 days ago in reddit/MachineLearning/[R] Current Limitations of Language Models: What You Need is Retrieval

We classify and re-examine some of the current approaches to improve the performance-computes trade-off of language models, including (1) non-causal models (such as masked language models), (2) extension of batch length with efficient attention, (3) recurrence, (4) conditional computation and (5) retrieval. We identify some limitations (1) - (4) suffer from. For example, (1) currently struggles with open-ended text generation with the output loosely constrained by the input as well as performing general textual tasks like GPT-2/3 due to its need for a specific fine-tuning dataset. (2) and (3) do not improve the prediction of the first $\sim 1...

Reading list

Google AI Blog: Improving Sparse Training with RigL
ai.googleblog.com

6 min read, listed 2 days ago in Google AI Blog

62

Posted by Utku Evci and Pablo Samuel Castro, Research Engineers, Google Research, Montreal Modern deep neural network architectures are o...

Reading list

Deep Dominance - How to Properly Compare Deep Neural Models - ACL Anthology
www.aclweb.org

1 min read, listed 2 days ago in reddit/MachineLearning/[Discussion] Statistical significance in deep learning papers?

Rotem Dror, Segev Shlomov, Roi Reichart. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019.

Reading list

The Hitchhiker’s Guide to Testing Statistical Significance in Natural Language Processing - ACL Anthology
www.aclweb.org

1 min read, listed 2 days ago in reddit/MachineLearning/[Discussion] Statistical significance in deep learning papers?

Rotem Dror, Gili Baumer, Segev Shlomov, Roi Reichart. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018.

Reading list

[2009.01398] It's Hard for Neural Networks To Learn the Game of Life
arxiv.org

2 min read, listed 2 days ago in reddit/MachineLearning/[R] Neural networks vs The Game of Life

5

Efforts to improve the learning abilities of neural networks have focused mostly on the role of optimization methods rather than on weight initializations. Recent findings, however, suggest that neural networks rely on lucky random initial weights of subnetworks called "lottery tickets" that converge quickly to a solution. To investigate how weight initializations affect performance, we examine small convolutional networks that are trained to predict n steps of the two-dimensional cellular automaton Conway's Game of Life, the update rules of which can be implemented efficiently in a 2n+1 layer convolutional network. We find that networks of t...

Reading list

[2009.06962] Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup
arxiv.org

1 min read, listed 2 days ago in reddit/MachineLearning/[R] Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup (ICML 2020)

1

While deep neural networks achieve great performance on fitting the training distribution, the learned networks are prone to overfitting and are susceptible to adversarial attacks. In this regard, a number of mixup based augmentation methods have been recently proposed. However, these approaches mainly focus on creating previously unseen virtual examples and can sometimes provide misleading supervisory signal to the network. To this end, we propose Puzzle Mix, a mixup method for explicitly utilizing the saliency information and the underlying statistics of the natural examples. This leads to an interesting optimization problem alternating bet...

Reading list

[1909.03104] Efficient Sentence Embedding using Discrete Cosine Transform
arxiv.org

1 min read, listed 2 days ago in reddit/LanguageTechnology/Pooling word embeddings to create sentence embeddings?

Vector averaging remains one of the most popular sentence embedding methods in spite of its obvious disregard for syntactic structure. While more complex sequential or convolutional networks potentially yield superior classification performance, the improvements in classification accuracy are typically mediocre compared to the simple vector averaging. As an efficient alternative, we propose the use of discrete cosine transform (DCT) to compress word sequences in an order-preserving manner. The lower order DCT coefficients represent the overall feature patterns in sentences, which results in suitable embeddings for tasks that could benefit fro...

Reading list

End of content

No more pages to load