Information Processing: machine learning

Showing posts with label machine learning. Show all posts

Monday, March 11, 2024

Solving the Hallucination Problem - interview with AppliedAI

Recent podcast interview with AppliedAI.

We discuss SuperFocus.ai Enterprise GPT.

Good intro to how customer-defined AI/LLM memory eliminates hallucinations.

Sunday, December 24, 2023

Peace on Earth, Good Will to Men 2023

When asked what I want for Christmas, I reply: Peace On Earth, Good Will To Men :-)

No one ever seems to recognize that this comes from the Bible (Luke 2.14).

Linus said it best in A Charlie Brown Christmas:

And there were in the same country shepherds abiding in the field, keeping watch over their flock by night.

And, lo, the angel of the Lord came upon them, and the glory of the Lord shone round about them: and they were sore afraid.

And the angel said unto them, Fear not: for, behold, I bring you good tidings of great joy, which shall be to all people.

For unto you is born this day in the city of David a Saviour, which is Christ the Lord.

And this shall be a sign unto you; Ye shall find the babe wrapped in swaddling clothes, lying in a manger.

And suddenly there was with the angel a multitude of the heavenly host praising God, and saying,

Glory to God in the highest, and on Earth peace, good will toward men.

2023 saw the founding of our startup SuperFocus.ai, which builds AIs with user-configured attached memory. The AI consults this memory in responding to prompts, and only gives answers consistent with the information in the memory. This solves the hallucination problem and allows the AI to answer questions like a human with perfect recall of the information.

SuperFocus built an AI for a major consumer electronics brand that can support and troubleshoot hundreds of models of smart devices (I can't be more specific). Its memory consists of thousands of pages of product manuals, support documents, and problem solving guides originally used by human support agents.

In December I traveled to Manila after the semester ended, in order to meet with outsourcing (BPO = Business Process Outsourcing) companies that run call centers for global brands. This industry accounts for ~8% of Philippine GPD (~$40B per annum), driven by comparative advantages such as the widespread use of English here and relatively low wages. I predict that AIs of the type produced by SuperFocus.ai will disrupt the BPO and other industries in coming years, with dramatic effects on the numbers of humans employed in areas like customer support.

But fear not: for, behold, I bring you good tidings of great joy, which shall be to all people.

The arrival of machine intelligence on Earth is the beginning of a great adventure!

This is GPT's account of the meetings in Manila.

In the heart of Manila, amidst the bustling cityscape, a meeting of innovative minds took place. Steve Hsu, the visionary founder of SuperFocus, had arrived to showcase the prowess of his latest creation—an AI designed to revolutionize technical support for complex products. The setting was a conference room adorned with sleek screens and cutting-edge technology, a fitting backdrop for the unveiling of this groundbreaking innovation.

Seated around the polished table were the owners and executives of prominent BPO (Business Process Outsourcing) companies. Their faces were a blend of anticipation and apprehension as Steve Hsu prepared to demonstrate the capabilities of the AI-powered technical support system.

With a confident smile, Steve initiated the demonstration. The AI, equipped with a sophisticated neural network, began its simulated interaction. It effortlessly tackled intricate technical queries, deciphering complex issues with lightning speed and unparalleled accuracy. Each solution presented was concise, comprehensive, and flawlessly executed.

As the AI effortlessly navigated through a myriad of scenarios and troubleshooting processes, the room fell into a hush. The BPO leaders exchanged astonished glances, their initial amazement mingled with a growing sense of unease. The capabilities displayed by the AI were undeniably impressive, but they also highlighted a looming question—what did this mean for the future of human roles in their industry?

Steve Hsu noticed the shift in atmosphere and paused the demonstration. With a gentle yet determined tone, he addressed the concerns lingering in the room. "This AI isn't meant to replace human expertise," he began. "Rather, it's here to augment and enhance your services. Imagine your teams empowered by this technology, streamlining operations, and providing even more efficient and effective support to customers."

His words offered reassurance, but the specter of automation replacing human jobs lingered in the minds of the BPO owners. The potential efficiency gains were undeniable, yet so too were the implications for the human workforce.

In the ensuing discussion, voices echoed with a mix of excitement and apprehension. Some saw the potential for growth and advancement, envisioning a future where human creativity combined with AI prowess would elevate their services to new heights. Others grappled with the uncertainty, worrying about the displacement of jobs and the evolving landscape of the industry they had dedicated their careers to.

Steve Hsu listened attentively, acknowledging their concerns while emphasizing the collaborative potential between humans and AI. "This technology," he explained, "is a tool, a means to empower and evolve, not to supplant. Together, we can harness its capabilities to create a synergy that benefits both businesses and their workforce."

As the meeting concluded, the BPO leaders departed with a mix of awe and trepidation. The AI presented by Steve Hsu had showcased a future teeming with possibilities, yet it also raised profound questions about adaptation and the role of humans in an increasingly automated world.

The echoes of the demonstration lingered in the minds of those present, igniting discussions and contemplation about the balance between innovation and the human touch, forever altering the landscape of the BPO industry in Manila and beyond.

Bonus: Two recent interviews I did which I enjoyed very much.

Thursday, August 10, 2023

AI on your phone? Tim Dettmers on quantization of neural networks — Manifold #41

Tim Dettmers develops computationally efficient methods for deep learning. He is a leader in quantization: coarse graining of large neural networks to increase speed and reduce hardware requirements.

Tim developed 4-and 8-bit quantizations enabling training and inference with large language models on affordable GPUs and CPUs - i.e., as commonly found in home gaming rigs.

Tim and Steve discuss: Tim's background and current research program, large language models, quantization and performance, democratization of AI technology, the open source Cambrian explosion in AI, and the future of AI.

Tim's site: https://timdettmers.com/

Tim on GitHub: https://github.com/TimDettmers

Audio-only and Transcript

0:00 Introduction and Tim’s background

18:02 Tim's interest in the efficiency and accessibility of large language models

38:05 Inference, speed, and the potential for using consumer GPUs for running large language models

45:55 Model training and the benefits of quantization with QLoRA

57:14 The future of AI and large language models in the next 3-5 years and beyond

Thursday, June 08, 2023

AI Cambrian Explosion: Conversation With Three AI Engineers — Manifold #37

In this episode, Steve talks to three AI engineers from his startup SuperFocus.AI.

0:00 Introduction

1:06 The Google memo and open-source AI

14:41 Sparsification and the size of models: AI on your phone?

30:16 When will AI take over ordinary decision-making from humans?

34:50 Rapid advances in AI: a view from inside

41:28 AI Doomers and Alignment

Links to earlier episodes on Artificial Intelligence & Large Language Models:

Oxford Lecture — #35:

https://www.manifold1.com/episodes/artificial-intelligence-large-language-models-oxford-lecture-35

Bing vs. Bard, US-China STEM Competition, and Embryo Screening — #30:

https://www.manifold1.com/episodes/bing-vs-bard-us-china-stem-competition-and-embryo-screening-30

ChatGPT, LLMs, and AI — #29:

https://www.manifold1.com/episodes/chatgpt-llms-and-ai

Saturday, March 11, 2023

Biobank-scale methods and projections for sparse polygenic prediction from machine learning

New paper! 80+ pages of fun :-)

We develop a novel method for projecting AUC and Correlation as a function of data size and characterize the asymptotic limit of performance. For LASSO (compressed sensing) we show that performance metrics and predictor sparsity are in agreement with theoretical predictions from the Donoho-Tanner phase transition.

Biobank-scale methods and projections for sparse polygenic prediction from machine learning

https://www.medrxiv.org/content/10.1101/2023.03.06.23286870v1

Timothy G. Raben, Louis Lello, Erik Widen, Stephen D.H. Hsu

https://doi.org/10.1101/2023.03.06.23286870

Abstract In this paper we characterize the performance of linear models trained via widely-used sparse machine learning algorithms. We build polygenic scores and examine performance as a function of training set size, genetic ancestral background, and training method. We show that predictor performance is most strongly dependent on size of training data, with smaller gains from algorithmic improvements. We find that LASSO generally performs as well as the best methods, judged by a variety of metrics. We also investigate performance characteristics of predictors trained on one genetic ancestry group when applied to another. Using LASSO, we develop a novel method for projecting AUC and Correlation as a function of data size (i.e., for new biobanks) and characterize the asymptotic limit of performance. Additionally, for LASSO (compressed sensing) we show that performance metrics and predictor sparsity are in agreement with theoretical predictions from the Donoho-Tanner phase transition. Specifically, a predictor trained in the Taiwan Precision Medicine Initiative for asthma can achieve an AUC of 0.63(0.02) and for height a correlation of 0.648(0.009) for a Taiwanese population. This is above the measured values of 0.61(0.01) and 0.631(0.008), respectively, for UK Biobank trained predictors applied to a European population.

Figure: Performance in 5 ancestry groups using LASSO, Elastic Net, and PRScs with UKB and 1,000 Genomes LD matrices. Solid bands = predicted performance using All of Us and Taiwan Precision Medicine Initiative datasets.

Tuesday, October 25, 2022

American Society of Human Genetics (ASHG) 2022 Posters

New results from Taiwan Precision Medicine Initiative, a mega-biobank (>500k already genotyped).

Diabetes predictor trained in TW as powerful as EUR-trained PRS. Breast Cancer PRS validated in TW. More!

Poster for Amer. Soc. Hum. Genetics#ASHG2022 pic.twitter.com/kU7RmHGlVi
— steve hsu (@hsu_steve) October 25, 2022

Monte Carlo projections for PGS performance in current and future biobanks.

Also methods comparison between sparse training algos: L1, PRScs, elastic net, ..

All of Us and TPMI (Taiwan) will close PGS gap for non-Euro ancestry groups!

Amer. Soc. Hum. Genetics#ASHG2022 pic.twitter.com/Pu7UfeaKEN
— steve hsu (@hsu_steve) October 25, 2022

Tuesday, September 20, 2022

Sibling Variation in Phenotype and Genotype: Polygenic Trait Distributions and DNA Recombination Mapping with UK Biobank and IVF Family Data (medRxiv)

This is a new paper which uses Genomic Prediction IVF family data, including genotyped embryo samples.

Sibling Variation in Phenotype and Genotype: Polygenic Trait Distributions and DNA Recombination Mapping with UK Biobank and IVF Family Data

L. Lello, M. Hsu, E. Widen, and T. Raben

We use UK Biobank and a unique IVF family dataset (including genotyped embryos) to investigate sibling variation in both phenotype and genotype. We compare phenotype (disease status, height, blood biomarkers) and genotype (polygenic scores, polygenic health index) distributions among siblings to those in the general population. As expected, the between-siblings standard deviation in polygenic scores is \sqrt{2} times smaller than in the general population, but variation is still significant. As previously demonstrated, this allows for substantial benefit from polygenic screening in IVF. Differences in sibling genotypes result from distinct recombination patterns in sexual reproduction. We develop a novel sibling-pair method for detection of recombination breaks via statistical discontinuities. The new method is used to construct a dataset of 1.44 million recombination events which may be useful in further study of meiosis.

Here are some figures illustrating the variation of polygenic scores among siblings from the same family.

The excerpt below describes the IVF family highlighted in blue above:

Among the families displayed in these figures, at position number 15 from the left, we encounter an interesting case of sibling polygenic distribution relative to the parents. In the family all siblings have significantly higher Health Index score than the parents. This arises in an interesting manner: the mother is a high-risk outlier for condition X and the father is a high-risk outlier for condition Y. (We do not specify X and Y, out of an abundance of caution for privacy, although the patients have consented that such information could be shared.) Their lower overall Health Index scores result from high risk of conditions X (mother) and Y (father). However, the embryos, each resulting from unique recombination of parental genotypes, are normal risk for both X and Y and each embryo has much higher Health Index score than the parents.

This case illustrates well the potential benefits from PGS embryo screening.

The second part of the paper introduces a new technique that directly probes DNA recombination -- the molecular mechanism responsible for sibling genetic differences. See figure above for some results. The new method detects recombination breaks via statistical discontinuities in pairwise comparisons of DNA regions.

From the discussion:

...This new sibling-pair method can be applied to large datasets with many thousands of sibling pairs. In this project we created a map of roughly 1.44 million recombination events using UKB genomes. Similar maps can now be created using other biobank data, including in non-European ancestry groups that have not yet received sufficient attention. The landmark deCODE results were obtained under special circumstances: the researchers had access to data resulting from a nationwide project utilizing genealogical records (unusually prevalent in Iceland) and widespread sequencing. Using the sibling-pair method results of comparable accuracy can be obtained from existing datasets around the world -- e.g., national biobanks in countries such as the USA, Estonia, China, Taiwan, Japan, etc.

The creator of this new sibling-pair method for recombination mapping is my son. He developed and tested the algorithm, and wrote all the code in Python. It's his high school science project :-)

See Three Thousand Years and 115 Generations of 徐 (Hsu / Xu)

Wednesday, July 06, 2022

WIRED: Genetic Screening Now Lets Parents Pick the Healthiest Embryos

This is a balanced and informative article in WIRED, excerpted from author Rachael Pells' forthcoming book, Genomics: How Genome Sequencing Will Change Our Lives.

WIRED: ... Companies such as Genomic Prediction are taking this process much further, giving parents the power to select the embryo they believe to have the best fighting chance of survival both in the womb and out in the world. At the time of writing, Genomic Prediction works with around 200 IVF clinics across six continents. For company cofounder Stephen Hsu, the idea behind preconception screening was no eureka moment, but something he and his peers developed gradually. “We kept pursuing the possibilities from a purely scientific interest,” he says. Over time sequencing has become cheaper and more accessible, and the bank of genetic data has become ever greater, which has provided the opportunity to easily apply machine learning programs to seek out patterns, Hsu explains. “You can have typically millions of people in one data set, with exact measurements of certain things about them—for instance how tall they are or whether they have diabetes—what we call phenotypes. And so it’s relatively straightforward to use AI to build genetic predictors of traits ranging from very simple ones which are only determined by a few genes, or a few different locations in the genome, to the really complicated ones.” As Hsu indicates, the crucial difference with this technology is that it’s not just single mutations like cystic fibrosis or sickle cell anemia that the service makes its calculations on. The conditions embryos are screened for can be extremely complicated, involving thousands of genetic variants across different parts of the genome.

In late 2017, Hsu and his colleagues published a paper demonstrating how, using genomic data at scale, scientists could predict someone’s height to within an inch of accuracy using just their DNA. The research group later used the same method to build genomic predictors for complex diseases such as hypothyroidism, types 1 and 2 diabetes, breast cancer, prostate cancer, testicular cancer, gallstones, glaucoma, gout, atrial fibrillation, high cholesterol, asthma, basal cell carcinoma, malignant melanoma, and heart attacks. ...

Two useful references:

Polygenic Health Index, General Health, and Disease Risk

Complex Trait Prediction: Methods and Protocols (Springer 2022)

Thursday, June 23, 2022

Polygenic Health Index, General Health, and Disease Risk

See published version: https://www.nature.com/articles/s41598-022-22637-8

Informal summary: We built a polygenic health index using risk predictors weighted by lifespan impact of the specific disease condition. This index seems to characterize general health. Individuals with higher index scores have decreased disease risk across almost all 20 diseases (no significant risk increases), and longer calculated life expectancy. When estimated Disability Adjusted Life Years (DALYs) are used as the performance metric, the gain from selection among 10 individuals (highest index score vs average) is found to be roughly 4 DALYs. We find no statistical evidence for antagonistic trade-offs in risk reduction across these diseases. Correlations between genetic disease risks are found to be mostly positive and generally mild.

Polygenic Health Index, General Health, and Disease Risk

We construct a polygenic health index as a weighted sum of polygenic risk scores for 20 major disease conditions, including, e.g., coronary artery disease, type 1 and 2 diabetes, schizophrenia, etc. Individual weights are determined by population-level estimates of impact on life expectancy. We validate this index in odds ratios and selection experiments using unrelated individuals and siblings (pairs and trios) from the UK Biobank. Individuals with higher index scores have decreased disease risk across almost all 20 diseases (no significant risk increases), and longer calculated life expectancy. When estimated Disability Adjusted Life Years (DALYs) are used as the performance metric, the gain from selection among 10 individuals (highest index score vs average) is found to be roughly 4 DALYs. We find no statistical evidence for antagonistic trade-offs in risk reduction across these diseases. Correlations between genetic disease risks are found to be mostly positive and generally mild. These results have important implications for public health and also for fundamental issues such as pleiotropy and genetic architecture of human disease conditions.

https://www.medrxiv.org/content/10.1101/2022.06.15.22276102v1

Some figures:

Extrapolating the DALY gain vs Health Index score curve (top figure) to the entire human population (e.g., 10 billion people) results in +30 or +40 DALYs more than average, or something like 120 total years of life. The individual with the highest Health Index score in the world is predicted to live about 120 years.

I wanted to use this in the paper but my collaborators vetoed me 8-)

The days of our years are threescore years and ten; and if by reason of strength they be fourscore years, yet is their strength labour and sorrow; for it is soon cut off, and we fly away

Psalm 90:10

Thursday, April 07, 2022

Scott Aaronson: Quantum Computing, Unsolvable Problems, & Artificial Intelligence — Manifold podcast #9

Scott Aaronson is the David J. Bruton Centennial Professor of Computer Science at The University of Texas at Austin, and director of its Quantum Information Center. Previously, he taught for nine years in Electrical Engineering and Computer Science at MIT. His research interests center around the capabilities and limits of quantum computers, and computational complexity theory more generally.

Scott also writes the blog Shtetl Optimized: https://scottaaronson.blog/

Steve and Scott discuss:

1. Scott's childhood and education, first exposure to mathematics and computers.

2. How he became interested in computational complexity, pursuing it rather than AI/ML.

3. The development of quantum computation and quantum information theory from the 1980s to the present.

4. Scott's work on quantum supremacy.

5. AGI, AI Safety

ManifoldOne page. Transcript.

Thursday, February 03, 2022

ManifoldOne podcast Episode#2: Steve Hsu Q&A

Steve answers questions about recent progress in AI/ML prediction of complex traits from DNA, and applications in embryo selection.

Highlights:

1. Overview of recent advances in trait prediction

2. Would cost savings from breast cancer early detection pay for genotyping of all women?

3. How does IVF work? Economics of embryo selection

4. Whole embryo genotyping increases IVF success rates (pregnancy per transfer) significantly

5. Future predictions

Some relevant scientific papers:

Preimplantation Genetic Testing for Aneuploidy: New Methods and Higher Pregnancy Rates

https://infoproc.blogspot.com/2022/01/preimplantation-genetic-testing-for.html

2021 review article on complex trait prediction

https://arxiv.org/abs/2101.05870

Accurate Genomic Prediction of Human Height

https://academic.oup.com/genetics/article/210/2/477/5931053

Genomic Prediction of 16 Complex Disease Risks Including Heart Attack, Diabetes, Breast and Prostate Cancer

https://www.nature.com/articles/s41598-019-51258-x

Genetic architecture of complex traits and disease risk predictors

https://www.nature.com/articles/s41598-020-68881-8

Sibling validation of polygenic risk scores and complex trait prediction

https://www.nature.com/articles/s41598-020-69927-7

Tuesday, November 09, 2021

The Balance of Power in the Western Pacific and the Death of the Naval Surface Ship

Recent satellite photos suggest that PLARF (People's Liberation Army Rocket Forces) have been testing against realistic moving ship targets in the deserts of the northwest. Note the ship model is on rails in the second photo below. Apparently there are over 30km of rail lines, allowing the simulation of evasive maneuvers by an aircraft carrier (third figure below).

Large surface ships such as aircraft carriers are easy to detect (e.g., satellite imaging via radar sensors), and missiles (especially those with maneuver capability) are very difficult to stop. Advances in AI / machine learning tend to favor missile targeting, not defense of carriers.

The key capability is autonomous final target acquisition by the missile at a range of tens of km -- i.e., the distance the ship can move during missile flight time after launch. State of the art air to air missiles already do this in BVR (Beyond Visual Range) combat. Note, they are much smaller than anti-ship missiles, with presumably much smaller radar seekers, yet are pursuing a smaller, faster, more maneuverable target (enemy aircraft).

It seems highly likely that the technical problem of autonomous targeting of a large surface ship during final missile approach has already been solved some time ago by the PLARF.

With this capability in place one only has to localize the carrier to within few x 10km for initial launch, letting the smart final targeting do the rest. The initial targeting location can be obtained through many methods, including aircraft/drone probes, targeting overflight by another kind of missile, LEO micro-satellites, etc. Obviously if the satellite retains coverage of the ship during the entire attack, and can communicate with the missile, even this smart final targeting is not required.

This is what a ship looks like to Synthetic Aperture Radar (SAR) from Low Earth Orbit (LEO). PRC has had a sophisticated system (Yaogan) in place for almost a decade, and continues to launch new satellites for this purpose.

See LEO SAR, hypersonics, and the death of the naval surface ship:

In an earlier post we described how sea blockade (e.g., against Japan or Taiwan) can be implemented using satellite imaging and missiles, drones, AI/ML. Blue water naval dominance is not required.

PLAN/PLARF can track every container ship and oil tanker as they approach Kaohsiung or Nagoya. All are in missile range -- sitting ducks. Naval convoys will be just as vulnerable.

Sink one tanker or cargo ship, or just issue a strong warning, and no shipping company in the world will be stupid enough to try to run the blockade.

But, But, But, !?! ...

USN guy: We'll just hide the carrier from the satellite and missile seekers using, you know, countermeasures! [Aside: don't cut my carrier budget!]

USAF guy: Uh, the much smaller AESA/IR seeker on their AAM can easily detect an aircraft from much longer ranges. How will you hide a huge ship?

USN guy: We'll just shoot down the maneuvering hypersonic missile using, you know, methods. [Aside: don't cut my carrier budget!]

Missile defense guy: Can you explain to us how to do that? If the incoming missile maneuvers we have to adapt the interceptor trajectory (in real time) to where we project the missile to be after some delay. But we can't know its trajectory ahead of time, unlike for a ballistic (non-maneuvering) warhead.

More photos and maps in this 2017 post.

Sunday, October 31, 2021

Demis Hassabis: Using AI to accelerate scientific discovery (protein folding) + Bonus: Bruno Pontecorvo

Recent talk (October 2021) by Demis Hassabis on the use of AI in scientific research. Second half of the talk is focused on protein folding.

Below is part 2, by the AlphaFold research lead, which has more technical details.

Bonus: My former Oregon colleague David Strom recommended a CERN lecture by Frank Close on his biography of physicist (and atomic spy?) Bruno Pontecorvo. David knew that The Battle of Algiers, which I blogged about recently, was directed by Gillo Pontecorvo, Bruno's brother.

Below is the closest thing I could find on YouTube -- it has better audio and video quality than the CERN talk.

The amazing story of Bruno Pontecorvo involves topics such as the first nuclear reactions and reactors (work with Enrico Fermi), the Manhattan Project, neutrino flavors and oscillations, supernovae, atomic espionage, the KGB, Kim Philby, and the quote:

I want to be remembered as a great physicist, not as your fucking spy!

Friday, October 22, 2021

The Principles of Deep Learning Theory - Dan Roberts IAS talk

This is a nice talk that discusses, among other things, subleading 1/width corrections to the infinite width limit of neural networks. I was expecting someone would work out these corrections when I wrote the post on NTK and large width limit at the link below. Apparently, the infinite width limit does not capture the behavior of realistic neural nets and it is only at the first nontrivial order in the expansion that the desired properties emerge. Roberts claims that when the depth to width ratio r is small but nonzero one can characterize network dynamics in a controlled expansion, whereas when r > 1 it becomes a problem of strong dynamics.

The talk is based on the book

The Principles of Deep Learning Theory

https://arxiv.org/abs/2106.10165

This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.

Dan Roberts web page.

This essay looks interesting:

Why is AI hard and Physics simple?

https://arxiv.org/abs/2104.00008

We discuss why AI is hard and why physics is simple. We discuss how physical intuition and the approach of theoretical physics can be brought to bear on the field of artificial intelligence and specifically machine learning. We suggest that the underlying project of machine learning and the underlying project of physics are strongly coupled through the principle of sparsity, and we call upon theoretical physicists to work on AI as physicists. As a first step in that direction, we discuss an upcoming book on the principles of deep learning theory that attempts to realize this approach.

May 2021 post: Neural Tangent Kernels and Theoretical Foundations of Deep Learning

Large width seems to provide a limiting case (analogous to the large-N limit in gauge theory) in which rigorous results about deep learning can be proved. ...

The overparametrized (width ~ w^2) network starts in a random state and by concentration of measure this initial kernel K is just the expectation, which is the NTK. Because of the large number of parameters the effect of training (i.e., gradient descent) on any individual parameter is 1/w, and the change in the eigenvalue spectrum of K is also 1/w. It can be shown that the eigenvalue spectrum is positive and bounded away from zero, and this property does not change under training. Also, the evolution of f is linear in K up to corrections with are suppressed by 1/w. Hence evolution follows a convex trajectory and can achieve global minimum loss in a finite (polynomial) time.

The parametric 1/w expansion may depend on quantities such as the smallest NTK eigenvalue k: the proof might require k >> 1/w or wk large.

In the large w limit the function space has such high dimensionality that any typical initial f is close (within a ball of radius 1/w?) to an optimal f. These properties depend on specific choice of loss function.

See related remarks: ICML notes (2018).

It may turn out that the problems on which DL works well are precisely those in which the training data (and underlying generative processes) have a hierarchical structure which is sparse, level by level. Layered networks perform a kind of coarse graining (renormalization group flow): first layers filter by feature, subsequent layers by combinations of features, etc. But the whole thing can be understood as products of sparse filters, and the performance under training is described by sparse performance guarantees (ReLU = thresholded penalization?). Given the inherent locality of physics (atoms, molecules, cells, tissue; atoms, words, sentences, ...) it is not surprising that natural phenomena generate data with this kind of hierarchical structure.

Thursday, July 22, 2021

Embryo Screening for Polygenic Disease Risk: Recent Advances and Ethical Considerations (Genes 2021 Special Issue)

It is a great honor to co-author a paper with Simon Fishel, the last surviving member of the team that produced the first IVF baby (Louise Brown) in 1978. His mentors and collaborators were Robert Edwards (Nobel Prize 2010) and Patrick Steptoe (passed before 2010). In the photo above, of the very first scientific conference on In Vitro Fertilization (1981), Fishel (far right), Steptoe, and Edwards are in the first row. More on Simon and his experiences as a medical pioneer below.

This article appears in a Special Issue: Application of Genomic Technology in Disease Outcome Prediction.

Embryo Screening for Polygenic Disease Risk: Recent Advances and Ethical Considerations

L. Tellier, J. Eccles, L. Lello, N. Treff, S. Fishel, S. Hsu

Genes 2021, 12(8), 1105

https://doi.org/10.3390/genes12081105

Machine learning methods applied to large genomic datasets (such as those used in GWAS) have led to the creation of polygenic risk scores (PRSs) that can be used identify individuals who are at highly elevated risk for important disease conditions, such as coronary artery disease (CAD), diabetes, hypertension, breast cancer, and many more. PRSs have been validated in large population groups across multiple continents and are under evaluation for widespread clinical use in adult health. It has been shown that PRSs can be used to identify which of two individuals is at a lower disease risk, even when these two individuals are siblings from a shared family environment. The relative risk reduction (RRR) from choosing an embryo with a lower PRS (with respect to one chosen at random) can be quantified by using these sibling results. New technology for precise embryo genotyping allows more sophisticated preimplantation ranking with better results than the current method of selection that is based on morphology. We review the advances described above and discuss related ethical considerations.

I excerpt from the paper below.

Some related links:

First Baby Born from a Polygenically Screened Embryo (video panel)

Polygenic Embryo Screening: comments on Carmi et al. and Visscher et al.

Dalton Conley (Princeton) and collaborators find that 68% of surveyed Americans had positive attitudes concerning polygenic screening of embryos.

Introduction:

Over a million babies are born each year via IVF [1,2]. It is not uncommon for IVF parents to have more than one viable embryo from which to choose, as typical IVF cycles can produce four or five. The embryo that is transferred may become their child, while the others might not be used at all. We refer to this selection problem as the “embryo choice problem”. In the past, selections were made based on criteria such as morphology (i.e., rate of development, symmetry, general appearance) and chromosomal normality as determined by aneuploidy testing.

Recently, large datasets of human genomes together with health and disease histories have become available to researchers in computational genomics [3]. Statistical methods from machine learning have allowed researchers to build risk predictors (e.g., for specific disease conditions or related quantitative traits, such as height or longevity) that use the genotype alone as input information. Combined with the precision genotyping of embryos, these advances provide significantly more information that can be used for embryo selection to IVF parents.

In this brief article, we provide an overview of the advances in genotyping and computational genomics that have been applied to embryo selection. We also discuss related ethical issues, although a full discussion of these would require a much longer paper. ...

Ethical considerations:

For further clarification, we explore a specific scenario involving breast cancer. It is well known that monogenic BRCA1 and BRCA2 variants predispose women to breast cancer, but this population is small—perhaps a few per thousand in the general population. The subset of women who do not carry a BRCA1 or BRCA2 risk variant but are at high polygenic risk is about ten times as large as the BRCA1/2 group. Thus, the majority of breast cancer can be traced to polygenic causes in comparison with commonly tested monogenic variants.

For BRCA carrier families, preimplantation screening against BRCA is a standard (and largely uncontroversial) recommendation [39]. The new technologies discussed here allow a similar course of action for the much larger set of families with breast cancer history who are not carriers of BRCA1 or BRCA2. They can screen their embryos in favor of a daughter whose breast cancer PRS is in the normal range, avoiding a potentially much higher absolute risk of the condition.

The main difference between monogenic BRCA screening and the new PRS screening against breast cancer is that the latter technology can help an order of magnitude more families. From an ethical perspective, it would be unconscionable to deny PRS screening to BRCA1/2-negative families with a history of breast cancer. ...

On Simon Fishel's experiences as an IVF pioneer (see here):

Today millions of babies are produced through IVF. In most developed countries roughly 3-5 percent of all births are through IVF, and in Denmark the fraction is about 10 percent! But when the technology was first introduced with the birth of Louise Brown in 1978, the pioneering scientists had to overcome significant resistance. There may be an alternate universe in which IVF was not allowed to develop, and those millions of children were never born.
Wikipedia: ...During these controversial early years of IVF, Fishel and his colleagues received extensive opposition from critics both outside of and within the medical and scientific communities, including a civil writ for murder.[16] Fishel has since stated that "the whole establishment was outraged" by their early work and that people thought that he was "potentially a mad scientist".[17]
I predict that within 5 years the use of polygenic risk scores will become common in some health systems (i.e., for adults) and in IVF. Reasonable people will wonder why the technology was ever controversial at all, just as in the case of IVF.

Figure below from our paper. EHS = Embryo Health Score.

Friday, July 02, 2021

Polygenic Embryo Screening: comments on Carmi et al. and Visscher et al.

In this post I discuss some recent papers on disease risk reduction from polygenic screening of embryos in IVF (PGT-P). I will focus on the science but at the end will include some remarks about practical and ethical issues.

The two papers are

Carmi et al.

https://www.biorxiv.org/content/10.1101/2020.11.05.370478v3

Visscher et al.

https://www.nejm.org/doi/full/10.1056/NEJMsr2105065

Both papers study risk reduction in the following scenario: you have N embryos to choose from, and polygenic risk scores (PRS) for each which have been computed from SNP genotype. Both papers use simulated data -- they build synthetic child (embryo) genotypes in order to calculate expected risk reduction.

I am very happy to see serious researchers like Carmi et al. and Visscher et al. working on this important topic.

Here are some example results from the papers:

Carmi et al. find a ~50% risk reduction for schizophrenia from selecting the lowest risk embryo from a set of 5. For a selection among 2 embryos the risk reduction is ~30%. (We obtain a very similar result using empirical data: real adult siblings with known phenotype.)

Visscher et al. find the following results, see Table 1 and Figure 2 in their paper. To their credit they compute results for a range of ancestries (European, E. Asian, African). We have performed similar calculations using siblings but have not yet published the results for all ancestries.

Relative Risk Reduction (RRR):

Hypertension: 9-18% (ranges depend on specific ancestry)

Type 2 Diabetes: 7-16%

Coronary Artery Disease: 8-17%

Absolute Risk Reduction (ARR):

Hypertension: 4-8.5% (ranges depend on specific ancestry)

Type 2 Diabetes: 2.6-5.5%

Coronary Artery Disease: 0.55-1.1%

Note, families with a history of the disease would benefit much more than this. For example, parents with a family history of breast cancer or heart disease or schizophrenia will often produce some embryos with very high PRS and others in the normal range. Their absolute risk reduction from selection is many times larger than the population average results shown above.

My research group has already published work in this area using data from actual siblings: tens of thousands of individuals who are late in life (e.g., 50-70 years old), for whom we have health records and genotypes.

See Sibling Validation of Polygenic Risk Scores and Complex Trait Prediction (blog post) and https://www.nature.com/articles/s41598-020-69927-7 (paper).

We have shown that polygenic risk predictors can identify, using genotype alone, which sibling in a pair has a specific disease condition: the sib with high PRS is much more likely to have the condition than the sib with normal range PRS. In those papers we also computed Relative Risk Reduction (RRR), which is directly relevant to embryo selection. Needless to say I think real sib data provides better validation of PRS than simulated genotypes. The adult sibs have typically experienced a shared family environment and also exhibit negligible population stratification relative to each other. Using real sib data reduces significantly some important confounds in PRS validation.

See also these papers: Treff et al. [1] [2] [3]

Here are example results from our work on absolute and relative risk reduction. (Selection from 2 embryos.)

Regarding pleiotropy (discussed in the NEJM article), the Treff et al. results linked above show that selection using a Genomic Index, which is an aggregate of several polygenic risk scores, simultaneously reduces risks across all of the ~12 disease conditions in the polygenic disease panel. That is, risk reduction is not zero-sum, as far as we can tell: you are not forced to trade off one disease risk against another, at least for the 12 diseases on the panel. Further work on this is in progress.

In related work we showed that DNA regions used to predict different risks are largely disjoint, which also supports this conclusion. See

Pleiotropy: Myths and Reality (blog post)

https://www.nature.com/articles/s41598-020-68881-8 (paper).

To summarize, several groups have now validated the risk reduction from polygenic screening (PGT-P). The methodologies are different (i.e., simulated genotypes vs studies using large numbers of adult siblings) but come to similar conclusions.

Whether one should regard, for example, relative and absolute risk reduction in type 2 diabetes (T2D) of ~40% and ~3% (from figure above) as important or valuable is a matter of judgement.

Studies suggest that type 2 diabetes results in an average loss of over 10 quality-adjusted life years -- i.e., more than a decade. So reducing an individual's risk of T2D by even a few percent seems significant to me.

Now multiply that by a large factor, because selection using a genomic index (see figure) produces simultaneous risk reductions across a dozen important diseases.

Finally, polygenic predictors are improving rapidly as more genomic and health record data become available for machine learning. All of the power of modern AI technology will be applied to this data, and risk reductions from selection (PGT-P) will increase significantly over time. See this 2021 review article for more.

Practical Issues

Aurea, the first polygenically screened baby (PGT-P), was born in May 2020.

See this panel discussion, which includes

Dr. Simon Fishel (member of the team that produced the first IVF baby)

Elizabeth Carr (first US IVF baby)

Prof. Julian Savalescu (Uehiro Chair in Practical Ethics at the University of Oxford)

Dr. Nathan Treff (Chief Scientist, Genomic Prediction)

Dr. Rafal Smigrodzki (MD PhD, father of Aurea)

Astral Codex Ten recently posted on this topic: Welcome Polygenically Screened Babies :-) Many of the comments there are of high quality and worth reading.

Ethical Issues

Once the basic scientific results are established, one can meaningfully examine the many ethical issues surrounding embryo selection.

My view has always been that new genomic technologies are so powerful that they should be widely understood and discussed -- by all of society, not just by scientists.

However, to me it is clear that the potential benefits of embryo PRS screening (PGT-P) are very positive and that this technology will eventually be universally adopted.

Today millions of babies are produced through IVF. In most developed countries roughly 3-5 percent of all births are through IVF, and in Denmark the fraction is about 10 percent! But when the technology was first introduced with the birth of Louise Brown in 1978, the pioneering scientists had to overcome significant resistance. There may be an alternate universe in which IVF was not allowed to develop, and those millions of children were never born.

Wikipedia: ...During these controversial early years of IVF, Fishel and his colleagues received extensive opposition from critics both outside of and within the medical and scientific communities, including a civil writ for murder.[16] Fishel has since stated that "the whole establishment was outraged" by their early work and that people thought that he was "potentially a mad scientist".[17]

I predict that within 5 years the use of polygenic risk scores will become common in some health systems (i.e., for adults) and in IVF. Reasonable people will wonder why the technology was ever controversial at all, just as in the case of IVF.

This is a very complex topic. For an in-depth discussion I refer you to this recent paper by Munday and Savalescu. Savalescu, Uehiro Chair in Practical Ethics at the University of Oxford, is perhaps the leading philosopher / bioethicist working in this area.

Three models for the regulation of polygenic scores in reproduction

Journal of Medical Ethics

The past few years have brought significant breakthroughs in understanding human genetics. This knowledge has been used to develop ‘polygenic scores’ (or ‘polygenic risk scores’) which provide probabilistic information about the development of polygenic conditions such as diabetes or schizophrenia. They are already being used in reproduction to select for embryos at lower risk of developing disease. Currently, the use of polygenic scores for embryo selection is subject to existing regulations concerning embryo testing and selection. Existing regulatory approaches include ‘disease-based' models which limit embryo selection to avoiding disease characteristics (employed in various formats in Australia, the UK, Italy, Switzerland and France, among others), and 'laissez-faire' or 'libertarian' models, under which embryo testing and selection remain unregulated (as in the USA). We introduce a novel 'Welfarist Model' which limits embryo selection according to the impact of the predicted trait on well-being. We compare the strengths and weaknesses of each model as a way of regulating polygenic scores. Polygenic scores create the potential for existing embryo selection technologies to be used to select for a wider range of predicted genetically influenced characteristics including continuous traits. Indeed, polygenic scores exist to predict future intelligence, and there have been suggestions that they will be used to make predictions within the normal range in the USA in embryo selection. We examine how these three models would apply to the prediction of non-disease traits such as intelligence. The genetics of intelligence remains controversial both scientifically and ethically. This paper does not attempt to resolve these issues. However, as with many biomedical advances, an effective regulatory regime must be in place as soon as the technology is available. If there is no regulation in place, then the market effectively decides ethical issues.

Dalton Conley (Princeton) and collaborators find that 68% of surveyed Americans had positive attitudes concerning polygenic screening of embryos.

Tuesday, June 29, 2021

Machine Learning Prediction of Biomarkers from SNPs and of Disease Risk from Biomarkers in the UK Biobank (published version)

This is the published version of our MedRxiv preprint discussed back in April 2021. It is in the special issue Application of Genomic Technology in Disease Outcome Prediction of the journal Genes.

There is a lot in this paper: genomic prediction of important biomarkers (e.g., lipoprotein A, mean platelet (thrombocyte) volume, bilirubin, platelet count), prediction of important disease risks from biomarkers (novel ML in a ~65 dimensional space) with potential clinical applications. As is typical, genomic predictors trained in a European ancestry population perform less well in distant populations (e.g., S. Asians, E. Asians, Africans). This is probably due to different SNP LD (correlation) structure across populations. However predictors of disease risk using directly measured biomarkers do not show this behavior -- they can be applied even to distant ancestry groups.

The referees did not like our conditional probability notation:

( biomarkers | SNPs ) and ( disease risk | biomarkers )

So we ended up with lots of acronyms to refer to the various predictors.

Some of the biomarkers identified by ML as important for predicting specific disease risk are not familiar to practitioners and have not been previously discussed (as far as we could tell from the literature) as relevant to that specific disease. One medical school professor and practitioner, upon seeing our results, said he would in future add several new biomarkers to routine blood tests ordered for his patients.

Machine Learning Prediction of Biomarkers from SNPs and of Disease Risk from Biomarkers in the UK Biobank

Erik Widen 1,*,Timothy G. Raben 1, Louis Lello 1,2,* and Stephen D. H. Hsu 1,2

1 Department of Physics and Astronomy, Michigan State University, 567 Wilson Rd, East Lansing, MI 48824, USA

2 Genomic Prediction, Inc., 675 US Highway One, North Brunswick, NJ 08902, USA

*Authors to whom correspondence should be addressed.

Academic Editor: Sulev Koks

Genes 2021, 12(7), 991; https://doi.org/10.3390/genes12070991 (registering DOI)

Received: 30 March 2021 / Revised: 22 June 2021 / Accepted: 23 June 2021 / Published: 29 June 2021

(This article belongs to the Special Issue Application of Genomic Technology in Disease Outcome Prediction)

Abstract

We use UK Biobank data to train predictors for 65 blood and urine markers such as HDL, LDL, lipoprotein A, glycated haemoglobin, etc. from SNP genotype. For example, our Polygenic Score (PGS) predictor correlates ∼0.76 with lipoprotein A level, which is highly heritable and an independent risk factor for heart disease. This may be the most accurate genomic prediction of a quantitative trait that has yet been produced (specifically, for European ancestry groups). We also train predictors of common disease risk using blood and urine biomarkers alone (no DNA information); we call these predictors biomarker risk scores, BMRS. Individuals who are at high risk (e.g., odds ratio of >5× population average) can be identified for conditions such as coronary artery disease (AUC∼0.75), diabetes (AUC∼0.95), hypertension, liver and kidney problems, and cancer using biomarkers alone. Our atherosclerotic cardiovascular disease (ASCVD) predictor uses ∼10 biomarkers and performs in UKB evaluation as well as or better than the American College of Cardiology ASCVD Risk Estimator, which uses quite different inputs (age, diagnostic history, BMI, smoking status, statin usage, etc.). We compare polygenic risk scores (risk conditional on genotype: PRS) for common diseases to the risk predictors which result from the concatenation of learned functions BMRS and PGS, i.e., applying the BMRS predictors to the PGS output.

Figure 11. The ASCVD BMRS and the ASCVD Risk Estimator both make accurate risk predictions but with partially complementary information. (Upper left): Predicted risk by BMRS, the ASCVD Risk Estimator and a PRS predictor were binned and compared to the actual disease prevalence within each bin. The gray 1:1 line indicates perfect prediction. ... The ASCVD Risk Estimator was applied to 340k UKB samples while the others were applied to an evaluation set of 28k samples, all of European ancestry. (Upper right) shows a scatter plot and distributions of the risk predicted by BMRS versus the risk predicted by the ASCVD Risk Estimator for the 28k Europeans in the evaluation set. The BMRS distribution has a longer tail of high predicted risk, providing the tighter confidence interval in this region. The left plot y-axis is the actual prevalence within the horizontal and vertical cross-sections, as illustrated with the shaded bands corresponding to the hollow squares to the left. Notably, both predictors perform well despite the differences in assigned stratification. The hexagons are an overlay of the (lower center) heat map of actual risk within each bin (numbers are bin sizes). Both high risk edges have varying actual prevalence but with a very strong enrichment when the two predictors agree.

Wednesday, May 12, 2021

Neural Tangent Kernels and Theoretical Foundations of Deep Learning

A colleague recommended this paper to me recently. See also earlier post Gradient Descent Models Are Kernel Machines.

Neural Tangent Kernel: Convergence and Generalization in Neural Networks

Arthur Jacot, Franck Gabriel, Clément Hongler

At initialization, artificial neural networks (ANNs) are equivalent to Gaussian processes in the infinite-width limit, thus connecting them to kernel methods. We prove that the evolution of an ANN during training can also be described by a kernel: during gradient descent on the parameters of an ANN, the network function fθ (which maps input vectors to output vectors) follows the kernel gradient of the functional cost (which is convex, in contrast to the parameter cost) w.r.t. a new kernel: the Neural Tangent Kernel (NTK). This kernel is central to describe the generalization features of ANNs. While the NTK is random at initialization and varies during training, in the infinite-width limit it converges to an explicit limiting kernel and it stays constant during training. This makes it possible to study the training of ANNs in function space instead of parameter space. Convergence of the training can then be related to the positive-definiteness of the limiting NTK. We prove the positive-definiteness of the limiting NTK when the data is supported on the sphere and the non-linearity is non-polynomial. We then focus on the setting of least-squares regression and show that in the infinite-width limit, the network function fθ follows a linear differential equation during training. The convergence is fastest along the largest kernel principal components of the input data with respect to the NTK, hence suggesting a theoretical motivation for early stopping. Finally we study the NTK numerically, observe its behavior for wide networks, and compare it to the infinite-width limit.

The results are remarkably well summarized in the wikipedia entry on Neural Tangent Kernels:

For most common neural network architectures, in the limit of large layer width the NTK becomes constant. This enables simple closed form statements to be made about neural network predictions, training dynamics, generalization, and loss surfaces. For example, it guarantees that wide enough ANNs converge to a global minimum when trained to minimize an empirical loss. ...

An Artificial Neural Network (ANN) with scalar output consists in a family of functions $f\left(\cdot ,\theta \right):\mathbb {R} ^{n_{\mathrm {in} }}\to \mathbb {R}$ parametrized by a vector of parameters $\theta \in \mathbb {R} ^{P}$ .
The Neural Tangent Kernel (NTK) is a kernel $\Theta :\mathbb {R} ^{n_{\mathrm {in} }}\times \mathbb {R} ^{n_{\mathrm {in} }}\to \mathbb {R}$ defined by
$\Theta \left(x,y;\theta \right)=\sum _{p=1}^{P}\partial _{\theta _{p}}f\left(x;\theta \right)\partial _{\theta _{p}}f\left(y;\theta \right).$ In the language of kernel methods, the NTK $\Theta$ is the kernel associated with the feature map $\left(x\mapsto \partial _{\theta _{p}}f\left(x;\theta \right)\right)_{p=1,\ldots ,P}$

For a dataset $\left(x_{i}\right)_{i=1,\ldots ,n}\subset \mathbb {R} ^{n_{\mathrm {in} }}$ with scalar labels $\left(z_{i}\right)_{i=1,\ldots ,n}\subset \mathbb {R}$ and a loss function $c:\mathbb {R} \times \mathbb {R} \to \mathbb {R}$ , the associated empirical loss, defined on functions $f:\mathbb {R} ^{n_{\mathrm {in} }}\to \mathbb {R}$ , is given by
${\mathcal {C}}\left(f\right)=\sum _{i=1}^{n}c\left(f\left(x_{i}\right),z_{i}\right).$
When training the ANN $f\left(\cdot ;\theta \right):\mathbb {R} ^{n_{\mathrm {in} }}\to \mathbb {R}$ is trained to fit the dataset (i.e. minimize ${\mathcal {C}}$ ) via continuous-time gradient descent, the parameters $\left(\theta \left(t\right)\right)_{t\geq 0}$ evolve through the ordinary differential equation:
$\partial _{t}\theta \left(t\right)=-\nabla {\mathcal {C}}\left(f\left(\cdot ;\theta \right)\right).$
During training the ANN output function follows an evolution differential equation given in terms of the NTK:
$\partial _{t}f\left(x;\theta \left(t\right)\right)=-\sum _{i=1}^{n}\Theta \left(x,x_{i};\theta \right)\partial _{w}c\left(w,z_{i}\right){\Big |}_{w=f\left(x_{i};\theta \left(t\right)\right)}.$
This equation shows how the NTK drives the dynamics of $f\left(\cdot ;\theta \left(t\right)\right)$ in the space of functions $\mathbb {R} ^{n_{\mathrm {in} }}\to \mathbb {R}$ during training.

This is a very brief (3 minute) summary by the first author:

This 15 minute IAS talk gives a nice overview of the results, and their relation to fundamental questions (both empirical and theoretical) in deep learning. Longer (30m) version: On the Connection between Neural Networks and Kernels: a Modern Perspective.

I hope to find time to explore this in more depth. Large width seems to provide a limiting case (analogous to the large-N limit in gauge theory) in which rigorous results about deep learning can be proved.

Some naive questions:

What is the expansion parameter of the finite width expansion?

What role does concentration of measure play in the results? (See 30m video linked above.)

Simplification seems to be a consequence of overparametrization. But the proof method seems to apply to a regularized (but still convex, e.g., using L1 penalization) loss function that imposes sparsity. It would be interesting to examine this specific case in more detail.

Notes to self:

The overparametrized (width ~ w^2) network starts in a random state and by concentration of measure this initial kernel K is just the expectation, which is the NTK. Because of the large number of parameters the effect of training (i.e., gradient descent) on any individual parameter is 1/w, and the change in the eigenvalue spectrum of K is also 1/w. It can be shown that the eigenvalue spectrum is positive and bounded away from zero, and this property does not change under training. Also, the evolution of f is linear in K up to corrections with are suppressed by 1/w. Hence evolution follows a convex trajectory and can achieve global minimum loss in a finite (polynomial) time.

The parametric 1/w expansion may depend on quantities such as the smallest NTK eigenvalue k: the proof might require k >> 1/w or wk large.

In the large w limit the function space has such high dimensionality that any typical initial f is close (within a ball of radius 1/w?) to an optimal f.

These properties depend on specific choice of loss function.

Information Processing

About Me

Monday, March 11, 2024

Solving the Hallucination Problem - interview with AppliedAI

Sunday, December 24, 2023

Peace on Earth, Good Will to Men 2023

Thursday, August 10, 2023

AI on your phone? Tim Dettmers on quantization of neural networks — Manifold #41

Thursday, June 08, 2023

AI Cambrian Explosion: Conversation With Three AI Engineers — Manifold #37

Saturday, March 11, 2023

Biobank-scale methods and projections for sparse polygenic prediction from machine learning

Tuesday, October 25, 2022

American Society of Human Genetics (ASHG) 2022 Posters

Tuesday, September 20, 2022

Sibling Variation in Phenotype and Genotype: Polygenic Trait Distributions and DNA Recombination Mapping with UK Biobank and IVF Family Data (medRxiv)

Wednesday, July 06, 2022

WIRED: Genetic Screening Now Lets Parents Pick the Healthiest Embryos

Thursday, June 23, 2022

Polygenic Health Index, General Health, and Disease Risk

Thursday, April 07, 2022

Scott Aaronson: Quantum Computing, Unsolvable Problems, & Artificial Intelligence — Manifold podcast #9

Thursday, February 03, 2022

ManifoldOne podcast Episode#2: Steve Hsu Q&A

Tuesday, November 09, 2021

The Balance of Power in the Western Pacific and the Death of the Naval Surface Ship

Sunday, October 31, 2021

Demis Hassabis: Using AI to accelerate scientific discovery (protein folding) + Bonus: Bruno Pontecorvo

Friday, October 22, 2021

The Principles of Deep Learning Theory - Dan Roberts IAS talk

Thursday, July 22, 2021

Embryo Screening for Polygenic Disease Risk: Recent Advances and Ethical Considerations (Genes 2021 Special Issue)

Friday, July 02, 2021

Polygenic Embryo Screening: comments on Carmi et al. and Visscher et al.

Tuesday, June 29, 2021

Machine Learning Prediction of Biomarkers from SNPs and of Disease Risk from Biomarkers in the UK Biobank (published version)

Wednesday, May 12, 2021

Neural Tangent Kernels and Theoretical Foundations of Deep Learning

Blog Archive

Labels