--

This might need some revision. The FF network is fully connected and has the most number of parameters. Though MHA has O(n^2) inference, the number of parameterts might be lower. Ideally LoRA can benefit the FFN. Reference: https://orenleung.com/transformer-parameter-counting

P.S: Havent read the full LoRA paper yet :) So my assumption might be incorrect.

--

--

Senthilkumar Gopal
Senthilkumar Gopal

Written by Senthilkumar Gopal

❤️ to code and solving complex problems everyday @AWS . Engineering leader for AI/ML Accelerator using Neuron. Opinions my own and does not represent AWS.

Responses (1)