X Tutup
The Wayback Machine - https://web.archive.org/web/20230108112840/https://github.com/huggingface/diffusers/issues/1880
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor attention.py #1880

Open
patrickvonplaten opened this issue Jan 1, 2023 · 2 comments
Open

Refactor attention.py #1880

patrickvonplaten opened this issue Jan 1, 2023 · 2 comments
Assignees

Comments

@patrickvonplaten
Copy link
Member

patrickvonplaten commented Jan 1, 2023

attention.py has at the moment two concurrent attention implementations which essentially do the exact same thing:

Both

class CrossAttention(nn.Module):
and
class AttentionBlock(nn.Module):
are already used for "simple" attention - e.g. the former for Stable Diffusion and the later for the simple DDPM UNet.

We should start deprecating

class AttentionBlock(nn.Module):
very soon as it's not viable to keep two attention mechanisms.

Deprecating this class won't be easy as it essentially means we have to force people to re-upload their weights. Essentially every model checkpoint that made use of

class AttentionBlock(nn.Module):
has to eventually re-upload their weights to be kept compatible.

I would propose to do this in the following way:

Happy to help with this PR. @williamberman are you maybe interested in taking this one?

@williamberman williamberman self-assigned this Jan 1, 2023
@Birch-san
Copy link
Contributor

Birch-san commented Jan 3, 2023

@patrickvonplaten rather than having everybody save & re-upload their weights: can diffusers intercept the weights during model load and map them to different parameter names?

Apple uses PyTorch's _register_load_state_dict_pre_hook() idiom to intercept the weights the state dict is being loaded, transform them and redirect them to be held in different Parameters:
https://github.com/apple/ml-ane-transformers/blob/da64000fa56cc85b0859bc17cb16a3d753b8304a/ane_transformers/huggingface/distilbert.py#L241

however, something about HF's technique for model-loading breaks this idiom. my model loading hooks never get invoked. they work in a CompVis repository, but not inside HF diffusers code. I think something about using importlib to load a .bin skips it. it'd be really good if you could fix that — it's the number one thing that made it difficult for me to optimize the Diffusers Unet for Neural Engine.

in the end, this is the technique I've had to resort to to replace every AttentionBlock with CrossAttention (after model loading):
Birch-san/diffusers-play@bf9b13e
you may find this as a useful reference for how to map between them.

@williamberman
Copy link
Contributor

williamberman commented Jan 4, 2023

@Birch-san Thank you for the added context, super helpful! I don't have much to add right now. When I start working on the refactor, I'll think about it more and we can discuss :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
X Tutup