X Tutup
The Wayback Machine - https://web.archive.org/web/20200526203335/https://github.com/huggingface/transformers/issues/3895
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_bertology: AttributeError: 'NoneType' object has no attribute 'abs' #3895

Open
ThomasSYT opened this issue Apr 22, 2020 · 6 comments
Open

run_bertology: AttributeError: 'NoneType' object has no attribute 'abs' #3895

ThomasSYT opened this issue Apr 22, 2020 · 6 comments

Comments

@ThomasSYT
Copy link

@ThomasSYT ThomasSYT commented Apr 22, 2020

🐛 Bug

Information

One more question did run_bertology also support albert model?
Model I am using (Bert, XLNet ...): Bert

Language I am using the model on (English, Chinese ...):English

The problem arises when using:

  • the official example scripts: (give details below)
    run_bertology.py
  • my own modified scripts: (give details below)
    python3.7/site-packages/transformers/data/processors/glue.py

Because the dataset SciEntsBank-3way labeled as "correct", "incorrect", "contradictory".

-223 return ["contradiction", "entailment", "neutral"]
+223 return ["correct", "incorrect", "contradictory"]

Because the dataset SciEntsBank-3way structure. label at first position, text_a at second position and text_b at third position.

-232 text_a = line[8]
-233 text_b = line[9]
-234 label = line[-1]
+232 text_a = line[1]
+233 text_b = line[2]
+234 label = line[0]

The tasks I am working on is:

To reproduce

Steps to reproduce the behavior:

python ./run_bertology.py --data_dir SciEntsBank-3way
--model_name bert-base-uncased
--task_name mnli
--max_seq_length 128
--output_dir ./tmp/$TASK_NAME/
--try_masking

Iteration: 100%|██████████| 4561/4561 [02:15<00:00, 33.57it/s]
INFO:main:Attention entropies
INFO:main:lv, h > 1 2 3 4 5 6 7 8 9 10 11 12
INFO:main:layer 1: 3.03255 2.82196 1.77876 1.64802 3.27255 2.91101 3.34266 3.03600 2.73255 3.09043 1.35738 2.52412
INFO:main:layer 2: 2.73629 1.11241 2.86221 2.44852 0.95509 2.39331 0.45580 2.82749 2.93869 2.88269 2.19532 2.48865
INFO:main:layer 3: 0.05847 1.66529 1.91624 2.79214 2.31408 2.67645 2.18180 2.62745 2.48442 0.05168 2.52636 2.49648
INFO:main:layer 4: 1.54150 2.90387 2.40694 2.06858 2.77907 0.80181 2.69664 2.88957 2.70095 1.19583 2.33666 1.83265
INFO:main:layer 5: 2.34246 2.64519 2.03515 1.37404 2.88754 1.67422 2.14421 1.41457 2.03571 2.69347 1.98139 1.44582
INFO:main:layer 6: 1.71052 1.10676 2.28401 1.87228 2.55920 1.75916 1.22450 1.35704 1.92916 1.02535 1.67920 1.60766
INFO:main:layer 7: 1.63887 1.93625 1.83002 1.20811 1.58296 1.65662 1.55572 2.38742 2.09030 1.69326 1.42275 1.08153
INFO:main:layer 8: 1.95536 1.73146 1.59791 1.17307 1.12128 1.95980 1.11606 1.11680 1.97816 1.64787 1.53183 1.28007
INFO:main:layer 9: 1.54698 1.96436 1.45466 2.03807 1.60202 1.44075 1.36014 2.32559 2.59592 2.09076 1.75704 1.85274
INFO:main:layer 10: 2.00444 1.91784 2.12478 1.99289 1.58305 2.48627 2.08822 1.69971 2.70500 1.71860 2.03850 2.38604
INFO:main:layer 11: 2.76158 1.53031 1.99278 2.26007 1.97855 1.66471 1.90139 2.13217 2.45516 1.83803 1.99372 2.15438
INFO:main:layer 12: 1.73656 2.10304 2.72498 1.85723 2.04607 2.20456 2.16210 1.82173 2.18728 2.71702 1.84256 1.83663
INFO:main:Head importance scores
INFO:main:lv, h > 1 2 3 4 5 6 7 8 9 10 11 12
INFO:main:layer 1: 0.30328 0.05899 0.06971 0.03727 0.13938 1.00000 0.04436 0.03679 0.22807 0.07911 0.19918 0.05241
INFO:main:layer 2: 0.04867 0.02256 0.21194 0.04069 0.23058 0.15942 0.65188 0.38251 0.47535 0.40172 0.10869 0.34316
INFO:main:layer 3: 0.06349 0.08003 0.56604 0.41141 0.38410 0.16264 0.29070 0.37301 0.28161 0.18325 0.45048 0.02401
INFO:main:layer 4: 0.74869 0.15986 0.29754 0.02072 0.20961 0.06570 0.35717 0.44580 0.01144 0.11113 0.26962 0.28707
INFO:main:layer 5: 0.05413 0.58029 0.29859 0.64154 0.25539 0.11611 0.36774 0.05591 0.19390 0.34493 0.04906 0.02742
INFO:main:layer 6: 0.24067 0.06599 0.45376 0.22384 0.40461 0.53808 0.06806 0.21937 0.04209 0.13334 0.19226 0.57838
INFO:main:layer 7: 0.33972 0.12576 0.31489 0.10031 0.29630 0.19341 0.28052 0.29937 0.78337 0.09395 0.23640 0.05812
INFO:main:layer 8: 0.23342 0.27415 0.27682 0.22111 0.23234 0.79778 0.03235 0.09092 0.40418 0.01651 0.21795 0.22528
INFO:main:layer 9: 0.01306 0.88878 0.08858 0.45180 0.04019 0.08035 0.13417 0.15899 0.39753 0.01761 0.10785 0.01428
INFO:main:layer 10: 0.01597 0.01365 0.08691 0.04718 0.01268 0.32052 0.00453 0.05614 0.81534 0.00000 0.02659 0.66734
INFO:main:layer 11: 0.86446 0.00818 0.05306 0.12751 0.13587 0.00293 0.06480 0.22173 0.21643 0.04838 0.48050 0.32190
INFO:main:layer 12: 0.08048 0.32489 0.56753 0.28201 0.37204 0.09334 0.26549 0.07130 0.00372 0.53481 0.24909 0.36108
INFO:main:Head ranked by importance scores
INFO:main:lv, h > 1 2 3 4 5 6 7 8 9 10 11 12
INFO:main:layer 1: 41 108 102 123 81 0 119 124 62 100 72 114
INFO:main:layer 2: 116 129 70 121 61 79 8 28 17 25 89 35
INFO:main:layer 3: 107 99 13 22 27 77 46 29 49 76 20 128
INFO:main:layer 4: 6 78 44 130 71 105 33 21 138 88 53 47
INFO:main:layer 5: 112 10 43 9 55 87 31 111 73 34 115 126
INFO:main:layer 6: 57 104 18 64 23 14 103 67 120 84 75 11
INFO:main:layer 7: 36 86 40 91 45 74 50 42 5 92 58 109
INFO:main:layer 8: 59 52 51 66 60 4 125 94 24 132 68 63
INFO:main:layer 9: 136 1 95 19 122 98 83 80 26 131 90 134
INFO:main:layer 10: 133 135 96 118 137 39 140 110 3 143 127 7
INFO:main:layer 11: 2 139 113 85 82 142 106 65 69 117 16 38
INFO:main:layer 12: 97 37 12 48 30 93 54 101 141 15 56 32
Iteration: 100%|██████████| 4561/4561 [01:54<00:00, 39.80it/s]
INFO:main:Attention entropies
INFO:main:lv, h > 1 2 3 4 5 6 7 8 9 10 11 12
INFO:main:layer 1: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 2: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 3: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 4: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 5: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 6: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 7: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 8: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 9: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 10: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 11: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 12: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:Head importance scores
INFO:main:lv, h > 1 2 3 4 5 6 7 8 9 10 11 12
INFO:main:layer 1: 0.30328 0.05899 0.06971 0.03727 0.13938 1.00000 0.04436 0.03679 0.22807 0.07911 0.19918 0.05241
INFO:main:layer 2: 0.04867 0.02256 0.21194 0.04069 0.23058 0.15942 0.65188 0.38251 0.47535 0.40172 0.10869 0.34316
INFO:main:layer 3: 0.06349 0.08003 0.56604 0.41141 0.38410 0.16264 0.29070 0.37301 0.28161 0.18325 0.45048 0.02401
INFO:main:layer 4: 0.74869 0.15986 0.29754 0.02072 0.20961 0.06570 0.35717 0.44580 0.01144 0.11113 0.26962 0.28707
INFO:main:layer 5: 0.05413 0.58029 0.29859 0.64154 0.25539 0.11611 0.36774 0.05591 0.19390 0.34493 0.04906 0.02742
INFO:main:layer 6: 0.24067 0.06599 0.45376 0.22384 0.40461 0.53808 0.06806 0.21937 0.04209 0.13334 0.19226 0.57838
INFO:main:layer 7: 0.33972 0.12576 0.31489 0.10031 0.29630 0.19341 0.28052 0.29937 0.78337 0.09395 0.23640 0.05812
INFO:main:layer 8: 0.23342 0.27415 0.27682 0.22111 0.23234 0.79778 0.03235 0.09092 0.40418 0.01651 0.21795 0.22528
INFO:main:layer 9: 0.01306 0.88878 0.08858 0.45180 0.04019 0.08035 0.13417 0.15899 0.39753 0.01761 0.10785 0.01428
INFO:main:layer 10: 0.01597 0.01365 0.08691 0.04718 0.01268 0.32052 0.00453 0.05614 0.81534 0.00000 0.02659 0.66734
INFO:main:layer 11: 0.86446 0.00818 0.05306 0.12751 0.13587 0.00293 0.06480 0.22173 0.21643 0.04838 0.48050 0.32190
INFO:main:layer 12: 0.08048 0.32489 0.56753 0.28201 0.37204 0.09334 0.26549 0.07130 0.00372 0.53481 0.24909 0.36108
INFO:main:Head ranked by importance scores
INFO:main:lv, h > 1 2 3 4 5 6 7 8 9 10 11 12
INFO:main:layer 1: 41 108 102 123 81 0 119 124 62 100 72 114
INFO:main:layer 2: 116 129 70 121 61 79 8 28 17 25 89 35
INFO:main:layer 3: 107 99 13 22 27 77 46 29 49 76 20 128
INFO:main:layer 4: 6 78 44 130 71 105 33 21 138 88 53 47
INFO:main:layer 5: 112 10 43 9 55 87 31 111 73 34 115 126
INFO:main:layer 6: 57 104 18 64 23 14 103 67 120 84 75 11
INFO:main:layer 7: 36 86 40 91 45 74 50 42 5 92 58 109
INFO:main:layer 8: 59 52 51 66 60 4 125 94 24 132 68 63
INFO:main:layer 9: 136 1 95 19 122 98 83 80 26 131 90 134
INFO:main:layer 10: 133 135 96 118 137 39 140 110 3 143 127 7
INFO:main:layer 11: 2 139 113 85 82 142 106 65 69 117 16 38
INFO:main:layer 12: 97 37 12 48 30 93 54 101 141 15 56 32
INFO:main:Pruning: original score: 0.091866, threshold: 0.082679
INFO:main:Heads to mask: [117, 125, 140, 114, 121, 44, 112, 96, 109, 107, 108, 93, 105, 39]
INFO:main:lv, h > 1 2 3 4 5 6 7 8 9 10 11 12
INFO:main:layer 1: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
INFO:main:layer 2: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
INFO:main:layer 3: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
INFO:main:layer 4: 1.00000 1.00000 1.00000 0.00000 1.00000 1.00000 1.00000 1.00000 0.00000 1.00000 1.00000 1.00000
INFO:main:layer 5: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
INFO:main:layer 6: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
INFO:main:layer 7: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
INFO:main:layer 8: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.00000 1.00000 1.00000
INFO:main:layer 9: 0.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.00000 1.00000 0.00000
INFO:main:layer 10: 0.00000 0.00000 1.00000 1.00000 0.00000 1.00000 0.00000 1.00000 1.00000 0.00000 1.00000 1.00000
INFO:main:layer 11: 1.00000 0.00000 1.00000 1.00000 1.00000 0.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
INFO:main:layer 12: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.00000 1.00000 1.00000 1.00000
Iteration: 100%|██████████| 4561/4561 [01:54<00:00, 39.68it/s]
INFO:main:Attention entropies
INFO:main:lv, h > 1 2 3 4 5 6 7 8 9 10 11 12
INFO:main:layer 1: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 2: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 3: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 4: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 5: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 6: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 7: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 8: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 9: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 10: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 11: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 12: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:Head importance scores
INFO:main:lv, h > 1 2 3 4 5 6 7 8 9 10 11 12
INFO:main:layer 1: 0.39574 0.02468 0.08140 0.12466 0.12766 1.00000 0.09364 0.03733 0.21125 0.05515 0.27669 0.01294
INFO:main:layer 2: 0.04871 0.01492 0.02147 0.00766 0.25008 0.16705 0.73248 0.48019 0.40388 0.39327 0.06609 0.38033
INFO:main:layer 3: 0.14803 0.02220 0.64758 0.29125 0.45867 0.02242 0.34411 0.33109 0.30959 0.29897 0.41782 0.01806
INFO:main:layer 4: 0.82395 0.20768 0.33463 0.05595 0.30457 0.01353 0.38665 0.33563 0.02096 0.12274 0.15990 0.30594
INFO:main:layer 5: 0.24467 0.77639 0.26634 0.43066 0.28580 0.15136 0.29888 0.09479 0.20161 0.38652 0.07106 0.09292
INFO:main:layer 6: 0.24192 0.04294 0.19242 0.18251 0.64465 0.64657 0.03439 0.17273 0.05866 0.20935 0.14715 0.51240
INFO:main:layer 7: 0.24221 0.11722 0.54783 0.09908 0.30887 0.33625 0.18271 0.09798 0.76243 0.19917 0.26639 0.02415
INFO:main:layer 8: 0.34639 0.10483 0.42852 0.23310 0.20756 0.85146 0.05960 0.06187 0.25805 0.12922 0.14193 0.28091
INFO:main:layer 9: 0.02696 0.97099 0.08023 0.36748 0.05116 0.06451 0.07015 0.23535 0.39404 0.14999 0.01570 0.01164
INFO:main:layer 10: 0.02307 0.01918 0.09727 0.05241 0.03105 0.32034 0.02875 0.08710 0.92011 0.00000 0.05011 0.59763
INFO:main:layer 11: 0.81794 0.00753 0.05141 0.14622 0.09715 0.00008 0.03071 0.09413 0.17420 0.05189 0.69652 0.31542
INFO:main:layer 12: 0.09312 0.36467 0.56134 0.25867 0.37935 0.06446 0.34758 0.09796 0.03789 0.56186 0.25654 0.37430
INFO:main:Head ranked by importance scores
INFO:main:lv, h > 1 2 3 4 5 6 7 8 9 10 11 12
INFO:main:layer 1: 24 126 100 85 84 0 96 120 64 111 52 138
INFO:main:layer 2: 117 136 131 140 58 75 8 18 23 26 104 29
INFO:main:layer 3: 79 130 10 49 19 129 36 40 43 47 22 134
INFO:main:layer 4: 4 66 39 110 46 137 27 38 132 86 76 45
INFO:main:layer 5: 59 6 54 20 50 77 48 94 68 28 102 98
INFO:main:layer 6: 61 118 70 72 12 11 121 74 109 65 80 17
INFO:main:layer 7: 60 87 16 89 44 37 71 90 7 69 53 127
INFO:main:layer 8: 35 88 21 63 67 3 108 107 56 83 82 51
INFO:main:layer 9: 125 1 101 32 115 105 103 62 25 78 135 139
INFO:main:layer 10: 128 133 92 112 122 41 124 99 2 143 116 13
INFO:main:layer 11: 5 141 114 81 93 142 123 95 73 113 9 42
INFO:main:layer 12: 97 33 15 55 30 106 34 91 119 14 57 31
INFO:main:Masking: current score: 0.092085, remaning heads 130 (90.3 percents)
INFO:main:Heads to mask: [15, 11, 41, 13, 106, 35, 14, 25, 29, 83, 1, 126, 66, 7]
INFO:main:lv, h > 1 2 3 4 5 6 7 8 9 10 11 12
INFO:main:layer 1: 1.00000 0.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.00000 1.00000 1.00000 1.00000 0.00000
INFO:main:layer 2: 1.00000 0.00000 0.00000 0.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
INFO:main:layer 3: 1.00000 0.00000 1.00000 1.00000 1.00000 0.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.00000
INFO:main:layer 4: 1.00000 1.00000 1.00000 0.00000 1.00000 0.00000 1.00000 1.00000 0.00000 1.00000 1.00000 1.00000
INFO:main:layer 5: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
INFO:main:layer 6: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.00000 1.00000 1.00000 1.00000 1.00000 1.00000
INFO:main:layer 7: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.00000
INFO:main:layer 8: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.00000 1.00000 1.00000
INFO:main:layer 9: 0.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.00000 0.00000 0.00000
INFO:main:layer 10: 0.00000 0.00000 1.00000 1.00000 0.00000 1.00000 0.00000 1.00000 1.00000 0.00000 1.00000 1.00000
INFO:main:layer 11: 1.00000 0.00000 1.00000 1.00000 1.00000 0.00000 0.00000 1.00000 1.00000 1.00000 1.00000 1.00000
INFO:main:layer 12: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.00000 1.00000 1.00000 1.00000
Iteration: 0%| | 0/4561 [00:00<?, ?it/s]
Traceback (most recent call last):
File "run_bertology.py", line 426, in
main()
File "run_bertology.py", line 421, in main
head_mask = mask_heads(args, model, eval_dataloader)
File "run_bertology.py", line 179, in mask_heads
args, model, eval_dataloader, compute_entropy=False, head_mask=new_head_mask
File "run_bertology.py", line 104, in compute_heads_importance
head_importance += head_mask.grad.abs().detach()
AttributeError: 'NoneType' object has no attribute 'abs'

Environment info

  • transformers version:2.8.0
  • Platform:
  • Python version:3.7
  • PyTorch version (GPU?):
  • Tensorflow version (GPU?):
  • Using GPU in script?:yes
  • Using distributed or parallel set-up in script?:
@sshleifer sshleifer changed the title AttributeError: 'NoneType' object has no attribute 'abs' run_bertology: AttributeError: 'NoneType' object has no attribute 'abs' Apr 22, 2020
@ThomasSYT ThomasSYT closed this May 1, 2020
@ThomasSYT ThomasSYT reopened this May 1, 2020
@ThomasSYT
Copy link
Author

@ThomasSYT ThomasSYT commented May 1, 2020

Sorry, I think it is the same issue as #4103.

@BramVanroy
Copy link
Collaborator

@BramVanroy BramVanroy commented May 2, 2020

Please don't open duplicate issues. If you have more info about this issue, post it here. Also, please use code blocks.

@ThomasSYT
Copy link
Author

@ThomasSYT ThomasSYT commented May 5, 2020

Please don't open duplicate issues. If you have more info about this issue, post it here. Also, please use code blocks.

Sorry for this.

@aditwhorra42
Copy link

@aditwhorra42 aditwhorra42 commented May 6, 2020

Facing the same issue -
Environment -

  • transformers version:2.2.2
  • Platform:
  • Python version:3.7.3
  • PyTorch version (GPU?): 1.3.1 (no GPU)
  • Tensorflow version (GPU?): 1.14.0 (no GPU)
  • Using GPU in script?:No
  • Using distributed or parallel set-up in script?: No

While computing the initial importance score (When the head_mask is None) I am getting the following error. -

File "/Users/user/Desktop/org//prune_attention_heads.py", line 226, in compute_heads_importance
head_importance += head_mask.grad.abs().detach()
AttributeError: 'NoneType' object has no attribute 'abs'

On putting the line above in a try and except block and printing the head_mask I get the following -

tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], requires_grad=True)

Here, the head_mask has requires_grad=True and I am passing the head_mask into the model and calling loss.backward() as in the bertology script.

@TobiasLee
Copy link
Contributor

@TobiasLee TobiasLee commented May 20, 2020

I met the same problem, the head_mask has no grad at the second time.

@TobiasLee
Copy link
Contributor

@TobiasLee TobiasLee commented May 20, 2020

Seems that during the second pruning, the head_mask tensor becomes a non-leaf node, when the grad is None, I printed the head_mask.is_leaf attribute and get the warning(PyTorch 1.5.0) as below:

UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.
warnings.warn("The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad "
head_mask is leaf: False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.
X Tutup