Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign uprun_bertology: AttributeError: 'NoneType' object has no attribute 'abs' #3895
Comments
|
Sorry, I think it is the same issue as #4103. |
|
Please don't open duplicate issues. If you have more info about this issue, post it here. Also, please use code blocks. |
Sorry for this. |
|
Facing the same issue -
While computing the initial importance score (When the head_mask is None) I am getting the following error. - File "/Users/user/Desktop/org//prune_attention_heads.py", line 226, in compute_heads_importance On putting the line above in a try and except block and printing the head_mask I get the following - tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], Here, the head_mask has requires_grad=True and I am passing the head_mask into the model and calling loss.backward() as in the bertology script. |
|
I met the same problem, the head_mask has no grad at the second time. |
|
Seems that during the second pruning, the head_mask tensor becomes a non-leaf node, when the grad is None, I printed the
|


Information
One more question did run_bertology also support albert model?
Model I am using (Bert, XLNet ...): Bert
Language I am using the model on (English, Chinese ...):English
The problem arises when using:
run_bertology.py
python3.7/site-packages/transformers/data/processors/glue.py
Because the dataset SciEntsBank-3way labeled as "correct", "incorrect", "contradictory".
-223 return ["contradiction", "entailment", "neutral"]
+223 return ["correct", "incorrect", "contradictory"]
Because the dataset SciEntsBank-3way structure. label at first position, text_a at second position and text_b at third position.
-232 text_a = line[8]
-233 text_b = line[9]
-234 label = line[-1]
+232 text_a = line[1]
+233 text_b = line[2]
+234 label = line[0]
The tasks I am working on is:
mnli
dataset: SciEntsBank-3way(https://www.cs.york.ac.uk/semeval-2013/task7.html)
To reproduce
Steps to reproduce the behavior:
python ./run_bertology.py --data_dir SciEntsBank-3way
--model_name bert-base-uncased
--task_name mnli
--max_seq_length 128
--output_dir ./tmp/$TASK_NAME/
--try_masking
Iteration: 100%|██████████| 4561/4561 [02:15<00:00, 33.57it/s]
INFO:main:Attention entropies
INFO:main:lv, h > 1 2 3 4 5 6 7 8 9 10 11 12
INFO:main:layer 1: 3.03255 2.82196 1.77876 1.64802 3.27255 2.91101 3.34266 3.03600 2.73255 3.09043 1.35738 2.52412
INFO:main:layer 2: 2.73629 1.11241 2.86221 2.44852 0.95509 2.39331 0.45580 2.82749 2.93869 2.88269 2.19532 2.48865
INFO:main:layer 3: 0.05847 1.66529 1.91624 2.79214 2.31408 2.67645 2.18180 2.62745 2.48442 0.05168 2.52636 2.49648
INFO:main:layer 4: 1.54150 2.90387 2.40694 2.06858 2.77907 0.80181 2.69664 2.88957 2.70095 1.19583 2.33666 1.83265
INFO:main:layer 5: 2.34246 2.64519 2.03515 1.37404 2.88754 1.67422 2.14421 1.41457 2.03571 2.69347 1.98139 1.44582
INFO:main:layer 6: 1.71052 1.10676 2.28401 1.87228 2.55920 1.75916 1.22450 1.35704 1.92916 1.02535 1.67920 1.60766
INFO:main:layer 7: 1.63887 1.93625 1.83002 1.20811 1.58296 1.65662 1.55572 2.38742 2.09030 1.69326 1.42275 1.08153
INFO:main:layer 8: 1.95536 1.73146 1.59791 1.17307 1.12128 1.95980 1.11606 1.11680 1.97816 1.64787 1.53183 1.28007
INFO:main:layer 9: 1.54698 1.96436 1.45466 2.03807 1.60202 1.44075 1.36014 2.32559 2.59592 2.09076 1.75704 1.85274
INFO:main:layer 10: 2.00444 1.91784 2.12478 1.99289 1.58305 2.48627 2.08822 1.69971 2.70500 1.71860 2.03850 2.38604
INFO:main:layer 11: 2.76158 1.53031 1.99278 2.26007 1.97855 1.66471 1.90139 2.13217 2.45516 1.83803 1.99372 2.15438
INFO:main:layer 12: 1.73656 2.10304 2.72498 1.85723 2.04607 2.20456 2.16210 1.82173 2.18728 2.71702 1.84256 1.83663
INFO:main:Head importance scores
INFO:main:lv, h > 1 2 3 4 5 6 7 8 9 10 11 12
INFO:main:layer 1: 0.30328 0.05899 0.06971 0.03727 0.13938 1.00000 0.04436 0.03679 0.22807 0.07911 0.19918 0.05241
INFO:main:layer 2: 0.04867 0.02256 0.21194 0.04069 0.23058 0.15942 0.65188 0.38251 0.47535 0.40172 0.10869 0.34316
INFO:main:layer 3: 0.06349 0.08003 0.56604 0.41141 0.38410 0.16264 0.29070 0.37301 0.28161 0.18325 0.45048 0.02401
INFO:main:layer 4: 0.74869 0.15986 0.29754 0.02072 0.20961 0.06570 0.35717 0.44580 0.01144 0.11113 0.26962 0.28707
INFO:main:layer 5: 0.05413 0.58029 0.29859 0.64154 0.25539 0.11611 0.36774 0.05591 0.19390 0.34493 0.04906 0.02742
INFO:main:layer 6: 0.24067 0.06599 0.45376 0.22384 0.40461 0.53808 0.06806 0.21937 0.04209 0.13334 0.19226 0.57838
INFO:main:layer 7: 0.33972 0.12576 0.31489 0.10031 0.29630 0.19341 0.28052 0.29937 0.78337 0.09395 0.23640 0.05812
INFO:main:layer 8: 0.23342 0.27415 0.27682 0.22111 0.23234 0.79778 0.03235 0.09092 0.40418 0.01651 0.21795 0.22528
INFO:main:layer 9: 0.01306 0.88878 0.08858 0.45180 0.04019 0.08035 0.13417 0.15899 0.39753 0.01761 0.10785 0.01428
INFO:main:layer 10: 0.01597 0.01365 0.08691 0.04718 0.01268 0.32052 0.00453 0.05614 0.81534 0.00000 0.02659 0.66734
INFO:main:layer 11: 0.86446 0.00818 0.05306 0.12751 0.13587 0.00293 0.06480 0.22173 0.21643 0.04838 0.48050 0.32190
INFO:main:layer 12: 0.08048 0.32489 0.56753 0.28201 0.37204 0.09334 0.26549 0.07130 0.00372 0.53481 0.24909 0.36108
INFO:main:Head ranked by importance scores
INFO:main:lv, h > 1 2 3 4 5 6 7 8 9 10 11 12
INFO:main:layer 1: 41 108 102 123 81 0 119 124 62 100 72 114
INFO:main:layer 2: 116 129 70 121 61 79 8 28 17 25 89 35
INFO:main:layer 3: 107 99 13 22 27 77 46 29 49 76 20 128
INFO:main:layer 4: 6 78 44 130 71 105 33 21 138 88 53 47
INFO:main:layer 5: 112 10 43 9 55 87 31 111 73 34 115 126
INFO:main:layer 6: 57 104 18 64 23 14 103 67 120 84 75 11
INFO:main:layer 7: 36 86 40 91 45 74 50 42 5 92 58 109
INFO:main:layer 8: 59 52 51 66 60 4 125 94 24 132 68 63
INFO:main:layer 9: 136 1 95 19 122 98 83 80 26 131 90 134
INFO:main:layer 10: 133 135 96 118 137 39 140 110 3 143 127 7
INFO:main:layer 11: 2 139 113 85 82 142 106 65 69 117 16 38
INFO:main:layer 12: 97 37 12 48 30 93 54 101 141 15 56 32
Iteration: 100%|██████████| 4561/4561 [01:54<00:00, 39.80it/s]
INFO:main:Attention entropies
INFO:main:lv, h > 1 2 3 4 5 6 7 8 9 10 11 12
INFO:main:layer 1: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 2: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 3: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 4: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 5: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 6: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 7: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 8: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 9: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 10: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 11: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 12: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:Head importance scores
INFO:main:lv, h > 1 2 3 4 5 6 7 8 9 10 11 12
INFO:main:layer 1: 0.30328 0.05899 0.06971 0.03727 0.13938 1.00000 0.04436 0.03679 0.22807 0.07911 0.19918 0.05241
INFO:main:layer 2: 0.04867 0.02256 0.21194 0.04069 0.23058 0.15942 0.65188 0.38251 0.47535 0.40172 0.10869 0.34316
INFO:main:layer 3: 0.06349 0.08003 0.56604 0.41141 0.38410 0.16264 0.29070 0.37301 0.28161 0.18325 0.45048 0.02401
INFO:main:layer 4: 0.74869 0.15986 0.29754 0.02072 0.20961 0.06570 0.35717 0.44580 0.01144 0.11113 0.26962 0.28707
INFO:main:layer 5: 0.05413 0.58029 0.29859 0.64154 0.25539 0.11611 0.36774 0.05591 0.19390 0.34493 0.04906 0.02742
INFO:main:layer 6: 0.24067 0.06599 0.45376 0.22384 0.40461 0.53808 0.06806 0.21937 0.04209 0.13334 0.19226 0.57838
INFO:main:layer 7: 0.33972 0.12576 0.31489 0.10031 0.29630 0.19341 0.28052 0.29937 0.78337 0.09395 0.23640 0.05812
INFO:main:layer 8: 0.23342 0.27415 0.27682 0.22111 0.23234 0.79778 0.03235 0.09092 0.40418 0.01651 0.21795 0.22528
INFO:main:layer 9: 0.01306 0.88878 0.08858 0.45180 0.04019 0.08035 0.13417 0.15899 0.39753 0.01761 0.10785 0.01428
INFO:main:layer 10: 0.01597 0.01365 0.08691 0.04718 0.01268 0.32052 0.00453 0.05614 0.81534 0.00000 0.02659 0.66734
INFO:main:layer 11: 0.86446 0.00818 0.05306 0.12751 0.13587 0.00293 0.06480 0.22173 0.21643 0.04838 0.48050 0.32190
INFO:main:layer 12: 0.08048 0.32489 0.56753 0.28201 0.37204 0.09334 0.26549 0.07130 0.00372 0.53481 0.24909 0.36108
INFO:main:Head ranked by importance scores
INFO:main:lv, h > 1 2 3 4 5 6 7 8 9 10 11 12
INFO:main:layer 1: 41 108 102 123 81 0 119 124 62 100 72 114
INFO:main:layer 2: 116 129 70 121 61 79 8 28 17 25 89 35
INFO:main:layer 3: 107 99 13 22 27 77 46 29 49 76 20 128
INFO:main:layer 4: 6 78 44 130 71 105 33 21 138 88 53 47
INFO:main:layer 5: 112 10 43 9 55 87 31 111 73 34 115 126
INFO:main:layer 6: 57 104 18 64 23 14 103 67 120 84 75 11
INFO:main:layer 7: 36 86 40 91 45 74 50 42 5 92 58 109
INFO:main:layer 8: 59 52 51 66 60 4 125 94 24 132 68 63
INFO:main:layer 9: 136 1 95 19 122 98 83 80 26 131 90 134
INFO:main:layer 10: 133 135 96 118 137 39 140 110 3 143 127 7
INFO:main:layer 11: 2 139 113 85 82 142 106 65 69 117 16 38
INFO:main:layer 12: 97 37 12 48 30 93 54 101 141 15 56 32
INFO:main:Pruning: original score: 0.091866, threshold: 0.082679
INFO:main:Heads to mask: [117, 125, 140, 114, 121, 44, 112, 96, 109, 107, 108, 93, 105, 39]
INFO:main:lv, h > 1 2 3 4 5 6 7 8 9 10 11 12
INFO:main:layer 1: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
INFO:main:layer 2: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
INFO:main:layer 3: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
INFO:main:layer 4: 1.00000 1.00000 1.00000 0.00000 1.00000 1.00000 1.00000 1.00000 0.00000 1.00000 1.00000 1.00000
INFO:main:layer 5: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
INFO:main:layer 6: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
INFO:main:layer 7: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
INFO:main:layer 8: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.00000 1.00000 1.00000
INFO:main:layer 9: 0.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.00000 1.00000 0.00000
INFO:main:layer 10: 0.00000 0.00000 1.00000 1.00000 0.00000 1.00000 0.00000 1.00000 1.00000 0.00000 1.00000 1.00000
INFO:main:layer 11: 1.00000 0.00000 1.00000 1.00000 1.00000 0.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
INFO:main:layer 12: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.00000 1.00000 1.00000 1.00000
Iteration: 100%|██████████| 4561/4561 [01:54<00:00, 39.68it/s]
INFO:main:Attention entropies
INFO:main:lv, h > 1 2 3 4 5 6 7 8 9 10 11 12
INFO:main:layer 1: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 2: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 3: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 4: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 5: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 6: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 7: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 8: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 9: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 10: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 11: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:layer 12: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
INFO:main:Head importance scores
INFO:main:lv, h > 1 2 3 4 5 6 7 8 9 10 11 12
INFO:main:layer 1: 0.39574 0.02468 0.08140 0.12466 0.12766 1.00000 0.09364 0.03733 0.21125 0.05515 0.27669 0.01294
INFO:main:layer 2: 0.04871 0.01492 0.02147 0.00766 0.25008 0.16705 0.73248 0.48019 0.40388 0.39327 0.06609 0.38033
INFO:main:layer 3: 0.14803 0.02220 0.64758 0.29125 0.45867 0.02242 0.34411 0.33109 0.30959 0.29897 0.41782 0.01806
INFO:main:layer 4: 0.82395 0.20768 0.33463 0.05595 0.30457 0.01353 0.38665 0.33563 0.02096 0.12274 0.15990 0.30594
INFO:main:layer 5: 0.24467 0.77639 0.26634 0.43066 0.28580 0.15136 0.29888 0.09479 0.20161 0.38652 0.07106 0.09292
INFO:main:layer 6: 0.24192 0.04294 0.19242 0.18251 0.64465 0.64657 0.03439 0.17273 0.05866 0.20935 0.14715 0.51240
INFO:main:layer 7: 0.24221 0.11722 0.54783 0.09908 0.30887 0.33625 0.18271 0.09798 0.76243 0.19917 0.26639 0.02415
INFO:main:layer 8: 0.34639 0.10483 0.42852 0.23310 0.20756 0.85146 0.05960 0.06187 0.25805 0.12922 0.14193 0.28091
INFO:main:layer 9: 0.02696 0.97099 0.08023 0.36748 0.05116 0.06451 0.07015 0.23535 0.39404 0.14999 0.01570 0.01164
INFO:main:layer 10: 0.02307 0.01918 0.09727 0.05241 0.03105 0.32034 0.02875 0.08710 0.92011 0.00000 0.05011 0.59763
INFO:main:layer 11: 0.81794 0.00753 0.05141 0.14622 0.09715 0.00008 0.03071 0.09413 0.17420 0.05189 0.69652 0.31542
INFO:main:layer 12: 0.09312 0.36467 0.56134 0.25867 0.37935 0.06446 0.34758 0.09796 0.03789 0.56186 0.25654 0.37430
INFO:main:Head ranked by importance scores
INFO:main:lv, h > 1 2 3 4 5 6 7 8 9 10 11 12
INFO:main:layer 1: 24 126 100 85 84 0 96 120 64 111 52 138
INFO:main:layer 2: 117 136 131 140 58 75 8 18 23 26 104 29
INFO:main:layer 3: 79 130 10 49 19 129 36 40 43 47 22 134
INFO:main:layer 4: 4 66 39 110 46 137 27 38 132 86 76 45
INFO:main:layer 5: 59 6 54 20 50 77 48 94 68 28 102 98
INFO:main:layer 6: 61 118 70 72 12 11 121 74 109 65 80 17
INFO:main:layer 7: 60 87 16 89 44 37 71 90 7 69 53 127
INFO:main:layer 8: 35 88 21 63 67 3 108 107 56 83 82 51
INFO:main:layer 9: 125 1 101 32 115 105 103 62 25 78 135 139
INFO:main:layer 10: 128 133 92 112 122 41 124 99 2 143 116 13
INFO:main:layer 11: 5 141 114 81 93 142 123 95 73 113 9 42
INFO:main:layer 12: 97 33 15 55 30 106 34 91 119 14 57 31
INFO:main:Masking: current score: 0.092085, remaning heads 130 (90.3 percents)
INFO:main:Heads to mask: [15, 11, 41, 13, 106, 35, 14, 25, 29, 83, 1, 126, 66, 7]
INFO:main:lv, h > 1 2 3 4 5 6 7 8 9 10 11 12
INFO:main:layer 1: 1.00000 0.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.00000 1.00000 1.00000 1.00000 0.00000
INFO:main:layer 2: 1.00000 0.00000 0.00000 0.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
INFO:main:layer 3: 1.00000 0.00000 1.00000 1.00000 1.00000 0.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.00000
INFO:main:layer 4: 1.00000 1.00000 1.00000 0.00000 1.00000 0.00000 1.00000 1.00000 0.00000 1.00000 1.00000 1.00000
INFO:main:layer 5: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
INFO:main:layer 6: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.00000 1.00000 1.00000 1.00000 1.00000 1.00000
INFO:main:layer 7: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.00000
INFO:main:layer 8: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.00000 1.00000 1.00000
INFO:main:layer 9: 0.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.00000 0.00000 0.00000
INFO:main:layer 10: 0.00000 0.00000 1.00000 1.00000 0.00000 1.00000 0.00000 1.00000 1.00000 0.00000 1.00000 1.00000
INFO:main:layer 11: 1.00000 0.00000 1.00000 1.00000 1.00000 0.00000 0.00000 1.00000 1.00000 1.00000 1.00000 1.00000
INFO:main:layer 12: 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.00000 1.00000 1.00000 1.00000
Iteration: 0%| | 0/4561 [00:00<?, ?it/s]
Traceback (most recent call last):
File "run_bertology.py", line 426, in
main()
File "run_bertology.py", line 421, in main
head_mask = mask_heads(args, model, eval_dataloader)
File "run_bertology.py", line 179, in mask_heads
args, model, eval_dataloader, compute_entropy=False, head_mask=new_head_mask
File "run_bertology.py", line 104, in compute_heads_importance
head_importance += head_mask.grad.abs().detach()
AttributeError: 'NoneType' object has no attribute 'abs'
Environment info
transformersversion:2.8.0