Skip to content

Commit d53c372

Browse files
JoonsooKimExactExampl
authored andcommitted
zram: implement deduplication in zram
This patch implements deduplication feature in zram. The purpose of this work is naturally to save amount of memory usage by zram. Android is one of the biggest users to use zram as swap and it's really important to save amount of memory usage. There is a paper that reports that duplication ratio of Android's memory content is rather high [1]. And, there is a similar work on zswap that also reports that experiments has shown that around 10-15% of pages stored in zswp are duplicates and deduplicate them provides some benefits [2]. Also, there is a different kind of workload that uses zram as blockdev and store build outputs into it to reduce wear-out problem of real blockdev. In this workload, deduplication hit is very high due to temporary files and intermediate object files. Detailed analysis is on the bottom of this description. Anyway, if we can detect duplicated content and avoid to store duplicated content at different memory space, we can save memory. This patch tries to do that. Implementation is almost simple and intuitive but I should note one thing about implementation detail. To check duplication, this patch uses checksum of the page and collision of this checksum could be possible. There would be many choices to handle this situation but this patch chooses to allow entry with duplicated checksum to be added to the hash, but, not to compare all entries with duplicated checksum when checking duplication. I guess that checksum collision is quite rare event and we don't need to pay any attention to such a case. Therefore, I decided the most simplest way to implement the feature. If there is a different opinion, I can accept and go that way. Following is the result of this patch. Test result #1 (Swap): Android Marshmallow, emulator, x86_64, Backporting to kernel v3.18 orig_data_size: 145297408 compr_data_size: 32408125 mem_used_total: 32276480 dup_data_size: 3188134 meta_data_size: 1444272 Last two metrics added to mm_stat are related to this work. First one, dup_data_size, is amount of saved memory by avoiding to store duplicated page. Later one, meta_data_size, is the amount of data structure to support deduplication. If dup > meta, we can judge that the patch improves memory usage. In Adnroid, we can save 5% of memory usage by this work. Test result #2 (Blockdev): build the kernel and store output to ext4 FS on zram <no-dedup> Elapsed time: 249 s mm_stat: 430845952 191014886 196898816 0 196898816 28320 0 0 0 <dedup> Elapsed time: 250 s mm_stat: 430505984 190971334 148365312 0 148365312 28404 0 47287038 3945792 There is no performance degradation and save 23% memory. Test result #3 (Blockdev): copy android build output dir(out/host) to ext4 FS on zram <no-dedup> Elapsed time: out/host: 88 s mm_stat: 8834420736 3658184579 3834208256 0 3834208256 32889 0 0 0 <dedup> Elapsed time: out/host: 100 s mm_stat: 8832929792 3657329322 2832015360 0 2832015360 32609 0 952568877 80880336 It shows performance degradation roughly 13% and save 24% memory. Maybe, it is due to overhead of calculating checksum and comparison. Test result #4 (Blockdev): copy android build output dir(out/target/common) to ext4 FS on zram <no-dedup> Elapsed time: out/host: 203 s mm_stat: 4041678848 2310355010 2346577920 0 2346582016 500 4 0 0 <dedup> Elapsed time: out/host: 201 s mm_stat: 4041666560 2310488276 1338150912 0 1338150912 476 0 989088794 24564336 Memory is saved by 42% and performance is the same. Even if there is overhead of calculating checksum and comparison, large hit ratio compensate it since hit leads to less compression attempt. I checked the detailed reason of savings on kernel build workload and there are some cases that deduplication happens. 1) *.cmd Build command is usually similar in one directory so content of these file are very similar. In my system, more than 789 lines in fs/ext4/.namei.o.cmd and fs/ext4/.inode.o.cmd are the same in 944 and 938 lines of the file, respectively. 2) intermediate object files built-in.o and temporary object file have the similar contents. More than 50% of fs/ext4/ext4.o is the same with fs/ext4/built-in.o. 3) vmlinux .tmp_vmlinux1 and .tmp_vmlinux2 and arch/x86/boo/compressed/vmlinux.bin have the similar contents. Android test has similar case that some of object files(.class and .so) are similar with another ones. (./host/linux-x86/lib/libartd.so and ./host/linux-x86-lib/libartd-comiler.so) Anyway, benefit seems to be largely dependent on the workload so following patch will make this feature optional. However, this feature can help some usecases so is deserved to be merged. [1]: MemScope: Analyzing Memory Duplication on Android Systems, dl.acm.org/citation.cfm?id=2797023 [2]: zswap: Optimize compressed pool memory utilization, lkml.kernel.org/r/1341407574.7551.1471584870761.JavaMail.weblogic@epwas3p2 Change-Id: I8fe80c956c33f88a6af337d50d9e210e5c35ce37 Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: https://lore.kernel.org/patchwork/patch/787162/ Patch-mainline: linux-kernel@ Thu, 11 May 2017 22:30:26 Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> Signed-off-by: Marco Zanin <mrczn.bb@gmail.com> Signed-off-by: snnbyyds <snnbyyds@gmail.com>
1 parent dffd1d9 commit d53c372

File tree

6 files changed

+287
-9
lines changed

6 files changed

+287
-9
lines changed

Documentation/blockdev/zram.txt

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -224,6 +224,9 @@ line of text and contains the following stats separated by whitespace:
224224
pages_compacted the number of pages freed during compaction
225225
huge_pages the number of incompressible pages
226226

227+
dup_data_size deduplicated data size
228+
meta_data_size the amount of metadata allocated for deduplication feature
229+
227230
File /sys/block/zram<id>/bd_stat
228231

229232
The stat file represents device's backing device statistics. It consists of
@@ -253,6 +256,7 @@ a single line of text and contains the following stats separated by whitespace:
253256
= writeback
254257

255258
With CONFIG_ZRAM_WRITEBACK, zram can write idle/incompressible page
259+
256260
to backing storage rather than keeping it in memory.
257261
To use the feature, admin should set up backing device via
258262

@@ -352,4 +356,4 @@ storage. It's a debugging feature so anyone shouldn't rely on it to work
352356
properly.
353357

354358
Nitin Gupta
355-
ngupta@vflare.org
359+
ngupta@vflare.org

drivers/block/zram/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
zram-y := zcomp.o zram_drv.o
1+
zram-y := zcomp.o zram_drv.o zram_dedup.o
22

33
obj-$(CONFIG_ZRAM) += zram.o

drivers/block/zram/zram_dedup.c

Lines changed: 204 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
/*
2+
* Copyright (C) 2017 Joonsoo Kim.
3+
*
4+
* This program is free software; you can redistribute it and/or
5+
* modify it under the terms of the GNU General Public License
6+
* as published by the Free Software Foundation; either version
7+
* 2 of the License, or (at your option) any later version.
8+
*/
9+
10+
#include <linux/vmalloc.h>
11+
#include <linux/jhash.h>
12+
#include <linux/highmem.h>
13+
14+
#include "zram_drv.h"
15+
16+
/* One slot will contain 128 pages theoretically */
17+
#define ZRAM_HASH_SHIFT 7
18+
#define ZRAM_HASH_SIZE_MIN (1 << 10)
19+
#define ZRAM_HASH_SIZE_MAX (1 << 31)
20+
21+
u64 zram_dedup_dup_size(struct zram *zram)
22+
{
23+
return (u64)atomic64_read(&zram->stats.dup_data_size);
24+
}
25+
26+
u64 zram_dedup_meta_size(struct zram *zram)
27+
{
28+
return (u64)atomic64_read(&zram->stats.meta_data_size);
29+
}
30+
31+
static u32 zram_dedup_checksum(unsigned char *mem)
32+
{
33+
return jhash(mem, PAGE_SIZE, 0);
34+
}
35+
36+
void zram_dedup_insert(struct zram *zram, struct zram_entry *new,
37+
u32 checksum)
38+
{
39+
struct zram_hash *hash;
40+
struct rb_root *rb_root;
41+
struct rb_node **rb_node, *parent = NULL;
42+
struct zram_entry *entry;
43+
44+
new->checksum = checksum;
45+
hash = &zram->hash[checksum % zram->hash_size];
46+
rb_root = &hash->rb_root;
47+
48+
spin_lock(&hash->lock);
49+
rb_node = &rb_root->rb_node;
50+
while (*rb_node) {
51+
parent = *rb_node;
52+
entry = rb_entry(parent, struct zram_entry, rb_node);
53+
if (checksum < entry->checksum)
54+
rb_node = &parent->rb_left;
55+
else if (checksum > entry->checksum)
56+
rb_node = &parent->rb_right;
57+
else
58+
rb_node = &parent->rb_left;
59+
}
60+
61+
rb_link_node(&new->rb_node, parent, rb_node);
62+
rb_insert_color(&new->rb_node, rb_root);
63+
spin_unlock(&hash->lock);
64+
}
65+
66+
static bool zram_dedup_match(struct zram *zram, struct zram_entry *entry,
67+
unsigned char *mem)
68+
{
69+
bool match = false;
70+
unsigned char *cmem;
71+
struct zcomp_strm *zstrm;
72+
73+
cmem = zs_map_object(zram->mem_pool, entry->handle, ZS_MM_RO);
74+
if (entry->len == PAGE_SIZE) {
75+
match = !memcmp(mem, cmem, PAGE_SIZE);
76+
} else {
77+
zstrm = zcomp_stream_get(zram->comp);
78+
if (!zcomp_decompress(zstrm, cmem, entry->len, zstrm->buffer))
79+
match = !memcmp(mem, zstrm->buffer, PAGE_SIZE);
80+
zcomp_stream_put(zram->comp);
81+
}
82+
zs_unmap_object(zram->mem_pool, entry->handle);
83+
84+
return match;
85+
}
86+
87+
static unsigned long zram_dedup_put(struct zram *zram,
88+
struct zram_entry *entry)
89+
{
90+
struct zram_hash *hash;
91+
u32 checksum;
92+
93+
checksum = entry->checksum;
94+
hash = &zram->hash[checksum % zram->hash_size];
95+
96+
spin_lock(&hash->lock);
97+
98+
entry->refcount--;
99+
if (!entry->refcount)
100+
rb_erase(&entry->rb_node, &hash->rb_root);
101+
else
102+
atomic64_sub(entry->len, &zram->stats.dup_data_size);
103+
104+
spin_unlock(&hash->lock);
105+
106+
return entry->refcount;
107+
}
108+
109+
static struct zram_entry *zram_dedup_get(struct zram *zram,
110+
unsigned char *mem, u32 checksum)
111+
{
112+
struct zram_hash *hash;
113+
struct zram_entry *entry;
114+
struct rb_node *rb_node;
115+
116+
hash = &zram->hash[checksum % zram->hash_size];
117+
118+
spin_lock(&hash->lock);
119+
rb_node = hash->rb_root.rb_node;
120+
while (rb_node) {
121+
entry = rb_entry(rb_node, struct zram_entry, rb_node);
122+
if (checksum == entry->checksum) {
123+
entry->refcount++;
124+
atomic64_add(entry->len, &zram->stats.dup_data_size);
125+
spin_unlock(&hash->lock);
126+
127+
if (zram_dedup_match(zram, entry, mem))
128+
return entry;
129+
130+
zram_entry_free(zram, entry);
131+
132+
return NULL;
133+
}
134+
135+
if (checksum < entry->checksum)
136+
rb_node = rb_node->rb_left;
137+
else
138+
rb_node = rb_node->rb_right;
139+
}
140+
spin_unlock(&hash->lock);
141+
142+
return NULL;
143+
}
144+
145+
struct zram_entry *zram_dedup_find(struct zram *zram, struct page *page,
146+
u32 *checksum)
147+
{
148+
void *mem;
149+
struct zram_entry *entry;
150+
151+
mem = kmap_atomic(page);
152+
*checksum = zram_dedup_checksum(mem);
153+
154+
entry = zram_dedup_get(zram, mem, *checksum);
155+
kunmap_atomic(mem);
156+
157+
return entry;
158+
}
159+
160+
void zram_dedup_init_entry(struct zram *zram, struct zram_entry *entry,
161+
unsigned long handle, unsigned int len)
162+
{
163+
entry->handle = handle;
164+
entry->refcount = 1;
165+
entry->len = len;
166+
}
167+
168+
bool zram_dedup_put_entry(struct zram *zram, struct zram_entry *entry)
169+
{
170+
if (zram_dedup_put(zram, entry))
171+
return false;
172+
173+
return true;
174+
}
175+
176+
int zram_dedup_init(struct zram *zram, size_t num_pages)
177+
{
178+
int i;
179+
struct zram_hash *hash;
180+
181+
zram->hash_size = num_pages >> ZRAM_HASH_SHIFT;
182+
zram->hash_size = min_t(size_t, ZRAM_HASH_SIZE_MAX, zram->hash_size);
183+
zram->hash_size = max_t(size_t, ZRAM_HASH_SIZE_MIN, zram->hash_size);
184+
zram->hash = vzalloc(zram->hash_size * sizeof(struct zram_hash));
185+
if (!zram->hash) {
186+
pr_err("Error allocating zram entry hash\n");
187+
return -ENOMEM;
188+
}
189+
190+
for (i = 0; i < zram->hash_size; i++) {
191+
hash = &zram->hash[i];
192+
spin_lock_init(&hash->lock);
193+
hash->rb_root = RB_ROOT;
194+
}
195+
196+
return 0;
197+
}
198+
199+
void zram_dedup_fini(struct zram *zram)
200+
{
201+
vfree(zram->hash);
202+
zram->hash = NULL;
203+
zram->hash_size = 0;
204+
}

drivers/block/zram/zram_dedup.h

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
#ifndef _ZRAM_DEDUP_H_
2+
#define _ZRAM_DEDUP_H_
3+
4+
struct zram;
5+
struct zram_entry;
6+
7+
u64 zram_dedup_dup_size(struct zram *zram);
8+
u64 zram_dedup_meta_size(struct zram *zram);
9+
10+
void zram_dedup_insert(struct zram *zram, struct zram_entry *new,
11+
u32 checksum);
12+
struct zram_entry *zram_dedup_find(struct zram *zram, struct page *page,
13+
u32 *checksum);
14+
15+
void zram_dedup_init_entry(struct zram *zram, struct zram_entry *entry,
16+
unsigned long handle, unsigned int len);
17+
bool zram_dedup_put_entry(struct zram *zram, struct zram_entry *entry);
18+
19+
int zram_dedup_init(struct zram *zram, size_t num_pages);
20+
void zram_dedup_fini(struct zram *zram);
21+
22+
#endif /* _ZRAM_DEDUP_H_ */

drivers/block/zram/zram_drv.c

Lines changed: 34 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1084,17 +1084,19 @@ static ssize_t mm_stat_show(struct device *dev,
10841084
max_used = atomic_long_read(&zram->stats.max_used_pages);
10851085

10861086
ret = scnprintf(buf, PAGE_SIZE,
1087-
"%8llu %8llu %8llu %8lu %8ld %8llu %8lu %8llu\n",
1087+
"%8llu %8llu %8llu %8lu %8ld %8llu %8lu %8llu %8llu %8llu\n",
10881088
orig_size << PAGE_SHIFT,
10891089
(u64)atomic64_read(&zram->stats.compr_data_size),
10901090
mem_used << PAGE_SHIFT,
10911091
zram->limit_pages << PAGE_SHIFT,
10921092
max_used << PAGE_SHIFT,
10931093
(u64)atomic64_read(&zram->stats.same_pages),
10941094
atomic_long_read(&pool_stats.pages_compacted),
1095-
(u64)atomic64_read(&zram->stats.huge_pages));
1096-
up_read(&zram->init_lock);
1095+
(u64)atomic64_read(&zram->stats.huge_pages),
1096+
zram_dedup_dup_size(zram),
1097+
zram_dedup_meta_size(zram));
10971098

1099+
up_read(&zram->init_lock);
10981100
return ret;
10991101
}
11001102

@@ -1148,26 +1150,35 @@ static struct zram_entry *zram_entry_alloc(struct zram *zram,
11481150
unsigned int len, gfp_t flags)
11491151
{
11501152
struct zram_entry *entry;
1153+
unsigned long handle;
11511154

11521155
entry = kzalloc(sizeof(*entry),
11531156
flags & ~(__GFP_HIGHMEM|__GFP_MOVABLE|__GFP_CMA));
11541157
if (!entry)
11551158
return NULL;
11561159

1157-
entry->handle = zs_malloc(zram->mem_pool, len, flags);
1158-
if (!entry->handle) {
1160+
handle = zs_malloc(zram->mem_pool, len, flags);
1161+
if (!handle) {
11591162
kfree(entry);
11601163
return NULL;
11611164
}
11621165

1166+
zram_dedup_init_entry(zram, entry, handle, len);
1167+
atomic64_add(sizeof(*entry), &zram->stats.meta_data_size);
1168+
11631169
return entry;
11641170
}
11651171

1166-
static inline void zram_entry_free(struct zram *zram,
1167-
struct zram_entry *entry)
1172+
void zram_entry_free(struct zram *zram, struct zram_entry *entry)
1173+
11681174
{
1175+
if (!zram_dedup_put_entry(zram, entry))
1176+
return;
1177+
11691178
zs_free(zram->mem_pool, entry->handle);
11701179
kfree(entry);
1180+
1181+
atomic64_sub(sizeof(*entry), &zram->stats.meta_data_size);
11711182
}
11721183

11731184
static void zram_meta_free(struct zram *zram, u64 disksize)
@@ -1180,6 +1191,7 @@ static void zram_meta_free(struct zram *zram, u64 disksize)
11801191
zram_free_page(zram, index);
11811192

11821193
zs_destroy_pool(zram->mem_pool);
1194+
zram_dedup_fini(zram);
11831195
vfree(zram->table);
11841196
}
11851197

@@ -1200,6 +1212,13 @@ static bool zram_meta_alloc(struct zram *zram, u64 disksize)
12001212

12011213
if (!huge_class_size)
12021214
huge_class_size = zs_huge_class_size(zram->mem_pool);
1215+
1216+
if (zram_dedup_init(zram, num_pages)) {
1217+
vfree(zram->table);
1218+
zs_destroy_pool(zram->mem_pool);
1219+
return false;
1220+
}
1221+
12031222
return true;
12041223
}
12051224

@@ -1362,6 +1381,7 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
13621381
void *src, *dst, *mem;
13631382
struct zcomp_strm *zstrm;
13641383
struct page *page = bvec->bv_page;
1384+
u32 checksum;
13651385
unsigned long element = 0;
13661386
enum zram_pageflags flags = 0;
13671387

@@ -1375,6 +1395,12 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
13751395
}
13761396
kunmap_atomic(mem);
13771397

1398+
entry = zram_dedup_find(zram, page, &checksum);
1399+
if (entry) {
1400+
comp_len = entry->len;
1401+
goto out;
1402+
}
1403+
13781404
compress_again:
13791405
zstrm = zcomp_stream_get(zram->comp);
13801406
src = kmap_atomic(page);
@@ -1442,6 +1468,7 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
14421468
zcomp_stream_put(zram->comp);
14431469
zs_unmap_object(zram->mem_pool, entry->handle);
14441470
atomic64_add(comp_len, &zram->stats.compr_data_size);
1471+
zram_dedup_insert(zram, entry, checksum);
14451472
out:
14461473
/*
14471474
* Free memory associated with this sector

0 commit comments

Comments
 (0)