Conversation
A degraded EC read will allocate and register an extra buffer to recover data, which may cause ENOMEM in some cases. this workaround does not prevent dynamic buffer allocation and registration, it does provide relatively precise control over the resources consumed by degraded EC reads. Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
For data migration, after being waken up, the ULT should try to wake up another ULT if there is still available resource. Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
- Add resource bucket so overall resource consumption wouldn't grow on system configured with more targets - Track demanded resource and waitq for blocked ULT, and wakeup as many waiters as resource(being released) allowed - Code cleanup Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
increase default resource limit Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
Fix a reference leak in migrate_fini_one_ult() Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
|
Errors are component not formatted correctly,Ticket number prefix incorrect,PR title is malformatted. See https://daosio.atlassian.net/wiki/spaces/DC/pages/11133911069/Commit+Comments,Unable to load ticket data |
|
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17670/1/testReport/ |
Signed-off-by: Wang Shilong <shilong.wang@hpe.com>
51e9635 to
c22d71c
Compare
|
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17670/2/testReport/ |
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
c22d71c to
839040f
Compare
|
Test stage Functional on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17670/2/execution/node/1280/log |
|
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17670/4/testReport/ |
- hulk data handling is not required anymore, it's replaced by starveling mechanism - remove the "yield" and simplify code Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
If a rebuild hang is detected, dump resource bucket information and the waiter queue head Signed-off-by: Wang Shilong <shilong.wang@hpe.com>
Signed-off-by: Wang Shilong <shilong.wang@hpe.com>
b337427 to
6b707a3
Compare
|
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17670/5/testReport/ |
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
The current DAOS only reserves RF × group_size targets for a GX object, whereas it should reserve targets_per_domain × RF. In addition, when the number of reserved targets is a fixed value, the likelihood of losing those targets grows significantly as the cluster scales, which can cause collocated shards and extra data movement during rebuild. This patch changes the number of reserved targets for a GX object to be no less than 30% of the targets. Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
- remove _get_target and _get_dom, these function are not required. pool_map of DAOS can guarantee extension always appends new children after old children, meanwhile, dom_avail_children (the original get_num_domains) only returns the number of valid sub-domains/targets, so it doesn't make sense to traverse, which is very expensive for large object. - Instead of checking bit by bit, tgt_isset_range and dom_isset_range can skip the entire byte of the bitmap if it is 0xFF. Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
Skip the big loop for logging if DB_PL is not enabled, otherwise it could loop for 10K+ times for nothing on large system. pool_map_find_domain() can just return the first domain if domain type is ROOT. Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
6b707a3 to
46b871a
Compare
|
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17670/6/testReport/ |
Steps for the author:
After all prior steps are complete: