LoadQueueUncache.scala - OpenGrok history log for /XiangShan/src/main/scala/xiangshan/mem/lsqueue/LoadQueueUncache.scala

Revision	Date	Author	Comments
# afa1262c	24-Feb-2025	Yanqin Li <[email protected]>	fix(LoadQueueUncache): exhaust the various cases of flush (#4300) Bug trigger point: The flush occurs during the `s_wait` phase. The entry has already passed the flush trigger condition of `io. fix(LoadQueueUncache): exhaust the various cases of flush (#4300) Bug trigger point: The flush occurs during the `s_wait` phase. The entry has already passed the flush trigger condition of `io.uncache.resp.fire`, leading to no flush. As a result, `needFlushReg` remains in the register until the next new entry's `io.uncache.resp.fire`, at which point the normal entry is flushed, causing the program to stuck. Bug analysis: The granularity of flush handling is too coarse. In the original calculation: ``` val flush = (needFlush && uncacheState === s_idle) \|\| (io.uncache.resp.fire && needFlushReg) ``` Flush is only handled in two states: `s_idle` and non-`s_idle`. This distinction makes the handling of the other three non-`s_idle` states very coarse. In fact, for the remaining three states, there needs to be corresponding feedback based on when `needFlush` is generated and when `NeedFlushReg` is delayed in the register. 1. In the `s_req` state, before the uncache request is sent, the flush can be performed in time, using `needFlush` to prevent the request from being sent. 2. If the request has been sent and the state reaches `s_resp`, to avoid mismatch between the uncache request and response, the flush can be only performed after receiving the uncache response, i.e., use `needFlush \|\| needFlushReg` to flush when `io.uncache.resp.fire`. 3. If a flush occurs during the `s_wait` state, it can also prevent a write-back and use `needFlush` to flush in time. Bug Fix: For better code readability, the `uncacheState` state machine update is used here to update the `wire` `flush`. Where `flush` refers to executing the flush, `needFlush` refers to the signal that triggers the flush, and `needFlushReg` refers to the flush signal stored for delayed processing flush. show more ...
# 9e12e8ed	08-Feb-2025	cz4e <[email protected]>	style(Bundles): move bundles to Bundles.scala (#4247)
# c590fb32	08-Feb-2025	cz4e <[email protected]>	refactor(MemBlock): move MemBlock.scala from backend to mem (#4221)
# 74050fc0	26-Jan-2025	Yanqin Li <[email protected]>	perf(Uncache): add merge policy when entering (#4154) # Background ## Problem How to design a more efficient entry rule for a new load/store request when a load/store with the same address already perf(Uncache): add merge policy when entering (#4154) # Background ## Problem How to design a more efficient entry rule for a new load/store request when a load/store with the same address already exists in the `ubuffer`？ * Old Design: Always reject the new request. * New Design: Consider merging requests. ## Merge Scenarios ‼️If the new one can be merge into the existing one, both need to be `NC`. 1. New Store Request: 1. Existing Store: Merge (the new store is younger). 2. Existing Load: Reject. 2. New Load Request: 1. Existing Load: Merge (the new load may be younger or older. Both are ok to merge). 2. Existing Store: Reject. # What this PR do? ## 1. Entry Actions 1. Allocate a new entry and mark as `valid` 1. When there is no matching address. 2. Allocate a new entry and mark as `valid` and `waitSame`: 1. When there is a matching address, and: * The virtual addresses and attributes are the same. * The older entry is either selected to issue or issued. 3. Merge into an Existing Entry: 1. When there is a matching address, and: * The virtual addresses and attributes are the same. * The older entry is not selected to issue or issued. 4. Reject the New Request: 1. When the ubuffer is full. 2. When there is a matching address, but: * The virtual addresses or attributes are different. NOTE: According to the definition in the TL-UL SPEC, the `mask` must be continuous and naturally aligned, and the `addr` must correspond to the mask. Therefore, the "same attributes" here introduces a new condition: the merged `mask` must meet the requirements of being continuous and naturally aligned (function `continueAndAlign`). During merging, the block offset of addr must be synchronously updated in `UncacheEntry.update`. ## 2. Handshake Mechanism Between `LoadQueueUncache (M)` and `Uncache (S)` > `mid`: master id > > `sid`: slave id Old Design: - `M` sends a `req` with a `mid`. - `S` receives the `req`, records the `mid`. - `S` sends a `resp` with the `mid`. - `M` receives the `resp` and matches it with the recorded `mid`. New Design: - `M` sends a `req` with a `mid`. - `S` receives the `req` and responds with `{mid, sid}` . - `M` matches it with the `mid` and updates its record with the received `sid`. - `S` sends a `resp` with the its `sid`. - `M` receives the `resp` and matches it with the recorded `sid`. Benefit: The new design allows `S` to merge requests when new request enters. ## 3. Forwarding Mechanism Old Design: Each address in the `ubuffer` is unique, so forwarding is straightforward based on a match. New Design: * A single address may have up to two entries matched in the `ubuffer`. * If it has two matched enties, it must be true that one entry is marked `inflight` and the other entry is marked `waitSame`. In this case, the forwarded data comes from the merged data of two entries, with the `inflight` entry being the older one. ## 4. Bug Fixes 1. In the `loadUnit`, `!tlbMiss` cannot be directly used as `tlbHit`, because when `tlbValid` is false, `!tlbMiss` can still be true. 2. `Uncache` state machine transition: The state indicating "able to send requests" (previously `s_refill_req`, now `s_inflight`) should not be triggered by `reqFire` but rather by `acquireFire`. <img width="747" alt="image" src="https://github.com/user-attachments/assets/75fbc761-1da8-43d9-a0e6-615cc58cefef" /> # Evaluation - ✅ timing - ✅ performance \| Type \| 4B1000 \| Speedup1-IO \| 1B4096 \| Speedup2-IO \| \| -------------- \| ------- \| ----------- \| ------- \| ----------- \| \| IO \| 51026 \| 1 \| 208149 \| 1.00 \| \| NC \| 42343 \| 1.21 \| 169248 \| 1.23 \| \| NC+OT \| 20379 \| 2.50 \| 160101 \| 1.30 \| \| NC+OT+mergeOpt \| 16308 \| 3.13 \| 126369 \| 1.65 \| \| cache \| 1298 \| 39.31 \| 4410 \| 47.20 \| show more ...
# a035c20d	02-Jan-2025	Yanqin Li <[email protected]>	fix(LQUncache): fix a potential deadblock when enqueue (#4096) Old design: When enqueuing, it is in the order of ldu0-1, i.e. ldu0 is allocated first. Bug scene: LQUncacheBuffer is small. T fix(LQUncache): fix a potential deadblock when enqueue (#4096) Old design: When enqueuing, it is in the order of ldu0-1, i.e. ldu0 is allocated first. Bug scene: LQUncacheBuffer is small. The enqueue `robIdx` of ldu0-1 is [57, 56, 55], the [57, 56] can enqueue, and [55] can not because buffer is full. 57/56 send the `NC` request after enqueuing. 55 is rollbacked. In principle, 57 and 56 need be flushed. But to ensure the correspondence between requests and responses of uncache, 57 is flushed when getting the uncache response. So when the same sequence [57, 56, 55] is coming, there is still no space to allocate 55, which causes that it is rollbacked again. Then a deadblock emerged. This bug is triggered after cutting `LoadUncacheBufferSize` from 20 to 4. One way to fix: When enqueuing, it is in the order of `robIdx`, i.e. the oldest is allocated first. show more ...
# 519244c7	25-Dec-2024	Yanqin Li <[email protected]>	submodule(CoupledL2, OpenLLC): support pbmt in CHI scene (#4071) * L1: deliver the NC and PMA signals of uncacheReq to L2 * L2: [support Svpbmt on CHI MemAttr](https://github.com/OpenXiangShan/Coupl submodule(CoupledL2, OpenLLC): support pbmt in CHI scene (#4071) * L1: deliver the NC and PMA signals of uncacheReq to L2 * L2: [support Svpbmt on CHI MemAttr](https://github.com/OpenXiangShan/CoupledL2/pull/273) * LLC: [Non-cache requests are forwarded directly downstream without entering the slice](https://github.com/OpenXiangShan/OpenLLC/pull/28) show more ...
# 54b55f34	24-Dec-2024	Yanqin Li <[email protected]>	fix(LQUncache): consider offset when allocating (#4080) bug scene: When the valid vector of ldu0-2 is [0, 0, 1], and the freelist can only allocate one entry (when the `canAllocate` vector is [1, 0 fix(LQUncache): consider offset when allocating (#4080) bug scene: When the valid vector of ldu0-2 is [0, 0, 1], and the freelist can only allocate one entry (when the `canAllocate` vector is [1, 0, 0]), the ldu2's request can not be allocated and then be rollbacked. This is because the allocation did not take into account the valid offset. show more ...
# 8b33cd30	13-Dec-2024	klin02 <[email protected]>	feat(XSLog): move all XSLog outside WhenContext for collection As data in WhenContext is not acessible in another module. To support XSLog collection, we move all XSLog and related signal outside Wh feat(XSLog): move all XSLog outside WhenContext for collection As data in WhenContext is not acessible in another module. To support XSLog collection, we move all XSLog and related signal outside WhenContext. For example, when(cond1){XSDebug(cond2, pable)} to XSDebug(cond1 && cond2, pable) show more ...
# e10e20c6	27-Nov-2024	Yanqin Li <[email protected]>	style(pbmt): remove the useless and standardize code * style(pbmt): remove outstanding constant which is just for self-test * fix(uncache): added mask comparison for `addrMatch` * style(mem): code style(pbmt): remove the useless and standardize code * style(pbmt): remove outstanding constant which is just for self-test * fix(uncache): added mask comparison for `addrMatch` * style(mem): code normalization * fix(pbmt): handle cases where the load unit is byte, word, etc * style(uncache): fix an import * fix(uncahce): address match should use non-offset address when forwading In this case, to ensure correct forwarding, stores with the same address but overlapping masks cannot be entered at the same time. * style(RAR): remove redundant design of `nc` reg show more ...
# e9e6cd09	27-Nov-2024	Yanqin Li <[email protected]>	perf(uncache): mmio and nc share LQUncache; nc data can writeback to ldu1-2