LSQWrapper.scala - OpenGrok history log for /XiangShan/src/main/scala/xiangshan/mem/lsqueue/LSQWrapper.scala

Revision	Date	Author	Comments
# 522c7f99	07-Mar-2025	Anzo <[email protected]>	fix(LSU): misaligned violation detection stuck (#4369) Since a load instruction that cross 16Byte needs to be split and accessed twice, it needs to enter the `RAR Queue` twice, but occupies only one fix(LSU): misaligned violation detection stuck (#4369) Since a load instruction that cross 16Byte needs to be split and accessed twice, it needs to enter the `RAR Queue` twice, but occupies only one `virtual load queue`, so in the extreme case it may happen that 36 load instructions that span 16Byte fill all 72 `RAR queues`. --- There is some problem with our previous handling; if the oldest load instruction spanning 16Byte enters the `replayqueue` and at the same time there exists an instruction in the `loadmisalignbuffer` that can't finish executing because the `RAR Queue` is full, then the oldest load instruction is never cannot be issued because the `loadmisalignbuffer` has instructions in it all the time. --- Therefore, we use a more violent scheme to do this. When the RAR is full, we let the misaligned load generate a rollback, and the next load instruction that the loadmisalignbuffer can receive must be the oldest (if it is misaligned). show more ...
# 3c808de0	17-Feb-2025	Anzo <[email protected]>	fix(LSU): fix cbo instr exceptions and implementation (#4262) 1. typo. 2. `cbo` instr not produce misaligned exception. 3. `cbo zero` instr need flush `sbuffer`. 4. `cbo zero` sets mask correctly fix(LSU): fix cbo instr exceptions and implementation (#4262) 1. typo. 2. `cbo` instr not produce misaligned exception. 3. `cbo zero` instr need flush `sbuffer`. 4. `cbo zero` sets mask correctly 5. Adding RAW checks to `cbo zero`. 6. Adding trigger(Debug Mode) checks to `cbo zero`. 7. Fixed several issues with the CBO instruction in NEMU. ---- In order not to create ambiguity with `io.mmioStout`, a new port of `StoreQueue` is introduced for writeback `cbo zero` after flush sbuffer. arbitration is performed in `MemBlock`, and currently, `cbo zero` has higher priority by default. `cbo zero` should not be writteback at the same time as `mmio`. --- A check on `CacheLine` has been added to `RAWQueue` to ensure memory consistency when executing `cbo zero`. See this issues:https://github.com/OpenXiangShan/XiangShan/issues/4240 for specific issues. --- The `cbo` instruction requires a trigger check. --------- Co-authored-by: zhanglinjuan <[email protected]> show more ...
# 9e12e8ed	08-Feb-2025	cz4e <[email protected]>	style(Bundles): move bundles to Bundles.scala (#4247)
# 74050fc0	26-Jan-2025	Yanqin Li <[email protected]>	perf(Uncache): add merge policy when entering (#4154) # Background ## Problem How to design a more efficient entry rule for a new load/store request when a load/store with the same address already perf(Uncache): add merge policy when entering (#4154) # Background ## Problem How to design a more efficient entry rule for a new load/store request when a load/store with the same address already exists in the `ubuffer`？ * Old Design: Always reject the new request. * New Design: Consider merging requests. ## Merge Scenarios ‼️If the new one can be merge into the existing one, both need to be `NC`. 1. New Store Request: 1. Existing Store: Merge (the new store is younger). 2. Existing Load: Reject. 2. New Load Request: 1. Existing Load: Merge (the new load may be younger or older. Both are ok to merge). 2. Existing Store: Reject. # What this PR do? ## 1. Entry Actions 1. Allocate a new entry and mark as `valid` 1. When there is no matching address. 2. Allocate a new entry and mark as `valid` and `waitSame`: 1. When there is a matching address, and: * The virtual addresses and attributes are the same. * The older entry is either selected to issue or issued. 3. Merge into an Existing Entry: 1. When there is a matching address, and: * The virtual addresses and attributes are the same. * The older entry is not selected to issue or issued. 4. Reject the New Request: 1. When the ubuffer is full. 2. When there is a matching address, but: * The virtual addresses or attributes are different. NOTE: According to the definition in the TL-UL SPEC, the `mask` must be continuous and naturally aligned, and the `addr` must correspond to the mask. Therefore, the "same attributes" here introduces a new condition: the merged `mask` must meet the requirements of being continuous and naturally aligned (function `continueAndAlign`). During merging, the block offset of addr must be synchronously updated in `UncacheEntry.update`. ## 2. Handshake Mechanism Between `LoadQueueUncache (M)` and `Uncache (S)` > `mid`: master id > > `sid`: slave id Old Design: - `M` sends a `req` with a `mid`. - `S` receives the `req`, records the `mid`. - `S` sends a `resp` with the `mid`. - `M` receives the `resp` and matches it with the recorded `mid`. New Design: - `M` sends a `req` with a `mid`. - `S` receives the `req` and responds with `{mid, sid}` . - `M` matches it with the `mid` and updates its record with the received `sid`. - `S` sends a `resp` with the its `sid`. - `M` receives the `resp` and matches it with the recorded `sid`. Benefit: The new design allows `S` to merge requests when new request enters. ## 3. Forwarding Mechanism Old Design: Each address in the `ubuffer` is unique, so forwarding is straightforward based on a match. New Design: * A single address may have up to two entries matched in the `ubuffer`. * If it has two matched enties, it must be true that one entry is marked `inflight` and the other entry is marked `waitSame`. In this case, the forwarded data comes from the merged data of two entries, with the `inflight` entry being the older one. ## 4. Bug Fixes 1. In the `loadUnit`, `!tlbMiss` cannot be directly used as `tlbHit`, because when `tlbValid` is false, `!tlbMiss` can still be true. 2. `Uncache` state machine transition: The state indicating "able to send requests" (previously `s_refill_req`, now `s_inflight`) should not be triggered by `reqFire` but rather by `acquireFire`. <img width="747" alt="image" src="https://github.com/user-attachments/assets/75fbc761-1da8-43d9-a0e6-615cc58cefef" /> # Evaluation - ✅ timing - ✅ performance \| Type \| 4B1000 \| Speedup1-IO \| 1B4096 \| Speedup2-IO \| \| -------------- \| ------- \| ----------- \| ------- \| ----------- \| \| IO \| 51026 \| 1 \| 208149 \| 1.00 \| \| NC \| 42343 \| 1.21 \| 169248 \| 1.23 \| \| NC+OT \| 20379 \| 2.50 \| 160101 \| 1.30 \| \| NC+OT+mergeOpt \| 16308 \| 3.13 \| 126369 \| 1.65 \| \| cache \| 1298 \| 39.31 \| 4410 \| 47.20 \| show more ...
# e836c770	16-Jan-2025	Zhaoyang You <[email protected]>	feat(TopDown): add TopDown PMU Events (#4122) This PR adds hardware synthesizable three-level categorized TopDown performance counters. Level-1: Retiring, Frontend Bound, Bad Speculation, Backend Bo feat(TopDown): add TopDown PMU Events (#4122) This PR adds hardware synthesizable three-level categorized TopDown performance counters. Level-1: Retiring, Frontend Bound, Bad Speculation, Backend Bound. Level-2: Fetch Latency Bound, Fetch Bandwidth Bound, Branch Missprediction, machine clears, Core Bound, Memory Bound. Leval-3: L1 Bound, L2 Bound, L3 Bound, Mem Bound, Store Bound. show more ...
# be8e95bc	25-Dec-2024	Anzo <[email protected]>	fix(MemBlock): fix overflow during lsqptr calculation (#4084) The addition used previously to calculate the `lsq` pointer results in overflow, this is because, the bit width of `numLsElem` is 5 and fix(MemBlock): fix overflow during lsqptr calculation (#4084) The addition used previously to calculate the `lsq` pointer results in overflow, this is because, the bit width of `numLsElem` is 5 and multiple uop accumulations result in data overflow. --- Theoretically this would have been a problem in previous versions as well, but for some reason the bug didn't occur in previous versions until `newDispatch`. show more ...
# 0a7d1d5c	22-Nov-2024	xiaofeibao <[email protected]>	feat(backend): NewDispatch
# b240e1c0	07-Nov-2024	Anzooooo <[email protected]>	feat(Zicclsm): refactoring misalign and support vector misalign
# e9e6cd09	27-Nov-2024	Yanqin Li <[email protected]>	perf(uncache): mmio and nc share LQUncache; nc data can writeback to ldu1-2
# e04c5f64	19-Nov-2024	Yanqin Li <[email protected]>	feat(outstanding): support nc outstanding and remove mmio st outstanding
# bb76fc1b	10-Oct-2024	Yanqin Li <[email protected]>	fix(NC): fix a list of bugs of NC WMO access * fix(PBMT): skip nc difftest and handle the conflict of nc and normal store * fix(PBMT): nc st req is changed to a state machine execution * fix(pbmt) fix(NC): fix a list of bugs of NC WMO access * fix(PBMT): skip nc difftest and handle the conflict of nc and normal store * fix(PBMT): nc st req is changed to a state machine execution * fix(pbmt): fix typo and control error of nc ld * fix(pbmt): nc data assignment error * fix(pbmt): nc should be used to wakeup * fix(pbmt): remove wrong assert * fix(pbmt): lots of bugs of nc st ld forward * fix(pbmt): fix address align error show more ...
# c7353d05	03-Sep-2024	Yanqin Li <[email protected]>	feat(NCld): support WMO access for NC ld * feat(LDU): add support for NC in LoadUnit * feat(LQ,UB): add support for NC in load queue and uncache buffer * chore(pbmt): add xsperf for nc ld statistic
# dc4fac13	02-Dec-2024	CharlieLiu <[email protected]>	feat(DCache): merge CMO requests into DCache TL-A Channel (#3968) * remove previous cmo datapath in memblock. * add datapath for cmo requests between lsq and dcache. * add new CMOUnit in MissQueue feat(DCache): merge CMO requests into DCache TL-A Channel (#3968) * remove previous cmo datapath in memblock. * add datapath for cmo requests between lsq and dcache. * add new CMOUnit in MissQueue. * bump rocket-chip & coupledL2. show more ...
# 189d8d00	29-Oct-2024	Anzo <[email protected]>	refactor(MemBlock): turn on `dontTouch` only when debugging (#3792) This will result in the delivery of clean generated code and may remove some of the pseudo-paths.
# cee1d5b2	15-Oct-2024	Yanqin Li <[email protected]>	fix(lsq): uncache req can be assigned only in idle state (#3732) Bug Description: When an uncache store (st) is immediately followed by an uncache load (ld), due to the `AddPipelineReg` in M fix(lsq): uncache req can be assigned only in idle state (#3732) Bug Description: When an uncache store (st) is immediately followed by an uncache load (ld), due to the `AddPipelineReg` in MemBlock when the LSQ transfers data with the Uncache, even though Uncache is handling the store request, `MemBlock.uncacheReq.ready` is still true. Under the original assignment conditions, the ld request(ld req) from LQ will be received by `MemBlock.uncacheReq` in the `s_store` state. So when `MemBlock.uncacheReq` is received by Uncache, the LSQ state has already transitioned from `s_store` to `s_idle`, without switching to `s_load`. As a result, the load response (ld resp) from Uncache can never be received by the LSQ. The process is briefly described as follows: 1. SQ: st req 2. Uncache: st req received 3. LQ: ld req in `s_store` state 4. Uncache: st resp 5. SQ: st resp received; Uncache: ld req received 6. LSQ: state to `s_idle` 7. Uncache: ld resp 8. ERROR: LSQ can not receive ld resp in `s_idle` state Fix：In LSQ, uncache req can be assigned only in idle state. <img width="1179" alt="image" src="https://github.com/user-attachments/assets/1d2d417d-06d6-43bf-a876-5cc53d0ff9ed"> show more ...
# 46e9ee74	27-Sep-2024	Haoyuan Feng <[email protected]>	fix(exception): fix exception vaddr generate logic (#3639) In LSU, for exceptions that can be detected before address translation(`preaf`, `prepf` or `pregpf`), the original vaddr should be retain fix(exception): fix exception vaddr generate logic (#3639) In LSU, for exceptions that can be detected before address translation(`preaf`, `prepf` or `pregpf`), the original vaddr should be retained. And for exceptions detected after address translation, the 48-bit vaddr needs to be zero-extended or sign-extended according to different modes(`GenExceptionVa`), and then write to *tval. Also fix some connection bugs. show more ...
# ad415ae0	21-Sep-2024	Xiaokun-Pei <[email protected]>	feat(trap): support m/htinst for specific G-stage translation (#3604) According to RISC-V priv spec, mtinst/htinst could be always written zero on trap into M/HS-mode, except for Guest-Page-Fault t feat(trap): support m/htinst for specific G-stage translation (#3604) According to RISC-V priv spec, mtinst/htinst could be always written zero on trap into M/HS-mode, except for Guest-Page-Fault traps that meet both of the following conditions: - the trap is caused by a G-stage translation which supports VS-stage translation - a nonzero value is written to mtval2/htval "isForVSnonLeafPTE" is used only in exceptional circumstances that gpf happens in the G-stage translation which supports VS-stage translation, such as searching the non-leaf pte of VS-stage. This patch adds support for writing proper value to mtinst/htinst when specific trap occurs. And bump the nemu. show more ...
# db6cfb5a	19-Sep-2024	Haoyuan Feng <[email protected]>	fix(exception): check high address bits of lsu (#3596) In previous implementation, we simply truncated the higher bits of jump target or load & store address, which made it impossible to raise exc fix(exception): check high address bits of lsu (#3596) In previous implementation, we simply truncated the higher bits of jump target or load & store address, which made it impossible to raise exceptions in such cases. Commit https://github.com/OpenXiangShan/XiangShan/commit/c1b28b66879239a5b3a44741376f3b002e8ac834 has already fixed high address bits checking of jump target. This commit fixes lsu part, checking full address in tlb and passing full address directly to csr. show more ...
# b4d41c12	10-Sep-2024	xiaofeibao <[email protected]>	timing(LsqEnqCtrl): fix timing of lqAllocNumber and sqAllocNumber
# 94998b06	04-Sep-2024	happy-lx <[email protected]>	fix(Zicclsm, trigger): fix the problem of missing breakpoint exception (#3470) + @wissygh Refactored Trigger check code of Memblock. + Move Trigger address cmp from load S3 to S1. In addition, the fix(Zicclsm, trigger): fix the problem of missing breakpoint exception (#3470) + @wissygh Refactored Trigger check code of Memblock. + Move Trigger address cmp from load S3 to S1. In addition, the detection of trigger is moved from Memblock to LoadUnit. - Once the breakpoint exception is detected, enter the exception Buffer directly to handle the exception (previously, the load instruction was executed first and then the exception was handled, which would cause the mmio load to change the status of the peripheral). + If Trigger address matches and the action is to enter debug mode, both loadUnit and storeUnit will directly write this instruction back without any execution (by setting this instruction as an exception). + Match trigger addresses for vector instructions in LoadUnit. + If both a misalign exception and a breakpoint occur, the breakpoint exception will be processed first. --------- Co-authored-by: chengguanghui <[email protected]> show more ...
# e3ed843c	30-Aug-2024	happy-lx <[email protected]>	Remove `RVA23` prefix and enable CMO by default (#3431) + Remove `RVA23` prefix to clean up code + set `hasCMO` to true by default
# 3fbc86fc	26-Aug-2024	Chen Xi <[email protected]>	RVA23 CMO (Cache Maintenance Operation) (#3426) Supports Zicbom Extension (Clean/Flush/Invalid) - https://github.com/OpenXiangShan/CoupledL2/pull/225 This PR also includes other CPL2 changes: - RVA23 CMO (Cache Maintenance Operation) (#3426) Supports Zicbom Extension (Clean/Flush/Invalid) - https://github.com/OpenXiangShan/CoupledL2/pull/225 This PR also includes other CPL2 changes: - bug fixes - timing fixes - SRAM-Queue \| https://github.com/OpenXiangShan/CoupledL2/pull/228 - data SRAM splitted into 4 \| https://github.com/OpenXiangShan/CoupledL2/pull/229 --------- Co-authored-by: lixin <[email protected]> show more ...
# 41d8d239	21-Aug-2024	happy-lx <[email protected]>	RVA23: Support Zicclsm & Zama16b (Handling Unaligned Load Store by Hardware) (#3320) This PR supports handling load store unaligned exceptions by hardware and provides CSR-controlled switches -- RVA23: Support Zicclsm & Zama16b (Handling Unaligned Load Store by Hardware) (#3320) This PR supports handling load store unaligned exceptions by hardware and provides CSR-controlled switches --------- Co-authored-by: xiaofeibao <[email protected]> show more ...
# 5003e6f8	23-Jul-2024	Huijin Li <[email protected]>	LSQ: optimize static clock gating coverage and fix x_value in vcs (#3176) optimize LSQ static clock gating coverage, fix x_value in vcs
# 16ede6bb	11-Jul-2024	weiding liu <[email protected]>	MemBlock: refactor selectOldest of rollback for better timing Don't select oldest rollback twice in LoadQueueRAW, send to memblock select oldest with other, will have port to send rollback request MemBlock: refactor selectOldest of rollback for better timing Don't select oldest rollback twice in LoadQueueRAW, send to memblock select oldest with other, will have port to send rollback request to memblock in LoadQueueRAW. show more ...
12 3 4 5 6 7