DCacheWrapper.scala - OpenGrok history log for /XiangShan/src/main/scala/xiangshan/cache/dcache/DCacheWrapper.scala

Revision	Date	Author	Comments
# ebe07d61	20-Mar-2025	梁森 Liang Sen <[email protected]>	feat(dfx): reuse dcache data sram read data register as mbist pipeline (#4371) Co-authored-by: sfencevma <[email protected]>
# 10cfb21d	03-Mar-2025	cz4e <[email protected]>	fix(DCache): use `ParallelMux` instead of `Mux1H` (#4340) * When there are multiple errors，`Mux1H` is equivalent to using `\|`, for example * error 0, valid = 1, addr0 = 0x1000 * error 1, va fix(DCache): use `ParallelMux` instead of `Mux1H` (#4340) * When there are multiple errors，`Mux1H` is equivalent to using `\|`, for example * error 0, valid = 1, addr0 = 0x1000 * error 1, valid = 1, addr1 = 0x0ffff * the result is `io.error.valid == 1`, but `io.error.bits.addr == (addr0 \| addr1)`, cause `Mux1H` will generate circuit like this: ``` addr = (valid0 ? addr0 : 'h0) \| (valid1 ? addr1 : 'h0) ``` * This problem can be avoided by using `ParallelMux` show more ...
# 51f9a957	21-Feb-2025	cz4e <[email protected]>	style(LoadPipe): use `miss_req.bits.cancel` instead of `mq_enq_cancel` (#4296)
# 2df9c392	19-Feb-2025	cz4e <[email protected]>	area(TagArray): split `TagArray` from 4way to 2way per array (#4287)
# 74050fc0	26-Jan-2025	Yanqin Li <[email protected]>	perf(Uncache): add merge policy when entering (#4154) # Background ## Problem How to design a more efficient entry rule for a new load/store request when a load/store with the same address already perf(Uncache): add merge policy when entering (#4154) # Background ## Problem How to design a more efficient entry rule for a new load/store request when a load/store with the same address already exists in the `ubuffer`？ * Old Design: Always reject the new request. * New Design: Consider merging requests. ## Merge Scenarios ‼️If the new one can be merge into the existing one, both need to be `NC`. 1. New Store Request: 1. Existing Store: Merge (the new store is younger). 2. Existing Load: Reject. 2. New Load Request: 1. Existing Load: Merge (the new load may be younger or older. Both are ok to merge). 2. Existing Store: Reject. # What this PR do? ## 1. Entry Actions 1. Allocate a new entry and mark as `valid` 1. When there is no matching address. 2. Allocate a new entry and mark as `valid` and `waitSame`: 1. When there is a matching address, and: * The virtual addresses and attributes are the same. * The older entry is either selected to issue or issued. 3. Merge into an Existing Entry: 1. When there is a matching address, and: * The virtual addresses and attributes are the same. * The older entry is not selected to issue or issued. 4. Reject the New Request: 1. When the ubuffer is full. 2. When there is a matching address, but: * The virtual addresses or attributes are different. NOTE: According to the definition in the TL-UL SPEC, the `mask` must be continuous and naturally aligned, and the `addr` must correspond to the mask. Therefore, the "same attributes" here introduces a new condition: the merged `mask` must meet the requirements of being continuous and naturally aligned (function `continueAndAlign`). During merging, the block offset of addr must be synchronously updated in `UncacheEntry.update`. ## 2. Handshake Mechanism Between `LoadQueueUncache (M)` and `Uncache (S)` > `mid`: master id > > `sid`: slave id Old Design: - `M` sends a `req` with a `mid`. - `S` receives the `req`, records the `mid`. - `S` sends a `resp` with the `mid`. - `M` receives the `resp` and matches it with the recorded `mid`. New Design: - `M` sends a `req` with a `mid`. - `S` receives the `req` and responds with `{mid, sid}` . - `M` matches it with the `mid` and updates its record with the received `sid`. - `S` sends a `resp` with the its `sid`. - `M` receives the `resp` and matches it with the recorded `sid`. Benefit: The new design allows `S` to merge requests when new request enters. ## 3. Forwarding Mechanism Old Design: Each address in the `ubuffer` is unique, so forwarding is straightforward based on a match. New Design: * A single address may have up to two entries matched in the `ubuffer`. * If it has two matched enties, it must be true that one entry is marked `inflight` and the other entry is marked `waitSame`. In this case, the forwarded data comes from the merged data of two entries, with the `inflight` entry being the older one. ## 4. Bug Fixes 1. In the `loadUnit`, `!tlbMiss` cannot be directly used as `tlbHit`, because when `tlbValid` is false, `!tlbMiss` can still be true. 2. `Uncache` state machine transition: The state indicating "able to send requests" (previously `s_refill_req`, now `s_inflight`) should not be triggered by `reqFire` but rather by `acquireFire`. <img width="747" alt="image" src="https://github.com/user-attachments/assets/75fbc761-1da8-43d9-a0e6-615cc58cefef" /> # Evaluation - ✅ timing - ✅ performance \| Type \| 4B1000 \| Speedup1-IO \| 1B4096 \| Speedup2-IO \| \| -------------- \| ------- \| ----------- \| ------- \| ----------- \| \| IO \| 51026 \| 1 \| 208149 \| 1.00 \| \| NC \| 42343 \| 1.21 \| 169248 \| 1.23 \| \| NC+OT \| 20379 \| 2.50 \| 160101 \| 1.30 \| \| NC+OT+mergeOpt \| 16308 \| 3.13 \| 126369 \| 1.65 \| \| cache \| 1298 \| 39.31 \| 4410 \| 47.20 \| show more ...
# 1abade56	22-Jan-2025	Anzo <[email protected]>	fix(LSU): fix cbo instruction exception handling logic (#4215)
# fa5e530d	21-Jan-2025	cz4e <[email protected]>	timing(VSegmentUnit): duplicate latchVAddr (#4209) * `latchVAddr` needs to index all dcache data sram from top to bottom, which causes a large fanout, so duplicate `latchVaddr`
# e836c770	16-Jan-2025	Zhaoyang You <[email protected]>	feat(TopDown): add TopDown PMU Events (#4122) This PR adds hardware synthesizable three-level categorized TopDown performance counters. Level-1: Retiring, Frontend Bound, Bad Speculation, Backend Bo feat(TopDown): add TopDown PMU Events (#4122) This PR adds hardware synthesizable three-level categorized TopDown performance counters. Level-1: Retiring, Frontend Bound, Bad Speculation, Backend Bound. Level-2: Fetch Latency Bound, Fetch Bandwidth Bound, Branch Missprediction, machine clears, Core Bound, Memory Bound. Leval-3: L1 Bound, L2 Bound, L3 Bound, Mem Bound, Store Bound. show more ...
# 5bd65c56	14-Jan-2025	Tang Haojin <[email protected]>	feat(Config): add yaml parser for complicated parametrization (#4147) This commit enables complicated parameterization by yaml parsing. We use circe to do this. In this commit, we implement 6 confi feat(Config): add yaml parser for complicated parametrization (#4147) This commit enables complicated parameterization by yaml parsing. We use circe to do this. In this commit, we implement 6 configurations: - PmemRanges: physical memory ranges - PMAConfigs - CHIAsyncBridge: set depth to 0 to disable it - L2CacheConfig - L3CacheConfig - DebugModuleBaseAddr For better human-readability, this commit changes `WithNKBL2/3` to `L2/3CacheConfig`, changing to case classes, and making the first parameter only accept human-readable size configuration like `0.5 MB` or `256kB`. This commit also changes PMAConfigs and PmemRanges into List of case classes. show more ...
# 4f2cafef	30-Dec-2024	CharlieLiu <[email protected]>	fix(DCache): fix dcache TL client parameters (#4110) In previous PR #3968 added a new TL port for CMOUnit in MissQueue, but did not update the config for dcache client, which make CMOUnit and the fi fix(DCache): fix dcache TL client parameters (#4110) In previous PR #3968 added a new TL port for CMOUnit in MissQueue, but did not update the config for dcache client, which make CMOUnit and the first releaseEntry share the same sourceId. Now fix it. show more ...
# 066ca249	27-Dec-2024	zhanglinjuan <[email protected]>	fix(MemBlock): support non-data error handling for cacheable region (#4093) When DCache refill reponses with `denied` or `corrupt` asserted, the loads belonging to the cache line should report load fix(MemBlock): support non-data error handling for cacheable region (#4093) When DCache refill reponses with `denied` or `corrupt` asserted, the loads belonging to the cache line should report load access fault. This is accomplished by including a `corrupt` bit in the DCache MSHR forwarding and TileLink channel D forwarding logic and triggering excepion when `corrupt` is detected. Store non-data error that comes from DCache store miss is unable to trigger a precise access fault trap but an imprecise bus-error interrupt. And it will be included in another commit. show more ...
# 519244c7	25-Dec-2024	Yanqin Li <[email protected]>	submodule(CoupledL2, OpenLLC): support pbmt in CHI scene (#4071) * L1: deliver the NC and PMA signals of uncacheReq to L2 * L2: [support Svpbmt on CHI MemAttr](https://github.com/OpenXiangShan/Coupl submodule(CoupledL2, OpenLLC): support pbmt in CHI scene (#4071) * L1: deliver the NC and PMA signals of uncacheReq to L2 * L2: [support Svpbmt on CHI MemAttr](https://github.com/OpenXiangShan/CoupledL2/pull/273) * LLC: [Non-cache requests are forwarded directly downstream without entering the slice](https://github.com/OpenXiangShan/OpenLLC/pull/28) show more ...
# 0b9f4b2d	25-Dec-2024	cz4e <[email protected]>	area(CacheOpDecoder): remove CacheOpDecoder (#4050) * CacheOpDecoder is no longer used
# 8b33cd30	13-Dec-2024	klin02 <[email protected]>	feat(XSLog): move all XSLog outside WhenContext for collection As data in WhenContext is not acessible in another module. To support XSLog collection, we move all XSLog and related signal outside Wh feat(XSLog): move all XSLog outside WhenContext for collection As data in WhenContext is not acessible in another module. To support XSLog collection, we move all XSLog and related signal outside WhenContext. For example, when(cond1){XSDebug(cond2, pable)} to XSDebug(cond1 && cond2, pable) show more ...
# 72dab974	16-Dec-2024	cz4e <[email protected]>	feat(CtrlUnit, DCache): support L1 DCache RAS (#4009) # L1 DCache RAS extension support The L1 DCache supports the part of Reliability, Availability, and Serviceability (RAS) Extension. * L1 DCache feat(CtrlUnit, DCache): support L1 DCache RAS (#4009) # L1 DCache RAS extension support The L1 DCache supports the part of Reliability, Availability, and Serviceability (RAS) Extension. * L1 DCache protection with Single Error Correct Double Error Detect (SECDED) ECC on the RAMs. This includes the L1 DChace tag and data RAMs. Not recovery error tag or data. * Fault Handling Interrupt (Bus Error Unit Interrupt,BEU, 65) * Error inject ## ECC Error Detect An error might be triggered, when access L1 DCache. * Error Report: * Tag ECC Error: As long as an ECC error occurs on a certain path, it is judged that an ECC error has occurred. * Data ECC Error: If an ECC error occurs in the hit line, it is considered that an ECC error has occurred. If it does not hit, it will not be processed. * If an instruction access triggers an ECC error, a Hardware error is considered and an exception is reported. * Whenever there is an error in starting, an error message needs to be sent to BEU. * When the hardware detects an error, it reports it to the BEU and triggers the NMI external interrupt(65). * Load instruction: * Only ECC errors of tags or data will be triggered during execution, and the errors will be reported to the BEU and a `Hardware Error` will be reported. * Probe/Snoop: * If a tag ecc error occurs, there is no need to change the cache status, and a `ProbeAck` with `corrupt=1` needs to be returned to l2. * If a data ecc error occurs, change the cache status according to the rules. If data needs to be returned, `ProbeAckData` with `corrupt=1` needs to be returned to l2. * Replace/Evict: * `ReleaseData` with `corrupt=1` needs to be returned to l2. * Store to L1 DCache: * If a tag ecc error occurs, the cacheline is released according to the `Repalce/Evict` process and the data is written to L1 DCache without reporting errors to l2. * If a data ecc error occurs, the data is written directly without reporting the error to l2. * Atomics: * report `Hardware Error`, do not report errors to l2. ## Error Inject Each core's L1 DCache is configured with a memory map register-controlled controller, and each hardware unit that supports ECC is configured with a control bank. After the Bank register configuration is completed, L1 DCache will trigger an ecc error for the first access L1 DCache. <div style="text-align: center;"> <img src="https://github.com/user-attachments/assets/8c4d23c5-0324-4e52-bcf4-29b47a282d72" alt="err_inject" width="200" /> </div> ### Address Space Address space `0x38022000`-`0x3802207F`, a total of 128 bytes of space, this space is the local space of each hart. <div style="text-align: center;"> <img width="292" alt="ctl_bank" src="https://github.com/user-attachments/assets/89f88b24-37a4-4786-a192-401759eb95cf"> </div> ### L1 DCache Control Bank Each Control Bank contains registers: `ECCCTL`, `ECCEID`, `ECCMASK`, each register is 8 bytes. <img width="414" alt="eccctl" src="https://github.com/user-attachments/assets/b22ff437-d05d-4b3c-a353-dbea1afdc156"> * ECCCTL(ECC Control): ECC injection control register. * `ese(error signaling enable)`: Indicates that the injection is valid and is initialized to 0. When the injection is successful and `pst==0`, ese will be clean. * `pst(persist)`: Continuously inject signals. When `pst==1`, the `ECCEID` counter decreases to 0 and after successful injection, the injection timer will be restored to the last set `ECCEID` and re-injected; when `pst==0`, it will be injected only once. * `ede(error delay enable)`: Indicates that counter is valid and initialized to 0. If * `ese==1` and `ede==0`, error injection is effective immediately. * `ese==1` and `ede==1`, you need to wait until `ECCEID` decrements to 0 before the injection is effective. * `cmp(component)`: Injection target, initialized to 0. * 1'b0: The injection object is tag. * 1'b1: The injection object is data. * `bank`: The bank valid signal is initialized to 0. When the bit in the `bank` is set, the corresponding mask is valid. <img width="414" alt="ecceid" src="https://github.com/user-attachments/assets/8cea0d8d-2540-44b1-b1f9-c1ed6ec5341e"> * ECCEID(ECC Error Inject Delay): ECC injection delay controller. * When `ese==1` and `ede==1`, it starts to decrease until it reaches 0. Currently, the same clock as the core frequency is used, which can also be divided. Since ECC injection relies on L1 DCache access, the time of the `EID` and the time when the ECC error is triggered may not be consistent. <img width="414" alt="eccmask" src="https://github.com/user-attachments/assets/b1be83fd-17a6-4324-8aa6-45858249c476"> * ECCMASK(ECC Mask): ECC injection mask register. * 0 means no inversion, 1 means flip. Tag injection only uses the bits in `ECCMASK0` corresponding to the tag length. ### Error Inject Example ``` 1 # set control bank base address 2 mv x3, $(BASEADDR) 3 4 # set eid 5 mv x5, 500 # delay 500 cycles 6 sd x5, 8(x3) # mmio store 7 8 # set mask 9 mv x5, 0x1 # flip bit 0 10 sd x5, 16(x3) # mmio store 11 12 # set ctl 13 mv x5, 0x7 # comp = 0, ede = 1, pst = 1, ese = 1 14 sd x5, 0(x3) # mmio store ``` show more ...
# b240e1c0	07-Nov-2024	Anzooooo <[email protected]>	feat(Zicclsm): refactoring misalign and support vector misalign
# 38c29594	26-Nov-2024	zhanglinjuan <[email protected]>	feat(MemBlock): add support for Zacas extension fix(AtomicsUnit, MemBlock): fix loss of multiple stds In the previous design, AtomicsUnit receives stds from StdExeUnit and arbitrate at most one std feat(MemBlock): add support for Zacas extension fix(AtomicsUnit, MemBlock): fix loss of multiple stds In the previous design, AtomicsUnit receives stds from StdExeUnit and arbitrate at most one std uop for one cycle. This works fine on most of the AMOs and LR/SC because they require only one std uop. However AMOCAS requires at least two std uops, which may be issued from two separate issue queues at the same time, leading to the loss of std uops. This commit fixes this by taking all the outputs of the StdExeUnits into account with arbitration logics. fix(AtomicsUnit): DCache req can only be sent at `s_cache_req` fix(AtomicsUnit, difftest): fix difftest io for atomic events fix(MainPipe): fix precedence of `&` and `=/=` operator fix(MainPipe): AMOCAS should not wait for AMOALU fix(MemBlock): remove unnecessary assertion fix(MainPipe): only CAS instruction can assert `s3_cas_fail` fix(AtomicsUnit): fix bug in data select logic submodule(difftest): bump difftest show more ...
# e04c5f64	19-Nov-2024	Yanqin Li <[email protected]>	feat(outstanding): support nc outstanding and remove mmio st outstanding
# c7353d05	03-Sep-2024	Yanqin Li <[email protected]>	feat(NCld): support WMO access for NC ld * feat(LDU): add support for NC in LoadUnit * feat(LQ,UB): add support for NC in load queue and uncache buffer * chore(pbmt): add xsperf for nc ld statistic
# dc4fac13	02-Dec-2024	CharlieLiu <[email protected]>	feat(DCache): merge CMO requests into DCache TL-A Channel (#3968) * remove previous cmo datapath in memblock. * add datapath for cmo requests between lsq and dcache. * add new CMOUnit in MissQueue feat(DCache): merge CMO requests into DCache TL-A Channel (#3968) * remove previous cmo datapath in memblock. * add datapath for cmo requests between lsq and dcache. * add new CMOUnit in MissQueue. * bump rocket-chip & coupledL2. show more ...
# b34797bc	25-Nov-2024	cz4e <[email protected]>	area(DCache ECC): combine ecc with tag/data (#3902)
# b32e9518	08-Nov-2024	Huijin Li <[email protected]>	power(MemBlock): add ClockGate for DCache SRAM (#3824) By using ClockGate for DCache SRAM, memory Power has 64% reduction, MemBlock total power has 23.38% reduction.
# 4a2e3bec	26-Sep-2024	Tang Haojin <[email protected]>	fix(Pmem): memory range should be 'or'ed rather than 'and'ed (#3651)
# 45def856	21-Sep-2024	Tang Haojin <[email protected]>	refactor(Pmem): use `Seq` for physical memory ranges (#3622)
# af95bc32	20-Sep-2024	Haoyuan Feng <[email protected]>	fix(prefetch): MMIO address should not send prefetch requests (#3615) TODO: Prefetcher should check pmp & pma in order to decide whether to send requests
12 3 4 5 6 7