#
ebe07d61 |
| 20-Mar-2025 |
梁森 Liang Sen <[email protected]> |
feat(dfx): reuse dcache data sram read data register as mbist pipeline (#4371)
Co-authored-by: sfencevma <[email protected]>
|
#
10cfb21d |
| 03-Mar-2025 |
cz4e <[email protected]> |
fix(DCache): use `ParallelMux` instead of `Mux1H` (#4340)
* When there are multiple errors,`Mux1H` is equivalent to using `|`, for example
* error 0, valid = 1, addr0 = 0x1000 * error 1, va
fix(DCache): use `ParallelMux` instead of `Mux1H` (#4340)
* When there are multiple errors,`Mux1H` is equivalent to using `|`, for example
* error 0, valid = 1, addr0 = 0x1000 * error 1, valid = 1, addr1 = 0x0ffff * the result is `io.error.valid == 1`, but `io.error.bits.addr == (addr0 | addr1)`, cause `Mux1H` will generate circuit like this: ``` addr = (valid0 ? addr0 : 'h0) | (valid1 ? addr1 : 'h0) ``` * This problem can be avoided by using `ParallelMux`
show more ...
|
#
51f9a957 |
| 21-Feb-2025 |
cz4e <[email protected]> |
style(LoadPipe): use `miss_req.bits.cancel` instead of `mq_enq_cancel` (#4296)
|
#
2df9c392 |
| 19-Feb-2025 |
cz4e <[email protected]> |
area(TagArray): split `TagArray` from 4way to 2way per array (#4287)
|
#
74050fc0 |
| 26-Jan-2025 |
Yanqin Li <[email protected]> |
perf(Uncache): add merge policy when entering (#4154)
# Background
## Problem
How to design a more efficient entry rule for a new load/store request when a load/store with the same address already
perf(Uncache): add merge policy when entering (#4154)
# Background
## Problem
How to design a more efficient entry rule for a new load/store request when a load/store with the same address already exists in the `ubuffer`?
* **Old Design**: Always **reject** the new request. * **New Desig**n: Consider **merging** requests.
## Merge Scenarios
‼️If the new one can be merge into the existing one, both need to be `NC`.
1. **New Store Request:** 1. **Existing Store:** Merge (the new store is younger). 2. **Existing Load:** Reject.
2. **New Load Request:** 1. **Existing Load:** Merge (the new load may be younger or older. Both are ok to merge). 2. **Existing Store:** Reject.
# What this PR do?
## 1. Entry Actions
1. **Allocate** a new entry and mark as `valid` 1. When there is no matching address. 2. **Allocate** a new entry and mark as `valid` and `waitSame`: 1. When there is a matching address, and: * The virtual addresses and attributes are the same. * The older entry is either selected to issue or issued. 3. **Merge** into an Existing Entry: 1. When there is a matching address, and: * The virtual addresses and attributes are the same. * The older entry is **not** selected to issue or issued. 4. **Reject** the New Request: 1. When the ubuffer is full. 2. When there is a matching address, but: * The virtual addresses or attributes are **different**.
**NOTE:** According to the definition in the TL-UL SPEC, the `mask` must be continuous and naturally aligned, and the `addr` must correspond to the mask. Therefore, the "**same attributes**" here introduces a new condition: the merged `mask` must meet the requirements of being continuous and naturally aligned (function `continueAndAlign`). During merging, the block offset of addr must be synchronously updated in `UncacheEntry.update`.
## 2. Handshake Mechanism Between `LoadQueueUncache (M)` and `Uncache (S)`
> `mid`: master id > > `sid`: slave id
**Old Design:**
- `M` sends a `req` with a **`mid`**. - `S` receives the `req`, records the **`mid`**. - `S` sends a `resp` with the **`mid`**. - `M` receives the `resp` and matches it with the recorded **`mid`**.
**New Design:**
- `M` sends a `req` with a **`mid`**. - `S` receives the `req` and responds with `{mid, sid}` . - `M` matches it with the **`mid`** and updates its record with the received **`sid`**. - `S` sends a `resp` with the its **`sid`**. - `M` receives the `resp` and matches it with the recorded **`sid`**.
**Benefit:** The new design allows `S` to merge requests when new request enters.
## 3. Forwarding Mechanism
**Old Design:** Each address in the `ubuffer` is **unique**, so forwarding is straightforward based on a match.
**New Design:**
* A single address may have up to two entries matched in the `ubuffer`. * If it has two matched enties, it must be true that one entry is marked `inflight` and the other entry is marked `waitSame`. In this case, the forwarded data comes from the merged data of two entries, with the `inflight` entry being the older one.
## 4. Bug Fixes
1. In the `loadUnit`, `!tlbMiss` cannot be directly used as `tlbHit`, because when `tlbValid` is false, `!tlbMiss` can still be true. 2. `Uncache` state machine transition: The state indicating "**able to send requests**" (previously `s_refill_req`, now `s_inflight`) should not be triggered by `reqFire` but rather by `acquireFire`.
<img width="747" alt="image" src="https://github.com/user-attachments/assets/75fbc761-1da8-43d9-a0e6-615cc58cefef" />
# Evaluation
- ✅ timing - ✅ performance
| Type | 4B*1000 | Speedup1-IO | 1B*4096 | Speedup2-IO | | -------------- | ------- | ----------- | ------- | ----------- | | IO | 51026 | 1 | 208149 | 1.00 | | NC | 42343 | 1.21 | 169248 | 1.23 | | NC+OT | 20379 | 2.50 | 160101 | 1.30 | | NC+OT+mergeOpt | 16308 | 3.13 | 126369 | 1.65 | | cache | 1298 | 39.31 | 4410 | 47.20 |
show more ...
|
#
1abade56 |
| 22-Jan-2025 |
Anzo <[email protected]> |
fix(LSU): fix cbo instruction exception handling logic (#4215)
|
#
fa5e530d |
| 21-Jan-2025 |
cz4e <[email protected]> |
timing(VSegmentUnit): duplicate latchVAddr (#4209)
* `latchVAddr` needs to index all dcache data sram from top to bottom, which causes a large fanout, so duplicate `latchVaddr`
|
#
e836c770 |
| 16-Jan-2025 |
Zhaoyang You <[email protected]> |
feat(TopDown): add TopDown PMU Events (#4122)
This PR adds hardware synthesizable three-level categorized TopDown performance counters. Level-1: Retiring, Frontend Bound, Bad Speculation, Backend Bo
feat(TopDown): add TopDown PMU Events (#4122)
This PR adds hardware synthesizable three-level categorized TopDown performance counters. Level-1: Retiring, Frontend Bound, Bad Speculation, Backend Bound. Level-2: Fetch Latency Bound, Fetch Bandwidth Bound, Branch Missprediction, machine clears, Core Bound, Memory Bound. Leval-3: L1 Bound, L2 Bound, L3 Bound, Mem Bound, Store Bound.
show more ...
|
#
5bd65c56 |
| 14-Jan-2025 |
Tang Haojin <[email protected]> |
feat(Config): add yaml parser for complicated parametrization (#4147)
This commit enables complicated parameterization by yaml parsing. We use circe to do this.
In this commit, we implement 6 confi
feat(Config): add yaml parser for complicated parametrization (#4147)
This commit enables complicated parameterization by yaml parsing. We use circe to do this.
In this commit, we implement 6 configurations:
- PmemRanges: physical memory ranges - PMAConfigs - CHIAsyncBridge: set depth to 0 to disable it - L2CacheConfig - L3CacheConfig - DebugModuleBaseAddr
For better human-readability, this commit changes `WithNKBL2/3` to `L2/3CacheConfig`, changing to case classes, and making the first parameter only accept human-readable size configuration like `0.5 MB` or `256kB`.
This commit also changes PMAConfigs and PmemRanges into List of case classes.
show more ...
|
#
4f2cafef |
| 30-Dec-2024 |
CharlieLiu <[email protected]> |
fix(DCache): fix dcache TL client parameters (#4110)
In previous PR #3968 added a new TL port for CMOUnit in MissQueue, but did not update the config for dcache client, which make CMOUnit and the fi
fix(DCache): fix dcache TL client parameters (#4110)
In previous PR #3968 added a new TL port for CMOUnit in MissQueue, but did not update the config for dcache client, which make CMOUnit and the first releaseEntry share the same sourceId. Now fix it.
show more ...
|
#
066ca249 |
| 27-Dec-2024 |
zhanglinjuan <[email protected]> |
fix(MemBlock): support non-data error handling for cacheable region (#4093)
When DCache refill reponses with `denied` or `corrupt` asserted, the loads belonging to the cache line should report load
fix(MemBlock): support non-data error handling for cacheable region (#4093)
When DCache refill reponses with `denied` or `corrupt` asserted, the loads belonging to the cache line should report load access fault. This is accomplished by including a `corrupt` bit in the DCache MSHR forwarding and TileLink channel D forwarding logic and triggering excepion when `corrupt` is detected.
Store non-data error that comes from DCache store miss is unable to trigger a precise access fault trap but an imprecise bus-error interrupt. And it will be included in another commit.
show more ...
|
#
519244c7 |
| 25-Dec-2024 |
Yanqin Li <[email protected]> |
submodule(CoupledL2, OpenLLC): support pbmt in CHI scene (#4071)
* L1: deliver the NC and PMA signals of uncacheReq to L2 * L2: [support Svpbmt on CHI MemAttr](https://github.com/OpenXiangShan/Coupl
submodule(CoupledL2, OpenLLC): support pbmt in CHI scene (#4071)
* L1: deliver the NC and PMA signals of uncacheReq to L2 * L2: [support Svpbmt on CHI MemAttr](https://github.com/OpenXiangShan/CoupledL2/pull/273) * LLC: [Non-cache requests are forwarded directly downstream without entering the slice](https://github.com/OpenXiangShan/OpenLLC/pull/28)
show more ...
|
#
0b9f4b2d |
| 25-Dec-2024 |
cz4e <[email protected]> |
area(CacheOpDecoder): remove CacheOpDecoder (#4050)
* CacheOpDecoder is no longer used
|
#
8b33cd30 |
| 13-Dec-2024 |
klin02 <[email protected]> |
feat(XSLog): move all XSLog outside WhenContext for collection
As data in WhenContext is not acessible in another module. To support XSLog collection, we move all XSLog and related signal outside Wh
feat(XSLog): move all XSLog outside WhenContext for collection
As data in WhenContext is not acessible in another module. To support XSLog collection, we move all XSLog and related signal outside WhenContext. For example, when(cond1){XSDebug(cond2, pable)} to XSDebug(cond1 && cond2, pable)
show more ...
|
#
72dab974 |
| 16-Dec-2024 |
cz4e <[email protected]> |
feat(CtrlUnit, DCache): support L1 DCache RAS (#4009)
# L1 DCache RAS extension support
The L1 DCache supports the part of Reliability, Availability, and Serviceability (RAS) Extension. * L1 DCache
feat(CtrlUnit, DCache): support L1 DCache RAS (#4009)
# L1 DCache RAS extension support
The L1 DCache supports the part of Reliability, Availability, and Serviceability (RAS) Extension. * L1 DCache protection with Single Error Correct Double Error Detect (SECDED) ECC on the RAMs. This includes the L1 DChace tag and data RAMs. Not recovery error tag or data. * Fault Handling Interrupt (Bus Error Unit Interrupt,BEU, 65) * Error inject
## ECC Error Detect An error might be triggered, when access L1 DCache. * **Error Report**: * Tag ECC Error: As long as an ECC error occurs on a certain path, it is judged that an ECC error has occurred. * Data ECC Error: If an ECC error occurs in the hit line, it is considered that an ECC error has occurred. If it does not hit, it will not be processed. * If an instruction access triggers an ECC error, a Hardware error is considered and an exception is reported. * Whenever there is an error in starting, an error message needs to be sent to BEU. * When the hardware detects an error, it reports it to the BEU and triggers the NMI external interrupt(65).
* **Load instruction**: * Only ECC errors of tags or data will be triggered during execution, and the errors will be reported to the BEU and a `Hardware Error` will be reported.
* **Probe/Snoop**: * If a tag ecc error occurs, there is no need to change the cache status, and a `ProbeAck` with `corrupt=1` needs to be returned to l2. * If a data ecc error occurs, change the cache status according to the rules. If data needs to be returned, `ProbeAckData` with `corrupt=1` needs to be returned to l2.
* **Replace/Evict**: * `ReleaseData` with `corrupt=1` needs to be returned to l2.
* **Store to L1 DCache**: * If a tag ecc error occurs, the cacheline is released according to the `Repalce/Evict` process and the data is written to L1 DCache without reporting errors to l2. * If a data ecc error occurs, the data is written directly without reporting the error to l2.
* **Atomics**: * report `Hardware Error`, do not report errors to l2.
## Error Inject Each core's L1 DCache is configured with a memory map register-controlled controller, and each hardware unit that supports ECC is configured with a control bank. After the Bank register configuration is completed, L1 DCache will trigger an ecc error for the first access L1 DCache. <div style="text-align: center;"> <img src="https://github.com/user-attachments/assets/8c4d23c5-0324-4e52-bcf4-29b47a282d72" alt="err_inject" width="200" /> </div>
### Address Space Address space `0x38022000`-`0x3802207F`, a total of 128 bytes of space, this space is the local space of each hart. <div style="text-align: center;"> <img width="292" alt="ctl_bank" src="https://github.com/user-attachments/assets/89f88b24-37a4-4786-a192-401759eb95cf"> </div>
### L1 DCache Control Bank Each Control Bank contains registers: `ECCCTL`, `ECCEID`, `ECCMASK`, each register is 8 bytes. <img width="414" alt="eccctl" src="https://github.com/user-attachments/assets/b22ff437-d05d-4b3c-a353-dbea1afdc156"> * ECCCTL(ECC Control): ECC injection control register. * `ese(error signaling enable)`: Indicates that the injection is valid and is initialized to 0. When the injection is successful and `pst==0`, ese will be clean. * `pst(persist)`: Continuously inject signals. When `pst==1`, the `ECCEID` counter decreases to 0 and after successful injection, the injection timer will be restored to the last set `ECCEID` and re-injected; when `pst==0`, it will be injected only once. * `ede(error delay enable)`: Indicates that counter is valid and initialized to 0. If * `ese==1` and `ede==0`, error injection is effective immediately. * `ese==1` and `ede==1`, you need to wait until `ECCEID` decrements to 0 before the injection is effective. * `cmp(component)`: Injection target, initialized to 0. * 1'b0: The injection object is tag. * 1'b1: The injection object is data. * `bank`: The bank valid signal is initialized to 0. When the bit in the `bank` is set, the corresponding mask is valid. <img width="414" alt="ecceid" src="https://github.com/user-attachments/assets/8cea0d8d-2540-44b1-b1f9-c1ed6ec5341e">
* ECCEID(ECC Error Inject Delay): ECC injection delay controller. * When `ese==1` and `ede==1`, it starts to decrease until it reaches 0. Currently, the same clock as the core frequency is used, which can also be divided. Since ECC injection relies on L1 DCache access, the time of the `EID` and the time when the ECC error is triggered may not be consistent.
<img width="414" alt="eccmask" src="https://github.com/user-attachments/assets/b1be83fd-17a6-4324-8aa6-45858249c476">
* ECCMASK(ECC Mask): ECC injection mask register. * 0 means no inversion, 1 means flip. Tag injection only uses the bits in `ECCMASK0` corresponding to the tag length.
### Error Inject Example ``` 1 # set control bank base address 2 mv x3, $(BASEADDR) 3 4 # set eid 5 mv x5, 500 # delay 500 cycles 6 sd x5, 8(x3) # mmio store 7 8 # set mask 9 mv x5, 0x1 # flip bit 0 10 sd x5, 16(x3) # mmio store 11 12 # set ctl 13 mv x5, 0x7 # comp = 0, ede = 1, pst = 1, ese = 1 14 sd x5, 0(x3) # mmio store ```
show more ...
|
#
b240e1c0 |
| 07-Nov-2024 |
Anzooooo <[email protected]> |
feat(Zicclsm): refactoring misalign and support vector misalign
|
#
38c29594 |
| 26-Nov-2024 |
zhanglinjuan <[email protected]> |
feat(MemBlock): add support for Zacas extension
fix(AtomicsUnit, MemBlock): fix loss of multiple stds
In the previous design, AtomicsUnit receives stds from StdExeUnit and arbitrate at most one std
feat(MemBlock): add support for Zacas extension
fix(AtomicsUnit, MemBlock): fix loss of multiple stds
In the previous design, AtomicsUnit receives stds from StdExeUnit and arbitrate at most one std uop for one cycle. This works fine on most of the AMOs and LR/SC because they require only one std uop. However AMOCAS requires at least two std uops, which may be issued from two separate issue queues at the same time, leading to the loss of std uops.
This commit fixes this by taking all the outputs of the StdExeUnits into account with arbitration logics.
fix(AtomicsUnit): DCache req can only be sent at `s_cache_req`
fix(AtomicsUnit, difftest): fix difftest io for atomic events
fix(MainPipe): fix precedence of `&` and `=/=` operator
fix(MainPipe): AMOCAS should not wait for AMOALU
fix(MemBlock): remove unnecessary assertion
fix(MainPipe): only CAS instruction can assert `s3_cas_fail`
fix(AtomicsUnit): fix bug in data select logic
submodule(difftest): bump difftest
show more ...
|
#
e04c5f64 |
| 19-Nov-2024 |
Yanqin Li <[email protected]> |
feat(outstanding): support nc outstanding and remove mmio st outstanding
|
#
c7353d05 |
| 03-Sep-2024 |
Yanqin Li <[email protected]> |
feat(NCld): support WMO access for NC ld
* feat(LDU): add support for NC in LoadUnit
* feat(LQ,UB): add support for NC in load queue and uncache buffer
* chore(pbmt): add xsperf for nc ld statistic
|
#
dc4fac13 |
| 02-Dec-2024 |
CharlieLiu <[email protected]> |
feat(DCache): merge CMO requests into DCache TL-A Channel (#3968)
* remove previous cmo datapath in memblock.
* add datapath for cmo requests between lsq and dcache.
* add new CMOUnit in MissQueue
feat(DCache): merge CMO requests into DCache TL-A Channel (#3968)
* remove previous cmo datapath in memblock.
* add datapath for cmo requests between lsq and dcache.
* add new CMOUnit in MissQueue.
* bump rocket-chip & coupledL2.
show more ...
|
#
b34797bc |
| 25-Nov-2024 |
cz4e <[email protected]> |
area(DCache ECC): combine ecc with tag/data (#3902)
|
#
b32e9518 |
| 08-Nov-2024 |
Huijin Li <[email protected]> |
power(MemBlock): add ClockGate for DCache SRAM (#3824)
By using ClockGate for DCache SRAM, memory Power has 64% reduction,
MemBlock total power has 23.38% reduction.
|
#
4a2e3bec |
| 26-Sep-2024 |
Tang Haojin <[email protected]> |
fix(Pmem): memory range should be 'or'ed rather than 'and'ed (#3651)
|
#
45def856 |
| 21-Sep-2024 |
Tang Haojin <[email protected]> |
refactor(Pmem): use `Seq` for physical memory ranges (#3622)
|
#
af95bc32 |
| 20-Sep-2024 |
Haoyuan Feng <[email protected]> |
fix(prefetch): MMIO address should not send prefetch requests (#3615)
TODO: Prefetcher should check pmp & pma in order to decide whether to
send requests
|