History log of /XiangShan/src/main/scala/xiangshan/cache/dcache/DCacheWrapper.scala (Results 1 – 25 of 153)
Revision Date Author Comments
# ebe07d61 20-Mar-2025 梁森 Liang Sen <[email protected]>

feat(dfx): reuse dcache data sram read data register as mbist pipeline (#4371)

Co-authored-by: sfencevma <[email protected]>


# 10cfb21d 03-Mar-2025 cz4e <[email protected]>

fix(DCache): use `ParallelMux` instead of `Mux1H` (#4340)

* When there are multiple errors,`Mux1H` is equivalent to using `|`, for
example

* error 0, valid = 1, addr0 = 0x1000
* error 1, va

fix(DCache): use `ParallelMux` instead of `Mux1H` (#4340)

* When there are multiple errors,`Mux1H` is equivalent to using `|`, for
example

* error 0, valid = 1, addr0 = 0x1000
* error 1, valid = 1, addr1 = 0x0ffff
* the result is `io.error.valid == 1`, but `io.error.bits.addr == (addr0
| addr1)`, cause `Mux1H` will generate circuit like this:
```
addr = (valid0 ? addr0 : 'h0) |
(valid1 ? addr1 : 'h0)
```
* This problem can be avoided by using `ParallelMux`

show more ...


# 51f9a957 21-Feb-2025 cz4e <[email protected]>

style(LoadPipe): use `miss_req.bits.cancel` instead of `mq_enq_cancel` (#4296)


# 2df9c392 19-Feb-2025 cz4e <[email protected]>

area(TagArray): split `TagArray` from 4way to 2way per array (#4287)


# 74050fc0 26-Jan-2025 Yanqin Li <[email protected]>

perf(Uncache): add merge policy when entering (#4154)

# Background

## Problem

How to design a more efficient entry rule for a new load/store request
when a load/store with the same address already

perf(Uncache): add merge policy when entering (#4154)

# Background

## Problem

How to design a more efficient entry rule for a new load/store request
when a load/store with the same address already exists in the `ubuffer`?

* **Old Design**: Always **reject** the new request.
* **New Desig**n: Consider **merging** requests.

## Merge Scenarios

‼️If the new one can be merge into the existing one, both need to be
`NC`.

1. **New Store Request:**
1. **Existing Store:** Merge (the new store is younger).
2. **Existing Load:** Reject.

2. **New Load Request:**
1. **Existing Load:** Merge (the new load may be younger or older. Both
are ok to merge).
2. **Existing Store:** Reject.

# What this PR do?

## 1. Entry Actions

1. **Allocate** a new entry and mark as `valid`
1. When there is no matching address.
2. **Allocate** a new entry and mark as `valid` and `waitSame`:
1. When there is a matching address, and:
* The virtual addresses and attributes are the same.
* The older entry is either selected to issue or issued.
3. **Merge** into an Existing Entry:
1. When there is a matching address, and:
* The virtual addresses and attributes are the same.
* The older entry is **not** selected to issue or issued.
4. **Reject** the New Request:
1. When the ubuffer is full.
2. When there is a matching address, but:
* The virtual addresses or attributes are **different**.

**NOTE:** According to the definition in the TL-UL SPEC, the `mask` must
be continuous and naturally aligned, and the `addr` must correspond to
the mask. Therefore, the "**same attributes**" here introduces a new
condition: the merged `mask` must meet the requirements of being
continuous and naturally aligned (function `continueAndAlign`). During
merging, the block offset of addr must be synchronously updated in
`UncacheEntry.update`.

## 2. Handshake Mechanism Between `LoadQueueUncache (M)` and `Uncache
(S)`

> `mid`: master id
>
> `sid`: slave id

**Old Design:**

- `M` sends a `req` with a **`mid`**.
- `S` receives the `req`, records the **`mid`**.
- `S` sends a `resp` with the **`mid`**.
- `M` receives the `resp` and matches it with the recorded **`mid`**.

**New Design:**

- `M` sends a `req` with a **`mid`**.
- `S` receives the `req` and responds with `{mid, sid}` .
- `M` matches it with the **`mid`** and updates its record with the
received **`sid`**.
- `S` sends a `resp` with the its **`sid`**.
- `M` receives the `resp` and matches it with the recorded **`sid`**.

**Benefit:** The new design allows `S` to merge requests when new
request enters.

## 3. Forwarding Mechanism

**Old Design:** Each address in the `ubuffer` is **unique**, so
forwarding is straightforward based on a match.

**New Design:**

* A single address may have up to two entries matched in the `ubuffer`.
* If it has two matched enties, it must be true that one entry is marked
`inflight` and the other entry is marked `waitSame`. In this case, the
forwarded data comes from the merged data of two entries, with the
`inflight` entry being the older one.

## 4. Bug Fixes

1. In the `loadUnit`, `!tlbMiss` cannot be directly used as `tlbHit`,
because when `tlbValid` is false, `!tlbMiss` can still be true.
2. `Uncache` state machine transition: The state indicating "**able to
send requests**" (previously `s_refill_req`, now `s_inflight`) should
not be triggered by `reqFire` but rather by `acquireFire`.

<img width="747" alt="image"
src="https://github.com/user-attachments/assets/75fbc761-1da8-43d9-a0e6-615cc58cefef"
/>

# Evaluation

- ✅ timing
- ✅ performance

| Type | 4B*1000 | Speedup1-IO | 1B*4096 | Speedup2-IO |
| -------------- | ------- | ----------- | ------- | ----------- |
| IO | 51026 | 1 | 208149 | 1.00 |
| NC | 42343 | 1.21 | 169248 | 1.23 |
| NC+OT | 20379 | 2.50 | 160101 | 1.30 |
| NC+OT+mergeOpt | 16308 | 3.13 | 126369 | 1.65 |
| cache | 1298 | 39.31 | 4410 | 47.20 |

show more ...


# 1abade56 22-Jan-2025 Anzo <[email protected]>

fix(LSU): fix cbo instruction exception handling logic (#4215)


# fa5e530d 21-Jan-2025 cz4e <[email protected]>

timing(VSegmentUnit): duplicate latchVAddr (#4209)

* `latchVAddr` needs to index all dcache data sram from top to bottom,
which causes a large fanout, so duplicate `latchVaddr`


# e836c770 16-Jan-2025 Zhaoyang You <[email protected]>

feat(TopDown): add TopDown PMU Events (#4122)

This PR adds hardware synthesizable three-level categorized TopDown
performance counters.
Level-1: Retiring, Frontend Bound, Bad Speculation, Backend Bo

feat(TopDown): add TopDown PMU Events (#4122)

This PR adds hardware synthesizable three-level categorized TopDown
performance counters.
Level-1: Retiring, Frontend Bound, Bad Speculation, Backend Bound.
Level-2: Fetch Latency Bound, Fetch Bandwidth Bound, Branch
Missprediction, machine clears, Core Bound, Memory Bound.
Leval-3: L1 Bound, L2 Bound, L3 Bound, Mem Bound, Store Bound.

show more ...


# 5bd65c56 14-Jan-2025 Tang Haojin <[email protected]>

feat(Config): add yaml parser for complicated parametrization (#4147)

This commit enables complicated parameterization by yaml parsing. We use
circe to do this.

In this commit, we implement 6 confi

feat(Config): add yaml parser for complicated parametrization (#4147)

This commit enables complicated parameterization by yaml parsing. We use
circe to do this.

In this commit, we implement 6 configurations:

- PmemRanges: physical memory ranges
- PMAConfigs
- CHIAsyncBridge: set depth to 0 to disable it
- L2CacheConfig
- L3CacheConfig
- DebugModuleBaseAddr

For better human-readability, this commit changes `WithNKBL2/3` to
`L2/3CacheConfig`, changing to case classes, and making the first
parameter only accept human-readable size configuration like `0.5 MB` or
`256kB`.

This commit also changes PMAConfigs and PmemRanges into List of case
classes.

show more ...


# 4f2cafef 30-Dec-2024 CharlieLiu <[email protected]>

fix(DCache): fix dcache TL client parameters (#4110)

In previous PR #3968 added a new TL port for CMOUnit in MissQueue, but
did not update the config for dcache client, which make CMOUnit and the
fi

fix(DCache): fix dcache TL client parameters (#4110)

In previous PR #3968 added a new TL port for CMOUnit in MissQueue, but
did not update the config for dcache client, which make CMOUnit and the
first releaseEntry share the same sourceId. Now fix it.

show more ...


# 066ca249 27-Dec-2024 zhanglinjuan <[email protected]>

fix(MemBlock): support non-data error handling for cacheable region (#4093)

When DCache refill reponses with `denied` or `corrupt` asserted, the
loads belonging to the cache line should report load

fix(MemBlock): support non-data error handling for cacheable region (#4093)

When DCache refill reponses with `denied` or `corrupt` asserted, the
loads belonging to the cache line should report load access fault. This
is accomplished by including a `corrupt` bit in the DCache MSHR
forwarding and TileLink channel D forwarding logic and triggering
excepion when `corrupt` is detected.

Store non-data error that comes from DCache store miss is unable to
trigger a precise access fault trap but an imprecise bus-error
interrupt. And it will be included in another commit.

show more ...


# 519244c7 25-Dec-2024 Yanqin Li <[email protected]>

submodule(CoupledL2, OpenLLC): support pbmt in CHI scene (#4071)

* L1: deliver the NC and PMA signals of uncacheReq to L2
* L2: [support Svpbmt on CHI
MemAttr](https://github.com/OpenXiangShan/Coupl

submodule(CoupledL2, OpenLLC): support pbmt in CHI scene (#4071)

* L1: deliver the NC and PMA signals of uncacheReq to L2
* L2: [support Svpbmt on CHI
MemAttr](https://github.com/OpenXiangShan/CoupledL2/pull/273)
* LLC: [Non-cache requests are forwarded directly downstream without
entering the slice](https://github.com/OpenXiangShan/OpenLLC/pull/28)

show more ...


# 0b9f4b2d 25-Dec-2024 cz4e <[email protected]>

area(CacheOpDecoder): remove CacheOpDecoder (#4050)

* CacheOpDecoder is no longer used


# 8b33cd30 13-Dec-2024 klin02 <[email protected]>

feat(XSLog): move all XSLog outside WhenContext for collection

As data in WhenContext is not acessible in another module. To support
XSLog collection, we move all XSLog and related signal outside
Wh

feat(XSLog): move all XSLog outside WhenContext for collection

As data in WhenContext is not acessible in another module. To support
XSLog collection, we move all XSLog and related signal outside
WhenContext. For example, when(cond1){XSDebug(cond2, pable)} to
XSDebug(cond1 && cond2, pable)

show more ...


# 72dab974 16-Dec-2024 cz4e <[email protected]>

feat(CtrlUnit, DCache): support L1 DCache RAS (#4009)

# L1 DCache RAS extension support

The L1 DCache supports the part of Reliability, Availability, and
Serviceability (RAS) Extension.
* L1 DCache

feat(CtrlUnit, DCache): support L1 DCache RAS (#4009)

# L1 DCache RAS extension support

The L1 DCache supports the part of Reliability, Availability, and
Serviceability (RAS) Extension.
* L1 DCache protection with Single Error Correct Double Error Detect
(SECDED) ECC on the RAMs. This includes the L1 DChace tag and data RAMs.
Not recovery error tag or data.
* Fault Handling Interrupt (Bus Error Unit Interrupt,BEU, 65)
* Error inject

## ECC Error Detect
An error might be triggered, when access L1 DCache.
* **Error Report**:
* Tag ECC Error: As long as an ECC error occurs on a certain path, it
is judged that an ECC error has occurred.
* Data ECC Error: If an ECC error occurs in the hit line, it is
considered
that an ECC error has occurred. If it does not hit, it will not be
processed.
* If an instruction access triggers an ECC error, a Hardware error is
considered and an exception is reported.
* Whenever there is an error in starting, an error message needs to
be sent to BEU.
* When the hardware detects an error, it reports it to the BEU and
triggers the NMI external interrupt(65).

* **Load instruction**:
* Only ECC errors of tags or data will be triggered during execution,
and the errors will be reported to the BEU and a `Hardware Error`
will be reported.

* **Probe/Snoop**:
* If a tag ecc error occurs, there is no need to change the cache
status,
and a `ProbeAck` with `corrupt=1` needs to be returned to l2.
* If a data ecc error occurs, change the cache status according to
the rules. If data needs to be returned, `ProbeAckData` with `corrupt=1`
needs to be returned to l2.

* **Replace/Evict**:
* `ReleaseData` with `corrupt=1` needs to be returned to l2.

* **Store to L1 DCache**:
* If a tag ecc error occurs, the cacheline is released according to the
`Repalce/Evict` process and the data is written to L1 DCache without
reporting errors to l2.
* If a data ecc error occurs, the data is written directly without
reporting
the error to l2.

* **Atomics**:
* report `Hardware Error`, do not report errors to l2.

## Error Inject
Each core's L1 DCache is configured with a memory map
register-controlled
controller, and each hardware unit that supports ECC is configured with
a
control bank. After the Bank register configuration is completed, L1
DCache
will trigger an ecc error for the first access L1 DCache.
<div style="text-align: center;">
<img
src="https://github.com/user-attachments/assets/8c4d23c5-0324-4e52-bcf4-29b47a282d72"
alt="err_inject" width="200" />
</div>

### Address Space
Address space `0x38022000`-`0x3802207F`, a total of 128 bytes of space,
this space is the local space of each hart.
<div style="text-align: center;">
<img width="292" alt="ctl_bank"
src="https://github.com/user-attachments/assets/89f88b24-37a4-4786-a192-401759eb95cf">
</div>

### L1 DCache Control Bank
Each Control Bank contains registers: `ECCCTL`, `ECCEID`, `ECCMASK`,
each register is 8 bytes.
<img width="414" alt="eccctl"
src="https://github.com/user-attachments/assets/b22ff437-d05d-4b3c-a353-dbea1afdc156">
* ECCCTL(ECC Control): ECC injection control register.
* `ese(error signaling enable)`: Indicates that the injection is valid
and is initialized to 0. When the injection is successful and `pst==0`,
ese will be clean.
* `pst(persist)`: Continuously inject signals. When `pst==1`,
the `ECCEID`
counter decreases to 0 and after successful injection, the
injection timer will be restored to the last set `ECCEID` and
re-injected;
when `pst==0`, it will be injected only once.
* `ede(error delay enable)`: Indicates that counter is valid and
initialized to 0. If
* `ese==1` and `ede==0`, error injection is effective immediately.
* `ese==1` and `ede==1`, you need to wait until `ECCEID`
decrements to 0 before the injection is effective.
* `cmp(component)`: Injection target, initialized to 0.
* 1'b0: The injection object is tag.
* 1'b1: The injection object is data.
* `bank`: The bank valid signal is initialized to 0. When the bit in
the `bank` is set, the corresponding mask is valid.

<img width="414" alt="ecceid"
src="https://github.com/user-attachments/assets/8cea0d8d-2540-44b1-b1f9-c1ed6ec5341e">

* ECCEID(ECC Error Inject Delay): ECC injection delay controller.
* When `ese==1` and `ede==1`, it
starts to decrease until it reaches 0. Currently, the same clock as
the core frequency is used, which can also be divided. Since ECC
injection relies on L1 DCache access, the time of the `EID` and the
time when the ECC error is triggered may not be consistent.

<img width="414" alt="eccmask"
src="https://github.com/user-attachments/assets/b1be83fd-17a6-4324-8aa6-45858249c476">

* ECCMASK(ECC Mask): ECC injection mask register.
* 0 means no inversion, 1 means flip.
Tag injection only uses the bits in `ECCMASK0` corresponding to
the tag length.

### Error Inject Example
```
1 # set control bank base address
2 mv x3, $(BASEADDR)
3
4 # set eid
5 mv x5, 500 # delay 500 cycles
6 sd x5, 8(x3) # mmio store
7
8 # set mask
9 mv x5, 0x1 # flip bit 0
10 sd x5, 16(x3) # mmio store
11
12 # set ctl
13 mv x5, 0x7 # comp = 0, ede = 1, pst = 1, ese = 1
14 sd x5, 0(x3) # mmio store
```

show more ...


# b240e1c0 07-Nov-2024 Anzooooo <[email protected]>

feat(Zicclsm): refactoring misalign and support vector misalign


# 38c29594 26-Nov-2024 zhanglinjuan <[email protected]>

feat(MemBlock): add support for Zacas extension

fix(AtomicsUnit, MemBlock): fix loss of multiple stds

In the previous design, AtomicsUnit receives stds from StdExeUnit and
arbitrate at most one std

feat(MemBlock): add support for Zacas extension

fix(AtomicsUnit, MemBlock): fix loss of multiple stds

In the previous design, AtomicsUnit receives stds from StdExeUnit and
arbitrate at most one std uop for one cycle. This works fine on most of
the AMOs and LR/SC because they require only one std uop. However AMOCAS
requires at least two std uops, which may be issued from two separate
issue queues at the same time, leading to the loss of std uops.

This commit fixes this by taking all the outputs of the StdExeUnits into
account with arbitration logics.

fix(AtomicsUnit): DCache req can only be sent at `s_cache_req`

fix(AtomicsUnit, difftest): fix difftest io for atomic events

fix(MainPipe): fix precedence of `&` and `=/=` operator

fix(MainPipe): AMOCAS should not wait for AMOALU

fix(MemBlock): remove unnecessary assertion

fix(MainPipe): only CAS instruction can assert `s3_cas_fail`

fix(AtomicsUnit): fix bug in data select logic

submodule(difftest): bump difftest

show more ...


# e04c5f64 19-Nov-2024 Yanqin Li <[email protected]>

feat(outstanding): support nc outstanding and remove mmio st outstanding


# c7353d05 03-Sep-2024 Yanqin Li <[email protected]>

feat(NCld): support WMO access for NC ld

* feat(LDU): add support for NC in LoadUnit

* feat(LQ,UB): add support for NC in load queue and uncache buffer

* chore(pbmt): add xsperf for nc ld statistic


# dc4fac13 02-Dec-2024 CharlieLiu <[email protected]>

feat(DCache): merge CMO requests into DCache TL-A Channel (#3968)

* remove previous cmo datapath in memblock.
* add datapath for cmo requests between lsq and dcache.
* add new CMOUnit in MissQueue

feat(DCache): merge CMO requests into DCache TL-A Channel (#3968)

* remove previous cmo datapath in memblock.
* add datapath for cmo requests between lsq and dcache.
* add new CMOUnit in MissQueue.
* bump rocket-chip & coupledL2.

show more ...


# b34797bc 25-Nov-2024 cz4e <[email protected]>

area(DCache ECC): combine ecc with tag/data (#3902)


# b32e9518 08-Nov-2024 Huijin Li <[email protected]>

power(MemBlock): add ClockGate for DCache SRAM (#3824)

By using ClockGate for DCache SRAM, memory Power has 64% reduction,
MemBlock total power has 23.38% reduction.


# 4a2e3bec 26-Sep-2024 Tang Haojin <[email protected]>

fix(Pmem): memory range should be 'or'ed rather than 'and'ed (#3651)


# 45def856 21-Sep-2024 Tang Haojin <[email protected]>

refactor(Pmem): use `Seq` for physical memory ranges (#3622)


# af95bc32 20-Sep-2024 Haoyuan Feng <[email protected]>

fix(prefetch): MMIO address should not send prefetch requests (#3615)

TODO: Prefetcher should check pmp & pma in order to decide whether to
send requests


1234567